Liberty BASIC Community Forum
Liberty BASIC Programming Discussions >> Novice >> finding the min distance between 2 sets of points

finding the min distance between 2 sets of points
Post by blessed2b3rd on Sep 24th, 2015, 10:42am

Hello again,

Firstly thank you for all the very helpful solutions you have all provided. Its been nine months since my coding conception and although the curve has been steep its been an oddly pleasing balance of frustration and immense satisfaction.

I have a new issue. I have two sets of points in two separate .csv files. Lets call them set A and set B. Both sets of points are defined by XYZ coordinates. Say set A has 1000 points and set B has 50000. What I want to do is cycle through each point in set A and find which point in set B is closest.

Im figuring the logic is this:

Take the first point in set A and using the X value, cycle through every X value in set B. By minus-ing each value in set B from each value in set A I can then figure which value is smallest, thus being closest on the X plane. Doing the same for each Y value will do the same for the Y plane. Adding the distances of the X value and Y value for each point will give me an overall value and thus filtering the smallest will tell me which points are closest. I am hoping to end up with a third .csv file listing the compete set A with the closest 1000 set B points next to them.

To speed up the code I use only points that are within the range of set A so as not to cycle through ach and every 50000 set B points.

I realize this may be difficult to visualize so i'll attempt a analogy. Picture a cargo net hanging from its four corners from four posts. The cargo net has 50000 thread cross-over points where there is 1000 tennis balls hanging and tied via a piece of string to a thread cross-over at randomly chosen spots. I want to find which cross over point is closest to each tennis ball.

I'm probably making it worse but here is the logic code I've come up with so far but I'm getting stuck in the loops. Can anyone help?

Set A = CSVData
Set B = CSVData1

'Scan Co-ords for max and min values

Xmin = 100000000000
Xmax = -100000000000
Ymin = 100000000000
Ymax = -100000000000

for i = 1 to countdata

X = CSVData(i,nX)
Y = CSVData(i,nY)

    if CSVData(i,nX) < Xmin then Xmin = X
    if CSVData(i,nX) > Xmax then Xmax = X
    if CSVData(i,nY) < Ymin then Ymin = Y
    if CSVData(i,nY) > Ymax then Ymax = Y
next i

print Xmin;" "; Xmax;" "; Ymin;" "; Ymax

PXmin = 100000000000
PXmax = -100000000000
PYmin = 100000000000
PYmax = -100000000000

for i = 1 to countdata

PX = CSVData1(i,nPitX)
PY = CSVData1(i,nPitY)

    if CSVData1(i,nPitX) < PXmin then PXmin = PX
    if CSVData1(i,nPitX) > PXmax then PXmax = PX
    if CSVData1(i,nPitY) < PYmin then PYmin = PY
    if CSVData1(i,nPitY) > PYmax then PYmax = PY
next i

print PXmin;" "; PXmax;" "; PYmin;" "; PYmax

'Cycle through points

MindistX = 999999999
MindistY = 999999999

for i = 1 to countdata
    for j = 1 to countdata

DX = CSVData1(i,nMinDistX)
DY = CSVData1(i,nMinDistY)

        if CSVData(i,nX) > PXmin and CSVData(i,nX) < PXmax then
            distX = CSVData(i,nX) - CSVData1(j,nPitX)
        end if
            if CSVData(i,ny) > Pymin and CSVData(i,nY) < PYmax then
                distY = CSVData(i,nY) - CSVData1(j,nPitY)
            end if
                if distY < MindistY then MindistY = CSVData1(i,nMinDistY)
                end if
        if distX < MindistX then MindistX = CSVData1(i,nMinDistX)
        end if
    next j
next i

I can provide the entire code where I import all the points and provide the two sets of points but the files are large and you guys seem to come up with a clickety-click method to ensure the code works on a smaller set first.
Re: finding the min distance between 2 sets of poi
Post by tenochtitlanuk on Sep 24th, 2015, 4:18pm

The following MAY be the kind of thing you're wanting.
I ASSUME the z values are irrelevant.
The data is held in a CSV string for convenience and compatibility with files.

The graphic representation shows the two overlapping data sets.
NB I'm using much smaller sets so i can see results in a few seconds. Haven't checked how long it takes for sets of 1000 and 50000!! ( ?about one day??)

I generate the two datasets internally- the 'randomize' means I get the same data each time for test purposes.
I use pythagoras to find separations ( actually use sepn^2 to save time.) then re-plot.
The new file appends nearest neighbour data to each of the first dataset's terms. Easily saved to a file.

Is this the kind of thing you are after??

'   nomainwin

randomize( 0.666)  '   seed PRNGenerator

WindowWidth  =600
WindowHeight =600

'                                       Generate two data sets to play with....
s1 =100 '1000
s2 =300 '50000

dim dataSet1$( s1)    '   string holds x y and z as csv string
dim dataSet2$( s2)
dim dataSet3$( s1)    '   will add the closest from set2 to set1

open "Display" for graphics_nsb_nf as #wg

#wg "trapclose [quit]"
#wg "down ; fill 50 50 50 ; size 2"
'                                       Show them on screen....
#wg "color red"
for i =1 to s1
    x =50 +int( 400 *rnd( 1)): y =50 +int( 400 *rnd( 1)): z =int( 256 *rnd( 1))
    dataSet1$( i) =str$( x) +"," +str$( y) +"," +str$( z)
    #wg "set "; 50 +x; " ";50 +y
next i

#wg "color cyan"
for i =1 to s2
    x =int( 500 *rnd( 1)): y =int( 500 *rnd( 1)): z =int( 256 *rnd( 1))
    dataSet2$( i) =str$( x) +"," +str$( y) +"," +str$( z)
    #wg "set "; 50 +x; " "; 50 +y
next i

'                                                   Go through all data in set1 and find closest in set2
for i =1 to s1
    sepnSqd =1E12
    Jn       =0
    for j =1 to s2
        dX =val( word$( dataSet1$( i), 1, ",")) -val( word$( dataSet2$( j), 1, ","))
        dY =val( word$( dataSet1$( i), 2, ",")) -val( word$( dataSet2$( j), 2, ","))
        dSqd =dX^2 +dY^2
        if dSqd <sepnSqd then sepnSqd =dSqd: Jn =j   '   Jn ends pointing to closest in set 2
    next j
    '                                               Create new dataset3
    dataSet3$( i) =dataSet1$( i) +"," +dataSet2$( Jn)
next i

'                                                   Clear screen and show only set1 and nearest neighbours
#wg "fill darkblue"

for i =1 to s1
    print dataSet3$( i)
    xS =int( val( word$( dataSet3$( i), 1, ",")))
    yS =int( val( word$( dataSet3$( i), 2, ",")))
    #wg "color red  ; set "; 50 +xS; " "; 50 +yS

    xS =int( val( word$( dataSet3$( i), 4, ",")))
    yS =int( val( word$( dataSet3$( i), 5, ",")))
    #wg "color cyan ; set "; 50 +xS; " "; 50 +yS
next i


close #wg

Sounds like you find programming as addictive and satisfying as I do!
Re: finding the min distance between 2 sets of poi
Post by AndyAmaya on Sep 24th, 2015, 4:26pm

I would suggest using the distance formula

Don't take the square root to get the actual distance (it will take too much time). Just leave the values squared and compare squared values.

Let's say you want to find the squared distance between two 3D points (x1,y1,z1 and x2,y2,z2).

xd = (x2-x1)
yd = (y2-y1)
zd = (y2-z1)

distSquared = xd*xd + yd*yd + zd*zd

Save the distSquared and the record numbers of the A and B files in a comma separated value (.CSV) file.

Load the .csv file into Excel or Libre Office spreadsheet and sort by the distSquared column.

The smallest distances will be at the top and you will see the associated A and B record numbers in the second and third column.

That's the best I can come up with to ease your pain smiley

Good Luck!

Re: finding the min distance between 2 sets of poi
Post by tsh73 on Sep 25th, 2015, 03:56am

Here is it, simplest way (brute force), finding nearest from 50000 B points, to 10 A points. That is, 1/100 of the actual task.

On my machine, it takes 12 seconts

12sec*100 make 20 hours.
EDIT stupid me. 20 minutes. I sure can wait 20 minutes - just go for some coffee.

Now, I guess that something like C/C# is some 100x times faster so it will be under 1 minute.

So the obvious choise is - recode it in a faster language
or implement some smart algorythm.
Like may be Nearest neighbour search on K-d_tree

'Say set A has 1000 points and set B has 50000.
'What I want to do is cycle through each point in set A
'and find which point in set B is closest.

dim A.x(numA),A.y(numA),A.z(numA)
dim AtoB(numA)
dim B.x(numB),B.y(numB),B.z(numB)

'generate randomly
print "generate randomly"
randomize 0.5   'repeatable "randomly"
rangeMin = 1000: rangeMax = 10000
for i = 1 to numA
    A.x(i) = rangeMin + rnd(1)*(rangeMax - rangeMin)
    A.y(i) = rangeMin + rnd(1)*(rangeMax - rangeMin)
    A.z(i) = rangeMin + rnd(1)*(rangeMax - rangeMin)
    'print i, A.x(i), A.y(i), A.z(i)
for i = 1 to numB
    B.x(i) = rangeMin + rnd(1)*(rangeMax - rangeMin)
    B.y(i) = rangeMin + rnd(1)*(rangeMax - rangeMin)
    B.z(i) = rangeMin + rnd(1)*(rangeMax - rangeMin)

print "Finding nearest point "
for i = 1 to 10'numA
    Ax=A.x(i): Ay=A.y(i): Az=A.z(i)
    for j = 1 to numB
        'Time taken, s 13.281 for 10x50000
        'Time taken, s 11.781 then cashing Ax Ay Az
        'print j, d, minD, minJ
        if d < minD then minD=d: minJ = j
print "Time taken, s "; (t1-t0)/1000
print "over"

for i = 1 to 20'numA
    print i, AtoB(i)

Re: finding the min distance between 2 sets of poi
Post by tsh73 on Sep 25th, 2015, 05:10am

>>recode it in a faster language
Sometimes one just *need* another language. And to object it is sheer badass stubborness.

Here same code converted to MS VC# Express 2010
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
    class Program
        static void Main(string[] args)
            int numA=1000;
            int numB=50000;
            int[] Ax=new int[numA];
            int[] Ay=new int[numA];
            int[] Az=new int[numA];
            int[] AtoB=new int[numA];

            int[] Bx=new int[numB];
            int[] By=new int[numB];
            int[] Bz=new int[numB];
            Console.WriteLine("generate randomly");
            Random r = new Random(1);   //'repeatable "randomly"
            int rangeMin = 1000,  rangeMax = 10000;
            for (int i = 0; i < numA; i++)
			    Ax[i]=r.Next(rangeMin,  rangeMax );
			    Ay[i]=r.Next(rangeMin,  rangeMax );
			    Az[i]=r.Next(rangeMin,  rangeMax );
            for (int i = 0; i < numB; i++)
			    Bx[i]=r.Next(rangeMin,  rangeMax );
			    By[i]=r.Next(rangeMin,  rangeMax );
			    Bz[i]=r.Next(rangeMin,  rangeMax );

            Console.WriteLine("Finding nearest point");
            DateTime t0 = DateTime.Now;

            double minD, d;
            int minJ=-1;
            for (int i = 0; i < numA; i++)
			    for (int j = 0; j < numB; j++)
                    if (d < minD)
		                 minD=d; minJ = j;

            DateTime t1 = DateTime.Now;

            Console.WriteLine("Time taken: {0}", t1 - t0);
            for (int i = 0; i < 20; i++)
                Console.WriteLine("{0} {1}", i, AtoB[i]);

generate randomly
Finding nearest point
Time taken: 00:00:02.9687690
0 48171
1 36077
2 45538
3 32153
4 11737
5 32412
6 32864
7 1619
8 8320
9 24538
10 35109
11 31782
12 15209
13 10501
14 40535
15 40947
16 40871
17 12194
18 43724
19 26686

That's 3 seconds for *whole* task. 3 seconds compared to 20 hours. Oh boy. I should miscalculate something...
Quick check:
12 seconds * 100 is 1200 seconds is 20 MINUTES!!!

Re: finding the min distance between 2 sets of poi
Post by tenochtitlanuk on Sep 25th, 2015, 06:21am

Agree about language choice- we say in UK 'Horses for courses', ie in a horse race the particular horse to bet on varies depending on the particular place they are running.
Anyway, here's my prog taking 1min to display 50 x 10000 points.
User Image
EDIT full 1000x50000 took under two hours...
Re: finding the min distance between 2 sets of poi
Post by blessed2b3rd on Oct 15th, 2015, 5:53pm

Hi All,

Thanks for the replies. Sorry its taken so long to get back to this but a mix of travel for work and the Rugby World Cup things sorta took a sidetrack.

The formula and technique provided by tsh73 seems to be the most straightforward for my novice needs. I can write the code to figure out what is the shortest distance but how do I write the culpable X and Y coordinates to a new csv. The writing to the .csv file isn't the issue its identifying the X and Y values that were used to figure out the minimum distance so I can then write these back to the corresponding set A. Here is my code:

MinD = 999999999

for i = 1 to countdata
    for j = 1 to countdata

        if CSVData(i,nX) > PXmin and CSVData(i,nX) < PXmax and CSVData(i,ny) > Pymin and CSVData(i,nY) < PYmax then
            d = (CSVData(i,nX) - CSVData1(j,nPitX))^2 + (CSVData(i,ny) - CSVData1(j,nPitY))^2
                if d < MinD then MinD = d
                end if
        end if
    next j
next i

For each point in set A there is a MinD. How do I write the XYs from set B that gave me the MinD?
Re: finding the min distance between 2 sets of poi
Post by Rod on Oct 16th, 2015, 01:42am

At the point you decide if minD is to be overwritten then also store minX and minY. Then as you exit the for next loop write minD minX and minY to your csv file.