Author 
Topic: finding the min distance between 2 sets of points (Read 389 times) 

blessed2b3rd
Junior Member
member is offline
Posts: 65


finding the min distance between 2 sets of points
« Thread started on: Sep 24^{th}, 2015, 10:42am » 

Hello again,
Firstly thank you for all the very helpful solutions you have all provided. Its been nine months since my coding conception and although the curve has been steep its been an oddly pleasing balance of frustration and immense satisfaction.
I have a new issue. I have two sets of points in two separate .csv files. Lets call them set A and set B. Both sets of points are defined by XYZ coordinates. Say set A has 1000 points and set B has 50000. What I want to do is cycle through each point in set A and find which point in set B is closest.
Im figuring the logic is this:
Take the first point in set A and using the X value, cycle through every X value in set B. By minusing each value in set B from each value in set A I can then figure which value is smallest, thus being closest on the X plane. Doing the same for each Y value will do the same for the Y plane. Adding the distances of the X value and Y value for each point will give me an overall value and thus filtering the smallest will tell me which points are closest. I am hoping to end up with a third .csv file listing the compete set A with the closest 1000 set B points next to them.
To speed up the code I use only points that are within the range of set A so as not to cycle through ach and every 50000 set B points.
I realize this may be difficult to visualize so i'll attempt a analogy. Picture a cargo net hanging from its four corners from four posts. The cargo net has 50000 thread crossover points where there is 1000 tennis balls hanging and tied via a piece of string to a thread crossover at randomly chosen spots. I want to find which cross over point is closest to each tennis ball.
I'm probably making it worse but here is the logic code I've come up with so far but I'm getting stuck in the loops. Can anyone help?
Set A = CSVData Set B = CSVData1
Code:'************************************************************************
'Scan Coords for max and min values
'************************************************************************
Xmin = 100000000000
Xmax = 100000000000
Ymin = 100000000000
Ymax = 100000000000
for i = 1 to countdata
X = CSVData(i,nX)
Y = CSVData(i,nY)
if CSVData(i,nX) < Xmin then Xmin = X
if CSVData(i,nX) > Xmax then Xmax = X
if CSVData(i,nY) < Ymin then Ymin = Y
if CSVData(i,nY) > Ymax then Ymax = Y
next i
print Xmin;" "; Xmax;" "; Ymin;" "; Ymax
PXmin = 100000000000
PXmax = 100000000000
PYmin = 100000000000
PYmax = 100000000000
for i = 1 to countdata
PX = CSVData1(i,nPitX)
PY = CSVData1(i,nPitY)
if CSVData1(i,nPitX) < PXmin then PXmin = PX
if CSVData1(i,nPitX) > PXmax then PXmax = PX
if CSVData1(i,nPitY) < PYmin then PYmin = PY
if CSVData1(i,nPitY) > PYmax then PYmax = PY
next i
print PXmin;" "; PXmax;" "; PYmin;" "; PYmax
'************************************************************************
'Cycle through points
'************************************************************************
MindistX = 999999999
MindistY = 999999999
for i = 1 to countdata
for j = 1 to countdata
DX = CSVData1(i,nMinDistX)
DY = CSVData1(i,nMinDistY)
if CSVData(i,nX) > PXmin and CSVData(i,nX) < PXmax then
distX = CSVData(i,nX)  CSVData1(j,nPitX)
end if
if CSVData(i,ny) > Pymin and CSVData(i,nY) < PYmax then
distY = CSVData(i,nY)  CSVData1(j,nPitY)
end if
if distY < MindistY then MindistY = CSVData1(i,nMinDistY)
end if
if distX < MindistX then MindistX = CSVData1(i,nMinDistX)
end if
next j
next i
I can provide the entire code where I import all the points and provide the two sets of points but the files are large and you guys seem to come up with a clicketyclick method to ensure the code works on a smaller set first.


Logged




tenochtitlanuk
Board Moderator
member is offline
Gender:
Posts: 1160


Re: finding the min distance between 2 sets of poi
« Reply #1 on: Sep 24^{th}, 2015, 4:18pm » 

The following MAY be the kind of thing you're wanting. I ASSUME the z values are irrelevant. The data is held in a CSV string for convenience and compatibility with files.
The graphic representation shows the two overlapping data sets. NB I'm using much smaller sets so i can see results in a few seconds. Haven't checked how long it takes for sets of 1000 and 50000!! ( ?about one day??)
I generate the two datasets internally the 'randomize' means I get the same data each time for test purposes. I use pythagoras to find separations ( actually use sepn^2 to save time.) then replot. The new file appends nearest neighbour data to each of the first dataset's terms. Easily saved to a file.
Is this the kind of thing you are after??
Code:' nomainwin
randomize( 0.666) ' seed PRNGenerator
WindowWidth =600
WindowHeight =600
' Generate two data sets to play with....
s1 =100 '1000
s2 =300 '50000
dim dataSet1$( s1) ' string holds x y and z as csv string
dim dataSet2$( s2)
dim dataSet3$( s1) ' will add the closest from set2 to set1
open "Display" for graphics_nsb_nf as #wg
#wg "trapclose [quit]"
#wg "down ; fill 50 50 50 ; size 2"
' Show them on screen....
#wg "color red"
for i =1 to s1
x =50 +int( 400 *rnd( 1)): y =50 +int( 400 *rnd( 1)): z =int( 256 *rnd( 1))
dataSet1$( i) =str$( x) +"," +str$( y) +"," +str$( z)
#wg "set "; 50 +x; " ";50 +y
scan
next i
#wg "color cyan"
for i =1 to s2
x =int( 500 *rnd( 1)): y =int( 500 *rnd( 1)): z =int( 256 *rnd( 1))
dataSet2$( i) =str$( x) +"," +str$( y) +"," +str$( z)
#wg "set "; 50 +x; " "; 50 +y
scan
next i
' Go through all data in set1 and find closest in set2
for i =1 to s1
sepnSqd =1E12
Jn =0
for j =1 to s2
dX =val( word$( dataSet1$( i), 1, ",")) val( word$( dataSet2$( j), 1, ","))
dY =val( word$( dataSet1$( i), 2, ",")) val( word$( dataSet2$( j), 2, ","))
dSqd =dX^2 +dY^2
if dSqd <sepnSqd then sepnSqd =dSqd: Jn =j ' Jn ends pointing to closest in set 2
next j
' Create new dataset3
dataSet3$( i) =dataSet1$( i) +"," +dataSet2$( Jn)
next i
' Clear screen and show only set1 and nearest neighbours
#wg "fill darkblue"
for i =1 to s1
print dataSet3$( i)
xS =int( val( word$( dataSet3$( i), 1, ",")))
yS =int( val( word$( dataSet3$( i), 2, ",")))
#wg "color red ; set "; 50 +xS; " "; 50 +yS
xS =int( val( word$( dataSet3$( i), 4, ",")))
yS =int( val( word$( dataSet3$( i), 5, ",")))
#wg "color cyan ; set "; 50 +xS; " "; 50 +yS
scan
next i
wait
[quit]
close #wg
end
Sounds like you find programming as addictive and satisfying as I do!


Logged




AndyAmaya
Senior Member
member is offline
Gender:
Posts: 265


Re: finding the min distance between 2 sets of poi
« Reply #2 on: Sep 24^{th}, 2015, 4:26pm » 

I would suggest using the distance formula
Don't take the square root to get the actual distance (it will take too much time). Just leave the values squared and compare squared values.
ex. Let's say you want to find the squared distance between two 3D points (x1,y1,z1 and x2,y2,z2).
xd = (x2x1) yd = (y2y1) zd = (y2z1)
distSquared = xd*xd + yd*yd + zd*zd
Save the distSquared and the record numbers of the A and B files in a comma separated value (.CSV) file.
Load the .csv file into Excel or Libre Office spreadsheet and sort by the distSquared column.
The smallest distances will be at the top and you will see the associated A and B record numbers in the second and third column.
That's the best I can come up with to ease your pain
Good Luck!
Andy


Logged

Knowledge liberates minds. It does not diminish when shared with more minds. Instead, the more it is shared, the more it grows.



tsh73
Board Moderator
member is offline
Anatoly (real name)
Gender:
Posts: 1680


Re: finding the min distance between 2 sets of poi
« Reply #3 on: Sep 25^{th}, 2015, 03:56am » 

Here is it, simplest way (brute force), finding nearest from 50000 B points, to 10 A points. That is, 1/100 of the actual task.
On my machine, it takes 12 seconts
12sec*100 make 20 hours. EDIT stupid me. 20 minutes. I sure can wait 20 minutes  just go for some coffee.
Now, I guess that something like C/C# is some 100x times faster so it will be under 1 minute.
So the obvious choise is  recode it in a faster language or implement some smart algorythm. Like may be Nearest neighbour search on Kd_tree
Code:'Say set A has 1000 points and set B has 50000.
'What I want to do is cycle through each point in set A
'and find which point in set B is closest.
numA=1000
numB=50000
dim A.x(numA),A.y(numA),A.z(numA)
dim AtoB(numA)
dim B.x(numB),B.y(numB),B.z(numB)
'generate randomly
print "generate randomly"
randomize 0.5 'repeatable "randomly"
rangeMin = 1000: rangeMax = 10000
for i = 1 to numA
A.x(i) = rangeMin + rnd(1)*(rangeMax  rangeMin)
A.y(i) = rangeMin + rnd(1)*(rangeMax  rangeMin)
A.z(i) = rangeMin + rnd(1)*(rangeMax  rangeMin)
'print i, A.x(i), A.y(i), A.z(i)
next
for i = 1 to numB
B.x(i) = rangeMin + rnd(1)*(rangeMax  rangeMin)
B.y(i) = rangeMin + rnd(1)*(rangeMax  rangeMin)
B.z(i) = rangeMin + rnd(1)*(rangeMax  rangeMin)
next
print "Finding nearest point "
t0=time$("ms")
for i = 1 to 10'numA
Ax=A.x(i): Ay=A.y(i): Az=A.z(i)
minD=1e300
for j = 1 to numB
'd=(A.x(i)B.x(j))^2+(A.y(i)B.y(j))^2+(A.z(i)B.z(j))
'Time taken, s 13.281 for 10x50000
d=(AxB.x(j))^2+(AyB.y(j))^2+(AzB.z(j))
'Time taken, s 11.781 then cashing Ax Ay Az
'print j, d, minD, minJ
if d < minD then minD=d: minJ = j
next
AtoB(i)=minJ
next
t1=time$("ms")
print "Time taken, s "; (t1t0)/1000
print "over"
for i = 1 to 20'numA
print i, AtoB(i)
next

« Last Edit: Sep 25^{th}, 2015, 05:13am by tsh73 » 
Logged

damned Dog in the Manger



tsh73
Board Moderator
member is offline
Anatoly (real name)
Gender:
Posts: 1680


Re: finding the min distance between 2 sets of poi
« Reply #4 on: Sep 25^{th}, 2015, 05:10am » 

>>recode it in a faster language Sometimes one just *need* another language. And to object it is sheer badass stubborness.
Here same code converted to MS VC# Express 2010 Code:using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
int numA=1000;
int numB=50000;
int[] Ax=new int[numA];
int[] Ay=new int[numA];
int[] Az=new int[numA];
int[] AtoB=new int[numA];
int[] Bx=new int[numB];
int[] By=new int[numB];
int[] Bz=new int[numB];
Console.WriteLine("generate randomly");
Random r = new Random(1); //'repeatable "randomly"
int rangeMin = 1000, rangeMax = 10000;
for (int i = 0; i < numA; i++)
{
Ax[i]=r.Next(rangeMin, rangeMax );
Ay[i]=r.Next(rangeMin, rangeMax );
Az[i]=r.Next(rangeMin, rangeMax );
}
for (int i = 0; i < numB; i++)
{
Bx[i]=r.Next(rangeMin, rangeMax );
By[i]=r.Next(rangeMin, rangeMax );
Bz[i]=r.Next(rangeMin, rangeMax );
}
Console.WriteLine("Finding nearest point");
DateTime t0 = DateTime.Now;
double minD, d;
int minJ=1;
for (int i = 0; i < numA; i++)
{
minD=1e300;
for (int j = 0; j < numB; j++)
{
d=(Ax[i]Bx[j])*(Ax[i]Bx[j])
+(Ay[i]By[j])*(Ay[i]By[j])
+(Az[i]Bz[j])*(Az[i]Bz[j]);
if (d < minD)
{
minD=d; minJ = j;
}
}
AtoB[i]=minJ;
}
DateTime t1 = DateTime.Now;
Console.WriteLine("over");
Console.WriteLine("Time taken: {0}", t1  t0);
for (int i = 0; i < 20; i++)
{
Console.WriteLine("{0} {1}", i, AtoB[i]);
}
Console.ReadKey();
}
}
}
results: Code:generate randomly
Finding nearest point
over
Time taken: 00:00:02.9687690
0 48171
1 36077
2 45538
3 32153
4 11737
5 32412
6 32864
7 1619
8 8320
9 24538
10 35109
11 31782
12 15209
13 10501
14 40535
15 40947
16 40871
17 12194
18 43724
19 26686
That's 3 seconds for *whole* task. 3 seconds compared to 20 hours. Oh boy. I should miscalculate something... Quick check: 12 seconds * 100 is 1200 seconds is 20 MINUTES!!! damn.


Logged

damned Dog in the Manger



tenochtitlanuk
Board Moderator
member is offline
Gender:
Posts: 1160


Re: finding the min distance between 2 sets of poi
« Reply #5 on: Sep 25^{th}, 2015, 06:21am » 

Agree about language choice we say in UK 'Horses for courses', ie in a horse race the particular horse to bet on varies depending on the particular place they are running. Anyway, here's my prog taking 1min to display 50 x 10000 points.
EDIT full 1000x50000 took under two hours...




blessed2b3rd
Junior Member
member is offline
Posts: 65


Re: finding the min distance between 2 sets of poi
« Reply #6 on: Oct 15^{th}, 2015, 5:53pm » 

Hi All,
Thanks for the replies. Sorry its taken so long to get back to this but a mix of travel for work and the Rugby World Cup things sorta took a sidetrack.
The formula and technique provided by tsh73 seems to be the most straightforward for my novice needs. I can write the code to figure out what is the shortest distance but how do I write the culpable X and Y coordinates to a new csv. The writing to the .csv file isn't the issue its identifying the X and Y values that were used to figure out the minimum distance so I can then write these back to the corresponding set A. Here is my code:
Code:MinD = 999999999
for i = 1 to countdata
for j = 1 to countdata
if CSVData(i,nX) > PXmin and CSVData(i,nX) < PXmax and CSVData(i,ny) > Pymin and CSVData(i,nY) < PYmax then
d = (CSVData(i,nX)  CSVData1(j,nPitX))^2 + (CSVData(i,ny)  CSVData1(j,nPitY))^2
if d < MinD then MinD = d
end if
end if
next j
next i
For each point in set A there is a MinD. How do I write the XYs from set B that gave me the MinD?


Logged




Rod
Global Moderator
member is offline
Graphics = goosebumps!
Gender:
Posts: 5471


Re: finding the min distance between 2 sets of poi
« Reply #7 on: Oct 16^{th}, 2015, 01:42am » 

At the point you decide if minD is to be overwritten then also store minX and minY. Then as you exit the for next loop write minD minX and minY to your csv file.


Logged




