Liberty BASIC Programming Discussions >> Novice >>

Post by

Hello again,

Firstly thank you for all the very helpful solutions you have all provided. Its been nine months since my coding conception and although the curve has been steep its been an oddly pleasing balance of frustration and immense satisfaction.

I have a new issue. I have two sets of points in two separate .csv files. Lets call them set A and set B. Both sets of points are defined by XYZ coordinates. Say set A has 1000 points and set B has 50000. What I want to do is cycle through each point in set A and find which point in set B is closest.

Im figuring the logic is this:

Take the first point in set A and using the X value, cycle through every X value in set B. By minus-ing each value in set B from each value in set A I can then figure which value is smallest, thus being closest on the X plane. Doing the same for each Y value will do the same for the Y plane. Adding the distances of the X value and Y value for each point will give me an overall value and thus filtering the smallest will tell me which points are closest. I am hoping to end up with a third .csv file listing the compete set A with the closest 1000 set B points next to them.

To speed up the code I use only points that are within the range of set A so as not to cycle through ach and every 50000 set B points.

I realize this may be difficult to visualize so i'll attempt a analogy. Picture a cargo net hanging from its four corners from four posts. The cargo net has 50000 thread cross-over points where there is 1000 tennis balls hanging and tied via a piece of string to a thread cross-over at randomly chosen spots. I want to find which cross over point is closest to each tennis ball.

I'm probably making it worse but here is the logic code I've come up with so far but I'm getting stuck in the loops. Can anyone help?

Set A = CSVData

Set B = CSVData1

'************************************************************************ 'Scan Co-ords for max and min values '************************************************************************ Xmin = 100000000000 Xmax = -100000000000 Ymin = 100000000000 Ymax = -100000000000 for i = 1 to countdata X = CSVData(i,nX) Y = CSVData(i,nY) if CSVData(i,nX) < Xmin then Xmin = X if CSVData(i,nX) > Xmax then Xmax = X if CSVData(i,nY) < Ymin then Ymin = Y if CSVData(i,nY) > Ymax then Ymax = Y next i print Xmin;" "; Xmax;" "; Ymin;" "; Ymax PXmin = 100000000000 PXmax = -100000000000 PYmin = 100000000000 PYmax = -100000000000 for i = 1 to countdata PX = CSVData1(i,nPitX) PY = CSVData1(i,nPitY) if CSVData1(i,nPitX) < PXmin then PXmin = PX if CSVData1(i,nPitX) > PXmax then PXmax = PX if CSVData1(i,nPitY) < PYmin then PYmin = PY if CSVData1(i,nPitY) > PYmax then PYmax = PY next i print PXmin;" "; PXmax;" "; PYmin;" "; PYmax '************************************************************************ 'Cycle through points '************************************************************************ MindistX = 999999999 MindistY = 999999999 for i = 1 to countdata for j = 1 to countdata DX = CSVData1(i,nMinDistX) DY = CSVData1(i,nMinDistY) if CSVData(i,nX) > PXmin and CSVData(i,nX) < PXmax then distX = CSVData(i,nX) - CSVData1(j,nPitX) end if if CSVData(i,ny) > Pymin and CSVData(i,nY) < PYmax then distY = CSVData(i,nY) - CSVData1(j,nPitY) end if if distY < MindistY then MindistY = CSVData1(i,nMinDistY) end if if distX < MindistX then MindistX = CSVData1(i,nMinDistX) end if next j next i

I can provide the entire code where I import all the points and provide the two sets of points but the files are large and you guys seem to come up with a clickety-click method to ensure the code works on a smaller set first.

Post by

The following MAY be the kind of thing you're wanting.

I ASSUME the z values are irrelevant.

The data is held in a CSV string for convenience and compatibility with files.

The graphic representation shows the two overlapping data sets.

NB I'm using much smaller sets so i can see results in a few seconds. Haven't checked how long it takes for sets of 1000 and 50000!! ( ?about one day??)

I generate the two datasets internally- the 'randomize' means I get the same data each time for test purposes.

I use pythagoras to find separations ( actually use sepn^2 to save time.) then re-plot.

The new file appends nearest neighbour data to each of the first dataset's terms. Easily saved to a file.

Is this the kind of thing you are after??

' nomainwin randomize( 0.666) ' seed PRNGenerator WindowWidth =600 WindowHeight =600 ' Generate two data sets to play with.... s1 =100 '1000 s2 =300 '50000 dim dataSet1$( s1) ' string holds x y and z as csv string dim dataSet2$( s2) dim dataSet3$( s1) ' will add the closest from set2 to set1 open "Display" for graphics_nsb_nf as #wg #wg "trapclose [quit]" #wg "down ; fill 50 50 50 ; size 2" ' Show them on screen.... #wg "color red" for i =1 to s1 x =50 +int( 400 *rnd( 1)): y =50 +int( 400 *rnd( 1)): z =int( 256 *rnd( 1)) dataSet1$( i) =str$( x) +"," +str$( y) +"," +str$( z) #wg "set "; 50 +x; " ";50 +y scan next i #wg "color cyan" for i =1 to s2 x =int( 500 *rnd( 1)): y =int( 500 *rnd( 1)): z =int( 256 *rnd( 1)) dataSet2$( i) =str$( x) +"," +str$( y) +"," +str$( z) #wg "set "; 50 +x; " "; 50 +y scan next i ' Go through all data in set1 and find closest in set2 for i =1 to s1 sepnSqd =1E12 Jn =0 for j =1 to s2 dX =val( word$( dataSet1$( i), 1, ",")) -val( word$( dataSet2$( j), 1, ",")) dY =val( word$( dataSet1$( i), 2, ",")) -val( word$( dataSet2$( j), 2, ",")) dSqd =dX^2 +dY^2 if dSqd <sepnSqd then sepnSqd =dSqd: Jn =j ' Jn ends pointing to closest in set 2 next j ' Create new dataset3 dataSet3$( i) =dataSet1$( i) +"," +dataSet2$( Jn) next i ' Clear screen and show only set1 and nearest neighbours #wg "fill darkblue" for i =1 to s1 print dataSet3$( i) xS =int( val( word$( dataSet3$( i), 1, ","))) yS =int( val( word$( dataSet3$( i), 2, ","))) #wg "color red ; set "; 50 +xS; " "; 50 +yS xS =int( val( word$( dataSet3$( i), 4, ","))) yS =int( val( word$( dataSet3$( i), 5, ","))) #wg "color cyan ; set "; 50 +xS; " "; 50 +yS scan next i wait [quit] close #wg end

Sounds like you find programming as addictive and satisfying as I do!

Post by

I would suggest using the

Don't take the square root to get the actual distance (it will take too much time). Just leave the values squared and compare squared values.

ex.

Let's say you want to find the squared distance between two 3D points (x1,y1,z1 and x2,y2,z2).

xd = (x2-x1)

yd = (y2-y1)

zd = (y2-z1)

distSquared = xd*xd + yd*yd + zd*zd

Save the distSquared and the record numbers of the A and B files in a comma separated value (.CSV) file.

Load the .csv file into Excel or Libre Office spreadsheet and sort by the distSquared column.

The smallest distances will be at the top and you will see the associated A and B record numbers in the second and third column.

That's the best I can come up with to ease your pain

Good Luck!

Andy

Post by

Here is it, simplest way (brute force), finding nearest from 50000 B points, to 10 A points. That is, 1/100 of the actual task.

On my machine, it takes 12 seconts

12sec*100 make

EDIT stupid me. 20 minutes. I sure can wait 20 minutes - just go for some coffee.

Now, I guess that something like C/C# is some 100x times faster so it will be under 1 minute.

So the obvious choise is - recode it in a faster language

or implement some smart algorythm.

Like may be Nearest neighbour search on

'Say set A has 1000 points and set B has 50000. 'What I want to do is cycle through each point in set A 'and find which point in set B is closest. numA=1000 numB=50000 dim A.x(numA),A.y(numA),A.z(numA) dim AtoB(numA) dim B.x(numB),B.y(numB),B.z(numB) 'generate randomly print "generate randomly" randomize 0.5 'repeatable "randomly" rangeMin = 1000: rangeMax = 10000 for i = 1 to numA A.x(i) = rangeMin + rnd(1)*(rangeMax - rangeMin) A.y(i) = rangeMin + rnd(1)*(rangeMax - rangeMin) A.z(i) = rangeMin + rnd(1)*(rangeMax - rangeMin) 'print i, A.x(i), A.y(i), A.z(i) next for i = 1 to numB B.x(i) = rangeMin + rnd(1)*(rangeMax - rangeMin) B.y(i) = rangeMin + rnd(1)*(rangeMax - rangeMin) B.z(i) = rangeMin + rnd(1)*(rangeMax - rangeMin) next print "Finding nearest point " t0=time$("ms") for i = 1 to 10'numA Ax=A.x(i): Ay=A.y(i): Az=A.z(i) minD=1e300 for j = 1 to numB 'd=(A.x(i)-B.x(j))^2+(A.y(i)-B.y(j))^2+(A.z(i)-B.z(j)) 'Time taken, s 13.281 for 10x50000 d=(Ax-B.x(j))^2+(Ay-B.y(j))^2+(Az-B.z(j)) 'Time taken, s 11.781 then cashing Ax Ay Az 'print j, d, minD, minJ if d < minD then minD=d: minJ = j next AtoB(i)=minJ next t1=time$("ms") print "Time taken, s "; (t1-t0)/1000 print "over" for i = 1 to 20'numA print i, AtoB(i) next

Post by

>>recode it in a faster language

Sometimes one just *need* another language. And to object it is sheer badass stubborness.

Here same code converted to MS VC# Express 2010

using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace ConsoleApplication1 { class Program { static void Main(string[] args) { int numA=1000; int numB=50000; int[] Ax=new int[numA]; int[] Ay=new int[numA]; int[] Az=new int[numA]; int[] AtoB=new int[numA]; int[] Bx=new int[numB]; int[] By=new int[numB]; int[] Bz=new int[numB]; Console.WriteLine("generate randomly"); Random r = new Random(1); //'repeatable "randomly" int rangeMin = 1000, rangeMax = 10000; for (int i = 0; i < numA; i++) { Ax[i]=r.Next(rangeMin, rangeMax ); Ay[i]=r.Next(rangeMin, rangeMax ); Az[i]=r.Next(rangeMin, rangeMax ); } for (int i = 0; i < numB; i++) { Bx[i]=r.Next(rangeMin, rangeMax ); By[i]=r.Next(rangeMin, rangeMax ); Bz[i]=r.Next(rangeMin, rangeMax ); } Console.WriteLine("Finding nearest point"); DateTime t0 = DateTime.Now; double minD, d; int minJ=-1; for (int i = 0; i < numA; i++) { minD=1e300; for (int j = 0; j < numB; j++) { d=(Ax[i]-Bx[j])*(Ax[i]-Bx[j]) +(Ay[i]-By[j])*(Ay[i]-By[j]) +(Az[i]-Bz[j])*(Az[i]-Bz[j]); if (d < minD) { minD=d; minJ = j; } } AtoB[i]=minJ; } DateTime t1 = DateTime.Now; Console.WriteLine("over"); Console.WriteLine("Time taken: {0}", t1 - t0); for (int i = 0; i < 20; i++) { Console.WriteLine("{0} {1}", i, AtoB[i]); } Console.ReadKey(); } } }

results:

generate randomly Finding nearest point over Time taken: 00:00:02.9687690 0 48171 1 36077 2 45538 3 32153 4 11737 5 32412 6 32864 7 1619 8 8320 9 24538 10 35109 11 31782 12 15209 13 10501 14 40535 15 40947 16 40871 17 12194 18 43724 19 26686

That's 3 seconds for *whole* task. 3 seconds compared

Quick check:

12 seconds * 100 is 1200 seconds is 20 MINUTES!!!

damn.

Post by

Agree about language choice- we say in UK 'Horses for courses', ie in a horse race the particular horse to bet on varies depending on the particular place they are running.

Anyway, here's my prog taking 1min to display 50 x 10000 points.

EDIT full 1000x50000 took under two hours...

Post by

Hi All,

Thanks for the replies. Sorry its taken so long to get back to this but a mix of travel for work and the Rugby World Cup things sorta took a sidetrack.

The formula and technique provided by tsh73 seems to be the most straightforward for my novice needs. I can write the code to figure out what is the shortest distance but how do I write the culpable X and Y coordinates to a new csv. The writing to the .csv file isn't the issue its identifying the X and Y values that were used to figure out the minimum distance so I can then write these back to the corresponding set A. Here is my code:

MinD = 999999999 for i = 1 to countdata for j = 1 to countdata if CSVData(i,nX) > PXmin and CSVData(i,nX) < PXmax and CSVData(i,ny) > Pymin and CSVData(i,nY) < PYmax then d = (CSVData(i,nX) - CSVData1(j,nPitX))^2 + (CSVData(i,ny) - CSVData1(j,nPitY))^2 if d < MinD then MinD = d end if end if next j next i

For each point in set A there is a MinD. How do I write the XYs from set B that gave me the MinD?

Post by

At the point you decide if minD is to be overwritten then also store minX and minY. Then as you exit the for next loop write minD minX and minY to your csv file.