Nearpt3 - Nearest Point Query on 184M Points in E3 with a Uniform Grid
Rensselaer Polytechnic Institute
Nearpt3 is a pair of subroutines to find nearest points in E3. Preprocess takes a list of fixed points, and preprocesses them into a data structure. Query takes a query point, and returns the closest fixed point to it.
Nearpt3 is very fast, very small, and can process very nonhomogeneous data. The largest example run on a laptop computer was St Matthew from the Stanford Digital Michelangelo Project Archive, with 184,088,599 fixed points.
Preprocessing the David data set, with 28,158,109 fixed points, into a 7233 grid took 0.6 microseconds per fixed point. Each query required 6 microseconds. The platform was an IBM T43p laptop computer with a 2GHz Intel Pentium M processor and 2GB of memory.
Please see this quite complete paper describing Nearpt3. Some highlights are as follows.
- Test results on the following 16 datasets, shown as
unstructured point clouds, which is how I used them.
Bunny Hand Dragon Bone6 Blade Uniform i.i.d. with 100K, 300K, 1M, 3M, 10M, 30M, or 100M Points UNC Complete Powerplant Lucy David St Matthew
The data had the following sources.
- Tradeoff between grid size and speed.
- Distribution of fixed points per grid cell for two datasets:
- a uniform distribution of 108 points.
- Michelangelo's David, with 28,158,109 fixed points. The points of this dataset are quite evenly distributed, with a local topology that is often two dimensional.
- Distribution of the number of cells searched per query point for the same two datasets.
- Comparison to ANN (Approximate Nearest Neighbors).
Details not appropriate for the paper follow on this web page.
- Nearpt3 is templated on the coordinate type, so decide what type your coordinates will be, say float. (However, each point's coordinates are converted to double before any operations, like computing which cell contains it. Therefore types like rational or bignum would require editing the source.)
- Nearpt3 uses the Boost array class.
- Read your nfixpts fixed points into a standard C array, p whose elements are boost::array<float,3>.
- Declare a grid variable to hold the preprocessed data, and
preprocess the fixed points into it:
nearpt3::Grid_T<float> *g = nearpt3::Preprocess(nfixpts, p);
- Read a query point, q, of type boost:array<float,3>.
- Make a query:
int closestpt = nearpt3::Query(g, q);Note that closestpt is the index in p of the closest point.
- Repeat as desired.
The tarball has some different main programs to call Preprocess and Query, which are in the file nearpt3.cc.
This is the smallest sample main program to demonstrate nearpt3.
- main5 reads fixed points as 3 floats from cin ...
- terminated by a single number <= -1e30.
- Then main5 calls Preprocess with a default value of ng_factor.
- Next it reads query points from cin and for each, finds the closest fixed point.
This is the standard version used in my time tests.
- main7 requires one float command line arg: ng_factor. If in doubt, use 1.
- main7 reads fixed points from file fixpts.
- main7 reads queries from file qpts. Both files contain triples of binary unsigned short ints.
- Why binary? Reading ascii files would take more time than the whole rest of nearpt3's computations.
- main7 writes results to binary file pairs. Each result is a query followed by the nearest fixed point, both as triples of unsigned short int coords.
- main7 writes a line of statistics preceded by a line of documentation. The line of statistics is formatted for inclusion in a LaTeX table.
- If macro EXHAUSTIVECHECK is defined when main7 is compiled, then each query result is exhaustively checked against all the fixed points to see whether it is really the closest. This can be very slow.
- If macro STATS is defined when main7 is compiled, then extra statistics are computed and printed. They include histograms of the number of fixed points per grid cell and cells searched per query.
- If two fixed points are equally close to the query, then Nearpt3 returns the one with smallest index.
- You may compute and query several sets of fixed points, stored in different grid variables, simultaneously.
- However, you must not change p, the array of fixed points while querying, since the grid doesn't duplicatively store the points, but stores &p.
- The following files and utility programs are included in
- nearpt3.cc The nearpt3 classes and subroutines.
- The various main programs listed above.
- compcellsearchorder.cc computes the file cellsearchorder that nearpt3.cc includes when being compiled. Generally, I use only the first 30,000 lines of cellsearchorder.
- makerandpts.cc generates i.i.d. random
points in [0,1] forever. I used, e.g.,
makerandpts | head -1000000to make the uni1m test data.
- splitfq2.cc reads ASCII file points and writes BINARY files fixpts and qpts. The binary files have 6 bytes per point, for 3 unsigned short ints.
- asc2binpts.cc Read ASCII points and write in BINARY.
- printdist.cc Read a pairs file and print the distances between the queries and answers.
- One way to plot the data, if points is ascii:
- and capture the output with xv
- This version of nearpt3 is much faster than earlier versions, particularly on nonuniform data.
Nearpt3 is freely available for nonprofit research and education. I would appreciate an acknowledgement, and a link to my homepage.