K-dimensional tree. Beam tracking in kd tree on CPU

Consider a binary spatial partition structure called kd -tree. This structure is a binary tree of bounding box enclosed in each other. Each box inkd -tree is divided by a plane perpendicular to one of the coordinate axes into two child parallelepipeds.

The whole scene is entirely contained inside the root box, but continuing the recursive splitting of the boxes, we can come to the conclusion that each sheet box will contain only a small number of primitives. Thus,kd -tree allows you to use binary search to find the primitive that the ray intersects.

Building a kd tree

The algorithm for constructing a kd-tree can be represented as follows (we will call the rectangular box the English word "box" (box)).

    Add all primitives to the bounding box. That is, build a box that bounds all primitives, which will correspond to the root node of the tree.

    If there are few primitives in the node or the tree depth limit is reached, complete the construction.

    Select a split plane that divides the given node into two children. We will call them the right and left nodes of the tree.

    Add the primitives intersecting with the box of the left node to the left node, the primitives intersecting with the box of the right node in the right.

The most difficult step in building a kd tree is the 3rd step. The efficiency of the accelerating structure directly depends on it. There are several ways to select a partition plane; we will consider them in order.

1. Select the center plane of the split.

The easiest way is to select the plane in the center of the box. First, select the axis (x, y or z) in which the box has the maximum size, then break the box in the center.

Picture 1 : Center split

This method has poor adaptability. The classic octree has the same drawbacks. One can intuitively describe the reason for the poor speed on such trees by the fact that each node has a lot of empty space through which the beam is traversed (pass through the tree during the search) while the empty space should be immediately discarded as soon as possible. A more rigorous explanation will be given a little later.

2. Select a plane by median.

You might think that it would be nice to split the node into two children each time so that the number of primitives in the right and left subtrees is the same. Thus, we will build a balanced tree, which should speed up the search.

Figure 2 : Median Breakdown

This is not a good idea. The thing is that balanced trees can only help if the desired element is each time in a random node, that is, if the distribution of rays among the nodes during the search is uniform. In fact, this is not so. On average, more rays will go to the node that is larger in surface area, and with a median split, these areas of the nodes can be different.

3. Surface Area Heuristic (SAH)

What are the criteria for a well-constructed kd-tree? On an intuitive level, this is actually not very clear, although the expression “as much empty space as possible should be discarded as quickly as possible” will probably do.

We use a formal approach. We introduce the cost traversal function, which will reflect how expensive it is for computing resources to trace this node with a random set of rays. We will calculate it according to the following formula.

SAH (x) \u003d CostEmpty + SurfaceArea (Left) * N (Left) + SurfaceArea (Right) * N (Right)

Where CostEmpty is the cost of tracing an empty node (some constant), SurfaceArea is the surface area of \u200b\u200bthe corresponding node, and N is the number of primitives in it. It is necessary to choose a partition plane so as to minimize this function. The argument of the function x is the one-dimensional coordinate of the partition plane.

Border primitives can serve as good candidates for a minimum of SAH. A simple construction algorithm is as follows. Each time you select a plane, you need to sort through all kinds of primitive boundaries in three dimensions, calculate the value of the cost function in them and find the minimum among all these values. When we calculate SAH for each plane, we need to know N (left) and N (right) - the number of primitives to the right and left of the plane. If we calculate N by a simple enumeration, we end up with a quadratic in complexity construction algorithm.

Picture 3 : Partition based on SAH

In Figure 4, you can see that SAH immediately discards large empty spaces, tightly restricting the geometry.


Picture 4 : kd tree built with SAH in mind

Surface Area Heuristic generates a more intelligent stopping criterion than a tree depth limit or a small number of primitives. If for the selected partition plane, the total cost of tracing the child nodes is greater than the cost of tracing the current node as a leaf (that is, according to the formula SurfaceArea (Node) * N (Node)), then the tree should be stopped.

4. Fast build kd tree

Option 0

Break in the center, choosing the axis along which the box size is maximum. This method is the fastest, but in some scenes ray tracing in such a tree is about 2-3 times slower.

Option 1

The simplest thing that can be done to speed up the construction is to reduce the number of planes being sorted. Instead of sorting through N planes (where N is the number of primitives), you can calculate SAH only in a small number of pre-fixed places. The input box is evenly divided on each axis into M parts. Typically, M ranges from a few tens to a couple of hundred. N (left) and N (right) are still calculated by enumeration, but now we have to do a full enumeration of the array only M times, not N. Thus, the algorithm becomes complex N * M. In fact, we achieved acceleration by roughening SAH by discretizing it. However, it turns out that the minimum value of the obtained rough function as a rule does not differ much from its exact minimum.

Option 2

What can be done to speed up option 1? It is necessary to get rid of exhaustive search while calculating N (left) and N (right). To do this, construct a tree, in each node of which there is a certain dividing plane and the number of primitives to the right and left of the plane. struct IntervalTree

Float   split

Int   numPrimsOnLeftSide;

Int   numPrimsOnRightSide;

IntervalTree * left;

IntervalTree * right;

In fact, we need three such trees at each level - one at x, y, z. The input interval will be divided in half each time. Having such a tree, it is possible to accurately determine the number of primitives to the right and left of the plane using log (N) operations. The search algorithm in the tree looks quite simple.

vec2i IntervalTree :: TraversePlane (

float   a_split) const

// in the zero component we will store the minimum, in the first - the maximum

vec2i res; // vector of two integers

  if(a_split< split - EPSILON)

res \u003d numPrimsOnRightSide;

  if(left! \u003d 0)

res + \u003d Left () -\u003e TraversePlane (a_split);

  // count at least those primitives that are in this node so that SAH does not vanish.

If(res.M \u003d\u003d 0)

res.M \u003d numPrimsOnLeftSide;

Elseif(a_split\u003e split + EPSILON)

res \u003d numPrimsOnLeftSide;

If(right! \u003d 0)

res + \u003d right-\u003e TraversePlane (a_split);

// if for some reason the number of primitives is zero, then

  // count at least those primitives that are in this node so that SAH does not vanish.

If(res \u003d\u003d 0)

res \u003d numPrimsOnRightSide;

Else

// hit exactly on the node plane

res \u003d numPrimsOnLeftSide;

res \u003d numPrimsOnRightSide;

  return   res;

The complexity of building a tree of planes is N * log (N). The complexity of estimating SAH is M * log (N), since we calculate SAH only in M \u200b\u200bfixed planes. Thus, the total complexity of the algorithm is N * log (N), because M is much smaller than N.

Before constructing a kd tree, it is possible in principle to build an arbitrary accelerating structure only in order to speed up the subsequent calculation of SAH.

Option 3 (Almost exactly and for N * Log (N))

We use sorting. For each axis (x, y, z), we sort the primitives first by the maxima and then by the minima of their bounding boxes. An array sorted by maxima sorted_max is called. An array sorted by minima is sorted_min.

Passing through the sorted_min array from left to right, each time we know exactly how many primitives (or rather, their bounding boxes) lie to the right of the plane (and we almost exactly know how many primitives lie to the left). Passing through the sorted_max array from right to left, we know exactly how many primitives lie to the left of the plane and almost exactly know how much lies to the right.

for(intaxis = 0; axis < 3; axis++)

  // sort primitives belong to current axis and min bounds

//   Comparemincomparator(axis);   std:: sort(prims. begin(), prims. end(), comparator);   for(inti=1; i< prims. size(); i++)   intonLeftSide = i;   intonRightSide = prims. size()- i;   floatsplit = prims[ i]. Box(). vmin[ axis];   if(split == a_box. vmin[ axis])   continue;

// evaluate SAH

Aabb3fbox_left = a_box;

Aabb3fbox_right = a_box;

Box_left. vmax[ axis] = split;

Box_right. vmin[ axis] = split;

Floatsah   \u003d EMPTY_COST + onLeftSide* SurfaceArea(box_left) + onRightSide* SurfaceArea(box_right);

If (sah < min_sah)

Min_sah = sah;

Min_split = split;

Min_axis = axis;

  // sort primitives belong to current axis and max bounds

//   CompareMaxcomparator2(axis);   std:: sort(prims. begin(), prims. end(), comparator2);   for(inti= prims. size()-2; i>=0; i--)   intonLeftSide = i+1;   intonRightSide = prims. size()- i-1;   floatsplit = prims[ i]. Box(). vmax[ axis];   if(split == a_box. vmax[ axis])   continue;

  // evaluate SAH

Aabb3f box_left = a_box;

Aabb3f box_right = a_box;

Box_left.vmax[axis] = split;

Box_right.vmin[axis] = split;

Float sah   \u003d EMPTY_COST + onLeftSide*SurfaceArea(box_left) + onRightSide*SurfaceArea(box_right);

If (sah < min_sah)

Min_sah = sah;

Min_split = split;

Min_axis = axis;

In theory, the method has a problem. Consider an array of sorted_min. We go through it from left to right in a cycle:

for (int i \u003d 0; i size(); i ++)

onLeftSide = i;

Int onRightSide = prims.size()-i;

// ...

We know the number of onLeftSide exactly, but the number of onRightSide is not quite. The fact is that the plane can break up some primitives and in this case the same primitive lies both to the right of the plane and to the left, which this algorithm does not take into account. In practice, this problem practically does not manifest itself.

Option 4

You can speed up the search for a minimum of the SAH functions using some kind of minimization method with several initial approximations. Say the golden ratio method. In this case, we also get rid of the exhaustive search method. It is only necessary to take into account that SAH takes on a more or less smooth shape only with a large number of primitives. The advantage of this approach is that the calculation of SAH by brute force is easily transferred to the GPU. Therefore, evaluating SAH each time with brute force and reducing the number of ratings to a small number (~ 10-20 max), you can build a kd tree by this method very quickly.

Option 5 (Binning)

This option is often used. The bottom line is this:

    You need to divide the space into regular intervals in x, y and z. Each such interval is called a basket (bin). Usually limited to a small number of baskets ~ 32.

    Centers of triangles are taken and put in baskets. This means that you need to go through all the triangles, and calculate their centers. After that, for each basket, you need to calculate how many points (centers) are in it. This is not difficult to do. When calculating the center, you just need to increase the corresponding counter. Since dividing into baskets regularly, having the coordinate of the point, you can immediately determine which basket it falls into.

Beam tracking in kd tree on CPU

you can download the algorithm from here

The classical binary search algorithm in kd trees (kd-tree traversal in English literature), which is used in mostCPU   implementations is approximately as follows.At the first step of the algorithm, it is necessary to calculate the intersection of the beam with the root parallelepiped bounding the scene and remember the information about the intersection in the form of two coordinates (in the space of the beam) -t_near and t_far denoting the intersection with the near and far planes, respectively. At each next step, information is only needed about the current node (its address) and these two coordinates.

In this case, there is no need to calculate the intersection of the beam and the child parallelepiped, it is enough to find out the intersection with the plane dividing the box (we denote the corresponding coordinate ast_split ) Every leaf nodekd   A tree has two child nodes. In fig. 5. depicts three versions of events that can occur when tracking the path of the beam.

Figure 5 : Beam tracking in kd tree

When A   (t_split\u003e \u003d t_far) or B   (t_split< t_near) the ray crosses only one child node, so you can simply discard the right (respectively left) nodeand continue searching for the intersection in the remaining node.

When C   the ray intersects both child nodes, so you must first look for the intersection in the near node and if it is not found, look for it in the far one. Since in the general case it is not known how many times the last event will occur, a stack is needed. Each time the beam crosses both child nodes, the address of the far node,t_near and t_far   are pushed onto the stack and the search continues in the middle. If the intersection is not found in the near node, the address of the far node is obtained from the stack,t_near, t_far   and the search continues at the far site.

GPU beam tracking in GP tree

Since on the GPU   initially there is no stack; stackless algorithms and algorithms using an artificial stack of short length have appeared. On theGPU   There are five known algorithms for tracking the path of the beam -restart, backtrack, push-down, short stack   and tracking algorithm inkd   tree with ligaments.

kd-tree restart

Picture 2 : restart algorithm work.

When the ray does not find an intersection in the leaf, its t_far is equal to t_near and the search continues from the root of the kd-tree (Fig. 2). Roughly speaking, the origin of the ray simply moves - that is, its point of emission and the search begins again. This leads to the fact that the beam passes through the same nodes many times, which is inefficient.

kd-tree push-down

Small modificationrestart   algorithm, consisting in the fact that the ray does not start from the root of the tree, but from some subtree. However, a node can be selected as a new subtree only if during the entire descent no nodes have met to which a ray would intersect both child nodes.

That is, if when descending to the nearest nodeskd   tree at least once met a node in which the beam intersects the near and far child nodes, it is this distant child node that must be selected as a subtree. Further, if the beam missed, the restart will be made from the memorized far node and you can again try to find a new subtree. In fact, this is an attempt to make a stack 1 element long.

kd-tree short stack

The authors also used a short stack. As long as the stack size is enough, it is filling up similarly to the classical algorithm. When the stack overflows, it starts working like a ring buffer. If the stack is empty, you need to restart. For example, if a stack of length 4 contains nodes with numbers (1), (4), (6), (8), then when adding a new element (12), the stack will have the following form: (12), (4), (6), (8). That is, the first element will be overwritten. Elements will be deleted in the order in which they were added (that is, first 12, then 8, 6 and finally 4), but when element (4) is removed from the stack, it will be necessary to restart, since we have deleted element (1). The meaning of the short stack is that it greatly reduces the number of restarts that the beam has to do.

kd-tree backtrack

If stored in nodeskd   tree link to the parent nodes, in case of a miss this link can get to the parent. In addition, it is necessary to store its bounding parallipipid in each node in order to be able to calculate a newt_far   in case of a miss.

Ct_near   can do the same as in the caserestart   an algorithm. This will require an additional 6 * 4 (for parallellipipid coordinates) + 4 (for parent reference) \u003d 28 bytes. Since the memory onGPU   rather limited, such squandering can create problems. In addition, each time you climb, you will have to consider the intersection of the beam and the parallellipipid aligned along the axes, which is certainly not free in terms of computing resources. It should be noted that a kd tree with saved parallellipipids will occupy much more memory than a well-built BVH tree in the same scene. The main reason here is that in the kd tree, parallellipipids will have more common points, which will eventually be duplicated.

kd-tree with bundles

In this embodiment, in each nodekd   the tree adds six links to neighboring nodes of the tree (those nodes that relate to this node) (Fig. 3). If the beam missed in the current node, then through the ligaments you can get to the following nodes in which you want to track the beam.

This algorithm, likebacktrack   avoids multiple walks through the same tree nodes. However, six links require an additional 24 bytes of memory, which, together with the available 8, gives 32 bytes.

Picture 3 : kd   tree with bundles .

The benefits of kd trees

  1. A very simple and efficient traverse algorithm. Even for the GPU.
  2. They take up little memory (8 bytes per node).

Disadvantages of kd trees

    A time-consuming construction, namely the search for a partition with minimal SAH.

    It has a greater depth than BVH. More steps to build.

Conclusion

Summarizing, we can say that the kd tree is ideal for ray tracing. This is true for both the CPU and the GPU.

Literature

    Wald I. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis,Saarland University, 2004.

  1. Shevtsov M., Soupikov A., Kapustin A.

      Highly Parallel Fast KD-tree Construction for Interactive Ray Tracing of Dynamic Scenes. In Proceedings of the EUROGRAPHICS conference, 2007.
  2. Foley T., Sugerman J . KD-Tree Acceleration Structures for a GPU Raytracer. In Proceedings of the ACM SIGGRAPH / EUROGRAPHICS conference on Graphics hardware, p. 15-22, 2005.

    Horn D., Sugerman J., Houston M., Hanrahan P. Interactive k-D Tree GPU Raytracing. Proceedings of the symposium on Interactive 3D graphics and games on Fast rendering, p. 167-174, 2007.

    Popov S., Günther J., Seidel H.-P., Slusallek P. Stackless KD-Tree Traversal for High Performance GPU Ray Tracing. In Proceedings of the EUROGRAPHICS conference, 2007.

Mathematical description

The K-dimensional tree is an unbalanced search tree for storing points from. It offers an R-tree-like search capability in a given range of keys. To the detriment of the simplicity of queries, memory requirements instead.

There are homogeneous and heterogeneous k-d trees. In homogeneous k-d trees, each node stores a record. In a heterogeneous version, internal nodes contain only keys, leaves contain links to records.

In a heterogeneous k-d tree with a parallel axis of a -dimensional hyperplane at a point. For the root, you need to divide the points through the hyperplane into two equally large sets of points and write to the root, to the left of this all points are saved to the right of which those. For the left subtree, you need to divide the points again into a new "divided plane", and it is saved in the internal node. To the left of this all points are saved. This continues recursively over all spaces. Then everything starts again from the first space, until each point can be clearly identified through a hyperplane.

K-d tree can be built for. A range search can be performed for, while indicating the size of the response. The memory requirement for the tree itself is limited.

Operations with k-d trees

Structure

The tree structure described in C ++:

Const N \u003d 10; // number of key spaces   Struct Item ( // element structure    int key [N]; // array of keys defining the element    char * info; // element information   ); Struct Node ( // tree node structure    Item i; // element Node * left; // left subtree    Node * right; // right subtree }

The tree structure may vary depending on the details of the implementation of the algorithm. For example, a node may contain not one element, but an array, which increases the efficiency of the search.

Element Search Analysis

Obviously, the minimum number of items viewed is equal, and the maximum number of items viewed is, where is the height of the tree. It remains to calculate the average number of items viewed.

The specified item.

Consider the case. Items found can be:

and so for each key space. The average search length in one space is:

.

The average value is calculated by the formula:

It remains to find the probability. It is equal to, where is the number of cases when, and is the total number of cases. It’s not difficult to guess that

Substitute this in the formula for the average value:

that is,, where is the height of the tree.

If we go from the height of the tree to the number of elements, then:

Where is the number of elements in the node.

From this you can make conclusionso that the more elements are contained in the node, the faster the search will take place in the tree, since the height of the tree will remain minimal, but you should not store a huge number of elements in the node, since with this method the whole tree can degenerate into a regular array or list .

Adding Items

Elements are added in the same way as in a regular binary search tree, with the only difference being that each level of the tree will also be determined by the space to which it belongs.

Tree promotion algorithm:

For (int i \u003d 0; tree; i ++) // i is the space number    if (tree-\u003e x [i]< tree- >   t) // t - median tree \u003d tree-\u003e left; // go to the left subtree    else tree \u003d tree-\u003e right; // go to the right subtree

Adding is performed for, where is the height of the tree.

Delete items

When deleting tree elements, several situations may arise.

  • Deleting a tree leaf is a fairly simple deletion when one node is deleted and the pointer of the ancestor node is simply nullified.
  • Removing a tree node (not a leaf) is a very complicated procedure in which you have to rebuild the entire subtree for a given node.

Sometimes the process of deleting a node is solved by modifications of the k-d tree. For example, if our node contains an array of elements, then when you delete the entire array, the tree node remains, but new elements are no longer written there.

Search for a range of items

The search is based on a normal tree descent when each node is checked for a range. If the medians of a node are less than or greater than a given range in a given space, then the bypass goes further along one of the branches of the tree. If the median of the node is fully within the specified range, then you need to visit both subtrees.

Algorithm

Z - tree node [(x_0_min, x_1_min, x_2_min, ..., x_n_min), (x_0_max, x_1_max, x_2_max, ..., x_n_max)] - given range Function Array (Node * & Z) (If ([x_0_min, x_1_min, x_2_min, ..., x_n_min]< Z) { Z= Z- >   left; // left subtree   ) else If ([x_0_max, x_1_max, x_2_max, ..., x_n_max]\u003e Z) (Z \u003d Z-\u003e right; // right subtree   ) Else ( // view both subtrees    Array (Z-\u003e right); // run the function for the right subtree    Z \u003d Z-\u003e left; // view the left subtree } }

Analysis

Obviously, the minimum number of items viewed is where is the height of the tree. It is also obvious that the maximum number of elements viewed is, that is, viewing all elements of the tree. It remains to calculate the average number of items viewed.

Preset range.

The original article about k-d trees gives this characteristic: for a fixed range.

If we move the height of the tree to the number of elements, then this will be:

Find your nearest neighbor

The search for the nearest element consists of 2 tasks. Determination of the possible closest element and search for the nearest elements in a given range, i.e.

Given a tree. We go down the tree to its leaves by condition and determine the probable nearest element by condition. After that, the algorithm for finding the nearest element in a given range, which is determined by the radius, is launched from the root of the tree.

In this case, when finding a closer element, the search radius is adjusted.

Algorithm

Z - tree root | List - list of nearest items | [x_0, x_1, x_2 ..., x_n] - the element for which the nearest Len is searched - minimum length Maybe_Near (Node * & Z) ( // search for the closest possible element    While (Z) ( // check the elements in the node   for (i \u003d 0; i< N; i++ ) { len_cur= sqrt ((x_0- x[ i] _0) ^ 2 + (x_1- x[ i] _1) ^ 2 + ... + (x_n- x[ i] _n) ^ 2 ) ; // length of the current element   if (Len\u003e // setting a new length    Delete (List); // clear the list    Add (List); // add a new item to the list If ((x_0 \u003d x [i] _0) && (x_1 \u003d x [i] _1) && ... && (x_n \u003d x [i] _n)) Return 1; ) If ([x_0, x_1, x_2 ..., x_n]< Z) Z= Z- >   left; // left subtree   If ([x_0, x_1, x_2 ..., x_n]\u003e Z) Z \u003d Z-\u003e right; // right subtree   )) Near function (Node * & Z) ( // search for the closest element in a given range    While (Z) ( // check the elements in the node   for (i \u003d 0; i< N; i++ ) { len_cur= sqrt ((x_0- x[ i] _0) ^ 2 + (x_1- x[ i] _1) ^ 2 + ... + (x_n- x[ i] _n) ^ 2 ) ; // length of the current element   if (Len\u003e length of the current element) (Len \u003d len_cur; // setting a new length    Delete (List); // clear the list    Add (List); // add a new item to the list   ) Else if (lengths are equal) Add (List); // add a new item to the list    ) If ([x_0, x_1, x_2 ..., x_n] + len\u003e Z) ( // if the range is greater than the median    Near (Z-\u003e right); // view both trees    Z \u003d Z-\u003e left; ) If ([x_0, x_1, x_2 ..., x_n]< Z) Z= Z- >   left; // left subtree   If ([x_0, x_1, x_2 ..., x_n]\u003e Z) Z \u003d Z-\u003e right; // right subtree } }

Analysis

Obviously, the minimum number of elements viewed is where h is the height of the tree. It is also obvious that the maximum number of items viewed is, that is, viewing all nodes. It remains to calculate the average number of items viewed.

The specified element for which to find the closest. This problem is divided into 2. Finding the nearest element in the node and finding the nearest element in the given range. For the first part, you will need one descent through the tree, that is.

For the second part, as we have already calculated, the search for elements in a given range is equal. To find out the average, simply add up these 2 values:

see also

Notes

References

  • libkdtree ++, an open-source STL-like implementation of k-d trees in C ++.
  • FLANN and its fork nanoflann, efficient C ++ implementations of k-d tree algorithms.
  • kdtree A simple C library for working with KD-Trees
  • libANN Approximate Nearest Neighbor Library includes a k-d tree implementation
  • Caltech Large Scale Image Search Toolbox: a Matlab toolbox implementing randomized k-d tree for fast approximate nearest neighbor search, in addition to LSH, Hierarchical K-Means, and Inverted File search algorithms.
  • Heuristic Ray Shooting Algorithms, pp. 11 and after
  • Into contains open source implementations of exact and approximate (k) NN search methods using k-d trees in C ++.

This article is fully devoted to KD-Trees: I describe the subtleties of building KD-Trees, the subtleties of implementing the "neighbor" search functions in the KD-Tree, as well as the possible "pitfalls" that arise in the process of solving certain subtasks of the algorithm. In order not to confuse the reader with terminology (plane, hyper-plane, etc.), and indeed for convenience, it is assumed that the main action takes place in three-dimensional space. However, where necessary, I note that we work in a space of a different dimension. In my opinion, the article will be useful both to programmers and to all those who are interested in learning algorithms: someone will find something new for themselves, and someone will simply repeat the material and possibly supplement the article in the comments. In any case, I ask everyone under cat.

Introduction

Kd-tree(K-dimensional tree), a special "geometric" data structure that allows you to split a K-dimensional space into "smaller parts" by means of a section of this space by hyperplanes ( K\u003e 3), planes ( K \u003d 3), direct ( K \u003d 2) well, and in the case of a one-dimensional space-point (by searching in such a tree, we get something similar to binary search).
  It is logical that such a partition is usually used to narrow the search range in K-dimensional space. For example, searching for a near object (vertex, sphere, triangle, etc.) to a point, projecting points onto a 3D grid, ray tracing (actively used in Ray Tracing), etc. In this case, space objects are placed in special parallelepipeds - bounding boxes(we call a bounding box such a parallelepiped that describes the initial set of objects or the object itself if we build a bounding box only for it. For points, a bounding box with zero surface area and zero volume is taken as a bounding box), the sides of which are parallel coordinate axes.

Node division process

  So, before using the KD-Tree, you need to build it. All objects are placed in one large bounding box, which describes the objects of the original set (each object is limited by its bounding box), which is then divided (divided) by a plane parallel to one of its sides into two. Two new nodes are added to the tree. In the same way, each of the resulting parallelepipeds is divided into two, etc. The process ends either by a special criterion (see Sah), or upon reaching a certain depth of the tree, or upon reaching a certain number of elements inside the tree node. Some elements can simultaneously enter both the left and right nodes (for example, when triangles are considered as tree elements).

I will illustrate this process with the example of many triangles in 2D:


On the fig. 1   shows the original set of triangles. Each triangle is placed in its own bounding box, and the whole set of triangles is inscribed in one large root bounding box.


On the fig. 2   divide the root node into two nodes (OX): triangles 1, 2, 5 fall into the left node, and 3, 4, 5 fall into the right node.


On the fig. 3, the resulting nodes are again divided into two, and a triangle 5 is included in each of them. When the process ends, we get 4 leaf nodes.

Of great importance is the choice of plane for the separation of the tree node. There are a huge number of ways to do this, I give only some of the most common methods in practice (it is assumed that the initial objects are placed in one large bounding box, and separation occurs parallel to one of the coordinate axes):

Method 1: Select the largest side of the bounding box. Then the secant plane will pass through the middle of the selected side.

Method 2: Dissect by median: sort all the primitives by one of the coordinates, and the median is the element (or the center of the element), which is located at the middle position in the sorted list. The secant plane will pass through the median so that the number of elements on the left and right is approximately equal.

Method 3: Using the alternation of sides when splitting: at a depth of 0 we beat through the middle of the side parallel to OX, the next level of depth is through the middle of the side parallel to OY, then along OZ, etc. If we "walked along all axes", then we begin the process anew. The exit criteria are described above.

Method 4: The most “smart” option is to use SAH (Surface Area Heuristic)   bounding box separation evaluation function. (This will be described in detail below). SAH also provides a universal criterion for stopping node division.

Methods 1 and 3   are good when it comes to the speed of constructing a tree (since it’s trivial to find the middle of the side and draw a plane through it, sifting out the elements of the original set “left” and “right”). At the same time, they quite often give a loose view of the partition of space, which can negatively affect the basic operations in the KD-Tree (especially when tracking a ray in a tree).

Method 2 allows you to achieve good results, but requires a considerable amount of time that is spent on sorting the elements of the node. Like methods 1, 3, it is quite simple to implement.

Of greatest interest is the method using "smart" SAH heuristics (method 4).

The Bounding box of the tree node is divided by N (parallel to the axes) planes into N + 1 parts (the parts are usually equal) on each side (in fact, the number of planes for each side can be any, but for simplicity and efficiency they take a constant) . Further, at possible points of intersection of the plane with the bounding box, the value of the special function is calculated: SAH. Division is performed by the plane with the smallest SAH value of the function (in the figure below, I assumed that the minimum is reached in SAH, therefore, division will be performed by this plane (although here 2D space is so straight)):

The SAH value of the function for the current plane is calculated as follows:

In my KD-Tree implementation, I divide each side into 33 equal parts using 32 planes. Thus, according to the test results, I was able to find the "golden" middle-speed of the basic functions of the tree / speed of tree construction.

As a SAH heuristic, I use the function shown in the figure above.

It is worth noting that to make a decision, only a minimum of this function on all secant planes is required. Therefore, using the simplest mathematical properties of the inequalities, as well as discarding the multiplication by 2 when calculating the surface area of \u200b\u200bthe node (in 3D) ( SAR, SAL, SA), this formula can be greatly simplified. In full, the calculations are performed only once per node: when evaluating the criterion for exiting the division function. Such a simple optimization gives a significant increase in the speed of building a tree ( x2.5).

To effectively calculate the value of the SAH function, you must be able to quickly determine how many nodal elements are to the right of this plane, and how many to the left. The results will be unsatisfactory if we use a crude, so-called brute force   quadratic asymptotic approach. However, when using basketry (binned)   The method situation is significantly changing for the better. A description of this method is given below:

Suppose that we break the side of the bounding box into N equal parts (the number of planes is (N-1)), store the bounding box with a pair of coordinates (pointMin, pointMax-cm.   fig. 1), then in one pass through all the elements of the node, we can precisely determine for each plane how many elements lie to the right and how many to the left of it. Create two arrays of N elements in each, and initialize with zeros. Let it be arrays with names aHigh   and aLow. Successively run through all the elements of the node. For the current element, we check which part gets pointMin and pointMax of its bounding box. Accordingly, we obtain two indices per element of the set. Let it be indexes with names ihigh(for pointMax) and iLow(for pointMin). After that, do the following: aHigh + \u003d 1, aLow + \u003d 1.

After passing through all the elements, we get the filled arrays aHigh and aLow. For aHigh array, we calculate partial-postfix (suffix) amounts, and for aLow, we calculate partial-prefix (prefix) amounts (see figure). It turns out that the number of elements on the right ( and only to the right!) from the plane with index i will be equal to aLow, the number of elements lying to the left ( and only on the left!): aHigh [i], the number of elements that are included in both the left and right nodes: AllElements-aLow-aHigh [i].

The problem is solved, and an illustration of this simple process is given below:

I would like to note that obtaining a predetermined number of elements to the left and to the right of the “beating” plane allows us to pre-allocate the required amount of memory for them (after all, after receiving the minimum SAH, we need to go through all the elements again and place each in the desired array , (and the use of a banal push_back (if reserve was not called) leads to constant memory allocation is a very expensive operation), which positively affects the speed of the tree construction algorithm (x3.3).

Now I would like to describe in more detail the purpose of the constants used in the SAH calculation formula, as well as talk about the criteria for stopping the division of this node.

Iterating over constants cI   and cT, you can achieve a denser tree structure (or vice versa), sacrificing the time of the algorithm. In many articles devoted primarily to the construction of the KD-Tree for Ray Tracing rendering, the authors use values cI \u003d 1., cT \u003d: the higher the cT value, the faster the tree is built, and the worse the ray tracking results in such a tree.

In my implementation, I use the tree to search for the "neighbor", and paying due attention to the results of testing and searching for the necessary coefficients, we can see that high values \u200b\u200bof cT give us nodes that are not completely filled with elements. To avoid this situation, I decided to set the cT value to 1., and test the cI value on different-large-data units. As a result, it was possible to obtain a fairly dense tree structure, having paid a significant increase in time during construction. On the results of the search for "neighbor", this action was reflected positively, the search speed increased.

The criterion for stopping the division of a node is quite simple:

In other words: if the cost of tracing the child nodes is greater than the cost of tracing the parent node, then the division must be stopped.

Now that we have learned how to divide the KD-Tree node, I’ll tell you about the initial cases when the number of elements in the node is quite large, and the stopping criteria by the number of elements takes the algorithm to infinity. Actually, all the attention to the picture (for example, triangles in 2D):

I call such situations "fan"   (have a common point of contact, matching objects, I also attribute to this category). It can be seen that no matter how we draw the secant plane, the central point somehow falls into one of the nodes, and with it the triangles for which it is common.

Tree building process

  We learned how to share a tree node, now it remains to apply the knowledge gained to the process of building the whole tree. Below I give a description of my implementation of the construction of the KD-Tree.

A tree is built from the root. In each node of the tree I store pointers to the left and right subtrees, if there are no such nodes, then it is considered leafy   (sheet i.e.). Each node stores a bounding box that describes the objects of this node. In sheet ( and only leafy!) nodes I store the indices of those objects that are included in this node. However, during the construction process, memory for nodes is allocated in portions (i.e., immediately for K nodes: firstly, it’s more efficient to work with a memory manager, and secondly, contracting items are ideal for caching). Such an approach prohibits storing tree nodes in a vector, because adding new elements to a vector can lead to the implementation of memory for all existing elements in another place.

Accordingly, pointers to subtrees lose all meaning. I use a container of type list (std :: list), each element of which is a vector (std :: vector), with a predetermined size (constant). I build the tree multithreaded (using Open MP), that is, I build each subtree in a separate thread (x4 to speed). Using copy semantics (C ++ 11) (+ 10% speed) is ideal for copying indices into a leaf node.

Finding a "neighbor" to a point in the KD Tree

  So, the tree is built, let's move on to the description of the implementation of the search operation in the KD-Tree.

Suppose that we carry out a search in a set of triangles: a point is given, it is required to find the triangle closest to it.

Solving the problem using bruteforce is disadvantageous: for a set of N points and M triangles, the asymptotic behavior is O (N * M). In addition, I would like the algorithm to calculate the distance from a point to a triangle “honestly” and do it quickly.

We will use the KD Tree and do the following:

Step 1. Find the closest leaf node to this point (in each node, as you know, we store the bounding box, and we can safely calculate the distance to the center ((pointMax + pointMin) * 0.5) of the bounding box of the node) and denote it as nearestNode.

Step 2. By enumerating among all the elements of the found node (nearestNode). The resulting distance is denoted by minDist.

Step 3. We construct a sphere centered at the starting point and a radius of length minDist. Check if this sphere lies completely inside (that is, without any intersections of the sides of the bounding box node) nearestNode. If it lies, then the nearest element is found; if not, then we go to step 4.

Step 4. Run from the root of the tree, search for the nearest element in the radius: going down the tree, check if the right or left nodes cross (in addition, the node can lie completely inside the sphere or the sphere inside the node ...) this sphere. If any node has been crossed, then a similar check is performed for internal nodes of the same node. If we came to a leaf node, then we will perform an exhaustive search for the neighbor in this node and compare the result with the length of the radius of the sphere. If the radius of the sphere is greater than the distance found, update the length of the radius of the sphere with the calculated minimum value. Further descents on the tree occur with the updated radius of the sphere (if we use the recursive algorithm, the radius is simply passed to the function by reference, and then updated where necessary).

The following figure helps to understand the essence of the algorithm described above:

According to the drawing: suppose we found the nearest leaf node (blue in the figure) to this point (highlighted in red), then, by searching for the neighbor in the node, we get that this is a triangle with index 1, however, as you can see, this is not so. A circle with radius R intersects an adjacent node, therefore, you need to search in this node, and then compare the newly found minimum with what is already there. As a result, it becomes apparent that the triangle with index 2 is the neighbor.

Now I would like to talk about the effective implementation of intermediate operations used in the “near” search algorithm.

When searching for a neighbor in a node, you must be able to quickly calculate the distance from a point to a triangle. I will describe the simplest algorithm:

We find the projection of the point A (the point to which we are looking for the nearest) on the plane of the triangle. Denote the found point by P. If P lies inside the triangle, then the distance from A to the triangle is equal to the length of the segment AP, otherwise we find the distances from A to each of the sides (segments) of the triangle, choose a minimum. The problem is solved.

The described algorithm is not the most efficient. A more effective approach relies on the search and analysis (finding the minimum of the gradient, etc.) of a function whose values \u200b\u200bare all possible distances from a given point to any point in the triangle. The method is rather complicated in perception and, in my opinion, deserves a separate article (so far it is implemented in my code, and you will find a link to the code below). You can familiarize yourself with the method by. According to the test results, this method turned out to be 10 times   faster than what I described earlier.

Determining whether a sphere is centered at point O and radius R inside a specific node represented by a bounding box is simple (3D):

Inline bool isSphereInBBox (const SBBox & bBox, const D3 & point, const double & radius) (return (bBox.m_minBB< point - radius && bBox.m_maxBB > < point - radius && bBox.m_maxBB >   point + radius && bBox.m_minBB< point - radius && bBox.m_maxBB >   point + radius); )
  With the algorithm for determining the intersection of a sphere with the bounding box of a node, finding a node inside a sphere or sphere inside a node, things are somewhat different. Again, I will illustrate (the picture was taken from) and give the correct code that allows you to perform this procedure (in 2D, 3D-similar):


  bool intersects (CircleType circle, RectType rect) (circleDistance.x \u003d abs (circle.x - rect.x); circleDistance.y \u003d abs (circle.y - rect.y); if (circleDistance.x\u003e (rect.width / 2 + circle.r)) (return false;) if (circleDistance.y\u003e (rect.height / 2 + circle.r)) (return false;) if (circleDistance.x<= (rect.width/2)) { return true; } if (circleDistance.y <= (rect.height/2)) { return true; } cornerDistance_sq = (circleDistance.x - rect.width/2)^2 + (circleDistance.y - rect.height/2)^2; return (cornerDistance_sq <= (circle.r^2)); }
First (the first pair of lines) we reduce the calculation of 4 quadrants to one. In the next pair of lines, we check if the circle lies in the green area. If it is, then there is no intersection. The next couple of lines will check if the circle enters the orange or gray areas of the pattern. If it enters, then an intersection is detected.

Actually, this calculation returns false   for all circles whose center is within the red region, and "true"   for all circles whose center is in the white area.

In general, this code provides what we need (I gave here an implementation of the code for 2D, however, in my KD-Tree code I use the 3D version).

It remains to talk about the speed of the search algorithm, as well as about critical situations that slow down the search in the KD-Tree.

As stated above "fan"   situations generate a large number of elements inside the node, the more they are, the slower the search. Moreover, if all elements are equidistant from a given point, then the search will be carried out for O (N)(the set of points that lie on the surface of the sphere, and the nearest one is looking for the center of the sphere). However, if these situations are removed, the search on average will be tantamount to a descent through a tree with enumeration of elements in several nodes i.e. behind O (log (N)). It is clear that the speed of the search depends on the number of leaf nodes of the tree visited.

Consider the following two figures:


  The essence of these figures is that if the point to which we are looking for the nearest element is located very, very far from the original bounding box of the set, then a sphere with a radius of length minDist (distance to the nearest) will intersect many more nodes than if we considered the same sphere, but with the center at a point much closer to the original bounding box of the set (naturally, minDist will change). In general, the search for neighbors to a very distant point is slower than the search for a point located close to the original set. My tests have confirmed this information.

Results and Summary

  As a result, I would like to add a few words about my implementation of the KD-Tree and give the results. Actually, the code design was developed so that it could be easily adapted to any objects of the original set (triangles, spheres, points, etc.). All you need to do is create an heir class, with overridden virtual functions. Moreover, my implementation also provides for passing a special class Splitteruser-defined. This class, or rather its virtual split method, determines exactly where the cutting plane will go. In my implementation, I provide a class that makes decisions based on SAH. Here, I note that in many articles devoted to accelerating the construction of a KD-Tree based on SAH, many authors use simpler techniques for finding the cutting plane (like center division or median) at the initial values \u200b\u200bof the tree depth (in general, when the number of elements in the tree node is large) ), and SAH heuristics are applied only at the moment when the number of elements in the node is small.

My implementation does not contain such tricks, but it allows me to quickly add them (you only need to expand the KD-Tree constructor with a new parameter and call the construction member function with the desired splitter, controlling the necessary restrictions). Searching in a tree is multithreaded. I perform all calculations in double precision numbers ( double) The maximum depth of the tree is set by a constant (default-32). Defined by some #defines, allowing, for example, to perform a tree traversal when searching without recursion (with recursion, though it is faster to exit — all nodes are elements of a certain vector (that is, located nearby in memory), which means they are well cached). Together with the code, I provide test data sets (3D models of the “modified OFF format” with a different number of triangles inside (from 2 to 3,000,000)). The user can throw a dump of the constructed tree (DXF format) and view it in the corresponding graphics program. The program also logs (you can enable / disable) the quality of the tree: the depth of the tree, the maximum number of elements in the leaf node, the average number of elements in the leaf nodes, the runtime are reset. In no case do I claim that the resulting implementation is ideal, but what is there, I myself know the places where I missed (for example, I do not pass the allocator to the template parameter, the frequent presence of C-code (I do not read and write files using streams) , possible undetected bugs, etc., it’s time to fix it). And of course, the tree is made and optimized strictly for working in 3D space.

I tested the code on a laptop with the following characteristics: Intel Core I7-4750HQ, 4 core (8 threads) 2 GHz, RAM-8gb, Win x64 app on Windows 10. The coefficients for calculating SAH I took the following: cT \u003d 1., cI \u003d 1.5.   And, speaking of the results, it turned out that on 1.5 million   triangles a tree is built in less than 1.5 seconds. On the 3 million in 2.4 seconds. For 1.5 million   points and 1.5 milliontriangles (points are located not very far from the original model), the search was completed in 0.23 seconds, and if the points are removed from the model, the time increases, up to 3 seconds. For 3 million   points (again, located close to the model) and 3 million   Triangles search takes about 0.7 seconds. I hope nothing is confused. Finally, an illustration of the visualization of the constructed KD-Tree:

). k-d trees are a special kind of binary search tree.

Mathematical description

K-tree is an unbalanced search tree for storing points from \\ mathbb (R) ^ k. It offers an R-tree-like search capability in a given range of keys. To the detriment of simplicity of queries, memory requirements O (kn)   instead O ((log (n)) ^ (k-1)).

There are homogeneous and heterogeneous k-d trees. In homogeneous k-d trees, each node stores a record. In a heterogeneous version, internal nodes contain only keys, leaves contain links to records.

In a heterogeneous k-d tree H_i (t) \u003d (x_1, x_2, \\ ldots, x_ (i-1), t, x_ (i + 1), \\ ldots, x_k)   at 1 \\ leq i \\ leq k   parallel to the axis (k-1)-dimensional hyperplane at a point t. For the root, you need to divide the points through the hyperplane H_1 (t)   into two possibly equally large sets of points and write t   to the root, to the left of this, all points at which x_1 on the right are those with x_1\u003e t. For the left subtree, you need to divide the points again into a new "divided plane" H_2 (t), a t   stored in the internal node. To the left of this, all points at which x_2 . This continues recursively over all spaces. Then everything starts again from the first space until each point can be clearly identified through a hyperplane.

K-d tree can be built in O (n (k + log (n))). Range search can be done in O (n ^ (1- \\ frac (1) (k)) + a), wherein a   indicates the size of the response. The memory requirement for the tree itself is limited. O (kn).

Operations with k-d trees

Structure

The tree structure described in C ++:

const int N \u003d 10; // number of key spaces struct Item (// element structure int key [N]; // array of keys defining the element char * info; // element information); struct Node (// structure of the tree node Item i; // element Node * left; // left subtree Node * right; // right subtree)

The tree structure may vary depending on the details of the implementation of the algorithm. For example, a node may contain not one element, but an array, which increases the efficiency of the search.

Element Search Analysis

Obviously, the minimum number of items viewed is 1, and the maximum number of items viewed is O (h)where h   is the height of the tree. It remains to calculate the average number of items viewed A_n.

  - the given element.

Consider the case h \u003d 3. Items found can be:

find (t_1): [(x_0 \u003d t_1)]; A \u003d 1.

find (t_2): [(x_0

find (t_3): [(x_0\u003e t_1) \\ land (x_0 \u003d t_3)]; A \u003d 2.

find (t_4): [(x_0

find (t_5): [(x_0 t_2) \\ land (x_0 \u003d t_5)]; A \u003d 3.

find (t_6): [(x_0

find (t_7): [(x_0 t_3) \\ land (x_0 \u003d t_7)]; A \u003d 3.

and so for each key space. The average search length in one space is:

A \u003d \\ frac (1 + 2 + 2 + 3 + 3 + 3 + 3) (7) \u003d \\ frac (17) (7) \\ approx 2,4.

The average value is calculated by the formula: A_n \u003d \\ sum_ (k \u003d 1) ^ n kp_ (n, k)

It remains to find the probability p_ (n, k). She is equal p_ (n, k) \u003d \\ frac (p_ (A, k)) (p_ (n))where p_ (A, k)   - the number of cases when A \u003d k   and p_ (n)   - total number of cases. It’s not difficult to guess that p_ (n, k) \u003d \\ frac (2 ^ (k-1)) (2 ^ n-1).

Substitute this in the formula for the average value:

A_n \u003d \\ sum_ (k \u003d 1) ^ n kp_ (n, k) \u003d \\ sum_ (k \u003d 1) ^ n (k \\ frac (2 ^ (k-1)) (2 ^ n-1)) \u003d \\ \\ frac (1) (2 ^ n-1) \\ sum_ (k + 1 \u003d 1) ^ n (((k + 1)) 2 ^ k) \u003d \\ frac (1) (2 ^ n-1) (\\ \\ frac (1) (2 ^ n-1) (\\ sum_ (k \u003d 1) ^ n (k2 ^ k) + \\ sum_ (k \u003d 1) ^ n (2 ^ k) - 2 ^ n - n2 ^ n )

\u003d \\ frac (1) (2 ^ n-1) (n2 ^ (n + 2) - (n + 1) 2 ^ (n + 1) +2 - 2 ^ n + 2 ^ 3 -1 - n2 ^ n ) \u003d \\ frac (2 ^ n (n-1) +1) (2 ^ n-1)

i.e

A_h \u003d \\ frac (2 ^ h (h-1) +1) (2 ^ h-1),

  - tree height. A_n \u003d ~ O (\\ frac (2 ^ h (h-1) +1) (2 ^ h-1)) \u003d ~ O (h \\ frac (2 ^ h) (2 ^ h-1) - 1) \u003d ~ O (log (\\ frac (n) (N) +1) \\ frac (2 ^ (log (\\ frac (n) (N) +1))) (2 ^ (log (\\ frac (n) (N ) +1)) - 1) - 1) \u003d ~ O (log (\\ frac (n) (N) +1) \\ frac (n + N) (n) -1) \u003dwhere h   \u003d ~ O (log (\\ frac (n) (N) +1) ^ (\\ frac (n + N) (n)) - 1)

If we go from the height of the tree to the number of elements, then:

N

  - the number of elements in the node.where Elements are added in the same way as in a regular binary search tree, with the only difference being that each level of the tree will also be determined by the space to which it belongs. for (int i \u003d 0; tree; i ++) // i is the number of the space if (tree-\u003e x [i]

From this you can make conclusionso that the more elements are contained in the node, the faster the search will take place in the tree, since the height of the tree will remain minimal, but you should not store a huge number of elements in the node, since with this method the whole tree can degenerate into a regular array or list .

Adding Items

t) // t - median tree \u003d tree-\u003e left; // go to the left subtree else tree \u003d tree-\u003e right; // go to the right subtree

Tree promotion algorithm:

adding is done in< tree->when deleting tree elements, several situations may arise:

Deleting a tree leaf is a fairly simple deletion when one node is deleted and the pointer of the ancestor node is simply nullified. O (h)where h   \u003d ~ O (log (\\ frac (n) (N) +1) ^ (\\ frac (n + N) (n)) - 1)

Delete items

Z - node of the tree [(x_0_min, x_1_min, x_2_min, ..., x_n_min), (x_0_max, x_1_max, x_2_max, ..., x_n_max)] - given range Function Array (Node * & Z) (If (

  • left; // left subtree) else If (\u003e Z) (Z \u003d Z-\u003e right; // right subtree) Else (// view both subtrees Array (Z-\u003e right); // run the function for the right subtree Z \u003d Z- \u003e left; // view the left subtree))
  • Removing a tree node (not a leaf) is a very complicated procedure in which you have to rebuild the entire subtree for a given node.

Sometimes the process of deleting a node is solved by modifications of the k-d tree. For example, if our node contains an array of elements, then when you delete the entire array, the tree node remains, but new elements are no longer written there.

Search for a range of items

The search is based on a normal tree descent when each node is checked for a range. If the medians of a node are less than or greater than a given range in a given space, then the bypass goes further along one of the branches of the tree. If the median of the node is fully within the specified range, then you need to visit both subtrees.

Algorithm

{!LANG-48f195492935ab5103acfd83c0765fd4!} {!LANG-76c4315e8ab0e4accbaa576c35511d5b!}

Analysis

O (h)where h - tree height. It’s also obvious that the maximum number of items viewed is O (2 ^ h-1), that is, viewing all elements of the tree. It remains to calculate the average number of items viewed A_n.

[(x_ (0_ (min)), x_ (1_ (min)), x_ (2_ (min)), ..., x_ (n_ (min))), (x_ (0_ (max)), x_ ( 1_ (max)), x_ (2_ (max)), ..., x_ (n_ (max)))]   - set range.

The original article about k-d trees gives this characteristic: A_n \u003d ~ O (h \\ cdot log (h))   for a fixed range.

If we go from the height of the tree to the number of elements, then this will be: A_n \u003d ~ O (log (log (n-1)) ^ (log (n-1)))

Find your nearest neighbor

The search for the nearest element is divided into two subtasks: determining the possible nearest element and searching for the nearest elements in a given range.

Given tree tree. We go down the tree to its leaves by condition tree \\ to x [i] (<,>\u003d) tree \\ to t   and determine the probable closest element by condition l_ (min) \u003d \\ sqrt ((((x_0-x [i] _0)) ^ 2 + ((x_1-x [i] _1)) ^ 2 + ... + ((x_n-x [i] _n )) ^ 2)). After that, the algorithm for finding the nearest element in a given range, which is determined by the radius, starts from the root of the tree R \u003d l_ (min) \u003d \\ sqrt ((((x_0-x [i] _0)) ^ 2 + ((x_1-x [i] _1)) ^ 2 + ... + ((x_n-x [i ] _n)) ^ 2)).

The search radius is adjusted when a closer element is found.

Algorithm

Z - tree root | List - list of nearest items | - the element for which the nearest Len is searched for - minimum length Maybe_Near (Node * & Z) (// search for the closest possible element While (Z) (// check elements in the for node (i \u003d 0; i length of the current element) (Len \u003d len_cur; // setting a new length Delete (List); // clear the list Add (List); // add a new element to the list) Else if (the lengths are equal) Add (List); // add a new item to the list If ((x_0 \u003d x [i] _0) && (x_1 \u003d x [i] _1) && ... && (x_n \u003d x [i] _n)) Return 1; ) If ( left; // left subtree If (\u003e Z) Z \u003d Z-\u003e right; // right subtree)) Near (Node * & Z) function (// search for the nearest element in the specified range While (Z) (// check elements in the for node (i \u003d 0; i length of the current element) (Len \u003d len_cur; // setting a new length Delete (List); // clear the list Add (List); // add a new element to the list) Else if (the lengths are equal) Add (List); // add a new element to the list) If (+ len\u003e Z) (// if the range is greater than the median Near (Z-\u003e right); // view both trees Z \u003d Z-\u003e left;) If ( left; // left subtree If (\u003e Z) Z \u003d Z-\u003e right; // right subtree))

Analysis

Obviously, the minimum number of items viewed is O (h)where h is the height of the tree. It’s also obvious that the maximum number of items viewed is O (2 ^ h-1), that is, viewing all nodes. It remains to calculate the average number of items viewed.

[(x_0, x_1, x_2, ..., x_n)]   - a given element for which you want to find the closest. This task is divided into two subtasks: finding the nearest element in the node and finding the nearest element in a given range. To solve the first subtask, you will need one descent in the tree, i.e. O (h).

For the second subtask, as we have already calculated, the search for elements in a given range is performed in O (h \\ cdot log (h)). To find out the average, just add up these two values:

\u003d ~ O (h) + ~ O \u200b\u200b(h \\ cdot log (h)) \u003d ~ O (h) \\ cdot ((~ O (log (h)) + 1))).

see also

Write a review on the article "K-dimensional tree"

Notes

References

  • , an open-source STL-like implementation of k-d trees in C ++.
  •   and its fork, efficient C ++ implementations of k-d tree algorithms.
  •   A simple C library for working with KD-Trees
  •   Approximate Nearest Neighbor Library includes a k-d tree implementation
  • : a Matlab toolbox implementing randomized k-d tree for fast approximate nearest neighbor search, in addition to LSH, Hierarchical K-Means, and Inverted File search algorithms.
  • , pp. 11 and after
  •   contains open source implementations of exact and approximate (k) NN search methods using k-d trees in C ++.

A passage characterizing the K-dimensional tree

Natasha thought for a moment.
  “Ah Sonya, if you only knew him like I did!” He said ... He asked me about how I promised Bolkonsky. He was glad that it was up to me to refuse him.
  Sonya sighed sadly.
  “But you did not refuse Bolkonsky,” she said.
  - Or maybe I refused! Maybe it's over with Bolkonsky. Why are you thinking so bad about me?
  - I don’t think anything, I just don’t understand this ...
  - Wait, Sonya, you'll understand everything. You will see what kind of person he is. You do not think bad either about me or about him.
  - I don’t think bad things about anyone: I love everyone and pity everyone. But what should I do?
  Sonya did not give up on the gentle tone with which Natasha addressed her. The softer and more searching the expression on Natasha’s face, the more serious and stern was Sonya’s face.
  “Natasha,” she said, “you asked me not to talk to you, I didn’t say it, now you yourself started.” Natasha, I do not believe him. Why is this secret?
  - Again, again! - interrupted Natasha.
- Natasha, I'm afraid for you.
  - What are you afraid of?
  “I’m afraid that you will ruin yourself,” Sonya said decisively, frightened by what she said.
  Natasha's face again expressed anger.
  “And I will ruin, I will ruin, I will ruin myself as soon as possible.” None of your business. Not you, but I will be ill. Leave, leave me. I hate you.
  - Natasha! - screamed cried out Sonya.
  - I hate, hate! And you are my enemy forever!
  Natasha ran out of the room.
  Natasha did not speak with Sonya anymore and avoided her. With the same expression of agitated surprise and crime, she walked around the rooms, taking up this or that other activity, and immediately throwing them.
  No matter how hard it was for Sonya, but she kept an eye on her friend.
  On the eve of the day on which the count was supposed to return, Sonya noticed that Natasha sat all morning at the living room window, as if expecting something and that she had made some sign to the passing military man, whom Sonya mistook for Anatole.
  Sonya began to observe her friend even more attentively and noticed that Natasha was in a strange and unnatural state all the time during lunch and evening (answered inappropriately to her questions, she started and did not finish the sentences, she laughed at everything).
  After tea, Sonya saw the timid maid girl, who was waiting for her at the door of Natasha. She let her in and, overhearing at the door, found out that a letter had been sent again. And suddenly it became clear to Sonya that Natasha had some kind of terrible plan for tonight. Sonya knocked on her. Natasha did not let her go.
  “She will run away with him! thought Sonya. She is capable of anything. Today there was something especially miserable and decisive in her face. She cried, saying goodbye to her uncle, Sonya recalled. Yes, that's right, she runs with him, - but what should I do? ” Sonya thought, recalling now those signs that clearly proved why Natasha had some kind of terrible intention. “No graph. What should I do, write to Kuragin, demanding an explanation from him? But who tells him to answer? To write to Pierre, as Prince Andrei requested in case of an accident? ... But maybe, in fact, she already refused Bolkonsky (she sent a letter to Princess Marya yesterday). There are no uncles! ” To say to Marya Dmitrievna, who so believed in Natasha, it seemed awful to Sonya. “But one way or another, Sonya thought, standing in the dark corridor: now or never has the time come to prove that I remember the good deeds of their family and love Nicolas. No, I won’t sleep at least for three nights, but don’t leave this corridor and force her to let her go, and I won’t let shame fall on their family, ”she thought.

Anatole recently moved to Dolokhov. Rostov’s abduction plan had already been thought over and prepared by Dolokhov for several days, and on the day when Sonya, having overheard Natasha at the door, decided to protect her, this plan was to be carried out. Natasha at ten o'clock in the evening promised to go to Kuragin on the back porch. Kuragin was supposed to put her in a cooked troika and carry it 60 miles from Moscow to the village of Kamenka, where a raspified pop was prepared to marry them. In Kamenka, a setup was ready, which was supposed to take them to the Warsaw Road and there, at the postal stations, they were supposed to ride abroad.
  Anatol had a passport, and a road ticket, and ten thousand of the money taken from his sister, and ten thousand, borrowed through the medium of Dolokhov.
  Two witnesses - Khvostikov, the former clerk who was used for the game by Dolokhov and Makarin, a retired hussar, a good-natured and weak man, who had infinite love for Kuragin - were sitting in the first room at tea.
  In the large office of Dolokhov, cleaned from wall to ceiling by Persian rugs, bear skins and weapons, Dolokhov was sitting in a travel beshmet and boots in front of the opened bureau, on which lay bills and packs of money. Anatole, in his unbuttoned uniform, walked from the room where the witnesses were sitting, through the office to the back room, where his lackey Frenchman and others were packing up the last things. Dolokhov counted money and wrote down.
  “Well,” he said, “Khvostikov should be given two thousand.”
  “Well, give it to me,” said Anatole.
  - Makarka (they called Makarin), this selflessly for you into the fire and into the water. Well, the scores are over, ”said Dolokhov, showing him a note. - So?
  “Yes, of course, it is,” said Anatole, apparently not listening to Dolokhov and with a smile that did not leave his face, looking ahead of himself.
  Dolokhov slammed the bureau and addressed Anatole with a mocking smile.
  - And you know what - drop it all: there is still time! - he said.
  - Fool! - said Anatole. - Stop saying nonsense. If only you knew ... That damn thing knows what!
  “Drop the right,” said Dolokhov. “I'm telling you something.” Is it a joke that you started?
  “Well, again, teasing again?” Went to the devil! Huh? ”Anatole said with a grimace. “The right is not up to your stupid jokes.” - And he left the room.
  Dolokhov smiled contemptuously and condescendingly when Anatole left.
  “You wait,” he said after Anatole, “I am not joking, I’m saying business, go, go here.”
Anatole again entered the room and, trying to focus his attention, looked at Dolokhov, apparently involuntarily obeying him.
  - Listen to me, I tell you the last time. What am I to joke with you? Did I contradict you? Who arranged everything for you, who found the priest, who took the passport, who got the money? All I.
  - Well, thank you. Do you think I'm not thankful to you? - Anatole sighed and hugged Dolokhov.
  “I helped you, but still I have to tell you the truth: it’s dangerous and, if you make out, stupid.” Well, you'll take her away, fine. Is it so left? It turns out that you are married. After all, they will let you down under a criminal court ...
  - Ah! nonsense, nonsense! - again wrinkled spoke Anatole. - After all, I interpreted you. A? - And Anatole, with that particular predilection (which happens to stupid people) for the conclusion to which they will come by their own mind, repeated the reasoning that he repeated a hundred times to Dolokhov. “After all, I interpreted you, I decided: if this marriage is not valid,” he said, bending his finger, “then I don’t answer; but if it’s valid, anyway: nobody will know this abroad, right? And do not say, do not speak, do not speak!
  - Right, quit! You only bind yourself ...
  “Get off the hell,” said Anatole, and, grabbing his hair, went into another room and immediately returned and sat down with his feet on a chair close to Dolokhov. “That damn thing knows what!” A? Look how beats! - He took Dolokhov's hand and put it to his heart. - Ah! quel pied, mon cher, quel regard! Une deesse !! [ABOUT! What a leg, my friend, what a look! Goddess !!] A?
  Dolokhov, coldly smiling and shining with his beautiful, impudent eyes, looked at him, apparently wanting to still have fun over him.
  “Well, the money will come out, then what?”
  - What then? A? - repeated Anatole with sincere bewilderment before thinking about the future. - What then? There I do not know what ... Well what nonsense to say! - He looked at his watch. - It's time!
  Anatole went to the back room.
  - Well, are you soon? Digging here! He shouted at the servants.
  Dolokhov removed the money and shouting to a man to order food and drink on the road, he entered the room where Khvostikov and Makarin were sitting.
  Anatole was lying in his study, leaning on his arm, on the couch, smiling thoughtfully and whispering something gently to himself with his beautiful mouth.
  - Go eat something. Well, have a drink! - shouted to him from another room Dolokhov.
  - I do not want! - answered Anatole, continuing to smile.
  - Go, Balaga has arrived.
Anatole got up and entered the dining room. Balaga was a famous troika coachman, who had known Dolokhov and Anatole for six years, and had served them with his triples. More than once he, when Anatol's regiment was stationed in Tver, took him from Tver in the evening, delivered him to Moscow by dawn, and took him away the next day at night. More than once he took Dolokhov away from the pursuit, more than once he rolled them around the city with gypsies and ladies, as Balaga called him. More than once, with their work, he crushed people and cabmen across Moscow, and his gentlemen always helped him out, as he called them. He drove more than one horse under them. More than once he was beaten by them, more than once they soldered him with champagne and Madeira, which he loved, and more than one thing he knew behind each of them, which Siberian would have deserved for an ordinary person. In their festivities, they often called Balagu, forced him to drink and dance with the gypsies, and not one thousand of their money passed through his hands. Serving them, he risked his life and his skin twenty times a year, and at their work he killed more horses than they overpaid him. But he loved them, he loved this crazy ride, eighteen miles an hour, he loved to outrun a cab and crush a pedestrian in Moscow, and fly at full speed through Moscow streets. He loved to hear this wild cry of drunken voices: “go! go! ” whereas it was already impossible to go farther; He loved to stretch painfully around the neck of a man who was already neither alive nor dead avoided him. “Real gentlemen!” he thought.
  Anatole and Dolokhov also loved Balagu for his driving skills and for the fact that he loved the same as they. Balaga dressed with others, took twenty-five rubles for a two-hour ride, and only occasionally traveled with others himself, and more he sent his fellows. But with his masters, as he called them, he always rode on his own and never demanded anything for his work. Only when he knew through valets the time when there was money, he came every few months in the morning, sober and, bowing low, asked to help him out. He was always planted by gentlemen.

Share with friends or save for yourself:

  Loading...