Given insertion order into binary search tree (BST), find distance between two nodes

Another interesting problem around binary search trees ->
Given the insertion order of nodes in a binary search tree, find the distance between two nodes in the tree.

The brute force way to approach this will be -
- To build the tree [which is O(nlogn)]
- Look for the two nodes in the tree and construct paths from root to node [which is O(logn)]
- Find the distance between two nodes based on the path found above [ worst case O(n) in time complexity but average case of O(log n) as well]
Hence, giving us an effective time complexity of O(nlogn)

But there is a better way to get to the distance without building the tree [thanks to a colleague of mine, Harish for suggesting the solution]
We rely on narrowing down the search space(maximum and minimum values) on the insertion order as we go inserting/finding nodes, pretty much similar to finding if a given tree is a valid binary search tree.

The solution for the problem relies on two variables 'rangeMin' and 'rangeMax' for keeping tabs on the current search space/insertion space of a node. Using this, we figure out the path from the root node to the given node if a binary search tree was to be constructed using the nodes inserted in the order given without any balancing of the tree.
- rangeMin is updated when given node is greater than current node and current node is greater than current rangeMin(hence moving nearer to the given node from the bottom search space)
- rangeMax is updated when given node is lesser than current node and current node is lesser than current rangeMax(hence moving nearer to the given node from the upper search space)

For example, take the insertion order {9,1,5,7,8,10,12,4,6}
If a tree were to be constructed using the same, it would look like -
                                                  fig 1 - Binary Search Tree constructed using given insertion order

If we were looking to find path from root to node 4, the rangeMin, rangeMax changes and node path from root to node would look like this ->
rangeMin rangeMax Path from root to node
MIN_INT MAX_INT [ ]
MIN_INT 9 [9]
1 9 [9,1]
1 5 [9,1,5]
...Skipped as out of range ...Skipped as out of range ...Stays the same until 4 encountered
1 5 [9,1,5,4]

Similarly, the path from root to node 6 goes as follows -
rangeMin rangeMax Path from root to node
MIN_INT MAX_INT [ ]
MIN_INT 9 [9]
1 9 [9,1]
5 9 [9,1,5]
5 7 [9,1,5,7]
...Skipped as out of range ...Skipped as out of range ...Stays the same until 6 encountered
5 7 [9,1,5,7,6]


The code is available here

The time complexity for the above solution is -
- Look for the two nodes in the insertion order and construct paths from root to node [which is O(n)]
- Find the distance between two nodes based on the path found above [ worst case O(n) but average case of O(log n) as well]


Giving us an effective time complexity of O(n)

0 comments :