The “Optimal Binary Search Tree” problem is a classical dynamic programming problem where the goal is to construct a binary search tree (BST) with a given set of keys and associated probabilities, such that the expected cost of searching for any key is minimized.
Problem Overview
Given a set of keys k1,k2, along with their associated search probabilities p1,p2,(the probability of each key being searched), and possibly some “dummy” keys with their own probabilities q0,q1, (representing gaps between the actual keys), you are asked to construct the binary search tree in such a way that the total cost of searching for any key is minimized.
Cost of a Binary Search Tree
In a binary search tree, the cost of a search is proportional to the depth of the key being searched. The objective here is to minimize the expected search cost, which involves balancing the keys in a way that minimizes the weighted sum of depths.
Dynamic Programming Approach
Let’s define a few things:
- e[i,j] will represent the expected search cost of the optimal binary search tree that contains keys ki.
- w[i,j] will represent the total probability (including dummy keys) for the subtree containing keys ki.
- root[i,j] will represent the index of the key that serves as the root of the optimal binary search tree for keys ki.
Initialization
For the base case where there are no keys in the subtree:
- e[i,i−1]=qi i.e., the expected cost when there are no real keys is just the probability of the dummy key.
- w[i,i−1]=qi since the total probability when there are no real keys is just the probability of the dummy key.
Recurrence Relation
For a given range of keys ki, the expected search cost is minimized by selecting one of the keys as the root. The recurrence relation for e[i,j
Where:
- e[i,r−1] is the cost of the left subtree.
- e[r+1,j] is the cost of the right subtree.
Algorithm Steps:
- Precompute the total probabilities w[i,j] for each range of keys.
- Use dynamic programming to compute e[i,j]for all ranges of keys.
- Backtrack using the
root[i, j]
table to recover the structure of the optimal binary search tree.
Time Complexity
The time complexity of the solution is O(n3)O(n^3) due to the three nested loops used for filling the DP table (to compute e[i,j] for all pairs i,ji, j and to find the optimal root for each pair).
However, some optimized solutions can bring this down to O(n2) using a more efficient approach for selecting the root.