Binary Search Tree (BST)#
We have already seen binary search trees in our discussions about binary search in Chapter 4 on complexity and again in Chapter 7 where we made binary search with recursion. Both of these algorithms worked on sorted arrays, which we envisioned as binary search trees. The data could also be stored directly in a binary tree structure with links. This is important when we do not know the size of our data set in advance and we want the ability to add to grow our data structure dynamically.
A binary search tree is a special binary tree that is organized by a key. For any node, the nodes to the left contain keys that are “less than” the key of the node. Similarly, for any node, the nodes to the right contain keys that are “greater than” the key of the node. Just as in the array-based search, a search through a linked binary tree will find the item with one path down from root to a leaf, that is, in time \(\theta (\log n)\), provided that the tree is sufficiently balanced and is not linear or nearly linear. There are many algorithms for working with binary trees, including how to maintain balanced trees, which you may study in other textbooks. Here we will focus on a few algorithms to move around a binary search tree, with both a full search (in \(\theta (n)\)) and single path searches (in \(\theta (\log n)\) for a balanced tree).
Inserting an Item into a BST#
When an item is inserted into a BST, it must be attached to a leaf, unless it is the first item where it simply becomes the root. The trick is that the tree must remain sorted. This means that we must progress down one path in the tree to find where to insert our new node. Here is an algorithm:
Algorithm insertBST(newNode, cursor)
Assumes: newNode has a key (for example, an id number), data is filled in, and links are set tonull
\(\;\;\;\;\;\)\(\;\;\;\;\;\)\(\;\;\;\) cursor starts at the root of the tree on first call
\(\;\;\;\;\;\)\(\;\;\;\;\;\)\(\;\;\;\) keys are unique
\(\;\;\;\;\;\)\(\;\;\;\;\;\)\(\;\;\;\) root is notnull
if newNode’s key < cursor’s key then go left
\(\;\;\;\;\;\) a. next ← cursor’s left child
\(\;\;\;\;\;\) b. if next isnull
then we have found insertion spot, so insert the node
\(\;\;\;\;\;\)\(\;\;\;\;\;\) - set the left child of cursor to newNode
\(\;\;\;\;\;\) c. otherwise next is notnull
so recursively move down the tree:
\(\;\;\;\;\;\)\(\;\;\;\;\;\) - insertBST(newNode, next)otherwise newNode’s key > cursor’s key so go right
\(\;\;\;\;\;\) a. next ← cursor’s right child
\(\;\;\;\;\;\) b. if next isnull
then we have found insertion spot, so insert the node
\(\;\;\;\;\;\)\(\;\;\;\;\;\) - set the right child of cursor to newNode
\(\;\;\;\;\;\) c. otherwise next is notnull
so recursively move down the tree:
\(\;\;\;\;\;\)\(\;\;\;\;\;\) - insertBST(newNode, next)
e. increment size of the tree, if tracking size
Note that this algorithm does not prevent the tree from containing linear sections. Such an algorithm is essential but beyond the scope of this first-level textbook.
Finding an Item in a BST#
There are two kinds of finding in a binary search tree:
When we are finding a key, which is the property that the tree is sorted on
When we are finding a field that is not the property that the tree is sorted on.
Given a Key#
If we need to find a key, the attribute on which the tree is sorted, we just need to search one path down the tree. Our algorithm will run in \(\theta (\log n)\) if the tree is reasonably nonlinear.
Algorithm findFromKey(whatToFind, cursor)
Assumes: cursor is a node and starts at root on first call
If cursor is
null
\(\;\;\;\;\;\) a. Return a signal that the item is not foundIf the cursor’s appropriate data field matches whatToFind
\(\;\;\;\;\;\) a. Return the cursorIf the cursor’s appropriate data field is less than whatToFind
, go left
\(\;\;\;\;\;\) a. leftChild ← cursor’s left child
\(\;\;\;\;\;\) b. Return findFromKey(whatToFind, leftChild)Otherwise, if the cursor’s appropriate data field is less than whatToFind
, go right
\(\;\;\;\;\;\) a. rightChild ← cursor’s right child
\(\;\;\;\;\;\) b. Return findFromKey(whatToFind, rightChild)
Given Another Data Field Not the Key#
If we need to find a data item that is not a key, then we must search the entire tree to look for it. This is similar to sequential search, looking at each node once until the item is found or we have checked everywhere. The time complexity will be in \(\theta (n)\). We can choose to go through the tree with any of the traversals we know. Here is an algorithm using pre-order traversal:
Algorithm findFromNonKey(whatToFind, cursor)
Assumes: cursor is a node and starts at root on first call
If cursor is
null
\(\;\;\;\;\;\) a. Return a signal that the item is not foundIf the cursor’s appropriate data field matches whatToFind
\(\;\;\;\;\;\) a. Return the cursorGo left
\(\;\;\;\;\;\) a. leftChild ← cursor’s left child
\(\;\;\;\;\;\) b. possibleItem ← findFromNonKey(whatToFind, leftChild)If the item was not found to the left, then look to the right
\(\;\;\;\;\;\) a. rightChild ← cursor’s right child
\(\;\;\;\;\;\) b. possibleItem ← findFromNonKey(whatToFind, rightChild)Return possibleItem
Practice Questions#
In Java, write the node class for a BST that contains a student’s name, their id number, and their major.
In Java, write the start of a BST made of nodes from question 1. Your file should just have attributes and a constructor to make an empty tree.
Write the Java code for inserting an item into the BST from question 2. Keep the tree sorted by id number.
Write the Java code for finding an item in the BST from question 3, that is, if given the id number return the corresponding node, or
null
if the id number is not found.Write a
toString()
method for the tree in question 3.Write the Java code to find the children of a given node in the BST from question 3. The result should be a list of two nodes, with the first item being the left child and the second item being the right child.