Splay trees are self-adjusting binary search trees. If the set of frequently accessed elements is a small subset of the elements in the tree, splay trees perform better than other search trees [1]. Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, data compression, and ropes (replacement of string used for long text strings). Additionally, I've used splay trees as the internal representation of associative container classes with the same interface as the associative containers in the C++ Standard Library. The implementation I present in this article is a plug-in replacement for the standard associative containers, the only difference being performance characteristics. The source code for my containers is available at http://www.dtilib.dk.
Setting the Stage
The C++ Standard Library describes a set of containers based on the concept of associative containers. They all belong to a category of data structures referred to as "dictionaries." Dictionaries let you make fast searches for their elements. One popular group of dictionary implementations is based on binary search trees. All implementations of the Standard Library I'm aware of use red-black trees to implement associative containers. Here, I present a splay-tree-based alternative implementation of associative containers. Splay trees have an interesting property compared to red-black trees. They have a sort of caching behavior: Elements recently accessed have fast access times. If some elements are accessed more frequently than others, splay trees perform the searches faster than traditional balanced search trees.
The Standard defines the set of operations that each associative container must support. For each operation, it not only defines how the operation is supposed to work, but also how efficient the operation must be. I wanted to understand how these requirements affected an implementation. The performance characteristics of splay trees do not quite fit the requirements of the Standard. This wasn't a big issue for me because most problems associated with implementing associative containers is related to the structure of binary trees, not the precise rebalancing algorithm. Consequently, I decided to base my implementation on splay trees because they represent an interesting alternative to traditional red-black implementations.
Finding the Data Fast
When searching is the primary operation on your data, it pays to structure the data so that searching is efficient. A general strategy is to always keep the data sorted and use bisection (binary search) for searching [2]. If efficient insertion and deletion are also needed, a binary search tree (BST) is often the data structure of choice. A BST is based on a set of nodes. Each node holds one data element and two pointers to child nodes (left and right child). The nodes are organized so that all the nodes in the left subtree are less than or equal to the element in the parent node. In the same manner, all elements in the right subtree are greater than or equal to the element in the parent node (Figure 1).
Figure 1: Binary search tree.
A search for a specific element starts at the top of the tree (the "root"). With two comparisons, it can be decided if the search has to continue in the left or right subtree or if the search has ended. A node without children is a "leaf node." The time a search takes for an insertion/deletion is a function of the distance from the root to the wanted node. In the worst case, this is the longest distance from the root to a leaf node, referred to as the "height" of the tree. A BST of height n
can maximally hold 2n
-1 nodes, which is also the key to efficiency of binary search trees. If balanced properly, the height is logarithmic in the number of nodes. The logarithm function is a slowly growing function. Consequently, if there are many nodes in the tree, the height is still a small number compared to the number of nodesmaking operations fast. There is a catch. You must ensure that the left and right subtrees hold approximately the same number of nodes on all levels; otherwise, there's a risk that the tree degenerates to a sorted chained list of nodes. So how do you keep the tree balanced when nodes are deleted and new ones are inserted?
Enter Rebalancing
If the tree is unbalanced, "rotations" can help (Figure 2). Rotations change the structure of the tree without disturbing the sorting order. Balanced search trees do just that in an ordered manner. They keep some kind of information in each node to help rebalancing the tree when neededtypically when a node is deleted or a new one inserted. The most commonly known variants of balanced search trees are AVL trees and red-black trees [2]; the latter is typically used in implementations of the associative containers in the Standard Library. The difference between AVL trees and red-black trees is in how balanced they guarantee the tree is:
Figure 2: Rotation operation.
- AVL trees guarantee that the longest and shortest path from the root to a leaf node at most differs by one.
- Red-black trees only guarantee that the longest path from the root to a leaf node is at most twice as long as the shortest path from the root to a leaf node.
Both rebalancing heuristics keep the tree sufficiently balanced to guarantee good performance. There is a trade-off: Guaranteeing a more perfectly balanced tree requires more work when changing the tree structure. Searches in a tree that are guaranteed to be well-balanced are fast, but insertions/deletions can be slow.
Another Approach: Splay Trees
What is important about balanced search trees is that performance guarantees are based on the fact that the longest search path is only logarithmic in the number of nodes in the tree. This guarantees that any individual operation is also fast, making the overall performance good. The tree in Figure 1 is just such a balanced tree. However, is it as fast as it can be if the most frequently accessed node is node-6, while node-4 is only seldom accessed? The answer is "probably not." A tree like that in Figure 3 probably performs better overall, even though the tree isn't as balanced as the tree in Figure 1.
Figure 3: A tree that performs well, even though it isn't as balanced as Figure 1.
In other words, the performance gain by moving node-6 to the top outweighs the extra time used to access node-4, because node-4 is only infrequently accessed. If you know the access patterns in advance, you can arrange the nodes for best performance, but this is seldom the case and difficult to implement if insertion/deletion are also allowed. Splay trees make access to frequently referred nodes fast by using the splaying heuristic: Move the accessed node to the top, but try to keep nodes already close to the top still close. Not all nodes can stay at the top, of course, so nodes not accessed slowly move down the tree.
After each search is performed, a "splay" operation is executed. The splay operation moves the just-found node to the top of the tree, making it the new root. To do this, it uses three primitives based on rotationszig, zig-zig, and zig-zag (I didn't make up these names). The operations zig-zig and zig-zag are "double rotations" that are implemented using two rotations. Zig is an ordinary rotation. Figure 4 shows the different kinds of operations minus symmetric cases. In all cases, the node named x
is moved to the top. The zig operation is only performed if x
is the direct child of the root node. In all other cases, zig-zag and zig-zig operations are performed.
Figure 4: Splay operations.
This is referred to as "bottom-up splaying" because it moves a node to the top of the tree starting at the node itself. You can combine the search and the splay operation into one operation so the splaying is done while the search is moving down the tree. In this case, the splay operation is referred to as a "top-down splay." Usually, only one of the splay methods is implemented. The performance of both splaying types is about the same, but top-down splaying tends to be slightly faster if the bottom-up splay has to first search for the node.
The splay operation has a remarkable propertythe distance from the root to any node visited during a splay operation is approximately halved. This has the effect that no matter how unbalanced the tree is, it won't stay unbalanced for long. The efficiency guarantee for splay trees is not based on a promise that individual operations are fast. Instead, a splay tree promises that a sequence of operations performs well. For most uses, this distinction isn't important because it is only the total time spent that counts. For time-critical applications, however, the distinction may be important. Splay trees can exploit this freedom and not always rebalance the tree after an update. The tree can be left unbalanced if this is faster, then rebalanced at a later, more convenient time. This type of performance guarantee is known from the push_back
function in the standard container vector. Some calls to push_back
are slow, but most are so fast that you won't notice the slow ones. These are "amortized performance guarantees." The splay operations restructure the tree in a similar way to the rebalancing operations of balanced search treesbut with the extra twist that if the access pattern has some sort of locality (many searches for the same element), the splay tree exploits this to boost performance. This is basically the same as a disk cacheexploiting locality to boost the performance of disk access.
Implementation
The current C++ Standard describes four associative containersmap
, multimap
, set
, and multiset
. Their implementation only slightly differs, making it possible for the four containers to be based on a single implementation. They differ in two ways:
- The first is whether the sorting key is a part of the inserted value as with
set
andmultiset
, ormap
andmultimap
, where the key is a separate entity. Formap
andmultimap
, this is handled by storing pairs in each node, combined with a functor extracting the key part to do the comparison between nodes. - The second difference is that
set
andmap
only allow unique keys, whilemultiset
andmultimap
allow duplicates (equivalent keys). The only place where this difference is important is in theinsert
member functions. Consequently, my implementation has two versions of theinsert
operationone for unique key containers and another for equivalent key containers. The four container classes basically keep an instance of the common implementation class and delegate their operations to this instance.