Fast and Small Resizable Arrays

By Erik Demaine, July 01, 2001

Dynamically resizable arrays are an important answer to many problems. Erik describes several ways to resize an array, each with its own advantages and disadvantages.

Jul01: Algorithm Alley

Erik is a Ph.D. student at the University of Waterloo in Ontario, Canada, with a speciality in algorithms. He can be reached at [email protected].

A basic problem that arises in many applications is accumulating elements into a list when the number of elements is unknown ahead of time. There are several list structures that support this application, and they each have various advantages and disadvantages. Two operations needed from such a structure are the ability to append an element to the end of the structure (growth), and some method of accessing the elements currently in the structure. If you are implementing a stack or similar structure, you would like the additional operation of removing the last element from the structure (shrinking).

One simple solution is a linked list, which can easily grow to any size and supports sequential traversal but slow random access. Usually, however, random access is unnecessary, so this is a viable solution.

Another solution is a dynamically resizable array, the topic of this article. The array is a powerful structure, and may seem like overkill for accumulating elements when random access is often unnecessary. However, it turns out that in several ways, an array is the most appropriate structure, even for the simple task of accumulation.

An array is a simple example of what the data-structuring community calls an "implicit data structure," meaning that it organizes data without actually using any pointers to store that organization: The organization is implicit from the data itself, and from the order in which it is stored. This fact has two important implications:

Arrays have basically no storage overhead: All that is stored (in a static array) is the data itself. In comparison, a linked list wastes at least half of the storage because each node has at least one pointer. Of course, this wastage factor can be improved by storing several elements in each node, but a constant fraction of space is still wasted in a linked list.
Arrays have a linear layout in memory, meaning that scanning the elements in order has ideal cache performance. Cache lines will be fully utilized (for this property it is also crucial that no space is wasted), and the prefetching mechanism should also detect a linear scan, unlike a linked list.

As a consequence, arrays are ideal for accumulating lists of data and for maintaining stacks. It is also possible to efficiently maintain a queue using two arrays. In addition, because arrays support random access, they can efficiently implement heaps (priority queues) and randomized queues (supporting removal of a random element). But to support a varying number of elements, we need to be able to dynamically grow and shrink an array.

In this article, I'll examine several techniques for making arrays resizable — three of which have already been described in DDJ, and one new approach.

Growing Arrays by Doubling

Because of the interest in this problem, there have already been two DDJ articles about resizable arrays. John Boyer's "Resizable Data Structures" (DDJ, January 1998), describes two approaches based on C's realloc function and applies them to resizable arrays and hash tables. The first approach is standard but highly inefficient: Whenever the array runs out of space, grow it by a few elements by calling realloc. Unfortunately, each realloc call can involve copying all of the elements into a new version of the array, and this event is likely when there are multiple resizable arrays growing at the same time. As a result, the time to insert an element can be proportional to the length of the array for a quadratic total cost of inserting many elements into an empty array. This performance is unreasonable, in particular far slower than a linked list, which requires only constant time per append or access.

The second approach, array doubling, is a simple modification that greatly improves performance. Instead of growing by a few elements whenever the array becomes full, double its size. This small change has the important consequence that in between two consecutive growths of an array, half of the elements were inserted requiring growths themselves. Thus, the cost of copying all the elements in one growth is balanced roughly equally by the cheapness of inserts since the last growth. In this sense, the amortized cost of appending an element is constant.

This array-doubling approach is simple to implement, has ideal cost for accessing elements (the same as a static array), and has optimal amortized performance for appends. Of course, the worst-case performance can be quite high: All the elements may need to be copied during a realloc call for a linear cost. A potentially more serious problem is that a large portion of storage is usually wasted. Up to half of the array itself can be empty, and during a realloc operation, two copies of the array may be in use simultaneously for a total wastage of 67 percent in the worst case. As with linked lists, this factor can be improved by growing the array by less more often (for example, only growing by 50 percent), but a constant fraction of space is still wasted.

In many cases, array doubling is the right tool for the job. In many other cases, the space wastage is too high. It may be surprising that you can waste less than a constant fraction of space and still resize arrays in amortized constant time. But it is possible.

HATs Use Less Space

A third approach was proposed by Edward Sitarski in "HATs: Hashed Array Trees" (DDJ, September 1996). The HAT structure has two levels. The top level consists of a single memory block pointing to all the memory blocks at the bottom level, which store elements in the array. All memory blocks have the same size, which is a power of two. This means that each block has a size around n, because there are two levels and (n)2=n.

Growing a HAT is conceptually simple: Attempt to add an element to the end of the last memory block at the bottom level. If this memory block becomes full, you add another memory block at the bottom level. In this case, you have to add a pointer to the end of the top memory block. When the top memory block becomes full, you resize every memory block to an appropriate new size — one at a time. As with array doubling, this global resize happens infrequently, so the amortized performance is still constant. As with array doubling, the worst-case performance can be high — O(n).

The key advantage of the HAT approach is the wasted space: Only the last bottom block and the top block may be partially empty, so the structure wastes only O(n) space, instead of O(n) as in linked lists and array doubling. The leading constant in O(n) is at most around two. It is important to stress how important this improvement is, particularly for large arrays and for many arrays. For example, if you had a megabyte of elements, around 2 KB would be wasted at any time. If you had a gigabyte of elements, only around 64 KB would be wasted.

Incidentally, although it isn't mentioned in Sitarski's article, the HAT structure can be grown and shrunk at the beginning as well as at the end. Whenever the first block at the bottom level becomes full, you add another block at the beginning of the bottom level. Now the top-level block just needs to be stored as a circular queue, using a single index to say where the queue begins. This extension allows HATs to be used as double-ended queues (deques).

A New Approach Resizes Less

The resizable array structure I present here has two levels (like a HAT), but the blocks all have different sizes (unlike a HAT). This variety lets you avoid resizing any blocks, ever, except for the top-level block. As the array grows, the sizes of the bottom-level blocks will follow a curious sequence of powers of twos:

1; 2; 2, 2; 4, 4; 4, 4, 4, 4; 8, 8, 8, 8; 8, 8, 8, 8, 8, 8, 8, 8; 16,...

The semicolons give some reason to this sequence: The groups in between semicolons alternate between doubling in value and doubling in size. Thus, the first and second groups have one number; the third and fourth groups have two numbers; the fifth and sixth groups have four numbers; and so on. On the other hand, the first group has value 1; the second and third groups have value 2; the fourth and fifth groups have value 4; and so on. The collection of blocks in between two semicolons is called a "superblock."

This sequence may seem strange, but it is easy to implement and turns out to work well for finding elements in blocks. Now whenever the last bottom-level block becomes full, you add a new bottom-level block, sized according to the sequence just mentioned. Only when the top-level block becomes full must you realloc anything, and then you only realloc the top block, which has size O(n). This makes array growth simpler and somewhat faster, and has the consequence that the worst-case performance of appending an element is O(n). As with HATs, the amortized performance is constant, and the space wastage is O(n) with a leading constant around two. Incidentally, it is possible to prove that this space bound is the best possible.

Listing One is C++ code for appending an element to an array, while Listing Two shows how easy it is to scan through the elements in the array, in this case for the purpose of printing the array. See Listing Three for the class definition, which lists the variables used for the data structure's state and shows how to initialize this state.

Locating an Element

Listing Four shows how to access the ith element of the array, which is unfortunately somewhat more difficult with this structure. To locate the ith element in the structure, you need to find the leftmost 1 bit in the binary representation of i+1. The index of this bit, counted from the right, is the number of the superblock containing the element. The block number within that superblock is given by the first half of the bit string after the 1 bit, and the element index within that block is given by the remainder of the bit string. These properties are all interesting consequences of the choice of the block-size sequence.

How do you find the leftmost 1 bit in a computer word? This operation is an instruction on many architectures; for example, Listing Five shows code for the Intel 80386 and its successors. (The method for inlining assembly code depends on the compiler; this code is tested on the GNU g++ compiler.) Modern Pentiums perform bsr as fast as an integer addition; older chips around the same time as integer multiplication. Assembly code for finding the rightmost 1 bit on several other architectures can be found in the source code for the Linux kernel (in include/asm-*/bitops.h), and in most cases this code can be adapted to find the leftmost 1 bit on many machines.

On a machine that does not have this operation (such as a RISC chip) or for a more portable solution, there are several possibilities. You could count the number of shifts necessary to zero the word. This could take up to 32 or 64 shifts, depending on the word size of the machine and the maximum size of an array. Alternatively, you could binary search on the bits, as in Listing Six. Finally, likely the fastest solution is to build a lookup table of 64K=216 bytes, and look up every 16 bits in sequence, as in Listing Seven. If space is tight, you can instead look up every 8 bits in a table of 256 bytes. It is important to note that this lookup table is global; you need only one, even if you have several resizable arrays.

Shrinking

The ability to shrink an array by removing the last element makes it easy to implement a stack or heap using a resizable array. Fortunately, shrinking any of the resizable arrays structures is roughly the same as growth, but with a different threshold. With array doubling, we should only shrink the array when it becomes a quarter full (or a third full, or any fraction less than a half), to avoid repeatedly crossing the half-full threshold. Listing Eight shows the code for the new structure, where the equivalent trick is to keep one empty data block in case it is needed soon.

Conclusion

Dynamically resizable arrays are important for many problems; even when a simple linked list would suffice, arrays offer low storage overhead and efficient cache performance. I have described several ways to resize an array, each with their own advantages and disadvantages. Array doubling is simple but uses a constant fraction of extra space. HATs and the new approach use little extra space, but are somewhat more complicated. In terms of accessing elements, array doubling is fastest (matching static arrays), followed by HATs, followed by the new approach. In terms of growth and shrinking, the new approach is fastest because it avoids reallocating blocks. If you want to waste little space yet have fast element accesses (roughly double the cost for static arrays), HATs are the best approach.

Acknowledgment

The structure presented here is the joint work of Andrej Brodnik, Svante Carlsson, J. Ian Munro, Robert Sedgewick, and myself. It appears in our paper, "Resizable Arrays in Optimal Time and Space," in the Proceedings of the 6th International Workshop on Algorithms and Data Structures, August 1999 (Volume 1663 of Lecture Notes in Computer Science). A copy is available from http://daisy.uwaterloo.ca/~eddemain/papers/WADS99a/.

DDJ

Listing One

template <class Element>
void ResizableArray<Element>::append (Element e) {
  // If the last nonempty data block is full, add a new block.
  if (elements_in_last_block >= last_block_size) {
    // If the last superblock is full, add a new superblock.
    if (blocks_in_last_superblock >= last_superblock_size) {
      num_superblocks++;
      // If the number of superblock is odd, double the number of 
      //                                    data blocks in a superblock.
      if (num_superblocks % 2 == 1)
        last_superblock_size *= 2;
      // Otherwise, double the number of elements in a data block.
      else
        last_block_size *= 2;
      // Set the occupancy of the last superblock to empty.
      blocks_in_last_superblock = 0;
    }
    // If there are no empty data blocks:
    if (! empty_block) {
      // If the index block is full, realloc it to twice its current size.
      if (num_blocks >= index_size)
        assert (index = (Block<Element> *) realloc (index, 
                              sizeof (Block<Element>) * (index_size *= 2)));
      // Allocate a new last data block, 
      //             and store a pointer to it in the index block.
      index[num_blocks].elements = 
                    new Element [index[num_blocks].size = last_block_size];
    } else
      empty_block = 0;
    // Increment the number of blocks in total and in this superblock.
    num_blocks++;
    blocks_in_last_superblock++;
    // Set the occupancy of the last data block to empty.
    elements_in_last_block = 0;
  }
  // Store the new element at the end of the last data block.
  index[num_blocks-1].elements[elements_in_last_block++] = e;
  num_elements++;
}

Back to Article

Listing Two

template <class Element>
ostream &operator<< (ostream &os, ResizableArray<Element> &array) {
  for (int block = 0; block < array.num_blocks - 1; block++)
    for (int i = 0; i < array.index[block].size; i++)
      os << array.index[block].elements[i] << endl;
  for (int i = 0; i < array.elements_in_last_block; i++)
    os << array.index[array.num_blocks-1].elements[i] << endl;
  return os;
}

Back to Article

Listing Three

#include <assert.h>
#include <stdlib.h>
#include <iostream>

template <class Element>
struct Block {
  Element *elements;
  int size;
};
template <class Element>
class ResizableArray {
 public:
  Block<Element> *index;
  int num_elements, index_size;
  int num_blocks, empty_block, elements_in_last_block, last_block_size;
  int num_superblocks, blocks_in_last_superblock, last_superblock_size;
 
  public:
  ResizableArray () {
    num_elements = elements_in_last_block = 0;
    assert (index = (Block<Element> *) 
                     malloc (sizeof (Block<Element>) * (index_size = 1)));
    index[0].elements = new Element [index[0].size = last_block_size = 1];
    num_blocks = num_superblocks = 1;
    empty_block = 0;
    blocks_in_last_superblock = last_superblock_size = 1;
  }
  void append (Element e);
  void shrink ();
  int length () const { return num_elements; }
  inline Element &operator[] (int i);
  friend ostream &operator<< <Element> (ostream &os, 
                                       ResizableArray<Element> &array);
};

Back to Article

Listing Four

template <class Element>
inline Element &ResizableArray<Element>::operator[] (int i) {
  assert (i < num_elements);
  int j = i + 1;
  int most_significant_1_index = find_most_significant_1_index (j);
  int most_significant_1 = 1 << most_significant_1_index;
  int half_ceiling = (most_significant_1_index + 1) >> 1;
  int two_power_half_ceiling_m1 = (1 << half_ceiling) - 1;
  return index[two_power_half_ceiling_m1 +
               (1 << most_significant_1_index - half_ceiling) - 1 +
              // part above is number of blocks before this superblock
               ((j ^ most_significant_1) >> half_ceiling)]
     .elements[j & two_power_half_ceiling_m1];
}

Back to Article

Listing Five

#ifdef __i386__
// Note: Return value is undefined for j == 0.
// Append the code "\n jnz 1f\n movl $-1,%0\n 1:" to support this case.
inline int find_most_significant_1_index (int j) {
  __asm__("bsrl %1,%0" :"=r" (j) :"r" (j));
  return j;
}
#endif // __i386__

Back to Article

Listing Six

#ifdef BINARY_SEARCH
// Assumes a 32-bit machine (or that j < 2^32), and that j != 0.
inline int find_most_significant_1_index (int j) {
  int new_j, r = 0;
  if ((new_j = j & 0xFFFF0000)) { r |= 16; j = new_j; }
  if ((new_j = j & 0xFF00FF00)) { r |=  8; j = new_j; }
  if ((new_j = j & 0xF0F0F0F0)) { r |=  4; j = new_j; }
  if ((new_j = j & 0xCCCCCCCC)) { r |=  2; j = new_j; }
  if (         j & 0xAAAAAAAA)    r |=  1;
  return r;
}
#endif // BINARY_SEARCH

Back to Article

Listing Seven

#ifdef LOOKUP
#define lookup_shift 16
#define lookup_mask ((1 << lookup_shift) - 1)

char lookup[lookup_mask + 1];
int lookup_init () {
  int most_significant_1_index = -1;
  int next_power_of_two = 1;
  for (int i = 0; i <= lookup_mask; i++) {
    if (i == next_power_of_two) {
      most_significant_1_index++;
      next_power_of_two <<= 1;
    }
    lookup[i] = most_significant_1_index;
  }
  return 0;
}
int lookup_garbage = lookup_init ();
// Assumes a 32-bit machine (or that j < 2^32).
inline int find_most_significant_1_index (int j) {
  int shifted_j;
# if lookup_shift == 16
    if ((shifted_j = j >> 16))
      return lookup[shifted_j] + 16;
    else
      return lookup[j];
# elif lookup_shift == 8
    if ((shifted_j = j >> 16)) {
      j = shifted_j;
      if ((shifted_j = j >> 8))
        return lookup[shifted_j] + 24;
      else
        return lookup[j] + 16;
    } else
      if ((shifted_j = j >> 8))
        return lookup[shifted_j] + 8;
      else
        return lookup[j];
# else
#   error Only lookup table shifts of 8 or 16 are currently supported.
# endif
}
#endif // LOOKUP

Back to Article

Listing Eight

template <class Element>
void ResizableArray<Element>::shrink () {
  // Remove element.
  assert (num_elements-- > 0);
  elements_in_last_block--;
  // If the last (but not first) data block is empty:
  if (elements_in_last_block <= 0 && num_blocks > 1) {
    // If there is another empty data block, deallocate it.
    if (empty_block)
      delete index[num_blocks].elements;
    else
      empty_block = 1;
    // If the index block is a quarter full, realloc it to half its size.
    if (num_blocks <= index_size / 4)
      assert (index = (Block<Element> *) realloc 
                    (index, sizeof (Block<Element>) * (index_size /= 2)));
    // Decrement number of data blocks in total and in this superblock.
    num_blocks--;
    blocks_in_last_superblock--;
    // If last (but not first) superblock is empty, remove it.
    if (blocks_in_last_superblock <= 0) {
      num_superblocks--;
      // If the number of superblocks is even, 
      //                halve the number of data blocks in a superblock.
      if (num_superblocks % 2 == 0)
        last_superblock_size /= 2;
      // Otherwise, halve the number of elements in a data block.
      else
        last_block_size /= 2;
      // Set the occupancy of the last superblock to full.
      blocks_in_last_superblock = last_superblock_size;
    }
    // Set the occupancy of the last data block to full.
    elements_in_last_block = last_block_size;
  }
}

Back to Article

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Fast and Small Resizable Arrays

Growing Arrays by Doubling

HATs Use Less Space

A New Approach Resizes Less

Locating an Element

Shrinking

Conclusion

Acknowledgment

Listing One

Listing Two

Listing Three

Listing Four

Listing Five

Listing Six

Listing Seven

Listing Eight

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Fast and Small Resizable Arrays

Growing Arrays by Doubling

HATs Use Less Space

A New Approach Resizes Less

Locating an Element

Shrinking

Conclusion

Acknowledgment

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content