Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Associative Arrays in C++


SP95: Associative Arrays in C++

David has been programming since 1971. He can be reached on CompuServe at 75267,1632.


Although at first glance they may seem simple, associative arrays are nothing less than a database indexed by a single unique text key. In fact, a considerable number of database applications require no more than that. And like a well-written B-tree database, associative arrays are both fast and space efficient. As I migrated into the C++ world, I found the old C module I was using for associative arrays distant and dated. Consequently, I wrote a complete C++ template implementation that handles objects, pointers to objects, and pointers to functions. The implementation also deals with deep copying the arrays and automatically extends itself when it runs out of room. In this article, I'll explain these ideas and present the C++ code. Along the way I'll also detail general ideas for efficient template design, hazards for the C++ unwary, and what the C++ version offers that the C version missed.

An associative array looks like a regular C array indexed by a string. In AWK, for example, you can perform the operations in Figure 1(a). The first statement retrieves a dollar value, which is an integer, indexed by year and name of automobile. However, you aren't limited to integer arrays. Any arbitrarily complex structure is a candidate for an array. With AWK, the structure consists of text fields in a string. With C or C++, the array can contain structs, classes, or pointers to either. Thus, any form of consistently organized data can be stuffed into an associative array. Our C++ template has the same basic capability as the AWK example and tries, within the limitations of its grammar, to mimic the syntax.

Listing One is the header file for the associative array. Note that the capability splits across two classes, ASSOCIATION_BASE and ASSOCIATION. The first is a standard C++ class, while the second is a class template. ASSOCIATION derives from ASSOCIATION_BASE, which has no public interface--it is a "hidden" class.

You may, at this point, wonder why the capability is not simply placed in the ASSOCIATION class. The reason is efficiency. The first time I saw examples of header files with long C++ template listings, I was struck by déjà vu: Here was macro madness all over again. In the past, it was common practice to fill header files with long macros that, when invoked in a module, gave legibility to the code and also granted speed. Unfortunately, a side effect was code bloat. Every invocation replicated the macro code. Another problem was large, unreadable headers. Templates attempt with a scalpel what macros tried with a chain saw, but they have the same inherent failures. Every template you instance for a new object type replicates all the code of the template. Further, the template code must be visible at the point that an instance is made. This means stuffing the entire template into the header. When the template is small and simple, these problems are insignificant. Unfortunately, most things in software are neither small nor simple.

My solution is to split the template by creating a functional core using a standard C++ class (but having no public interface), then designing a template that has only the interface and derives its operation from the core. I then use a universal format for passing the information between the template and the core. In this case, the universal media are chars and pointers to chars. The template casts types between the universal format and the template instance type. Even though casts are usually considered dangerous, these casts are contained in a class and hidden from user access. Most important, the cast object is not reinterpreted--the functional core is simply a byte bag that does not interpret or presume upon the bytes it stores. The interpretation occurs in the template layer, where strong typing keeps users from getting their hands caught in the casting machinery.

You make an associative array using a template constructor that defines the data type and, optionally, takes a parameter that is a best guess of the maximum size. If that maximum is exceeded, the array will automatically double in size. Once created, you can stuff associations into the array with the insert method. You can access keys with the find function. find returns a pointer to a data object, not the data object itself, and can therefore be used as an existence operator. If the key does not exist in the array, find will return 0. The remove function deletes an entry from an array, but it does not shrink the array. Instead, it opens up space in the array that is reusable by another insert. The first and next methods iterate over the object in the array. Since there is no standardization of iterators, I picked yet another neutral format. You can use these to make your preferred iterator, whether it be a Windows-style enumerator function or a Standard Template Library (STL) iterator object. Notice in Figure 1(b) that I do not use AWK-style syntax. The reason will become apparent later, when I overload C++ operators to give the class a quasi-AWK look.

References and Other Riddles

The definition for the insert function in Listing One takes template data objects as parameters rather than as references to an object. This is unconventional. When I first wrote the class, I used the typical approach and inserted the data by reference, but this had undesirable effects. I wanted the array to handle three data formats--objects, pointers to objects, and pointers to functions--without using function overloading or custom variants of the template interface. One compiler had problems referencing a pointer to a function. This is understandable. A function pointer exists only as a pointer; there is no tangible data object behind it. Taking references of function pointers, although perfectly legal, seems to be treading near the edge of the compiler's abilities.

Another problem with references is overhead. For example, Figure 1(c) stuffs a simple array of integers with constants. With references, the compiler builds a temporary image of the numeric constant and then pushes a pointer to that temporary onto the stack. Without references, just the constant is pushed, which is much faster. Don't get me wrong: References are a valuable addition to the language, and I will need them to implement the array operator. Their intended use is avoiding the overhead of copying large objects on the stack. Passing an object with a hidden reference pointer is easier than passing the entire data area. However, when I look over my use of associative arrays I find that big objects, being dynamically allocated on the heap, already exist as pointers. It is simpler to use the pointer-based array in Figure 1(d).

Operator overloading allows an AWK-like syntax. The array operator [] gives both rvalue and lvalue access to associative arrays. Since it is impossible to add keywords to the C++ language through the overloading mechanism, the AWK keyword "in" cannot be duplicated. Instead, I use the function operator for existence checking; see Figure 2(a). Similarly, object removal and iteration cannot have an AWK syntax. I stick with the named methods remove, first, and next. The hazard with this scheme is that the array operator cannot distinguish between lvalues and rvalues , so it creates storage for new keys, even when they are used only as rvalues. With named methods, insert and find, the lvalue/rvalue dichotomy is kept separate. With the array operator, they merge. Figure 2(b) will create a zero-filled object if the key does not exist, so the existence operator is essential.

A class containing pointers needs special handling when it is copied or assigned. By default, the compiler makes an exact, byte-for-byte copy of the class, known as a "shallow copy." However, this can create aliases. Consider class instance A, which has an internal pointer, and class instance B, which is a shallow copy of A. B's internal pointer is the same as A. Delete class A, and you delete what it points to. Now, the pointer in B points to nothing, so using the B pointer will bring the walls tumbling down. The correct approach is to make a new copy of the internal object and direct B's pointer towards it. This is known as "deep copying." ASSOCIATION_BASE handles this properly. Examine the header in Listing One and the engine in Listing Two, following the code for the copy constructor and the assignment operator. Note how the copy constructor piggybacks on the assignment operator rather than replicating its code.

ASSOCIATION_BASE uses pointer-independent linkage to streamline the copying of an array. This underscores a C++ design rule--minimize the number of pointers in a class. The internal data structure for the associative array is a hash table with collisions resolved by chaining. The linked list for the chain typically looks like Figure 3(a). The next pointer in this structure is a speed trap. When you deep copy the array into a new data area you must walk every chain and reconnect the links. Not too swift. But if you store the links as offsets from some base pointer, you can bulk copy the linkage (memcpy) into the newly allocated area; see Figure 3(b).

The ASSOCIATION class forces the programmer to maintain storage for the keys. That isn't a problem if the keys are static. However, if the keys are dynamic, built in a For loop, or read from a stream, then the overhead is an immense hassle. ASSOC_STORED is a sister class that does everything ASSOCIATION does while managing heap space for copies of the keys. Use the class that best fits your intended purpose.

Wrap Up

Keeping up with the fast moving changes in the C++ language means more than continuously spending money for compiler upgrades. It also means writing software that is portable across a range of language variants. Listing One uses preprocessor #defines to control exception handling and new-style casts. Set them to 0 for older compilers, and turn them on as new features appear in your favorite compiler.

The proposed string class for the C++ standard library dynamically adjusts the case sensitivity of its comparison function using the set_case_sensitive method. The ASSOCIATION class uses statically controlled case sensitivity that you can modify by editing the ASSOC_CASE_SENSTIVE preprocessor directive in the header file, then recompiling. It would have been easy to make it dynamic; just use a function pointer for the comparison operation and toggle it between strcmp and stricmp, but I am always haunted by the ghosts of performance lost. By statically hardwiring the comparison function, the compiler can elect to optimize the code with an intrinsic inline replacement that a function pointer would disallow.

What is gained by moving the software from C to C++? Both do their job equally well. However, there is a payoff for the translation effort. On a simplistic level, C++ allows a more-natural, AWK-like syntax, which C could never achieve. Beyond this, the real gain is safety. My C version used untyped pointers, which, if you didn't keep your eye on them, could run amok. The C++ version is strongly typed. You can't accidentally shove the wrong thing into the array. The C++ version is also easier to modify, with shades of reusability. The C++ ASSOC_STORED class, as a variant of the ASSOCIATION class, required only 90 lines of code. A similar variation in C would be much more expensive. The C++ version feels more comfortable. It has the right heft and balance. These intangible features are precisely the characteristics that make a tool useful in a craftsman's hands.

Figure 1: (a) Associative-array operations as used in the AWK language; (b) associative-array operations in C++; (c) filling an array with integers; (d) using pointer-based arrays.

(a)
price = bluebook["1967 VW"];        # access rvalue

bluebook["1967 VW"] -= valuedecay;   # set lvalue

if ("1928 Durant" in bluebook)      # check existence

delete bluebook["Stanley Steamer"];  # remove entry

for (pv in bluebook)                # iterate over array


(b)
ASSOCIATION<myclass> test(100); // creation

test.insert("key1",instance1);  // lvalue access

test.find("key1");             // rvalue and existence

test.remove("key1");           // remove entry

test.first(), test.next();      // iterate

(c)
ASSOCIATION<int> test2;

test2.insert("one",1);         // inserting constants

test2.insert("two",2);

(d)
ASSOCIATION<bigclass *> test3;  // note: pointer

bigclass *instance1 = new bigclass(construction parameters);

test3.insert("bigkey1",instance1);

Figure 2: (a) Using a function operator for existence checking; (b) using an existence test to prevent the array operator from unconditionally instantiating rvalues.

(a)
ASSOCIATION<maillist> test4;

dial(test4["Bob Smith"].phone_number);  // rvalue

test4["Jane Thomas"].state = "CO";      // lvalue

if (test4("John Doe"))                 // existence

(b)
// Creates "Mary Lamb" if she did not previously exist

cout << test4["Mary Lamb"].city;        // even though an rvalue

// Use this instead

if (test4("Mary Lamb"))

cout << test4["Mary Lamb"].city;

Figure 3: (a) Linked-list struct; (b) creating an offset from the base pointer.

(a)
struct ELEMENT

{

some data stuff;

struct ELEMENT *next;

};

(b)
struct ELEMENT

{

some data stuff;

unsigned int next;

// offset from "base"

};

struct ELEMENT *base;

Listing One

/***************************************************************
 * file: ASSOC.HPP
 * purpose: template class for associative arrays
 * contains:
 *    ASSOCIATION_BASE - core routines for the ASSOCIATION template
 *    ASSOCIATION - association between strings and data objects
 * copyright: 1994 by David Weber.  Unlimited use granted in EXE, OBJ,
 *  or LIB formats.  Do not sell as source without asking first.
 * environment: tested Borland C++ 4.01 and Zortech C++ 3.1r2
 * history: 10-02-94 - initial code, based on an earlier C module
 **************************************************************/
#ifndef _ASSOC
#define _ASSOC
// needed goodies
#include <string.h>
// Feature controls
#define ASSOC_CASE_SENSITIVE 1    // string case sensitivity (1 or 0)
#define ASSOC_EXCEPTIONS 0        // environment supports C++ exceptions
#define ASSOC_NEW_CASTS 0         // environment supports new C++ casts
// case sensitivity - This could be done dynamically with a function ptr in the
//    class, but would disallow the compiler to do optimization via intrinsic
//    function expansion.
#if ASSOC_CASE_SENSITIVE
#define ASSOC_STRCMP strcmp
#define ASSOC_MAP(c) (c)
#else
#include <ctype.h>
#define ASSOC_STRCMP stricmp
#define ASSOC_MAP(c) toupper(c)
#endif
// The only place exceptions occur are resource failure with the "new" calls.
// If the environment supports C++ exceptions and a failure occurs, a handler
// somewhere, even if it's terminate(), will take care of it.  Without
// exception handling we just shut down the farm via assert() and abort()
#if ASSOC_EXCEPTIONS
#define ASSOC_MEM_CHECK(p)
#else
#include <stdlib.h>
#include <assert.h>
#define ASSOC_MEM_CHECK(p) { if ((p) == 0) { assert((p) != 0); abort(); } }
#endif
// old versus new casting, not much gained here except you can search for casts
#if ASSOC_NEW_CASTS
#define ASSOC_CAST(cast,item) reinterpret_cast<cast>(item)
#else
#define ASSOC_CAST(cast,item) (cast)(item)
#endif
// defines
const int DEFAULT_ASSOC_SIZE = 64;  // default estimated size of array
const unsigned int ASSOC_NULL = ~0; // end of chain
// The base class for associative arrays.  You should NOT make an instance
// of this class.  Instead use the ASSOCIATION template below
class ASSOCIATION_BASE
  {
  protected:                    // protect everything
    ASSOCIATION_BASE(unsigned int data_size,unsigned int estimated_size);
    ~ASSOCIATION_BASE();        // destructor - make virtual if extending
                                // inheritance and/or pointing along the chain
    ASSOCIATION_BASE(const ASSOCIATION_BASE &);           // copy constructor
    ASSOCIATION_BASE &operator=(const ASSOCIATION_BASE &);// assignment
    ASSOCIATION_BASE(const char **keys,const char *data,  // static initializer
            unsigned int data_size,unsigned int count);
    unsigned int size(void)       // how many data elements in array
      { return fill_level; }
    int insert(const char *key,const char *data);         // add an element
    int remove(const char *key);  // remove an element
    char *find(const char *key);  // find an element
    const char *first(void);      // traversal functions
    const char *next(void);
    int operator!()               // test array for problems
      { return hash_list==0 || key_list==0 || link_list==0 || data_list==0; }
    int operator()(const char *key)    // existence operator
      { return find(key) != 0; }
    char &operator[](const char *key)  // access operator
      { return *reference(key); }
    char *reference(const char *key);  // get a reference to data or insert
    unsigned int hash(const char *key);// hash function
    void allocate_array(void);         // get enough space for the array
    int expand_array(void);            // resize array
    void clear_pointers(void)          // clear all pointers
      { hash_list = 0; key_list = 0; link_list = 0; data_list = 0; }
    void delete_pointers(void)         // delete all pointers
      { delete [] hash_list; delete [] key_list;
        delete [] link_list; delete [] data_list; }
    unsigned int array_size;           // full size of array
    unsigned int fill_level;           // data entries currently in array
    unsigned int *hash_list;           // hash indexed array of chains
    const char **key_list;             // storage for key pointers
    unsigned int *link_list;           // storage for key linkages
    char *data_list;                   // storage for data expressed as char
    unsigned int sizeofdata;           // size of data objects in bytes
    unsigned int iteration;            // current iteration position in data
  };
// ASSOCIATION - associative array template
//  Use this class template for creating instances when you will be
//  storing associations between character strings and data objects.
//  There are three ways to use this template:
//    direct storage - data are stored directly in the associative array.
//      good for small classes or native types
//      ASSOCIATION<myclass> direct_array(estimated_size);
//      value = *direct_array.find("key");
//    indirect storage - pointers to data are stored in the array.
//      good for large classes or pointers to things that vary in size
//      ASSOCIATION<myclass *> indirect_array(estimated_size);
//      ptr_to_value = *indirect_array.find("key");
//      value = **indirect_array.find("key");
//    function pointer storage - pointers to functions are stored.
//      ASSOCIATION<int (*)(int)> func_ptr_array(estimated_size);
//      int (**fptr)(int);
//      if ((fptr = func_ptr_array.find("key")) != 0)    // always test
//        function_return = (**fptr)(parameter);
//  You are responsible for the string storage.  In the case of indirect
//  storage you are also responsible for storing the data.
//  example:
//    ASSOCIATION<myclass> assoc_array(estimated_size);  // declaration
//    if (!assoc_array) { class is unusable; }           // validity
//    assoc_array.insert("xray",myclass_instance1);      // insert,
//    assoc_array.insert("yodel",myclass_instance2);     // returns 0 on failure
//    assoc_array.insert("zorro",myclass_instance3);
//    assoc_array.remove("yodel");             // delete, returns 0 if not there
//    cout << "Size is " << assoc_array.size() << "\n";  // size is 2
//    cout << "zorro is " << *assoc_array.find("zorro") << "\n"; // find
//    assert(assoc_array.find("garbage") == 0);        // failed find returns 0
//    if ((const char *p = assoc_array.first()) != 0)  // iterate over set
//      {                        // do not insert or remove while iterating
//      do
//        {
//        cout << p << "\t\t" << *assoc_array.find(p) << "\n";
//        } while ((p = assoc_array.next()) != 0);
//      }
//  other uses:
//    copy constructor
//      ASSOCIATION<int> x;      // fill x with stuff
//      ASSOCIATION<int> y(x);
//    assignment
//      ASSOCIATION<int> x;      // fill x with stuff
//      ASSOCIATION<int> y;
//      y = x;
//    static initialization
//      static const char *keys[] = { "key1","key2", "key3" }; // note: const
//      static int data[] = { 1,2,3 };
//      ASSOCIATION<int> x(keys,data,sizeof(keys)/sizeof(keys[0]));
template <class T> class ASSOCIATION : public ASSOCIATION_BASE
  {
  public:
    ASSOCIATION(unsigned int estimated_size = DEFAULT_ASSOC_SIZE) :
      ASSOCIATION_BASE(sizeof(T),estimated_size) { }    // default constructor
    // destructor and copy constructor found in ASSOCIATION_BASE
    ASSOCIATION<T> &operator=(const ASSOCIATION<T> &original)  // assignment
      { ASSOCIATION_BASE::operator=(original); return *this; }
                                                          // static initializer
    ASSOCIATION(const char **keys,T *data,unsigned int count) :
      ASSOCIATION_BASE(keys,ASSOC_CAST(const char *,data),sizeof(T),count) { }
    unsigned int size(void)                  // how many data elements in array
      { return fill_level; }
    int insert(const char *key,T data)       // add an element
      { return ASSOCIATION_BASE::insert(key,ASSOC_CAST(const char *,&data)); }
    int remove(const char *key)              // remove an element
      { return ASSOCIATION_BASE::remove(key); }
    T *find(const char *key)                 // find an element
      { return ASSOC_CAST(T *,ASSOCIATION_BASE::find(key)); }
    const char *first(void)                  // traversal functions
      { return ASSOCIATION_BASE::first(); }
    const char *next(void)
      { return ASSOCIATION_BASE::next(); }
    int operator!()                          // test array for problems
      { return ASSOCIATION_BASE::operator!(); }
    int operator()(const char *key)          // existence operator
      { return ASSOCIATION_BASE::find(key) != 0; }
    T &operator[](const char *key)           // access operator
      { return *(ASSOC_CAST(T *,ASSOCIATION_BASE::reference(key))); }
  };
// ASSOC_STORED - ASSOCIATION class with storage for keys
// This class is almost identical to ASSOCIATION except that it maintains the
// storage for the keys.  The interface is the same but the static initializer
// is left out.  If it is static, the keys are already stored. So why bother?
template <class T> class ASSOC_STORED : public ASSOCIATION_BASE
  {
  public:
    ASSOC_STORED(unsigned int estimated_size = DEFAULT_ASSOC_SIZE) :
      ASSOCIATION_BASE(sizeof(T),estimated_size) { }   // default constructor
    ~ASSOC_STORED();                         // destructor
    ASSOC_STORED(const ASSOC_STORED<T> &original) : ASSOCIATION_BASE(original)
      { dup_keys(); }                        // copy constructor
    ASSOC_STORED<T> &operator=(const ASSOC_STORED<T> &original)
      {                                      // assignment
      ASSOCIATION_BASE::operator=(original);
      dup_keys();
      return *this;
      }
    unsigned int size(void)                  // how many data elements in array
      { return fill_level; }
    int insert(const char *key,T data)       // add an element
      {
      if (key == 0)
        return 0;
      char *p = new char[strlen(key)+1];
      ASSOC_MEM_CHECK(p);
      strcpy(p,key);
      return ASSOCIATION_BASE::insert(p,ASSOC_CAST(const char *,&data));
      }
    int remove(const char *key);             // remove an element
    T *find(const char *key)                 // find an element
      { return ASSOC_CAST(T *,ASSOCIATION_BASE::find(key)); }
    const char *first(void)                  // traversal functions
      { return ASSOCIATION_BASE::first(); }
    const char *next(void)
      { return ASSOCIATION_BASE::next(); }
    int operator!()                          // test array for problems
      { return ASSOCIATION_BASE::operator!(); }
    int operator()(const char *key)          // existence operator
      { return ASSOCIATION_BASE::find(key) != 0; }
    T &operator[](const char *key)           // access operator
      {
      char *p = ASSOCIATION_BASE::find(key);
      if (p == 0)
        {
        if (key == 0)
          return *(ASSOC_CAST(T *,0));       // garbage in, garbage out
        char *p = new char[strlen(key)+1];
        ASSOC_MEM_CHECK(p);
        strcpy(p,key);
        return *(ASSOC_CAST(T *,ASSOCIATION_BASE::reference(p)));
        }
      return *(ASSOC_CAST(T *,p));
      }
  private:
    void dup_keys(void);
  };
// functions out of line cuz looped, large and/or called infrequently
// destructor
template <class T> ASSOC_STORED<T>::~ASSOC_STORED()
  {
  for (unsigned int i = 0 ; i < fill_level ; i++)
    delete [] ASSOC_CAST(char *,key_list[i]);
  }
// remove
template <class T> int ASSOC_STORED<T>::remove(const char *key)
  {
  char *data = ASSOCIATION_BASE::find(key);
  if (data != 0)
    {
    data = ASSOC_CAST(char *,key_list[(data-data_list)/sizeofdata]);
    if (ASSOCIATION_BASE::remove(key))
      {
      delete [] data;
      return 1;
      }
    }
  return 0;
  }
// duplicate the full set of keys
template <class T> void ASSOC_STORED<T>::dup_keys(void)
  {
  char *newkey;
  const char *oldkey;
  for (unsigned int i = 0 ; i < fill_level ; i++)
    {    // duplicate keys
    oldkey = key_list[i];
    newkey = new char[strlen(oldkey)+1];
    ASSOC_MEM_CHECK(newkey);
    strcpy(newkey,oldkey);
    key_list[i] = newkey;
    }
  }
#endif   // _ASSOC

Listing Two

/***************************************************************
 * file: ASSOC.CPP
 * purpose: template class for associative arrays
 * contains:
 *    ASSOCIATION_BASE - core routines for the ASSOCIATION template
 *    ASSOCIATION - association between strings and data objects
 * copyright: 1994 by David Weber.  Unlimited use granted in EXE, OBJ,
 *  or LIB formats.  Do not sell as source without asking first.
 * environment: tested Borland C++ 4.01 and Zortech C++ 3.1r2
 * history: 10-02-94 - initial code, based on an earlier C module
 **************************************************************/
#include "assoc.hpp"
/************************************************
* function:ASSOCIATION_BASE(unsigned int data_size,unsigned int estimated_size)
*  Constructor for an associative array
* parameters: byte size of a data element and the estimated size of array
* returns: nothing
************************************************/
ASSOCIATION_BASE::ASSOCIATION_BASE(unsigned int data_size,
                                   unsigned int estimated_size)
  {
  clear_pointers();            // preset as invalid
  sizeofdata = data_size;      // set up sizes
  array_size = estimated_size;
  fill_level = 0;              // empty to start
  allocate_array();            // allocate space
  }
/************************************************
 * function: ~ASSOCIATION_BASE()
 *  Destructor for an associative array
 * parameters: none
 * returns: nothing
 ************************************************/
ASSOCIATION_BASE::~ASSOCIATION_BASE()
  {
  delete_pointers();           // delete storage
  clear_pointers();            // just for tidiness
  }
/************************************************
 * function: ASSOCIATION_BASE(const ASSOCIATION_BASE &original)
 *  copy constructor for class
 * parameters: previous associative array to copy
 * returns: nothing
 ************************************************/
ASSOCIATION_BASE::ASSOCIATION_BASE(const ASSOCIATION_BASE &original)
  {
  clear_pointers();            // no heap to start
  *this = original;            // piggyback on assignment operator
  }
/************************************************
 * function: ASSOCIATION_BASE & operator=(const ASSOCIATION_BASE &original)
 *  assigment operator for class
 * parameters: previous associative array to copy
 * returns: *this
 ************************************************/
ASSOCIATION_BASE & ASSOCIATION_BASE::operator=(const ASSOCIATION_BASE &original)
  {
  delete_pointers();                 // remove old storage
  clear_pointers();                  // no heap to start
  array_size = original.array_size;  // essential data
  fill_level = original.fill_level;
  sizeofdata = original.sizeofdata;
  allocate_array();                  // allocate heap
  if (!*this)                        // valid?
    return *this;
  // copy hash, keys, links and data; this only works cuz linkage is via offsets
  memcpy(hash_list,original.hash_list,array_size * sizeof(unsigned int));
  memcpy(key_list,original.key_list,fill_level * sizeof(const char *));
  memcpy(link_list,original.link_list,fill_level * sizeof(unsigned int));
  memcpy(data_list,original.data_list,array_size * sizeofdata);
  return *this;
  }
/************************************************
 * function: ASSOCIATION_BASE(const char **keys,const char *data,
 *                            unsigned int data_size,unsigned int count)
 *  static initialization constructor, build an associative array with
 *  pre-existing key and data arrays.
 * parameters: pointer to an array of keys, pointer to an array of data
 *         expressed as chars, size of array in elements
 * returns: nothing
 ************************************************/

ASSOCIATION_BASE::ASSOCIATION_BASE(const char **keys,const char *data,
                                   unsigned int data_size,unsigned int count)
  {
  unsigned int i;
  clear_pointers();            // mark storage as invalid
  sizeofdata = data_size;      // set up sizes
  array_size = count;
  fill_level = 0;              // empty to start
  allocate_array();            // allocate space
  if (!*this)                  // valid?
    return;
  for (i = 0 ; i < count ; i++, keys++, data += sizeofdata)
    insert(*keys,data);        // add info
  }
/************************************************
 * function: int insert(const char *key,const char *data)
 *  Insert an entry into the associative array.  The caller is responsible for
 *  storage of the key;
 * parameters: const pointer to key string and const/non const pointer to data
 * returns: 1 if inserted OK or 0 if a bad key
 ************************************************/
int ASSOCIATION_BASE::insert(const char *key,const char *data)
  {
  unsigned int index,k;
  if (key == 0)                // no null keys allowed
    return 0;
  if (fill_level >= array_size)
    if (!expand_array())
      return 0;
  index = hash(key);           // find start in array
  for (k = hash_list[index] ; k != ASSOC_NULL ; k = link_list[k])
    {                          // see if already exists
    if (ASSOC_STRCMP(key,key_list[k]) == 0)
      {                        // replace current data
      memcpy(data_list+k*sizeofdata,data,sizeofdata);
      return 1;
      }
    }
  key_list[fill_level] = key;  // put in new data
  link_list[fill_level] = hash_list[index];
  hash_list[index] = fill_level;
  memcpy(data_list+fill_level*sizeofdata,data,sizeofdata);
  fill_level++;
  return 1;
  }
/************************************************
 * function: int remove(const char *key)
 *  Remove an entry from the associative array
 * parameters: const pointer to the key string
 * returns: 1 if removed or 0 if not found in the array
 ************************************************/
int ASSOCIATION_BASE::remove(const char *key)
  {
  unsigned int k,*l;
  if (key == 0)                // no null keys allowed
    return 0;
  for (l=hash_list+hash(key), k=*l ; k != ASSOC_NULL ; l=link_list+k, k=*l)
    {                          // find entry
    if (ASSOC_STRCMP(key,key_list[k]) == 0)
      {
      *l = link_list[k];       // unlink
      fill_level--;            // level shrinks
      key_list[k] = key_list[fill_level];  // move top of pile into "removed"
      link_list[k] = link_list[fill_level];
      memcpy(data_list+k*sizeofdata,data_list+fill_level*sizeofdata,sizeofdata);
      for (l=hash_list+hash(key_list[k]) ; *l != ASSOC_NULL ; l=link_list + *l)
        if (*l == fill_level)
          {                    // fix link to what was top of pile
          *l = k;
          break;
          }
      return 1;
      }
    }
  return 0;                    // doesn't exist
  }
/************************************************
 * function: char *find(const char *key)
 *  Lookup an entry in the associative array and return a non const pointer
 * parameters: const pointer to the key string
 * returns:non const pointer to the data associated with the key
 ************************************************/
char *ASSOCIATION_BASE::find(const char *key)
  {
  unsigned int k;
  if (key == 0)                // no null keys allowed
    return 0;
  for (k = hash_list[hash(key)] ; k != ASSOC_NULL ; k = link_list[k])
    {                          // follow chain in array
    if (ASSOC_STRCMP(key,key_list[k]) == 0)
      return data_list + k * sizeofdata;
    }
  return 0;                    // not found
  }
/************************************************
 * function: const char *first(void)
 *  Find the first key string in the array.  Follow this by
 *  next() calls until a 0 is returned.  Inserting or Removing
 *  from an array while you are iterating will invalidate the
 *  iteration sequence (but doesn't mess up the array).
 * parameters: none
 * returns: first key string encountered or 0 if none
 ************************************************/
const char *ASSOCIATION_BASE::first(void)
  {
  iteration = 0;               // start from beginning
  return next();               // and search
  }
/************************************************
 * function: const char *next(void)
 *  Find the next key string in the array, call first()
 *  to start iteration.
 * parameters: nothing
 * returns: next key string encountered or 0 if all done
 ************************************************/
const char *ASSOCIATION_BASE::next(void)
  {
  while (iteration < fill_level)   // until end of data
    return key_list[iteration++];  // return key
  return 0;
  }
/************************************************
 * function: char *reference(const char *key)
 *  find a key and return a reference to its data if it is there
 *  otherwise insert a place for it and return a reference to the
 *  zeroed out hole
 * parameters: const pointer to the key string
 * returns: pointer to associated data (which may be zeroed out)
 ************************************************/
char *ASSOCIATION_BASE::reference(const char *key)
  {
  unsigned int k,index;
  if (key == 0)                                // no null keys allowed
    return 0;
  index = hash(key);
  for (k = hash_list[index] ; k != ASSOC_NULL ; k = link_list[k])
    {                                          // follow chain in array
    if (ASSOC_STRCMP(key,key_list[k]) == 0)
      return data_list + k * sizeofdata;       // found it
    }
  if (fill_level >= array_size)                // expand if necessary
    if (!expand_array())
      return 0;
  key_list[fill_level] = key;                  // put in hole for new data
  link_list[fill_level] = hash_list[index];
  hash_list[index] = fill_level;
  memset(data_list+sizeofdata*fill_level,0,sizeofdata);
  return data_list + sizeofdata*fill_level++;  // return pointer to hole
  }
/************************************************
 * function: unsigned int hash(const char *key)
 *  local function calculates a hash.  Designed to minimize clustering.
 * parameters: key string
 * returns: hash value clipped to array size
 ************************************************/
unsigned int ASSOCIATION_BASE::hash(const char *key)
  {
  unsigned int index;
  const unsigned char *k;
  for (index=0x5555, k=ASSOC_CAST(const unsigned char *,key) ; *k != 0 ; k++)
    index = (index << 1) ^ ASSOC_MAP(*k);      // hash key
  return index % array_size;                   // fit in array
  }
/************************************************
 * function: void allocate_array(void)
 *  local function allocates and initializes the array
 * parameters: none
 * returns: nothing
 ************************************************/
void ASSOCIATION_BASE::allocate_array(void)
  {
  unsigned int i;
  hash_list = new unsigned int[array_size];    // allocate hash array
  key_list = new const char *[array_size];     // allocate key pointers
  link_list = new unsigned int[array_size];    // allocate key linkage
  data_list = new char[array_size*sizeofdata]; // allocate storage for data
  ASSOC_MEM_CHECK(hash_list);                  // validate resources
  ASSOC_MEM_CHECK(key_list);
  ASSOC_MEM_CHECK(link_list);
  ASSOC_MEM_CHECK(data_list);
  for (i = 0 ; i < array_size ; i++)
    hash_list[i] = ASSOC_NULL;                 // preset with nothing
  }
/************************************************
 * function: int expand_array(void)
 *  double the size of the array
 * parameters: none
 * returns: 1 if expanded OK or 0 if failed
 ************************************************/
int ASSOCIATION_BASE::expand_array(void)
  {                            // if array full, increase size
  const char **old_key;
  char *old_data;
  unsigned int i,index;
  old_key = key_list;          // save old data
  old_data = data_list;
  delete [] hash_list;         // remove pointer storage
  hash_list = 0;
  delete [] link_list;
  link_list = 0;
  array_size <<= 1;            // new size
  allocate_array();            // new array
  if (!*this)                  // valid?
    return 0;
  memcpy(key_list,old_key,fill_level*sizeof(const char *));
  memcpy(data_list,old_data,sizeofdata*fill_level);
  for (i = 0 ; i < fill_level ; i++)
    {                          // rehash old data into new array
    index = hash(old_key[i]);
    link_list[i] = hash_list[index];
    hash_list[index] = i;
    }
  delete [] old_key;           // blow away old storage
  delete [] old_data;
  return 1;
  }
  
End Listings


Copyright © 1995, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.