Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Persistence for C++


October 1996: Persistance For C++

Persistence for C++

Persistence for a nonpersistent language

David Channon

David is a PhD candidate in the department of computer science at the University of Newcastle. He can be contacted at [email protected].


Persistent objects are data elements that can live after the program that created them has ceased executing. Ideally, persistent objects can outlive different versions of the creation programs, the operating system, or even the hardware on which the operating system was running when the persistent objects were created. Persistence is a desirable addition for the next generation of software and computing systems: It can simplify multimedia management and tools, distributed operating systems, and shared-memory multiprocessor machines.

The most common applications using persistence are object-oriented databases. But persistence can also be used by other applications. For example, compilers could provide true incremental compilation. The persistent layer allows the parse tree to be saved without needing to provide intermediate object code. The language that generated the parse-tree segment becomes immaterial, allowing team members to develop in their chosen language. In the final stage, the parse trees are integrated to produce the targeted system executable.

Persistence is commonly applied to save an application's state. For example, a computer simulator may use a persistent layer to build a checkpoint mechanism that provides fault tolerance against system downtime. The primary application is the simplification of applications that process large amounts of data. Rather than implementing different data-organization structures in memory and on secondary storage, the data is maintained with an identical structure, irrespective of its presence in primary or secondary memory.

When integrated into an application, persistence provides a functional data-management service. Although the persistent layer is hidden, many applications do not need all the services offered. The target application may only need minimal persistent functionality-which is best satisfied by a handcrafted solution. Although this article does not provide a complete solution, I'll introduce some basic issues for implementing a persistent layer for C++, which does not formally include persistence as a language feature.

Persistence Techniques

There are several approaches to providing persistence using C++. In general, generic solutions use a preprocessor to automate the integration of the persistent layer. The preprocessor usually operates on the original source code that is augmented with special persistent directives, which guides the inclusion of the persistent layer operations. I'll restrict my discussion here to general introductory storage and restoration techniques, leaving any preprocessor-automation implementation for a future article.

To make objects persistent, two issues need to be addressed: object storage and object restoration. Object storage, includes class identification, encoding of data, and encoding of complex data such as contained objects and pointers. Object restoration includes creating an actual instance using a symbolic class name, retrieving data, and restoring complex data.

The core aspect of persistence is the ability to save and restore objects to and from secondary storage. There are two main forms of data transfer provided by C++-a text mode and a binary mode, each offering different challenges:

Text-based. C++ streams are close to offering some level of language-supported persistence. All data is converted to a common ASCII text format, allowing data to be easily transferred back and forth from secondary storage. It is the most popular method of providing persistence, since it is well suited for small amounts of data, or limited and simple data types.

Binary-based. Objects are saved directly to disk without text conversion. Binary is faster than text-based data transfer because there is no ASCII conversion. Although saving the data is simple, restoration needs to address a number of issues, including the hidden-pointer problem (see "Making C++ Objects Persistent: The Hidden Pointer Problem," by A. Biliris, S. Dar, and N. Gehani, Software, December 1993).

To provide persistence within a nonpersistent language like C++, the abstraction mechanisms of the native language are used to build the persistent layer. The implementation issues that must be considered are:

  • Extensibility: the ability to add persistent properties to application objects.
  • Typing: type versus shape versus value (object-data management), static versus dynamic establishment, and static versus dynamic recovery.
  • Modularity: modules' independence from persistence.
  • Information hiding: modular interaction among system components, with hidden intermedial representation.
  • Program construction: minimization of additional syntax and semantic baggage in writing programs that make use of the mechanism.

Not all persistent-layer solutions provide support for variant (unions), aggregate (structures), nonaggregate (built-in) and aliased (typedef) types. To provide uniform persistence, the schema (a database of type information) needs to be maintained. The schema describes each type's content and structure. This information can be used to view and manage the stored objects and the relationships between the objects. The schema may include additional information, such as compact-class information, to assist with the development of browsing tools.

Eventually, the application that originally created the object store will be developed or replaced. When the object store outlives the application, changes in the application's object structure are inevitable. Schema evolution is the ability to update the persistent data and the relationships between the stored objects in a new version of the application. The original schema information provides the framework for schema evolution, thus preserving data integrity over time.

Stream-Based Persistence

According to Bjarne Stroustrup, "The stream I/O family in C++ is exclusively concerned with the process of converting typed objects into sequences of discrete characters and vice versa," (The C++ Programming Language, second edition. Addison-Wesley, 1992). Stream-based persistence stores objects into a text-based data object store. The C++ streams library converts and transfers the built-in types. Aggregate types (structures and classes) are the programmer's responsibility to convert and transfer to disk.

There are two main approaches when incorporating persistence based on C++ streams: Adding methods or overloaded stream operators to perform the save/ restore operations; and inheriting a persistent base class that controls the save/ restore operations.

The most basic level of persistence is provided by overloading the shift operators to allow the aggregate types' data to be streamed. For example, a class designed to represent a line uses the start and end coordinates as the basic data necessary to describe the line object. The line aggregate type consists of four built-in integer data items representing the coordinates.

Two overloaded friend shift operators provide the input and output operators necessary to save and restore the data values of the class. Listing One is an example declaration including the overloaded stream operators. The example class contains the data members, which contain the start and finish points of a line. When saved, each value is separated by a space to ensure correct interpretation of the value when the data is restored. You must be careful to ensure that restoration of the data is performed in the same sequence as when it was stored; see Listing Two. In addition to the correct order of instance data, the order in which the objects themselves are saved and restored must be maintained.

Class data items are not restricted to the simple built-in types and may include nested aggregate types, variant types, aliased types, and relational pointers. Aggregate types are not as difficult to stream as it may first appear, since they can be recursively broken down into a set of built-in types and streamed. Variant types, aliased types, and relational pointers require significant management of the type information. Among these, relational-pointer management offers the most difficult challenge.

Nontrivial software involves dynamic structures: When dynamic structures are restored, the relational-pointer chains must be maintained. Updating the pointer values upon restoration is necessary, as the location of each object is likely to have changed. Therefore, a persistent data structure may not need to explicitly store the relational-pointer values. Many data structures can have their original shape inferred. It is important that the stream contains enough information to be able to rebuild the original structure. For example, a persistent binary-tree data structure can be inferred through a preorder traversal. To maintain the structure, you must insert additional information into the stream to indicate that a child node does not exist. The output stream of a binary tree of integers would have the data values dispersed with appropriate no-child indicators such as "nil." Figure 1 shows how a binary-tree structure is inferred from the flattened output of the traversal rules.

An application's persistent store would normally contain more than a single data type. For example, a line class and a rectangle class are created, both using a pair of coordinates. The line coordinates describe the end points, and the rectangle coordinates represent the location of the opposite corners. For the items to be restored correctly, the application needs to be aware of the order in which each object was saved. Without the explicit knowledge of object restoration order, the application needs to determine the owner of each data item. To determine ownership of each object in the stream, each object's data is preceded by an identification string, such as the object's class name; line 1 0 100 200, rectangle 5 20 50 60, for example. This approach provides the added benefit of allowing the persistent store to outlive the application.

The object-restoration process consists of determining the next object in the stream, creating an instance of the class by an object factory, then restoring the data. The object-factory restoration approach provides a clean restoration technique that also addresses issues concerning pointers to base classes. C++ polymorphism allows a base-class pointer to refer to a base-class object and any object derived from it. To restore a polymorphic pointer correctly, the object-data ownership issue resurfaces. A base-class pointer must be restored with a pointer to an instance of the original polymorphic type. The solution, again, requires inserting object identifications within the stream.

To help simplify many of the issues, you can use a persistent base class that encapsulates the three common operations necessary to implement the persistent layer. To make a class persistent, it must inherit the base class that provides the persistence interface for the object. The save/restore operations and an object factory, coupled with object identification, are implemented using protected virtual functions, as in Listing Three. The protected persistence methods are accessed by the stream operators. The output stream operation saves the object identification by first calling the virtual name method to output the object identification, then adding the virtual write function to output the instance data; see Listing Four. Restoration requires that the object factory be made aware of all the object names. The factory can be implemented as a global function, taking the object identification as an argument and returning a pointer to the created instance (Listing Five). The restore operation reads the identification from the stream, then uses the factory to create the related object instance that proceeds to read the instance data, as in Listing Six.

The use of the object factory as shown assumes all persistent objects will be created on the heap. Implementing persistence only through dynamically allocated objects is a common limitation among persistent stores. A distinction must be made between stream output for dynamic, polymorphic objects, and nonpolymorphic objects to allow static objects to be persistent. This distinction is made by augmenting the persistent base class with additional stream operators, specifically for dynamic recovery and additional virtual functions to provide the nondynamic restoration; see Listing Seven.

Binary-Based Persistence

As mentioned earlier, binary transfer of data to secondary storage is more efficient than the text-based approach. In addition to avoiding the penalty that text conversion incurs, a binary approach bypasses several layers of abstraction in the standard C++ I/O implementation.

Unlike streams, binary-based persistence is generally considered an external mechanism. This means the save/restore operations are performed outside the class definition, usually by low-level block I/O functions. In the save operation, the persistent store is opened and the entire object is saved as a continuous block of bytes using a low-level write system call; see Listing Eight. In the restore operation, the persistent store is opened, the object instance is created, and the data values from the store are read; see Listing Nine. The basic mechanism behind binary-based persistence is simple; however, a number of challenges exist.

Virtual Functions and Classes

The most difficult aspect is the restoration of objects with virtual functions and virtual base classes. Most C++ programmers are unaware of the existence of hidden pointers. The C++ compiler cannot determine all bindings at compile time, when polymorphism is being employed. Run-time binding is implemented by inserting hidden pointers into the class data. There are two types of hidden pointers inserted by C++: the virtual class pointer (vcp) and the virtual function pointer (vfp). This is more commonly referred to as "the hidden-pointer problem."

The vcp is inserted for each base class inherited with the keyword virtual. The base-class pointer refers to the location of the base class, so only a single instance of the base-class data will exist even when more than one copy of the base class is inherited within the inheritance hierarchy. The vfp refers to a virtual pointer table. Each entry in the table is a pointer to the virtual function's implementation. Late binding uses the virtual function table to select the correct function to execute. For example, a class declaration with a virtual print function in the base class includes a vfp, in addition to the class-instance data. The inserted vfp refers to the entry in the virtual pointer table that points to the implementation of the function. A vcp is also present when this same class inherits the base class virtually, as illustrated by Listing Ten and Figure 2.

Virtual pointers present difficulties when implementing binary-based persistence. Virtual pointers are considered volatile: Their values will vary among programs dealing with the same persistent data. Additionally, the C++ standard does not include any operators directed toward virtual pointer manipulation. As a result, the implementations of virtual pointers vary among compilers. (The standard does not dictate the implementation because the application program is not expected to manipulate them directly.) Differences include the location of the pointers within the data and the type of pointer-some compilers implement base-class pointers as an index rather than as memory-location pointers. The advantage of the base-class index is that it reduces the amount of memory occupied by the object, though it limits the total size of object-instance data. Any solution dealing with hidden pointers must consider the possible differences in virtual-pointer implementation by the compiler vendors.

The hidden-pointer problem only needs to be considered upon object restoration. There are two ways to initialize the hidden pointers. The first approach uses a member-wise copy to an instance of the object with the correctly initialized hidden pointers. Alternatively, the hidden pointers are initialized directly without disturbing the data.

Directly updating the hidden pointers minimizes the data-copying overhead. To restore the class correctly, the language is tricked into updating the hidden pointers through the use of a special version of the overloaded new operator. new allocates memory and, if no errors are generated, initializes the hidden pointers, then executes the appropriate constructor method. The special overloaded version of new designed for restoration uses a pointer parameter to distinguish this version from the original version of new, as in Listing Eleven. The code returns the pointer without allocating any memory, so no errors are reported and the language proceeds to initialize the hidden pointers.

The overloaded new will call the default constructor of the object that has the potential to destroy the restored data. One solution to controlling the execution of the constructor uses the value of a global variable indicating whether the constructor has been invoked by the overloaded restoration version of new. When a restoration operation is necessary, the global-restoration flag is set; the overloaded restoration new initializes the hidden pointers, and the global flag is reset. This completes the restoration. To help manage the entire process, a static member function encapsulating the three actions is incorporated into the persistent class; see Listing Twelve.

The constructor needs to be rearranged to modify only the instance data based on the setting of the global flag. The constructor body is protected by an encompassing conditional statement. However, further management is necessary because of a C++ feature that allows the constructor to include an initialization list. One obvious solution is to disallow the use of initialization lists and move existing code into the constructor body. This approach is not workable due to restrictions on initialization of inherited classes. An alternative approach is to use the inline conditional statement embedded in the argument of each entry in the initialization list. The inline conditional passes either the initialization value or the data's current value based on the global flag, as illustrated in Listing Thirteen.

There remains a possible problem with polymorphic base-class pointers to persistent objects. To correctly maintain C++ semantics, the polymorphic pointer may need updating to point at the appropriate base class within the restored object. If so, object-management routines that maintain a list of all the types and the associated base classes in the type can be used. Contained within this information are the offsets of the base classes, which are used to update the polymorphic base-class pointer.

The previous examples illustrate techniques for initializing the virtual hidden pointers directly. Alternative techniques employ member-wise copying.

The first approach requires the insertion of persistence member functions similar to the streams examples. The member functions read and write each of the individual data values. The disadvantages include the programming overhead of the additional methods and the execution cost of a function call for every instance-data value.

A second technique uses the assignment operator as its default behavior is to perform a member-wise copy:

1. Create space for the to be restored object and read in the data.

2. Create an instance of the object that will contain the correct hidden pointers.

3. Use the assignment operator to copy the values from the allocated space containing the object's data to the new instance.

The disadvantages of this solution are the greater allocation of memory (twice for each object), and reliance on the expected behavior of the assignment operator. (If the assignment operator is overloaded, the semantics may not hold, thus invalidating the solution.)

A general solution for object restoration using member-wise copying uses type information for all the persistent objects. The location of the hidden pointers must be known to perform the copy. A metadatatype mask is created by comparing a zeroed object with valid hidden pointers to a second instance of the class. The generated mask is used to guide the member-wise copy. The procedure follows and will operate with any C++ compiler:

    1. Overload the new operator. Use calloc() to create the space filled with zeros. The resultant call to new will leave a zero-filled object with correct hidden pointers.

    2. Create a second instance of the class and compare the values byte by byte. The equal, nonzero values indicate a virtual pointer table or a base pointer implemented as an index. Otherwise, they represent a true base pointer.

    3. With the knowledge of the size of each pointer, a mask can be built that is sufficient to guide the data-member copy.

This approach at first appears impractical, due to the simplicity of just overloading new. However, metadata is essential for fast storage of entire pages of memory.

Conclusion

This article does not provide a complete solution for persistence in C++. Although persistence can be implemented using various approaches, the necessity for type information is a common factor. Run-type information is very important for schema evolution, polymorphic-object restoration, and for distributed persistence.

References

Biliris, A., S. Dar, and N. Gehani. "Making C++ Objects Persistent: The Hidden Pointer Problem," Software, December 1993.

Laurent, Philip and Nino Silverio. "Persistence in C++," Journal of Object-

Oriented Programming, October 1993.

Soukup, Jiri. Taming C++: Pattern Classes and Persistence for Large Projects. Reading, MA: Addison-Wesley, 1994.

Wolf, Alexander L. "Abstraction Mechanisms and Persistence," Proceedings

of the 4th Workshop on Persistent Object Systems, 1990.

Figure 1: A persistent binary tree.

Figure 2: Memory layout.

Listing One


// Example of Stream operator declaration
class line {
    friend istream& operator>>(istream&, line&);
    friend ostream& operator<<(ostream&, const line&);
    public:
         void Draw();
    private:
         int startx, starty;
         int finishx, finishy;
 };

Listing Two


// Streaming the line aggregate type
ostream& operator<<(ostream &os, const line &li) {
      return os << li.startx << " " << li.starty << " " <<
           li.finishx << " " << li.finishy;
}
istream& operator>>(istream &is, line &li) {
      return is >> li.startx >> li.starty >>
           li.finishx >> li.finishy;
}

Listing Three


// Declaration of a Persistence base class
class pbase {
      friend istream& operator>>(istream&, pbase *&);
      friend ostream& operator<<(ostream&, const pbase *);

      protected:
     virtual void Read(istream&) = 0;
     virtual void Write(ostream&) = 0;
     virtual char *Name() = 0;
 };

Listing Four


// Store the object using virtual methods
 ostream& operator<<(ostream &os, const pbase *base) {
        os << base->Name() << endl;
        base->Write(os);
        return os;
 }

Listing Five


// Prototype of object factory
pbase *Create(char*);

Listing Six


// Object restoration operation
istream& operator>>(istream &is, pbase *&base) {
        char theName[30];
        is >> theName;
        is.get();                   // endl
            base = Create(theName);         base->Read(is);
        return is;
}1

Listing Seven


// Heap and non-heap based restoration persistent base class
class ipstream: public ifstream {};
class opstream: public ofstream {};

class pbase {
friend ipstream& operator>>(ipstream&, pbase *&);
friend opstream& operator<<(opstream&, const pbase *);
friend istream&  operator>>(istream&, pbase *&);
friend ostream&  operator<<(ostream&, const pbase *);
      protected:
       virtual void Read(ipstream&) = 0;
       virtual void Write(opstream&) = 0;
       virtual char *Name() = 0;
     public:
       virtual void Input(istream&)=0;
       virtual void Output(ostream&) = 0;
};

Listing Eight


// Program to save an object to disk
       main() {
         line A(1, 0, 100, 50);
         int fd = create("data.store", FILE_MODE);
         write(fd, &A, sizeof(line));
       }

Listing Nine



// Program to restore a saved object
       main() {
         int fd = open("data.store");
         Line A;
         read(fd, &A, sizeof(line));
         A.print();
       }

Listing Ten


// Code which inserts hidden pointers
      class shape {
       public:
        virtual void print();
       protected:
        int startx, starty;
       };
       class line: virtual public shape {
       public:
        void print();
       private:
        int finishx, finishy;
       }; 

Listing Eleven


// Overload new and restoration of a class example
    class _pce { };

    void operator new(size_t, _pce *p) {
         return (void*) p;
    }

    new((_pce*)p) employee;

Listing Twelve


// static restore method.
    extern short _fix_hidden;
    static void D::reinit(void *p) {
          _fix_hidden = 1;
          new((_pce*)p)D;
          _fix_hidden = 0;
    }

Listing Thirteen


// Modification to constructor and Initialization list.
//        D::D(parameter declarations) initization list {}
     employee::employee(): sal(_fix_hidden ? sal: 3000) {
           if(!_fix_hidden) {

           }
     }


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.