August 01, 1997
C++ Theory and Practice
Maybe It Wasn't Such a Good Idea After All
Dan Saks
August 1997/C++ Theory and Practice
Dan has second thoughts about an early C++ design decision.
Copyright © 1997 by Dan Saks
In C++, classes and structs are
essentially the same construct. They obey nearly identical
rules. Rather than repeatedly refer to "class or
struct," the draft C++ Standard neatly folds structs
into classes with this single statement:
A structure is a class defined with
the class-key struct; its members and base classes are
public by default.
(The identical statement appears in
the ARM[1]
.) Aside from coding
examples, the draft hardly mentions structs. It speaks only
of classes with the understanding that classes include
structs.
Thus, a struct in C++ can have members
(including constructors and a destructor), access
specifiers, and base classes. The only real difference
between a class declared as a class and a class declared as
a struct is in the default access to their bases and
members. For example, in
class B : D
{
T m;
};
D is a private base of B and m is a
private member of B. If you simply change the keyword class
to struct, as in
struct B : D
{
T m;
};
then D is a public base and m is a
public member. If you always specify the access on the base
class and the first member, it doesn't matter whether you
use the keyword class or struct.
Bjarne Stroustrup[
2
]
explained his reasons for defining structs as classes
as follows:
Maybe we could have lived with two
sets of rules, but a single concept provides a smoother
integration of features and simpler implementations. I was
convinced that if struct came to mean "C and
compatibility" to users and class to mean "C++ and
advanced features," the community would fall into two
distinct camps that would soon stop communicating... Only a
single concept would support my ideas of a smooth and
gradual transition from "traditional C-style
programming," through data abstraction, to
object-oriented programming. Only a single concept would
support the notion of "you only pay for what you use"
ideal.
He later added that:
I think the idea of keeping
struct and class the same concept saved us from classes
supporting an expensive, diverse, and rather different set
of features that we have now. In other words, the "a
struct is a class" notion is what has stopped C++ from
drifting into becoming a much higher-level language with a
disconnected low-level subset."
Like many beginning C++ programmers, I
was surprised when I learned that a struct is a class. But
it appealed to me right away. I have always liked designs
that unify seemingly disparate concepts within a common set
of rules provided the common set of rules are indeed
simpler than the separate of sets of rules would have been.
Therein lies my current
disillusionment. If the Grand Unification (of classes and
structs) had indeed been a good idea, it would have led to a
simpler and safer language. I don't believe it did. I now
believe it led to a language that is more complicated and
perilous than it had to be.
One of the direct consequences of the
Grand Unification is the language's penchant for generating
copy constructors and copy assignment operators. These
generated functions are a well-known source of bugs.
Generated Copy
Operations
The copy constructor for a class T is
a constructor that can be called with a single argument of
type T. The typical declaration for a copy constructor has
the form
T(T const &);
(My habit of writing T const &
rather than const T & is purely a matter of style.)
A copy assignment operator for class T
is an operator= that can be called with an argument of type
T. The typical declaration for a copy assignment operator
has the form
T &operator=(T const &);
Although the term copy assignment
sounds redundant, it isn't. A class can have numerous
assignment operators. The copy assignment is the only one
that compilers can generate. The term copy assignment
distinguishes this assignment operator from all other
assignment operators.
A generated copy constructor uses
memberwise construction. For example, if class T has members
m and n, then the generated copy constructor behaves as if
it were defined as:
T::T(T const &t) : m(t.m), n(t.n)
{
}
For any member that has a scalar type,
such as int or char *, then the compiler turns its member
initializer into an assignment.
Similarly, a generated copy assignment
uses memberwise assignment. For class T, copy assignment
behaves as if it were defined as:
T &T::operator=(T const &t)
{
m = t.m;
n = t.n;
return *this;
}
Copy constructors and copy assignment
operators are profoundly important in C++. Initialization
and assignment are fundamental operations available to all
types in C, and these functions are the means by which C++
extends them to class types. C++ uses copy constructors not
only for declarations, but also for passing function
arguments and returning function results.
It often appears that C++ is doing
you a favor by generating the copy operations for you. After
all, most classes need them and, at least for many simple
classes, the compiler-generated functions are just right.
This allows rank beginners to write code that successfully
passes class objects by value even before they know enough
about const and references to write a decent copy
constructor.
For example, you might have a class
representing rational numbers (exact fractions) defined as:
class rational
{
...
private:
long num, denom;
};
For this class, the compiler-generated
copy constructor is exactly as it should be. In this case,
the compiler has indeed done you a favor.
On the other hand, many interesting
classes contain pointers to dynamically-allocated memory.
For example, a typical vector class contains a pointer to a
dynamically-allocated array that holds the vector elements:
template <class T>
class vector
{
...
private:
T *v;
size_t n;
};
For these vector types, the generated
copy operations produce objects that share
dynamically-allocated arrays without realizing that they are
doing so. This leads to real confusion about who owns what.
Programs with such objects usually wind up leaking memory or
scrambling the free store.
A Question of
Compatibility
If a compiler might generate erroneous
copy operations, why does it generate them at all? As
Stroustrup[
2]
explained:
I personally consider it
unfortunate that copy operations are defined by default...
However, C++ inherited its default assignment and copy
constructors from C, and they are frequently used.
(By "default assignment," he
meant what the draft now calls "copy assignment.")
In C, you can assign an object of a
struct type to another object of the same struct type. You
can also pass and return objects of struct type by value.
Struct copy operations (assignment and initialization)
assign each member of the source object to the corresponding
member of the destination object. That is, they perform what
C++ calls memberwise assignment.
When you compile C code as C++, C++
treats the structs as classes. If that code performs a
struct assignment, the C++ compiler will look for a copy
assignment that will carry out the assignment. Since the
code is C, the compiler will not find what it's looking for.
If the compiler is to treat structs as classes, and yet
succeed at compiling the code, it must generate a copy
assignment that will behave just like a C struct assignment.
By the same token, the compiler must generate a copy
constructor as needed for struct initialization.
But, did C++ really inherit the copy
assignment and copy constructor from C, as Stroustrup
suggested? I don't think so. C++ inherited struct
assignment, that's all. C has copy assignments and copy
constructors only if you first posit that "a struct is
a class." By abandoning the dream of the Grand
Unification, we could have had a language without a
propensity for generating inappropriate copy operations.
The Unification That
Wasn't
The real irony of the Grand
Unification is that it was an illusion. The ARM preserved
the illusion by failing to address various incompatibilities
with C. C makes various promises about the layout of members
within a struct. One such promise is that you can convert a
pointer to a struct into a pointer to the first member of
that struct. Since a C++ class object can have hidden data
such as a vptr (a pointer to a virtual function table), C++
cannot extend those promises to all class types.
In removing the incompatibility, the
draft C++ Standard added a definition for a restricted
category of classes called POD classes. POD stands for "plain
ol' data," and (surprise!) it's pretty much a struct or
union as defined in C.
PODs are defined in terms of
aggregates. An aggregate is an array or a class with no
user-declared constructors, no private or protected
non-static data members, no base classes, and no virtual
functions. In essence, an aggregate is a class type that can
be brace-initialized. For example,
struct date
{
int mm, dd, yy;
};
is an aggregate. You can declare a
date d_day and initialize it with a sequence of expressions
enclosed in braces, as in
date d_day = { 6, 6, 1944 };
On the other hand, if a date has
private members and a constructor, as in:
class date
{
public:
date(int m, int d, int y);
private:
int mm, dd, yy;
};
then it is not an aggregate, and you
cannot brace initialize it. Rather, you must initialize a
date by calling the constructor:
date d(6, 6, 1944);
An aggregate may sound like it's just
a C struct, but it's more than that. An aggregate can have
members with non-aggregate types (such as classes with
constructors). It can also have members that are references
or pointers to members.
The draft defines a POD as an
aggregate that has no non-static data members of type
pointer to member or non-POD (or array of such types) or
reference, and has no user-defined copy assignment operator
and no user-defined destructor.
How is a POD different from a C
struct? A POD can have static data members and non-virtual
member functions (other than the ones specifically
prohibited). I believe that's the only difference.
In addition to eliminating the
incompatibility with C, the notion of POD types is useful in
describing what kinds of class objects can reside in
read-only memory (ROM). Although the draft doesn't say so
explicitly, I believe it implies that a const-qualified
class object can reside in ROM only if it has a POD type.
The current draft Standard makes
distinctions between POD and non-POD classes in a dozen or
so places. It makes distinctions between aggregate and
non-aggregate classes in a few more. If the concepts of
class and struct were truly unified, I don't think we'd see
so many such distinctions.
What to Do
As I mentioned earlier, I have always
liked designs that unify seemingly disparate concepts within
a common set of rules provided the common set of rules
are indeed simpler than the separate of sets of rules would
have been. I don't like the Grand Unification of classes and
structs; the rules aren't any simpler, and they have some
very bad effects.
Well, it's interesting to critique
the language and think about what we might have done if we
had the power to do it over. But we're not about to change
C++. Nonetheless, I think the insights are helpful. They
lead us investigate programming techniques we can use to
avoid the traps built into the language.
Next month I will look at some other
problems that arise from the Grand Unification, and look at
what we can do to cure those problems.
References
[1]
Margaret
Ellis and Bjarne Stroustrup. The Annotated C++ Reference
Manual (Addison-Wesley, 1990).
[2]
Bjarne
Stroustrup. The Design and Evolution of C++ (Addison-Wesley,
1994).
Dan Saks is the president of Saks &
Associates, which offers training and consulting in C++ and C. He
is active in C++ standards, having served nearly seven years as
secretary of the ANSI and ISO C++ standards committees. Dan is
coauthor of C++ Programming Guidelines, and codeveloper of the
Plum Hall Validation Suite for C++ (both with Thomas Plum). You
can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA,
by phone at +1-937-324-3601, or electronically at
dsaks@wittenberg.edu.