Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The Standard Librarian: User-Defined Format Flags


February 2001 C++ Experts Forum/The Standard Librarian


When you write a class, you'll probably want to define operations to read and write objects of that class. The usual technique in C is to define functions with names like read_X and write_X, but in C++ we have a better way. I/O functions in C, like printf and scanf, involve format strings that describe a fixed set of types like int, double, and char*: you write printf("x = %d", x) if x is an int, printf("x = %g", x) if x is a double, and so on. The C I/O library is not extensible, so we use one mechanism, printf, for built-in types, and another mechanism, functions like write_X, for user-defined types.

The C++ I/O library is extensible. It relies on overloading, not on format strings: you write cout << "x = " << x whether x is an int, a double, or any other type. To make this mechanism work with your own type, all you have to do is define a function with the signature

std::ostream& operator<<(std::ostream&, const my_type&)

Built-in types and user-defined types appear on an equal footing; in both cases, the appropriate function is selected by ordinary overload resolution. It's even possible to write generic I/O components parameterized by type, such as ostream_iterator.

This is all straightforward, and it's covered in every introductory C++ book [1]. There's just one annoying complication: what do you do if there's more than one good choice for the class's printed representation?

This is hardly a rare occurrence. If you're writing a class to store IPv6 addresses, for example, you can print an address as 0:0:0:0:0:FFFF:8190:3426, ::FFFF:129.144.52.38, or in several other ways. If you're writing a class to represent physical quantities, should you print a length as 0.5m, 50cm, or 20in? If you're writing a date and time class, do you print "June 16, 1904", "16/6/04", or any of a dozen other possibilities? (The time_put and time_get locale facets give some limited guidance on these issues, but they're mostly a low-level framework.) We even have this problem with a humble complex number class: should we write a complex number as (5, 3), 5 + 3i, 5 + 3j, or perhaps 5.8*exp(0.54i)?

In all but the most trivial cases, there's more than one sensible way to print an object. If you're writing a class that's only intended for a single use in a specific project, maybe it's acceptable to hard-wire a specific policy. If you're writing a class for reuse, however, you don't have that luxury. It's the class's user, not the class's designer, who knows which choice is appropriate in a given context.

An obvious bad idea would be to give your class a static member variable that specifies formatting options. It's obvious because it gives the user a choice, and it's a bad idea because the choice is global. The usual objections to global data apply: a formatting style that's appropriate in one part of a large program may not be appropriate in another part. Global data means that everything in your program potentially depends on everything else.

If we were writing a print_X function, there would be a very easy way to solve this problem: we would just give it an extra argument where the caller could provide format flags. (You've probably written functions like that.) We can't do that here because operator<< only takes two arguments, the stream and the object to be printed; there's no obvious place to put format flags. Where should they go?

What Would int Do?

Scott Meyers's rule of thumb for user-defined classes [2] applies here: "When in doubt, do as the ints do." Let's look at how the C++ Standard library solves the problem of format flags for built-in types like int.

You have a great many formatting choices when you write an int! Here's an incomplete list:

  • What base to display it in; the choices are decimal, octal, and hexadecimal.
  • Whether to put in a distinguishing prefix that shows what base it's displayed in.
  • Whether to put in a + sign when the number is positive.
  • The minimum field width. If you're printing an integer with fewer digits than the width, the library will add padding.
  • Whether the padding goes on the left, on the right, or in between the sign and the rest of the number.

These options are stored in the ios_base class, which istream and ostream inherit from. Of the options listed above, the width is an integer (accessed by the width member function), and the others are bits within a single format flag word (accessed by the flags and setf member functions). So, for example, the following snippet will print an integer in hexadecimal, padding it to 10 characters so that the number appears on the right-hand side of the field:

#include <iostream>
int main() {
  std::cout.setf(std::ios_base::hex,
                 std::ios_base::basefield);
  std::cout.setf(std::ios::right, 
                 std::ios_base::adjustfield);
  std::cout.width(10);
  std::cout << 42 << std::endl;
}

Or, as an abbreviation, you can use manipulators instead:

#include <iostream>
#include <iomanip>
int main() {
  std::cout << std::hex << std::right << std::setw(10)
  std::cout << 42 << std::endl;
}

There's nothing magic about manipulators; they're just another example of an overloaded operator<<, and, in the end, they just invoke member functions like setf and width. Whether we use manipulators or call setf and width directly, all we're doing in this code sample is changing the state of a specific ostream object, cout.

Adding Your Own Format Flags

It might seem that we have backed ourselves into a corner. If we want our own class to behave the same way as int, then we should examine and set format flags in ios_base. But the only reason the library can do that for int is that flags like hex and right were built in to ios_base from the beginning. We can't add new flags to an existing class. Or can we?

We can't add new member variables to ios_base, but we can do something that's just as good. Every ios_base object contains an array of user data, accessed through the iword member function; os.iword(n) returns a reference to a long. There is no upper limit on n: the array is automatically extended whenever you give iword a larger argument than it has seen before. (The Standard guarantees that the newly created element is initialized to zero.) So if you choose some fixed integer my_index, you can treat os.iword(my_index) as your own private format flag. The syntax is slightly different from that of setf, but you'll probably want to hide such low-level details behind a manipulator anyway.

The only problem that's left is how to choose my_index; we don't want to have a clash when two different classes decide they want to store their format flags in the same place. There's a simple solution: ios_base has a static member function called xalloc, which returns a different number each time it's called. The intention is that you get an index from xalloc and then use that index whenever you access your format flag. If everyone uses the xalloc protocol, there will be no clash. Or, putting these words into code:

long& my_format_flag(std::ios_base& s) {
  static int my_index = std::ios_base::xalloc();
  return s.iword(my_index);
}

long get_my_flag(std::ios_base& s) {
  return my_format_flag(s);
}

void set_my_flag(std::ios_base& s, long n) {
  my_format_flag(s) = n;
}

The static variable my_index will be initialized the first time my_format_flag gets called.

Putting It All Together

Let's look at a simple example of a class where we might want to define our own format flags: a class to store a name. This class isn't nearly complicated enough to accommodate all of the cultural variation in personal names, but it suffices as an illustration:

struct name {
public:
  name(const std::string& g, const std::string& f)
    : given(g), family(f)
    { }
  std::string given;
  std::string family;
};

In the US, there are two customary ways of printing a person's name; in some contexts we write a name as "Leopold Bloom" and in others as "Bloom, Leopold." We'll define an output operator that accommodates both formats.

First we'll add a little bit of formatting machinery to the name class, defining an enum to make the code clearer.

struct name {
   ...
public:
  enum order { given_first = 0, family_first = 1 };

  static void set_order(std::ios_base& s, order o)
    { flag(s) = o; }
  static order get_order(std::ios_base& s)
    { return (order) flag(s); }

private:
  static long& flag(std::ios_base& s) {
    static int n = std::ios_base::xalloc();
    return s.iword(n);
  }
};

By defining given_first to have the value 0, we're choosing it as the default.

Second, we'll define some manipulators. This is purely syntactic sugar, but it's nice to let users write code like

std::cout << "Your name is " << given_first << your_name;

instead of forcing them to write

name::set_order(std::cout, name::given_first);
std::cout << "Your name is " << your_name;

Fortunately, defining manipulators is very easy:

std::ostream& given_first(std::ostream& os) {
  name::set_order(os, name::given_first);
  return os;
}

std::ostream& family_first(std::ostream& os) {
  name::set_order(os, name::family_first);
  return os;
}

These functions may not look like manipulators, but they are. The C++ Standard library contains an overload for operator<< where the first argument is an ostream& and the second is a pointer to a function that has a single argument of type ostream& -- like one of these functions. If you write cout << given_first, this overload will be selected and operator<< will in turn invoke given_first(cout). (Defining a manipulator that takes an argument, as setw does, is a bit more complicated.)

Finally, we'll put all of the formatting information together in an output operator. It isn't very complicated, but, as usual with I/O, there are a lot of different cases to consider. We examine our new format flag, using name::get_order, to find which component comes first. Then we pad the name with blanks, if necessary, to bring its length up to os.width. We'll respect the user's choice of where to put the padding: on the left, on the right, or internally, between the two parts of the name.

std::ostream& operator<<(std::ostream& os, const name& n)
{
  // Determine order of components.
  std::string first, last, sep;
  if (name::get_order(os) == name::given_first) {
    first = n.given;
    sep = " ";
    last = n.family;
  }
  else {
    first = n.family;
    sep= ", ";
    last = n.given;
  }

  // Compute how much padding to add.
  const std::streamsize len =
    first.size() + sep.size() + last.size();
  const std::streamsize npad =
    os.width() — std::min(os.width(), len);

  // Find out where to add padding.
  switch(os.flags() & std::ios_base::adjustfield) {
  case std::ios::left:
    last.append(npad, ' ');
    break;
  case std::ios_base::right:
    first.insert(first.begin(), npad, ' ');
    break;
  case std::ios_base::internal:
    sep.append(npad, ' ');
    break;
  }

  // Write the name.
  os << first + sep + last;
  return os;
}

The only slight subtlety is that we turn the name into a single string that we write all at once, rather than writing each part separately. This matters for two reasons. First, it means we don't have to write our own error handling code: we can let string's ordinary output operator take care of all that. Second, it's the easiest way to make sure that the width gets handled properly. We want to make sure that the name as a whole gets padded, not just the first field.

Letting the user choose between given_first and family_first makes the code more complicated, but not much more. Using xalloc and iword, it's easy to add this kind of flexibility.

An Advanced Feature: pword

You might occasionally find that you want even more flexibility than this. You can use iword to store an integer, a character, or a few boolean flags, but what if, for some reason, you need to store more information than that? What if, instead of a boolean flag, you need to store a string? You certainly can't fit a variable-length string into a long!

What you can do instead is store a pointer to a string: ios_base has another member function, pword, that's just like iword except that it refers to an array of void* instead of an array of long. You can store a string by writing

os.pword(my_index) = new std::string(my_string);

and you can retrieve it by writing

std::string* p = (std::string*) os.pword(my_index);

As usual, you get my_index from xalloc.

Unfortunately, this raises as many questions as it answers. We're using dynamic memory allocation; it's the only reasonable option. (We can't very well store a pointer to a local variable -- that would be an invitation to dangling pointer bugs.) But if you're allocating memory, you also have to delete it. The I/O library won't do it for you; it can't. As far as the I/O library is concerned, you're just storing some random void*. It doesn't know that you're going to interpret that void* as a pointer to a string.

You have to delete the string when you store another string into the same slot, when the ios_base object itself gets destroyed, or when you're copying the format flags from another ios_base. Similarly, to prevent having the string get deleted twice, you need to make sure, when copying format flags to another ios_base, that you end up with two strings instead of two pointers to the same string.

Deleting an old string when you store a new one into the same slot is easy; you can write a single function to perform both operations so that you don't forget. The other cases are more complicated, since you're responding to an external event: you need to use a callback function. The standard library does provide hooks for such callback functions, since pword would be useless without them. Listing 1 shows how to use pword to store a string in an ios_base object.

Advice

I described pword for completeness. You shouldn't use it: it's one of the very most obscure and complicated corners of the C++ I/O library. It's hard to understand code that uses such features and easy to get it wrong, and with a little bit of thought, you can almost certainly come up with an alternative design that sidesteps the whole mess. None of the built-in I/O operators in the C++ Standard library use pword or anything like it. I haven't shown an example where pword is necessary, because most of the examples that I've seen seem a bit artificial.

There's no reason, however, for you to be afraid of using iword. There's nothing complicated about it. With a few lines of boilerplate code, you can add your own customized format flags to ios_base, and you can use those format flags in just the same way as the standard library uses its built-in flags. Just as the standard library gives users many choices about how to display ints, you should give such choices to the users of your classes.

The C++ Standard library makes it possible to treat built-in types and user-defined types on an equal footing, and user-defined format flags are one of the essential features that allows this.

Notes

[1] See, for example, 12.3.1 of Koenig and Moo's Accelerated C++.

[2] See items 6 and 32 in More Effective C++.

Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committee’s library working group. He works at AT&T Labs — Research and can be contacted at [email protected]


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.