Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Security

Programming Language Format String Vulnerabilities


Hal is a Vulnerability Research at CERT. He can be contacted at www.hburch.com.
Robert C. Seacord is Senior Vulnerability Analyst for CERT/CC. He can be reached at [email protected].


Although not as well known as other vulnerability types such as buffer overflows, format string vulnerabilities have been known to exist in C and C++ programs since at least 1999, when a format string vulnerability was found in AnswerBook2 (cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-1999-1417). Formatted output became a major focus of the security community in June 2000, when a format string vulnerability was discovered in the Washington University ftpd (WU-FTPD) software package (www.kb.cert.org/vuls/id/29823).

But format string vulnerabilities are not limited to programs written in C and C++. Other languages that include format strings include Perl, PHP, Java, Python, and Ruby. While these languages are relatively immune from buffer overflows because they maintain dynamic arrays and strings for programmers, programs written in them may still contain format string vulnerabilities.

Format string vulnerabilities result from including data from an untrusted source, such as a user, in a format string. Format strings are used by input and output routines to specify a conversion between a character string and a set of data values. The following example shows how the C function printf() accepts a format string and a set of values:

printf ("%s Pop: %11d\n",
    country, pop);

and produces a string:

United States Pop:   295734134

In the format string, the % begins a conversion specification. This is followed by a set of formatting parameters and the data type. The %s conversion specifier instructs printf() to output a string value (the value passed as an argument). The %11d conversion specifier instructs printf() to output a decimal value (the "d") in an 11-character field. Format strings can be much more complicated, including flags, precisions, length modifiers, and even variable widths specified in parameters.

Directly including user input in a format string lets an attacker inject format specifications into the format string. This is particularly problematic in programming languages that support the relatively unknown %n specification. This unusual specification causes the number of characters successfully written so far to be stored in the integer whose address is given as the argument. If attackers can write data values to memory, they can often leverage that to gain control of the system. Even if the language does not support %n, an attacker may cause the format string to include more specifications than parameters. Depending on what stack protection exists in the language, an attacker may be able to access private data, avoid logging, or crash the program. (Writing exploits for a format string vulnerability is beyond the scope of this work. For a more detailed explanation, see Robert Seacord's Secure Programming in C and C++; Addison-Wesley, 2005.)

Format string vulnerabilities often result from a programmer being unaware that a particular routine takes a format string. For example, you can write:

snprintf(str, sizeof(str),      "Wrong password for email %s",
        email);
syslog(LOG_WARNING, str);

Unfortunately, the syslog() routine uses its second parameter as a format string. As a result, if an attacker inputs an e-mail of "webmaster%s%s%s%[email protected]", syslog() looks for parameters to interpret the %s conversion specifiers in the format, most likely resulting in the program crashing. A more advanced attack may use %n to gain control of the system.

Another common source of format string vulnerabilities is when you need to write an error to more than one location. For simplicity, you may construct the string using snprintf() and then use one routine to print the message to a log and another routine to output the message to the end user in some way, such as in a message box. If either routine allows for format strings, you must be careful to include the format specification in the call:

fprintf(log, "%s", logmessage);

instead of neglecting it as in the following call:

fprintf(log, logmessage);

The first invocation is the correct one, avoiding a format string vulnerability by specifying that a string (%s) should be outputted and then providing that string. Because the second is shorter and may correspond to how you are thinking about the desired behavior, you may write the statement in this fashion without considering the consequences.

In this article, I explore the potential consequences of format string vulnerabilities in Perl, PHP, Java, Python, and Ruby programs.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.