Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

A Frame-Based Message-Passing Parser for C


A message-passing parser makes it easy to evaluate and interpret object-oriented languages, because the parser design allows the language interpreter to evaluate language references and the objects they refer to within their context.

The program described in this article, ctalk, is an object-oriented preprocessor for ANSI C that allows programmers to include language features such as classes, objects, and methods in C programs.

Ctalk is an offshoot of a project to write an interpreter for the Smalltalk language, and ctalk's design owes much to the object-oriented concepts and terminology of Smalltalk's design.

In addition, unlike object-oriented extensions to the C language, such as C++ and Objective C, ctalk's design is intended to require as little change to the C source code as possible.

Lexical Analysis And Messages

As with most languages, evaluation requires multiple passes through the input source code. Ctalk's first parser pass performs lexical analysis, splits the input into messages that are stored on the parser's message stack, and adds stack frames for use by the language interpreter in later evaluation steps.

Figure 1 shows the steps the parser uses to create messages and frames, and analyze the messages. The creation and function of stack frames is subsequently described.

Figure 1: The steps the parser uses to create messages and frames, and analyze the messages.

Listing One shows part of the subroutine lexical(), which, like most lexical analyzers, splits the source program's text input contained in the array buf[] into tokens, and returns a value corresponding to the input's lexical type.

Listing One

int lexical (char *buf, long long *idx, MESSAGE *m) {

  int c, i, j;
  int is_float = 0;
  int tmpint;
  double tmpfloat;
  char tmpbuf[MAXMSG];
  char *q;

  c = buf[(*idx)++];

  /* WHITESPACE */
  if (c == ' ' || c == '\t' || c == '\f') {
    if (!isspace ((int)buf[*idx])) {
      sprintf (m -> name, "%c", c);
      m -> tokentype = WHITESPACE;
      return WHITESPACE;
    }
    for (i = 0, (*idx)--;
	 ((buf[*idx + i] == ' ') || (buf[*idx + i] == '\t') || 
	  (buf[*idx] == '\f'));
	 i++, (*idx)++)
      m -> name[i] = buf[*idx];
    m -> name[i] = 0;
    m -> tokentype = WHITESPACE;
    return WHITESPACE;
  }

  /* LABEL */
  if (isalnum ((int)c) || c == '_') {
    (*idx)--;
    q = tmpbuf;
    while (isalnum ((int)buf[*idx]) || (buf[*idx] == '_'))
      *q++ = buf[(*idx)++];
    *q = 0;
    strcpy (m -> name, tmpbuf);
    m -> tokentype = LABEL;
    return LABEL;
  }
    
.
.
.

  sprintf (m -> name, "%c", c);
  m -> tokentype = CHAR;
  return c;
}

The third argument to lexical(), m, is a MESSAGE structure typedef that contains the token's text, its token type, and other attributes of the token. Listing Two shows the MESSAGE structure typedef. Lexical() fills in only the name and tokentype members, and the interpreter fills in the other attributes during later parser passes.

Listing Two

typedef struct _message {
  char sig[7];
  char name[MAXLABEL];
  char value[MAXLABEL];
  OBJECT *value_obj;
  OBJECT *obj;
  int tokentype; 
  int evaled;
  int output;
  int error_line;
  int error_column;
  OBJECT *receiver_obj;
  struct _message *receiver_msg;
} MESSAGE;

After the token is analyzed, the parser pushes the message onto the message stack. Ctalk could easily pass MESSAGE structures among functions in the parser and interpreter, but instead it passes the message's stack index, and most of the parser and interpreter functions require only the stack index of a message as an argument.

Most of the data types in ctalk contain a sig member, which allows the program to check the validity of the data when necessary. The following macro checks for a valid message:

#define IS_MESSAGE(x) (!memcmp 
	     ((void *)x, "MESSAGE", 7))

Ctalk's other data types have corresponding macros. If an error occurs within the interpreter, the program can break off processing with a statement like the following:

if (!m || !IS_MESSAGE (m))

 error ("%s is not a message.", 
			m -> name);

Stack Frames

The first pass of the parser also creates stack frames for each language statement and function in the input source code. Listing Three shows the parser's FRAME structure typedef.

Listing Three

typedef struct _frame {
  int scope;
  int ret_frame_top;
  int var_frame_top;
  int message_frame_top;
  int arg_frame_top;
} FRAME;

Each frame records the statement's index in the message stack as well in the argument, variable, and return value stacks, which the interpreter uses in later steps.

The first parser pass also determines if the scope of each frame is global or local. The frame's scope in turn determines the scope of newly created objects, and tells the interpreter to delete local objects when they go out of scope.

The use of frames simplifies the parser and interpreter design considerably. The frames delimit the boundaries of each statement in the parser's message stack and the other stacks used by the interpreter, so it is easy to determine the beginning and end of each statement, and the statement's corresponding indexes in the other interpreter stacks.

The second and later parser passes need only the frame's stack index to interpret each statement of the language, and the second pass can evaluate the messages in a stack frame as many times as necessary to resolve the statement to a single value.

Interpreting C

In addition to the main parser, ctalk also has other simpler and more specialized parsers that interpret C preprocessor statements and ANSI C statements.

These parsers maintain their own message stacks, and although they are similar in design to the main parser, they can be much simpler because the values they derive from the input are also much simpler.

This strategy also simplifies the design of the interpreter. By handing off complex but specialized chores to subroutines, the ctalk interpreter can concentrate on evaluating its own language constructs and generating output code.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.