C/C++

A Frame-Based Message-Passing Parser for C

By Robert Kiesling, February 01, 2006

Ctalk, an object-oriented preprocessor for ANSI C, contains a novel parser design that's well suited to interpreting object-oriented languages.

A message-passing parser makes it easy to evaluate and interpret object-oriented languages, because the parser design allows the language interpreter to evaluate language references and the objects they refer to within their context.

The program described in this article, ctalk, is an object-oriented preprocessor for ANSI C that allows programmers to include language features such as classes, objects, and methods in C programs.

Ctalk is an offshoot of a project to write an interpreter for the Smalltalk language, and ctalk's design owes much to the object-oriented concepts and terminology of Smalltalk's design.

In addition, unlike object-oriented extensions to the C language, such as C++ and Objective C, ctalk's design is intended to require as little change to the C source code as possible.

Lexical Analysis And Messages

As with most languages, evaluation requires multiple passes through the input source code. Ctalk's first parser pass performs lexical analysis, splits the input into messages that are stored on the parser's message stack, and adds stack frames for use by the language interpreter in later evaluation steps.

Figure 1 shows the steps the parser uses to create messages and frames, and analyze the messages. The creation and function of stack frames is subsequently described.

Figure 1: The steps the parser uses to create messages and frames, and analyze the messages.

Listing One shows part of the subroutine lexical(), which, like most lexical analyzers, splits the source program's text input contained in the array buf[] into tokens, and returns a value corresponding to the input's lexical type.

Listing One

int lexical (char *buf, long long *idx, MESSAGE *m) {

  int c, i, j;
  int is_float = 0;
  int tmpint;
  double tmpfloat;
  char tmpbuf[MAXMSG];
  char *q;

  c = buf[(*idx)++];

  /* WHITESPACE */
  if (c == ' ' || c == '\t' || c == '\f') {
    if (!isspace ((int)buf[*idx])) {
      sprintf (m -> name, "%c", c);
      m -> tokentype = WHITESPACE;
      return WHITESPACE;
    }
    for (i = 0, (*idx)--;
	 ((buf[*idx + i] == ' ') || (buf[*idx + i] == '\t') || 
	  (buf[*idx] == '\f'));
	 i++, (*idx)++)
      m -> name[i] = buf[*idx];
    m -> name[i] = 0;
    m -> tokentype = WHITESPACE;
    return WHITESPACE;
  }

  /* LABEL */
  if (isalnum ((int)c) || c == '_') {
    (*idx)--;
    q = tmpbuf;
    while (isalnum ((int)buf[*idx]) || (buf[*idx] == '_'))
      *q++ = buf[(*idx)++];
    *q = 0;
    strcpy (m -> name, tmpbuf);
    m -> tokentype = LABEL;
    return LABEL;
  }
    
.
.
.

  sprintf (m -> name, "%c", c);
  m -> tokentype = CHAR;
  return c;
}

The third argument to lexical(), m, is a MESSAGE structure typedef that contains the token's text, its token type, and other attributes of the token. Listing Two shows the MESSAGE structure typedef. Lexical() fills in only the name and tokentype members, and the interpreter fills in the other attributes during later parser passes.

Listing Two

typedef struct _message {
  char sig[7];
  char name[MAXLABEL];
  char value[MAXLABEL];
  OBJECT *value_obj;
  OBJECT *obj;
  int tokentype; 
  int evaled;
  int output;
  int error_line;
  int error_column;
  OBJECT *receiver_obj;
  struct _message *receiver_msg;
} MESSAGE;

After the token is analyzed, the parser pushes the message onto the message stack. Ctalk could easily pass MESSAGE structures among functions in the parser and interpreter, but instead it passes the message's stack index, and most of the parser and interpreter functions require only the stack index of a message as an argument.

Most of the data types in ctalk contain a sig member, which allows the program to check the validity of the data when necessary. The following macro checks for a valid message:

#define IS_MESSAGE(x) (!memcmp 
	     ((void *)x, "MESSAGE", 7))

Ctalk's other data types have corresponding macros. If an error occurs within the interpreter, the program can break off processing with a statement like the following:

if (!m || !IS_MESSAGE (m))

 error ("%s is not a message.", 
			m -> name);

Stack Frames

The first pass of the parser also creates stack frames for each language statement and function in the input source code. Listing Three shows the parser's FRAME structure typedef.

Listing Three

typedef struct _frame {
  int scope;
  int ret_frame_top;
  int var_frame_top;
  int message_frame_top;
  int arg_frame_top;
} FRAME;

Each frame records the statement's index in the message stack as well in the argument, variable, and return value stacks, which the interpreter uses in later steps.

The first parser pass also determines if the scope of each frame is global or local. The frame's scope in turn determines the scope of newly created objects, and tells the interpreter to delete local objects when they go out of scope.

The use of frames simplifies the parser and interpreter design considerably. The frames delimit the boundaries of each statement in the parser's message stack and the other stacks used by the interpreter, so it is easy to determine the beginning and end of each statement, and the statement's corresponding indexes in the other interpreter stacks.

The second and later parser passes need only the frame's stack index to interpret each statement of the language, and the second pass can evaluate the messages in a stack frame as many times as necessary to resolve the statement to a single value.

Interpreting C

In addition to the main parser, ctalk also has other simpler and more specialized parsers that interpret C preprocessor statements and ANSI C statements.

These parsers maintain their own message stacks, and although they are similar in design to the main parser, they can be much simpler because the values they derive from the input are also much simpler.

This strategy also simplifies the design of the interpreter. By handing off complex but specialized chores to subroutines, the ctalk interpreter can concentrate on evaluating its own language constructs and generating output code.

1 2 3 4 5 6 7 8 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

A Frame-Based Message-Passing Parser for C

Lexical Analysis And Messages

Figure 1: The steps the parser uses to create messages and frames, and analyze the messages.

Listing One

Listing Two

Stack Frames

Listing Three

Interpreting C

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

A Frame-Based Message-Passing Parser for C

Lexical Analysis And Messages

Figure 1: The steps the parser uses to create messages and frames, and analyze the messages.

Listing One

Listing Two

Stack Frames

Listing Three

Interpreting C

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content