May 31, 2007
Evolving a Domain-Specific LanguageAppropriate Notation: An Example with Regular Expressions
As software designers, we are always strongly aware of the tree-structured nature of just about everything we deal with. In designing a DSL notation, there is always an inclination to represent this tree directly in the syntax. However, this is not always the most usable option.
Regular expressions are an example of a textual DSL. Regular expressions have a very compact textual notation, and while very powerful for those who have learned the notation, occasional users can find them opaque. The goal in this example is to create a DSL in which regular expressions can be constructed with a graphical notation. The expected benefits include:
Reminder about Regular Expressions
Regular expressions can seem very arcane to the uninitiated, but the basic idea is simple. Suppose you are processing some text -- let's say, an HTML file; and you want to find the next html element (between < and >); and you want to extract the tag and the parameter pairs of the element and get them into separate variables. So you call:
foreach (Match m in
Regex.Match(yourHtmlString, theRegularExpression))
{ ... and each m is a match to part of the string ... }
The regular expression contains a sequence of characters that you expect to find in the string, and certain characters (parentheses, * + ? [ ] and one or two others) play special roles. * signifies a repetition of zero or many of what went immediately before, so that < * matches a < followed by any number of spaces, including none. + is similar, but insists on at least one occurrence. Square brackets match any single character in the range defined within, so that [A-Z]+ matches any sequence of at least one capital letter. Parentheses demarcate a match that you wish to capture into a variable, so that ([A-Za-z]+) should return to you a word made of one or more alphabetics. (?:...)* repeatedly performs the matches within the parentheses without capturing the whole thing to a variable. "|" specifies alternative matches. (?
< *([A-Za-z]+) +(?:([A-Za-z]+) *= *(?<quote>"|')([^"']*)${quote} *)*/?>
matches, for example:
< table bgcolor= "#ffddff" border="1' >
as illustrated in Figure 1.
< *([A-Za-z]+) +(?:([A-Za-z]+) *= *(?<quote>"|')([^"']*)${quote} *)*/?>
Figure 1: Interpretation of a regular expression.
The objective of this DSL is to make a more accessible notation for regular expressions.
|
|
|||||||||||||||||||
|
|
|
|