Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Web Development

Using XML Schema for Validating User Input

Alex is a technical lead for the Nokia Siemens Network. He can be contacted at [email protected].

One of the often repeated (and mundane) tasks in software development is capturing and validating user input. Nevertheless, these are important tasks, and the most common method of accomplishing them is with if-else statements in the code. However, this approach leads to code bloat and inflexible code, not to mention it is hard to unit test.

Another approach is to use something like the Apache CLI (Comand-Line Interface), although here the fields and parameters have to be input in the code, again leading to inflexible code and situations where the parameters or acceptable range/values changes have to be updated and recompiled.

However, using an XML Schema for validation decouples the code from the validation task completely. For example:

1. The user enters a command via the GUI/CLI to create a "route" between two "servers":

CR_ROUTE source="" dest="" vrf_bit="0" name="Server1" source2="" prim_mask=""

2. This is coverted to XML (with the schema embedded):

<CLISYNTAX xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  <CR_ROUTE  prim_addr="" dest=""
vrf_bit="0" name="Server1" source2="" 

3. This XML is validated against the schema CLISyntaxData.xsd using, for instance, Xerces. Listing One is an excerpt of the schema.

<?xml version="1.0" encoding="UTF-8"?>
<!--  Command database for user syntax validation -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    <xs:element name="CLISYNTAX">
                <xs:element name="CR_ROUTE">
                        <xs:attributeGroup ref="CR_ROUTE_Attrs"/>
                <xs:element name="DISP_ROUTE">
                        <xs:attributeGroup ref="DISP_ROUTE_Attrs"/>

<!-- Command Line attributes for CR_ROUTE command -->
    <xs:attributeGroup name="CR_ROUTE_Attrs">
        <xs:attribute name="dest" type="IPaddr" use="required"/>
        <xs:attribute name="name" type="xs:string" use="required"/>
        <xs:attribute name="prim_addr" type="IPaddr" use="required"/>
        <xs:attribute name="prim_mask" type="IPaddr" use="required"/>l;
        <xs:attribute name="sourc2e" type="IPaddr" use="required"/>
        <xs:attribute name="vrf_bit" type="xs:integer" use="required"/>
    <!-- Command Line attributes for DISPVRF command -->
    <xs:attributeGroup name="DISP_ROUTE_Attrs">
        <xs:attribute name="vrf" type="xs:integer"/>

    <!--  Derived types for syntax validation -->

    <xs:simpleType name="IPaddr">
        <xs:restriction base="xs:string">

Listing One

The advantage of this approach is:

  • All syntax validation done in one shot.
  • No code required for validation, (only for forming the XML and calling the xerexes API for validation).
  • New command-line interface commands can be added without adding any code.
  • The formed XML can be used to extract user inputs.

Now let's see how you can do this using Java. Here, I illustrate the same approach by getting the input from the command line. The command-line input follows the pattern fieldname<space> value; and use pattern matching via regular expressions to extract the fields. Listing Two is an excerpt of the main method:

public static void main(String[] args)
         String input="";
        for (int i=0;i<args.length;i++){
            input = input+" "+args[i];
        input = input.trim();
        commandLineTokenizer regToken = new commandLineTokenizer();
Listing Two

You can see how the code works in parseCmdLine method below. Since I am using regular expressions, I use these Java packages:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public void parseCmdline(String inputStr) {

You first declare a string with the correct regular expression:

String patternStr1 = "-(\\S++)\\s*(\\S+)\\s*";
String patternStr2 = "(\\S+)\\s*=\\s*(\\S+)\\s*";

Predefined Character Classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

 Greedy  Reluctant  Possessive
 X?  X??  X?+  X, once or not at all
 X*  X*?  X*+  X, zero or more times
 X+  X+?  X++  X, one or more times
 X{n}  X{n}?  X{n}+  X, exactly n times
 X{n,}  X{n,}?  X{n,}+  X, at least n times
 X{n,m}  X{n,m}?  X{n,m}+  X, at least n but not more than m times

Also note that I used the regular expression grouping methodologies.

Once you create the pattern, you compile it:

// Compile regular expression
Pattern pattern1 = Pattern.compile(patternStr1);
Pattern pattern2 = Pattern.compile(patternStr2);
//Pattern pattern3 = Pattern.compile(patternStr3);
//To get all the flags except (-)
// and the CLI command
Matcher matcher = pattern1.matcher(inputStr);

The flags and values will be in two groups that are then printed out:

          System.out.println("Command=" + matcher.group(2));
      System.out.println("flag=" + matcher.group(1));
      System.out.println("flagValue=" + matcher.group(2));

Once the flags and values are separated, it is easy to create XML from them:

public  boolean createXMLFromInput(String filename)
     try {
      BufferedWriter out = new BufferedWriter(new FileWriter(filename));
        out.write("<?xml version=\"1.0\"encoding=\"UTF-8\"?>\n");
        out.write("<CLISYNTAX  xmlns=\"http://www.scs.org\>\n");
        out.write("<"+ szCommand +" ");   
               //CR_ROUTE  source=""
dest="" vrf_bit="0" name="Server1" source2="" 
              //now write the paramete and the values
              szCommandLineString =szCommand +" ";
              // Iterate over the keys in the map
              Iterator itParam = cmdlineParamMap.keySet().iterator();
                     while (itParam.hasNext())
                        // Get key
                        String param = itParam.next().toString();
                        szCommandLineString +=param+"=";
                        String value =cmdlineParamMap.get(param).toString();
                        out.write("\""+  value + "\"" + " ");
                        szCommandLineString +="\""+  value + "\"" + " ";

Once the XML is created from the input and using Apache Xereces DOMParser, the XML can be validated with the schema. I use import org.apache.xerces.parsers.DOMParser; for this functionlity:

DOMParser domParser = new DOMParser();
CustomErrorHandler handler = new CustomErrorHandler();
   try { domParser.setFeature("http://xml.org/sax/features/namespaces",true );
 domParser.setFeature("http://xml.org/sax/features/validation",true );
) ;
) ;
CLISyntaxData.xsd" );

Now the ever important method call:


In case of an error, this is caught in the error-handler object of the CustomErrorHandler class that was passed to DOMParser:

if(handler.getError() > 0)
  System.out.println("Error in Parsing-Invalid Input" );

Using pattern matching, you can get a user-fiendly error message out of the cryptic error message of the SAXParseException object:

if (handler.getColumNumber() > 0)
String cmdline=regToken.getParsedCommandLineString();
int len = handler.getColumNumber() ;

That's it. You can find the complete SchemaParser source code here.

Assuming you're on Windows, the environemt varibles to set are:


The invocation of the Java class is:

\JavaXML\SchemaParser\bin>java -classpath %CLASSPATH% java_xml/schemaparser

and the sample input and corresponding output are:

\JavaXML\SchemaParser\bin>java -classpath %CLASSPATH%
java_xml/schemaparser "-n neName -g ugname -l fsdf,fdsfsd,fsdfsdf -i
CR_TUNNEL prsim_addr= dest= vrf=1 
ggsn=String source= prim_mask = dsasd=0"
Error in Parsing-Invalid Output
Error= Attribute 'prim_addr' must appear on element 'CR_TUNNEL'.
Error= Attribute 'prsim_addr' is not allowed to appear in element
Error= Attribute 'dsasd' is not allowed to appear in element 'CR_TUNNEL'.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.