XML and the desperate Tcl hacker

Steve Ball

Plume Project, Australian National University
Steve.Ball@tcltk.anu.edu.au

Abstract
A stated design goal of XML [1] is that the language should be simple enough that it is easy to write programs to process XML documents. More informally, it has been said that XML documents should be able to be handled using scripting languages, such as Perl, and by the typical "Desperate Perl Hacker" (DPH). Tcl hackers are just as desperate as Perl hackers, so I have developed support for XML documents using the Tcl scripting language [Ouster]. To provide support for XML documents a Tcl package has been developed, imaginatively called "TclXML". This package includes a validating parser, as well as a facility to programmatically generate XML document text from a Tcl script. TclXML is suitable for a variety of applications, and along with Tcl itself it is highly embeddable, for example into an existing (legacy) application.

Keywords
XML; Tcl; Scripting

1. Scripting languages and XML

Scripting languages, for example Tcl and Perl, generally process XML using regular expressions. Regular expressions are used in order to improve the performance of the scripting language by processing the text "in bulk". This is necessary since using a scripting language to write a lexical analyzer which iterates over the characters of the text would be extremely slow. However, the bulk-processing technique makes it difficult to cope with certain irregularities in the language, and some nesting constructs. The regularity of XML will determine how easily it can be processed by a scripting language using regular expression based techniques. This poster presents how the TclXML package is able to support access to XML documents using only the Tcl scripting language, and assesses how successful XML is in its aim of achieving ease-of-processing by scripting languages.

2. Parsing documents

TclXML parses documents into a format known as XAPI-Tcl, which a Tcl application can then use to perform processing on the parsed document. XAPI-Tcl presents a tree-structured hierarchical list (a grove), which the Tcl application can process using Tcl list commands. For example the command:

	xml::parse {<MEMO
PRIORITY="Important"><TO>All WWW7 Attendees</TO><FROM>Steve Ball</FROM>
	<MESSAGE>XML is terrific!</MESSAGE></MEMO>}
returns the following Tcl list:
	parse:element MEMO {PRIORITY Important} {
		parse:element TO {} {
			parse:text {All WWW7 Attendees} {} {}
		}
		parse:element FROM {} {
			parse:text {Steve Ball} {} {}
		}
		parse:element MESSAGE {} {
			parse:text {XML is terrific!} {} {}
		}
	}

The parser accepts options to configure how it constructs the grove. For example, the markers "parse:element" or "parse:text" may be configured to be other values. Normally the parser transparently takes care of removing comments and expanding character entity references, but instead they may be included in the returned structure by using the -commentcommand or -entitycommand respectively. In other words, the configuration options given to the parser are its grove plan.

The handling of errors in the document is under the control of the application. By default, TclXML will attempt to recover from document errors and produce a best-guess approximation of the document structure. However, an application callback script may be specified to modify this behaviour by using the -errorcommand.

The Tcl application may manipulate the parsed structure as a list, or it can evaluate the parsed structure as a script to provide an alternate, convenient processing method. By applying appropriate special character quoting, TclXML ensures that this is safe. To process a document using the evaluation method, the application simply defines a procedure for each of the markers used in the XAPI-Tcl structure, such as parse:element.

3. Creating XML documents

The TclXML document generation facility allows Tcl scripts to emit XML documents in a convenient manner, with an optional feature of on-the-fly validation of the document as it is generated. This facility works by parsing the document's DTD and then creating Tcl commands for each element and entity that is defined in the DTD. The Tcl application can then create a document by invoking these commands. Element attributes are given as arguments to the commands. Say goodbye to angle brackets!

For example, to generate the XML document given above, the following commands would be used:

	xml::generate [xml::parseDTD $dtd]
	set xmlDocument [MEMO priority Important {
		TO {xml::text "All WWW7 Attendees"}
		FROM {xml::text "Steve Ball"}
		MESSAGE {xml::text "XML is terrific!"}
	}]

4. Future directions

Apart from tracking the development of the XML language specification, further work will extend the definition of XAPI-Tcl and functions will be created for searching and manipulating the parsed document structure, in order to support standards such as XLL [2] and DOM [3].

The TclXML package will feature in the next release of the Plume World Wide Web browser, since XML is now used in many Web standards, such as RDF [5] and SMIL [6], as well as Microsoft's CDF. Plume [Ball] will also support the display of XML documents, and there is progress on the use of stylesheets, such as XSL [4], as well as document scripting using Tcl/Tk.

References

[1]
T. Bray, J. Paoli, and C.M. Sperberg-McQueen, Extensible Markup Language (XML) Specification: Part 1. Language, W3C Working Draft 17th November 1997,
http://www.w3c.org/TR/
[2]
T. Bray and S. DeRose, Extensible Markup Language (XML) Specification: Part 2. Linking, W3C Working Draft 31st July 1997,
http://www.w3c.org/TR/
[3]
L. Wood, Document Object Model Specification, W3C Working Draft 9th October 1997,
http://www.w3c.org/TR/
[4]
S. Adler et al., A proposal for XSL, W3C NOTE 27th August 1997,
http://www.w3c.org/TR/
[5]
O. Lassila and R.R. Swick, Resource Description Format Model and Syntax, W3C Working Draft 2nd October 1997,
http://www.w3c.org/TR/
[6]
P. Hoschka, Synchronized Multimedia Integration Language, W3C Working Draft 9th November 1997,
http://www.w3c.org/TR/
[Ouster]
J.K. Ousterhout, Tcl and the Tk Toolkit. Addison Wesley, Reading, MA, 1994.
[Ball]
S. Ball, SurfIt! A WWW Browser, in: Proc. of the 4th Annual Tcl/Tk Workshop, Monterey, CA USA July 1996, pp. 161–171,
http://tcltk.anu.edu.au/

URLs