Journal reference: Computer Networks and ISDN Systems, Volume 28, issues 7–11, p. 1095.

Extending the Web's Tag Set Using SGML:

The Grif Symposia Authoring Tool

Jean Paoli, Technical Director, GRIF S.A., 2 Boulevard Vauban, BP 266, 78053 St Quentin en Yvelines, Cedex, France


HTML suffers from its lack of extensibility, and anarchical tag proliferation is in danger of breaking the WWW.

In our view, the extensibility of the text model is necessary and we should develop and make extensive use of SGML to extend the current HTML model not only by defining other DTDs which could replace HTML but by proposing an extensibility scheme offering Web users rules for extending the HTML DTD themselves.

This approach has been developed in the Grif Symposia authoring tool.

Grif Symposia, a joint INRIA / GRIF S.A. project, is an integrated authoring-browsing environment and will soon be shipped with full extensible capabilities to handle mixed HTML/SGML data models.

We will discuss in this paper the advantages to be gained by using a mixed HTML/SGML data model for the WWW on the basis of the work that we have achieved by developing Grif Symposia. We will present the different layers developed for Grif Symposia and highlight the advantages obtained in authoring information in a mixed SGML/HTML environment.

Extensibility, Authoring, HTML, Grif Symposia, SGML, Style Sheets.

1 Extending the Web's Tag Set using SGML

There is a great need for the data model supported by the WWW to be extensible. The WWW data model could be roughly described as a text model formalized by HTML which embeds other data or live objects like images, video, Java or OLE objects.

The current trend observed on the WWW is to define a way for extending this model to support more data or live objects [Berners 95]while nothing is done to define how the text model could be extended.

There is no doubt that the large success of the WWW came from the sudden availability of a great amount of world-wide searchable textual information. This search ability was made possible because the information was expressed in a public non-opaque non-binary text format: HTML.

While it is without any doubt necessary to enable the WWW to handle more and more data or live objects, these are often expressed in a non-public or binary format. The proliferation of this kind of data, minimizing, slowly but surely, textual data expressed in HTML could, in the long term, jeapordize the searchable exploitable working scheme based on textual formats which has proved the success of the WWW.

1.1 Tag Proliferation

HTML suffers from its lack of extensibility, and anarchical tag proliferation is in danger of breaking the WWW. A lot of the recent tag proliferation came from the need to offer improved presentation and layout of online documents. It is obvious that HTML, in that area, is far from the sophisticated layout formats which exists in pre-press software such as Adobe PageMaker or Quark XPress.

The purpose of associating improved presentation, or at least adding some flexibility to the current HTML scheme is, in our view, addressed by the associating of style sheets to the HTML tags, and the work of the W3C in that direction (CSS [Lie 95], DSSSL[Bozak 95]) is a solution which should be adopted.

1.2 Semantic tags

Another reason for tag proliferation has been the need to express and use more complex structured information than is found in the HTML structure model.

Documentary data are more clearly identified by semantic tags [Paoli 94]. Semantic tags are used to precisely identify corporate or industry specific information such as motors, product parts, transistors, or other objects which require a very precise description. This is one of the strong points of SGML and there is always a need for mission-critical data to be formalized and stored using such markup.

It is in specialized corporate contexts that we see the greatest need for such markup. It is generally accepted that people in this kind of corporate environment often manipulate the same kind of structured information which describes very precisely the nature of their business or technical data. In the case of an electronic components manufacturer, for example, it would not be uncommon for engineers to work together on the definition of a new electronic device and on the documentary data which could best describe it.

1.3 An extension mechanism for HTML

In our view, the extensibility of the text model (a text/hypertext framework which contain as much text as possible and which embeds other live objects) is necessary and we should develop and make extensive use of SGML [Sperberg 94] to extend the current HTML model not only by defining other DTDs which could replace HTML but by proposing an extensibility scheme offering Web users rules for extending the HTML DTD themselves.

This approach has been developed in the Grif Symposia authoring tool that we have developed, and we present here the different layers developed for Grif Symposia and highlight the advantages to be obtained in authoring information in a mixed SGML/HTML environment.

There are four different points that must be examined to enable the extensibility of the text model of the WWW:

  1. New tags: Text documents which navigate on the WWW are written today using the HTML DTD. The possibility to add new tags either by enabling end users to use a completely different DTD or to extend the HTML DTD with their own set of tags is necessary.
  2. Navigation: Text documents navigate using the http protocol (or other protocols).
  3. Links: Texts documents are linked to other files by using the <A> anchor tag.
  4. Display: Texts documents are displayed by browsers which interpret in a semi-uniform way the set of tags defined in the different versions of HTML (HTML, HTML 2.0, HTML 3.0).

2 Grif Symposia

Grif Symposia is a WYSIWYG authoring tool that enables you to create and modify HTML and SGML documents directly on the WWW. Grif Symposia was developed by GRIF S.A.andINRIA as part of the European effort for the creation of more powerful tools for the World Wide Web.

Grif Symposia is built on top of Grif SGML editor, a WYSIWYG native SGML editor that allows files based on any DTD to be loaded, edited and saved in SGML format [Quint 95]. Functions for handling each of the different HTML versions have been incorporated into the Grif software and the SGML parser used has been modified so as to accept HTML documents which are not strictly valid.

The CERN/W3C network library has been integrated with the SGML Editor and OpenURL and SaveURL commands have been added using the PUT element of the http protocol. This allows documents to be created and saved directly on remote servers. Various cooperative strategies have been studied to allow collaborative authoring on the WWW and a simple strategy (lock/unlock file) has been developed[Paoli 95].

Special user-friendly editing (point and click) commands for creating and modifying anchors have been written and links can be followed immediately after being created through the network. The tool accepts any of the HTML 'dialects' and any other SGML DTD could be incorporated in the tool using the standard features of the Grif environment. This would allow the creation, editing and remote saving of documents by multiple authors on the network in their original SGML format.

Further work is underway in the field of collaborative authoring on the WWW, including the incorporation of annotated documents, handling large documents, support for different versions, and the interactive incorporation of various fragments of data from different servers into one document.

A freeware version of Grif Symposia can be downloaded from the INRIA WWW server at A Pro version is commercialized by GRIF S.A.

Because Grif Symposia makes use of Grif SGML Editor's system for handling generic structured documents, incorporating new DTDs in Grif Symposia is simply a matter of following the process previously described.

3 Adding new tags by defining SGML DTDs

In Grif Symposia, our approach for adding new tags is by defining and using SGML DTDs. By using new DTDs, one could express new sets of tags and the different structure rules between these tags. We give here an example using an SGML environment that represents an illustrated parts list catalog.

<!-- A Part List contains multiple block items--> 
<!-- which describe parts and assemblies --> 
<!ELEMENT partlist (blockitem)*>  
<!ELEMENT blockitem - - 
(supplier, supplierref, partref,type, expire?, quantity) >

A partlist describes the different parts of a manufactured product. A partlist is a list of blockitem. Each blockitem describes in detail the part of the product (the supplier, the supplier reference number, the part reference number in a database, the type of the part, the expiration date of this information and the quantity of these kind of parts in the product).

3.1 Using another DTD: the Grif Symposia approach

The first approach to incorporate the partlist set of tags is by defining documents which could contain only the partlist set of tags.

Grif Symposia is built on top of Grif SGML editor, a WYSIWYG native SGML editor that allows files based on any DTD to be loaded, edited and saved in SGML format.


The DOCTYPE declaration which precedes the tags definition defines this set of tags as a separate DTD. The Grif Symposia environment includes an SGML parser that analyzes the structural rules expressed in this DTD and generates a structured authoring environment that, through the use of contextual menus, ensures that all document contain structurally valid tagged text.

In Grif Symposia, a valid document starts with the following:

<?SymposiaStyle partlistP PUBLIC "-//MY COMPANY//STYLE PARTLIST//EN">

The string "-//MY COMPANY//DTD PARTLIST//EN" is a public identifier which points to the address of the DTD definition.

In Grif Symposia, we use the SGML Open catalog definition to associate a physical address (a URL) to this public identifier.

In an SGML Open catalog definition, each entry in the catalog associates a "Storage Object Identifier" (SOI), such as a file name, with information about the external entity that appears in the SGML document.

For example, the following are possible catalog entries that associate a public identifier with an SOI:

PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN" ""

A valid document could also start with:

<!DOCTYPE partlist SYSTEM "">

This indicates directly the physical location of the DTD definition.

The string <?SymposiaStyle partlistP PUBLIC "-//MY COMPANY//STYLE PARTLIST//EN"> is a public identifier which points to the address of the definition of a style sheet associated to the DTD and which permit to display the tags. The discussion on the format of styles sheets is in a next section of this paper. Grif style language is called P and partlistP is a style sheet file associated to a partlist DTD. <?SymposiaStyle> is a processing instruction (like a comment) which is interpreted only by Grif Symposia to find the style sheet definition. This solution had been adopted before the work which has been done on HTML and Style sheets and we intend to use the LINK tag of HTML 2 as specified by the W3C instead of this processing instruction where its definition is :

    href CDATA #REQUIRED -- Uniform Resource Locator -- 
    title CDATA #IMPLIED -- advisory title string --
    rel CDATA #IMPLIED -- forward link type -- 
    rev CDATA #IMPLIED -- reverse link type -- 
    type CDATA #IMPLIED -- advisory Internet media type -- > 

And use it as in:

<LINK TITLE="partlistP" REL=stylesheet HREF="" TYPE="text/p">

3.2 Extending the HTML DTD: The Grif Symposia approach

The second approach to incorporate the partlist set of tags is by defining how an HTML document could contain these tags. Our approach in Grif Symposia was similar to the approach taken by the W3C on theINSERT tag to permit to the HTML author to specify the data, including persistent data and/or properties/parameters for the initialization of objects to be inserted into HTML documents, as well as the code that can be used to display/manipulate that data. The INSERT tag permit to extend dynamically HTML (i.e. without having to modify, for each new data format, the DTD).

<!-- INSERT is a character-like element for inserting objects --> 
<!ELEMENT insert - - (param*, bodytext)>  
<!ATTLIST insert %attrs --id, class, style, lang, dir -- 
          data %URL #IMPLIED -- ref to object's data -- 
          code %URL #IMPLIED -- ref to object's code --  
          ... >

In Grif Symposia, we added to the HTML DTD a NATURE tag, similar to the INSERT tag. Similarly to the INSERT tag, NATURE permit to extend dynamically HTML (i.e. without having to modify, for each new set of tags, the DTD):

<!-- NATURE is a character-like element for inserting SGML DTDs --> 
<!ELEMENT nature - - (dtd, -- ref to the dtd -- 
                      link, -- ref to the style sheet to use to display dtd -- 
                      ANY) -- placeholder for the SGML fragment written using dtd -- >

This approach is interesting because any HTML document could contain, encompassed by the NATURE tag, a set of well defined well structured data displayed using a style sheet. A document would then contain for example:

<BODY> ...

4 Using http's PUT to save and retrieve data

HTML is a specific SGML DTD [Berners 94] but the http protocol is completely independent from this DTD and from the tags this DTD defines (this explain why the set of supported tags is evolving in the different versions of HTML). This is why using another SGML DTD which simply defines another set of tags could be easily implemented.

Grif Symposia is directly wired to the WWW. The W3C (formerly CERN) network library has been integrated with the SGML Editor and OpenURL and SaveURL commands have been implemented using PUT.

The PUT element of the http protocol allows documents to be saved in a location specified by a URL. PUT can be enabled without recompiling the server by adding some configuration files to enable, among other things, the designation of directories on a remote server where files could be stored. In our authoring environment, we use a simple strategy which fetch the last modification time of a file at load time and to send this date and the new content of the file at save time to a CGI script on the remote server. In this way, the script could compare if the file has been modified by someone else in the mean time.

5 Managing the Links

By adding new set of tags to the documents travelling in the network, one must be able to define also the hypertext capabilities which must be included in any documents used in the Web.

An easy solution to the problem of being able to point from any SGML document to any other file on the network is to introduce the <A> anchor tag (with exactly the same definition that we find in HTML 2.0) into every SGML DTD to be used on the Net. There are no technical problems preventing this.

Other easy solutions could be found by using the HyTime standard[DeRose 94] or by adding an adequate association in a DSSSL description: the important thing here is to be able to indicate to the browser or to the editor which tag of the DTD represents the start and end of the link (i.e. the tag which looks like the <A> anchor tag) and which attributes contain the URL (i.e. which attributes looks like the HREF attribute)[Paoli 94] [Quint 94].

In Grif Symposia, we implemented the first solution (introducing the <A> tag in any set of tags we want to use) but we gave it a little more flexibility in naming the A tag and its attributes.

A resource file named forms.cfg defines which tag, for each DTD, looks like the <A> tag and its attributes.

In that way, any element of any DTD, which has at least a minimum set of attributes that the <A> tag has could be used to build links between documents on the WWW.

A forms.cfg file looks like:


This indicates that for the HTML DTD, A, HREF and NAME are named A, HREF and NAME and that for the PARTLIST DTD, they are named ANCHOR, HYPERTEXT and PNAME .Here, it is important to notice that the ANCHOR tag do not have to have the same content model of <A> .

6 Dealing with Style Sheets

6.1 Displaying tags

The display of SGML tags is the most serious problem we had to consider. The display of structured information has always been recognized as difficult, mainly because structured information means management of a tree representation of a document and display means a two dimensional representation on screen or on paper of that tree.

The DSSSL standard [Clark 95], which is intended to describe style sheets adapted to structured information in a standardized way, is at last being finalized and concrete action has already been taken (DSSSL Online [Bozak 95]) by vendors and tool builders involved in the SGML community (SGML Open) and in the WWW community. DSSSL Online is intended to define a suitable subset of DSSSL which will allow the display of SGML tags for editing and browsing purposes.

The CSS [Lie 95] effort from W3C propose a way of defining and associating styles for the current HTML tags. The current draft of CSS (named level 1) is not intended for formatting any arbitrary set of SGML tags. W3C is planning to complete CSS in that direction after CSS level 1 has been adopted and a DSSSL/CSS convergence is studied.

6.2 P: Grif Symposia Styles Sheets

Grif Symposia allows to define generic presentation rules for any element of any DTD. These rules are expressed separately from the DTD in a declarative language called P [Quint 91] (P for Presentation).

Grif Symposia's P style language is a declarative language which is used to associate layout descriptions to the SGML elements. Interactive formatting of the SGML document is thus done by applying these descriptions (or style sheets) when the document is to be displayed.

The P language describes hierarchies of boxes associated to the SGML elements. A P description [Roisin 93] contain rules which describes two different subset of properties:

It is possible to associate to elements presentation boxes by mean of creation rules as Create. These boxes are automatically generated when the formatter displays the corresponding element; they may contain computed information (the section number for example), repeated content of an element (the title of the current section on top of each page) or a static value (the string abstract added before the element abstract, a colored box in the background, horizontal and/or vertical hairlines, etc.).

The presentation rules associated to the SGML elements could be fairly sophisticated. It could also use the attributes values of SGML elements. For example, if a set of tags is used to define a rectangle as:

<!-- A rectangle is defined by its coordinates--> 
<!-- in relation to the image and by its width--> 
<!-- and height --> 
<!ELEMENT rectangle - O EMPTY > 
<!ATTLIST rectangle 
       rectx NUMBER  IMPLIED 
       recty NUMBER  IMPLIED 
       rectw NUMBER  IMPLIED 
       recth NUMBER  IMPLIED >

The P description of the rectangle layout may be expressed as follows:

             HorizPos:Left = Enclosing GRAPHIC.Left
+ RECTX pt; RECTY (RECTANGLE): VertPos: Top = Enclosing GRAPHIC.Top

The first rule indicates that the horizontal coordinate of the RECTANGLE element which has an attribute RECTX is equal to the coordinate of the left border of the GRAPHIC element plus the value of the RECTX attribute.

The third rule indicates that the width of the RECTANGLE element which has an attribute RECTW is equal to the value of that attribute.

Therefore, by using style sheets definitions associated to specific DTDs as described in [3.1] [3.2] , Grif Symposia is able to format any set of SGML tags defined in a DTD in both cases where the DTD is used separately or as an extension to HTML. Future work is planned in comparing P to DSSSL Online and CSS and to support them.

7 Integrating and authoring data through the Net: the Symposia API

The overall approach taken by Grif Symposia permit to build a powerful authoring environment which permit the integration of multiple tools and multiple data on the WWW in a seamless way for the end user.

Grif Symposia has an API that we have developed and which presents a set of solid principles for extending the user interface, document management, network extensibility and interactive behavior of document fragments.

The Grif Symposia API permit to build authoring environments which presents a kind of user interface called Document Oriented user Interface (DOI): basic user operations on the document launch tools which operate distinctively on the selected portion of the document. We call this the document paradigm because complex applications could be disguised as document component behavior. Data is generated within a document and sent from tool to tool, from server to server and the document is used as the natural vehicle between users and computers[Bier 90].

The Grif Symposia API is organized in two layers:

  1. The first layer handles activity tracking: To be able to use structured documents (by scripting them) as user interfaces the issue of how to specify behavior (or 'activity tracking') in a document becomes the key issue.
  2. The second layer handles the set of services

7.1 Activity Tracking in Symposia

In Grif Symposia, our approach to activity tracking is an object-oriented approach for tailoring the behavior of SGML elements: SGML elements receive event messages reflecting the user interaction and in response they execute an appropriate action. Actions are written in C code.[JPaoli 95]

Basic user interaction generates messages but the most important is that structure and content changes of the elements generate messages. The supported message list contain almost all of the SGML ESIS events related to the creation, modification of SGML elements and attributes. Content modification such as PCDATA text modification also generates messages to the appropriate element.

7.2 Services provided by the Symposia API

The Grif Symposia API [JPaoli 95] is an SGML API and provides a programming interface to the HTML/SGML structure and content.

The API supports element and attribute creation and manipulation, content modification, structural searches, and incorporation of fragments into the document.

8 An SGML Online Authoring Application using Grif Symposia

Multiple Online Authoring Applications could be designed to make use of the extensibility features of Grif Symposia.

For example, when creating manuals, an author often needs to gain access to information which has been created previously. This may involve taking fragments of data from various sources and integrating them into a single document.

The viewing process might require that the viewing tool provide the response to a user query regarding information stored on the network. In order to be accurate, the reply provided by the viewer should take account of certain constraints such as data contained in the document. The query could then be refined if such data was encoded within the document as SGML data.

The Parts Lists Online Authoring Application
The example we have built using Grif Symposia uses an SGML environment that represents an illustrated parts list catalog. In the illustrated parts list catalog, keying the content of a part reference automatically query a remote server for the part description, while keying the same content in the title of the document would do nothing. A Database menu is also available for the part reference to give, through the network, the list of valid choices.
<!-- A Part List contains multiple block items--> 
<!-- which describes parts and assemblies --> 
<!ELEMNT partlist (blockitem)*> 
<!ATTRIBUTES partlist URL %URL>  
<!ELEMENT blockitem - - 
(supplier, supplierref, partref, type, expire?, quantity) >

The partlist has an attribute which indicates the URL of a database (or a CGI script) which gives back the blockitem SGML fragment corresponding to a particular partref.

The description of the behavior of the partref element may be expressed as follows:

                StdMsgTextModify: ApplicationTextModify;

The action ApplicationTextModify is executed when the message
StdTextModify (text has been modified) is sent to the element partref .

ApplicationTextModify (Element element) { 
           Fetch The Value of PARTREF 
           Move up to the partlist element 
           Fetch its URL 
           Query through the network with the URL and the PARTREF  
           for receiving the corresponding BLOCKITEM 
           Move to the element SUPPLIER 
           Fill in this element from the Query Result 
           Move to the element SUPPLIERREF 
           Fill in this element from the Query Result ... }

One have to understand that this action is executed (and a query is sent on the network) only when a user types in a partref number in a partref element.

9 Conclusion

The experience gained in implementing and using Grif Symposia allows us to draw some conclusions about defining, authoring and using new tags in WWW documents:

10 Acknowledgments

We would like to thank V.Quint, I.Vatton from INRIA, and G.Marichal and P.Telegone from GRIF S.A. for the very interesting and fruitful discussions which have allowed the definition and implementation of Grif Symposia. We would also like to thank Stuart Culshaw at Grif S.A. for his help in reviewing this paper.

11 References

[Berners 94]T.Berners-Lee, D.Connolly, 'Hypertext Markup Language Specifications - 2.0' , Internet Draft, hypertext/ WWW/MarkUp/ html-spec/ html-spec_2.html, May 1995.

[Berners 95]T.Berners-Lee, L.Montulli, E.Sink, W.Gramlich, J.Hirschman, D.Connolly,'Inserting multimedia objects into HTML 3' , W3C Working Draft,, December 1995.

[Bier 90]E.Bier and A.Goodisman, 'Documents as User Interfaces', EP 90, Proceedings of the International Conference on Electronic Publishing, Document Manipulation & Typography, R. Furuta ed., pp. 249-262, Cambridge University Press, September 1990

[Quint 91]V. Quint, 'The Languages of Grif' , The Grif software documentation, 1991.

[Quint 95]V. Quint, C. Roisin, I. Vatton, 'A structured authoring environment for the World-Wide Web' , Proceedings of the Third International World Wide Web Conference , edited by Computer Networks and ISDN systems, pp. 831-840, April 1995.

[Roisin 93] C. Roisin, I. Vatton, 'Formatting Structured Documents' , Rapport de Recherche, INRIA, September 1993.

[Clark 95] J.Clark,Document Syntax Semantics Specification Language,, February, 1995.

[Bozak 95] J.Bozak, DSSSL Online Application profile ,, February, 1995.

[DeRose 94]S. DeRose, D. Durand, 'Making Hypermedia Work, A User's Guide to HyTime' , Kluwer Academic Publishers, 1994.

[Paoli 95]J.Paoli, 'Cooperative work on the network: edit the WWW!' , Proceedings of the Third International World Wide Web Conference , edited by Computer Networks and ISDN systems, pp. 841-847, April 1995.

[Paoli 94]J.Paoli, 'Creating SGML objects for End-Users - Establishing SGML in an interactive world' , Proceedings of SGML "94 , GCA, ed., pp. 323-333, December 1994.

[JPaoli 95]J.Paoli, 'Rules for extending a WWW client: The Symposia API' , Proceedings of the Fourth International World Wide Web Conference, December 1995.

[Lie 95]Håkon W. Lie, Bert Bos, 'Cascading Style Sheets' , Fifth Draft Specifications, W3C, November 1995.

[Quint 94]V. Quint, I. Vatton, 'Making Structured Documents Active' , Electronic Publishing - Origination, Dissemination and Design , vol. 7, num. 3, 1994.

[Sperberg 94]C. M. Sperberg-McQueen, Robert F. Goldstein, 'HTML to the Max A Manifesto for Adding SGML Intelligence to the World-Wide Web' ,, October 1994.

About the author

Jean Paoli is the Technical Director and a co-founder of GRIF S.A., a leader in the creation of SGML and WWW authoring tools. He supervises the development and implementation of Grif’s WYSIWYG SGML products, the latest being Grif SGML Editor for Macintosh and GATE, an interactive SGML API. Paoli manages GRIF S.A. application consulting groups as well as research and strategic planning toward the Grif technology.

He is currently driving a joint INRIA/GRIF S.A. project for the development of Grif Symposia, a WWW editor which enable collaborative authoring on the network.

Paoli draws on more than 10 years of experience in the structured editing field. Before co-founding GRIF S.A., he worked on structured editors for programming languages with the leading French software house SEMA-GROUP and France’s leading computing research institute, INRIA. Jean holds a specialization in software engineering and is graduated from the Ecole Nationale des Ponts et Chaussées.