Journal reference: Computer Networks and ISDN Systems, Volume 28, issues 7–11, p. 1085.

HTML Generation and Semantic Markup for Telepathology

Vincenzo Della Mea, Carlo Alberto Beltrami
Dept. of Anatomic Pathology, University of Udine, Italy

dellamea@dimi.uniud.it; 
beltrami@uniud.it

http://www.uniud.it/drmm/drmmeng.html

Vito Roberto and Davide Brunato
Machine Vision Lab, Dept. of Informatics, University of Udine, Italy

v.roberto@dimi.uniud.it; 
brunato@dimi.uniud.it

http://www-iapr-ic.dimi.uniud.it/Udine

Abstract

The paper presents a new strategy for the authoring of hypermedia documents; describes an HTML generator called HistMaker, and its application to the domain of Anatomic Pathology. A simple extension to HTML is presented, whose aim is introducing a general-purpose grouping construct to allow the semantic markup of hierarchically structured hypermedia documents. Such a structural information can be used for an effective authoring, browsing and searching of documents. The authoring tool HistMaker is introduced on the basis of a model of a pathologic case; its implementation and practical results are also discussed.

1 Introduction

In this paper we present a novel approach to the authoring and maintenance of hypermedia documents, applied to the domain of Anatomic Pathology. The approach comprises a simple semantic markup addition to HTML, a model of a histopathologic case, and an authoring environment - HistMaker - that allows the generation of maintainable, multilingual hypermedia medical cases for diagnostic reference and training. HistMaker has been developed in Hypercard on an Apple Macintosh.

This work is a part of the MULTIPATH project [Multipath95] aimed at developing distributed multimedia services for Telepathology.

The paper is organised as follows. Section 2 introduces the basic concepts underlying our approach, and its possible implications; in the next Section we present the model of a histopathologic case that has been used to develop and test HistMaker; the latter environment is presented in Section 4 with its implementation and results. Our conclusions are in Section 5.

2 Semi-structured hypermedia documents on the WWW

2.1 Motivations

The World Wide Web consists of a document representation language, HTML [Berners-Lee94] and a communication protocol, HTTP [Berners-Lee93] .

Although HTML allows a substantial freedom in creating documents, in many cases there is the need to constrain the author in following specific guidelines. This may be requested when multiple, repetitive documents (such as simple database records) are to be edited, while maintaining the same overall structure in each of them.

An authoring tool for structured documents is useful in order to avoid syntactic and stylistic errors. The tool may introduce in the document some form of semantic markup, in order to identify the components of the structure, thereby making it possible to re-edit the documents themselves, and render them automatically processable.

Finally, a structured authoring tool is helpful whenever an important issue is maintaining multilingual versions of structured documents.

As for standard hypertexts, the possibility of inserting some type of free link is also important: we call semi-structured hypermedia a structured document which includes a number of free links connecting to internal and/or external fragments.

2.2 Related work

Authoring tools are a fundamental topic in the research on hypermedia.

Several Authors [Kaindl91, Quint95, Varela94, Dobson95, Hardman93, Kesseler95] acknowledge the role of structure in hypermedia documents: it is needed for carrying more complex and detailed information, and it is also the way databases usually organize data.

Based on a pre-existing SGML tool [Quint86], Quint et al. [Quint95] proposed an environment in which the structure of the document is described by a SGML DTD. In this way, the author is helped - or sometimes obliged - to produce a document consistent with the selected structure. The Quint's proposal aims at avoiding syntactic and stylistic errors while generating large documents, and also at exploiting advanced features of HTML that are difficult to use with a standard editor, or without a detailed knowledge of HTML. In his work, he defines also the concept of presentation model, that is the set of rules used to graphically show a structured document. Several different presentation models may be associated with a structure definition.

An important rationale is the reuse and interfacing of existing databases. Varela and Hayes [Varela94] developed a schema-based method for the creation of soft database applications on the World Wide Web by extending HTML with directives for document generation. Their work is based on the presence of an underlying structure in the data, that may be used for the automatic generation and modification of soft user interfaces realised in the WWW. They recognize the flexibility offered by automatically modifying the interface according to the user needs and the database transactions.

Dobson and Burrill [Dobson95] evaluated the usability of HTML for generating the so-called "lightweight databases", i.e., small database applications with some of the typical features of databases, such as searching and indexing. To this aim, they proposed a limited extension to HTML that introduces entities, attributes and relations for semantic markup, in such a way that the conceptual structure of information may be put forward.

Hardman et al. [Hardman93] presented a structured multimedia authoring environment for the specific field of creating multimedia presentations. The latter is a complex task, that may be simplified by adopting a model of the multimedia document to be generated. The Authors argue that most authors already use implicit structures, and better results may be obtained by making that structure explicit, in such a way that the structure itself may be manipulated as a component of the document.

The manipulation of large archives of regularly structured hypertexts is the aim of the Kesseler's work [Kesseler95]: the objects composing the hypertext and the relationships among them are represented in a schema, that is used by the author to edit and update the archive. An incremental compilation technique is adopted to realize the schema evolution.

The papers reported - each of which focussed on a different issue - indicate that there exists a class of hypertexts that may be treated by taking into consideration the structure underlying the documents, in order to make their production more efficient.

2.3 A simple semantic markup technique

In HTML, hierarchical features of the documents are flattened to a substantially unique layer, if we exclude the depth appearing in lists. As a matter of fact, there are classes of documents that may be described with a more detailed model; in addition, if the latter model is embedded in the document, some automatic operation may be carried out by taking advantage from it.

A way to achieve this goal is to embed some meta-information in the document. The meta-information should not disturb the rendering of the document by the client, and should also be as light as possible in order not to overcharge documents with external notations. A good example is presented in [Dobson95], where three new tags are introduced that allow a hierarchical and relational representation of documents, thus adapting HTML to small database applications.

Our approach is even simpler, providing a minimal extension that allows the semantic markup of hierarchically structured hypermedia documents.

This is made by introducing the section element:

<SECTION NAME=section_name>
</SECTION>

This tag is used to identify the start and end of named sections of a hypertext, and can be recursively nested in order to represent hierarchical structures.

The section construct we propose is similar to an ordinary record structure, or a LISP language list.

We are actually developing the Document Type Definition (DTD) describing our extension.

As an example, let us consider the problem of representing on the WWW the patient's data. A minimal set of the latter comprises the birthdate, sex and weight. Such data may be represented by a HTML file with the following content (indented only for the sake of readability):

<H1>PATIENT DATA</H1>
<SECTION NAME=patient>
		<H2>BirthDate: </H2>
		<SECTION NAME=birthdate> 6-1-1968 </SECTION>
		<P>
		<H2>Sex: </H2>
		<SECTION NAME=sex>F</SECTION>
		<H2>WEIGHT: </H2>
		<SECTION NAME=weight>53</SECTION>
<P>
The file may contain additional text.
</SECTION>

The structural information embedded in this way can be used for three distinct purposes:

Structure-aware authoring of hypertexts: when authoring repetitive documents, all the common elements can be associated to the structure and automatically generated , thus reducing the time spent in the authoring itself;
Structure-aware browsing of hypertexts: for particular aims, specific browsers may be developed that are able to show structurally marked hypertexts in the most appropriate ways;
Structure-aware searching of hypertexts: an important feature obtainable with the structural markup is the capability of searching for specific keywords only in semantically relevant portions of the documents. This allows more effective and efficient searches.

In the present work we will explore mainly the first opportunity, although a few words will be spent about the remaining ones.

2.4 From structures to documents

By using the semantic markup technique previously described, we are able to annotate a hypertext with basic indications about its content. The annotations may be used by a structure-aware HTML editor to generate documents starting with a set of structured data, provided that knowledge about how to represent each section is given.

Following the approach presented in [Quint95], the concept of presentation model can be defined in association with a structure. The presentation model now contains all the knowledge needed for representing a series of hypertexts sharing the same structure. In addition, many different representation models may be associated with the same structure, in order to fulfill different needs.

The knowledge may be given in form of rules: for example, we can describe the graphical appearance of a section with a basic set of three rules:

preceded_by
default_style
followed_by

The "preceded_by" rule describes the text to be inserted immediately before the section: for example, it may include the section title and a named anchor. The "default_style" rule regards the representation of the whole section (that may be modified when authoring a specific section). The "followed_by" rule can be used for closing the section. The quoted rules are used by the editor in a compilation process that generates an HTML document starting from the data of each section, its structure and the selected presentation model.

3 The hypermedia histopathologic case

3.1 Reference cases in Pathology

As in any medical specialty, in Pathology the expert knowledge derives from a continuous practice of problem solving. In particular, Anatomic Pathology is an image-based discipline, in which the diagnosis is achieved by examining visually perceivable features from a great deal of images acquired by a microscope. Rare cases are discoverable for reference, either by checking similar, previously encountered cases, or by consulting the literature in order to identify discriminating features. A possible solution is having at disposal a number of paradigmatic cases to be inspected for comparison. Usually, such cases are available in paper form as case reports published on scientific reviews.

To the aim of providing an easily searchable base of cases, a Reference Case Archive is of great help, providing that it is realised with the contributions of several pathologists and Institutes. When many different pathologists furnish cases, the problem arises of identifying common guidelines for the description of the cases.

From such a distributed archive, also medical students may gain knowledge by means of self-training and problem-based learning. To this aim, a slightly different view of the same case may be useful (e.g., in the form of exercises, with descriptions and diagnoses initially obscured).

3.2 The structure of the histopathologic case

Different data concur in describing a histopathologic case. In fact, the anatomo-pathologic findings alone are often insufficient to reach a correct diagnosis: the anagraphical data and clinical history of the patient can put the case in the right frame.

A standard data analysis has been carried out, and a model of the histopathologic case has been developed as shown in Figure 1. The model takes into account also the needs of the pathologists, such as the external references.

Figure 1 - the histopathologic case: overall description

3.3 Text and images

Images may appear in three sections of a hypermedia case document: clinical history, macroscopic and microscopic description. In our model of histopathologic cases, emphasis is put to pathologic images, either macroscopic or microscopic; however, we provided anyway the possibility of inserting images in the clinical history.

Pathologic images have been divided into two lists: gross anatomy photos and images related to microscopic descriptions. In the latter class, images coming from light microscopes play a major role, but other images have also been considered: electron microscope images and DNA content histograms obtained by cytofluorimetry.

Images should be accompanied by textual informations necessary to their correct interpretation. A relevant information to be included in any image description is its file size, since the net traffic may discourage from loading larger files. Other data are dependent on the image type: generally speaking, the images should be accompanied by the description of how the same images have been obtained. In particular, in microscopic images details about staining and magnification are to be furnished in order to properly address the inspecting pathologist.

Figure 2 summarizes the features that have been considered: common to all images, there are the file size and a short textual description.

Figure 2 - Images appearing in a case, with their features, as a part of the model in Fig. 1

A useful functionality - not comprised in the model outlined above - is the possibility of linking images to portions of a text. In fact, the connections are made by means of free links, not easily reconducible to a rigid description.

3.4 External references

When dealing with rare or interesting cases, the appropriate scientific literature should be available, either in the form of case reports or of deeper studies on the details of the disease. Thus, when describing a case in a hypermedia form, the author should be able to access some kind of reference external to the case. If the latter is a simple bibliographic citation, no authoring problems occur, being this only a textual addition to the document. A more interesting situation arises from the availability of networked medical resources: in fact, not only papers and books supply useful informations, but also other networked case archives, or bibliographic databases, or hypertexts, or even information services such as those furnished by the Cancernet [Fare91], the Virtual Hospital [Galvin94], and others. By the way, the latter type of distributed services highly extend the resources available to the pathologist.

This feature, together with image linking, turns the hypermedia histopathologic case from a structured to a semi-structured document model, making it a starting point for cognitive explorations on the Internet.

3.5 PathGallery: a distributed archive of hypermedia cases

The environment described in the present paper has been designed as a support to realise a distributed hypermedia archive of medical cases, called PathGallery [PathGallery95], which is a goal of the MULTIPATH project.

Visual databases of images [Kayser93] and hypermedia histopathologic cases [Della Mea95] are among the most interesting applications of Telepathology. Such applications are even more relevant in remote or distributed environments, because of the difficulty in gathering cases within a single Institution. Internet offers the ideal services for such archives.

4 HistMaker: an authoring tool for semi-structured histopathologic cases

4.1 Functionalities

With the aim of testing the approach reported so far, and enabling the pathologist to compose the HTML documents describing the cases, we developed HistMaker, a tool for authoring hypermedia histopathologic cases.

The tool allows the pathologist to:

edit a histopathologic case;
insert hyperlinks between text and images;
generate an HTML file;
re-edit previously generated HTML files;
change the representation model of histopathologic cases in an easy way.

The structure of Figure 1 acts as a frame for the authoring of cases, by helping the pathologist in inserting the correct data - some of the latter are compulsory, such as sex and age. Many different presentation models may be associated to the structure, which are actually implemented as simple rules of the type "preceded_by" and "followed_by". In this way, for each section pertaining to the structure, the final HTML description is generated by means of a compilation process, as follows:

preceded_by

<section name=section_name>

.... free hypertext ....

</section>

followed_by

where preceded_by and followed_by are portions of HTML code containing everything useful to identify the section - e.g., a printable name, horizontal rulers, and so on. No default style is actually implemented.

Our technique also allows the generation of the same case in different languages, but with the same overall structure. In fact, using the same conceptual structure the author can change only the rules in order to reflect different section headings.

The tool provides also an easy way to connect images to a text. In fact, a possible source of problems in HTML authoring may be the syntax of anchors and URLs, especially when the author is a beginner.

A fundamental feature of the tool is the ability of re-editing generated files, using the semantic markup for identifying the sections and associating it to the structure. This mechanism allows to upgrade and maintain files, as well as generate multilingual versions. Debugging and maintenance may be carried out by simply loading previously generated files, and editing text or links. Upgrading documents can be done by creating an upgraded presentation model associated to the same structure, by loading a file, and then saving it after changing the presentation model.

Finally, the tool obviously takes into account the presence of character entities to be translated into the corresponding HTML codes.

4.2 Implementation

The prototype tool has been developed on an Apple Macintosh (Apple, Cupertino, USA) using the well-known authoring tool Hypercard (Claris Corp., Santa Clara, USA), that is equipped with an interface system and an embedded programming language suitable to our project. The more time-consuming operations have been written as external commands (XCMD) using the C language (Symantec, Cupertino, USA).

The textual part in the structure of a case may be directly edited within a series of user-fillable fields. In addition, three lists of images may be inserted together with their distinctive features, for clinical history, macroscopic and microscopic descriptions.

Adding a link between a portion of a text and an image is easy: once selected the image from a popup menu, the corresponding text should be selected with the mouse. Then, by selecting "connect to image" from a menu, the text will be automatically marked as linked with a double tag: the character turns underlined and bracketed, in this way indicating that the text is already linked to an image. The software then checks any attempt to modify the linked text, and issues a warning message. The author may disconnect an image from the text, if necessary.

A separate window is dedicated to the parameter settings, the most important of which is surely the structure to be used for generating the hypermedia case.

A limited editor for the structure and presentation model is present in that window, allowing the creation of identical or different structures with different presentation models. The structure may be "compiled" in the corresponding field interface, at the moment with some limitations, because ad hoc solutions have been adopted for making the user input process as easiest as possible, in particular when editing image links. However, this is sufficient for our application, while for a more general approach to the structured authoring, more complex solutions should be adopted.

A particular approach has been adopted to enable the pathologist to set links with networked resources. With the aim of avoiding a direct interaction of the pathologist with the HTML language, we devised a method that takes directly from the user's environment the knowledge about the resources to be possibly inserted in the external reference field of the cases. This can be done by connecting the authoring environment with the tools normally used by the pathologist in his/her visits to the World Wide Web. More specifically, we take advantage from the bookmarks gathered by the pathologist when he/she browses the WWW: the set of bookmarks represents a sketch of the network resources that, at a given time, are useful for the diagnostic, research and teaching interests of the pathologist. Figure 3 gives a look to Histmaker.

Figure 3 - Six snapshots from the user interface of HistMaker, filled with the data corresponding to a case.

The whole environment is not intended as a complete archive management system, but only as a facility for case authoring, leaving the other operations on the WWW site - such as linkage of new cases to the archive, file management, and so on - to the webmanager. This because the management of a truly complete distributed case archive remains a complex task, to be better accomplished by a dedicated technician than by an ordinary WWW user.

4.3 An example

The approach we propose is easier to understand with an example case obtained with our authoring environment. Figure 4 show the result of the compilation process carried out on the same data which appear in Figure 3.

Figure 3 - The case presented in Figure 3 has been automatically converted into a HTML form.

5 Conclusions

In this paper we have presented an approach to the authoring and maintenance of semi-structured hypermedia documents. The proposed solution involves a semantic markup technique; a conceptual data model; the development of a suitable tool, HistMaker, working with histopathologic case descriptions.

The semantic markup technique is conceptually very simple, but significantly extends the expressive power of HTML documents: it introduces a general-purpose grouping construct that can be used to structure parts of a document as database records, and to automatically generate HTML codes, thereby enabling more effective authoring, browsing and searching of hypertexts.

A model of the histopathologic cases has been presented, whose aim is twofold: on one hand, it acts as a database schema for the hypermedia archive of cases on the WWW, on the other hand it is a knowledge representation tool, used to provide a number of model-based, high-level supports to a homogeneous class of users - i.e., the pathologists.

HistMaker is an environment designed and realised in order to test our approach and evaluate its performances in the domain of telepathology. It enables the user to construct hypertextual documents under his/her complete control; to generate HTML files; to effectively use the specialised services available on the Internet.

Not all the requested functionalities are currently implemented in the prototype; tests are under way by the pathologists involved in the MULTIPATH project. The realization of PathGallery, an archive of hypermedia reference cases is also under way.

The proposed approach presents interesting challenges. As an example, we mention the development of a general-purpose, adaptive system that, on the basis of structural descriptions, generates the most adequate user interface for a specified document.

From an application point of view, a more general tool can be developed on the basis of HistMaker; it should be able to deal with generic structures - defined in an appropriate way - to be used for tailoring specific user interfaces and generating HTML documents accordingly.

Acknowledgments

We thank the anonymous Referees for their stimulating suggestions and comments.

References

[Berners-Lee93] Berners-Lee T. Hypertext Transfer Protocol, Internet Draft. 5 Nov 1993.

[Berners-Lee94] Berners-Lee T, Connolly D. Hypertext markup language specification - 2.0. IETF HTML Working Group, RFC1866, 1995.

[Della Mea95] Della Mea V, Puglisi F, Brunato D, Roberto V, Forti S, Dalla Palma P, Beltrami CA. Histopathologic reference cases on Internet: an hypermedia approach for training, reference and education. Proceedings of 9th International Conference on Diagnostic Quantitative Pathology, Heidelberg, Germany, 1995.

[Dobson 1995] Dobson SA, Burrill VA. Lightweight Databases. Proceedings of the 3nd International World Wide Web Conference , Darmstadt, Germany (1995).

[Fare91] Fare C, Ugolini D. The PDQ (Physician Data Query), the cancer database, in oncological clinical practice. Cancer Treatment Reviews 1991;18(2):137-143.

[Galvin94] Galvin JR, D'Alessandro MP, Erkonen WE, Lacey DL, Santer DM. The Virtual Hospital: A link between academia and practitioners (Letter to the Editor). Acad. Med, 1994;69:130.

[Hardman 1993] Hardman L, van Rossum G, Bulterman DCA. Structured multimedia authoring. Proceedings of 1st ACM Conference on Multimedia, pp. 283-289, Anaheim, CA, USA, Aug 1-6, 1993.

[Kaindl91] Kaindl H, Snaprud M. Hypertext and structured object representation: a unifying view. Proceedings of ACM Hypertext 91 pp. 345-358, 1991.

[Kayser93] Kayser K. Progress in Telepathology. In Vivo 7(4), pp 331-3, 1993.

[Kesseler95] Kesseler M. A schema based approach to HTML authoring. Proceedings of the 4th International World Wide Web Conference, Boston, MA, USA (1995).

[Multipath95] http://www.uniud.it/drmm/anpat/gallery/pathgallery.html

[PathGallery95] http://www-iapr-ic.dimi.uniud.it/Udine/Respro/Multipath/multipath.html

[Quint 1995] Quint V, Roisin C, Vatton I. A structured authoring environment for the World-Wide Web. Proceedings of the 3nd International World Wide Web Conference , Darmstadt, Germany (1995).

[Quint86] Quint V, Vatton I. Grif: an interactive system for structured document manipulation. Text processing and document manipulation, Proceedings of the International Conference, J. C. van Vliet, ed., pp. 200-213, Cambridge University Press, 1986.

[Varela94] Varela CA, Hayes CC. Zelig: schema-based generation of soft WWW database applications. Proceedings of the 2nd International World Wide Web Conference (1994).

About the authors

Vincenzo Della Mea was born in Como, Italy, in 1967. He received his M.Sc. in Computer Science from the University of Udine, Italy, in 1992. He is actually Ph.D. Candidate at the same University. His main research interests are image processing and hypermedia, with their applications in medicine and telemedicine.

Vito Roberto is Associate Professor at the Computer Science Faculty, University of Udine, Italy. He got the "Laurea" degree in Physics in 1973. Since then, he has been working on computational aspects of signal and image analysis. His current research activity concerns model-based techniques for machine vision and image communication. In particular, he is currently leading research projects in the fields of multi-sensor data fusion and multi-agent systems, in the application domains of industrial inspection and telepathology. Prof. Roberto is the author of several articles, and editor of volumes in the fields of Perceptual Systems, Artificial Intelligence and Pattern Recognition. He is a member of the International Association for Pattern Recognition and the American Association for Artificial Intelligence.

Davide Brunato was born in 1968. He is currently student of Computer Science at the University of Udine, Italy, and he is doing his M.Sc. thesis about semi-structured hypermedia authoring. His research interests are mainly image processing and hypermedia.

Carlo Alberto Beltrami is Full Professor and Head of Pathology at the University of Udine, Italy. He obtained his Medicine degree at the Ferrara University, Italy, in 1967. He specialised in Clinical Pathology, Oncology and Pathological Anatomy. From 1971 to 1983 he worked as Assistant Professor of Pathology at the University of Ferrara and Ancona, Italy. From 1983 to 1985 he was Associate Professor of Pathology at the University of Ancona, Italy. In 1985 he become Full Professor of Pathology; from 1988 he is at the University of Udine, Italy. Recently he become Head of the University Hospital of Udine, Italy. His research interests are cardiovascular pathology, oncology, telepathology and the applications of image processing in quantitative pathology.