Journal reference: Computer Networks and ISDN Systems, Volume 28, issues 711, p. 1085.
Vito Roberto and Davide Brunato
This work is a part of the MULTIPATH project [Multipath95] aimed at developing distributed multimedia services for Telepathology.
The paper is organised as follows. Section 2 introduces the basic concepts underlying our approach, and its possible implications; in the next Section we present the model of a histopathologic case that has been used to develop and test HistMaker; the latter environment is presented in Section 4 with its implementation and results. Our conclusions are in Section 5.
Although HTML allows a substantial freedom in creating documents, in many cases there is the need to constrain the author in following specific guidelines. This may be requested when multiple, repetitive documents (such as simple database records) are to be edited, while maintaining the same overall structure in each of them.
An authoring tool for structured documents is useful in order to avoid syntactic and stylistic errors. The tool may introduce in the document some form of semantic markup, in order to identify the components of the structure, thereby making it possible to re-edit the documents themselves, and render them automatically processable.
Finally, a structured authoring tool is helpful whenever an important issue is maintaining multilingual versions of structured documents.
As for standard hypertexts, the possibility of inserting some type of free link is also important: we call semi-structured hypermedia a structured document which includes a number of free links connecting to internal and/or external fragments.
Several Authors [Kaindl91, Quint95, Varela94, Dobson95, Hardman93, Kesseler95] acknowledge the role of structure in hypermedia documents: it is needed for carrying more complex and detailed information, and it is also the way databases usually organize data.
Based on a pre-existing SGML tool [Quint86], Quint et al. [Quint95] proposed an environment in which the structure of the document is described by a SGML DTD. In this way, the author is helped - or sometimes obliged - to produce a document consistent with the selected structure. The Quint's proposal aims at avoiding syntactic and stylistic errors while generating large documents, and also at exploiting advanced features of HTML that are difficult to use with a standard editor, or without a detailed knowledge of HTML. In his work, he defines also the concept of presentation model, that is the set of rules used to graphically show a structured document. Several different presentation models may be associated with a structure definition.
An important rationale is the reuse and interfacing of existing databases. Varela and Hayes [Varela94] developed a schema-based method for the creation of soft database applications on the World Wide Web by extending HTML with directives for document generation. Their work is based on the presence of an underlying structure in the data, that may be used for the automatic generation and modification of soft user interfaces realised in the WWW. They recognize the flexibility offered by automatically modifying the interface according to the user needs and the database transactions.
Dobson and Burrill [Dobson95] evaluated the usability of HTML for generating the so-called "lightweight databases", i.e., small database applications with some of the typical features of databases, such as searching and indexing. To this aim, they proposed a limited extension to HTML that introduces entities, attributes and relations for semantic markup, in such a way that the conceptual structure of information may be put forward.
Hardman et al. [Hardman93] presented a structured multimedia authoring environment for the specific field of creating multimedia presentations. The latter is a complex task, that may be simplified by adopting a model of the multimedia document to be generated. The Authors argue that most authors already use implicit structures, and better results may be obtained by making that structure explicit, in such a way that the structure itself may be manipulated as a component of the document.
The manipulation of large archives of regularly structured hypertexts is the aim of the Kesseler's work [Kesseler95]: the objects composing the hypertext and the relationships among them are represented in a schema, that is used by the author to edit and update the archive. An incremental compilation technique is adopted to realize the schema evolution.
The papers reported - each of which focussed on a different issue - indicate that there exists a class of hypertexts that may be treated by taking into consideration the structure underlying the documents, in order to make their production more efficient.
A way to achieve this goal is to embed some meta-information in the document. The meta-information should not disturb the rendering of the document by the client, and should also be as light as possible in order not to overcharge documents with external notations. A good example is presented in [Dobson95], where three new tags are introduced that allow a hierarchical and relational representation of documents, thus adapting HTML to small database applications.
Our approach is even simpler, providing a minimal extension that allows the semantic markup of hierarchically structured hypermedia documents.
This is made by introducing the section element:
<SECTION NAME=section_name> </SECTION>This tag is used to identify the start and end of named sections of a hypertext, and can be recursively nested in order to represent hierarchical structures.
The section construct we propose is similar to an ordinary record structure, or a LISP language list.
We are actually developing the Document Type Definition (DTD) describing our extension.
As an example, let us consider the problem of representing on the WWW the patient's data. A minimal set of the latter comprises the birthdate, sex and weight. Such data may be represented by a HTML file with the following content (indented only for the sake of readability):
<H1>PATIENT DATA</H1> <SECTION NAME=patient> <H2>BirthDate: </H2> <SECTION NAME=birthdate> 6-1-1968 </SECTION> <P> <H2>Sex: </H2> <SECTION NAME=sex>F</SECTION> <H2>WEIGHT: </H2> <SECTION NAME=weight>53</SECTION> <P> The file may contain additional text. </SECTION>The structural information embedded in this way can be used for three distinct purposes:
Following the approach presented in [Quint95], the concept of presentation model can be defined in association with a structure. The presentation model now contains all the knowledge needed for representing a series of hypertexts sharing the same structure. In addition, many different representation models may be associated with the same structure, in order to fulfill different needs.
The knowledge may be given in form of rules: for example, we can describe the graphical appearance of a section with a basic set of three rules:
To the aim of providing an easily searchable base of cases, a Reference Case Archive is of great help, providing that it is realised with the contributions of several pathologists and Institutes. When many different pathologists furnish cases, the problem arises of identifying common guidelines for the description of the cases.
From such a distributed archive, also medical students may gain knowledge by means of self-training and problem-based learning. To this aim, a slightly different view of the same case may be useful (e.g., in the form of exercises, with descriptions and diagnoses initially obscured).
A standard data analysis has been carried out, and a model of the histopathologic case has been developed as shown in Figure 1. The model takes into account also the needs of the pathologists, such as the external references.
Figure 1 - the histopathologic case: overall description
Pathologic images have been divided into two lists: gross anatomy photos and images related to microscopic descriptions. In the latter class, images coming from light microscopes play a major role, but other images have also been considered: electron microscope images and DNA content histograms obtained by cytofluorimetry.
Images should be accompanied by textual informations necessary to their correct interpretation. A relevant information to be included in any image description is its file size, since the net traffic may discourage from loading larger files. Other data are dependent on the image type: generally speaking, the images should be accompanied by the description of how the same images have been obtained. In particular, in microscopic images details about staining and magnification are to be furnished in order to properly address the inspecting pathologist.
Figure 2 summarizes the features that have been considered: common to all images, there are the file size and a short textual description.
Figure 2 - Images appearing in a case, with their features, as a part of the model in Fig. 1
A useful functionality - not comprised in the model outlined above - is the possibility of linking images to portions of a text. In fact, the connections are made by means of free links, not easily reconducible to a rigid description.
This feature, together with image linking, turns the hypermedia histopathologic case from a structured to a semi-structured document model, making it a starting point for cognitive explorations on the Internet.
Visual databases of images [Kayser93] and hypermedia histopathologic cases [Della Mea95] are among the most interesting applications of Telepathology. Such applications are even more relevant in remote or distributed environments, because of the difficulty in gathering cases within a single Institution. Internet offers the ideal services for such archives.
The tool allows the pathologist to:
preceded_by <section name=section_name> .... free hypertext .... </section> followed_by
where preceded_by and followed_by are portions of HTML code containing everything useful to identify the section - e.g., a printable name, horizontal rulers, and so on. No default style is actually implemented.
Our technique also allows the generation of the same case in different languages, but with the same overall structure. In fact, using the same conceptual structure the author can change only the rules in order to reflect different section headings.
The tool provides also an easy way to connect images to a text. In fact, a possible source of problems in HTML authoring may be the syntax of anchors and URLs, especially when the author is a beginner.
A fundamental feature of the tool is the ability of re-editing generated files, using the semantic markup for identifying the sections and associating it to the structure. This mechanism allows to upgrade and maintain files, as well as generate multilingual versions. Debugging and maintenance may be carried out by simply loading previously generated files, and editing text or links. Upgrading documents can be done by creating an upgraded presentation model associated to the same structure, by loading a file, and then saving it after changing the presentation model.
Finally, the tool obviously takes into account the presence of character entities to be translated into the corresponding HTML codes.
The textual part in the structure of a case may be directly edited within a series of user-fillable fields. In addition, three lists of images may be inserted together with their distinctive features, for clinical history, macroscopic and microscopic descriptions.
Adding a link between a portion of a text and an image is easy: once selected the image from a popup menu, the corresponding text should be selected with the mouse. Then, by selecting "connect to image" from a menu, the text will be automatically marked as linked with a double tag: the character turns underlined and bracketed, in this way indicating that the text is already linked to an image. The software then checks any attempt to modify the linked text, and issues a warning message. The author may disconnect an image from the text, if necessary.
A separate window is dedicated to the parameter settings, the most important of which is surely the structure to be used for generating the hypermedia case.
A limited editor for the structure and presentation model is present in that window, allowing the creation of identical or different structures with different presentation models. The structure may be "compiled" in the corresponding field interface, at the moment with some limitations, because ad hoc solutions have been adopted for making the user input process as easiest as possible, in particular when editing image links. However, this is sufficient for our application, while for a more general approach to the structured authoring, more complex solutions should be adopted.
A particular approach has been adopted to enable the pathologist to set links with networked resources. With the aim of avoiding a direct interaction of the pathologist with the HTML language, we devised a method that takes directly from the user's environment the knowledge about the resources to be possibly inserted in the external reference field of the cases. This can be done by connecting the authoring environment with the tools normally used by the pathologist in his/her visits to the World Wide Web. More specifically, we take advantage from the bookmarks gathered by the pathologist when he/she browses the WWW: the set of bookmarks represents a sketch of the network resources that, at a given time, are useful for the diagnostic, research and teaching interests of the pathologist. Figure 3 gives a look to Histmaker.
Figure 3 - Six snapshots from the user interface of HistMaker, filled with the data corresponding to a case.
The whole environment is not intended as a complete archive management system, but only as a facility for case authoring, leaving the other operations on the WWW site - such as linkage of new cases to the archive, file management, and so on - to the webmanager. This because the management of a truly complete distributed case archive remains a complex task, to be better accomplished by a dedicated technician than by an ordinary WWW user.
Figure 3 - The case presented in Figure 3 has been automatically converted into a HTML form.
The semantic markup technique is conceptually very simple, but significantly extends the expressive power of HTML documents: it introduces a general-purpose grouping construct that can be used to structure parts of a document as database records, and to automatically generate HTML codes, thereby enabling more effective authoring, browsing and searching of hypertexts.
A model of the histopathologic cases has been presented, whose aim is twofold: on one hand, it acts as a database schema for the hypermedia archive of cases on the WWW, on the other hand it is a knowledge representation tool, used to provide a number of model-based, high-level supports to a homogeneous class of users - i.e., the pathologists.
HistMaker is an environment designed and realised in order to test our approach and evaluate its performances in the domain of telepathology. It enables the user to construct hypertextual documents under his/her complete control; to generate HTML files; to effectively use the specialised services available on the Internet.
Not all the requested functionalities are currently implemented in the prototype; tests are under way by the pathologists involved in the MULTIPATH project. The realization of PathGallery, an archive of hypermedia reference cases is also under way.
The proposed approach presents interesting challenges. As an example, we mention the development of a general-purpose, adaptive system that, on the basis of structural descriptions, generates the most adequate user interface for a specified document.
From an application point of view, a more general tool can be developed on the basis of HistMaker; it should be able to deal with generic structures - defined in an appropriate way - to be used for tailoring specific user interfaces and generating HTML documents accordingly.
[Berners-Lee94] Berners-Lee T, Connolly D. Hypertext markup language specification - 2.0. IETF HTML Working Group, RFC1866, 1995.
[Della Mea95] Della Mea V, Puglisi F, Brunato D, Roberto V, Forti S, Dalla Palma P, Beltrami CA. Histopathologic reference cases on Internet: an hypermedia approach for training, reference and education. Proceedings of 9th International Conference on Diagnostic Quantitative Pathology, Heidelberg, Germany, 1995.
[Dobson 1995] Dobson SA, Burrill VA. Lightweight Databases. Proceedings of the 3nd International World Wide Web Conference , Darmstadt, Germany (1995).
[Fare91] Fare C, Ugolini D. The PDQ (Physician Data Query), the cancer database, in oncological clinical practice. Cancer Treatment Reviews 1991;18(2):137-143.
[Galvin94] Galvin JR, D'Alessandro MP, Erkonen WE, Lacey DL, Santer DM. The Virtual Hospital: A link between academia and practitioners (Letter to the Editor). Acad. Med, 1994;69:130.
[Hardman 1993] Hardman L, van Rossum G, Bulterman DCA. Structured multimedia authoring. Proceedings of 1st ACM Conference on Multimedia, pp. 283-289, Anaheim, CA, USA, Aug 1-6, 1993.
[Kaindl91] Kaindl H, Snaprud M. Hypertext and structured object representation: a unifying view. Proceedings of ACM Hypertext 91 pp. 345-358, 1991.
[Kayser93] Kayser K. Progress in Telepathology. In Vivo 7(4), pp 331-3, 1993.
[Kesseler95] Kesseler M. A schema based approach to HTML authoring. Proceedings of the 4th International World Wide Web Conference, Boston, MA, USA (1995).
[Quint 1995] Quint V, Roisin C, Vatton I. A structured authoring environment for the World-Wide Web. Proceedings of the 3nd International World Wide Web Conference , Darmstadt, Germany (1995).
[Quint86] Quint V, Vatton I. Grif: an interactive system for structured document manipulation. Text processing and document manipulation, Proceedings of the International Conference, J. C. van Vliet, ed., pp. 200-213, Cambridge University Press, 1986.
[Varela94] Varela CA, Hayes CC. Zelig: schema-based generation of soft WWW database applications. Proceedings of the 2nd International World Wide Web Conference (1994).
Vito Roberto is Associate Professor at the Computer Science Faculty, University of Udine, Italy. He got the "Laurea" degree in Physics in 1973. Since then, he has been working on computational aspects of signal and image analysis. His current research activity concerns model-based techniques for machine vision and image communication. In particular, he is currently leading research projects in the fields of multi-sensor data fusion and multi-agent systems, in the application domains of industrial inspection and telepathology. Prof. Roberto is the author of several articles, and editor of volumes in the fields of Perceptual Systems, Artificial Intelligence and Pattern Recognition. He is a member of the International Association for Pattern Recognition and the American Association for Artificial Intelligence.
Davide Brunato was born in 1968. He is currently student of Computer Science at the University of Udine, Italy, and he is doing his M.Sc. thesis about semi-structured hypermedia authoring. His research interests are mainly image processing and hypermedia.
Carlo Alberto Beltrami is Full Professor and Head of Pathology at the University of Udine, Italy. He obtained his Medicine degree at the Ferrara University, Italy, in 1967. He specialised in Clinical Pathology, Oncology and Pathological Anatomy. From 1971 to 1983 he worked as Assistant Professor of Pathology at the University of Ferrara and Ancona, Italy. From 1983 to 1985 he was Associate Professor of Pathology at the University of Ancona, Italy. In 1985 he become Full Professor of Pathology; from 1988 he is at the University of Udine, Italy. Recently he become Head of the University Hospital of Udine, Italy. His research interests are cardiovascular pathology, oncology, telepathology and the applications of image processing in quantitative pathology.