On-the-Fly Translation and Role-Based Authorization For Intranet Document Systems

Nancy Grady, Noel Nachtigal, Mark Elmore, James Kohl, James Rome, Joel Reed, Philip Papadapoulos

Abstract

Introduction

Web-based intranet information systems must handle two main requirements for an automated on-line delivery system, security and presentation.

In a decentralized web, corporate information can be distributed across many servers to ease the traffic demands, and to place the information near it's owner/maintainers. In a large corporation the standard methods are manually intensive and cumbersome to maintain. Personnel move frequently within the organization, and the organization itself often changes structure. Authorization based on individual identification requires constant maintenance on servers requiring restricted access to information.

Presentation of information is also a significant issue. Information is most useful if it is available in many formats, not only in a static presentation on the web, but in exchange formats to be used within other applications. As HTML and graphics standards change, older pages need to be updated to maintain a consistent look and feel over time. Separating structure and style is important to the long term vitality of legacy information.

Oak Ridge National Laboratory (ORNL) has designed the Financial Automated On-Line User System (FaMOUS) with these issues in mind. FaMOUS is a system of interconnected reports which present financial information to managers across the web. The reports present a concise set of information to help managers monitor their project expenditures, and the time demands on their staff. Hyperlinks provide easy navigation among the reports for drilling down through the information. The financial system at ORNL is processed once each month, where all costs become official. At this point the data for this set of designed reports is downloaded to a local server, and the reports are created. This distributed set of reports is a presentation layer to insulate the staff from changes in the underlying accounting system, and to remove the requirement that managers learn a database query language.

This paper will describe the approach taken in the FaMOUS system to address the needs for security and presentation of reports in a restricted intranet environment.

Document Structure

Reports within FaMOUS are designed to provide a presentation layer to financial information. This information can be text, tables or graphs. The advantage to a web-based delivery of information is that the reports can be interconnected, with related entries being cross-referenced between them. While the web is an excellent delivery medium, it is not sufficient alone. Managers need to be able to print reports to carry with them or archive in notebooks, and they need to be able to retrieve the data into other programs such as spreadsheets for further analysis. It is impractical to save each report in a wide number of presentation formats. The clear solution is to mark the structure of reports and perform the translation to a presentation format when requested. SGML as the standard for text markup is the obvious medium to achieve this. By designing a document type definition (DTD) with the elements needed for our reports, we can store each report once. This DTD, given in FaMOUS.DTD, documents the allowed structure of the reports. Translation to a web presentation format for example then consists of conversion of this DTD into one appropriate for the web, such as the HTML 3.2. DTD.

A number of decisions must be made in the design of a DTD in terms of the specificity of each data element. The crucial decision concerns the need for retrieval of elements. If there is need for indexing of data within the reports, each element would need to be addressed by content. In a table for example a cell could be labeled by it's content, such as AccountName, or it could be simply labeled as a cell. Since the FaMOUS reports are a presentation layer, not a duplication of the functionality of the underlying database, we felt that a streamlined design would entail labeling the simple structure of the reports, as table rows and cells. In this way all reports could be rendered from a single structure oriented DTD. This greatly simplifies the maintenance of this automated system as new reports are created since items are not tagged for content but only for structure. Content information is provided in tables by the labels in a header row.

The second step in the creation of the FaMOUS DTD was to describe the hyperlink structure between reports. Each report is given a unique identifier. This consists of the report name, year, month, organization, optional program name, and location within the page. This unique name, much in the vein of the earlier URI work uniquely identifies the document. Since a translation program, described below, renders the report this identifier can be translated to a location on disk for the SGML file during rendering. Links are thus represented as

<Link

urla="report.year.month.organization#location">ReportText</Link>

This represents a unique reference within the set of reports. Separation from the disk structure also provides flexibility for future restructuring.

The FaMOUS DTD provides a simple structure markup that can handle reports which are a mixture of tables, text and graphics. Reports created from the database report writer are checked for consistency by using the NSGMLS parser by Jim Clark.

Access Security

The next consideration is the protection of the information contained in the reports as they are delivered across the internet. Security can be considered in three steps, authentication, authorization, and encryption. Authentication is typically handled on the server using the standard username-password combination, with security coming through encryption between browser and server. This requires each server to maintain a list of passwords for all users. Authorization is likewise handled on the server, where each directory on the server has an associated access group, and membership is assigned through a standard group file. This scheme is difficult to maintain for a set of distributed servers. Distribution of passwords opens a potential vulnerability by having many distributed copies. As personnel move within the organization, it becomes a maintenance nightmare to update all the server password lists. FaMOUS addresses the concerns for distributed authentication and authorization.

Encryption

Apache servers are used to provide secure transmission between server and browser. Certificates are issued for each server to be accepted by the user.

Authentication

To avoid the maintenance and vulnerability of password lists on each server, the authentication subroutine in the Apache code was altered. Instead of comparing the username and password to the server's password list, the pair is encrypted and sent to a central server along with the request for a specific web page. The web page on the central server is restricted to current staff members. If the central page returns successfully, then the username-password pair are accepted on the local server. This allows the distributed servers to use the central server's authorization process, rather than having to duplicate it.

Role-based Authorization

Once the user is authenticated, they must also be authorized for the class of information they are allowed to access. An authorization scheme was designed that is based both upon a person's role within an organization, and upon themselves individually. NIST has developed a role-based authorization scheme, where access is granted for functional classes. For financial reports, however, functional classes turned out to be too broad; instead, access is hierarchical, based on the organizational chart. A new access system was designed that places individuals within the hierarchy and then assigns authorization based upon their position. Authorization depends not only on the depth within the chart, but also upon its branch. Authorization can be granted or denied to a box in the chart, regardless of who is currently occupying that position, or to a given individual, regardless of their position in the organization. In addition, authorization can be also be granted or denied by a box in the chart to its supervisors or to its subordinates. Authorization by individual would automatically follow them when they change positions within the organization. Authorization by position would affect whoever was identified as the occupant of the box.

The hierarchical tree is stored as a linked list where each position has a unique supervisor. This is important for future maintenance, since a reorganization would require only the identification of a new supervisor, and the entire branch of the organizational tree would then be completely moved with just this one change. To populate or update the tree, a CGI-based tool was designed that allows a privileged account (authorization administrator) to perform the necessary maintenance functions, as shown in Figure 1. New hierarchies can be created, positions can be added to or deleted from a hierarchy, new supervisors can be specified for a position, and the occupants of the positions can be updated. For example, ORNL is a matrix organization that has not only line management, but also program management. Thus, a hierarchy for program managers would consist of the creation of a second tree appropriate to the program. If external organizations need to be authorized, such as funding agencies, then additional trees would be constructed for them. The system is flexible enough to allow a person to occupy several positions, whether in the same hierarchy or in several hierarchies.

Once the organizational chart is in place, an authorization tag is associated with each report, describing the allowed access to that report in terms of positions in the hierarchy and their supervisors and subordinates, or specific individuals. If several reports have the same logical access groups (for example, sick leave reports might always be accessible to the individual concerned and their supervisory chain), then generic authorization tag templates can be used to encapsulate the information. These tag templates use the regular expression language (REGEX) to extract information from the report names and build the corresponding authorization tags on-the-fly. Such constructs turn out to be extremely powerful; for example, the template describing the set of positions allowed access to the sick leave reports described above would have only one line, it would instantiate only one authorization tag when matched against a particular report, and thus would require only one verification for access. Currently the actual list of regular expressions are constructed manually.

FaMOUS provides for encryption using Apache servers, distributed authentication through a modification of the servers, and authorization through a new system which matches an organizational chart to a REGEX description of report identifiers.

Textual Report Translation

The reports within the FaMOUS system are translated on the fly, leading to significant performance constraints. While a number of general translation systems are available, such as the SGML-tools suite (formerly know as linuxdoc-sgml) and Perl programming tools such as SGMLSPL, the limited structural variety within this system and the performance criteria indicated a simple lexical translation to be best. Lex is a programming language designed for code compilation, and is ideally suited to document conversion.

In the lexical program the reports are processed as a sequence of tokens, and actions are specified for each token found. A lex program was constructed which parses the SGML report into three types of elements; tags (with attributes), endtags, and text. The list of tags and attributes is taken from a table built from the specifications of the FaMOUS DTD. This simple table consists of a list of tag elements and attributes and actions to be taken for each entry.

Table 1. Translation control for elements
Type	Attributes	MixedWith Text	OpenEnded	Name	StartTag	TagText	EndTag
ELEMENT	1 or 0	1 or 0	1 or 0	TagName	NULL	echo	action
ELEMENT	0	0	0	Title	<center><h1>	echo	</h1></center>

For each element, a processing action is specified at the identification of a StartTag, its contents, or its EndTag. Table 1 shows a sample entry for an tag element. The table contains a set of actions or a substitute string to be used. An "echo" entry specifies that the element should be output as is. A NULL indicates the element is to be ignored. ACTION indicates that a function should be called to specify the processing for that element. The hierarchy of tags is maintained, so the actions can be dependent on the tag ancestry. For ease of processing, flags indicate whether the tag contains attributes, is mixed with the text of other tags, and whether it can be open-ended, i.e. not contain a closing tag. If the SGML is a transliteration of the presentation markup, that translation can be specified in the table and will be used directly. For example in Table 1, the report Title element will be translated to HTML as a centered H1 text, with the text contents of the Title StartTag-EndTag pair being output verbatim. For attributes the table entries are simpler, as shown in Table 2.

Table 2. Translation control for attributes
Type	Name	Action
ATTRIBUTE	AttributeName	save

The save action indicates that the list of attributes is to be saved in a structure for later processing. This translation table is sufficient to control the compilation of the SGML report into other formats. More significant actions are given as function calls with the lex program itself. The lex program is constructed in blocks which separates the user variable declarations and actions from the document structure handling functionality.

As a final step in translation to HTML, the program allows for a dynamic HTML header and footer, not contained in the reports themselves. This allows insertion of links to help files, project pages, and bug report scripts to be placed for each report as shown. This information can be kept current and not expire as documents age.

Subsequent to the start of this project, a lexical analyzer for HTML and SGML has also been constructed by Dan Connolly. The program described here is customizable to a user defined DTD, as described in the translation table, and supports include statements within the SGML documents. Currently supported translations are SGML, HTML, comma separated values, and LaTeX. Current plans are to take the LaTeX translation to PDF file through use of the Adobe Distiller program.

Graphical Report Translation

Equally important to a textual representation of financial information is the graphical representation. Tables with the FaMOUS reports are saved into individual data files. Within the text based reports, the graphical data files are linked by their unique identifier

<Link

type=cgi_graph urla="report.year.month.type">


     <Img type=cgi_tnail

urla="report.year.month.type">Type</Link>

This tag sequence is translated by the lex program to call a graphical program that creates an in-line gif thumbnail of the actual graph dynamically. An example of such in-line thumbnails is shown in Figure 2. These thumbnails of the actual graphs are treated as icons which are linked to the graphing program to render the full image.

When called the graphing program itself constructs a large gif from the data file, and allows it to be redrawn in differing chart formats such pie, bar, line, and also can construct a postscript file for printing. It also provides a number of customization features such as graph height, width, number of data columns to graph (with offset) as shown in Figure 3.

Conclusions and Future Directions

The FaMOUS system was designed to provide an automated report delivery on the web combining integrated text and graphics, controlled through a flexible authorization and distributed authentication scheme.This project has demonstrated that a flexible authorization scheme can be used to assign access according to organizational system, and that authentication can be accessed remotely from a central system. It has further shown that a system of reports can be translated on the fly into HTML including text and the actual graphs as thumbnail images. This same translation can be easily modified to other formats, and has been extended to comma separated values for the tables and text, and to LaTeX. This system is currently running at ORNL, and has pioneered a new paradigm for intranet information delivery.

Future enhancements should include an automated toolkit for the construction of REGEX report identifiers being matched to the authorization structure. The current report translation to Adobe PDF is provided only through LaTeX, and should be considered directly. The new style sheet specification CSS should be incorporated into the HTML translation for greater control over the web presentation of the reports. Finally the FaMOUS DTD should be reevaluated in light of the recent work on XML.

Acknowledgments

The authors (Grady and Elmore) wish to thank James D. Mason of ORNL, and the convenor of the WG8 working group on Document Description and Processing Languages, for several fruitful discussions on the design of DTDs.

References

Adobe Distiller convert from postscript to the Portable Document Format (PDF). Further information can be obtained at http://www.adobe.com/