WWW94: converting formatted documents to HTML
presented by jon stephenson von tetzchner, norwegian telecom research
the author presented a program called "fm2html" that converts documents
from FrameMaker Interchange Format (MIF) into HTML.
a MIF document can roughly be divided into four sections:
- specification of document structures (style sheets, etc.)
- definition of tables and frames
- page layout information
HTML documents contain only very little information about the layout of the
document, because it is up to the client software to decide how the data
shall be presented.
the conversion process
the conversion process is broken up into the following stages:
- convert the FrameMaker file into a FrameMaker MIF file using fmbatch
- convert the MIF file into a HTML file and at the same time,
extract figures and convert them into GIF files. also a table of contents
is automatically generated.
the conversion of plain text is quite straight forward. one of the problems
is that HTML does not support tabulators. therefore tabs are removed except
in the case where paragraphs are bound to the HTML construct "preformatted".
hypertext links are also converted except those that are page based. since
HTML is not page oriented, these links are ignored.
the process of converting figures has seven stages:
- extract each figure from the original document into a separate MIF file
- convert the figure to postscript format
- convert the figure to ppm format
- remove excess space
- add a border
- convert to GIF
- include figure into resulting HTML document
tables are converted using the HTML-"preformatted" tag.
in the future the proposed HTML+ format would allow more accurate automated
conversions of formatted documents.
i have no link information for this paper on the web.
13-jun-94 (ra) /