WWW5 tutorial: collecting and serving information
introduction:
this tutorial covered exactly the problems we are currently facing at the
ETH in general and the "informatikdienste" in particular: the pros and
cons of having multiple independent WWW servers versus one centralized WWW
server.
of course, there is no way to centralize ALL HTML documents
currently available at the ETH in ONE PLACE and it
would not make any sense at all to try to do so. nevertheless, there are now
discussions about subjects addressed by this speech such as:
- uniform style over all documents
- ease of navigation within the organization's information
- searching and indexing
- consistent and persistent URLs
- valid HTML syntax
different models of WWW server structures:
there are various ways to structure one or more WWW servers within one
organization. the two major approaches are "distributed structures"
and "centralized structures". in this
tutorial, they distinguished the following five structures ranging from "anarchy"
to "dictatorship" with the first two structures considered "distributed":
- multiple independent servers
- multiple coordinated servers
- generally accessible central server
- subscription server
- controlled centralized server
three of the above models (1, 3 and 4) have been discussed in greater detail
and the advantages and disadvantages were compared from the user's, the
information provider's and a global point of view. at the ETH, we basically have
dozens of more or less independent servers (there is some minimal coordination
particularely in the addressing scheme of the servers), but at the same time, we
run some generally accessible central servers ( APACHE, ezInfo, WAWONA) that are
used by many different organizational units which do not run their own WWW
server for various reasons.
multiple independent servers:
many organizational units inside the organization run their own WWW server.
at the ETH, this organizational units are departments, institutes, working
groups etc. these organizational units operate their server on their own, but
can get support from the "informatikdienste" if they want to.
advantages:
- the information providers can choose their most favorite platform
(hardware, operating system, httpd daemon, etc.) and operate the server the way
they like it best.
- users benefit from multiple servers because they do not represent a single
point of failure and they may like shorter URLs better.
- in general, the organizational units do not depend on a central service.
disadvantages:
- because the information providers run their own server, they have to
act as WebMasters (taking care of the server software, etc).
- information may be duplicated due to the lack of coordination.
- the structure of the URLs may be inconsistent within the organization.
- URLs may be more non-persistent as changes may occur more frequently.
- searching over all Web servers of the organization is much more difficult.
- it may be more difficult to apply an uniform style through the
organization's Web pages.
generally accessible central server:
the information providers use a central server to create and maintain their
own documents. at the ETH, the "informatikdienste" operate three
central servers that are used by various organizational units to provide their
Web pages. once the so called Web moderator has received a username and a
password to access the server, she or he create and maintain her/his documents
on her/his own responsibility. most providers create the documents on their own
workstation or PC and transfer the documents to the server when they are ready
to be published. others may create and maintain the documents directly on the
server. it is also possible to use NFS or AFS so that the provider can maintain
the documents on her/his private workstation while the files still reside on the
server. another convenient way would be the use of an authoring tool that
supports remote save.
advantages:
- the provider can benefit from a central server because she or he does
not have to deal with the server software itself. also data backup and all other
management tasks will be covered by a central service. in addition, the central
server may provide other services, such as cgi scripts, HTML syntax and link
checking, generation of statistical information and so on.
- the user may benefit from a centralized search index and more consistent
presentation of the information.
- in general, system management efforts may be reduced if only one or at
least a small number of servers have to be maintained.
disadvantages:
- information providers must rely on a central service and may have to
follow some policies.
- a single server represents a single point of failure. if the server is
down, no information about the whole organization will be available to the Web.
subscription server:
a subscription server is a centralized server which does not give
information providers direct access to their Web pages. instead, it provides
means to transfer data to the server, usually via HTML forms. documents may not
only be transferred from the provider's system to the central server, but they
may also be checked, converted and registered in a database. currently, we do
not operate a subscription server at the ETH.
advantages:
- information providers may submit their documents in their most
favorite format, for example as postscript file, word document or even as a
printed paper. the team operating the central server will convert the documents
to the appropriate format, e.g. HTML, PDF etc.
- because all documents are processed somehow, they may all have a unified
layout, the links will be more consistent and persistent and from the
registration database, users may get valuable information about the documents
and the authors.
- searching and indexing is much easier.
disadvantages:
- information providers have no direct access to their Web pages and
updates may take longer because they have to be processed.
- if documents are stored in different formats, they need a lot of disk
space. if conversion is done on the fly, access to documents may be slow.
conclusion:
all of the above mentioned structures have their advantages and
disadvantages. in an organization such as the ETH, a mixture of distributed and
centralized servers is probably the only practical solution.
in my opinion,
centralized servers are good for:
- organizational units that cannot or do not want to operate their own
server, either because they lack people with the required knowledge, because
they do not want to spend time to manage a server or because they prefer to rely
an a system that is managed by professionals.
- important documents that shall be presented in a consistent style, such as
the homepage of the organization itself and homepages of important and highly
visible organizational units, such as public relation.
- documents that do not change very often and which URLs shall be short,
simple and persistent.
on the other hand, pages that changes very often shall be located on the
organizational unit's server. this is particularly true if the information is
dynamically created either from a database or by a cgi script.
information shall never be duplicated, it should always be
available from its source !
the slides of this presentation are available
on the Web.
back to WWW5 main document.
WWW5 tutorial 11 /
22-may-96 (ra) /
reto ambühler
!!! Dieses Dokument stammt aus dem
ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the
ETH Web archive and is no longer maintained !!!