Web Content Adaptation and Transcoding based on CC/PP and Semantic Templates

Minsu Jang
Electronics & Telecommunications Research Institute
Internet Computing Dept., ETRI, Yuseong P.O.Box 106
Daejeon-si, 305-350, South Korea
+82-42-860-1250
minsu@etri.re.kr
Jaehong Kim
Electronics & Telecommunications Research Institute
Internet Computing Dept., ETRI, Yuseong P.O.Box 106
Daejeon-si, 305-350, South Korea
+82-42-860-1783
jhkim504@etri.re.kr
Joo-Chan Sohn
Electronics & Telecommunications Research Institute
Internet Computing Dept., ETRI, Yuseong P.O.Box 106
Daejeon-si, 305-350, South Korea
+82-42-860-5660
jcsohn@etri.re.kr

ABSTRACT

We present a web content adaptation and transcoding system built based on W3C's CC/PP and device independence guidelines. We describe our system according to seven feature elements of web content adaptation and transcoding we identified through the development of our system. Also, we introduce an idea of semantic web template for higher quality transcoding. We conclude that, with semantic web templates we can efficiently and systematically create and organize device independent web content.

Keywords

CC/PP, Device Independence, Transcoding, Adaptation, Semantic Annotation, Semantic Web Templates

1. INTRODUCTION

As the age of post-PC is coming, device independent web access is becoming a more practical requirement. Realizing this, W3C has set up working groups devoted to research on device independence, delivery context representation, and web accessibility etc. As of this writing, some preliminary results from activities by the working groups have appeared. We think that, so-called, content transcoding system is an essential element of device independence, as there should be some way to adapt original content to any given delivery context for device independent web content service.

We developed a web content adaptation and transcoding system, titled as CATS(Content Adaptation and Transcoding System), which is extensible and highly configurable, based on internet technologies like CC/PP, UAProf, RDF, XML, XSL etc [1]. CATS is composed of two major parts - an annotation system and a transcoding system. The annotation system makes it possible to add semantic annotations to any web content, which are in turn used by the transcoding system to adapt original content. The transcoding system contains a collection of transcoding function primitives and a decision subsystem. The decision subsystem determines a sequence of transcoding primitives based on a given delivery context information necessary to properly transcode the adapted content. With the two parts each of which concentrating on different concerns, transcoding results were of higher quality, and the system architecture was more flexible and extensible.

2. SYSTEM FEATURES

Through the development of CATS, we identified 7 primary feature elements for content adaptation & transcoding. They are: delivery context representation, delivery context transportation, delivery context interpretation, content generation, content adaptation, delivery context aggregation and content transcoding. We describe features of CATS according to these elements.

2.1 Delivery Context Representation

CATS represents delivery context based on W3C's CC/PP [2] and WAP Forum's UAProf [3]. Most of the property vocabularies required to represent device capabilities are derived from UAProf, but a few additional vocabularies, such as Doctypes, DeviceType etc, were additionally defined for more accurate transcoding. To support current devices which do not send CC/PP profiles, we implemented a device detection module and provided a repository of pre-written device capability profiles so that device profiles can be retrieved by detecting the type of client devices.

2.2 Delivery Context Transportation

As of this writing, there's no standard or recommended way of transporting CC/PP profiles, but WAP Forum published in UAProf a CC/PP transportation protocol called W-HTTP. CATS supports this protocol, as DELI [4], a CC/PP support library for java servlets, does.

2.3 Delivery Context Interpretation

CATS contains a simple CC/PP parser, called CCPPAR(CC/PP PARser). CATS uses CCPPAR to access components and properties specified in CC/PP documents. CCPPAR parses a CC/PP document and constructs a DOM-like object structure, resolving duplicated properties when necessary. CCPPAR can convert a CC/PP document into a set of flat name-value pairs or a list of JESS [5] facts, either of which in turn is delivered as input to delivery context aggregation stage.

2.4 Content Generation

This is a task of web servers or web applications, where CATS has no privilege to intervene.

2.5 Content Adaptation

Content adaptation is a task of restructuring original content based on pre-authored semantic annotations. Semantic annotations include semantic information and adaptation commands. Semantic information is composed of semantic properties like importance, fidelity, role etc, which is used as referential information for adaptation. Adaptation commands are simple content handling primitives. In CATS, there're three commands: 'remove', 'keep', 'replace'. Adaptation commands are assigned to sub-trees of original content's DOM tree, each of which is activated only when a condition for each command is satisfied. A condition is a statement of the form: "if the current delivery context states that property x is of value y, then execute the following adaptation command."

CATS contains a WYSIWYG annotation editor which can browse web pages and attach semantic annotations to the pages. Annotation documents are converted into XSL documents and then saved on an annotation server. At content service time, transcoding system retrieves annotation documents from a pre-designated annotation server and applies them to original web content.

2.5.1 Annotation by multiple parties

CATS allows authoring annotations by multiple parties such as content authors, content providers and content consumers. Content authors will want to put sufficient semantic information and some constraints to their content so that some specific parts won't be removed or deteriorated unexpectedly by transcoding process. Content providers will concentrate on supporting wide range of devices. Content consumers will try to trim down content or replace some parts of the content as an activity of personalization. CATS provides different annotation vocabularies for each of these parties and contains a conflict resolution algorithm to be used when multiple parties create different annotations for the same web page.

2.6 Delivery Context Aggregation

Delivery context aggregation is composed of three sub-stages: 1) collecting additional profiles, 2) aggregating all the collected profiles, 3) producing a sequence of transcoding commands.

In CATS, delivery context aggregation stage can be executed using a rule-based system, JESS, or an XSL engine, Xalan. Aggregation rules are specified in if-then rules when using JESS or XSL's it-test statements when using Xalan. Aggregation rules test various conditions based on delivery context property values obtained from various profiles to conclude whether any specific transcoding feature is needed or not. Thus, aggregation rules are key elements for shaping transcoding results.

By specifying aggregation rules in a separate rule-base or XSL file, one can easily fine-tune transcoding features to suit her own needs by simply editing the rules.

2.7 Transcoding

CATS interprets a transcoding command sequence and calls transcoding components. A transcoding component implements a transcoding command.

In CATS, we implemented various transcoding components like tag filters, HTML table serializers, HTML and WML fragmentation components, markup converters (HTML-to-XHTML Basic, HTML-to-WML, HTML-to-mHTML, WML-to-mHTML, mHTML-to-WML) and resource adapters. Resource adapters dynamically select from a set of prepared resources of various fidelity most appropriate one for a given delivery context.

CATS provides a simple java API with which one can develop transcoding components and install them into CATS.

3. SEMANTIC WEB TEMPLATES

Though CATS did its job not badly, there were two major complaints from content providers who were interested in CATS.

3.1 Problems: Performance and Quality

The first one was low performance. Though annotation documents are authored statically, CATS is basically a dynamic transcoding system in that every stage of adaptation and transcoding is executed at the time of content service. This dynamicity is essential because original source content and delivery context are all dynamically changing. CATS should actively respond upon changing environment. But this dynamicity was the major source of performance bottleneck for CATS.

The second was compromised quality of transcoded results. Though CATS uses semantic annotation for higher quality transcoding, the quality still was not up to that of human-authored content. There should be some way to lift up the quality quite a bit.

3.2 An Answer: Semantic Web Templates

A web page is composed of a set of content components. The structure of a web page can be specified with a web page template. A template is composed of a number of regions where content components are placed in.

A semantic web template is a web page template augmented with semantic information which explains and characterizes each content component placed in each region of the template. An example of semantic information is: "The content in this region should be a flash animation of size 120x50, and it's an ad of highest importance for our site." Semantic web templates are content specifications. Authors and content providers should create and put content components into a web page which are conformant to the semantic web template of the page.

With semantic web templates, CATS can almost statically determine a proper transcoding sequence for each content component and template. At content service time, pre-determined transcoding sequences can be simply applied mostly without delivery context aggregation, which can boost performance. And with more semantic information on content, transcoding quality will be improved.

4. SUMMARY AND FUTURE WORK

We described a web content adaptation and transcoding system which is developed based on W3C's CC/PP. We identified seven primary feature elements of web content transcoding, and presented an idea of semantic web templates.

We plan to extend our system to make it a full-powered device independent web publishing system. We believe that semantic web templates and semantic annotation will play a crucial role in device independent authoring.

5. REFERENCES

  1. Minsu Jang, Jaehong Kim, Joo-chan Sohn. CATS - Content Adaptation & Transcoding System. Proceedings of International Conference on Advanced Communication Tech., pp322~pp326, January 2003.
  2. Graham Klyne, Franklin Reynolds, Chris Woodrow, Hidetaka Ohto, Mark H. Butler. Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies. W3C Working Draft, 08 November 2002. http://www.w3.org/TR/CCPP-struct-vocab/
  3. WAP Forum. UAProf User Agent Profile Specification. Version 21-Jun-2000. http://www1.wapforum.org/tech/documents/WAP-174_100-UAProf-20000621-a.pdf
  4. Mark H. Butler. DELI: A Delivery context Library for CC/PP and UAProf. 08 February 2002. http://www-uk.hpl.hp.com/people/marbut/DeliUserGuideWEB.htm.
  5. Sandia National Laboratories. JESS: An Expert System Shell for the Java Platform. http://herzberg.ca.sandia.gov/jess