| Skip to main content | Skip to navigation |

Using Graph Matching Techniques to Wrap Data from PDF Documents

  • Tamir Hassan, Vienna University of Technology, Austria
  • Robert Baumgartner, Vienna University of Technology, Austria

Full text:

Track: Posters

Wrapping is the process of navigating a data source, semi-automatically extracting data and transforming it into a form suitable for data processing applications. There are currently a number of established products on the market for wrapping data from web pages. One such approach is Lixto, a product of research performed at our institute. Our work is concerned with extending the wrapping functionality of Lixto to PDF documents. As the PDF format is relatively unstructured, this is a challenging task. We have developed a method to segment the page into blocks, which are represented as nodes in a relational graph. This paper describes our current research in the use of relational matching techniques on this graph to locate wrapping instances.

Organised by

ECS Logo

in association with

BCS Logo ACM Logo

Platinum Sponsors

Sponsor of The CIO Dinner

Valid XHTML 1.0! IFIP logo WWW Conference Committee logo Web Consortium logo Valid CSS!