Understanding what users write on the web....

Welcome to CAW2 (Content Analysis in Web 2.0)

Web mining deals with understanding, and discovering information in, the World Wide Web. Web mining focuses on analyzing three different sources of information: web structure, user activity and the contents. When referring to the Web 2.0, web structure and user activity related data can be dealt with in a very similar way that in the case of the traditional Web, however, in the case of contents, conventional analysis and mining procedures are not suitable anymore. This is mainly because, in the Web 2.0, contents are generated by users, who make a very free use of language and are constantly incorporating new communication elements which are generally context dependent. This kind of language can also be found on chats, SMS, e-mails and other channels of informal textual communication.

Visualising Slashdot

In order to help you in the process of analysing our data, we have made available a visual tool called WET, which has been prepared to enable the visualisation of threads of Slashdot. The tool provides an overview of a single conversation, showing details when hovering a comment and enabling the mapping of a set of Slashdot metrics (e.g. score) into a set of visual attributes such as colour, size or shape. To try out the tool you just have to click here, and select the corresponding file of the thread that you would like to visualise.

Once you click on the name of the desired conversation, you will have to open the thread:


Submissions will be done using EasyChair - CAW2.0


Test Set

General Description

The test set will be sent by mail to the participants after the paper submission. It will be composed of some data that is "similar" to the training set, although some metadata like labels or ratings may have been removed. The train and test sets constitute a partition of the same original data set.

Call for Papers

The organizer committee of the first CAW2 (Content Analysis in Web 2.0) workshop, The Universitat Pompeu Fabra and The Barcelona Media Innovation Center invites you to participate in this workshop, belonging to the International World Wide Web Conference WWW-2009 to be held in Madrid.

The general topic of this workshop is the analysis of the content generated by users in the web 2.0. This first year, the content is restricted to text. Two different paper modalities will be considered for the workshop:

Training Dataset (data already available for download)

Here you can download the training dataset for CAW2 2009 shared-tasks. You must be logged in to be able to download the data. If you still have not created an account, do it now. When you are logged in, come back to this page and you will see the download link below.

Syndicate content