Skip to main content.

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Wi-Fi Hardware Sponsor

Banquet Sponsor

Conference Bag and T-Shirts Sponsor

Conference Flash Disk Sponsor

Conference Welcome Reception Sponsor

Conference VIP Dinner Sponsor

Organizers

Beihang University

Host of the Conference Series

International World Wide Web Conferences Steering Committee

In Cooperation With

ACM SIGWEB
ACM SIGecom

China Computer Federation

International Federation for Information Processing

World Wide Web Consortium

Sponsors are Listed In Alphabetic Order

Poster

Track: Posters

Paper Title:
A Larger Scale Study of Robots.txt

Authors:

Santanu Kolay(Yahoo! Inc.)
Paolo D'Alberto(Yahoo! Inc.)
Ali Dasdan(Yahoo! Inc.)
Arnab Bhattacharjee(Yahoo! Inc.)

Abstract:
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site to allow or disallow part or all of its content to certain crawlers, resulting in a favorable or unfavorable bias towards some of them. A 2007 survey on the robots.txt usage of about 7,593 sites found some evidence of such biases, the news of which led to widespread discussions on the web. In this paper, we report on our survey of about 6 million sites. Our survey tries to correct the shortcomings of the previous survey and shows the lack of any significant preferences towards any particular search engine.

PDF version

Inquiries can be sent to: Email contact: program-chairs at www2008.org

**Sponsors are Listed In Alphabetic Order**

Poster

Sponsors are Listed In Alphabetic Order