Scrapers Selectively Respect robots.txt Directives: Evidence From a Large-Scale Empirical Study

Taein Kim; Karstan Bock; Claire Luo; Amanda Liswood; Chloe Poroslay; Emily Wenger

doi:10.1145/3730567.3764471

What is it about?

The Robots Exclusion Protocol is used across the internet by website owners to outline privacy directives to automated crawlers, but is not a method of enforcement. This paper explores how effective the protocol is in preventing unwanted web traffic towards websites, and whether there are directives in the protocol that are more likely to be complied with.

Photo by julien Tromeur on Unsplash

Why is it important?

Our findings show that stricter directives in the robots.txt directives of the Robots Exclusion Protocol are less likely to be followed. Consequently, more enforceable alternatives to the protocol are necessary to protect user data in websites across the web.

This page is a summary of: Scrapers Selectively Respect robots.txt Directives: Evidence From a Large-Scale Empirical Study, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3730567.3764471.
You can read the full text:

Read

Contributors

The following have contributed to this page

Taein Kim
Duke University

Compliance of web scraping directives

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Compliance of web scraping directives

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management