Publishers punt new web crawler blocking standards

Rusty robots.txt for the scrapheap?

Thu 29 Nov 2007 // 11:50 UTC

A long-awaited new standard designed to give webmasters more control over how search engines and newsreaders access their content will be unveiled in New York today.

After a year-long pilot the Automated Content Access Protocol (ACAP) will be launched at the headquarters of the Associated Press. It aims to improve on the current robots.txt permission file for spiders and other bots.

ACAP will include the commands designed to allow web publishers to limit how long content can be indexed for and how much of an article news aggregators are allowed to display.

A standard "Follow" command will block or allow crawlers to follow links in a page - the basis of Google's PageRank algorithm. Google currently obeys the non-standard HTML "NOFOLLOW" meta tag.

Robots.txt was created by consensus way back in in 1994 and is voluntary, though all the major search engines comply. The campaign for a new protocol was fired by the emergence of Google News and other aggregators.

More traditional news organisations including AFP and the Telegraph have engaged in sabre-rattling over such indexes, which they said parasitise their journalism.

AFP eventually got what it wanted - a revenue-sharing deal - after it threatened a landmark test case in the US. A Belgian newspaper group has led the anti-indexing charge lately.

ACAP is being pushed by the World Association of Newspapers, the European Publishers Council and the International Publishers Association. It's an attempt to soothe their industry's web worries by handing more control back to the producers of news.

The new standards have been cautiously welcomed by Google, according to AP, but the firm is still "evaluating" the new system.

There's more info on version 1.0 of ACAP here. More features are planned, including permissions for indexing web video. ®

More about

COMMENTS

TIP US OFF

Send us news

Topics

Special Features

Vendor Voice

Resources

Channel

Publishers punt new web crawler blocking standards

Rusty robots.txt for the scrapheap?

More about

TIP US OFF

Other stories you might like

UK agriculture department slammed for paper pushing despite tech splurges

BOFH: Smells like Teams spirit

Help! My mouse climbed a wall and now it doesn't work right

Industrial systems integrating digitalisation

VMware’s end-user compute community told to brace for ‘Omnissa’ shift

Flaws in Chinese keyboard apps leave 750 million users open to snooping, researchers claim

Atlassian loses half its CEOs, but customers stay solid after Server products exit support

Intel excited by PC sales pop and GPU prospects, but investors aren’t because the outlook is poor

What's up with Alphabet and Microsoft lately? Profits, sales – and AI costs

Amazon to blow $11B on cluster of Indiana bit barns

Cops cuff man for allegedly framing colleague with AI-generated hate speech clip

Ring dinged for $5.6M after, among other claims, rogue insider spied on 'pretty girls'

About Us

Our Websites

Your Privacy