Full-up Google choking on web spam?

Buddy, can you spare a server?


Webmasters have been seething at Google since it introduced its 'Big Daddy' update in January, the biggest revision to the way its search engine operates for years.

Alarm usually accompanies changes to Google's algorithms, as the new rankings can cause websites to be demoted, or disappear entirely. But four months on from the introduction of "Big Daddy," it's clear that the problem is more serious than any previous revision - and it's getting worse.

Webmasters now report sites not being crawled for weeks, with Google SERPS (search engine results pages) returning old pages, and failing to return results for phrases that used to bear fruitful results.

"Some sites have lost 99 per cent of their indexed pages," reports one member of the Webmaster World forum. "Many cache dates go back to 2004 January." Others report long-extinct pages showing up as "Supplemental Results."

This thread is typical of the problems.

With creating junk web pages is so cheap and easy to do, Google is engaged in an arms race with search engine optimizers. Each innovation designed to bring clarity to the web, such as tagging, is rapidly exploited by spammers or site owners wishing to harvest some classified advertising revenue.

Recently, we featured a software tool that can create 100 Blogger weblogs in 24 minutes, called Blog Mass Installer. A subterranean industry of sites providing "private label articles," or PLAs exists to flesh out "content" for these freshly minted sites. And as a result, legitimate sites are often caught in the cross fire.

But the new algorithms may not be solely to blame. Google's chief executive Eric Schmidt has hinted at another reason for the recent chaos. In Google's earnings conference call last month, Schmidt was frank about the extent of the problem.

"Those machines are full," he said. "We have a huge machine crisis."

And there's at least some anecdotal evidence to support the theory that hardware limitations are to blame.

"The issue I have now is Googlebot is SLAMMING my sites since last week, but none of it makes it into the index. If it's old pages being re-indexed or new pages for the first page, they don't show up," writes one webmaster.

The confusion has several consequences which we've rarely seen discussed outside web circles.

Giving Google the benefit of the doubt, and assuming the changes are intentional, one webmaster writes: "In which case Google's index, and hence effectively 'the Web as most people know it' is set to become a whole lot smaller in the coming weeks."

It's barely more than a year since Yahoo! and Google were engaged in a willy-waving exercise to claim who had the largest index. (See My spam-filled search index is bigger than yours!)

Now size, it seems, doesn't matter.

There's also the intriguing question raised by search engines that are unable to distinguished between nefarious sites and legitimate SEO (search engine optimization) techniques? The search engines can't, we now know, blacklist a range of well-establish techniques without causing chaos. In future, will the search engines need to code for backward bug compatibility?

And lingering in the background is the question of whether the explosion of junk content - estimates put robot-generated spam consists of anywhere between one-fifth and one-third of the Google index - can be tamed?

"At this rate," writes one poster on the Google Sitemaps Usenet group, in a year the SERPS will be nothing but Amazon affiliates, Ebay auctions, and Wiki clones.  Those sites don't seem to be affected one bit by supplemental hell, 301s, and now deindexing."

With $8 billion in the bank, Google is better resourced and more focussed than anyone - but it's still struggling. Financial analysts noted that its R&D expenditure now matches that of a wireline telco.

Only a cynic would suggest that poor SERPs drive desperate businesses to the search engines own classified ad departments - so if you want to play, you have to pay. Banish that unworthy thought at once.

(Thanks to Isham Research's Phil Payne for the tip).®

Bootnote: Something called OneWebDay - we're not kidding - is encouraging you to celebrate the web with a "special hand signal - you extend your middle three fingers and have your thumb and little finger touch in a circle. Not the gesture many webmasters are making this week.


Other stories you might like

  • Intel to sell Massachusetts R&D site, once home to its only New England fab
    End of another era as former DEC facility faces demolition

    As Intel gets ready to build fabs in Arizona and Ohio, the x86 giant is planning to offload a 149-acre historic research and development site in Massachusetts that was once home to the company's only chip manufacturing plant in New England.

    An Intel spokesperson confirmed on Wednesday to The Register it plans to sell the property. The company expects to transfer the site to a new owner, a real-estate developer, next summer, whereupon it'll be torn down completely.

    The site is located at 75 Reed Rd in Hudson, Massachusetts, between Boston and Worcester. It has been home to more than 800 R&D employees, according to Intel. The spokesperson told us the US giant will move its Hudson employees to a facility it's leasing in Harvard, Massachusetts, about 13 miles away.

    Continue reading
  • Start using Modern Auth now for Exchange Online
    Before Microsoft shutters basic logins in a few months

    The US government is pushing federal agencies and private corporations to adopt the Modern Authentication method in Exchange Online before Microsoft starts shutting down Basic Authentication from the first day of October.

    In an advisory [PDF] this week, Uncle Sam's Cybersecurity and Infrastructure Security Agency (CISA) noted that while federal executive civilian branch (FCEB) agencies – which includes such organizations as the Federal Communications Commission, Federal Trade Commission, and such departments as Homeland Security, Justice, Treasury, and State – are required to make the change, all organizations should make the switch from Basic Authentication.

    "Federal agencies should determine their use of Basic Auth and migrate users and applications to Modern Auth," CISA wrote. "After completing the migration to Modern Auth, agencies should block Basic Auth."

    Continue reading
  • Arrogant, subtle, entitled: 'Toxic' open source GitHub discussions examined
    Developer interactions sometimes contain their own kind of poison

    Analysis Toxic discussions on open-source GitHub projects tend to involve entitlement, subtle insults, and arrogance, according to an academic study. That contrasts with the toxic behavior – typically bad language, hate speech, and harassment – found on other corners of the web.

    Whether that seems obvious or not, it's an interesting point to consider because, for one thing, it means technical and non-technical methods to detect and curb toxic behavior on one part of the internet may not therefore work well on GitHub, and if you're involved in communities on the code-hosting giant, you may find this research useful in combating trolls and unacceptable conduct.

    It may also mean systems intended to automatically detect and report toxicity in open-source projects, or at least ones on GitHub, may need to be developed specifically for that task due to their unique nature.

    Continue reading
  • Why Wi-Fi 6 and 6E will connect factories of the future
    Tech body pushes reliability, cost savings of next-gen wireless comms for IIoT – not a typo

    Wi-Fi 6 and 6E are being promoted as technologies for enabling industrial automation and the Industrial Internet of Things (IIoT) thanks to features that provide more reliable communications and reduced costs compared with wired network alternatives, at least according to the Wireless Broadband Alliance (WBA).

    The WBA’s Wi-Fi 6/6E for IIoT working group, led by Cisco, Deutsche Telekom, and Intel, has pulled together ideas on the future of networked devices in factories and written it all up in a “Wi-Fi 6/6E for Industrial IoT: Enabling Wi-Fi Determinism in an IoT World” manifesto.

    The detailed whitepaper makes the case that wireless communications has become the preferred way to network sensors as part of IIoT deployments because it's faster and cheaper than fiber or copper infrastructure. The alliance is a collection of technology companies and service providers that work together on developing standards, coming up with certifications and guidelines, advocating for stuff that they want, and so on.

    Continue reading
  • How can we make the VC world less pale and male, Congress wonders
    'Combating tech bro culture' on the agenda this week for US House committee

    A US congressional hearing on "combating tech bro culture" in the venture capital world is will take place this week, with some of the biggest names in startup funding under the spotlight.

    The House Financial Services Committee's Task Force on Financial Technology is scheduled to meet on Thursday. FSC majority staff said in a memo [PDF] the hearing will focus on how VCs have failed to invest in, say, fintech companies founded by women and people of color. 

    We're told Sallie Krawcheck, CEO and cofounder of Ellevest; Marceau Michel, founder of Black Founders Matter; Abbey Wemimo, cofounder and co-CEO of Esusu; and Maryam Haque, executive director of Venture Forward have at least been invited to speak at the meeting.

    Continue reading
  • DataStax launches streaming data platform with backward support for JMS
    Or move to Apache Pulsar for efficiency gains, says NoSQL vendor

    DataStax, the database company built around open-source wide-column Apache Cassandra, has launched a streaming platform as a service with backwards compatibility for messaging standards JMS, MQ, and Kafka.

    The fully managed messaging and event streaming service, based on open-source Apache Pulsar, is a streaming technology built for the requirements of high-scale, real-time applications.

    But DataStax wanted to help customers get data from their existing messaging platforms, as well as those who migrate to Pulsar, said Chris Latimer, vice president of product management.

    Continue reading
  • Infor to stop developing on-prem software for IBM iSeries
    ERP vendor had promised containerized options, but looks set to focus on the cloud

    ERP vendor Infor is to end development of on-premises and containerized versions of its core product for customers running on IBM iSeries mid-range systems.

    Born from a cross-breeding of ERP stalwarts Baan and Lawson, Infor was developing an on-premises containerized version of M3, dubbed CM3, to help ease migration for IBM hardware customers and offer them options other than lifting and shifting to the cloud.

    Infor said it would continue to run the database component on IBM i (Power and I operating system, formerly known as iSeries) while supporting the application component of the product in a Linux or Windows container on Kubernetes.

    Continue reading

Biting the hand that feeds IT © 1998–2022