Security precogs divine web vulnerabilities BEFORE THEY EXIST

Three million web properties will go under the pwned hammer


Wayback is way ahead: Three million webpages are set to become hacker fodder according to research that could predict what websites will become vulnerable ahead of time.

The research by Kyle Soska and Nicolas Christin of Carnegie Mellon University used an engine which divined the future by looking at the past - more specifically, by trawling the Way Back Machine with its 391 billion stored pages for sites that had become malicious.

It determined that of 4,916,203 current benign webpages (tied to 444,519 websites) about 3 million would become vulnerable within a year.

The work was a boon to search engines for assessing malicious hits, blacklist operators, and affected website admins who could be warned ahead of potential compromise, according to Soska and Christin.

Their predictions, made with 66 per cent accuracy, were determined by a intelligent algorithm and by obtaining samples both malicious from blacklists including PhishTank and benign from the .com zone file. An astonishing eighty-nine per cent of these samples were captured by the Way Back Machine.

It was then a matter of looking back between three to 12 months before a site was compromised to acquire indicators of why it was popped.

Those indicators included sudden increases to traffic, the presence of certain files like the WordPress CMS which may be unpatched, and particular HTML tags.

User-generated content was parsed out from the assessed data on websites as it was not useful for determining sites that would become vulnerable in the future.

"Our approach relies on an online classification algorithm that can automatically detect whether a server is likely to become malicious," the duo wrote in their paper Automatically detecting vulnerable websites before they turn malicious[PDF].

"At a high level, the classifier determines if a given website shares a set of features with websites known to have been malicious. A key aspect of our approach is that the feature list used to make this determination is automatically extracted from a training set of malicious and benign webpages, and is updated over time, as threats evolve."

The classifier was efficient, interpretable, robust to imbalanced data and missing features, and adaptive to drastic changes over time, they said.

Plenty of systems existed to determine vulnerable and compromised websites but all were reactive, prompting the duo to develop a means to divine future flaws.

Determining vulnerable websites ahead of time could help decrease black hat search engine poisoning and redirection which was increasingly common; Sophos said in its 2013 threat report 80 per cent of malware-foisting websites were hacked web servers owned by innocent third-parties.

There were limitations in the system's assumptions that a potential vulnerable site could be determined by its traffic and content: attackers could compromise sites by brite-forcing passwords or could host their own sites with malicious intent.

The software developed in the research would be later released publicly. Further technical details were available in the Usenix paper. ®

Broader topics

Narrower topics


Other stories you might like

  • AI tool finds hundreds of genes related to human motor neuron disease

    Breakthrough could lead to development of drugs to target illness

    A machine-learning algorithm has helped scientists find 690 human genes associated with a higher risk of developing motor neuron disease, according to research published in Cell this week.

    Neuronal cells in the central nervous system and brain break down and die in people with motor neuron disease, like amyotrophic lateral sclerosis (ALS) more commonly known as Lou Gehrig's disease, named after the baseball player who developed it. They lose control over their bodies, and as the disease progresses patients become completely paralyzed. There is currently no verified cure for ALS.

    Motor neuron disease typically affects people in old age and its causes are unknown. Johnathan Cooper-Knock, a clinical lecturer at the University of Sheffield in England and leader of Project MinE, an ambitious effort to perform whole genome sequencing of ALS, believes that understanding how genes affect cellular function could help scientists develop new drugs to treat the disease.

    Continue reading
  • Need to prioritize security bug patches? Don't forget to scan Twitter as well as use CVSS scores

    Exploit, vulnerability discussion online can offer useful signals

    Organizations looking to minimize exposure to exploitable software should scan Twitter for mentions of security bugs as well as use the Common Vulnerability Scoring System or CVSS, Kenna Security argues.

    Better still is prioritizing the repair of vulnerabilities for which exploit code is available, if that information is known.

    CVSS is a framework for rating the severity of software vulnerabilities (identified using CVE, or Common Vulnerability Enumeration, numbers), on a scale from 1 (least severe) to 10 (most severe). It's overseen by First.org, a US-based, non-profit computer security organization.

    Continue reading
  • Sniff those Ukrainian emails a little more carefully, advises Uncle Sam in wake of Belarusian digital vandalism

    NotPetya started over there, don't forget

    US companies should be on the lookout for security nasties from Ukrainian partners following the digital graffiti and malware attack launched against Ukraine by Belarus, the CISA has warned.

    In a statement issued on Tuesday, the Cybersecurity and Infrastructure Security Agency said it "strongly urges leaders and network defenders to be on alert for malicious cyber activity," having issued a checklist [PDF] of recommended actions to take.

    "If working with Ukrainian organizations, take extra care to monitor, inspect, and isolate traffic from those organizations; closely review access controls for that traffic," added CISA, which also advised reviewing backups and disaster recovery drills.

    Continue reading

Biting the hand that feeds IT © 1998–2022