Security

CSO

Google open sources file-identifying Magika AI for malware hunters and others

Cool, but it's 2024 – needs more hype, hand wringing, and flashy staged demos to be proper ML


Google has open sourced Magika, an in-house machine-learning-powered file identifier, as part of its AI Cyber Defense Initiative, which aims to give IT network defenders and others better automated tools.

Working out the true contents of a user-submitted file is perhaps harder than it looks. It's not safe to assume the file type from, say, its extension, and relying on heuristics and human-crafted rules – such as those in the widely used libmagic – to identify the actual nature of a document from its data is, in Google's view, "time consuming and error prone."

Basically, if someone uploads a .JPG to your online service, you want to be sure it's a JPEG image and not some script masquerading as one, which could later bite you in the ass. Enter Magika, which uses a trained model to rapidly identify file types from file data, and it's an approach the Big G thinks works well enough to use in production. Magika is, we're told, used by Gmail, Google Drive, Chrome's Safe Browsing, and VirusTotal to properly identify and route data for further processing.

Your mileage may vary. Libmagic, for one, might work well enough for you. In any case, Magika is an example of Google internally using artificial intelligence to reinforce its security, and hopes others can benefit from that tech, too. Another example would be RETVec, which is a multi-language text-processing model used to detect spam. This comes at a time when we're all being warned that miscreants are apparently making more use of machine-learning software to automate intrusions and vulnerability research.

Policymakers, security professionals and civil society have the chance to finally tilt the cybersecurity balance from attackers to cyber defenders

"AI is at a definitive crossroads — one where policymakers, security professionals and civil society have the chance to finally tilt the cybersecurity balance from attackers to cyber defenders," Phil Venables, chief information security officer at Google Cloud, and Royal Hansen, veep of engineering for privacy, safety, and security, said on Friday. 

"At a moment when malicious actors are experimenting with AI, we need bold and timely action to shape the direction of this technology."

The pair believe Magika can be used by network defenders to identify, fast and at scale, the true content of files, which is a first step in malware analysis and intrusion detection. To be honest, this deep-learning model could be useful for anyone who needs to scan user-provided documents: Videos that are actually executables, for instance, ought to set off some alarm and require closer inspection. Email attachments that aren't what they say they are ought to be quarantined. You get the idea.

More generally speaking, in the context of cybersecurity, AI models can not only inspect files for suspicious content and source code for vulnerabilities, they can also generate patches to fix bugs, the Googlers asserted. The mega-corp's engineers have been experimenting with Gemini to improve the automated fuzzing of open source projects, too.

Google claims Magika is 50 percent more accurate at identifying file types than the biz's previous system of handcrafted rules, takes milliseconds to identify a file type, and is said to have at least 99 percent accuracy in tests. It isn't perfect, however, and fails to classify file types about three percent of the time. It's licensed under Apache 2.0, the code is here, and its model weighs in at 1MB.

Moving away from Magika, the Chocolate Factory will also, as part of this new AI Cyber Defense Initiative, partner up with 17 startups in the UK, US, and Europe, and train them to use these types of automated tools to improve their security. 

It will also expand its $15 million Cybersecurity Seminars Program to help universities train more European students in security. Closer to home, it pledged $2 million in grants to fund research in cyber-offense as well as large language models to support academics at the University of Chicago, Carnegie Mellon, and Stanford.

"The AI revolution is already underway. While people rightly applaud the promise of new medicines and scientific breakthroughs, we're also excited about AI's potential to solve generational security challenges while bringing us close to the safe, secure and trusted digital world we deserve," Venables and Hansen concluded. ®

Send us news
10 Comments

3Blue1Brown copyright takedown blunder by AI biz blamed on human error

Worker copy-pasted wrong YouTube URL, says ChainPatrol

Schneider Electric warns of future where datacenters eat the grid

Report charts four scenarios from 'Sustainable AI' to 'Who Turned Out The Lights?'

It's only a matter of time before LLMs jump start supply-chain attacks

'The greatest concern is with spear phishing and social engineering'

Additional Microprocessors Decoded: Quick guide to what AMD is flinging out next for AI PCs, gamers, business

Plus: A peek at Nvidia's latest hype

AI spending spree continues as Microsoft commits $80B for 2025

With those whopping returns who could argue with the premis... oh wait

Looming energy crunch makes future uncertain for datacenters

But investors still betting big on bit barns thanks to AI and cloud demand

Biden said to weigh global limits on AI exports in 11th-hour trade war blitz

China faces outright ban while others vie for Uncle Sam's favor

Short-lived bling, dumb smart things, and more: The worst in show from CES 2025

The honors are dubious, but boy, so is the tech

Tired of begging, Microsoft now trying to trick users into thinking Bing is Google

If you can't beat 'em, just imitate their branding, hide yours and hope they don't notice

Trump China tariffs to 'overshadow' the 'progress' of AI PCs

Already inflated costs of NPU-based boxes set to jump on import tax

Google's 10-year Chromebook lifeline leaves old laptops headed for silicon cemetery

Longer support for newer models won't save prior versions from scrapheap

After China's Salt Typhoon, the reconstruction starts now

If 40 years of faulty building gets blown down, don’t rebuild with the rubble