Trojan Source attack: Code that says one thing to humans tells your compiler something very different, warn academics

Bidirectional character attack – simple and nightmarish

Updated The way Unicode's UTF-8 text encoding handles different languages could be misused to write malicious code that says one thing to humans and another to compilers, academics are warning.

"What if it were possible to trick compilers into emitting binaries that did not match the logic visible in source code?" ask Cambridge student Nicholas Boucher and Professor Ross Anderson in a paper published today.

They say it is possible, and outlined a new threat [PDF] that could be deployed by future supply chain attackers – making detection of something like the SolarWinds attack at code level even harder than it is already.

Tracked as CVE-2021-42574, the duo's research focused on so-called bidirectional ("bidi") characters in Unicode. These are used so words written in right-to-left languages (such as Arabic and Hebrew) can be inserted into sentences written in left-to-right languages (such as English). Boucher and Anderson discovered that they can be misused to misrepresent source code.

"Embedding multiple layers of LRI and RLI within each other enables the near-arbitrary reordering of strings," says their paper. "Our key insight is that we can reorder source code characters in such a way that the resulting display order also represents syntactically valid source code."

"In effect, we anagram program A into program B."

Concerningly, the academics say that Microsoft's VS Code and Apple's Xcode text editors don't highlight the use of bidi characters as prominently as they might – while praising Vim for showing them as "numerical code points."

Professor Anderson told The Register: "Most programming languages let you put [bidi characters] in string literals and in comments, so you can use them in source code: code that appears innocuous to a human reviewer can actually do something nasty. That's bad news for projects like Linux and Webkit that accept contributions from random people, subject them to manual review, then incorporate them into critical code."

The problem is not merely academic: Rust's maintainers patched rustc against the attack over the weekend after the researchers used it for a successful proof-of-concept, even though Rust acknowledged it has not seen the technique deployed in the wild.

Snippets of the technique exist on GitHub, although the Cambridge pair's paper says that none of them seemed to be malicious.

Break comment, receive code

Boucher and Anderson's paper included several examples of this novel attack technique. One, in Python, is presented below.

Code snippet demonstrating the bidirectional character Trojan Source attack

Click to enlarge

In figure 2 'alice' is defined as being worth 100, followed by a function that subtracts funds from Alice. The final line calls that function with a value of 50, so when executed that little program should give us a result of 50.

However, figure 1 shows us how bidi characters can be used to frustrate the program's intent: by inserting RLI we change the text direction from conventional English to right-to-left. The output of figure 1 becomes 100 in spite of our subtract funds function.

"This is because the word return in the docstring is actually executed due to a bidi override, causing the function to return prematurely and the code which subtracts value from a user's bank account to never run," explains the paper.

The same principle can be applied to other languages, including C, C#, C++ and JavaScript as well as Rust – though for the latter, yesterday's update to version 1.56.0 sees Rust rejecting code containing bidi characters.

Surely highlighting solves this

Most text editors used by devs highlight various levels of nested code, so you'd imagine bidi attacks would be frustrated by changes immediately showing up. Unfortunately, this isn't as reliable a defence as you might imagine: the academics say their "experience was mixed" on this front.

"Some attacks provided strange highlighting in a subset of editors, which may suffice to alert developers that an encoding issue is present. However, all syntax highlighting nuances were editor-specific, and other attacks did not show abnormal highlighting in the same settings" the paper says.

A pile of blocks with characters on their sides

Hey, AI software developers, you are taking Unicode into account, right ... right?


Defending against the attack technique could be as straightforward as rewriting software build pipelines to halt if they encounter a bidi character, suggest the academics.

The same technique could be used to insert homoglyphs – those irritating non-Latin characters used by fraudsters in domain names for years in order to phish the unwary.

Martin Lee, EMEA outreach manager for Cisco Talos, commented to The Register: "Managing security risk is all the more difficult when threat actors are able to compromise source code, or software update systems, in order to integrate malicious functionality within otherwise legitimate software. "This research underlines the fact that threat actors may bypass even the most secure perimeter defences. Organisations need to be constantly vigilant for evidence of incursion using both endpoint and network based security systems." ®

Updated to add

Interestingly, Atlassian issued a security advisory for CVE-2021-42574 affecting a collection of its products, from Confluence to Jira, with multiple software updates to address the issue.

"A vulnerability has been identified affecting multiple Atlassian products where special characters, known as Unicode bidirectional override characters, are not rendered or displayed in the affected applications," the IT giant said.

"These special characters are typically not displayed by the browser or code editors but can affect the meaning of the source code when it is processed by a compiler or an interpreter."


Boucher and Anderson's paper observes: "When writing vulnerability disclosures, descriptions that personalise the potential impact can be needed to drive action. Neutral disclosures like those found in academic papers are less likely to evoke a response than disclosures stating that named products are immediately at risk".

We reserve the right to arbitrarily rename the next security discovery FLAMINGHELLDEATHPWNAGE. Tenders will be issued in due course for design of a logo and procurement of a snappy domain name.

Similar topics

Other stories you might like

  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading
  • American diplomats' iPhones reportedly compromised by NSO Group intrusion software

    Reuters claims nine State Department employees outside the US had their devices hacked

    The Apple iPhones of at least nine US State Department officials were compromised by an unidentified entity using NSO Group's Pegasus spyware, according to a report published Friday by Reuters.

    NSO Group in an email to The Register said it has blocked an unnamed customers' access to its system upon receiving an inquiry about the incident but has yet to confirm whether its software was involved.

    "Once the inquiry was received, and before any investigation under our compliance policy, we have decided to immediately terminate relevant customers’ access to the system, due to the severity of the allegations," an NSO spokesperson told The Register in an email. "To this point, we haven’t received any information nor the phone numbers, nor any indication that NSO’s tools were used in this case."

    Continue reading

Biting the hand that feeds IT © 1998–2021