This article is more than 1 year old
Get coding or you'll bounce email from new dot-thing domains
Expansion in DNS means you may struggle to handle email from Chinese or Arabic domains
Sysadmins needs to update their code for handling email addresses to meet changes to the global DNS, or they risk losing customers and reputation.
That's according to a number of domain experts meeting at the NamesCon conference in Las Vegas this week.
Representatives from Google, ICANN, DotAsia and the Domain Name Association (DNA), among others, met to discuss the issue of "universal acceptance" and figure out how to reach the millions of code bases out there on the internet that are using outdated and potentially buggy code.
At the heart of the issue is the expansion of the internet both in terms of top-level registries and global users. Among the hundreds of new top-level domains are several dozen in non-Latin scripts: Arabic, Chinese, parts of French and the like.
Increasingly these registries and their domains are not only being used in browsers but for email. And that is causing serious problems.
Who the hell configured this?
Most systems, including mail servers, are designed to accept email addresses with some level of coding aimed at verifying that it is an actual email address. Anything from simply checking there is an '@' symbol, to limiting entries to particular characters, to running a whitelist that excludes anything that isn't a recognized domain (some even exclude free email domains in an effort to limit spamming).
The issue of resolving websites with new internet domains in browsers has been largely fixed thanks to Mozilla's public suffix list and Chrome referring to the authorized list of top-level domains published by IANA.
But internationalized domain names (IDNs) present a more difficult problem since they are sent in punycode and start with the prefix "xn--", which means that any software handling domains has to reference an IDN repository to know how to display them.
Emails are one step further in complication since rules need to be developed both for the username before the '@' symbol and the domain ending. Since a lot of email coding pre-dates the introduction of IDNs, systems will often simply reject emails that don't work within the old DNS model - refusing to allow dashes for example.
Some systems have coding that can handle the domain part of an email address but not the username (if presented as an IDN) - mostly because there still isn’t a standard to encode that part of an email.
But as more and more people come online and more non-English users start adopting emails in their own language, the failure to accept someone's email address as valid will start leading to a loss in global customers and will make you look silly.
It isn't as simple as allowing anything in the email box however. A famous example of a potential security issue is in the use of Cyrillic to create what looks exactly like a Latin-script "a" but it is in fact a completely different character (there are many other examples). That means, for example, that the domain name "raural.com" is Cyrillic can look exactly like "paypal.com" in Latin. The opportunities for spoofing and phishing are obvious.
Practice makes perfect
And so, explained CEO of DotAsia, Edmon Chung, a group of domain experts are developing best practice guidelines and policies that will help mitigate potential problems - for example, code that will not accept a mix of scripts in the same email or domain.
That would be a start-point, explained Chung, but it needs refinement since there are legitimate uses for mixed scripts in both Japanese and Chinese, not to mention the fact that many people use Latin usernames with IDN domains.
Google has been digging in this issue for some time. Gmail is one of the very few email services that accepts full IDN email addresses. How it currently deals with email addresses that don't follow typical patterns is through an alert box. But, as program manager Brent London pointed out, there are additional complications.
For example, Arabic is written right-to-left. Systems set up to handle Arabic script typically switch when the first Arabic character is entered. But with a mixed script email address such as 'customer.care@[IDN domain].IDN', the system needs to be able to handle both left-to-right and right-to-left scripting.
This provides another potential security hole: if someone registers the domain name "customer.care" in this scenario they could type in the Arabic script first (username), causing an email system to switch to right-to-left and then put in the domain "customer.care". It would appear in the box as though "customer.care" was the username on the left-hand side of the email address. The only difference would be that the whole address would be aligned right.
While London noted that due to the relatively small take-up of IDN emails at the moment this is not a big security risk, it is something that is likely to get increasingly problematic over time.