Changes to the way string comparisons work in the soon-to-be-released .NET 5.0 may break existing code on Windows.
The issue came to light when developer Jimmy Bogard was upgrading a library to support .NET 5.0 and noticed a test failing.
His code could find a string within another string using the
Contains method – which answers the question "is the searched string contained in the target string" – but when he used the
IndexOf method to find the location of the string, it returned -1, meaning not found. However, on .NET Core 3.0 or 3.1 on Windows, it worked as expected and
IndexOf returned the location of the searched string. The seeming anomaly only occurs with strings that contain special characters such as returns or perhaps certain diacritical marks.
He raised the issue on GitHub and a Microsoft engineer informed him: "This is by design as in .NET 5.0 we have switched using ICU instead of NLS."
ICU means the International Components for Unicode standard – and NLS refers to National Language Support, a Windows thing. The
Contains method is case-sensitive but culture-insensitive, while the
IndexOf method is culture-sensitive, meaning that some characters, such as soft hyphens, are ignored and other characters may be considered equivalent. Performing a culture-sensitive comparison without specifying a culture uses the system current culture, which can be an uncertain business.
Microsoft recommends that developers "use
StringComparison.OrdinalIgnoreCase for comparisons as your safe default for culture-agnostic string matching".
We were able to reproduce the .NET 5.0 string puzzle where Contains says Yes and IndexOf, No. Results are different on earlier versions of .NET Core on Windows
While that sounds fair enough, how many developers may have used the
IndexOf method without appreciating these complexities? Such code may break when upgraded from .NET Core 3.1 to .NET 5.0, and worse, may break unexpectedly if it is not covered by unit tests that include examples of the subset of string comparisons that behave differently.
"It is not right to compare the results of
IndexOf without the
StringComparison parameters," said a Microsoft engineer, but applications may do things that are "not right" and work perfectly for years.
On the positive side, the move to ICU on both Windows and Linux means that cross-platform code which behaved differently before will now behave the same.
That said, developers hate breaking changes, and the discovery of this one (which does not seem to have been flagged prominently by Microsoft before now) raises worries that there may be other obscure behavioural differences.
"I'm just not excited at the prospect of .NET 5 introducing a new crop of unknown unknowns and revisiting those fixes for not just my libraries, but our downstream dependencies too. That's significant economic cost to us that doesn't create new productivity improvements for our users," said a library author. Some are requesting an analyzer to uncover such issues before recompiling an application or library.
As another developer remarked: "Someone who is doing informal string munging, indifferent to the obscurities of characters, grapheme clusters, or locale, would take it as given that if
str.Contains(whatever) succeeds, there is no need to inspect the result from
str.IndexOf(whatever) because we were just told it is in there and therefore can be found." That does not seem unreasonable.
There is a workaround for this particular issue. Developers can set an option in the project or with an environment variable to continue using NLS with .NET 5.0. As is so often the case, the key thing is not how you solve the problem, but how you discover it. ®