Keen to go _ExtInt? LLVM Clang compiler adds support for custom width integers
'Standard C integer types are incredibly wasteful' for certain types of programming
Erich Keane, a compiler frontend engineer at Intel, has committed a patch to the LLVM Clang project that enables custom width integers - such as 31 bit, 3 bit or 512 bit.
The assumption of power-of-two integer sizes is baked into computing and into the C language. "Historically these types have been sufficient for nearly all programming architectures, since power-of-two representation of integers is convenient and practical," said Keane.
There is a problem, though, when it comes to configuring FPGA (Field Programmable Gate Array) chips – integrated circuits designed to be customised for specific applications. Tools called High Level Synthesis Compilers are used to generate transistor layouts for FPGAs, he added, but standard C integer types are "incredibly wasteful."
"A vast majority of the time programmers are not using the full width of their integer types".
It is true: most of the time the numbers stored are far smaller than the maximum a 32-bit integer allows, 2,147,483,647 for a signed int, for example. This does not normally make any difference, since the CPU is designed for those types, but "on FPGAs logic gates are an incredibly valuable resource, and HLS compilers should not be required to waste bits on large power of two integers when they only need a small subset of that!" Keane said.
The C language also promotes operations on types smaller than int to operations on int, he said. The result of these two factors is "massively larger FPGA/HLS programs than the programmer needed, and likely much larger than they intended. Worse, there was no way for the programmer to express their intent in the cases where they do not need the full width of a standard integer type."
The LLVM-IR (Intermediate Representation) assembly language can represent integers of any bitwidth between 1 and 16,777,215, so the patch enables coders to use the new _ExtInt class of types, which translate into the corresponding LLVM-IR types. For example, "unsigned _ExtInt(9) foo;" declares a variable foo that is an unsigned integer type taking up 9 bits and represented as an i9 in LLVM-IR, said the Intel engineer.
This is the fourth attempt since 2017 to implement this feature, requested by Intel's FPGA group, and that it is "very far from over", Keane said, since it will be subject to approval by the ISO/IEC WG14 standards committee, which specifies the C programming language. A paper [PDF] has been submitted and received "near unanimous support" at the Spring WG14 committee meeting and could potentially be approved, with amendments, at the next WG14 meeting (set for October 2020 in Minneapolis, COVID-19 allowing), when it would become part of the language. The committee is rightly cautious about adding stuff to C so it might not happen, or it might be delayed.
Is this feature of any use outside FPGA programming? A discussion on HackerNews is inconclusive. "Arbitrary bit-width integers are great for writing computer emulator code," one commenter noted, though they observed that you could use Zig, which already supports arbitrary bit-width integers up to 128 bits.
C developers with an opinion on _ExtInt are invited to contact Keane or other WG14 committee members with their views.®