This article is more than 1 year old

Any fool can write a language: It takes compilers to save the world

The language wars were fun, but they're done

Opinion Here's a recipe for happiness. Don't get overexcited by the latest "C is not a language" kerfuffle.

Proper coders have known since its inception that C is as much a glorified library of assembler macros as anything else. Don't sweat it. That business with operating systems being infected by their old C genes, crippling all the new cool Rusts and Swifts? So what? If your code is limited by its OS interactions, you should probably go write a kernel.

There is one place, and one place only, where you should invest your emotional and intellectual  energies. Compilers. They saved the world once, and they're about to do it again.

For most young code-slingers, compilers are slightly mystical interfaces to the data divinities. If things are good, the completion messages flow down the terminal window like a high priest's blessings on the faithful. If you've done bad, the compiler heaps fire and brimstone on you, and you must repent. When judgement day comes, your executable is your reward.

Graybeards know compiler technology is much more complex than simple go/no-go code parsing. In the 70-odd years since Grace Hopper brought compilers into existence, every milestone in processor design has been blessed or cursed by compiler advances or the lack of them. Itanium died because Intel couldn't make the compiler fly. Arm prospered because its compilers made its performance available.

Only compilers can unify the market: C wouldn't be more than a footnote if its compiler technology wasn't easy to port to multiple platforms. Conversely, compilers can divide the market. Think of the 1980s with multiple monolithic C compilers trying to lock products into platforms by making porting painful. They weren't very good, they weren't very compatible, life was slower and harder than it needed to be.

GNU

Code contributions to GCC no longer have to be assigned to FSF, says compiler body

READ MORE

It is hard to overstate the difference that GCC made to that environment. Compilers can be split into three components: a front end, an optimizer, and a back end. The front end takes your code and digests it into a form the optimizer can then arrange according to rules, and the back end translates that output into the right object format for the target system. GCC liberated this structure, so if you wanted to write a new language, you just had to worry about the front end. A new target instruction set, just the back end. A particular cache structure, hit the optimizer. The benefits are shared with everyone.

That's where the true magic of compilerdom lies. You can have the world's most thoughtfully designed language, but if your compiler makes slow code, nobody will care. With GCC, you got the same kit of parts as everyone else and you could concentrate on what makes your language or processor special. When GCC stopped being the GNU C Compiler and became the GNU Compiler Collection, the economics and scope of compilers skyrocketed, and systems innovation followed them into orbit.

Then LLVM went a step further, and broke compiler technology into a set of libraries, meaning that some of the things compilers do very well, as in optimally managing data structures and parsing complex commands, can be added into databases and browsers and all manner of applications. Sure, languages are cool, but a complete architectural component model is cooler. Do you want magic in a bottle or to badly reinvent the wheel?

This isn't history. This is the future. Ignore what Intel says: Moore's Law is dead. Single program performance is done. The industry is moving to  accelerators, task-specific designs around heterogeneous multi-core, complex memory, massively parallel concepts.

They're all different; the code you write for one won't port to another, or even to the next version.  They are domain-specific architectures needing domain specific languages. Which the accelerator makers build and you take what you're given. We're back in the 1980s C compiler hellscape. Only this time, it's like that for all the languages.

It's very well for the C-spiters to say that your computer is not a souped-up PDP-11. But their computer increasingly isn't even von Neumann or Harvard, with nice unified memory space and homogeneous machine code. Their language assumptions are wrong and getting worse.

The good news is we're learning that massively parallel systems have much in common. They have control CPUs looking after memory hierarchies, tiling, security, and power management that do the same jobs whether you're accelerating TensorFlow or 5G.

If those can be abstracted sensibly, they can be handled in a common way – and indeed, a new outgrowth of LLVM, Multi Level Intermediate Representation (MLIR), is a project doing exactly that, bringing the proven miracle of open-source compilation right back where it's needed. As for the instruction set for the hardware, it'd be great if that too was open source and highly modular with a rich developmental ecosystem.

RISC-V, anyone? The synergies between that and LLVM have been there from the start.

Most of this is still to come, but it's a way forward that has all the power of history behind it and an economic and technical potential for the industry way above any alternative. By all means, enjoy the language wars; they're pleasant, harmless LARPery. Compiling the future, however, is where the real fun's at. ®

More about

TIP US OFF

Send us news


Other stories you might like