The future of AI is ... analog? Upstart bags $100M to push GPU-like brains on less juice

EnCharge claims 150 TOPS/watt, a 20x performance-per-watt edge

Interview AI chip startup EnCharge claims its analog artificial intelligence accelerators could rival desktop GPUs while using just a fraction of the power. Impressive — on paper, at least. Now comes the hard part: Proving it in the real world.

The outfit boasts it has developed a novel in-memory compute architecture for AI inferencing that replaces traditional transistors with analog capacitors to achieve a 20x performance-per-watt advantage over digital accelerators, like GPUs.

According to CEO Naveen Verma, EnCharge's inference chip delivers 150 TOPS of AI compute at 8-bit precision on just one watt of power. Scale it up to 4.5 watts, and Verma claims it could match desktop GPUs — but with 1/100th the power draw. That's the pitch, at least.

However, this isn't all theoretical. EnCharge's chips were spun out of Verma's lab at Princeton where they were developed with support from the United States Defense Advanced Research Projects Agency, aka DARPA, and Taiwanese chip factory giant TSMC. Verma told us that the biz has now taped out several test chips to prove the architecture can work.

"The products we're building are actually based on a fundamental technology that came out of my research lab," he said. "We've really had an opportunity here to look at, fundamentally, what are the challenges with AI compute."

With $100 million in new series-B funding from Tiger Global, RTX, and others, EnCharge plans to tape out its first production chips for mobile, PCs, and workstations later this year.

Verma claims the real difference is in how and where the chip handles computation. The vast majority of genAI compute today is done using many, many multiply accumulate units, or MAC for short.

In traditional architectures, these are built using billions of transistor gates, which ultimately operate on discrete values due to the way the numbers are represented using binary ones and zeroes. Verma argues this approach can be improved upon, and made more efficient and precise, by using continuous values rather than discrete ones.

Thus, EnCharge's MACs are built using analog capacitors, which can represent arbitrary continuous signal values based on their charge level. And because capacitors are basically just two conductors separated by a dielectric material, they can easily be etched into silicon using existing CMOS technologies, Verma said.

The second element of EnCharge's design is that analog computation is handled in memory.

In-memory compute is by no means a new concept. Several companies have been working to commercialize AI accelerators based on the concept for years. The idea behind this concept is that by embedding compute — often in the form of a bunch of math circuits — into the memory, the matrices can be calculated in place rather than having to shuttle data around all the time.

With EnCharge's design, the analog capacitors are now responsible for carrying out this calculation, by adding up the charges.

"When you drive any one of these capacitors, the output of the capacitive line that's coupled basically goes to the average value of the signal," he said. "An average is an accumulation. It should normalize to the number of terms you're averaging."

Achieving this took eight years of research and development, and involved not only the development of an in-memory analog matrix accumulate unit, but also all the other stuff necessary to make them programmable.

"We recognized that what you have to do when you have these fundamental technology breakthroughs, is also build a full architecture, and build all of the software," Verma said.

And speaking of programmability, EnCharge's chip supports a variety of AI workloads ranging from convolutional neural networks to the transformer architectures behind large language and diffusion models.

As an inference chip, the design will vary depending on the target workload. For some workloads, factors such as memory capacity and bandwidth may have a bigger impact on performance than raw compute.

Large language models, for example, tend to be heavily memory bound with memory capacity and bandwidth often having a larger impact on perceived performance than the number of TOPS it can churn out. So, Verma says, an EnCharge chip targeting those kinds of workloads might dedicate less die area to compute to make room for a bigger memory bus.

On the flip side, for something like diffusion models – which aren't nearly as memory-bound – you might want more compute in order to generate images faster.

For now, EnCharge is sticking to M.2 or PCIe add-in cards due to ease of adoption. We've previously seen lower power accelerators packaged in this form factor, like Google's Coral TPU and Hailo's NPUs.

In the long run, the technology could be adapted for larger, higher-wattage applications, Verma said. "Fundamentally, the ability to grow to 75 watt PCIe cards and so on is all there."

The initial batch of production EnCharge chips is expected to tape out later this year, though he notes it'll take a little while longer before they see widespread adoption as the startup works to integrate the chips into their customers' designs and build out software pipeline. ®

More about

TIP US OFF

Send us news


Other stories you might like