Apple releases OpenELM, a slightly more accurate LLM

It's not the fastest machine learning model, but you can't have everything

Apple, not normally known for its openness, has released a generative AI model called OpenELM which apparently outperforms a set of other language models trained on public data sets.

It's not by much – compared to OLMo, which debuted in February, OpenELM is 2.36 percent more accurate while using 2x fewer pretraining tokens. But it's perhaps enough to remind people that Apple is no longer content to be the wallflower at the industry AI rave.

Apple's claim to openness comes from its decision to release not just the model, but its training and evaluation framework.

"Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations," explain eleven Apple researchers in the associated technical paper.

And diverging from academic practice, the authors' email addresses are not listed. Chalk it up to Apple's interpretation of openness, which is somewhat comparable to the not-very-open OpenAI.

The accompanying software release is not a recognized open source license. It's not unduly restrictive, but it does make clear that Apple reserves the right to file a patent claim if any derivative work based on OpenELM is deemed to infringe on its rights.

OpenELM utilizes a technique called layer-wise scaling to allocate parameters more efficiently in the transformer model. So instead of each layer having the same set of parameters, OpenELM's transformer layers have different configurations and parameters. The result is better accuracy, shown in the percentage of correct predictions from the model in benchmark tests.

We're told that OpenELM was pre-trained using the RedPajama dataset from GitHub, a ton of books, Wikipedia, StackExchange posts, ArXiv papers, and more, and the Dolma set from Reddit, Wikibooks, Project Gutenberg, and more. The model can be used as you might expect: You give it a prompt, and it attempts to answer or auto-complete it.

One noteworthy aspect of the release is that it is accompanied by "code to convert models to MLX library for inference and fine-tuning on Apple devices."

MLX is a framework released last year for running machine learning on Apple silicon. The ability to operate locally on Apple devices, rather than over the network, should make OpenELM more interesting to developers.

"Apple's OpenELM release marks a significant advancement for the AI community, offering efficient, on-device AI processing ideal for mobile apps and IoT devices with limited computing power," Shahar Chen, CEO and co-founder of AI service biz Aquant, told The Register. "This enables quick, local decision-making essential for everything from smartphones to smart home devices, expanding the potential for AI in everyday technology."

Apple is keen to show the merits of its homegrown chip architecture for machine learning, specifically supported in hardware since Cupertino introduced its Neural Engine in 2017. Nonetheless OpenELM, while it may score higher on accuracy benchmarks, comes up short in terms of performance.

"Despite OpenELM’s higher accuracy for a similar parameter count, we observe that it is slower than OLMo," the paper explains, citing tests run using Nvidia's CUDA on Linux as well as the MLX version of OpenELM on Apple Silicon.

The reason for the less than victorious showing, Apple's boffins say, is their "naive implementation of RMSNorm," a technique for normalizing data in machine learning. In the future, they plan to explore further optimizations.

OpenELM is available in pretrained and instruction tuned models with 270 million, 450 million, 1.1 billion and 3 billion parameters. Those using it are warned to exercise due diligence before trying the model for anything meaningful.

"The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models," the paper says. "Trained on publicly available datasets, these models are made available without any safety guarantees." ®

More about


Send us news

Other stories you might like