Opinion The Open Source Initiative (OSI) and its allies are getting closer to a definition of open source AI. If all goes well, Stefano Maffulli, the OSI's executive director, expects to announce the OSI open source AI definition at All Things Open in late October. But some open source leaders already want nothing to do with it.
Open Source Initiative tries to define Open Source AI
READ MORELet's start with some background. Lots of companies – I'm looking at you, Meta – have been claiming that their AI models are open source. They're not. They're not even close.
So the OSI and a host of other companies and groups have been working on creating a comprehensive open source AI definition. After all, the OSI is the same organization that defines open source software with the Open Source Definition.
In their latest draft, the Open Source AI Definition – draft v. 0.0.9, which was announced at KubeCon and Open Source Summit Asia in Hong Kong, significant changes were made, which grated on the nerves of some open source supporters. These are:
- Role of training data: Training data is beneficial but not required for modifying AI systems. This decision reflects the complexities of sharing data, including legal and privacy concerns. The draft categorizes training data into open, public, and unshareable non-public data, each with specific guidelines to enhance transparency and understanding of AI system biases.
- Separation of checklist: The license evaluation checklist has been separated from the main definition document, aligning with the Model Openness Framework (MOF). This separation allows for a focused discussion on identifying open source AI while maintaining general principles in the definition.
As Linux Foundation executive director Jim Zemlin detailed at the KubeCon and Open Source Summit China, the MOF "is a way to help evaluate if a model is open or not open. It allows people to grade models."
Within the MOF, Zemlin added, there are three tiers of openness. "The highest level, level one, is an open science definition where the data, every component used, and all of the instructions must go and create your model the same way. Level two is a subset where not everything is open, but most are. Then, on level three, you have areas where the data may not be available, and the data that describe the data sets would be available. And you can understand that – even though the model is open – not all the data is available."
This doesn't fly with some people. Tara Tarakiyee, FOSS Technologist for the Sovereign Tech Fund, writes: "A system that can only be built on proprietary data can only be proprietary. It doesn't get simpler than this self-evident axiom."
Tarakiyee adds: "The new definition contains so many weasel words that you can start a zoo... These words provide a barn-sized backdoor for what are essentially proprietary AI systems to call themselves open source."
Open source leader julia ferraioli agrees: "The Open Source AI Definition in its current draft dilutes the very definition of what it means to be open source. I am absolutely astounded that more proponents of open source do not see this very real, looming risk."
AWS principal open source technical strategist Tom Callaway said before the latest draft appeared: "It is my strong belief (and the belief of many, many others in open source) that the current Open Source AI Definition does not accurately ensure that AI systems preserve the unrestricted rights of users to run, copy, distribute, study, change, and improve them."
- Intel's processor failures: A cautionary tale of business vs engineering
- CrowdStrike meets Murphy's Law: Anything that can go wrong will
- The graying open source community needs fresh blood
- Windows: Insecure by design
Afterwards, in a more sorrowful than angry statement, Callaway wrote: "I am deeply disappointed in the OSI's decision to choose a flawed definition. I had hoped they would be capable of being aspirational. Instead, we get the same excuses and the same compromises wrapped in a facade of an open process."
Chris Short, an AWS senior developer advocate, Open Source Strategy & Marketing, agreed. He responded to Callaway that he: "100 percent believe in my soul that adopting this definition is not in the best interests of not only OSI but open source at large will get completely diluted."
Steve Pousty, a developer advocacy consultant, commented on the OSI AI draft: "This definition does not grant the freedom to modify and is unacceptable as an Open Source Definition. With AI models, the weights are the user interface. I can use them directly as a user. They are what is typically distributed to everyone."
That's all well and good, but Maffulli doesn't feel a purely idealistic approach to the open source AI definition will work because no one will be able to meet the definition. Thus, the OSI's support for the MOF's levels of openness approach.
Callaway concluded: "They had a chance to lead, and they chose not to. I suppose the question is now: who will choose to lead in their place?"
That is indeed the question. Or will the community decide that the OSI AI Definition is the best practical way forward? Stay tuned. I fear this debate is going to last for years.
The real question to my mind is whether this will become a meaningless tech argument, such as vi vs EMACS (the answer's vi, by the way), while AI goes its merry way without referencing "open source" except as a marketing term. ®