The Linux Foundation dives into machine learning with Open Voice Network, dataset licence launches
Looks to improve the simplicity with which such things are shared
The Linux Foundation has announced two projects with which it aims to help settle the choppy waters of machine learning: the Open Voice Network (OVN), and the CDLA-Permissive-2.0 licence for machine learning datasets.
"Voice is expected to be a primary interface to the digital world, connecting users to billions of sites, smart environments and AI bots," said Mike Dolan, senior veep and general manager of projects at the Linux Foundation. "It is already increasingly being used beyond smart speakers to include applications in automobiles, smartphones and home electronics devices of all types.
"Key to enabling enterprise adoption of these capabilities and consumer comfort and familiarity is the implementation of open standards. The potential impact of voice on industries including commerce, transportation, healthcare and entertainment is staggering and we're excited to bring it under the open governance model of the Linux Foundation to grow the community and pave a way forward."
That open governance model is behind the launch of the Open Voice Network, a Linux Foundation project which focuses on the development of open standards designed to promote user choice and trust in voice systems, the identification and sharing of best practices for conversational AI, and advocacy in work with existing industry associations – including a look at regulatory and legislative issues surrounding data privacy.
The OVN includes among its founding members Deutsche Telekom, Microsoft, Schwarz Gruppe, Target, Veritone, and Wegmans Food Markets, each of which has pledged "a commitment of resources in support of the its research, awareness and advocacy activities and active participation in the its symposia and workshops."
At the same time, the Linux Foundation also launched a new licence: the Community Data Licence Agreement Permissive 2.0, or CLDA-Permissive-2.0, specifically targeting the creation, distribution, and use of machine learning datasets – whether relating to voice or otherwise.
- 'Set it and forget it' attitude to open-source software has become a major security problem, says Veracode
- Open standard but not open access: Schematron author complains about ISO paywall
- Microsoft loves Linux so much that packages.microsoft.com has fallen and can't get up
- Open-source projects glibc and gnulib look to sever copyright ties with Free Software Foundation
Designed as a successor to the original CDLA, which was released back in October 2017, the new variant combines work from the CDLA with Microsoft's Open Use of Data Agreement (O-UDA) following the company's decision to grant stewardship of the latter to the CDLA project.
The new agreement is shorter, under a page in length, and aims at simplicity with a permissive model that doesn't require mandatory attribution when a dataset is used – whether that's for a commercial purpose or otherwise.
"Data is an essential component of how companies build their operations today, particularly around Open Data sets that are available for public use," Amanda Brock, chief executive of OpenUK, opined of the new licence.
"We welcome the CDLA-Permissive-2.0 license as a tool to make Open Data more available and more manageable over time, which will be key to addressing the challenges that organisations have coming up. This new approach will make it easier to collaborate around Open Data and we hope to use it in our upcoming work in this space."
The licence is already in use: IBM has confirmed it will be relicensing its public datasets under CDLA-Permissive-2.0, beginning with large-scale source code dataset Project CodeNet, while Microsoft has announced the release of the Hippocorpus short-story, Public Perception of Artificial Intelligence, Xbox Avatars Descriptions, Dual Word Embeddings, and GPS Trajectory datasets under the licence.