Linux data-sharing licences: So, will big data hogs take the plunge?

Experts weigh in


With its new open data licensing framework, announced on Tuesday, the Linux Foundation has created legal frameworks around sharing raw, unorganised data to tempt generous companies, nonprofits, government agencies and researchers to do so.

But an expert says their current ambiguity makes them risky, and others are concerned over licensing compatibility issues.

Mike Dolan, the Linux Foundation's VP of strategic programs – who helped draft the licences – told El Reg that individuals or organisations working on machine learning, traffic flow or other data-heavy systems could gain a lot from sharing, such as improving algorithms and increasing resources.

But today (excluding sensitive data covered by law), you either keep your raw data a trade secret or release it with no IP restrictions, said Estelle Derclaye, an IP lawyer at the University of Nottingham. There are already comprehensive licence agreements for sharing and attributing data organised in a database (such as CC-BY, the Open Data Commons Open Database License, or the Open Data Commons Attribution License).

When Derclaye reviewed one of the two new licence agreements at The Reg's request, she told us: "I wouldn't want to sign it."

Why a new licence?

Dolan said the aim was "to ensure that data providers and users had clarity about their ability to curate, use, and share" in order to enable "the creation of open, collaborative data, collaborative data communities". Drafting began during the third quarter of 2016 because of a perceived gap in one-shop licence agreements.

He gave the example of training a drone to fly autonomously – what if a dataset didn't include any examples of trees, a user trained its drone on the data, and it crashed into one? Whose fault would that be?

One licence agreement requires that changes to data be shared. There's also a permissive choice, which Dolan expects to be the most popular because of the lower legal approval legwork.

The CDLA agreement does not put any restrictions on any results produced by processing and analysing the data.

Dolan said that "well over 100 lawyers" had reviewed the agreements and that the licences take into account differences between countries. Nevertheless, the framework is open to iteration. The team is opening a mailing list to facilitate public feedback and will "monitor" discussion.

Whose data is it anyway?

Daniel Himmelstein, a data science postdoctoral researcher at the University of Pennsylvania, told The Register: "Until recently, there was little awareness that data licensing could be an issue. This is an exciting development since it reflects that major players are now considering the importance. If more people feel comfortable releasing data openly because of these licences, then that's a win".

However, he was uncertain about the benefits of having a more "data-focused" licence agreement compared to creative commons licences. "I will likely continue to release most of my data under a CC0 public domain dedication," he said.

Dolan responded: "We did not draft the CDLA agreements to cure or fix any specific issues with other licences, but rather to look at what the current use cases required and build an agreement from that point of view based on what we've learned in open source-software licensing.

"All that we say is if there are attribution notices, you cannot remove them... In many jurisdictions there can be severe penalties for removing attribution notices so we wanted to prevent that from happening."

Derclaye, the author of The Legal Protection of Databases, said she could understand why the Linux Foundation had created these restrictive licence agreements for raw data – saying they'd be incentive for organisations to disclose it. At the same time, she thinks it "wouldn't have been much work" to modify the existing creative commons licences, such as the commonly used CC-BY, to accommodate raw data, instead of creating something from scratch.

Room to improve

What the Linux Foundation ended up writing is "too vague" and "might create problems". She argued that:

  1. Unlike CC-BY, the sharing licence agreement does not explicitly state that the data is royalty-free. The licensee would need to check with the Linux Foundation.
  2. The licence does not include language for removing technological protection measures, such as encryption or other anti-copy tech. (Dolan claims the licence does have this, though "we made it even more broad than just technological protection measures").
  3. The agreement does not explicitly state that the licensee can sub-license the agreement to other parties without the Linux Foundation's approval. This might come up if a PhD student switched labs and wanted to sub-license data to their new boss. (Dolan said: "Everyone gets a licence to use, modify and distribute it to anyone under the licence they're all agreeing to use – the CDLA").
  4. CC-BY has language explicitly allowing existing fair use laws in the US and exception laws in other countries, although the Linux Foundation does not touch on it. (Dolan responded that open-source software licences routinely don't explicitly reference such exceptions and that they would be dealt with by "applicable law").
  5. The licence adds explicit language stating that the data will not be considered a work of joint authorship – but the actual definition is unclear.
  6. The licence gives contradictory advice regarding moral obligations and attribution that is confusing – CC-BY is clearer.

The database law prof said it's better to be clear, even if an agreement is more restrictive than it would be otherwise. Because of the vagueness, she added, if you're using a Linux Foundation agreement in a shared resource with other data under a different license, there could be conflicts. "If they really want it to be useful, it's good to be aligned as possible."

Leigh Dodds, of the UK's Open Data Institute, told The Register: "Clear licensing, which gives anyone the permission to access, use and share data, is fundamental to the open data movement.

"While we welcome the efforts of the Linux Foundation, we are not yet clear on what these new licences bring to the ecosystem. Users need to understand how these new licenses are compatible with, or different from, existing creative commons licences (especially CC-BY 4.0 and CC-BY-SA 4.0), and whether they allow for relicensing."

The org will "continue to recommend use of CC-BY 4.0 as it is already well adopted internationally". ®

Similar topics


Other stories you might like

  • UK science suffers as lawmakers continue to dither over Brexit negotiations

    Horizons Europe carrot dangled amid protocol wrangling

    A report from the UK House of Commons' European Scrutiny Committee has blamed delays in Brussels for choking off revenue streams to British institutions and businesses.

    The UK departed the European Union following a 2016 referendum. One of the results was that UK businesses were no longer able to tender for lucrative contracts within the bloc.

    The Brexit Divorce Bill uncomfortably laid out the facts back in 2018. The satellite navigation system Galileo was one victim despite substantial involvement from the UK in its development. Another was the Copernicus Earth monitoring programme; the UK was infamously snubbed when the European Space Agency (ESA) handed out six juicy contracts to institutions from the Continent.

    Continue reading
  • Warehouse belonging to Chinese payment terminal manufacturer raided by FBI

    PAX Technology devices allegedly infected with malware

    US feds were spotted raiding a warehouse belonging to Chinese payment terminal manufacturer PAX Technology in Jacksonville, Florida, on Tuesday, with speculation abounding that the machines contained preinstalled malware.

    PAX Technology is headquartered in Shenzhen, China, and is one of the largest electronic payment providers in the world. It operates around 60 million point-of-sale (PoS) payment terminals in more than 120 countries.

    Local Jacksonville news anchor Courtney Cole tweeted photos of the scene.

    Continue reading
  • Everything you wanted to know about modern network congestion control but were perhaps too afraid to ask

    In which a little unfairness can be quite beneficial

    Systems Approach It’s hard not to be amazed by the amount of active research on congestion control over the past 30-plus years. From theory to practice, and with more than its fair share of flame wars, the question of how to manage congestion in the network is a technical challenge that resists an optimal solution while offering countless options for incremental improvement.

    This seems like a good time to take stock of where we are, and ask ourselves what might happen next.

    Congestion control is fundamentally an issue of resource allocation — trying to meet the competing demands that applications have for resources (in a network, these are primarily link bandwidth and router buffers), which ultimately reduces to deciding when to say no and to whom. The best framing of the problem I know traces back to a paper [PDF] by Frank Kelly in 1997, when he characterized congestion control as “a distributed algorithm to share network resources among competing sources, where the goal is to choose source rate so as to maximize aggregate source utility subject to capacity constraints.”

    Continue reading
  • How business makes streaming faster and cheaper with CDN and HESP support

    Ensure a high video streaming transmission rate

    Paid Post Here is everything about how the HESP integration helps CDN and the streaming platform by G-Core Labs ensure a high video streaming transmission rate for e-sports and gaming, efficient scalability for e-learning and telemedicine and high quality and minimum latencies for online streams, media and TV broadcasters.

    HESP (High Efficiency Stream Protocol) is a brand new adaptive video streaming protocol. It allows delivery of content with latencies of up to 2 seconds without compromising video quality and broadcasting stability. Unlike comparable solutions, this protocol requires less bandwidth for streaming, which allows businesses to save a lot of money on delivery of content to a large audience.

    Since HESP is based on HTTP, it is suitable for video transmission over CDNs. G-Core Labs was among the world’s first companies to have embedded this protocol in its CDN. With 120 points of presence across 5 continents and over 6,000 peer-to-peer partners, this allows a service provider to deliver videos to millions of viewers, to any devices, anywhere in the world without compromising even 8K video quality. And all this comes at a minimum streaming cost.

    Continue reading
  • Cisco deprecates Microsoft management integrations for UCS servers

    Working on Azure integration – but not there yet

    Cisco has deprecated support for some third-party management integrations for its UCS servers, and emerged unable to play nice with Microsoft's most recent offerings.

    Late last week the server contender slipped out an end-of-life notice [PDF] for integrations with Microsoft System Center's Configuration Manager, Operations Manager, and Virtual Machine Manager. Support for plugins to VMware vCenter Orchestrator and vRealize Orchestrator have also been taken out behind an empty rack with a shotgun.

    The Register inquired about the deprecations, and has good news and bad news.

    Continue reading
  • Protonmail celebrates Swiss court victory exempting it from telco data retention laws

    Doesn't stop local courts' surveillance orders, though

    Encrypted email provider Protonmail has hailed a recent Swiss legal ruling as a "victory for privacy," after winning a lawsuit that sees it exempted from data retention laws in the mountainous realm.

    Referring to a previous ruling that exempted instant messaging services from data capture and storage laws, the Protonmail team said this week: "Together, these two rulings are a victory for privacy in Switzerland as many Swiss companies are now exempted from handing over certain user information in response to Swiss legal orders."

    Switzerland's Federal Administrative Court ruled on October 22 that email providers in Switzerland are not considered telecommunications providers under Swiss law, thereby removing them from the scope of data retention requirements imposed on telcos.

    Continue reading
  • Japan picks AWS and Google for first gov cloud push

    Local players passed over for Digital Agency’s first project

    Japan's Digital Agency has picked Amazon Web Services and Google Cloud for its first big reform push.

    The Agency started operations in September 2021, years after efforts like the UK's Government Digital Service (GDS) or Australia's Digital Transformation Agency (DTA). The body was a signature reform initiated by Prime Minister Yoshihide Suga, who spent his year-long stint in the top job trying to curb Japan's reliance on paper documents, manual processes, and faxes. Japan's many government agencies also operated their websites independently of each other, most with their own design and interface.

    The new Agency therefore has a remit to "cut across all ministries" and "provide services that are driven not toward ministries, agency, laws, or systems, but toward users and to improve user-experience".

    Continue reading
  • Singaporean minister touts internet 'kill switch' that finds kids reading net nasties and cuts 'em off ASAP

    Fancies a real-time crowdsourced content rating scheme too

    A Minister in the Singapore government has suggested the creation of an internet kill switch that would prevent minors from reading questionable material online – perhaps using ratings of content created in real time by crowdsourced contributors.

    "The post-COVID world will bring new challenges globally, including to us in the security arena," said Minister for Defence Dr Ng Eng Hen at a Tuesday ceremony to award the city-state's 2021 Defense Technology Prize.

    "For operations, the SAF (Singapore Armed Force) has to expand its capabilities in the digital domain. Whether for administrative or operational purposes, I think that we will need to leverage technology to the maximum," he declared.

    Continue reading

Biting the hand that feeds IT © 1998–2021