Waymo sues California's DMV to block autonomous car crash data from publication
Plus: Non-profit suicide hotline criticised for sharing mental health crisis texts with AI startup, and more
In brief Waymo is suing California's Department of Motor Vehicles in an attempt to keep information about its autonomous car crashes and other operational details private, arguing that the data is a trade secret.
California's DMV is strict about giving permits to companies testing self-driving cars on real roads. Companies have to disclose operational and safety data before they're approved to drive in the state. But Waymo doesn't want that kind of information getting out.
The DMV received a request for public record of Waymo's self-driving car test permit application filed last year. Waymo sent the department a redacted version to give, but the person requesting the information challenged the redactions. The DMV then notified Waymo it was going to hand over the unredacted report unless the company "sought an injunction prohibiting disclosure of the material in unredacted form" by January 31, 2022, according to the lawsuit [PDF].
If the data is made public, it "could provide strategic insight to Waymo's competitors and third parties regarding Waymo's assessment of those collisions from a variety of different perspectives, including potential technological remediation," the Google spin-off claimed in the court docs.
Should mental health crisis text conversations be used to train customer service chatbots?
A suicide hotline was slammed for sharing online text conversations with people seeking mental health support with its commercial spinoff to, erm, improve customer service AI chatbots.
Crisis Text Line, a global non-profit supporting those with mental problems, strips the data, removing any personally identifiable information, before it is given to Loris.ai, a startup focused on making customer service chatbots more empathetic. Crisis Text Line has a close working relationship with Loris.ai; it owns part of the startup and makes money from it. Both companies even had the same CEO at one point, according to Politico.
Experts questioned whether it was right to share text data generated from vulnerable people for commercial purposes, even if it wasn't illegal. "The nonprofit may have legal consent, but do they have actual meaningful, emotional, fully understood consent?" said Jennifer King, the privacy and data policy fellow at the Stanford University Institute for Human-Centered Artificial Intelligence.
A volunteer who has spent hundreds of hours helping Crisis Text Line said she wasn't aware the non-profit was sharing people's sensitive conversations. "Mental health and people cutting themselves adapted to customer service?" said Beck Bamberger. "That sounds ridiculous. Wow."
OpenAI trains new language models more aligned with humans
Researchers at OpenAI have developed InstructGPT, a new class of language model that promises to better understand user intentions and follow instructions to generate more relevant text.
For example, given the prompt: "Explain the moon landing to a six-year-old in a few sentences." OpenAI's previous GPT-3 model would spit out: "Explain the theory of gravity to a six-year-old. Explain the theory of relativity to a six-year-old in a few sentences. Explain the Big Bang Theory to a six-year-old." GPT-3 is good at regurgitating the structure of a given prompt to complete sentences.
InstructGPT, however, replied: "People went to the Moon, and they took pictures of what they saw, and sent them back to the Earth so we could all see them."
This output is arguably more useful than the previous one. OpenAI said it trained InstructGPT by using reinforcement learning algorithms to rank a dataset annotated by humans studying the quality of GPT-3's replies from given prompts.
"The resulting InstructGPT models are much better at following instructions than GPT-3. They also make up facts less often, and show small decreases in toxic output generation," OpenAI said in a blog post. "Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model, despite having more than 100x fewer parameters."
GPT-3 Embeddings by @OpenAI was announced this week.— Nils Reimers (@Nils_Reimers) January 28, 2022
📈 I was excited and tested them on 20 datasets
😢 Sadly they are worse than open models that are 1000 x smaller
💰 Running @OpenAI models can be a 1 million times more expensivehttps://t.co/vY1rsakLZM pic.twitter.com/cRGM1upVCJ
InstructGPT is now the default class of models offered via OpenAI's API. But one natural language processing researcher from Hugging Face, Nils Reimers, found that InstructGPT performed worse than many smaller open source models across a range of tasks, including text retrieval or search. ®