Please check your data: A self-driving car dataset failed to label hundreds of pedestrians, thousands of vehicles

Plus: Trump budget favors AI, 'patent troll' backs down, and CEO quits amid sex pest claims

Roundup It's a long weekend in the US, though sadly not in Blighty. So, for those of you starting your week, here's some bite-sized machine-learning news, beyond what we've recently covered, if that's your jam.

Check your training data: A popular dataset for training self-driving vehicles, including an open-source autonomous car system, failed to correctly label hundreds of pedestrians and thousands of vehicles.

Brad Dwyer, founder of Roboflow, a startup focused on building data science tools, discovered the errors when he started digging into the dataset compiled by Udacity, an online education platform.

“I first noticed images that were missing annotations,” Dwyer told The Register. “That led me to dig in deeper and check some of the other images. I found so many errors I ended up going through all 15,000 images because I didn’t want to re-share a dataset that had such obvious errors.”

After flicking through each image, he found that 33 per cent of them contained mistakes. Thousands of vehicles, hundreds of pedestrians, and dozens of cyclists were not labelled. Some of the bounding boxes around objects were duplicated or needlessly oversized too.

Training an autonomous car on such an incomplete dataset could potentially be dangerous. The collection was pulled together to make it easier for engineers to collaborate and build a self-driving car. Thankfully, a project to develop such a system using this information seems to have died down since it launched more than three years ago.

“Udacity created this dataset years ago as a tool purely for educational purposes, back when self-driving car datasets were very hard to come by, and those learning the skills needed to develop a career in this field lacked adequate training resources,” a Udacity spokesperson told El Reg.

“At the time it was helpful to the researchers and engineers who were transitioning into the autonomous vehicle community. In the intervening years, companies like Waymo, nuTonomy, and Voyage have published newer, better datasets intended for real-world scenarios. As a result, our project hasn't been active for three years.

“We make no representations that the dataset is fully labeled or complete. Any attempts to show this educational data set as an actual dataset are both misleading and unhelpful. Udacity's self-driving car currently operates for educational purposes only on a closed test track. Our car has not operated on public streets for several years, so our car poses no risk to the public.”

Roboflow has since corrected the errors on the dataset, and issued an improved version.

Standing up to patent trolls works: Mycroft AI, a startup building an open-source voice-controlled assistant for Linux-based devices, was sued for allegedly infringing a couple of patents, as we reported earlier this month.

Mycroft’s CEO Joshua Montgomery spoke to The Register about his strong suspicions that he was being targeted by a so-called patent troll. His biz was told by a lawyer representing the patents' owner to cough up a license fee, and when Montgomery ignored the request, a patent-infringement lawsuit was filed against his company.

The mysterious patent owner, Voice Tech Corp, turned out to a brand new company in Texas, USA, and its address was someone’s bungalow, according to court filings. All of that fueled the growing speculation that, yes, Voice Tech Corp, was probably a patent troll.

Now, after facing sufficient resistance from Mycroft, Voice Tech Corp has dropped its case. Montgomery threatened to fight the lawsuit all the way to get Voice Tech Corp’s patents invalidated so that no other startup would have to face the same problem.

More Clearview drama: The controversial facial-recognition outfit that admitted to harvesting more than three billion publicly shared photos from social media sites is back in the news again.

The American Civil Liberties Union (ACLU) revealed it is trying to get Clearview to remove the claim from its marketing that its facial recognition code was verified using a “methodology used by the ACLU.” The rights warriors said they had no involvement in the product and do not endorse it. In fact, the union is pretty much against everything Clearview is doing.

Clearview boasts that its technology is 99 per cent accurate following numerous tests. Buzzfeed News, however, reckons it is nowhere near that good. The upstart previously said its algorithms helped police in New York City catch a terrorist planning to plant fake bombs on the subway. NYPD denied using Clearview’s software.

Google, YouTube, Twitter, and Facebook have sent Clearview cease-and-desist letters demanding the startup stop scraping images of their platforms, and to delete those in its database. In a bizarre interview, Clearview’s CEO fought back and said he believed that since all the photos were public, his stateside company, therefore, had a “First Amendment right to public information." Er, yeah right.

Public funding for AI, 5G: President Donald Trump has vowed to spend more of US taxpayers' money on the research and development of emergent technologies, such as AI, quantum computing, and 5G, than traditional sciences.

“The Budget prioritizes accelerating AI solutions,” according to a proposal, subject to congressional approval, published this week. “Along with quantum information sciences, advanced manufacturing, biotechnology, and 5G research and development (R&D), these technologies will be at the forefront of shaping future economies.

“The Budget proposes large increases for key industries, including doubling AI and quantum information sciences R&D by 2022 as part of an all-of-Government approach to ensure the United States leads the world in these areas well into the future.”

Trump pledged to spend $142.2bn in R&D for the next fiscal year, nine per cent less than this year. While AI and quantum computing are favored, there's less federal funding for general research and development for the other sciences.

The Department of Energy, the National Science Foundation, the National Institutes of Health, and others, will see cuts. The DOE’s Advanced Research Projects Agency-Energy (ARPA-E) will be particularly hard hit: not only does the proposed budget effectively eliminate the agency, it must pay back $311m to the treasury.

You can read more about the proposed budget for the fiscal year of 2021, here.

CEO of AI startup steps down over allegations: The CEO of Clinc, a small artificial-intelligence outfit spun out of the University of Michigan, has resigned following claims he sexually harassed employees and customers.

Jason Mars, an assistant professor of computer science at the university, was accused of physically accosting clients, making lewd comments about female employees and interns, and hiring a prostitute during a work trip.

In an email to employees at Clinc, first reported by The Verge, Mars said the allegations against him were “rife with embellishments and fabrications.” He did, however, admit to drinking too much and partying with staff in “a way that’s not becoming of a CEO.” ®

Similar topics

Other stories you might like

Biting the hand that feeds IT © 1998–2022