Nearly one in two industry pros scaled back open source use over security fears

Log4j being the main driver, this data science poll claims

About 40 percent of industry professionals say their organizations have reduced their usage of open source software due to concerns about security, according to a survey conducted by data science firm Anaconda.

The company's 2022 State of Data Science report solicited opinions in April and May from 3,493 individuals from 133 countries and regions, targeting academics, industry professionals, and students. About 16 percent of respondents identified as data scientists.

About 33 percent of surveyed industry professionals said they had not scaled back on open source, 7 percent said they had increased usage, and 20 percent said they weren't sure. The remaining 40 percent said they had.

By industry professionals, or commercial respondents as Anaconda puts it, the biz means a data-science-leaning mix of business analysts, product managers, data and machine-learning scientists and engineers, standard IT folks such as systems administrators, and others in technology, finance, consulting, healthcare, and so on.

And by scale back, that doesn't mean stop: 87 percent of commercial respondents said their organization still allowed the use of open source. It appears a good number of them, though, are seeking to reducing the risk from relying on too many open source dependencies.

Anaconda's report found that incidents like Log4j and reports of "protestware" prompted users of open source software to take security concerns more seriously. Of the 40 percent who scaled back usage of open source, more than half did so after the Log4j fiasco.

Some 31 percent of respondents said security vulnerabilities represent the biggest challenge in the open source community today.

Most organizations use open source software, according to Anaconda. But among the 8 percent of respondents indicating that they don't, more than half (54 percent, up 13 percent since last year) cited security risks as the reason.

Other reasons for not using open source software include: lack of understanding (38 percent); lack of confidence in organizational IT governance (29 percent); "open-source software is deemed insecure, so it's not allowed" (28 percent); and not wanting to disrupt current projects (26 percent).

Anaconda's survey also registered worries about lack of technical skills, with 90 percent of professional respondents fretting about a talent shortage. Some 64 percent said their biggest concern was being able to recruit and retain talent and 56 percent opined that lack of data science talent represented one of the major obstacles in enterprise data science efforts.

"Organizations should bolster the tools and resources available for continued learning, and academic institutions should fill in the skills gaps for students and turn them into strengths as they prepare to enter the workforce," said Jessica Reeves, SVP of Operations at Anaconda, in a statement.

Reeves argues that training existing workers in data science and permitting more appealing remote work options can help with talent acquisition and retention.

Python continues to be the preferred language for data science types. Among survey respondents, 31 percent said they use it "always" and 27 percent said they use it "frequently." Julia, as a point of comparison, scored 3 percent "always" and 12 percent "frequently."

Attention to ethics in data science continues to be underwhelming. The survey found 24 percent of respondents said their organizations don't have standards, policies, or measurement tools to address algorithmic fairness and bias. An additional 15 percent aren't sure how their organizations confront such challenges.

Academic institutions come out even worse in the survey. Among academic-track respondents, just 19 percent said their institutions teach ethics in data science and machine learning and 20 percent said that ethics is covered in the coursework for their respective fields.

Just 23 percent of academic respondents and 21 percent of students said bias in AI/ML/data science is taught regularly. About 39 percent said it's rarely taught and 36 percent said it's never taught – which Anaconda notes is at least a 9 percent year-on-year decrease from the company's 2021 survey.

About a third (32 percent) of respondents said that the social effects of bias in data and models is the biggest problem in AI/ML/data science today.

"[T]here is certainly room for a greater emphasis on ethics and bias in the educational sphere," the survey says. ®

Similar topics


Other stories you might like

Biting the hand that feeds IT © 1998–2022