Open access journals are vanishing from the web, Internet Archive stands ready to fill in the gaps

Diamonds are forever, scholarly articles not so much, it seems

35 Reg comments Got Tips?

Academics studying the longevity of online scholarship say that 176 digital open-access journals have vanished from the internet over the past two decades.

The findings are documented in a research paper distributed via ArXiv, "Open is not forever: a study of vanished open access journals."

Authors Mikael Laakso of Hanken School of Economics of Helsinki, Lisa Matthias of the Free University of Berlin, and Najko Jahn of the University of Göttingen surveyed various bibliographic indexes and the Directory of Open Access Journals, and tried to trace the publications through the Internet Archive's Wayback Machine.

Open access journals, as the term suggests, are made accessible to internet users at no charge, in contrast to traditional subscription-based journals. There's an ongoing debate among researchers about whether to publish in an OA journal or a subscription-based title. The decision affects article distribution, publication speed, publication cost, professional prestige, and other factors.

The issue surfaced last year when the University of California terminated its subscription with scientific publishing giant Elsevier over the price it demanded for access to its collection of more than 2,000 journals. The UC system wants Elsevier to provide open access publishing for UC-authored articles and to provide UC scholars with access to journal content at a reasonable price. Negations remain ongoing.

The restrictions and costs imposed by traditional academic publishers like Elsevier led Alexandra Asanovna Elbakyan in 2011 to found Sci-Hub, a website that shares research papers at no cost and with no regard for US copyright law.

OA journals by design are more affordable and accessible than their subscription-based kin, but their lack of a funding model makes their ongoing survival more precarious.

The slow creep of information loss

Laakso, Matthias, and Jahn note that while OA journals "are not vanishing in vast numbers," their disappearance affects certain disciplines disproportionately. Journals associated with academic institutions or scholarly groups in North America, and with social sciences and humanities research, represented a larger portion of the disappeared content than others.

"Our findings suggest that current approaches to digital preservation are successful in archiving content from larger journals and established publishing houses but leave behind those that are more at risk," the authors state in the paper. "Hence, preservation initiatives may need to re-evaluate their current strategy and develop alternative pathways—ideally in close collaboration and consultation with university and society journals—that are better suited for smaller journals that operate without the support of large, professional publishers."

Angry man's eye peers from face painted with Scottish flag / st andrews cross

Um, almost the entire Scots Wikipedia was written by someone with no idea of the language – 10,000s of articles

READ MORE

In an email to The Register, Brewster Kahle, director and co-founder of the Internet Archive, "We have found much the same thing with open access journals – they are often untended," adding he's glad the issue is being formally studied.

Kahle said the Internet Archive has been crawling the web for 25 years but started focusing on academic periodicals in 2018. The Mellon Foundation, he said, gave the Internet Archive a grant along these lines in 2019.

"So we are now crawling long tail journal literature and also datasets as they are available on the web," he said, "We are geared up significantly because of these grants, and frankly the discoveries we have made about how much support the Commons needs."

Bryan Newbold of the Internet Archive said that by the Internet Archive's analysis, "18 per cent of all open access articles since 1945, over three million, are not independently archived by us or another preservation organization, other than the publishers themselves."

Newbold said the Internet Archive has been in contact with the paper's authors and are working to improve article crawling. "There is a 'save paper now' feature, as well as an API for bots, Organizations like DOAJ, ISSN, DOI registrars (Crossref, Datacite, others) are crucial for this," he said. "

The Internet Archive aims to augment its efforts by partnering with larger publishers, such as LOCKSS, Portico, JSTOR, and institutional repositories.

Newbold said it would be helpful to have the equivalent of youtube-dl – a tool for downloading YouTube videos – for open access papers.

"There is a lot of content on large platforms and publishers which have anti-crawling measures (even for gold OA and hybrid content!), as well as a long tail of small publishers that don't use simple/common mechanisms like OAI-PMH and the 'citation_pdf_url' HTML meta tag to identify full text content. The OAI-PMH ecosystem sadly is not very complete or helpful for the use case of mirroring," said Newbold. ®

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER


Biting the hand that feeds IT © 1998–2020