We've launched a site https://github.com/petermr/openVirus to search the whole open literature for content which could help tackle the pandemic. We're looking for volunteers (tech, biblio/library, documenters to help).
It's now clear that knowledge is one of the key tools in tackling this COVID-19 epidemic, and also that citizens across the world are desperate for knowledge. To address this some organizations are releasing restrictions on all IP as long as the epidemic lasts + 1 year. https://opencovidpledge.org/
Immediate action is required to halt the COVID-19 Pandemic and treat those it has affected. It is a practical and moral imperative that every tool we have at our disposal be applied to develop and deploy technologies on a massive scale without impediment. We therefore pledge to make all intellectual property under our control available to any group or individual for use in ending the COVID-19 pandemic and minimizing the impact of the disease, free of charge and without encumbrances.
We will implement this pledge expeditiously in accordance with the rules and regulations under which we operate.
The COVID-19 outbreak has drawn a minimal response from Scholarly publishing, both commercial and academic (e.g. repositories). One publisher, The Royal Society, has made ALL its publications freely accessible without restriction. This is the minimum that makes any difference. The only other response I know of is CORD-19 dataset (https://cset.georgetown.edu/covid-19-open-research-dataset-cord-19/)
CORD-19 contains 29,000 full-text articles with a wealth of information about the novel coronavirus (SARS-CoV-2), the associated illness COVID-19, and related viruses. The collection will be updated as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.
At the request of the White House Office of Science and Technology Policy, CSET leads this effort in partnership with the Allen Institute for AI, Chan Zuckerberg Initiative, Microsoft Research and the National Library of Medicine of the National Institutes of Health.
I have worked with this dataset and had helpful discussions with Allen AI. But I believe this response is minimal and can be only used by a very small proportion of the world. (I have no criticism of Allen AI, but I have a major criticism of the scholarly publishing industry).
This dataset (29 K) is a minute fraction of scholarly publication relevant to epidemics, between 0.1-1% . Half of it is public anyway in Europe/PMC and rxiv
s so the amount contributed by publishers is even less. It assumes that (a) the publishers know what people want (they don't) and (b) that the only people who need to get help are datamining AI academics. The data set is not readable by humans (the papers have been cast into JSON and the metadata removed into a separate CSV file).
The terms "COVID", "SARS", "coronavirus" only reach a small amount of the literature. I'm on a Cambridge Slack where my colleagues are discussing many different aspects of tackling the epidemic. Here are some:
- aerosols
- communications
- early detection
- epidemic modelling
- law
- masks
- molecular modelling
- strategy
- surfaces
- ventilators
None of these will be in CORD-19.
It's clear that some strategies depend heavily on human behaviour and political systems. We need papers on history, law, psychology, economics, literature, maths, statistics, education, politics ... ... in fact everything
Every subject researched in University is relevant to this fight. And the majority of citizens will be able to understand and use a large amount of the scholarly literature. You don't need to know quantum mechanics to read papers on how previous epidemics have been controlled.
Humans now must have a basic right to read any publicly funded research without restriction. Charging them 35-50 USD to read a single paper for one day is an insult. CEO's trumpeting what a great contribution CORD-19 is is unbearably arrogant. People are losing their livelihoods and lives, yet they are being charged exorbitant amounts to read about how to stay alive.
"food rationing" is a possible strategy in compliance. I've searched Elsevier and Taylor and Francis for this and of the top 20/25 hits there is ZERO access to citizens, even for papers 50 years old. It's time we started thinking about READERS, not authors.
It's also critical that we use machines to read the ALL literature as it is published - perhaps 10K / day. And also theses. It's not a huge task, it's just horribly messy because we don't have tradition of wanting the output to be read or used. (If we had, the Ebola outbreak prediction would have been made public many years ago).
The only modern way to use the fruits of public scholarship are:
- create all material openly
- with a semantic version
- review as necessary in public
- remove any access barriers to authoring or reading or re-using
- use machines to process all material and index it with a single point of access. (Individual publishers with own brands are a massive friction in the system. Individual university repositories are massive friction. )
- annotate, split, combine, compute. The human/machine readership should be the judge of what's useful and needed.
- pay for service, not rent/ownership. The preprint servers have shown that the costs are very low. Latin America has shown that the costs are very low.
This means that publishers must adapt or die. Other industries are doing that - planes, manufacturing , food, ... People are dying. There is no longer a right to make money by restricting access to knowledge (Paywalls, lawyers, Glass screens, etc.). Publishers - if they are needed at all - must put the dissemination of public knowledge at the top of their mission.
And if you've read this polemic this far, and have something to contribute, https://github.com/petermr/openVirus