This is a collection of scripts that I used to download a bunch of articles to do a lit review. If you want to know how to download a bunch of articles that you do not have rights to download (presumably through an academic library that you have sanctioned access to) this likely will not help. If you do have access to journal articles though your library, this set of scripts might make it easier for you to download them without clicking in your web browser 4-20 times per article.
There's even a script to rename the PDFs so they make some sense.
The scripts that I think work are in bin
. They were mostly used on a Linux box, but should work on a Mac. They probably work if you can use wsl under Windows, but I was unable to test that, in spite of eventually gaining access to a Windows box in order to be able to download over 2000 files from Taylor and Francis in just over half an hour. (If you want to do that, talk to your librarian; it requires that you do the work from a specific IP address that your librarian registers with T&F for a specific project and time period. If you need to download more than a couple hundred articles, contact your librarian as soon as you have a good enough description of your project that they can fill out the paperwork.)
For some journal we were unable to do a full-text search that I trusted to get a list of DOIs matching those keywords, so I downloaded the entire journal and then used Zotero to
USER=$CROSSREF_EMAIL
JNL="TESOL%20Quarterly"
VOL=44
ISSUE=1
PAGE=4
YEAR=2010
curl -s -L "https://doi.crossref.org/openurl?pid=$USER&title=$JNL&volume=$VOL&issue=$ISSUE&spage=$PAGE&date=$YEAR&redirect=false"
USER=$CROSSREF_EMAIL
JNL="International%20Multilingual%20Research%20Journal"
VOL=10
ISSUE=1
YEAR=2016
curl -s -L "https://doi.crossref.org/openurl?pid=$USER&title=$JNL&volume=$VOL&issue=$ISSUE&date=$YEAR&redirect=false" |lless
1931-3152
pulls the DOI out of the XML from crossref
for x in *txt; do grep -i positionality -A 10 -B 5 "$x"; echo "=========================" ; done |less
run pdftotext on all pdf files in current directory
looks for "To link to this article: " and expects https://doi.org/.... and then renames all txt and pdf files to the DOI.
Creates the directory.
Tried to use DOI: to find just the DOI, but sometimes DOI: came at the end of the line and the DOI was on the next line. This worked for the ones from TandF.
takes a DOI. Downloads the file using Wiley API. Must be run from on campus. Works only for journals that your library has a subscription for.