forked from livgust/covid-vaccine-scrapers
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge current scrape with previous results (livgust#52)
* Interim MAimmunizations improvement. (livgust#50) * Test for the Mass. error page As of 11am, it returns a Heroku error page about half the time. Catch that and fail fast. Also, check for other CSS selector serach failures, although it's not clear under what circumstances that might happen. Previously we got: TypeError: Cannot read property 'evaluate' of undefined at ScrapeWebsiteData (/Users/jhawk/src/covid-vaccine-scrapers/site-scrapers/MAImmunizations.js:20:48) at processTicksAndRejections (internal/process/task_queues.js:93:5) at async GetAvailableAppointments (/Users/jhawk/src/covid-vaccine-scrapers/site-scrapers/MAImmunizations.js:5:18) at async Promise.all (index 0) at async gatherData (/Users/jhawk/src/covid-vaccine-scrapers/main.js:41:19) at async execute (/Users/jhawk/src/covid-vaccine-scrapers/main.js:116:2) at async /Users/jhawk/src/covid-vaccine-scrapers/main.js:124:3 The following data would be published: attempting to iterate over the page navigation. THe Heroku error page looks like this: ---snip <!DOCTYPE html><html><head> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta charset="utf-8"> <title>Application Error</title> <style media="screen"> html,body,iframe { margin: 0; padding: 0; } html,body { height: 100%; overflow: hidden; } iframe { width: 100%; height: 100%; border: 0; } </style> </head> <body> <iframe src="//www.herokucdn.com/error-pages/application-error.html"></iframe> </body></html> ---snip so we check the page title to find if we're there. * MAImmunizations.js: Also return false for other failure Avoid the traceback for every instance of missing page navigation, not just the Heroku error page. * Return static copy of MA scrape if we fail * Fetch the current data.json, use it if neccssary Note we have a bootstrapping problem, which is why the baseline is here. Otherwise we can never update ourselves because the retreived data never has the data to update. Adds a node-fetch dependancy. * add defaulting data, next step is adding timestamps to scrapers * forgot to add time allowance * add timestamps for all scrapers * respond to codecheck comments Co-authored-by: John Hawkinson <[email protected]>
- Loading branch information
1 parent
982d0c5
commit 4ac5dc6
Showing
17 changed files
with
1,082 additions
and
266 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
/* mergeResults | ||
* | ||
* Merges cachedResults into currentResults. If secondsOfTolerance is set, | ||
* will only merge in cachedResults with a timestamp newer than | ||
* now - secondsOfTolerance. | ||
*/ | ||
function mergeResults(currentResults, cachedResults, secondsOfTolerance) { | ||
if (!(cachedResults && cachedResults.length)) { | ||
return currentResults; | ||
} else { | ||
const combinedResults = []; | ||
const currentResultsMap = {}; | ||
currentResults.forEach((result) => { | ||
combinedResults.push(result); | ||
currentResultsMap[generateKey(result)] = 1; | ||
}); | ||
|
||
cachedResults.forEach((cachedResult) => { | ||
if (!currentResultsMap[generateKey(cachedResult)]) { | ||
if (secondsOfTolerance) { | ||
const lowerTimeBound = | ||
new Date() - secondsOfTolerance * 1000; | ||
if ( | ||
cachedResult.timestamp && | ||
cachedResult.timestamp >= lowerTimeBound | ||
) { | ||
combinedResults.push(cachedResult); | ||
} | ||
} else { | ||
combinedResults.push(cachedResult); | ||
} | ||
} | ||
}); | ||
|
||
return combinedResults; | ||
} | ||
} | ||
|
||
function generateKey(entry) { | ||
let uniqueIdentifier = ""; | ||
["name", "street", "city", "zip"].forEach((key) => { | ||
if (entry[key]) { | ||
uniqueIdentifier += `${entry[key] | ||
.toLowerCase() | ||
.replace(/[^\w]/g, "")}|`; | ||
} | ||
}); | ||
|
||
return uniqueIdentifier; | ||
} | ||
|
||
module.exports.mergeResults = mergeResults; | ||
module.exports.generateKey = generateKey; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.