-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge-trees-to-database #15
base: main
Are you sure you want to change the base?
Conversation
- added a merge step to upload to the database - added the db config files similar to the wtt_server - added command line arguments for the merge statement - the merge statement runs ogr2ogr then calls a stored procedure - still some TODOs for more async code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!!
How much testing have you done on it? Has any source failed or does it time out or exit on the larger datasets?
@@ -12,6 +12,7 @@ | |||
"convert": "node bin/convert.js", | |||
"normalize": "node bin/normalize.js", | |||
"concatenate": "node bin/concatenate.js", | |||
"merge": "node bin/merge.js", | |||
"save": "node bin/save.js", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge will get rid of the need for the save step I think. Do you want to handle that or should I assign myself?
* source: | ||
* https://github.com/vitaly-t/pg-promise/issues/78#issuecomment-171951303 | ||
*/ | ||
function camelizeColumns(data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, we have this function replicated here: https://github.com/waterthetrees/wtt_server/blob/68e22644d31da834a69b4435373e8b9aeac9ad5e/server/db/pg-promise-config.js
Probably not worth calling between repos tho since its a common utility function.
console.log(`Running for ${source.destinations.normalized.path}`); | ||
// FIXME use the async version of exec, but that means a new dependecy | ||
const command = `ogr2ogr -f "PostgreSQL" PG:"host=${dbConfig.host} user=${dbConfig.user} password=${dbConfig.password} dbname=${dbConfig.database}" ${source.destinations.normalized.path} -nln tree_staging -geomfield geom -append` | ||
exec(command, async (error, stdout, stderr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer spawn here as it has larger buffer for longer running processes. Could use spawnSync.
This PR adds a merge step so that we can move trees into the database based on the geojson files. To do this, I've added a
merge
step to the data pipeline. This would run sometime after normalization using the geojsons from that step.The process does two things:
There are still a few TODOs in terms of async code we can add, albeit with additional libraries.
The performance of this step depends on the size of the file.
ogr2ogr
uploads 20k rows at a time. A file with 600k rows (like NYC) can take on the order of minutes, while a file with 20k rows (like Alameda) is within seconds. This is expected to run monthly, so that is not much of a concern.Not included in here is the
.env
file that will be necessary holding the database info, similar to the wtt_server repo.