merge-trees-to-database #15

tzinckgraf · 2023-02-01T23:25:34Z

This PR adds a merge step so that we can move trees into the database based on the geojson files. To do this, I've added a merge step to the data pipeline. This would run sometime after normalization using the geojsons from that step.
The process does two things:

Use ogr2ogr to append a geojson file to database table, in this case a staging table.
Use pgPromise to call a stored procedure that merges the staging table into the main table.

There are still a few TODOs in terms of async code we can add, albeit with additional libraries.

The performance of this step depends on the size of the file. ogr2ogr uploads 20k rows at a time. A file with 600k rows (like NYC) can take on the order of minutes, while a file with 20k rows (like Alameda) is within seconds. This is expected to run monthly, so that is not much of a concern.

Not included in here is the .env file that will be necessary holding the database info, similar to the wtt_server repo.

- added a merge step to upload to the database - added the db config files similar to the wtt_server - added command line arguments for the merge statement - the merge statement runs ogr2ogr then calls a stored procedure - still some TODOs for more async code

zoobot

LGTM, nice work!!
How much testing have you done on it? Has any source failed or does it time out or exit on the larger datasets?

zoobot · 2023-02-02T02:42:57Z

package.json

@@ -12,6 +12,7 @@
    "convert": "node bin/convert.js",
    "normalize": "node bin/normalize.js",
    "concatenate": "node bin/concatenate.js",
+    "merge": "node bin/merge.js",
    "save": "node bin/save.js",


Merge will get rid of the need for the save step I think. Do you want to handle that or should I assign myself?

zoobot · 2023-02-02T02:47:33Z

src/db/pg-promise-config.js

+ * source:
+ * https://github.com/vitaly-t/pg-promise/issues/78#issuecomment-171951303
+ */
+function camelizeColumns(data) {


Just a note, we have this function replicated here: https://github.com/waterthetrees/wtt_server/blob/68e22644d31da834a69b4435373e8b9aeac9ad5e/server/db/pg-promise-config.js

Probably not worth calling between repos tho since its a common utility function.

zoobot · 2023-02-02T02:55:53Z

src/stages/merge.js

+    console.log(`Running for ${source.destinations.normalized.path}`);
+    // FIXME use the async version of exec, but that means a new dependecy
+    const command = `ogr2ogr -f "PostgreSQL" PG:"host=${dbConfig.host} user=${dbConfig.user} password=${dbConfig.password} dbname=${dbConfig.database}" ${source.destinations.normalized.path} -nln tree_staging -geomfield geom -append`
+    exec(command, async (error, stdout, stderr) => {


Prefer spawn here as it has larger buffer for longer running processes. Could use spawnSync.

merge-trees-to-database

ccc884d

- added a merge step to upload to the database - added the db config files similar to the wtt_server - added command line arguments for the merge statement - the merge statement runs ogr2ogr then calls a stored procedure - still some TODOs for more async code

zoobot reviewed Feb 2, 2023

View reviewed changes

zoobot mentioned this pull request Feb 17, 2023

Sources Integration with FE and vector tiles and db waterthetrees/.github#1

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge-trees-to-database #15

merge-trees-to-database #15

tzinckgraf commented Feb 1, 2023

zoobot left a comment

zoobot Feb 2, 2023

zoobot Feb 2, 2023

zoobot Feb 2, 2023

merge-trees-to-database #15

Are you sure you want to change the base?

merge-trees-to-database #15

Conversation

tzinckgraf commented Feb 1, 2023

zoobot left a comment

Choose a reason for hiding this comment

zoobot Feb 2, 2023

Choose a reason for hiding this comment

zoobot Feb 2, 2023

Choose a reason for hiding this comment

zoobot Feb 2, 2023

Choose a reason for hiding this comment