-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/merge-treedata-staging #56
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
|
||
create or replace procedure public.merge_treedata ( | ||
c_id_source_name character varying(255) | ||
) | ||
language plpgsql | ||
as $$ | ||
begin | ||
|
||
DROP TABLE IF EXISTS _tree; | ||
|
||
CREATE TEMPORARY TABLE _tree AS | ||
SELECT geom, | ||
id, | ||
id_reference, | ||
id_source_name, | ||
city, | ||
country, | ||
email, | ||
download, | ||
info, | ||
lat, | ||
lng, | ||
count, | ||
|
||
-- crosswalk fields, as per opentrees schema | ||
scientific, | ||
genus, | ||
species, | ||
variety, | ||
common, | ||
dbh, | ||
health, | ||
height, | ||
crown, | ||
spread, | ||
ule, | ||
updated, | ||
planted, | ||
note, | ||
address | ||
FROM treedata_staging ts | ||
WHERE 1=1 | ||
AND (c_id_source_name is null or ts.id_source_name = c_id_source_name); | ||
|
||
/* | ||
Insert with a left join based on the id_tree column | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Upserts can work, but left joins can be faster and more importantly, cleaner. The code to do an upsert on a bunch of columns in one sql statement can get big and cumbersome. It can be nicer to split it out. |
||
*/ | ||
INSERT INTO treedata ( | ||
id_tree, | ||
id_reference, | ||
id_source_name, | ||
geom, | ||
city, | ||
country, | ||
email, | ||
lat, | ||
lng, | ||
count, | ||
scientific, | ||
genus, | ||
species, | ||
variety, | ||
common, | ||
dbh, | ||
health, | ||
height, | ||
date_planted, | ||
note, | ||
address | ||
) | ||
SELECT id as id_tree, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: Be careful with the ids, there's a lot of them... Wondering if we shot ourselves in the foot with calling the treehash "id"... id is the unique hash of the tree that we create with our tree-id repo. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think given all the changes lately, I need to refresh my database. In the latest version I have, I don't have an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The id might be hiding midway down the table. Its sandwiched
|
||
id_reference, | ||
id_source_name, | ||
geom, | ||
city, | ||
country, | ||
email, | ||
lat, | ||
lng, | ||
count, | ||
scientific, | ||
genus, | ||
species, | ||
variety, | ||
common, | ||
dbh, | ||
health, | ||
height, | ||
date_planted, | ||
note, | ||
address | ||
FROM _tree | ||
LEFT JOIN treedata on ( | ||
treedata.id_source_name = _tree.id_source_name | ||
AND treedata.id_tree = _tree.id | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These should not be the same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They shouldn't, but they are. I need to fix the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless I am reading it wrong, I was looking at this code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean these: treedata.id_tree = _tree.id shouldnt be the same. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lines 406, 408, 446, and 447 are all different. id_wtt needs to be removed at some point as it is no longer in use. reposting here: |
||
) | ||
WHERE 1=1 | ||
AND treedata.id_tree is null | ||
; | ||
|
||
/* | ||
Update with a join. Add more columns. | ||
*/ | ||
UPDATE treedata | ||
SET dbh = _tree.dbh, | ||
address = _tree.address, | ||
health = _tree.health | ||
FROM _tree | ||
WHERE 1=1 | ||
AND _tree.id = treedata.id_tree | ||
-- check any columns that changed | ||
-- FIXME add more columns | ||
AND ( | ||
_tree.dbh <> treedata.dbh | ||
OR _tree.address <> treedata.address | ||
OR _tree.health <> treedata.health | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to add some more columns here, but this is just a draft PR to get me started. |
||
) | ||
; | ||
|
||
/* | ||
For the delete, we will probably want to keep track of old treedata rows. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do we want to do in the case of deletes? We might not be able to actually delete rows because of foreign keys. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've talked about this a ton and we are almost there with functionality to return our data back to the source cities but we'll need to gather their emails and or build out a 311 api. What are you planning on deleting? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may have trees in our db that do not exist in the source data... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking of the case where a tree exists, then dies and gets cut down. The city the next season goes and plants a new tree in that same tree pit (because new tree pits are expensive). They record the new tree and maybe delete the old tree, assuming it was in the database. What do we do with the old tree in our database in that scenario? This is all hypothetical. I'm not sure any city does that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Similar to the update, we can delete based on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
They do replant in the same hole pretty much every time a tree dies so we need to figure it out. We have functionality on the FE to edit and delete but we removed it last year because it is a can of worms. The question is, how do we know for sure which is the correct data. What if a user replanted something and the city has old data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah that's a real issue. Because the id_source_name IS the city name. I suppose we need to differentiate between what's city data and what is user data coming in from the field via the FE. Even still, which takes precedence. We could have a "field_data" vs "source_data" column value or something and then just grab whichever has the most recent date. A lot of sources don't have a modified or last maintained fiield tho. |
||
Do we want some sort of audit trail for deleted trees? | ||
Maybe an "active" column that we can use. | ||
We probably want an "active" column because of foreign key constraints | ||
*/ | ||
|
||
/* | ||
Delete the rows from the staging table based on the | ||
id_source_name | ||
*/ | ||
DELETE | ||
FROM treedata_staging | ||
WHERE 1=1 | ||
AND c_id_source_name is null or id_source_name = c_id_source_name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FIXME: Put the |
||
; | ||
|
||
end; $$ | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to add some more columns here, like
dbh_min
anddbh_max
. I just need some more sample data to play with to test this out.We just merged the PR to run the staging tables from the tree-sources process, so this should be reasonable to test.