-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge trees to database #48
base: main
Are you sure you want to change the base?
Conversation
- added a script to fetch the tables dynamically and pull schemas into files - added the table schema for individual tables
- updated README with database update instructions
- First iteration to create a table to hold all tree data - data was taken based on the fields pulled in the tree-source repo - added a merge stored procedure split into 3 different pieces of the upserts for performance - staging table is an unlogged table to prevent WAL writes - indexes added similar to the treedata table - migration script to move the treedata data into the tree table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start to this!!
For now we probably should use for db serial id_source_staging or something to adhere to the convention of the other table's serials so as not to be confusing. We are using id as the id that get's passed around to the front end. In the tree-sources code base id is used as the city id(basically the city name) and also id for the treedata's tree id that we create(not the db's serialized version) Basically we are currently using the db's serial for nothing at all.
Are tree and tree_staging temporary tables?
If these are temporary tables, can we name them after the tables they are merging into?
Can you add an in depth commit description for this? :)
Also just putting this here for your perusal:
https://standards.opencouncildata.org/#/trees
We've deviated from it. We should maybe submit a PR requesting more fields...
drop table if exists _tree; | ||
|
||
create temporary table _tree as | ||
SELECT t.id as id_tree, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id and id_tree are different. id is the id that we create with the tree-id repo, id_tree is the db serial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id_tree
for the purposes of this stored procedure just let me know we are using the id
that connects to the tree table, as opposed to the staging table. The difference is mainly used below for determining missing / matching data.
create temporary table _tree as | ||
SELECT t.id as id_tree, | ||
ts.id as id_tree_staging, | ||
ts.ref as id_reference, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id_reference is what we were using and then we moved it to ref because of the open data standards. Unfortunately it's special in react so ref kind of is confusing as an open data standard, imo.
@@ -0,0 +1,86 @@ | |||
-- drop table public.tree | |||
CREATE TABLE public.tree ( | |||
id bigserial NOT NULL primary key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id_treedata_staging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an id based on one we create
@@ -0,0 +1,86 @@ | |||
-- drop table public.tree | |||
CREATE TABLE public.tree ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
treedata_staging
SELECT 1; | ||
|
||
-- drop table public.tree_staging; | ||
CREATE UNLOGGED TABLE public.tree_staging ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tree_sources_staging
country character varying(255), | ||
neighborhood character varying(255), | ||
health character varying(255), | ||
dbh double precision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbh_min
and dbh_max
HI @tzinckgraf Are you interested in working on this anymore or should I un-assign? |
This is the first iteration for keeping the tree data in the database.
This PR introduces two new tables,
trees
andtrees_staging
.The
trees
table would ideally replace thetreedata
table. Thetreedata
table is currently a mix of time based data and reference data. There was a discussion around normalizing it further. However, we can also use the originaltreedata
table instead of the proposedtree
table with minimal updates. There is a migration file that shows how the data is interchangeable between those two.The
trees_staging
table is a unlogged table. This is similar to a temporary table in that there is no write to the WAL. It differs in that the data is global. This means you can write with one connection and then use that data with another. A temporary table would need to maintain data and operations on the same connection, which is a problem when using something likeogr2ogr
for writing data.There is also a stored procedure that merges the two tables. This does the full upsert logic in three parts using a temporary table. The procedure is idempotent.