Skip to content

Commit

Permalink
Merge pull request #117 from technologiestiftung/staging
Browse files Browse the repository at this point in the history
feat: refactoring & cleanup (#116)
  • Loading branch information
Jaszkowic authored Apr 29, 2024
2 parents 482c4b7 + 95f7196 commit 57c9503
Show file tree
Hide file tree
Showing 39 changed files with 1,019 additions and 1,040 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test-harvest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
id: api-start
run: cd api && supabase start | grep -w "service_role key" | cut -d ":" -f 2 | xargs | tr -d '\n' | awk '{print "service_role_key="$1}' >> "$GITHUB_OUTPUT" && cd ..
- name: run the harvester
run: docker run --env PG_SERVER='0.0.0.0' --env SKIP_MAPBOX --env PG_DB --env PG_PORT --env PG_USER --env PG_PASS --env SUPABASE_URL --env SUPABASE_SERVICE_ROLE_KEY='${{ steps.api-start.outputs.service_role_key }}' --env SUPABASE_BUCKET_NAME --env MAPBOXTOKEN --env MAPBOXUSERNAME --env LOGGING --env OUTPUT --network host technologiestiftung/giessdenkiez-de-dwd-harvester:test
run: docker run --env PG_SERVER='0.0.0.0' --env SKIP_MAPBOX --env PG_DB --env PG_PORT --env PG_USER --env PG_PASS --env SUPABASE_URL --env SUPABASE_SERVICE_ROLE_KEY='${{ steps.api-start.outputs.service_role_key }}' --env LIMIT_DAYS='30' --env SURROUNDING_SHAPE_FILE='/app/assets/buffer.shp' --env SUPABASE_BUCKET_NAME --env MAPBOXTOKEN --env MAPBOXUSERNAME --env MAPBOXTILESET --env MAPBOXLAYERNAME --env LOGGING --env OUTPUT --network host technologiestiftung/giessdenkiez-de-dwd-harvester:test
- name: stop the api
run: cd api && supabase stop && cd ..
release:
Expand Down
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -292,3 +292,14 @@ dist
.yarn/build-state.yml
.yarn/install-state.gz
.pnp.*

# Generated files
harvester/assets/berlin.shx
harvester/assets/buffer.cpg
harvester/assets/buffer.dbf
harvester/assets/buffer.prj
harvester/assets/buffer.shp
harvester/assets/buffer.dbf
harvester/assets/buffer.shx

harvester/.vscode
67 changes: 58 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
![](https://img.shields.io/badge/Built%20with%20%E2%9D%A4%EF%B8%8F-at%20Technologiestiftung%20Berlin-blue)

<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->

[![All Contributors](https://img.shields.io/badge/all_contributors-7-orange.svg?style=flat-square)](#contributors-)

<!-- ALL-CONTRIBUTORS-BADGE:END -->

# giessdenkiez-de-dwd-harvester
Expand Down Expand Up @@ -47,20 +50,67 @@ The current python binding of gdal is fixed to GDAL==2.4.2. If you get another g

Copy the `sample.env` file and rename to `.env` then update the parameters, most importantly the database connection parameters.

```
PG_SERVER=localhost
PG_PORT=54322
PG_USER=postgres
PG_PASS=postsgres
PG_DB=postgres
SUPABASE_URL=http://127.0.0.1:54321
SUPABASE_SERVICE_ROLE=eyJh...
SUPABASE_BUCKET_NAME=data_assets
MAPBOXUSERNAME=your_mapbox_username
MAPBOXTOKEN=your_mapbox
MAPBOXTILESET=your_mapbox_tileset_id
MAPBOXLAYERNAME=your_mapbox_layer_name
SKIP_MAPBOX=False
LIMIT_DAYS=30
SURROUNDING_SHAPE_FILE=./assets/buffer.shp
```

## Running

### Preparing the Buffer Shape
`harvester/prepare.py` shows how the assets/buffer.shp was created. If a bigger buffer is needed change `line 10` accordingly and re-run.
Starting from an empty database, the complete process of running the DWD harvester consists of three steps:

1. Preparing the buffered shapefile
2. Creating the grid structure for the `radolan_geometry` table
3. Harvesting the DWD data

### Creating the Grid Structure
`harvester/grid/grid.py` can be used to populate the radolan_geometry table. This table contains vector data for the target city. The data is needed by the harvest process to find the rain data for the target city area.
### 1. Preparing the buffered shapefile

This tool currently works for Berlin. To make use of it for another city, just replace the `harvester/grid/buffer.shp` file with a suitable shape. (can be generated by `harvester/prepare.py` for example. See above)
Firstly, a buffered shapefile is needed, which is created with the following commands. This step is utilizing the `harvester/assets/berlin.prj` and `harvester/assets/berlin.shp` files. Make sure to set the environment variables properly before running this step.

### Running the Harvest Process
`harvester/harvester.py` is the actual file for harvesting the data. Simply run, no command line parameters, all settings are in `.env`.
- `cd harvester/prepare`
- `SHAPE_RESTORE_SHX=YES python create-buffer.py`

The code in `harvester/harvester.py` tries to clean up after running the code. But, when running this in a container, as the script is completely stand alone, its probably best to just destroy the whole thing and start from scratch next time.
### 2. Creating the grid structure for the `radolan_geometry` table

Secondly, the `radolan_geometry` table needs to be populated. You need to have the buffered shapefile (from the previous step) created and available in `../assets`. The `radolan_geometry` table contains vector data for the target city. The data is needed by the harvest process to find the rain data for the target city area. This repository contains shape files for Berlin area. To make use of it for another city, replace the `harvester/assets/berlin.prj` and `harvester/assets/berlin.shp` files. Run the following commands to create the grid structure in the database:

- `cd harvester/prepare`
- `python create-grid.py`

### 3. Harvesting the DWD data

Make sure to set the environment variables properly before running the script. Make sure that you have succesfully ran the previous steps for preparing the buffered shapefile and creating the grid structure for the `radolan_geometry` table. The file `harvester/src/run_harvester.py` contains the script for running the DWD harvester, it does the following:

- Checks for existens of all required environment variables
- Setup database connection
- Get start end end date of current harvesting run (for incremental harvesting every day)
- Download all daily radolan files from DWD server
- Extracts the daily radolan files into hourly radolan files
- For each hourly radolan file:
- Projects the given data to Mercator, cuts out the area of interest. Using `gdalwarp` library.
- Produce a polygon feature layer. Using `gdal_polygonize.py` library.
- Extract raw radolan values from generate feature layer.
- Upload extracted radolan values to database
- Cleanup old radolan values in database (keep only last 30 days)
- Build a radolan grid holding the hourly radolan values for the last 30 days for each polygon in the grid.
- Updates `radolan_sum` and `radolan_values` columns in the database `trees` table
- Updates the Mapbox trees layer:
- Build a trees.csv file based on all trees (with updated radolan values) in the database
- Preprocess trees.csv using `tippecanoe` library.
- Start the creation of updated Mapbox layer

## Docker

Expand Down Expand Up @@ -96,7 +146,6 @@ docker-compose up --build

```


## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
Expand Down
10 changes: 10 additions & 0 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ inputs:
description: "Set to 'True' to skip the Mapbox Tileset generation (for testing pipelines)"
required: true
default: "False"
LIMIT_DAYS:
description: "The number of days to harvest DWD data for"
required: true
default: "30"
SURROUNDING_SHAPE_FILE:
description: "The path to the shape file of the area of interest"
required: true
default: "assets/buffer.shp"
runs:
using: "docker"
image: "harvester/Dockerfile"
Expand All @@ -69,3 +77,5 @@ runs:
LOGGING: ${{ inputs.LOGGING }}
DATABASE_URL: ${{ inputs.DATABASE_URL }}
SKIP_MAPBOX: ${{ inputs.SKIP_MAPBOX }}
LIMIT_DAYS: ${{ inputs.LIMIT_DAYS }}
SURROUNDING_SHAPE_FILE: ${{ inputs.SURROUNDING_SHAPE_FILE }}
7 changes: 2 additions & 5 deletions harvester/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,8 @@ COPY --from=builder /install /usr/local
RUN apt-get update && apt-get -y install git && apt-get -y install make
RUN git clone https://github.com/mapbox/tippecanoe.git && cd tippecanoe && make -j && make install

# COPY harvester.py /app/
# COPY prepare.py /app/
# COPY grid/ grid/
# COPY assets/ /app/assets
COPY . /app/

RUN cd /app/prepare && SHAPE_RESTORE_SHX=YES python create-buffer.py

CMD python /app/harvester.py && python /app/mapbox_tree_update.py
CMD python /app/src/run_harvester.py
1 change: 0 additions & 1 deletion harvester/assets/Berlin.cpg

This file was deleted.

Binary file removed harvester/assets/Berlin.dbf
Binary file not shown.
Binary file removed harvester/assets/Berlin.shx
Binary file not shown.
Loading

0 comments on commit 57c9503

Please sign in to comment.