Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: populate database with admin boundaries #852

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

feat: populate database with admin boundaries #852

wants to merge 9 commits into from

Conversation

cka-y
Copy link
Contributor

@cka-y cka-y commented Dec 16, 2024

Summary:

This pull request introduces the new reverse_geolocation_populate Cloud Function, designed to populate a PostgreSQL database with administrative boundary polygons and metadata for specified country codes. The function integrates BigQuery for querying OpenStreetMap data and handles administrative levels dynamically based on the input.

Key Changes:

  • New Functionality:

    • Added reverse_geolocation_populate to handle:
      • Parsing country_code and admin_levels from the request payload.
      • Querying OpenStreetMap data via BigQuery to fetch administrative polygons.
      • Inserting or updating the database with geometry data using geoalchemy2.

    File Affected:

    • functions-python/reverse_geolocation_populate/src/main.py
  • Workflow Integration:

    • Updated the associated GCP workflow to trigger reverse_geolocation_populate for all ISO-3166-1 country codes, with a concurrency limit of 5.

Expected Behavior:

  1. Country-Level Populations:

    • The function populates the database with administrative boundary polygons for specified country codes, supporting up to the 8th level of administrative boundaries as defined in the [OSM documentation].

    • The levels are determined dynamically if not provided by request parameter:

      1. Country-Level (ISO 3166-1): The function identifies the base administrative level for the country (e.g., admin_level = 2 for most countries).
      2. Subdivision Levels (ISO 3166-2): The function automatically extracts subdivision levels (e.g., states, provinces) based on the ISO 3166-2 definitions.
      3. Additional Levels: Two levels beyond the highest ISO 3166-2-compliant level are included to capture finer subdivisions where applicable.
  2. Example Outputs:

    • Canada (DEV):
      Populates provincial boundaries (only iso-3166-2 compliant locations are shown but more administrative boundaries are extracted for Canada).
      Screenshot 2024-12-16 at 12 22 03 PM
    • France (DEV):
      Populates regions and departments (only iso-3166-2 compliant locations are shown but more administrative boundaries are extracted for France).
      Screenshot 2024-12-16 at 12 23 25 PM
  3. Full Workflow Run:

    • Example execution of the population workflow:
      [View Workflow Execution]
    • Note: This workflow should be run once per environment for initial population.

Testing Steps:

  1. Database Validation:

    • Connect to the DEV database.
    • Query the GeoPolygon table to explore populated geometries and verify data integrity.
  2. Workflow Validation:

    • Trigger the associated workflow with all ISO country codes.
    • Monitor the logs and ensure successful execution for each country.

Notes:

  • The function ensures SRID 4326 compliance for geometries.
  • Indexing has been optimized for spatial queries using GIST.
  • Workflow concurrency is limited to 5 requests at a time to balance resource usage.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@cka-y cka-y linked an issue Dec 16, 2024 that may be closed by this pull request
@cka-y cka-y marked this pull request as ready for review December 16, 2024 17:36
@cka-y
Copy link
Contributor Author

cka-y commented Dec 16, 2024

Should only merge after 1.5.1 release

Copy link
Member

@davidgamez davidgamez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few non-blocking comments. It's hard to test the full functionality as it integrates with BigQuery. As this is a non-user-facing function, we can do an entire testing exercise after merging to QA. If any issues are encountered, we can fix them in a future PR

### Input
The function expects an HTTP POST request with a JSON payload containing:
- **`country_code`** (required): The ISO 3166-1 alpha-2 code of the country (e.g., `"FR"` for France).
- **`admin_levels`** (optional): A comma-separated list of administrative levels to process (e.g., `"2,4,6"`). If not provided, the function calculates levels automatically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]: As admin_levels is optional, in case that is not specified, what is the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this in the PR description but i should probably add it in the README as well:
The levels are determined dynamically if not provided by request parameter:

  • Country-Level (ISO 3166-1): The function identifies the base administrative level for the country (e.g., admin_level = 2 for most countries).
  • Subdivision Levels (ISO 3166-2): The function automatically extracts subdivision levels (e.g., states, provinces) based on the ISO 3166-2 definitions.
  • Additional Levels: Two levels beyond the highest ISO 3166-2-compliant level are included to capture finer subdivisions where applicable.

"description": "Populate the database with reverse geolocation data",
"entry_point": "reverse_geolocation_populate",
"timeout": 3600,
"memory": "8Gi",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the memory field? I think we are using memory and available_memory for the same purpose in different functions.

cloudevents~=1.10.1

# Additional packages for this function
gtfs-kit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this dependency being used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add geolocation information to the database
2 participants