Skip to content

Commit

Permalink
Merge pull request #132 from ASFHyP3/develop
Browse files Browse the repository at this point in the history
Release 0.3.0
  • Loading branch information
jtherrmann authored Dec 8, 2022
2 parents ca6b920 + 707da08 commit ca6c75d
Show file tree
Hide file tree
Showing 19 changed files with 304 additions and 65 deletions.
14 changes: 8 additions & 6 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,19 @@ on: push
jobs:
pytest:
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}

steps:
- uses: actions/[email protected]

- uses: actions/setup-python@v4
- uses: conda-incubator/setup-miniconda@v2
with:
python-version: 3.9

- run: |
python -m pip install --upgrade pip
make install
mamba-version: '*'
python-version: '3.9'
activate-environment: asf-stac
environment-file: environment.yml

- name: run pytest
run: make test
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,6 @@ dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.3.0]
### Added
- Created a STAC item collection for the `glo-30-hand` dataset.

## [0.2.0]
### Added
- Created a STAC item collection for the `sentinel-1-global-coherence` dataset.
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ run-api:
python -m stac_fastapi.pgstac.app

test:
PYTHONPATH=${PWD}/collections/sentinel-1-global-coherence/ python -m pytest tests/
PYTHONPATH=${PWD}/collections/sentinel-1-global-coherence/:${PWD}/collections/glo-30-hand/ python -m pytest tests/

static: flake8 cfn-lint

Expand Down
47 changes: 46 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ Creation and hosting of STAC catalogs by the ASF Tools team.

**Production API:** <https://stac.asf.alaska.edu>
* *Swagger UI:* <https://stac.asf.alaska.edu/api.html>
* *STAC Browser:* <https://radiantearth.github.io/stac-browser/#/external/stac.asf.alaska.edu/>

**Test API:** <https://stac-test.asf.alaska.edu>
* *Swagger UI:* <https://stac-test.asf.alaska.edu/api.html>
* *STAC Browser:* <https://radiantearth.github.io/stac-browser/#/external/stac-test.asf.alaska.edu/>

## Developer setup

Expand All @@ -20,6 +22,18 @@ conda env create -f environment.yml
conda activate asf-stac
```

If you ever see the following warning from `gdal`...

```
Warning 1: PROJ: proj_create_from_database: Open of /home/.../miniconda3/envs/asf-stac/share/proj failed
```

...you can run the following command and then re-activate your Conda environment:

```
conda env config vars set PROJ_LIB=${CONDA_PREFIX}/share/proj
```

## Requirements for connecting to the database

The database only accepts connections from within the ASF Full VPN or from clients
Expand Down Expand Up @@ -57,7 +71,38 @@ Finally, ingest the dataset:

```
cd ../../
make pypgstac-load db_host=<host> db_admin_password=<password> table=items ndjson_file=sentinel-1-global-coherence.ndjson
make pypgstac-load db_host=<host> db_admin_password=<password> table=items ndjson_file=collections/sentinel-1-global-coherence/sentinel-1-global-coherence.ndjson
```

## Creating and ingesting the HAND dataset

We must create and ingest the HAND dataset after running a new STAC API deployment. We must also
re-create and re-ingest the dataset after making changes to how the STAC items are structured.

Fetch the list of S3 objects:

```
cd collections/glo-30-hand/
./list-hand-objects
wc -l hand-s3-objects.txt
```

Confirm that the number of lines is `26450` (one per object).

Next, create the dataset:

```
python create_hand_items.py hand-s3-objects.txt
wc -l glo-30-hand.ndjson
```

Again, confirm that the number of lines is the same as in the previous step.

Finally, ingest the dataset:

```
cd ../../
make pypgstac-load db_host=<host> db_admin_password=<password> table=items ndjson_file=collections/glo-30-hand/glo-30-hand.ndjson
```

## Manually connecting to the database
Expand Down
4 changes: 2 additions & 2 deletions apps/database/buildspec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ phases:
build:
commands:
- make configure-database db_host=$PGHOST db_admin_password=$PGPASSWORD db_read_password=$READ_PASSWORD
- python convert_collection_to_ndjson.py collections/sentinel-1-global-coherence/sentinel-1-global-coherence.json
- make pypgstac-load db_host=$PGHOST db_admin_password=$PGPASSWORD table=collections ndjson_file=collections/sentinel-1-global-coherence/sentinel-1-global-coherence.ndjson
- python convert_collections_to_ndjson.py collections/sentinel-1-global-coherence/sentinel-1-global-coherence.json collections/glo-30-hand/glo-30-hand.json
- make pypgstac-load db_host=$PGHOST db_admin_password=$PGPASSWORD table=collections ndjson_file=collections.ndjson
83 changes: 83 additions & 0 deletions collections/glo-30-hand/create_hand_items.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import argparse
import urllib.parse
from datetime import datetime, timezone
from pathlib import Path, PurePath

import asf_stac_util
import boto3
from osgeo import gdal
from shapely import geometry

gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN', 'EMPTY_DIR')

s3 = boto3.client('s3')

COLLECTION_ID = 'glo-30-hand'


def get_s3_url() -> str:
bucket = 'glo-30-hand'
location = s3.get_bucket_location(Bucket=bucket)['LocationConstraint']
return f'https://{bucket}.s3.{location}.amazonaws.com/'


def write_stac_items(s3_keys: list[str], s3_url: str, output_file: Path) -> None:
with output_file.open('w') as f:
for count, s3_key in enumerate(s3_keys, start=1):
print(f'Creating STAC items: {count}/{len(s3_keys)}', end='\r')
gdal_info_output = gdal_info(s3_key, s3_url)
stac_item = create_stac_item(s3_key, s3_url, gdal_info_output)
f.write(asf_stac_util.jsonify_stac_item(stac_item) + '\n')


def gdal_info(s3_key: str, s3_url: str) -> dict:
url = f'/vsicurl/{urllib.parse.urljoin(s3_url, s3_key)}'
return gdal.Info(url, format='json')


def create_stac_item(s3_key: str, s3_url: str, gdal_info_output: dict) -> dict:
item_id = PurePath(s3_key).stem
item_geometry = gdal_info_output['wgs84Extent']
return {
'type': 'Feature',
'stac_version': '1.0.0',
'id': item_id,
'properties': {
'datetime': None,
'start_datetime': datetime(2010, 12, 1, tzinfo=timezone.utc),
'end_datetime': datetime(2015, 2, 1, tzinfo=timezone.utc),
},
'geometry': item_geometry,
'assets': {
'data': {
'href': urllib.parse.urljoin(s3_url, s3_key),
'type': 'image/tiff; application=geotiff',
},
},
'bbox': geometry.shape(item_geometry).bounds,
'stac_extensions': [],
'collection': COLLECTION_ID,
}


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('s3_objects', type=Path, help='Path to a text file containing the list of S3 objects')
parser.add_argument('-o', '--output-file', type=Path, help='Path for the output file',
default='glo-30-hand.ndjson')
parser.add_argument('-n', '--number-of-items', type=int, help='Number of items to create')
return parser.parse_args()


def main():
args = parse_args()

with args.s3_objects.open() as f:
s3_keys = f.read().splitlines()[:args.number_of_items]

s3_url = get_s3_url()
write_stac_items(s3_keys, s3_url, args.output_file)


if __name__ == '__main__':
main()
34 changes: 34 additions & 0 deletions collections/glo-30-hand/glo-30-hand.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"type": "Collection",
"id": "glo-30-hand",
"title": "Global 30m Height Above Nearest Drainage (HAND)",
"stac_version": "1.0.0",
"description": "Height Above Nearest Drainage (HAND) is a terrain model that normalizes topography to the relative heights along the drainage network and is used to describe the relative soil gravitational potentials or the local drainage potentials. Each pixel value represents the vertical distance to the nearest drainage. The HAND data provides near-worldwide land coverage at 30 meters and was produced from the 2021 release of the Copernicus GLO-30 Public DEM as distributed in the Registry of Open Data on AWS. To generate the GLO-30 HAND, we used the ASF Tools Python package, which is based on the HydroSAR Big_Hand_notebook.ipynb and uses PySheds, a library for simple and fast watershed delineation in Python. The HAND data are provided as a tiled set of Cloud Optimized GeoTIFFs (COGs) with 30-meter (1 arcsecond) pixel spacing. The COGs are organized into the same 1 degree by 1 degree grid tiles as the GLO-30 DEM, and individual tiles are pixel-aligned to the corresponding COG DEM tile.",
"links": [
{
"rel": "about",
"href": "https://glo-30-hand.s3.us-west-2.amazonaws.com/readme.html",
"type": "text/html"
}
],
"extent": {
"spatial": {
"bbox": [
-180.0,
-90.0,
180.0,
90.0
]
},
"temporal": {
"interval": [
[
"2010-12-01T00:00:00Z",
"2015-02-01T00:00:00Z"
]
]
}
},
"stac_extensions": [],
"license": "Creative Commons Attribution 4.0 International Public License."
}
3 changes: 3 additions & 0 deletions collections/glo-30-hand/list-hand-objects
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env bash

aws s3 ls --no-sign-request --recursive s3://glo-30-hand/v1/2021/ | cut -c '32-' | grep '.tif$' > hand-s3-objects.txt
20 changes: 3 additions & 17 deletions collections/sentinel-1-global-coherence/create_coherence_items.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
import argparse
import json
import urllib.parse
from dataclasses import dataclass

from datetime import datetime, timezone
from pathlib import Path, PurePath

import asf_stac_util
import boto3
from shapely import geometry

Expand Down Expand Up @@ -70,17 +70,7 @@ def write_stac_items(s3_keys: list[str], s3_url: str, output_file: Path) -> None
for count, s3_key in enumerate(s3_keys, start=1):
print(f'Creating STAC items: {count}/{len(s3_keys)}', end='\r')
stac_item = create_stac_item(s3_key, s3_url)
f.write(jsonify_stac_item(stac_item) + '\n')


def jsonify_stac_item(stac_item: dict) -> str:
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime) and obj.tzinfo == timezone.utc:
return obj.isoformat().removesuffix('+00:00') + 'Z'
return json.JSONEncoder.default(self, obj)

return json.dumps(stac_item, cls=DateTimeEncoder)
f.write(asf_stac_util.jsonify_stac_item(stac_item) + '\n')


def create_stac_item(s3_key: str, s3_url: str) -> dict:
Expand Down Expand Up @@ -122,7 +112,7 @@ def create_stac_item(s3_key: str, s3_url: str) -> dict:


def parse_s3_key(s3_key: str) -> ItemMetadata:
item_id = item_id_from_s3_key(s3_key)
item_id = PurePath(s3_key).stem
parts = item_id.split('_')
if len(parts) == 3:
tile, _, product = parts
Expand Down Expand Up @@ -152,10 +142,6 @@ def parse_s3_key(s3_key: str) -> ItemMetadata:
return metadata


def item_id_from_s3_key(s3_key: str) -> str:
return PurePath(s3_key).stem


def bounding_box_from_tile(tile: str) -> geometry.Polygon:
# "Tiles in the data set are labeled by the upper left coordinate of each 1x1 degree tile"
# http://sentinel-1-global-coherence-earthbigdata.s3-website-us-west-2.amazonaws.com/#organization
Expand Down
13 changes: 0 additions & 13 deletions convert_collection_to_ndjson.py

This file was deleted.

15 changes: 15 additions & 0 deletions convert_collections_to_ndjson.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import argparse
import json
from pathlib import Path

parser = argparse.ArgumentParser()
parser.add_argument('--output-file', type=Path, default='collections.ndjson')
parser.add_argument('collections', type=Path, nargs='+')
args = parser.parse_args()

with args.output_file.open('w') as output_file:
for collection in args.collections:
with collection.open() as f:
data = json.load(f)
json.dump(data, output_file)
output_file.write('\n')
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ channels:
dependencies:
- python=3.9
- pip
- gdal=3.6.0
- postgresql
- pip:
- -r requirements.txt
12 changes: 12 additions & 0 deletions lib/asf-stac-util/asf_stac_util/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import json
from datetime import datetime, timezone


def jsonify_stac_item(stac_item: dict) -> str:
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime) and obj.tzinfo == timezone.utc:
return obj.isoformat().removesuffix('+00:00') + 'Z'
return json.JSONEncoder.default(self, obj)

return json.dumps(stac_item, cls=DateTimeEncoder)
10 changes: 10 additions & 0 deletions lib/asf-stac-util/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from setuptools import find_packages, setup

setup(
name='asf-stac-util',
license='BSD',
include_package_data=True,
install_reqires=[],
python_requires='~=3.9',
packages=find_packages(),
)
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
-r requirements-apps-api.txt
-r requirements-run-codebuild.txt
./lib/asf-stac-util/
boto3==1.26.21
cfn-lint==0.72.1
flake8==6.0.0
Expand Down
Loading

0 comments on commit ca6c75d

Please sign in to comment.