Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

House keeping #16

Merged
merged 3 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
python -m pip install -r requirements.txt
- name: Lint with pylint
run: |
python3 -m pylint deltap predictors prepare_layers prepare_species
python3 -m pylint deltap prepare_layers prepare_species
- name: Tests
run: |
python3 -m pytest ./tests
96 changes: 6 additions & 90 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,11 @@
# LIFE implementation

Code for calculating persistence values.

Originally derived from and using IUCN modlib and aoh lib by Daniele Baisero.

## Pipeline

The code is designed to run as a series of independant stages to minimise re-running code. The stages currently are:

1. Input generation: running speciesgenerator.py will generate a CSV list of species/seasonality/experiment tuples.
2. AoH calculation: using a tool like [littlejohn](https://github.com/carboncredits/littlejohn) you can then process each line of the generated data. This will create a new CSV file that has the inputs plus the area in.
3. Persistence calculation: TODO - add script that processes the output of stage 2 to generate persistence values in CSV.

This is currently encoded in the included makefile.

## Configuration

The main program run.py takes configuration from a json file, which should be called `config.json` or specified using the --config parameter. The contents of the file should be like this:

```
{
"iucn": {
"api_key": "YOUR_IUCN_API_KEY"
},
"experiments": {
"ae491-jung": {
"translator": "jung",
"habitat": "S:\\aoh_trial\\jung_aoh_basemaps\\HabMap_merge.tif",
"elevation": "S:\\aoh_trial\\jung_aoh_basemaps\\Modlib_DEM_merge.tif",
"area": "S:\\aoh_trial\\jung_aoh_basemaps\\small.tiff",
"range": "S:\\aoh_trial\\mammals_terrestrial_filtered_collected_fix.gpkg",
"iucn_batch": "S:\\4C\\data\\aoh_trial\\MAMMALS"
},
"gpu_tests": {
"translator": "esacci",
"habitat": "S:\\aoh_trial\\esacci_aoh_basemaps\\esacci_2020.tif",
"elevation": "S:\\aoh_trial\\esacci_aoh_basemaps\\esacci_dem.tif",
"area": "S:\\aoh_trial\\esacci_aoh_basemaps\\small_area.tif",
"range": "S:\\aoh_trial\\mammals_terrestrial_filtered_collected_fix.gpkg"
}
}
}
```

| Key | Optional | Meaning |
| --- | -------- | ------- |
| iucn | yes | Contains IUCN API access data. |
| api_key | yes | Your key for accessing the IUCN redlist API. |
| experiments | no | A dictionary of data required to run an experiment. You use the --experiment option on run.py to select the one you want to use for a particular invocation. |
| translator | no | Which translator should be used to convert the range map data to match the raster map data. Valid values are "jung" and "esacci". |
| habitat | no | Raster habitat map file location |
| elevation | no | Raster elevation map file location |
| area | yes | Raster area of pixel map file location. If not provided you'll get a count of pixels rather than a total area. |
| range | no | Vector species range map file location |
| iucn_batch | yes | The location of canned/pre-downloaded IUCN data. If present this will be used in preference of doing API lookings. |


## GPU Support

CUDA support is provided if cupy is installed.
This repository implements the LIFE extinction risk methodology as published in [Eyres et al](https://www.cambridge.org/engage/coe/article-details/66866978c9c6a5c07a3e07fa). The code will generate maps that cover the impact to extinction risk in an area under the two scenarios: conversion of the land to arable use, and coversion of the land to pre-human.

## Running the code

# H3AreaCalculator
The methodology is explained in more detail in [method.md](method.md), but there is also a script set up to just do an entire run of the pipeline in `./scripts/run.sh`.

This is the script for calculating the area of a species AoH to individual hex tiles, based on the [H3 tile system](https://h3geo.org/).
## Credits

## Usage

To run for a set of species do:

```
$ python ./calculate.py CURRENT_RASTERS_DIR RANGE_FILE OUTPUT_DIR
```

Where the arguments are:

* CURRENT_RASTERS_DIR - A directory of AoH GeoTIFFs, where each pixel contains an area value of the habitate in the land area covered by that pixel for the species. We currently assume the ID of the species is in the filename.
* RANGE_FILE - A vertor range file that contains the range for all species in the CURRENT_RASTERS_DIR
* OUTPUT_DIR - A directory where to write the output

The output is a CSV file per species that contains the area per tile. If you want to see what those look like you can load them into [Kepler GL](https://kepler.gl/).

## Notes

This is a test of using H3 as the basis for doing equal area calculatins on non-uniform map projections. Currently it just calculates the area of habitat per h3 tile.

The key so far has been using parallelism to make things work well, but avoiding using GDAL in any concurrent context, as GDAL is both thread-unsafe and leaks memory. Thus we use python multiprocessing for parallelism, which uses a new process per worker, and use a new GDAL context in each worker so we don't accumlate leaked memory - it goes away when the worker goes away.

Working out the H3 files to use is a two stage process, again due to GDAL concurrecy limitations. We first do a single threaded appraoch to get all the polygons for a species range, and then once we have the polygon data we can then parallelise turning those polygons into H3 tile IDs. If the individual polygons are very big (often we see one very large polygon and a lot of smaller satellites), we split up the large polygon into smaller sections to aid parallelism. The current way of doing this is naive (just bands of 2 degrees longitude) but is enough to get acceptable performance for the proof of concept.

Then we just process each hex tile in as many concurrent workers as there are CPU cores on the machine. I suspect here we hit an overhead based on having to repeatedly search the GDAL raster file for data, as we see on hipp we go from 10k hex tiles per second to 1k hex tiles per second as the raster gets significantly larger. An optimisation to investigate is opening the raster once and then using shared memory between the workers. But even without this, the largest example we have takes just 4.4 hours on hipp, with others being considerably less. So again, for a proof of concept, this seems a good place to be.

# IUCN Data Importer

See IUCN-importer for scripts for combining csv and shp files from IUCN.
Originally derived from and using IUCN modlib and aoh lib by Daniele Baisero.
257 changes: 0 additions & 257 deletions predictors/endemism.py

This file was deleted.

Loading
Loading