-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #12 from ClimateImpactLab/Readme_changes
Readme changes
- Loading branch information
Showing
1 changed file
with
33 additions
and
218 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,13 @@ | ||
[![pipeline status](https://gitlab.com/ClimateImpactLab/Impacts/integration/badges/main/pipeline.svg)](https://gitlab.com/ClimateImpactLab/Impacts/integration/-/commits/master) | ||
[![docs page](https://img.shields.io/badge/docs-latest-blue)](https://climateimpactlab.gitlab.io/Impacts/integration/) | ||
[![coverage report](https://gitlab.com/ClimateImpactLab/Impacts/integration/badges/main/coverage.svg)](https://gitlab.com/ClimateImpactLab/Impacts/integration/-/commits/main) | ||
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) | ||
|
||
# DSCIM: The Data-driven Spatial Climate Impact Model | ||
|
||
This Python library enables the calculation of a sector integrated social cost of carbon | ||
(SCC) using a variety of valuation methods and assumptions. The main purpose of this | ||
This Python library enables the calculation of sector-specific partial social cost of greenhouse gases (SC-GHG) and SCGHGs that are combined across sectors using a variety of valuation methods and assumptions. The main purpose of this | ||
library is to parse the monetized spatial damages from different sectors and integrate them | ||
using different options (or menu options) that encompass different decisions, such as | ||
using different options ("menu options") that encompass different decisions, such as | ||
discount levels, discount strategies, and different considerations related to | ||
economic and climate uncertainty. | ||
|
||
## Documentation | ||
|
||
Full documentation is available here: https://climateimpactlab.gitlab.io/Impacts/integration/ | ||
|
||
## Setup | ||
|
||
To begin we assume you have a system with `conda` available from the command line, and some familiarity with it. A conda distribution is available from [miniconda](https://docs.conda.io/en/latest/miniconda.html), [Anaconda](https://www.anaconda.com/), or [mamba](https://mamba.readthedocs.io/en/latest/). This helps to ensure required software packages are correctly compiled and installed, replicating the analysis environment. | ||
|
@@ -48,21 +40,46 @@ python damage_fun_runs/Directory_setup.py | |
|
||
Note that this will download several gigabytes of data and may take several minutes, depending on your connection speed. | ||
|
||
## Running SCCs | ||
## Running SCGHGs | ||
|
||
After setting up your environment and the input data, you can run SCCs under different conditions with | ||
After setting up your environment and the input data, you can run SCGHG calculations under different conditions with | ||
|
||
```bash | ||
python damage_fun_runs/command_line_scc.py | ||
``` | ||
|
||
and follow the on-screen prompts. | ||
and follow the on-screen prompts. When the selector is a carrot, you may only select one option. Use the arrow keys on your keyboard to highlight your desired option and click enter to submit. When you are presented with `X` and `o` selectors, you may use the spacebar to select (`X`) or deselect (`o`) then click enter to submit once you have chosen your desired number of parameters. Once you have completed all of the options, the DSCIM run will begin. | ||
|
||
### Command line options | ||
|
||
Below is a short summary of what each command line option does. To view a more detailed description of what the run parameters do, see the [Documentation](https://impactlab.org/research/dscim-user-manual-version-092022-epa) for Data-driven Spatial Climate Impact Model (DSCIM). | ||
|
||
#### Sector | ||
|
||
The user may only select one sector per run. Sectors represent the combined SCGHG or partial SCGHGs of the chosen sector. | ||
|
||
#### Discount rate | ||
|
||
These runs use endogenous Ramsey discounting that are targeted to begin at the chosen near-term discount rate(s). | ||
|
||
#### Pulse years | ||
|
||
Pulse year represents the SCGHG for a pulse of greenhouse gas (GHG) emitted in the chosen pulse year(s). | ||
|
||
#### Domain of damages | ||
|
||
The default is a global SCGHG accounting for global damages in response to a pulse of GHG. The user has the option to instead compute a domestic SCGHG accounting only for United States damages. | ||
|
||
#### Optional files | ||
|
||
By default, the script will produce the expected SCGHGs as a `.csv`. The user also has the option to save the full distribution of SCGHGs -- across emissions, socioeconomics, and climate uncertainty -- as a `.csv`, and the option to save global consumption net of baseline climate damages ("global_consumption_no_pulse") as a netcdf `.nc4` file. | ||
|
||
|
||
## Structure and logic | ||
|
||
The library is split into several components that implement the hierarchy | ||
defined by the menu options. These are the main elements of the library and | ||
serve as the main classes to call different menu options. In this release, only `Baseline` is available: | ||
serve as the main classes to call different menu options. | ||
|
||
```mermaid | ||
graph TD | ||
|
@@ -76,7 +93,7 @@ SubGraph1Flow(Storage and I/O) | |
subgraph "Recipe Book" | ||
A[StackedDamages] --> B[MainMenu] | ||
B[MainMenu] --> C[Baseline]; | ||
B[MainMenu] --> C[AddingUpRecipe]; | ||
B[MainMenu] --> D[RiskAversionRecipe]; | ||
B[MainMenu] --> E[EquityRecipe] | ||
end | ||
|
@@ -96,209 +113,7 @@ Class | Function | |
|
||
|
||
and these elements can be used for the menu options: | ||
- `Baseline`: Adding up all damages and collapse them to calculate a general SCC without valuing uncertainty. | ||
- `AddingUpRecipe`: Adding up all damages and collapse them to calculate a general SCC without valuing uncertainty. | ||
- `RiskAversionRecipe`: Add risk aversion certainty equivalent to consumption calculations - Value uncertainty over econometric and climate draws. | ||
- `EquityRecipe`: Add risk aversion and equity to the consumption calculations. Equity includes taking a certainty equivalent over spatial impact regions. | ||
|
||
|
||
|
||
### Documentation and contributing | ||
|
||
Learn more about how to contribute to the library checking our [contribution | ||
guidelines](./CONTRIBUTING.md) and the official [documentation][8]. | ||
|
||
|
||
|
||
## For Developers | ||
|
||
|
||
### Contained environment | ||
Additionally, we also have a built a contained environment compatible with most | ||
HPC systems using Singularity. You can check more about how to use Singularity | ||
using [its quick start guide][5]. In a nutshell, our Singularity container is | ||
a Ubuntu OS with a Python (`miniconda3`) environment with all the needed | ||
dependencies installed. We provide options to open jupyter notebooks that are | ||
compatible with `Dask`. At the same time, you can build you own scripts and run | ||
them against the same environment. | ||
|
||
**A note on singularity remote builts**: `singularity build` needs root access, | ||
which might be impossible to have if you live under the HPC admin tyranny. But, | ||
Singularity have your back with the use of remote builds: `singularity build | ||
--remote`. This means that the building process happens remotely on [Sylabs][6] | ||
servers and gets automatically downloaded to the local machine. To make use of | ||
this option, you need to authenticate and open a Sylabs account, you can start | ||
this process by just doing: `singularity remote login`. A link will appear to | ||
create an account and an API key. Later, an prompt will appear asking for your | ||
API key, you just need to copy and paste it to your terminal. | ||
|
||
You can build the container using our `Makefile`: | ||
|
||
```bash | ||
make Makefile container | ||
``` | ||
|
||
After running this you will have a `/images` directory with the container file. | ||
This container will contain the same libraries that in the [pangeo environment][7] | ||
but the current version of the `dscim` will not be installed. In this | ||
repo we added some tools to install the `dscim` and open a Jupyter | ||
notebook to explore data or run SCC calculations. | ||
`infrastructure/run_in_singularity.sh` is a script that installs this repo and | ||
opens a Jupyter notebooks inside the container: | ||
|
||
```bash | ||
age: ${0} {build|notebook} | ||
OPTIONS: | ||
-h|help Show this message | ||
-b|--build | ||
-n|--notebook | ||
INFRASTRUCTURE: | ||
Build the infrastructure and output python | ||
$ ./run_singularity.sh --build | ||
Run notebook inside Singularity image. This function takes arguments | ||
for both IP and port to use in Jupyterlab | ||
$ ./run_singularity.sh --notebook 0.0.0.0 8888 | ||
``` | ||
We have wrapped this process within the same `Makefile` we use to build the | ||
Singularity container, so you can just do: | ||
```bash | ||
make Makefile run-jupyter | ||
``` | ||
The Jupyter `--port` option is hardcoded in the notebook, and the | ||
auto-ssh-fowarding is active by using the `--ip` flag. Be aware that you do not | ||
need to build the image on each run, the image will live in the `images/` folder | ||
and you can use the `run-jupyter` to run the Jupyter Notebook. Also, everytime | ||
you build the notebook, a fresh version of the code will be installed in the | ||
notebook (this might take a while due to compilation issues). | ||
## Requirements | ||
The library runs on Python +3.6 and it expects a that all requirements are | ||
installed previous running any code, check Installation The integration | ||
process is stacking different damage outcomes from several sectors | ||
at the impact region level. Thus, you will need several tricks to deal with | ||
the data I/O. | ||
## Computing | ||
### Computing introduction | ||
One of the tricks we rely on is the extensive use of `Dask` and `xarray` to | ||
read raw damage data in `nc4` or `zarr` format (This latter is how coastal damages are provided). | ||
Hence, you will need to have a `Dask` `distributed.client` to harness the power of distributed computing. | ||
The computing requirements will vary depending on the execution of different | ||
menu options and the number of sectors you are aggregating. These are some general rules about | ||
computational intensity: | ||
1. For recipes, `EquityRecipe > RiskAversionRecipe > BaselineRecipe` | ||
2. For discounting, `euler_gwr > euler_ramsey > naive_gwr > naive_ramsey > constant > constant_model_collapsed` | ||
3. More options (ie., greater number of SSPs, greater number of sectors) means more computing resources required. | ||
4. `Dask` does not perfectly release memory after each menu run. Thus, if you are running | ||
several menu options, in loops or otherwise, you may need to execute a `client.restart()` partway through | ||
to force `Dask` into emptying memory. | ||
5. Inclusion of coastal increases memory usage exponentially (due to the 500 batches and 10 GMSL bins against which | ||
other sectors' damages must be broadcasted). Be careful and smart when running this option, | ||
and don't be afraid to reconsider chunking for the files being read in. | ||
### Setting up a Dask client | ||
Ensure that the following packages are installed and updated: | ||
[Dask](https://docs.dask.org/en/latest/install.html), [distributed](https://distributed.dask.org/en/latest/install.html), [Jupyter Dask extension](https://github.com/dask/dask-labextension), `dask_jobqueue`. | ||
Ensure that your Jupyter Lab has add-ons enabled so that you can access Dask as an extension. | ||
You have two options for setting up a Dask client. | ||
#### Local client | ||
<details><summary>Click to expand</summary> | ||
If your local node has sufficient memory and computational power, you will only need to create a local Dask client. | ||
_If you are operating on Midway3, you should be able to run the menu in its entirety. | ||
Each `caslake` computing node on Midway3 has 193 GB memory, and 48 CPUs. This is sufficient for all options._ | ||
- Open the Dask tab on the left side of your Jupyter Lab page. | ||
- Click `New + ` and wait for a cluster to appear. | ||
- Drag and drop the cluster into your notebook and execute the cell. | ||
- You now have a new Dask client! | ||
- click on the `CPU`, `Worker Memory`, and `Progress` tabs to track progress. You can arrange them in a side bar of your | ||
Jupyter notebook to keep them all visible at the same time. | ||
- note that opening 2 or 3 local Clients does _not_ get you 2 or 3 times the compute space. These clients will be sharing | ||
the same node, so in fact computing may be slower as they are fighting for resources. (_check this, it's a hypothesis_) | ||
![](images/dask_example.png) | ||
</details> | ||
#### Distributed client | ||
<details><summary>Click to expand</summary> | ||
If your local node does not have sufficient computational power, you will need to manually request separate | ||
nodes with `dask.distributed`: | ||
``` | ||
cluster = SLURMCluster() | ||
print(cluster.job_script()) | ||
cluster.scale(10) | ||
client = Client(cluster) | ||
client | ||
``` | ||
You can adjust the number of workers by changing the integer inside `cluster.scale()`. You can adjust the CPUs | ||
and memory per worker inside `~/.config/dask/jobqueue.yaml`. | ||
To track progress of this client, copy-paste the "Dashboard" IP address and SSH into it. Example code: | ||
``` | ||
ssh -N -f -L 8787:10.50.250.7:8510 [email protected] | ||
``` | ||
Then go to `localhost:8787` in your browser to watch the magic. | ||
</details> | ||
### Dask troubleshooting | ||
Most Dask issues in the menu come from one of two sources: | ||
1. requesting Dask to compute too many tasks (your chunks are too small) which will result in a sort of "hung state" | ||
and empty progress bar. | ||
2. requesting Dask to compute too _large_ tasks (your chunks are too big). In this case, you will see memory under | ||
`Worker Memory` taskbar shoot off the charts. Then your kernel will likely be killed by SLURM. | ||
How can you avoid these situations? | ||
1. Start with `client.restart()`. Sometimes, Dask does not properly release tasks from memory and this plugs up | ||
the client. Doing a fresh restart (and perhaps a fresh restart of your notebook) will fix the problem. | ||
2. Next, check your chunks! Ensure that any `xr.open_dataset()` or `xr.open_mfdataset()` commands have a `chunks` | ||
argument passed. If not, Dask's default is to load the entire file into memory before rechunking later. This | ||
is very bad news for impact-region-level damages, which are 10TB of data. | ||
3. Start executing the menu object by object. Call an object, select a small slice of it, and add `.compute()`. If the object | ||
computes successfully without overloading memory, it's not the memory leak. Keep moving through the menu until you find the | ||
source of the error. _Hot tip: it's usually the initial reading-in of files where nasty things happen._ Check each object in the menu to | ||
ensure three things: | ||
- chunks should be a reasonable size ('reasonable' is relative, but approximately 250-750 MB is typically successful | ||
on a Midway3 `caslake` computing node) | ||
- not too many chunks! Again, this is relative, but more than 10,000 likely means you should reconsider your chunksize. | ||
- not too many tasks per chunk. Again, relative, but more than 300,000 tasks early in the menu is unusual and should be | ||
checked to make sure there aren't any unnecessary rechunking operations being forced upon the menu. | ||
4. Consider rechunking your inputs. If your inputs are chunked in a manner that's orthogonal to your first few operations, | ||
Dask will have a nasty time trying to rechunk all those files before executing things on them. Rechunking and resaving | ||
usually takes a few minutes; rechunking in the middle of an operation can take hours. | ||
5. If this has all been done and you are still getting large memory errors, it's possible that Dask isn't correctly separating | ||
and applying operations to chunks. If this is the case, consider adding a `map_blocks` method, which explicitly | ||
tells Dask to apply the operation to each chunk sequentially. | ||
For more information about how to | ||
execute `Dask` and the `job-queue` library (in case you are in a computing | ||
cluster), refer to [Dask Distributed][3] and [job-queue][4] documentation. | ||
You can check several use-case examples on the computed notebook under examples. | ||
### Priority | ||
Maintaining priority is important when given tight deadlines to run menu options. To learn more about | ||
priority, click [here](https://rcc.uchicago.edu/docs/tutorials/rcc-tips-and-tricks.html#priority | ||
). | ||
In general, following these hygiene rules will keep priority high: | ||
1. Kill all notebooks/clusters when not in use. | ||
2. Only request what you need (in terms of `WALLTIME`, `WORKERS`, and `WORKER MEMORY`). | ||
3. Run things right the first time around. Your notebook text is worth an extra double check :) | ||
[3]: https://distributed.dask.org/en/latest/ | ||
[4]: https://jobqueue.dask.org/en/latest/ | ||
[5]: https://sylabs.io/guides/3.5/user-guide/quick_start.html | ||
[6]: https://sylabs.io/ | ||
[7]: https://pangeo.io/setup_guides/hpc.html | ||
[8]: https://climateimpactlab.gitlab.io/Impacts/integration/ |