Skip to content

Commit

Permalink
finalized docs for single table bridge, starting with mrio bridge
Browse files Browse the repository at this point in the history
  • Loading branch information
konstantinstadler committed Jul 18, 2024
1 parent f5e7aff commit 4d2efb8
Show file tree
Hide file tree
Showing 2 changed files with 81 additions and 49 deletions.
95 changes: 58 additions & 37 deletions doc/source/notebooks/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,22 @@

# %% [markdown]
# All conversion relies on a *mapping table* that maps (bridges)
# the indices of the source data to the indices of the target data.
# the index/columns of the source data to the indices of the target data.

# %% [markdown]
# This tables requires headers (columns) corresponding to the column headers
# of the source data as well as bridge columns which specify the new target index.
# This tables requires headers (columns) corresponding to the
# index.names and columns.names of the source data (constraining data)
# as well as bridge data which specify the new target index.
# The later are indicated by "NewIndex__OldIndex" - **the important part are
# the two underscore in the column name**. Another column named "factor" specifies
# the two underscore in the column name**. Another (optional)
# column named "factor" specifies
# the multiplication factor for the conversion.
# Finally, additional columns can be used to indicate units and other information.
# TODO:CHECK Finally, additional columns can be used to indicate units and other information.

# %% [markdown]
# All mapping occurs on the index of the original data.
# Thus the data to be converted needs to be in long matrix format, at least for the index
# levels which are considered in the conversion.
# TODO: In case conversion happens on MRIO Extensions this conversion happens automatically.
# Constraining data columns can either specify columns or index.
# However, any constraining data to be bridged/mapped to a new name need to be
# in the index of the original data.

# %% [markdown]
# The first example below shows the simplest case of renaming a single table.
Expand Down Expand Up @@ -186,7 +187,8 @@
ghg_new_kg

# %% [markdown]
# In case of unit conversion of pymrio satellite accounts, we can also check the unit before and set the unit after conversion:
# In case of unit conversion of pymrio satellite accounts,
# we can also check the unit before and set the unit after conversion:
# TODO: unit conversion extensions


Expand Down Expand Up @@ -261,8 +263,8 @@


# %% [markdown]
# A more complex example is the application of regional specific characterization factors.
# (The same principle applies to sector specific factors.)
# A more complex example is the application of regional specific characterization
# factors (the same principle applies to sector specific factors.).
# For that, we assume some land use results for different regions:

# %%
Expand Down Expand Up @@ -292,7 +294,10 @@
# %% [markdown]
# Now we setup a pseudo characterization table for converting the land use data into
# biodiversity impacts. We assume, that the characterization factors vary based on
# land use type and region.
# land use type and region. However, the "region" information is a pure
# constraining column (specifying the region for which the factor applies) without
# any bridge column mapping it to a new name. Thus, the "region" can either be in the index
# or in the columns of the source data - in the given case it is in the columns.

# %% [markdown]
landuse_characterization = pd.DataFrame(
Expand All @@ -313,43 +318,59 @@
)
landuse_characterization

biodiv_result = pymrio.convert(land_use_result, landuse_characterization)
biodiv_result


# CONT: Explain the biodiv_result - difference between bridge and constraining column

# CONT: finalize docs for biodiv
# CONT: start working on convert for extensions/mrio method


# %% [markdown]
# Irrespectively of the table or the mrio system, the convert function always follows the same pattern.
# It requires a bridge table, which contains the mapping of the indicies of the source data to the indicies of the target data.
# This bridge table has to follow a specific format, depending on the table to be converted.
# The table shows several possibilities to specify factors which apply to several
# regions/stressors.
# All of them are based on the [regular expression](https://docs.python.org/3/howto/regex.html):
#
# - In the first data line we use the "or" operator "|" to specify that the
# same factor applies to Wheat and Maize.
# - On the next line we use the grouping capabilities of regular expressions
# to indicate the same factor for Region 2 and 3.
# - At the last four lines .* matches any number of characters. This
# allows to specify the same factor for both forest types or to abbreviate
# the naming of the stressor (last 2 lines).
#
# The use of regular expression is optional, one can also use one line per factor.
# In the example above, we indicate the factor for Rice in 3 subsequent entries.
# This would be equivalent to ```["Rice", "BioImpact", "Region[1,2,3]", 12]```.


# %% [markdown]
# Lets assume a table with the following structure (the table to be converted):
# With that setup we can now characterize the land use data in land_use_result.

# %% [markdown]
# TODO: table from the test cases
# %%
biodiv_result = pymrio.convert(land_use_result, landuse_characterization)
biodiv_result

# %% [markdown]
# A potential bridge table for this table could look like this:
# Note, that in this example the region is not in the index
# but in the columns.
# The convert function can handle both cases.
# The only difference is that constraints which are
# in the columns will never be aggregated but keep the column resolution at the
# output. Thus the result is equivalent to

# %% [markdown]
# TODO: table from the test cases
# %%
land_use_result_stacked = land_use_result.stack(level="region")
biodiv_result_stacked = pymrio.convert(land_use_result_stacked,
landuse_characterization,
drop_not_bridged_index=False)
biodiv_result_stacked.unstack(level="region")[0]

# %% [markdown]
# Describe the column names, and which entries can be regular expressions
# In this case we have to specify to not drop the not bridged "region" index.
# We then unstack the result again, and have to select the first element ([0]),
# since there where not other columns left after stacking them before the
# characterization.

# CONT: start working on convert for extensions/mrio method

# %% [markdown]
# Once everything is set up, we can continue with the actual conversion.

# %% [markdown]
# ## Converting a single data table
# Irrespectively of the table or the mrio system, the convert function always follows the same pattern.
# It requires a bridge table, which contains the mapping of the indicies of the source data to the indicies of the target data.
# This bridge table has to follow a specific format, depending on the table to be converted.


# %% [markdown]
# ## Converting a pymrio extension
35 changes: 23 additions & 12 deletions pymrio/tools/ioutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -1014,23 +1014,36 @@ def convert(df_orig, df_map, agg_func="sum", drop_not_bridged_index=True):
----------
df_orig : pd.DataFrame
The DataFrame to process.
The index levels need to be named (df.index.name needs to
be set for all levels). All index to be bridged to new
names need to be in the index (these are columns
The index/columns levels need to be named (df.index.name
and df.columns.names needs to be set for all levels).
All index to be bridged to new names need to be in the index (these are columns
indicated with two underscores '__' in the mapping dataframe, df_map).
Other constraining conditions (e.g. regions, sectors) can be either
in the index or columns. The values in index are preferred.
in the index or columns. If the same name exists in the
index and columns, the values in index are preferred.
df_map : pd.DataFrame
The DataFrame with the mapping of the old to the new classification.
This requires a specific structure, which depends on the structure of the
dataframe to be characterized: one column for each index level in the dataframe
and one column for each new index level in the characterized result dataframe.
dataframe to be characterized:
- Constraining data (e.g. stressors, regions, sectors) can be
either in the index or columns of df_orig. The need to have the same
name as the named index or column in df_orig. The algorithm searches
for matching data in df_orig based on all constraining columns in df_map.
- Bridge columns are columns with '__' in the name. These are used to
map (bridge) some/all of the constraining columns in df_orig to the new
classification.
- One column "factor", which gives the multiplication factor for the
conversion. If it is missing, it is set to 1.
This is better explained with an example.
Assuming a original dataframe df_orig with
index names 'stressor' and 'compartment'
the characterizing dataframe would have the following structure (column names):
index names 'stressor' and 'compartment' and column name 'region',
the characterizing dataframe could have the following structure (column names):
stressor ... original index name
compartment ... original index name
Expand All @@ -1054,15 +1067,13 @@ def convert(df_orig, df_map, agg_func="sum", drop_not_bridged_index=True):
"region" is constraining column, these can either be for the index or column
in df_orig. In case both exist, the one in index is preferred.
The structure "stressor" and "impact__stressor" is important.
agg_func : str or func
the aggregation function to use for multiple matchings (summation by default)
drop_not_bridged_index : bool, optional
What to do with index levels in df_orig not appearing in the bridge columns.
If True, drop them (aggregation across these), if False,
If True, drop them after aggregation across these, if False,
pass them through to the result.
*Note:* Only index levels will be dropped, not columns.
Expand All @@ -1073,7 +1084,7 @@ def convert(df_orig, df_map, agg_func="sum", drop_not_bridged_index=True):
Extension for extensions:
extensino ... extension name
extension ... extension name
unit_orig ... the original unit (optional, for double check with the unit)
unit_new ... the new unit to be set for the extension
Expand Down

0 comments on commit 4d2efb8

Please sign in to comment.