Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute aggregation/transformation + plotting & evaluation analyses #34

Open
wants to merge 109 commits into
base: main
Choose a base branch
from

Conversation

glitt13
Copy link
Collaborator

@glitt13 glitt13 commented Dec 19, 2024

An approach to aggregate and transform existing attribute data to create new attribute data
Additionally, create & save plots that visualize results and aid in algorithm performance

Additions

  • xssa_attrs_tform.yaml: The example configuration file describing how variables are aggregated and transformed.
  • fs_tfrm_attrs.py : this is the main processing script that calls functions inside tfrm_attr.py
  • tfrm_attr.py : contains new functions for the fs_proc package
  • test_tfrm_attr.py: unit tests
  • fs_attrs_miss.R: Reads in missing comid-attributes file sometimes generated during fs_tfrm_attrs.py and attempts to find missing attribute data. In case missing attribute data exist, this Rscript is called inside the fs_tfrm_attrs.py to attempt to acquire data and use it for attribute transformation.
  • principal component analysis based on predictor dataset and response variable, including a generation of PCA plot: pca_stdscaled_tfrm, plot_pca_stdscaled_tfrm, plot_pca_stdscaled_cumulative_var, std_pca_plot_path, functions comprehensively summarized in the plot_pca_save_wrap wrapper
  • random forest feature importance, all relevant functions comprehensively summarized in the save_feat_imp_fig_wrap wrapper
  • algorithm evaluation using the learning_curve analysis, including the AlgoEvalPlotLC class, with functions comprehensively summarized with the plot_learning_curve_save_wrap wrapper
  • Create & save predicted vs observed regression plot, comprehensively summarized in the plot_pred_vs_obs_wrap wrapper
  • Generate map of predicted response variables, with all relevant functions comprehensively summarized in the plot_map_pred_wrap wrapper
  • Generate map of best prediction across multiple datasets, with all relevant functions comprehensively summarized by the plot_best_algo_wrap wrapper 
  • AGU 2024 analysis scripts: the ealstm analyses, such as /scripts/analysis/fs_proc_viz_best_ealstm.py and the more-formal /scripts/eval_ingest/ealstm/proc_ealstm_agu24.py plus associated config files in the ealstm/ directory
  • Created an updated algo training/testing evaluation script, fs_proc_algo_viz.py as an updated version of fs_proc_algo.py with new evaluation and plotting features

Removals

Changes

  • Converted scripts/config/attr_gen_camels.R from hard-coding into a generalizable form that uses the config file scripts/config/attr_gen_camels_config.yaml
  • proc.attr.hydfab: refactor attribute grabbing to pull multiple comids & attributes at once. Attribute grabbing now takes tens of minutes rather than days.

Testing

  1. Unit testing with test_tfrm_attr.py has been challenging to implement under a normal unittest package approach owing to a mysterious dask.dataframe as dd error. Implemented a work-around that partially tests this package by nixing most instances of using classes.
  2. proc.attr.hydfab: revise & improve unit testing with the aforementioned refactoring

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

…at strings just-in-case user doesn't use f'{dir_base}'
…ssing comids or variables have been identified, else write message that there could be an issue in the logic
glitt13 and others added 8 commits December 17, 2024 09:48
…yle theme (#33)

* Create custom matplotlib stylesheet for RaFTS plots

* Flip axes on scatter; change perf to pred for clarity

* Change perf to pred for clarity

* Read in mplstyle file directly from fs_algo

* incorporate plotting functions into fs_perf_viz.py

* Use functions for creating file output paths

* Change perf_map to pred_map

---------

Co-authored-by: glitt13 <[email protected]>
…fficient s3 retrievals of basin attribute data with proc_attr_mlti_wrap. Still needs integration into full processing.
…ibutes all at once; doc: update documentation pertaining to refactoring
… change in script to a different config file path
…he attribute grabbing needed when creating new transformation attributes
…nd response dataset; refactor: train/test split logic now considers common indices for simplicity
…e same comid; ensure unique comids b/w train/test split, ensure NA and duplicates consistently handled across multiple steps with the creation of combine_resp_gdf_comid_wrap()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants