diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0c6e6ebc..46e87506 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -2,8 +2,6 @@ 💫⚙️🤖 We're excited that you're here and want to contribute. 🤖⚙️💫 -By joining our efforts, you will be helping to democratise emulation for Digital Twins and beyond. - We want to ensure that every user and contributor feels welcome, included and supported to participate in the AutoEmulate community. Whether you're a seasoned developer, a machine learning researcher, a data scientist, or just someone eager to learn and contribute, **you are welcome here**. We value every contribution, be it big or small, and we appreciate the unique perspectives you bring to the project. We hope that the information provided in this document will make it as easy as possible for you to get involved. If you find that you have questions that are not discussed below, please let us know through one of the many ways to [get in touch](#get-in-touch). @@ -12,36 +10,26 @@ We hope that the information provided in this document will make it as easy as p If you'd like to find out more about AutoEmulate, make sure to check out: -1. **README**: For a high-level overview of the project, please refer to our README. +1. **README**: For a high-level overview of the project, please refer to our [README](https://github.com/alan-turing-institute/autoemulate/blob/main/README.md). 2. **Documentation**: For more detailed information about the project, please refer to our [documentation](https://alan-turing-institute.github.io/autoemulate). -3. **Project Roadmap**: Familiarise yourself with our direction and goals by checking out [the project's project board](https://github.com/orgs/alan-turing-institute/projects/185/views/4) in lieu of a formal product roadmap. - -## Get in touch - -The easiest way to get involved with the active development of AutoEmulate is to join our sprints. If you are looking to become part of the core development team in this way, reach out to Research Application Manager Kalle Westerling via email kwesterling@turing.ac.uk to request an invite. - - - +## How to Contribute -**Email**: If you prefer a formal communication method or have specific concerns, please reach us via lead Research Software Engineer Martin Stoffel, mstoffel@turing.ac.uk. +This section provides a high-level guide to contributing to AutoEmulate, designed for those with little or no experience with open source projects. For more detailed information, please also refer to the docs for: -## How to Contribute +* [contributing emulators](contributing-emulators.md) +* [contributing to the docs](contributing-docs.md) We welcome contributions of all kinds, be it code, documentation, or community engagement. We encourage you to read through the following sections to learn more about how you can contribute to the package. -We are always interested in adding more simulations or simulation input/output datasets from any field (see https://github.com/alan-turing-institute/autoemulate/issues/4). - ## How to Submit Changes We follow the same instructions for submitting changes to the project as those developed by [The Turing Way](https://github.com/the-turing-way/the-turing-way/blob/main/CONTRIBUTING.md#making-a-change-with-a-pull-request). In short, there are five steps to adding changes to this repository: 1. **Fork the Repository**: Start by [forking the AutoEmulate repository](https://github.com/alan-turing-institute/autoemulate/fork). -1. **Make Changes**: Ensure your code adheres to the style guidelines and passes all tests. -2. **Commit and Push**: Use clear commit messages. -3. **Open a Pull Request**: Ensure you describe the changes made and any additional details. +2. **Make Changes**: Ensure your code follows the existing code style ([PEP 8](https://peps.python.org/pep-0008/)) and passes all tests. +3. **Commit and Push**: Use clear commit messages. +4. **Open a Pull Request**: Ensure you describe the changes made and any additional details. ### 1. Fork the Repository @@ -49,49 +37,42 @@ Once you have [created a fork of the repository](https://github.com/alan-turing- Make sure to [keep your fork up to date](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork) with the main repository, otherwise, you can end up with lots of dreaded [merge conflicts](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/about-merge-conflicts). -If you prefer working with GitHub in the browser, [these instructions](https://github.com/KirstieJane/STEMMRoleModels/wiki/Syncing-your-fork-to-the-original-repository-via-the-browser) describe how to sync your fork to the original repository. - ### 2. Make Changes -Try to keep the changes focused. If you submit a large amount of work all in one go it will be much more work for whoever is reviewing your pull request. Help them help you! :wink: +After writing new code or modifying existing code, please make sure to: -Are you new to Git and GitHub or just want a detailed guide on getting started with version control? Check out the [Version Control chapter](https://the-turing-way.netlify.com/version_control/version_control.html) in _The Turing Way_ Book! +* write [numpy-style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html). +* write tests in the `tests/` directory using [pytest](https://docs.pytest.org/en/7.4.x/). +* format the code using [black](https://github.com/psf/black) -### 3. Commit and Push +It would be great if you could also [update the documentation](contributing-docs.md) to reflect the changes you've made. If you plan to add a new emulator have a look at the [contributing emulators docs](contributing-emulators.md). -While making your changes, commit often and write good, detailed commit messages. [This blog](https://chris.beams.io/posts/git-commit/) explains how to write a good Git commit message and why it matters. It is also perfectly fine to have a lot of commits - including ones that break code. A good rule of thumb is to push up to GitHub when you _do_ have passing tests then the continuous integration (CI) has a good chance of passing everything. 😸 +### 3. Commit and Push -Please do not re-write history! That is, please do not use the [rebase](https://help.github.com/en/articles/about-git-rebase) command to edit previous commit messages, combine multiple commits into one, or delete or revert commits that are no longer necessary. +While making your changes, commit often and write good, detailed commit messages. [This blog](https://chris.beams.io/posts/git-commit/) explains how to write a good Git commit message and why it matters. ### 4. Open a Pull Request -We encourage you to open a pull request as early in your contributing process as possible. This allows everyone to see what is currently being worked on. It also provides you, the contributor, feedback in real-time from both the community and the continuous integration as you make commits (which will help prevent stuff from breaking). - -GitHub has a [nice introduction](https://guides.github.com/introduction/flow) to the pull request workflow, but please [get in touch](#get-in-touch) if you have any questions :balloon:. +We encourage you to open a pull request as early in your contributing process as possible. This allows everyone to see what is currently being worked on. It also provides you, the contributor, feedback in real-time. GitHub has a [nice introduction](https://guides.github.com/introduction/flow) to the pull request workflow. ## First-timers' Corner -If you're new to the project, we recommend starting with issues labeled as ["good first issue"](https://github.com/alan-turing-institute/autoemulate/issues?q=is:issue+is:open+label:%22good+first+issue%22). These are typically simpler tasks that offer a great starting point. - -There's also the label ["thoughts welcome"](https://github.com/alan-turing-institute/autoemulate/issues?q=is:issue+is:open+label:%22thoughts+welcome%22), which allows for you to contribute with discussion points in the issues, even if you don't want to -or cannot contribute to the codebase. - -If you feel ready for it, you can also open a new issue. Before you open a new issue, please check if any of [our open issues](https://github.com/alan-turing-institute/autoemulate/issues) cover your idea already. If you open a new issue, please follow our basic guidelines laid out in our issue templates, which you should be able to see if you [open a new issue](https://github.com/alan-turing-institute/autoemulate/issues/new/choose). +Just to-reiterate: We welcome all contributions, no matter how big or small! If anything in this guide is unclear, please reach out to ask or simply ask questions in a PR or issue. ## Reporting Bugs Found a bug? Please open an issue here on GitHub to report it. We have a template for opening issues, so make sure you follow the correct format and ensure you include: -- A clear title. -- A detailed description of the bug. -- Steps to reproduce it. -- Expected versus actual behavior. +* A clear title. +* A detailed description of the bug. +* Steps to reproduce it. +* Expected versus actual behavior. ## Recognising Contributions -We value and recognise every contribution. All contributors will be acknowledged in the [contributors](https://github.com/alan-turing-institute/autoemulate/tree/main#contributors) section of the README. Notable contributions will also be highlighted in our sprint demo meetings. +All contributors will be acknowledged in the [contributors](https://github.com/alan-turing-institute/autoemulate/tree/main#contributors) section of the README. -AutoEmulate follows the [all-contributors](https://github.com/kentcdodds/all-contributors#emoji-key) specifications. The all-contributors bot usage is described [here](https://allcontributors.org/docs/en/bot/usage). You can see a list of current contributors here. +AutoEmulate follows the [all-contributors](https://github.com/kentcdodds/all-contributors#emoji-key) specifications. The all-contributors bot usage is described [here](https://allcontributors.org/docs/en/bot/usage). To add yourself or someone else as a contributor, comment on the relevant Issue or Pull Request with the following: @@ -108,8 +89,8 @@ What happens if you accidentally run the bot before the previous run was merged If you're stuck or need assistance: -- Reach out via email for personalised assistance. (See ["Get in touch"](#get-in-touch) above for links.) -- Consider pairing up with a another contributor for guidance. Contact us for guidance on this topic +* Reach out via email for personalised assistance. (See [Get in touch](#get-in-touch) above for links.) +* Consider pairing up with a another contributor for guidance. Contact us for guidance on this topic **Once again, thank you for considering contributing to AutoEmulate! We hope you enjoy your contributing experience.** @@ -122,3 +103,13 @@ Every contributor is expected to adhere to our Code of Conduct. It outlines our ---- These Contributing Guidelines have been adapted from the [Contributing Guidelines](https://github.com/the-turing-way/the-turing-way/blob/main/CONTRIBUTING.md#recognising-contributions) of [The Turing Way](https://github.com/the-turing-way/the-turing-way)! (License: CC-BY) + +## Get in touch + +**Email**: For any inquiries, please reach out to lead Developer Martin Stoffel, . + + + + \ No newline at end of file diff --git a/README.md b/README.md index ee00cfc0..f1d20d39 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# AutoEmulate +# AutoEmulate ![CI](https://github.com/alan-turing-institute/autoemulate/actions/workflows/ci.yaml/badge.svg) [![codecov](https://codecov.io/gh/alan-turing-institute/autoemulate/graph/badge.svg?token=XD1HXQUIGK)](https://codecov.io/gh/alan-turing-institute/autoemulate) @@ -7,14 +7,11 @@ [![Documentation](https://img.shields.io/badge/documentation-blue)](https://alan-turing-institute.github.io/autoemulate/) - -Simulations of physical systems are often slow and need lots of compute, which makes them unpractical for real-world applications like digital twins, or when they have to run thousands of times for sensitivity analyses. The goal of `AutoEmulate` is to make it easy to replace simulations with fast, accurate emulators. To do this, `AutoEmulate` automatically fits and compares various models, ranging from simple models like Radial Basis Functions and Second Order Polynomials to more complex models like Support Vector Machines, Gaussian Processes and Conditional Neural Processes to find the best emulator for a simulation. +Simulations of physical systems are often slow and need lots of compute, which makes them unpractical for real-world applications like digital twins, or when they have to run thousands of times for sensitivity analyses. The goal of `AutoEmulate` is to make it easy to replace simulations with fast, accurate emulators. To do this, `AutoEmulate` automatically fits and compares various emulators, ranging from simple models like Radial Basis Functions and Second Order Polynomials to more complex models like Support Vector Machines, Gaussian Processes and Conditional Neural Processes to find the best emulator for a simulation. The project is in early development. - - -## installation +## Installation There's currently a lot of development, so we recommend installing the most current version from GitHub: @@ -35,7 +32,7 @@ cd autoemulate poetry install ``` -## quick start +## Quick start ```python import numpy as np @@ -70,7 +67,7 @@ si = ae.sensitivity_analysis(emulator) ae.plot_sensitivity_analysis(si) ``` -## documentation +## Documentation You can find tutorials, FAQs and the API reference [here](https://alan-turing-institute.github.io/autoemulate/). The documentation is still work in progress. diff --git a/docs/_toc.yml b/docs/_toc.yml index 9a3473fd..cfac0018 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -20,6 +20,8 @@ chapters: - file: community/index sections: - file: community/contributing + - file: community/contributing-emulators + - file: community/contributing-docs - file: community/code-of-conduct - file: community/faq/index sections: diff --git a/docs/community/contributing-docs.md b/docs/community/contributing-docs.md new file mode 100644 index 00000000..f7092091 --- /dev/null +++ b/docs/community/contributing-docs.md @@ -0,0 +1,52 @@ +# Contributing to the docs + +We welcome all documentation contributions, from fixing small typos to adding comprehensive tutorials. This guide will help you get started. + +## Prerequisites + +Before contributing, please read our [contributing guide](contributing.md) to set up your development environment and understand our workflow. + +## Types of Documentation Contributions + +### 1. Fixing typos and small changes + +1. Navigate to the relevant file in the `docs/` directory +2. Make your changes +3. Build the docs locally to verify your changes: + + ```bash + jupyter-book build docs --all + ``` + +4. Open the generated file `docs/_build/html/index.html` in your browser to preview. + +### 2. Adding tutorials + +1. Create a new Jupyter notebook in `docs/tutorials/` +2. Include: + - Clear introduction and objectives + - Step-by-step instructions + - Code examples +3. Add your tutorial to the table of contents: + - Open `_toc.yml` in the docs/ directory + - Add an entry for your new tutorial +4. Build and verify the docs as described above + +### 3. Updating API documentation + +The API documentation is generated from source code docstrings. There are two scenarios: + +#### Modifying existing API docs + +Simply update the docstring in the source code and rebuild: + +```bash +jupyter-book build docs --all +``` + +#### Adding new API docs + +1. Create a new `.rst` file in `docs/community/reference/` +2. Add the file to `_toc.yml` +3. Ensure your source code has comprehensive docstrings +4. Build the documentation diff --git a/docs/community/contributing-emulators.md b/docs/community/contributing-emulators.md new file mode 100644 index 00000000..9778314d --- /dev/null +++ b/docs/community/contributing-emulators.md @@ -0,0 +1,71 @@ +# Contributing emulators + +This guide explains how to contribute new emulator models to `AutoEmulate`. + +## Emulator structure + +All emulators in AutoEmulate are implemented as `scikit-learn` estimators, making them compatible with scikit-learn's cross-validation, grid-search, and pipeline functionality. Have a look at the [scikit-learn estimator developer guide](https://scikit-learn.org/1.5/developers/develop.html#rolling-your-own-estimator) for more details on how to implement a new emulator. + +**Note**: Keep in mind when contributing emuulators that AutoEmulate doesn't currently support time-series or spatial data. + +### Core Requirements + +Each emulator class must: + +1. Live in `autoemulate/emulators/` +2. Inherit from `sklearn.base`'s `BaseEstimator` and `RegressorMixin` +3. Implement the `fit` and `predict` methods +4. Include these additional methods/properties: + + - `get_grid_params()`: Returns a dictionary of parameter values for grid search over hyperparameters + - `model_name`: Property that returns the emulator name (usually `self.__class__.__name__`) + - `_more_tags()`: Defines emulator properties like multioutput support + +### Getting Started + +The easiest way to create a new emulator is to: + +1. Look at existing emulators in `autoemulate/emulators/` as templates +2. Run the scikit-learn estimator tests `tests/test_estimators.py` early to catch any implementation issues +3. Add your own tests in `tests/models/` + +### Naming Conventions + +The `model_name` property allows the emulator to be accessed with both long and short names: + +- Long name: The class name (e.g., "RadialBasisFunctions") +- Short name: Uppercase letters from long name (e.g., "rbf") + +Make sure your chosen class name: + +- Doesn't conflict with existing emulators +- Contains some uppercase letters for the short name +- Is descriptive of the emulation technique + +## Testing emulators + +We use two types of tests: + +1. **Scikit-learn Test Suite**: Add your emulator to `tests/test_estimators.py` to verify scikit-learn compatibility. Not all tests need to pass - use `_more_tags()` to skip incompatible tests. See the [estimator tags overview](https://scikit-learn.org/1.5/developers/develop.html#estimator-tags) for details. + +2. **Custom Tests**: Add specific tests for your emulator in `tests/models/` to verify its core functionality (e.g., validating end-to-end functionality of components such as parameter search etc). + +## Registering an emulator + +After your emulator passes tests: + +1. Add it to `model_registry` in `autoemulate/emulators/__init__.py` +2. Set `is_core=False` to make it available but not a default model + +## PyTorch emulators + +PyTorch emulators require special handling: + +1. Put the model architecture in `autoemulate/emulators/neural_networks/` +2. Put the main emulator class in `autoemulate/emulators/` +3. Use [skorch](https://skorch.readthedocs.io/) for scikit-learn compatibility: + - Create `self.model_` as `NeuralNetRegressor` instance + - Pass model architecture as first argument + - Use `self.model_` in `fit` and `predict` methods + +See existing PyTorch emulators like `conditional_neural_process.py` for examples. diff --git a/docs/reference/emulators/light_gbm.rst b/docs/reference/emulators/light_gbm.rst index 25b1afa1..d92fdc06 100644 --- a/docs/reference/emulators/light_gbm.rst +++ b/docs/reference/emulators/light_gbm.rst @@ -1,5 +1,5 @@ autoemulate.emulators.light_gbm -============================= +=============================== .. automodule:: autoemulate.emulators.light_gbm :members: diff --git a/docs/tutorials/02_speed.ipynb b/docs/tutorials/02_speed.ipynb index dde536f8..ffb7f154 100644 --- a/docs/tutorials/02_speed.ipynb +++ b/docs/tutorials/02_speed.ipynb @@ -190,7 +190,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 1) parallise model fits using `n_jobs`\n", + "## 1) parallise model fits using `n_jobs`\n", "The n_jobs parameter allows you to specify the number of CPU cores to use for parallel processing. Setting n_jobs = -1 uses all available cores, speeding up computations when working with large datasets.\n", "\n", "Note: Maxing out all available cores might not always lead to faster computation times. Due to overhead from parallelization, memory bandwidth limitations, and potential load imbalances, using more cores can sometimes result in diminishing returns or even slower performance.\n", @@ -240,8 +240,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "### 2) restrict the range of models\n", + "## 2) restrict the range of models\n", "\n", "Another approach is to limit the range of models by selecting a subset of relevant types based on your domain and problem expertise. This selection process typically considers factors such as the nature of the problem, data characteristics or the need for interpretability. By narrowing down the types of models, you can reduce the computational burden and focus on the most promising architectures for your specific task." ] @@ -294,7 +293,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 3) reduce the number of folds in cross validation using `cross_validator` " + "## 3) reduce the number of folds in cross validation using `cross_validator` " ] }, { @@ -348,7 +347,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 4) modify hyperparameter search\n", + "## 4) modify hyperparameter search\n", "\n", "If we want to use hyperparameter search, we suddenly have to fit many more models. For each model, we might have 20 different parameter combinations, and because we cross validate each combination, we are running 20 * 5 = 100 model fits per model. It's therefore recommended to focus on a few models of interest when using hyperparameter search.\n", "\n", @@ -627,7 +626,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.9" + "version": "3.11.10" } }, "nbformat": 4, diff --git a/misc/AE_logo_final.png b/misc/AE_logo_final.png new file mode 100644 index 00000000..c2a168c4 Binary files /dev/null and b/misc/AE_logo_final.png differ