Integrate papermill
for automated testing of Jupyter Notebooks
#70
Labels
enhancement
New feature or request
papermill
for automated testing of Jupyter Notebooks
#70
Jupyter Notebook Testing Overview
In the field of data science and analysis, Jupyter Notebooks come as a very handy tool since they provide interactive computing and visualization of data. But when projects scale and complex workflows integrate with notebooks, it is now needed to integrate the notebooks with automated processes.
Here are four tools that enable testing of Jupyter Notebooks and integration with CI/CD pipelines:
nbconvert
,nbval
,papermill
, andpytest-notebook
.nbconvert
This is a facility that allows one to convert Jupyter Notebooks to many formats, including Python scripts, HTML, PDF, and Markdown, among others. This might prove useful for sharing analyses in different formats or merging notebooks into different development pipeline stages.
Key Features:
nbval
"nbval" is a pytest plugin running notebooks like tests in the full test framework. In this context, it allows checking the reproducibility of your analysis and fitting the notebook in the continuous integration loop.
Key Features:
papermill
Further, papermill provides a parameterized way of executing Jupyter notebooks. In a way, the notebook could be given varying input to be run on them, which makes this very useful for operations such as batch processing or automated reporting or even parameterized analysis.
Key Features:
pytest-notebook
The
pytest-notebook
is a notebook testing plugin testing against notebooks in more sophisticated ways. For example, it makes the tests compare the outputs as found in the notebook with some expected outputs. Last checked, the latest version is 0.10.Key Features:
Proposal: Adopting Papermill for Enhanced Notebook Testing
After reviewing the capabilities of the aforementioned tools, I propose the adoption of papermill for our workflows. Here is where Papermill really comes to the fore, based on its core strength of stability and flexibility not only to allow for the automation and parameterized execution of Jupyter Notebooks but also for systematic testing of the outputs all the way down to binary data. Hence, they form the best suitable need for our case, especially for the advanced areas in data analysis flexibility in execution and assurance at the time of output verification.
The text was updated successfully, but these errors were encountered: