Skip to content

Commit

Permalink
Merge pull request #413 from pyiron/minor_bump
Browse files Browse the repository at this point in the history
[minor] 0.10.0
  • Loading branch information
liamhuber authored Aug 22, 2024
2 parents 3bd6241 + cef10fb commit 3041942
Show file tree
Hide file tree
Showing 54 changed files with 4,545 additions and 7,086 deletions.
4 changes: 0 additions & 4 deletions .binder/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,7 @@ dependencies:
- cloudpickle =3.0.0
- executorlib =0.0.1
- graphviz =9.0.0
- h5io =0.2.4
- h5io_browser =0.0.16
- pandas =2.2.2
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.3
- toposort =1.10
Expand Down
4 changes: 0 additions & 4 deletions .ci_support/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,7 @@ dependencies:
- cloudpickle =3.0.0
- executorlib =0.0.1
- graphviz =9.0.0
- h5io =0.2.4
- h5io_browser =0.0.16
- pandas =2.2.2
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.3
- toposort =1.10
Expand Down
4 changes: 0 additions & 4 deletions .ci_support/lower_bound.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,7 @@ dependencies:
- cloudpickle =3.0.0
- executorlib =0.0.1
- graphviz =9.0.0
- h5io =0.2.2
- h5io_browser =0.0.14
- pandas =2.2.0
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.0
- toposort =1.10
Expand Down
148 changes: 100 additions & 48 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,88 +22,140 @@ By allowing (but not demanding, in the case of data DAGs) users to specify the e

By scraping type hints from decorated functions, both new data values and new graph connections are (optionally) required to conform to hints, making workflows strongly typed.

Individual node computations can be shipped off to parallel processes for scalability. (This is a beta-feature at time of writing; the `Executor` executor from [`executorlib`](https://github.com/pyiron/exectorlib) is supported and tested; automated execution flows to not yet fully leverage the efficiency possible in parallel execution, and `executorlib`'s more powerful flux- and slurm- based executors have not been tested and may fail.)
Individual node computations can be shipped off to parallel processes for scalability. (This is a beta-feature at time of writing; standard python executors like `concurrent.futures.ThreadPoolExecutor` and `ProcessPoolExecutor` work, and the `Executor` executor from [`executorlib`](https://github.com/pyiron/exectorlib) is supported and tested; `executorlib`'s more powerful flux- and slurm- based executors have not been tested and may fail.)

Once you're happy with a workflow, it can be easily turned it into a macro for use in other workflows. This allows the clean construction of increasingly complex computation graphs by composing simpler graphs.

Nodes (including macros) can be stored in plain text as python code, and registered by future workflows for easy access. This encourages and supports an ecosystem of useful nodes, so you don't need to re-invent the wheel. (This is a beta-feature, with full support of [FAIR](https://en.wikipedia.org/wiki/FAIR_data) principles for node packages planned.)
Nodes (including macros) can be stored in plain text as python code, and imported by future workflows for easy access. This encourages and supports an ecosystem of useful nodes, so you don't need to re-invent the wheel. When these python files are in a properly managed git repository and released in a stable channel (e.g. conda-forge), they fulfill most requirements of the [FAIR](https://en.wikipedia.org/wiki/FAIR_data) principles.

Executed or partially-executed graphs can be stored to file, either by explicit call or automatically after running. When creating a new node(/macro/workflow), the working directory is automatically inspected for a save-file and the node will try to reload itself if one is found. (This is an alpha-feature, so it is currently only possible to save entire graphs at once and not individual nodes within a graph, all the child nodes in a saved graph must have been instantiated by `Workflow.create` (or equivalent, i.e. their code lives in a `.py` file that has been registered), and there are no safety rails to protect you from changing the node source code between saving and loading (which may cause errors/inconsistencies depending on the nature of the changes).)
Executed or partially-executed graphs can be stored to file, either by explicit call or automatically after running. These can be reloaded (automatically on instantiation, in the case of workflows) and examined/rerun, etc.

## Example
## Installation

`conda install -c conda-forge pyiron_workflow`

## User introduction

`pyiron_workflow` offers a single-point-of-entry in the form of the `Workflow` object, and uses decorators to make it easy to turn regular python functions into "nodes" that can be put in a computation graph.

Nodes can be used by themselves and -- other than being "delayed" in that their computation needs to be requested after they're instantiated -- they feel an awful lot like the regular python functions they wrap:
Decorating your python function as a node means that it's actually now a class, so you'll need to instantiate it before you can call it -- but otherwise it's a _lot_ like a regular python function. You can put regular python code inside it, and it that code will run whenever you run the node.

```python
>>> from pyiron_workflow import Workflow
>>>
>>> @Workflow.wrap.as_function_node("y")
... def AddOne(x):
... return x + 1
>>> @Workflow.wrap.as_function_node
... def HelloWorld(greeting="Hello", subject="World"):
... hello = f"{greeting} {subject}"
... return hello
>>>
>>> AddOne(AddOne(AddOne(x=0)))()
3
>>> hello_node = HelloWorld() # Instantiate a node instance
>>> hello_node(greeting="Salutations") # Use it just like a function
'Salutations World'

```

But the intent is to collect them together into a workflow and leverage existing nodes. We can directly perform (many but not quite all) python actions natively on output channels, can build up data graph topology by simply assigning values (to attributes or at instantiation), and can package things together into reusable macros with customizable IO interfaces:
The intent of this node form is to build up a collection of function calls into a _directed graph_ that gives a formal definition of your workflow. Under the hood, the node above has labelled input and output data channels:

```python
>>> import math
>>>
>>> @Workflow.wrap.as_function_node("y")
... def AddOne(x):
... return x + 1
>>>
>>> @Workflow.wrap.as_function_node("permutations")
... def Permutations(n, choose=None):
... return math.perm(n, choose)
>>>
>>> @Workflow.wrap.as_macro_node
... def PermutationDifference(self, n, choose=None):
... self.p = Permutations(n, choose=choose)
... self.plus_1 = AddOne(n)
... self.p_plus_1 = Permutations(self.plus_1, choose=choose)
... self.difference = self.p_plus_1 - self.p
... return self.difference
>>>
>>> wf = Workflow("toy_example")
>>>
>>> wf.choose = Workflow.create.standard.UserInput(2)
>>> wf.small = PermutationDifference(5, choose=wf.choose)
>>> wf.large = PermutationDifference(25, choose=wf.choose)
>>>
>>> print(hello_node.inputs.labels)
['greeting', 'subject']

>>> hello_node.outputs.hello.value
'Salutations World'

```

Each time it runs, the `Function` node is taking its input, passing it to the function we decorated, executing that, and then putting the result into the node's output channels. These inputs and outputs can be chained together to form a computational graph. Inputs and outputs aren't actually just the data they hold -- they are data channels -- but you can perform most python operations on them _as though_ they were raw objects. If a node only has a single output, you can reference it directly in place of its single output channel. This dynamically creates a new node to delay the operation and handle it at runtime:

```python
>>> first = HelloWorld("Welcome", "One")
>>> second = HelloWorld("Greetings", "All")
>>> combined = first + " and " + second
>>> print(type(combined))
<class 'pyiron_workflow.nodes.standard.Add'>
>>> combined()
'Welcome One and Greetings All'

```

<aside style="background-color: #a86932; border-left: 5px solid #ccc; padding: 10px;">
Nodes couple input values to output values. In order to keep this connection truthful, it is best practice to write nodes that do not mutate mutable data, i.e. which are functional and idempotent. Otherwise, a downstream node operation may silently alter the output of some upstream node! This is python and idempotency is only a best practice, not a strict requirement; thus it's up to you to decide whether you want your nodes to mutate data or not, and to take care of side effects.
</aside>

Sets of nodes can be collected under the umbrella of a living `Workflow` object, that can have nodes add to and removed from it. Let's build the above graph as a `Workflow`, and leverage one of the built-in `standard` nodes to hold input and fork it to two different downstream nodes:

```python
>>> wf = Workflow("readme")
>>> wf.greeting = Workflow.create.standard.UserInput("Hi")
>>> wf.first = HelloWorld(greeting=wf.greeting)
>>> wf.second = HelloWorld(greeting=wf.greeting)
>>> wf.combined = wf.first + " and " + wf.second
>>> wf()
{'small__difference': 10, 'large__difference': 50}
{'combined__add': 'Hi World and Hi World'}

```

Packaging as a workflow/macro makes it easy to re-run calculations:
Here we see that the output comes as a dictionary, with keys according to the node lable (`'combined'` and the channel name (`'add'`). Workflows return all unconnected output, and take any unconnected input as input arguments with similar keyword rules. Let's exploit this to easily re-run our workflow with different values:

```python
>>> wf(choose__user_input=5)
{'small__difference': 600, 'large__difference': 1518000}
>>> wf(greeting__user_input="Hey", first__subject="you")
{'combined__add': 'Hey you and Hey World'}

```

We can also visualize our workflow, at a high-level:
Once we have a workflow we like and think is useful, we may wish to package it as a `Macro` node. These are a lot like workflow, but "crystallized". Like `Function` nodes, they have a fixed set of input and output. They also let you have a bit more control over what gets exposed as IO, unlike workflows which (by default) expose all the unconnected bits. Defining a `Macro` is also a lot like defining a `Function` -- it can be done by decorating a simple python function. However, where `Function` nodes execute their decorated function at each run and can hold arbitrary python code, `Macro` nodes decorate a function that defines the graph they hold, it is executed _once_ at instantiation, the input values are themselves all data channels and not the raw data, and from then on running the node runs that entire graph:

![](_static/readme_diagram_shallow.png)
```python
>>> @Workflow.wrap.as_macro_node
... def Combined(wf, greeting="Hey", subject1="You", subject2="World"):
... wf.first = HelloWorld(greeting=greeting, subject=subject1)
... wf.second = HelloWorld(greeting=greeting, subject=subject2)
... wf.combined = wf.first + " and " + wf.second
... return wf.combined
>>>
>>> hello_macro = Combined()
>>> hello_macro(subject2="everyone")
{'combined': 'Hey You and Hey everyone'}

Or diving in and resolving macro nodes to a specified depth:
```

Not only does this give us a bit more control with how people interface with the graph (i.e. what IO to expose, what defaults (if any) to use), but `Macro` nodes are _composable_ -- we can stick them into other macros or workflows as nodes, i.e. we can nest a sub-graph inside our graph. Let's do that, and also give a first example of a node with multiple outputs:

![](_static/readme_diagram_deep.png)
```python
>>> @Workflow.wrap.as_macro_node
... def Composition(self, greeting):
... self.compose = Combined(greeting=greeting)
... self.simple = greeting + " there"
... return self.compose, self.simple
>>>
>>> composed = Composition()
>>> composed(greeting="Hi")
{'compose': 'Hi You and Hi World', 'simple': 'Hi there'}

```

(`diagram` in this case is a `graphviz.graphs.Digraph` object, which will render nicely in a Jupyter notebook, but in this case was saved as a png using the `diagram.render` method.)
(Note that we also renamed the first variable to python's canonical `self`. It doesn't matter what the first variable is called -- but it must be there and represents the macro instance! If it's easier to use python's `self`, go for it; if you're copying and pasting from a workflow you wrote, `wf` or whatever your workflow variable was will be easier.)

## Installation
Although the macro exposes only particular data for IO, you can always dig into the object to see what's happening:

`conda install -c conda-forge pyiron_workflow`
```python
>>> composed.compose.second.outputs.hello.value
'Hi World'

```

This lets us build increasingly complex workflows by composing simpler blocks. These building blocks are shareable and reusable by storing your macro in a `.py` file, or even releasing them as a python package. These workflows are formally defined, so unlike a plain python script it's easy to give them non-code representations, e.g. we can `.draw` our workflows or nodes at a high level:

![](_static/readme_diagram_shallow.png)

Or dive in and resolving macro nodes to a specified depth:

![](_static/readme_diagram_deep.png)

## Learning more
To explore other benefits of `pyiron_workflow`, look at the `quickstart.ipynb` in the demo [notebooks](../notebooks). There we explore
- Making nodes (optionally) strongly-typed
- Saving and loading (perhaps partially) executed workflows
- Parallelizing workflow computation by assigning executors to specific nodes
- Iterating over data with for-loops

Check out the demo [notebooks](../notebooks), read through the docstrings, and don't be scared to raise an issue on this GitHub repo!
For more advanced topics, like cyclic graphs, check the `deepdive.ipynb` notebook, explore the docstrings, or look at the
Binary file modified docs/_static/readme_diagram_deep.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/readme_diagram_shallow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 0 additions & 4 deletions docs/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,7 @@ dependencies:
- cloudpickle =3.0.0
- executorlib =0.0.1
- graphviz =9.0.0
- h5io =0.2.4
- h5io_browser =0.0.16
- pandas =2.2.2
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.3
- toposort =1.10
Expand Down
Loading

0 comments on commit 3041942

Please sign in to comment.