Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: apply links to functions and new sphinx extension to guide #276

Open
wants to merge 2 commits into
base: epic/195-rewrite-user-guide
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/developer_guide/internal-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ Epic issues are large tasks divided into smaller sub-tasks, marked with the `epi

Example:

```
```markdown
- [ ] Sub-task 1
- [ ] Sub-task 2
- [ ] Sub-task 3
Expand Down
1 change: 0 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ developer_guide/index

The MedModels documentation is your go-to resource for exploring the package. It offers complete API descriptions and a detailed user guide, giving you everything you need to effectively utilize its features.


```{only} html
[![black](https://img.shields.io/badge/code_style-black-black.svg)](https://black.readthedocs.io/en/stable/)
![python versions](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)
Expand Down
224 changes: 146 additions & 78 deletions docs/user_guide/02_medrecord.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,11 @@
02b_query_engine
```


## Preface

Every major library has a central object that consitutes its core. For [PyTorch](https://pytorch.org/), it is the `torch.Tensor`, whereas for [Numpy](https://numpy.org/), it is the `np.array`. In our case, MedModels centres around the `mm.MedRecord` as its foundational structure.

MedModels delivers advanced data analytics methods out-of-the-box by utilizing a structured approach to data storage. This is enabled by the MedRecord class, which organizes data of any complexity within a graph structure. With its Rust backend implementation, MedRecord guarantees high performance, even when working with extremely large datasets.
Every major library has a central object that consitutes its core. For [PyTorch](https://pytorch.org/), it is the `torch.Tensor`, whereas for [Numpy](https://numpy.org/), it is the `np.array`. In our case, MedModels centres around the [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"} as its foundational structure.

MedModels delivers advanced data analytics methods out-of-the-box by utilizing a structured approach to data storage. This is enabled by the [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"} class, which organizes data of any complexity within a graph structure. With its Rust backend implementation, MedRecord guarantees high performance, even when working with extremely large datasets.

```{literalinclude} scripts/02_medrecord_intro.py
---
Expand All @@ -32,34 +30,34 @@ Let's begin by introducing some sample medical data:
:widths: 15 15 15 15
:header-rows: 1

* - ID
- Age
- Sex
- Loc
* - Patient 01
- 72
- M
- USA
* - Patient 02
- 74
- M
- USA
* - Patient 03
- 64
- F
- GER
- - ID
- Age
- Sex
- Loc
- - Patient 01
- 72
- M
- USA
- - Patient 02
- 74
- M
- USA
- - Patient 03
- 64
- F
- GER
:::

This data, stored for example in a Pandas DataFrame, looks like this:

```{literalinclude} scripts/02_medrecord_intro.py
---
language: python
lines: 5-13
lines: 6-13
---
```

In the example below, we create a new MedRecord using the builder pattern. We instantiate a `MedRecordBuilder` and instruct it to add the Pandas DataFrame as nodes, using the 'ID' column for indexing. Additionally, we assign these nodes to the group 'Patients'.
In the example below, we create a new MedRecord using the builder pattern. We instantiate a [`MedRecordBuilder`](medmodels.medrecord.builder.MedRecordBuilder){target="_blank"} and instruct it to add the Pandas DataFrame as nodes, using the _'ID'_ column for indexing. Additionally, we assign these nodes to the group 'Patients'.
The Builder Pattern simplifies creating complex objects by constructing them step by step. It improves flexibility, readability, and consistency, making it easier to manage and configure objects in a controlled way.

```{literalinclude} scripts/02_medrecord_intro.py
Expand All @@ -69,19 +67,29 @@ lines: 30
---
```

:::{dropdown} Methods used in the snippet

- [`builder()`](medmodels.medrecord.medrecord.MedRecord.builder){target="_blank"} : Creates a new [`MedRecordBuilder`](medmodels.medrecord.builder.MedRecordBuilder){target="_blank"} instance to build a [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"}.
- [`add_nodes()`](medmodels.medrecord.builder.MedRecordBuilder.add_nodes){target="_blank"} : Adds nodes to the MedRecord from different data formats and optionally assigns them to a group.
- [`build()`](medmodels.medrecord.builder.MedRecordBuilder.build){target="_blank"} : Constructs a MedRecord instance from the builder’s configuration.
:::

The MedModels MedRecord object, `record`, now contains three patients. Each patient is identified by a unique index and has specific attributes, such as age, sex, and location. These patients serve as the initial nodes in the graph structure of our MedRecord, and are represented as follows:

```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_01.png
:class: transparent-image
```

We can now proceed by adding additional data, such as the following medications.

```{literalinclude} scripts/02_medrecord_intro.py
---
language: python
lines: 15-18
lines: 16-18
---
```
Using the builder pattern to construct the MedRecord allows us to pass as many nodes and edges as needed. If nodes are not added during the initial graph construction, they can easily be added later to an existing MedRecord by calling `add_nodes`, where you provide the DataFrame and specify the column containing the node indices.

Using the builder pattern to construct the MedRecord allows us to pass as many nodes and edges as needed. If nodes are not added during the initial graph construction, they can easily be added later to an existing MedRecord by calling [`add_nodes()`](medmodels.medrecord.medrecord.MedRecordBuilder.add_nodes){target="_blank"}, where you provide the DataFrame and specify the column containing the node indices.

```{literalinclude} scripts/02_medrecord_intro.py
---
Expand All @@ -90,11 +98,17 @@ lines: 32
---
```

:::{dropdown} Methods used in the snippet

- [`add_nodes()`](medmodels.medrecord.medrecord.MedRecord.add_nodes){target="_blank"} : Adds nodes to the MedRecord from different data formats and optionally assigns them to a group.
:::

This will expand the MedRecord, adding several new nodes to the graph. However, these nodes are not yet connected, so let's establish relationships between them!

```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_02.png
:class: transparent-image
```

## Adding Edges to a MedRecord

To capture meaningful relationships between nodes, such as linking patients to prescribed medications, we add edges to the MedRecord. These edges must be specified in a relation table, such as the one shown below:
Expand All @@ -103,18 +117,18 @@ To capture meaningful relationships between nodes, such as linking patients to p
:widths: 15 15 15
:header-rows: 1

* - Pat_ID
- Med_ID
- time
* - Patient 02
- Med 01
- 2020/06/07
* - Patient 02
- Med 02
- 2018/02/02
* - Patient 03
- Med 02
- 2019/03/02
- - Pat_ID
- Med_ID
- time
- - Patient 02
- Med 01
- 2020/06/07
- - Patient 02
- Med 02
- 2018/02/02
- - Patient 03
- Med 02
- 2019/03/02
:::

We can add these edges then to our MedRecord Graph:
Expand All @@ -125,21 +139,36 @@ language: python
lines: 34
---
```

:::{dropdown} Methods used in the snippet

- [`add_edges()`](medmodels.medrecord.medrecord.MedRecord.add_edges){target="_blank"} : Adds edges to the MedRecord from different data formats and optionally assigns them to a group.

:::

This results in an enlarged Graph with more information.

```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_03b.png
:class: transparent-image
```

## Adding Groups to a MedRecord

For certain analyses, we may want to define specific subcohorts within our MedRecord for easier access. We can do this by defining named groups withing our MedRecored.
For certain analyses, we may want to define specific subcohorts within our MedRecord for easier access. We can do this by defining named groups withing our MedRecord.

```{literalinclude} scripts/02_medrecord_intro.py
---
language: python
lines: 36
---
```

:::{dropdown} Methods used in the snippet

- [`add_group()`](medmodels.medrecord.medrecord.MedRecord.add_group){target="_blank"} : Adds a group to the MedRecord instance with an optional list of node indices.

:::

This group will include all the defined nodes, allowing for easier access during complex analyses. Both nodes and edges can be added to a group, with no limitations on group size. Additionally, nodes and edges can belong to multiple groups without restriction.

```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_04.png
Expand All @@ -153,81 +182,120 @@ When building a MedRecord, you may want to save it to create a persistent versio
```{literalinclude} scripts/02_medrecord_intro.py
---
language: python
lines: 58-59
lines: 68-69
---
```

## Printing Overview Tables
:::{dropdown} Methods used in the snippet

The MedModels MedRecord class is designed to efficiently handle large datasets while maintaining a standardized data structure that supports complex analysis methods. As a result, the structure within the MedRecord can become intricate and difficult to manage. To address this, MedModels offers tools to help keep track of the graph-based data. One such tool is the `print_node_overview` method:
- [`to_ron()`](medmodels.medrecord.medrecord.MedRecord.to_ron){target="_blank"} : Writes the MedRecord instance to a RON file.
- [`from_ron()`](medmodels.medrecord.medrecord.MedRecord.from_ron){target="_blank"} : Creates a MedRecord instance from a RON file.
:::

```{literalinclude} scripts/02_medrecord_intro.py
## Overview Tables

The MedRecord class is designed to efficiently handle large datasets while maintaining a standardized data structure that supports complex analysis methods. As a result, the structure within the MedRecord can become intricate and difficult to manage. To address this, MedModels offers tools to help keep track of the graph-based data. One such tool is the [`overview_nodes()`](medmodels.medrecord.medrecord.MedRecord.overview_nodes){target="_blank"} method, which prints an overview over all nodes in the MedRecord.

```{exec-literalinclude} scripts/02_medrecord_intro.py
---
language: python
lines: 38
setup-lines: 1-45
lines: 47
---
```

It will print an overview over all grouped nodes in the MedRecord.
:::{dropdown} Methods used in the snippet

```
-------------------------------------------------------
Nodes Group Count Attribute Info
-------------------------------------------------------
Medications 2 Name Values: Insulin, Warfarin
Patients 3 Age min: 64
max: 74
mean: 70.00
Loc Values: GER, USA
Sex Values: F, M
US-Patients 2 Age min: 72
max: 74
mean: 73.00
Loc Values: USA
Sex Values: M
-------------------------------------------------------
```
As shown, we have two groups of nodes - Patients and Medications - created when adding the nodes. Additionally, there’s a group called 'US-Patients' that we created. For each group of nodes, we can view their attributes along with a brief statistical summary, such as the minimum, maximum, and mean for numeric variables.
- [`overview_nodes()`](medmodels.medrecord.medrecord.MedRecord.overview_nodes){target="_blank"} : Gets a summary for all nodes in groups and their attributes.
:::

We can do the same to get an overview over edges in our MedRecord by using the `print_edge_overview` method:
As shown, we have two groups of nodes - Patients and Medications - created when adding the nodes. Additionally, there’s a group called _'US-Patients'_ that we created. For each group of nodes, we can view their attributes along with a brief statistical summary, such as the minimum, maximum, and mean for numeric variables.

```{literalinclude} scripts/02_medrecord_intro.py
We can do the same to get an overview over edges in our MedRecord by using the [`overview_edges()`](medmodels.medrecord.medrecord.MedRecord.overview_nodes){target="_blank"} method:

```{exec-literalinclude} scripts/02_medrecord_intro.py
---
language: python
lines: 40
setup-lines: 1-45
lines: 49
---
```

However, they need to belong in order to show their attributes in the overview.

```{exec-literalinclude} scripts/02_medrecord_intro.py
---
language: python
setup-lines: 1-45
lines: 52, 54
---
```
---------------------------------------------------------------------
Edges Groups Count Attribute Info
---------------------------------------------------------------------
Patients -> Medications 3 Date min: 2018-02-02 00:00:00
max: 2020-06-07 00:00:00
US-Patients -> Medications 2 Date min: 2018-02-02 00:00:00
max: 2020-06-07 00:00:00
---------------------------------------------------------------------
```

:::{dropdown} Methods used in the snippet

- [`overview_edges()`](medmodels.medrecord.medrecord.MedRecord.overview_edges){target="_blank"} : Gets a summary for all edges in groups and their attributes.
:::

## Accessing Elements in a MedRecord

Now that we have stored some structured data in our MedRecord, we might want to access certain elements of it. The main way to do this is by either selecting the data with their indices or via groups that they are in.

```{literalinclude} scripts/02_medrecord_intro.py
We can for example, get all available nodes:

```{exec-literalinclude} scripts/02_medrecord_intro.py
---
language: python
lines: 42-56
setup-lines: 1-52
lines: 57
---
```

The MedRecord can be queried in very advanced ways in order to find very specific nodes based on time, relations, neighbors or other. These advanced querying methods are covered in one of the next sections of the user guide.
Or access the attributes of a specific node:

```{exec-literalinclude} scripts/02_medrecord_intro.py
---
language: python
setup-lines: 1-52
lines: 60
---
```

Or get all available groups:

```{exec-literalinclude} scripts/02_medrecord_intro.py
---
language: python
setup-lines: 1-52
lines: 63
---
```

Or get all nodes belong to a certain group:

```{exec-literalinclude} scripts/02_medrecord_intro.py
---
language: python
setup-lines: 1-52
lines: 66
---
```

:::{dropdown} Methods used in the snippet

- [`nodes()`](medmodels.medrecord.medrecord.MedRecord.nodes){target="_blank"} : Lists the node indices in the MedRecord instance.
- [`node[]`](medmodels.medrecord.medrecord.MedRecord.node){target="_blank"} : Provides access to node attributes within the MedRecord instance via an indexer.
- [`groups()`](medmodels.medrecord.medrecord.MedRecord.groups){target="_blank"} : Lists the groups in the MedRecord instance.
- [`nodes_in_group()`](medmodels.medrecord.medrecord.MedRecord.nodes_in_group){target="_blank"} : Retrieves the node indices associated with the specified group/s in the MedRecord.
:::

The MedRecord can be queried in very advanced ways in order to find very specific nodes based on time, relations, neighbors or other. These advanced querying methods are covered in one of the next sections of the user guide.

## Full example Code

The full code examples for this chapter can be found here:
The full code examples for this chapter can be found here:

```{literalinclude} scripts/02_medrecord_intro.py
---
language: python
---
```
```
2 changes: 1 addition & 1 deletion docs/user_guide/02a_schema.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# MedRecord Schema
# MedRecord Schema
Loading
Loading