diff --git a/docs/developer_guide/internal-workflow.md b/docs/developer_guide/internal-workflow.md index 9698ab0..362ce5e 100644 --- a/docs/developer_guide/internal-workflow.md +++ b/docs/developer_guide/internal-workflow.md @@ -90,7 +90,7 @@ Epic issues are large tasks divided into smaller sub-tasks, marked with the `epi Example: -``` +```markdown - [ ] Sub-task 1 - [ ] Sub-task 2 - [ ] Sub-task 3 diff --git a/docs/index.md b/docs/index.md index 1dae54f..3139600 100644 --- a/docs/index.md +++ b/docs/index.md @@ -44,7 +44,6 @@ developer_guide/index The MedModels documentation is your go-to resource for exploring the package. It offers complete API descriptions and a detailed user guide, giving you everything you need to effectively utilize its features. - ```{only} html [![black](https://img.shields.io/badge/code_style-black-black.svg)](https://black.readthedocs.io/en/stable/) ![python versions](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue) diff --git a/docs/user_guide/02_medrecord.md b/docs/user_guide/02_medrecord.md index a24f1d6..26abd3d 100644 --- a/docs/user_guide/02_medrecord.md +++ b/docs/user_guide/02_medrecord.md @@ -9,13 +9,11 @@ 02b_query_engine ``` - ## Preface -Every major library has a central object that consitutes its core. For [PyTorch](https://pytorch.org/), it is the `torch.Tensor`, whereas for [Numpy](https://numpy.org/), it is the `np.array`. In our case, MedModels centres around the `mm.MedRecord` as its foundational structure. - -MedModels delivers advanced data analytics methods out-of-the-box by utilizing a structured approach to data storage. This is enabled by the MedRecord class, which organizes data of any complexity within a graph structure. With its Rust backend implementation, MedRecord guarantees high performance, even when working with extremely large datasets. +Every major library has a central object that consitutes its core. For [PyTorch](https://pytorch.org/), it is the `torch.Tensor`, whereas for [Numpy](https://numpy.org/), it is the `np.array`. In our case, MedModels centres around the [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"} as its foundational structure. +MedModels delivers advanced data analytics methods out-of-the-box by utilizing a structured approach to data storage. This is enabled by the [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"} class, which organizes data of any complexity within a graph structure. With its Rust backend implementation, MedRecord guarantees high performance, even when working with extremely large datasets. ```{literalinclude} scripts/02_medrecord_intro.py --- @@ -32,22 +30,22 @@ Let's begin by introducing some sample medical data: :widths: 15 15 15 15 :header-rows: 1 -* - ID - - Age - - Sex - - Loc -* - Patient 01 - - 72 - - M - - USA -* - Patient 02 - - 74 - - M - - USA -* - Patient 03 - - 64 - - F - - GER +- - ID + - Age + - Sex + - Loc +- - Patient 01 + - 72 + - M + - USA +- - Patient 02 + - 74 + - M + - USA +- - Patient 03 + - 64 + - F + - GER ::: This data, stored for example in a Pandas DataFrame, looks like this: @@ -55,11 +53,11 @@ This data, stored for example in a Pandas DataFrame, looks like this: ```{literalinclude} scripts/02_medrecord_intro.py --- language: python -lines: 5-13 +lines: 6-13 --- ``` -In the example below, we create a new MedRecord using the builder pattern. We instantiate a `MedRecordBuilder` and instruct it to add the Pandas DataFrame as nodes, using the 'ID' column for indexing. Additionally, we assign these nodes to the group 'Patients'. +In the example below, we create a new MedRecord using the builder pattern. We instantiate a [`MedRecordBuilder`](medmodels.medrecord.builder.MedRecordBuilder){target="_blank"} and instruct it to add the Pandas DataFrame as nodes, using the _'ID'_ column for indexing. Additionally, we assign these nodes to the group 'Patients'. The Builder Pattern simplifies creating complex objects by constructing them step by step. It improves flexibility, readability, and consistency, making it easier to manage and configure objects in a controlled way. ```{literalinclude} scripts/02_medrecord_intro.py @@ -69,19 +67,29 @@ lines: 30 --- ``` +:::{dropdown} Methods used in the snippet + +- [`builder()`](medmodels.medrecord.medrecord.MedRecord.builder){target="_blank"} : Creates a new [`MedRecordBuilder`](medmodels.medrecord.builder.MedRecordBuilder){target="_blank"} instance to build a [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"}. +- [`add_nodes()`](medmodels.medrecord.builder.MedRecordBuilder.add_nodes){target="_blank"} : Adds nodes to the MedRecord from different data formats and optionally assigns them to a group. +- [`build()`](medmodels.medrecord.builder.MedRecordBuilder.build){target="_blank"} : Constructs a MedRecord instance from the builder’s configuration. +::: + The MedModels MedRecord object, `record`, now contains three patients. Each patient is identified by a unique index and has specific attributes, such as age, sex, and location. These patients serve as the initial nodes in the graph structure of our MedRecord, and are represented as follows: ```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_01.png :class: transparent-image ``` + We can now proceed by adding additional data, such as the following medications. + ```{literalinclude} scripts/02_medrecord_intro.py --- language: python -lines: 15-18 +lines: 16-18 --- ``` -Using the builder pattern to construct the MedRecord allows us to pass as many nodes and edges as needed. If nodes are not added during the initial graph construction, they can easily be added later to an existing MedRecord by calling `add_nodes`, where you provide the DataFrame and specify the column containing the node indices. + +Using the builder pattern to construct the MedRecord allows us to pass as many nodes and edges as needed. If nodes are not added during the initial graph construction, they can easily be added later to an existing MedRecord by calling [`add_nodes()`](medmodels.medrecord.medrecord.MedRecordBuilder.add_nodes){target="_blank"}, where you provide the DataFrame and specify the column containing the node indices. ```{literalinclude} scripts/02_medrecord_intro.py --- @@ -90,11 +98,17 @@ lines: 32 --- ``` +:::{dropdown} Methods used in the snippet + +- [`add_nodes()`](medmodels.medrecord.medrecord.MedRecord.add_nodes){target="_blank"} : Adds nodes to the MedRecord from different data formats and optionally assigns them to a group. +::: + This will expand the MedRecord, adding several new nodes to the graph. However, these nodes are not yet connected, so let's establish relationships between them! ```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_02.png :class: transparent-image ``` + ## Adding Edges to a MedRecord To capture meaningful relationships between nodes, such as linking patients to prescribed medications, we add edges to the MedRecord. These edges must be specified in a relation table, such as the one shown below: @@ -103,18 +117,18 @@ To capture meaningful relationships between nodes, such as linking patients to p :widths: 15 15 15 :header-rows: 1 -* - Pat_ID - - Med_ID - - time -* - Patient 02 - - Med 01 - - 2020/06/07 -* - Patient 02 - - Med 02 - - 2018/02/02 -* - Patient 03 - - Med 02 - - 2019/03/02 +- - Pat_ID + - Med_ID + - time +- - Patient 02 + - Med 01 + - 2020/06/07 +- - Patient 02 + - Med 02 + - 2018/02/02 +- - Patient 03 + - Med 02 + - 2019/03/02 ::: We can add these edges then to our MedRecord Graph: @@ -125,14 +139,22 @@ language: python lines: 34 --- ``` + +:::{dropdown} Methods used in the snippet + +- [`add_edges()`](medmodels.medrecord.medrecord.MedRecord.add_edges){target="_blank"} : Adds edges to the MedRecord from different data formats and optionally assigns them to a group. + +::: + This results in an enlarged Graph with more information. ```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_03b.png :class: transparent-image ``` + ## Adding Groups to a MedRecord -For certain analyses, we may want to define specific subcohorts within our MedRecord for easier access. We can do this by defining named groups withing our MedRecored. +For certain analyses, we may want to define specific subcohorts within our MedRecord for easier access. We can do this by defining named groups withing our MedRecord. ```{literalinclude} scripts/02_medrecord_intro.py --- @@ -140,6 +162,13 @@ language: python lines: 36 --- ``` + +:::{dropdown} Methods used in the snippet + +- [`add_group()`](medmodels.medrecord.medrecord.MedRecord.add_group){target="_blank"} : Adds a group to the MedRecord instance with an optional list of node indices. + +::: + This group will include all the defined nodes, allowing for easier access during complex analyses. Both nodes and edges can be added to a group, with no limitations on group size. Additionally, nodes and edges can belong to multiple groups without restriction. ```{image} https://raw.githubusercontent.com/limebit/medmodels-static/main/imgs/user_guide/02/02_medrecord_intro_04.png @@ -153,81 +182,120 @@ When building a MedRecord, you may want to save it to create a persistent versio ```{literalinclude} scripts/02_medrecord_intro.py --- language: python -lines: 58-59 +lines: 68-69 --- ``` -## Printing Overview Tables +:::{dropdown} Methods used in the snippet -The MedModels MedRecord class is designed to efficiently handle large datasets while maintaining a standardized data structure that supports complex analysis methods. As a result, the structure within the MedRecord can become intricate and difficult to manage. To address this, MedModels offers tools to help keep track of the graph-based data. One such tool is the `print_node_overview` method: +- [`to_ron()`](medmodels.medrecord.medrecord.MedRecord.to_ron){target="_blank"} : Writes the MedRecord instance to a RON file. +- [`from_ron()`](medmodels.medrecord.medrecord.MedRecord.from_ron){target="_blank"} : Creates a MedRecord instance from a RON file. +::: -```{literalinclude} scripts/02_medrecord_intro.py +## Overview Tables + +The MedRecord class is designed to efficiently handle large datasets while maintaining a standardized data structure that supports complex analysis methods. As a result, the structure within the MedRecord can become intricate and difficult to manage. To address this, MedModels offers tools to help keep track of the graph-based data. One such tool is the [`overview_nodes()`](medmodels.medrecord.medrecord.MedRecord.overview_nodes){target="_blank"} method, which prints an overview over all nodes in the MedRecord. + +```{exec-literalinclude} scripts/02_medrecord_intro.py --- language: python -lines: 38 +setup-lines: 1-45 +lines: 47 --- ``` -It will print an overview over all grouped nodes in the MedRecord. +:::{dropdown} Methods used in the snippet -``` -------------------------------------------------------- -Nodes Group Count Attribute Info -------------------------------------------------------- -Medications 2 Name Values: Insulin, Warfarin -Patients 3 Age min: 64 - max: 74 - mean: 70.00 - Loc Values: GER, USA - Sex Values: F, M -US-Patients 2 Age min: 72 - max: 74 - mean: 73.00 - Loc Values: USA - Sex Values: M -------------------------------------------------------- -``` -As shown, we have two groups of nodes - Patients and Medications - created when adding the nodes. Additionally, there’s a group called 'US-Patients' that we created. For each group of nodes, we can view their attributes along with a brief statistical summary, such as the minimum, maximum, and mean for numeric variables. +- [`overview_nodes()`](medmodels.medrecord.medrecord.MedRecord.overview_nodes){target="_blank"} : Gets a summary for all nodes in groups and their attributes. +::: -We can do the same to get an overview over edges in our MedRecord by using the `print_edge_overview` method: +As shown, we have two groups of nodes - Patients and Medications - created when adding the nodes. Additionally, there’s a group called _'US-Patients'_ that we created. For each group of nodes, we can view their attributes along with a brief statistical summary, such as the minimum, maximum, and mean for numeric variables. -```{literalinclude} scripts/02_medrecord_intro.py +We can do the same to get an overview over edges in our MedRecord by using the [`overview_edges()`](medmodels.medrecord.medrecord.MedRecord.overview_nodes){target="_blank"} method: + +```{exec-literalinclude} scripts/02_medrecord_intro.py --- language: python -lines: 40 +setup-lines: 1-45 +lines: 49 --- ``` +However, they need to belong in order to show their attributes in the overview. + +```{exec-literalinclude} scripts/02_medrecord_intro.py +--- +language: python +setup-lines: 1-45 +lines: 52, 54 +--- ``` ---------------------------------------------------------------------- -Edges Groups Count Attribute Info ---------------------------------------------------------------------- -Patients -> Medications 3 Date min: 2018-02-02 00:00:00 - max: 2020-06-07 00:00:00 -US-Patients -> Medications 2 Date min: 2018-02-02 00:00:00 - max: 2020-06-07 00:00:00 ---------------------------------------------------------------------- -``` + +:::{dropdown} Methods used in the snippet + +- [`overview_edges()`](medmodels.medrecord.medrecord.MedRecord.overview_edges){target="_blank"} : Gets a summary for all edges in groups and their attributes. +::: ## Accessing Elements in a MedRecord Now that we have stored some structured data in our MedRecord, we might want to access certain elements of it. The main way to do this is by either selecting the data with their indices or via groups that they are in. -```{literalinclude} scripts/02_medrecord_intro.py +We can for example, get all available nodes: + +```{exec-literalinclude} scripts/02_medrecord_intro.py --- language: python -lines: 42-56 +setup-lines: 1-52 +lines: 57 --- ``` -The MedRecord can be queried in very advanced ways in order to find very specific nodes based on time, relations, neighbors or other. These advanced querying methods are covered in one of the next sections of the user guide. +Or access the attributes of a specific node: + +```{exec-literalinclude} scripts/02_medrecord_intro.py +--- +language: python +setup-lines: 1-52 +lines: 60 +--- +``` +Or get all available groups: + +```{exec-literalinclude} scripts/02_medrecord_intro.py +--- +language: python +setup-lines: 1-52 +lines: 63 +--- +``` + +Or get all nodes belong to a certain group: + +```{exec-literalinclude} scripts/02_medrecord_intro.py +--- +language: python +setup-lines: 1-52 +lines: 66 +--- +``` + +:::{dropdown} Methods used in the snippet + +- [`nodes()`](medmodels.medrecord.medrecord.MedRecord.nodes){target="_blank"} : Lists the node indices in the MedRecord instance. +- [`node[]`](medmodels.medrecord.medrecord.MedRecord.node){target="_blank"} : Provides access to node attributes within the MedRecord instance via an indexer. +- [`groups()`](medmodels.medrecord.medrecord.MedRecord.groups){target="_blank"} : Lists the groups in the MedRecord instance. +- [`nodes_in_group()`](medmodels.medrecord.medrecord.MedRecord.nodes_in_group){target="_blank"} : Retrieves the node indices associated with the specified group/s in the MedRecord. +::: + +The MedRecord can be queried in very advanced ways in order to find very specific nodes based on time, relations, neighbors or other. These advanced querying methods are covered in one of the next sections of the user guide. ## Full example Code -The full code examples for this chapter can be found here: +The full code examples for this chapter can be found here: + ```{literalinclude} scripts/02_medrecord_intro.py --- language: python --- -``` \ No newline at end of file +``` diff --git a/docs/user_guide/02a_schema.md b/docs/user_guide/02a_schema.md index 0bc9236..24cb2f0 100644 --- a/docs/user_guide/02a_schema.md +++ b/docs/user_guide/02a_schema.md @@ -1 +1 @@ -# MedRecord Schema \ No newline at end of file +# MedRecord Schema diff --git a/docs/user_guide/02b_query_engine.md b/docs/user_guide/02b_query_engine.md index d8916a5..06ad0bd 100644 --- a/docs/user_guide/02b_query_engine.md +++ b/docs/user_guide/02b_query_engine.md @@ -208,7 +208,7 @@ lines: 58-67 :::{dropdown} Methods used in the snippet -- [`neighbors()`](medmodels.medrecord.querying.NodeOperand.neighbors){target="_blank"} : Returns a [`NodeOperand()`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query the neighbors of those nodes. +- [`neighbors()`](medmodels.medrecord.querying.NodeOperand.neighbors){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query the neighbors of those nodes. - [`attribute()`](medmodels.medrecord.querying.NodeOperand.attribute){target="_blank"} : Returns a [`MultipleValuesOperand()`](medmodels.medrecord.querying.MultipleValuesOperand){target="_blank"} to query on the values of the nodes for that attribute. - [`lowercase()`](medmodels.medrecord.querying.MultipleValuesOperand.lowercase){target="_blank"} : Converts the values that are strings to lowercase. - [`contains()`](medmodels.medrecord.querying.NodeIndexOperand.contains){target="_blank"} : Query node indices containing that argument. @@ -250,9 +250,9 @@ lines: 79-88 - [`in_group()`](medmodels.medrecord.querying.EdgeOperand.in_group){target="_blank"} : Query nodes that belong to that group. - [`attribute()`](medmodels.medrecord.querying.EdgeOperand.attribute){target="_blank"} : Returns a [`MultipleValuesOperand()`](medmodels.medrecord.querying.MultipleValuesOperand){target="_blank"} to query on the values of the nodes for that attribute. - [`less_than()`](medmodels.medrecord.querying.MultipleValuesOperand.less_than){target="_blank"} : Query values that are less than that value. -- [`source_node()`](medmodels.medrecord.querying.EdgeOperand.source_node){target="_blank"} : Returns a [`NodeOperand()`](medmodels.medrecord.querying.NodeOperand) to query on the source nodes for those edges. +- [`source_node()`](medmodels.medrecord.querying.EdgeOperand.source_node){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand) to query on the source nodes for those edges. - [`is_max()`](medmodels.medrecord.querying.MultipleValuesOperand.is_max){target="_blank"} : Query on the values that hold on the maximum value among all of the. -- [`target_node()`](medmodels.medrecord.querying.EdgeOperand.target_node){target="_blank"} : Returns a [`NodeOperand()`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query on the target nodes for those edges. +- [`target_node()`](medmodels.medrecord.querying.EdgeOperand.target_node){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query on the target nodes for those edges. - [`contains()`](medmodels.medrecord.querying.NodeIndexOperand.contains){target="_blank"} : Query node indices containing that argument. - [`select_edges()`](medmodels.medrecord.medrecord.MedRecord.select_edges){target="_blank"} : Select edges that match that query. @@ -395,6 +395,21 @@ lines: 162 --- ``` +:::{dropdown} Methods used in the snippet + +- [`in_group()`](medmodels.medrecord.querying.EdgeOperand.in_group){target="_blank"} : Query nodes that belong to that group.: Query edges that belong to that group. +- [`index()`](medmodels.medrecord.querying.NodeOperand.index){target="_blank"}: Returns a [`NodeIndexOperand`](medmodels.medrecord.querying.NodeIndexOperand){target="_blank"}` to query on the indices. +- [`contains()`](medmodels.medrecord.querying.NodeIndexOperand.contains){target="_blank"} : Query node indices containing that argument. +- [`contains()`](medmodels.medrecord.querying.EdgeOperand.attribute){target="_blank"} : Returns a [`MultipleValuesOperand()`](medmodels.medrecord.querying.MultipleValuesOperand){target="_blank"} to query on the values of the edges for that attribute. +- [`mean()`](medmodels.medrecord.querying.MultipleValuesOperand.mean){target="_blank"}: Returns a [`SingleValueOperand`](medmodels.medrecord.querying.SingleValueOperand){target="_blank"} containing the mean of those values. +- [`clone()`](medmodels.medrecord.querying.SingleValueOperand.clone){target="_blank"} : Returns a clone of the operand. +- [`subtract()`](medmodels.medrecord.querying.SingleValueOperand.subtract){target="_blank"} : Subtract the argument from the single value operand. +- [`greater_than()`](medmodels.medrecord.querying.MultipleValuesOperand.greater_than){target="_blank"} : Query values that are greater than that value. +- [`less_than()`](medmodels.medrecord.querying.MultipleValuesOperand.less_than){target="_blank"} : Query values that are less than that value. +- [`select_nodes()`](medmodels.medrecord.medrecord.MedRecord.select_nodes){target="_blank"} : Select nodes that match that query. + +::: + ## Full example Code The full code examples for this chapter can be found here: diff --git a/docs/user_guide/03_treatment_effect.md b/docs/user_guide/03_treatment_effect.md index a4931fd..d11c337 100644 --- a/docs/user_guide/03_treatment_effect.md +++ b/docs/user_guide/03_treatment_effect.md @@ -1 +1 @@ -# Treatment Effect Calculation \ No newline at end of file +# Treatment Effect Calculation diff --git a/docs/user_guide/04_medrecord_comparer.md b/docs/user_guide/04_medrecord_comparer.md index 1ca1128..231916c 100644 --- a/docs/user_guide/04_medrecord_comparer.md +++ b/docs/user_guide/04_medrecord_comparer.md @@ -1 +1 @@ -# Data Comparer \ No newline at end of file +# Data Comparer diff --git a/docs/user_guide/getstarted.md b/docs/user_guide/getstarted.md index dad1e61..725e71b 100644 --- a/docs/user_guide/getstarted.md +++ b/docs/user_guide/getstarted.md @@ -2,12 +2,12 @@ ## MedRecord -A _MedRecord_ is a data class that contains medical data in a network structure. It is based on _nodes_ and _edges_, which are connections between nodes. The MedRecord makes it easy to connect a dataset with different medical data tables or DataFrames into one structure with the necessary relationships. +A [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"} is a data class that contains medical data in a network structure. It is based on _nodes_ and _edges_, which are connections between nodes. The MedRecord makes it easy to connect a dataset with different medical data tables or DataFrames into one structure with the necessary relationships. The MedModels framework is based on the MedRecord class and all MedModels methods take a MedRecord as input. ## Nodes -Nodes are the core components of a MedRecord. Each data entry, such as patient, diagnoses or procedure entries, is an indivual node in the MedRecord. Each node needs to have a unique identifier and can have different attributes associated to it. For example the patient data would have the _patient_id_ column as the unique identifier, and _gender_ and _age_ could be additional attributes for each patient. +Nodes are the core components of a [`MedRecord`](medmodels.medrecord.medrecord.MedRecord){target="_blank"}. Each data entry, such as patient, diagnoses or procedure entries, is an indivual node in the MedRecord. Each node needs to have a unique identifier and can have different attributes associated to it. For example the patient data would have the _patient_id_ column as the unique identifier, and _gender_ and _age_ could be additional attributes for each patient. ```python # nodes - patient information @@ -246,6 +246,12 @@ edges = [ medrecord = MedRecord.from_tuples(nodes, edges) ``` +:::{dropdown} Methods used in the snippet + +* [`from_tuples()`](medmodels.medrecord.medrecord.MedRecord.from_tuples){target="_blank"} : Creates a MedRecord instance from lists of node and edge tuples. + +::: + ### Pandas DataFrames If the MedRecord is created from a Pandas DataFrame, nodes and edges can be either a single DataFrame or a list of DataFrames. Edges are optional, but nodes need to be created to continue. @@ -274,6 +280,12 @@ medrecord = MedRecord.from_pandas( ) ``` +:::{dropdown} Methods used in the snippet + +* [`from_pandas()`](medmodels.medrecord.medrecord.MedRecord.from_pandas){target="_blank"} : Creates a MedRecord from Pandas DataFrames of nodes and optionally edges. + +::: + #### Adding Nodes Nodes and Edges can be added to an existing MedRecord later, either as single DataFrames or a list of DataFrames. @@ -284,6 +296,12 @@ drug.set_index("drug_code", inplace=True) medrecord.add_nodes(nodes=drug) ``` +:::{dropdown} Methods used in the snippet + +* [`add_nodes()`](medmodels.medrecord.medrecord.MedRecord.add_nodes){target="_blank"} : Adds nodes to the MedRecord from different data formats and optionally assigns them to a group. + +::: + ### Polars Dataframes When adding a Polars DataFrame to a MedRecord, the index columns must be specified with the DataFrame because there are no index columns in a Polars DataFrame. @@ -301,9 +319,15 @@ patient_drug_edges = medrecord.add_edges_polars( ) ``` +:::{dropdown} Methods used in the snippet + +* [`add_edges_polars()`](medmodels.medrecord.medrecord.MedRecord.add_edges_polars){target="_blank"} : Adds nodes to the MedRecord from different data formats and optionally assigns them to a group. + +::: + ### Removing entries -Nodes and edges can be easily removed by their identifier. To check if a node or edge exists, the `contains_node()` or `contains_edge()` functions can be used. If a node is deleted from the MedRecord, its corresponding edges will also be removed. +Nodes and edges can be easily removed by their identifier. To check if a node or edge exists, the [`contains_node()`](medmodels.medrecord.medrecord.MedRecord.contains_node){target="_blank"} or [`contains_edge()`](medmodels.medrecord.medrecord.MedRecord.contains_edge){target="_blank"} functions can be used. If a node is deleted from the MedRecord, its corresponding edges will also be removed. ```python # returns attributes for the node that will be removed @@ -313,6 +337,14 @@ medrecord.contains_node("pat_6") or medrecord.contains_edge(edge_pat6_pat2_id) False +:::{dropdown} Methods used in the snippet + +* [`remove_nodes()`](medmodels.medrecord.medrecord.MedRecord.remove_nodes){target="_blank"} : Removes a node or multiple nodes from the MedRecord and returns their attributes. +* [`contains_node()`](medmodels.medrecord.medrecord.MedRecord.contains_node){target="_blank"} : Checks whether a specific node exists in the MedRecord. +* [`contains_edge()`](medmodels.medrecord.medrecord.MedRecord.contains_edge){target="_blank"} : Checks whether a specific edge exists in the MedRecord. + +::: + ### Size of a MedRecord The size of a MedRecord instance is determined by the number of nodes and their connecting edges. @@ -325,6 +357,13 @@ print( The medrecord has 73 nodes and 160 edges. +:::{dropdown} Methods used in the snippet + +* [`node_count()`](medmodels.medrecord.medrecord.MedRecord.node_count){target="_blank"} : Returns the total number of nodes currently managed by the MedRecord. +* [`edge_count()`](medmodels.medrecord.medrecord.MedRecord.edge_count){target="_blank"} : Returns the total number of edges currently managed by the MedRecord. + +::: + ## Nodes ### Getting node attributes @@ -344,6 +383,11 @@ print(f"Age of multiple patients: {medrecord.node[['pat_2', 'pat_3', 'pat_4'], ' Gender of first patient: M Age of multiple patients: {'pat_4': 19, 'pat_2': 22, 'pat_3': 96} +:::{dropdown} Methods used in the snippet + +* [`node[]`](medmodels.medrecord.medrecord.MedRecord.node){target="_blank"} : Provides access to node attributes within the MedRecord instance via an indexer. +::: + ### Setting and updating node attributes With the same indexing concept, attributes can also be updated or new attributes can be added. @@ -364,14 +408,19 @@ print(f"First patient attributes: {medrecord.node[first_patient]}") First patient attributes: {'gender': 'F', 'death': True, 'age': 42} First patient attributes: {'gender': 'F', 'age': 42} +:::{dropdown} Methods used in the snippet + +* [`node[]`](medmodels.medrecord.medrecord.MedRecord.node){target="_blank"} : Provides access to node attributes within the MedRecord instance via an indexer. +::: + ### Selecting nodes and grouping -Nodes can be selected using the MedRecords query engine. The `select_nodes()` function works with logical operators on node properties, attributes or the node index and returns a list of node indices. +Nodes can be selected using the MedRecords query engine. The [`select_nodes()`](medmodels.medrecord.medrecord.MedRecord.select_nodes){target="_blank"} function works with logical operators on node properties, attributes or the node index and returns a list of node indices. Nodes and edges can be organized in groups for easier access. Nodes can be added to a group by their indices. ```python -# select all indeces for node +# select all indices for node patient_ids = medrecord.select_nodes(node().index().starts_with("pat")) medrecord.add_group(group="Patient", node=patient_ids) @@ -380,6 +429,16 @@ print(f"Patients: {medrecord.select_nodes(node().in_group('Patient'))}") Patients: ['pat_5', 'pat_1', 'pat_4', 'pat_3', 'pat_2'] +:::{dropdown} Methods used in the snippet + +* [`select_nodes()`](medmodels.medrecord.medrecord.MedRecord.select_nodes){target="_blank"} : Select nodes that match that query. +* [`node()`](medmodels.medrecord.querying.NodeOperand){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query on the nodes of the MedRecord. +* [`index()`](medmodels.medrecord.querying.NodeOperand.index){target="_blank"} : Returns a [`NodeIndexOperand`](medmodels.medrecord.querying.NodeIndexOperand){target="_blank"} to query on the node indices of the node operand. +* [`starts_with()`](medmodels.medrecord.querying.NodeIndexOperand.starts_with){target="_blank"} : Query the node indices that start with that argument. +* [`add_group()`](medmodels.medrecord.medrecord.MedRecord.add_group){target="_blank"} : Adds a group to the MedRecord, optionally with node and edge indices. +* [`in_group()`](medmodels.medrecord.querying.NodeOperand.in_group){target="_blank"} : Query the nodes that are in that given group. +::: + ### Creating sub populations Grouping can also be used to make sub populations that share the same properties. The nodes can be added to a group either by their indices or directly by giving a node operation to the node parameter. @@ -394,10 +453,20 @@ medrecord.add_group(group="Young", node=young_id) medrecord.add_group(group="Woman", node=node().attribute("gender").equal_to("F")) ``` -The nodes of a group or a list of groups can be easily accessed with `group()`. The return is either a list of node indices for a single group or a dictionary with each group name, +:::{dropdown} Methods used in the snippet + +* [`select_nodes()`](medmodels.medrecord.medrecord.MedRecord.select_nodes){target="_blank"} : Select nodes that match that query. +* [`node()`](medmodels.medrecord.querying.NodeOperand){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query on the nodes of the MedRecord. +* [`attribute()`](medmodels.medrecord.querying.NodeOperand.attribute){target="_blank"} : Returns a [`MultipleValuesOperand`](medmodels.medrecord.querying.MultipleValuesOperand){target="_blank"} to query on the values of the nodes for that attribute. +* [`less_than()`](medmodels.medrecord.querying.MultipleValuesOperand.less_than){target="_blank"} : Query values that are less than that value. +* [`equal_to()`](medmodels.medrecord.querying.MultipleValuesOperand.equal_to){target="_blank"} : Query values equal to that value. +* [`add_group()`](medmodels.medrecord.medrecord.MedRecord.add_group){target="_blank"} : Adds a group to the MedRecord, optionally with node and edge indices. +::: + +The nodes of a group or a list of groups can be easily accessed with [`group()`](medmodels.medrecord.medrecord.MedRecord.group){target="_blank"}. The return is either a list of node indices for a single group or a dictionary with each group name, mapping to its list of node indices in case of multiple groups. -To get all groups in which a node or a list of nodes is categorized, the function `groups_of_nodes()` can be used. +To get all groups in which a node or a list of nodes is categorized, the function [`groups_of_node()`](medmodels.medrecord.medrecord.MedRecord.groups_of_node){target="_blank"} can be used. ```python print( @@ -414,13 +483,19 @@ medrecord.group(["Young", "Woman"]) {'Young': ['pat_4'], 'Woman': ['pat_3', 'pat_1', 'pat_2']} +:::{dropdown} Methods used in the snippet + +* [`group()`](medmodels.medrecord.medrecord.MedRecord.group){target="_blank"} : Returns the node and edge indices associated with the specified group/s in the MedRecord. +* [`groups_of_node()`](medmodels.medrecord.medrecord.MedRecord.groups_of_node){target="_blank"} : Retrieves the groups associated with the specified node(s) in the MedRecord. +::: + Nodes can also be added to an existing group later. ```python higher_age = 25 additional_young_id = medrecord.select_nodes( - node().attribute("age").greater_or_equal(young_age) - & node().attribute("age").less(higher_age) + node().attribute("age").greater_than_or_equal_to(young_age) + & node().attribute("age").less_than(higher_age) ) medrecord.add_nodes_to_group(group="Young", nodes=additional_young_id) @@ -431,6 +506,17 @@ print( Patients in Group 'Young' if threshold age is 25: ['pat_4', 'pat_2'] +:::{dropdown} Methods used in the snippet + +* [`select_nodes()`](medmodels.medrecord.medrecord.MedRecord.select_nodes){target="_blank"} : Select nodes that match that query. +* [`node()`](medmodels.medrecord.querying.NodeOperand){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query on the nodes of the MedRecord. +* [`attribute()`](medmodels.medrecord.querying.NodeOperand.attribute){target="_blank"} : Returns a [`MultipleValuesOperand`](medmodels.medrecord.querying.MultipleValuesOperand){target="_blank"} to query on the values of the nodes for that attribute. +* [`greater_than_or_equal_to()`](medmodels.medrecord.querying.MultipleValuesOperand.greater_than_or_equal_to){target="_blank"} : Query values that are greater or equal to a specific value. +* [`less_than()`](medmodels.medrecord.querying.MultipleValuesOperand.less_than){target="_blank"} : Query values that are less than that value. +* [`group()`](medmodels.medrecord.medrecord.MedRecord.group){target="_blank"} : Returns the node and edge indices associated with the specified group/s in the MedRecord. +* [`add_nodes_to_group()`](medmodels.medrecord.medrecord.MedRecord.add_nodes_to_group){target="_blank"} : Retrieves the groups associated with the specified node(s) in the MedRecord. +::: + It is possible to remove nodes from groups and to remove groups entirely from the MedRecord. ```python @@ -458,20 +544,35 @@ print( medrecord.add_group(group="Diagnosis", node=node().index().starts_with("diagnosis")) ``` +:::{dropdown} Methods used in the snippet + +* [`remove_nodes_from_group()`](medmodels.medrecord.medrecord.MedRecord.remove_nodes_from_group){target="_blank"} : Select nodes that match that query. +* [`select_nodes()`](medmodels.medrecord.medrecord.MedRecord.select_nodes){target="_blank"} : Select nodes that match that query. +* [`node()`](medmodels.medrecord.querying.NodeOperand){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query on the nodes of the MedRecord. +* [`in_group()`](medmodels.medrecord.querying.NodeOperand.in_group){target="_blank"} : Query the nodes that are in that given group. +* [`group_count()`](medmodels.medrecord.medrecord.MedRecord.group_count){target="_blank"} : Returns the total number of groups currently defined within the MedRecord. +* [`remove_groups()`](medmodels.medrecord.medrecord.MedRecord.remove_groups){target="_blank"} : Removes one or more groups from the MedRecord instance. +* [`contains_group()`](medmodels.medrecord.medrecord.MedRecord.contains_group){target="_blank"} : Checks whether a specific group exists in the MedRecord. +::: + ## Edges ### Getting edge indices -Edges are assigned a unique index when they are added to the MedRecord. To retrieve the indices for a specific edge, the corresponding source and target node have to be specified in `edges_connecting()`. The same concept can also be used to get a list of all edge indices that are connecting a group of source nodes to a group of target nodes. +Edges are assigned a unique index when they are added to the MedRecord. To retrieve the indices for a specific edge, the corresponding source and target node have to be specified in [`edges_connecting()`](medmodels.medrecord.medrecord.MedRecord.edges_connecting){target="_blank"}. The same concept can also be used to get a list of all edge indices that are connecting a group of source nodes to a group of target nodes. ```python -# next PR patient_diagnosis_edges = medrecord.edges_connecting( source_node=medrecord.group("Patient"), target_node=medrecord.group("Diagnosis") ) ``` -All outgoing or incoming edges of a node or a list of nodes can be retrieved with the functions `outgoing_edges()` or `incoming_edges()` respectively. If the edges of a list of nodes is requested, the return will be a dictionary with the nodes as keys and their edges as values in lists. Otherwise for a single node, the return will be a simple list. +:::{dropdown} Methods used in the snippet + +* [`edges_connecting()`](medmodels.medrecord.medrecord.MedRecord.edges_connecting){target="_blank"} : Retrieves the edges connecting the specified source and target nodes in the MedRecord. +::: + +All outgoing or incoming edges of a node or a list of nodes can be retrieved with the functions [`outgoing_edges()`](medmodels.medrecord.medrecord.MedRecord.outgoing_edges){target="_blank"} or [`incoming_edges()`](medmodels.medrecord.medrecord.MedRecord.incoming_edges){target="_blank"} respectively. If the edges of a list of nodes is requested, the return will be a dictionary with the nodes as keys and their edges as values in lists. Otherwise for a single node, the return will be a simple list. The outgoing edges of a node are only the ones where the node is defined as the source node, while incoming edges of a node are the edges, where the specific node is defined as a target node. @@ -484,6 +585,7 @@ print( The first patient has 24 outgoing edges and 0 incoming edges. + ```python # diagnosis edges diabetes_diagnosis = medrecord.select_nodes( @@ -497,7 +599,17 @@ print( The diabetes diagnosis has 0 outgoing edges and the following incoming edges: [8, 25]. -From the edge indices, the source and target nodes can be retrieved with `edge_endpoints()`. +:::{dropdown} Methods used in the snippet + +* [`outgoing_edges()`](medmodels.medrecord.medrecord.MedRecord.outgoing_edges){target="_blank"} : Lists the outgoing edges of the specified node(s) in the MedRecord. +* [`incoming_edges()`](medmodels.medrecord.medrecord.MedRecord.incoming_edges){target="_blank"} : Lists the incoming edges of the specified node(s) in the MedRecord. +* [`select_nodes()`](medmodels.medrecord.medrecord.MedRecord.select_nodes){target="_blank"} : Select nodes that match that query. +* [`node()`](medmodels.medrecord.querying.NodeOperand){target="_blank"} : Returns a [`NodeOperand`](medmodels.medrecord.querying.NodeOperand){target="_blank"} to query on the nodes of the MedRecord. +* [`attribute()`](medmodels.medrecord.querying.NodeOperand.attribute){target="_blank"} : Returns a [`MultipleValuesOperand`](medmodels.medrecord.querying.MultipleValuesOperand){target="_blank"} to query on the values of the nodes for that attribute. +* [`contains()`](medmodels.medrecord.querying.NodeIndexOperand.contains){target="_blank"} : Query node indices containing that argument. +::: + +From the edge indices, the source and target nodes can be retrieved with [`edge_endpoints()`](medmodels.medrecord.medrecord.MedRecord.edge_endpoints){target="_blank"}. ```python medrecord.edge_endpoints(diabetes_incoming_edges) @@ -505,6 +617,11 @@ medrecord.edge_endpoints(diabetes_incoming_edges) {25: ('pat_3', 'diagnosis_15777000'), 8: ('pat_1', 'diagnosis_15777000')} +:::{dropdown} Methods used in the snippet + +* [`edge_endpoints()`](medmodels.medrecord.medrecord.MedRecord.edge_endpoints){target="_blank"} : Retrieves the source and target nodes of the specified edge(s) in the MedRecord. +::: + ### Getting edge attributes Retrieving attributes for edges works with the same indexing principles as retrieving attributes for nodes. @@ -519,6 +636,11 @@ print( {25: {'diagnosis_time': '1981-01-04', 'duration_days': None}, 8: {'diagnosis_time': '2020-05-12', 'duration_days': None}} The first drug of the first patient costs 215.58$. +:::{dropdown} Methods used in the snippet + +* [`edge[]`](medmodels.medrecord.medrecord.MedRecord.edge){target="_blank"} : Provides access to edge attributes within the MedRecord instance via an indexer. +::: + ### Setting and updating attributes New attributes for edges can be created or existing attributes can be updated with the indexing method. @@ -533,6 +655,11 @@ print(medrecord.edge[patient_drug_edges[0]]) {'start_time': '2014-04-08T12:54:59Z', 'cost': 100, 'price_changed': True, 'quantity': 3} +:::{dropdown} Methods used in the snippet + +* [`edge[]`](medmodels.medrecord.medrecord.MedRecord.edge){target="_blank"} : Provides access to edge attributes within the MedRecord instance via an indexer. +::: + ### Selecting edges Edges can also be selected using the query engine. The logic operators and functions are similar to the ones used for `select_nodes()`. @@ -543,19 +670,32 @@ medrecord.select_edges(edge().attribute("cost").greater_than(500)) [114, 117, 124] +:::{dropdown} Methods used in the snippet + +* [`select_edges()`](medmodels.medrecord.medrecord.MedRecord.select_edges){target="_blank"} : Select edges that match that query. +* [`edge()`](medmodels.medrecord.querying.EdgeOperand){target="_blank"} : Returns a [`EdgeOperand`](medmodels.medrecord.querying.EdgeOperand){target="_blank"} to query on the edges of the MedRecord. +* [`attribute()`](medmodels.medrecord.querying.EdgeOperand.attribute){target="_blank"} : Returns a [`MultipleValuesOperand()`](medmodels.medrecord.querying.MultipleValuesOperand){target="_blank"} to query on the values of the nodes for that attribute. +* [`greater_than()`](medmodels.medrecord.querying.MultipleValuesOperand.greater_than){target="_blank"} : Query values that are greater than that value. +::: + ## Saving the MedRecord A MedRecord instance and all its data can be saved as a RON (Rusty Object Notation) file. From there, it can also be loaded and a new MedRecord instance can be created from an existing RON file. ```python medrecord.to_ron("medrecord.ron") - medrecord_loaded = MedRecord.from_ron("medrecord.ron") ``` +:::{dropdown} Methods used in the snippet + +* [`to_ron()`](medmodels.medrecord.medrecord.MedRecord.to_ron){target="_blank"} : Writes the MedRecord instance to a RON file. +* [`from_ron()`](medmodels.medrecord.medrecord.MedRecord.from_ron){target="_blank"} : Creates a MedRecord instance from a RON file. +::: + ## Clearing the MedRecord -All data can be removed from the MedRecord with the `clear()` function. +All data can be removed from the MedRecord with the [`clear()`](medmodels.medrecord.medrecord.MedRecord.clear){target="_blank"} function. ```python medrecord.clear() @@ -563,3 +703,9 @@ medrecord.node_count() ``` 0 + +:::{dropdown} Methods used in the snippet + +* [`clear()`](medmodels.medrecord.medrecord.MedRecord.clear){target="_blank"} : Clears all data from the MedRecord instance. +* [`node_count()`](medmodels.medrecord.medrecord.MedRecord.node_count){target="_blank"} : Returns the total number of nodes currently managed by the MedRecord. +::: diff --git a/docs/user_guide/index.md b/docs/user_guide/index.md index 8a7552a..a290341 100644 --- a/docs/user_guide/index.md +++ b/docs/user_guide/index.md @@ -16,4 +16,4 @@ self 02_medrecord 03_treatment_effect 04_medrecord_comparer -``` \ No newline at end of file +``` diff --git a/docs/user_guide/scripts/02_medrecord_intro.py b/docs/user_guide/scripts/02_medrecord_intro.py index 437a674..da6eb9d 100644 --- a/docs/user_guide/scripts/02_medrecord_intro.py +++ b/docs/user_guide/scripts/02_medrecord_intro.py @@ -33,27 +33,37 @@ record.add_edges((patient_medication, "Pat_ID", "Med_ID")) -record.add_group("US-Patients", ["Patient 01", "Patient 02"]) +record.add_group("US-Patients", nodes=["Patient 01", "Patient 02"]) + +record.add_nodes( + ( + pd.DataFrame( + [["Patient 04", 65, "M", "USA"]], columns=["ID", "Age", "Sex", "Loc"] + ), + "ID", + ), +) record.overview_nodes() record.overview_edges() +# Adding edges to a certain group so that they are shown in the overview +record.add_group("Patient-Medication", edges=record.edges) + +record.overview_edges() + # Getting all available nodes record.nodes -# ['Patient 03', 'Med 01', 'Med 02', 'Patient 01', 'Patient 02'] # Accessing a certain node record.node["Patient 01"] -# {'Age': 72, 'Loc': 'USA', 'Sex': 'M'} # Getting all available groups record.groups -# ['Medications', 'Patients', 'US-Patients'] # Getting the nodes that are within a certain group record.nodes_in_group("Medications") -# ['Med 02', 'Med 01'] record.to_ron("record.ron") new_record = mm.MedRecord.from_ron("record.ron") diff --git a/medmodels/medrecord/medrecord.py b/medmodels/medrecord/medrecord.py index ff1767c..740b857 100644 --- a/medmodels/medrecord/medrecord.py +++ b/medmodels/medrecord/medrecord.py @@ -868,7 +868,7 @@ def add_group( nodes: Optional[Union[NodeIndex, NodeIndexInputList, NodeQuery]] = None, edges: Optional[Union[EdgeIndex, EdgeIndexInputList, EdgeQuery]] = None, ) -> None: - """Adds a group to the MedRecord instance with an optional list of node indices. + """Adds a group to the MedRecord, optionally with node and edge indices. If node indices are specified, they are added to the group. If no nodes are specified, the group is created without any nodes.