Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bugs, add new metrics and tasks #169

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Datasets
datasets/pyhealth.datasets.SampleEHRDataset
datasets/pyhealth.datasets.SampleSignalDataset
datasets/pyhealth.datasets.MIMIC3Dataset
datasets/pyhealth.datasets.MIMICExtractDataset
datasets/pyhealth.datasets.MIMIC4Dataset
datasets/pyhealth.datasets.eICUDataset
datasets/pyhealth.datasets.OMOPDataset
Expand Down
15 changes: 15 additions & 0 deletions docs/api/datasets/pyhealth.datasets.MIMICExtractDataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
pyhealth.datasets.MIMICExtractDataset
===================================

The open Medical Information Mart for Intensive Care (MIMIC-III) database, refer to `doc <https://mimic.mit.edu/>`_ for more information. We process this database into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.

.. autoclass:: pyhealth.datasets.MIMICExtractDataset
:members:
:undoc-members:
:show-inheritance:






2 changes: 1 addition & 1 deletion docs/api/datasets/pyhealth.datasets.OMOPDataset.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
pyhealth.datasets.OMOPDataset
===================================

We can process any OMOP-CDM formatted database, refer to `doc <https://www.ohdsi.org/data-standardization/the-common-data-model/>`_ for more information. We it into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.
We can process any OMOP-CDM formatted database, refer to `doc <https://www.ohdsi.org/data-standardization/the-common-data-model/>`_ for more information. The raw data is processed into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.

.. autoclass:: pyhealth.datasets.OMOPDataset
:members:
Expand Down
1 change: 1 addition & 0 deletions docs/api/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ We implement the following models for supporting multiple healthcare predictive
models/pyhealth.models.GAMENet
models/pyhealth.models.MICRON
models/pyhealth.models.SafeDrug
models/pyhealth.models.MoleRec
models/pyhealth.models.Deepr
models/pyhealth.models.ContraWR
models/pyhealth.models.SparcNet
Expand Down
14 changes: 14 additions & 0 deletions docs/api/models/pyhealth.models.MoleRec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
pyhealth.models.MoleRec
===================================

The separate callable MoleRecLayer and the complete MoleRec model.

.. autoclass:: pyhealth.models.MoleRecLayer
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: pyhealth.models.MoleRec
:members:
:undoc-members:
:show-inheritance:
64 changes: 32 additions & 32 deletions docs/log.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,121 +4,121 @@ We track the new development here:

**May 9, 2023**

.. code-block:: bash
.. code-block:: rst

1. add MIMIC-Extract dataset `#136 <https://github.com/sunlabuiuc/PyHealth/pull/136>`_
1. add MIMIC-Extract dataset `#136`
2. add new maintainer members for pyhealth: Junyi Gao and Benjamin Danek

**May 6, 2023**

.. code-block:: bash
.. code-block:: rst

1. add new parser functions (admissionDx, diagnosisStrings) and prediction tasks for eICU dataset `#148 <https://github.com/sunlabuiuc/PyHealth/pull/148>`_
1. add new parser functions (admissionDx, diagnosisStrings) and prediction tasks for eICU dataset `#148`

**Apr 27, 2023**

.. code-block:: bash
.. code-block:: rst

1. add MoleRec model (WWW'23) for drug recommendation `#122 <https://github.com/sunlabuiuc/PyHealth/pull/122>`_
1. add MoleRec model (WWW'23) for drug recommendation `#122`

**Apr 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. fix bugs in GRASP model `#141 <https://github.com/sunlabuiuc/PyHealth/pull/141>`_
2. add pandas install <2 constraints `#135 <https://github.com/sunlabuiuc/PyHealth/pull/135>`_
3. add hcpcsevents table process in MIMIC4 dataset `#134 <https://github.com/sunlabuiuc/PyHealth/pull/134>`_
1. fix bugs in GRASP model `#141`
2. add pandas install <2 constraints `#135`
3. add hcpcsevents table process in MIMIC4 dataset `#134`

**Apr 10, 2023**

.. code-block:: bash
.. code-block:: rst

1. fix Ambiguous datetime usage in eICU (https://github.com/sunlabuiuc/PyHealth/pull/132)

**Mar 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. add the entire uncertainty quantification module (https://github.com/sunlabuiuc/PyHealth/pull/111)

**Feb 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. add 6 EHR predictiom model: Adacare, Concare, Stagenet, TCN, Grasp, Agent

**Feb 24, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest for omop dataset
2. add github action triggered manually, check #104
2. add github action triggered manually, check `#104`

**Feb 19, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest for eicu dataset
2. add ISRUC dataset (and task function) for signal learning

**Feb 12, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest for mimiciii, mimiciv
2. add SHHS datasets for sleep staging task
3. add SparcNet model for signal classification task

**Feb 08, 2023**

.. code-block:: bash
.. code-block:: rst

1. complete the biosignal data support, add ContraWR [1] model for general purpose biosignal classification task ([1] Yang, Chaoqi, Danica Xiao, M. Brandon Westover, and Jimeng Sun.
"Self-supervised eeg representation learning for automatic sleep staging."
arXiv preprint arXiv:2110.15278 (2021).)

**Feb 07, 2023**

.. code-block:: bash
.. code-block:: rst

1. Support signal dataset processing and split: add SampleSignalDataset, BaseSignalDataset. Use SleepEDFcassette dataset as the first signal dataset. Use example/sleep_staging_sleepEDF_contrawr.py
2. rename the dataset/ parts: previous BaseDataset becomes BaseEHRDataset and SampleDatast becomes SampleEHRDataset. Right now, BaseDataset will be inherited by BaseEHRDataset and BaseSignalDataset. SampleBaseDataset will be inherited by SampleEHRDataset and SampleSignalDataset.

**Feb 06, 2023**

.. code-block:: bash
.. code-block:: rst

1. improve readme style
2. add the pyhealth live 06 and 07 link to pyhealth live

**Feb 01, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest of PyHealth MedCode and Tokenizer

**Jan 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. accelerate MIMIC-IV, eICU and OMOP data loading by using multiprocessing (pandarallel)

**Jan 25, 2023**

.. code-block:: bash
.. code-block:: rst

1. accelerate the MIMIC-III data loading process by using multiprocessing (pandarallel)

**Jan 24, 2023**

.. code-block:: bash
.. code-block:: rst

1. Fix the code typo in pyhealth/tasks/drug_recommendation.py for issue #71.
1. Fix the code typo in pyhealth/tasks/drug_recommendation.py for issue `#71`.
2. update the pyhealth live schedule

**Jan 22, 2023**

.. code-block:: bash
.. code-block:: rst

1. Fix the list of list of vector problem in RNN, Transformer, RETAIN, and CNN
2. Add initialization examples for RNN, Transformer, RETAIN, CNN, and Deepr
Expand All @@ -128,42 +128,42 @@ We track the new development here:

**Jan 21, 2023**

.. code-block:: bash
.. code-block:: rst

1. Added a new model, Deepr (models.Deepr)

**Jan 20, 2023**

.. code-block:: bash
.. code-block:: rst

1. add the pyhealth live 05
2. add slack channel invitation in pyhealth live page

**Jan 13, 2023**

.. code-block:: bash
.. code-block:: rst

1. add the pyhealth live 03 and 04 video link to the nagivation
2. add future pyhealth live schedule

**Jan 8, 2023**

.. code-block:: bash
.. code-block:: rst

1. Changed BaseModel.add_feature_transform_layer in models/base_model.py so that it accepts special_tokens if necessary
2. fix an int/float bug in dataset checking (transform int to float and then process them uniformly)

**Dec 26, 2022**

.. code-block:: bash
.. code-block:: rst

1. add examples to pyhealth.data, pyhealth.datasets
2. improve jupyter notebook tutorials 0, 1, 2


**Dec 21, 2022**

.. code-block:: bash
.. code-block:: rst

1. add the development logs to the navigation
2. add the pyhealth live schedule to the nagivation
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Sphinx==5.2.3
sphinx-automodapi>
sphinx-automodapi
sphinx-autodoc-annotation
sphinx_last_updated_by_git
sphinxcontrib-spelling
Expand Down
113 changes: 113 additions & 0 deletions pyhealth/metrics/early_prediction_score.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
import numpy as np
from typing import Dict, Optional


def calculate_confusion_matrix_value_result(outcome_pred, outcome_true):
outcome_pred = 1 if outcome_pred > 0.5 else 0
if outcome_pred == 1 and outcome_true == 1:
return "tp"
elif outcome_pred == 0 and outcome_true == 0:
return "tn"
elif outcome_pred == 1 and outcome_true == 0:
return "fp"
elif outcome_pred == 0 and outcome_true == 1:
return "fn"
else:
raise ValueError("Unknown value occurred")

def calculate_es(los_true, threshold, penalty, case="tp"):
metric = 0.0
if case == "tp":
if los_true >= threshold: # predict correct in early stage
metric = 1
else:
metric = los_true / threshold
elif case == "fn":
if los_true >= threshold: # predict wrong in early stage
metric = 0
else:
metric = los_true / threshold - 1
elif case == "tn":
metric = 0.0
elif case == "fp":
metric = penalty # penalty term
return metric


def early_prediction_score(
y_true_outcome: np.ndarray,
y_true_los: np.ndarray,
y_prob: np.ndarray,
late_threshold: Optional[float] = None,
fp_penalty: Optional[float] = -0.1
) -> Dict[str, float]:
"""Computes early prediction score for binary classification.

Paper: Junyi Gao, et al. A Comprehensive Benchmark for COVID-19 Predictive Modeling
Using Electronic Health Records in Intensive Care: Choosing the Best Model for
COVID-19 Prognosis. arXiv preprint arXiv:2209.07805, 2023.

Args:
y_true_outcome: True target outcome of shape (n_samples,).
y_true_los: Time to true target outcome of shape (n_samples,).
y_prob: Predicted probabilities of shape (n_samples,).
late_threshold: Threshold gamma for late prediction penalties. Default is 0.5 *
mean(y_true_los).
fp_penalty: Penalty term for false positive predictions. Default is -0.1.

Returns:
Dictionary of metrics whose keys are the metric names and values are
the metric values.

Examples:
>>> from pyhealth.metrics import early_prediction_score
>>> y_true_outcome = np.array([0, 0, 1, 1])
>>> y_true_los = np.array([5, 3, 8, 1])
>>> y_prob = np.array([0.1, 0.4, 0.7, 0.8])
>>> early_prediction_score(y_true_outcome, y_true_los, y_prob)
{'score': 0.5952380952380952, 'late_threshold': 2.125, 'fp_penalty': 0.1}
"""
metric = []
metric_optimal = []
num_records = len(y_prob)

if late_threshold is None:
late_threshold = 0.5 * np.mean(y_true_los)

for i in range(num_records):
cur_outcome_pred = y_prob[i]
cur_outcome_true = y_true_outcome[i]
cur_los_true = y_true_los[i]
prediction_result = calculate_confusion_matrix_value_result(cur_outcome_pred, cur_outcome_true)
prediction_result_optimal = calculate_confusion_matrix_value_result(cur_outcome_true, cur_outcome_true)
metric.append(
calculate_es(
cur_los_true,
late_threshold,
penalty=fp_penalty,
case=prediction_result,
)
)
metric_optimal.append(
calculate_es(
cur_los_true,
late_threshold,
penalty=fp_penalty,
case=prediction_result_optimal,
)
)
metric = np.array(metric)
metric_optimal = np.array(metric_optimal)
result = 0.0
if metric_optimal.sum() > 0.0:
result = metric.sum() / metric_optimal.sum()
result = max(result, -1.0)
if isinstance(result, np.float64):
result = result.item()
return {"score": result, 'late_threshold': late_threshold, 'fp_penalty': fp_penalty}

if __name__ == "__main__":
y_true_outcome = np.array([0, 1, 1, 1])
y_true_los = np.array([5, 3, 8, 1])
y_prob = np.array([0.1, 0.4, 0.7, 0.8])
print(early_prediction_score(y_true_outcome, y_true_los, y_prob))
Loading