sunlabuiuc · v1xerunt · May 10, 2023 · May 10, 2023 · May 10, 2023 · May 10, 2023
diff --git a/docs/api/datasets.rst b/docs/api/datasets.rst
@@ -10,6 +10,7 @@ Datasets
     datasets/pyhealth.datasets.SampleEHRDataset
     datasets/pyhealth.datasets.SampleSignalDataset
     datasets/pyhealth.datasets.MIMIC3Dataset
+    datasets/pyhealth.datasets.MIMICExtractDataset
     datasets/pyhealth.datasets.MIMIC4Dataset
     datasets/pyhealth.datasets.eICUDataset
     datasets/pyhealth.datasets.OMOPDataset

diff --git a/docs/api/datasets/pyhealth.datasets.MIMICExtractDataset.rst b/docs/api/datasets/pyhealth.datasets.MIMICExtractDataset.rst
@@ -0,0 +1,15 @@
+pyhealth.datasets.MIMICExtractDataset
+===================================
+
+The open Medical Information Mart for Intensive Care (MIMIC-III) database, refer to `doc <https://mimic.mit.edu/>`_ for more information. We process this database into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.
+
+.. autoclass:: pyhealth.datasets.MIMICExtractDataset
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+
+
+
+
+
diff --git a/docs/api/datasets/pyhealth.datasets.OMOPDataset.rst b/docs/api/datasets/pyhealth.datasets.OMOPDataset.rst
@@ -1,7 +1,7 @@
 pyhealth.datasets.OMOPDataset
 ===================================
 
-We can process any OMOP-CDM formatted database, refer to `doc <https://www.ohdsi.org/data-standardization/the-common-data-model/>`_ for more information. We it into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.
+We can process any OMOP-CDM formatted database, refer to `doc <https://www.ohdsi.org/data-standardization/the-common-data-model/>`_ for more information. The raw data is processed into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.
 
 .. autoclass:: pyhealth.datasets.OMOPDataset
     :members:

diff --git a/docs/api/models.rst b/docs/api/models.rst
@@ -15,6 +15,7 @@ We implement the following models for supporting multiple healthcare predictive
     models/pyhealth.models.GAMENet
     models/pyhealth.models.MICRON
     models/pyhealth.models.SafeDrug
+    models/pyhealth.models.MoleRec
     models/pyhealth.models.Deepr
     models/pyhealth.models.ContraWR
     models/pyhealth.models.SparcNet

diff --git a/docs/api/models/pyhealth.models.MoleRec.rst b/docs/api/models/pyhealth.models.MoleRec.rst
@@ -0,0 +1,14 @@
+pyhealth.models.MoleRec
+===================================
+
+The separate callable MoleRecLayer and the complete MoleRec model.
+
+.. autoclass:: pyhealth.models.MoleRecLayer
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+.. autoclass:: pyhealth.models.MoleRec
+    :members:
+    :undoc-members:
+    :show-inheritance:
diff --git a/docs/log.rst b/docs/log.rst
@@ -4,121 +4,121 @@ We track the new development here:
 
 **May 9, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
-    1. add MIMIC-Extract dataset  `#136 <https://github.com/sunlabuiuc/PyHealth/pull/136>`_
+    1. add MIMIC-Extract dataset  `#136`
     2. add new maintainer members for pyhealth: Junyi Gao and Benjamin Danek
 
 **May 6, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
-    1. add new parser functions (admissionDx, diagnosisStrings) and prediction tasks for eICU dataset  `#148 <https://github.com/sunlabuiuc/PyHealth/pull/148>`_
+    1. add new parser functions (admissionDx, diagnosisStrings) and prediction tasks for eICU dataset `#148`
 
 **Apr 27, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
-    1. add MoleRec model (WWW'23) for drug recommendation `#122 <https://github.com/sunlabuiuc/PyHealth/pull/122>`_
+    1. add MoleRec model (WWW'23) for drug recommendation `#122`
 
 **Apr 26, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
-    1. fix bugs in GRASP model `#141 <https://github.com/sunlabuiuc/PyHealth/pull/141>`_
-    2. add pandas install <2 constraints `#135 <https://github.com/sunlabuiuc/PyHealth/pull/135>`_
-    3. add hcpcsevents table process in MIMIC4 dataset `#134 <https://github.com/sunlabuiuc/PyHealth/pull/134>`_
+    1. fix bugs in GRASP model `#141`
+    2. add pandas install <2 constraints `#135` 
+    3. add hcpcsevents table process in MIMIC4 dataset `#134`
 
 **Apr 10, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. fix Ambiguous datetime usage in eICU (https://github.com/sunlabuiuc/PyHealth/pull/132)
 
 **Mar 26, 2023**    
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add the entire uncertainty quantification module (https://github.com/sunlabuiuc/PyHealth/pull/111)
 
 **Feb 26, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add 6 EHR predictiom model: Adacare, Concare, Stagenet, TCN, Grasp, Agent
 
 **Feb 24, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add unittest for omop dataset
-    2. add github action triggered manually, check #104
+    2. add github action triggered manually, check `#104`
 
 **Feb 19, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add unittest for eicu dataset
     2. add ISRUC dataset (and task function) for signal learning
 
 **Feb 12, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add unittest for mimiciii, mimiciv
     2. add SHHS datasets for sleep staging task
     3. add SparcNet model for signal classification task
 
 **Feb 08, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. complete the biosignal data support, add ContraWR [1] model for general purpose biosignal classification task ([1] Yang, Chaoqi, Danica Xiao, M. Brandon Westover, and Jimeng Sun. 
         "Self-supervised eeg representation learning for automatic sleep staging."
         arXiv preprint arXiv:2110.15278 (2021).)
 
 **Feb 07, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. Support signal dataset processing and split: add SampleSignalDataset, BaseSignalDataset. Use SleepEDFcassette dataset as the first signal dataset. Use example/sleep_staging_sleepEDF_contrawr.py
     2. rename the dataset/ parts: previous BaseDataset becomes BaseEHRDataset and SampleDatast becomes SampleEHRDataset. Right now, BaseDataset will be inherited by BaseEHRDataset and BaseSignalDataset. SampleBaseDataset will be inherited by SampleEHRDataset and SampleSignalDataset.
 
 **Feb 06, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. improve readme style
     2. add the pyhealth live 06 and 07 link to pyhealth live
 
 **Feb 01, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add unittest of PyHealth MedCode and Tokenizer
 
 **Jan 26, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. accelerate MIMIC-IV, eICU and OMOP data loading by using multiprocessing (pandarallel)
 
 **Jan 25, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. accelerate the MIMIC-III data loading process by using multiprocessing (pandarallel)
 
 **Jan 24, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
-    1. Fix the code typo in pyhealth/tasks/drug_recommendation.py for issue #71.
+    1. Fix the code typo in pyhealth/tasks/drug_recommendation.py for issue `#71`.
     2. update the pyhealth live schedule 
 
 **Jan 22, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. Fix the list of list of vector problem in RNN, Transformer, RETAIN, and CNN
     2. Add initialization examples for RNN, Transformer, RETAIN, CNN, and Deepr
@@ -128,42 +128,42 @@ We track the new development here:
 
 **Jan 21, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. Added a new model, Deepr (models.Deepr)
 
 **Jan 20, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add the pyhealth live 05
     2. add slack channel invitation in pyhealth live page
 
 **Jan 13, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add the pyhealth live 03 and 04 video link to the nagivation
     2. add future pyhealth live schedule
 
 **Jan 8, 2023**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. Changed BaseModel.add_feature_transform_layer in models/base_model.py so that it accepts special_tokens if necessary
     2. fix an int/float bug in dataset checking (transform int to float and then process them uniformly)
 
 **Dec 26, 2022**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add examples to pyhealth.data, pyhealth.datasets
     2. improve jupyter notebook tutorials 0, 1, 2
 
 
 **Dec 21, 2022**
 
-.. code-block:: bash
+.. code-block:: rst
 
     1. add the development logs to the navigation
     2. add the pyhealth live schedule to the nagivation
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,5 +1,5 @@
 Sphinx==5.2.3
-sphinx-automodapi>
+sphinx-automodapi
 sphinx-autodoc-annotation
 sphinx_last_updated_by_git
 sphinxcontrib-spelling

diff --git a/pyhealth/metrics/early_prediction_score.py b/pyhealth/metrics/early_prediction_score.py
@@ -0,0 +1,113 @@
+import numpy as np
+from typing import Dict, Optional
+
+
+def calculate_confusion_matrix_value_result(outcome_pred, outcome_true):
+    outcome_pred = 1 if outcome_pred > 0.5 else 0
+    if outcome_pred == 1 and outcome_true == 1:
+        return "tp"
+    elif outcome_pred == 0 and outcome_true == 0:
+        return "tn"
+    elif outcome_pred == 1 and outcome_true == 0:
+        return "fp"
+    elif outcome_pred == 0 and outcome_true == 1:
+        return "fn"
+    else:
+        raise ValueError("Unknown value occurred")
+
+def calculate_es(los_true, threshold, penalty, case="tp"):
+    metric = 0.0
+    if case == "tp":
+        if los_true >= threshold:  # predict correct in early stage
+            metric = 1
+        else:
+            metric = los_true / threshold
+    elif case == "fn":
+        if los_true >= threshold:  # predict wrong in early stage
+            metric = 0
+        else:
+            metric = los_true / threshold - 1
+    elif case == "tn":
+        metric = 0.0
+    elif case == "fp":
+        metric = penalty # penalty term
+    return metric
+
+
+def early_prediction_score(
+    y_true_outcome: np.ndarray,
+    y_true_los: np.ndarray,
+    y_prob: np.ndarray,
+    late_threshold: Optional[float] = None,
+    fp_penalty: Optional[float] = -0.1
+) -> Dict[str, float]:
+    """Computes early prediction score for binary classification.
+
+    Paper: Junyi Gao, et al. A Comprehensive Benchmark for COVID-19 Predictive Modeling
+    Using Electronic Health Records in Intensive Care: Choosing the Best Model for
+    COVID-19 Prognosis. arXiv preprint arXiv:2209.07805, 2023.
+
+    Args:
+        y_true_outcome: True target outcome of shape (n_samples,).
+        y_true_los: Time to true target outcome of shape (n_samples,).
+        y_prob: Predicted probabilities of shape (n_samples,).
+        late_threshold: Threshold gamma for late prediction penalties. Default is 0.5 *
+        mean(y_true_los).
+        fp_penalty: Penalty term for false positive predictions. Default is -0.1.
+
+    Returns:
+        Dictionary of metrics whose keys are the metric names and values are
+            the metric values.
+
+    Examples:
+        >>> from pyhealth.metrics import early_prediction_score
+        >>> y_true_outcome = np.array([0, 0, 1, 1])
+        >>> y_true_los = np.array([5, 3, 8, 1])
+        >>> y_prob = np.array([0.1, 0.4, 0.7, 0.8])
+        >>> early_prediction_score(y_true_outcome, y_true_los, y_prob)
+        {'score': 0.5952380952380952, 'late_threshold': 2.125, 'fp_penalty': 0.1}
+    """
+    metric = []
+    metric_optimal = []
+    num_records = len(y_prob)
+
+    if late_threshold is None:
+        late_threshold = 0.5 * np.mean(y_true_los)
+
+    for i in range(num_records):
+        cur_outcome_pred = y_prob[i]
+        cur_outcome_true = y_true_outcome[i]
+        cur_los_true = y_true_los[i]
+        prediction_result = calculate_confusion_matrix_value_result(cur_outcome_pred, cur_outcome_true)
+        prediction_result_optimal = calculate_confusion_matrix_value_result(cur_outcome_true, cur_outcome_true)
+        metric.append(
+            calculate_es(
+                cur_los_true,
+                late_threshold,
+                penalty=fp_penalty,
+                case=prediction_result,
+            )
+        )
+        metric_optimal.append(
+            calculate_es(
+                cur_los_true,
+                late_threshold,
+                penalty=fp_penalty,
+                case=prediction_result_optimal,
+            )
+        )
+    metric = np.array(metric)
+    metric_optimal = np.array(metric_optimal)
+    result = 0.0
+    if metric_optimal.sum() > 0.0:
+        result = metric.sum() / metric_optimal.sum()
+    result = max(result, -1.0)
+    if isinstance(result, np.float64):
+        result = result.item()
+    return {"score": result, 'late_threshold': late_threshold, 'fp_penalty': fp_penalty}
+
+if __name__ == "__main__":
+    y_true_outcome = np.array([0, 1, 1, 1])
+    y_true_los = np.array([5, 3, 8, 1])
+    y_prob = np.array([0.1, 0.4, 0.7, 0.8])
+    print(early_prediction_score(y_true_outcome, y_true_los, y_prob))