- Led a Generative AI team in collaboration with Providence Health to use Large Language Models to extract complex unstructured data from 25,217 notes from 795 pregnant patients.
- Used Chain-of-Thought prompting and few-shot learning to extract SDoH (Social Determinants of Health) data including housing insecurity with higher recall than human annotators (0.92) and a precision of 0.85.
- More efficient methods for obtaining structured SDoH data can help accelerate inclusion of exposome variables in biomedical research, and support healthcare systems in identifying patients who could benefit from proactive outreach.
Comparison of recall and precision for Regex, GPT-3.5, GPT-4, and manual annotation in identifying notes with current or past housing instability, measured on 539 manually annotated notes.
- Developed an XGBoost model with an AUROC of 0.83 to identify patients at high risk of CKD (Chronic Kidney Disease).
- Developed CROP (Clinical Recruitment Optimization Pipeline), a statistical method that improves ML predictions to identify high-risk patients for clinical trials.
- Use of CROP & CKD risk model resulted in a six-fold decrease in the total number of patients needed for clinical trial recruitment.
CROP procedure of adjusting model probabilities and estimating the number of transitions under the Poisson model.
- Utilized the Snorkel system to build a training set of labeled biomimicry papers.
- Labeling our data by hand was prohibitively slow, so we turned to a weak supervision approach using labeling functions (LFs) in Snorkel.
- LFs are noisy, programmatic rules and heuristics that assign labels to unlabeled training data.
- We successfully trained a classifier that could predict what label a certain biomimicry paper should receive with 95% accuracy.
An overview of the Snorkel system. (1) Subject matter experts (SME) users write labeling functions (LFs) that express weak supervision sources like distant supervision, patterns, and heuristics. (2) Snorkel applies the LFs over unlabeled data and learns a generative model to combine the LFs' outputs into probabilistic labels. (3) Snorkel uses these labels to train a discriminative classification model, such as a deep neural network. Adapted from Ratner et. al (2017).
For questions contact Alexandra Ralevski ([email protected])