Skip to content

dotimplement/HealthChain

Repository files navigation

HealthChain πŸ’« πŸ₯

HealthChain Logo

GitHub License PyPI Version Python Versions Downloads

Simplify developing, testing and validating AI and NLP applications in a healthcare context πŸ’« πŸ₯.

Building applications that integrate with electronic health record systems (EHRs) is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.

pip install healthchain

First time here? Check out our Docs page!

Came here from NHS RPySOC 2024 ✨? CDS sandbox walkthrough

Features

  • πŸ› οΈ Build custom pipelines or use pre-built ones for your healthcare NLP and ML tasks
  • πŸ—οΈ Add built-in CDA and FHIR parsers to connect your pipeline to interoperability standards
  • πŸ§ͺ Test your pipelines in full healthcare-context aware sandbox environments
  • πŸ—ƒοΈ Generate synthetic healthcare data for testing and development
  • πŸš€ Deploy sandbox servers locally with FastAPI

Why use HealthChain?

  • EHR integrations are manual and time-consuming - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
  • It's difficult to track and evaluate multiple integration instances - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
  • Most healthcare data is unstructured - HealthChain is optimized for real-time AI and NLP applications that deal with realistic healthcare data.
  • Built by health tech developers, for health tech developers - HealthChain is tech stack agnostic, modular, and easily extensible.

Pipeline

Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily integrate with complex healthcare systems.

Building a pipeline

from healthchain.io.containers import Document
from healthchain.pipeline import Pipeline
from healthchain.pipeline.components import TextPreProcessor, SpacyNLP, TextPostProcessor

# Initialize the pipeline
nlp_pipeline = Pipeline[Document]()

# Add TextPreProcessor component
preprocessor = TextPreProcessor(tokenizer="spacy")
nlp_pipeline.add_node(preprocessor)

# Add Model component (assuming we have a pre-trained model)
spacy_nlp = SpacyNLP.from_model_id("en_core_sci_md", source="spacy")
nlp_pipeline.add_node(spacy_nlp)

# Add TextPostProcessor component
postprocessor = TextPostProcessor(
    postcoordination_lookup={
        "heart attack": "myocardial infarction",
        "high blood pressure": "hypertension"
    }
)
nlp_pipeline.add_node(postprocessor)

# Build the pipeline
nlp = nlp_pipeline.build()

# Use the pipeline
result = nlp(Document("Patient has a history of heart attack and high blood pressure."))

print(f"Entities: {result.nlp.spacy_doc.ents}")

Adding connectors

Connectors give your pipelines the ability to interface with EHRs.

from healthchain.io import CdaConnector
from healthchain.models import CdaRequest

cda_connector = CdaConnector()

pipeline.add_input(cda_connector)
pipeline.add_output(cda_connector)

pipe = pipeline.build()

cda_data = CdaRequest(document="<CDA XML content>")
output = pipe(cda_data)

Using pre-built pipelines

Pre-built pipelines are use case specific end-to-end workflows that already have connectors and models built-in.

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.models import CdaRequest

# Load from model ID
pipeline = MedicalCodingPipeline.from_model_id(
    model="blaze999/Medical-NER", task="token-classification", source="huggingface"
)

# Or load from local model
pipeline = MedicalCodingPipeline.from_local_model("./path/to/model", source="spacy")

cda_data = CdaRequest(document="<CDA XML content>")
output = pipeline(cda_data)

Sandbox

Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.

Clinical Decision Support (CDS)

CDS Hooks is an HL7 published specification for clinical decision support.

When is this used? CDS hooks are triggered at certain events during a clinician's workflow in an electronic health record (EHR), e.g. when a patient record is opened, when an order is elected.

What information is sent: the context of the event and FHIR resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.

What information is returned: β€œcards” displaying text, actionable suggestions, or links to launch a SMART app from within the workflow.

import healthchain as hc

from healthchain.pipeline import SummarizationPipeline
from healthchain.use_cases import ClinicalDecisionSupport
from healthchain.models import Card, CdsFhirData, CDSRequest
from healthchain.data_generator import CdsDataGenerator
from typing import List

@hc.sandbox
class MyCDS(ClinicalDecisionSupport):
    def __init__(self) -> None:
        self.pipeline = SummarizationPipeline.from_model_id(
            "facebook/bart-large-cnn", source="huggingface"
        )
        self.data_generator = CdsDataGenerator()

    # Sets up an instance of a mock EHR client of the specified workflow
    @hc.ehr(workflow="encounter-discharge")
    def ehr_database_client(self) -> CdsFhirData:
        return self.data_generator.generate()

    # Define your application logic here
    @hc.api
    def my_service(self, data: CDSRequest) -> CDSRequest:
        result = self.pipeline(data)
        return result

Clinical Documentation

The ClinicalDocumentation use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.

When is this used? Triggered when a clinician opts in to a CDI functionality (e.g. Epic NoteReader) and signs or pends a note after writing it.

What information is sent: A CDA (Clinical Document Architecture) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.

import healthchain as hc

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.use_cases import ClinicalDocumentation
from healthchain.models import CcdData, CdaRequest, CdaResponse

@hc.sandbox
class NotereaderSandbox(ClinicalDocumentation):
    def __init__(self):
        self.pipeline = MedicalCodingPipeline.from_model_id(
            "en_core_sci_md", source="spacy"
        )

    # Load an existing CDA file
    @hc.ehr(workflow="sign-note-inpatient")
    def load_data_in_client(self) -> CcdData:
        with open("/path/to/cda/data.xml", "r") as file:
            xml_string = file.read()

        return CcdData(cda_xml=xml_string)

    @hc.api
    def my_service(self, data: CdaRequest) -> CdaResponse:
        annotated_ccd = self.pipeline(data)
        return annotated_ccd

Running a sandbox

Ensure you run the following commands in your mycds.py file:

cds = MyCDS()
cds.run_sandbox()

This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in the ./output directory.

Then run:

healthchain run mycds.py

By default, the server runs at http://127.0.0.1:8000, and you can interact with the exposed endpoints at /docs.

Road Map

  • πŸŽ›οΈ Versioning and artifact management for pipelines sandbox EHR configurations
  • ❓ Testing and evaluation framework for pipelines and use cases
  • 🧠 Multi-modal pipelines that that have built-in NLP to utilize unstructured data
  • ✨ Improvements to synthetic data generator methods
  • πŸ‘Ύ Frontend UI for EHR client and visualization features
  • πŸš€ Production deployment options

Contribute

We are always eager to hear feedback and suggestions, especially if you are a developer or researcher working with healthcare systems!

Acknowledgement

This repository makes use of CDS Hooks developed by Boston Children’s Hospital.