MedicalCodingPipeline and SummarizationPipeline implementations #95

jenniferjiangkells · 2024-10-31T18:32:55Z

Description

Implement MedicalCodingPipeline and SummarizationPipeline

Related Issue

#55

Changes Made

I come, once again, bearing breaking changes.

💥 Changes to Document container class: ordered by sub-containers nlp, concepts, hl7, cds, models for better organisation. Each attribute is in charge of handling specific data handling, usually via getter and setter functions.
- Changed .add_huggingface_output() etc to .add_output(integration_name, task, output) - easier to access and manage
- Added models.get_generated_text() method,
Changes to CcdData: uses a ConceptLists dataclass to contain problems, medications, allergies concepts for better interface with the Document class.
Changes to .load() method for BasePipeline: this method now configures the pipeline with additional logic that parses a model and model source (either string - name of model or path to model or a callable - langchain chain object) into a ModelConfig object.
Added ModelRouter, a helper which returns the appropriate integration component given a ModelConfig
Templates: Users can pass in a Jinja template for custom CDS cards (this will extend to CDAs too, but that's a matter for a different issue).
Added CdsCardCreator: this component either extracts generated text from model outputs in the pipeline or takes in specified static content and parses this into a CDS Card object using Jinja templates (a default is used if not provided).
Renamed integration components to be more descriptive: SpacyComponent -> SpacyNLP, HuggingFaceComponent -> HFTransformer, LangchainComponent -> LangChainLLM
- Also pass kwargs to integration components
Added ._add_concepts_to_hc_doc() helper method to SpacyNLP, which takes the entities from the the spacy doc and parses it to Concept and adds it to the .concepts attribute in Document. This is hard coded to always add new concepts as SNOMED Problems for now, but will be made configurable in future.
Removed default spacy tokenizer in TextPreprocessor: this is redundant as can just use SpacyNLP. For better separation of concern this component is just for very simple text preprocessing - the default is .split() but users can also pass in a tokenizer object (Callable) to use with the component.
And finally, added MedicalCodingPipeline and SummarizationPipeline implementation.
- the pipeline does some internal coercion to make the task either ner or summarization, but no strict validation yet

Testing

Added tests for:

CdsCardCreator: test_card_creator.py
ModelRouter: test_modelrouter.py
pipeline .load() method: test_pipeline_load.py
Pipeline implementations: test_medicalcoding.py, test_summarization.py
check that kwargs are properly propagated in integration components: test_integrations.py
check that TextPreprocessor initializes tokenizer object - test_preprocessor.py
updated tests for Document methods - test_containers.py

Documentation

Updated relevant documentation
Updated cookbook examples

…types

…n of BaseObject and base.py in modules

…to feature/cda-connector

…ed in data generator

…plates to card reader

…mplate path as init option

jenniferjiangkells and others added 30 commits October 10, 2024 16:51

Added connector modules

e03111b

Fix typo

9fc2f36

Added processing of io connectors in pipelines

453b636

Refactored CDA related processing in use case to connectors

7427aa8

Added tests

ffe36a4

Added CdsFhirConnector

c45a018

first pass at adding spacy and hf integrations

5a15cdb

Updated use case functions and tests

dad5336

WIP connector usage in pipelines and components

ee797da

Fix model import name in docs

83d3299

Update Bundle validator method to dynamically import nested resource …

32fa8bb

…types

Update CdsFhirConnector input method validations

ad4a4a7

Add create method to CdsFhirData

45400fa

Fixed CdsResponse should return list of actions

ba6f847

Added tests

c5aa473

Added pipeline tests

6f6a2f6

fix pyproject

ca73827

adding langchain and modifying document

b931229

added testing

7a3b904

Changed .add() -> .add_node() to make more explicit and use conventio…

0a30447

…n of BaseObject and base.py in modules

Update documentation to reflect changes in this PR

1ffa34b

Merge branch 'main' of https://github.com/dotimplement/HealthChain in…

c921aba

…to feature/cda-connector

adding docs

1e057c9

finish docs

86c3b9f

fix test

f1dc664

fix test2

d1798dd

WIP

6a38c72

skip transformers test

c01b61d

fix tests

694cd74

adding magicmock for iterable

4b8c91d

jenniferjiangkells added 26 commits November 6, 2024 15:05

Change load method to use source parameter

3afdb6a

Renamed integration components

03b0def

Remove spacy from preprocessor component and allow callable instead

4086a8c

Pass kwargs to integration components

d9f2a6d

Added CdsCardCreator implementation

1f9dd08

Updated tests for prebuilt pipelines

0b24f70

Added tests for pipeline loading method and modelrouter

c26d08e

Update test for spacy integration

44e6475

Tweak fixture

981e9d5

Use Mixin for ModelRouter

1119e8c

Clean up __init__ imports

3ee43ba

Fix resourceType not showing up by explicitly passing it in when call…

a4e5bfa

…ed in data generator

Parse text from DocumentReference in cdsfhir

22a5c49

Add delimiter to create multiple cards and basic text cleaner for tem…

1ba3a73

…plates to card reader

Make model loading more explicit and added langchain routing

46453f0

Update prebuilt pipeline initialization methods

0bec7a0

Update tests

0ae269d

Added cookbook

a295166

Moved default mapping initialization inside data generator

cba3b17

Split .load method to from_model_id and from_local_model and added te…

3977f75

…mplate path as init option

Update tests and docs

9970e0b

Tidy up docstrings and .load usage

69c5f0f

Update tests

16c56ed

Update documentation

16c6197

Add cookbook examples

ac7009a

Update dependencies

4660b7b

jenniferjiangkells marked this pull request as ready for review November 20, 2024 12:09

jenniferjiangkells requested a review from adamkells November 20, 2024 12:09

jenniferjiangkells linked an issue Nov 20, 2024 that may be closed by this pull request

Add out-of-the-box pipelines and components for common NLP / LLM use case #55

Closed

jenniferjiangkells merged commit 6e80363 into main Nov 20, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MedicalCodingPipeline and SummarizationPipeline implementations #95

MedicalCodingPipeline and SummarizationPipeline implementations #95

jenniferjiangkells commented Oct 31, 2024 •

edited

Loading

MedicalCodingPipeline and SummarizationPipeline implementations #95

MedicalCodingPipeline and SummarizationPipeline implementations #95

Conversation

jenniferjiangkells commented Oct 31, 2024 • edited Loading

Description

Related Issue

Changes Made

Testing

Documentation

jenniferjiangkells commented Oct 31, 2024 •

edited

Loading