-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MedicalCodingPipeline and SummarizationPipeline implementations #95
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…n of BaseObject and base.py in modules
…to feature/cda-connector
…ed in data generator
…plates to card reader
…mplate path as init option
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Implement
MedicalCodingPipeline
andSummarizationPipeline
Related Issue
#55
Changes Made
I come, once again, bearing breaking changes.
💥 Changes to
Document
container class: ordered by sub-containersnlp
,concepts
,hl7
,cds
,models
for better organisation. Each attribute is in charge of handling specific data handling, usually via getter and setter functions..add_huggingface_output()
etc to.add_output(integration_name, task, output)
- easier to access and managemodels.get_generated_text()
method,Changes to
CcdData
: uses aConceptLists
dataclass to contain problems, medications, allergies concepts for better interface with theDocument
class.Changes to
.load()
method forBasePipeline
: this method now configures the pipeline with additional logic that parses a model and model source (either string - name of model or path to model or a callable - langchain chain object) into aModelConfig
object.Added
ModelRouter
, a helper which returns the appropriate integration component given aModelConfig
Templates: Users can pass in a Jinja template for custom CDS cards (this will extend to CDAs too, but that's a matter for a different issue).
Added
CdsCardCreator
: this component either extracts generated text from model outputs in the pipeline or takes in specified static content and parses this into a CDSCard
object using Jinja templates (a default is used if not provided).Renamed integration components to be more descriptive:
SpacyComponent
->SpacyNLP
,HuggingFaceComponent
->HFTransformer
,LangchainComponent
->LangChainLLM
kwargs
to integration componentsAdded
._add_concepts_to_hc_doc()
helper method toSpacyNLP
, which takes the entities from the the spacy doc and parses it toConcept
and adds it to the.concepts
attribute inDocument
. This is hard coded to always add new concepts as SNOMED Problems for now, but will be made configurable in future.Removed default spacy tokenizer in
TextPreprocessor
: this is redundant as can just useSpacyNLP
. For better separation of concern this component is just for very simple text preprocessing - the default is.split()
but users can also pass in a tokenizer object (Callable) to use with the component.And finally, added
MedicalCodingPipeline
andSummarizationPipeline
implementation.ner
orsummarization
, but no strict validation yetTesting
Added tests for:
CdsCardCreator
:test_card_creator.py
ModelRouter
:test_modelrouter.py
.load()
method:test_pipeline_load.py
test_medicalcoding.py
,test_summarization.py
test_integrations.py
TextPreprocessor
initializes tokenizer object -test_preprocessor.py
Document
methods -test_containers.py
Documentation