-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valuset builder #298
Merged
Merged
Valuset builder #298
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
e422128
Valuset builder
dogversioning 9073825
PR feedback
dogversioning 01e05c8
Unit test cleanup
dogversioning f739396
removed unused test table
dogversioning be75cf4
Uprev min DuckDB to 1.1
dogversioning 2d74329
coverage
dogversioning 4d01bad
more coverage
dogversioning File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
"""Package metadata""" | ||
|
||
from .base_utils import StudyConfig | ||
from .builders.base_table_builder import BaseTableBuilder | ||
from .builders.counts import CountsBuilder | ||
from .study_manifest import StudyManifest | ||
from cumulus_library.base_utils import StudyConfig | ||
from cumulus_library.builders.base_table_builder import BaseTableBuilder | ||
from cumulus_library.builders.counts import CountsBuilder | ||
from cumulus_library.study_manifest import StudyManifest | ||
|
||
__all__ = ["BaseTableBuilder", "CountsBuilder", "StudyConfig", "StudyManifest"] | ||
__version__ = "3.1.0" | ||
__version__ = "4.0.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
133 changes: 133 additions & 0 deletions
133
cumulus_library/builders/valueset/additional_rules_builder.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
"""Builder for generating subsets of RxNorm data from a given valueset""" | ||
|
||
import pathlib | ||
|
||
from cumulus_library import BaseTableBuilder, base_utils, study_manifest | ||
from cumulus_library.builders.valueset import valueset_utils | ||
from cumulus_library.template_sql import base_templates | ||
|
||
|
||
class AdditionalRulesBuilder(BaseTableBuilder): | ||
display_text = "Generating rulesets..." | ||
base_path = pathlib.Path(__file__).resolve().parent | ||
|
||
def prepare_queries( | ||
self, | ||
*args, | ||
config: base_utils.StudyConfig, | ||
manifest: study_manifest.StudyManifest, | ||
valueset_config: valueset_utils.ValuesetConfig, | ||
**kwargs, | ||
): | ||
study_prefix = manifest.get_prefix_with_seperator() | ||
table_prefix = "" | ||
if valueset_config.table_prefix: | ||
table_prefix = f"{valueset_config.table_prefix}_" | ||
self.queries.append( | ||
base_templates.get_base_template( | ||
"create_search_rules_descriptions", | ||
self.base_path / "template_sql", | ||
study_prefix=study_prefix, | ||
table_prefix=table_prefix, | ||
) | ||
) | ||
self.queries.append( | ||
base_templates.get_create_table_from_tables( | ||
table_name=f"{study_prefix}{table_prefix}potential_rules", | ||
# From a domain logic perspective, the _rela table is | ||
# the leftmost table and we're annotating with the | ||
# data from rxnconso. Since rxnconso is much, much | ||
# larger, we're moving it to the left in the actual | ||
# constructed join for athena performance reasons | ||
tables=[ | ||
f"{study_prefix}{table_prefix}all_rxnconso_keywords", | ||
f"{study_prefix}{table_prefix}rela", | ||
], | ||
table_aliases=["r", "s"], | ||
columns=[ | ||
"s.rxcui", | ||
"r.rxcui", | ||
"s.tty", | ||
"r.tty", | ||
"s.rui", | ||
"s.rel", | ||
"s.rela", | ||
"s.str", | ||
"r.str", | ||
"r.keyword", | ||
], | ||
column_aliases={ | ||
"s.rxcui": "rxcui1", | ||
"s.tty": "tty1", | ||
"s.str": "str1", | ||
"r.rxcui": "rxcui2", | ||
"r.tty": "tty2", | ||
"r.str": "str2", | ||
}, | ||
join_clauses=[ | ||
"s.rxcui2 = r.rxcui", | ||
( | ||
"s.rxcui2 NOT IN (SELECT DISTINCT RXCUI FROM " # noqa: S608 | ||
f"{study_prefix}{table_prefix}rxnconso_keywords)" | ||
), | ||
], | ||
) | ||
) | ||
self.queries.append( | ||
base_templates.get_create_table_from_tables( | ||
table_name=f"{study_prefix}{table_prefix}included_rels", | ||
tables=[ | ||
f"{study_prefix}{table_prefix}potential_rules", | ||
f"{study_prefix}{table_prefix}search_rules", | ||
], | ||
table_aliases=["r", "e"], | ||
columns=[ | ||
"r.rxcui1", | ||
"r.rxcui2", | ||
"r.tty1", | ||
"r.tty2", | ||
"r.rui", | ||
"r.rel", | ||
"r.rela", | ||
"r.str1", | ||
"r.str2", | ||
"r.keyword", | ||
], | ||
join_clauses=[ | ||
"r.REL NOT IN ('RB', 'PAR')", | ||
"e.include = TRUE", | ||
"r.TTY1 = e.TTY1", | ||
"r.TTY2 = e.TTY2", | ||
"r.RELA = e.RELA", | ||
], | ||
) | ||
) | ||
self.queries.append( | ||
base_templates.get_base_template( | ||
"create_included_keywords", | ||
self.base_path / "template_sql", | ||
study_prefix=study_prefix, | ||
table_prefix=table_prefix, | ||
) | ||
) | ||
self.queries.append( | ||
base_templates.get_create_table_from_union( | ||
table_name=f"{study_prefix}{table_prefix}combined_ruleset", | ||
tables=[ | ||
f"{study_prefix}{table_prefix}included_keywords", | ||
f"{study_prefix}{table_prefix}included_rels", | ||
], | ||
columns=[ | ||
"rxcui1", | ||
"rxcui2", | ||
"tty1", | ||
"tty2", | ||
"rui", | ||
"rel", | ||
"rela", | ||
"str1", | ||
"str2", | ||
"keyword", | ||
], | ||
) | ||
) |
98 changes: 98 additions & 0 deletions
98
cumulus_library/builders/valueset/lookup_drug_from_ingredient_rules.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
TTY1 RELA TTY2 INCLUDE | ||
BN reformulated_to BN Yes | ||
BN reformulation_of BN Yes | ||
BN tradename_of IN Keyword | ||
BN has_precise_ingredient PIN Keyword | ||
BN ingredient_of SBD Yes | ||
BN ingredient_of SBDC Yes | ||
BN ingredient_of SBDF Yes | ||
BN ingredient_of SBDG Yes | ||
BPCK has_dose_form DF No | ||
BPCK tradename_of GPCK Yes | ||
BPCK contains SBD Yes | ||
BPCK contains SCD Yes | ||
DF No | ||
DFG No | ||
GPCK has_tradename BPCK Yes | ||
GPCK has_dose_form DF No | ||
GPCK contains SCD Keyword | ||
IN has_tradename BN Keyword | ||
IN part_of MIN Keyword | ||
IN has_form PIN Keyword | ||
IN ingredient_of SCDC Keyword | ||
IN ingredient_of SCDF Keyword | ||
IN ingredient_of SCDG Keyword | ||
IN boss_of SCDFP Keyword | ||
MIN has_part IN Keyword | ||
MIN has_part PIN Keyword | ||
MIN ingredients_of SCD Yes | ||
PIN precise_ingredient_of BN Yes | ||
PIN form_of IN Keyword | ||
PIN part_of MIN Yes | ||
PIN precise_ingredient_of SCDC Yes | ||
PIN boss_of SCDFP Yes | ||
SBD has_ingredient BN Yes | ||
SBD contained_in BPCK Yes | ||
SBD has_dose_form DF No | ||
SBD quantified_form_of SBD Yes | ||
SBD has_quantified_form SBD Yes | ||
SBD consists_of SBDC Yes | ||
SBD isa SBDF Keyword | ||
SBD isa SBDFP Keyword | ||
SBD isa SBDG Keyword | ||
SBD tradename_of SCD Yes | ||
SBD consists_of SCDC Yes | ||
SBDC has_ingredient BN Yes | ||
SBDC constitutes SBD Yes | ||
SBDC tradename_of SCDC Yes | ||
SBDF has_ingredient BN Yes | ||
SBDF has_dose_form DF No | ||
SBDF inverse_isa SBD Yes | ||
SBDF isa SBDG Yes | ||
SBDF tradename_of SCDF Yes | ||
SBDF has_form SBDFP Yes | ||
SBDFP form_of SBDF Yes | ||
SBDFP tradename_of SCDFP Yes | ||
SBDFP inverse_isa SBD Yes | ||
SBDG has_ingredient BN Yes | ||
SBDG has_doseformgroup DFG No | ||
SBDG inverse_isa SBD Yes | ||
SBDG inverse_isa SBDF Yes | ||
SBDG tradename_of SCDG Keyword | ||
SBDG tradename_of SCDGP Yes | ||
SCD contained_in BPCK Yes | ||
SCD has_dose_form DF No | ||
SCD contained_in GPCK Yes | ||
SCD has_ingredients MIN Yes | ||
SCD has_tradename SBD Yes | ||
SCD quantified_form_of SCD Yes | ||
SCD has_quantified_form SCD Yes | ||
SCD consists_of SCDC Yes | ||
SCD isa SCDF Keyword | ||
SCD isa SCDG Keyword | ||
SCD isa SCDFP Yes | ||
SCDC has_ingredient IN Keyword | ||
SCDC has_precise_ingredient PIN Keyword | ||
SCDC constitutes SBD Yes | ||
SCDC has_tradename SBDC Yes | ||
SCDC constitutes SCD Yes | ||
SCDF has_dose_form DF No | ||
SCDF has_ingredient IN Keyword | ||
SCDF has_tradename SBDF Keyword | ||
SCDF inverse_isa SCD Yes | ||
SCDF isa SCDG Keyword | ||
SCDF has_form SCDFP Yes | ||
SCDFP inverse_isa SCD Yes | ||
SCDFP has_tradename SBDFP Yes | ||
SCDFP form_of SCDF Yes | ||
SCDFP isa SCDGP Yes | ||
SCDFP has_boss IN Keyword | ||
SCDFP has_boss PIN Keyword | ||
SCDG has_doseformgroup DFG No | ||
SCDG has_ingredient IN Keyword | ||
SCDG has_tradename SBDG Keyword | ||
SCDG inverse_isa SCD Yes | ||
SCDG inverse_isa SCDF Yes | ||
SCDGP has_tradename SBDG Keyword | ||
SCDGP inverse_isa SCDFP Yes | ||
SCDGP form_of SCDG Keyword |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: your own life might be better if you re-exported all the individual builder classes in
builders/__init__.py
, so that you could do the following:This also avoids some churn as you add/remove/rename builders.
(I've wound up on this pattern in the ETL anyway - all the cumulus_etl.* modules are toplevel concepts like "loaders" or "nlp" and then those modules are little fiefdoms that export their internals out to the top level. Helps me mentally map the code structure too, to keep semantic meaning one level deep. I don't always hit that mark, but I try 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm slowly coming around to this view of things - i may do it in a separate PR for my own sanity.