-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic populated field detection #196
Conversation
9769269
to
f694f72
Compare
target_field: str, | ||
expected=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some part of these two params is still lightly confusing to me. Maybe it just needs some explanation in the docstring?
target_field
is required, but only used if expected
is not provided. Is it better to require a full expected list all the time? Or always add target_field
to the expected list regardless of whether it was provided or not? The current approach just feels a little surprise-inducing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is back to 'the default is to treat the field as if it's a codeableconcept, unless you specify otherwise', which the more i think about it, the worse it feels.
i could've sworn i put a docstring on this, but might've lost it switching branches - i'll get something in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added a docstring for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah that helps, but it's a little odd now.
:param target_field: The name of the nested value to check for inside the
schema of source_col
:param expected: a list of elements that should be present in target_field.
If none, we assume it is a CodeableConcept.
target_field
actually isn't used at all in the current code and should likely be removed from the args list. And expected
's docstring should probably say "present in source_col" - cause it's frequently looking for fields anywhere up and down the hierarchy (like codeable concepts looking for both coding and system)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, updated this docstring again - too many moving targets in this thing.
c86bfa8
to
81ad97a
Compare
source_table="documentreference", | ||
source_id="id", | ||
column_name="content.format", | ||
is_array=False, | ||
column_hierarchy=[("format", dict), ("content", dict)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is flipped, right? If so, you should probably add a unit test that would catch that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well a test would have caught it if the test data had been correct, so that should be addressed now.
column_name="code", | ||
is_array=False, | ||
column_hierarchy=[("code", dict)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this consolidation of arguments makes sense, yeah.
nit: maybe now that there's less overload on the word, you could just use columns
instead of column_hierarchy
which is easy to say or type wrong. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't like columns
because ordering is important and not implicit in that name. open to other ideas.
parent_field=None | ||
if len(config.column_hierarchy) == 1 | ||
else config.column_hierarchy[0][0], | ||
is_array=True if config.column_hierarchy[0][1] == list else False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True if bool_expression else False
is a verbose way to say bool_expression
. I'm guessing it looked odd to have a bare equality check in the middle of the kwargs. Maybe throw a parens around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally for something inline i find this verbose form more readable, but i can do this.
column_name=config.column_hierarchy[1][0], | ||
parent_field=config.column_hierarchy[0][0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is uncomfortably hardcoded - you always have exactly depth=2 calls into this method? I get that it's a temporary function, but still.
How about throwing an assert at the top of this method for that assumption, to safeguard against accidents during the temporary period.
(And likewise, in get_codeable_concept_denormalize_query
above, maybe assert len(config.column_hierarchy) <= 2
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently it's used in one place. but i can block with assert for now.
|
||
@dataclass(kw_only=True) | ||
class FhirObjectDNConfig(abc.ABC): | ||
"""ABC for handling table detection/denormalization of objects in SQL""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro nit: ABC is a bit jargony for a docstring, to my eyes. Especially because it can be displayed in contexts removed from the abc.ABC
inheritance a line above.
Speaking of jargon, I'm lightly leery of encoding DN
in this class name.
I don't find "denormalization" to be super evocative. And definitely not on the tip of my tongue enough to know what DN means without seeing the longer word right next to it. Like, I only know what you mean by denormalization from context. If I look it up online, I get stuff about injecting duplicate data into databases (which I guess the unnesting process does do, in the columns you don't care about, but that's not like the main thrust of it in my mind). "Flattening" feels more natural of a description to me. Or simply call it by its keyword of "unnesting." But maybe there's a yet better word.
But already in this docstring you have two use cases mentioned - schema detection and denormalization, but DN gets the honor of being in the class name. Anyway, not important - and you can keep using denormalization for it. Just speaking as a filthy casual over here, that it's not an intuitive word to my ears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this to say 'base class' and named the thing a BaseConfig
.
target_field: str, | ||
expected=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah that helps, but it's a little odd now.
:param target_field: The name of the nested value to check for inside the
schema of source_col
:param expected: a list of elements that should be present in target_field.
If none, we assume it is a CodeableConcept.
target_field
actually isn't used at all in the current code and should likely be removed from the args list. And expected
's docstring should probably say "present in source_col" - cause it's frequently looking for fields anywhere up and down the hierarchy (like codeable concepts looking for both coding and system)
7d5d81c
to
dcd9de1
Compare
This PR makes the following changes:
Checklist
docs/
) needs to be updated