-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pushing bug fix for metacat #487
Conversation
2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the lazy logging formatting.
Also, perhaps that should be a logger.debug
instead?
PS:
At initial look it seems like you're changing the order of the returned arguments in the method as well as when unpacking. And that wouldn't change the behaviour. But for the lines changed, that is the intention.
What the change actually does is force the use of encode_category_values
in line 354 to use the full data rather than the undersampled data.
So there's a little bit of a confusing path of fixing an issue. In order to fix an issue of unpacking data in line 354, the order of the returned arguments is changed everywhere else.
But this order probably does make sense overall.
medcat/utils/meta_cat/data_utils.py
Outdated
@@ -210,6 +210,8 @@ def encode_category_values(data: Dict, existing_category_value2id: Optional[Dict | |||
for i in range(len(data)): | |||
if data[i][2] in category_value2id.values(): | |||
label_data_[data[i][2]] = label_data_[data[i][2]] + 1 | |||
|
|||
logger.info(f"Original label_data: {label_data_}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to use lazy formatting for logging. So instead of an f-string, use %s
formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes I'll fix that!
Regarding the ordering of unpacking, I was going to only change the order for link 354, however Tom pointed out that it would make the most sense to have it full_data
and then undersampled_data
That's why the long route
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I make the change for all logging statements to use lazy logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally don't want to change the purpose of a PR.
But I don't think there's too many. So why not.
As for unpacking - yes, I agree. This order makes more sense in general. Just thought I'd mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazy logging means allowing the logger to parse in the %
formats if (and only if) the logged message has a handler (i.e if it needs to be shown/written somewhere).
So a messages gets logged as follows:
logger.info("Messages about %s and nr %d", "topic 1", 4)
Fixed the lazy logging formatting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
2-phase learning for MetaCAT utilises
data_undersampled
.Fixed a bug in the
eval
function, which was incorrectly using thedata_undersampled
instead of thefull_data