Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing bug fix for metacat #487

Merged
merged 4 commits into from
Sep 5, 2024
Merged

Pushing bug fix for metacat #487

merged 4 commits into from
Sep 5, 2024

Conversation

shubham-s-agarwal
Copy link
Collaborator

2-phase learning for MetaCAT utilises data_undersampled.
Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data

2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the lazy logging formatting.
Also, perhaps that should be a logger.debug instead?

PS:
At initial look it seems like you're changing the order of the returned arguments in the method as well as when unpacking. And that wouldn't change the behaviour. But for the lines changed, that is the intention.
What the change actually does is force the use of encode_category_values in line 354 to use the full data rather than the undersampled data.
So there's a little bit of a confusing path of fixing an issue. In order to fix an issue of unpacking data in line 354, the order of the returned arguments is changed everywhere else.
But this order probably does make sense overall.

@@ -210,6 +210,8 @@ def encode_category_values(data: Dict, existing_category_value2id: Optional[Dict
for i in range(len(data)):
if data[i][2] in category_value2id.values():
label_data_[data[i][2]] = label_data_[data[i][2]] + 1

logger.info(f"Original label_data: {label_data_}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use lazy formatting for logging. So instead of an f-string, use %s formatting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes I'll fix that!
Regarding the ordering of unpacking, I was going to only change the order for link 354, however Tom pointed out that it would make the most sense to have it full_data and then undersampled_data
That's why the long route

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make the change for all logging statements to use lazy logging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally don't want to change the purpose of a PR.
But I don't think there's too many. So why not.

As for unpacking - yes, I agree. This order makes more sense in general. Just thought I'd mention it.

@shubham-s-agarwal shubham-s-agarwal marked this pull request as draft September 4, 2024 12:44
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy logging means allowing the logger to parse in the % formats if (and only if) the logged message has a handler (i.e if it needs to be shown/written somewhere).

So a messages gets logged as follows:

logger.info("Messages about %s and nr %d", "topic 1", 4)

@shubham-s-agarwal shubham-s-agarwal marked this pull request as ready for review September 4, 2024 14:06
@shubham-s-agarwal
Copy link
Collaborator Author

Fixed the lazy logging formatting

Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@shubham-s-agarwal shubham-s-agarwal merged commit 6127f77 into master Sep 5, 2024
8 checks passed
@shubham-s-agarwal shubham-s-agarwal deleted the metacat_bug_fix branch September 5, 2024 08:12
mart-r pushed a commit to mart-r/MedCAT that referenced this pull request Oct 14, 2024
* Pushing bug fix for metacat

2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data

* Pushing change for lazy logging

* Pushing update for lazy logging

* Pushing lint fix
mart-r pushed a commit that referenced this pull request Oct 14, 2024
* Pushing bug fix for metacat

2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data

* Pushing change for lazy logging

* Pushing update for lazy logging

* Pushing lint fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants