Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending Problem Catalogue - Limits on complexity ? #78

Open
atiro opened this issue Jul 18, 2021 · 0 comments
Open

Extending Problem Catalogue - Limits on complexity ? #78

atiro opened this issue Jul 18, 2021 · 0 comments

Comments

@atiro
Copy link
Contributor

atiro commented Jul 18, 2021

I wondered how complex the problem catalogue rules code should get, or if it's better to allow some of these calculations to be done further on. For example, to identify,

  • object types that appear in both singular and plural form,
  • object types that appear in both upper and lower case form.
  • object types that are aggregations of other object types (e.g. "watercolour and ink drawing")

These are relatively simple to do in Pandas (well, except the last one) by reading the CSV output from MQAF with the relevant extracted field, but they could also be something added to MQAF as a problem catalogue rule, although it feels like it would end up writing a lot of code to replicate large data analysis that Pandas handles very well.

Trying to decide where the line should be drawn, maybe if the rule requires the entire dataset to be in memory that's where something like Pandas should come in ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant