You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wondered how complex the problem catalogue rules code should get, or if it's better to allow some of these calculations to be done further on. For example, to identify,
object types that appear in both singular and plural form,
object types that appear in both upper and lower case form.
object types that are aggregations of other object types (e.g. "watercolour and ink drawing")
These are relatively simple to do in Pandas (well, except the last one) by reading the CSV output from MQAF with the relevant extracted field, but they could also be something added to MQAF as a problem catalogue rule, although it feels like it would end up writing a lot of code to replicate large data analysis that Pandas handles very well.
Trying to decide where the line should be drawn, maybe if the rule requires the entire dataset to be in memory that's where something like Pandas should come in ?
The text was updated successfully, but these errors were encountered:
I wondered how complex the problem catalogue rules code should get, or if it's better to allow some of these calculations to be done further on. For example, to identify,
These are relatively simple to do in Pandas (well, except the last one) by reading the CSV output from MQAF with the relevant extracted field, but they could also be something added to MQAF as a problem catalogue rule, although it feels like it would end up writing a lot of code to replicate large data analysis that Pandas handles very well.
Trying to decide where the line should be drawn, maybe if the rule requires the entire dataset to be in memory that's where something like Pandas should come in ?
The text was updated successfully, but these errors were encountered: