Valency: do not show duplicate instances #955
Labels
backend
bug is related to backend
enhancement
this label means that resolving the issue would improve some part of the system
Based on discussion with J. Normanskaya, we should hide duplicate instances at the valency instance approval page /valency.
Consider a use case of a user slightly changing a source document from the corpus, adding the changed version to the corpus, parsing it and then updating valency data. Depending on the extent of the changes, a considerable number of new instances may be duplicates of already existing instances. Even if the user deletes the previous version of the document, we are still to show any approved instances sourced from it, and so there will be duplicate instances, see #775 (comment).
Instances are considered duplicates if they are in the same position in the same sentence of the same source, with sentence identified by its sequence of tokens with their parsed attributes, including grammar.
A possible solution is to store instance hashes and filter out duplicates when querying DB, preferring earlier instances of earlier sources.
The text was updated successfully, but these errors were encountered: