Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table "keywords" lacks UNIQUE constraint on field "value" #235

Open
peterjdann opened this issue Jun 28, 2024 · 4 comments
Open

Table "keywords" lacks UNIQUE constraint on field "value" #235

peterjdann opened this issue Jun 28, 2024 · 4 comments
Labels

Comments

@peterjdann
Copy link
Contributor

It should not be possible, even accidentally or by the result of a programming error, to add an entry to table "keywords" that duplicates a value already present.

You have check the number of duplicate values in that table by running the following SQL:

SELECT value, COUNT(value)
FROM keywords
GROUP BY value
HAVING COUNT(value) > 1

In the scrubbed development database, this currently shows three duplicate values.

If/when administrators become more conscious of what keywords have been applied to audiobooks as a result of mooted changes that are currently the subject of two pull requests, it is possible they may seek to edit the contents of some keywords and, in the process (depending on what safeguards the Librivox software implements short of a DB field constraint) they may be able, in the process, to create further duplicates in this table, thus rendering any search-by-keywords process inaccurate.

@redrun45
Copy link
Collaborator

Hmm... our database contains a number of things that were put in, before the code was updated to disallow them in future. If the current code fails to detect existing keywords, so that it creates duplicates, I would expect to see more than 3.

Are you able to recreate whatever bug this was, and trick the code into making duplicates now?

@redrun45
Copy link
Collaborator

*reproduce, I mean, not recreate. Are you able to reproduce whatever conditions caused duplicate keywords, so that the current code will make more?

@peterjdann
Copy link
Contributor Author

The current code for managing potential duplicate keywords when editing a project works very well indeed at preventing duplicates. Nonetheless, given the number of obvious typos in our keywords values, it might at some stage in the future be tempting to "fix" this problem by directly (not programmatically) editing the database entries. I was surprised to find that "mere" software was being used to protect the integrity of this database column, without any second line of defence, as it were, in the table definition itself. As well as acting as a second layer of defence, setting a UNIQUE constraint on the keywords->value column could serve a useful documentary function.

I can see this is not an urgent matter, however, if it warrants attention at all.

@redrun45
Copy link
Collaborator

That being the case, it's at least worth having as a warning note, for next time we touch the relevant code. Perhaps at some point, a manual schema change like this will be convenient to do along with some other update process. Now that you've pointed it out, it's at least on the radar. 😉

Remembering that this code and schema was created by a paid professional, you may well find the number of irregularities surprising. That all of those irregularities have not been fixed by the very few volunteers who have stopped by over the years, and usually without having this particular professional domain knowledge, should not be surprising. A lot of work goes into managing even the smallest change. Myself, I would struggle to motivate myself for much of what I do, if I didn't get to see people benefit from it directly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants