-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementing a database interface #24
Comments
After some reading, I noticed that TinyDB is restricted to single threaded operations. While that is not in and of itself a problem, it might be worthwhile to look into an alternative such as SQLite, which could be useful in the long run if multi threading becomes necessary. SQLite would be slightly more future proof while (hopefully) not requiring as much time to implement as MongoDB. SQLite also touts itself as being better equipped to handle tables/graphs, though it is unclear as to what those additional benefits are. If SQLite was shown to be the inferior choice after further investigation, implementing TinyDB sounds like a worthwhile addition! |
SQLite seems somewhat easier than setting up MongoDB and somewhat harder than working with TinyDB. As the UI components representing/updating the data will need to be changed having support for multi-threaded access is a nice option to have. Not sure if it will be necessary or appropriate to use, but nice to have. At any rate, learning some SQL is probably a good thing, just on principle. Still undecided, will do some more pondering Related reading
import json
import sqlite3
sqlite3.register_converter('json1', json.loads)
sqlite3.register_adapter(dict, json.dumps)
sqlite3.register_adapter(list, json.dumps)
db = sqlite3.connect(
':memory:', detect_types=sqlite3.PARSE_DECLTYPES
)
with db:
db.execute(
'create table docs (i integer primary key, d json1)'
)
doc1 = {'a': 1, 'b': ['x', 'y', 'z'], 'c': None}
db.execute('insert into docs (d) values (?)', (doc1,))
doc2 = db.execute('select d from docs').fetchone()[0]
assert doc1 == doc2 |
the issue
ScaleWiz currently manages data storage by looking for JSON files.
It performs some bulk data mutation and serialization occasionally.
When updates to data do occur, the entire data file must be written and read from again. This is inefficient.
Further, the lack of a centralized store means data can be duplicated 2-3 times in memory (
TestHandler
,ProjectEditor
,EvaluationWindow
) , with each copy pending being mutated by some UI widget. Problematic operations like this are currently disabled on the GUI layer.Competing edits could result in data loss, so it would be better to just use a real database.
Data security here should be worth the overhead of having to make database retrievals/insertions.
possible solutions
The two best options appear to be MongoDB, and TinyDB.
MongoDB would be the more comprehensive, robust option. However, it requires talking to an actual instance of MongoDB either locally or remotely. This makes ScaleWiz harder to install/setup initially, with some potential long term benefits.
TinyDB is a less comprehensive option, but is user-friendly and easily extensible. It is pure Python, and doesn't require talking to a external process / server instance. ScaleWiz could still simply be pip installed.
Both options work by serializing data in a very dict-compatible format. MongoDB uses BSON and TinyDB uses JSON (or YAML, etc.) ScaleWiz models are already stored in JSON so migrating to it should be pretty easy. Migrating from TinyDB to something like MongoDB should be similarly easy if the need arises.
the work
Resolving this issue has a couple implementation goals
structure
One of the first and most important implementation details to sort out will be how to actually find the documents once they are in the database. Right now we rely on the user navigating the filesystem to show us where the relevant chunk of JSON is. This file's path is effectively its 'universal identifier'.
A possible solution is to add to each
Project
auuid
, which could be stored as a hex string such as1d849bddb8d14f4da6743770033e463f
We can generate these on the fly relatively cheaply. they are almost guaranteed to be collision free.
A TinyDB instance could be defined in the init.py of the package for easy access. All modifications to it should be passed through the main tkinter thread though (much like the GUI updates)
getting started will be kind of awkward. right now we rely on the user to navigate the filesystem to point us to the data. instead we will need to be able to query for it reliably. we also need to be able to relay the visual representation of the available data to the user. probably with something like a treeview
also see the difference between these possible schemas
https://github.com/teauxfu/scalewiz/blob/5523105262126654d62f7c37ed70d81190302bf6/sampledb.json#L1-L42
https://github.com/teauxfu/scalewiz/blob/5523105262126654d62f7c37ed70d81190302bf6/sampledb.yaml#L1-L33
JSON vs YAML as a format
hooking up
the current data cycle is like
JSON file -> Python dict -> Python object (tkinter variable model) -> Python dict -> JSON
this isn't very elegant, but the tkinter variables are easy to add traces to (fire a callback when the variable is updated).
this affords some nice functionality
Test.name
andProject.name
)related #21
a possibly better two-way flow might be like
database (manages its own file, we don't care what kind) <-> Python object (tkinter vars)
a pain point in the UI previously has been keeping multiple copies of the same
Project
in memory, each with their own view, and having to do lots of file read/writing to rebuild the UITestHandler
which must read experiment params from theProject
Tests
to theProject
, and can create newProject
s but never independently mutates the metadata on theProject
nor its existingTest
sProjectEditor
can make newProject
s and read/write all of aProject
metadata. cannot mutate existingTest
sEvaluationWindow
can read/write to the list ofTest
s in aProject
, read/write a coupleProject
metadata fields (plot
andpath
iirc)the new database implementation should
Project
object in this waybuild
the GUIupdates could be done immediately, or in batches by keeping a memo of update operations and sending them all off when the user clicks save etc. often it makes sense to let the user preview changes before persisting them
porting
Whatever the implementation ends up being, it must provide a path to migrate older data. There seems to be two options here
The data as it is currently stored doesn't use a specific JSON schema. At the Python/tkinter layer we just hope the right fields are there, using the
dict
'sget
method to try to find the right key while offering fallbacks. eg.This is somewhat useful for kind-of correcting old JSON files on the fly, but also cumbersome. Especially as each undocumented change to the non-existent schema has to be added in as extra Python logic.
The text was updated successfully, but these errors were encountered: