Skip to content

Database Planning

sustained edited this page Jun 15, 2019 · 2 revisions

Figuring out how the database should be laid out. Includes example data.

Users table

id email username email_verified_at password remember_token created_at updated_at
1 [email protected] foo NULL HASH NULL TIMESTAMP TIMESTAMP

Courses table

id author_id title description source_language target_language
1 1 Swedish 101 Learn Swedish! sv en

Levels table

id course_id title created_at updated_at
1 1 Basics 1 TIMESTAMP TIMESTAMP

Words table

id level_id course_id source target created_at updated_at
1 1 1 I jag TIMESTAMP TIMESTAMP
2 1 1 is, am, are är TIMESTAMP TIMESTAMP
3 1 1 a, an en TIMESTAMP TIMESTAMP
4 1 1 a, an ett TIMESTAMP TIMESTAMP
  • ❗ We haven't yet planned for synonyms/alternative answers, or typo detection.
  • ❗ We don't have any concept of attributes yet, like Memrise (e.g. gender, part of speech, ...). Ideally we want to support custom/arbitrary attributes too so we need to think about this one.

Sessions table

Stores the basics of an individual learning session.

id user_id completed started_at finished_at
1 1 true TIMESTAMP TIMESTAMP
  • ❓ Probably useful to know if a learning session wasn't actually completed.
  • ❓ Storing the started/finished at times can probably aid cheating-detection.

In the above example, user 1 started a learning session at some timestamp and finished it at some other timestamp. Any entries for that session will be stored in the session_entries table with the session_id of 1.

Session Entries table

Stores the actual individual entries for each item within the learning session.

id session_id word_id was_correct was_typo time_taken answer created_at
1 1 1 true false 3000 jag TIMESTAMP
  • ❓ How to track multiple choice questions etc.?
  • ❓ Should we store the prompt? We have the word_id, so we can get a hold of the prompt...

Statistics table

Basically tracks statistics from all the sessions for any given word.

id word_id user_id avg_time_taken num_reviews num_correct num_incorrect num_typos created_at updated_at
1 1 1 4720 (ms) 7 5 3 1 TIMESTAMP TIMESTAMP
  • 👍 Allows us to regularly purge the session and session entries tables which will likely grow huge, fast while still allowing us to provide most of the information from those sessions.

  • 👍 Not losing session history could potentially be a "pro" feature.

  • 👎 Means that we of course need to keep this table in sync with the others and have some data duplication.

  • ℹ️ Typos are separate because a typo is considered a correct answer.

Reviews table

Stores when words need to be reviewed.

???