-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ebisu assumes that half-lives do not change after reviews #43
Comments
Thanks for opening an issue, and for raising the initial issue and chatting about it on Reddit! I haven't forgotten about this! I've been trying a few different ways to achieve the goal of evolving the halflife or the probability of recall, with a straightforward statistical model and compact predict/update equations, but haven't been able to make much progress yet.
To elaborate on this for future readers, another way to see this is—take a representative flashcard from a real Anki deck, where you've passed a bunch of quizzes and failed a few, and try to fit a maximum likelihood estimate for the initial model: either the full 2D case (find the alpha=beta and the initial halflife variables that maximize likelihood) or the simpler 1D case (fix alpha=beta=2 say and find the initial halflife to maximize likelihood). You'll find that the initial halflife of the maximum likelihood estimate is like, 10'000 hours, which is obviously wrong. The math isn't wrong, the underlying model is broken. The easiest way to see why is, ignore exponential decay of memory for now, and assume you take a quiz at the exact halflives. Then the model simplifies to a Beta random variable that just accumulates the number of successes and failures—the classic conjugate prior to Bernoulli trials. If quizzes at halflives were coin flips of weighted coins, Ebisu estimates the weight of that coin; but memory isn't a fixed coin flip: the weight of the coin (the strength of recall just after each quiz) changes with the number of quizzes. The goal for an improved model is to explicitly track the strength of the memory over time. It's unclear whether the Beta/Bernoulli model has a place in this. You can imagine a random process (e.g., white noise Gaussian process) trying to estimate this hidden memory strength as reviews come in, but I'm struggling to separate the natural evolution of the memory strength (independent of reviews) with the exponential decay after each review. I will update this thread when I have something solid! I am happy to get proposals for algorithms too! For Ebisu users, this explains why you can't just schedule reviews when cards drop to 80% or 50%: in the past I thought those predictions weren't realistic because we were very handwavy about our initial models, but actually it's because of the issue above. If you are using Ebisu now, you most likely already have a workaround, perhaps reviewing when recall drops to 10% or reporting recall probability with a visual meter (see here), and that will continue to work! If you are evaluating Ebisu, know that the output of
|
Posted an interim detailed update at #35 (comment). |
A quick update— I've basically found a solution, a Bayesian implementation of what @cyphar above calls "ease", i.e., the growth of the halflife as a function of review. In a nutshell, the idea is that, as part of the
I'm really disappointed that I might have to change the Ebisu update process to use Monte Carlo—a big part of what Ebisu meant to me was fully analytical updates. But I'm slowly making peace with this—moving to Monte Carlo actually makes everything a lot simpler and may give us a lot of powerful tools to do things we've wanted to do. For example—one of those things is how we model correlations between flashcards. We know that there are pairs of cards where, if you review one, then the other is (very) likely to be a success. How do we detect those? We also know that flashcards can have a lot of metadata associated with them—Duolingo's halflife regression paper got all of us excited about the possibilities here. Often this metadata lets us predict recall for cards that the user hasn't even studied. Instead of rushing out a big Ebisu version that explicitly models each card's ease (I was calling it "boost") as a Bayesian parameter, I'm spending some time experimenting with how to make Ebisu more general to accommodate things like correlations and metadata, which are now a lot easier to handle since we're using Monte Carlo. I'm hoping to timebox that effort though, so if I don't make any concrete breakthroughs there, I'll try to release that new Ebisu version that accounts for ease so folks can start using it. |
Another quick update. I think I've mostly finished the math and the code for the new version of Ebisu, and I'm working on tests that's exposing various bugs—hopefully there's no more unknown unknowns. I'm hoping to post an RFC with the new API in a few weeks. When I posted my last comment in September, I was afraid we'd have to use Monte Carlo, but luckily, there's a simpler and less computationally-intensive way to handle updates, via MAP (maximum a posteriori) estimates of the halflife and the boost factor (Anki's ease factor). As a reminder, the new version will:
Timeline: RFC in a few weeks, release in another few weeks (documentation takes me as long to write as doing mathematical analysis…). Edit: the progress is happening in another repo: https://github.com/fasiha/ebisu-likelihood-analysis/commits/main |
This thread cleared some things up for me. I never did understand how ebisu was modeling the increase in half-lives due to reviews. I thought I just wasn't understanding the algorithm well enough. I also don't schedule flashcards at all; ebisu tells me what to study, but not when to study it. So I guess I'm not that affected :-) I'm looking forward to seeing your new version. |
This is a summary of this reddit discussion we had some time ago, and I'm mostly posting this here so that:
(You mentioned you'd open an issue about it, but I guess other things got in the way. 😸)
The main concern I have with ebisu at the moment is that it has an implicit assumption that the half-life of a card is a fundamental property of that card -- this means that, independent of how many times you reviewed a card, that card will be forgotten at approximately the same rate (note that because ebisu uses Bayes, this half-life does grow with each review but the fundamental assumption is still there). This has the net effect of causing you to do far more reviews than necessary (at least this is the case if you use it in an Anki-style application where you quiz cards that have fallen below a specific expected recall probability -- I'm not sure if ebisu used in its intended application would show you a card you know over a card you don't).
To use a practical metric, if you take a real Anki deck (with a historical recall probability of >80%) and apply ebisu to the historical review history, ebisu will predict that the half-life of the vast majority of cards has either already lapsed or the predicted recall is below 50%. In addition, if you construct a fake review history of cards that are always passed, ebisu will only grow the interval by ~1.3x each review. This is a problem because we know that Anki's (flawed) method of applying a 2.5x multiplier to the interval works (even for cards without perfect recall) so ebisu is clearly systematically underestimating the way the half-life of a card changes after a quiz.
In my view this is a flaw in what ebisu is trying to model -- by basing the model around a fundamental half-life quantity, ebisu is trying to model a second-order effect which varies with each review as a constant quantity. As discussed on Reddit, you had the idea that we should model the derivative of the half-life explicitly (which you called the velocity) -- in Anki terminology this would be equivalent to modelling the ease factor explicitly. I completely agree this would be a far more accurate model, since it seems to me that the ease factor of a card is a far more stable quantity that is more of an intrinsic factor of the card (it might be the case that the ease factor evolves as a card moves to long-term memory but at the least it should be a slowly-varying quantity).
This was your comment on how we might do this:
(I am completely clueless about Kalman filters, and I honestly struggled to understand the Beta/GB1 framework so sadly I'm not sure I can be too much of a help here. Maybe I should've taken more stats courses.)
The text was updated successfully, but these errors were encountered: