-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate games accounting for first move advantage (fix #6818) #16281
Conversation
8320559
to
d9d4b2a
Compare
Love to see this! I analysed this a while back and concluded that adding this parameter is a significant improvement, but somehow never got around to implementing it. It's important to get the parameter right. If 35 overshoots too much, it could lead to less accurate results than 0. It should be possible to calculate the optimum value from aggregate results, like opening explorer data. We want that White's average score equals the rating system prediction for a game between two stable players at 1500. |
Sampling rated blitz games with rating=1500, I see:
These numbers don't match the Opening Explorer numbers, or any other case study I could find. I selected blitz games because the rapid rating distribution is skewed, and the blitz rating distribution is not skewed. Using this calculator, I see that an Elo difference of +7.786 produces an expected score of 0.510870. |
|
||
final class GameResult(winner: Rating, loser: Rating, isDraw: Boolean) extends Result: | ||
final class GameResult(val first: Rating, val second: Rating, outcome: Option[Boolean]) extends Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that you changed the semantics of this class while keeping typing compatibility.
winner: Rating, loser: Rating
and val first: Rating, val second: Rating
have the same types, but mean different things. Code using it might break without the compiler noticing.
I reckon the class should at least have a different name, to force us to review and fix the code wherever it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I changed the only other consumer of this class (PuzzleFinisher.scala
) to consume BinaryResult
instead.
I tried to think of a better name than GameResult
to describe this concept... I needed a way to identify which player had the first move advantage.
private val VOLATILITY = lila.rating.Glicko.default.volatility | ||
private val TAU = 0.75d | ||
private val calculator = glicko2.RatingCalculator(VOLATILITY, TAU) | ||
private val calculator = lila.rating.Glicko.system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the fudge... there was a horrible bug here.
private val calculator = glicko2.RatingCalculator(VOLATILITY, TAU)
was completely wrong here, as the constructor goes RatingCalculator(tau: Double = 0.75d, ratingPeriodsPerDay: Double = 0)
.
And that's what happens when we use weak types like Double
. Arguments get swapped during some code change and no-one notices.
Gonna change these to proper opaque type
so that it doesn't happen again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goodness, that is quite a mess and I feel somewhat bad for having introduced it years ago.
Parameter tau
being near-zero may have caused solvers' volatility to increase slower when on a hot streak:
Smaller values of τ prevent the volatility measures from changing by large amounts, which in turn prevent enormous changes in ratings based on very improbable results.
http://www.glicko.net/glicko/glicko2.pdf
Parameter ratingPeriodsPerDay
being 0;75d
instead of 0.21436d
may have caused solvers' RD to increase faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I wasn't blaming you for the bug. In fact I didn't check who introduced it and assumed it was me.
Thanks for fixing it tho.
looks like I cannot push to this PR, that's a bit unpractical |
|
||
def getOpponent(player: Rating): Rating | ||
def getAdvantage(advantage: Double, player: Rating): Double = | ||
if player == first then advantage / 2.0d else -advantage / 2.0d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: It's a bit insane that players are identified by their exact Rating
. Not introduced by this PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is a dilemma when two players have the exact same Rating
. (I can't tell at the moment whether this PR either creates or fixes that edge case.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that comes from https://github.com/goochjs/glicko2. We could fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to fix it... but also, there's a case of dependency inversion since RatingCalculator
shouldn't need to inspect RatingPeriodResults
to identify what players, puzzles, tutor-exercises, etc. need ratings updated.
Trait RatingPeriodResults
should be replaceable by List[Result]
, because each consumer of RatingCalcuator#updateRatings
already has that context, especially because we're only rating games and puzzles serially (one at a time) anyway (except for modules/tutor/src/test/GlickoTest.scala
which rates 2 results concurrently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in any case this PR is not the time and place to fix it.
[EDIT] I'm able to push after accepting the github invitation to contribute.
It happened before on a couple PRs and has remained a mystery. |
|
Given excitement in our team channel about crazyhouse, I did a little analysis (and maybe should update this PR): This month, 1228 rated crazyhouse games were played, with 628 (51%) White wins and 575 (47%) Black wins, a 15.171 Elo advantage (as compared to 7.786 Elo for standard blitz). |
fixes clicking the reset password email link after obtaining a gdpr erasure
players: Iterable[Rating], | ||
results: RatingPeriodResults[?], | ||
skipDeviationIncrease: Boolean = false | ||
) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is strange, or I'm misunderstanding something.
This updateRatings
function now takes an additional argument players
.
That info used to come from results
which also contains the players as results.players
.
Which means there are now 2 sources of truth for the players. Both players
and results.players
.
I see that results.players
has been made private and is completely unused (?!) but it is still a code smell in my opinion, and a potential source of bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this PR exposes what in my mind was previously a code smell: RatingCalculator.scala
looking at results.players
to figure out "which ratings need to be updated?" was unnecessarily complex (besides that previous code being confusing in other ways with trying to figure out if a player played any games in that RatingPeriodResults
or not).
Tutor could have exercises with fixed ratings: in fact, in chess literature (books, magazines, CDs, other training materials) exercises are provided with "if you get this score, we estimate your rating is X." But this PR might be prematurely forcing us to make that design decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For others' benefit; we discussed on stream that we don't think results.players
needs to exist, but it's an artifact from a previous reference implementation prematurely optimized straight from java hell. https://xkcd.com/2347/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding result.players
, class Result
is perhaps the world's least efficient implementation mapping Pair[Rating, Pair[OpponentRating, Score]]
or maybe Map[Rating, Pair[OpponentRating, Score]]
(for a pair of players playing a game, for a player and a puzzle, or for a player attempting a fixed-rating tutor exercise). At present my head hurts trying to think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait... Result
should actually be Tuple[OpponentRating, Score]
and there's no need for any functions. Sure, a game has a winner and a loser, so a game should produce a Pair[Result]
.
val results = glicko2.GameRatingPeriodResults( | ||
List( | ||
game.winnerColor match | ||
case None => glicko2.GameResult(white, black, true) | ||
case Some(chess.White) => glicko2.GameResult(white, black, false) | ||
case Some(chess.Black) => glicko2.GameResult(black, white, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go from
case Some(chess.White) => glicko2.GameResult(white, black, false)
case Some(chess.Black) => glicko2.GameResult(black, white, false)
to
case Some(chess.White) => glicko2.GameResult(white, black, Some(true))
case Some(chess.Black) => glicko2.GameResult(black, white, Some(false))
which I'm pretty sure has a bug, not fixed by more recent commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a best effort to correct this (and pushed my commits), and am overwhelmed trying to understand compile errors.
This reverts commit 22a850a.
I'm making a pure glicko calculator API, moving it to scalachess, and integrating it back into lila |
Thanks! I see that PR successfully merged into master! |
On |
Case studies measure a first move advantage of 35 Elo, for example:
2002 - https://en.chessbase.com/post/the-sonas-rating-formula-better-than-elo
2024 - https://www.robweir.com/blog/2014/01/first-move-advantage-in-chess.html
This PR does not tune tau (see http://www.glicko.net/glicko/glicko2.pdf ),
although doing so may help stabilize variant and ultrabullet ratings.