-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Porting certainty propagation to new codebase. Notes and Outline. #13
Comments
@Videodr0me @dubslow I originally came over to this project with the thought of doing certainty propagation. If I have some time this weekend, I'll try to get this implemented. |
@RedDenver Great! Feel free to add any questions you might have to this issue, i will try to clarify and can also look at your code and give input. Don't expect to much, though. In my testing doing this is approx. +10 Elo without TB in selfplay, I suspect that this will get more valuable with TB and some tests indicate it also helps a little more against a/b opponents, but nothing major. It is kind of nice that it finds shallow mates, sometimes missed by standard lc0 and also is a little better than human mating abilities, sometimes finding "hidden" mates in 8 or 9. |
Of course once this makes it into the training it will be worth at least 100 elo, is my guess |
Should I wait for egtb to get merged and then build off of that since tb hits will be certain? |
I am under the impression that the merge of TB code is imminent. So you might wait, even though there are almost no interactions between two code changes. The main one is that TB scored nodes are also made certain (and not terminal - if you do not want to "overload" the terminal bool), except for that almost nothing should change in the TB code by adding certainty prop. |
I would say also wait on, or start from, the |
I'll have some time tonight to get started on this, so let me know if you branch; otherwise, I'll just start off of current master. |
I'm having issues with protobuf and can't really do much since I can't run it: LeelaChessZero/lc0#283 |
I hope the protobug issues are sorted now!? I would recommend to make this a separate branch at first. Actually i am advocating for a separate search extension branch in general and hope it gathers some traction. All search changes (for example: minimaxq, certainty prop, sticky checkmate) would after some testing go to that branch first. Only after a second round of testing and some experience in the "wild" they would get merged to master. This "search" branch would be the cutting edge and somewhat experimental. It would prevent frequent changes to master so that contributors have a stable version for a longer period of time to contribute training games. |
Was that a typo? That's huge. If it wasn't a typo, it might be a good idea to wait for this before any reset. But is 100 Elo realistic? |
Interesting game from self-play, where Leela misses many opportunities for a mate in 3. https://lichess.org/o6XsH9X6#180 Would certainty propagation solve this? |
Depends on how temperature interacts with certainty. Playing mates in 1 should never be temperature-ified, as per my sticky checkmate code, but I'm thinking we still want temperature even with certain mates seen, because certain mates doesn't exclude possibly shorter mates. I do think the constant t=1 is rather suboptimal, but that's slightly outside the scope of this topic. In scenarios where we improve the temperature scheme, we can "reduce the temperature when certain mates are found", so to speak. |
@dubslow how will temperature help find shorter mates? Leela's training says that mate is mate, so policy of different mating positions won't change much on average. |
Actually this isn't true at the moment. Even in training, policy is trained that non-mate alternatives to mate should get some portion of the policy. I think this is an oversight and proposed fixing that some weeks ago, but it was tabled. And shorter mates doesn't have anything to do with the mating move, but rather with finding the moves before the mate that lead to that quicker mate. |
As Dubslow is MIA, i am porting it right now. A basic version will probably be ready for bug-hunting in a week. |
Outdated unfortunately. But worth keeping in documentation. |
I agree, it may serve as an educational basis for those of lc0-wide willing to enter the discussions seriously by exposing the points where one needs to be careful (logical joints in the discussion). The type of ideas that were receivable, and the keywords. Would help me build a discussion glossary a bit like I started in the forum, from sifting though those issues. |
Certainty propagation
Certainty propagation is known in literature as MCTS-Solver (Winands et. al). If we reach terminal (which are "certain") nodes, we can propagate the "certain" results from these nodes efficiently up the search-tree. For example if one side can play a move thats leads to a terminal win node, we can make the parent node a "certain" node scored as a loss from the opponents view. Further, nodes whose children are all certain, become certain themselves with the -max q among the certain childs. With this even MCTS can solve some shallow mates, and steer exploring the tree more efficiently and avoid branches that are "proven" to no longer need exploration. See also https://github.com/Videodr0me/leela-chess-experimental/wiki/MCTS-Solver---Certainty-Propagation-and-Autoextending
Features of full Implementation (as in my experimental build):
Note on the basic implementation (what are the essentials):
-IsCertain(). Nodes get a new attribute "certain". All terminal nodes are certain, but not all certain nodes are terminal. MakeCertain(v) makes a node certain. MakeTerminal(result) makes a node terminal and certain. Most checks in the main loop are changed from IsTerminal() to IsCertain(), This changes nothing in regard to terminals (all terminals are also certain). The terminal state can be "overloaded" with this, but i opted for a separate bool for clarity and so i can turn the feature of more easily via uci options (for self-play testing). Later this became useful as for example two-fold draw-scoring needs to differentiate between terminal draws and two-fold draws. I could imagine something similar with TB and a "swindle" mode when playing against non TB opponents.
-The main propagation is done in the (new codebase) DoBackupUpdate(). I store the certain scores in both Q and V, and opted to fetch them via V. But fetching them via Q would work as well. I use hacky float == x.x compares, which works because i set them exactly as such, but there are nicer ways (for example use game:result or something). For efficiency I only do the certainty check if the update originated from a certain node (which can also be a terminal) -> v_is_certain = true. Root is never made certain (actually it is made uncertain before search, if it was certain due to tree-reuse). I "overloaded" the code to at root also aid in the best move determination (thats what the n==root_node_ condition is for) and for tie breaking (two-folds, certain wins vs. terminal wins.). This is a simplified version without the overloading, its actually really simple:
This is basically it, if all childs are certain we make the node certain with the -max of the certain childs. Or if we find at least one certain winning child, then the node is a certain loss for the opponent. This gets backpropagated in the tree.
The full version of this with the following best move determination and the tie break rules are here https://github.com/Videodr0me/leela-chess-experimental/blob/master/src/mcts/search.cc#L339 I currently do best move determination in two stages (mainly because of the not searching certain childs of root). First the normal update (which is preliminary) and then at root (one ply higher) I resuse the info about the best certain move from the linked full version. This again can be done simpler initially.
The changes to GetBestMove are here https://github.com/Videodr0me/leela-chess-experimental/blob/d2088837fff01779dbf8fc606af6e92c98c1a449/src/mcts/search.cc#L505 These compute normal best move and best certain move, and then decide between them. This can greatly be simplified if we initially continue searching certain childs at root (basically prioritizing everything over certain losses and always take certain wins.). Also the tiebreak rules can safely be ignored (not really sure they yield any meaningful elo anyhow for example in prefering a certain loss over a terminal loss, in hope the opponent did not see the mate).
The changes only affect node.h, node.cc, search.h and search.cc. Only conditionals with if (kCertaintyProb) are relevant. And of those only those that are general "kCertaintyProb" or "kCertaintyProp==1". The rest is irrelevant. I hope this helps for a start. The changes in ExtendNode (old codebase) are only relevant for the one-ply lookahead and two-fold draw scoring (and the tree balancing).
The text was updated successfully, but these errors were encountered: