-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update split_one_chevs balancemode v2 #328
Update split_one_chevs balancemode v2 #328
Conversation
Add split-one-chevs-v2 Update moduledoc
fc47059
to
0dd77e3
Compare
This reverts commit 0dd77e3.
4f05643
to
dacd9fd
Compare
Check for bad balancer
c09fbef
to
95d352e
Compare
0dd4639
to
d638612
Compare
Refactor minor
d638612
to
a2c5976
Compare
Minor update
0497a07
to
2cd66d9
Compare
Sample video of testing balancer tab in Integration server: Note that the balancer tab only appears for rated games. Also the chevron level of players is based on current data always (since we don't store history of this). |
ed2850c
to
21f8613
Compare
More improvements
21f8613
to
182e034
Compare
Local Dev TestsYou must rerun the fake data task. This is because if you ran it previously, the fake users will have too high permissions. I modified the fakedata task so fake users will have normal permissions. Also the task will now also add fake playtime data.
Launch the website
Login to the website using There is a dropdown with the label "Balance Algorithm" near the top. Change this to Testing permissions of normal users
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.
I thought rank was only calculated on login, when/where else does Teiserver calculate rank?
@@ -96,7 +96,7 @@ defmodule Mix.Tasks.Teiserver.Fakedata do | |||
name: generate_throwaway_name() |> String.replace(" ", ""), | |||
email: UUID.uuid1(), | |||
password: root_user.password, | |||
permissions: ["admin.dev.developer"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in order to test that a user cannot see the balance tab, they must have zero permissions. That admin.dev.developer permission basically gives them admin role.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be good to integrate FakePlaytime into Fakedata to avoid the need for 2 commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll investigate to see if I can do it. This is much more about my Elixir skills not being strong enough to safely modify this code.
socket | ||
|> mount_require_any(["Reviewer"]) | ||
|> mount_require_any(["Reviewer", "Contributor"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ratings were locked behind Overwatch
, balance behind Reviewer
, neither available to Contributor
which in my opinion would find the data more useful, mainly for debugging purposes, than Overwatch
for moderation. This doesn't make much sense to me and I would change it all to Contributor
but this is something for Beherith to decide.
end) | ||
end) | ||
|> List.flatten() | ||
|
||
past_balance = | ||
BalanceLib.create_balance(groups, match.team_count, mode: :loser_picks) | ||
BalanceLib.create_balance(groups, match.team_count, algorithm: balancer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are balancer
and algorithm
the same thing in this context? Some places are using one, some the other, it might be nice to always use the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can update my word usage to be more consistent. Will look into it.
end | ||
|
||
@spec has_enough_noobs?([BT.expanded_group()]) :: bool() | ||
def has_enough_noobs?(expanded_group) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_enough_noobs
only checks if there are noobs, not how many?
Is this intended or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some ideas like enough noobs be something like
- At least a single 1chev or
- At least two, 2chevs
But decided to keep it simple for now. So the function is more about allowing additional complexity in the future.
A single noob actually gets treated differently compared to Teifion's balancer. Teifion's balancer will pick the noob higher (since it's based on their OS). My algorithm will always pick the noob last irrespective of OS.
To be honest this bug really baffled me. I still have no idea why they are out of sync. From my searching it only gets calculated on login. |
That was my understanding as well. I checked again, not sure what we are missing... |
Currently at least there is an issue for it: #332 |
@L-e-x-o-n I have updated the PR now with the following changes:
|
# which has an expiry of 60s | ||
# See application.ex for cache settings | ||
rating_type_id = MatchRatingLib.rating_type_name_lookup()[rating_type] | ||
rating = get_user_balance_rating_value(userid, rating_type_id) | ||
{skill, uncertainty} = get_user_rating_value_uncertainty_pair(userid, rating_type_id) | ||
rating = calculate_rating_value(skill, uncertainty) | ||
rating = fuzz_rating(rating, fuzz_multiplier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mistake
calculate_rating_value
should not be called since we already have the rating.
Context
I played a game with split_one_chevs on and noticed a few issues. The 1Chevs were not split. This was because Macwhite was recognised by Teiserver as a 2Chev whereas Chobby showed them as a 1Chev. This is likely because Chobby gets a player's rank on login and then never updates it so they get out of sync with TeiServer.
How to resolve?
To resolve this my algorithm will now group both 1 and 2Chevs into the "Noob" bucket. There's also going to be slight change on how we draft players who are in the "Noob" bucket detailed below.
Other Findings
When putting the replay
into https://openskill-test.web.app/ I noticed that the library expects Team 1 to win, despite that most humans would bet on Team 2.
Since the library expects Team 1 to win, if my team were to win I would gain a lot of OS i.e. +1. If my team were to lose, I only lose -0.37.
From this we can conclude that if you were to choose from the three "Noobs": CindersFire, Macwhite, Victorious_Dead you probably want to avoid the overrated players, which are likely the ones with highest uncertainty. If an overrated player is on the other team, you stand to win more and lose less, and since they're overrated, you're also more likely to win. Therefore, for those in the "Noobs" category, we probably want to pick those with low uncertainty as they are less likely to be overrated.
split_one_chevs Algorithm v2
Based on these findings, the algorithm will now draft players based on these criteria:
This draft mimics how a human might draft players with the given visible information in a lobby. It's not super mathematical. Players generally look at chevron level to determine how overrated someone might be. Someone did complain in chat about the lobby balance in the game I played mentioned above. They were obviously eyeballing the chevron levels and assuming those two players were overrated.
Further enchancements
has_parties?
. If this is false we do not need to rerun the balancer again.loser_picks
) will be called. This is the default algo and it supports parties.Known Bugs
Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.
Unit Tests
Run this to run multiple unit tests that relate to balance
Local Dev Tests
See comment here for test steps.
Theoretical Testing on past replays
Go here: https://balance-algo-web.web.app/
And enter a past replay. Change algorithm to Split One Chevs v2