-
Notifications
You must be signed in to change notification settings - Fork 0
/
lsq_notes.txt
327 lines (279 loc) · 14.5 KB
/
lsq_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
==== Feb 22, 2012
Experiment with comparing iterative solving to exact solving.
*** BigMoneyUltimate
alpha = 0.01
initial weights = [0,1,0,-1]
model:
my money density / 3.0
my vp / 43.0
opponent money density / 3.0
opponent vp / 43.0
A run of 100 games:
Best: [ 4.29508338 2.93704835 -4.54502706 -2.87357622]
Found: [ 2.03326253e-03 1.03804614e+00 -2.27004143e-04 -1.06592063e+00]
Another:
Best: [ 5.58029168 3.39851064 -5.90434826 -3.31513852]
Found: [ 4.65158804e-03 1.03557099e+00 -9.93641494e-04 -1.08492468e+00]
A run of 1000 games:
Best: [ 4.91180147 3.11083833 -5.19287036 -3.03654276]
Found: [ 0.05964249 1.44408752 -0.0588232 -1.51734499]
So, the best solution seems reasonably consistent.
The iterative convergence is much too slow!
Try alpha = 0.2. This might risk divergence, but at least convergence should
be faster if it occurs.
Run of 1000 games:
Best: [ 5.23274332 3.10012048 -5.5334671 -3.00607252]
Found: [ 2.48050491 2.59399158 -2.94985874 -2.72150569]
Convergence STILL too slow! Try alpha = 1.0 !
Note that the amount of data is apparently still too low to get a very reliable
best solution, as there is a fair bit of difference from the previous one.
With alpha = 1.0, got severe divergence:
Best: [ 4.79729666 2.98188381 -5.07034035 -2.90589255]
Found: [ 5.01320614e+145 -7.05181414e+144 5.59635992e+145 2.15306331e+145]
The divergence was experienced even after just the 2nd game,
so it is quite fast.
Let's see where the critical point is for alpha.
alpha = 0.5?
Best: [ 4.98041757 2.95815444 -5.26772694 -2.86182576]
Found: [ 4.30374657 2.81009816 -4.69511505 -3.28508863]
Converged fairly well, with no crazy divergence.
Let's see how it would do over 10,000 games, again with alpha = 0.5.
Visually, convergence seems worse than the previous run: e.g. it's at about 2000
games now, and the first coefficient is still less than 4.
Best: [ 4.80675988 3.00011302 -5.08273628 -2.92180695]
Found: [ 4.26385022 2.6779444 -4.24603723 -3.53405828]
Visually, there seemed to be fairly large fluctuations in the found coefficients
throughout the algorithm, so probably alpha is too large for later training.
<=== NOTE: The above "best" weights are strong. ===>
Out of curiosity, how well does SimpleAI perform with the "best" weights from
the previous trial? Before we try, observe that only the first two weights will
affect the performance of the AI, because the other weights correspond to features
of the opponent player that aren't changed by our buy decision. The ratio of
the VP weight over the money-density weight is roughly 0.625, whereas hand
optimization found that roughly 0.6 performed best--about the same number!
To observe the performance of these weights, we ran 1000 trials of SimpleAI
against BMU. Results (with 99% confidence intervals):
Win: 36.9% +/- 3.9%
Loss: 58.1% +/- 4.0%
Tie: 5.0% +/- 1.8%
For comparison, consider the performance of BasicBigMoney against BMU:
Win: 17.8% +/- 3.1%
Loss: 80.2% +/- 3.2%
Tie: 2.0% +/- 1.1%
Quite a bit worse.
Let's see if we can get better performance by training SimpleAI directly
against BMU. Let's use a small value of alpha since we already have a
decent starting point--try alpha=0.05, and to start, 1000 trials.
Experiment rate is set to zero. Unfortunately, even by game ~700 the
coefficient for money-density has dropped below the coefficient for
VP, and now SimpleAI is performing extremely poorly, in total winning
fewer than 10% of games. Final weights:
Best: [ 5.22224107 0.59376631 -5.53730531 -0.39199473]
Found: [ 2.11569303 3.57695386 -6.45648768 -1.00935343]
Really bizarre best solution, as well. Perhaps this can be explained as
follows: in early games, when the weights were close to the originals,
SimpleAI would invest in both money and VP and sometimes win. In later
games, with the broken weights, SimpleAI would heavily overinvest in
VP and never win--thus the best solution tends to recognize presence of
VP as overly bad and presence of money density of overly good, since
these correspond to periods of poor play and good play, respectively.
The obvious question is... why is it that training weights via observing
BMU play yields a good AI, but when the AI observes its own play vs BMU to
train weights, it does terribly?
Some things to test:
- SimpleAI vs SimpleAI: was it just learning bad habits against BMU due to
losing often? Running this now with the same parameters.
- SimpleAI vs BMU, but without incremental updates, and then compute best
solution: this would help determine if the instability of incremental
updates is causing issues, since if the best solution here is also
actually a poor solution, that would indicate a problem.
Result of SimpleAI vs SimpleAI:
Best: [ 6.00857446 3.15251864 -6.38471951 -3.00783411]
Found: [ 5.02746337 3.0508573 -5.34550722 -2.92599645]
Visual observation showed that the found rates were varying somewhat,
but they didn't stray too far from the initial weights, so that is good.
Strangely enough, the best solution seems to overvalue money density.
Let's try this solution vs BMU (without updates) to see the result:
Win: 27.9% +/- 3.7%
Loss: 67.7% +/- 3.8%
Tie: 4.4% +/- 1.7%
Definitely worse than before. Note that the key ratio is about 0.525 here,
well below the hand-optimized value of roughly 0.6.
Why would the best solution overvalue money density in self-play?
The fact that the two players are using the same weights means that the
explanation in the vs-BMU case no longer really makes sense, because the
winrate between the players should always be around 50%.
TODO!!!!!
For the record, best weights:
Best: [ 1.32668138 2.60300329 -1.39226533 -2.71456454]
Found: [ 6.00857446 3.15251864 -6.38471951 -3.00783411]
I wonder whether the emphasis on VP is influenced by SimpleAI's underdog
status. As an underdog, it depends on scoring lucky provinces to win--
when this occurs, its VP count is higher than usual. On the other hand,
if it hits $7, it will buy golds, increasing its money density, which
then looks bad.
Let's try playing against BMU with the good weights, without incremental
updates, and then computing the best solution, to see if it's the
incremental updates that are problematic. (1000 plays.)
Note that ONLY the TD updates from SimpleAI's perspective will be used
to compute weights.
Performance (as expected, similar to before):
Win: 36.3% +/- 3.9%
Loss: 59.6% +/- 4.0%
Tie: 4.1% +/- 1.6%
Weights:
Best: [ 3.01378615 2.71707892 -4.02457252 -2.46092754]
Found: [ 4.80675988 3.00011302 -5.08273628 -2.92180695]
Again the problem with undervaluing money density and overvaluing VP!
Really vexing.
What are the best weights for self-play without incremental updates?
Best: [ 5.76272973 3.0919074 -6.12896186 -2.94875218]
Found: [ 4.80675988 3.00011302 -5.08273628 -2.92180695]
Now money density is OVER-valued. ???
I guess it might be the case that these best weights would give favourable
results in self-play, even if they are not the best choice against BMU.
This seems worth testing. It requires a bit of change in program design
so that different instances of SimpleAI can use different weights, but
this is a worthwhile change to make in any case because the old design
is terrible.
The new design will be: instead of a PlayerStrategy instance corresponding
to a player in a game, rather a PlayerStrategy will have a create_player
method that returns a Player that delegates strategic decisions to the
strategy. Weights and compressed training data are maintained as
instance variables of PlayerStrategy, whereas the experimented flag and
the prev_features vector are instance variables of Player. This way,
a strategy can be trained by two different players concurrently.
Done! So now SimpleAI can self-play with different weights.
Keep in mind that the "smart" weights might have been flukes, since we
only ran 1000 games.
Results (for "smart" weights vs original BMU-trained weights):
Win: 35.8% +/- 3.9%
Loss: 52.1% +/- 4.1%
Tie: 12.1% +/- 2.7%
Definitely seems like the "smart" weights are actually pretty dumb.
The update weights from this run would be:
Best: [ 5.20467152 2.88381559 -6.07492049 -2.7463815 ]
Found: [ 5.76272973 3.0919074 -6.12896186 -2.94875218]
Note how these weights are pessimistic, since this version of SimpleAI
tends to lose.
Now for something fun!
We can write
money_density = copper_density + 2*silver_density + 3*gold_density;
total_vp = estate_count + 3*duchy_count + 6*province_count.
So, splitting into these individual features should not damage our
model's power, and can maybe improve it. Let's try!
New model (for me, then for opponent):
copper density
silver density
gold density
estate count / 7
duchy count / 4
province count / 4
First observation while training by observing BMU: even alpha=0.01 saw
divergence with incremental updates (although note this starts with
random weights drawn from N(0,1)). So the increase in features maybe
leads to instability. alpha=0.002 seems to be stable.
After 1000 games, weights:
Best: [ 14.71643228 12.2656796 13.95768907 0.52691816 0.64333316
0.86416075 -14.89968981 -12.28664605 -14.23262165 -0.51322355
-0.67777177 -0.87454557]
Found: [ 1.6289053 -0.52451952 -0.22484738 -0.17929265 0.09038819 0.28776102
-0.84303723 0.41713313 0.37942689 0.04371454 -0.09460905 -0.26615444]
The "best" weights seem very extreme. A coefficient of ~15 for copper density,
really? And a whopping ~14 for gold density? Meanwhile, province count receives
a miniscule 0.86! The copper density coefficient may be damaged due to the
feature essentially being "7 / (10 + # buys made)", since BMU does not buy
copper. Excluding copper density is tempting, but that would give less power
than our original model.
Let's try a few things:
- Include a 0/1 variable which is 0 for the first player, 1 for the second player.
- Run for 5000 games.
- Start at zero weights.
Weights:
Best: [ 14.42726814 11.94976359 13.63191231 0.50696871 0.60690088
0.84562124 -14.54334944 -11.94634136 -13.8635629 -0.493736
-0.64088102 -0.85026777 -0.07545339]
Found: [ 0.0042212 -0.12199819 0.15038786 -0.07401236 0.01654467 0.26061165
0.04979619 0.11603943 -0.23253175 0.09791922 -0.03610467 -0.24584386
-0.25333274]
Well for a start, the incremental updates completely failed to converge to
anything reasonable whatsoever. The best weights here are simply bizarre.
Apparently copper is better than gold, and silver is the worst of all.
I really find this hard to understand. Anyway, let's pit it vs BMU and see what
happens!
Unsurprisingly, it is getting demolished, due to making such bizarre moves as
opening copper/copper. In fact, out of 1000 games, it wins none. =(
Let's try to figure out what's up with this. First of all, the copper density
feature (as observed before) is basically just the reciprocal of the deck size.
Why it's so good to have a small deck, I do not know. Anyway, this makes it
clear that watching BMU doesn't help with figuring out when to buy coppers,
since, well, BMU does not buy coppers!
Why do estates (and to a lesser extent duchies) look so good and provinces so
bad? Maybe because, in the endgame, if the provinces split evenly, then estates
actually are really important. A lot of games, BMU will split the provinces
evenly, so there you go. More of a functional thought: the VP coefficients
are mainly useful for indicating when to switch from buying money to buying
VP, as long as the coefficients are ordered correctly. So, the high coefficient
on estate basically means that it's correct to switch to buying estates not
long after buying provinces, which seems totally reasonable.
The oversights seem to be explainable by not understanding situations that
don't occur in BMU games:
- The bot doesn't understand that buying copper is bad, because it never sees
a situation where buying copper causes a loss.
- The bot doesn't understand that a province is better than two estates,
because this isn't really relevant when there is no +buy.
Let's try disabling the copper-density feature altogether. This perhaps leaves
us with less power than the old model, because now a copper and an estate look
the same in the deck... but oh well. The problem is that it's difficult to
include a feature now that can't get copper density via a linear combination,
and if that's possible, the solver will do it.
Weights from that:
Best: [ 2.11500823 3.88785563 0.05612424 0.16940762 0.37110225 -2.19524308
-3.8747309 -0.05391813 -0.18900806 -0.35788766 -0.14009059]
Found: [-0.05661241 0.04575251 -0.07635714 0.06280407 0.23825626 0.03901031
-0.09023782 0.1089921 -0.09511622 -0.24295026 -0.24114975]
Seems a lot more reasonable, I think.
Now, Gold is rated about 1.84x as good as Silver, not too ridiculous.
Also, Duchy is 3.02x as good as Estate, and Province 2.20x as good as Duchy.
In fact, these coefficients are so reasonable, it makes me wonder why including
copper density had such a disastrous effect!
Let's try it out vs BMU. I suspect performance might be worse than the simpler
model, simply because now it's a bit hampered in its ability to appreciate how
much VP dilutes money. It can understand the dilution of silver and gold, but
not the dilution of copper. On the other hand, it should have a more nuanced
ability to understand the trade-offs between individual card types.
Results:
Win: 22.6% +/- 3.4%
Loss: 74.0% +/- 3.6%
Tie: 3.4% +/- 1.5%
Quite a lot worse than the simpler model.
==== Feb 23, 2012
Trying a new model:
Keep different features for each turn of 1-25, and then for turns 26+.
The features are: # silver, # gold, # estates, # duchies, # provinces,
am-i-second-player.
(Since BMU will not buy copper, don't include # copper.)
Except for the am-i-second-player bias parameter, each feature is present
for self and opponent.
Ran 12,000 games of BMU against itself to gather training data.
Because of linear dependencies among features, could not use scipy.linalg.solve,
so used scipy.linalg.lstsq instead.
With obtained weights, ran 1000 games of SimpleAI against BMU:
Win: 23.6% +/- 3.5%
Loss: 74.1% +/- 3.6%
Tie: 2.3% +/- 1.2%
So... it's much weaker than what we got doing 10,000 games with the original
4-feature model. =(
Potentially the problem here is that 12,000 games for 275 features is much less
training than 10,000 for 4 features. So, let's try 30,000 games and see if that
gives better results:
Win: 21.0% +/- 3.3%
Loss: 76.6% +/- 3.4%
Tie: 2.4% +/- 1.2%
About the same.
Ran for 100,000 games!!!
Results:
SimpleAI: 13.2% +/- 2.8%
Tie: 1.3% +/- 0.9%
BigMoneyUltimate: 85.5% +/- 2.9%
What.