-
Notifications
You must be signed in to change notification settings - Fork 0
/
records.txt
168 lines (125 loc) · 7.09 KB
/
records.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# Personal records of game score
22-12-17: 190
# Linking with AI
Measurements for 30 iterations of 0th generation
Note that each iteration had a typical score of 8 to 12.
Uncapping FPS:
147.70 s -> 19.96 s (7.4x improvement)
Headless:
19.96 s -> 0.83 s
Assuming that a typical run when the network has been trained takes 10x longer,
And that performance scales perfectly with 32 cores,
It takes 0.83 * 10 / 30 / 32 == 0.0086 seconds per iteration.
Let's run each Breaker 5 times, for example, so that the middle 3 values can be averaged for the fitness.
If we have n Breakers in each generation,
we run 5 * n iterations total, giving 0.0086 * 5 * n == 0.043 * n.
Let's just say (0.05 * n) seconds for each generation
If we want to limit each generation to about 1 minute,
we get n == 1200.
I don't think that a larger population size will be critical in the end result,
so let's have n == 1024 to make things nice.
Testing scaling with multiprocessing
1 process, n=1024 at gen 0 took 35 seconds.
32 processes, n=1024 at gen 0 took 2.9 seconds.
--> This is a 12x improvement. Hmm..
1 process, n=4096 at gen 0 took 139 seconds.
16 processes, n=4096 at gen 0 took 15.3 seconds.
32 processes, n=4096 at gen 0 took 11.4 seconds.
--> Still a 12x improvement. I guess this is what the truth is.
---
# New calculations for bootstrapping-based AI
From the previous AI attempt using GA, we have:
AVG score 70, 5120 iterations --> 230 seconds
230 / 5120 == 45 ms / iteration
45 / 70 == 0.64 ms / round
We can estimate that a single firing round takes about 0.6 ms on 32 cores.
That translates to about 8 ms on one core. (assuming x12 scaling as observed earlier)
If we divide the angle of fire into 1000 incremental steps,
It will take 8 seconds to simulate all of them.
Yikes.
Good thing we have 32 cores, eh?
Back to 0.6 ms per round.
That yields 0.6 seconds for simulating each round.
Assume length of 100 rounds per game.
That yields 60 seconds per game. (This is even ignoring the evaluation overhead)
To acquire 10,000 samples, we need 100 minutes.
To acquire 100,000 samples, we need 1000 minutes, i.e. 17 hours.
...I think a speedup is necessary.
Why don't we try preprocess the collision detections?
Position can be in increments of 0.1 pixels, and the angle in 21600 steps(1 arcmin each).
There are 6 possible wall configurations.
((120 / 2) / 0.1) * ((75 / 2) / 0.1) * 21600 * 6 == 29,160,000,000
If we do a 10 million calculations per second, this will take less than an hour to preprocess.
Will the speedup be significant?
I think the quickest way to find out... is to just give it try.
It isn't even hard to implement.
I just tried timing how long the collision handling logic takes.
It was responsible for 15 seconds out of the 230 for each iteration of GA.
No point in the optimization mentioned earlier.
Abort!
# Performance fix
Just found out that there was a lot of redundant AI calculations going on,
because the AI was being run even when the game wasn't in its responsive state.
Resolving this brought the GA runtime from 230 seconds down to 168 seconds.
This is a 27% gain in speed. Not bad.
Running explorer.
We can collect 1000 runs, then try to learn stuff from it, see how it goes.
---
We have the first of generation 1!
They perform excellent, except for some rare hiccups.
Dividing the angle by 128 looked good enough, just eyeballing.
I am still tuning the hyperparameters for the evaluator.
I just trained the network some more epochs, up to 10000, and now it performs crazy.
Almost every move results in all the bricks being cleared.
It is very convincing that the AI will just continue indefinitely.
That means that we can't really continue a bootstrapped learning loop,
as the AI simply beat the game already at gen-1.
What now?
- I could try training the network even more, as even at 10000 epochs, there are no signs of overfitting.
--> I think we're starting to see diminishing returns with the epochs. And as gen-0's network was random,
I think it is impossible by nature to perfectly predict the labels. Don't think it's worth it.
- As we now know that n=128 is okay for angle division, we can simulate generation 0 again, but much faster.
We could create a much larger dataset, and try learning from it again.
--> Again, I don't think having a larger dataset will achieve anything much better. Not worth it.
- I could implement my own hard-coded AI, also explorer(angle simulation) based, for comparison.
--> Won't be too hard to do. Given how the gen-1 AI is able to clear all blocks, a hard-coded AI will
also be clearing all blocks, even more reliably. So a performance comparison will be hard to do, because...
both AIs just don't die. We could introduce new metrics to measure performance like how 'reliably' the bricks are
being cleared, but the time and effort to go all the way will not be worth it, I think.
From what I saw today, I think it is the truth that a calculation-based AI will just be invincible.
Why should I create a hard-coded AI?
... Because it doesn't take much effort, I guess. I could still just compare the two AIs, not necessarily with
a justifiable metric, but by simply watching them play.
- I could add an AI-assist functionality in the regular game mode.
--> No reason not to! Let's do it. Which AI to use, I can decide after I test them.
- I could try train a non-simulation based AI using this simulation-based AI.
--> Before I burn another day on this... I don't think this will work.
The AI we have only checks at finite steps, and what state it aims for is not a 'clear' target.
This AI's output is far from a fixed answer - it is just... this AI's output.
Considering all this, it will be really weird for another network to learn this simulation AI's behaviour.
Let's not.
- Write a README or something.
--> Yeah. I should. Sigh.
---
# Conclusions
1. This, as in creating a Brick Breaker AI, is a hard problem.
We don't have labeled answers to game states, so we must either use some form of reinforcement learning or
genetic algorithms(or some other stuff that I'm not familiar with).
The GA approach failed. If I really take my time to learn and apply a RL model, it might work. Who knows.
But for time's sake, I decided to switch to a more computational, simulation based approach.
If I am to learn RL, I should work on a more established problem, so that I can actually learn,
instead of having no idea what I'm doing.
2. I didn't want to do the simulation based approach.
It seemed less Artificial Intelligence and more... simulation.
But I had made my compromise.
There was still the task of deciding what game state was good and what state bad, and I had a plan to train a
network to do this...
And I did, and it worked!
So I can safely call this a success.
Stop feeling so bad.
3. What to do next?
Write my hard-coded AI, integrate AI-assist function.
Write README.
This was a toy project, from the start.
I need to get better at wrapping things up.
Let's move on to some real stuff now!