Machine Learning applied to the problem of finding Wally's keys.
Where's Wally is a heavily illustrated puzzle book series created by Martin Handford where you have to find a little guy in very chaotic scenarios. This seemed to me a good problem to be solved by a computer using Machine Learning (ML) techniques.
After a web search returned a first page full of results of people that had already done that, I changed the task to an arguably more difficult one (for humans, at least): finding Wally's keys (one of the many side goals of the books).
As an example, here is one of the books scenarios:
And here is the Wally's key in that scenario:
(Go ahead and try yourself. If you give up, here is the answer.)
All the data I used to train this ML model is based on Where-is-Waldo-Wally DevianArt Gallery. They had scanned all the scenes from books #2 to #5 and added convenient marks pointing to the answers. All the originals images can be found in data/1-original-images folder.
From those images, I cutted out the answers panel (data/2-no-panels folder). I then chopped the resulting images into 40x40 sized slices (data/3-slices folder and subfolders).
The images filenames are in 00-00-k.jpg format, where the numbers are the coordinates of the slice and the -k means there is a key in the image.
In the 3-slices folder, there is also a subfolder named augmented with rotated versions of the keys to address the issue of unbalanced data (there are many more images with no keys than otherwise). If you want to know more, Bharath Raj wrote a nice article about data augmentation.
Unfortunately, my model didn't take much advantage of all those images because I had had to discard most of them before training the model. My computer couldn't handle the load.
The model was constructed with evilsocket's Ergo framework, which makes easier to build models on Keras.
I tried to use the same model evilsocket had used to demonstrate Ergo, a vanilla Convolutional Neural Network (CNN) for airplane detection in satellite imagery, but unfortunately results were not very good.
I ended up making the network deeper and things improved. Here is the final model:
I was able to get the following results:
Training --------------------------------------------
precision recall f1-score support
0 1.00 1.00 1.00 23789
1 1.00 1.00 1.00 12040
micro avg 1.00 1.00 1.00 35829
macro avg 1.00 1.00 1.00 35829
weighted avg 1.00 1.00 1.00 35829
confusion matrix:
[[23787 2]
[ 3 12037]]
Validation ------------------------------------------
precision recall f1-score support
0 1.00 1.00 1.00 5030
1 0.99 1.00 1.00 2647
micro avg 1.00 1.00 1.00 7677
macro avg 1.00 1.00 1.00 7677
weighted avg 1.00 1.00 1.00 7677
confusion matrix:
[[5011 19]
[ 1 2646]]
Test ------------------------------------------------
precision recall f1-score support
0 1.00 1.00 1.00 5085
1 0.99 1.00 1.00 2592
micro avg 1.00 1.00 1.00 7677
macro avg 1.00 1.00 1.00 7677
weighted avg 1.00 1.00 1.00 7677
confusion matrix:
[[5060 25]
[ 0 2592]]
Thanks to all the dev whose work allowed me to create this project.
Thank you, Gaby (personal cheerleader).