What is common choice for vocabulary size? #1

insikk · 2017-12-26T11:53:56Z

Intuitively, too small vocab size give less discriminate power, but we can build the vocabulary faster.

i.e. k-means clustering with smaller k

moderate size of vocabulary gives compact representation while having satisfactory discriminitive power. We hope so similar descriptors are grouped into the same visual words.

When vocabulary size becomes the number of available descriptor, we don't have reason to use vocabulary at all.

According to paper "Large Vocabularies over a Fine Quantization" by Andrej Mikulik, Michal Perdoch, Ondrej Chum, Jiri Matas,

5.2 Size of the vocabulary.
There are different opinions about the number of visual words in the vocabulary for image retrieval. Philbin et. al. in [Philbin et al., 2007] achieved the best mAP for object recognition with a vocabulary of 1M visual words and predict a performance drop for larger vocabularies. We attribute the result
in [Philbin et al., 2007] to a too small training dataset (16.7M descriptors). In our case the vocabularies with up to 64M words is built using 11G training descriptors. Experiments show that the larger the vocabulary is, the better performance is achieved, even for plain bag-of-words retrieval.
Introducing the alternative words, the situation is changed even more rapidly and, as expected, they are more useful for larger vocabularies (Figure 5). We have not built vocabularies larger than 64M because the memory footprint of the assignment tree started to be impractical and the performance has almost converged.

With above reasoning, for Oxford 5k images (16M descriptors). vocabulary size 1M may works at best.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is common choice for vocabulary size? #1

What is common choice for vocabulary size? #1

insikk commented Dec 26, 2017

What is common choice for vocabulary size? #1

What is common choice for vocabulary size? #1

Comments

insikk commented Dec 26, 2017