-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index is larger than image set? #22
Comments
I think the problem here is Elastic is writing my data out to disk before indexing it, and it is filling up as my application is writing images more quickly than elastic can handle them. Upon stopping my application from filling in more data, Elastic search takes a few minutes, then the index shrinks to <200MB. Elastic then unassigns itself from my index and appears to lose a lot of the image information. Do you know how I can force Elastic to perform the indexing using an API call and therefore make my application wait for it to complete? |
I guess my issue is mainly, how do I perform the initial indexing of a large image set? |
The way i doing index basically is using bulk api, which sending maybe around 10 images per time, so it will at least stable when doing indexes. some how I not index large image set yet, so not sure what will be happen. |
I'm trying to index 120,000 images (around 50GB) from a spinning HDD onto a 128GB SSD using SCALABLE_COLOR for testing purposes. To my surprise after just 20,000 images, the index has swelled to 60GB, making the index likely to be ten times the size of my source images by the time it finishes.
Is this expected behaviour? Am I accidentally storing the entire image in Elasticsearch?
For the record, my mapping is
curl -XPUT 'localhost:9201/images/art/_mapping' -d '{ "my_image_item": { "properties": { "img": { "type": "image", "feature": { "SCALABLE_COLOR": { "hash": ["LSH"] } } } } } }'
The text was updated successfully, but these errors were encountered: