Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support custom analysers #111

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

Treora
Copy link
Contributor

@Treora Treora commented Dec 28, 2014

To support custom analysers, it seems best to abandon model.create_all
in favour of an index-wide method (es.create_models) that sets all
mappings (and custom analysis settings) at once.

Code is largely transplanted from hypothesis/h#1825

Details:
Custom analysers(&filters&tokenisers) are shared in Elasticsearch among
all document types (models). To update a model's mappings one needs to
make sure the custom analysers are defined first, but updating those
just for one model is inelegant; The index would have to be closed for
each update, and one cannot check for duplicate definitions of analysers
as it is not visible which model defined a particular analyser. Treating
index-wide settings as per-model settings creates leaky abstractions.

@tilgovi, @nickstenning (and anyone concerned): Please let me know if you disagree with abandoning the model-centric abstraction in this case. I think it is better this way because Elasticsearch does not group analysers by document type, so if we do pretend that it does we either get leaky abstractions with possibly unreliable edge cases, or we need to do more effort than it's worth to pertain the model-centric abstraction.
We could even consider to just define analysers centrally instead of in each model, to simplify things even further (no merging of settings required, it would just match ES's structure), but I think I'm okay with this approach.

I will do tests, add tests and polish things when the approach is considered satisfactory.

@tilgovi
Copy link
Member

tilgovi commented Dec 28, 2014

I think there should be symmetry in names, create_all and drop_all.

I agree it doesn't make sense to have the settings defined on the models so let's just not. Maybe we should configure it on the es object.

Finally, the natural conclusion of this to me would be to use metaclass techniques (or have an API on the es object that such metaprogramming would use) to register models. That kind of magic fits with magic properties, like __mapping__, but doesn't need a get_mapping.

How far do we want to go? I'd be fine to leave it all alone, TBH, because I'd like to invest instead in looking toward other storage.

@Treora
Copy link
Contributor Author

Treora commented Dec 28, 2014

I think there should be symmetry in names, create_all and drop_all.

I was thinking that drop_all could perhaps more clearly be named delete_index or so.

I'm not sure what you mean with metaclass/metaprogramming techniques. But I also think that creating nice abstractions and APIs is just a waste of effort. Creating abstractions with less effort simply leads to more bugs, maintenance, leaky abstractions and nasty edge cases, so I would rather just make things simple and stay close to the structure Elasticsearch provides for now.

I'll look how clean things become if settings is just defined separately rather than possibly distributed over the models.

To support custom analysers, it seems best to abandon model.create_all
in favour of an index-wide method (es.create_models) that sets all
mappings (and custom analysis settings) at once.

Code is largely transplanted from hypothesis/h#1825

Details:
Custom analysers(&filters&tokenisers) are shared in Elasticsearch among
all document types (models). To update a model's mappings one needs to
make sure the custom analysers are defined first, but updating those
just for one model is inelegant; The index would have to be closed for
each update, and one cannot check for duplicate definitions of analysers
as it is not visible which model defined a particular analyser. Treating
index-wide settings as per-model settings creates leaky abstractions.
The function deletes the whole index, not just one type of documents, so
this makes much more sense.
Note that also when the index is an alias index_exists will be true.
@Treora Treora force-pushed the support_custom_analysers branch from 2d00201 to fe6e900 Compare December 30, 2014 13:50
The approach where each model can define custom analysers did not match
Elasticsearch's structure well, and created more complexity than it was
worth.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants