[WIP] Support custom analysers #111

Treora · 2014-12-28T00:33:57Z

To support custom analysers, it seems best to abandon model.create_all
in favour of an index-wide method (es.create_models) that sets all
mappings (and custom analysis settings) at once.

Code is largely transplanted from hypothesis/h#1825

Details:
Custom analysers(&filters&tokenisers) are shared in Elasticsearch among
all document types (models). To update a model's mappings one needs to
make sure the custom analysers are defined first, but updating those
just for one model is inelegant; The index would have to be closed for
each update, and one cannot check for duplicate definitions of analysers
as it is not visible which model defined a particular analyser. Treating
index-wide settings as per-model settings creates leaky abstractions.

@tilgovi, @nickstenning (and anyone concerned): Please let me know if you disagree with abandoning the model-centric abstraction in this case. I think it is better this way because Elasticsearch does not group analysers by document type, so if we do pretend that it does we either get leaky abstractions with possibly unreliable edge cases, or we need to do more effort than it's worth to pertain the model-centric abstraction.
We could even consider to just define analysers centrally instead of in each model, to simplify things even further (no merging of settings required, it would just match ES's structure), but I think I'm okay with this approach.

I will do tests, add tests and polish things when the approach is considered satisfactory.

tilgovi · 2014-12-28T01:01:40Z

I think there should be symmetry in names, create_all and drop_all.

I agree it doesn't make sense to have the settings defined on the models so let's just not. Maybe we should configure it on the es object.

Finally, the natural conclusion of this to me would be to use metaclass techniques (or have an API on the es object that such metaprogramming would use) to register models. That kind of magic fits with magic properties, like __mapping__, but doesn't need a get_mapping.

How far do we want to go? I'd be fine to leave it all alone, TBH, because I'd like to invest instead in looking toward other storage.

Treora · 2014-12-28T11:33:11Z

I think there should be symmetry in names, create_all and drop_all.

I was thinking that drop_all could perhaps more clearly be named delete_index or so.

I'm not sure what you mean with metaclass/metaprogramming techniques. But I also think that creating nice abstractions and APIs is just a waste of effort. Creating abstractions with less effort simply leads to more bugs, maintenance, leaky abstractions and nasty edge cases, so I would rather just make things simple and stay close to the structure Elasticsearch provides for now.

I'll look how clean things become if settings is just defined separately rather than possibly distributed over the models.

To support custom analysers, it seems best to abandon model.create_all in favour of an index-wide method (es.create_models) that sets all mappings (and custom analysis settings) at once. Code is largely transplanted from hypothesis/h#1825 Details: Custom analysers(&filters&tokenisers) are shared in Elasticsearch among all document types (models). To update a model's mappings one needs to make sure the custom analysers are defined first, but updating those just for one model is inelegant; The index would have to be closed for each update, and one cannot check for duplicate definitions of analysers as it is not visible which model defined a particular analyser. Treating index-wide settings as per-model settings creates leaky abstractions.

The function deletes the whole index, not just one type of documents, so this makes much more sense.

Note that also when the index is an alias index_exists will be true.

The approach where each model can define custom analysers did not match Elasticsearch's structure well, and created more complexity than it was worth.

Remove needless context

2d97f29

Treora added 3 commits December 30, 2014 13:23

Move Model.drop_all -> es.drop_all

1fe0464

The function deletes the whole index, not just one type of documents, so this makes much more sense.

Test for index existence before creation

36ec42f

Note that also when the index is an alias index_exists will be true.

Treora force-pushed the support_custom_analysers branch from 2d00201 to fe6e900 Compare December 30, 2014 13:50

Treora added 3 commits December 30, 2014 15:07

Take analysis settings out of models

a41ab89

The approach where each model can define custom analysers did not match Elasticsearch's structure well, and created more complexity than it was worth.

Rename create_models -> create_all

bbf71eb

Update changelog

458789c

Treora force-pushed the support_custom_analysers branch from fe6e900 to 458789c Compare December 30, 2014 14:08

tilgovi force-pushed the master branch from dedde9e to 06d0f65 Compare May 18, 2015 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support custom analysers #111

[WIP] Support custom analysers #111

Treora commented Dec 28, 2014

tilgovi commented Dec 28, 2014

Treora commented Dec 28, 2014

[WIP] Support custom analysers #111

Are you sure you want to change the base?

[WIP] Support custom analysers #111

Conversation

Treora commented Dec 28, 2014

tilgovi commented Dec 28, 2014

Treora commented Dec 28, 2014