You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which gave me AttributeError: 'tuple' object has no attribute 'encode'. So I suppose the library accepts only strings, but sometimes you want other structures.
The text was updated successfully, but these errors were encountered:
@dragoon I'm not sure adding support for having any object as a key is a good idea - because I don't know how to implement it efficiently.
We can't store just an id of object (it defeats the purpose of marisa-trie), so we should somehow serialize the key to bytes to use it as a key. For strings the wrapper encodes unicode input to utf8.
In order to support arbitrary objects we may use pickle, but I'm not sure how compressable is the result, and better task-specific serialization methods usually exists. For example, in your case (a tuple with 2 strings) it makes sense to join the strings using some separator before adding to the trie and split by this separator when retreiving. You don't need marisa-trie support to do this.
But that's true that there are some edge cases (separator inside the tuple element?), splitting/joining tuples could be more efficient if implemented in Cython, and storing tuples of strings is quite common. So I think adding a trie subclass that allows tuples of strings as keys is a good idea - ngram storage is a common use case. Pull requests are welcome :)
I tried to construct the following trie:
Which gave me
AttributeError: 'tuple' object has no attribute 'encode'
. So I suppose the library accepts only strings, but sometimes you want other structures.The text was updated successfully, but these errors were encountered: