-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urns and language models #9
Conversation
and more tests
with nginx and gunicorn for production
direction = fields.Str( | ||
required=False, | ||
description=gettext("Text direction."), | ||
missing = "ltr", | ||
validate = validate.OneOf(["ltr","rtl"]) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not really used at the moment, but it's in the api in case we need it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok
@@ -0,0 +1,169 @@ | |||
import importlib | |||
|
|||
LANGUAGE = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, that it would be more flexible to place such long lists to separate json files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably true :-) I will enter an item for that.
@@ -179,8 +189,7 @@ def _segmentize( | |||
'index': tokenIndex, | |||
'docIndex': token.i, | |||
'text': token.text, | |||
'punct': token.is_punct, | |||
'metadata': {} | |||
'punct': token.is_punct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you decide to map metadata to the other data model? not to a segment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's just that originally I had some of the metadata in a sub object. In the end I made all metadata direct properties of the segment. I went back and forth on how to do this so the code wasn't quite the same everywhere due to my wishy washy refactoring.
Fixes #2
Adds direction parameter to api for #4
Fixes #6
beginning of docker-compose setup for production deployment
ports functionality and tests from llt-tokenizer to new Ancient Greek and Latin language models (incomplete for Latin see #7)
adds support for other available Spacy language models