Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urns and language models #9

Merged
merged 25 commits into from
Sep 22, 2020
Merged

urns and language models #9

merged 25 commits into from
Sep 22, 2020

Conversation

balmas
Copy link
Member

@balmas balmas commented Sep 21, 2020

Fixes #2
Adds direction parameter to api for #4
Fixes #6
beginning of docker-compose setup for production deployment
ports functionality and tests from llt-tokenizer to new Ancient Greek and Latin language models (incomplete for Latin see #7)
adds support for other available Spacy language models

@balmas balmas changed the title urns and language modles urns and language models Sep 21, 2020
Comment on lines +25 to +30
direction = fields.Str(
required=False,
description=gettext("Text direction."),
missing = "ltr",
validate = validate.OneOf(["ltr","rtl"])
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not really used at the moment, but it's in the api in case we need it in the future.

@balmas balmas requested a review from irina060981 September 21, 2020 20:58
Copy link
Member

@irina060981 irina060981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok

@@ -0,0 +1,169 @@
import importlib

LANGUAGE = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that it would be more flexible to place such long lists to separate json files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably true :-) I will enter an item for that.

@@ -179,8 +189,7 @@ def _segmentize(
'index': tokenIndex,
'docIndex': token.i,
'text': token.text,
'punct': token.is_punct,
'metadata': {}
'punct': token.is_punct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you decide to map metadata to the other data model? not to a segment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's just that originally I had some of the metadata in a sub object. In the end I made all metadata direct properties of the segment. I went back and forth on how to do this so the code wasn't quite the same everywhere due to my wishy washy refactoring.

@balmas balmas merged commit 3d05d5f into master Sep 22, 2020
balmas pushed a commit that referenced this pull request Dec 7, 2020
balmas pushed a commit that referenced this pull request Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

citation is not applied correctly without the first empty line. add token exception for cts urns and uris
2 participants