Skip to content
This repository has been archived by the owner on Jul 5, 2018. It is now read-only.

Sentence segmentation with apostrophes #1

Open
fbaumgardt opened this issue Oct 21, 2013 · 1 comment
Open

Sentence segmentation with apostrophes #1

fbaumgardt opened this issue Oct 21, 2013 · 1 comment
Assignees
Labels

Comments

@fbaumgardt
Copy link

Apostrophes ʼ are not parsed correctly - sometimes they appear in pairs to mark quotations. The second apostrophe usually gets assigned to the following sentence and if there is none (-> end of chapter), it will be assigned its own sentence with length=1. You can find those locations searching for "1".*\n\s{3}</.

I am not familiar with the sentence id schema here - how can we fix a bug that affects sentence segmentation?

@ghost ghost assigned balmas Nov 15, 2013
@balmas
Copy link
Collaborator

balmas commented Nov 15, 2013

This is a bug in the old Perseus segmentation code and something that should be noted as a requirement for the Annotation Service and any tokenization services we use in Perseids.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants