-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strippable affix class regexes #1333
Comments
EDIT: Fix the POSIX regex. I found a better solution, that all the regex libraries support: I will change the code to support this too. EDIT yet again: EDIT: |
@linas, However, it seems there is a simpler solution that doesn't use a regex affix: Use Another thing: |
I said above:
If a word contains 2 kinds of punctuations, one that has to be separated and one that should not, this way wouldn't work since the word could either match or not match a regex. So I will send the PR that splits by affix regexes. EDIT:
This was a bad idea since in general such matches have nothing to do with word splits. |
I finished implementing and testinmg it, and here are the examples I used:
In
corpus-fixes.batch
:LPUNC and RPUNC also support regexes, and I tested with them
/^[[:punct:]]/
and/[[:punct:]]$/
(respectively) inamy
.However, there is a problem: It is supported only when configured with PCRE2, and when configured with C++ the lookbehind regex compilation fails (not supported by C++). POSIX regexes (C library and TRE) also fail. (This is not really a problem for
amy
etc. since we don't need to support other regex libraries there.)Possible solutions:
HAVE_POCRE2_H
.4.Only support PCRE2 on POSIX systems. (BTW, it is now easy for me to add PCRE2 support on MS-Windows too.)
"/(?<!\d)[,:]|[,:](?!\d)/PCRE2"
(or even flag "e" for "extended").I am for (5) and otherwise for (2) or (1).
The text was updated successfully, but these errors were encountered: