Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] AnnotationTable.TokenizedAnnotationTable #39

Closed
HLWeil opened this issue Oct 20, 2023 · 4 comments
Closed

[Discussion] AnnotationTable.TokenizedAnnotationTable #39

HLWeil opened this issue Oct 20, 2023 · 4 comments

Comments

@HLWeil
Copy link
Member

HLWeil commented Oct 20, 2023

I think we should reconsider the current design of this type as it's kind of an awkward state:

Currently it is split into a list of IO columns and a list of Term Columns. This has two-fold problems according to the current proposed state of the ARC specification 1.2:

  1. What about non-term and non-IO columns like Protocol REF?
  2. There MUST be at most 1 Input and 1 Output Column, so a list seems counterintuitive.

Alternatively to trying to design this in some specific way, we could also keep it more naive and just have a list of columns (including terms, IOs and whatever)?

#25

@kMutagene
Copy link
Member

i am thinking of a complete rework of the parsing. I think we should use ARCtrl's composite column model.

@kMutagene
Copy link
Member

kMutagene commented Oct 24, 2023

iirc ARCtrl parses annotation tables like this:

  • pattern match and assign grouping
  • everything not assignes is Freetext

if that is true, then it should be easy to use for tokenization as well, by filling these composite columns with CvParams in an additional step.

Sounds good? @HLWeil

@HLWeil
Copy link
Member Author

HLWeil commented Oct 25, 2023

Yup that's pretty much it.

It sounds fine with me, provided that it doesn't fail in some specific cases which should be checked. But as a starting point for getting your tokens for further use it should be good!

@kMutagene
Copy link
Member

Closing this as we use ARCtr's ARCTable parser now, which we then tokenize. See #48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants