Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text parser. #11

Merged
merged 15 commits into from
Mar 10, 2020
Merged

Add text parser. #11

merged 15 commits into from
Mar 10, 2020

Conversation

siler
Copy link

@siler siler commented Feb 26, 2020

Fixes issue #10.

Alright, here's the parser. I've still got some qualms with it but I wanted to get a CR up so we could start a discussion.

I basically translated the ANTLR grammar directly and included snippets of it where appropriate (similar to what exists in binary.rs). Perhaps not as important compared to the binary format, but I thought it would be helpful when reading. The code is laid out in the same order as the text encoding. I haven't integrated other text from the Ion spec but it is probably worth doing to explain the behavior in some locations.

Some items I'm thinking about:

  • impl From<&str> for SymbolToken is probably not a reasonable usage.
  • Parsing using char instead of u8 is probably really slow, but the primary mode of communication with Ion should probably be the binary encoding so I'm not sure it is worth worrying about, especially before benchmarks.
  • Need to remove the iterator experiment for the TLV stream and integrate it properly with the parser module.
  • There is a bunch of duplication in the test harnesses currently.
  • I'd like to get the error reporting to something nice that can point out what went wrong, haven't dove into this too far yet.
  • The "bad" folder is not manually verified against specific errors (this is related to the above point).
  • There's a bunch of allocations happening for Strings and other things but I'm not really sure how to get rid of them since we have to deal with underscore separated numerals and escaped strings. Maybe Cow could help here in cases where those aren't present?
  • There's a couple dependencies I was fooling around with that I still need to clean up. Running out of time this morning though.

Anyway, excited for some feedback. Thanks for taking a look!

@PeytonT

@siler siler changed the title Squish Add text parser. Feb 26, 2020
@PeytonT
Copy link
Owner

PeytonT commented Feb 29, 2020

Hi there!

I expect I will have some time over this weekend to review this. I'll also put some thought into the other topics you mentioned here.

Thanks for the contribution!

@siler
Copy link
Author

siler commented Feb 29, 2020

Yep!

Cleaned it up a bit more tonight. Couple other notes I wanted to mention:

  • I included the time crate. Chrono is way too strict for Ion, but we can at least use the Date part of time to verify months/leap years, etc. to start with. I kind of hacked it in just to ensure all the tests pass for now.
  • Using a top level value iterator is actually rather nice since you can keep the remaining data and current state of the symbol table together.
  • Test harness duplication from my first comment is fixed.

Copy link
Owner

@PeytonT PeytonT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew! That was a large pull request indeed. Many thanks for the significant effort that clearly went into this!

I can tell by some of the comments that you ran up against some of the broken edges of the specification and tests. It would be great if you could open issues about the errors you found against the spec. If you'd rather not, please respond with a bit more detail to some of the comments I left, and I'll get around to it. I did open one already, because it's relevant to the code in this pull request. Badgering the Ion maintainers is becoming something of a trend for this project.

I wasn't able to study everything in as much detail as I would have liked and still get the review done over this weekend, but I did get through everything. I left an enormous pile of comments, but I don't want that to detract from how appreciative I am of the effort here. There are a lot of comments because there is a lot of code! Pretty much all of the comments are about style, a few are about correctness. Let me know what you think, a lot of them involve questions for you.

I'm a fan of the Google CR guide, and I'd be happy to merge this as-is and then make the changes I asked after myself. But this can sometimes make people grumpy, or cause them to feel disrespected, so I'd be happy for you to weigh in on my suggestions.

Thanks again! :)

src/parser/ion_1_0/current_symbol_table.rs Outdated Show resolved Hide resolved
src/error.rs Outdated Show resolved Hide resolved
src/error.rs Outdated Show resolved Hide resolved
src/parser/constants.rs Outdated Show resolved Hide resolved
src/parser/ion_1_0/current_symbol_table.rs Outdated Show resolved Hide resolved
src/parser/ion_1_0/text/mod.rs Outdated Show resolved Hide resolved
src/parser/ion_1_0/text/mod.rs Outdated Show resolved Hide resolved
src/parser/ion_1_0/text/mod.rs Outdated Show resolved Hide resolved
src/parser/ion_1_0/text/mod.rs Show resolved Hide resolved
src/parser/ion_1_0/text/mod.rs Outdated Show resolved Hide resolved
@PeytonT
Copy link
Owner

PeytonT commented Mar 2, 2020

Some combination of extremely long lines and possibly many comments seems to have screwed up the github discussion UI. I recommend looking through the PR itself to see what sections the comments actually refer to.

@siler
Copy link
Author

siler commented Mar 5, 2020

Whew! That was a large pull request indeed. Many thanks for the significant effort that clearly went into this!

I can tell by some of the comments that you ran up against some of the broken edges of the specification and tests. It would be great if you could open issues about the errors you found against the spec. If you'd rather not, please respond with a bit more detail to some of the comments I left, and I'll get around to it. I did open one already, because it's relevant to the code in this pull request. Badgering the Ion maintainers is becoming something of a trend for this project.

I wasn't able to study everything in as much detail as I would have liked and still get the review done over this weekend, but I did get through everything. I left an enormous pile of comments, but I don't want that to detract from how appreciative I am of the effort here. There are a lot of comments because there is a lot of code! Pretty much all of the comments are about style, a few are about correctness. Let me know what you think, a lot of them involve questions for you.

I'm a fan of the Google CR guide, and I'd be happy to merge this as-is and then make the changes I asked after myself. But this can sometimes make people grumpy, or cause them to feel disrespected, so I'd be happy for you to weigh in on my suggestions.

Thanks again! :)

Probably should have commented on this earlier, but yeah that all sounds fine. As evidenced by my activity I'd like to polish it up. I think at this point I've got most of the items sorted out that aren't dependent on either input from you or an issue with the Ion team. There's a couple loose ends I'm working on (for example I just filed this: amazon-ion/ion-tests#65) to avoid a different commit for the tests subproject, but I expect to have that squared away in the next couple minutes.

@siler siler requested a review from PeytonT March 5, 2020 03:52
@siler
Copy link
Author

siler commented Mar 5, 2020

Apologies, didn't mean to request a review. Finger got heavy at the wrong moment, doesn't seem to be a way to cancel it. :)

@siler
Copy link
Author

siler commented Mar 8, 2020

Alright, I think this is cleaned up enough and aligns with the discussion in issues #12 and #13. Of course if I missed anything feel free to add an issue and I'll keep an eye on them, or just change it if you prefer. I'd also like to add some issues for some specific tasks I'm considering or that we've identified (I'd like to continue working on this crate, I'm currently using bincode as a stop-gap in a personal project).

src/value.rs Outdated Show resolved Hide resolved
@PeytonT
Copy link
Owner

PeytonT commented Mar 9, 2020

Alright, I think this is cleaned up enough and aligns with the discussion in issues #12 and #13. Of course if I missed anything feel free to add an issue and I'll keep an eye on them, or just change it if you prefer. I'd also like to add some issues for some specific tasks I'm considering or that we've identified (I'd like to continue working on this crate, I'm currently using bincode as a stop-gap in a personal project).

Looks pretty good to me! Is there anything else you are planning to revise in this PR? This has gotten large enough and touches enough parts of the code that I'm cautious about causing merge conflicts with other changes.

@siler
Copy link
Author

siler commented Mar 10, 2020

Yeah, I think any other revisions should be separate, too much is outstanding and I definitely don't want to hold you back. :) Seems like this should be a good start for the text side though.

@PeytonT
Copy link
Owner

PeytonT commented Mar 10, 2020

Yeah, I think any other revisions should be separate, too much is outstanding and I definitely don't want to hold you back. :) Seems like this should be a good start for the text side though.

Sounds good to me, and yes, certainly a very significant expansion of the crate's functionality!

One last thing remains. Can you confirm for me that you are the copyright holder of these changes, and that you are contributing them in accordance with the dual MIT/Apache-2.0 license of this crate?

@siler
Copy link
Author

siler commented Mar 10, 2020

I am, and yes.

@PeytonT
Copy link
Owner

PeytonT commented Mar 10, 2020

Excellent

@PeytonT PeytonT merged commit c08ddc7 into PeytonT:master Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants