-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question (and perhaps suggestion): transforming parsers to limit input size, the "proper way"? #2895
Comments
A fold can be converted to a parser without any problem, but a parser cannot be converted to a fold because we will lose the error. It is a one way street. Therefore, a function like Parsers were designed to just succeed or fail without any type level distinction on why it failed. Failure is used for the Alternative implementation, if a parser fails for whatever reason we can opt to use an alternative parser. But we cannot make that decision based on the contents of the error. Parameterizing the Parser type with an error type would be a bigger change. However, the wrapper parsers are free to handle the underlying parser's error in whatever way.
You can case match on the inner parser result, handle the error case and transform the error string as you want. Thus the contents of the error string can be determined by this outer parser and not the inner one. You can tag the error with some keyword to indicate whether the error is due to inner parser failure or size overflow. The error would still be a String but you control the format of the string, so an outer parser can handle it deterministically if required. |
Thanks for your quick reply @harendra-kumar! Would you be open to a PR to add an error type to Something like the following:
And replacing all wherever one sees I think this will need some further changes also. Some functions could just have their signature changed. For example:
just needs to be changed to:
But wherever there is a hardcoded string in an
I think the best way to change it will be like this:
but I'm not sure if the
Perhaps this interferes less with optimisations, and the compiler can more easily recover the original I ask all this because whilst I'm happy to work on a PR myself to achieve this (I suspect it will be in the new year sometime) but I just wanted to check with you whether you thought this was a reasonable thing to do before I dive in. |
I am not averse to the idea, however, we need to think about a few things:
For position tracking, line number reporting we have another change planned, see the PR #2861 . Basically we will have the stream position available everywhere which can be used to compute the location including line numbers. That PR has some performance impact, we need to see if we want to absorb the perf impact or have position tracking parsers separated from non-tracking ones. Given that we can have position tracking separately, we need to see what additional benefits we will have with parameterized error. It can possibly be used to compose errors in a better way, but I would like some good, practical end user benefits listed down. Thanks for proposing to work on this. Again, I am not against the idea but want to whet out pros and cons thoroughly before deciding to merge. A minimal prototype can bring out how complicated things become with this, and what we are getting into. |
I'm actually doing some custom CSV parsing. I've looked on Hackage and I can't find a library that does all of the below:
So I figure it would be less work to write something from scratch than try to completely change the architecture of an existing library. I think I'll just get something working with So then I might try to patch your library, and try to convince the boss it allow me to open source the core of the CSV parsing library I'm writing, so you can see the PR and a use case together. That's likely only to be a new year thing though. |
So I want to do some bounded space parsing. I'm collecting a bunch of tokens (in my case
Word8
, but it doesn't really matter) and combining them into a new token.I wanted to just write my parser logic independently, and then "wrap" the parser in a way that transforms it into a new parser that fails if the input length is too large before generating a token. I don't want this "limit" logic to be part of every parser I write, I want to separate these concerns.
So I came up with something like this (excuse the verboseness, I've been quite explicit about the types):
So, basically, given a
Parser a m b
, I can generate aParser (Int, a) m (Int, b)
which means I can give a "size" to all my input elements, and set a "max size" for the output.So far so good. Well almost. The issue is that if
Parser (Int, a) m (Int, b)
fails, I just get back a string, but if the underlyingParser a m b
fails, I also just get a string. Short of hacky pattern matching on the string, I don't know where each one came from.So then I tried this, instead writing a
Parser
wrapper that never fails, but instead throws it's error inExceptT
. Now I can type the error:But now I've got a parser transformer that only fails if the underlying parser fails. But because the transformer itself doesn't introduce failures, maybe I could write this as a fold transformer. And I can:
But now I can only use this to transform folds. Although I feel like there should be a function like:
Although I haven't written one myself to prove it exists, and I couldn't find it in the docs.
So I think I've just got a bunch of vague questions, and would appreciate some guidance:
ExceptT
, and are there any sneaky space leak issues I need to watch out for?Parser a m b
be insteadParser e a m b
, and the constructorError String
be insteadError e
? This would allow one to write (I think):limitParserInput :: (Applicative m) => Int -> Parser e a m b -> Parser (StreamSizeLimitError, e) (Int, a) m (Int, b)
and perhaps this is better for efficiency/ergonomics?My apologies for the somewhat verbose question, I just thought the best way to not completely go down the wrong rabbit hole was just to put it all out there the way I've been going.
The text was updated successfully, but these errors were encountered: