Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plus signs to spaces. #219

Closed
mgenereu opened this issue Oct 4, 2020 · 8 comments · Fixed by #225
Closed

Plus signs to spaces. #219

mgenereu opened this issue Oct 4, 2020 · 8 comments · Fixed by #225
Assignees

Comments

@mgenereu
Copy link

mgenereu commented Oct 4, 2020

Describe the bug
Plus signs aren't being URL decoded to spaces. Looks like there's support for encoding to plus signs from spaces.

To Reproduce
https://scastie.scala-lang.org/mgenereu/KsC9oHBvR5qd84JVublZuA/9

The expected result is space and not a plus sign in the name.

I think I can do the PR for this if it's a confirmed bug.

@theon
Copy link
Member

theon commented Oct 4, 2020

Thanks for raising this. I'm surprised this isn't already supported and it's definitely something we should support.

It'll have to be configurable as we'll need to support both users that want to decode + as spaces and those who want them left as +. The question is what the default behaviour should be?

I don't think RFC3986 mentions anything about this. I'm not familiar enough with the WHATWG URL spec to know if it has an opinion, but I'll take a look later today. Opting in would mirror the behaviour for encoding spaces in scala-uri, so maybe that is best for now? Let me know if you have any thoughts.

@theon
Copy link
Member

theon commented Oct 4, 2020

Had a read through the WHATWG spec and it's pretty confusing. It looks like it requires space/plus encoding/decoding only for query parameters and only by their URLSearchParams class and not by their URL class

Notice how in the note under 6.2. URLSearchParams class, when the URL class from the spec is used, the space is encoded as %20, but when URLSearchParams comes into play, the space is encoded as +:

As a URLSearchParams object uses the application/x-www-form-urlencoded format underneath there are some difference with how it encodes certain code points compared to a URL object (including href and search). This can be especially surprising when using searchParams to operate on a URL’s query.

const url = new URL('https://example.com/?a=b ~');
console.log(url.href);   // "https://example.com/?a=b%20~"
url.searchParams.sort();
console.log(url.href);   // "https://example.com/?a=b+%7E"

URLSearchParams objects will percent-encode anything in the application/x-www-form-urlencoded percent-encode set, and will encode U+0020 SPACE as U+002B (+).

I think in scala-uri we should:

  • Create a new implementation of UriDecoder that replaces + with (space)
  • Facilitate the ability for users to something like this:
import io.lemonlabs.uri.decoding._
implicit val config = UriConfig(queryDecoder = percentDecode + plusAsSpace)

This is backwards compatible and mirrors what we already do this on the encoding side

Let me know what you think. I can implement this within the next couple weeks, or feel free to raise a PR if you'd like it sooner 👍

@mgenereu
Copy link
Author

mgenereu commented Oct 5, 2020

Plus has been the encoding for space in a URL for 25 years and I've been using it for 25 years. Nobody would be surprised if you made decoding plus to space and %2B to plus as the default, announce a breaking change, and provided a way to disable. It's been true on every web project I've ever been on. My specific usage case here is in the browser:

const render_url = new URL("render", document.location.href);
render_url.searchParams.append("name", $("#name").val());
render_url.searchParams.append("title", $("#title").val());
render_url.searchParams.append("location", $("#location").val());
$("#render").attr("src", render_url.href)

I can go and find you the decoding in every web framework out there if you want precedence on this.

My intention isn't to be right but rather the incredible weight behind this functionality based on history.

@mgenereu
Copy link
Author

mgenereu commented Oct 5, 2020

@mgenereu
Copy link
Author

mgenereu commented Oct 5, 2020

Reading this: https://en.wikipedia.org/wiki/Percent-encoding#The_application/x-www-form-urlencoded_type
tilts me towards the methods that you're saying. All these frameworks are assuming form/JavaScript submissions via browsers and I'm guessing you're thinking a bigger scope? I've not been to exposed to query parameters outside of the web.

https://tools.ietf.org/html/rfc1630 (search for "QUERY STRINGS")

@theon
Copy link
Member

theon commented Oct 16, 2020

After looking back over a few older related issues, I agree that encoding/decoding space to + in query params should be default; I think this will suit most users of this library. Any users who wish to use %20 instead can configure this behaviour

I'll release this in version 3.0.0 shortly with a migration note.

Related issues:

@theon
Copy link
Member

theon commented Oct 17, 2020

This is now released as version 3.0.0; let me know how you get on. Thanks again for raising this 👍

@mgenereu
Copy link
Author

Works perfect. Removed all my silly .map(_.replace("+", " ")) hacks. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants