Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url_parse does not parse correctly with google maps url #98

Open
shunyamaya opened this issue Oct 8, 2019 · 5 comments
Open

url_parse does not parse correctly with google maps url #98

shunyamaya opened this issue Oct 8, 2019 · 5 comments

Comments

@shunyamaya
Copy link

Hi, thanks for developing the package. I realized that url_parse (and all of the other functions dependent on it) act strangely to google map urls.

google_maps <- "https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519"
url_parse(google_maps)

> scheme                       domain port
1  https 40.7519848,-74.0015045,14.7z <NA>
                                                                                path
1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
  parameter fragment
1      <NA>     <NA>

Can this be fixed? Thanks!

@Ironholds
Copy link
Owner

I don't think so? Google's URLs are...very much not one's friend :(. One way of fixing it might be to url_encode the path for the parsing operation? How consistent are the URL portions /before/ the path?

@hrbrmstr
Copy link
Collaborator

{curlparse} handles this if you need something in the interim.

dplyr::glimpse(
  curlparse::parse_curl("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
)
Observations: 1
Variables: 9
$ scheme   <chr> "https"
$ user     <chr> NA
$ password <chr> NA
$ host     <chr> "www.google.com"
$ port     <chr> "443"
$ path     <chr> "/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420…
$ options  <chr> NA
$ query    <chr> NA
$ fragment <chr> NA

@BernhardClemm
Copy link

My current, hacky, way to deal with this is to manipulate the URL before applying urltools:

url <- "https://www.google.com/maps/@42.4939588,-54.8994772,3z?entry=ttu"
domain <- urltools::domain(gsub("@", "%40", url))

So it seems that the @ is causing the problem? Is there no way to fix this within the package?

@hrbrmstr
Copy link
Collaborator

hrbrmstr commented Sep 1, 2023

or you could just use that curlparse package?

@BernhardClemm
Copy link

@hrbrmstr because I also need the suffix_extract() function by urltools, and don't want to import more packages than necessary.

I see that your other package psl has some useful functions in that regard, but it's not on CRAN :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants