Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p.a.redirector redirections get no redirection for crawlers (via SSR) and volto redirects to incorrect ++api++ url depending on rewrite rule setup #4800

Closed
3 tasks
djay opened this issue May 23, 2023 · 6 comments

Comments

@djay
Copy link
Member

djay commented May 23, 2023

Describe the bug

Really this is two bugs

  • if a crawler crawls a page with p.a.redirector redirection (such as manual url management or via changing the short url, it will get a 200 with the content of the target SSR page. ie it won't get a 301 and won't know the page has moved.
  • if accessed in a browser, under certain circumstances, the page will load but end up with the wrong url in the browser bar (++api++ added to the url) so if you reload the page it will break. Note this doesn't happen on out-of-the-box plone or 6.demo.plone.org
    • seems to depend on how you do your rewrite rules. externally vs internally. no rewrite rules is broken (results in ++api++ in the url)

Relevant tickets in volto #1095 and restapi plone/plone.restapi#181 don't make it clear if the ++api++ is meant to be left in or not. but it makes most sense an api call should redirect to an api call.

To Reproduce

Steps to reproduce the behavior:

Out of the box plone with volto

  1. Create a page
  2. change the short name
  3. access old url using curl
  4. you will get 200 not 30x from SSR
  5. if using a browser then volto will make a content api call that will then result in a 30x redirect
  6. If you use the nginx and rewrite rules then you will get api redirect without ++api++ in it. e.g this reproduces this case
    $ curl -H "Accept: application/json" -v http://localhost:8080/VirtualHostBase/http/www.blah.com/++api++/Plone/VirtualHostRoot/test1
    ...
    < HTTP/1.1 302 Found
    ...
    < Location: http://www.blah.com:8080/test2
    
  7. but if no rewrite rules or internal VHM then redirect is with ++api++ e.g.
    $ curl -H "Accept: application/json" -v http://localhost:8080/++api++/Plone/test1
    ... 
    < HTTP/1.1 302 Found
    ...
    Location: http://localhost:8080/++api++/Plone/test2
    
  8. This url is directly put into the window.location and if you refresh you get json

It seems that volto relies on the ++api++ being missing in the redirect and if it gets a redirect from an api with ++api++ then it puts ++api++ into the browser url causing the bug

Expected behavior

  • direct access to a changed url should result in a 30x from SSR, not 200.
  • clicking on a changed url in the browser should result the correct new url in the browser no matter what rewrite rules or production setup you have
  • an api redirect should redirect to an api url regardless of VHM

proposed solution

Screenshots

You can see the redirect without ++api++ in demo.plone.org

image (4)

Software (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Volto Version [e.g. 8.5.0]
  • Plone Version [e.g. 5.2.2]
  • Plone REST API Version [e.g. 7.0.1]

Additional context

Add any other context about the problem here.

@djay djay changed the title p.a.redirector redirections result in broken ++api++ links in some cases and no redirection for crawlers p.a.redirector redirections get no redirection for crawlers (via SSR) and sometimes result in broken ++api++ links in browsers May 23, 2023
@djay djay changed the title p.a.redirector redirections get no redirection for crawlers (via SSR) and sometimes result in broken ++api++ links in browsers p.a.redirector redirections get no redirection for crawlers (via SSR) and browsers sometimes incorrectly redirect to ++api++ links May 23, 2023
@djay djay changed the title p.a.redirector redirections get no redirection for crawlers (via SSR) and browsers sometimes incorrectly redirect to ++api++ links p.a.redirector redirections get no redirection for crawlers (via SSR) and volto redirects to incorrect ++api++ url depending on rewrite rule setup May 24, 2023
@djay
Copy link
Member Author

djay commented May 24, 2023

Managed to reproduce it locally so updated the ticket.

@djay
Copy link
Member Author

djay commented May 24, 2023

@davisagli wondering if you can review the proposed fix before we start working on it.

@davisagli
Copy link
Member

davisagli commented May 26, 2023

@djay Thanks for taking a look at this. I think you're mostly on the right track about the proposed fixes but I'll give some commentary...

fix volto so it can handle ++api++ in the redirect

Makes sense. In general, volto is responsible for adding or removing ++api++ from the URL, depending on whether it is using the URL for something on the client-side (e.g. setting the browser location -- shouldn't have ++api++) or a call to the server (should have ++api++).

All the volto sites I've worked on use rewrites, so I was a bit surprised that not using them causes the backend to return URLs that include ++api++. But, if that is the case, then yes, volto needs to fix it before redirecting the browser.

fix restapi to ensure ++api++ is in the redirect ... An api call should redirect to an api call.

I was thinking the same thing, but then I realized that this might be tricky to implement correctly for all cases. The problem is that the backend returns some URLs that are meant to be used for subsequent calls to the backend (with ++api++) and some URLs that are meant to be used on the client side (e.g. these redirects, or things like action URLs). So we would need to come up with a good way for code in the backend to distinguish what kind of URL it's producing, and I don't think we don't have anything like that yet.

This shouldn't be strictly necessary as long as volto fixes whatever URL it has before using it, so I wouldn't prioritize this part of the fix. Don't take this as stop energy -- I'm certainly interested if you have ideas how to handle this distinction better -- I'm just trying to prioritize where to spend effort.

adjust SSR code to pass on the redirect that it gets from the api redirect

Yep, we want this, it just hasn't been implemented yet.

There's an important nuance: We want volto's server side to serve a permanent redirect (301 or 308) to logged out users, but a temporary redirect (302 or 307) to logged in users. The reason is that we need permanent redirects for SEO, but we need temporary redirects to avoid poisoning the cache of editors who are moving content around.

@djay
Copy link
Member Author

djay commented May 26, 2023

@davisagli there is seperate restapi code to deal with redirects. It already deals with the case of putting ++API++ back in, just not in all cases so it's a bug. The only case it seems to be designed to give a non API result is if the original wasn't an API call or if the redirect is external (not sure how that is possible)

I suspect the place to put the logic around different redirect codes for logged in or not is in the restapi redirect code. Then volto just passes on whatever code it gets from the API call. Will that work?

@davisagli
Copy link
Member

I suspect the place to put the logic around different redirect codes for logged in or not is in the restapi redirect code. Then volto just passes on whatever code it gets from the API call. Will that work?

I was thinking of doing it only in volto. (We don't really need this distinction at the API level, because we don't really care about SEO of JSON responses.) But I don't see a problem with doing it in the API like you said, and it does sound a bit more consistent.

@davisagli
Copy link
Member

This was fixed a while back in #4854

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants