Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request never completes when using a custom matcher/transformer class and StringIO #65

Closed
ndbroadbent opened this issue Jan 2, 2019 · 1 comment

Comments

@ndbroadbent
Copy link

ndbroadbent commented Jan 2, 2019

Just posting this here in case someone runs into the same issue. This is kind of related to #36.

TL:DR; If you're going to change the body, make sure you update the Content-Length header!

I found #51 and 2a0174f, so I followed the example in the test and set up my own matcher/transformer class like this:

class BlogProxyMatcherAndTransformer
  def self.match(path)
    return BlogProxyMatcherAndTransformer.new if path.match?(/^\/blog\//)
    false
  end

  def url(path)
    blog_path = path.sub(/^\/blog\//, '/')
    [ENV['BLOG_ROOT_URL'], blog_path].join('')
  end

  def transform(response, request_uri)
    Rails.logger.info "[rack-reverse-proxy]: #{request_uri}"
    status, headers, body = response

    headers['Cache-Control'] = 'public, max-age=31464028'
    headers['Expires'] = 1.year.from_now.httpdate

    unless response[1]['Content-Type']&.include?('text/html')
      return [status, headers, body] 
    end

    replaced_body = body.to_s
      .gsub(/<meta name="?robots"? content="?noindex"?>/, '')
    [status, headers, StringIO.new(replaced_body)]
  end
end

reverse_proxy BlogProxyMatcherAndTransformer

I didn't want any search engines to crawl the blog on a different domain, in case they find it. So I added the noindex meta tag, and then stripped it from the proxied response.

I found the StringIO example in this comment and thought I'd try that (but I wasn't calling target_response.send(:session).finish.)

I tried this out, but was getting errors on staging:

screen shot 2019-01-03 at 2 10 42 am

$ curl https://mystagingapp.com/blog/
...
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)

Also:

$ curl http://localhost:3000/blog/
...
curl: (18) transfer closed with 34 bytes remaining to read

Then I saw that I could just use a plain array (in #36), so I tried that.

This still gave me the same error when trying it locally. The curl request would hang, and then give me the error:

curl: (18) transfer closed with 34 bytes remaining to read

So I went back to #41 and tried adding the #finish method as well. That didn't work - when I called to_s on the body, this already closes the HTTP session.

I started looking at rack-proxy. Realized why #to_s was closing the HTTP session.

At this point I double-checked to make sure my code was causing the issue. I just returned the original instance of Rack::HttpStreamingResponse, and this worked fine (the curl request completed normally.)

I tried a new StreamWrapper class:

        class StreamingResponseWrapper
          def initialize(streaming_response)
            @streaming_response = streaming_response
          end

          def each
            @streaming_response.each do |string|
              replaced_string = string
                .gsub(/<meta name="?robots"? content="?noindex"?>/, '')
              yield(replaced_string)
            end
          end
        end

But this had the same issue. When I returned the original string, everything worked. When I removed the meta tag, the curl request hung, and it also didn't work in the browser.

Then I finally realized that it was because I was sending the original Content-Length header! I was removing bytes from the response, and curl was still waiting for me to send them.

I tried deleting the Content-Length header to see if Rack would recreate it, but it just sent "Content-Length: 0".

So I finally got it working by calling: headers['Content-Length'] = replaced_body.length.to_s

Then I also realized that I was doing this for HEAD requests with curl -I, which set the Content-Length to 0 (because body was an empty string), so that also needed to be fixed. I was able to find the request method by calling body.instance_variable_get('@request').method, and then checking that it equals GET. (Then this actually returns the wrong Content-Length, because it doesn't know about the removed bytes. I don't know if this really matters, since browsers will only be making GET requests.)

Here's the working code:

class BlogProxyMatcherAndTransformer
  def self.match(path)
    if path.match?(/^\/blog\//)
      return BlogProxyMatcherAndTransformer.new
    end

    false
  end

  def url(path)
    blog_path = path.sub(/^\/blog\//, '/')
    [ENV['BLOG_ROOT_URL'], blog_path].join('')
  end

  def transform(response, request_uri)
    Rails.logger.info "[rack-reverse-proxy]: #{request_uri}"
    status, headers, body = response

    headers['Cache-Control'] = 'public, max-age=31464028'
    headers['Expires'] = 1.year.from_now.httpdate

    method = body.instance_variable_get('@request').method

    # Ignore HEAD requests and anything that's not HTML
    unless method == 'GET' && headers['Content-Type']&.include?('text/html')
      return response
    end

    replaced_body = body.to_s
      .gsub(/<meta name="?robots"? content="?noindex"?>/, '')

    # IMPORTANT: Also update the content length!
    headers['Content-Length'] = replaced_body.length.to_s

    [status, headers, [replaced_body]]
  end
end

reverse_proxy BlogProxyMatcherAndTransformer
@ndbroadbent
Copy link
Author

Hope that helps someone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant