Request never completes when using a custom matcher/transformer class and StringIO #65

ndbroadbent · 2019-01-02T20:07:53Z

Just posting this here in case someone runs into the same issue. This is kind of related to #36.

TL:DR; If you're going to change the body, make sure you update the `Content-Length` header!

I found #51 and 2a0174f, so I followed the example in the test and set up my own matcher/transformer class like this:

class BlogProxyMatcherAndTransformer
  def self.match(path)
    return BlogProxyMatcherAndTransformer.new if path.match?(/^\/blog\//)
    false
  end

  def url(path)
    blog_path = path.sub(/^\/blog\//, '/')
    [ENV['BLOG_ROOT_URL'], blog_path].join('')
  end

  def transform(response, request_uri)
    Rails.logger.info "[rack-reverse-proxy]: #{request_uri}"
    status, headers, body = response

    headers['Cache-Control'] = 'public, max-age=31464028'
    headers['Expires'] = 1.year.from_now.httpdate

    unless response[1]['Content-Type']&.include?('text/html')
      return [status, headers, body] 
    end

    replaced_body = body.to_s
      .gsub(/<meta name="?robots"? content="?noindex"?>/, '')
    [status, headers, StringIO.new(replaced_body)]
  end
end

reverse_proxy BlogProxyMatcherAndTransformer

I didn't want any search engines to crawl the blog on a different domain, in case they find it. So I added the noindex meta tag, and then stripped it from the proxied response.

I found the StringIO example in this comment and thought I'd try that (but I wasn't calling target_response.send(:session).finish.)

I tried this out, but was getting errors on staging:

$ curl https://mystagingapp.com/blog/
...
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)

Also:

$ curl http://localhost:3000/blog/
...
curl: (18) transfer closed with 34 bytes remaining to read

Then I saw that I could just use a plain array (in #36), so I tried that.

This still gave me the same error when trying it locally. The curl request would hang, and then give me the error:

curl: (18) transfer closed with 34 bytes remaining to read

So I went back to #41 and tried adding the #finish method as well. That didn't work - when I called to_s on the body, this already closes the HTTP session.

I started looking at rack-proxy. Realized why #to_s was closing the HTTP session.

At this point I double-checked to make sure my code was causing the issue. I just returned the original instance of Rack::HttpStreamingResponse, and this worked fine (the curl request completed normally.)

I tried a new StreamWrapper class:

        class StreamingResponseWrapper
          def initialize(streaming_response)
            @streaming_response = streaming_response
          end

          def each
            @streaming_response.each do |string|
              replaced_string = string
                .gsub(/<meta name="?robots"? content="?noindex"?>/, '')
              yield(replaced_string)
            end
          end
        end

But this had the same issue. When I returned the original string, everything worked. When I removed the meta tag, the curl request hung, and it also didn't work in the browser.

Then I finally realized that it was because I was sending the original Content-Length header! I was removing bytes from the response, and curl was still waiting for me to send them.

I tried deleting the Content-Length header to see if Rack would recreate it, but it just sent "Content-Length: 0".

So I finally got it working by calling: headers['Content-Length'] = replaced_body.length.to_s

Then I also realized that I was doing this for HEAD requests with curl -I, which set the Content-Length to 0 (because body was an empty string), so that also needed to be fixed. I was able to find the request method by calling body.instance_variable_get('@request').method, and then checking that it equals GET. (Then this actually returns the wrong Content-Length, because it doesn't know about the removed bytes. I don't know if this really matters, since browsers will only be making GET requests.)

Here's the working code:

class BlogProxyMatcherAndTransformer
  def self.match(path)
    if path.match?(/^\/blog\//)
      return BlogProxyMatcherAndTransformer.new
    end

    false
  end

  def url(path)
    blog_path = path.sub(/^\/blog\//, '/')
    [ENV['BLOG_ROOT_URL'], blog_path].join('')
  end

  def transform(response, request_uri)
    Rails.logger.info "[rack-reverse-proxy]: #{request_uri}"
    status, headers, body = response

    headers['Cache-Control'] = 'public, max-age=31464028'
    headers['Expires'] = 1.year.from_now.httpdate

    method = body.instance_variable_get('@request').method

    # Ignore HEAD requests and anything that's not HTML
    unless method == 'GET' && headers['Content-Type']&.include?('text/html')
      return response
    end

    replaced_body = body.to_s
      .gsub(/<meta name="?robots"? content="?noindex"?>/, '')

    # IMPORTANT: Also update the content length!
    headers['Content-Length'] = replaced_body.length.to_s

    [status, headers, [replaced_body]]
  end
end

reverse_proxy BlogProxyMatcherAndTransformer

The text was updated successfully, but these errors were encountered:

ndbroadbent · 2019-01-02T20:11:24Z

Hope that helps someone!

ndbroadbent closed this as completed Jan 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request never completes when using a custom matcher/transformer class and StringIO #65

Request never completes when using a custom matcher/transformer class and StringIO #65

ndbroadbent commented Jan 2, 2019 •

edited

Loading

ndbroadbent commented Jan 2, 2019

Request never completes when using a custom matcher/transformer class and StringIO #65

Request never completes when using a custom matcher/transformer class and StringIO #65

Comments

ndbroadbent commented Jan 2, 2019 • edited Loading

TL:DR; If you're going to change the body, make sure you update the Content-Length header!

ndbroadbent commented Jan 2, 2019

ndbroadbent commented Jan 2, 2019 •

edited

Loading

TL:DR; If you're going to change the body, make sure you update the `Content-Length` header!