Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Support custom retry logic per method #1081

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mgyucht
Copy link
Contributor

@mgyucht mgyucht commented Nov 4, 2024

Changes

Some requests can be safely retried when the API returns a 504 Gateway Timeout, but not all. In this PR, we refactor the retry logic slightly to allow discriminating based on the request and response as well as any potential error returned from the Go HTTP client or parsed from the API response. The logic for this is consolidated into a single place, the ErrorRetriable field of the HTTP Client's config.

In addition, we add retries for the Get Permissions API, which can fail in this way, to provide better stability for callers of this endpoint. Today, we don't have a clear indication of which API methods are idempotent, so the approach is very conservative, only retrying requests whose method and path match a given pattern.

This PR incidentally rewrites the retry logic for the DefaultErrorRetriable, reading the status code directly from the *http.Response instead of from the parsed error. This is less brittle in case a custom error parser is provided that doesn't deserialize 4xx/5xx responses using the HttpError type.

Tests

The first commit is a refactor of the existing code. All existing tests pass.

A new test is added that verifies that the Get Permissions API retries when the API responds with a 504. I verified that this fails on the initial commit.

  • make test passing
  • make fmt applied
  • relevant integration tests applied

@mgyucht mgyucht changed the title [Fix][WIP] Support custom retry logic per method. [Fix][WIP] Support custom retry logic per method Nov 4, 2024
@mgyucht mgyucht temporarily deployed to test-trigger-is November 4, 2024 13:25 — with GitHub Actions Inactive
@mgyucht mgyucht temporarily deployed to test-trigger-is November 4, 2024 13:25 — with GitHub Actions Inactive
@mgyucht mgyucht temporarily deployed to test-trigger-is November 4, 2024 13:46 — with GitHub Actions Inactive
@mgyucht mgyucht changed the title [Fix][WIP] Support custom retry logic per method [Fix] Support custom retry logic per method Nov 4, 2024
@mgyucht mgyucht temporarily deployed to test-trigger-is November 4, 2024 13:46 — with GitHub Actions Inactive
…ent" requests varies somewhat from service to service
@mgyucht mgyucht temporarily deployed to test-trigger-is November 4, 2024 14:44 — with GitHub Actions Inactive
Copy link

github-actions bot commented Nov 4, 2024

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-go

Inputs:

  • PR number: 1081
  • Commit SHA: 1e51e8bf2d0c68f24fac87b3a9ace8f0d615b2af

Checks will be approved automatically on success.

@mgyucht mgyucht temporarily deployed to test-trigger-is November 4, 2024 14:44 — with GitHub Actions Inactive
@@ -63,8 +63,7 @@ func (c *DatabricksClient) GetOAuthToken(ctx context.Context, authDetails string

// Do sends an HTTP request against path.
func (c *DatabricksClient) Do(ctx context.Context, method, path string,
headers map[string]string, request, response any,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this back in if desired, just a small formatting change.

}
return false
},
httpclient.RetryUrlErrors,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this out of the ApiClient to have a single codesite where the retry logic is defined for the client. The downside is that you need to add this explicitly in your ErrorRetriable if you don't specify DefaultErrorRetriable. Happy to make this a default behavior, let me know what you think.

@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/11666553890

@@ -63,8 +63,7 @@ func (c *DatabricksClient) GetOAuthToken(ctx context.Context, authDetails string

// Do sends an HTTP request against path.
func (c *DatabricksClient) Do(ctx context.Context, method, path string,
headers map[string]string, request, response any,
visitors ...func(*http.Request) error) error {
headers map[string]string, request, response any, visitors ...func(*http.Request) error) error {
Copy link
Contributor

@renaudhartert-db renaudhartert-db Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I would go with one parameter per line to echo the way function calls and struct declarations are made:

func (c *DatabricksClient) Do(
	ctx      context.Context, 
	method   string,
	path     string,
	headers  map[string]string, 
	request  any, 
	response any, 
	visitors ...func(*http.Request) error
) error {

There's a couple of similar patterns in the Go standard library but not many. One of the reason is that long lists of parameters are usually substituted with a struct (https://google.github.io/styleguide/go/best-practices#option-structure). I actually wanted to make that change for quite sometime but it didn't feel right sending one PR just for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This Do() method should be formulated either as a struct argument or with functional options, since everything after path is optional. I'd prefer not to change this signature unless necessary, as it is used in multiple places in the TF provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with not using a struct as this is out of scope for this PR. Let's at least format idiomatically then. Our Go style does not mandate a 80 char line length which makes the current formatting quite arbitrary. I'm fine with either having everything on a single line or one argument per line.

Comment on lines +1 to +54
package config

import (
"context"
"fmt"
"io"
"net/http"
"strings"
"testing"

"github.com/databricks/databricks-sdk-go/httpclient"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

type hc func(r *http.Request) (*http.Response, error)

func (cb hc) RoundTrip(r *http.Request) (*http.Response, error) {
return cb(r)
}

func (cb hc) SkipRetryOnIO() bool {
return true
}

func TestApiClient_RetriesGetPermissionsOnGatewayTimeout(t *testing.T) {
requestCount := 0
c := &Config{
HTTPTransport: hc(func(r *http.Request) (*http.Response, error) {
initialRequestCount := requestCount
requestCount++
if initialRequestCount == 0 {
return &http.Response{
Request: r,
StatusCode: http.StatusGatewayTimeout,
Body: io.NopCloser(strings.NewReader(
fmt.Sprintf(`{"error_code":"TEMPORARILY_UNAVAILABLE", "message":"The service at %s is taking too long to process your request. Please try again later or try a faster operation."}`, r.URL))),
}, nil
}
return &http.Response{
Request: r,
StatusCode: http.StatusOK,
Body: io.NopCloser(strings.NewReader(`{"permissions": ["can_run_queries"]}`)),
}, nil
}),
}
client, err := c.NewApiClient()
require.NoError(t, err)
ctx := context.Background()
var res map[string][]string
err = client.Do(ctx, "GET", "/api/2.0/permissions/object/id", httpclient.WithResponseUnmarshal(&res))
assert.NoError(t, err)
assert.Equal(t, map[string][]string{"permissions": {"can_run_queries"}}, res)
}
Copy link
Contributor

@renaudhartert-db renaudhartert-db Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend structuring the unit tests differently by having this test focused on how ApiClient manages mocked ErrorRetrier (or the absence of an ErrorRetrier). The tests of the ErrorRetrier themselves (e.g. verify that the path pattern match properly) should happen in httpclient/errors_test.go.

This test could look like the following (I did not verify that the code works):

type mock struct {
	MaxFails      int            // number of times the failed Response is returned
	FailResponse  *http.Response // response to return in case of fail
	FailError     error          // error to return in case of fail
	NumCalls      int            // total number of calls
}

func (m *mock) RoundTrip(r *http.Request) (*http.Response, error) {
	m.NumCalls++
	if m.NumCalls <= m.MaxFails {
		return m.FailResponse, n.FailError
	}
	return &http.Response{
		Request:    r,
		StatusCode: http.StatusOK,
		Body:       io.NopCloser(strings.NewReader(`{}`)),
	}, nil
}

func (m *mock) SkipRetryOnIO() bool {
	return true
}

func TestApiClient_Do_retries(t *testing.T) {
	testCases := []struct{
		desc         string
		config       *Config
		errorRetrier ErrorRetrier
		wantNumCalls int
	} {
		{
			desc: "nil retrier",
			mock: &mock{
				MaxFails: 1,
				FailResponse: &http.Response{StatusCode: http.StatusGatewayTimeout}
			}
			wantNumCalls: 1,
		},
		{
			desc: "no retry",
			mock: &mock{
				MaxFails: 1,
				FailResponse: &http.Response{StatusCode: http.StatusGatewayTimeout}
			}
			errorRetrier: func(context.Context, *http.Request, *common.ResponseWrapper, error) bool {
				return false
			},
			wantNumCalls: 1,
		},
		{
			desc: "retry 1 time",
			mock: &mock{
				MaxFails: 1,
				FailResponse: &http.Response{StatusCode: http.StatusGatewayTimeout}
			}
			errorRetrier: func(context.Context, *http.Request, *common.ResponseWrapper, error) bool {
				return true
			},
			wantNumCalls: 2,
		},
		{
			desc: "retry 2 times",
			mock: &mock{
				MaxFails: 2,
				FailResponse: &http.Response{StatusCode: http.StatusGatewayTimeout}
			}
			errorRetrier: func(_ context.Context, _ *http.Request, _ *common.ResponseWrapper, _ error) bool {
				return true
			},
			wantNumCalls: 3,
		},
		{
			desc: "retry 3 times",
			mock: &mock{
				MaxFails: 3,
				FailResponse: &http.Response{StatusCode: http.StatusGatewayTimeout}
			}
			errorRetrier: func(_ context.Context, _ *http.Request, _ *common.ResponseWrapper, _ error) bool {
				return true
			},
			wantNumCalls: 4,
		},
	} 


	func _, tc := range testCases {
		t.Run(tc.desc, func(t *testing.T) {
			cfg := &Config{HTTPTransport: tc.mock} 
			client, err := cfg.NewApiClient()
			client.ErrorRetrier = tc.errorRetrier

			err = client.Do(context.Background(), "GET", "test-path") 
			gotNumCalls = tc.mock.NumCalls

			if gotNumCalls != tc.wantNumCalls {
				t.Errorf("got %d calls, want %d", gotNumCalls, tc.wantNumCalls)
			}
		})
	}
}

Please feel free to ignore this comment if this is too much work or if the ApiClient cannot be instrumented that easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely can be instrumented this way, and this is a nice test case to use (I'll adapt it and include it in this PR). However, I did want to specifically test the get permissions pathway. Essentially, this tests that "the client returned by Config.GetApiClient() correctly implements retry on 504." I will add more test cases here though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did want to specifically test the get permissions pathway

Sounds good to me as long as this complements the overall testing of the retry logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants