Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent race condition when marshalling cached operations #96

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

grevych
Copy link
Contributor

@grevych grevych commented Aug 4, 2022

What this PR does / why we need it

Re-bootstrapping with a recent version flags a race condition when running tests:

 :: Running go test (or_test)


  60ms . ·······················↷·································✖✖✖✖✖✖
       graphql_test

 63 tests, 1 skipped, 6 failures in 8.945s

=== Skipped
=== SKIP: . TestDoCustom (0.00s)
    do_test.go:51:

=== Failed
=== FAIL: . TestCustomOperationWithHeaders (0.01s)
    testing.go:1152: race detected during execution of test

=== FAIL: . TestQuery (0.01s)
    testing.go:1152: race detected during execution of test

=== FAIL: . TestCustomOperation (0.01s)
    testing.go:1152: race detected during execution of test

=== FAIL: . TestQueryWithHeaders (0.01s)
    testing.go:1152: race detected during execution of test

=== FAIL: . TestMutate (0.01s)
    testing.go:1152: race detected during execution of test

=== FAIL: . TestMutateWithHeaders (0.01s)
    testing.go:1152: race detected during execution of test

DONE 63 tests, 1 skipped, 6 failures in 8.963s
make: *** [test] Error 1

The issue occurs in the marshal method when multiple requests try to update the declaration name of the same cached operation

goql/query.go

Line 636 in 6f72c1e

operation.Decl.Name = wrapper

The proposed fix copies the cached operation by value before it is updated. Since no other fields are updated during the marshalling process, there's no need make a deep copy (by reference).

Jira ID

N/A

Notes for your reviewers

@grevych grevych requested a review from a team August 4, 2022 10:26
// A copy is needed because the top level declaration name (primitive type)
// of the operation is re-assigned.
opReplica := *cachedOperation.(*field)
operation = &opReplica

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid we need deeper refactoring.
Current "buggy" implementation modifies operation data (Decl) stored in cache after being retrieved from cache. So supposedly next time it is retrieved from cache it has new pieces.

This implementation modifies data on a copy of cached entry so that change is not visible to the next caller that hits the cache.

And maybe it is fine - the Decl part should be custom to each cycle. In that case we should probably removed it from cache entry and have it as local var in this function.

But if it is integral part of cache entry w'll probably need larger sync scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. The updated field in Declaration is definitely not integral part of the cache entry. Therefore, next retrievals don't need to be aware of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y'all are right on the money with your analysis as far as I can tell. I'll approve this in the interim though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants