-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/gql: Cache the SiteProductVersion query for up to 10 minutes #6111
base: main
Are you sure you want to change the base?
Conversation
@valerybugakov I would love your input on all of this. Two things I'm particularly uncertain about: The cache de-duplicates concurrent requests, because we tend to send a lot of the same request at once. Requests share a result because of the de-duping. We only abort a request when the last concurrent client aborts it. But this means it is possible to:
Of course, we could preserve the one-aborts-one semantics. Do you think it is important to do that? Errors use exponential backoff up to 10 minutes. Do you think it is reasonable to treat all kinds of errors like that? If your internet dropped out for 30 minutes, it seems weak to make you wait an extra 10 minutes to get back online... |
Note to self: needs updated snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, we could preserve the one-aborts-one semantics. Do you think it is important to do that?
The scenario you mentioned seems unlikely. In the ideal world, we don't use promise-based APIs in user-land but rely on the siteVersion
observable, which would handle this case for us and serve as one source of truth. My vote goes for keeping the de-duplication logic as-is for now.
Errors use exponential backoff up to 10 minutes. Do you think it is reasonable to treat all kinds of errors like that? If your internet dropped out for 30 minutes, it seems weak to make you wait an extra 10 minutes to get back online...
With the current state of the product, if I needed an updated config, I'd probably just reload the whole VS Code instance. To be thorough, it seems better to have a complete solution across the extension so that after any long connectivity issue, I can trust that everything in Cody is up to date without needing manual checks.
So, I agree that it seems weak, but handling this edge case here won't make a difference. To address this, we need a centralized solution.
|
||
this.fetchTimeMsec = now | ||
|
||
const thisFetch = fetcher(this.aborts.signal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could fetcher
throw before returning the promise? Do we want to handle is here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior as-is should be perfect for our GraphQL APIs, which lift network errors and so on into Promise resolves not rejects.
If we end up here, then we are starting a new fetch. If fetcher
throws before returning, the current caller will get a rejected Promise. Because we do not update thisFetch
, the next caller will start a new one.
Is the idea behind the comment that we should do exponential backoff in that case too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea behind the comment that we should do exponential backoff in that case too?
Yes. Would it be helpful to handle sync errors in the same way we handle async errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because they're probably not the kind of transient failure we're interested in retrying. But for now I've implemented the suggestion... I think this might me making the tests flaky for me, let's see what CI thinks...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is in 7ebfde2 if you are interested, but for now I have rolled it back to not do that. It seems slower and flakier, have not dug into why.
} | ||
|
||
async get( | ||
signal: AbortSignal | undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where we want to pass undefined
here? It would be strange to read cache.get(undefined, signal => fetch(...))
. It may be worth changing the call signature or narrowing the first arg type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some callers to the GraphQL method don't care about aborts and pass undefined
.
I like the callback in the end position because it can be a multiline expression without looking weird. setTimeout
with the callback first makes my teeth itch.
This reverts commit 3d9b0196483cf5f73eefa0c2b23d1139c6e687c0.
3a6d347
to
ebe7f52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
7ebfde2
to
2f7de58
Compare
We hammer the SiteProductVersion GraphQL endpoint really hard. This adds a cache in front of it to reduce the heavy traffic to that endpoint. The cache also makes the product faster.
Changes the
currentSiteVersion
to use this cache directly instead of burying it in an observable where it may cache transient errors like network errors.I tried writing an exponential-backoff-and-retry Observable, and it is possible, but it is a bit tortured because the downstream consumer pings the upstream "(re)try" producer to cause the Observable to re-emit. I judged caching in the GraphQL client and retrying when called was simpler.
One subtlety among many: Aborts cause exponential backoff. This is a feature: it prevents a misbehaving client (which we have suffered recently) from initiating and then immediately aborting many requests rapidly.
Test plan
New automated test:
Manual test:
graphql.fetch.SiteProductVersion.cached - 0.473875ms
appear. For example, run the product in the debugger and check the Debug Console for those entries.graphql.fetch.SiteProductVersion - 261.213334ms
appear.Changelog