Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC requests accounting #426

Merged
merged 4 commits into from
Apr 1, 2024
Merged

Conversation

jianoaix
Copy link
Contributor

@jianoaix jianoaix commented Apr 1, 2024

Why are these changes needed?

The current NumBlobRequests is counting the same RPC requests multiple times (one for each quorum). This is not right for API accounting.

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@@ -507,13 +532,15 @@ func (s *DispersalServer) GetBlobStatus(ctx context.Context, req *pb.BlobStatusR

requestID := req.GetRequestId()
if len(requestID) == 0 {
s.metrics.HandleInvalidArgRpcRequest("GetBlobStatus")
s.metrics.HandleInvalidArgRequest("GetBlobStatus")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be removed later: quorum tracking doesnt' apply to GetBlobStatus

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we remove it now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a sanity check for the new metric with new v.s. old comparison

@@ -614,6 +646,7 @@ func (s *DispersalServer) RetrieveBlob(ctx context.Context, req *pb.RetrieveBlob

origin, err := common.GetClientAddress(ctx, s.rateConfig.ClientIPHeader, 2, true)
if err != nil {
s.metrics.HandleInvalidArgRpcRequest("RetrieveBlob")
s.metrics.HandleInvalidArgRequest("RetrieveBlob")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be removed later: quorum tracking doesnt' apply to RetrieveBlob

@jianoaix jianoaix requested review from mooselumph and ian-shim April 1, 2024 21:27
@@ -413,11 +433,13 @@ func (s *DispersalServer) checkRateLimitsAndAddRatesToHeader(ctx context.Context

globalRates, ok := s.rateConfig.QuorumRateInfos[param.QuorumID]
if !ok {
s.metrics.HandleInternalFailureRpcRequest("DisperseBlobAuthenticated")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkRateLimits... is called by both "DisperseBlob" and "DispserseBlobAuthenticated". Maybe we need to pass in the apiMethod argument here as elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done

return nil, api.NewInternalError(fmt.Sprintf("missing confirmation information: %s", err.Error()))
}

s.metrics.HandleSuccessfulRpcRequest("GetBlobStatus")

s.logger.Debug("isConfirmed", "metadata", metadata, "isConfirmed", isConfirmed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but should we remove this debug call? Seems like metadata could be large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to just log metadataKey

return nil, api.NewInternalError(fmt.Sprintf("ratelimiter error: %v", err))
}
if !allowed {
s.metrics.HandleRateLimitedRpcRequest("RetrieveBlob")
s.metrics.HandleFailedRequest(codes.ResourceExhausted.String(), "", 0, "RetrieveBlob")
Copy link
Collaborator

@mooselumph mooselumph Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we also remove this quorum specific metrics? (here and below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do it in followup, making the goal of this PR focused

Copy link
Collaborator

@mooselumph mooselumph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Does this PR address the overcounting issue? Can you describe how it resolves it?

@jianoaix
Copy link
Contributor Author

jianoaix commented Apr 1, 2024

Looks good. Does this PR address the overcounting issue? Can you describe how it resolves it?

It's straightforward and just avoid updating the metric for each quorum

@jianoaix jianoaix merged commit b94604e into Layr-Labs:master Apr 1, 2024
5 checks passed
@mooselumph
Copy link
Collaborator

A little late, but I'm confused about why some of the non-RPC metrics loop over the quorums and others do not:

s.metrics.HandleFailedRequest(codes.InvalidArgument.String(), fmt.Sprint(quorumID), len(request.DisperseRequest.GetData()), "DisperseBlobAuthenticated")

vs

s.metrics.HandleInvalidArgRequest("DisperseBlobAuthenticated")

How are the blob-specific metrics meant to be used here?

@jianoaix
Copy link
Contributor Author

jianoaix commented Apr 1, 2024

A little late, but I'm confused about why some of the non-RPC metrics loop over the quorums and others do not:

s.metrics.HandleFailedRequest(codes.InvalidArgument.String(), fmt.Sprint(quorumID), len(request.DisperseRequest.GetData()), "DisperseBlobAuthenticated")

vs

s.metrics.HandleInvalidArgRequest("DisperseBlobAuthenticated")

How are the blob-specific metrics meant to be used here?

Yea, this is another issue of current metric: it's not consistent. As a result, the number shows on this metric is not well defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants