VA: Add a method for performing MPIC compliant challenge validation #7794

beautifulentropy · 2024-11-08T22:20:11Z

Add VA.ValidateChallenge, a new MPIC compliant gRPC method that will replace VA.PerformValidation for the validation of ACME challenges. A follow-up will add VA.CheckCAA and an RA feature flag that enables their use.

Part of #7615
Part of #7614
Part of #7616

Ballot Summary for Reviewers

You can read the full ballot contents here. I have pulled together a summary below:

3.2.2.9 Multi-Perspective Issuance Corroboration

... Furthermore, for any pair of DNS resolvers used on a Multi-Perspective Issuance Corroboration attempt, the straight-line distance between the two States, Provinces, or Countries the DNS resolvers reside in MUST be at least 500 km. The location of a DNS resolver is determined by the point where unencapsulated outbound DNS queries are typically first handed off to the network infrastructure providing Internet connectivity to that DNS resolver.

This PR does not attempt to satisfy the aforementioned distance requirement. This will need to be satisfied as part of the datacenter selection process for perspectives.

Table: Quorum Requirements

# of Distinct Remote Network Perspectives Used # of Allowed non-Corroborations

2-5 1

6+ 2

...

Phased Implementation Timeline

Effective March 15, 2025, the CA MUST implement Multi-Perspective Issuance Corroboration using at least two (2) remote Network Perspectives. The CA MAY proceed with certificate issuance if the number of remote Network Perspectives that do not corroborate the determinations made by the Primary Network Perspective ("non-corroborations") is greater than allowed in the Quorum Requirements table.

Effective September 15, 2025, the CA MUST implement Multi-Perspective Issuance Corroboration using at least two (2) remote Network Perspectives. The CA MUST NOT proceed with certificate issuance if the number of non-corroborations is greater than allowed in the Quorum Requirements table.

Effective March 15, 2026, the CA MUST implement Multi-Perspective Issuance Corroboration using at least three (3) remote Network Perspectives. The CA MUST NOT proceed with certificate issuance if the number of non-corroborations is greater than allowed in the Quorum Requirements table and if the remote Network Perspectives that do corroborate the determinations made by the Primary Network Perspective do not fall within the service regions of at least two (2) distinct Regional Internet Registries.

These requirements are satisfied by this PR.

Effective June 15, 2026, the CA MUST implement Multi-Perspective Issuance Corroboration using at least four (4) remote Network Perspectives. The CA MUST NOT proceed with certificate issuance if the number of non-corroborations is greater than allowed in the Quorum Requirements table and if the remote Network Perspectives that do corroborate the determinations made by the Primary Network Perspective do not fall within the service regions of at least two (2) distinct Regional Internet Registries.

Effective December 15, 2026, the CA MUST implement Multi-Perspective Issuance Corroboration using at least five (5) remote Network Perspectives. The CA MUST NOT proceed with certificate issuance if the number of non-corroborations is greater than allowed in the Quorum Requirements table and if the remote Network Perspectives that do corroborate the determinations made by the Primary Network Perspective do not fall within the service regions of at least two (2) distinct Regional Internet Registries.

These requirements are not satisfied by this PR. The following code will need to be updated to reject validation requests when fewer than 4 (later 5) remote VAs are required.

func (va *ValidationAuthorityImpl) remoteValidateChallenge(ctx context.Context, req *vapb.ValidationRequest) (mpicSummary, *probs.ProblemDetails) {
	remoteVACount := len(va.remoteVAs)
	if remoteVACount < 3 {
		return mpicSummary{}, probs.ServerInternal("Insufficient remote perspectives: need at least 3")
	}

5.4.1 Types of events recorded

Multi-Perspective Issuance Corroboration attempts from each Network Perspective, minimally recording the following information:
- a. an identifier that uniquely identifies the Network Perspective used;
- b. the attempted domain name and/or IP address; and
- c. the result of the attempt (e.g., "domain validation pass/fail", "CAA permission/prohibition").>

Multi-Perspective Issuance Corroboration quorum results for each attempted domain name or IP address represented in a Certificate request (i.e., "3/4" which should be interpreted as "Three (3) out of four (4) attempted Network Perspectives corroborated the determinations made by the Primary Network Perspective).

These requirements are satisfied by this PR.

grpc/pb-marshalling.go

jsha

This change does two things:

Add logging and threshold checking required by MPIC
Reimplement VA.PerformValidation as VA.ValidateChallenge, with the important distinction that VA.ValidateChallenge does not check CAA.

I remember we discussed during standup last week some challenges around multi-perspective CAA rechecking that led to (2), but I think we forgot to write down the details. I did my best to write down what I remember of it: #7808.

Looking at (2) in this PR, I'm concerned about how much near-duplication of code it results in. I'd rather do some refactoring to implement (1) in the existing code, and treat (2) as separate followup change - or possibly take a different approach entirely, like the alternatives mentioned in #7808.

If we do pursue (2) as a followup change, my goal would be to reuse as much code as possible between the two RPC methods, so we have less code, and simpler code, to reason about.

va/vampic.go

va/proto/va.proto

jsha · 2024-11-13T22:20:39Z

va/vampic.go

+func (va *ValidationAuthorityImpl) observeLatency(op, perspective, challType, probType, result string, latency time.Duration) {
+	labels := prometheus.Labels{
+		"operation":      op,
+		"perspective":    perspective,
+		"challenge_type": challType,
+		"problem_type":   probType,
+		"result":         result,
+	}
+	va.metrics.validationLatency.With(labels).Observe(latency.Seconds())


This function doesn't add much vs calling Observe() directly, other than moving from named fields (across multiple lines) to positional fields in the function call. While this allows the call sites to use a single line, it makes it less obvious at the call site that the parameters are correct (also the same result could be achieved by using .WithLabelValues(op, perspective, challType,...).

The two call sites look like this:

va.observeLatency(challenge, va.perspective, string(chall.Type), probType, outcome, localLatency) if va.isPrimaryVA() { // Observe total validation latency (primary+remote). va.observeLatency(challenge, all, string(chall.Type), probType, outcome, va.clk.Since(start))

Since several of those are the same, let's use the cool .CurryWith() method to reduce duplication while still keeping the clarity of naming the labels inline:

hist := va.metrics.validationLatency.CurryWith(prometheus.Labels{ "operation": challenge, "challenge_type": chall.Type, "problem_type": probType, "result": outcome, } hist.With({"perspective": va.perspective}).Observe(localLatency.Seconds()) if va.isPrimaryVA() { hist.With({"perspective": all}).Observe(va.clk.Since(start))

#7799 adds two more call sites. This helper keeps the label boilerplate out of the way. It is less likely to result in mislabeling or missing a label altogether than .WithLabelValues(. The .CurryWith() suggestion is a nice trick though! I had forgotten that existed.

va/vampic.go

jsha · 2024-11-14T21:13:07Z

We talked about ways to extract out some of the non-core parts of this change into their own changes so the important stuff is more readily visible. Some possibilities:

In the current code we have:

	type rvaResult struct {
		hostname string
		response *vapb.ValidationResult
		err      error
	}

	results := make(chan *rvaResult, len(va.remoteVAs))

In the new code we have:

	type response struct {
		addr   string
		result *vapb.ValidationResult
		err    error
	}

	responses := make(chan *response, remoteVACount)

That is, renaming rvaResult to response, hostname to addr, and results to responses. Those renamings seem unobjectionable to me, but we can make them in the original code first, so there are fewer diffs when comparing with the new code.

The current code counts good and bad results in integers. This PR instead accumulates them as slices; that's great! We can backport that code into PerformValidation.

The old code has:

				currProb = probs.ServerInternal("Remote PerformValidation RPC failed")

The new code has:

				currProb = probs.ServerInternal("Secondary domain validation RPC failed")

Again, a fine change but one we can make in the existing PerformValidation code.

In va_test.go there is a rename of cancelledVA to canceledVA that touches a lot of lines. I'm all for consistency of spelling, but that should be its own PR. That reduces diffs, and it also allows us to better focus on whether we hit all the spots we intend to hit. For instance there's a variable name on line 398 that needs updating to be consistent.

This PR introduces the new validation_latency metric that is similar to validation_time except that it renames the type label to challenge_type and adds operation and perspective. We can add code to PerformValidation that increments this (with operation being challenge+caa). Also, FWIW, we should put the initialization of this metric next to the initialization for validation_time and explain the difference.

jsha · 2024-11-14T21:14:35Z

va/vampic.go

+	if remoteVACount < 3 {
+		return mpicSummary{}, probs.ServerInternal("Insufficient remote perspectives: need at least 3")
+	}


Since the number of remoteVAs is determined at construction time, it seems better to put this check in NewValidationAuthorityImpl.

Agreed, it should eventually move there. Today that change would make remote VAs a requirement to run Boulder starting in the next release.

go.mod

Bring this code more in line with `VA.remoteDoDCV` in #7794. This should make these two easier to diff in review.

jsha · 2024-11-15T23:31:22Z

Note: not a complete review, I just wanted to get some notes down before I go get the kid.

--- /tmp/old.txt	2024-11-15 14:57:20.746779025 -0800
+++ /tmp/new.txt	2024-11-15 14:58:03.728594914 -0800
@@ -1,12 +1,11 @@
-func (va *ValidationAuthorityImpl) performRemoteValidation(
-	ctx context.Context,
-	req *vapb.PerformValidationRequest,
-) *probs.ProblemDetails {
+func (va *ValidationAuthorityImpl) remoteDoDCV(ctx context.Context, req *vapb.DCVRequest) (mpicSummary, *probs.ProblemDetails) {
+	// Mar 15, 2026: MUST implement using at least 3 perspectives
+	// Jun 15, 2026: MUST implement using at least 4 perspectives
+	// Dec 15, 2026: MUST implement using at least 5 perspectives
 	remoteVACount := len(va.remoteVAs)
-	if remoteVACount == 0 {
-		return nil
+	if remoteVACount < 3 {
+		return mpicSummary{}, probs.ServerInternal("Insufficient remote perspectives: need at least 3")
 	}
-
 	type response struct {
 		addr   string
 		result *vapb.ValidationResult
@@ -15,34 +14,31 @@
 
 	responses := make(chan *response, remoteVACount)
 	for _, i := range rand.Perm(remoteVACount) {
-		go func(rva RemoteVA, out chan<- *response) {
-			res, err := rva.PerformValidation(ctx, req)
-			out <- &response{
-				addr:   rva.Address,
-				result: res,
-				err:    err,
-			}
-		}(va.remoteVAs[i], responses)
+		go func(rva RemoteVA) {
+			res, err := rva.DoDCV(ctx, req)
+			responses <- &response{rva.Address, res, err}
+		}(va.remoteVAs[i])
 	}
 
-	required := remoteVACount - va.maxRemoteFailures
 	var passed []string
 	var failed []string
+	passedRIRs := make(map[string]struct{})
+
 	var firstProb *probs.ProblemDetails
+	for i := 0; i < remoteVACount; i++ {
+		resp := <-responses
 
-	for resp := range responses {
 		var currProb *probs.ProblemDetails
-
 		if resp.err != nil {
 			// Failed to communicate with the remote VA.
 			failed = append(failed, resp.addr)
-
-			if canceled.Is(resp.err) {
-				currProb = probs.ServerInternal("Remote PerformValidation RPC canceled")
+			if errors.Is(resp.err, context.Canceled) {
+				currProb = probs.ServerInternal("Secondary domain validation RPC canceled")
 			} else {
-				va.log.Errf("Remote VA %q.PerformValidation failed: %s", resp.addr, resp.err)
-				currProb = probs.ServerInternal("Remote PerformValidation RPC failed")
+				va.log.Errf("Remote VA %q.ValidateChallenge failed: %s", resp.addr, resp.err)
+				currProb = probs.ServerInternal("Secondary domain validation RPC failed")
 			}
+
 		} else if resp.result.Problems != nil {
 			// The remote VA returned a problem.
 			failed = append(failed, resp.result.Perspective)
@@ -50,37 +46,53 @@
 			var err error
 			currProb, err = bgrpc.PBToProblemDetails(resp.result.Problems)
 			if err != nil {
-				va.log.Errf("Remote VA %q.PerformValidation returned malformed problem: %s", resp.addr, err)
-				currProb = probs.ServerInternal("Remote PerformValidation RPC returned malformed result")
+				va.log.Errf("Remote VA %q.ValidateChallenge returned a malformed problem: %s", resp.addr, err)
+				currProb = probs.ServerInternal("Secondary domain validation RPC returned malformed result")
 			}
+
 		} else {
 			// The remote VA returned a successful result.
 			passed = append(passed, resp.result.Perspective)
+			passedRIRs[resp.result.Rir] = struct{}{}
 		}
 
 		if firstProb == nil && currProb != nil {
 			// A problem was encountered for the first time.
 			firstProb = currProb
 		}
+	}
+
+	// Prepare the summary, this MUST be returned even if the validation failed.
+	summary := prepareSummary(passed, failed, passedRIRs, remoteVACount)
 
-		if len(passed) >= required {
-			// Enough successful responses to reach quorum.
-			return nil
+	maxRemoteFailures := maxValidationFailures(remoteVACount)
+	if len(failed) > maxRemoteFailures {
+		// Too many failures to reach quorum.
+		if firstProb != nil {
+			firstProb.Detail = fmt.Sprintf("During secondary domain validation: %s", firstProb.Detail)
+			return summary, firstProb
 		}
-		if len(failed) > va.maxRemoteFailures {
-			// Too many failed responses to reach quorum.
-			firstProb.Detail = fmt.Sprintf("During secondary validation: %s", firstProb.Detail)
-			return firstProb
+		return summary, probs.ServerInternal("Secondary domain validation failed due to too many failures")
+	}
+
+	if len(passed) < (remoteVACount - maxRemoteFailures) {
+		// Too few successful responses to reach quorum.
+		if firstProb != nil {
+			firstProb.Detail = fmt.Sprintf("During secondary domain validation: %s", firstProb.Detail)
+			return summary, firstProb
 		}
+		return summary, probs.ServerInternal("Secondary domain validation failed due to insufficient successful responses")
+	}
 
-		// If we somehow haven't returned early, we need to break the loop once all
-		// of the VAs have returned a result.
-		if len(passed)+len(failed) >= remoteVACount {
-			break
+	if len(passedRIRs) < 2 {
+		// Too few successful responses from distinct RIRs to reach quorum.
+		if firstProb != nil {
+			firstProb.Detail = fmt.Sprintf("During secondary domain validation: %s", firstProb.Detail)
+			return summary, firstProb
 		}
+		return summary, probs.Unauthorized("Secondary domain validation failed to receive enough corroborations from distinct RIRs")
 	}
 
-	// This condition should not occur - it indicates the passed/failed counts
-	// neither met the required threshold nor the maxRemoteFailures threshold.
-	return probs.ServerInternal("Too few remote PerformValidation RPC results")
+	// Enough successful responses from distinct RIRs to reach quorum.
+	return summary, nil
 }

Now with some of the formatting and naming changes merge to main, I pulled performRemoteValidation from main and did a diff against remoteDoDCV in this branch. There are still a few formatting and naming diffs that make it hard to zero in on the functionality diffs. Can we eliminate those? For instance:

			out <- &response{
				addr:   rva.Address,
				result: res,
				err:    err,
			}

Became:

			responses <- &response{rva.Address, res, err}

Poking around it looks like the latter is the more common style we use for this pattern, so let's merge it upstream as well.

Also, the "Remote PerformValidation RPC" to "Secondary domain validation" error message changes didn't wind up making it into the backports.

Similarly:

  for resp := range responses {

Became:

  for i := 0; i < remoteVACount; i++ {

I think the former is more appropriate in this case.

if canceled.Is(resp.err) {

Became:

if errors.Is(resp.err, context.Canceled) {

The documentation for canceled.Is() says it checks for both context.Canceled and grpc/codes.Canceled. Thinking about it, I can't think of a reason we would receive grpc/codes.Canceled here. The canceled package was introduced to solve a particular CT related logging problem (#3447), and also predated errors.Is (I think). So the switch here to errors.Is seems correct to me. Let's backport it. I actually wonder now if we need this canceled check at all. For one thing, in the new code we never return early and so never cancel anything. However, even in the old code, we would cancel after leaving the loop so would never expect to get a canceled response. Gonna take a look at the git history for this check.

Other than that the diff here seems to do what I expect. Could you update the PR description to describe how DoDCV behaves differently than PerformValidation? Here's what I've come to understand from factoring out some of the components:

Doesn't do CAA checking
Waits for all backends to return, doesn't try to return early if it gets too many failures, or enough successes.
Enforces that there are 2+ distinct RIRs among the successful responses.

That last one is probably implicit in the ballot summary text but I think it's useful to list it in the executive summary up top.

jsha

Since this is the PR where we're adding enforcement of MPIC-related constraints, there are some useful checks to add:

There are no duplicate perspectives.
If a VA considers itself remote (i.e. no backends), its perspective and RIR are non-empty. I'm sure these can also be expressed in the config validation language, but IMO it doesn't hurt to check them in the constructor as well.
If a VA considers itself remote, its Perspective is not PrimaryPerspective. Note: this will run into some trouble in prod because we are using boulder va, not yet boulder-remoteva.

va/va_test.go

jsha · 2024-11-16T01:31:58Z

va/va_test.go

+	// ua if set to pass, the remote VA will always pass validation. If set to
+	// fail, the remote VA will always fail validation with probs.Unauthorized.
+	// This is set to pass by default.
+	ua string


I think this comment is not quite correct. ua is set to "" by default, by the normal zero value rules.

Looking in setupRVAs, if it receives an empty ua it will default to "user agent 1.0".

Suggested change

// ua if set to pass, the remote VA will always pass validation. If set to

// fail, the remote VA will always fail validation with probs.Unauthorized.

// This is set to pass by default.

ua string

// ua the user agent to be used by this remote VA. Different test cases use the

// user agent string to accept or reject VAs selectively to create conditions for

// testing. For instance, `TestDoDCVMPIC` accepts all requests with a user agent

// of "pass".

ua string

Previously this was a configuration field. Ports `maxAllowedFailures()` from `determineMaxAllowedFailures()` in #7794. Test updates: Remove the `maxRemoteFailures` param from `setup` in all VA tests. Some tests were depending on setting this param directly to provoke failures. For example, `TestMultiVAEarlyReturn` previously relied on "zero allowed failures". Since the number of allowed failures is now 1 for the number of remote VAs we were testing (2), the VA wasn't returning early with an error; it was succeeding! To fix that, make sure there are two failures. Since two failures from two RVAs wouldn't exercise the right situation, add a third RVA, so we get two failures from three RVAs. Similarly, TestMultiCAARechecking had several test cases that omitted this field, effectively setting it to zero allowed failures. I updated the "1 RVA failure" test case to expect overall success and added a "2 RVA failures" test case to expect overall failure (we previously expected overall failure from a single RVA failing). In TestMultiVA I had to change a test for `len(lines) != 1` to `len(lines) == 0`, because with more backends we were now logging more errors, and finding e.g. `len(lines)` to be 2.

beautifulentropy force-pushed the mpic-part-two branch 2 times, most recently from d77a42b to 14a8ac5 Compare November 8, 2024 22:28

beautifulentropy changed the title ~~WIP~~ VA: Add a method for performing MPIC compliant challenge validation Nov 8, 2024

VA: Add a method for performing MPIC compliant challenge validation

29dee31

beautifulentropy force-pushed the mpic-part-two branch from 14a8ac5 to 29dee31 Compare November 8, 2024 22:30

beautifulentropy marked this pull request as ready for review November 8, 2024 22:53

beautifulentropy requested a review from a team as a code owner November 8, 2024 22:53

beautifulentropy requested a review from jsha November 8, 2024 22:53

beautifulentropy commented Nov 8, 2024

View reviewed changes

grpc/pb-marshalling.go Outdated Show resolved Hide resolved

beautifulentropy added 3 commits November 8, 2024 18:42

Extract summary prep.

1149ac4

Small fix to summary initialization and prep

f7da26d

Some more tidying.

6722e0e

beautifulentropy mentioned this pull request Nov 13, 2024

VA: Add a method for performing MPIC compliant CAA checks #7799

Draft

jsha requested changes Nov 14, 2024

View reviewed changes

jsha mentioned this pull request Nov 14, 2024

va: compute maxRemoteFailures based on MPIC #7810

Merged

jsha reviewed Nov 14, 2024

View reviewed changes

beautifulentropy added 3 commits November 14, 2024 16:48

Addressing comments

19b0acf

Comment grammar and removing a renaming

ca5fdd1

expectedKeyAuthorization

c9836ac

beautifulentropy mentioned this pull request Nov 14, 2024

VA: Cleanup performRemoteValidation #7814

Merged

jsha reviewed Nov 15, 2024

View reviewed changes

go.mod Outdated Show resolved Hide resolved

jsha pushed a commit that referenced this pull request Nov 15, 2024

VA: Cleanup performRemoteValidation (#7814)

b2b5645

Bring this code more in line with `VA.remoteDoDCV` in #7794. This should make these two easier to diff in review.

jsha requested changes Nov 16, 2024

View reviewed changes

va/va_test.go Outdated Show resolved Hide resolved

jsha reviewed Nov 16, 2024

View reviewed changes

beautifulentropy added 3 commits November 19, 2024 10:21

Merge remote-tracking branch 'origin/main' into mpic-part-two

559575e

Merge remote-tracking branch 'origin/main' into mpic-part-two

3cda0b5

Use maxAllowedFailures

d10a884

beautifulentropy added 3 commits November 19, 2024 16:57

Bring remoteDoDCV closer into alignment with performRemoteValidation

570513e

Merge remote-tracking branch 'origin/main' into mpic-part-two

55ecd50

More changes to bring DoDCV into better alignment for PerformValidation

6782af5

beautifulentropy force-pushed the mpic-part-two branch from 22acec8 to 6782af5 Compare November 20, 2024 00:07

Merge remote-tracking branch 'origin/main' into mpic-part-two

6ef1fd0

beautifulentropy force-pushed the mpic-part-two branch from dd5c024 to 6ef1fd0 Compare November 20, 2024 19:40

Merge upstream test changes

198060f

beautifulentropy marked this pull request as draft November 25, 2024 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VA: Add a method for performing MPIC compliant challenge validation #7794

VA: Add a method for performing MPIC compliant challenge validation #7794

beautifulentropy commented Nov 8, 2024 •

edited

Loading

jsha left a comment

jsha Nov 13, 2024

beautifulentropy Nov 14, 2024 •

edited

Loading

jsha commented Nov 14, 2024

jsha Nov 14, 2024

beautifulentropy Nov 14, 2024 •

edited

Loading

jsha commented Nov 15, 2024 •

edited

Loading

jsha left a comment

jsha Nov 16, 2024

VA: Add a method for performing MPIC compliant challenge validation #7794

Are you sure you want to change the base?

VA: Add a method for performing MPIC compliant challenge validation #7794

Conversation

beautifulentropy commented Nov 8, 2024 • edited Loading

Ballot Summary for Reviewers

3.2.2.9 Multi-Perspective Issuance Corroboration

5.4.1 Types of events recorded

jsha left a comment

Choose a reason for hiding this comment

jsha Nov 13, 2024

Choose a reason for hiding this comment

beautifulentropy Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

jsha commented Nov 14, 2024

jsha Nov 14, 2024

Choose a reason for hiding this comment

beautifulentropy Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

jsha commented Nov 15, 2024 • edited Loading

jsha left a comment

Choose a reason for hiding this comment

jsha Nov 16, 2024

Choose a reason for hiding this comment

beautifulentropy commented Nov 8, 2024 •

edited

Loading

beautifulentropy Nov 14, 2024 •

edited

Loading

beautifulentropy Nov 14, 2024 •

edited

Loading

jsha commented Nov 15, 2024 •

edited

Loading