Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnlineDDL: reduce vrepl_stress workload in forks #14302

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -135,13 +135,16 @@ var (
writeMetrics WriteMetrics
)

const (
maxTableRows = 4096
workloadDuration = 5 * time.Second
var (
maxConcurrency = 20
singleConnectionSleepInterval = 2 * time.Millisecond
countIterations = 5
migrationWaitTimeout = 60 * time.Second
)

const (
maxTableRows = 4096
workloadDuration = 5 * time.Second
migrationWaitTimeout = 60 * time.Second
)

func resetOpOrder() {
Expand All @@ -157,6 +160,18 @@ func nextOpOrder() int64 {
return opOrder
}

func TestInitialSetup(t *testing.T) {
repo, ok := os.LookupEnv("GITHUB_REPOSITORY") // `ok` tells us the env variable exists, hence that we are running in GitHub CI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do decide to stick with this, the CI env variable is IMO the right one to use here: https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables

We use that in at least one other spot. If that's too general, then a more Actions specific one would be GITHUB_ACTION_REPOSITORY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is too broad and does not indicate that this is a GitHub CI, where we're specifically looking to identify if this is a GitHub CI and which specific repository it is.

GITHUB_ACTION_REPOSITORY is actually not what we are looking for. It will not tell you vitessio/vitess. It will tell you, e.g. actions/checkout, i.e. the name of the repository of the action code (a-la import <url> path). What we want to know is the name of the vitess repository this actions runs on.

GITHUB_REPOSITORY is therefore the most (and only?) correct variable that identifies where the GitHub CI is running on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike most of the above, given the suggestion of using vCPUs. I'll ignore the GitHub repository name altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, strike all of the above! The vCPU based approach is 💯

t.Logf("==== repo=%v", repo)
if ok && repo != "vitessio/vitess" {
// `vitessio/vitess` repository enjoys faster runners. Otherwise, GitHub CI has much slower runners
// and we have to reduce the workload
maxConcurrency = maxConcurrency / 2
singleConnectionSleepInterval = singleConnectionSleepInterval * 2
}
Copy link
Contributor

@mattlord mattlord Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details of runners are dynamic for a repo. IMO we should instead base the settings on the runtime. For example, something like this:

	t.Helper()
	vCPUs := runtime.NumCPU()
	maxConcurrency = vCPUs
	sleepModifier := int(math.Max(float64(4-(vCPUs/4)), 1))
	singleConnectionSleepInterval = time.Duration((int(singleConnectionSleepInterval.Milliseconds()) * sleepModifier) * 1000) // ms to us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By all means. I guess with GitHub CI we still need to take into account noisy neighbors? Otherwise I'm game to any such change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runners available for Actions in a given repo can be changed at any time in Settings->Actions->Runners. And the specific runner used for a workflow can be changed at any time in the yaml. That's why making assumptions based on repo isn't a great solution IMO, since what we really care about (and what we're assuming here based on the repo) are the resources/specs for the runner. Noisy neighbors is an unrelated issue (although a real one with VMs generally).

t.Logf("==== test setup: maxConcurrency=%v, singleConnectionSleepInterval=%v", maxConcurrency, singleConnectionSleepInterval)
}

func TestMain(m *testing.M) {
defer cluster.PanicHandler(nil)
flag.Parse()
Expand Down Expand Up @@ -377,6 +392,9 @@ func checkTablesCount(t *testing.T, tablet *cluster.Vttablet, showTableName stri
query := fmt.Sprintf(`show tables like '%%%s%%';`, showTableName)
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
ticker := time.NewTicker(time.Second)
defer ticker.Stop()

rowcount := 0

for {
Expand All @@ -388,7 +406,7 @@ func checkTablesCount(t *testing.T, tablet *cluster.Vttablet, showTableName stri
}

select {
case <-time.After(time.Second):
case <-ticker.C:
continue // Keep looping
case <-ctx.Done():
// Break below to the assertion
Expand Down Expand Up @@ -502,6 +520,9 @@ func runSingleConnection(ctx context.Context, t *testing.T) {
_, err = conn.ExecuteFetch("set transaction isolation level read committed", 1000, true)
require.Nil(t, err)

ticker := time.NewTicker(singleConnectionSleepInterval)
defer ticker.Stop()

for {
switch rand.Int31n(3) {
case 0:
Expand All @@ -515,7 +536,7 @@ func runSingleConnection(ctx context.Context, t *testing.T) {
case <-ctx.Done():
log.Infof("Terminating single connection")
return
case <-time.After(singleConnectionSleepInterval):
case <-ticker.C:
}
assert.Nil(t, err)
}
Expand Down
15 changes: 11 additions & 4 deletions go/test/endtoend/onlineddl/vtgate_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -247,9 +247,13 @@ func WaitForMigrationStatus(t *testing.T, vtParams *mysql.ConnParams, shards []c
for _, status := range expectStatuses {
statusesMap[string(status)] = true
}
startTime := time.Now()
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
ticker := time.NewTicker(time.Second)
defer ticker.Stop()

lastKnownStatus := ""
for time.Since(startTime) < timeout {
for {
countMatchedShards := 0
r := VtgateExecQuery(t, vtParams, query, "")
for _, row := range r.Named().Rows {
Expand All @@ -266,9 +270,12 @@ func WaitForMigrationStatus(t *testing.T, vtParams *mysql.ConnParams, shards []c
if countMatchedShards == len(shards) {
return schema.OnlineDDLStatus(lastKnownStatus)
}
time.Sleep(1 * time.Second)
select {
case <-ctx.Done():
return schema.OnlineDDLStatus(lastKnownStatus)
case <-ticker.C:
}
}
return schema.OnlineDDLStatus(lastKnownStatus)
}

// CheckMigrationArtifacts verifies given migration exists, and checks if it has artifacts
Expand Down
Loading