Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: mtr run that runs last N failed tests #178

Merged
merged 9 commits into from
Dec 28, 2023
Merged

Conversation

vuvova
Copy link
Member

@vuvova vuvova commented Sep 26, 2023

This is work-in-progress commit to have fast branch protection builders with a guarantee that a test, once failed, is fixed before further pushes are allowed. It works by running last N failed tests.

Note:

  • it likely uses twisted incorrectly, needs to be fixed
  • commit only provides a factory, it's not actually used for any builders, let's switch one builder over

After it is made to work and proves useful, needs to be extended as:

  • better to use MTR.runQueryWithRetry instead of runQuery
  • also test new/touched tests from the commit and remove removed tests from the list
  • need to have "test everything" builder, even if slow
  • do something about skips (makes little sense to run N failed tests if they're all skipped)
  • typ values made consistent over all builders so that they could be used for filtering
  • it uses only normal protocol, should be changed to take ps/embedded/view/etc as an argument. When branch protection quick builders will run ps/embedded/view/etc
  • the way SELECT works it looks for 50 unique test names within last 1000 failures. I want to make buildbot to autotune the second limit. The smaller it is, the faster will the query run
  • uses N=50 now, this could be increases to catch more failures or decreased to run faster
  • more branch protection builders should use it
  • gcov builder
  • use failures from old buildbot too or may be not, if we're shutting it down

@vuvova vuvova requested a review from vladbogo September 26, 2023 12:07
@vladbogo
Copy link
Collaborator

vladbogo commented Sep 26, 2023

Changed to use a custom step to fetch the steps since @util.renderer
cannot be combined with a generator.

Things to solve:

  • Probably the overall logic can be simplified since the new step can set directly mtr_additional_args
  • Removed typ since the retrieved tests are empty otherwise
  • MTR call fails still fails because if complains about a test not being found
  • --suite=main is added if no failed tests are found
  • add a name for the new step

@vladbogo
Copy link
Collaborator

There still are some problems that I didn't manage to solve today. The builder https://buildbot.mariadb.org/#/builders/201 is modified to run with the new setup. Here is an example of a generated MTR command:

            exec perl mysql-test-run.pl --verbose-restart --force --retry=3 --max-save-core=1 --max-save-datadir=10 --max-test-fail=20 --mem --parallel=$(expr 7 \* 2) main.func_kdf,new sys_vars.innodb_sync_spin_loops_basic,innodb sys_vars.innodb_max_purge_lag_basic,innodb period.create, innodb.row_size_error_log_warnings_3,4k,innodb main.column_compression,innodb maria.maria-purge, main.func_compress, archive.archive, main.column_compression_rpl,innodb,stmt main.column_compression_rpl,innodb,row perfschema.show_aggregate,innodb,pot main.column_compression_rpl,innodb,mix main.mysqlbinlog_stmt_compressed, connect.zip, main.mysqlbinlog_row_compressed, compat/oracle.column_compression,innodb main.sp-row, binlog_encryption.encrypted_master,cbc,innodb,mix wsrep.wsrep_provider_plugin_defaults,innodb galera.galera_gcache_recover_manytrx,innodb galera_3nodes.GCF-354,innodb galera_3nodes.galera_safe_to_bootstrap,innodb binlog_encryption.encrypted_master,cbc,innodb,row sysschema.v_schema_table_lock_waits,innodb events.events_slowlog, perfschema.innodb_table_io,innodb rpl.rpl_xa_prepare_gtid_fail,innodb,row main.column_compression_parts, galera.galera_log_bin_ext_mariabackup,innodb sysschema.v_statements_with_full_table_scans,innodb galera_3nodes.MDEV-29171,innodb galera_3nodes.galera_pc_bootstrap,innodb galera_3nodes_sr.galera_sr_kill_slave_before_apply,innodb main.kill_processlist-6619, events.events_restart, galera.GCF-939,innodb galera_3nodes_sr.GCF-336,innodb spider/bugfix.self_reference_multi, main.ssl_autoverify,auto,unix unit.aes, rpl.rpl_gtid_until_before_after_gtids,innodb,mix main.ssl_autoverify,pem,unix unit.lf, rpl.rpl_ssl1,mix perfschema.transaction_nested_events,innodb,pot rpl.rpl_ssl1,stmt perfschema.show_aggregate,1tpc,innodb perfschema.memory_aggregate_no_a_no_u, spider.partition_mrr,

Currently not sure why this returns:

mysql-test-run: *** ERROR: Could not find 'func_kdf' in 'main' suite

@vuvova
Copy link
Member Author

vuvova commented Sep 27, 2023

  • func_kdf — okay, that's a problem. The query selects failing tests for all branches of the same major branch. e.g. for any %11.3% branch it'll try to run tests that failed in any of the %11.3% branches. But some of those branches can have tests not present in others. func_kdf is not pushed into the main branch yet. I'll see how it can be fixed.

  • Ok, typ values are inconsistent, cannot be used for filtering. Let's have as a TODO to fix them eventually

@vuvova
Copy link
Member Author

vuvova commented Sep 27, 2023

fixed func_kdf issue

@vladbogo
Copy link
Collaborator

vladbogo commented Sep 27, 2023

@vuvova updated so now the builder runs using the latest version. I have observed the following new issues:

  1. For https://buildbot.mariadb.org/#/builders/201/builds/19966
mysql-test-run: *** ERROR: Could not run main.show_explain with 'innodb' combination(s)
  1. For https://buildbot.mariadb.org/#/builders/201/builds/19963
MySQLdb._exceptions.OperationalError: (2006, 'MySQL server has gone away')

I find 2 a bit concerning since I have no idea what might cause that, but maybe runQueryWithRetry as you suggested will solve it. And it happened two times since yesterday.

@@ -25,7 +25,7 @@ def run(self):

if master_branch:
query = """
select concat(test_name,',',test_variant) from (select id, test_name,test_variant from test_failure,test_run where branch like '%%%s%%' and test_run_id=id order by test_run_id desc limit %d) x group by test_name,test_variant order by max(id) desc limit %d
select concat(test_name,',',test_variant) from (select id, test_name,test_variant from test_failure,test_run where branch='%s' and test_run_id=id order by test_run_id desc limit %d) x group by test_name,test_variant order by max(id) desc limit %d
Copy link
Member

@cvicentiu cvicentiu Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better readability, I suggest using Python's format:

query = f"""
            select concat(test_name, ',', test_variant)
            from
              (select id, test_name, test_variant
               from test_failure, test_run
               where
                 branch = '{master_branch}' and test_run_id = id
               order by test_run_id desc
               limit {overlimit}) x
            group by test_name, test_variant
            order by max(id) desc
            limit {limit}
"""
tests = yield self.mtrDbPool.runQuery(query)

Copy link
Member

@cvicentiu cvicentiu Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consider the following changes to make it easier for new contributors to understand the code:

  1. using JOIN instead of the equality in the WHERE clause for readability (I don't know by looking at that query which table has column test_run_id and which has id, I can guess, but being explicit is only 6 more characters 😄 )
  2. table aliases for test_failure (such as tf) and test_run (such as tr), and specify columns in the select list with:
    tf.id, tf.test_name, tf.test_variant.

@vuvova
Copy link
Member Author

vuvova commented Sep 27, 2023

  1. For https://buildbot.mariadb.org/#/builders/201/builds/19966
mysql-test-run: *** ERROR: Could not run main.show_explain with 'innodb' combination(s)

I don't understand it. main.show_explain can be run with innodb combination just fine, I've even tried it,
in that very commit where it failed in buildbot.

  1. For https://buildbot.mariadb.org/#/builders/201/builds/19963
MySQLdb._exceptions.OperationalError: (2006, 'MySQL server has gone away')

I find 2 a bit concerning since I have no idea what might cause that, but maybe runQueryWithRetry as you suggested will solve it. And it happened two times since yesterday.

Yes, but runQueryWithRetry is defined inside MTR class, which is a separate step. As you moved the query into a step, it cannot really use another step's data or methods. Only duplicate the logic, as far as I can see.

The reason could the one from the runQueryWithRetry comment:

    This is needed to be robust against things like database connection
    idle timeouts.

@vladbogo
Copy link
Collaborator

@vuvova I changed the code to use runQueryWithRetry. Hopefully this will improve the reliability of the builder.

Related to 1, I have no idea. I have manually tested in the same environment and I get the same error. Also, I have seen this repeated for other tests. If you have any suggestions please let me know,

@vladbogo
Copy link
Collaborator

vladbogo commented Oct 3, 2023

Short update: The fetching of data now with runQueryWithRetry seems stable.

I have identified one more failure mysql-test-run: *** ERROR: Could not find 'galera_restart_replica' in 'galera' suite

vladbogo pushed a commit to vladbogo/buildbot-mdb that referenced this pull request Oct 4, 2023
Bump to php-8.2 and new npm versions
@fauust fauust force-pushed the main branch 3 times, most recently from 8b053e1 to 93d9479 Compare October 20, 2023 10:40
@vuvova
Copy link
Member Author

vuvova commented Oct 22, 2023

The last failure (missing galera_restart_replica) happens because the test was failing in buildbot, but it is removed in the push that is being tested.

This will be fixed in the second checkbox item from the PR description

@vuvova
Copy link
Member Author

vuvova commented Oct 22, 2023

okay, there's a problem. buildbot only has a list of "changed files", it cannot tell if they were added deleted or modified. Meaning, we cannot delete removed tests from the list.

@vladbogo
Copy link
Collaborator

@vuvova I guess we can set a property and get the information from git. I am thinking something similar to 2134d03, but I guess a bit more complicated to take all commits into account.

Also FYI I needed to revert the s390x builders since some other changes needed to go live. I'll work on rebasing this PR and get them back to using it, but for now they are not.

@vuvova
Copy link
Member Author

vuvova commented Oct 23, 2023

I've already added a feature to mtr to ignore unknown tests (MariaDB/server@d5773bc), so no need to go to git. Let's wait for it to be merged up, then we'll continue.

Besides, this "spider_changed" seems overcomplicated, you can simply grep for /spider/ in self.build.allFiles() as I've done in 718060a

vuvova and others added 6 commits December 27, 2023 23:13
as that's what the branch protection protects
FetchTestData now extends MTR and uses runQueryWithRetry to improve
database connection reliability.
also, rename "failed_tests" property to "tests_to_run"
@vuvova vuvova force-pushed the main branch 2 times, most recently from ef041e9 to 76f73f6 Compare December 28, 2023 15:34
and don't run galera for getLastNFailedBuildFactory
amd64-debian-10 and amd64-fedora-38 now only run the last N failed tests
@vladbogo vladbogo marked this pull request as ready for review December 28, 2023 18:27
@vladbogo vladbogo merged commit 4ec2dd3 into MariaDB:dev Dec 28, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants