Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a Redis server that requires SSL connection with self signed certificate exits 0 with 0 tests run #292

Open
schneems opened this issue Nov 11, 2024 · 4 comments

Comments

@schneems
Copy link
Contributor

The issue is described in my debugging notes here heroku/heroku-buildpack-ruby#1505 (comment)

Context

Heroku rolled out a SSL connections by default https://devcenter.heroku.com/changelog-items/2992. However all certificates are self-signed. (as the servers do not "own" the domain compute-1.amazonaws.com). Therefore we recommend people disable peer verification https://devcenter.heroku.com/articles/connecting-heroku-redis#connecting-in-ruby.

When this change happened, my CI was reporting that my runs were successful however I never noticed that they were executing 0 tests:

-----> Running test command `bundle exec rspec-queue --max-requeues=3 --timeout 180 --queue $REDIS_URL --format documentation || { cat log/test_order.log;  $(exit 1); }`...
Finished in 9.72 seconds (files took 20.02 seconds to load)
0 examples, 0 failures
-----> test command `bundle exec rspec-queue --max-requeues=3 --timeout 180 --queue $REDIS_URL --format documentation || { cat log/test_order.log;  $(exit 1); }` completed successfully

While debugging, I tried connecting to a legacy non-ssl url (REDIS_TEMPORARY_URL) and when I made that change everything worked fine.

Expected

When I try to connect to a redis instance and the connection fails for some reason that it will exit non-zero.

Actual

It reports 0 tests were executed and then does not fail

Suggestion

Ideally if the runner cannot connect to redis (for any reason) it would hard fail and ideally give more connection information. I understand that rspec-queue is marked deprecated in the readme, however I'm guessing that this behavior might also persist in other shared connection code-paths.

Beyond improving the connection failure message, if an flag or env var could be added to set the verification mode to none for use with hosted/self-signed certs, that would be helpful as I have no other way of working around this problem at this time.

@casperisfine
Copy link
Contributor

This behavior is actually by design. ci-queue is designed so that all workers are expendable and soft failing except for actual test failures. This include not being able to reach Redis.

As such, the reporter step isn't optional, it's 100% required as it's the only thing that ensure all queued tests were ran.

This is definitely not documented enough, and you are not the first one to get bit by this.

What we could probably do to is to make this opt-in with some sort of --soft-failure-code 0 or something, as this was done at a time where our CI system didn't support registering some error code for soft failure.

if an flag or env var could be added to set the verification mode to none

That's a different feature request.

@casperisfine
Copy link
Contributor

What we could probably do to is to make this opt-in with some sort of --soft-failure-code 0 or something

@ChrisBr any opinion?

@schneems
Copy link
Contributor Author

Thanks for the extra context. In my case such a flag would meet my needs for increasing confidence. I'm running a small cluster, 16 nodes, and I don't expect any of them to fail or be unavailable.

all workers are expendable and soft failing except for actual test failures

CAP strikes again I guess. It makes sense that you would want your distributed job to allow for failure of nodes.

This include not being able to reach Redis.

Short of adding a new feature, lower hanging fruit could be to emit a warning saying when the worker is going into a soft-failure mode, ideally along with why to differentiate failure modes. Possibly to link to some docs explaining why it's not exiting 0 i.e. highlighting that it's designed with expendable test workers in mind. Basically, raise awareness of the issue.

That's a different feature request.

👍 I'm going to poke at it a bit. I'll be at RubyConf this week, so probably won't have a PR until after that. Thanks for taking a look at the issue.

schneems added a commit to schneems/ci-queue that referenced this issue Nov 19, 2024
Hosted Redis servers use self signed certificates (because they do not own the domain they're running on such as `compute-1.amazonaws.com`) therefore a full SSL connection verification will fail. The fix for that behavior is to disable verification so a self signed certificate can be used [docs from a hosted Redis/Key-value store provider](https://devcenter.heroku.com/articles/connecting-heroku-redis#connecting-in-ruby).

This PR makes the default behavior to allow connecting to self signed Redis servers. 

This failing SSL connection behavior was originally reported in Shopify#292, however that issue focuses on raising visibility of such failures.

This is an alternative to Shopify#293 that does not introduce a new configuration flag.
@ChrisBr
Copy link
Contributor

ChrisBr commented Dec 3, 2024

In my case such a flag would meet my needs for increasing confidence. I'm running a small cluster, 16 nodes, and I don't expect any of them to fail or be unavailable.

FYI this is exactly what minitest-distributed does as it does not have a summary step and all workers report. It doesn't have all the bells and whistles as ci-queue (e.g. not bisect) but is a lot simpler and hence integrates easier. Let me know what you think.

https://github.com/Shopify/minitest-distributed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants