-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing Beanstalk from teuthology #1650
base: main
Are you sure you want to change the base?
Removing Beanstalk from teuthology #1650
Conversation
946dc36
to
0ef5b15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, the retry comments are due to some issues we've had with paddles in the past, particularly recently with db conflicts causing stale sqlalchemy handles that may cause a subsequent request to run into an error.
Working around this with client-side (aka teuthology) retries has mitigated these issues.
First of all, the tests are failing. |
Second, is it possible to support both scheduling mechanisms for a while? |
4f83a98
to
4d2ae67
Compare
It seems like a lot of effort - on the teuthology side one can always use the commit before this merges. In terms of the sepia lab, I figure we'll announce this widely before it merges and when it is being merged. Folks will notice pretty fast when beanstalkd is no longer running. @amathuria is planning to test out the paddles side further in sepia before merging the teuthology piece as well. |
a170942
to
88af7b5
Compare
I don't think it is a lot of effort for the moment to support just another scheduling on teuthology client side. I've added --backend option to handle this.
It is not only about sepia lab we still use downstream beanstalk with two different teuthology setups and it will be tough to switch everything at once. I would vote for keeping the beanstalk support in teuthology for awhile so it can run with older paddles and pulpito. |
Developers can now use start.sh to build the images and set up postgresql, paddles, pulpito and teuthology for development. This PR is also pending for: #1650 ceph/paddles#94 to be merged, as currently we use these branches as images for paddles and pulpito. Signed-off-by: Kamoltat Sirivadhna <[email protected]>
Developers can now use start.sh to build the images and set up postgresql, paddles, pulpito and teuthology for development. This PR is also pending for: #1650 ceph/paddles#94 to be merged, as currently we use these branches as images for paddles and pulpito. Signed-off-by: Kamoltat Sirivadhna <[email protected]>
5c0d202
to
61a399c
Compare
@zmc want to take a look at the teuthology side of this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work. I left some comments in the diff section, but some of them will probably be made irrelevant by the below.
I think we should do some consolidation for simplicity's sake. Rather than add teuthology/paddles_queue.py
alongside beanstalk.py
, let's create a teuthology/queue
subdir so that we can have modules named teuthology.queue.beanstalk
and teuthology.queue.paddles
. A __init__.py
in the queue
subdir should contain the code that the two implementations have in common, so that we're not duplicating things like JobProcessor
and friends.
We can add a new config setting that controls which queueing mechanism to use (config.queue_backend
?). A __init__.py
in the queue
subdir can look at that setting and load the appropriate backend. Scripts that have a --queue-backend
arg should be able to use that setting's value as their default; then using the actual arg is almost never necessary.
Then, rather than have separate teuthology-beanstalk-queue
and teuthology-paddles-queue
scripts, which will depend on the user to know which to call, we can leave the existing scripts/queue.py
mostly alone.
Last, this PR should probably not change the default backend to paddles just yet, to avoid causing problems for other users.
Developers can now use start.sh to build the images and set up postgresql, paddles, pulpito and teuthology for development. This PR is also pending for: #1650 ceph/paddles#94 to be merged, as currently we use these branches as images for paddles and pulpito. Signed-off-by: Kamoltat Sirivadhna <[email protected]>
604c452
to
fe3d0c9
Compare
74d1650
to
5a3edb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in the very early stages of testing this with ceph/paddles#94 via ceph-devstack, but I have run into an issue: it looks like teuthology-schedule
isn't noticing that I have backend: paddles
in my .teuthology.yaml
, so I get RuntimeError: Beanstalk queue information not found in None
.
I also see several log entries with POST to http://paddles:8080/queue/stats/ failed with status 400: {"message": "specified queue does not exist"}
- so I think there's an issue with creating queues as well. This comment belongs on the paddles side, but I think it'd make a lot of sense to just create queues transparently like beanstalkd does.
response = self.session.post(uri, data=queue_json, headers=headers) | ||
|
||
if response.status_code == 200: | ||
return response.json() | ||
else: | ||
msg = response.text | ||
self.log.error( | ||
"POST to {uri} failed with status {status}: {msg}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the above be using a GET
request?
report_status = False | ||
else: | ||
report_status = True | ||
|
||
name = args['--name'] | ||
if not name or name.isdigit(): | ||
raise ValueError("Please use a more descriptive value for --name") | ||
job_config = build_config(args) | ||
backend = args['--queue-backend'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be reading this from the config object - the CLI flag can take precedence over that value, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Thanks for trying this out! I'll look into these issues |
The queue does not exist issue popped up because I wasn't reading backend from teuthology.yaml. Once that is fixed we shouldn't see this error anymore. |
33a4ffd
to
a6c7093
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple bugs; looks like a rebase is needed as well
scripts/dispatcher.py
Outdated
--supervisor run dispatcher in job supervisor mode | ||
--bin-path BIN_PATH teuthology bin path | ||
--job-config CONFIG file descriptor of job's config file | ||
--exit-on-empty-queue if the queue is empty, exit | ||
--queue-backend BACKEND which backend will be used for the queue | ||
[default: beanstalk] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the default of beanstalk
here, we get a traceback if we've got backend: paddles
in /etc/teuthology.yaml
and omit the flag:
Traceback (most recent call last):
File "/home/teuthworker/teuthology/virtualenv/bin/teuthology-dispatcher", line 8, in <module>
sys.exit(main())
File "/home/teuthworker/teuthology/scripts/dispatcher.py", line 37, in main
sys.exit(teuthology.dispatcher.main(args))
File "/home/teuthworker/teuthology/teuthology/dispatcher/__init__.py", line 96, in main
connection = beanstalk.connect()
File "/home/teuthworker/teuthology/teuthology/queue/beanstalk.py", line 16, in connect
raise RuntimeError(
RuntimeError: Beanstalk queue information not found in None
Perhaps we should drop the flag and only read from the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense, I'll make the change
@@ -143,6 +143,7 @@ class TeuthologyConfig(YamlConfig): | |||
'archive_upload_key': None, | |||
'archive_upload_url': None, | |||
'automated_scheduling': False, | |||
'backend': 'beanstalk', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'backend': 'beanstalk', | |
'queue_backend': 'beanstalk', |
try: | ||
report.try_delete_jobs(job_config['name'], job_config['job_id']) | ||
except Exception: | ||
log.exception("Unable to delete job %s", job_config['job_id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably log the name here as well
teuthology/schedule.py
Outdated
backend = args['--queue-backend'] | ||
backend = config.backend | ||
if args['--queue-backend']: | ||
backend = args['--queue-backend'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar concern as -dispatcher
re: the flag vs. config file
065b7a3
to
4ae5f62
Compare
4ae5f62
to
dfb91e8
Compare
Adding DNM as the Paddles PR will have to be merged first |
570ac00
to
a0b7570
Compare
a0b7570
to
fc8245c
Compare
9150d49
to
56179ba
Compare
56179ba
to
e520395
Compare
The following changes support the removal of Beanstalk from Teuthology. In place of Beanstalk, we will now be using Paddles for queue management in Teuthology. This PR has the corresponding changes for the paddles PR: https://github.com/ceph/paddles/pull/94/files. The changes include: 1. Removing all beanstalk related code 2. Teuthology scheduler and dispatcher using Paddles queue for scheduling and dispatching jobs 3. Adding support for Paddles queue management 4. Additional functionality of being able to change the priority of Teuthology jobs in the queued state in the teuthology-queue command Signed-off-by: Aishwarya Mathuria <[email protected]>
Previously the teuthology worker unit test used beanstalk, the test will be using Paddles now. Signed-off-by: Aishwarya Mathuria <[email protected]>
1. Add retry loop for the paddles calls. 2. Add run name as a parameter for updating priority of jobs in paddles. 3. Modify the pause queue command to run on server side with an optional pause duration parameter. Signed-off-by: Aishwarya Mathuria <[email protected]>
Signed-off-by: Aishwarya Mathuria <[email protected]>
…ith Paddles With the use of the --queue-backend argument the user can specify which backend(paddles/beanstalk) they would like to use for maintaining the teuthology Jobs queue. In order to avoid overlapping Job IDs, when a job is being scheduled in beanstalk it is also written to paddles which returns a unique ID. This is the ID teuthology will treat as the Job ID throughout the run of the job. To differentiate between the 2 queue backends, the teuthology-queue command has been split into teuthology-paddles-queue command and teuthology-beanstalk-queue command. Signed-off-by: Aishwarya Mathuria <[email protected]>
…alk queue Signed-off-by: Aishwarya Mathuria <[email protected]>
e520395
to
d7a4c94
Compare
Makes the same teuthology-queue commands work regardless of the queue backend, Paddles or Beanstalk. Signed-off-by: Aishwarya Mathuria <[email protected]>
d7a4c94
to
be34af2
Compare
The following changes support the removal of Beanstalk from Teuthology.
In place of Beanstalk, we will now be using Paddles for queue management in Teuthology.
This PR has the corresponding changes for the paddles PR: https://github.com/ceph/paddles/pull/94/files.
The changes include:
Signed-off-by: Aishwarya Mathuria [email protected]