Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop workers during online backup #856

Merged
merged 5 commits into from
Aug 13, 2024
Merged

stop workers during online backup #856

merged 5 commits into from
Aug 13, 2024

Conversation

evgeni
Copy link
Member

@evgeni evgeni commented May 28, 2024

this ensures more consistent on-disk data

@evgeni evgeni force-pushed the stop-workers-for-backup branch 3 times, most recently from 68572a8 to b27654d Compare May 29, 2024 07:07
@evgeni evgeni force-pushed the stop-workers-for-backup branch 2 times, most recently from b844757 to f2a3be6 Compare May 29, 2024 08:26
@evgeni evgeni force-pushed the stop-workers-for-backup branch 2 times, most recently from a3cc938 to 91cb8bb Compare May 29, 2024 16:11
@evgeni evgeni force-pushed the stop-workers-for-backup branch from 91cb8bb to 6d4e875 Compare June 7, 2024 07:12
@evgeni
Copy link
Member Author

evgeni commented Jun 7, 2024

# hammer job-invocation create --job-template "Run Command - Script Default" --inputs 'command=sleep 120' --search-query "name = $(hostname -f)" --async
Job invocation 5 created

# foreman-maintain backup online /var/tmp/b
Starting backup: 2024-06-07 07:29:04 +0000
Running preparation steps required to run the next scenarios
================================================================================
Make sure Foreman DB is up: 
/ Checking connection to the Foreman DB                               [OK]      
--------------------------------------------------------------------------------
Make sure Candlepin DB is up: 
- Checking connection to the Candlepin DB                             [OK]      
--------------------------------------------------------------------------------
Make sure Pulpcore DB is up: 
\ Checking connection to the Pulpcore DB                              [OK]      
--------------------------------------------------------------------------------


Running Backup
================================================================================
Check for running tasks:                                              [FAIL]
There are 2 active task(s) in the system.
Please wait for these to complete or cancel them from the Monitor tab.
--------------------------------------------------------------------------------
There are multiple steps to proceed:
1) Fetch tasks status and wait till they finish
2) Investigate the tasks via UI
Select step to continue, [n(next), q(quit)] 1
Fetch tasks status and wait till they finish:                                   
| Try checking status of running task(s)                                        
There are 2 running tasks.
| Try checking status of running task(s)                                        
There are 2 running tasks.
\ Try checking status of running task(s)                                        
There are 2 running tasks.
\ Try checking status of running task(s)                                        
There are 2 running tasks.
\ Try checking status of running task(s)                                        
There are 2 running tasks.
\ Try checking status of running task(s)                                        
There are 2 running tasks.
| Try checking status of running task(s)                                        
There are 2 running tasks.
| Try checking status of running task(s)                                        
There are 2 running tasks.
| Try checking status of running task(s)                                        
There are 2 running tasks.
| Try checking status of running task(s)                                        
There are 2 running tasks.
| Try checking status of running task(s)                                        
There are 2 running tasks.
| Try checking status of running task(s)                                        
There are 1 running tasks.
| Try checking status of running task(s)                                        
There are 1 running tasks.
| Try checking status of running task(s)                              [OK]      
--------------------------------------------------------------------------------
Rerunning the check after fix procedure
Check for running tasks:                                              [OK]
--------------------------------------------------------------------------------
Check for running pulpcore tasks:                                     [OK]

@evgeni evgeni force-pushed the stop-workers-for-backup branch 3 times, most recently from 9595a47 to cd5e6cb Compare June 7, 2024 08:23
@evgeni
Copy link
Member Author

evgeni commented Jun 7, 2024

# PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager shell < lol.py 
/pulp/api/v3/tasks/018ff1cc-1924-7108-9b17-0b7adc330e9b/

# pulp task list --state running
[
  {
    "pulp_href": "/pulp/api/v3/tasks/018ff1cc-1924-7108-9b17-0b7adc330e9b/",
    "pulp_created": "2024-06-07T08:23:55.941268Z",
    "pulp_last_updated": "2024-06-07T08:23:55.941279Z",
    "state": "running",
    "name": "pulpcore.app.tasks.test.sleep",
    "logging_cid": "lol",
    "created_by": "/pulp/api/v3/users/1/",
    "unblocked_at": "2024-06-07T08:23:55.952339Z",
    "started_at": "2024-06-07T08:23:55.972970Z",
    "finished_at": null,
    "error": null,
    "worker": "/pulp/api/v3/workers/018ff195-c89d-70bd-8262-16b5431226bc/",
    "parent_task": null,
    "child_tasks": [],
    "task_group": null,
    "progress_reports": [],
    "created_resources": [],
    "reserved_resources_record": [
      "shared:/pulp/api/v3/domains/018fb72c-f3d2-7e49-9b59-20a4c2f7da44/"
    ]
  }
]

# foreman-maintain backup online /var/tmp/b
Starting backup: 2024-06-07 08:24:01 +0000
Running preparation steps required to run the next scenarios
================================================================================
Make sure Foreman DB is up: 
/ Checking connection to the Foreman DB                               [OK]      
--------------------------------------------------------------------------------
Make sure Candlepin DB is up: 
- Checking connection to the Candlepin DB                             [OK]      
--------------------------------------------------------------------------------
Make sure Pulpcore DB is up: 
\ Checking connection to the Pulpcore DB                              [OK]      
--------------------------------------------------------------------------------


Running Backup
================================================================================
Check for running tasks:                                              [OK]
--------------------------------------------------------------------------------
Check for running pulpcore tasks:                                     [FAIL]
There are 1 active task(s) in the system.
Please wait for these to complete.
--------------------------------------------------------------------------------
Continue with step [Fetch tasks status and wait till they finish]?, [y(yes), n(no), q(quit)] y
Fetch tasks status and wait till they finish:                                   
- waiting for tasks to finish                                                   
There are 1 tasks.
/ Waiting 10 seconds before retry.                                              
There are 1 tasks.
\ Waiting 10 seconds before retry.                                              
There are 1 tasks.
/ Waiting 10 seconds before retry.                                              
There are 1 tasks.
| Waiting 10 seconds before retry.                                              
There are 1 tasks.
- Waiting 10 seconds before retry.                                              
There are 1 tasks.
| Waiting 10 seconds before retry.                                    [OK]      
--------------------------------------------------------------------------------
Rerunning the check after fix procedure
Check for running pulpcore tasks:                                     [OK]

lol.py being:

from django_guid import set_guid 
from pulpcore.tasking.tasks import dispatch 
from pulpcore.app.util import get_url
set_guid('lol') 
task = dispatch("pulpcore.app.tasks.test.sleep", args=(60,)) 
print(get_url(task))

@evgeni evgeni force-pushed the stop-workers-for-backup branch 10 times, most recently from 73b53e6 to 239f42c Compare June 14, 2024 11:21
@@ -57,6 +57,7 @@ def common_backup_options
option '--features', 'FEATURES',
"#{proxy_name} features to include in the backup. " \
'Valid features are tftp, dns, dhcp, openscap, and all.', :multivalued => true
option '--wait-for-tasks', :flag, 'Wait for running tasks to complete instead of aborting'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I've implemented this as a boolean flag, but am wondering if instead we want to give users the ability to configure the timeout. (the current one is 300, so 5 minutes, for both pulp and foreman, so a maximum of 10 minutes waiting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - however we could make it explicit (options to the CLI) or "hidden" (environment variables).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is the only one that is left undecided.
I must admit I kinda find env vars bad for discovery, but on the other hand we'd have to add the cli command to update/upgrade too if we'd go with that (to be consistent).
But I also really have no feeling how often people would actually want to override that.

So, maybe, start with env, and then see how many complaints we get?

@evgeni evgeni marked this pull request as ready for review June 14, 2024 11:29
@evgeni
Copy link
Member Author

evgeni commented Jun 14, 2024

should the whole "wait for tasks before backup" part be split out in an own commit/PR?

@evgeni evgeni force-pushed the stop-workers-for-backup branch from 6a9403a to 5059588 Compare June 14, 2024 14:08
@evgeni evgeni force-pushed the stop-workers-for-backup branch 2 times, most recently from 5766e06 to db07fba Compare June 17, 2024 09:39
@evgeni
Copy link
Member Author

evgeni commented Jun 17, 2024

split the big patch into four logical commits, enjoy!

@evgeni evgeni force-pushed the stop-workers-for-backup branch from db07fba to bacba01 Compare June 19, 2024 13:28
@ehelms
Copy link
Member

ehelms commented Jul 12, 2024

Needs a rebase!

@evgeni evgeni force-pushed the stop-workers-for-backup branch 2 times, most recently from d66c6ad to 6ba62e7 Compare July 13, 2024 09:58
@evgeni
Copy link
Member Author

evgeni commented Jul 13, 2024

got a rebase!

@evgeni
Copy link
Member Author

evgeni commented Jul 13, 2024

hm, need to adjust tests. later.

@evgeni evgeni force-pushed the stop-workers-for-backup branch 3 times, most recently from cd13c57 to 2897a6c Compare July 17, 2024 08:49
@evgeni evgeni force-pushed the stop-workers-for-backup branch from 2897a6c to b87f9b8 Compare July 18, 2024 06:58
@evgeni evgeni merged commit fdb2c09 into master Aug 13, 2024
8 checks passed
@evgeni evgeni deleted the stop-workers-for-backup branch August 13, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants