Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AN-226] Enable submitting workflows to Cromwell's GCP Batch backend #3141

Merged
merged 33 commits into from
Dec 9, 2024

Conversation

sam-schu
Copy link
Contributor

@sam-schu sam-schu commented Nov 22, 2024

Ticket: AN-226

  • Updated the WorkflowSubmissionActor to submit workflows to Cromwell's GCP Batch backend, rather than to the PAPI backend, if the UseCromwellGcpBatchBackendSetting workspace setting is enabled.
    • This workspace setting was added in AN-225 (PR here). It is currently exposed only by the workspace settings API, but will later be added to the Terra UI workspace settings modal under a feature flag.
    • Uses a new helm config value added here.
  • Added the gcp_batch_gcs_root key, which takes the same value as the existing jes_gcs_root, to the execution service workflow options. This key/value pair is needed by the GCP Batch backend, and ignored by the PAPI backend.
  • Updated the README and the PR template to both include all 3 places where the rawls-model dependency version should be updated when a new model is published.

PR checklist

  • Include the JIRA issue number in the PR description and title
  • Make sure Swagger is updated if API changes
    • ...and Orchestration's Swagger too!
  • If you changed anything in model/, then you should publish a new official rawls-model and update rawls-model in Orchestration's dependencies.
  • Get two thumbsworth of PR review
  • Verify all tests go green, including CI tests
  • Squash commits and merge to develop (branches are automatically deleted after merging)
  • Inform other teams of any substantial changes via Slack and/or email

@sam-schu sam-schu changed the title [AN-226] Submit to batch backend [AN-226] Submit workflows to Cromwell's GCP Batch backend when workspace setting is enabled Dec 4, 2024
@sam-schu sam-schu changed the title [AN-226] Submit workflows to Cromwell's GCP Batch backend when workspace setting is enabled [AN-226] Enable submitting workflows to Cromwell's GCP Batch backend Dec 4, 2024
@@ -126,6 +126,7 @@ class ExecutionModelSpec extends AnyFlatSpec with Matchers {
"ExecutionServiceWorkflowOptions" should "serialize/deserialize to/from JSON" in {
val test = ExecutionServiceWorkflowOptions(
jes_gcs_root = "jes_gcs_root",
gcp_batch_gcs_root = "gcp_batch_gcs_root",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of key/value ambiguity, even for tests.

Suggested change
gcp_batch_gcs_root = "gcp_batch_gcs_root",
gcp_batch_gcs_root = "example_gcp_batch_gcs_root",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

@@ -467,6 +480,7 @@ trait WorkflowSubmission extends FutureSupport with LazyLogging with MethodWiths
monitoringImageScript = submissionRec.monitoringImageScript
)
} yield {
logger.error("Test workflow options: " + wfOpts.toString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or demote below error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my standalone comment below.

README.md Outdated
Comment on lines 177 to 180
* **Note:** Local Rawls instances are currently unable to submit workflows to CromIAM due to an issue with the
`caas-collection-name` label. If you are able to make a workflow submission to your local back Rawls instance and have
it run on Cromwell, check that you did not use the dev database and accidentally allow dev Rawls to pick up your
submission.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I recommend only checking in the content in the Publish rawls-model section.

I think a larger rewrite of the instructions is in order, which I intend to do in AN-297.

In particular, I don't think it makes sense to write instructions on what doesn't work, people want to know what does work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the note from lines 177-180 — I am happy to defer updates related to local workflow submission to your ticket. Did you also think I should revert my other updates other than those related to the Rawls model? I think my additional VPN reminder and additional context about using the dev database based on what we learned will help future developers not make the same mistakes that I did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please stick to updating the content in the Publish rawls-model section.

We will have horrible merge conflicts otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will definitely pull you in to review that PR so you can confirm I covered everything you ran into.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks! I have reverted the other changes.

Co-authored-by: Adam Nichols <[email protected]>
Copy link
Contributor

@jgainerdewar jgainerdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome!

Comment on lines 308 to 312
useCromwellGcpBatchBackend: Boolean = currentSettings
.collectFirst { case setting: UseCromwellGcpBatchBackendSetting =>
setting
}
.exists(_.config.enabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer if we did it in a single invocation (further tweaking welcome, this is just an idea):

useCromwellGcpBatchBackend = currentSettings.exists { setting => 
    setting match {
        case backendSetting: UseCromwellGcpBatchBackendSetting => backendSetting.config.enabled
        case _ => false
     }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by a single invocation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean going from a collectFirst followed by a exists, to just an exists. I find it easier to read because I don't have to think about the type coming out of the collectFirst. I also like making it obvious that we are handling two different false cases - the settings exists and is set to false, or the setting doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably create a reusable utility method to retrieve a strongly-typed workspace setting from the settings list, a la:

  def findSetting[T](allSettings: List[WorkspaceSetting]): Option[T] = {
    allSettings.collectFirst {
      case found:T => found
    }
  }

if we had that, then this code could be a bit clearer, like:

useBatchSetting: Option[UseCromwellGcpBatchBackendSetting] = somwhereTheUtilityLives
        .findSetting[UseCromwellGcpBatchBackendSetting](currentSettings)

or

useCromwellGcpBatchBackend: Boolean = somwhereTheUtilityLives
        .findSetting[UseCromwellGcpBatchBackendSetting](currentSettings).exists(_.config.enabled)

you could use the same utility to calculate separateSubmissionSetting up at line 293, too.

I'd consider this a stretch goal/out of scope/does not block this PR. I don't want to scope-creep!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @davidangb I agree that refactoring would be great for clarity and reusability in the future, but I have implemented Janet's suggestion for now.

@@ -11,7 +11,10 @@ import org.broadinstitute.dsde.rawls.coordination.{DataSourceAccess, Uncoordinat
import org.broadinstitute.dsde.rawls.dataaccess._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any non-scalafmt or whitespace changes to this file? If not, can you back out these changes? It's better to not have files modified at all if we don't need to change them functionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty certain that I made the commit to scalafmt this file because it was giving me an error on the scalafmt PR check. I see that that action claims to check only modified files, but maybe that file was being checked because the merge commit from dev modified it. Is there a way to revert the scalafmt changes without failing the check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe reverse all of the changes so that the diff between this branch and develop includes no change to this file, and see how it feels? It sounds like that might be the state you started in, but worth going back to that so we can try again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to revert the commit that scalafmt-ed this file, but I decided to check the diff first and I couldn't see my changes anymore. It turns out that Bria must have been having the same problem, and this PR that was merged yesterday also fixed the formatting. Does it matter now whether or not I revert the commit that made the same changes on my branch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, if the diff looks good you're all set!

@@ -1202,6 +1208,90 @@ class WorkflowSubmissionSpec(_system: ActorSystem)
}
}

it should "submit workflows to Cromwell's GCP Batch backend when UseCromwellGcpBatchBackendSetting is true" in withDefaultTestDatabase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

regular = "cwb_ws_dev_v7"
exfiltrationControlled = "vpc_sc_v10"
regular = "cwb_ws_dev_v8"
exfiltrationControlled = "vpc_sc_v13"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, should have updated these when I switched to the new pools but I didn't know they existed!

@@ -93,8 +93,8 @@ dataRepo {

resourceBuffer {
projectPool {
regular = "cwb_ws_dev_v7"
exfiltrationControlled = "vpc_sc_v10"
regular = "cwb_ws_dev_v8"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgainerdewar I updated these pools in the local dev config to reflect the new pools you recently created because the old values were giving me trouble when trying to run Rawls locally. Do these new values seem correct? (The exfiltrationControlled value I was particularly unsure of.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, those look correct!

@@ -467,6 +480,7 @@ trait WorkflowSubmission extends FutureSupport with LazyLogging with MethodWiths
monitoringImageScript = submissionRec.monitoringImageScript
)
} yield {
logger.error("Test workflow options: " + wfOpts.toString);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this log before merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is still here!

@sam-schu sam-schu marked this pull request as ready for review December 4, 2024 21:17
@sam-schu sam-schu requested a review from a team as a code owner December 4, 2024 21:17
@sam-schu sam-schu requested review from calypsomatic and davidangb and removed request for a team December 4, 2024 21:17
Copy link
Contributor

@aednichols aednichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending resolution of README merge conflicts/duplication with #3146

Copy link
Contributor

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not test it myself, but code looks good! Minor comments inline.

Comment on lines +317 to 318
submission.submissionRoot,
submission.submissionRoot,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a good comment on ExecutionServiceWorkflowOptions that explains why we're passing the same value as the first two arguments … perhaps add a code comment here too? I was awfully confused when I first read this code, until I drilled down into ExecutionServiceWorkflowOptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a clarifying comment, thanks!

Comment on lines 308 to 312
useCromwellGcpBatchBackend: Boolean = currentSettings
.collectFirst { case setting: UseCromwellGcpBatchBackendSetting =>
setting
}
.exists(_.config.enabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably create a reusable utility method to retrieve a strongly-typed workspace setting from the settings list, a la:

  def findSetting[T](allSettings: List[WorkspaceSetting]): Option[T] = {
    allSettings.collectFirst {
      case found:T => found
    }
  }

if we had that, then this code could be a bit clearer, like:

useBatchSetting: Option[UseCromwellGcpBatchBackendSetting] = somwhereTheUtilityLives
        .findSetting[UseCromwellGcpBatchBackendSetting](currentSettings)

or

useCromwellGcpBatchBackend: Boolean = somwhereTheUtilityLives
        .findSetting[UseCromwellGcpBatchBackendSetting](currentSettings).exists(_.config.enabled)

you could use the same utility to calculate separateSubmissionSetting up at line 293, too.

I'd consider this a stretch goal/out of scope/does not block this PR. I don't want to scope-creep!

executionServiceWorkflowOptions = ExecutionServiceWorkflowOptions(
submission.submissionRoot,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to be duplicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! David was confused by the same thing. I added a clarifying comment as he suggested!

@sam-schu sam-schu requested a review from jgainerdewar December 6, 2024 19:51
Copy link
Contributor

@jgainerdewar jgainerdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, happy to approve once that last log line is removed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants