Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate thread locals to Delta thread pools #2154

Closed
wants to merge 1 commit into from

Conversation

fred-db
Copy link
Contributor

@fred-db fred-db commented Oct 9, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

  • The default thread pool executor in Apache Spark does not forward thread locals to threads spawned in a thread pool.
  • This can cause issues if the threads depend on the thread locals.
  • To fix this, we introduce a wrapper class around the thread pool executor that forwards thread locals.

How was this patch tested?

  • UTs of SparkThreadLocalForwardingExecutor to ensure thread locals are forwarded and reset after future finished.

Does this PR introduce any user-facing changes?

No

Copy link
Contributor

@larsk-db larsk-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, otherwise it's looks good thanks :)

// Capture an immutable threadsafe snapshot of the current local properties
val capturedProperties = sparkContext
.map(sc => CapturedSparkThreadLocals.toValuesArray(
org.apache.spark.util.Utils.cloneProperties(sc.getLocalProperties)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import Utils instead of using qualified name here? You can rename it (e.g. to SparkUtils) if it clashes.

Comment on lines 89 to 91
capturedProperties.foreach { p =>
sparkContext.foreach(_.setLocalProperties(CapturedSparkThreadLocals.toProperties(p)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
capturedProperties.foreach { p =>
sparkContext.foreach(_.setLocalProperties(CapturedSparkThreadLocals.toProperties(p)))
}
for {
p <- capturedProperties
sc <- sparkContext
} sc.setLocalProperties(CapturedSparkThreadLocals.toProperties(p))

Isn't this much more readable? ;)

throw t
} finally {
TaskContext.setTaskContext(previousTaskContext)
previousProperties.foreach(p => sparkContext.foreach(_.setLocalProperties(p)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines 115 to 117
props.foreach { kvp =>
resultProps.put(kvp._1, kvp._2)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, hear me out:

Suggested change
props.foreach { kvp =>
resultProps.put(kvp._1, kvp._2)
}
for ((key, value) <- props) {
resultProps.put(key, value)
}

}

test("That CapturedSparkThreadLocals properly restores the existing spark properties." +
" Changes to local properties inside a task do not affect the original properties") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

@fred-db
Copy link
Contributor Author

fred-db commented Oct 13, 2023

Hi @larsk-db , thank you so much for reviewing! :) I addressed your comments, hope it looks good now. I also moved the DeltaThreadPool under util/threads, as we now have more threading-specific code in the codebase and it makes sense to put it into a separate package.

Copy link
Contributor

@larsk-db larsk-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

@fred-db fred-db changed the title Propagate threadlocals to DeltaThreadPool Propagate thread locals to Delta thread pools Oct 13, 2023
xupefei pushed a commit to xupefei/delta that referenced this pull request Oct 31, 2023
* The default thread pool executor in Apache Spark does not forward thread locals to threads spawned in a thread pool.
*  This can cause issues if the threads depend on the thread locals.
* To fix this, we introduce a wrapper class around the thread pool executor that forwards thread locals.

Closes delta-io#2154

GitOrigin-RevId: 9e9423e4b041232457ffaab18f5f96490bb45b88
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants