You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm preparing "contigName", "start", "end", "referenceAllele", "alternateAlleles" field before the call, and I've checked there is no any NULL values in any of the fields.
During Spark action call I'm getting this error:
23/10/12 00:33:15 ERROR TaskContextImpl: Error in TaskCompletionListener
java.lang.NullPointerException: null
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1(NormalizeVariantExpr.scala:55) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1$adapted(NormalizeVariantExpr.scala:54) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:132) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:199) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:141) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
23/10/12 00:33:15 ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 88)
org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:254) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:141) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
Suppressed: java.lang.NullPointerException
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1(NormalizeVariantExpr.scala:55) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1$adapted(NormalizeVariantExpr.scala:54) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:132) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:199) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:141) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
I've tried to run just this part of dataframe from pyspark session manually, there were no any errors. But when I run whole pipeline with all joins it's failing just on this step for multiple containers. Here you can see the executor stats:
I'm running this on Spark 3.4.1 with 6G executors and 3G driver.
It looks like Glow cannot find a listener for a specific task.
Can you help me with this, please?
The text was updated successfully, but these errors were encountered:
I don't know. Maybe the problem is still there. I've found a workaround by dumping whole dataframe to parquet file on HDFS and continue the step with the parquet file instead of dealing with long query.
By the way, the new version 2.0.0 is not even initialize. It fails on import glow with some numpy compatibility error. There is no backward compatibility at all.
I'm trying to implement variant normalization function. I'm calling it within a dataframe like this:
I'm preparing
"contigName", "start", "end", "referenceAllele", "alternateAlleles"
field before the call, and I've checked there is no any NULL values in any of the fields.During Spark action call I'm getting this error:
I've tried to run just this part of dataframe from pyspark session manually, there were no any errors. But when I run whole pipeline with all joins it's failing just on this step for multiple containers. Here you can see the executor stats:
I'm running this on Spark 3.4.1 with 6G executors and 3G driver.
It looks like Glow cannot find a listener for a specific task.
Can you help me with this, please?
The text was updated successfully, but these errors were encountered: