Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running quickstart_spark. Error [DATA_SOURCE_NOT_FOUND] #584

Open
pepinho14 opened this issue Oct 4, 2024 · 1 comment
Open

Running quickstart_spark. Error [DATA_SOURCE_NOT_FOUND] #584

pepinho14 opened this issue Oct 4, 2024 · 1 comment

Comments

@pepinho14
Copy link

pepinho14 commented Oct 4, 2024

I am having some troubles trying to run the python example: delta-sharing/examples/python/quickstart_spark.py and I am getting this error:

Py4JJavaError: An error occurred while calling o61.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at `[https://spark.apache.org/third-party-projects.html`](https://spark.apache.org/third-party-projects.html%60).
	at org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:724)

I tried to follow instructions adding pyspark --packages io.delta:delta-sharing-spark_2.12:3.1.0 and I get that error and also by adding Hadoop
spark = (SparkSession .builder .config('spark.jars.packages', 'org.apache.hadoop:hadoop-azure:3.3.1,io.delta:delta-core_2.12:2.2.0,io.delta:delta-sharing-spark_2.12:0.6.2') .config('spark.sql.extensions', 'io.delta.sql.DeltaSparkSessionExtension') .config('spark.sql.catalog.spark_catalog', 'org.apache.spark.sql.delta.catalog.DeltaCatalog') .getOrCreate() ) as some other user suggest in another Issue.

So, it would be great if anyone could help me here. Thanks!

@MHenn1g
Copy link

MHenn1g commented Nov 5, 2024

Hi @pepinho14 ,

my first guess would be that delta-sharing-spark_2.12:0.6.2 may be a bit outdated.
You could try utilizing this:

from pyspark.sql import SparkSession
import delta_sharing

spark = (SparkSession
         .builder
         .config('spark.jars.packages', 'io.delta:delta-sharing-spark_2.12:3.1.0')
         .config('spark.sql.extensions', 'io.delta.sql.DeltaSparkSessionExtension')
         .config('spark.sql.catalog.spark_catalog', 'org.apache.spark.sql.delta.catalog.DeltaCatalog')
         .getOrCreate()
         ) 

profile_path="/path/to/config/"
share_file_path = f"{profile_path}/config.share"

table_url = f"{share_file_path}#<share-name>.<schema-name>.<table-name>"
df = spark.read.format("deltasharing").load(table_url)
print(df.show())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants