You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.
My log bucket is fairly large in size, however we have Glaicered anything older than three months. When I run the job, I get the following, as it completes in a minute or two:
19/09/25 13:26:02 WARN HadoopDataSource: Skipping Partition
{}as no new files detected @ s3://<BUCKET>/ / or path does not exist
where is the name of my S3 access log storage bucket.
My logs are being saved at top-level in the S3 bucket, i.e. all log files are at s3:///
What could be happening here? I know there are logs in the bucket that are not partitioned, and the converted DB/tables are empty when I preview them. I have given the classification of the raw data table as CSV, but I am not sure what is correct.
Any pointers would be appreciated!
The text was updated successfully, but these errors were encountered:
We get a similar issue when a file is not in s3 and an empty DataFrame is still created, shouldn't this raise an exception?:
22/06/30 08:52:18 WARN HadoopDataSource: Skipping Partition {} as no new files detected @ s3://sample-bucket/test/dict_most_common_names_old.csv or path does not exist
Empty DataFrame
Columns: []
Index: []
<class 'pandas.core.frame.DataFrame'>
Hi,
My log bucket is fairly large in size, however we have Glaicered anything older than three months. When I run the job, I get the following, as it completes in a minute or two:
where is the name of my S3 access log storage bucket.
My logs are being saved at top-level in the S3 bucket, i.e. all log files are at s3:///
What could be happening here? I know there are logs in the bucket that are not partitioned, and the converted DB/tables are empty when I preview them. I have given the classification of the raw data table as CSV, but I am not sure what is correct.
Any pointers would be appreciated!
The text was updated successfully, but these errors were encountered: