-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for partition table #42
Comments
Hi @alepuccetti, You might be able to set a custom date pattern and follow the guide for converting date sharded tables into a single date partitioned table, even if it's not ideal. To add support for this feature, it looks like we'll need a few more settings (from the docs on table creation):
Translated to a LogStash config that would probably look something like this: config :cluster_fields, validate: :string, list: true, default: []
config :enable_time_partitioning, validate: :boolean, default: false
config :time_partition_field, validate: :string, default: ""
config :time_partition_require_filter, validate: :boolean, default: false Do you think something like that could fit your needs? |
Hi @josephlewis42,
Making monthly (or other time shards) tables is not a feasable for my use case. Writing queries that touch multiple tables is very ugly, and if you want to build a dashboard with datastudio, I am not even sure if that is possible. Converting sharded tables to partitioned ones, it is not feasible because I want to query the data in near real-time. I think that be able to leverage partitioned table would be great for everyone. I am not familiar with the internal of streaming ingest, but I think it is just a table definition issues, isn't it? I am trying a different number of things now (I started using this plugin today).
In my case, I rather have one table with all the data and drop row with a select statement. Currently, I can do this by setting |
Update: I successfully inserted into an existing partitioned table, so this could be a quick around. However, I noticed that if I have one record in a batch failing the whole batch fails. Note that setting batch size to 1 will make the inser very slow. |
@alepuccetti can you please show how you managed to insert data to an existing partitioned table? I am trying to do the same thing right now, but I'm getting constant Null Pointer Exceptions in the logstash logfile (the config I am using worked without any issue before I made the table a partitioned one) |
@MPTG94 It has been a long time, so I don't remember for sure if I had to do something special. But I don't remember to have this issue. Maybe something changes form when I did it. |
I was just trying to do this myself since I want the data to be loaded in a partitioned table and set partition expiry to i.e. 60 days. i.e. in my case:
(and then just set table_prefix to mytable and table_separator + date_pattern empty) I am seeing lot of these errors, but the data is actually being loaded succesfully :
|
@simon-verzijl thanks for the reply, I ended up doing the same as you did (just creating the Partitioned table from the GCP web interface and not the command line). I am also seeing these weird errors, but data seems to flow to the tables just fine |
I don't really think this is in a place to say upstream because I don't really know ruby or java very well and had slowly work my way to get this setup but it works and allows table partitioning and clustering to be setup. I have it in a fork here https://github.com/zachaller/logstash-output-google_bigquery an example config that would create an HOUR partitioned table with 1 month table partition due to bq only supporting 4k partitions and enabling clustering on the table as well.
Should be able to build an image via
|
Bigquery can partition tables by time to save cost and faster performances when querying smaller time windows.
Enable create partitioned tables on ingestion time or a date/timestamp field in the schema.
Maybe there is a way to do that but it is not documented.
The text was updated successfully, but these errors were encountered: