-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Venice support #78
base: main
Are you sure you want to change the base?
Conversation
this.sinkOptions = addKeysAsOption(options, rowType); | ||
} | ||
|
||
private Map<String, String> addKeysAsOption(Map<String, String> options, RelDataType rowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this approach, open to suggestions.
I looked into hints to solve this and did get them working to an extent (will open a separate PR) but this would require users to pass in key information into their SQL statement. I have not figured out a way to inject hints at runtime from VeniceDriver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm surprised we need to fully specify the keys in the options. The Kafka connector has similar properties (key.prefix
, key.fields
), but you don't need both. Is the Venice connector doing something different here? I'd expect key.prefix=key_
to be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how would the Venice connector behave if we grouped the keys in a Row(...)
object? Can we just have key.fields=KEY
and then KEY ROW(F1 VARCHAR, F2 INT)
etc?
(Not suggesting we do that, just asking if possible?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm surprised we need to fully specify the keys in the options. The Kafka connector has similar properties (
key.prefix
,key.fields
), but you don't need both. Is the Venice connector doing something different here? I'd expectkey.prefix=key_
to be sufficient.
Yea I confirmed it is an issue with Venice due do some additional avro schema validation they do. They pull the keySchema and validate it against key.fields
(separate from the prefix). The key.prefix
allow these names to be different like "id" vs "key_id"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into the ROW syntax and it doesn't seem that is possible in Flink, there is no way to get Flink to destruct that ROW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I think that's fine. The only slight concern I have is potential deviation from the Kafka -> Kafka use-case. With Kafka -> Kafka users can do select *
and retain partitioning, since the output topic will use the input topic's KEY
field. Users might expect select *
to work similarly with Kafka -> Venice, except Kafka has one key (KEY
) and Venice has one or more keys (KEY_xyz
), so that might not work. I'm thinking we should actually change the way Kafka works and adopt this KEY_xzy
approach, or maybe have PipelineRel
explicitly use KEY
in cases where there is only one key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does retaining partitioning mean for the Kafka -> Venice use case? It seems like it is more of a problem on the producer side. Users that expect the same partitioning behavior would have to key their Kafka topic using the same combination of keys as Venice. We did this in Brooklin by constructing the producer key as a simple string with key values separated by _
from the source keys. Of course this isn't the same as identity partitioning but it does ensure that downstream consumer tasks read the same combo of keys.
Even for the Kafka -> Kafka use case, we aren't the ones consuming, the partitioning behavior comes from Flink right? I haven't looked into it to be fair, not sure how the behavior changes if you define key.fields
there.
6128407
to
4c2ffb9
Compare
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/VeniceStore.java
Outdated
Show resolved
Hide resolved
Makefile
Outdated
@@ -9,8 +9,8 @@ build: | |||
|
|||
bounce: build undeploy deploy deploy-samples deploy-config deploy-demo | |||
|
|||
# Integration tests expect K8s and Kafka to be running | |||
integration-tests: deploy-dev-environment deploy-samples | |||
# Integration tests expect K8s, Kafka, and Venice to be running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 🔥 🔥
this.sinkOptions = addKeysAsOption(options, rowType); | ||
} | ||
|
||
private Map<String, String> addKeysAsOption(Map<String, String> options, RelDataType rowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm surprised we need to fully specify the keys in the options. The Kafka connector has similar properties (key.prefix
, key.fields
), but you don't need both. Is the Venice connector doing something different here? I'd expect key.prefix=key_
to be sufficient.
this.sinkOptions = addKeysAsOption(options, rowType); | ||
} | ||
|
||
private Map<String, String> addKeysAsOption(Map<String, String> options, RelDataType rowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how would the Venice connector behave if we grouped the keys in a Row(...)
object? Can we just have key.fields=KEY
and then KEY ROW(F1 VARCHAR, F2 INT)
etc?
(Not suggesting we do that, just asking if possible?)
@@ -0,0 +1,11 @@ | |||
{ | |||
"type": "record", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get create table venice.foo
working :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I will spend some time looking into this when I can. Should be a simple API call just as I'm doing to fetch schemas, just different than the current paradigm since it isn't managed via K8s.
// Without forced projection this will get optimized to: | ||
// INSERT INTO `my-store` (`KEYFIELD`, `VARCHARFIELD`) SELECT * FROM `KAFKA`.`existing-topic-1`; | ||
// With forced project this will resolve as: | ||
// INSERT INTO `my-store` (`KEY_id`, `stringField`) SELECT `KEYFIELD` AS `KEY_id`, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat!
hoptimator-util/src/main/java/com/linkedin/hoptimator/util/planner/ScriptImplementor.java
Outdated
Show resolved
Hide resolved
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/LocalControllerClient.java
Outdated
Show resolved
Hide resolved
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/VeniceDriver.java
Outdated
Show resolved
Hide resolved
hoptimator-venice/src/main/java/com/linkedin/hoptimator/venice/VeniceStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if (schema.startsWith("VENICE")...)
logic needs to be fixed, but I think we can accept the TODO
and fix later.
8c6b927
to
152c9cf
Compare
152c9cf
to
1973fe0
Compare
Adds Venice support to Hoptimator.
KEY$
to prevent collisionsKEY
fields through to Sink options (intended to be used by flink under key.fields connector property)insert into "VENICE-CLUSTER0"."test-store-1" ("KEY$id", "stringField") SELECT ...
Other changes in this PR:
Implemented the Venice driver/schema classes with separate overridable functions to be able to handle company-internal connection components via a simple override
See included tests for more samples