forked from raystack/firehose
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: - bump depot version - add maxcompute sink - adjust gradle dependencies * fix: Instrumentation * chore: add configuration for image building and local testing * chore: cleanup unused change * chore: add maxcompute sink documentation * chore: change version to 0.11.0 and depot version to 0.10.0 * chore: fix maxcompute-sink.md * fix: wrong class name * chore: Update maxcompute-sink.md
- Loading branch information
1 parent
a376678
commit 22d0afb
Showing
5 changed files
with
85 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# MaxCompute sink | ||
|
||
### Datatype Protobuf | ||
|
||
MaxCompute sink has several responsibilities, including : | ||
|
||
1. Creation of MaxCompute table if it does not exist. | ||
2. Updating the MaxCompute table schema based on the latest protobuf schema. | ||
3. Translating protobuf messages into MaxCompute compatible records and inserting them into MaxCompute tables. | ||
|
||
## MaxCompute Table Schema Update | ||
|
||
### Protobuf | ||
|
||
MaxCompute Sink update the MaxCompute table schema on separate table update operation. MaxCompute | ||
utilise [Stencil](https://github.com/goto/stencil) to parse protobuf messages generate schema and update MaxCompute | ||
tables with the latest schema. | ||
The stencil client periodically reload the descriptor cache. Table schema update happened after the descriptor caches | ||
uploaded. | ||
|
||
#### Supported Protobuf - MaxCompute Table Type Mapping | ||
|
||
| Protobuf Type | MaxCompute Type | | ||
|------------------------------------------|-------------------------------| | ||
| bytes | BINARY | | ||
| string | STRING | | ||
| enum | STRING | | ||
| float | FLOAT | | ||
| double | DOUBLE | | ||
| bool | BOOLEAN | | ||
| int64, uint64, fixed64, sfixed64, sint64 | BIGINT | | ||
| int32, uint32, fixed32, sfixed32, sint32 | INT | | ||
| message | STRUCT | | ||
| .google.protobuf.Timestamp | TIMESTAMP_NTZ | | ||
| .google.protobuf.Struct | STRING (Json Serialised) | | ||
| .google.protobuf.Duration | STRUCT | | ||
| map<k,v> | ARRAY<STRUCT<key:k, value:v>> | | ||
|
||
## Partitioning | ||
|
||
MaxCompute Sink supports creation of table with partition configuration. Currently, MaxCompute Sink supports primitive field(STRING, TINYINT, SMALLINT, BIGINT) | ||
and timestamp field based partitioning. Timestamp based partitioning strategy introduces a pseudo-partition column with the value of the timestamp field truncated to the nearest start of day. | ||
|
||
## Clustering | ||
|
||
MaxCompute Sink currently does not support clustering. | ||
|
||
## Metadata | ||
|
||
For data quality checking purposes, sometimes some metadata need to be added on the record. | ||
if `SINK_MAXCOMPUTE_ADD_METADATA_ENABLED` is true then the metadata will be added. | ||
`SINK_MAXCOMPUTE_METADATA_NAMESPACE` is used for another namespace to add columns | ||
if namespace is empty, the metadata columns will be added in the root level. | ||
`SINK_MAXCOMPUTE_METADATA_COLUMNS_TYPES` is set with kafka metadata column and their type, | ||
An example of metadata columns that can be added for kafka records. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,5 +19,6 @@ public enum SinkType { | |
BLOB, | ||
BIGQUERY, | ||
BIGTABLE, | ||
MONGODB | ||
MONGODB, | ||
MAXCOMPUTE | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters