Schema Registry Statistics Tool is a small utility that allows you to easily identify the usage of different schema versions within a topic.
Using this tool, you can consume from a topic, while calculating the percentage of each schema version.
Example output:
[sr-stats] 2022/12/28 10:02:12 Starting to consume from payments-topic
[sr-stats] 2022/12/28 10:02:12 Consumer up and running!...
[sr-stats] 2022/12/28 10:02:12 Use SIGINT to stop consuming.
[sr-stats] 2022/12/28 10:02:14 terminating: via signal
[sr-stats] 2022/12/28 10:02:14 Total messages consumed: 81
Schema ID 1 => 77%
Schema ID 3 => 23%
As you can see, in the payments-topic
, 77% of the messages are produced using schema ID 1, while the remaining messages are produced using schema ID 3.
You can get the schema by ID:
curl -s http://<SCHEMA_REGISTRY_ADDR>/schemas/ids/1 | jq .
For further offsets analysis, you can store the results into a JSON file:
{"1":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61],"3":[62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80]}
Name | Description | Require | Type | default |
---|---|---|---|---|
--bootstrap |
The Kafka bootstrap servers. | V |
string |
"localhost:9092" |
--topic |
The topic name to consume. | V |
string |
"" |
--version |
The Kafka client version to be used. | string |
"2.1.1" | |
--group |
The consumer group name. | string |
schema-stats | |
--user |
The Kafka username for authentication. | string |
"" | |
--password |
The Kafka authentication password. | string |
"" | |
--tls |
Use TLS communication. | bool |
false |
|
--cert |
When TLS communication is enabled, specify the path for the CA certificate. | when tls |
string |
"" |
--store |
Store results into a file. | bool |
false |
|
--chart |
Generate pie chart from results. | bool |
false |
|
--path |
If store flag is set, the path to store the file. |
string |
"/tmp/results.json" | |
--oldest |
Consume from oldest offset. | bool |
true |
|
--limit |
Limit consumer to X messages, if different than 0. | int |
0 | |
--verbose |
Raise the consumer log level. | bool |
false |
./schema-registry-statistics --bootstrap kafka1:9092 --group stat-consumer --topic payments-topic --store --path ~/results.json
Consume from payments-topic
of kafka1
and store the results. The consumer will run until SIGINT
(CMD + C
) will be used.
By using the --chart
flag, you can generate an HTML page with a pie chart visualization:
According to the Kafka wire format, has only a couple of components:
Bytes | Area | Description |
---|---|---|
0 | Magic Byte | Confluent serialization format version number; currently always 0. |
1-4 | Schema ID | 4-byte schema ID as returned by the Schema Registry. |
5.. | Data | Serialized data in the specified schema format (Avro, Protobuf). |
The tool leverage this format, and reads the binary format of the each message in order to extract the schema ID and store it.
You can use the docker-compose.yml
file to create a local environment from scratch.
In the /scripts
directory, there are 2 versions of the same schema, and a simple Python Avro producer.
This project is licensed under the Apache License - see the LICENSE file for details.