Service to retrieve published data to be used to update a search index This service calls /publisheddata endpoint on zebedee and metadata endpoint on dataset API.
This service listens to the content-updated
kafka topic for events of type contentUpdatedEvent e.g.
see schemas package. You can also read our AsyncAPI specification.
This service takes the uri, from the consumed event, and either calls ...
- ... /publisheddata endpoint on zebedee. It passes in the URI as a path parameter e.g. http://localhost:8082/publisheddata?uri=businessindustryandtrade
- ... /datasets//editions//versions//metadata endpoint on dataset API, e.g. http://localhost:22000/datasets/CPIH01/editions/timeseries/versions/1/metadata
See search service architecture docs here
- Run
make debug
- Run
make help
to see full list of make targets
The service runs in the background consuming messages from Kafka.
An example event can be created using the helper script, make produce
.
- golang 1.20.x
- Running instance of zebedee
- Requires running…
- No further dependencies other than those defined in
go.mod
To run make validate-specification
you require Node v20.x and to install @asyncapi/cli:
npm install -g @asyncapi/cli
Environment variable | Default | Description |
---|---|---|
BIND_ADDR | localhost:25800 | The host and port to bind to |
DATASET_API_URL | http://localhost:22000 |
The URL for the DatasetAPI |
GRACEFUL_SHUTDOWN_TIMEOUT | 5s | The graceful shutdown timeout in seconds (time.Duration format) |
HEALTHCHECK_INTERVAL | 30s | Time between self-healthchecks (time.Duration format) |
HEALTHCHECK_CRITICAL_TIMEOUT | 90s | Time to wait until an unhealthy dependent propagates its state to make this app unhealthy (time.Duration format) |
KAFKA_ADDR | "localhost:9092" | The address of Kafka (accepts list) |
KAFKA_OFFSET_OLDEST | true | Start processing Kafka messages in order from the oldest in the queue |
KAFKA_VERSION | 1.0.2 |
The version of Kafka |
KAFKA_NUM_WORKERS | 1 | The maximum number of parallel kafka consumers |
KAFKA_SEC_PROTO | unset (only TLS ) |
if set to TLS , kafka connections will use TLS |
KAFKA_SEC_CLIENT_KEY | unset | PEM [2] for the client key (optional, used for client auth) [1] |
KAFKA_SEC_CLIENT_CERT | unset | PEM [2] for the client certificate (optional, used for client auth) [1] |
KAFKA_SEC_CA_CERTS | unset | PEM [2] of CA cert chain if using private CA for the server cert [1] |
KAFKA_SEC_SKIP_VERIFY | false | ignore server certificate issues if set to true [1] |
KAFKA_CONTENT_UPDATED_GROUP | dp-search-data-extractor | The consumer group this application to consume content-updated messages |
KAFKA_CONTENT_UPDATED_TOPIC | content-updated | The name of the topic to consume messages from |
KAFKA_PRODUCER_TOPIC | search-data-import | The name of the topic to produce messages to |
KEYWORDS_LIMITS | -1 | The keywords allowed, default no limit |
SERVICE_AUTH_TOKEN | unset | The service auth token for the dp-search-data-extractor |
STOP_CONSUMING_ON_UNHEALTHY | true | Application stops consuming kafka messages if application is in unhealthy state |
TOPIC_TAGGING_ENABLED | false | Enable topics tagging using the topic cache |
TOPIC_CACHE_UPDATE_INTERVAL | 30m | The time interval to update topics cache (time.Duration format) |
TOPIC_API_URL | http://localhost:25300 |
The URL for the Topic API |
ZEBEDEE_URL | http://localhost:8082 |
The URL for the Zebedee |
Notes:
The /health
endpoint returns the current status of the service. Dependent services are health checked on an interval
defined by the HEALTHCHECK_INTERVAL
environment variable.
On a development machine a request to the health check endpoint can be made by:
curl localhost:25800/health
See CONTRIBUTING for details.
Copyright © 2024, Office for National Statistics (https://www.ons.gov.uk)
Released under MIT license, see LICENSE for details.