Skip to content

Commit

Permalink
adding a timeout to stream metadata health checks to increase failure…
Browse files Browse the repository at this point in the history
… visibility (#1251)

in the event of a hanging request, the health check failure is not
registered by the stream metadata service, but it is registered by the
load balancer. this causes the service to get terminated, but the
service is unable to detect the failure and log the error. this PR
allows us to capture and log the error so that we can respond to it
better
  • Loading branch information
mechanical-turk authored Oct 14, 2024
1 parent 169f4f2 commit 99811fb
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 2 deletions.
3 changes: 3 additions & 0 deletions packages/stream-metadata/src/environment.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ function makeConfig() {
tracingEnabled: envMain.TRACING_ENABLED,
profilingEnabled: envMain.PROFILING_ENABLED,
},
healthCheck: {
timeout: 5000, // 5 seconds
},
}
}

Expand Down
15 changes: 13 additions & 2 deletions packages/stream-metadata/src/routes/health.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,28 @@
import { FastifyReply, FastifyRequest } from 'fastify'

import { getRiverRegistry } from '../evmRpcClient'
import { config } from '../environment'

export async function checkHealth(request: FastifyRequest, reply: FastifyReply) {
const logger = request.log.child({ name: checkHealth.name })
// Do a health check on the river registry
try {
await getRiverRegistry().getAllNodes()
logger.info('Running riverRegistry health check')
await Promise.race([
getRiverRegistry().getAllNodes(),
new Promise((_, reject) =>
setTimeout(
() => reject(new Error('Timed out waiting for the riverRegistry check')),
config.healthCheck.timeout,
),
),
])
logger.info('Health check passed')
// healthy
return reply.code(200).send({ status: 'ok' })
} catch (error) {
// unhealthy
logger.error(error, 'Failed to get river registry')
logger.error(error, 'Health check failed')
return reply.code(500).send({ status: 'error' })
}
}

0 comments on commit 99811fb

Please sign in to comment.