[NODE] Correct reconnect on read-only PSQL replica #12834

shibaeff · 2024-04-16T10:19:03Z

Description
We have active-passive PostgreSQL setup and we failover our database instances, when there is a need for it. Today we tested a failover scenario in a controlled manner and chainlink-ocr node needed a manual restart.

We would like that Chainlink node automatically handles a flip of database instance from read-write (when it is still an active master) to a read-only replica (when previously active master, is not any more an active master), by tearing down all active SQL sessions, maybe sleeping for 5 seconds or so, and then re-establishing them from scratch.

Steps to Reproduce
Put 2 psql instances behind the haproxy instance. Point chainlink-ocr node to the haproxy and failover the psql from one instance to another while Chainlink node is running.

Basic Information
We're running Chainlink in the k8s environment using Docker image based on the publicly available Docker image provided by Chainlink team.
Logs on the chainlink-ocr side:

d PoR USD version 4 contract 0x6CeA38508B186DE36AAfd0f3B513E708691bc0C4 network mainnet jobID=3703 jobName=CacheGold PoR USD version 4 contract 0x6CeA38508B186DE36AAfd0f3B513E708691bc0C4 network mainnet logger=OCR version=2.10.0@0fe6514
2024-04-16T03:31:35.194Z [ERROR] Error creating SpecError ReportGeneration: DataSource errored job/orm.go:658                   err=ERROR: cannot execute INSERT in a read-only transaction (SQLSTATE 25006) logger=JobORM stacktrace=github.com/smartcontractkit/chainlink/v2/core/services/job.(*orm).TryRecordError
        /chainlink/core/services/job/orm.go:658
github.com/smartcontractkit/chainlink/v2/core/services/ocr.(*Delegate).ServicesForSpec.func1
        /chainlink/core/services/ocr/delegate.go:162
github.com/smartcontractkit/chainlink-common/pkg/logger.(*ocrWrapper).Error
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/pkg/logger/ocr.go:47
github.com/smartcontractkit/libocr/internal/loghelper.loggerWithContextImpl.ErrorIfNotCanceled
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/internal/loghelper/logger_with_context.go:54
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.(*reportGenerationState).observeValue
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/report_generation_follower.go:383
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.(*reportGenerationState).messageObserveReq
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/report_generation_follower.go:108
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.MessageObserveReq.processReportGeneration
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/message.go:123
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.(*reportGenerationState).run
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/report_generation.go:147
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.RunReportGeneration
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/report_generation.go:55
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.(*pacemakerState).spawnReportGeneration.func1
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/pacemaker.go:508
github.com/smartcontractkit/libocr/subprocesses.(*Subprocesses).Go.func1
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/subprocesses/subprocesses.go:29 version=2.10.0@0fe6514
2024-04-16T03:31:35.195Z [ERROR] ReportGeneration: DataSource errored               protocol/report_generation_follower.go:383 configDigest=94027e1122b30b47797c20a59633fbce contractAddress=0x6CeA38508B186DE36AAfd0f3B513E708691bc0C4 epoch=73913 error=Number of faulty inputs 1 to median task > number allowed faults 0: too many errors errorVerbose=too many errors
Number of faulty inputs 1 to median task > number allowed faults 0
github.com/smartcontractkit/chainlink/v2/core/services/pipeline.(*MedianTask).Run
        /chainlink/core/services/pipeline/task.median.go:53
github.com/smartcontractkit/chainlink/v2/core/services/pipeline.(*runner).executeTaskRun
        /chainlink/core/services/pipeline/runner.go:472
github.com/smartcontractkit/chainlink/v2/core/services/pipeline.(*runner).run.func1
        /chainlink/core/services/pipeline/runner.go:342
github.com/smartcontractkit/chainlink/v2/core/recovery.WrapRecoverHandle
        /chainlink/core/recovery/recover.go:40
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1650 externalJobID=d2ed2fcf-1302-487a-ae21-184765c30c1b jobID=3703 jobName=CacheGold PoR USD version 4 contract 0x6CeA38508B186DE36AAfd0f3B513E708691bc0C4 network mainnet leader=0 logger=OCR oid=3 round=4 stacktrace=github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.(*reportGenerationState).observeValue
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/report_generation_follower.go:383
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.(*reportGenerationState).messageObserveReq
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/report_generation_follower.go:108
github.com/smartcontractkit/libocr/offchainreporting/internal/protocol.MessageObserveReq.processReportGeneration
        /go/pkg/mod/github.com/smartcontractkit/[email protected]/offchainreporting/internal/protocol/message.go:123

Logs on the psql side:

root@db10:~# tail  /var/log/postgresql/log.log
                        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
                        RETURNING id;
2024-04-16 02:52:02.749 UTC [3653275] chainlink-ocr@chainlink-ocr ERROR:  cannot execute INSERT in a read-only transaction
2024-04-16 02:52:02.749 UTC [3653275] chainlink-ocr@chainlink-ocr STATEMENT:  INSERT INTO pipeline_runs (pipeline_spec_id, meta, all_errors, fatal_errors, inputs, outputs, created_at, finished_at, state)
                        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
                        RETURNING id;
2024-04-16 02:52:03.192 UTC [3653275] chainlink-ocr@chainlink-ocr ERROR:  cannot execute INSERT in a read-only transaction
2024-04-16 02:52:03.192 UTC [3653275] chainlink-ocr@chainlink-ocr STATEMENT:  INSERT INTO pipeline_runs (pipeline_spec_id, meta, all_errors, fatal_errors, inputs, outputs, created_at, finished_at, state)
                        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
                        RETURNING id;
root@db10:~# tail  /var/log/postgresql/log.log
                        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
                        RETURNING id;
2024-04-16 02:52:04.709 UTC [3653279] chainlink-ocr@chainlink-ocr ERROR:  cannot execute INSERT in a read-only transaction
2024-04-16 02:52:04.709 UTC [3653279] chainlink-ocr@chainlink-ocr STATEMENT:  INSERT INTO pipeline_runs (pipeline_spec_id, meta, all_errors, fatal_errors, inputs, outputs, created_at, finished_at, state)
                        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
                        RETURNING id;
2024-04-16 02:52:05.361 UTC [3653279] chainlink-ocr@chainlink-ocr ERROR:  cannot execute INSERT in a read-only transaction
2024-04-16 02:52:05.361 UTC [3653279] chainlink-ocr@chainlink-ocr STATEMENT:  INSERT INTO pipeline_runs (pipeline_spec_id, meta, all_errors, fatal_errors, inputs, outputs, created_at, finished_at, state)
                        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
                        RETURNING id;

Network: Ethereum
Blockchain Client: geth v1.13.14
Go Version: 1.21
Operating System: debian bullseye 11.8
Commit: Chainlink v2.10.0
Hosting Provider: self-hosted k8s + psql behind proxy

Startup Command: [e.g. docker run smartcontract/chainlink local n]
flags for the Chainlink entrypoint binary:

          - '-s'
          - /home/chainlink/secrets.toml
          - local
          - 'n'
          - '-p'
          - /home/chainlink/credentials/.password
          - '-a'
          - /home/chainlink/credentials/.api

The text was updated successfully, but these errors were encountered:

rgottleber · 2024-04-18T14:03:40Z

Thanks for sharing this and all of the details. We will take a look.

saram-aman · 2024-05-31T10:44:30Z

@rgottleber could you please assign it to me? thanks

rgottleber added the enhancement label Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NODE] Correct reconnect on read-only PSQL replica #12834

[NODE] Correct reconnect on read-only PSQL replica #12834

shibaeff commented Apr 16, 2024

rgottleber commented Apr 18, 2024

saram-aman commented May 31, 2024

[NODE] Correct reconnect on read-only PSQL replica #12834

[NODE] Correct reconnect on read-only PSQL replica #12834

Comments

shibaeff commented Apr 16, 2024

rgottleber commented Apr 18, 2024

saram-aman commented May 31, 2024