Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in grpc streaming ingest_measurements #66

Open
ezychlagre opened this issue Nov 22, 2024 · 0 comments · May be fixed by #67
Open

error in grpc streaming ingest_measurements #66

ezychlagre opened this issue Nov 22, 2024 · 0 comments · May be fixed by #67
Assignees
Labels
A:existing-plugin area: an existing plugin P:high priority: high = do this first! T:bug type: Something isn't working

Comments

@ezychlagre
Copy link

ezychlagre commented Nov 22, 2024

Describe the bug
On a ovh k8s cluster, I deployed an ALUMet client on each cluster's node (2) and 1 ALUMet server. Sometimes, few second after the deployment (using an helm chart), all ALUMet's client crashed with the following error:
ERROR alumet::pipeline::elements::output] Error when asynchronously writing to relay-client/output/grpc-measurements (will stop running): error in grpc streaming ingest_measurements: status status: Cancelled, message: "h2 protocol error: http2 error", details: [], metadata: MetadataMap { headers: {} }

Environment information
OS: k8s on ovh
Alumet version²: 0.6.1 + energy-estimation-tdp plugin
Enabled plugins:

  • EnergyEstimationTdpPlugin v0.1.0
    - k8s v0.1.0
    - procfs v0.1.0
    - relay-client v0.4.0
    - socket-control v0.2.0

²: please include the commit hash if it's not a release
2611b11

To Reproduce
Steps to reproduce the behavior:

  1. deploy the ALUMet helm chart but the error is not reproductable every time ! I reproduced at least 3 times on ovh k8s cluster.

I did several deploiement tests this afternoon on ovh ck8s luster and the results are:

  1. 16h03: reproduce the error,
  2. 16h18: no error
  3. 16h27: reproduce the error,
  4. 16h31: no error
  5. 14h41: no error
  6. 16h57: no error
  7. 17h05: no error

Expected behavior
the ALUMet pipepline crashed, we get a grpc error

Logs / Output
Attached logs file of server and client part.
alumet-client-ovh-demo-ns-demo-test-2024-11-13.txt
alumet-server-ovh-demo-ns-demo-test-2024-11-13.txt
Below a screenshot influxdb dashboard but not at the same times as the above log files
influxdb-crash-AL-client

@ezychlagre ezychlagre added the T:bug type: Something isn't working label Nov 22, 2024
@TheElectronWill TheElectronWill self-assigned this Nov 22, 2024
@TheElectronWill TheElectronWill added A:existing-plugin area: an existing plugin P:high priority: high = do this first! labels Nov 22, 2024
@TheElectronWill TheElectronWill linked a pull request Nov 25, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:existing-plugin area: an existing plugin P:high priority: high = do this first! T:bug type: Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants