Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod owner attributes for RDMA metrics #4294

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

lou-lan
Copy link
Collaborator

@lou-lan lou-lan commented Nov 18, 2024

No description provided.

Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 75.63025% with 29 lines in your changes missing coverage. Please review.

Project coverage is 79.98%. Comparing base (b6d671a) to head (3976574).

Files with missing lines Patch % Lines
pkg/podownercache/pod_owner_cache.go 80.76% 14 Missing and 6 partials ⚠️
pkg/rdmametrics/metrics.go 40.00% 7 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4294      +/-   ##
==========================================
+ Coverage   79.60%   79.98%   +0.37%     
==========================================
  Files          53       54       +1     
  Lines        6252     6359     +107     
==========================================
+ Hits         4977     5086     +109     
+ Misses       1083     1071      -12     
- Partials      192      202      +10     
Flag Coverage Δ
unittests 79.98% <75.63%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/rdmametrics/metrics.go 79.48% <40.00%> (-0.36%) ⬇️
pkg/podownercache/pod_owner_cache.go 80.76% <80.76%> (ø)

... and 1 file with indirect coverage changes

@ty-dc
Copy link
Collaborator

ty-dc commented Nov 18, 2024

This problem #4293 seems to cause CI to fail

and the trivy images failed https://github.com/spidernet-io/spiderpool/actions/runs/11910688357/job/33237767686?pr=4294, please wait #4289

@lou-lan lou-lan force-pushed the update-metrics branch 2 times, most recently from 638d96a to c8de765 Compare November 19, 2024 09:20
@lou-lan lou-lan added release/feature-new release note for new feature and removed kind/feature labels Nov 19, 2024
@lou-lan lou-lan force-pushed the update-metrics branch 2 times, most recently from 59b290a to 0be0d4c Compare November 19, 2024 10:04
@lou-lan lou-lan mentioned this pull request Nov 21, 2024
@lou-lan lou-lan force-pushed the update-metrics branch 3 times, most recently from cb6dc13 to 4eef3b3 Compare November 21, 2024 10:15
@lou-lan lou-lan force-pushed the update-metrics branch 3 times, most recently from 93045fe to 838dec1 Compare November 25, 2024 01:51
@weizhoublue
Copy link
Collaborator

weizhoublue commented Nov 25, 2024

几个 图片 可以把 整个 窗口 都 截出来,不是 部分 内容

@lou-lan
Copy link
Collaborator Author

lou-lan commented Nov 25, 2024

几个 图片 可以把 整个 窗口 都 截出来,不是 部分 内容

Done :

  1. cluster 面板 增加 top pod
  2. workload 的面板,标题只显示 summary 就行
  3. pod 的面板,node 下拉为可选项,支持 All 选所有

@lou-lan lou-lan force-pushed the update-metrics branch 3 times, most recently from 93940d0 to 2d0e1cc Compare November 25, 2024 12:01
@@ -0,0 +1,10 @@
# for rdma metrics exporter, read rdma pod owner's info
# for example, the rdma pod owner is a job, the job's owner is a cronjob
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是不是有点暴力,读取所有的资源。目前 rbac 文件都是自动生成,在在 pkg/k8s/apis/声明,是不是应该 精确声明哪些资源应该可以被list/watch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,这个会被 一些 社区 安全 规则 challenge 的

并且,如果这个 role 是 观测单独需要的,它 应该有 if 的 条件 使能 判断, 否则,这个 role 的一些 配置 合入 现有的 role 中

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单独一个项目比较好, rdma-expoter/rdma-expoter,一个组件一个职责,避免 role 滥用

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是因为 AI 之类的 Job 是不可预期的 CRD,我先加个开关,仅开了 rdma 观测才绑定所有 read,在 pkg/k8s/apis/声明目前开关好像无法实现。

cmd/spiderpool-agent/cmd/daemon.go Outdated Show resolved Hide resolved
cmd/spiderpool-agent/cmd/metrics_server.go Show resolved Hide resolved
@lou-lan lou-lan force-pushed the update-metrics branch 4 times, most recently from 8d5832c to aee4d06 Compare November 27, 2024 12:33
@ty-dc
Copy link
Collaborator

ty-dc commented Nov 27, 2024

CI 失败已知问题,已在 #4280 中修复。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/feature-new release note for new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants