Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMHMS-6299: Fix PCS/TRS resource leaks and scaling issues #38

Merged
merged 8 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions changelog/v2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,28 @@ All notable changes to this project for v2.0.X will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.0.11] - 2024-11-25

### Changed

- Update timeoutSeconds for liveness probe in helm chart from 1 to 3
- Updated hms-trs-app-api vendor code (bug fixes and enhancements)
- Passed PCS's log level through to TRS to match PCS's
- Configured TRS to use connection pools for status requests to BMCs
- Renamed PCS_STATUS_HTTP_TIMEOUT to PCS_STATUS_TIMEOUT as it is not an
http timeout
- Added PCS_MAX_IDLE_CONNS and PCS_MAX_IDLE_CONNS_PER_HOST env variables
which allow overriding PCS's connection pool settings in TRS
- Added PCS_BASE_TRS_TASK_TIMEOUT env variable which allows the timeout
for power transitions and capping to be configured
- The above variables are configurable on PCS's helm chart
- At PCS start time, log all env variables that were set
- Added PodName global to facilitate easier debug of log messages
- Log start and end of large batched requests to BMCs
- Fixed many resource leaks associated with making http requests and using TRS
- Update required version of Go to 1.23 to avoid
https://github.com/golang/go/issues/59017

## [2.0.10] - 2024-10-25

### Changed
Expand Down
22 changes: 22 additions & 0 deletions changelog/v2.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,28 @@ All notable changes to this project for v2.1.X will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.1.10] - 2024-11-25

### Changed

- Update timeoutSeconds for liveness probe in helm chart from 1 to 3
- Updated hms-trs-app-api vendor code (bug fixes and enhancements)
- Passed PCS's log level through to TRS to match PCS's
- Configured TRS to use connection pools for status requests to BMCs
- Renamed PCS_STATUS_HTTP_TIMEOUT to PCS_STATUS_TIMEOUT as it is not an
http timeout
- Added PCS_MAX_IDLE_CONNS and PCS_MAX_IDLE_CONNS_PER_HOST env variables
which allow overriding PCS's connection pool settings in TRS
- Added PCS_BASE_TRS_TASK_TIMEOUT env variable which allows the timeout
for power transitions and capping to be configured
- The above variables are configurable on PCS's helm chart
- At PCS start time, log all env variables that were set
- Added PodName global to facilitate easier debug of log messages
- Log start and end of large batched requests to BMCs
- Fixed many resource leaks associated with making http requests and using TRS
- Update required version of Go to 1.23 to avoid
https://github.com/golang/go/issues/59017

## [2.1.9] - 2024-10-25

### Changed
Expand Down
4 changes: 2 additions & 2 deletions charts/v2.0/cray-power-control/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
name: "cray-power-control"
version: 2.0.10
version: 2.0.11
description: "Kubernetes resources for cray-power-control"
home: "https://github.com/Cray-HPE/hms-power-control-charts"
sources:
Expand All @@ -15,6 +15,6 @@ dependencies:
maintainers:
- name: Hardware Management
url: https://github.com/orgs/Cray-HPE/teams/hardware-management
appVersion: 2.5.0
appVersion: 2.6.0
annotations:
artifacthub.io/license: "MIT"
15 changes: 13 additions & 2 deletions charts/v2.0/cray-power-control/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
# tag: "" (default = "latest")
# pullPolicy: "" (default = "IfNotPresent")
global:
appVersion: 2.5.0
testVersion: 2.5.0
appVersion: 2.6.0
testVersion: 2.6.0

tests:
image:
Expand Down Expand Up @@ -116,12 +116,23 @@ cray-service:
configMapKeyRef:
name: cray-power-control-cacert-info
key: CA_URI
- name: PCS_BASE_TRS_TASK_TIMEOUT
value: "40"
- name: PCS_STATUS_TIMEOUT
value: "30"
- name: PCS_STATUS_HTTP_RETRIES
value: "3"
- name: PCS_MAX_IDLE_CONNS
value: "4000"
- name: PCS_MAX_IDLE_CONNS_PER_HOST
value: "4"
livenessProbe:
httpGet:
port: 28007
path: /v1/liveness
initialDelaySeconds: 15
periodSeconds: 5
timeoutSeconds: 3
readinessProbe:
httpGet:
port: 28007
Expand Down
4 changes: 2 additions & 2 deletions charts/v2.1/cray-power-control/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
name: "cray-power-control"
version: 2.1.9
version: 2.1.10
description: "Kubernetes resources for cray-power-control"
home: "https://github.com/Cray-HPE/hms-power-control-charts"
sources:
Expand All @@ -15,6 +15,6 @@ dependencies:
maintainers:
- name: Hardware Management
url: https://github.com/orgs/Cray-HPE/teams/hardware-management
appVersion: 2.5.0
appVersion: 2.6.0
annotations:
artifacthub.io/license: "MIT"
15 changes: 13 additions & 2 deletions charts/v2.1/cray-power-control/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
# tag: "" (default = "latest")
# pullPolicy: "" (default = "IfNotPresent")
global:
appVersion: 2.5.0
testVersion: 2.5.0
appVersion: 2.6.0
testVersion: 2.6.0

tests:
image:
Expand Down Expand Up @@ -120,12 +120,23 @@ cray-service:
value: "20000"
- name: EXPIRE_TIME_MINS
value: "1440"
- name: PCS_BASE_TRS_TASK_TIMEOUT
value: "40"
- name: PCS_STATUS_TIMEOUT
value: "30"
- name: PCS_STATUS_HTTP_RETRIES
value: "3"
- name: PCS_MAX_IDLE_CONNS
value: "4000"
- name: PCS_MAX_IDLE_CONNS_PER_HOST
value: "4"
livenessProbe:
httpGet:
port: 28007
path: /v1/liveness
initialDelaySeconds: 15
periodSeconds: 5
timeoutSeconds: 3
readinessProbe:
httpGet:
port: 28007
Expand Down
2 changes: 2 additions & 0 deletions cray-hms-power-control.compatibility.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ chartVersionToApplicationVersion:
"2.0.8": "2.3.0"
"2.0.9": "2.4.0"
"2.0.10": "2.5.0"
"2.0.11": "2.6.0"
"2.1.0": "2.0.0"
"2.1.1": "2.0.0"
"2.1.2": "2.0.0"
Expand All @@ -57,6 +58,7 @@ chartVersionToApplicationVersion:
"2.1.7": "2.4.0"
"2.1.8": "2.4.0"
"2.1.9": "2.5.0"
"2.1.10": "2.6.0"

# Test results for combinations of Chart, Application, and CSM versions.
chartValidationLog:
Expand Down