Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ca/auth: update Azure token when invalid #1134

Merged
merged 3 commits into from
Nov 11, 2024

Conversation

chaporgin
Copy link
Member

@chaporgin chaporgin commented Nov 7, 2024

This changes the version of cluster autoscaler from tag cluster-autoscaler-1.27.8 to branch cluster-autoscaler-release-1.28, commit 10a229ac17ea8049248d1c3ce2923b94a4f9085c. Motivation:

We get an occasional error in Azure:

E1106 12:08:11.509971       1 azure_manager.go:177] Failed to regenerate Azure cache: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 401, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/REDUCTED/resourceGroups/MC_dev-eastus2-aks2_dev-azure-eastus2-aks2_eastus2/providers/Microsoft.Compute/virtualMachineScaleSets?api-version=2022-03-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-11-06T12:08:11.4851735Z, assertion valid from 2024-11-04T18:55:21.0000000Z, expiry time of assertion 2024-11-04T19:55:21.0000000Z. Review the documentation at https://learn.microsoft.com/entra/identity-platform/certificate-credentials . Trace ID: 1c8e947d-f154-4052-9e8a-8529877f7c00 Correlation ID: b04f2ef9-09f7-4f4e-80f8-15d313c5568f Timestamp: 2024-11-06 12:08:11Z","error_codes":[700024],"timestamp":"2024-11-06 12:08:11Z","trace_id":"1c8e947d-f154-4052-9e8a-8529877f7c00","correlation_id":"b04f2ef9-09f7-4f4e-80f8-15d313c5568f","error_uri":"https://login.microsoftonline.com/error?code=700024"} Endpoint https://login.microsoftonline.com/c8350122-1697-4543-929a-d4a75d1bb552/oauth2/token?api-version=1.0

CA seems to have fixed that with recent versions by switching to the cloud-provider-azure package, which has a callback to reread the JWT token when needed. This is already present in the cluster-autoscaler-release-1.28 branch, but it is not present in the cluster-autoscaler-1.28.6 tag that I used previously in 26d39a6. Instead, in this branch, the code reads JWT from the filesystem only once and does not consider that AKS will occasionally replace it.

Are we OK with versioning this as neondatabase/cluster-autoscaler-neonvm:k8s-1.28-2024-10-07?

https://github.com/neondatabase/cloud/issues/18284

This changes version of cluster autoscaler from tag `cluster-autoscaler-1.27.8
` to branch `cluster-autoscaler-release-1.28`, commit 10a229ac17ea8049248d1c3ce2923b94a4f9085c. Motivation:

We get an occasional error in Azure:
```
E1106 12:08:11.509971       1 azure_manager.go:177] Failed to regenerate Azure cache: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 401, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/REDUCTED/resourceGroups/MC_dev-eastus2-aks2_dev-azure-eastus2-aks2_eastus2/providers/Microsoft.Compute/virtualMachineScaleSets?api-version=2022-03-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-11-06T12:08:11.4851735Z, assertion valid from 2024-11-04T18:55:21.0000000Z, expiry time of assertion 2024-11-04T19:55:21.0000000Z. Review the documentation at https://learn.microsoft.com/entra/identity-platform/certificate-credentials . Trace ID: 1c8e947d-f154-4052-9e8a-8529877f7c00 Correlation ID: b04f2ef9-09f7-4f4e-80f8-15d313c5568f Timestamp: 2024-11-06 12:08:11Z","error_codes":[700024],"timestamp":"2024-11-06 12:08:11Z","trace_id":"1c8e947d-f154-4052-9e8a-8529877f7c00","correlation_id":"b04f2ef9-09f7-4f4e-80f8-15d313c5568f","error_uri":"https://login.microsoftonline.com/error?code=700024"} Endpoint https://login.microsoftonline.com/c8350122-1697-4543-929a-d4a75d1bb552/oauth2/token?api-version=1.0
```

CA seem to have fixed that with recent versions by switching to the `cloud-provider-azure package`, which has a callback to reread the jwt token when needed. This is already present in the `cluster-autoscaler-release-1.28` branch, but it is not present in the `cluster-autoscaler-1.28.6` tag that I used previously. Instead, in this branch, the code reads jwt from the filesystem only once and does not consider that AKS will ever refresh it.

Are we OK with versioning this as `neondatabase/cluster-autoscaler-neonvm:k8s-1.28-2024-10-07`?
@chaporgin chaporgin requested review from sharnoff and edude03 November 7, 2024 09:02
@chaporgin chaporgin force-pushed the chaporgin/18284-from-cache-prod branch from 34cfad3 to c03df3f Compare November 7, 2024 09:02
@chaporgin chaporgin changed the title [ca/azure-token] impr: update Azure token when invalid ca/auth update Azure token when invalid Nov 7, 2024
@chaporgin chaporgin changed the title ca/auth update Azure token when invalid ca/auth: update Azure token when invalid Nov 7, 2024
@chaporgin chaporgin marked this pull request as ready for review November 7, 2024 09:29
@edude03
Copy link
Contributor

edude03 commented Nov 8, 2024

Are we OK with versioning this as neondatabase/cluster-autoscaler-neonvm:k8s-1.28-2024-10-07?

I think that's fine. Although if I was going to be super nitpicky I'd love if we had our own version as part of the tag like
CA-0.1-k8s-128-api but that's basically bikeshedding

@chaporgin chaporgin merged commit ff2c34a into main Nov 11, 2024
23 checks passed
@chaporgin chaporgin deleted the chaporgin/18284-from-cache-prod branch November 11, 2024 07:51
@chaporgin
Copy link
Member Author

I'd love if we had our own version as part of the tag like CA-0.1-k8s-128-api

Noted about the version format, I will apply it in the next iterations, if any will take place.

sharnoff added a commit that referenced this pull request Nov 15, 2024
This is a follow-up to #1134, to return the formatting to:

1. Move imports so that the standard library remains in a
   newline-separated group by itself.
2. Use 'git diff' between commits — this results in the changes to
   context markers to show the function names, plus the simpler header.
sharnoff added a commit that referenced this pull request Nov 20, 2024
This is a follow-up to #1134, to return the formatting to:

1. Move imports so that the standard library remains in a
   newline-separated group by itself.
2. Use 'git diff' between commits — this results in the changes to
   context markers to show the function names, plus the simpler header.
edude03 pushed a commit that referenced this pull request Nov 22, 2024
This changes the version of cluster autoscaler from tag `cluster-autoscaler-1.27.8
` to branch `cluster-autoscaler-release-1.28`, commit `10a229ac17ea8049248d1c3ce2923b94a4f9085c`. Motivation:

We get an occasional error in Azure:
```
E1106 12:08:11.509971       1 azure_manager.go:177] Failed to regenerate Azure cache: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 401, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/REDUCTED/resourceGroups/MC_dev-eastus2-aks2_dev-azure-eastus2-aks2_eastus2/providers/Microsoft.Compute/virtualMachineScaleSets?api-version=2022-03-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-11-06T12:08:11.4851735Z, assertion valid from 2024-11-04T18:55:21.0000000Z, expiry time of assertion 2024-11-04T19:55:21.0000000Z. Review the documentation at https://learn.microsoft.com/entra/identity-platform/certificate-credentials . Trace ID: 1c8e947d-f154-4052-9e8a-8529877f7c00 Correlation ID: b04f2ef9-09f7-4f4e-80f8-15d313c5568f Timestamp: 2024-11-06 12:08:11Z","error_codes":[700024],"timestamp":"2024-11-06 12:08:11Z","trace_id":"1c8e947d-f154-4052-9e8a-8529877f7c00","correlation_id":"b04f2ef9-09f7-4f4e-80f8-15d313c5568f","error_uri":"https://login.microsoftonline.com/error?code=700024"} Endpoint https://login.microsoftonline.com/c8350122-1697-4543-929a-d4a75d1bb552/oauth2/token?api-version=1.0
```

CA seems to have fixed that with recent versions by switching to the `cloud-provider-azure package`, which has a callback to reread the JWT token when needed. This is already present in the `cluster-autoscaler-release-1.28` branch, but it is not present in the `cluster-autoscaler-1.28.6` tag that I used previously in 26d39a6. Instead, in this branch, the code reads JWT from the filesystem only once and does not consider that AKS will occasionally replace it.

neondatabase/cloud#18284
edude03 pushed a commit that referenced this pull request Nov 25, 2024
This changes the version of cluster autoscaler from tag `cluster-autoscaler-1.27.8
` to branch `cluster-autoscaler-release-1.28`, commit `10a229ac17ea8049248d1c3ce2923b94a4f9085c`. Motivation:

We get an occasional error in Azure:
```
E1106 12:08:11.509971       1 azure_manager.go:177] Failed to regenerate Azure cache: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 401, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/REDUCTED/resourceGroups/MC_dev-eastus2-aks2_dev-azure-eastus2-aks2_eastus2/providers/Microsoft.Compute/virtualMachineScaleSets?api-version=2022-03-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-11-06T12:08:11.4851735Z, assertion valid from 2024-11-04T18:55:21.0000000Z, expiry time of assertion 2024-11-04T19:55:21.0000000Z. Review the documentation at https://learn.microsoft.com/entra/identity-platform/certificate-credentials . Trace ID: 1c8e947d-f154-4052-9e8a-8529877f7c00 Correlation ID: b04f2ef9-09f7-4f4e-80f8-15d313c5568f Timestamp: 2024-11-06 12:08:11Z","error_codes":[700024],"timestamp":"2024-11-06 12:08:11Z","trace_id":"1c8e947d-f154-4052-9e8a-8529877f7c00","correlation_id":"b04f2ef9-09f7-4f4e-80f8-15d313c5568f","error_uri":"https://login.microsoftonline.com/error?code=700024"} Endpoint https://login.microsoftonline.com/c8350122-1697-4543-929a-d4a75d1bb552/oauth2/token?api-version=1.0
```

CA seems to have fixed that with recent versions by switching to the `cloud-provider-azure package`, which has a callback to reread the JWT token when needed. This is already present in the `cluster-autoscaler-release-1.28` branch, but it is not present in the `cluster-autoscaler-1.28.6` tag that I used previously in 26d39a6. Instead, in this branch, the code reads JWT from the filesystem only once and does not consider that AKS will occasionally replace it.

neondatabase/cloud#18284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants