OCPBUGS-3009: Prune stale CCRs before aggregating scan results #221

rhmdnd · 2023-02-16T22:17:53Z

Previously, the compliance operator would leave CCRs around and then
just overwrite them on subsequent scans. While the most recent scan data
was accurate, because it was overwriting existing check results, it gave
the impression that some changes weren't taking effect.

For example, if you create a tailored profile, run a scan, exclude a
rule, and rerun the scan, it appears the change you just made never took
effect because the result from the rule you ignored still exists.

To avoid this, let's check for any check results at scan time and make
sure we clean them up before we aggregate the new results.

openshift-ci-robot · 2023-02-16T22:17:58Z

@rhmdnd: This pull request references Jira Issue OCPBUGS-3009, which is invalid:

expected the bug to target the "4.13.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Previously, the compliance operator would leave CCRs around and then
just overwrite them on subsequent scans. While the most recent scan data
was accurate, because it was overwriting existing check results, it gave
the impression that some changes weren't taking effect.

For example, if you create a tailored profile, run a scan, exclude a
rule, and rerun the scan, it appears the change you just made never took
effect because the result from the rule you ignored still exists.

To avoid this, let's check for any check results at scan time and make
sure we clean them up before we aggregate the new results.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rhmdnd · 2023-02-16T22:18:31Z

This still needs some testing but I'm waiting on a cluster.

Vincent056 · 2023-02-16T22:49:23Z

config/manager/kustomization.yaml

@@ -3,7 +3,7 @@ resources:

 images:
 - name: compliance-operator
-  newName: quay.io/compliance-operator/compliance-operator
-  newTag: latest
+  newName: image-registry.openshift-image-registry.svc:5000/openshift/compliance-operator


could you remove this change from the commit?

Vincent056 · 2023-02-16T22:51:24Z

I think it makes sense to remove CCR before aggregating phase, but I wonder if we should do that during phaseDoneHandler, when someone labels a scan to be rescanned. maybe in here: https://github.com/ComplianceAsCode/compliance-operator/pull/221/files#diff-7aba53b74d27b417a478166b1a983b145fb0e381975e97097b7794752b51edeeR612

xiaojiey · 2023-02-17T07:58:10Z

/hold for test

xiaojiey · 2023-02-23T09:39:42Z

@rhmdnd,
Verified with 4.13.0-0.nightly-2023-02-23-000625 and the code in the PR. seems it is NOT working as expected. Could you please help to check? Thanks.
Details:

1. Deploy compliance operator with code in the PR
2. Create a tp with a rule disabled in the tp:
oc apply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: TailoredProfile
metadata:
  annotations:
    compliance.openshift.io/product-type: Platform
  name: test
  namespace: openshift-compliance
spec:
  description: test
  extends: ocp4-cis
  disableRules:
  - name: ocp4-kubelet-enable-streaming-connections
    rationale: we only want to test this rule
  title: test
EOF
tailoredprofile.compliance.openshift.io/test created
3. Create a ssb with above tp and check results:
$ oc compliance bind -N test1 -S default tailoredprofile/test
Creating ScanSettingBinding test1
$ oc get suite -w
NAME    PHASE     RESULT
test1   RUNNING   NOT-AVAILABLE
test1   AGGREGATING   NOT-AVAILABLE
test1   DONE          NON-COMPLIANT
test1   DONE          NON-COMPLIANT
^C
$ oc get ccr | grep kubelet-enable-streaming-connections
$ oc get ccr | grep kubelet-eviction-thresholds-set-hard-imagefs-available
test-kubelet-eviction-thresholds-set-hard-imagefs-available    PASS     medium
4. configure the tp with one more rule disabled:
$ oc apply -f -<<EOF
apiVersion: compliance.openshift.io/v1alpha1
kind: TailoredProfile
metadata:
  annotations:
    compliance.openshift.io/product-type: Platform
  name: test
  namespace: openshift-compliance
spec:
  description: test
  extends: ocp4-cis
  disableRules:
  - name: ocp4-kubelet-enable-streaming-connections
    rationale: we only want to test this rule
  - name: ocp4-kubelet-eviction-thresholds-set-hard-imagefs-available
    rationale: we only want to test this rule
  title: test
EOF
tailoredprofile.compliance.openshift.io/test configured
5. Tried to rerun the ssb directly and check the result:
$ oc compliance rerun-now scansettingbinding test1
Rerunning scans from 'test1': test
Re-running scan 'openshift-compliance/test'
[xiyuan@MiWiFi-RA69-srv compliance-operator (pr-221)]$ oc get suite -w
NAME    PHASE     RESULT
test1   RUNNING   NOT-AVAILABLE
test1   AGGREGATING   NOT-AVAILABLE
test1   DONE          NON-COMPLIANT
test1   DONE          NON-COMPLIANT
^C
$ oget ccr | grep kubelet-enable-streaming-connections
$ oc get ccr | grep kubelet-eviction-thresholds-set-hard-imagefs-available
test-kubelet-eviction-thresholds-set-hard-imagefs-available    PASS     medium
6. Create another ssb with the tp:
$ oc compliance bind -N test2 -S default tailoredprofile/test
Creating ScanSettingBinding test2
$ oc get suite -w
NAME    PHASE   RESULT
test1   DONE    NON-COMPLIANT
test2   DONE    NON-COMPLIANT
^C
$ oc get ccr | grep kubelet-enable-streaming-connections
$ oc get ccr | grep kubelet-eviction-thresholds-set-hard-imagefs-available
test-kubelet-eviction-thresholds-set-hard-imagefs-available    PASS     medium

rhmdnd · 2023-03-09T20:50:27Z

I think this PR still needs some work, including an e2e test. I was in the process of adding a test and then discovered other issues in the test framework that were interfering with my test (specifically around resource cleanup).

rhmdnd · 2023-04-13T19:59:29Z

I was able to reproduce with the latest test.

tests/e2e/framework/client.go

rhmdnd · 2023-04-13T22:07:55Z

/retest

rhmdnd · 2023-04-14T03:37:29Z

@xiaojiey I got a clean parallel run with the test. Should be good for another pre-merge validation test.

rhmdnd · 2023-04-14T13:14:26Z

/retest e2e-aws-serial

Retest due to infrastructure timeouts.

openshift-ci · 2023-04-14T13:14:29Z

@rhmdnd: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test e2e-aws-parallel
/test e2e-aws-serial
/test go-build
/test images
/test unit
/test verify

Use /test all to run all jobs.

In response to this:

/retest e2e-aws-serial

Retest due to infrastructure timeouts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mkumku

lgtm, thank you!

rhmdnd · 2023-04-14T16:43:37Z

/retest e2e-aws-serial

openshift-ci-robot · 2023-11-08T15:18:01Z

@rhmdnd: This pull request references Jira Issue OCPBUGS-3009, which is invalid:

expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Previously, the compliance operator would leave CCRs around and then
just overwrite them on subsequent scans. While the most recent scan data
was accurate, because it was overwriting existing check results, it gave
the impression that some changes weren't taking effect.

For example, if you create a tailored profile, run a scan, exclude a
rule, and rerun the scan, it appears the change you just made never took
effect because the result from the rule you ignored still exists.

To avoid this, let's check for any check results at scan time and make
sure we clean them up before we aggregate the new results.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xiaojiey · 2023-11-09T01:52:36Z

/hold

tests/e2e/parallel/main_test.go

Vincent056

/lgtm
just some inline comments but I think we can add that in another pr

Vincent056

/lgtm

xiaojiey · 2023-11-10T08:08:22Z

Pre-merge testing passed with 4.14.0-0.nightly-2023-11-09-092851 and code in #221

1. Create a tp and create a ssb with the tailoredprofile:

$ cat tp2.yaml 
apiVersion: compliance.openshift.io/v1alpha1
kind: TailoredProfile
metadata:
  name: test
  namespace: openshift-compliance
spec:
  description: test
  title: test
  enableRules:
    - name: ocp4-api-server-anonymous-auth
      rationale: platform
    - name: ocp4-api-server-api-priority-gate-enabled
      rationale: platform
    - name: ocp4-api-server-audit-log-maxbackup
      rationale: platform
    - name: ocp4-api-server-audit-log-maxsize
      rationale: platform
    - name: ocp4-api-server-encryption-provider-cipher
      rationale: platform
$ oc apply -f tp2.yaml 
tailoredprofile.compliance.openshift.io/test created
]$ oc get tp
NAME   STATE
test   READY
$ oc compliance bind -N test tailoredprofile/test
Creating ScanSettingBinding test
$ oc compliance bind -N test tailoredprofile/test
Creating ScanSettingBinding test
$ oc get suite
NAME   PHASE     RESULT
test   DONE          NON-COMPLIANT
$ oc get ccr
NAME                                         STATUS   SEVERITY
test-api-server-anonymous-auth               PASS     medium
test-api-server-api-priority-gate-enabled    FAIL     medium
test-api-server-audit-log-maxbackup          PASS     low
test-api-server-audit-log-maxsize            PASS     medium
test-api-server-encryption-provider-cipher   FAIL     medium
$ oc get cr
NAME                                         STATE
test-api-server-encryption-provider-cipher   NotApplied
2. Edit to replace the rule ocp4-api-server-encryption-provider-cipher with ocp4-audit-profile-set, and rerun the ssb. Check the result for newly added rule is created, the removed rule has been remove from the ccr and cr.

$ oc get tp test -o yaml
...
spec:
  description: test
  enableRules:
  - name: ocp4-api-server-anonymous-auth
    rationale: platform
  - name: ocp4-api-server-api-priority-gate-enabled
    rationale: platform
  - name: ocp4-api-server-audit-log-maxbackup
    rationale: platform
  - name: ocp4-api-server-audit-log-maxsize
    rationale: platform
  - name: ocp4-audit-profile-set
    rationale: platform
  title: test
status:
  id: xccdf_compliance.openshift.io_profile_test
  outputRef:
    name: test-tp
    namespace: openshift-compliance
  state: READY
$ oc compliance rerun-now scansettingbinding test
Rerunning scans from 'test': test
Re-running scan 'openshift-compliance/test'
$ oc get suite
NAME   PHASE     RESULT
test   DONE          NON-COMPLIANT
$ oc get ccr
NAME                                        STATUS   SEVERITY
test-api-server-anonymous-auth              PASS     medium
test-api-server-api-priority-gate-enabled   FAIL     medium
test-api-server-audit-log-maxbackup         PASS     low
test-api-server-audit-log-maxsize           PASS     medium
test-audit-profile-set                      PASS     medium
$ oc get ccr --no-headers |wc -l
5
$ oc get cr
No resources found in openshift-compliance namespace.
##3. Add "extends: ocp4-cis" in the tp, and rerun the test result:

$ oc get tp test -o=jsonpath={.spec} | jq -r
{
  "description": "test",
  "enableRules": [
    {
      "name": "ocp4-api-server-anonymous-auth",
      "rationale": "platform"
    },
    {
      "name": "ocp4-api-server-api-priority-gate-enabled",
      "rationale": "platform"
    },
    {
      "name": "ocp4-api-server-audit-log-maxbackup",
      "rationale": "platform"
    },
    {
      "name": "ocp4-api-server-audit-log-maxsize",
      "rationale": "platform"
    },
    {
      "name": "ocp4-audit-profile-set",
      "rationale": "platform"
    }
  ],
  "extends": "ocp4-cis",
  "title": "test"
}
$ oc compliance rerun-now scansettingbinding test
Rerunning scans from 'test': test
Re-running scan 'openshift-compliance/test'
$ oc get suite -w
NAME   PHASE     RESULT
test   DONE          NON-COMPLIANT
$ oc get ccr --no-headers |wc -l
88
$ oc get ccr | head
NAME                                                          STATUS   SEVERITY
test-accounts-restrict-service-account-tokens                 MANUAL   medium
test-accounts-unique-service-account                          MANUAL   medium
test-api-server-admission-control-plugin-alwaysadmit          PASS     medium
test-api-server-admission-control-plugin-alwayspullimages     PASS     high
test-api-server-admission-control-plugin-namespacelifecycle   PASS     medium
test-api-server-admission-control-plugin-noderestriction      PASS     medium
test-api-server-admission-control-plugin-scc                  PASS     medium
test-api-server-admission-control-plugin-service-account      PASS     medium
test-api-server-anonymous-auth                                PASS     medium

xiaojiey · 2023-11-10T08:08:38Z

/label qe-approved

Vincent056 · 2023-11-10T09:00:00Z

I was trying to resolve the merge conflicts through github UI and then it got the master to merge to the branch to commit here

Previously, the compliance operator would leave CCRs around and then just overwrite them on subsequent scans. While the most recent scan data was accurate, because it was overwriting existing check results, it gave the impression that some changes weren't taking effect. For example, if you create a tailored profile, run a scan, exclude a rule, and rerun the scan, it appears the change you just made never took effect because the result from the rule you ignored still exists. To avoid this, let's prune stale results when we aggregate new results.

rhmdnd · 2023-11-10T13:44:12Z

Removed the hold flag since we have QE approval.

I also rebased locally to resolve the merge conflict with master in the same patch.

Thanks for all the reviews folks.

yuumasato

/lgtm

openshift-ci · 2023-11-10T15:04:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhrozek, rhmdnd, Vincent056, yuumasato

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Vincent056,jhrozek,rhmdnd]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Vincent056 · 2023-11-10T17:17:59Z

/jira refresh

openshift-ci-robot · 2023-11-10T17:18:04Z

@Vincent056: This pull request references Jira Issue OCPBUGS-3009, which is invalid:

expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-11-14T13:15:03Z

@rhmdnd: Jira Issue OCPBUGS-3009: All pull requests linked via external trackers have merged:

ComplianceAsCode/compliance-operator#221

Jira Issue OCPBUGS-3009 has been moved to the MODIFIED state.

In response to this:

Previously, the compliance operator would leave CCRs around and then
just overwrite them on subsequent scans. While the most recent scan data
was accurate, because it was overwriting existing check results, it gave
the impression that some changes weren't taking effect.

For example, if you create a tailored profile, run a scan, exclude a
rule, and rerun the scan, it appears the change you just made never took
effect because the result from the rule you ignored still exists.

To avoid this, let's check for any check results at scan time and make
sure we clean them up before we aggregate the new results.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

A recent change improved how the aggregator pod handled compliance check results, by allowing it to find all existing results, and prune results that were stale. This makes the state of the Compliance Check Results consistent with the latest run: ComplianceAsCode#221 To do this though, we needed to give the aggregator pod permissions to list and delete Compliance Check Results. But, in that patch we forgot to update the bundle build to include those new permissions. This means bundle installs are currently broken for all scans because the aggregator pod gets stuck in a crashloop, due to failing permissions. This commit updates the manifest for the bundles so that bundle installs work again.

openshift-ci-robot added jira/valid-reference jira/invalid-bug labels Feb 16, 2023

openshift-ci bot requested review from jhrozek and xiaojiey February 16, 2023 22:18

openshift-ci bot added the approved label Feb 16, 2023

Vincent056 reviewed Feb 16, 2023

View reviewed changes

openshift-ci bot added the do-not-merge/hold label Feb 17, 2023

rhmdnd force-pushed the OCPBUGS-3009 branch 2 times, most recently from b238c94 to 75e89f2 Compare April 13, 2023 19:59

rhmdnd force-pushed the OCPBUGS-3009 branch from 75e89f2 to 6f2107d Compare April 13, 2023 21:10

rhmdnd commented Apr 13, 2023

View reviewed changes

tests/e2e/framework/client.go Outdated Show resolved Hide resolved

rhmdnd force-pushed the OCPBUGS-3009 branch from 6f2107d to 3daf4ad Compare April 13, 2023 21:19

rhmdnd force-pushed the OCPBUGS-3009 branch from 3daf4ad to ed41ddb Compare April 14, 2023 01:30

rhmdnd requested review from mkumku and sheriff-rh April 14, 2023 03:37

sheriff-rh added the docs-approved label Apr 14, 2023

mkumku added the px-approved label Apr 14, 2023

mkumku reviewed Apr 14, 2023

View reviewed changes

rhmdnd removed the qe-approved label Nov 8, 2023

Vincent056 reviewed Nov 9, 2023

View reviewed changes

tests/e2e/parallel/main_test.go Show resolved Hide resolved

Vincent056 approved these changes Nov 9, 2023

View reviewed changes

openshift-ci bot added the lgtm label Nov 9, 2023

rhmdnd force-pushed the OCPBUGS-3009 branch from c86fb07 to a975b20 Compare November 9, 2023 21:52

openshift-ci bot removed the lgtm label Nov 9, 2023

Vincent056 approved these changes Nov 9, 2023

View reviewed changes

openshift-ci bot added the lgtm label Nov 9, 2023

openshift-ci bot added qe-approved and removed lgtm labels Nov 10, 2023

rhmdnd force-pushed the OCPBUGS-3009 branch from 71264b3 to f043d19 Compare November 10, 2023 13:40

rhmdnd removed the do-not-merge/hold label Nov 10, 2023

yuumasato approved these changes Nov 10, 2023

View reviewed changes

openshift-ci bot assigned yuumasato Nov 10, 2023

openshift-ci bot added the lgtm label Nov 10, 2023

rhmdnd removed the jira/invalid-bug label Nov 14, 2023

openshift-merge-bot bot merged commit ebcd30a into ComplianceAsCode:master Nov 14, 2023
6 checks passed

rhmdnd mentioned this pull request Nov 28, 2023

CMP-3009: Update manifest to include new aggregator permissions #485

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-3009: Prune stale CCRs before aggregating scan results #221

OCPBUGS-3009: Prune stale CCRs before aggregating scan results #221

rhmdnd commented Feb 16, 2023

openshift-ci-robot commented Feb 16, 2023

rhmdnd commented Feb 16, 2023

Vincent056 Feb 16, 2023

Vincent056 commented Feb 16, 2023 •

edited

Loading

xiaojiey commented Feb 17, 2023

xiaojiey commented Feb 23, 2023 •

edited

Loading

rhmdnd commented Mar 9, 2023

rhmdnd commented Apr 13, 2023

rhmdnd commented Apr 13, 2023

rhmdnd commented Apr 14, 2023

rhmdnd commented Apr 14, 2023

openshift-ci bot commented Apr 14, 2023

mkumku left a comment

rhmdnd commented Apr 14, 2023

openshift-ci-robot commented Nov 8, 2023

xiaojiey commented Nov 9, 2023

Vincent056 left a comment

Vincent056 left a comment

xiaojiey commented Nov 10, 2023

xiaojiey commented Nov 10, 2023

Vincent056 commented Nov 10, 2023

rhmdnd commented Nov 10, 2023

yuumasato left a comment

openshift-ci bot commented Nov 10, 2023

Vincent056 commented Nov 10, 2023

openshift-ci-robot commented Nov 10, 2023

openshift-ci-robot commented Nov 14, 2023

OCPBUGS-3009: Prune stale CCRs before aggregating scan results #221

OCPBUGS-3009: Prune stale CCRs before aggregating scan results #221

Conversation

rhmdnd commented Feb 16, 2023

openshift-ci-robot commented Feb 16, 2023

rhmdnd commented Feb 16, 2023

Vincent056 Feb 16, 2023

Choose a reason for hiding this comment

Vincent056 commented Feb 16, 2023 • edited Loading

xiaojiey commented Feb 17, 2023

xiaojiey commented Feb 23, 2023 • edited Loading

rhmdnd commented Mar 9, 2023

rhmdnd commented Apr 13, 2023

rhmdnd commented Apr 13, 2023

rhmdnd commented Apr 14, 2023

rhmdnd commented Apr 14, 2023

openshift-ci bot commented Apr 14, 2023

mkumku left a comment

Choose a reason for hiding this comment

rhmdnd commented Apr 14, 2023

openshift-ci-robot commented Nov 8, 2023

xiaojiey commented Nov 9, 2023

Vincent056 left a comment

Choose a reason for hiding this comment

Vincent056 left a comment

Choose a reason for hiding this comment

xiaojiey commented Nov 10, 2023

xiaojiey commented Nov 10, 2023

Vincent056 commented Nov 10, 2023

rhmdnd commented Nov 10, 2023

yuumasato left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Nov 10, 2023

Vincent056 commented Nov 10, 2023

openshift-ci-robot commented Nov 10, 2023

openshift-ci-robot commented Nov 14, 2023

Vincent056 commented Feb 16, 2023 •

edited

Loading

xiaojiey commented Feb 23, 2023 •

edited

Loading