Skip to content
This repository has been archived by the owner on May 6, 2021. It is now read-only.

Jenkins idler is retaining stale data on active builds #143

Open
ldimaggi opened this issue Mar 2, 2018 · 19 comments
Open

Jenkins idler is retaining stale data on active builds #143

ldimaggi opened this issue Mar 2, 2018 · 19 comments
Assignees
Milestone

Comments

@ldimaggi
Copy link

ldimaggi commented Mar 2, 2018

Related to issue: openshiftio/openshift.io#2418

It appears that the issue reported in #2418 is caused by stale data relating to a completed build that is marked as active. The idler sees this build as active:


    "Name": "ldimaggi-osiotest2",
    "ID": "61ddf7b3-d141-402e-ae6a-30ee65f1879b",
    "ActiveBuild": {
        "metadata": {
            "name": "march1test-2",
            "\nnamespace": "ldimaggi-osiotest2",
            "annotations": {
                "openshift.io/build.number": "2",
                "openshift.io/jenkins-namespace": "ldimaggi-o\nsiotest2-jenkins"
            },
            "Generation": 0
        },
        "status": {
            "phase": "Running",
            "startTimestamp": "2018-03-01T18:18:49Z",
            "completionTimestamp\n": "2018-03-01T18:18:51.416135334Z"
        },
        "spec": {
            "replicas": 0,
            "Strategy": {
                "Type": "JenkinsPipeline"
            }
        }
    },
    "DoneBuild": {
        "metadata": {
            "\nname": "march1test-1",
            "namespace": "ldimaggi-osiotest2",
            "annotations": {
                "openshift.io/build.number": "1",
                "openshift.io/jenkins-\nnamespace": "ldimaggi-osiotest2-jenkins"
            },
            "Generation": 0
        },
        "status": {
            "phase": "Complete",
            "startTimestamp": "2018-03-01T17:44:17\nZ",
            "completionTimestamp": "2018-03-01T18:09:19Z"
        },
        "spec": {
            "replicas": 0,
            "Strategy": {
                "Type": "JenkinsPipeline"
            }
        }
    },
    "JenkinsState\nList": null,
    "JenkinsLastUpdate": "2018-03-02T13:26:41Z"
}

And - an active build prevents the idler from running.

But - no builds are active:

oc get builds -n ldimaggi-osiotest2
No resources found.
@hferentschik
Copy link
Contributor

hferentschik commented Mar 2, 2018

@ldimaggi So here is the thing, where what happened with your builds in general? There is not even a completed build. Where are the march1test-1 and march1test-2 builds? They don't even seem to show up as completed builds.

I think you reset the environment, right? Does this also reset pipeline builds? The logic in the Idler is expecting a specific flow a Build goes through and transitions the internal state accordingly. I am wondering whether the resetting of the environment basically screws up this state transitions so that the state we keep in memory is getting out of sync.

@vpavlin what do you think?

@ldimaggi @aslakknutsen I am not familiar with how the resetting of the environment works. What does it in terms of OpenShift actions? What happens with existing builds? What type of events (if any) would be oberservable?

@hferentschik
Copy link
Contributor

I think this relates to issue #120 and #141. We should consider timestamps and we should change how we model the data.

@ldimaggi
Copy link
Author

ldimaggi commented Mar 2, 2018

It's my understanding that the env reset removes the build configs and deploy configs - not sure what in addition to that. Wouldn't deleting the bc's and dc's remove all running builds?

How can I clean up this situation today? I cannot see anything in OS O via oc.

@hferentschik
Copy link
Contributor

It's my understanding that the env reset removes the build configs and deploy configs - not sure what in addition to that.

Sure

Wouldn't deleting the bc's and dc's remove all running builds?

I think the problem is really on the Idler side. Not sure whether there is much you can do from your end right now. Restarting the Idler might help. As part of this issue, I am planning to add some sort of reset call which would allow to reset the state for a single namespace. This way if a namespace gets into a inconsistent state (in terms of the model the Idler build of it), there is an easy way to reset just this namespace. But all this required changes on the Idler code first.

hferentschik added a commit to hferentschik/fabric8-jenkins-idler that referenced this issue Mar 2, 2018
hferentschik added a commit to hferentschik/fabric8-jenkins-idler that referenced this issue Mar 2, 2018
@hferentschik hferentschik changed the title Jenkins idler seems to be retaining stale data on active builds - this blocks idler from running Jenkins idler is retaining stale data on active builds Mar 15, 2018
@chmouel chmouel assigned chmouel and unassigned hferentschik Mar 20, 2018
@lordofthejars lordofthejars self-assigned this Jul 19, 2018
@lordofthejars
Copy link
Contributor

@hferentschik I am starting looking at this issue. After talking with @chmouel it seems that the approach to fix this is to watch for delete build event, and when this happens then we remove the data from idler so there is no more stale data on it.

Do you think is the right approach to fix this, or you will prefer to have a /reset endpoint to be called externally to delete all data?

@vpavlin
Copy link
Member

vpavlin commented Jul 20, 2018

@hferentschik I am starting looking at this issue. After talking with @chmouel it seems that the approach to fix this is to watch for delete build event, and when this happens then we remove the data from idler so there is no more stale data on it.

Yeah, that makes sense, looking at the HandleBuild code, I think it would be good to look at what happens when the build is deleted and handle that case in this function.

That should prevent weird behaviour is you are able to recognise the delete event and react appropriately.

I'd still consider a /reset endpoint for the user/namespace to clean it up - might be even useful for testing. I'd also add the logic to call the endpoint for the "Reset environment" flow to make sure the user start with the clean slate.

Might be also useful to check what happens with Proxy - imagine there is a webhook buffered and you reset the environment - I am not sure if it will simply stay there and retry forever, or if it disappears (sorry, long time no see with the code:) ).

Hope this helps

@chmouel
Copy link
Contributor

chmouel commented Jul 20, 2018

yeah +1 on having a /reset may be a good idea anyway for ops and "Reset Env"!

@lordofthejars
Copy link
Contributor

Ok, I will start with /reset which seems easy to implement. But then rest removes all data right? I mean I do not filter anything.

lordofthejars added a commit to lordofthejars/fabric8-jenkins-idler that referenced this issue Jul 24, 2018
@lordofthejars
Copy link
Contributor

Now that I have implemented the /reset endpoint, I have started to look at reacting to a delete build event and remove these data do not become stale data.

The problem is that I am not really sure if this can be detected using the event. Let me explain why:

Build events are thrown for any change that occurs on that object, the build object is specified at https://docs.openshift.com/online/rest_api/apis-build.openshift.io/v1.Build.html#object-schema and there is one field called status which you expect that there is what you need to check to know if it has been deleted or not. Then there is one field that it is called phase which you might expect the phase of the build so if it is running, canceled or deleted. Sadly if you check the possible values of this field, you get next list (https://github.com/openshift/origin/blob/master/pkg/build/apis/build/types.go#L403) so there is no deleted phase.

So my naive question is: Is enough to react to the canceled event? Since if the build is running and someone deletes it, then the build is canceled. If it is deleted when done, then we have already received the complete event so we should not modify anything.

WDYT?

@chmouel @vpavlin

@ldimaggi
Copy link
Author

Followup question - is the build automatically deleted when done?

@kishansagathiya
Copy link
Member

Followup question - is the build automatically deleted when done?

Nope, which is why we are able to see all pipeline runs in OSIO and OpenShift.
But that is something obvious, am I not understanding your question well?

@kishansagathiya
Copy link
Member

so there is no deleted phase

If there is a build with phase deleted, it isn't really deleted, is it?

@lordofthejars
Copy link
Contributor

lordofthejars commented Jul 24, 2018 via email

@kishansagathiya
Copy link
Member

I was able to reproduce this.

@kishansagathiya
Copy link
Member

kishansagathiya commented Sep 26, 2018

So, all the build related things are stored in userIdler, which is stored in the memory. It takes some time for recent change to get reflected in userIdler. So, immediately after reset environments, if you call info api, it will get the old data that is stored in the memory. Given some time this should change to the current state.

This issue is consistent and easily reproducible. After resetting the environment I saw old build data. I ran a new build and this is what I saw after that.

[kishansagathiya@localhost fabric8-jenkins-idler]$ curl http://localhost:8080/api/idler/info/ksagathi-preview
{"error": "Could not find queried namespace"}[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ curl http://localhost:8080/api/idler/info/ksagathi-preview
{"error": "Could not find queried namespace"}[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ 
[kishansagathiya@localhost fabric8-jenkins-idler]$ curl http://localhost:8080/api/idler/info/ksagathi-preview
{"Name":"ksagathi-preview","ID":"7219a11c-f86a-4db1-ab3e-83216ff53009","ActiveBuild":{"metadata":{"annotations":{},"Generation":0},"status":{"phase":"New","startTimestamp":{"Time":"0001-01-01T00:00:00Z"},"completionTimestamp":{"Time":"0001-01-01T00:00:00Z"}},"spec":{"replicas":0,"Strategy":{"Type":""}}},"DoneBuild":{"metadata":{"name":"app-test-10-1","namespace":"ksagathi-preview","annotations":{"openshift.io/build.number":"1","openshift.io/jenkins-namespace":"ksagathi-preview-jenkins"},"Generation":0},"status":{"phase":"Complete","startTimestamp":{"Time":"2018-09-25T10:38:25Z"},"completionTimestamp":{"Time":"2018-09-25T13:35:27Z"}},"spec":{"replicas":0,"Strategy":{"Type":"JenkinsPipeline"}}},"JenkinsLastUpdate":"0001-01-01T00:00:00Z","IdleStatus":{"Timestamp":"0001-01-01T00:00:00Z","Success":false,"Reason":""}}
[kishansagathiya@localhost fabric8-jenkins-idler]$ 

@kishansagathiya
Copy link
Member

blocked on openshiftio/openshift.io#4356

@kishansagathiya
Copy link
Member

Still blocked as prod-preview is down

@kishansagathiya
Copy link
Member

Not blocked anymore

@kishansagathiya
Copy link
Member

upstream issue filed for this openshift/origin#21112

@kishansagathiya kishansagathiya removed their assignment Dec 5, 2018
@hrishin hrishin added this to the 159/L milestone Dec 21, 2018
@hrishin hrishin modified the milestones: 159/L, 160/L Jan 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants