Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Possibility to store operational issues to a machine #41

Open
majst01 opened this issue Mar 20, 2020 · 7 comments
Open

RFC: Possibility to store operational issues to a machine #41

majst01 opened this issue Mar 20, 2020 · 7 comments

Comments

@majst01
Copy link
Contributor

majst01 commented Mar 20, 2020

During normal operations, it is sometimes the case that there are failures regarding a machine like:

  • hard disk errors
  • network card with duplicate mac
  • cabeling
  • powersupply failure
  • etc.

It might be a good idea to track such issues by machine. To do so we could either:

  • create an issue in our private issue tracker and have a open issues field on the machine
  • be independent from a external issue tracker, but then most of the functionality must be re-implemented here, not an option
  • more options please here...

For me it feels like add MachineIssue the right approach.

type MachineIssue struct {
   MachineID string
   Description string
   URL string
   CreatedAt time.Time
   ClosedAt time.Time
}

Then we can add the following metalctl command:

metalctl machine issue add <machineID> --description "nvme disk timeout" --issueurl "https://github.com/metal-stack/metal-api/issues/2" 

And the other way round, machine listing will add a Sign to machines with issues:

metalctl machine issues
ID                                                      LAST EVENT      WHEN    AGE     HOSTNAME        PROJECT SIZE            IMAGE   PARTITION  ISSUE ISSUEURL
00000000-0000-0000-0000-ac1f6b2d34a4                    Preparing ↻     4s   fra-equ01 nvme disk timeout https://github.com/metal-stack/metal-api/issues/2
``
@majst01
Copy link
Contributor Author

majst01 commented Mar 20, 2020

/cc @Gerrit91 @mwennrich @ulrichSchreiner WDYT ?

@Gerrit91
Copy link
Contributor

Gerrit91 commented Mar 23, 2020

Sorry, I do not really have a strong opinion about that. Only thing that comes to my mind is that there would be the opportunity to add this to MachineState, such that we do not only have a "locked" and "reserved" state but also "maintenance" or "defect" or whatever. I think someone from operations should say if this would help them, @mwennrich?

@ulrichSchreiner
Copy link
Contributor

ulrichSchreiner commented Mar 31, 2020

i'm unsure about this feature. first it sounds good, but who creates this issues? and more important: how do you make sure that such issues are removed from the machine when it is resolved?

does a machine with an issue mark this machine as defect or unusable? if this is not the case than after some time you will have machines with many issues and do not know if any of these issues is already fixed.

@majst01
Copy link
Contributor Author

majst01 commented Mar 31, 2020

Could potentially be done with a issue webhook in gitlab ??
https://docs.gitlab.com/ee/user/project/integrations/webhooks.html#issue-events

@ulrichSchreiner
Copy link
Contributor

Gitlab issues are freetext .... it will be hard to connect them to a specific machine-ID and do a specific REST-call when an issue event happens.
I'm still missing which parts of metal-api should inspect the issues table and why. such machines are allocatable?

@majst01
Copy link
Contributor Author

majst01 commented Apr 16, 2020

@mwennrich have you some opinions here ?

@majst01
Copy link
Contributor Author

majst01 commented May 11, 2020

related: metal-stack/metal-hammer#17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants