Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VM informant #8

Merged
merged 14 commits into from
Jan 11, 2023
Merged

Add VM informant #8

merged 14 commits into from
Jan 11, 2023

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Jan 7, 2023

Currently, this mostly does nothing. The protocol is designed so that it'll be easy to add to it later, which I'll do shortly.

Remaining TODO items before merging:

  • agent must check with the informant before downscaling and after upscaling (0bb61ba)
  • ARCHITECTURE.md needs updating (3c40d14, 8389afe)

@sharnoff sharnoff marked this pull request as ready for review January 9, 2023 02:42
@sharnoff
Copy link
Member Author

sharnoff commented Jan 9, 2023

I think this is ready to merge. It's now clear to me that there's some heavy refactoring needed in pkg/agent/run.go (and .../serve.go, added by this PR), so that the existing long-running tasks can share state & signals — in addition to the work required by #5 (which I think is mostly separate and could be done in parallel).

While the refactoring in pkg/agent/{run,serve}.go is a prerequisite to further work, I think it's more important to merge this as-is & get CI-built vm-informant binaries (see #6) into the compute images, so that we can iterate on something ­— especially considering that this autoscaler-agent requires the informant to be present, so we already have to wait to deploy it until after the updated storage images are built.

@sharnoff sharnoff mentioned this pull request Jan 9, 2023
42 tasks
build/vm-informant/Dockerfile Show resolved Hide resolved
build/vm-informant/Dockerfile Outdated Show resolved Hide resolved
build/vm-informant/build.sh Show resolved Hide resolved
cmd/vm-informant/main.go Show resolved Hide resolved
cmd/vm-informant/main.go Outdated Show resolved Hide resolved
cmd/vm-informant/main.go Outdated Show resolved Hide resolved
cmd/vm-informant/main.go Outdated Show resolved Hide resolved
cmd/vm-informant/main.go Outdated Show resolved Hide resolved
@sharnoff sharnoff mentioned this pull request Jan 9, 2023
3 tasks
@kelvich
Copy link
Contributor

kelvich commented Jan 9, 2023

Hm, do we actually need a new binary inside? Not saying that we shouldn't, but trying to understand the list of reasons.

Vector now provides metrics that agent can read, and we also can embed some UDF's in postgres to up/down scale things and agent can call them. That leaves us with postgres -> scheduler requests on spiky load. Is there anything else?

@sharnoff
Copy link
Member Author

sharnoff commented Jan 9, 2023

@kelvich

Is there anything else?

Short answer: not yet. The key benefits from having a binary inside, as I see it:

  1. We can handle "memory high" events along the lines of Christian's proposal with cgroups
  2. Calling postgres UDF(s) is through localhost (easier to harden postgres auth rules later)
  3. A binary inside has the flexibility to supply metrics that vector might not
  4. [not sure] Better tracking of disk usage? We want to track this as closely as possible so we can handle "low disk space" in the same way as (1). It's not clear to me whether this is required. Related to (3).

These are all directions for future work — none of this is implemented yet, although (1) is planned and (3) is tenative. (4) may be required.

@kelvich
Copy link
Contributor

kelvich commented Jan 9, 2023

👌

sharnoff added a commit that referenced this pull request Jan 9, 2023
This commit is just future-proofing. '-mod readonly' is the default
behavior, but wouldn't be if we had a vendor directory.

See also:
* https://go.dev/ref/mod#build-commands
* #8 (comment)
sharnoff added a commit that referenced this pull request Jan 10, 2023
Without gcc (and musl-dev), `docker build` fails with:

    #0 1.775 # runtime/cgo
    #0 1.775 cgo: C compiler "gcc" not found: exec: "gcc": executable file not found in $PATH

See also:
* #8 (comment)
Currently, this mostly does nothing. The protocol is designed so that
it'll be easy to add to it later, which we'll do.

There's a couple "todo" items that are necessary before merging that
will be handled in follow-up commits.
There's some more work required there because anyways because of merge
conflicts, but I wanted to get this out of the way beforehand.
Earlier in this PR's development, util.AddHandler actually lived in
cmd/vm-informant, and I forgot to remove the old version.
Per review comment. It's a relatively clean way to return a non-zero
exit code.
@sharnoff sharnoff force-pushed the sharnoff/vm-informant branch from 52d0587 to 7424de6 Compare January 11, 2023 00:10
@sharnoff sharnoff merged commit 65d1528 into main Jan 11, 2023
@sharnoff sharnoff deleted the sharnoff/vm-informant branch January 11, 2023 00:21
sharnoff added a commit that referenced this pull request Jan 11, 2023
sharnoff added a commit that referenced this pull request Jan 11, 2023
sharnoff added a commit that referenced this pull request Jan 16, 2023
Some out-of-date items from #4 (49e835f..59a567a) and #8 (65d1528)
bayandin pushed a commit that referenced this pull request Feb 23, 2023
* cicd cleanup

Co-authored-by: sharnoff <[email protected]>
bayandin pushed a commit that referenced this pull request Mar 22, 2023
* cicd cleanup

Co-authored-by: sharnoff <[email protected]>
bayandin pushed a commit that referenced this pull request Mar 22, 2023
* cicd cleanup

Co-authored-by: sharnoff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants