Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: make autoscaling production-ready #2

Closed
20 of 42 tasks
sharnoff opened this issue Dec 12, 2022 · 5 comments
Closed
20 of 42 tasks

Epic: make autoscaling production-ready #2

sharnoff opened this issue Dec 12, 2022 · 5 comments
Assignees
Labels
c/compute Component: compute c/infra Component: infrastructure p/cloud Product: Neon Cloud t/Epic Issue type: Epic

Comments

@sharnoff
Copy link
Member

sharnoff commented Dec 12, 2022

This is a collection of tasks from this repo & others that I'm pretty sure will be required.

This does not include required tasks to get VMs & autoscaling working on staging.

If there's something missing, feel free to add it to the appropriate task group.

DoD

All non-optional tasks implemented and questions resolved.

Optional tasks would be good to have implemented, but are not strictly necessary for deploying to production.

Tasks — autoscaling (this repo!)


  • optional: Support changing VM limits without restarting autoscaler-agent
    • probably best to switch autoscaler-agent to daemonset first done
  • optional: agent, plugin: Add "dump state" endpoints #76
    • optional: Provide separate tool to analyze this state
  • optional: Scheduler live config updates
  • optional: autoscaler-agent live config updates
  • optional: Migrate away from under-utilized nodes to allow cluster autoscaler to remove them.
  • optional: Add disk as resource type (required for handling local filesystem cache)
    • optional: Do we also need high disk usage (or: low disk usage) signals, like memory?

Tasks — neonvm

Tasks — neon and neondatabase/postgres

  • Yet another port of local file system cache neon#2622
    • Resize cache when decreasing disk size? Should be handled by VM informant
    • Report disk usage with/without cache (so we can resize)
  • optional: Finalize decision on shared_buffers
  • optional: Report when e.g. vacuum is running, so it doesn't influence scaling decisions

Tasks — cloud

  • optional: Update NeonVM resource limits (if compute active) when endpoint limits change

Tasks — console

  • optional: Live view of current resource allocation
    • There's some neat options here where the autoscaler-agent could report the reason for scaling decisions and we display that here (e.g. "high load, wanted more CPU", "low load, but high memory usage is preventing downscale")

Tasks — infra team

I don't know what, yet. If anyone on the infra team has ideas, it would be helpful to add them here to replace this paragraph. (see also: "What's required for sentry integration?")

Further questions

  • Can node resource limits change at runtime?
  • What happens if memory is scaled down below current usage? (e.g. can this kill postmaster?)
  • How do we handle VM migration failure?
    • What are possible failure causes?
  • How can we minimize downtime during scheduler upgrades?
  • What's required for sentry integration?

Other related tasks and Epics

@sharnoff sharnoff added c/infra Component: infrastructure c/storage/compute p/cloud Product: Neon Cloud t/Epic Issue type: Epic labels Dec 12, 2022
@sharnoff sharnoff self-assigned this Dec 12, 2022
@seymourisdead seymourisdead added the c/console Component: console label Jan 5, 2023
@seymourisdead
Copy link
Member

console this week: remove the default option for provisioner

@sharnoff
Copy link
Member Author

Marked disk scaling as optional to reflect the outcome of our sync earlier this week.

bayandin pushed a commit that referenced this issue Feb 23, 2023
@stepashka stepashka removed the c/console Component: console label Mar 1, 2023
bayandin pushed a commit that referenced this issue Mar 22, 2023
bayandin pushed a commit that referenced this issue Mar 22, 2023
@stepashka
Copy link
Member

shall we close this @vadim2404 ?

@yaoyinnan
Copy link

yaoyinnan commented Jul 12, 2023

@stepashka @vadim2404 @sharnoff Can the current autoscaling be used for production? I currently don't see documentation for deploying PostgreSQL's K8s with Neon storage layer components (Safekeeper, Pageserver), and the cluster topology cannot be configured in Autoscaling. Is there any relevant information? Please provide me with as much information as possible. Thanks.

@vadim2404
Copy link

We already use it in production. About documentation, yes, indeed, we need more cover in regard to the corresponding deployment. And one hidden component stays behind https://console.neon.tech/ this domain, which actually does the whole orchestration.

fprasx added a commit that referenced this issue Aug 21, 2023
This commit has 2 main benefits:
1) It makes it impossible to access nextTransactionID un-atomically
2) It fixes a small bug where we would have racy (albeit atomic)
   accesses to nextTransactionID. Consider the following interleaving:
   Dispactcher Call #1 and #2:
        Read nextTransactioID

   Dispactcher Call #1 and #2:
        Bump nextTransactionId *locally* and then write it back. The
        same value is written back twice.

   Dispactcher Call #1 and #2:
        Send a message with the newly minted transaction ID, x. Note,
        *two* messages are sent with x! So two responses will come back.

   First response arrives:
        Entry is deleted from dispatcher's waited hash map.

   Second response arrives:
        Received message with id x, but no record of it, because the
        entry was deleted when the first message arrived.

   The solution is just to use an atomic read-modify-write operation in
   the form of .Add(1)
fprasx added a commit that referenced this issue Aug 21, 2023
* make nextTransactionID an atomic variable

This commit has 2 main benefits:
1) It makes it impossible to access nextTransactionID un-atomically
2) It fixes a small bug where we would have racy (albeit atomic)
   accesses to nextTransactionID. Consider the following interleaving:
   Dispactcher Call #1 and #2:
        Read nextTransactioID

   Dispactcher Call #1 and #2:
        Bump nextTransactionId *locally* and then write it back. The
        same value is written back twice.

   Dispactcher Call #1 and #2:
        Send a message with the newly minted transaction ID, x. Note,
        *two* messages are sent with x! So two responses will come back.

   First response arrives:
        Entry is deleted from dispatcher's waited hash map.

   Second response arrives:
        Received message with id x, but no record of it, because the
        entry was deleted when the first message arrived.

   The solution is just to use an atomic read-modify-write operation in
   the form of .Add(1)

* protect disp.waiters with mutex

disp.Call can be called from multiple threads (the main disp.run()
thread, and the healthchecker thread), so access needs to be guarded
with a mutex as the underlying map is not thread safe.

* rename nextTransactionID to lastTransactionID
Omrigan added a commit that referenced this issue May 24, 2024
The virtio-serial interface can be opened only once.
Consider the following scenario:

1. Process #1 starts writing to the serial device.
2. Process #1 spawns a fork, Process #2. It inherits the open file descriptor.
3. Process #1 dies; Process #2 survives and preserves the file descriptor.
4. Process #1 is restarted but cannot open the serial device again,
   causing it to crash-loop.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this issue May 24, 2024
The virtio-serial interface can be opened only once.
Consider the following scenario:

1. Process #1 starts writing to the serial device.
2. Process #1 spawns a fork, Process #2. It inherits the open file descriptor.
3. Process #1 dies; Process #2 survives and preserves the file descriptor.
4. Process #1 is restarted but cannot open the serial device again,
   causing it to crash-loop.

To fix it, we are creating FIFO special file, which supports multiple
writers, and spawning cat to redirect it to the virtio-serial.

Signed-off-by: Oleg Vasilev <[email protected]>
@stepashka stepashka added the c/compute Component: compute label Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/compute Component: compute c/infra Component: infrastructure p/cloud Product: Neon Cloud t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

5 participants