Skip to content

Commit

Permalink
lints
Browse files Browse the repository at this point in the history
  • Loading branch information
maniwani committed May 1, 2021
1 parent e4a71b6 commit cdd0586
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 35 deletions.
24 changes: 19 additions & 5 deletions implementation_details.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@

# Implementation Details

## `Connection` != `Player`

I know I've been using the terms "client" and "player" somewhat interchangeably, but `Connection` and `Player` should be separate tokens. There's no benefit in forcing one player per connection. Having `Player` be its own thing makes it easier to do stuff like online splitscreen, temporarily fill team slots with bots, etc.

## "Clock" Synchronization

Ideally, clients predict ahead by just enough to have their inputs reach the server right before they're needed. People often try to have clients estimate the clock time on the server (with some SNTP handshake) and use that to schedule the next simulation step, but that's overly complex.

What we really care about is: How much time passes between when the server receives my input and when that input is consumed? If the server simply tells clients how long their inputs are waiting in its buffer, the clients can use that information to converge on the correct lead.
Expand Down Expand Up @@ -58,10 +61,12 @@ interp_time = max(interp_time, predicted_time - max_lag_comp)
The key idea here is that simplifying the client-server relationship makes the problem easier. You *could* have the server apply inputs whenever they arrive, rolling back if necessary, but that would only complicate things. If the server never accepts late inputs and never changes its pace, no one needs to coordinate.

## Prediction <-> Interpolation

Clients can't directly modify the authoritative state, but they should be able to predict whatever they want locally. One obvious implementation is to literally fork the latest authoritative state. If copying the full state ends up being too expensive, we can probably use a copy-on-write layer.

My current idea to shift components between prediction and interpolation is to default to interpolated (reset upon receiving a server update) and then use specialized change detection `DerefMut` magic to flag as predicted.
```

```rust
Predicted<T>
PredictAdded<T>
PredictRemoved<T>
Expand All @@ -72,13 +77,15 @@ Cancelled<T>
CancelAdded<T>
CancelRemoved<T>
```

Everything is predicted by default, but users can opt-out by filtering on `Predicted<T>`. In the more conservative cases, clients would predict the entities driven by their input, the entities they spawn (until confirmed), and any entities mutated as a result of the first two. Systems with filtered queries (i.e. physics, path-planning) should typically run last.

We can also use these filters to generate events that only trigger on authoritative changes and events that trigger on predicted changes to be confirmed or cancelled later. The latter are necessary for handling sounds and particle effects. Those shouldn't be duplicated during rollbacks and should be faded out if mispredicted.

Should UI be allowed to reference predicted state or only verified state?

## Predicting Entity Creation

This requires some special consideration.

The naive solution is to have clients spawn dummy entities. When an update that confirms the result arrives, clients can simply destroy the dummy and spawn the true entity. IMO this is a poor solution because it prevents clients from smoothly blending these entities from predicted time into interpolated time. It won't look right.
Expand All @@ -92,6 +99,7 @@ A better solution is for the server to assign each networked entity a global ID
- A more extreme solution would be to somehow bake global IDs directly into the memory allocation. If memory layouts are mirrored, relative pointers become global IDs, which don't need to be explicitly written into packets. This would save 4-8 bytes per entity before compression.

## Smooth Rendering

Rendering should come after `NetworkFixedUpdate`.

Whenever clients receive an update with new remote entities, those entities shouldn't be rendered until that update is interpolated.
Expand All @@ -103,13 +111,14 @@ We'll also need to distinguish instant motion from integrated motion when interp
Is an exponential decay enough for smooth error correction or are there better algorithms?

## Lag Compensation

Lag compensation deals with colliders. To avoid weird outcomes, lag compensation needs to run after all motion and physics systems.

Again, people often imagine having the server estimate what interpolated state the client was looking at based on their RTT, but we can resolve this without any guesswork.

Clients can just tell the server what they were looking at by bundling the interpolated tick numbers and the blend value inside the input payloads. With this information, the server can reconstruct *exactly* what each client saw.

```
```plaintext
<packet header>
tick number (predicted)
tick number (interpolated from)
Expand All @@ -119,6 +128,7 @@ interpolation blend value
```

So there are two ways to go about the actual compensation:

- Compensate upfront by bringing new projectiles into the present (similar to a rollback).
- Compensate over time ("amortized"), constantly testing projectiles against the history buffer.

Expand All @@ -133,14 +143,15 @@ For clients with too-high ping, their interpolation will lag far behind their pr
When a player is parented to another entity, which they have no control over (e.g. the player is a passenger in a vehicle), the non-predicted movement of that parent must be rewound during compensation to spawn any projectiles fired by the player in the correct location.

## Unconditional Rollbacks

Every article on "rollback netcode" and "client-side prediction and server reconciliation" encourages having clients compare their predicted state to the authoritative state and reconciling *if* they mispredicted. But how do you actually detect a mispredict?

I thought of two methods while I was writing this:

1. Unordered scan looking for first difference.
2. Ordered scan to compute checksum and compare.

The first option has an unpredictable speed. The second option requires a fixed walk of the game state (checksums *are* probably worth having even if only for debugging non-determinism). There may be options I didn't consider, but the point I'm trying to make is that detecting changes among large numbers of entities isn't cheap.
The first option has an unpredictable speed. The second option requires a fixed walk of the game state (checksums *are* probably worth having even if only for debugging non-determinism). There may be options I didn't consider, but the point I'm trying to make is that detecting changes among large numbers of entities isn't cheap.

Let's consider a simpler default:

Expand All @@ -149,6 +160,7 @@ Let's consider a simpler default:
Now, you may think that's wasteful, but I would say "if mispredicted" gives you a false sense of security. Mispredictions can occur at any time, *especially* during long-lasting complex physics interactions. It's much easier to profile and optimize for your worst-case if clients *always* rollback and re-sim. It's also more memory-efficient, since clients never need to store old predicted states.

## Delta-Compressed Snapshots

- The server keeps an incrementally updated copy of the networked state.
- Components are stored with their global ID instead of the local ID.
- The server keeps a ring buffer of "patches" for the last `N` snapshots.
Expand All @@ -161,12 +173,14 @@ Now, you may think that's wasteful, but I would say "if mispredicted" gives you
- Pass compressed payloads to protocol layer.
- Protocol and I/O layers do whatever they do and send the packet.

## Interest Managed Updates
## Interest-Managed Updates

TODO

## Messages

TODO

Messages are best for sending global alerts and any gameplay mechanics you explicitly want modeled as request-reply (or one-way) interactions. They can be unreliable or reliable. You can also postmark messages to be executed on a certain tick like inputs. That can only be best effort, though.

The example I'm thinking of is buying items from an in-game vendor. The server doesn't simulate UI, but ideally we can write the message transaction in the same system. A macro might end up being the most ergonomic choice.
The example I'm thinking of is buying items from an in-game vendor. The server doesn't simulate UI, but ideally we can write the message transaction in the same system. A macro might end up being the most ergonomic choice.
48 changes: 34 additions & 14 deletions networked_replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This RFC proposes an implementation of engine features for developing networked

## Motivation

Networking is unequivocally the most lacking feature in all general-purpose game engines.
Networking is unequivocally the most lacking feature in all general-purpose game engines.

While most engines provide low-level connectivity—virtual connections, optionally reliable UDP channels, rooms—almost none of them ([except][1] [Unreal][2]) provide high-level *replication* features like prediction, interest management, or lag compensation, which are necessary for most networked multiplayer games.

Expand All @@ -15,6 +15,7 @@ This broad absence of first-class replication features stifles creative ambition
Bevy's ECS opens up the possibility of providing a near-seamless, generalized networking API.

What I hope to explore in this RFC is:

- What game design choices and constraints does networking add?
- How does ECS make networking easier to implement?
- What should developing a networked multiplayer game in Bevy look like?
Expand All @@ -29,7 +30,7 @@ As a user, you only have to annotate your gameplay-related components and system

> Game design should (mostly) drive networking choices. Future documentation could feature a questionnaire to guide users to the correct configuration options for their game. Genre and player count are generally enough to decide.
The core primitive here is the `Replicate` trait. All instances of components and resources that implement this trait will be automatically detected and synchronized over the network. Simply adding a `#[derive(Replicate)]` should be enough in most cases.
The core primitive here is the `Replicate` trait. All instances of components and resources that implement this trait will be automatically detected and synchronized over the network. Simply adding a `#[derive(Replicate)]` should be enough in most cases.

```rust
#[derive(Replicate)]
Expand All @@ -48,6 +49,7 @@ struct Health {
hp: u32,
}
```

By default, both client and server will run every system you add to `NetworkFixedUpdate`. If you want systems or code snippets to run exclusively on one or the other, you can annotate them with `#[client]` or `#[server]` for the compiler.

```rust
Expand All @@ -62,6 +64,7 @@ fn ball_movement_system(
```

For more nuanced runtime cases—say, an expensive movement system that should only process the local player entity on clients—you can use the `Predicted<T>` query filter. If you need an explicit request or notification, you can use `Message` variants.

```rust
fn update_player_velocity(
mut q: Query<(&Player, &mut Rigidbody)>)
Expand Down Expand Up @@ -89,22 +92,22 @@ Bevy can configure an `App` to operate in several different network modes.

| Mode | Playable? | Authoritative? | Open to connections? |
| :--- | :---: | :---: | :---: |
| Client ||||
| Standalone ||||
| Listen Server ||||
| Client ||||
| Standalone ||||
| Listen Server ||||
| Dedicated Server ||||
| Relay ||||

<br>

```rust
// TODO: Example App configuration.
```

## Implementation Strategy

[Link to more in-depth implementation details (more of an idea dump atm).](../main/implementation_details.md)

### Requirements

- `ComponentId` (and maybe the other `*Ids`) should be stable between clients and the server.
- Must have a means to isolate networked and non-networked state.
- `World` should be able to reserve an `Entity` ID range, with separate storage metadata.
Expand All @@ -115,7 +118,9 @@ Bevy can configure an `App` to operate in several different network modes.
- Networked components must only be mutated inside `NetworkFixedUpdate`.
- The ECS scheduler should support nested loops.
- (I'm pretty sure this isn't an actual blocker, but the workaround feels a little hacky.)

### The Replicate Trait

```rust
// TODO
impl Replicate for T {
Expand All @@ -124,6 +129,7 @@ impl Replicate for T {
```

### Specialized Change Detection

```rust
// TODO
// Predicted<T> (+ Added<T> and Removed<T> variants)
Expand All @@ -135,7 +141,8 @@ impl Replicate for T {
```

### Rollback via Run Criteria
```rust

```rust
/*
TODO
The "outer" loop is the number of fixed update steps as determined by the fixed timestep accumulator.
Expand All @@ -144,79 +151,92 @@ The "inner" loop is the number of steps to re-simulate.
```

### NetworkFixedUpdate

Clients

1. Iterate received server updates.
2. Update simulation and interpolation timescales.
3. Sample inputs and push them to send buffer.
4. Rollback and re-sim *if* a new update was received.
5. Simulate predicted tick.

Server

1. Iterate received client inputs.
2. Sample buffered inputs.
3. Simulate authoritative tick.
4. Duplicate state changes to copy.
5. Push client updates to send buffer.

Everything aside from the simulation steps could be auto-generated.

### Saving Game State

- At the end of each fixed update, server iterates `Changed<T>` and `Removed<T>` for all replicable components and duplicates them to an isolated copy.
- Could pass this copy to another thread to do the serialization and compression.
- This copy has no `Table<T>`, those would be rebuilt by the client.

### Preparing Server Packets

- Snapshots (full state updates) will use delta compression and manual fragmentation.
- Eventual consistency (partial state updates) will use interest management.
- Both will most likely use the same data structure.

### Restoring Game State

- At the beginning of each fixed update, the client decodes the received update and generates the latest authoritative state.
- Client then uses this state to write its local prediction copy that has all the tables and non-replicable components.

## Drawbacks

- Lots of potentially cursed macro magic.
- Direct writes to `World`.
- Seemingly limited to components that implement `Clone` and `Serialize`.

## Rationale and Alternatives

### Why *this* design?

Networking is a widely misunderstood problem domain. The proposed implementation should suffice for most games while minimizing design friction—users need only annotate gameplay-related components and systems, put those systems in `NetworkFixedUpdate`, and configure some settings.

Polluting the API with "networked" variants of structs and systems (aside from `Transform`, `Rigidbody`, etc.) would just make life harder for everybody, both game developers and Bevy maintainers. IMO the ease of macro annotations is worth any increase in compile times when networking features are enabled.

### Why should Bevy provide this?
People who want to make multiplayer games want to focus on designing their game and not worry about how to implement prediction, how to serialize their game, how to keep packets under MTU, etc. Having these come built-in would be a huge selling point.

People who want to make multiplayer games want to focus on designing their game and not worry about how to implement prediction, how to serialize their game, how to keep packets under MTU, etc. Having these come built-in would be a huge selling point.

### Why not wait until Bevy is more mature?

It'll only grow more difficult to add these features as time goes on. Take Unity for example. Its built-in features are too non-deterministic and its only working solutions for state transfer are paid third-party assets. Thus far, said assets cannot integrate deeply enough to be transparent (at least not without substituting parts of the engine).

### Why does this need to involve `bevy_ecs`?

I strongly doubt that fast, efficient, and transparent replication features can be implemented without directly manipulating a `World` and its component storages. We may need to allocate memory for networked data separately.

## Unresolved Questions
- Can we provide lints for undefined behavior like mutating networked state outside of `NetworkFixedUpdate`?

- Can we provide lints for undefined behavior like mutating networked state outside of `NetworkFixedUpdate`?
- Do rollbacks break change detection or events?
- ~~When sending partial state updates, how should we deal with weird stuff like there being references to entities that haven't been spawned or have been destroyed?~~ Already solved by generational indexes.
- How should UI widgets interact with networked state? React to events? Exclusively poll verified data?
- How should we handle correcting mispredicted events and FX?
- How should we handle correcting mispredicted events and FX?
- Can we replicate animations exactly without explicitly sending animation data?

## Future Possibilities

- With some tools to visualize game state diffs, these replication systems could help detect non-determinism in other parts of the engine.
- Much like how Unreal has Fortnite, Bevy could have an official (or curated) collection of multiplayer samples to dogfood these features.
- Bevy's future editor could automate most of the configuration and annotation.
- Replication addresses all the underlying ECS interop, so it should be settled first. But beyond replication, Bevy need only provide one good default for protocol and I/O for the sake of completeness. I recommend dividing crates at least to the extent shown below to make it easy for developers to swap the low-level transport with [whatever][3] [alternatives][4] [they][5] [want][7].

| `bevy::net::replication` | `bevy::net::protocol` | `bevy::net::io` |
| `bevy::net::replication` | `bevy::net::protocol` | `bevy::net::io` |
| -- | -- | -- |
| <ul><li>save and restore</li><li>prediction</li><li>serialization</li><li>delta compression</li><li>interest management</li><li>visual error correction</li><li>lag compensation</li><li>statistics (high-level)</li></ul> | <ul><li>(N)ACKs</li><li>reliability</li><li>virtual connections</li><li>channels</li><li>encryption</li><li>statistics (low-level)</li></ul> | <ul><li>send</li><li>recv</li><li>poll</li></ul> |


[1]: https://youtu.be/JOJP0CvpB8w "Unreal Networking Features"
[2]: https://www.unrealengine.com/en-US/tech-blog/replication-graph-overview-and-proper-replication-methods "Unreal Replication Graph Plugin"
[3]: https://github.com/quinn-rs/quinn
[4]: https://partner.steamgames.com/doc/features/multiplayer
[5]: https://developer.microsoft.com/en-us/games/solutions/multiplayer/
[6]: https://dev.epicgames.com/docs/services/en-US/Overview/index.html
[7]: https://docs.aws.amazon.com/gamelift/latest/developerguide/gamelift-intro.html
[7]: https://docs.aws.amazon.com/gamelift/latest/developerguide/gamelift-intro.html
Loading

0 comments on commit cdd0586

Please sign in to comment.