Netcode Architectures Part 2: Rollback
In the second part of this series, we’ll be taking a look at what’s commonly referred to as “rollback” netcode architecture. It was popularized by Tony Cannon’s GGPO library and is the approach used by most competitive fighting games today.
In many ways, rollback can be seen as an extension of a classic lockstep architecture. We covered lockstep in the first part of this series and we’ll be referring to it as we go, so I recommend checking that out first if you’re unfamiliar.
Note that many netcode techniques include some element of rolling back state to reconcile incoming data, but in this article we’ll focus on what is commonly meant by a “rollback” architecture before discussing the prerequisites and limitations.
In a rollback architecture, players send their input each frame and then advance their simulation without waiting to receive input from remote players. Advancing the game in this way is called client prediction because remote input is not yet known; the client must make assumptions about what the input for remote players will be and predict what the future state of the game will be under those assumptions.
See below for an overview of the player experience and how it differs from local multiplayer:
Unlike the lockstep architecture, which provides perfect consistency from frame to frame at the cost of input delay, rollback is the opposite—it provides instantaneous response to input at the cost of consistency.
Under the Hood
In the following video, you can see what this process might look like for a 1v1 match where the players are connected directly to each other i.e., peer-to-peer without a dedicated server:
You can see that the simulation advances and renders immediately as soon as local input is sampled and sent. However, because the remote input hasn’t arrived yet, what’s displayed to the player is just a prediction. Once remote input arrives, this new information must be reconciled with that prediction—which leads to the big question: How do you reconcile input arriving now when it was intended to be applied several frames ago? You have to go back in time and fix it! That’s where rollback comes in.
Let’s say new remote input arrives for frame 5 when the client is about to render frame 7. When this happens, the client must perform the following steps:
Load/restore the state of the entire game as it was on frame 4, i.e., the last non-predicted frame where all inputs were known
Advance the game to frame 5 using the original local input for frame 5 along with the newly received remote input for frame 5
Predict frame 6 using the original local input for frame 6 (and predicting remote input)
Predict frame 7 using the original local input for frame 7 (and predicting remote input)
Render the result
After this process completes, the client will be displaying frame 7 as expected but now with the new remote input correctly incorporated.
Predicting Remote Input
There’s another question to answer though: since the game advances before remote input is received, what do we consider the remote player to be pressing during those frames?
Most implementations simply carry the last known input forward i.e., the remote player is considered to still be pushing whatever the client last heard they were. While simple, this prediction is most often correct because even if a game advances 60 times per second players are rarely changing their inputs at that rate.
It’s been suggested a number of times throughout the years that using AI or machine learning to make better predictions could be beneficial but I disagree for one simple reason: incorrectly predicting that a remote player takes an action is a far worse user experience than incorrectly predicting a remote player does nothing. In the former case, players would see all kinds of actions take place that suddenly abort or disappear when the remote input is finally received and reconciled. In the latter case, players won’t see any action until a remote input is received at which point the game just snaps to part-way through whatever action the remote player initiated (as demonstrated in the video above).
Following that same logic, there is one variation on the typical approach that can improve results depending on the game/context. A typical worst-case misprediction involves the player moving at their highest speed in one direction and then suddenly moving in the opposite direction e.g., holding the left thumbstick on the gamepad full-tilt in one direction and then instantly going full-tilt in the opposite direction. This results in a poor user experience because, under latency, clients predict the player continuing to move in the first direction until the remote input is received indicating the direction change. At that point, since the real and predicted player positions have been traveling away from each other during the rollback window, the difference between the last rendered position and the newly calculated position is large and this error manifests visually as teleporting or rubber-banding.
In some cases, this can be mitigated by decaying a player’s input over time based on how old that input is. For example, the first frame you’re predicting a remote player you apply 100% of their movement input, the second frame two-thirds, the third frame one-third, and by the fourth frame you don’t consider the remote player’s movement input at all. The intuition here is that undershooting motion and then catching up when remote input is received tends to look a lot better than overshooting and then rubber-banding backwards. Rocket League, for example, uses this technique and you can hear it described in Jared Cone’s “It IS Rocket Science!” talk.
All of the prerequisites (determinism and fixed tick rate) of the lockstep architecture apply to rollback as well, so be sure to check out the previous article for more information on those and why they are necessary. However, rollback introduces one more:
Game State Serialization
The first step of the rollback process outlined above requires loading the state of the entire game exactly as it was on an earlier frame. This means that developers looking to use a rollback architecture will need to implement some sort of serialization so the state can be saved and loaded on demand. Performance and determinism are both important considerations. In the ideal case, the entire game state is contained contiguously in memory e.g., within one big C struct, and can simply be copied in memory. Some of the earliest integrations of GGPO, for example, were within emulators which often have load/save state functionality built-in.
One aspect of this prerequisite that can be particularly challenging is handling the state for audio and visual effects. Depending on the engine and available APIs, saving and restoring the state of particle effects and audio clips may not be easy or even feasible. It may also be prohibitive due to memory and/or performance constraints. One common solution for elements like these that don’t affect gameplay is to only track state indicating that they occurred. That information (or lack thereof) can then be used to trigger them and subsequently to cancel them if they were mispredicted.
For example, your game might track state indicating that a player was punched and use that state to trigger playback of an impact sound. Later, when remote input comes in, it may be that the player actually blocked the punch and thus that state may have been rolled back and now no longer exists. Its absence could then signal the game to stop that impact sound or fade it out. For effects with very short durations, the best user experience is often to just let them play out even if they are mispredicted and rolled back.
Challenges and Limitations
As with the prerequisites, all of the challenges and limitations of the lockstep architecture apply to rollback as well (determinism, join in progress, 3+ players) with the exception of input delay, of course, which rollback is specifically designed to mitigate. Likewise, each of the variants and solutions proposed for those issues in the lockstep article apply to a rollback architecture as well. Here, we’ll focus on the two additional challenges that rollback introduces:
The major drawback of mitigating the input delay from a lockstep architecture by performing prediction and rollback is that it introduces inconsistency. As mentioned above, this inconsistency is the result of mispredicting remote inputs. The longer the time horizon of prediction, the more likely it is that an input was mispredicted, and the further the game has diverged from the ground truth since then. It depends on the game, of course, but doing prediction in excess of 100ms to 150ms or so tends to get unplayable quickly due to the amount of inconsistency introduced. There are a couple of common options to help mitigate this:
Option 1: Design for Rollback
The first option for fighting inconsistency is to specifically design your game with rollback in mind. Under a rollback architecture, the client is typically running several frames ahead of the remote input its receiving. Because of this, any action initiated by the input of a remote player will be mispredicted and so the frames between the start of the action and what’s being rendered when the input is received get “skipped over” i.e., are never rendered, due to the rollback reconciliation process described above. You can see this skipping in action in the video at the top of this article.
By incorporating additional frames of animation anticipating an action, it provides instant feedback to the player that pushed the button while delaying the critical frames showing the action taking effect. The goal is for remote players to skip those anticipatory frames due to rollback as described above while still seeing those critical frames that help the player understand what’s happened. The exact number of frames needed depends on the frame rate of the animation and how much latency you want to cover, though the 100ms to 150ms mentioned above is a good rule of thumb.
For example, if a character punches when you press a button, you could include several frames of the character pulling their fist back in anticipation of punching before actually extending their arm. While local players will see that whole animation play out, remote players may see the character go straight from their idle pose to the pose with their arm fully extended. By contrast, without those additional frames, remote players might only see the result of the punch e.g., health reduction, without ever having seen the punch itself! In the fighting game genre, where rollback architectures are common, these are called startup frames and you can Google that term for some excellent examples of this design. Startup frames predate the use of rollback architectures, but the introduction of rollback placed new emphasis on their presence and duration in the genre.
Option 2: Input Delay
The second option for fighting inconsistency is to reintroduce input delay. While the whole point of rollback architecture is ostensibly to eliminate input delay, in practice it’s better to settle for reducing it. As mentioned above, the inconsistency introduced with rollback gets worse as you predict further and further time horizons, quickly becoming unplayable. Given that, most implementations of rollback architectures add some amount of input delay in order to reduce inconsistency, support higher latencies, and improve performance.
The simplest and most common option is to apply a fixed input delay. For example, on the same frame that the client samples and sends the input for frame 5 it will render frame 2. This introduces 3 frames of delay between when a player presses a button and sees the result on screen but also means that 3 fewer frames of prediction/rollback are required. In many fighting games, the amount of input delay is user configurable. In some cases, input delay is even applied when playing offline so that the game feels consistent in all scenarios.
More complex behavior is also possible and can be quite beneficial. For example, in low latency scenarios where a couple of frames are sufficient to cover 100% of latency, it is often advantageous to simply use two frames of input delay and guarantee perfect consistency. At the other extreme, in high latency scenarios, adding additional prediction/rollback would introduce too much inconsistency to be playable and so it is better to apply additional input delay in that context as well. Between those two ends of the spectrum, there is a sweet spot where prediction can be used effectively to reduce input latency without sacrificing too much consistency due to misprediction.
SnapNet is highly configurable in this regard and is a useful example to illustrate this balance. By default, up to the first 50ms of latency are mitigated by input delay. So, for a game with a 60 Hz tick rate, connections with latencies of 50ms or less will have up to 3 frames of input delay applied, and no rollbacks i.e., perfect consistency. After that, up to the next 100ms are mitigated by prediction, so connections with latencies between 50ms and 150ms will have 3 frames of input delay and up to 6 frames of rollback. Finally, beyond that, any additional latency will be mitigated by applying additional input delay. For rollback-based games, we believe these settings generally provide an ideal user experience for the vast majority of connections while remaining as playable as possible in extreme conditions.
The other challenge introduced by a rollback architecture is performance. In order to support rolling back, games need to be able to advance the simulation many times within a single render frame. For example, if a game runs and renders at 60 frames per second, has 3 frames of input delay, and supports up to 300ms of latency, then it needs to be able to roll back and resimulate up to 15 frames in just 16.66ms. That means the CPU budget to advance the game simulation is only ~1.1ms—a dramatic reduction compared to the full 16.66ms budget under a lockstep architecture! As a result, games need to be highly optimized and those with more complex simulations need to consider lower tick rates, limiting how many frames of rollback they support (possibly by adding more input delay), or different netcode architectures all together.
This limited CPU budget can lead to what’s commonly known as the spiral of death. This occurs when the time to rollback and resimulate exceeds the necessary budget. For example, consider a title with a 60 Hz tick rate that needs to rollback and resimulate 100ms of gameplay in order to render the next frame. If it takes more than 16.66ms in real-time to resimulate those 6 ticks, then next frame you’ll probably need to rollback and resimulate at least 7 ticks of gameplay in order to catch up—which likely takes even longer. This feedback loop repeats and performance continues to worsen as the game quickly grinds to a halt because it’s unable to keep pace with the other clients. Note that this is an issue with peer-to-peer lockstep too (or any network model where only inputs are exchanged) but is more frequently an obstacle when using rollback due to the tight performance requirements.
Ultimately, it’s best to consider rollback architecture as an extension of the lockstep model and a tool to mitigate input latency. It’s not without its downsides; namely, it introduces a lot of inconsistency and implementation complexity. It is primarily useful for games that are sensitive to input latency and can fit within the narrow performance constraints. This makes it an ideal choice for fighting games and is also well-suited to many sports games.
Be sure to check out the additional reading below and, if you’ve enjoyed this brief exploration of time travel in netcode, you’ll love the upcoming split into multiple time streams. Stay tuned for part three of this series where we’ll discuss snapshot interpolation, a model particularly popular among first-person shooters.