Netcode Architectures Part 4: Tribes
In the fourth and final part of this series, we’ll be taking a look at the architecture originally developed for the Tribes franchise of video games. Although we’ll primarily focus on the specific implementation described by Mark Frohnmayer and Tim Gift in their paper, The TRIBES Engine Networking Model, the core principles are similar to those employed by games today such as Fortnite (and therefore many Unreal Engine titles) and Halo. To avoid retreading what the Tribes paper already covers, only the high-level concepts will be summarized before diving into the analysis and comparison with other models.
Tribes uses a client-server model in which the server sends partial updates about the state of the world to clients each frame. This is sometimes referred to as eventual consistency because the client is guaranteed only that an object’s state will eventually match the server’s latest state, provided the server makes no further changes to that object. Intermediate changes may be skipped or applied out of order. In Tribes terminology, these client-side representations of objects are referred to as ghosts and these partial updates are generated and applied by the Ghost Manager.
To keep the game responsive to the client’s controls, players are assigned a single object called their control object and player inputs, called moves, are sent to the server and applied to this object by the Move Manager. This is the only object for which clients are guaranteed to receive the latest state each update, along with which move was most recently applied. After receiving an update, clients reconcile this new authoritative state coming in from the server by reapplying any subsequent moves that are still in flight.
In this model, the client is also responsible for simulating all other objects forward in time between updates received from the server and smoothly correcting any differences when incoming updates are applied.
This video illustrates the remote player experience as compared to local multiplayer:
The experience for the locally controlled object is mostly the same as in snapshot interpolation, but the behavior of all other objects is quite different.
Under the Hood
It can be helpful to look at this architecture in two parts: local player prediction and state transfer. Both are necessary pieces of the puzzle, but work very differently from one another.
Local Player Prediction
In the following video, you can see how local player prediction works in this model:
Note the similarities between this process and that of the snapshot interpolation model. In both, inputs are sent to the server which applies them to that player’s control object and then replies with its new state. When the client receives and applies this state, it reconciles it with its ongoing prediction by reapplying any inputs that are still in flight. The major difference between snapshot interpolation and the Tribes architecture is how it transfers state for every object other than the one that is being locally predicted.
Caveat: Local Player Prediction in Unreal EngineAlthough it was stated earlier that Unreal Engine’s model is similar to that of Tribes, one noteworthy difference is that Unreal Engine lacks the generalized concept of a control object and Move Manager. Although Unreal Engine’s pawn actors serve a similar role to Tribes’ control objects, the equivalent of the Move Manager that samples and applies input deterministically is implemented explicitly as part of Unreal’s character movement component via RPCs. While this component can be customized by game developers, it remains limited to capsule-based character actors. A bespoke solution for client-side prediction is required to control other types of actors or shapes, such as vehicles.
In the following video, you can see how objects (excluding the control object) are transferred:
Remote objects may receive updates at any time, independently from one another. While waiting for the next update, clients are responsible for simulating remote objects forward in time plausibly. One tricky aspect of this is that due to jitter and packet loss, each update may arrive early or late relative to the client’s ongoing simulation. See the Jitter and Error Correction section below for more on this.
The requirements for this model are the same as for snapshot interpolation. Namely, it is both state-based and server-based and so the game must be able to efficiently serialize game/object state and one machine must be designated the authority.
Challenges and Limitations
Partial state transmission introduces a number of unique challenges, as it provides fewer guarantees to gameplay code than any of the other architectures we’ve discussed. At a high level:
- Eventual consistency means gameplay programmers must be proactive and code defensively against unexpected states that arise from the netcode itself.
- Irregular update rates make incoming state difficult to de-jitter.
- Extrapolation between updates leads to reliance on more client-authoritative methods.
- Complex prioritization hinders efforts to guarantee a minimum update rate for objects and therefore a minimum level of fidelity.
Many of these challenges only become problematic under specific network conditions, e.g., a particular packet is dropped or a sequence of packets are misordered. As the complexity of a game grows, so do the number of permutations of things that can go awry. Practically speaking, checking all of these permutations as part of regular testing isn’t feasible. Fixing each issue as it arises can also be challenging both to identify the sequence of updates that trigger it and to reliably reproduce the sequence to verify it has been fixed. As a result, if extreme care is not taken, it’s typical for games using this architecture to appear functional during development only to break down in unusual ways once they go live.
In the following sections, we’ll dive into each of these challenges in detail.
Although objects are guaranteed to be eventually consistent in this model, property changes may be applied out of order in the interim. As a simple example, consider a game where characters can ride in vehicles. One might network this with two properties on the character: a reference to the vehicle the character is riding in, and an enumeration or index indicating which seat the character is occupying.
When a player first enters a car on the driver’s side, the server would assign the vehicle property to the car and set the seat property to the driver’s seat. Because both property changes occur on the same object and on the same frame, the server is guaranteed to transmit them both to the client in a single message. Due to the unreliable nature of the Internet, messages can be dropped so either both changes arrive or neither will. If dropped, the server will eventually retransmit them to maintain eventual consistency. No problem so far.
Next, the character changes seats, and the server sets the seat property to the passenger seat. This new value for the seat property will now get sent independently to the client. Once it’s received, if the first update has already arrived, the client’s state will match the server’s. However, if the first update was lost or delayed, the client will have a character that’s sitting in a passenger seat but without a valid vehicle reference.
It’s important to note that this invalid state never occurred on the server or as a result of any gameplay code but only due to the quirks of the network model itself. Using this toy example, it’s trivial to consider defensive workarounds like ignoring changes in seat without a valid vehicle reference, but it’s emblematic of a larger issue. As the number of properties on an object grows, there is a combinatorial explosion with respect to the number of states gameplay engineers must guard against.
Because only partial state is transmitted, objects may be updated at different times even when they are changed simultaneously on the server. For example, consider a capture the flag game where players store a boolean to indicate whether they are the flag carrier and the following sequence of events occurs:
- Player 1 is holding the flag and drops it, setting its “flag carrier” status to false and spawning a flag object in the world.
- Player 2 picks up the flag, destroying the flag object in the world and setting its “flag carrier” status to true.
As a result of these actions, four changes are emitted:
- Player 1’s “flag carrier” status changes from true to false
- The flag in the world is spawned
- Player 2’s “flag carrier” status changes from false to true
- The flag in the world is destroyed
Since none of these changes are guaranteed to be sent in the same packet, any combination of these changes might be dropped in transit. Clients may have some period of time during which they see:
- Both players holding the flag, with or without a third flag in the world (Change 1 was dropped)
- No players holding the flag with no flag in the world (Changes 2 and 3 were dropped)
- Player 1 holding the flag with a second flag in the world (Changes 1 and 4 were dropped)
- Player 2 holding the flag with a second flag in the world (Change 4 was dropped)
Of course, none of the above are valid states that ever existed on the server, nor do any of those states even make sense in the context of the game. In this case, one could easily solve this by getting rid of the per-player booleans and just having the flag object always exist with a single property indicating who is carrying it. However, it’s not unreasonable to think a gameplay programmer might naively implement it this way. To avoid this, all gameplay programmers must understand how the netcode works at a fundamental level and consider it carefully while implementing all mechanics.
Ultimately, games are about objects interacting, so it is common for a single action taken in a game to affect multiple properties on multiple objects simultaneously. As described with intra-object consistency above, the difficulty of guarding against these issues grows exponentially with the complexity of the game.
The amount of data transmitted in each update is capped to respect the client’s configured bandwidth limit. As a result, the server must decide which objects to send updates about each frame. Objects have a priority assigned to them so that they can be updated more or less frequently relative to other objects depending on their importance or the fidelity required. However, available throughput is constant so, regardless of priority, an increase in the number of changing objects leads to a decrease in the relative frequency with which objects receive updates. Another consequence of this prioritization scheme is that objects changing frequently with too high a priority may starve out other objects from getting into packets as regularly as they otherwise would.
With each object having its own priority, clients configuring different bandwidth limits, and a wide variety of game scenarios, it’s not trivial to determine whether everything is configured optimally or correctly. Developers typically want to ensure that certain quality of service guarantees will be met in extreme but realistic gameplay scenarios, but this cannot be directly determined or calculated from the values that developers are tuning. For example, do clients still receive updates about all player locations at least 20 times per second if everyone is throwing grenades towards them? One would have to try it to find out and then test regularly to prevent any future regressions. The same flexibility that allows this architecture to scale to large object counts and low bandwidth connections creates an inexact science that places a heavy burden on gameplay engineers, designers, and quality assurance personnel to ensure that the desired fidelity is sustained in all supported configurations and scenarios.
Jitter and Error Correction
In this model, property updates are applied as soon as they are received from the server. In between updates, the game is responsible for advancing objects smoothly each frame. This can be considered a form of client-side prediction or extrapolation but it’s important to distinguish this process from the more robust mechanism of client-side prediction provided by the control object and Move Manager.
Example: Synchronizing the position of a rocket
Consider a fast-moving rocket traveling at a constant speed of 2000 cm/s. Say the server sends an update once per second for the next couple of seconds, i.e., after the rocket has moved 2000 cm and again after it has moved 4000 cm. Due to packet jitter, this first update might arrive 20ms late (1.02 seconds since the rocket was launched). While this first update indicates the rocket has moved 2000 cm, the client will have moved the rocket 2040 cm in the interim. For Americans, that’s off by over a foot. At this point, the client must consider these two possibilities:
The rocket is still moving at 2000 cm/s but the update arrived 20ms later than expected (the truth in this case)
The rocket was slowed down at some point during the last second and the update arrived on time
In the first case, ideally no correction would be made since the client would recognize that what was predicted matched what was sent by the server and the message just arrived late. In the second case, where the rocket actually slowed down, one would want to correct the rocket’s location and simulate forward with the new, slower speed.
Unfortunately, due to the way partial state is transmitted in this model, it’s possible that the server did send the client a change to the rocket’s velocity and the packet never arrived. The model also lacks a synchronized timestamp to determine the time at which properties changed, so it’s not possible to determine that the update arrived late.
Since the client cannot determine which of these two scenarios took place, it must initiate a correction for the rocket’s overshoot, moving the rocket back by 40 cm. It will then continue on simulating it moving at 2000 cm/s. Note that no matter which of the two scenarios above is true, this is not ideal behavior.
Let’s say the second update then arrives on time (2 seconds in total since the rocket was launched, 980ms after the client received the first update). The client will now have moved the rocket an additional 1960 cm since the first update was received. Because of the correction that was applied, it will have moved 3960 cm in total. This causes yet another correction to bring the rocket to the final 4000 cm.
Correcting Errors via Smoothing
All connections have jitter, so errors like the ones described above occur with most updates under this model. In this case, one might see a rocket that should be traveling smoothly in a straight line instead teleport backwards and forwards as it moves through the air. This is typically papered over by some form of smoothing with varying degrees of success depending on the object’s motion and how severe the jitter is. Applying smoothing can make these issues less apparent, but can introduce its own issues. By definition, it will skip over any short discontinuities in motion, so applying too much smoothing to characters in a shooter, for example, may mean that players can step quickly out and back into cover without opponents seeing much movement take place. This creates an unfair advantage for those peeking around corners and quickly returning to cover.
Comparison and Interaction with Control Objects
These types of errors do not occur when using the client-side prediction mechanism provided by the Move Manager because the complete state of the object is sent with each update along with the most recent move processed. This allows the client to re-apply any moves that are still in flight and consistently calculate the same end result. This is equivalent to how client-side prediction is performed in both rollback and snapshot models. Note that there is also a difference in time between these two approaches. The control object is being predicted a full round-trip time ahead of the latest state received from the server while other ghosts are only being extrapolated forward from the latest state received onwards. As a result, any relationship or interaction between the control object and other ghosts is challenging because they are offset in time with one another by roughly one round-trip time. In the case of the rocket example presented here, players attempting to dodge the rocket may appear to successfully do so on their screen, only to find out from the server shortly thereafter that they were actually hit.
The previous article on snapshot interpolation introduced the concept of dual time streams, where predicted and interpolated objects were updated consistently on the client but offset from one another with respect to the server time they represent. Backwards reconciliation is made possible because the server can reconstruct exactly what the client was rendering given two timestamps, one from each of the two time streams.
There are three primary obstacles to leveraging this technique with the Tribes transmission model:
Extrapolation: The Tribes architecture relies on clients extrapolating objects forward in time from the most recent update received the server. In order for a server to reconstruct a client’s view of an object, it would need to perform the same extrapolation the client did for each object which is often prohibitively costly to compute.
Varying Time Streams: With partial state transmission, objects receive updates on different frames and at varying rates, so each object on the client may have state that’s newer or older than another. Each object is effectively in its own time stream, independent from the rest. This makes it impossible to transmit a single timestamp to the server from which it can accurately reconstruct the client’s view.
Inconsistency: As described above, the lack of intra-object consistency means objects may be only partially updated if packets are dropped or late, leaving the object in a state that was never present on the server. Since a client’s estimation of an object at a particular point in time may not be accurate, the server-authoritative validation will likewise be inconsistent.
If one were to modify the Tribes architecture to provide timestamps alongside property changes, it would be possible to buffer and de-jitter updates for interpolation instead of performing extrapolation. However, the issue of inconsistency remains and a new obstacle is introduced: irregular update rates. Servers send object updates according to priority and as bandwidth allows; the client has no guarantees about how frequently it will receive updates for any given object. Since two states are required to interpolate between, clients must delay interpolation by at least as long as it takes to receive two updates for the least frequently updated object. As a game grows in scale and update rates decrease, interpolation delay must increase, and this additional latency can make backwards reconciliation feel unfair and exacerbate peeker’s advantage. Alternatively, one could use a shorter interpolation delay and extrapolate the state of objects for which the client is missing data, but that could cause the server-authoritative validation to behave inconsistently when those objects are involved.
Overwatch is a notable exception of using interpolation and backwards reconciliation on top of a transmission model similar to Tribes. The small scale of Overwatch’s simulation likely means that all changed state can fit in a single packet and therefore sending partial updates is rare. Their scripting system also transmits complete state deltas from the last acknowledged state, which is more similar to the snapshot interpolation model than Tribes’ Ghost Manager, and this provides intra-object consistency for that system. Combined, Overwatch’s approach avoids many of the obstacles presented here, but it does not necessarily scale to larger simulations with lower object update frequencies for which this architecture is most often used. Given that, if your game has a similarly small set of changing state, the snapshot interpolation model may be a better fit.
Because the Tribes architecture presents so many obstacles to performing accurate backwards reconciliation, some amount of client trust is typically required. Reliance on client-authoritative hit detection is an unfortunate weakness of this architecture for shooters as it is a significant attack vector for cheaters and a major obstacle to competitive integrity. More details can be found in our Performing Lag Compensation in Unreal Engine 5 article.
Predictive processing of local inputs in this architecture is limited to the application of a player’s moves to its control object. As described above, a player’s control object is transmitted differently from other ghosts because its state must be present in every packet along with which move was most recently applied. This is required in order to ensure deterministic processing of moves on both client and server. Note that applying moves to ghosts as normally transmitted would not work correctly for a couple of reasons:
A ghost may only be partially updated on the client and so its current state may not match what it was on the server when a given move was applied
Ghosts may be updated at arbitrary rates and so there would be no upper bound on how many moves might need to be reconciled when a new update come in. This would be a burden on memory for storing the moves, CPU usage for applying the moves, and would provide no guarantee of how long it takes for the client to hear about a misprediction.
The architecture could certainly be extended to include more than one control object but, if the entire game state needs to be predicted and reconciled as is common for fighting and sports games, then the state of all objects would need to be in every packet. At that point, it effectively becomes the snapshot model discussed in the previous article.
The Tribes architecture allows synchronizing large worlds with a high number of changing objects. In doing so, it requires developers to trade-off bandwidth usage with fidelity, and gives up a number of critical guarantees relative to other approaches. This places an additional burden on engineers to solve problems within gameplay code that would normally be handled by the netcode itself and to defensively guard against exotic states introduced by the architecture’s transmission model. As is typical with most architectures that power large-scale real-time experiences, like MMOs, a significant reliance on client-authority is often necessary to achieve desired outcomes which introduces vulnerabilities that require additional engineering to mitigate.
For games with a large enough scale or in bandwidth-constrained environments, the Tribes architecture is typically one of the only viable models, especially if determinism is not an option or input delay is unacceptable. That makes it a natural choice in those contexts but, for other titles, better results are typically achievable with less effort and fewer production “gotchas” by leveraging a different architecture.
This is the final part of our series on netcode architectures. Be sure to check out the additional reading below if you want to learn more about the Tribes model.
If you’ve missed any of our previous articles in this series you can find them here:
As always, if I got anything wrong, you have questions about the approach we’ve taken with SnapNet, or you just want to chat about netcode in general, please don’t hesitate to reach out!
I Shot You First: Networking the Gameplay of Halo: Reach
The TRIBES Engine Networking Model
Mark Frohnmayer, Tim Gift
Building a Network for Games
Malachi Middlebrook, Philip Orwig