Protocols & Industrial Communications
How protocol characteristics influence architectural choices - deterministic protocols requiring specific redundancy mechanisms, timing considerations shaping topology design.
Industrial networks are judged by how they perform when everything works, but their true quality is revealed when something fails. Architecture and redundancy determine whether faults are absorbed quietly or escalate into outages.
Network architecture is often reduced to diagrams of rings, stars, and meshes, but true architecture defines how traffic flows, where decisions are made, which failures are isolated, and how systems recover.
Two networks with identical topologies can behave entirely differently under stress depending on segmentation, prioritisation, and monitoring. Architecture transforms intent into predictable behaviour. In industrial environments, this means designing not for optimal performance under ideal conditions, but for controlled degradation during inevitable failures - cable cuts, device failures, power interruptions, and human error.
Most industrial networks that suffer "mysterious" failures are not overly complex; they are under-architected. They lack deliberate control and containment mechanisms, allowing faults to propagate and troubleshooting to devolve into guesswork. This section explores the principles that move networks from fragile to resilient.
Flat networks appear simple initially - everything can see everything else, configuration is minimal, and troubleshooting seems straightforward. As systems grow, this simplicity becomes a systemic liability.
Broadcast and multicast traffic increases without natural boundaries, faults propagate without barriers, security perimeters dissolve, and diagnosing issues becomes harder, not easier. Segmentation addresses these issues not through restriction, but through controlled organisation. It creates logical boundaries that contain traffic, limit fault propagation, and establish clear security zones. The goal is to replace complexity with clarity.
Effective segmentation aligns with operational functions - separating safety-critical control from process supervision, field devices from enterprise systems, and real-time traffic from best-effort data. This alignment ensures that when architecture diagrams are translated into operational reality, the network behaves in understandable, predictable ways.
Redundancy is often misunderstood as component duplication - adding another cable, switch, or path. True redundancy requires the system to know how to behave when a component fails.
Effective redundancy demands clear primary and secondary paths, predictable failover times, awareness of protocol sensitivity to disruption, and regular testing under realistic conditions. Redundant components without coordinated behaviour can exacerbate failures rather than mitigate them. For example, a redundant link that takes 30 seconds to converge may be useless for a control system that requires sub-second recovery.
| Redundancy Mechanism | Typical Recovery Time | Suitable Applications |
|---|---|---|
| Spanning Tree Protocol (STP) | 2–50 seconds (variable with network size and tuning) | Non-time-critical IT traffic, best-effort data collection |
| Ethernet Ring (ERP/G.8032) | <50 milliseconds | Process control, SCADA, real-time monitoring |
| Parallel Redundancy Protocol (PRP) | Zero packet loss (active-active) | Protection systems, safety-critical control, high-speed motion |
| Dynamic Routing (OSPF/BGP) | 1–10 seconds | Large campus/wide-area networks, enterprise IT convergence |
Selecting redundancy mechanisms requires matching recovery characteristics to application timing requirements - a mismatch guarantees operational failure during real incidents.
Many industrial applications are sensitive to timing. Protection systems, motion control, and synchronised processes depend on data arriving within defined windows - redundancy mechanisms that introduce variable delay can destabilise these systems.
Architecture must account for latency introduced during failover, packet reordering and duplication, and convergence behaviour under fault conditions. Designing for determinism means accepting that not all redundancy strategies are suitable for all applications. A video surveillance system may tolerate brief interruption; a motor synchronisation system will not.
This requires understanding both the network's recovery characteristics and the application's timing tolerance. The worst-case scenario is a redundancy mechanism that restores connectivity but alters timing in ways that make the application malfunction - technically "up" but operationally broken.
Resilient networks rarely rely on a single protective mechanism. Instead, resilience emerges from multiple complementary layers that compensate for each other's limitations.
A layered approach might include physical path diversity (separate cable routes), logical segmentation (security zones), redundant communication paths (dual rings), graceful degradation strategies (local fallback modes), and comprehensive monitoring for early detection. This reduces reliance on any single technology behaving perfectly - a critical consideration in long-lived industrial systems where components age and environments change.
Each layer provides resilience at a different level, ensuring that a failure in one area does not cascade into complete system failure.
Redundant systems that are not monitored are trusted blindly. Without visibility, it is impossible to know whether redundancy paths are operational, whether failover will occur as expected, or whether hidden faults already exist.
Architecture defines expected behaviour; diagnostics confirm it. In many cases, redundancy failures are only discovered during real incidents - when recovery matters most. Effective monitoring tracks not just whether redundant components are present, but whether they are functional and behaving within design parameters. This includes measuring failover times, verifying path diversity, and detecting "silent" failures where a backup component has failed without affecting the primary.
The most robust architectural designs include built-in testability - ways to safely validate redundancy mechanisms during maintenance windows without risking production operations.
Industrial networks change slowly but inevitably. Architecture that cannot tolerate change becomes fragile over time as maintenance, upgrades, and expansions introduce risk.
Good architectural design considers how components can be isolated for maintenance without affecting overall operation, how new devices are introduced safely, how temporary connections are controlled, and how documentation remains aligned with reality. Redundancy that complicates maintenance often leads to dangerous bypasses and shortcuts that ultimately undermine resilience.
This requires clear change control processes, but also architectural patterns that accommodate evolution - modular design, expansion points, and backward compatibility where practical. Networks should be understandable, predictable, and explainable even years after initial deployment.
One of the most valuable architectural concepts is the failure domain - defining what can be affected by a single fault, where that fault is stopped, and how recovery is isolated.
Well-designed networks ensure that local failures remain local, critical systems are insulated from non-critical ones, and faults do not cascade across functions. Containment is the difference between an incident and a disaster. This involves strategic placement of segmentation boundaries, careful design of interdependencies, and understanding how failures propagate through both physical and logical layers.
For example, a fault in a non-critical monitoring system should not affect safety-critical control. A power failure in one cabinet should not take down redundant paths. Architecture makes these boundaries explicit and defensible.
Throughput Technologies approaches network architecture and redundancy as a systems engineering discipline. We focus on designing for failure, implementing layered resilience, and ensuring that recovery behaviour matches operational timing requirements. The goal is networks that continue operating within defined limits even when components fail.
Architecture transforms intent into predictable behaviour -
especially when things go wrong.
Network architecture interacts with every other aspect of industrial networking. These related Knowledge Hub sections provide deeper context.
How protocol characteristics influence architectural choices - deterministic protocols requiring specific redundancy mechanisms, timing considerations shaping topology design.
How physical media characteristics shape architecture - fibre enabling certain topologies, wireless influencing redundancy design, hybrid media requiring boundary management.
How to monitor architectural health - validating redundancy, measuring failover performance, detecting boundary violations, and ensuring design intent matches reality.