Flipping the Source of Truth for Edge Orchestration

When the Network Disappears, So Does Your Control Plane

Every mainstream orchestration system - Kubernetes, Nomad, the commercial VMware stack - shares a common architectural assumption: a centralized database is the source of truth for workload placement and routing decisions. Individual nodes report up, the central brain decides, and commands flow down.

This works beautifully in a well-connected data center. It fails catastrophically at the edge.

The Consensus Problem

Distributed consensus protocols like Raft and Paxos promise consistency across nodes. They deliver on that promise - within limits. What the whitepapers don't emphasize: these protocols break down over long distances and unreliable links. 

When your nodes span from a forward operating base to a carrier strike group to a CONUS data center, consensus becomes a liability. The latency alone kills responsiveness. Add intermittent connectivity, and you're watching your control plane oscillate between "degraded" and "completely unavailable."

The Source of Truth Problem

Centralized architectures create a fundamental tension at the edge: the central database demands both responsiveness *and* consistency across locations that may be thousands of miles apart, connected by contested or degraded links.

The harder problem isn't the consensus algorithm. It's the assumption that consensus is required at all.

Consider what actually needs to be globally consistent in an edge deployment. Workload placement? The node running the workload already knows what it's running. Routing tables? They need to reflect reality, but "reality" is inherently distributed - each node's state is authoritative for that node.

The architectural insight: **flip the source of truth**. Instead of central orchestrators deciding what runs where and propagating that decision downward, let individual nodes be authoritative for their own workloads.

Learning from Network Routing

This isn't a new idea. Network routing protocols solved this problem decades ago.

A protocol like OSPF is a "link-state routing protocol" - routers are sources of truth for their own links and are responsible for communicating changes to peers. There's no central routing database blessing every path. The network converges on a consistent view through efficient propagation, not through consensus elections.

Edge orchestration can work the same way. Nodes own their state. Changes propagate via gossip. Conflicts are resolved through well-understood mechanisms (last-write-wins with causal ordering, CRDTs, or domain-specific resolution). The system converges quickly without requiring any node to wait for acknowledgment from a central authority.

What This Looks Like Operationally

In a decentralized control plane:

  • No leader elections: There's no master node whose failure requires expensive recovery. Every node can operate independently.

  • No consensus bottlenecks: Updates propagate in seconds via gossip, regardless of how many nodes exist or where they're located.

  • Graceful degradation: When connectivity degrades, nodes continue operating with their local state. When connectivity returns, they reconcile automatically.

  • Local reads, local decisions: Routing decisions and workload queries hit local state - no round-trips to a distant control plane.

The trade-off is obvious: you give up strong consistency for availability and partition tolerance. But at the edge, that's almost always the right trade. A system that's consistent but unavailable is worthless in a contested environment.

The Blast Radius Question

Decentralization doesn't eliminate failure modes - it changes them. In a centralized system, control plane failures affect everything. In a decentralized system, bugs and bad data can propagate *fast*.

The mitigation is regionalization. Instead of a single global state domain, create hierarchical zones. Fine-grained state stays regional. Global state contains only what's necessary for cross-region routing. This bounds the blast radius: a bug in a regional state affects only that region.

Why This Matters for Defense and Disconnected Operations

Traditional enterprise orchestration assumes the happy path: reliable connectivity, stable infrastructure, predictable network topology. Defense and edge environments offer none of these.

In denied, degraded, intermittent, or limited (DDIL) environments, your control plane architecture *is* your availability architecture. A system that requires phoning home to a central authority will fail when that link is contested. A system where each node is self-sufficient and eventually consistent can continue operating through the storm.

This isn't theoretical. It's the difference between a platform that enables mission continuity and one that becomes a liability the moment conditions get difficult.


Ready to revisit your enterprise security solution?

Traditional virtualization wasn't designed for today's edge requirements. Starlight is different.

Learn more
Brad Sollar

Brad is the CTO and Co-Founder of Mainsail and brings 25+ years of experience innovating and accelerating the deployment of cutting-edge technologies. Brad has a diverse background in offensive and defensive cyber operations for the Army during his time at Lockheed Martin and has driven innovation in emerging technologies at MITRE. At Red Hat, he guided federal clients in adopting Kubernetes and automation solutions. As Field CTO at Presidio, Brad helped organizations implement hybrid cloud strategies using VMware, Cisco, and AWS. Throughout his career, Brad has continually demonstrated his technical prowess in architecting modern solutions to address our most complex cybersecurity challenges.

Previous
Previous

SOF Week 2026: Mainsail Heads to Tampa