Optimization for Edge Workloads
Looking at the dod's most significant collaborative effort called The Joint, all domain command and control JADC2 with the main goal is to expose data between the forces and weapon systems to be able to use AI and ML in real-time to make strategic decisions. Each branch has developed its program to oversee development and integration; the Navy has Project Overmatch, the Army has Project Convergence, and the Air Force has Advanced Battle Space Management. Each component will specialize and coordinating efforts from guidance, policy, program offices, and cybersecurity. There are many different things to take into account to be successful with such an ambitious goal. We will look at what's needed to connect and expose data from distributed edge systems.
One thing that's common with Edge devices is that they are all resource-constrained, meaning that there are not infinitely scalable resources like there are in the Cloud. This means that all workloads have to be appropriately sized so that they do not cause resource contention or affect other co-located workloads on the same device. This presents an issue for many of today’s virtualization platforms where they share hardware between virtual machines and use scheduling orchestrators to share resources between workloads/VMs fairly.
- Noisy neighbor: it is hard to guarantee any quality of service QoS in multitenant environments due to unpredictable workloads and resource demands. The noisy neighbor problem arises when a workload is located on the same machine and competes for system resources, introducing latency, reduced throughput, and jitter to your workload. Noisy neighbors are not only a public cloud phenomenon, but it can also happen in traditional virtualization environments.
- Over-scheduling: The second issue is resource contention (over-scheduling). Resource contention can arise in these systems when multiple workloads come under load at the same time. The most common theme in these situations is that everyone’s workload suffers. Over-scheduling can be detrimental to time-sensitive workloads and be a deal-breaker for any real-time system.
These two issues can cause issues with the workloads running at the edge, things like AI/ML & HPC where jitter, throughput, and latency can be detrimental to these types of workloads. AI/ML at the edge is where models are usually deployed and run, and AI decisions being sent back to commanders to aid in their decision-making. Model results are also part of a training feedback loop to continuously improve the model and refine results.
To combat interference that can present in multi-tenant systems, vendors look to software-based controls such as CPU pinning for VMs that ensure workloads are not moved around and their QoS service levels are set by being pinned to one CPU. Scheduling policies can give certain VMs higher QoS than other VMs by the scheduler. While there are many additional settings and configurations, most of these controls are done at the hypervisor layer and depend on a scheduler to share VM resources between VMs and fairly enforce QoS policies.
Partitioning hypervisors uses a different model to ensure the highest QoS for workloads to combat noisy neighbors and interference in multitenant systems. Partitioning hypervisors can partition system hardware and dedicate hardware resources to each VM. No longer do virtual machines have to compete for resources, worry about noisy neighbors, or have enough computing power when the entire system is under load. Traditional virtualization time shares the physical hardware between virtual machines and using a scheduler to share processing time on the CPU. Partitioning hypervisors dedicate hardware to the VM at boot time instead of sharing the hardware between other VMs. Partitioning hypervisors workloads have dedicated QoS that is unmatched by traditional virtualization.
- Partitioning Hypervisors use micro-segmentation to partition system hardware that gets dedicated to each virtual machine (VM).
- Partitioning Hypervisors can use the latest processor features to isolate the CPU’s cores and cache, VPU, memory, storage, and network.
Pooled vs. Dedicated Resources
In traditional virtualization, resources are pooled, and a scheduler is used to place workloads within these resource pools of compute. The scheduled workload is usually allocated shares or time they can process on the systems CPU. Due to resource contention and noisy neighbors, the reality of getting the fully promised utilization of CPU core and cache is significantly diminished under traditional shared and pooled resources. Partitioning hypervisors take a different approach and dedicates CPU cores to workloads with 100% utilization for that workload. This is shown in (fig 2), where workloads can be given full CPU utilization and can also in most cases even perform optimal bin-packing can place more workloads on the same hardware where pooled resources can not.
Mainsail Industries has created a demo to showcase how partitioning hypervisors protect workloads from Noisy Neighbors to visualize the increased QoS and determinism that partitioning hypervisors bring. This demo is of a Kubernetes cluster where half of the cluster is traditional KVM virtualization, and the other half is a partitioning hypervisor. The same application is running on both sides, and we are monitoring them both for performance and latency. To simulate a noisy neighbor, we spin up 20 SQL containers on both sides of the cluster to stress the workload and see how our app performs when resource contention occurs. Our demo clearly shows the KVM side introduces high latency and has a performance impact. In contrast, the partitioning hypervisor side is unaffected because the hardware is dedicated to the VM and not shared.
This technology works at low levels in the stack to provide a higher quality of service for workloads and integrates with higher-level orchestration frameworks like OpenStack and Kubernetes. We are building security and performance from the ground up and will continue developing and providing optimal tuning profiles for edge workloads today and in the future.