Solution Design

Introduction

Welcome to the CNF reference architecture solution design documentation. To best understand this document, an understanding of Kubernetes and network virtualization concepts (i.e. CNI, SR-IOV, VF) is recommended.

This document details design aspects that tailor the cloud-native environment for networking workloads. This is mainly two items:

  • Dedicating isolated CPUs to networking workload Pods

  • Enabling the Kubernetes control plane to manage NIC virtual functions (VF)

In the cloud-native paradigm, Pods declare the resources they need, and Kubernetes provides them. This means the Kubernetes cluster has to be configured to understand and provision the types of resources networking workload Pods will need. Additionally, Pods need to know how to request the resources they need (i.e., dedicated CPUs and VFs).

Additionally, this document discusses how the local docker registry is created and used.

Cluster Configuration

Dedicated Isolated CPUs

Networking workloads expect to run on dedicated CPUs that are isolated from the Linux kernel scheduler. Isolation is provided by the isolcpus kernel command line parameter.

First, Kubernetes has to be configured to provide Pods dedicated CPUs. For this, the kubelet on each worker has to be configured to use the static CPU policy. This policy manages a pool of CPUs that pods can share. When a Pod is granted a dedicated CPU, that CPU is removed from this pool and placed in that Pod’s cpuset. Therefore, there is only one Pod on that CPU.

Once Pods are able to utilize dedicated CPUs, Kubernetes has to be configured to only use isolated CPUs for Pods. This is achieved by setting the reservedSystemCPUs field of the kubelet configuration file. This field defines what CPUs are reserved for host level system threads and Kubernetes related threads. So, if this is set to all non-isolated CPUs, then all host level system threads and Kubernetes related threads run on non-isolated CPUs.

Combining these enables Pods to obtain exclusive, dedicated CPUs isolated from the rest of the system.

Enabling Kubernetes to Manage Virtual Functions

Kubernetes enables management of devices via Device Plugins. This enables the Kubernetes control plane to understand what devices are available on which nodes, and assign them to specific Pods.

This solution utilizes the SR-IOV Network Device Plugin to make Virtual Functions (VFs) allocatable to Pods. The SR-IOV Device Plugin runs on every worker node to get information on all available VFs. Based upon the configured selectors, VFs are added under resource names. This solution groups all VFs intended for its use under the arm.com/dpdk name.

Once the VFs are grouped together on each node, the SR-IOV Device Plugin makes this known to the Kubernetes control plane through each node’s kubelet. Once this is done, the Kubernetes control plane understands which VFs are available on which nodes, and can dedicate specific VFs to specific Pods.

Enabling Virtual Functions to be Added to Pods

The SR-IOV Device Plugin by itself is not sufficient to provide Pods access to VFs. The SR-IOV CNI must be used to add VFs as a separate network interface to Pods.

SR-IOV CNI is unable to provide a default Pod network, and thus relies upon a meta CNI such as Multus CNI. Multus CNI enables creation of “multi-homed” pods, or pods with multiple network interfaces. This enables standard CNIs (e.g. Calico, Cilium, Flannel) to provide the Kubernetes-aware networking (Pod to Pod, Pod to service, etc.) while SR-IOV CNI focuses on “just” adding the VF as another network interface to the Pod.

To summarize, the SR-IOV Device Plugin enables Kubernetes to understand and allocate VFs to Pods. The SR-IOV CNI is needed to add the VF as a secondary network interface to a Pod. To enable a multi-homed Pod to use SR-IOV CNI, a meta CNI like Multus is needed.

Multus CNI is configured using a NetworkAttachmentDefinitions. The name of the NetworkAttachmentDefinition is used by Pods to request additional network interfaces. A single NetworkAttachmentDefinition invokes a single CNI to provide an additional interface. A Pod can ask Multus to invoke any number of CNIs any number of times for any number of interfaces.

This solution needs Multus CNI to add additional interfaces for VFs using SR-IOV CNI. To accomplish this, the k8s.v1.cni.cncf.io/resourceName is added to the NetworkAttachmentDefinition metadata. This is needed so Multus will provide SR-IOV CNI with the necessary device information. This solution names the NetworkAttachmentDefinition as sriov-dpdk.

How Pods Can Utilize These Resources

Dedicated and Exclusive CPUs

Pods must have the Guaranteed Quality of Service to be allocated exclusive, dedicated CPUs. This means:

  • Every container in the Pod must have a memory limit and memory request

  • For every container in the pod, the limit must equal the request

  • The same is true for CPU requests

Virtual Function Allocation and Use

Pods first have to declare they need a VF resource. This is done in the same requests and limits fields as the CPU and memory resources. Since this solution puts every VF into the arm.com/dpdk resource name, requesting one VF is done by putting arm.com/dpdk: 1 into both limits and requests. For example, the following snippet is used in the DPDK sample application deployment:

resources:
  # Limits and requests must be equal for cpu and memory for the container to be pinned to CPUs.
        limits:
                hugepages-2Mi: 1Gi
                cpu: 1
                memory: 2Gi
                arm.com/dpdk: 1
        requests:
                cpu: 1
                memory: 2Gi
                arm.com/dpdk: 1

Inside the Pod, an environmental variable is set to inform the Pod which resource it has been allocated. In the case of this solution, the arm.com/dpdk resource name means the environmental variable is called PCIDEVICE_ARM_COM_DPDK. See examples/dpdk-l3fwd/dpdk-launch.sh for examples using this variable.

In addition to requesting the VF, the Pod must have Multus add the VF via SR-IOV CNI. This solution has configured the SR-IOV CNI to be invoked with the Multus network name sriov-dpdk. So all the pod must do is add the k8s.v1.cni.cncf.io/networks: sriov-dpdk annotation to its metadata.

Local Docker Registry

The controller node sets up a Docker registry for this solution to use. This registry holds the container images for the AArch64 build of the SR-IOV CNI and the sample application.

To setup the registry and have it be used by the other nodes in the cluster, the following steps are followed:

  1. Create a self-signed certificate for the FQDN of the controller node

  2. Trust the certificate for every node in the cluster

  3. Launch the registry using the self-signed certificate

  4. Nodes interact with the registry using the controller’s FQDN

If additional worker nodes are added to the cluster at a later time, they will need to trust the self-signed certificate. Otherwise, the additional nodes cannot pull the SR-IOV CNI Docker image to install it.