User Guide

Introduction

Welcome to the CNF Reference Architecture user guide. This guide provides instructions on how to run a sample containerized networking application in a multi-node Kubernetes cluster comprised of AArch64 machines.

This reference solution is targeted for a networking software developer or performance analysis engineer who has in-depth networking knowledge, but does not know AArch64 architecture necessarily.

Mastering knowledge on certain open source projects, e.g., Ansible, Kubernetes, DPDK, will help gain deeper understanding of this guide and reference solution more easily.

This guide is intended to describe complex and practical uses cases requiring complex test setup. By following the steps of this guide to the end, you will setup a multi-node Kubernetes cluster. One machine will serve as the Kubernetes controller and host a private Docker registry to hold custom container images. The worker nodes will run Application Pods, like DPDK Testpmd. The multi-node Kubernetes cluster topology is shown below.

../_images/k8s-cluster.png

Multi-node Kubernetes cluster topology

The topology diagram above illustrates the major components of the deployment and their relationship.

  • DPDK testpmd application, implements 5-tuple swap networking function in software and forwards packets out the same port which receives them.

  • TG (Traffic Generator), generates and sends packets to the worker node’s NIC card via the Ethernet cable. It can be hardware TG, e.g., IXIA chassis, or software TG running on regular server, e.g., TRex, DPDK Pktgen, Scapy.

  • Management Node, can be any bare-metal machine, VM, or container. It is used to download the project source code, login to the controller and worker nodes to create the Kubernetes cluster and deploy the application.

Infrastructure Setup

This guide can be run on physical hardware or on AWS EC2 cloud instances.

This guide requires the following setup:

../_images/user_guide_hw.png

Physical Hardware Setup

  1. Controller Node can be any machine that has a network connection to the other machines in the Kubernetes cluster. The solution is tested against an AArch64 machine as the controller node.

Hardware Minimum Requirements

The Controller Node has the following hardware requirements:

  • Minimum 1GHz and 2 CPU cores

  • Minimum 8GB RAM

  • Connection to the internet to download and install packages

  • Connection to the worker nodes

Software Minimum Requirements

The following items are expected of the Controller Node’s software environment:

  • Controller Node is running Ubuntu 20.04 (Focal)

  • Admin (root) privileges are required

  • The Fully Qualified Domain Name (FQDN) of the Controller Node can be checked with python3 -c 'import socket; print(socket.getfqdn())' command. See Troubleshooting if the proper FQDN is not shown.

  1. Worker Nodes are any number of AArch64 architecture machines. NIC card is plugged into a PCIe slot and is connected to a traffic generator with an Ethernet cable.

Hardware Minimum Requirements

The Worker Nodes have the following hardware requirements:

  • AArch64 v8 CPU

  • Minimum 1GHz and 4 CPU cores

  • DPDK compatible NIC

  • Connection to the internet to download and install packages

  • Minimum 8G of RAM

  • Support 1G Hugepages

Software Minimum Requirements

  • Worker node is running Ubuntu 20.04 (Focal)

  • Admin (root) privileges are required

  • PCIe address of the NIC port(s) attached to the traffic generator is confirmed with sudo lshw -C network -businfo

  • CPU cores are isolated and 1GB hugepages reserved via required Linux command line parameters. See FAQ for more details.

There can be any number of worker nodes. To use a single-node cluster, refer to the Quickstart Guide.

  1. Management node can be any bare-metal, VM, or container. The management node is used to download the repository, access the cluster nodes via ssh and configure the Kubernetes cluster by executing an Ansible playbook. The Ansible playbook is executed locally on management node and it configures the cluster nodes via ssh.

Software Minimum Requirements

  • Can execute Ansible

  • Can ssh into each cluster node using SSH keys. See FAQ for more details.

  • Admin (root) or sudo privileges are required

  1. TG can be any traffic generator capable of generating IP packets.

AWS EC2 Setup

  1. Controller Node can be any EC2 AArch64 machine that has a network connection to the other machines in the Kubernetes cluster.

EC2 Requirements

The Controller Node has the following hardware requirements:

  • c6gn.xlarge or c7gn.xlarge (or larger) instance

  • Connection to the internet to download and install packages

  • Connection to the worker nodes

  • EC2 instance is associated with the required AWS VPC CNI IAM policies, in addition to the AmazonEC2ContainerRegistryReadOnly policy.

Software Minimum Requirements

The following items are expected of the Controller Node’s software environment:

  • Controller Node is running Amazon Linux 2

  • Admin (root) privileges are required

  1. Worker Nodes are any number of AArch64 EC2 instances meeting the following requirements:

EC2 Requirements

  • c6gn.xlarge or c7gn.xlarge (or larger) instance type

  • Secondary ENI attached, with the node.k8s.amazonaws.com/no_manage: true tag applied. The ENI’s PCIe address is known

    • It should be 0000:00:06.0

  • Connection to the internet to download and install packages

Software Requirements

  • Amazon Linux 2 AMI

  • aws CLI installed, with permission to describe-instance-types

  • CPU cores are isolated via the isolcpus in the Linux command line parameters. See FAQ for more details

  • At least one 1G hugepage is available. To easily allocate one, add the relevant Linux command line parameters as describe in the FAQ

  • SSH access enabled via SSH keypair

  • Admin (root) or sudo privileges

  • EC2 instance is associated with the required AWS VPC CNI IAM policies, in addition to the AmazonEC2ContainerRegistryReadOnly policy.

  1. Management node can be any bare-metal, VM, or container. The management node is used to download the repository, access the DUT via ssh and configure Kubernetes cluster by executing an Ansible playbook. The Ansible playbook is executed locally on management node and it configures the DUT via ssh.

Software Minimum Requirements

  • Can execute Ansible

  • Can ssh into the DUT using SSH keys. See FAQ for more details.

  • Admin (root) or sudo privileges are required

  1. TG can be any traffic generator capable of generating IP packets. For EC2 deployment, this is typically another EC2 instance in the same VPC running a software based traffic generator, such as Pktgen DPDK.

Tested Platforms

This solution is tested on the following platforms.

Physical Hardware

Cluster Nodes

NIC

  • Mellanox ConnectX-5

    • OFED driver: MLNX_OFED_LINUX-5.4-3.1.0.0

    • Firmware version: 16.30.1004 (MT_0000000013).

  • Intel X710

    • Firmware version: 6.01

Note

To use Mellanox NIC, install OFED driver, update and configure NIC firmware by following the guidance in FAQ.

Management Node

  • Ubuntu 20.04 system

    • Python 3.8

    • Ansible 6.5.0

AWS EC2 Instances

Cluster Nodes

  • c6gn.xlarge instance and c6gn.2xlarge instance

    • Amazon Linux 2

    • Kernel 5.10.184-175.731.amzn2.aarch64

    • Security group settings

      • Security group settings added same as this Kubernetes Ports and Protocols guidelines

      • Additionally for allowing traffic between x86 and arm EC2 give permissions to all ports and protocol in inbound rules.

    • Secondary ENI attached at device index 1, with node.k8s.amazonaws.com/no_manage set to true

Management Node

  • Ubuntu 20.04 system

    • Python 3.8

    • Ansible 6.5.0

Prerequisite

Management Node

Management node needs to install dependencies, e.g., git, curl, python3.8, pip, Ansible, repo. Follow below guidelines on Ubuntu 20.04.

  1. Make sure sudo is available and install git, curl, python3.8, python3-pip, python-is-python3 by executing

    $ sudo apt-get update
    $ sudo apt-get install git curl python3.8 -y
    $ sudo apt-get install python3-pip python-is-python3 -y
    
  2. Install ansible and netaddr by executing

    $ sudo python3 -m pip install ansible==6.5.0 netaddr
    

Note

Install the ansible and not the ansible-core package, as this solution makes use of community packages not included in the ansible-core python package.

  1. Configure git with your name and email address

    $ git config --global user.email "[email protected]"
    $ git config --global user.name "Your Name"
    
  2. Follow the instructions provided in git-repo to install the repo tool manually

  3. Follow the FAQ to setup SSH keys on the management node

DUT

Complete below steps by following the suggestions provided.

  1. Follow the FAQ to setup DUT with isolated CPUs and 1G hugepages.

  2. Update NIC firmware and drivers by following the guidance in the FAQ. Not applicable to EC2 instances.

  3. Remove any routes used by the dataplane interfaces. This may cause loss of connectivity on those interfaces.

Download Source Code

Unless mentioned specifically, all operations in this section are executed on management node.

Create a new folder that will be the workspace, henceforth referred to as <nw_cra_workspace> in these instructions:

mkdir <nw_cra_workspace>
cd <nw_cra_workspace>
export NW_CRA_RELEASE=refs/tags/NW-CRA-2024.03.29

Note

Sometimes new features and additional bug fixes are made available in the git repositories, but are not tagged yet as part of a release. To pick up these latest changes, remove the -b <release tag> option from the repo init command below. However, please be aware that such untagged changes may not be formally verified and should be considered unstable until they are tagged in an official release.

To clone the repository, run the following commands:

repo init \
    -u https://git.gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-manifest.git \
    -b ${NW_CRA_RELEASE} \
    -m cnf-reference-arch.xml
repo sync

Create Kubernetes Cluster

Unless mentioned specifically, all operations henceforth are executed on management node.

Create Ansible Inventory File

The Ansible playbooks in this repository are easiest to use with inventory files to keep track of the cluster nodes. For this solution we need one inventory file.

A template inventory.ini is provided at <nw_cra_workspace>/cnf-reference-arch/inventory.ini with the following contents:

[controller]
<fqdn_or_ec2_ip> ansible_user=<remote_user> ansible_private_key_file=<key_location>
; replace above line with DUT FQDN & optionally ansible_user and ansible_private_key_file.
; If an optional variable is not used, delete the entire key=<placeholder>.

[worker]
<fqdn_or_ec2_ip> ansible_user=<remote_user> pcie_addr=<pcie_addr_from_lshw> dpdk_driver=<driver_name> ansible_private_key_file=<key_location>
; replace above line with DUT FQDN, PCIe address, DPDK linux driver & optionally ansible_user and ansible_private_key_file.
; If an optional variable is not used, delete the entire key=<placeholder>.

Filling in of the inventory file differs between a physical hardware setup and an AWS EC2 setup.

Physical Hardware Inventory File

Under the [controller] heading, replace <fqdn_or_ec2_ip> with the FQDN of the Controller Node. Under the [worker] heading, replace <fqdn_or_ec2_ip> with the FQDN of a worker node, or an SSH alias for a worker node. If the controller node is also a worker node, the FQDN should be the exact same under both the [controller] and [worker] headings.

<remote_user> specifies the user name to use to login to that node.

Replace <pcie_addr_from_lshw> with the PCIe address of the port on the worker node connected to the traffic generator.

If the worker node uses Mellanox ConnectX-5 NIC to connect the traffic generator, replace <driver_name> with mlx5_core. Otherwise, replace it with vfio-pci.

ansible_private_key_file should be set to the identity file used to connect to each instance if other than the default key used by ssh.

If multiple worker nodes are to be used, each one should be a separate line under the worker tag, with ansible_user, pcie_addr, dpdk_driver and ansible_private_key_file filled in per worker node.

As an example, if the user name used to access the cluster nodes is user1, the controller’s FQDN is dut.arm.com, the sole worker is reachable at worker-1 and is connected to the traffic generator on PCIe address 0000:06:00.1 with a NIC compatible with the vfio-pci driver, then inventory.ini would contain:

[controller]
dut.arm.com ansible_user=user1

[worker]
worker-1 ansible_user=user1 pcie_addr=0000:06:00.1 dpdk_driver=vfio-pci

Note

All PCIe addresses for a single node must work with the same DPDK driver. This solution does not support per-address DPDK drivers without modification.

If worker-1 also had PCIe address 0000:06:00.0 connected to a traffic generator, then inventory.ini would contain:

[controller]
dut.arm.com ansible_user=user1

[worker]
worker-1 ansible_user=user1 pcie_addr="['0000:06:00.1', '0000:06:00.0']" dpdk_driver=vfio-pci

If the same setup also included a worker-2 which is connected to a traffic generator on PCIe address 0000:09:00.0 with a Mellanox NIC, and used different-key.pem as the private key, then inventory.ini would contain:

[controller]
dut.arm.com ansible_user=user1

[worker]
worker-1 ansible_user=user1 pcie_addr="['0000:06:00.1', '0000:06:00.0']" dpdk_driver=vfio-pci
worker-2 ansible_user=user1 pcie_addr=0000:09:00.0 dpdk_driver=mlx5_core ansible_private_key_file=different-key.pem

AWS EC2 Inventory File

Under the [controller] heading, replace <fqdn_or_ec2_ip> with the primary IP address of the Controller Node. Under the [worker] heading, replace <fqdn_or_ec2_ip> with the primary IP address of a worker node.

<remote_user> specifies the user name to use to login to that node.

Replace <pcie_addr_from_lshw> with the PCIe address of the secondary ENI with tag node.k8s.amazonaws.com/no_manage: true. This is typically 0000:00:06.0.

Replace <driver_name> with igb_uio.

ansible_private_key_file should be set to the SSH key pair generated at instance creation.

As an example, if the user name to access the cluster nodes is ec2-user, the controller’s IP is 10.100.100.100, the sole worker’s IP is 10.100.200.200 the private key file is ssh-key-pair.pem, and the secondary ENI PCIe address is 0000:00:06.0, then inventory.ini would contain:

[controller]
10.100.100.100 ansible_user=ec2-user ansible_private_key_file=ssh-key-pair.pem

[worker]
10.100.200.200 ansible_user=ec2-user pcie_addr=0000:00:06.0 dpdk_driver=igb_uio ansible_private_key_file=ssh-key-pair.pem

If the same setup also included another worker node with primary IP 10.100.200.210 with a secondary ENI PCIe address of 0000:00:06.0 and used the same ec2-user and ssh-key-pair.pem, then inventory.ini would contain:

[controller]
10.100.100.100 ansible_user=ec2-user ansible_private_key_file=ssh-key-pair.pem

[worker]
10.100.200.200 ansible_user=ec2-user pcie_addr=0000:00:06.0 dpdk_driver=igb_uio ansible_private_key_file=ssh-key-pair.pem
10.100.200.210 ansible_user=ec2-user pcie_addr=0000:00:06.0 dpdk_driver=igb_uio ansible_private_key_file=ssh-key-pair.pem

If the worker with IP address 10.100.200.200 had another secondary ENI with the tag node.k8s.amazonaws.com/no_manage: true at PCIe address 0000:00:07.0, then inventory.ini would contain:

[controller]
10.100.100.100 ansible_user=ec2-user ansible_private_key_file=ssh-key-pair.pem

[worker]
10.100.200.200 ansible_user=ec2-user pcie_addr="['0000:00:06.0', '0000:00:07.0']" dpdk_driver=igb_uio ansible_private_key_file=ssh-key-pair.pem
10.100.200.210 ansible_user=ec2-user pcie_addr=0000:00:06.0 dpdk_driver=igb_uio ansible_private_key_file=ssh-key-pair.pem

Setup Kubernetes Cluster

Next, setup the Kubernetes cluster by executing the create-cluster.yaml playbook. The playbook takes multiple override parameters that slightly modify its behavior.

Physical Hardware

To execute the playbook without any override parameters, run ansible-playbook -i inventory.ini -K create-cluster.yaml.

EC2 Instances

First, note the AWS region the EC2 instance is deployed in. Next, use this table to obtain the correct AWS Elastic Container Registry (ECR) URL for the AWS region. To execute the playbook without any override parameters, substitute the corresponding values for aws_region and ecr_registry_url and run:

$ ansible-playbook -i inventory.ini create-cluster.yaml -e '{aws_inst: true, deploy_on_vfs: false, aws_region: us-west-2, ecr_registry_url: 602401143452.dkr.ecr.us-west-2.amazonaws.com}'

Playbook Summary

The playbook will operate in a few stages.

Stage 1: Install necessary packages and configuration
  1. Install packages to use apt over HTTPS

  2. Install python3 and pip

  3. Add the Docker apt repository and install Docker CE (Ubuntu only)

  4. Install Docker via amazon-linux-extras (Amazon Linux 2 only)

  5. Add Kubernetes apt repository and install Kubernetes packages (Ubuntu only)

  6. Install Kubernetes binaries and systemd service files (Amazon Linux 2 only)

  7. Install required python packages via pip

  8. Add remote user to the docker group

  9. Disable swap

  10. Clean up any prior K8s clusters

  11. Configure containerd to use systemd cgroups

Stage 2: Create VFs and bind ports to specified driver

The playbook will by default create 2 VFs per PF and note the VF vendor/device ID for each worker node. It will also bind the VFs to the designated Linux driver for DPDK.

Stage 3: Create and trust a self-signed certificate

The playbook will create a self-signed certificate on the controller node, and have each node trust it. This is used by the docker registry to communicate over HTTPS.

Stage 4: Setup Kubernetes controller node

The playbook will perform the following steps on the controller node:

  1. Start the Kubernetes control plane using kubeadm

  2. Allow the controller node user to use kubectl to interact with the cluster

  3. Copy the command to join worker nodes to the cluster to the management node

  4. Start a private docker registry using the self-signed certificate

  5. Install default CNI - Calico for physical hardware, AWS VPC CNI for EC2 instances. AWS VPC CNI is installed using the region-specific ECR repository.

  6. Generate and apply a configuration for the SR-IOV Device Manager

  7. Install Multus CNI

  8. Apply a Multus configuration

Stage 5: Setup the Kubernetes worker node(s)

The playbook will perform the following steps on the worker nodes:

  1. Get a list of non-isolated CPUs

  2. Join the Kubernetes cluster

  3. Configure the kubelet to use the static CPU policy & dedicate isolated CPUs to Pods

  4. Configure appropriate value for max pods due to AWS VPC CNI limitations (EC2 instances only)

  5. Build an SR-IOV CNI image for Arm & push to the controller’s private registry (performed by only one worker node; Ubuntu only)

  6. Install SR-IOV CNI (Ubuntu only)

  7. Install the SR-IOV Device Plugin

Deploy on existing VFs

The default behavior is to provide PF PCIe addresses, and the solution will create VFs for each PF to provide to the application. However, it may not be possible to dedicate an entire PF to this solution. In this case, VFs can be created ahead of time and a subset provided to this solution.

To specify specific VFs to consume, put the allowed VF PCIe addresses in the pcie_addr section of the inventory.ini file. The solution will automatically detect the PCIe addresses are VFs and setup the cluster correctly.

For example, a worker node may have the following lshw output:

Bus info          Device           Class      Description
=========================================================
pci@0000:09:00.0  enp9s0f0         network    Ethernet Controller X710 for 10GbE SFP+
pci@0000:09:00.1  enp9s0f1         network    Ethernet Controller X710 for 10GbE SFP+
pci@0000:09:00.2  enp9s0f2         network    Ethernet Controller X710 for 10GbE SFP+
pci@0000:09:00.3  enp9s0f3         network    Ethernet Controller X710 for 10GbE SFP+
pci@0000:09:02.0                   network    Ethernet Virtual Function 700 Series
pci@0000:09:02.1                   network    Ethernet Virtual Function 700 Series
pci@0000:09:06.0                   network    Ethernet Virtual Function 700 Series
pci@0000:09:06.1                   network    Ethernet Virtual Function 700 Series

In this example, PCIe addresses 0000:09:00.0 and 0000:09:00.1 are PFs connected to the traffic generator. They each have two VFs created and bound to the vfio-pci driver. The VF PCIe addresses are 0000:09:02.0, 0000:09:02.1, 0000:09:06.0 and 0000:09:06.1. To deploy this solution on solely 0000:09:02.0 and 0000:09:06.0, the inventory.ini would look like:

[controller]
dut.arm.com ansible_user=user1

[worker]
worker-1 ansible_user=user1 pcie_addr="['0000:09:02.0', '0000:09:06.0']" dpdk_driver=vfio-pci

This enables other workloads to freely use the VFs on 0000:09:02.1 and 0000:09:06.1.

The solution does not currently support providing both PF and VF PCIe addresses for pcie_addr. Hybrid setups (where some worker nodes have VFs for pcie_addr and others have PFs) is not tested.

Override Options

This solution allows for modifying its behavior by setting variables. To set certain variables at run-time, follow these docs.

Note

Several overrides are not strings, so the key=value format may not work correctly for every override. Therefore, use the JSON format or separate JSON/YAML file to set overrides.

AWS EC2 Installation

When CRA is deployed on EC2 instances, this override is needed to tailor the installation to the AWS environment. To set this override, set aws_inst: true.

AWS EC2 Region

AWS Region information is needed to ensure the maxPods override is computed correctly. The default region is us-west-2. To set a different region, set aws_region accordingly. All EC2 instances need to be in the same region.

AWS ECR URL

When CRA is deployed on EC2 instances, AWS VPC CNI is used to provide the default Kubernetes networking. The AWS VPC CNI image to use is region specific. To obtain the correct ECR URL for a given region, use this table. To set the override, provide the correct ECR URL value to ecr_registry_url. The default value is 602401143452.dkr.ecr.us-west-2.amazonaws.com, which corresponds to the us-west-2 region.

Force VF creation

The default behavior of VF creation for a certain PCIe address would just try to create a certain number (2 by default) of VFs under it, but it may fail and show error like this:

echo: write error: Device or resource busy

which is due to existing VFs which have been created before. To override this error condition, set the force_vf_creation to true, which would clear prior VFs before creating new VFs. Only set this option if the existing VFs are not used now. The default value of force_vf_creation is false.

Deploy on PFs

The default behavior for PCIe addresses that are PFs is to create VFs on each PF and provide those VFs to the application Pods. If VFs should not be created, and the PFs provided to the application Pods directly, then deploy_on_vfs should be set to false. This override is mandatory for an EC2 instance deployment, as ENIs are unable to create VFs.

Note that setting deploy_on_vfs: false will install and use the host-device CNI. The SR-IOV CNI is still available in lab-based deployments for future use.

Providing VF PCIe addresses for pcie_addr and setting deploy_on_vfs to false is unsupported.

Modify Pod CIDR

Each K8s Pod is assigned its own IP address. It is important the IP block for pods has no overlap with other IPs on the network. To change the Pod CIDR, set pod_cidr to an unoccupied CIDR. This is ignored in EC2 instance deployments due to incompatibility with the AWS VPC CNI.

Supply additional arguments to kubeadm init

Any additional arguments needed to be supplied to kubeadm init can be done so by setting kubeadm_init_extra_args to a string.

Use VFIO without IOMMU

When deploying to a platform without an IOMMU (like a virtual machine), the vfio-pci kernel module needs a parameter set. By setting no_iommu to 1, the playbook will take care of loading the kernel module properly.

Change number of VFs per PF

Set num_vfs to the number of VFs to create for each PF. Ignored if deploy_on_vfs is false. Default is 2 VFs per PF.

Self-signed certificate directory

Set cert_dir to place the self-signed certificates in the specified directory. By default, they will be placed in ~/certs on the controller node.

Timeout for Nodes to be Ready

Set node_wait_timeout to configure how long to wait for all K8s nodes to reach the Ready state. If any node is not ready by the end of the timeout, the playbook will exit with error. The wait occurs after joining worker nodes to the K8s cluster (if not a single-node cluster), but before building/installing the SR-IOV CNI. The default is 600s, or 10 minutes.

Example

For example, the following command sets all possible overrides:

ansible-playbook -i inventory.ini -K create-cluster.yaml -e @vars.yaml

The -e parameter loads variables from the vars.yaml file. In this example, it contains:

deploy_on_vfs: false
pod_cidr: 192.168.54.0/24
kubeadm_init_extra_args: "--apiserver-advertise-address=\"192.168.0.24\" --apiserver-cert-extra-sans=\"192.168.0.24\""
no_iommu: 1
num_vfs: 5 # ignored because VFs won't be created
cert_dir: ~/my-cert-dir
node_wait_timeout: "300s"

If the user is sure that VFs can be created on the desired PF PCIe address, a tag of force_vf_creation can be added and set to true when deploy_on_vfs is true/unset:

force_vf_creation: true

The set of valid overrides differs slightly for an EC2 based deployment. This example sets all relevant overrides for EC2 instances:

aws_inst: true
aws_region: us-east-1
deploy_on_vfs: false
kubeadm_init_extra_args: "--apiserver-advertise-address=\"192.168.0.24\" --apiserver-cert-extra-sans=\"192.168.0.24\""
cert_dir: ~/my-cert-dir
node_wait_timeout: "300s"

Porting/Integrating to another Arm platform

Although the solution is tested on the platforms listed in the Tested Platforms section, the solution should work on other Arm platforms. However, such platforms should support Arm v8 architecture at least and be supported by the underlying components.

Sample Applications

High Availability