Quickstart Guide

Introduction

Welcome to the CNF Reference Architecture quickstart guide. This guide provides the quick guidance to run a sample containerized networking application in a Kubernetes cluster, which is comprised of an AArch64 machine.

This reference solution is targeted for a networking software developer or performance analysis engineer with in-depth Kubernetes and networking knowledge, but does not know AArch64 architecture necessarily.

Mastering knowledge on certain open source projects, e.g., Ansible, Kubernetes, DPDK, will help gain deeper understanding of this guide and the reference solution more easily.

By following the steps in this quickstart guide to the end, you will set up a single-node Kubernetes cluster. Kubernetes controller and application Pods are deployed on a single AArch64 machine. The DPDK testpmd sample application is deployed in the Application Pod. The Pod has one interface for the K8s network and one interface for the VF/PF/ENI connected to the network requiring packet processing. The testpmd application receives a packet on a port, swaps the source and destination MAC/IP address/port, and forwards the packet out the same port. The single-node Kubernetes cluster topology is shown as below.

_images/single-node-cluster.png

Single-node Kubernetes cluster topology

The topology diagram above illustrates the major components of the deployment and their relationship.

  • DUT (Device Under Test), is the only AArch64 machine in single-node Kubernetes cluster. Kubernetes controller and Application Pods run on this machine. This can be a physical machine or an AWS EC2 Graviton 2/3 instance.

  • DPDK testpmd application, implements 5-tuple swap networking function in software and forwards packets out the same port which receives them.

  • TG (Traffic Generator), generates and sends packets to the AArch64 machine’s NIC card via the Ethernet cable. It can be hardware TG, e.g., IXIA chassis, or software TG running on regular server, e.g., TRex, DPDK Pktgen, Scapy.

  • Management Node, can be any bare-metal machine, VM, or container. It is used to download the project source code, login to the DUT to create the Kubernetes cluster and deploy the application.

Infrastructure Setup

This guide can be run on physical hardware or on AWS EC2 cloud instances.

This guide requires the following setup:

_images/hw-setup.png

Required hardware setup

Physical Hardware Setup

  1. DUT is an AArch64 architecture machine and the only node in Kubernetes cluster. NIC card is plugged in its PCIe slot to connect the traffic generator via Ethernet cable.

Hardware Minimum Requirements

The DUT has the following hardware requirements:

  • AArch64 v8 CPU

  • Minimum 1GHz and 5 CPU cores

  • DPDK compatible NIC

  • Connection to the internet to download and install packages

  • Minimum 8G of RAM

  • At least one 1G hugepage is available. To easily allocate one, add the relevant Linux command line parameters as describe in the FAQ

Software Minimum Requirements

The following items are expected of the DUT’s software environment.

  • DUT is running Ubuntu 20.04 (Focal)

  • Admin (root) privileges are required to setup the DUT

  • The Fully Qualified Domain Name (FQDN) of the DUT can be checked with python3 -c 'import socket; print(socket.getfqdn())' command. See Troubleshooting if the proper FQDN is not shown.

  • PCIe address of the NIC port attached to the traffic generator is confirmed with sudo lshw -C network -businfo

  • CPU cores are isolated via isolcpus, nohz_full, rcu_nocbs, cpuidle.off, cpufreq.off Linux command line parameters. See FAQ for more details.

  1. Management node can be any bare-metal, VM, or container. The management node is used to download the repository, access the DUT via ssh and configure Kubernetes cluster by executing an Ansible playbook. The Ansible playbook is executed locally on management node and it configures the DUT via ssh.

Software Minimum Requirements

  • Can execute Ansible

  • Can ssh into the DUT using SSH keys. See FAQ for more details.

  • Admin (root) or sudo privileges are required

  1. TG can be any traffic generator capable of generating IP packets. TG must be connected to DUT.

AWS EC2 Setup

  1. DUT is an EC2 instance meeting the following requirements:

EC2 Requirements

  • c6gn.2xlarge or c7gn.2xlarge (or larger) instance

  • Secondary ENI attached, with the node.k8s.amazonaws.com/no_manage: true tag applied. The ENI’s PCIe address is known

    • It should be 0000:00:06.0

  • Connection to the internet to download and install packages

  • EC2 instance is associated with the required AWS VPC CNI IAM policies, in addition to the AmazonEC2ContainerRegistryReadOnly policy.

Software Requirements

  • Amazon Linux 2 AMI

  • aws CLI installed, with permission to describe-instance-types

  • CPU cores are isolated via the isolcpus in the Linux command line parameters. See FAQ for more details

  • At least one 1G hugepage is available. To easily allocate one, add the relevant Linux command line parameters as describe in the FAQ

  • SSH access enabled via SSH keypair

  • Admin (root) or sudo privileges

  1. Management node can be any bare-metal, VM, or container. The management node is used to download the repository, access the DUT via ssh and configure Kubernetes cluster by executing an Ansible playbook. The Ansible playbook is executed locally on management node and it configures the DUT via ssh.

Software Minimum Requirements

  • Can execute Ansible

  • Can ssh into the DUT using SSH keys. See FAQ for more details.

  • Admin (root) or sudo privileges are required

  1. TG can be any traffic generator capable of generating IP packets. For EC2 deployment, this is typically another EC2 instance in the same VPC running a software based traffic generator, such as Pktgen DPDK.

Tested Platforms

The steps described in this quickstart guide have been validated on the following platforms.

Physical Hardware

DUT
NIC
  • Mellanox ConnectX-5

    • OFED driver: MLNX_OFED_LINUX-5.4-3.1.0.0

    • Firmware version: 16.30.1004 (MT_0000000013).

  • Intel X710

    • Firmware version: 6.01

Note

To use Mellanox NIC, install OFED driver, update and configure NIC firmware by following the guidance in FAQ.

Management Node
  • Ubuntu 20.04 system

    • Python 3.8

    • Ansible 6.5.0

AWS EC2 Instance

DUT
  • c6gn.2xlarge instance

    • Amazon Linux 2

    • Kernel 5.10.184-175.731.amzn2.aarch64

    • Security group settings

      • Security group settings added same as this Kubernetes Ports and Protocols guidelines

      • Additionally for allowing traffic between x86 and arm EC2 give permissions to all ports and protocol in inbound rules.

    • Secondary ENI attached at device index 1, with node.k8s.amazonaws.com/no_manage set to true

Management Node
  • Ubuntu 20.04 system

    • Python 3.8

    • Ansible 6.5.0

Prerequisite

Management Node

Management node needs to install dependencies, e.g., git, curl, python3.8, pip, Ansible, repo. Follow below guidelines on Ubuntu 20.04.

  1. Make sure sudo is available and install git, curl, python3.8, python3-pip, python-is-python3 by executing

    $ sudo apt-get update
    $ sudo apt-get install git curl python3.8 -y
    $ sudo apt-get install python3-pip python-is-python3 -y
    
  2. Install ansible and netaddr by executing

    $ sudo python3 -m pip install ansible==6.5.0 netaddr
    

Note

Install the ansible and not the ansible-core package, as this solution makes use of community packages not included in the ansible-core python package.

  1. Configure git with your name and email address

    $ git config --global user.email "[email protected]"
    $ git config --global user.name "Your Name"
    
  2. Follow the instructions provided in git-repo to install the repo tool manually

  3. Follow the FAQ to setup SSH keys on the management node. For EC2 deployment, use the SSH keypair assigned to the instance.

DUT

Complete below steps by following the suggestions provided.

  1. Follow the FAQ to setup DUT with isolated CPUs, and 1G hugepages.

  2. Update NIC firmware and drivers by following the guidance in the FAQ. Not applicable to EC2 instances.

  3. Remove any routes used by the dataplane interfaces. This may cause loss of connectivity on those interfaces.

Download Source Code

Unless mentioned specifically, all operations henceforth are executed on the management node.

Create a new folder that will be the workspace, henceforth referred to as <nw_cra_workspace> in these instructions:

mkdir <nw_cra_workspace>
cd <nw_cra_workspace>
export NW_CRA_RELEASE=refs/tags/NW-CRA-2024.03.29

Note

Sometimes new features and additional bug fixes are made available in the git repositories, but are not tagged yet as part of a release. To pick up these latest changes, remove the -b <release tag> option from the repo init command below. However, please be aware that such untagged changes may not be formally verified and should be considered unstable until they are tagged in an official release.

To clone the repository, run the following commands:

repo init \
    -u https://git.gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-manifest.git \
    -b ${NW_CRA_RELEASE} \
    -m cnf-reference-arch.xml
repo sync

Create Single Node Cluster

Create Ansible Inventory File

The Ansible playbooks in this repository are easiest to use with inventory files to keep track of the cluster nodes. For this solution we need one inventory file.

A template inventory.ini is provided at <nw_cra_workspace>/cnf-reference-arch/inventory.ini with the following contents:

[controller]
<fqdn_or_ec2_ip> ansible_user=<remote_user> ansible_private_key_file=<key_location>
; replace above line with DUT FQDN & optionally ansible_user and ansible_private_key_file.
; If an optional variable is not used, delete the entire key=<placeholder>.

[worker]
<fqdn_or_ec2_ip> ansible_user=<remote_user> pcie_addr=<pcie_addr_from_lshw> dpdk_driver=<driver_name> ansible_private_key_file=<key_location>
; replace above line with DUT FQDN, PCIe address, DPDK linux driver & optionally ansible_user and ansible_private_key_file.
; If an optional variable is not used, delete the entire key=<placeholder>.

Filling in of the inventory file differs between a physical hardware setup and an AWS EC2 setup.

Physical Hardware Inventory File

Replace <fqdn_or_ec2_ip> with the FQDN of the DUT. The same FQDN should be used for both [controller] and [worker] as this is a single-node setup.

<remote_user> specifies the user name to use to login to the DUT.

Replace <pcie_addr_from_lshw> with the PCIe address of the port on the DUT connected to the traffic generator.

If the DUT uses Mellanox ConnectX-5 NIC to connect the traffic generator, replace <driver_name> with mlx5_core. Otherwise, replace it with vfio-pci.

ansible_private_key_file should be set to the identity file used to connect to each instance if other than the default key used by ssh.

As an example, if the user name used to access DUT is user1, the DUT FQDN is dut.arm.com and is connected to the traffic generator on PCIe address 0000:06:00.1 with a NIC compatible with the vfio-pci driver, then inventory.ini would contain:

[controller]
dut.arm.com ansible_user=user1

[worker]
dut.arm.com ansible_user=user1 pcie_addr=0000:06:00.1 dpdk_driver=vfio-pci

AWS EC2 Inventory File

Replace <fqdn_or_ec2_ip> with the primary IP address of the DUT. The same IP address should be used for both [controller] and [worker] as this is a single-node setup.

<remote_user> specifies the user name to use to login to the DUT.

Replace <pcie_addr_from_lshw> with the PCIe address of the secondary ENI with tag node.k8s.amazonaws.com/no_manage: true. This is typically 0000:00:06.0.

Replace <driver_name> with igb_uio.

ansible_private_key_file should be set to the SSH key pair generated at instance creation.

As an example, if the user name to access the DUT is ec2-user, the DUT IP is 10.100.100.100, the private key file is ssh-key-pair.pem, and the secondary ENI PCIe address is 0000:00:06.0, then inventory.ini would contain:

[controller]
10.100.100.100 ansible_user=ec2-user ansible_private_key_file=ssh-key-pair.pem

[worker]
10.100.100.100 ansible_user=ec2-user pcie_addr=0000:00:06.0 dpdk_driver=igb_uio ansible_private_key_file=ssh-key-pair.pem

Execute the Playbook

Physical Hardware

To setup the Kubernetes cluster on physical hardware, run:

$ ansible-playbook -i inventory.ini create-cluster.yaml -K

It will start by asking for the sudo password of the user name on DUT (the prompt may say BECOME password instead). If remote user has passwordless sudo on DUT, the -K flag can be omitted.

See the user guide for the full list of actions this playbook will take.

EC2 Instance

First, note the AWS region the EC2 instance is deployed in. Next, use this table to obtain the correct AWS Elastic Container Registry (ECR) URL for the AWS region. To setup the Kubernetes cluster on an EC2 instance, substitute the corresponding values for aws_region and ecr_registry_url and run:

$ ansible-playbook -i inventory.ini create-cluster.yaml -e '{aws_inst: true, deploy_on_vfs: false, aws_region: us-west-2, ecr_registry_url: 602401143452.dkr.ecr.us-west-2.amazonaws.com}'

See the user guide for the full list of actions this playbook will take.

Validate the Cluster

At this point in time, the setup should look like the Single-node Kubernetes cluster topology at the beginning of this document. The DUT should be in a Kubernetes cluster and running a private docker registry.

To verify, ssh into the DUT and run kubectl get nodes. The output should look like:

$ kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
dut.arm.com     Ready    control-plane   24m   v1.25.0

Also run kubectl describe node $(hostname) | grep -A 5 ^Allocatable: to ensure allocatable CPUs and 1G hugepages are correct. The output should look like:

$ kubectl describe node $(hostname) | grep -A 5 ^Allocatable:
Allocatable:
arm.com/dpdk:       2
cpu:                4
ephemeral-storage:  189217404206
hugepages-1Gi:      1Gi

Finally, verify the local docker registry is running with: docker ps -f name=registry. The output should look like:

$ docker ps -f name=registry
CONTAINER ID   IMAGE        COMMAND                  CREATED        STATUS          PORTS                            NAMES
53656144b298   registry:2   "/entrypoint.sh /etc…"   46 hours ago   Up 33 minutes   0.0.0.0:443->443/tcp, 5000/tcp   registry

Run the Sample Application

Run

Now, it is time to apply the dpdk-testpmd.yaml Ansible playbook. To do so, run the following commands on the management node:

$ cd <nw_cra_workspace>/cnf-reference-arch/examples/dpdk-testpmd
$ ansible-playbook -i ../../inventory.ini dpdk-testpmd.yaml

For EC2 instance run the following commands:

$ cd <nw_cra_workspace>/cnf-reference-arch/examples/dpdk-testpmd
$ ansible-playbook -i ../../inventory.ini dpdk-testpmd.yaml -e '{aws_inst: true, deploy_on_vfs: false}'

See the dpdk-testpmd user guide for the full list of actions this playbook will take.

Once the playbook finishes, ssh into the DUT and deploy the DPDK testpmd application with dpdk-deployment.yaml file which is copied and stored in DUT home directory:

$ cd $HOME
$ kubectl apply -f dpdk-deployment.yaml

Check deployment status with command:

$ kubectl get deploy
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
dpdk-testpmd   1/1     1            1           2m31s

Test

Monitor the application pod status by running kubectl get pods on the DUT. It may take some time to start up.

kubectl get pods should show something like:

$ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
dpdk-testpmd-fbb6cd468-d7x8g   1/1     Running   0          31s

Once the pod is in the “Running” state, view its logs with kubectl logs <podname>.

The logs should contain something similar to:

$ kubectl logs dpdk-testpmd-fbb6cd468-d7x8g
...
+ ./build/app/dpdk-testpmd --lcores 1@9,2@10 -a 0000:07:02.0 -- --forward-mode=5tswap --port-topology=loop --auto-start
...
Set 5tswap packet forwarding mode
Auto-start selected
Configuring Port 0 (socket 0)

Port 0: link state change event

Port 0: link state change event
Port 0: CA:7D:57:CB:B0:5F
Checking link statuses...

These logs show port 0 has MAC address CA:7D:57:CB:B0:5F with PCIe address 0000:07:02.0 on the DUT.

Configure the traffic generator to send packets to the NIC port, using the specified MAC as DMAC. If deploying on AWS EC2 instances, also ensure the destination IP matches the primary IP of the dataplane ENI.

This example uses a destination MAC address of CA:7D:57:CB:B0:5F and a destination IP of 198.18.0.21. Then, dpdk-testpmd will forward those packets out on port 0 after swapping the MAC, IP and port(s). In this example, the packets transmitted by dpdk-testpmd will have the source MAC set to CA:7D:57:CB:B0:5F and the source IP will be 198.18.0.21. The destination MAC and IP will be set to the source MAC and IP of the packets transmitted by the traffic generator.

Stop

The pods can be stopped by deleting the deployment by running kubectl delete deploy dpdk-testpmd on the DUT. Then, clean up the Kubernetes cluster by executing sudo kubeadm reset -f and sudo rm -rf /etc/cni/net.d on the DUT.