.. # Copyright (c) 2022-2024, Arm Limited. # # SPDX-License-Identifier: Apache-2.0 Quickstart Guide **************** Introduction ------------ Welcome to the CNF Reference Architecture quickstart guide. This guide provides the quick guidance to run a sample containerized networking application in a Kubernetes cluster, which is comprised of an AArch64 machine. This reference solution is targeted for a networking software developer or performance analysis engineer with in-depth Kubernetes and networking knowledge, but does not know AArch64 architecture necessarily. Mastering knowledge on certain open source projects, e.g., `Ansible `_, `Kubernetes `_, `DPDK `_, will help gain deeper understanding of this guide and the reference solution more easily. By following the steps in this quickstart guide to the end, you will set up a single-node Kubernetes cluster. Kubernetes controller and application Pods are deployed on a single AArch64 machine. The DPDK testpmd sample application is deployed in the Application Pod. The Pod has one interface for the K8s network and one interface for the VF/PF/ENI connected to the network requiring packet processing. The testpmd application receives a packet on a port, swaps the source and destination MAC/IP address/port, and forwards the packet out the same port. The single-node Kubernetes cluster topology is shown as below. .. _single-node-cluster: .. figure:: images/single-node-cluster.png :align: center Single-node Kubernetes cluster topology The topology diagram above illustrates the major components of the deployment and their relationship. - DUT (Device Under Test), is the only AArch64 machine in single-node Kubernetes cluster. Kubernetes controller and Application Pods run on this machine. This can be a physical machine or an AWS EC2 Graviton 2/3 instance. - DPDK testpmd application, implements 5-tuple swap networking function in software and forwards packets out the same port which receives them. - TG (Traffic Generator), generates and sends packets to the AArch64 machine's NIC card via the Ethernet cable. It can be hardware TG, e.g., IXIA chassis, or software TG running on regular server, e.g., `TRex `_, `DPDK Pktgen `_, `Scapy `_. - Management Node, can be any bare-metal machine, VM, or container. It is used to download the project source code, login to the DUT to create the Kubernetes cluster and deploy the application. Infrastructure Setup -------------------- This guide can be run on physical hardware or on AWS EC2 cloud instances. This guide requires the following setup: .. figure:: images/hw-setup.png :align: center Required hardware setup Physical Hardware Setup ~~~~~~~~~~~~~~~~~~~~~~~ 1. DUT is an AArch64 architecture machine and the only node in Kubernetes cluster. NIC card is plugged in its PCIe slot to connect the traffic generator via Ethernet cable. **Hardware Minimum Requirements** The DUT has the following hardware requirements: - AArch64 v8 CPU - Minimum 1GHz and 5 CPU cores - `DPDK compatible `__ NIC - Connection to the internet to download and install packages - Minimum 8G of RAM - At least one 1G hugepage is available. To easily allocate one, add the relevant Linux command line parameters as describe in the :ref:`FAQ ` **Software Minimum Requirements** The following items are expected of the DUT's software environment. - DUT is running Ubuntu 20.04 (Focal) - Admin (root) privileges are required to setup the DUT - The `Fully Qualified Domain Name (FQDN) `__ of the DUT can be checked with ``python3 -c 'import socket; print(socket.getfqdn())'`` command. See :ref:`Troubleshooting ` if the proper FQDN is not shown. - PCIe address of the NIC port attached to the traffic generator is confirmed with ``sudo lshw -C network -businfo`` - CPU cores are isolated via ``isolcpus, nohz_full, rcu_nocbs, cpuidle.off, cpufreq.off`` Linux command line parameters. See :ref:`FAQ ` for more details. 2. Management node can be any bare-metal, VM, or container. The management node is used to download the repository, access the DUT via ``ssh`` and configure Kubernetes cluster by executing an Ansible playbook. The Ansible playbook is executed locally on management node and it configures the DUT via ``ssh``. **Software Minimum Requirements** - Can `execute Ansible `_ - Can ``ssh`` into the DUT using SSH keys. See :ref:`FAQ ` for more details. - Admin (root) or ``sudo`` privileges are required 3. TG can be any traffic generator capable of generating IP packets. TG must be connected to DUT. AWS EC2 Setup ~~~~~~~~~~~~~ 1. DUT is an EC2 instance meeting the following requirements: **EC2 Requirements** - c6gn.2xlarge or c7gn.2xlarge (or larger) instance - Secondary ENI attached, with the ``node.k8s.amazonaws.com/no_manage: true`` tag applied. The ENI's PCIe address is known - It should be ``0000:00:06.0`` - Connection to the internet to download and install packages - EC2 instance is associated with the required AWS VPC CNI `IAM policies `__, in addition to the `AmazonEC2ContainerRegistryReadOnly policy `__. **Software Requirements** - Amazon Linux 2 AMI - ``aws`` CLI installed, with permission to `describe-instance-types `__ - CPU cores are isolated via the ``isolcpus`` in the Linux command line parameters. See :ref:`FAQ ` for more details - At least one 1G hugepage is available. To easily allocate one, add the relevant Linux command line parameters as describe in the :ref:`FAQ ` - SSH access enabled via SSH keypair - Admin (root) or ``sudo`` privileges 2. Management node can be any bare-metal, VM, or container. The management node is used to download the repository, access the DUT via ``ssh`` and configure Kubernetes cluster by executing an Ansible playbook. The Ansible playbook is executed locally on management node and it configures the DUT via ``ssh``. **Software Minimum Requirements** - Can `execute Ansible `_ - Can ``ssh`` into the DUT using SSH keys. See :ref:`FAQ ` for more details. - Admin (root) or ``sudo`` privileges are required 3. TG can be any traffic generator capable of generating IP packets. For EC2 deployment, this is typically another EC2 instance in the same VPC running a software based traffic generator, such as `Pktgen DPDK `__. Tested Platforms ~~~~~~~~~~~~~~~~ The steps described in this quickstart guide have been validated on the following platforms. Physical Hardware ^^^^^^^^^^^^^^^^^ DUT === - Ampere Altra (Neoverse-N1) - Ubuntu 20.04.3 LTS (Focal Fossa) - `Kernel 5.17.0-051700-generic `_ NIC === - `Mellanox ConnectX-5 `__ - OFED driver: MLNX_OFED_LINUX-5.4-3.1.0.0 - Firmware version: 16.30.1004 (MT\_0000000013). - Intel X710 - Firmware version: 6.01 .. note:: To use Mellanox NIC, install OFED driver, update and configure NIC firmware by following the guidance in :ref:`FAQ `. Management Node =============== - Ubuntu 20.04 system - Python 3.8 - Ansible 6.5.0 AWS EC2 Instance ^^^^^^^^^^^^^^^^ DUT === - c6gn.2xlarge instance - Amazon Linux 2 - Kernel ``5.10.184-175.731.amzn2.aarch64`` - Security group settings - Security group settings added same as this `Kubernetes Ports and Protocols guidelines `_ - Additionally for allowing traffic between x86 and arm EC2 give permissions to all ports and protocol in inbound rules. - Secondary ENI attached at device index 1, with ``node.k8s.amazonaws.com/no_manage`` set to ``true`` Management Node =============== - Ubuntu 20.04 system - Python 3.8 - Ansible 6.5.0 Prerequisite ------------ Management Node ~~~~~~~~~~~~~~~ Management node needs to install dependencies, e.g., ``git, curl, python3.8, pip, Ansible, repo``. Follow below guidelines on Ubuntu 20.04. #. Make sure ``sudo`` is available and install ``git, curl, python3.8, python3-pip, python-is-python3`` by executing :: $ sudo apt-get update $ sudo apt-get install git curl python3.8 -y $ sudo apt-get install python3-pip python-is-python3 -y #. Install ``ansible`` and ``netaddr`` by executing :: $ sudo python3 -m pip install ansible==6.5.0 netaddr .. note:: Install the ``ansible`` and not the ``ansible-core`` package, as this solution makes use of community packages not included in the ``ansible-core`` python package. 3. Configure git with your name and email address :: $ git config --global user.email "you@example.com" $ git config --global user.name "Your Name" #. Follow the instructions provided in `git-repo `__ to install the ``repo`` tool manually #. Follow the :ref:`FAQ ` to setup SSH keys on the management node. For EC2 deployment, use the SSH keypair assigned to the instance. DUT ~~~ Complete below steps by following the suggestions provided. 6. Follow the :ref:`FAQ ` to setup DUT with isolated CPUs, and 1G hugepages. #. Update NIC firmware and drivers by following the guidance in the :ref:`FAQ `. Not applicable to EC2 instances. #. Remove any routes used by the dataplane interfaces. This may cause loss of connectivity on those interfaces. Download Source Code -------------------- Unless mentioned specifically, all operations henceforth are executed on the management node. Create a new folder that will be the workspace, henceforth referred to as ```` in these instructions: :: mkdir cd export NW_CRA_RELEASE=refs/tags/NW-CRA-2024.03.29 .. note:: Sometimes new features and additional bug fixes are made available in the git repositories, but are not tagged yet as part of a release. To pick up these latest changes, remove the ``-b `` option from the ``repo init`` command below. However, please be aware that such untagged changes may not be formally verified and should be considered unstable until they are tagged in an official release. To clone the repository, run the following commands: :: repo init \ -u https://git.gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-manifest.git \ -b ${NW_CRA_RELEASE} \ -m cnf-reference-arch.xml repo sync Create Single Node Cluster -------------------------- Create Ansible Inventory File ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Ansible playbooks in this repository are easiest to use with `inventory files `__ to keep track of the cluster nodes. For this solution we need one inventory file. A template ``inventory.ini`` is provided at ``/cnf-reference-arch/inventory.ini`` with the following contents: .. code-block:: none [controller] ansible_user= ansible_private_key_file= ; replace above line with DUT FQDN & optionally ansible_user and ansible_private_key_file. ; If an optional variable is not used, delete the entire key=. [worker] ansible_user= pcie_addr= dpdk_driver= ansible_private_key_file= ; replace above line with DUT FQDN, PCIe address, DPDK linux driver & optionally ansible_user and ansible_private_key_file. ; If an optional variable is not used, delete the entire key=. Filling in of the inventory file differs between a physical hardware setup and an AWS EC2 setup. Physical Hardware Inventory File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Replace ```` with the FQDN of the DUT. The same FQDN should be used for both ``[controller]`` and ``[worker]`` as this is a single-node setup. ```` specifies the user name to use to login to the DUT. Replace ```` with the PCIe address of the port on the DUT connected to the traffic generator. If the DUT uses Mellanox ConnectX-5 NIC to connect the traffic generator, replace ```` with ``mlx5_core``. Otherwise, replace it with ``vfio-pci``. ``ansible_private_key_file`` should be set to the identity file used to connect to each instance if other than the default key used by ``ssh``. As an example, if the user name used to access DUT is user1, the DUT FQDN is ``dut.arm.com`` and is connected to the traffic generator on PCIe address ``0000:06:00.1`` with a NIC compatible with the ``vfio-pci`` driver, then ``inventory.ini`` would contain:: [controller] dut.arm.com ansible_user=user1 [worker] dut.arm.com ansible_user=user1 pcie_addr=0000:06:00.1 dpdk_driver=vfio-pci AWS EC2 Inventory File ^^^^^^^^^^^^^^^^^^^^^^ Replace ```` with the primary IP address of the DUT. The same IP address should be used for both ``[controller]`` and ``[worker]`` as this is a single-node setup. ```` specifies the user name to use to login to the DUT. Replace ```` with the PCIe address of the secondary ENI with tag ``node.k8s.amazonaws.com/no_manage: true``. This is typically ``0000:00:06.0``. Replace ```` with ``igb_uio``. ``ansible_private_key_file`` should be set to the SSH key pair generated at instance creation. As an example, if the user name to access the DUT is ``ec2-user``, the DUT IP is ``10.100.100.100``, the private key file is ``ssh-key-pair.pem``, and the secondary ENI PCIe address is ``0000:00:06.0``, then ``inventory.ini`` would contain:: [controller] 10.100.100.100 ansible_user=ec2-user ansible_private_key_file=ssh-key-pair.pem [worker] 10.100.100.100 ansible_user=ec2-user pcie_addr=0000:00:06.0 dpdk_driver=igb_uio ansible_private_key_file=ssh-key-pair.pem Execute the Playbook ~~~~~~~~~~~~~~~~~~~~ Physical Hardware ^^^^^^^^^^^^^^^^^ To setup the Kubernetes cluster on physical hardware, run:: $ ansible-playbook -i inventory.ini create-cluster.yaml -K It will start by asking for the ``sudo`` password of the user name on DUT (the prompt may say ``BECOME`` password instead). If remote user has passwordless sudo on DUT, the ``-K`` flag can be omitted. See the :ref:`user guide ` for the full list of actions this playbook will take. EC2 Instance ^^^^^^^^^^^^ First, note the AWS region the EC2 instance is deployed in. Next, use `this table `__ to obtain the correct AWS Elastic Container Registry (ECR) URL for the AWS region. To setup the Kubernetes cluster on an EC2 instance, substitute the corresponding values for ``aws_region`` and ``ecr_registry_url`` and run:: $ ansible-playbook -i inventory.ini create-cluster.yaml -e '{aws_inst: true, deploy_on_vfs: false, aws_region: us-west-2, ecr_registry_url: 602401143452.dkr.ecr.us-west-2.amazonaws.com}' See the :ref:`user guide ` for the full list of actions this playbook will take. Validate the Cluster ~~~~~~~~~~~~~~~~~~~~ At this point in time, the setup should look like the :ref:`single-node-cluster` at the beginning of this document. The DUT should be in a Kubernetes cluster and running a private docker registry. To verify, ``ssh`` into the DUT and run ``kubectl get nodes``. The output should look like:: $ kubectl get nodes NAME STATUS ROLES AGE VERSION dut.arm.com Ready control-plane 24m v1.25.0 Also run ``kubectl describe node $(hostname) | grep -A 5 ^Allocatable:`` to ensure allocatable CPUs and 1G hugepages are correct. The output should look like:: $ kubectl describe node $(hostname) | grep -A 5 ^Allocatable: Allocatable: arm.com/dpdk: 2 cpu: 4 ephemeral-storage: 189217404206 hugepages-1Gi: 1Gi Finally, verify the local docker registry is running with: ``docker ps -f name=registry``. The output should look like:: $ docker ps -f name=registry CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 53656144b298 registry:2 "/entrypoint.sh /etc…" 46 hours ago Up 33 minutes 0.0.0.0:443->443/tcp, 5000/tcp registry Run the Sample Application -------------------------- Run ~~~ Now, it is time to apply the ``dpdk-testpmd.yaml`` Ansible playbook. To do so, run the following commands on the management node: :: $ cd /cnf-reference-arch/examples/dpdk-testpmd $ ansible-playbook -i ../../inventory.ini dpdk-testpmd.yaml For EC2 instance run the following commands: :: $ cd /cnf-reference-arch/examples/dpdk-testpmd $ ansible-playbook -i ../../inventory.ini dpdk-testpmd.yaml -e '{aws_inst: true, deploy_on_vfs: false}' See the :doc:`dpdk-testpmd user guide ` for the full list of actions this playbook will take. Once the playbook finishes, ``ssh`` into the DUT and deploy the DPDK testpmd application with ``dpdk-deployment.yaml`` file which is copied and stored in DUT home directory:: $ cd $HOME $ kubectl apply -f dpdk-deployment.yaml Check deployment status with command:: $ kubectl get deploy NAME READY UP-TO-DATE AVAILABLE AGE dpdk-testpmd 1/1 1 1 2m31s Test ~~~~ Monitor the application pod status by running ``kubectl get pods`` on the DUT. It may take some time to start up. ``kubectl get pods`` should show something like:: $ kubectl get pods NAME READY STATUS RESTARTS AGE dpdk-testpmd-fbb6cd468-d7x8g 1/1 Running 0 31s Once the pod is in the "Running" state, view its logs with ``kubectl logs ``. The logs should contain something similar to:: $ kubectl logs dpdk-testpmd-fbb6cd468-d7x8g ... + ./build/app/dpdk-testpmd --lcores 1@9,2@10 -a 0000:07:02.0 -- --forward-mode=5tswap --port-topology=loop --auto-start ... Set 5tswap packet forwarding mode Auto-start selected Configuring Port 0 (socket 0) Port 0: link state change event Port 0: link state change event Port 0: CA:7D:57:CB:B0:5F Checking link statuses... These logs show port 0 has MAC address ``CA:7D:57:CB:B0:5F`` with PCIe address ``0000:07:02.0`` on the DUT. Configure the traffic generator to send packets to the NIC port, using the specified MAC as DMAC. If deploying on AWS EC2 instances, also ensure the destination IP matches the primary IP of the dataplane ENI. This example uses a destination MAC address of ``CA:7D:57:CB:B0:5F`` and a destination IP of ``198.18.0.21``. Then, ``dpdk-testpmd`` will forward those packets out on port 0 after swapping the MAC, IP and port(s). In this example, the packets transmitted by ``dpdk-testpmd`` will have the source MAC set to ``CA:7D:57:CB:B0:5F`` and the source IP will be ``198.18.0.21``. The destination MAC and IP will be set to the source MAC and IP of the packets transmitted by the traffic generator. Stop ~~~~ The pods can be stopped by deleting the deployment by running ``kubectl delete deploy dpdk-testpmd`` on the DUT. Then, clean up the Kubernetes cluster by executing ``sudo kubeadm reset -f`` and ``sudo rm -rf /etc/cni/net.d`` on the DUT.