Troubleshooting¶
This page describes common issues and steps to resolve them.
python3 -c 'import socket; print(socket.getfqdn())'does not show the FQDN:If the above python command fails to print out the correct FQDN, then your system may be affected by this python bug. The solution is to modify
/etc/hoststo hardcode the correct FQDN. To do so, follow the steps in the Debian manual to modify/etc/hosts.Create VFs for <pcie_addr>task fails withecho 2 > /sys/bus/pci/devices/<pcie_addr>/sriov_numvfs ... echo: I/O error:CRA is most easily deployed with exclusive control over a PF. Ensure no other applications like DPDK or QEMU are using the PF. Additionally, ensure the PF is bound to the default kernel driver such as
i40eormlx5_core.Check if NIC port is used by other applications like QEMU or DPDK, and stop them.
Check if NIC is bound with
vfio-pcidriver, this can be done with/usr/local/bin/dpdk-devbind.py -s.Bind with NIC’s default driver. Taking
i40eas example, bind NIC port toi40ewith/usr/local/bin/dpdk-devbind.py -b i40e <pcie_addr>.Re-deploy CRA solution.
CRA can also be deployed onto specific VFs on a machine. This can be used to share underlying PFs with other applications like DPDK or QEMU. To leverage this deployment model, specify the VF PCIe addresses in the
pcie_addrfor the worker node. See the user guide for more information.Tasks error out by throwing error
Timeout (12s) waiting for privilege escalation prompt:This error happens when latency to execute tasks is high. We can add the parameter
-T 120at the end ofansible-playbookto increase the timeout time.Playbook fails due to PCIe address having active routes:
TASK [bind_pcie_addrs : Fail if any PCIe address has active routes] ****************************************************** fatal: [worker-node]: FAILED! => {"changed": false, "msg": "At least one PCIe address has an active route. Binding this to igb_uio may cause a loss in connectivity."}This error happens when a dataplane interface (PF/VF/ENI) on a worker node has an active route. In this step, the dataplane interface is being rebound to a different driver, which would remove any routes. Since this may result in a loss of connectivity, the solution will not automatically remove the routes. To use that dataplane interface, remove its routes and re-run the playbook.