Troubleshooting¶
This page describes common issues and steps to resolve them.
python3 -c 'import socket; print(socket.getfqdn())'
does not show the FQDN:If the above python command fails to print out the correct FQDN, then your system may be affected by this python bug. The solution is to modify
/etc/hosts
to hardcode the correct FQDN. To do so, follow the steps in the Debian manual to modify/etc/hosts
.Create VFs for <pcie_addr>
task fails withecho 2 > /sys/bus/pci/devices/<pcie_addr>/sriov_numvfs ... echo: I/O error
:CRA is most easily deployed with exclusive control over a PF. Ensure no other applications like DPDK or QEMU are using the PF. Additionally, ensure the PF is bound to the default kernel driver such as
i40e
ormlx5_core
.Check if NIC port is used by other applications like QEMU or DPDK, and stop them.
Check if NIC is bound with
vfio-pci
driver, this can be done with/usr/local/bin/dpdk-devbind.py -s
.Bind with NIC’s default driver. Taking
i40e
as example, bind NIC port toi40e
with/usr/local/bin/dpdk-devbind.py -b i40e <pcie_addr>
.Re-deploy CRA solution.
CRA can also be deployed onto specific VFs on a machine. This can be used to share underlying PFs with other applications like DPDK or QEMU. To leverage this deployment model, specify the VF PCIe addresses in the
pcie_addr
for the worker node. See the user guide for more information.Tasks error out by throwing error
Timeout (12s) waiting for privilege escalation prompt
:This error happens when latency to execute tasks is high. We can add the parameter
-T 120
at the end ofansible-playbook
to increase the timeout time.Playbook fails due to PCIe address having active routes:
TASK [bind_pcie_addrs : Fail if any PCIe address has active routes] ****************************************************** fatal: [worker-node]: FAILED! => {"changed": false, "msg": "At least one PCIe address has an active route. Binding this to igb_uio may cause a loss in connectivity."}
This error happens when a dataplane interface (PF/VF/ENI) on a worker node has an active route. In this step, the dataplane interface is being rebound to a different driver, which would remove any routes. Since this may result in a loss of connectivity, the solution will not automatically remove the routes. To use that dataplane interface, remove its routes and re-run the playbook.