How CI/CD is Sidetracking Optimization, and What You Can Do About It
High-velocity code changes are making it impossible to optimize infrastructure. But not all is lost in the battle for improved performance.
Read moreI recently tried running netconsole, a Linux kernel module that sends kernel logs to a remote UDP server, and was surprised to find it wouldn’t work on my AWS EC2 instance. I received the following error:
modprobe: ERROR: could not insert 'netconsole': Unknown error 524
Netconsole is a kernel module that logs kernel printk messages over UDP, allowing debugging of problems where disk logging fails and serial consoles are impractical. With netconsole, the kernel messages are sent over the network in UDP packets, providing a practical solution in situations where you don’t know how to reproduce the kernel panic, and/or when there are no clues in the logfiles once you reboot your system. For example, Facebook uses it to monitor hundreds of thousands of servers and identify potentially problematic machines.
I’m using a c5.large instance on AWS, running Ubuntu 16.04 (Linux version 4.4.0-1072-aws). When I try to enable netconsole I receive the following error message:
$ sudo modprobe netconsole netconsole="@/ens5,6666@192.168.1.123/" modprobe: ERROR: could not insert 'netconsole': Unknown error 524
Checking dmesg reveals the not-so-unknown error:
netpoll: netconsole: ens5 doesn’t support polling, aborting
Apparently netconsole’s initialization fails because ens5, the network interface, doesn’t support polling. Searching for this error message in the kernel sources brings us to the __netpoll_setup function, under net/core/netpoll.c:
if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) || !ndev->netdev_ops->ndo_poll_controller) { np_err(np, "%s doesn't support polling, aborting\n", np->dev_name); err = -ENOTSUPP; goto out; }
So, what is netpoll and how did we even get there? The netpoll API implemented in the Linux kernel allows the use of network devices with interrupts disabled, by hardware polling. Using the netpoll API, UDP clients can be implemented independently from the regular network stack, and can therefore be used in critical contexts such as system crashes. This is exactly what’s needed for netconsole.
By inspecting netconsole’s code (located under drivers/net/netconsole.c), we can see it uses netpoll to transmit kernel logs over UDP. For each netconsole target (multiple targets may be defined), init_netconsole calls alloc_param_target which in turn calls netpoll_setup to initialize a new struct netpoll object. netpoll_setup calls the inner __netpoll_setup, which is where our error occurred.
Let’s dig into the problematic line. ndev is the net_device object for our network interface (ens5). netdev_ops is a struct net_device_ops containing management hooks for the specific network device. It seems that the network device’s driver didn’t define an ndo_poll_controller hook, which causes netconsole to fail. To find the relevant network driver we’ll use ethtool:
$ ethtool -i ens5 driver: ena version: 2.0.1K firmware-version: Expansion-rom-version: bus-info: 0000:00:05.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no
The driver is ena version 2.0.1K, the Elastic Network Adapter that Amazon uses for (some of) its Linux EC2 instances. We can find its code in the amzn_drivers repository on github, then look at the appropriate version tag and find the culprit under amzn-drivers/kernel/linux/ena/ena_netdev.c:
static const struct net_device_ops ena_netdev_ops = { .ndo_open = ena_open, .ndo_stop = ena_close, .ndo_start_xmit = ena_start_xmit, .ndo_select_queue = ena_select_queue, #if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,36)) .ndo_get_stats64 = ena_get_stats64, #else .ndo_get_stats = ena_get_stats, #endif .ndo_tx_timeout = ena_tx_timeout, .ndo_change_mtu = ena_change_mtu, .ndo_set_mac_address = NULL, #ifdef HAVE_SET_RX_MODE .ndo_set_rx_mode = ena_set_rx_mode, #endif .ndo_validate_addr = eth_validate_addr, #if ENA_BUSY_POLL_SUPPORT .ndo_busy_poll = ena_busy_poll, #endif };
ena_netdev_ops really doesn’t define an ndo_poll_controller function!
At this stage I was a bit surprised. I couldn’t think of a good reason for ena not to support polling. There must be a better reason for ndo_poll_controller to be missing, so I looked through the version history and found that it does in fact exist in ena versions 1.X! Its removal is even documented in ena 2.0’s release notes:
Minor Changes
We now know that not defining ndo_poll_controller was intentional, but we still don’t know why. The next clue comes from the Linux kernel version that first incorporated ena’s change, version 4.19. What’s interesting is that many other network drivers also removed their support of ndo_poll_controller in this version, all quoting the same reason:
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for an unlimited amount of time, since one cpu is generally not able to drain all the queues under load.
The NAPI mechanism allows device drivers to supply a polling interface for use in times of high traffic, instead of creating many interrupts. Song Liu discovered that the driver’s ndo_poll_controller function registers all of the device’s NAPI contexts to a single CPU.
for (i = 0; i < adapter->num_queues; i++) napi_schedule(&adapter->ena_napi[i].napi);
This led to the decision to remove ndo_poll_controller from supported network drivers, ena included.
But does this mean that Linux kernel 4.19 breaks the use of netpoll and netconsole? Absolutely not, since it also includes another important change: making ndo_poll_controller optional. The commit message for this change provides additional information about the issue:
It seems that all networking drivers that do use NAPI for their TX completions, should not provide a ndo_poll_controller(). NAPI drivers have netpoll support already handled in core networking stack, since netpoll_poll_dev() uses poll_napi(dev) to iterate through registered NAPI contexts for a device.
This patch allows netpoll_poll_dev() to process NAPI contexts even for drivers not providing ndo_poll_controller(), allowing for following patches in NAPI drivers.
As Eric Dumazet replied to Song Liu: “The core infrastructure is just better at being able to drain TX completions without risking stealing the NAPI context forever.”
We now understand exactly what caused the issue: Our Ubuntu 16.04 came with ena 2.0, which doesn’t contain the ndo_poll_controller hook since it became optional, but it also came with kernel 4.4 where ndo_poll_controller is still mandatory in netpoll. We can now offer several solutions to the problem:
void ndo_poll_controller_empty(struct net_device *ndev) { // Empty } int patch_ndo_poll_controller(struct net_device *ndev) { // Allocate new struct net_device_ops struct net_device_ops *netdev_ops = kmalloc(sizeof(struct net_device_ops), GFP_KERNEL); if (netdev_ops == NULL) { return -ENOMEM; } // Copy the content of the original netdev_ops *netdev_ops = *ndev->netdev_ops; // Switch ndo_poll_controller to the empty implementation netdev_ops->ndo_poll_controller = ndo_poll_controller_empty; // Patch the net_device's ops ndev->netdev_ops = netdev_ops; return 0; }