A Case of a Broken Netconsole

How a kernel bug-fix caused an incompatibility of updated network drivers in older kernel versions — and how we solved it
Tomer Yogev

Tomer Yogev

Dec 11, 2019

I recently tried running netconsole, a Linux kernel module that sends kernel logs to a remote UDP server, and was surprised to find it wouldn’t work on my AWS EC2 instance. I received the following error:

modprobe: ERROR: could not insert 'netconsole': Unknown error 524

Using netconsole

Netconsole is a kernel module that logs kernel printk messages over UDP, allowing debugging of problems where disk logging fails and serial consoles are impractical. With netconsole, the kernel messages are sent over the network in UDP packets, providing a practical solution in situations where you don’t know how to reproduce the kernel panic, and/or when there are no clues in the logfiles once you reboot your system. For example, Facebook uses it to monitor hundreds of thousands of servers and identify potentially problematic machines. 

Why doesn’t netconsole start?

I’m using a c5.large instance on AWS, running Ubuntu 16.04 (Linux version 4.4.0-1072-aws). When I try to enable netconsole I receive the following error message: 

$ sudo modprobe netconsole netconsole="@/ens5,6666@192.168.1.123/"
modprobe: ERROR: could not insert 'netconsole': Unknown error 524

Checking dmesg reveals the not-so-unknown error:

netpoll: netconsole: ens5 doesn’t support polling, aborting

Apparently netconsole’s initialization fails because ens5, the network interface, doesn’t support polling. Searching for this error message in the kernel sources brings us to the __netpoll_setup function, under net/core/netpoll.c:

if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
    !ndev->netdev_ops->ndo_poll_controller) {
       np_err(np, "%s doesn't support polling, aborting\n", np->dev_name);
       err = -ENOTSUPP;
       goto out;
}

Netpoll

So, what is netpoll and how did we even get there? The netpoll API implemented in the Linux kernel allows the use of network devices with interrupts disabled, by hardware polling. Using the netpoll API, UDP clients can be implemented independently from the regular network stack, and can therefore be used in critical contexts such as system crashes. This is exactly what’s needed for netconsole.

By inspecting netconsole’s code (located under drivers/net/netconsole.c), we can see it uses netpoll to transmit kernel logs over UDP. For each netconsole target (multiple targets may be defined), init_netconsole calls alloc_param_target which in turn calls netpoll_setup to initialize a new struct netpoll object. netpoll_setup calls the inner __netpoll_setup, which is where our error occurred.

ndo_poll_controller

Let’s dig into the problematic line. ndev is the net_device object for our network interface (ens5). netdev_ops is a struct net_device_ops containing management hooks for the specific network device. It seems that the network device’s driver didn’t define an ndo_poll_controller hook, which causes netconsole to fail. To find the relevant network driver we’ll use ethtool:

$ ethtool -i ens5
driver: ena
version: 2.0.1K
firmware-version:
Expansion-rom-version: 
bus-info: 0000:00:05.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

The driver is ena version 2.0.1K, the Elastic Network Adapter that Amazon uses for (some of) its Linux EC2 instances. We can find its code in the amzn_drivers repository on github, then look at the appropriate version tag and find the culprit under amzn-drivers/kernel/linux/ena/ena_netdev.c:

static const struct net_device_ops ena_netdev_ops = {
       .ndo_open = ena_open,
       .ndo_stop = ena_close,
       .ndo_start_xmit = ena_start_xmit,
       .ndo_select_queue = ena_select_queue,
#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,36))
       .ndo_get_stats64 = ena_get_stats64,
#else
       .ndo_get_stats = ena_get_stats,
#endif
       .ndo_tx_timeout = ena_tx_timeout,
       .ndo_change_mtu = ena_change_mtu,
       .ndo_set_mac_address = NULL,
#ifdef HAVE_SET_RX_MODE
       .ndo_set_rx_mode = ena_set_rx_mode,
#endif
       .ndo_validate_addr = eth_validate_addr,
#if ENA_BUSY_POLL_SUPPORT
       .ndo_busy_poll = ena_busy_poll,
#endif
};

ena_netdev_ops really doesn’t define an ndo_poll_controller function!

How did this happen?

At this stage I was a bit surprised. I couldn’t think of a good reason for ena not to support polling. There must be a better reason for ndo_poll_controller to be missing, so I looked through the version history and found that it does in fact exist in ena versions 1.X! Its removal is even documented in ena 2.0’s release notes:

Minor Changes

  • Remove support for ndo_netpoll_controller.
  • Update host info structure to match the latest ENA spec.
  • Remove redundant parameter in ena_com_admin_init().
  • Fix indentations in ena_defs for better readability.
  • Add section about predictable Network Names to the README.
  • Fix small spelling mistake in RELEASE_NOTES __FGP_COLD => __GFP_COLD.

We now know that not defining ndo_poll_controller was intentional, but we still don’t know why. The next clue comes from the Linux kernel version that first incorporated ena’s change, version 4.19. What’s interesting is that many other network drivers also removed their support of ndo_poll_controller in this version, all quoting the same reason:

As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for an unlimited amount of time, since one cpu is generally not able to drain all the queues under load.

The NAPI mechanism allows device drivers to supply a polling interface for use in times of high traffic, instead of creating many interrupts. Song Liu discovered that the driver’s ndo_poll_controller function registers all of the device’s NAPI contexts to a single CPU.

 
for (i = 0; i < adapter->num_queues; i++)
       napi_schedule(&adapter->ena_napi[i].napi);

This led to the decision to remove ndo_poll_controller from supported network drivers, ena included.

But does this mean that Linux kernel 4.19 breaks the use of netpoll and netconsole? Absolutely not, since it also includes another important change: making ndo_poll_controller optional. The commit message for this change provides additional information about the issue:

It seems that all networking drivers that do use NAPI for their TX completions, should not provide a ndo_poll_controller(). NAPI drivers have netpoll support already handled in core networking stack, since netpoll_poll_dev() uses poll_napi(dev) to iterate through registered NAPI contexts for a device.

This patch allows netpoll_poll_dev() to process NAPI contexts even for drivers not providing ndo_poll_controller(), allowing for following patches in NAPI drivers.

As Eric Dumazet replied to Song Liu: “The core infrastructure is just better at being able to drain TX completions without risking stealing the NAPI context forever.”

Our walkthrough 

We now understand exactly what caused the issue: Our Ubuntu 16.04 came with ena 2.0, which doesn’t contain the ndo_poll_controller hook since it became optional, but it also came with kernel 4.4 where ndo_poll_controller is still mandatory in netpoll. We can now offer several solutions to the problem:

  1. Upgrading the Linux Kernel: If it’s possible for you, updating the OS to a newer Ubuntu and a newer kernel (4.19 and up) might be the simplest way to avoid this issue.
  2. Upgrading netconsole: If you don’t want to upgrade the entire kernel, you can alternatively compile a newer netconsole which can support the new ena.
  3. Downgrading ena: Technically, you can also solve the issue by downgrading the ena driver to version 1.X, but I wouldn’t recommend this approach for obvious reasons.
  4. Recompiling ena: The older kernel demands the existence of ndo_poll_controller, but it’s not really necessary (which is exactly the reason why it became optional). So another solution is to recompile ena yourself with an empty ndo_poll_controller function. 
  5. Patch in Runtime: If you can’t afford to reboot your machine and you don’t want to change the preexisting drivers, another solution is to patch the net_device in runtime. Write a kernel module that finds the net_device and changes its netdev_ops struct to a struct of your own. Your netdev_ops should be the same as the original struct, but it’ll also include an empty ndo_poll_controller. Here’s a sample snippet:
    void ndo_poll_controller_empty(struct net_device *ndev) {
           // Empty
    }
     
    int patch_ndo_poll_controller(struct net_device *ndev) {
           // Allocate new struct net_device_ops
           struct net_device_ops *netdev_ops = kmalloc(sizeof(struct net_device_ops), GFP_KERNEL);
           if (netdev_ops == NULL) {
                  return -ENOMEM;
           }
     
           // Copy the content of the original netdev_ops
           *netdev_ops = *ndev->netdev_ops;
     
           // Switch ndo_poll_controller to the empty implementation
           netdev_ops->ndo_poll_controller = ndo_poll_controller_empty;
     
           // Patch the net_device's ops
           ndev->netdev_ops = netdev_ops;
     
           return 0;
    }
    

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top
Skip to content