[PATCH 0/5] genirq: threadable IRQ support

* [PATCH 0/5] genirq: threadable IRQ support
@ 2016-06-15 13:42 Paolo Abeni
  2016-06-15 13:42 ` [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, David S. Miller, Eric Dumazet, Steven Rostedt,
	Peter Zijlstra (Intel),
	Ingo Molnar, Hannes Frederic Sowa, netdev

This patch series adds a new genirq interface to allows the user space to change
the IRQ mode at runtime, switching to and from the threaded mode.

The configuration is performing on a per irqaction basis, writing into the
newly added procfs entry /proc/irq/<nr>/<irq action name>/threaded. Such entry
is created at IRQ request time, only if CONFIG_IRQ_FORCED_THREADING
is defined.

Upon IRQ creation, the device handling such IRQ may optionally provide, via
the newly added API irq_set_mode_notifier(), an additional callback to be
notified about IRQ mode change.
The device can use such callback to configure its internal state to behave
differently in threaded mode and in normal mode if required.

Additional IRQ flags are added to let the device specifies some default
aspects of the IRQ thread. The device can request a SCHED_NORMAL scheduling
policy and avoid the affinity setting for the IRQ thread. Both of such
options are beneficial for the first threadable IRQ user.

The initial user for this feature is the networking subsystem; some
infrastructure is added to the network core for such goal. A new napi field
storing an IRQ thread reference is used to mark a NAPI instance as threaded
and __napi_schedule is modified to invoke a poll loop directly instead of
raising a softirq when the related NAPI instance is in threaded mode, plus 
a IRQ_mode_set callback is provided to notify the NAPI instance of the IRQ
mode change.

Each network device driver must be migrated explicitly to leverage the new
infrastructure. In this patch series, the Intel ixgbe is updated to invoke
irq_set_mode_notifier(), only when using msix IRQs. 
This avoids other IRQ events to be delayed indefinitely when the rx IRQ is
processed in thread mode. The default behavior after the driver migration is
unchanged.

Running the rx packets processing inside a conventional kthread is beneficial
for different workload since it allows the process scheduler to nicely use
the available resources. With multiqueue NICs, the ksoftirq design does not allow
any running process to use 100% of a single CPU, under relevant network load,
because the softirq poll loop will be scheduled on each CPU.

The above can be experienced in a hypervisor/VMs scenario, when the guest is
under UDP flood. If the hypervisor's NIC has enough rx queues the guest will
compete with ksoftirqd on each CPU. Moreover, since the ksoftirqd CPU
utilization change with the ingress traffic, the scheduler try to migrate the
guest processes towards the CPUs with the highest capacity, further impacting
the guest ability to process rx packets.

Running the hypervisor rx packet processing inside a migrable kthread allows
the process scheduler to let the guest process[es] to fully use a single a
core each, migrating some rx threads as required.

The raw numbers, obtained with the netperf UDP_STREAM test, using a tun
device with a noqueue qdisc in the hypervisor, and using random IP addresses
as source in case of multiple flows, are as follow:

		vanilla		threaded
size/flow	kpps		kpps/delta
1/1		824		843/+2%
1/25		736		906/+23%
1/50		752		906/+20%
1/100		772		906/+17%
1/200		741		976/+31%
64/1		829		840/+1%
64/25		711		932/+31%
64/50		780		894/+14%
64/100		754		946/+25%
64/200		714		945/+32%
256/1		702		510/-27%
256/25		724		894/+23%
256/50		739		889/+20%
256/100		798		873/+9%
256/200		812		907/+11%
1400/1		720		727/+1%
1400/25		826		826/0
1400/50		827		833/0
1400/100	820		820/0
1400/200	796		799/0

The guest runs 2vCPU, so it's not prone to the userspace livelock issue
recently exposed here: http://thread.gmane.org/gmane.linux.kernel/2218719

There are relevant improvement in all cpu bounded scenarios with multiple flows
and significant regression with medium size packet, single flow. The latter
is due to the increased 'burstiness' of packet processing which cause the
single socket in the guest of overflow more easily, if the receiver application
is scheduled on the same cpu processing the incoming packets.

The kthread approach should give a lot of new advantages over the softirq
based approach:

* moving into a more dpdk-alike busy poll packet processing direction:
we can even use busy polling without the need of a connected UDP or TCP
socket and can leverage busy polling for forwarding setups. This could
very well increase latency and packet throughput without hurting other
processes if the networking stack gets more and more preemptive in the
future.

* possibility to acquire mutexes in the networking processing path: e.g.
we would need that to configure hw_breakpoints if we want to add
watchpoints in the memory based on some rules in the kernel

* more and better tooling to adjust the weight of the networking
kthreads, preferring certain networking cards or setting cpus affinity
on packet processing threads. Maybe also using deadline scheduling or
other scheduler features might be worthwhile.

* scheduler statistics can be used to observe network packet processing

Paolo Abeni (5):
  genirq: implement support for runtime switch to threaded irqs
  genirq: add flags for controlling the default threaded irq behavior
  sched/preempt: cond_resched_softirq() must check for softirq
  netdev: implement infrastructure for threadable napi irq
  ixgbe: add support for threadable rx irq

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  14 +-
 include/linux/interrupt.h                     |  21 +++
 include/linux/netdevice.h                     |   4 +
 kernel/irq/internals.h                        |   3 +
 kernel/irq/manage.c                           | 212 ++++++++++++++++++++++++--
 kernel/irq/proc.c                             |  51 +++++++
 kernel/sched/core.c                           |   3 +-
 net/core/dev.c                                |  59 +++++++
 8 files changed, 355 insertions(+), 12 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread