* [PATCH 0/5] genirq: threadable IRQ support
@ 2016-06-15 13:42 Paolo Abeni
2016-06-15 13:42 ` [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs Paolo Abeni
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 13:42 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, David S. Miller, Eric Dumazet, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
This patch series adds a new genirq interface to allows the user space to change
the IRQ mode at runtime, switching to and from the threaded mode.
The configuration is performing on a per irqaction basis, writing into the
newly added procfs entry /proc/irq/<nr>/<irq action name>/threaded. Such entry
is created at IRQ request time, only if CONFIG_IRQ_FORCED_THREADING
is defined.
Upon IRQ creation, the device handling such IRQ may optionally provide, via
the newly added API irq_set_mode_notifier(), an additional callback to be
notified about IRQ mode change.
The device can use such callback to configure its internal state to behave
differently in threaded mode and in normal mode if required.
Additional IRQ flags are added to let the device specifies some default
aspects of the IRQ thread. The device can request a SCHED_NORMAL scheduling
policy and avoid the affinity setting for the IRQ thread. Both of such
options are beneficial for the first threadable IRQ user.
The initial user for this feature is the networking subsystem; some
infrastructure is added to the network core for such goal. A new napi field
storing an IRQ thread reference is used to mark a NAPI instance as threaded
and __napi_schedule is modified to invoke a poll loop directly instead of
raising a softirq when the related NAPI instance is in threaded mode, plus
a IRQ_mode_set callback is provided to notify the NAPI instance of the IRQ
mode change.
Each network device driver must be migrated explicitly to leverage the new
infrastructure. In this patch series, the Intel ixgbe is updated to invoke
irq_set_mode_notifier(), only when using msix IRQs.
This avoids other IRQ events to be delayed indefinitely when the rx IRQ is
processed in thread mode. The default behavior after the driver migration is
unchanged.
Running the rx packets processing inside a conventional kthread is beneficial
for different workload since it allows the process scheduler to nicely use
the available resources. With multiqueue NICs, the ksoftirq design does not allow
any running process to use 100% of a single CPU, under relevant network load,
because the softirq poll loop will be scheduled on each CPU.
The above can be experienced in a hypervisor/VMs scenario, when the guest is
under UDP flood. If the hypervisor's NIC has enough rx queues the guest will
compete with ksoftirqd on each CPU. Moreover, since the ksoftirqd CPU
utilization change with the ingress traffic, the scheduler try to migrate the
guest processes towards the CPUs with the highest capacity, further impacting
the guest ability to process rx packets.
Running the hypervisor rx packet processing inside a migrable kthread allows
the process scheduler to let the guest process[es] to fully use a single a
core each, migrating some rx threads as required.
The raw numbers, obtained with the netperf UDP_STREAM test, using a tun
device with a noqueue qdisc in the hypervisor, and using random IP addresses
as source in case of multiple flows, are as follow:
vanilla threaded
size/flow kpps kpps/delta
1/1 824 843/+2%
1/25 736 906/+23%
1/50 752 906/+20%
1/100 772 906/+17%
1/200 741 976/+31%
64/1 829 840/+1%
64/25 711 932/+31%
64/50 780 894/+14%
64/100 754 946/+25%
64/200 714 945/+32%
256/1 702 510/-27%
256/25 724 894/+23%
256/50 739 889/+20%
256/100 798 873/+9%
256/200 812 907/+11%
1400/1 720 727/+1%
1400/25 826 826/0
1400/50 827 833/0
1400/100 820 820/0
1400/200 796 799/0
The guest runs 2vCPU, so it's not prone to the userspace livelock issue
recently exposed here: http://thread.gmane.org/gmane.linux.kernel/2218719
There are relevant improvement in all cpu bounded scenarios with multiple flows
and significant regression with medium size packet, single flow. The latter
is due to the increased 'burstiness' of packet processing which cause the
single socket in the guest of overflow more easily, if the receiver application
is scheduled on the same cpu processing the incoming packets.
The kthread approach should give a lot of new advantages over the softirq
based approach:
* moving into a more dpdk-alike busy poll packet processing direction:
we can even use busy polling without the need of a connected UDP or TCP
socket and can leverage busy polling for forwarding setups. This could
very well increase latency and packet throughput without hurting other
processes if the networking stack gets more and more preemptive in the
future.
* possibility to acquire mutexes in the networking processing path: e.g.
we would need that to configure hw_breakpoints if we want to add
watchpoints in the memory based on some rules in the kernel
* more and better tooling to adjust the weight of the networking
kthreads, preferring certain networking cards or setting cpus affinity
on packet processing threads. Maybe also using deadline scheduling or
other scheduler features might be worthwhile.
* scheduler statistics can be used to observe network packet processing
Paolo Abeni (5):
genirq: implement support for runtime switch to threaded irqs
genirq: add flags for controlling the default threaded irq behavior
sched/preempt: cond_resched_softirq() must check for softirq
netdev: implement infrastructure for threadable napi irq
ixgbe: add support for threadable rx irq
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 14 +-
include/linux/interrupt.h | 21 +++
include/linux/netdevice.h | 4 +
kernel/irq/internals.h | 3 +
kernel/irq/manage.c | 212 ++++++++++++++++++++++++--
kernel/irq/proc.c | 51 +++++++
kernel/sched/core.c | 3 +-
net/core/dev.c | 59 +++++++
8 files changed, 355 insertions(+), 12 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs
2016-06-15 13:42 [PATCH 0/5] genirq: threadable IRQ support Paolo Abeni
@ 2016-06-15 13:42 ` Paolo Abeni
2016-06-15 14:50 ` kbuild test robot
2016-06-15 13:42 ` [PATCH 2/5] genirq: add flags for controlling the default threaded irq behavior Paolo Abeni
` (3 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 13:42 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, David S. Miller, Eric Dumazet, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
When the IRQ_FORCED_THREADING compile option is enabled, a new
new 'threaded' procfs entry is added under the action proc
directory upon irq request. Writing a true value onto
that file will cause the underlying action to be reconfigured
in a FORCE_THREADED mode.
The reconfiguration is performed disabling the irq underlaying
the current action, and then updating the action struct to the
specified mode, i.e. setting the thread field and the
IRQTF_FORCED_THREAD.
If en error occours before notifying the device, the
irq action is unmodified.
A device that wants to be notified about irq mode change,
can register a notifier with irq_set_mode_notifier(). Such
notifier will be invoked in atomic context just after each
irq reconfiguration.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/linux/interrupt.h | 15 ++++
kernel/irq/internals.h | 3 +
kernel/irq/manage.c | 197 ++++++++++++++++++++++++++++++++++++++++++++--
kernel/irq/proc.c | 51 ++++++++++++
4 files changed, 261 insertions(+), 5 deletions(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9fcabeb..85d3738 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -90,6 +90,7 @@ enum {
};
typedef irqreturn_t (*irq_handler_t)(int, void *);
+typedef void (*mode_notifier_t)(int, void *, struct task_struct *);
/**
* struct irqaction - per interrupt action descriptor
@@ -106,6 +107,8 @@ typedef irqreturn_t (*irq_handler_t)(int, void *);
* @thread_flags: flags related to @thread
* @thread_mask: bitmask for keeping track of @thread activity
* @dir: pointer to the proc/irq/NN/name entry
+ * @mode_notifier: callback to notify the device about irq mode change
+ * (threaded vs normal mode)
*/
struct irqaction {
irq_handler_t handler;
@@ -121,6 +124,7 @@ struct irqaction {
unsigned long thread_mask;
const char *name;
struct proc_dir_entry *dir;
+ mode_notifier_t mode_notifier;
} ____cacheline_internodealigned_in_smp;
extern irqreturn_t no_action(int cpl, void *dev_id);
@@ -212,6 +216,17 @@ extern void irq_wake_thread(unsigned int irq, void *dev_id);
extern void suspend_device_irqs(void);
extern void resume_device_irqs(void);
+#ifdef CONFIG_IRQ_FORCED_THREADING
+extern int irq_set_mode_notifier(unsigned int irq, void *dev_id,
+ mode_notifier_t notifier);
+#else
+static inline int
+irq_set_mode_notifier(unsigned int irq, void *dev_id, mode_notifier_t notifier)
+{
+ return 0;
+}
+#endif
+
/**
* struct irq_affinity_notify - context for notification of IRQ affinity changes
* @irq: Interrupt to which notification applies
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 09be2c9..841c714 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -105,6 +105,9 @@ static inline void unregister_handler_proc(unsigned int irq,
struct irqaction *action) { }
#endif
+extern int irq_reconfigure(unsigned int irq, struct irqaction *act,
+ bool threaded);
+
extern int irq_select_affinity_usr(unsigned int irq, struct cpumask *mask);
extern void irq_set_thread_affinity(struct irq_desc *desc);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index ef0bc02..cce4efd 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -938,8 +938,7 @@ static int irq_thread(void *data)
irqreturn_t (*handler_fn)(struct irq_desc *desc,
struct irqaction *action);
- if (force_irqthreads && test_bit(IRQTF_FORCED_THREAD,
- &action->thread_flags))
+ if (test_bit(IRQTF_FORCED_THREAD, &action->thread_flags))
handler_fn = irq_forced_thread_fn;
else
handler_fn = irq_thread_fn;
@@ -1052,8 +1051,8 @@ static void irq_release_resources(struct irq_desc *desc)
c->irq_release_resources(d);
}
-static int
-setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
+static struct task_struct *
+create_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
{
struct task_struct *t;
struct sched_param param = {
@@ -1070,7 +1069,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
}
if (IS_ERR(t))
- return PTR_ERR(t);
+ return t;
sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m);
@@ -1080,6 +1079,17 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
* references an already freed task_struct.
*/
get_task_struct(t);
+ return t;
+}
+
+static int
+setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
+{
+ struct task_struct *t = create_irq_thread(new, irq, secondary);
+
+ if (IS_ERR(t))
+ return PTR_ERR(t);
+
new->thread = t;
/*
* Tell the thread to set its affinity. This is
@@ -1511,6 +1521,183 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id)
return action;
}
+#ifdef CONFIG_IRQ_FORCED_THREADING
+/*
+ * Internal function to reconfigure an irqaction - change it to
+ * threaded mode if the specified task struct is not NULL and vice versa
+ */
+void __irq_reconfigure_action(struct irq_desc *desc, struct irqaction *action,
+ struct task_struct *t)
+{
+ action->flags &= ~IRQF_ONESHOT;
+ action->thread_mask = 0;
+ if (!t) {
+ if (action->thread_fn) {
+ action->handler = action->thread_fn;
+ action->thread_fn = NULL;
+ }
+ clear_bit(IRQTF_FORCED_THREAD, &action->thread_flags);
+ action->thread = NULL;
+ return;
+ }
+
+ /* Force the irq in threaded mode */
+ if (!action->thread_fn) {
+ action->thread_fn = action->handler;
+ action->handler = irq_default_primary_handler;
+ }
+
+ action->thread = t;
+ set_bit(IRQTF_FORCED_THREAD, &action->thread_flags);
+ set_bit(IRQTF_AFFINITY, &action->thread_flags);
+
+ if (!(desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)) {
+ /*
+ * We already ensured no other actions is registered on
+ * this irq
+ */
+ action->thread_mask = 1;
+ action->flags |= IRQF_ONESHOT;
+ desc->istate |= IRQS_ONESHOT;
+ }
+}
+
+/* Internal function to check if the specified irqaction can be threadable */
+static bool __irq_check_threadable(struct irq_desc *desc, struct irqaction *act)
+{
+ if (irq_settings_is_nested_thread(desc) ||
+ !irq_settings_can_thread(desc))
+ return false;
+
+ /*
+ * Enabling thread mode is going to set IRQF_ONESHOT, unless the irq
+ * chip will help us; in the first case the irq can't be shared: the
+ * only registered action can be the current one
+ */
+ if (desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)
+ return true;
+ return !desc->action ||
+ (act && desc->action == act && act->next == NULL);
+}
+
+/* Internal function to configure the specified action threaded mode */
+int irq_reconfigure(unsigned int irq, struct irqaction *act, bool threaded)
+{
+ struct task_struct *thread = NULL, *old_thread = NULL;
+ struct irq_desc *desc = irq_to_desc(irq);
+ struct irqaction *action;
+ int retval = -EINVAL;
+ unsigned long flags;
+
+ /*
+ * Preallocate the kthread, so that we can update the action atomically
+ * later
+ */
+ if (threaded) {
+ old_thread = thread = create_irq_thread(act, irq, false);
+ if (IS_ERR(thread))
+ return PTR_ERR(thread);
+ }
+
+ disable_irq(irq);
+
+ chip_bus_lock(desc);
+ raw_spin_lock_irqsave(&desc->lock, flags);
+
+ /* Check for no-op under lock */
+ if (threaded == test_bit(IRQTF_FORCED_THREAD, &act->thread_flags))
+ goto unlock;
+
+ /* Even more pedantic check: look-up for our action */
+ for_each_action_of_desc(desc, action)
+ if (action->dev_id == act->dev_id)
+ break;
+ if (!action || action != act)
+ goto unlock;
+
+ /*
+ * Check again for threadable constraints: the action list/desc
+ * can be changed since the irq_set_threadable call
+ */
+ if (!__irq_check_threadable(desc, action))
+ goto unlock;
+
+ old_thread = action->thread;
+ __irq_reconfigure_action(desc, action, thread);
+
+ if (action->mode_notifier)
+ action->mode_notifier(action->irq, action->dev_id, thread);
+ retval = 0;
+
+unlock:
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+ chip_bus_sync_unlock(desc);
+
+ if (old_thread) {
+ kthread_stop(old_thread);
+ put_task_struct(old_thread);
+ }
+
+ if (retval)
+ pr_err("can't change configuration for irq %d: %d\n", irq,
+ retval);
+
+ enable_irq(irq);
+ return retval;
+}
+
+/**
+ * irq_set_mode_notifier - register a mode change notifier
+ * @irq: Interrupt line
+ * @dev_id: The cookie used to identify the irq handler and passed back
+ * to the notifier
+ * @mode_notifier: The callback to be registered
+ *
+ * This call registers a callback to notify the device about irq mode
+ * change (threaded/normal mode). Mode change are triggered writing on
+ * the 'threaded' procfs entry.
+ * When running in threaded mode the irq thread task struct will be passed
+ * to the notifer, or NULL elsewhere. It's up to the device update its
+ * internal state accordingly
+ */
+int irq_set_mode_notifier(unsigned int irq, void *dev_id,
+ mode_notifier_t notifier)
+{
+ struct irq_desc *desc = irq_to_desc(irq);
+ struct irqaction *action;
+ unsigned long flags;
+ int ret = -EINVAL;
+
+ if (!desc)
+ return ret;
+
+ chip_bus_lock(desc);
+ raw_spin_lock_irqsave(&desc->lock, flags);
+
+ for_each_action_of_desc(desc, action)
+ if (action->dev_id == dev_id)
+ break;
+
+ if (!action || action->mode_notifier)
+ goto out;
+
+ /*
+ * Sync current status, so that the device is fine if the irq has been
+ * reconfigured before the notifer is registered
+ */
+ action->mode_notifier = notifier;
+ if (notifier)
+ notifier(action->irq, action->dev_id, action->thread);
+ ret = 0;
+
+out:
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+ chip_bus_sync_unlock(desc);
+ return ret;
+}
+EXPORT_SYMBOL(irq_set_mode_notifier);
+#endif
+
/**
* remove_irq - free an interrupt
* @irq: Interrupt line to free
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 4e1b947..01f155b 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -302,6 +302,51 @@ static int name_unique(unsigned int irq, struct irqaction *new_action)
return ret;
}
+#ifdef CONFIG_IRQ_FORCED_THREADING
+static int irqaction_threaded_proc_show(struct seq_file *m, void *v)
+{
+ struct irqaction *action = (struct irqaction *)m->private;
+
+ seq_printf(m, "%d\n",
+ test_bit(IRQTF_FORCED_THREAD, &action->thread_flags));
+ return 0;
+}
+
+static int irqaction_threaded_proc_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, irqaction_threaded_proc_show, PDE_DATA(inode));
+}
+
+static ssize_t irqaction_threaded_proc_write(struct file *file,
+ const char __user *buf,
+ size_t count, loff_t *offs)
+{
+ struct irqaction *action = PDE_DATA(file_inode(file));
+ bool threaded;
+ size_t ret;
+
+ ret = kstrtobool_from_user(buf, count, &threaded);
+ if (ret)
+ return ret;
+
+ if (threaded == test_bit(IRQTF_FORCED_THREAD, &action->thread_flags))
+ goto out;
+
+ irq_reconfigure(action->irq, action, threaded);
+
+out:
+ return count;
+}
+
+static const struct file_operations irqaction_threaded_proc_fops = {
+ .open = irqaction_threaded_proc_open,
+ .read = seq_read,
+ .write = irqaction_threaded_proc_write,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+#endif
+
void register_handler_proc(unsigned int irq, struct irqaction *action)
{
char name [MAX_NAMELEN];
@@ -316,8 +361,14 @@ void register_handler_proc(unsigned int irq, struct irqaction *action)
/* create /proc/irq/1234/handler/ */
action->dir = proc_mkdir(name, desc->dir);
+#ifdef CONFIG_IRQ_FORCED_THREADING
+ if (action->dir)
+ proc_create_data("threaded", 0644, action->dir,
+ &irqaction_threaded_proc_fops, (void *)action);
+#endif
}
+
#undef MAX_NAMELEN
#define MAX_NAMELEN 10
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/5] genirq: add flags for controlling the default threaded irq behavior
2016-06-15 13:42 [PATCH 0/5] genirq: threadable IRQ support Paolo Abeni
2016-06-15 13:42 ` [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs Paolo Abeni
@ 2016-06-15 13:42 ` Paolo Abeni
2016-06-15 13:42 ` [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq Paolo Abeni
` (2 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 13:42 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, David S. Miller, Eric Dumazet, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
A threadable irq can benefit from irq_set_affinity when running
in non threaded mode and prefer running unbounded to cpu when in
threaded mode. Setting the IRQF_TH_NO_AFFINITY flag on irq
registration allow the irq to achieve both behaviors.
A long running threaded irq can starve the system if scheduled under
SCHED_FIFO. Setting the IRQF_TH_SCHED_NORMAL flag on irq will cause
the irq thread to run by default under the SCHED_NORMAL scheduler.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/linux/interrupt.h | 6 ++++++
kernel/irq/manage.c | 17 +++++++++++------
2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 85d3738..33c3033 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,10 @@
* interrupt handler after suspending interrupts. For system
* wakeup devices users need to implement wakeup detection in
* their interrupt handlers.
+ * IRQF_TH_SCHED_NORMAL - If the IRQ is threaded, it will use SCHED_NORMAL,
+ * instead the default SCHED_FIFO scheduler
+ * IRQF_TH_NO_AFFINITY - If the IRQ is threaded, the affinity hint will not be
+ * enforced in the IRQ thread
*/
#define IRQF_SHARED 0x00000080
#define IRQF_PROBE_SHARED 0x00000100
@@ -74,6 +78,8 @@
#define IRQF_NO_THREAD 0x00010000
#define IRQF_EARLY_RESUME 0x00020000
#define IRQF_COND_SUSPEND 0x00040000
+#define IRQF_TH_SCHED_NORMAL 0x00080000
+#define IRQF_TH_NO_AFFINITY 0x00100000
#define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index cce4efd..d695e12 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1055,9 +1055,7 @@ static struct task_struct *
create_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
{
struct task_struct *t;
- struct sched_param param = {
- .sched_priority = MAX_USER_RT_PRIO/2,
- };
+ struct sched_param param;
if (!secondary) {
t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
@@ -1071,7 +1069,12 @@ create_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
if (IS_ERR(t))
return t;
- sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m);
+ if (new->flags & IRQF_TH_SCHED_NORMAL) {
+ sched_setscheduler_nocheck(t, SCHED_NORMAL, ¶m);
+ } else {
+ param.sched_priority = MAX_USER_RT_PRIO/2;
+ sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m);
+ }
/*
* We keep the reference to the task struct even if
@@ -1100,7 +1103,8 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
* correct as we want the thread to move to the cpu(s)
* on which the requesting code placed the interrupt.
*/
- set_bit(IRQTF_AFFINITY, &new->thread_flags);
+ if (!(new->flags & IRQF_TH_NO_AFFINITY))
+ set_bit(IRQTF_AFFINITY, &new->thread_flags);
return 0;
}
@@ -1549,7 +1553,8 @@ void __irq_reconfigure_action(struct irq_desc *desc, struct irqaction *action,
action->thread = t;
set_bit(IRQTF_FORCED_THREAD, &action->thread_flags);
- set_bit(IRQTF_AFFINITY, &action->thread_flags);
+ if (!(action->flags & IRQF_TH_NO_AFFINITY))
+ set_bit(IRQTF_AFFINITY, &action->thread_flags);
if (!(desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)) {
/*
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq
2016-06-15 13:42 [PATCH 0/5] genirq: threadable IRQ support Paolo Abeni
2016-06-15 13:42 ` [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs Paolo Abeni
2016-06-15 13:42 ` [PATCH 2/5] genirq: add flags for controlling the default threaded irq behavior Paolo Abeni
@ 2016-06-15 13:42 ` Paolo Abeni
2016-06-15 13:48 ` Peter Zijlstra
2016-06-15 13:42 ` [PATCH 4/5] netdev: implement infrastructure for threadable napi irq Paolo Abeni
2016-06-15 13:42 ` [PATCH 5/5] ixgbe: add support for threadable rx irq Paolo Abeni
4 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 13:42 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, David S. Miller, Eric Dumazet, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
Currently cond_resched_softirq() fails to reschedule if there
are pending softirq but no other running process. This happens
i.e. when receiving an interrupt with local bh disabled.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..788625f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4837,7 +4837,8 @@ int __sched __cond_resched_softirq(void)
{
BUG_ON(!in_softirq());
- if (should_resched(SOFTIRQ_DISABLE_OFFSET)) {
+ if (should_resched(SOFTIRQ_DISABLE_OFFSET) ||
+ local_softirq_pending()) {
local_bh_enable();
preempt_schedule_common();
local_bh_disable();
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-15 13:42 [PATCH 0/5] genirq: threadable IRQ support Paolo Abeni
` (2 preceding siblings ...)
2016-06-15 13:42 ` [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq Paolo Abeni
@ 2016-06-15 13:42 ` Paolo Abeni
2016-06-15 14:12 ` kbuild test robot
2016-06-15 14:17 ` Eric Dumazet
2016-06-15 13:42 ` [PATCH 5/5] ixgbe: add support for threadable rx irq Paolo Abeni
4 siblings, 2 replies; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 13:42 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, David S. Miller, Eric Dumazet, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
This commit adds the infrastructure needed for threadable
rx interrupt. A reference to the irq thread is used to
mark the threaded irq mode.
In threaded mode the poll loop is invoked directly from
__napi_schedule().
napi drivers which want to support threadable irq interrupts
must provide an irq mode change handler which actually set
napi->thread and register it after requesting the irq.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/linux/netdevice.h | 4 ++++
net/core/dev.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 63 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d101e4d..5da53be 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -322,6 +322,9 @@ struct napi_struct {
struct list_head dev_list;
struct hlist_node napi_hash_node;
unsigned int napi_id;
+#ifdef CONFIG_IRQ_FORCED_THREADING
+ struct task_struct *thread;
+#endif
};
enum {
@@ -330,6 +333,7 @@ enum {
NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
NAPI_STATE_HASHED, /* In NAPI hash (busy polling possible) */
NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
+ NAPI_STATE_SCHED_THREAD, /* The poll thread is scheduled */
};
enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index b148357..40ea1e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -93,6 +93,7 @@
#include <linux/etherdevice.h>
#include <linux/ethtool.h>
#include <linux/notifier.h>
+#include <linux/kthread.h>
#include <linux/skbuff.h>
#include <net/net_namespace.h>
#include <net/sock.h>
@@ -3453,10 +3454,68 @@ int netdev_tstamp_prequeue __read_mostly = 1;
int netdev_budget __read_mostly = 300;
int weight_p __read_mostly = 64; /* old backlog weight */
+#if CONFIG_IRQ_FORCED_THREADING
+static int napi_poll(struct napi_struct *n, struct list_head *repoll);
+
+static void napi_threaded_poll(struct napi_struct *napi)
+{
+ unsigned long time_limit = jiffies + 2;
+ struct list_head dummy_repoll;
+ int budget = netdev_budget;
+ bool again = true;
+
+ if (test_and_set_bit(NAPI_STATE_SCHED_THREAD, &napi->state))
+ return;
+
+ local_irq_enable();
+ INIT_LIST_HEAD(&dummy_repoll);
+
+ while (again) {
+ /* ensure that the poll list is not empty */
+ if (list_empty(&dummy_repoll))
+ list_add(&napi->poll_list, &dummy_repoll);
+
+ budget -= napi_poll(napi, &dummy_repoll);
+
+ if (napi_disable_pending(napi))
+ again = false;
+ else if (!test_bit(NAPI_STATE_SCHED, &napi->state))
+ again = false;
+ else if (kthread_should_stop())
+ again = false;
+
+ if (!again || unlikely(budget <= 0 ||
+ time_after_eq(jiffies, time_limit))) {
+ /* no need to reschedule if we are going to stop */
+ if (again)
+ cond_resched_softirq();
+ time_limit = jiffies + 2;
+ budget = netdev_budget;
+ rcu_bh_qs();
+ __kfree_skb_flush();
+ }
+ }
+
+ clear_bit(NAPI_STATE_SCHED_THREAD, &napi->state);
+ local_irq_disable();
+}
+
+static inline bool napi_is_threaded(struct napi_struct *napi)
+{
+ return current == napi->thread;
+}
+#else
+#define napi_is_threaded(napi) 0
+#endif
+
/* Called with irq disabled */
static inline void ____napi_schedule(struct softnet_data *sd,
struct napi_struct *napi)
{
+ if (napi_is_threaded(napi)) {
+ napi_threaded_poll(napi);
+ return;
+ }
list_add_tail(&napi->poll_list, &sd->poll_list);
__raise_softirq_irqoff(NET_RX_SOFTIRQ);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/5] ixgbe: add support for threadable rx irq
2016-06-15 13:42 [PATCH 0/5] genirq: threadable IRQ support Paolo Abeni
` (3 preceding siblings ...)
2016-06-15 13:42 ` [PATCH 4/5] netdev: implement infrastructure for threadable napi irq Paolo Abeni
@ 2016-06-15 13:42 ` Paolo Abeni
4 siblings, 0 replies; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 13:42 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, David S. Miller, Eric Dumazet, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
Plug-in the threadable irq infrastructure to allow run-time
configuration of rx irqs, when msix irqs are used.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 088c47c..d9a591c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2890,6 +2890,14 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
return 0;
}
+static void ixgbe_irq_mode_notifier(int irq, void *data,
+ struct task_struct *irq_thread)
+{
+ struct ixgbe_q_vector *q_vector = (struct ixgbe_q_vector *)data;
+
+ q_vector->napi.thread = irq_thread;
+}
+
/**
* ixgbe_request_msix_irqs - Initialize MSI-X interrupts
* @adapter: board private structure
@@ -2921,8 +2929,12 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
/* skip this unused q_vector */
continue;
}
- err = request_irq(entry->vector, &ixgbe_msix_clean_rings, 0,
+ err = request_irq(entry->vector, &ixgbe_msix_clean_rings,
+ IRQF_TH_NO_AFFINITY | IRQF_TH_SCHED_NORMAL,
q_vector->name, q_vector);
+ if (!err)
+ err = irq_set_mode_notifier(entry->vector, q_vector,
+ ixgbe_irq_mode_notifier);
if (err) {
e_err(probe, "request_irq failed for MSIX interrupt "
"Error: %d\n", err);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq
2016-06-15 13:42 ` [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq Paolo Abeni
@ 2016-06-15 13:48 ` Peter Zijlstra
2016-06-15 14:00 ` Paolo Abeni
0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-06-15 13:48 UTC (permalink / raw)
To: Paolo Abeni
Cc: linux-kernel, Thomas Gleixner, David S. Miller, Eric Dumazet,
Steven Rostedt, Ingo Molnar, Hannes Frederic Sowa, netdev
On Wed, Jun 15, 2016 at 03:42:04PM +0200, Paolo Abeni wrote:
> Currently cond_resched_softirq() fails to reschedule if there
> are pending softirq but no other running process. This happens
> i.e. when receiving an interrupt with local bh disabled.
>
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
All your patches appear to have this broken SoB chain.
As presented it suggests you wrote the patches, which matches with From,
however it then suggests Hannes collected and send them onwards, not so
much.
Please correct.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq
2016-06-15 13:48 ` Peter Zijlstra
@ 2016-06-15 14:00 ` Paolo Abeni
0 siblings, 0 replies; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 14:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Thomas Gleixner, David S. Miller, Eric Dumazet,
Steven Rostedt, Ingo Molnar, Hannes Frederic Sowa, netdev
On Wed, 2016-06-15 at 15:48 +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 03:42:04PM +0200, Paolo Abeni wrote:
> > Currently cond_resched_softirq() fails to reschedule if there
> > are pending softirq but no other running process. This happens
> > i.e. when receiving an interrupt with local bh disabled.
> >
> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>
> All your patches appear to have this broken SoB chain.
>
> As presented it suggests you wrote the patches, which matches with From,
> however it then suggests Hannes collected and send them onwards, not so
> much.
>
> Please correct.
My bad. I'll re-submit. The intention was to specify this is joint work
done together with Hannes.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-15 13:42 ` [PATCH 4/5] netdev: implement infrastructure for threadable napi irq Paolo Abeni
@ 2016-06-15 14:12 ` kbuild test robot
2016-06-15 14:17 ` Eric Dumazet
1 sibling, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2016-06-15 14:12 UTC (permalink / raw)
To: Paolo Abeni
Cc: kbuild-all, linux-kernel, Thomas Gleixner, David S. Miller,
Eric Dumazet, Steven Rostedt, Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
[-- Attachment #1: Type: text/plain, Size: 3438 bytes --]
Hi,
[auto build test ERROR on tip/irq/core]
[also build test ERROR on v4.7-rc3 next-20160615]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Paolo-Abeni/genirq-threadable-IRQ-support/20160615-214836
config: cris-etrax-100lx_v2_defconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 4.6.3
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=cris
All error/warnings (new ones prefixed by >>):
>> net/core/dev.c:3457:5: warning: "CONFIG_IRQ_FORCED_THREADING" is not defined [-Wundef]
net/core/dev.c: In function '____napi_schedule':
>> net/core/dev.c:3516:3: error: implicit declaration of function 'napi_threaded_poll' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
vim +/napi_threaded_poll +3516 net/core/dev.c
3451 EXPORT_SYMBOL(netdev_max_backlog);
3452
3453 int netdev_tstamp_prequeue __read_mostly = 1;
3454 int netdev_budget __read_mostly = 300;
3455 int weight_p __read_mostly = 64; /* old backlog weight */
3456
> 3457 #if CONFIG_IRQ_FORCED_THREADING
3458 static int napi_poll(struct napi_struct *n, struct list_head *repoll);
3459
3460 static void napi_threaded_poll(struct napi_struct *napi)
3461 {
3462 unsigned long time_limit = jiffies + 2;
3463 struct list_head dummy_repoll;
3464 int budget = netdev_budget;
3465 bool again = true;
3466
3467 if (test_and_set_bit(NAPI_STATE_SCHED_THREAD, &napi->state))
3468 return;
3469
3470 local_irq_enable();
3471 INIT_LIST_HEAD(&dummy_repoll);
3472
3473 while (again) {
3474 /* ensure that the poll list is not empty */
3475 if (list_empty(&dummy_repoll))
3476 list_add(&napi->poll_list, &dummy_repoll);
3477
3478 budget -= napi_poll(napi, &dummy_repoll);
3479
3480 if (napi_disable_pending(napi))
3481 again = false;
3482 else if (!test_bit(NAPI_STATE_SCHED, &napi->state))
3483 again = false;
3484 else if (kthread_should_stop())
3485 again = false;
3486
3487 if (!again || unlikely(budget <= 0 ||
3488 time_after_eq(jiffies, time_limit))) {
3489 /* no need to reschedule if we are going to stop */
3490 if (again)
3491 cond_resched_softirq();
3492 time_limit = jiffies + 2;
3493 budget = netdev_budget;
3494 rcu_bh_qs();
3495 __kfree_skb_flush();
3496 }
3497 }
3498
3499 clear_bit(NAPI_STATE_SCHED_THREAD, &napi->state);
3500 local_irq_disable();
3501 }
3502
3503 static inline bool napi_is_threaded(struct napi_struct *napi)
3504 {
3505 return current == napi->thread;
3506 }
3507 #else
3508 #define napi_is_threaded(napi) 0
3509 #endif
3510
3511 /* Called with irq disabled */
3512 static inline void ____napi_schedule(struct softnet_data *sd,
3513 struct napi_struct *napi)
3514 {
3515 if (napi_is_threaded(napi)) {
> 3516 napi_threaded_poll(napi);
3517 return;
3518 }
3519 list_add_tail(&napi->poll_list, &sd->poll_list);
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 8526 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-15 13:42 ` [PATCH 4/5] netdev: implement infrastructure for threadable napi irq Paolo Abeni
2016-06-15 14:12 ` kbuild test robot
@ 2016-06-15 14:17 ` Eric Dumazet
2016-06-15 14:21 ` Eric Dumazet
2016-06-15 16:42 ` Paolo Abeni
1 sibling, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2016-06-15 14:17 UTC (permalink / raw)
To: Paolo Abeni
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
On Wed, Jun 15, 2016 at 6:42 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> This commit adds the infrastructure needed for threadable
> rx interrupt. A reference to the irq thread is used to
> mark the threaded irq mode.
> In threaded mode the poll loop is invoked directly from
> __napi_schedule().
> napi drivers which want to support threadable irq interrupts
> must provide an irq mode change handler which actually set
> napi->thread and register it after requesting the irq.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
> include/linux/netdevice.h | 4 ++++
> net/core/dev.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 63 insertions(+)
>
I really appreciate the effort, but as I already said this is not going to work.
Many NIC have 2 NAPI contexts per queue, one for TX, one for RX.
Relying on CFS to switch from the two 'threads' you need in the one
vCPU case will add latencies that your 'pure throughput UDP flood' is
not able to detect.
I was waiting a fix from Andy Lutomirski to be merged before sending
my ksoftirqd fix, which will work and wont bring kernel bloat.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-15 14:17 ` Eric Dumazet
@ 2016-06-15 14:21 ` Eric Dumazet
2016-06-15 16:42 ` Paolo Abeni
1 sibling, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2016-06-15 14:21 UTC (permalink / raw)
To: Paolo Abeni
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev, Andy Lutomirski
On Wed, Jun 15, 2016 at 7:17 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Jun 15, 2016 at 6:42 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>> This commit adds the infrastructure needed for threadable
>> rx interrupt. A reference to the irq thread is used to
>> mark the threaded irq mode.
>> In threaded mode the poll loop is invoked directly from
>> __napi_schedule().
>> napi drivers which want to support threadable irq interrupts
>> must provide an irq mode change handler which actually set
>> napi->thread and register it after requesting the irq.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> ---
>> include/linux/netdevice.h | 4 ++++
>> net/core/dev.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 63 insertions(+)
>>
>
> I really appreciate the effort, but as I already said this is not going to work.
>
> Many NIC have 2 NAPI contexts per queue, one for TX, one for RX.
>
> Relying on CFS to switch from the two 'threads' you need in the one
> vCPU case will add latencies that your 'pure throughput UDP flood' is
> not able to detect.
>
> I was waiting a fix from Andy Lutomirski to be merged before sending
> my ksoftirqd fix, which will work and wont bring kernel bloat.
Andy's patch was"x86/traps: Don't force in_interrupt() to return true
in IST handlers"
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs
2016-06-15 13:42 ` [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs Paolo Abeni
@ 2016-06-15 14:50 ` kbuild test robot
0 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2016-06-15 14:50 UTC (permalink / raw)
To: Paolo Abeni
Cc: kbuild-all, linux-kernel, Thomas Gleixner, David S. Miller,
Eric Dumazet, Steven Rostedt, Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
[-- Attachment #1: Type: text/plain, Size: 2510 bytes --]
Hi,
[auto build test WARNING on tip/irq/core]
[also build test WARNING on v4.7-rc3 next-20160615]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Paolo-Abeni/genirq-threadable-IRQ-support/20160615-214836
reproduce: make htmldocs
All warnings (new ones prefixed by >>):
>> kernel/irq/manage.c:1681: warning: No description found for parameter 'notifier'
>> kernel/irq/manage.c:1681: warning: Excess function parameter 'mode_notifier' description in 'irq_set_mode_notifier'
kernel/irq/handle.c:1: warning: no structured comments found
--
lib/crc32.c:148: warning: No description found for parameter 'tab)[256]'
lib/crc32.c:148: warning: Excess function parameter 'tab' description in 'crc32_le_generic'
lib/crc32.c:293: warning: No description found for parameter 'tab)[256]'
lib/crc32.c:293: warning: Excess function parameter 'tab' description in 'crc32_be_generic'
lib/crc32.c:1: warning: no structured comments found
mm/memory.c:2881: warning: No description found for parameter 'old'
>> kernel/irq/manage.c:1681: warning: No description found for parameter 'notifier'
>> kernel/irq/manage.c:1681: warning: Excess function parameter 'mode_notifier' description in 'irq_set_mode_notifier'
vim +/notifier +1681 kernel/irq/manage.c
1665 /**
1666 * irq_set_mode_notifier - register a mode change notifier
1667 * @irq: Interrupt line
1668 * @dev_id: The cookie used to identify the irq handler and passed back
1669 * to the notifier
1670 * @mode_notifier: The callback to be registered
1671 *
1672 * This call registers a callback to notify the device about irq mode
1673 * change (threaded/normal mode). Mode change are triggered writing on
1674 * the 'threaded' procfs entry.
1675 * When running in threaded mode the irq thread task struct will be passed
1676 * to the notifer, or NULL elsewhere. It's up to the device update its
1677 * internal state accordingly
1678 */
1679 int irq_set_mode_notifier(unsigned int irq, void *dev_id,
1680 mode_notifier_t notifier)
> 1681 {
1682 struct irq_desc *desc = irq_to_desc(irq);
1683 struct irqaction *action;
1684 unsigned long flags;
1685 int ret = -EINVAL;
1686
1687 if (!desc)
1688 return ret;
1689
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 6370 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-15 14:17 ` Eric Dumazet
2016-06-15 14:21 ` Eric Dumazet
@ 2016-06-15 16:42 ` Paolo Abeni
2016-06-15 17:04 ` Eric Dumazet
1 sibling, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2016-06-15 16:42 UTC (permalink / raw)
To: Eric Dumazet
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
On Wed, 2016-06-15 at 07:17 -0700, Eric Dumazet wrote:
> On Wed, Jun 15, 2016 at 6:42 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > This commit adds the infrastructure needed for threadable
> > rx interrupt. A reference to the irq thread is used to
> > mark the threaded irq mode.
> > In threaded mode the poll loop is invoked directly from
> > __napi_schedule().
> > napi drivers which want to support threadable irq interrupts
> > must provide an irq mode change handler which actually set
> > napi->thread and register it after requesting the irq.
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> > include/linux/netdevice.h | 4 ++++
> > net/core/dev.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 63 insertions(+)
> >
>
> I really appreciate the effort, but as I already said this is not going to work.
>
> Many NIC have 2 NAPI contexts per queue, one for TX, one for RX.
>
> Relying on CFS to switch from the two 'threads' you need in the one
> vCPU case will add latencies that your 'pure throughput UDP flood' is
> not able to detect.
We have done TCP_RR tests with similar results: when the throughput is
(guest) cpu bounded and multiple flows are used, there is measurable
gain.
> I was waiting a fix from Andy Lutomirski to be merged before sending
> my ksoftirqd fix, which will work and wont bring kernel bloat.
We experimented that patch in this scenario, but it don't give
measurable gain, since the ksoftirqd threads still prevent the qemu
process from using 100% of any hypervisor's cores.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-15 16:42 ` Paolo Abeni
@ 2016-06-15 17:04 ` Eric Dumazet
2016-06-16 10:39 ` Paolo Abeni
0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2016-06-15 17:04 UTC (permalink / raw)
To: Paolo Abeni
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
On Wed, Jun 15, 2016 at 9:42 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Wed, 2016-06-15 at 07:17 -0700, Eric Dumazet wrote:
>>
>> I really appreciate the effort, but as I already said this is not going to work.
>>
>> Many NIC have 2 NAPI contexts per queue, one for TX, one for RX.
>>
>> Relying on CFS to switch from the two 'threads' you need in the one
>> vCPU case will add latencies that your 'pure throughput UDP flood' is
>> not able to detect.
>
> We have done TCP_RR tests with similar results: when the throughput is
> (guest) cpu bounded and multiple flows are used, there is measurable
> gain.
TCP_RR hardly triggers the problem I am mentioning.
You need a combination of different competing works. Both bulk and rpc like.
The important factor for RPC is P99 latency.
Look, the simple fact that mlx4 driver can dequeue 256 skb per TX napi poll
and only 64 skbs in RX poll is problematic in some workloads, since
this allows a queue to build up on RX rings.
>
>> I was waiting a fix from Andy Lutomirski to be merged before sending
>> my ksoftirqd fix, which will work and wont bring kernel bloat.
>
> We experimented that patch in this scenario, but it don't give
> measurable gain, since the ksoftirqd threads still prevent the qemu
> process from using 100% of any hypervisor's cores.
Not sure what you measured, but in my experiment, the user thread
could finally get a fair share of the core, instead of 0%
Improvement was 100000 % or so.
How are you making sure your thread uses say 1% of the core, and let
99% to the 'qemu' process exactly ?
How the typical user will enable all this stuff exactly ?
All I am saying is that you add a complex infra, that will need a lot
of tweaks and considerable maintenance burden,
instead of fixing the existing one _first_.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-15 17:04 ` Eric Dumazet
@ 2016-06-16 10:39 ` Paolo Abeni
2016-06-16 11:19 ` Eric Dumazet
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2016-06-16 10:39 UTC (permalink / raw)
To: Eric Dumazet
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
On Wed, 2016-06-15 at 10:04 -0700, Eric Dumazet wrote:
> On Wed, Jun 15, 2016 at 9:42 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > On Wed, 2016-06-15 at 07:17 -0700, Eric Dumazet wrote:
>
> >>
> >> I really appreciate the effort, but as I already said this is not going to work.
> >>
> >> Many NIC have 2 NAPI contexts per queue, one for TX, one for RX.
> >>
> >> Relying on CFS to switch from the two 'threads' you need in the one
> >> vCPU case will add latencies that your 'pure throughput UDP flood' is
> >> not able to detect.
> >
> > We have done TCP_RR tests with similar results: when the throughput is
> > (guest) cpu bounded and multiple flows are used, there is measurable
> > gain.
>
> TCP_RR hardly triggers the problem I am mentioning.
>
> You need a combination of different competing works. Both bulk and rpc like.
>
> The important factor for RPC is P99 latency.
>
> Look, the simple fact that mlx4 driver can dequeue 256 skb per TX napi poll
> and only 64 skbs in RX poll is problematic in some workloads, since
> this allows a queue to build up on RX rings.
>
> >
> >> I was waiting a fix from Andy Lutomirski to be merged before sending
> >> my ksoftirqd fix, which will work and wont bring kernel bloat.
> >
> > We experimented that patch in this scenario, but it don't give
> > measurable gain, since the ksoftirqd threads still prevent the qemu
> > process from using 100% of any hypervisor's cores.
>
> Not sure what you measured, but in my experiment, the user thread
> could finally get a fair share of the core, instead of 0%
>
> Improvement was 100000 % or so.
We used a different setup to explicitly avoid the (guest) userspace
starvation issue. Using a guest with 2vCPUs (or more) and a single queue
avoids the starvation issue, because the scheduler moves the user space
processes on a different vCPU in respect to the ksoftirqd thread.
In the hypervisor, with a vanilla kernel, the qemu process receives a
fair share of the cpu time, but considerably less 100%, and his
performances are bounded to a considerable lower throughput than the
theoretical one.
We tested your patch in both the guest and/or the hypervisor with the
above scenario and it doesn't change the throughput numbers much. But it
fixes nicely the starvation issue on single core host and we are
definitely in favor of it and waiting to get it included.
> How are you making sure your thread uses say 1% of the core, and let
> 99% to the 'qemu' process exactly ?
We allow the irq thread to be migrated. The scheduler can move it on a
different (hypervisor) core according to the workload, and qemu can
avoid completely competing with other processes for a cpu.
We are not using the threaded irqs in the guest, only into the
hypervisor.
> How the typical user will enable all this stuff exactly ?
A desktop host or a bare-metal server don't probably need/want it. An
hypervisor or an (small) router would probably enable irq threading on
all supported NICs. That could be managed by the tuned daemon or the
like with an appropriate profile.
Advanced users, also real time sensitive users, can simply use the
procfs now.
kernel without IRQ_FORCED_THREADING are unaffected, kernel with
IRQ_FORCED_THREADING can already change the packet reception (and more)
in a significant way with the forcedirq parameter.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-16 10:39 ` Paolo Abeni
@ 2016-06-16 11:19 ` Eric Dumazet
2016-06-16 12:03 ` Paolo Abeni
0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2016-06-16 11:19 UTC (permalink / raw)
To: Paolo Abeni
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
On Thu, Jun 16, 2016 at 3:39 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> We used a different setup to explicitly avoid the (guest) userspace
> starvation issue. Using a guest with 2vCPUs (or more) and a single queue
> avoids the starvation issue, because the scheduler moves the user space
> processes on a different vCPU in respect to the ksoftirqd thread.
>
> In the hypervisor, with a vanilla kernel, the qemu process receives a
> fair share of the cpu time, but considerably less 100%, and his
> performances are bounded to a considerable lower throughput than the
> theoretical one.
>
Completely different setup than last time. I am kind of lost.
Are you trying to find the optimal way to demonstrate your patch can be useful ?
In a case with 2 vcpus, then the _standard_ kernel will migrate the
user thread on the cpu not used by the IRQ,
once process scheduler can see two threads competing on one cpu
(ksoftirqd and the user thread), and the other cpu being idle.
Trying to shift the IRQ 'thread' is not nice, since the hardware IRQ
will be delivered on the wrong cpu.
Unless user space forces cpu pinning ? Then tell the user it should not.
The natural choice is to put both producer and consumer on same cpu
for cache locality reasons (wake affine),
but in stress mode allow to run the consumer on another cpu if available.
If the process scheduler fails to migrate the producer, then there is
a bug needing to be fixed.
Trying to migrate the producer, while hardware IRQ are generally stick
to one cpu is counter intuitive and source of reorders.
(Think of tunneling processing, re-injecting packets to the stack with
netif_rx())
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-16 11:19 ` Eric Dumazet
@ 2016-06-16 12:03 ` Paolo Abeni
2016-06-16 16:55 ` Eric Dumazet
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2016-06-16 12:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
On Thu, 2016-06-16 at 04:19 -0700, Eric Dumazet wrote:
> On Thu, Jun 16, 2016 at 3:39 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > We used a different setup to explicitly avoid the (guest) userspace
> > starvation issue. Using a guest with 2vCPUs (or more) and a single queue
> > avoids the starvation issue, because the scheduler moves the user space
> > processes on a different vCPU in respect to the ksoftirqd thread.
> >
> > In the hypervisor, with a vanilla kernel, the qemu process receives a
> > fair share of the cpu time, but considerably less 100%, and his
> > performances are bounded to a considerable lower throughput than the
> > theoretical one.
> >
>
> Completely different setup than last time. I am kind of lost.
>
> Are you trying to find the optimal way to demonstrate your patch can be useful ?
>
> In a case with 2 vcpus, then the _standard_ kernel will migrate the
> user thread on the cpu not used by the IRQ,
> once process scheduler can see two threads competing on one cpu
> (ksoftirqd and the user thread), and the other cpu being idle.
>
> Trying to shift the IRQ 'thread' is not nice, since the hardware IRQ
> will be delivered on the wrong cpu.
>
> Unless user space forces cpu pinning ? Then tell the user it should not.
>
> The natural choice is to put both producer and consumer on same cpu
> for cache locality reasons (wake affine),
> but in stress mode allow to run the consumer on another cpu if available.
>
> If the process scheduler fails to migrate the producer, then there is
> a bug needing to be fixed.
I guess you means 'consumer' here. The scheduler doesn't fail to migrate
it: the consumer is actually migrated a lot of times, but on each cpu a
competing and running ksoftirqd thread is found.
The general problem is that under significant network load (not
necessary udp flood, similar behavior is observed even with TCP_RR
tests), with enough rx queue available and enough flows running, no
single thread/process can use 100% of any cpu, even if the overall
capacity would allow it.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
2016-06-16 12:03 ` Paolo Abeni
@ 2016-06-16 16:55 ` Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2016-06-16 16:55 UTC (permalink / raw)
To: Paolo Abeni
Cc: LKML, Thomas Gleixner, David S. Miller, Steven Rostedt,
Peter Zijlstra (Intel),
Ingo Molnar, Hannes Frederic Sowa, netdev
>
> I guess you means 'consumer' here. The scheduler doesn't fail to migrate
> it: the consumer is actually migrated a lot of times, but on each cpu a
> competing and running ksoftirqd thread is found.
>
> The general problem is that under significant network load (not
> necessary udp flood, similar behavior is observed even with TCP_RR
> tests), with enough rx queue available and enough flows running, no
> single thread/process can use 100% of any cpu, even if the overall
> capacity would allow it.
>
Looks like a general process scheduler issue ?
Really, allowing the RX processing to be migrated among cpus is
problematic for TCP,
as it will increase reorders.
RFS for example has a very specific logic to avoid these problems as
much as possible.
/*
* If the desired CPU (where last recvmsg was done) is
* different from current CPU (one in the rx-queue flow
* table entry), switch if one of the following holds:
* - Current CPU is unset (>= nr_cpu_ids).
* - Current CPU is offline.
* - The current CPU's queue tail has advanced beyond the
* last packet that was enqueued using this table entry.
* This guarantees that all previous packets for the flow
* have been dequeued, thus preserving in order delivery.
*/
if (unlikely(tcpu != next_cpu) &&
(tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
((int)(per_cpu(softnet_data, tcpu).input_queue_head -
rflow->last_qtail)) >= 0)) {
tcpu = next_cpu;
rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
}
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-06-16 16:55 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-15 13:42 [PATCH 0/5] genirq: threadable IRQ support Paolo Abeni
2016-06-15 13:42 ` [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs Paolo Abeni
2016-06-15 14:50 ` kbuild test robot
2016-06-15 13:42 ` [PATCH 2/5] genirq: add flags for controlling the default threaded irq behavior Paolo Abeni
2016-06-15 13:42 ` [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq Paolo Abeni
2016-06-15 13:48 ` Peter Zijlstra
2016-06-15 14:00 ` Paolo Abeni
2016-06-15 13:42 ` [PATCH 4/5] netdev: implement infrastructure for threadable napi irq Paolo Abeni
2016-06-15 14:12 ` kbuild test robot
2016-06-15 14:17 ` Eric Dumazet
2016-06-15 14:21 ` Eric Dumazet
2016-06-15 16:42 ` Paolo Abeni
2016-06-15 17:04 ` Eric Dumazet
2016-06-16 10:39 ` Paolo Abeni
2016-06-16 11:19 ` Eric Dumazet
2016-06-16 12:03 ` Paolo Abeni
2016-06-16 16:55 ` Eric Dumazet
2016-06-15 13:42 ` [PATCH 5/5] ixgbe: add support for threadable rx irq Paolo Abeni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).