All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next-2.6 0/5] RFS hardware acceleration (v3)
@ 2011-01-19 20:59 Ben Hutchings
  2011-01-19 21:02   ` Ben Hutchings
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Ben Hutchings @ 2011-01-19 20:59 UTC (permalink / raw)
  To: David Miller, Thomas Gleixner, Tom Herbert
  Cc: netdev, linux-net-drivers, linux-kernel

This patch series extends RFS to use hardware RX filters where
available.  Depending on the number of hardware RX queues and their
IRQs' affinity, this should reduce the need for IPIs or at least get
packets delivered to the right NUMA node.

The first patch implements IRQ affinity notifiers, based on the outline
that Thomas Gleixner wrote in response to the previous version of this
patch series.  This has been updated based on Thomas's previous comments.

The second patch is a generalisation of the CPU affinity reverse-
mapping, plus functions to maintain such a mapping based on the new IRQ
affinity notifiers.  This has been updated based on Eric Dumazet's
comments.

The remaining patches add the RFS acceleration hooks and an
implementation in the sfc driver.  I have changed the sfc driver's
strategy for reclaiming entries in the filter table entry from the
previous version.  The table can now be scanned at the end of each NAPI
polling interval, based on the rate at which filters are being added.
However, I haven't yet constructed a good test case that involves
turnover of flows so I have yet to settle on a good strategy for this.

Ben.

Ben Hutchings (5):
  genirq: Add IRQ affinity notifiers
  lib: cpu_rmap: CPU affinity reverse-mapping
  net: RPS: Enable hardware acceleration of RFS
  sfc: Limit filter search depth further for performance hints (i.e.
    RFS)
  sfc: Implement hardware acceleration of RFS

 drivers/net/sfc/efx.c        |   49 ++++++++-
 drivers/net/sfc/efx.h        |   16 +++
 drivers/net/sfc/filter.c     |  107 ++++++++++++++++-
 drivers/net/sfc/net_driver.h |    3 +
 include/linux/cpu_rmap.h     |   73 ++++++++++++
 include/linux/interrupt.h    |   31 +++++
 include/linux/irqdesc.h      |    3 +
 include/linux/netdevice.h    |   33 +++++-
 kernel/irq/manage.c          |   82 +++++++++++++
 lib/Kconfig                  |    4 +
 lib/Makefile                 |    2 +
 lib/cpu_rmap.c               |  269 ++++++++++++++++++++++++++++++++++++++++++
 net/Kconfig                  |    6 +
 net/core/dev.c               |   97 ++++++++++++++-
 14 files changed, 760 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/cpu_rmap.h
 create mode 100644 lib/cpu_rmap.c

-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 20:59 [PATCH net-next-2.6 0/5] RFS hardware acceleration (v3) Ben Hutchings
@ 2011-01-19 21:02   ` Ben Hutchings
  2011-01-19 21:03 ` [PATCH net-next-2.6 2/5] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Ben Hutchings @ 2011-01-19 21:01 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: netdev, linux-net-drivers, Tom Herbert, David Miller

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
notification mechanism to support this.

This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
Thomas, I hope this answers all your comments.  If you are happy with
this, can it go into net-next-2.6 so the rest of this series can get
into 2.6.39?

Ben.

 include/linux/interrupt.h |   31 +++++++++++++++++
 include/linux/irqdesc.h   |    3 ++
 kernel/irq/manage.c       |   82 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 116 insertions(+), 0 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 55e0d42..9823e37 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -14,6 +14,8 @@
 #include <linux/smp.h>
 #include <linux/percpu.h>
 #include <linux/hrtimer.h>
+#include <linux/kref.h>
+#include <linux/workqueue.h>
 
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
@@ -240,6 +242,35 @@ extern int irq_can_set_affinity(unsigned int irq);
 extern int irq_select_affinity(unsigned int irq);
 
 extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+
+/**
+ * struct irq_affinity_notify - context for notification of IRQ affinity changes
+ * @irq:		Interrupt to which notification applies
+ * @kref:		Reference count, for internal use
+ * @work:		Work item, for internal use
+ * @notify:		Function to be called on change.  This will be
+ *			called in process context.
+ * @release:		Function to be called on release.  This will be
+ *			called in process context.  Once registered, the
+ *			structure must only be freed when this function is
+ *			called or later.
+ */
+struct irq_affinity_notify {
+        unsigned int irq;
+        struct kref kref;
+        struct work_struct work;
+        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
+        void (*release)(struct kref *ref);
+};
+
+extern int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
+
+static inline void irq_run_affinity_notifiers(void)
+{
+	flush_scheduled_work();
+}
+
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 6a64c6f..b98ffa4 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -8,6 +8,7 @@
  * For now it's included from <linux/irq.h>
  */
 
+struct irq_affinity_notify;
 struct proc_dir_entry;
 struct timer_rand_state;
 /**
@@ -24,6 +25,7 @@ struct timer_rand_state;
  * @last_unhandled:	aging timer for unhandled count
  * @irqs_unhandled:	stats field for spurious unhandled interrupts
  * @lock:		locking for SMP
+ * @affinity_notify:	context for notification of affinity changes
  * @pending_mask:	pending rebalanced interrupts
  * @threads_active:	number of irqaction threads currently running
  * @wait_for_threads:	wait queue for sync_irq to wait for threaded handlers
@@ -70,6 +72,7 @@ struct irq_desc {
 	raw_spinlock_t		lock;
 #ifdef CONFIG_SMP
 	const struct cpumask	*affinity_hint;
+	struct irq_affinity_notify *affinity_notify;
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 	cpumask_var_t		pending_mask;
 #endif
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0caa59f..e48019e 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -134,6 +134,10 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 		irq_set_thread_affinity(desc);
 	}
 #endif
+	if (desc->affinity_notify) {
+		kref_get(&desc->affinity_notify->kref);
+		schedule_work(&desc->affinity_notify->work);
+	}
 	desc->status |= IRQ_AFFINITY_SET;
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return 0;
@@ -155,6 +159,79 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
+static void irq_affinity_notify(struct work_struct *work)
+{
+	struct irq_affinity_notify *notify =
+		container_of(work, struct irq_affinity_notify, work);
+	struct irq_desc *desc = irq_to_desc(notify->irq);
+	cpumask_var_t cpumask;
+	unsigned long flags;
+
+	if (!desc)
+		goto out;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
+		goto out;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+	if (desc->status & IRQ_MOVE_PENDING)
+		cpumask_copy(cpumask, desc->pending_mask);
+	else
+#endif
+		cpumask_copy(cpumask, desc->affinity);
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	notify->notify(notify, cpumask);
+
+	free_cpumask_var(cpumask);
+out:
+	kref_put(&notify->kref, notify->release);
+}
+
+/**
+ *	irq_set_affinity_notifier - control notification of IRQ affinity changes
+ *	@irq:		Interrupt for which to enable/disable notification
+ *	@notify:	Context for notification, or %NULL to disable
+ *			notification.  Function pointers must be initialised;
+ *			the other fields will be initialised by this function.
+ *
+ *	Must be called in process context.  Notification may only be enabled
+ *	after the IRQ is allocated and must be disabled before the IRQ is
+ *	freed using free_irq().
+ */
+int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_affinity_notify *old_notify;
+	unsigned long flags;
+
+	/* The release function is promised process context */
+	might_sleep();
+
+	if (!desc)
+		return -EINVAL;
+
+	/* Complete initialisation of *notify */
+	if (notify) {
+		notify->irq = irq;
+		kref_init(&notify->kref);
+		INIT_WORK(&notify->work, irq_affinity_notify);
+	}
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+	old_notify = desc->affinity_notify;
+	desc->affinity_notify = notify;
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	if (old_notify)
+		kref_put(&old_notify->kref, old_notify->release);
+	
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_set_affinity_notifier);
+
 #ifndef CONFIG_AUTO_IRQ_AFFINITY
 /*
  * Generic version of the affinity autoselector.
@@ -1004,6 +1081,11 @@ void free_irq(unsigned int irq, void *dev_id)
 	if (!desc)
 		return;
 
+#ifdef CONFIG_SMP
+	if (WARN_ON(desc->affinity_notify))
+		desc->affinity_notify = NULL;
+#endif
+
 	chip_bus_lock(desc);
 	kfree(__free_irq(irq, dev_id));
 	chip_bus_sync_unlock(desc);
-- 
1.7.3.4



-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
@ 2011-01-19 21:02   ` Ben Hutchings
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Hutchings @ 2011-01-19 21:02 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: netdev, linux-net-drivers, Tom Herbert, David Miller

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
notification mechanism to support this.

This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
Thomas, I hope this answers all your comments.  If you are happy with
this, can it go into net-next-2.6 so the rest of this series can get
into 2.6.39?

Ben.

 include/linux/interrupt.h |   31 +++++++++++++++++
 include/linux/irqdesc.h   |    3 ++
 kernel/irq/manage.c       |   82 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 116 insertions(+), 0 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 55e0d42..9823e37 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -14,6 +14,8 @@
 #include <linux/smp.h>
 #include <linux/percpu.h>
 #include <linux/hrtimer.h>
+#include <linux/kref.h>
+#include <linux/workqueue.h>
 
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
@@ -240,6 +242,35 @@ extern int irq_can_set_affinity(unsigned int irq);
 extern int irq_select_affinity(unsigned int irq);
 
 extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+
+/**
+ * struct irq_affinity_notify - context for notification of IRQ affinity changes
+ * @irq:		Interrupt to which notification applies
+ * @kref:		Reference count, for internal use
+ * @work:		Work item, for internal use
+ * @notify:		Function to be called on change.  This will be
+ *			called in process context.
+ * @release:		Function to be called on release.  This will be
+ *			called in process context.  Once registered, the
+ *			structure must only be freed when this function is
+ *			called or later.
+ */
+struct irq_affinity_notify {
+        unsigned int irq;
+        struct kref kref;
+        struct work_struct work;
+        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
+        void (*release)(struct kref *ref);
+};
+
+extern int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
+
+static inline void irq_run_affinity_notifiers(void)
+{
+	flush_scheduled_work();
+}
+
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 6a64c6f..b98ffa4 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -8,6 +8,7 @@
  * For now it's included from <linux/irq.h>
  */
 
+struct irq_affinity_notify;
 struct proc_dir_entry;
 struct timer_rand_state;
 /**
@@ -24,6 +25,7 @@ struct timer_rand_state;
  * @last_unhandled:	aging timer for unhandled count
  * @irqs_unhandled:	stats field for spurious unhandled interrupts
  * @lock:		locking for SMP
+ * @affinity_notify:	context for notification of affinity changes
  * @pending_mask:	pending rebalanced interrupts
  * @threads_active:	number of irqaction threads currently running
  * @wait_for_threads:	wait queue for sync_irq to wait for threaded handlers
@@ -70,6 +72,7 @@ struct irq_desc {
 	raw_spinlock_t		lock;
 #ifdef CONFIG_SMP
 	const struct cpumask	*affinity_hint;
+	struct irq_affinity_notify *affinity_notify;
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 	cpumask_var_t		pending_mask;
 #endif
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0caa59f..e48019e 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -134,6 +134,10 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 		irq_set_thread_affinity(desc);
 	}
 #endif
+	if (desc->affinity_notify) {
+		kref_get(&desc->affinity_notify->kref);
+		schedule_work(&desc->affinity_notify->work);
+	}
 	desc->status |= IRQ_AFFINITY_SET;
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return 0;
@@ -155,6 +159,79 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
+static void irq_affinity_notify(struct work_struct *work)
+{
+	struct irq_affinity_notify *notify =
+		container_of(work, struct irq_affinity_notify, work);
+	struct irq_desc *desc = irq_to_desc(notify->irq);
+	cpumask_var_t cpumask;
+	unsigned long flags;
+
+	if (!desc)
+		goto out;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
+		goto out;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+	if (desc->status & IRQ_MOVE_PENDING)
+		cpumask_copy(cpumask, desc->pending_mask);
+	else
+#endif
+		cpumask_copy(cpumask, desc->affinity);
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	notify->notify(notify, cpumask);
+
+	free_cpumask_var(cpumask);
+out:
+	kref_put(&notify->kref, notify->release);
+}
+
+/**
+ *	irq_set_affinity_notifier - control notification of IRQ affinity changes
+ *	@irq:		Interrupt for which to enable/disable notification
+ *	@notify:	Context for notification, or %NULL to disable
+ *			notification.  Function pointers must be initialised;
+ *			the other fields will be initialised by this function.
+ *
+ *	Must be called in process context.  Notification may only be enabled
+ *	after the IRQ is allocated and must be disabled before the IRQ is
+ *	freed using free_irq().
+ */
+int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_affinity_notify *old_notify;
+	unsigned long flags;
+
+	/* The release function is promised process context */
+	might_sleep();
+
+	if (!desc)
+		return -EINVAL;
+
+	/* Complete initialisation of *notify */
+	if (notify) {
+		notify->irq = irq;
+		kref_init(&notify->kref);
+		INIT_WORK(&notify->work, irq_affinity_notify);
+	}
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+	old_notify = desc->affinity_notify;
+	desc->affinity_notify = notify;
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	if (old_notify)
+		kref_put(&old_notify->kref, old_notify->release);
+	
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_set_affinity_notifier);
+
 #ifndef CONFIG_AUTO_IRQ_AFFINITY
 /*
  * Generic version of the affinity autoselector.
@@ -1004,6 +1081,11 @@ void free_irq(unsigned int irq, void *dev_id)
 	if (!desc)
 		return;
 
+#ifdef CONFIG_SMP
+	if (WARN_ON(desc->affinity_notify))
+		desc->affinity_notify = NULL;
+#endif
+
 	chip_bus_lock(desc);
 	kfree(__free_irq(irq, dev_id));
 	chip_bus_sync_unlock(desc);
-- 
1.7.3.4



-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next-2.6 2/5] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-19 20:59 [PATCH net-next-2.6 0/5] RFS hardware acceleration (v3) Ben Hutchings
  2011-01-19 21:02   ` Ben Hutchings
@ 2011-01-19 21:03 ` Ben Hutchings
  2011-01-24 23:01   ` David Miller
  2011-01-19 21:03 ` [PATCH net-next-2.6 3/5] net: RPS: Enable hardware acceleration of RFS Ben Hutchings
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Ben Hutchings @ 2011-01-19 21:03 UTC (permalink / raw)
  To: David Miller, Thomas Gleixner
  Cc: netdev, linux-net-drivers, Tom Herbert, linux-kernel

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
library functions to support a generic reverse-mapping from CPUs to
objects with affinity and the specific case where the objects are
IRQs.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/cpu_rmap.h |   73 +++++++++++++
 lib/Kconfig              |    4 +
 lib/Makefile             |    2 +
 lib/cpu_rmap.c           |  269 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 348 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cpu_rmap.h
 create mode 100644 lib/cpu_rmap.c

diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
new file mode 100644
index 0000000..473771a
--- /dev/null
+++ b/include/linux/cpu_rmap.h
@@ -0,0 +1,73 @@
+/*
+ * cpu_rmap.c: CPU affinity reverse-map support
+ * Copyright 2011 Solarflare Communications Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/gfp.h>
+#include <linux/slab.h>
+
+/**
+ * struct cpu_rmap - CPU affinity reverse-map
+ * @size: Number of objects to be reverse-mapped
+ * @used: Number of objects added
+ * @obj: Pointer to array of object pointers
+ * @near: For each CPU, the index and distance to the nearest object,
+ *      based on affinity masks
+ */
+struct cpu_rmap {
+	u16		size, used;
+	void		**obj;
+	struct {
+		u16	index;
+		u16	dist;
+	}		near[0];
+};
+#define CPU_RMAP_DIST_INF 0xffff
+
+extern struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags);
+
+/**
+ * free_cpu_rmap - free CPU affinity reverse-map
+ * @rmap: Reverse-map allocated with alloc_cpu_rmap(), or %NULL
+ */
+static inline void free_cpu_rmap(struct cpu_rmap *rmap)
+{
+	kfree(rmap);
+}
+
+extern int cpu_rmap_add(struct cpu_rmap *rmap, void *obj);
+extern int cpu_rmap_update(struct cpu_rmap *rmap, u16 index,
+			   const struct cpumask *affinity);
+
+static inline u16 cpu_rmap_lookup_index(struct cpu_rmap *rmap, unsigned int cpu)
+{
+	return rmap->near[cpu].index;
+}
+
+static inline void *cpu_rmap_lookup_obj(struct cpu_rmap *rmap, unsigned int cpu)
+{
+	return rmap->obj[rmap->near[cpu].index];
+}
+
+#ifdef CONFIG_GENERIC_HARDIRQS
+
+/**
+ * alloc_irq_cpu_rmap - allocate CPU affinity reverse-map for IRQs
+ * @size: Number of objects to be mapped
+ *
+ * Must be called in process context.
+ */
+static inline struct cpu_rmap *alloc_irq_cpu_rmap(unsigned int size)
+{
+	return alloc_cpu_rmap(size, GFP_KERNEL);
+}
+extern void free_irq_cpu_rmap(struct cpu_rmap *rmap);
+
+extern int irq_cpu_rmap_add(struct cpu_rmap *rmap, int irq);
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 0ee67e0..8334342 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -201,6 +201,10 @@ config DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
        bool "Disable obsolete cpumask functions" if DEBUG_PER_CPU_MAPS
        depends on EXPERIMENTAL && BROKEN
 
+config CPU_RMAP
+	bool
+	depends on SMP
+
 #
 # Netlink attribute parsing support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index cbb774f..b73ba01 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o
 
 obj-$(CONFIG_AVERAGE) += average.o
 
+obj-$(CONFIG_CPU_RMAP) += cpu_rmap.o
+
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
new file mode 100644
index 0000000..987acfa
--- /dev/null
+++ b/lib/cpu_rmap.c
@@ -0,0 +1,269 @@
+/*
+ * cpu_rmap.c: CPU affinity reverse-map support
+ * Copyright 2011 Solarflare Communications Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#include <linux/cpu_rmap.h>
+#ifdef CONFIG_GENERIC_HARDIRQS
+#include <linux/interrupt.h>
+#endif
+#include <linux/module.h>
+
+/*
+ * These functions maintain a mapping from CPUs to some ordered set of
+ * objects with CPU affinities.  This can be seen as a reverse-map of
+ * CPU affinity.  However, we do not assume that the object affinities
+ * cover all CPUs in the system.  For those CPUs not directly covered
+ * by object affinities, we attempt to find a nearest object based on
+ * CPU topology.
+ */
+
+/**
+ * alloc_cpu_rmap - allocate CPU affinity reverse-map
+ * @size: Number of objects to be mapped
+ * @flags: Allocation flags e.g. %GFP_KERNEL
+ */
+struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags)
+{
+	struct cpu_rmap *rmap;
+	unsigned int cpu;
+	size_t obj_offset;
+
+	/* This is a silly number of objects, and we use u16 indices. */
+	if (size > 0xffff)
+		return NULL;
+
+	/* Offset of object pointer array from base structure */
+	obj_offset = ALIGN(offsetof(struct cpu_rmap, near[nr_cpu_ids]),
+			   sizeof(void *));
+
+	rmap = kzalloc(obj_offset + size * sizeof(rmap->obj[0]), flags);
+	if (!rmap)
+		return NULL;
+
+	rmap->obj = (void **)((char *)rmap + obj_offset);
+
+	/* Initially assign CPUs to objects on a rota, since we have
+	 * no idea where the objects are.  Use infinite distance, so
+	 * any object with known distance is preferable.  Include the
+	 * CPUs that are not present/online, since we definitely want
+	 * any newly-hotplugged CPUs to have some object assigned.
+	 */
+	for_each_possible_cpu(cpu) {
+		rmap->near[cpu].index = cpu % size;
+		rmap->near[cpu].dist = CPU_RMAP_DIST_INF;
+	}
+
+	rmap->size = size;
+	return rmap;
+}
+EXPORT_SYMBOL(alloc_cpu_rmap);
+
+/* Reevaluate nearest object for given CPU, comparing with the given
+ * neighbours at the given distance.
+ */
+static bool cpu_rmap_copy_neigh(struct cpu_rmap *rmap, unsigned int cpu,
+				const struct cpumask *mask, u16 dist)
+{
+	int neigh;
+
+	for_each_cpu(neigh, mask) {
+		if (rmap->near[cpu].dist > dist &&
+		    rmap->near[neigh].dist <= dist) {
+			rmap->near[cpu].index = rmap->near[neigh].index;
+			rmap->near[cpu].dist = dist;
+			return true;
+		}
+	}
+	return false;
+}
+
+#ifdef DEBUG
+static void debug_print_rmap(const struct cpu_rmap *rmap, const char *prefix)
+{
+	unsigned index;
+	unsigned int cpu;
+
+	pr_info("cpu_rmap %p, %s:\n", rmap, prefix);
+
+	for_each_possible_cpu(cpu) {
+		index = rmap->near[cpu].index;
+		pr_info("cpu %d -> obj %u (distance %u)\n",
+			cpu, index, rmap->near[cpu].dist);
+	}
+}
+#else
+static inline void
+debug_print_rmap(const struct cpu_rmap *rmap, const char *prefix)
+{
+}
+#endif
+
+/**
+ * cpu_rmap_add - add object to a rmap
+ * @rmap: CPU rmap allocated with alloc_cpu_rmap()
+ * @obj: Object to add to rmap
+ *
+ * Return index of object.
+ */
+int cpu_rmap_add(struct cpu_rmap *rmap, void *obj)
+{
+	u16 index;
+
+	BUG_ON(rmap->used >= rmap->size);
+	index = rmap->used++;
+	rmap->obj[index] = obj;
+	return index;
+}
+EXPORT_SYMBOL(cpu_rmap_add);
+
+/**
+ * cpu_rmap_update - update CPU rmap following a change of object affinity
+ * @rmap: CPU rmap to update
+ * @index: Index of object whose affinity changed
+ * @affinity: New CPU affinity of object
+ */
+int cpu_rmap_update(struct cpu_rmap *rmap, u16 index,
+		    const struct cpumask *affinity)
+{
+	cpumask_var_t update_mask;
+	unsigned int cpu;
+
+	if (unlikely(!zalloc_cpumask_var(&update_mask, GFP_KERNEL)))
+		return -ENOMEM;
+
+	/* Invalidate distance for all CPUs for which this used to be
+	 * the nearest object.  Mark those CPUs for update.
+	 */
+	for_each_online_cpu(cpu) {
+		if (rmap->near[cpu].index == index) {
+			rmap->near[cpu].dist = CPU_RMAP_DIST_INF;
+			cpumask_set_cpu(cpu, update_mask);
+		}
+	}
+
+	debug_print_rmap(rmap, "after invalidating old distances");
+
+	/* Set distance to 0 for all CPUs in the new affinity mask.
+	 * Mark all CPUs within their NUMA nodes for update.
+	 */
+	for_each_cpu(cpu, affinity) {
+		rmap->near[cpu].index = index;
+		rmap->near[cpu].dist = 0;
+		cpumask_or(update_mask, update_mask,
+			   cpumask_of_node(cpu_to_node(cpu)));
+	}
+
+	debug_print_rmap(rmap, "after updating neighbours");
+
+	/* Update distances based on topology */
+	for_each_cpu(cpu, update_mask) {
+		if (cpu_rmap_copy_neigh(rmap, cpu,
+					topology_thread_cpumask(cpu), 1))
+			continue;
+		if (cpu_rmap_copy_neigh(rmap, cpu,
+					topology_core_cpumask(cpu), 2))
+			continue;
+		if (cpu_rmap_copy_neigh(rmap, cpu,
+					cpumask_of_node(cpu_to_node(cpu)), 3))
+			continue;
+		/* We could continue into NUMA node distances, but for now
+		 * we give up.
+		 */
+	}
+
+	debug_print_rmap(rmap, "after copying neighbours");
+
+	free_cpumask_var(update_mask);
+	return 0;
+}
+EXPORT_SYMBOL(cpu_rmap_update);
+
+#ifdef CONFIG_GENERIC_HARDIRQS
+
+/* Glue between IRQ affinity notifiers and CPU rmaps */
+
+struct irq_glue {
+	struct irq_affinity_notify notify;
+	struct cpu_rmap *rmap;
+	u16 index;
+};
+
+/**
+ * free_irq_cpu_rmap - free a CPU affinity reverse-map used for IRQs
+ * @rmap: Reverse-map allocated with alloc_irq_cpu_map(), or %NULL
+ *
+ * Must be called in process context, before freeing the IRQs, and
+ * without holding any locks required by global workqueue items.
+ */
+void free_irq_cpu_rmap(struct cpu_rmap *rmap)
+{
+	struct irq_glue *glue;
+	u16 index;
+
+	if (!rmap)
+		return;
+
+	for (index = 0; index < rmap->used; index++) {
+		glue = rmap->obj[index];
+		irq_set_affinity_notifier(glue->notify.irq, NULL);
+	}
+	irq_run_affinity_notifiers();
+
+	kfree(rmap);
+}
+EXPORT_SYMBOL(free_irq_cpu_rmap);
+
+static void
+irq_cpu_rmap_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
+{
+	struct irq_glue *glue =
+		container_of(notify, struct irq_glue, notify);
+	int rc;
+
+	rc = cpu_rmap_update(glue->rmap, glue->index, mask);
+	if (rc)
+		pr_warning("irq_cpu_rmap_notify: update failed: %d\n", rc);
+}
+
+static void irq_cpu_rmap_release(struct kref *ref)
+{
+	struct irq_glue *glue =
+		container_of(ref, struct irq_glue, notify.kref);
+	kfree(glue);
+}
+
+/**
+ * irq_cpu_rmap_add - add an IRQ to a CPU affinity reverse-map
+ * @rmap: The reverse-map
+ * @irq: The IRQ number
+ *
+ * This adds an IRQ affinity notifier that will update the reverse-map
+ * automatically.
+ *
+ * Must be called in process context, after the IRQ is allocated but
+ * before it is bound with request_irq().
+ */
+int irq_cpu_rmap_add(struct cpu_rmap *rmap, int irq)
+{
+	struct irq_glue *glue = kzalloc(sizeof(*glue), GFP_KERNEL);
+	int rc;
+
+	if (!glue)
+		return -ENOMEM;
+	glue->notify.notify = irq_cpu_rmap_notify;
+	glue->notify.release = irq_cpu_rmap_release;
+	glue->rmap = rmap;
+	glue->index = cpu_rmap_add(rmap, glue);
+	rc = irq_set_affinity_notifier(irq, &glue->notify);
+	if (rc)
+		kfree(glue);
+	return rc;
+}
+EXPORT_SYMBOL(irq_cpu_rmap_add);
+
+#endif /* CONFIG_GENERIC_HARDIRQS */
-- 
1.7.3.4



-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next-2.6 3/5] net: RPS: Enable hardware acceleration of RFS
  2011-01-19 20:59 [PATCH net-next-2.6 0/5] RFS hardware acceleration (v3) Ben Hutchings
  2011-01-19 21:02   ` Ben Hutchings
  2011-01-19 21:03 ` [PATCH net-next-2.6 2/5] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
@ 2011-01-19 21:03 ` Ben Hutchings
  2011-01-24 23:01   ` David Miller
  2011-01-19 21:06 ` [PATCH net-next-2.6 4/5] sfc: Limit filter search depth further for performance hints (i.e. RFS) Ben Hutchings
  2011-01-19 21:06 ` [PATCH net-next-2.6 5/5] sfc: Implement hardware acceleration of RFS Ben Hutchings
  4 siblings, 1 reply; 19+ messages in thread
From: Ben Hutchings @ 2011-01-19 21:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers, Tom Herbert

Allow drivers for multiqueue hardware with flow filter tables to
accelerate RFS.  The driver must:

1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion
IRQs (in queue order).  This will provide a mapping from CPUs to the
queues for which completions are handled nearest to them.

2. Implement net_device_ops::ndo_rx_flow_steer.  This operation adds
or replaces a filter steering the given flow to the given RX queue, if
possible.

3. Periodically remove filters for which rps_may_expire_flow() returns
true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/netdevice.h |   33 ++++++++++++++-
 net/Kconfig               |    6 +++
 net/core/dev.c            |   97 ++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 127 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d971346..abe8b2d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -551,14 +551,16 @@ struct rps_map {
 #define RPS_MAP_SIZE(_num) (sizeof(struct rps_map) + (_num * sizeof(u16)))
 
 /*
- * The rps_dev_flow structure contains the mapping of a flow to a CPU and the
- * tail pointer for that CPU's input queue at the time of last enqueue.
+ * The rps_dev_flow structure contains the mapping of a flow to a CPU, the
+ * tail pointer for that CPU's input queue at the time of last enqueue, and
+ * a hardware filter index.
  */
 struct rps_dev_flow {
 	u16 cpu;
-	u16 fill;
+	u16 filter;
 	unsigned int last_qtail;
 };
+#define RPS_NO_FILTER 0xffff
 
 /*
  * The rps_dev_flow_table structure contains a table of flow mappings.
@@ -608,6 +610,11 @@ static inline void rps_reset_sock_flow(struct rps_sock_flow_table *table,
 
 extern struct rps_sock_flow_table __rcu *rps_sock_flow_table;
 
+#ifdef CONFIG_RFS_ACCEL
+extern bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
+				u32 flow_id, u16 filter_id);
+#endif
+
 /* This structure contains an instance of an RX queue. */
 struct netdev_rx_queue {
 	struct rps_map __rcu		*rps_map;
@@ -753,6 +760,13 @@ struct xps_dev_maps {
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
  *			  struct nlattr *port[]);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ *
+ *	RFS acceleration.
+ * int (*ndo_rx_flow_steer)(struct net_device *dev, const struct sk_buff *skb,
+ *			    u16 rxq_index, u32 flow_id);
+ *	Set hardware filter for RFS.  rxq_index is the target queue index;
+ *	flow_id is a flow ID to be passed to rps_may_expire_flow() later.
+ *	Return the filter ID on success, or a negative error code.
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -825,6 +839,12 @@ struct net_device_ops {
 	int			(*ndo_fcoe_get_wwn)(struct net_device *dev,
 						    u64 *wwn, int type);
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	int			(*ndo_rx_flow_steer)(struct net_device *dev,
+						     const struct sk_buff *skb,
+						     u16 rxq_index,
+						     u32 flow_id);
+#endif
 };
 
 /*
@@ -1039,6 +1059,13 @@ struct net_device {
 
 	/* Number of RX queues currently active in device */
 	unsigned int		real_num_rx_queues;
+
+#ifdef CONFIG_RFS_ACCEL
+	/* CPU reverse-mapping for RX completion interrupts, indexed
+	 * by RX queue number.  Assigned by driver.  This must only be
+	 * set if the ndo_rx_flow_steer operation is defined. */
+	struct cpu_rmap		*rx_cpu_rmap;
+#endif
 #endif
 
 	rx_handler_func_t __rcu	*rx_handler;
diff --git a/net/Kconfig b/net/Kconfig
index 7284062..79cabf1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -221,6 +221,12 @@ config RPS
 	depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
 	default y
 
+config RFS_ACCEL
+	boolean
+	depends on RPS && GENERIC_HARDIRQS
+	select CPU_RMAP
+	default y
+
 config XPS
 	boolean
 	depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
diff --git a/net/core/dev.c b/net/core/dev.c
index 7741507..3c18d9c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -132,6 +132,7 @@
 #include <trace/events/skb.h>
 #include <linux/pci.h>
 #include <linux/inetdevice.h>
+#include <linux/cpu_rmap.h>
 
 #include "net-sysfs.h"
 
@@ -2532,6 +2533,53 @@ EXPORT_SYMBOL(__skb_get_rxhash);
 struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly;
 EXPORT_SYMBOL(rps_sock_flow_table);
 
+static struct rps_dev_flow *
+set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
+	    struct rps_dev_flow *rflow, u16 next_cpu)
+{
+	u16 tcpu;
+
+	tcpu = rflow->cpu = next_cpu;
+	if (tcpu != RPS_NO_CPU) {
+#ifdef CONFIG_RFS_ACCEL
+		struct netdev_rx_queue *rxqueue;
+		struct rps_dev_flow_table *flow_table;
+		struct rps_dev_flow *old_rflow;
+		u32 flow_id;
+		u16 rxq_index;
+		int rc;
+
+		/* Should we steer this flow to a different hardware queue? */
+		if (!skb_rx_queue_recorded(skb) || !dev->rx_cpu_rmap)
+			goto out;
+		rxq_index = cpu_rmap_lookup_index(dev->rx_cpu_rmap, next_cpu);
+		if (rxq_index == skb_get_rx_queue(skb))
+			goto out;
+
+		rxqueue = dev->_rx + rxq_index;
+		flow_table = rcu_dereference(rxqueue->rps_flow_table);
+		if (!flow_table)
+			goto out;
+		flow_id = skb->rxhash & flow_table->mask;
+		rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb,
+							rxq_index, flow_id);
+		if (rc < 0)
+			goto out;
+		old_rflow = rflow;
+		rflow = &flow_table->flows[flow_id];
+		rflow->cpu = next_cpu;
+		rflow->filter = rc;
+		if (old_rflow->filter == rflow->filter)
+			old_rflow->filter = RPS_NO_FILTER;
+	out:
+#endif
+		rflow->last_qtail =
+			per_cpu(softnet_data, tcpu).input_queue_head;
+	}
+
+	return rflow;
+}
+
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
@@ -2602,12 +2650,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		if (unlikely(tcpu != next_cpu) &&
 		    (tcpu == RPS_NO_CPU || !cpu_online(tcpu) ||
 		     ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
-		      rflow->last_qtail)) >= 0)) {
-			tcpu = rflow->cpu = next_cpu;
-			if (tcpu != RPS_NO_CPU)
-				rflow->last_qtail = per_cpu(softnet_data,
-				    tcpu).input_queue_head;
-		}
+		      rflow->last_qtail)) >= 0))
+			rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
+
 		if (tcpu != RPS_NO_CPU && cpu_online(tcpu)) {
 			*rflowp = rflow;
 			cpu = tcpu;
@@ -2628,6 +2673,46 @@ done:
 	return cpu;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+
+/**
+ * rps_may_expire_flow - check whether an RFS hardware filter may be removed
+ * @dev: Device on which the filter was set
+ * @rxq_index: RX queue index
+ * @flow_id: Flow ID passed to ndo_rx_flow_steer()
+ * @filter_id: Filter ID returned by ndo_rx_flow_steer()
+ *
+ * Drivers that implement ndo_rx_flow_steer() should periodically call
+ * this function for each installed filter and remove the filters for
+ * which it returns %true.
+ */
+bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
+			 u32 flow_id, u16 filter_id)
+{
+	struct netdev_rx_queue *rxqueue = dev->_rx + rxq_index;
+	struct rps_dev_flow_table *flow_table;
+	struct rps_dev_flow *rflow;
+	bool expire = true;
+	int cpu;
+
+	rcu_read_lock();
+	flow_table = rcu_dereference(rxqueue->rps_flow_table);
+	if (flow_table && flow_id <= flow_table->mask) {
+		rflow = &flow_table->flows[flow_id];
+		cpu = ACCESS_ONCE(rflow->cpu);
+		if (rflow->filter == filter_id && cpu != RPS_NO_CPU &&
+		    ((int)(per_cpu(softnet_data, cpu).input_queue_head -
+			   rflow->last_qtail) <
+		     (int)(10 * flow_table->mask)))
+			expire = false;
+	}
+	rcu_read_unlock();
+	return expire;
+}
+EXPORT_SYMBOL(rps_may_expire_flow);
+
+#endif /* CONFIG_RFS_ACCEL */
+
 /* Called from hardirq (IPI) context */
 static void rps_trigger_softirq(void *data)
 {
-- 
1.7.3.4



-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next-2.6 4/5] sfc: Limit filter search depth further for performance hints (i.e. RFS)
  2011-01-19 20:59 [PATCH net-next-2.6 0/5] RFS hardware acceleration (v3) Ben Hutchings
                   ` (2 preceding siblings ...)
  2011-01-19 21:03 ` [PATCH net-next-2.6 3/5] net: RPS: Enable hardware acceleration of RFS Ben Hutchings
@ 2011-01-19 21:06 ` Ben Hutchings
  2011-01-24 23:02   ` David Miller
  2011-01-19 21:06 ` [PATCH net-next-2.6 5/5] sfc: Implement hardware acceleration of RFS Ben Hutchings
  4 siblings, 1 reply; 19+ messages in thread
From: Ben Hutchings @ 2011-01-19 21:06 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev, linux-net-drivers

---
I consider this experimental still; so it's not signed-off.

Ben.

 drivers/net/sfc/filter.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/sfc/filter.c b/drivers/net/sfc/filter.c
index d4722c4..47a1b79 100644
--- a/drivers/net/sfc/filter.c
+++ b/drivers/net/sfc/filter.c
@@ -27,6 +27,10 @@
  */
 #define FILTER_CTL_SRCH_MAX 200
 
+/* Don't try very hard to find space for performance hints, as this is
+ * counter-productive. */
+#define FILTER_CTL_SRCH_HINT_MAX 5
+
 enum efx_filter_table_id {
 	EFX_FILTER_TABLE_RX_IP = 0,
 	EFX_FILTER_TABLE_RX_MAC,
@@ -325,15 +329,16 @@ static int efx_filter_search(struct efx_filter_table *table,
 			     struct efx_filter_spec *spec, u32 key,
 			     bool for_insert, int *depth_required)
 {
-	unsigned hash, incr, filter_idx, depth;
+	unsigned hash, incr, filter_idx, depth, depth_max;
 	struct efx_filter_spec *cmp;
 
 	hash = efx_filter_hash(key);
 	incr = efx_filter_increment(key);
+	depth_max = (spec->priority <= EFX_FILTER_PRI_HINT ?
+		     FILTER_CTL_SRCH_HINT_MAX : FILTER_CTL_SRCH_MAX);
 
 	for (depth = 1, filter_idx = hash & (table->size - 1);
-	     depth <= FILTER_CTL_SRCH_MAX &&
-		     test_bit(filter_idx, table->used_bitmap);
+	     depth <= depth_max && test_bit(filter_idx, table->used_bitmap);
 	     ++depth) {
 		cmp = &table->spec[filter_idx];
 		if (efx_filter_equal(spec, cmp))
@@ -342,7 +347,7 @@ static int efx_filter_search(struct efx_filter_table *table,
 	}
 	if (!for_insert)
 		return -ENOENT;
-	if (depth > FILTER_CTL_SRCH_MAX)
+	if (depth > depth_max)
 		return -EBUSY;
 found:
 	*depth_required = depth;
-- 
1.7.3.4



-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next-2.6 5/5] sfc: Implement hardware acceleration of RFS
  2011-01-19 20:59 [PATCH net-next-2.6 0/5] RFS hardware acceleration (v3) Ben Hutchings
                   ` (3 preceding siblings ...)
  2011-01-19 21:06 ` [PATCH net-next-2.6 4/5] sfc: Limit filter search depth further for performance hints (i.e. RFS) Ben Hutchings
@ 2011-01-19 21:06 ` Ben Hutchings
  4 siblings, 0 replies; 19+ messages in thread
From: Ben Hutchings @ 2011-01-19 21:06 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev, linux-net-drivers

---
I consider this experimental still, so it's not signed-off.

Ben.

 drivers/net/sfc/efx.c        |   49 +++++++++++++++++++++-
 drivers/net/sfc/efx.h        |   16 +++++++
 drivers/net/sfc/filter.c     |   94 ++++++++++++++++++++++++++++++++++++++++++
 drivers/net/sfc/net_driver.h |    3 +
 4 files changed, 160 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 002bac7..607316d 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -21,6 +21,7 @@
 #include <linux/ethtool.h>
 #include <linux/topology.h>
 #include <linux/gfp.h>
+#include <linux/cpu_rmap.h>
 #include "net_driver.h"
 #include "efx.h"
 #include "nic.h"
@@ -307,6 +308,8 @@ static int efx_poll(struct napi_struct *napi, int budget)
 			channel->irq_mod_score = 0;
 		}
 
+		efx_filter_rfs_expire(channel);
+
 		/* There is no race here; although napi_disable() will
 		 * only wait for napi_complete(), this isn't a problem
 		 * since efx_channel_processed() will have no effect if
@@ -1175,10 +1178,32 @@ static int efx_wanted_channels(void)
 	return count;
 }
 
+static int
+efx_init_rx_cpu_rmap(struct efx_nic *efx, struct msix_entry *xentries)
+{
+#ifdef CONFIG_RFS_ACCEL
+	int i, rc;
+
+	efx->net_dev->rx_cpu_rmap = alloc_irq_cpu_rmap(efx->n_rx_channels);
+	if (!efx->net_dev->rx_cpu_rmap)
+		return -ENOMEM;
+	for (i = 0; i < efx->n_rx_channels; i++) {
+		rc = irq_cpu_rmap_add(efx->net_dev->rx_cpu_rmap,
+				      xentries[i].vector);
+		if (rc) {
+			free_irq_cpu_rmap(efx->net_dev->rx_cpu_rmap);
+			efx->net_dev->rx_cpu_rmap = NULL;
+			return rc;
+		}
+	}
+#endif
+	return 0;
+}
+
 /* Probe the number and type of interrupts we are able to obtain, and
  * the resulting numbers of channels and RX queues.
  */
-static void efx_probe_interrupts(struct efx_nic *efx)
+static int efx_probe_interrupts(struct efx_nic *efx)
 {
 	int max_channels =
 		min_t(int, efx->type->phys_addr_channels, EFX_MAX_CHANNELS);
@@ -1220,6 +1245,11 @@ static void efx_probe_interrupts(struct efx_nic *efx)
 				efx->n_tx_channels = efx->n_channels;
 				efx->n_rx_channels = efx->n_channels;
 			}
+			rc = efx_init_rx_cpu_rmap(efx, xentries);
+			if (rc) {
+				pci_disable_msix(efx->pci_dev);
+				return rc;
+			}
 			for (i = 0; i < n_channels; i++)
 				efx_get_channel(efx, i)->irq =
 					xentries[i].vector;
@@ -1253,6 +1283,8 @@ static void efx_probe_interrupts(struct efx_nic *efx)
 		efx->n_tx_channels = 1;
 		efx->legacy_irq = efx->pci_dev->irq;
 	}
+
+	return 0;
 }
 
 static void efx_remove_interrupts(struct efx_nic *efx)
@@ -1302,7 +1334,9 @@ static int efx_probe_nic(struct efx_nic *efx)
 
 	/* Determine the number of channels and queues by trying to hook
 	 * in MSI-X interrupts. */
-	efx_probe_interrupts(efx);
+	rc = efx_probe_interrupts(efx);
+	if (rc)
+		goto fail;
 
 	if (efx->n_channels > 1)
 		get_random_bytes(&efx->rx_hash_key, sizeof(efx->rx_hash_key));
@@ -1317,6 +1351,10 @@ static int efx_probe_nic(struct efx_nic *efx)
 	efx_init_irq_moderation(efx, tx_irq_mod_usec, rx_irq_mod_usec, true);
 
 	return 0;
+
+fail:
+	efx->type->remove(efx);
+	return rc;
 }
 
 static void efx_remove_nic(struct efx_nic *efx)
@@ -1849,6 +1887,9 @@ static const struct net_device_ops efx_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = efx_netpoll,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer	= efx_filter_rfs,
+#endif
 };
 
 static void efx_update_name(struct efx_nic *efx)
@@ -2288,6 +2329,10 @@ static void efx_fini_struct(struct efx_nic *efx)
  */
 static void efx_pci_remove_main(struct efx_nic *efx)
 {
+#ifdef CONFIG_RFS_ACCEL
+	free_irq_cpu_rmap(efx->net_dev->rx_cpu_rmap);
+	efx->net_dev->rx_cpu_rmap = NULL;
+#endif
 	efx_nic_fini_interrupt(efx);
 	efx_fini_channels(efx);
 	efx_fini_port(efx);
diff --git a/drivers/net/sfc/efx.h b/drivers/net/sfc/efx.h
index d43a7e5..82c6c53 100644
--- a/drivers/net/sfc/efx.h
+++ b/drivers/net/sfc/efx.h
@@ -74,6 +74,22 @@ extern int efx_filter_remove_filter(struct efx_nic *efx,
 				    struct efx_filter_spec *spec);
 extern void efx_filter_clear_rx(struct efx_nic *efx,
 				enum efx_filter_priority priority);
+#ifdef CONFIG_RFS_ACCEL
+extern int efx_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
+			  u16 rxq_index, u32 flow_id);
+extern unsigned __efx_filter_rfs_expire(struct efx_nic *efx, unsigned quota);
+static inline void efx_filter_rfs_expire(struct efx_channel *channel)
+{
+	if (channel->rfs_filters_added >= 100) {
+		channel->rfs_filters_added -=
+			__efx_filter_rfs_expire(channel->efx, 100);
+	}
+}
+#define efx_filter_rfs_enabled() 1
+#else
+static inline void efx_filter_rfs_expire(struct efx_channel *channel) {}
+#define efx_filter_rfs_enabled() 0
+#endif
 
 /* Channels */
 extern void efx_process_channel_now(struct efx_channel *channel);
diff --git a/drivers/net/sfc/filter.c b/drivers/net/sfc/filter.c
index 47a1b79..987cae4 100644
--- a/drivers/net/sfc/filter.c
+++ b/drivers/net/sfc/filter.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/in.h>
+#include <net/ip.h>
 #include "efx.h"
 #include "filter.h"
 #include "io.h"
@@ -51,6 +52,10 @@ struct efx_filter_table {
 struct efx_filter_state {
 	spinlock_t	lock;
 	struct efx_filter_table table[EFX_FILTER_TABLE_COUNT];
+#ifdef CONFIG_RFS_ACCEL
+	u32		*rps_flow_id;
+	unsigned	rps_expire_index;
+#endif
 };
 
 /* The filter hash function is LFSR polynomial x^16 + x^3 + 1 of a 32-bit
@@ -567,6 +572,13 @@ int efx_probe_filters(struct efx_nic *efx)
 	spin_lock_init(&state->lock);
 
 	if (efx_nic_rev(efx) >= EFX_REV_FALCON_B0) {
+#ifdef CONFIG_RFS_ACCEL
+		state->rps_flow_id = kcalloc(FR_BZ_RX_FILTER_TBL0_ROWS,
+					     sizeof(*state->rps_flow_id),
+					     GFP_KERNEL);
+		if (!state->rps_flow_id)
+			goto fail;
+#endif
 		table = &state->table[EFX_FILTER_TABLE_RX_IP];
 		table->id = EFX_FILTER_TABLE_RX_IP;
 		table->offset = FR_BZ_RX_FILTER_TBL0;
@@ -612,5 +624,87 @@ void efx_remove_filters(struct efx_nic *efx)
 		kfree(state->table[table_id].used_bitmap);
 		vfree(state->table[table_id].spec);
 	}
+#ifdef CONFIG_RFS_ACCEL
+	kfree(state->rps_flow_id);
+#endif
 	kfree(state);
 }
+
+#ifdef CONFIG_RFS_ACCEL
+
+int efx_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
+		   u16 rxq_index, u32 flow_id)
+{
+	struct efx_nic *efx = netdev_priv(net_dev);
+	struct efx_channel *channel;
+	struct efx_filter_state *state = efx->filter_state;
+	struct efx_filter_spec spec;
+	const struct iphdr *ip;
+	const __be16 *ports;
+	int nhoff;
+	int rc;
+
+	nhoff = skb_network_offset(skb);
+
+	if (skb->protocol != htons(ETH_P_IP))
+		return -EPROTONOSUPPORT;
+
+	/* RFS must validate the IP header length before calling us */
+	EFX_BUG_ON_PARANOID(!pskb_may_pull(skb, nhoff + sizeof(*ip)));
+	ip = (const struct iphdr *)(skb->data + nhoff);
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return -EPROTONOSUPPORT;
+	EFX_BUG_ON_PARANOID(!pskb_may_pull(skb, nhoff + 4 * ip->ihl + 4));
+	ports = (const __be16 *)(skb->data + nhoff + 4 * ip->ihl);
+
+	efx_filter_init_rx(&spec, EFX_FILTER_PRI_HINT, 0, rxq_index);
+	rc = efx_filter_set_ipv4_full(&spec, ip->protocol,
+				      ip->daddr, ports[1], ip->saddr, ports[0]);
+	if (rc)
+		return rc;
+
+	rc = efx_filter_insert_filter(efx, &spec, true);
+	if (rc < 0)
+		return rc;
+
+	/* Remember this so we can check whether to expire the filter later */
+	state->rps_flow_id[rc] = flow_id;
+	channel = efx_get_channel(efx, skb_get_rx_queue(skb));
+	++channel->rfs_filters_added;
+
+	return rc;
+}
+
+unsigned __efx_filter_rfs_expire(struct efx_nic *efx, unsigned quota)
+{
+	struct efx_filter_state *state = efx->filter_state;
+	struct efx_filter_table *table = &state->table[EFX_FILTER_TABLE_RX_IP];
+	unsigned mask = table->size - 1;
+	unsigned index;
+	unsigned stop;
+
+	if (!spin_trylock_bh(&state->lock))
+		return 0;
+
+	index = state->rps_expire_index;
+	stop = (index + quota) & mask;
+
+	while (index != stop) {
+		if (test_bit(index, table->used_bitmap) &&
+		    table->spec[index].priority == EFX_FILTER_PRI_HINT &&
+		    rps_may_expire_flow(efx->net_dev,
+					table->spec[index].dmaq_id,
+					state->rps_flow_id[index], index))
+			efx_filter_table_clear_entry(efx, table, index);
+		index = (index + 1) & mask;
+	}
+
+	state->rps_expire_index = stop;
+	if (table->used == 0)
+		efx_filter_table_reset_search_depth(table);
+
+	spin_unlock_bh(&state->lock);
+	return quota;
+}
+
+#endif /* CONFIG_RFS_ACCEL */
diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index 28df866..b92f442 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -358,6 +358,9 @@ struct efx_channel {
 
 	unsigned int irq_count;
 	unsigned int irq_mod_score;
+#ifdef CONFIG_RFS_ACCEL
+	unsigned int rfs_filters_added;
+#endif
 
 	int rx_alloc_level;
 	int rx_alloc_push_pages;
-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 21:02   ` Ben Hutchings
  (?)
@ 2011-01-19 21:19   ` Thomas Gleixner
  2011-01-19 21:27     ` David Miller
  -1 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2011-01-19 21:19 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, linux-net-drivers, Tom Herbert, David Miller

B1;2401;0cOn Wed, 19 Jan 2011, Ben Hutchings wrote:

> When initiating I/O on a multiqueue and multi-IRQ device, we may want
> to select a queue for which the response will be handled on the same
> or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
> notification mechanism to support this.
> 
> This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Thomas, I hope this answers all your comments.  If you are happy with
> this, can it go into net-next-2.6 so the rest of this series can get
> into 2.6.39?

Looks good, but I prefer to take it via the tip/genirq branch as I
have other changes pending to that code. I'll put this into a separate
branch based on rc1 as a single commit so Dave can pull that branch
into netdev. Dave ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 21:19   ` Thomas Gleixner
@ 2011-01-19 21:27     ` David Miller
  2011-01-19 21:39       ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2011-01-19 21:27 UTC (permalink / raw)
  To: tglx; +Cc: bhutchings, netdev, linux-net-drivers, therbert

From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 19 Jan 2011 22:19:04 +0100 (CET)

> B1;2401;0cOn Wed, 19 Jan 2011, Ben Hutchings wrote:
> 
>> When initiating I/O on a multiqueue and multi-IRQ device, we may want
>> to select a queue for which the response will be handled on the same
>> or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
>> notification mechanism to support this.
>> 
>> This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.
>> 
>> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
>> ---
>> Thomas, I hope this answers all your comments.  If you are happy with
>> this, can it go into net-next-2.6 so the rest of this series can get
>> into 2.6.39?
> 
> Looks good, but I prefer to take it via the tip/genirq branch as I
> have other changes pending to that code. I'll put this into a separate
> branch based on rc1 as a single commit so Dave can pull that branch
> into netdev. Dave ?

But if you do that what happens down the line when Linus pulls our trees
in?

If he takes your tip/genirq first, I have merge hassles to deal with.

If he takes the -rc1 relative version I end up with, you'll have the
merge hassles.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 21:27     ` David Miller
@ 2011-01-19 21:39       ` Thomas Gleixner
  2011-01-19 21:48         ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2011-01-19 21:39 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev, linux-net-drivers, therbert

On Wed, 19 Jan 2011, David Miller wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 19 Jan 2011 22:19:04 +0100 (CET)
> 
> > B1;2401;0cOn Wed, 19 Jan 2011, Ben Hutchings wrote:
> > 
> >> When initiating I/O on a multiqueue and multi-IRQ device, we may want
> >> to select a queue for which the response will be handled on the same
> >> or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
> >> notification mechanism to support this.
> >> 
> >> This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.
> >> 
> >> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> >> ---
> >> Thomas, I hope this answers all your comments.  If you are happy with
> >> this, can it go into net-next-2.6 so the rest of this series can get
> >> into 2.6.39?
> > 
> > Looks good, but I prefer to take it via the tip/genirq branch as I
> > have other changes pending to that code. I'll put this into a separate
> > branch based on rc1 as a single commit so Dave can pull that branch
> > into netdev. Dave ?
> 
> But if you do that what happens down the line when Linus pulls our trees
> in?
> 
> If he takes your tip/genirq first, I have merge hassles to deal with.
> 
> If he takes the -rc1 relative version I end up with, you'll have the
> merge hassles.
 
Nothing will happen and no hassels at all, because your tree and my
tree have the exact same commit with the exact same sha1. You don't
have further changes in your tree which touch genirq stuff and I don't
have anything which touches net.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 21:39       ` Thomas Gleixner
@ 2011-01-19 21:48         ` David Miller
  2011-01-19 21:53           ` Thomas Gleixner
  2011-01-22 16:38           ` Thomas Gleixner
  0 siblings, 2 replies; 19+ messages in thread
From: David Miller @ 2011-01-19 21:48 UTC (permalink / raw)
  To: tglx; +Cc: bhutchings, netdev, linux-net-drivers, therbert

From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 19 Jan 2011 22:39:04 +0100 (CET)

> On Wed, 19 Jan 2011, David Miller wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Date: Wed, 19 Jan 2011 22:19:04 +0100 (CET)
>> 
>> > B1;2401;0cOn Wed, 19 Jan 2011, Ben Hutchings wrote:
>> > 
>> >> When initiating I/O on a multiqueue and multi-IRQ device, we may want
>> >> to select a queue for which the response will be handled on the same
>> >> or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
>> >> notification mechanism to support this.
>> >> 
>> >> This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.
>> >> 
>> >> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
>> >> ---
>> >> Thomas, I hope this answers all your comments.  If you are happy with
>> >> this, can it go into net-next-2.6 so the rest of this series can get
>> >> into 2.6.39?
>> > 
>> > Looks good, but I prefer to take it via the tip/genirq branch as I
>> > have other changes pending to that code. I'll put this into a separate
>> > branch based on rc1 as a single commit so Dave can pull that branch
>> > into netdev. Dave ?
>> 
>> But if you do that what happens down the line when Linus pulls our trees
>> in?
>> 
>> If he takes your tip/genirq first, I have merge hassles to deal with.
>> 
>> If he takes the -rc1 relative version I end up with, you'll have the
>> merge hassles.
>  
> Nothing will happen and no hassels at all, because your tree and my
> tree have the exact same commit with the exact same sha1. You don't
> have further changes in your tree which touch genirq stuff and I don't
> have anything which touches net.

You said you had stuff before Ben's patches, and that's why you needed
to provide me with an -rc1 relative version of his commits.

If that's not the case, then yes it would work just fine.

Just give me the URL to pull from, thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 21:48         ` David Miller
@ 2011-01-19 21:53           ` Thomas Gleixner
  2011-01-22 16:38           ` Thomas Gleixner
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2011-01-19 21:53 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev, linux-net-drivers, therbert

On Wed, 19 Jan 2011, David Miller wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 19 Jan 2011 22:39:04 +0100 (CET)
> > Nothing will happen and no hassels at all, because your tree and my
> > tree have the exact same commit with the exact same sha1. You don't
> > have further changes in your tree which touch genirq stuff and I don't
> > have anything which touches net.
> 
> You said you had stuff before Ben's patches, and that's why you needed
> to provide me with an -rc1 relative version of his commits.

Sorry, I meant other stuff pending (not yet applied) which will
interfere with that patch. And even if I would have pending patches in
git already it would not matter:

    tip/irq/core     some stuff based on whatever
    tip/irq/for-net  single patch based on rc1

tip/irq/core merges tip/irq/for-net _before_ offering it to Linus

You pull tip/irq/for-net and nothing breaks :)
 
> If that's not the case, then yes it would work just fine.
> 
> Just give me the URL to pull from, thanks!

Will do, give me a day or two !

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip:irq/numa] genirq: Add IRQ affinity notifiers
  2011-01-19 21:02   ` Ben Hutchings
  (?)
  (?)
@ 2011-01-22 16:37   ` tip-bot for Ben Hutchings
  -1 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Ben Hutchings @ 2011-01-22 16:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, therbert, bhutchings, davem

Commit-ID:  cd7eab44e9946c28d595abe3e9a43e945bc49141
Gitweb:     http://git.kernel.org/tip/cd7eab44e9946c28d595abe3e9a43e945bc49141
Author:     Ben Hutchings <bhutchings@solarflare.com>
AuthorDate: Wed, 19 Jan 2011 21:01:44 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jan 2011 17:34:25 +0100

genirq: Add IRQ affinity notifiers

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
notification mechanism to support this.

This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: linux-net-drivers@solarflare.com
Cc: Tom Herbert <therbert@google.com>
Cc: David Miller <davem@davemloft.net>
LKML-Reference: <1295470904.11126.84.camel@bwh-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/interrupt.h |   33 +++++++++++++++++-
 include/linux/irqdesc.h   |    3 ++
 kernel/irq/manage.c       |   82 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 117 insertions(+), 1 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 55e0d42..63c5ad7 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -14,6 +14,8 @@
 #include <linux/smp.h>
 #include <linux/percpu.h>
 #include <linux/hrtimer.h>
+#include <linux/kref.h>
+#include <linux/workqueue.h>
 
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
@@ -240,6 +242,35 @@ extern int irq_can_set_affinity(unsigned int irq);
 extern int irq_select_affinity(unsigned int irq);
 
 extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+
+/**
+ * struct irq_affinity_notify - context for notification of IRQ affinity changes
+ * @irq:		Interrupt to which notification applies
+ * @kref:		Reference count, for internal use
+ * @work:		Work item, for internal use
+ * @notify:		Function to be called on change.  This will be
+ *			called in process context.
+ * @release:		Function to be called on release.  This will be
+ *			called in process context.  Once registered, the
+ *			structure must only be freed when this function is
+ *			called or later.
+ */
+struct irq_affinity_notify {
+	unsigned int irq;
+	struct kref kref;
+	struct work_struct work;
+	void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
+	void (*release)(struct kref *ref);
+};
+
+extern int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
+
+static inline void irq_run_affinity_notifiers(void)
+{
+	flush_scheduled_work();
+}
+
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
@@ -255,7 +286,7 @@ static inline int irq_can_set_affinity(unsigned int irq)
 static inline int irq_select_affinity(unsigned int irq)  { return 0; }
 
 static inline int irq_set_affinity_hint(unsigned int irq,
-                                        const struct cpumask *m)
+					const struct cpumask *m)
 {
 	return -EINVAL;
 }
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index c1a95b7..bfef56d 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -8,6 +8,7 @@
  * For now it's included from <linux/irq.h>
  */
 
+struct irq_affinity_notify;
 struct proc_dir_entry;
 struct timer_rand_state;
 /**
@@ -24,6 +25,7 @@ struct timer_rand_state;
  * @last_unhandled:	aging timer for unhandled count
  * @irqs_unhandled:	stats field for spurious unhandled interrupts
  * @lock:		locking for SMP
+ * @affinity_notify:	context for notification of affinity changes
  * @pending_mask:	pending rebalanced interrupts
  * @threads_active:	number of irqaction threads currently running
  * @wait_for_threads:	wait queue for sync_irq to wait for threaded handlers
@@ -70,6 +72,7 @@ struct irq_desc {
 	raw_spinlock_t		lock;
 #ifdef CONFIG_SMP
 	const struct cpumask	*affinity_hint;
+	struct irq_affinity_notify *affinity_notify;
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 	cpumask_var_t		pending_mask;
 #endif
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0caa59f..0587c5c 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -134,6 +134,10 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 		irq_set_thread_affinity(desc);
 	}
 #endif
+	if (desc->affinity_notify) {
+		kref_get(&desc->affinity_notify->kref);
+		schedule_work(&desc->affinity_notify->work);
+	}
 	desc->status |= IRQ_AFFINITY_SET;
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return 0;
@@ -155,6 +159,79 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
+static void irq_affinity_notify(struct work_struct *work)
+{
+	struct irq_affinity_notify *notify =
+		container_of(work, struct irq_affinity_notify, work);
+	struct irq_desc *desc = irq_to_desc(notify->irq);
+	cpumask_var_t cpumask;
+	unsigned long flags;
+
+	if (!desc)
+		goto out;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
+		goto out;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+	if (desc->status & IRQ_MOVE_PENDING)
+		cpumask_copy(cpumask, desc->pending_mask);
+	else
+#endif
+		cpumask_copy(cpumask, desc->affinity);
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	notify->notify(notify, cpumask);
+
+	free_cpumask_var(cpumask);
+out:
+	kref_put(&notify->kref, notify->release);
+}
+
+/**
+ *	irq_set_affinity_notifier - control notification of IRQ affinity changes
+ *	@irq:		Interrupt for which to enable/disable notification
+ *	@notify:	Context for notification, or %NULL to disable
+ *			notification.  Function pointers must be initialised;
+ *			the other fields will be initialised by this function.
+ *
+ *	Must be called in process context.  Notification may only be enabled
+ *	after the IRQ is allocated and must be disabled before the IRQ is
+ *	freed using free_irq().
+ */
+int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_affinity_notify *old_notify;
+	unsigned long flags;
+
+	/* The release function is promised process context */
+	might_sleep();
+
+	if (!desc)
+		return -EINVAL;
+
+	/* Complete initialisation of *notify */
+	if (notify) {
+		notify->irq = irq;
+		kref_init(&notify->kref);
+		INIT_WORK(&notify->work, irq_affinity_notify);
+	}
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+	old_notify = desc->affinity_notify;
+	desc->affinity_notify = notify;
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	if (old_notify)
+		kref_put(&old_notify->kref, old_notify->release);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_set_affinity_notifier);
+
 #ifndef CONFIG_AUTO_IRQ_AFFINITY
 /*
  * Generic version of the affinity autoselector.
@@ -1004,6 +1081,11 @@ void free_irq(unsigned int irq, void *dev_id)
 	if (!desc)
 		return;
 
+#ifdef CONFIG_SMP
+	if (WARN_ON(desc->affinity_notify))
+		desc->affinity_notify = NULL;
+#endif
+
 	chip_bus_lock(desc);
 	kfree(__free_irq(irq, dev_id));
 	chip_bus_sync_unlock(desc);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 21:48         ` David Miller
  2011-01-19 21:53           ` Thomas Gleixner
@ 2011-01-22 16:38           ` Thomas Gleixner
  2011-02-07 20:44             ` Ben Hutchings
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2011-01-22 16:38 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev, linux-net-drivers, therbert

B1;2401;0cOn Wed, 19 Jan 2011, David Miller wrote:
> You said you had stuff before Ben's patches, and that's why you needed
> to provide me with an -rc1 relative version of his commits.

I hope a rc2 relative version works as well :)
 
> If that's not the case, then yes it would work just fine.
> 
> Just give me the URL to pull from, thanks!

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git irq/numa

@Ben: Your patch was pretty white space damaged. Please be more
careful next time.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-19 21:02   ` Ben Hutchings
                     ` (2 preceding siblings ...)
  (?)
@ 2011-01-24 23:01   ` David Miller
  -1 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2011-01-24 23:01 UTC (permalink / raw)
  To: bhutchings; +Cc: tglx, netdev, linux-net-drivers, therbert

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 19 Jan 2011 21:01:44 +0000

> When initiating I/O on a multiqueue and multi-IRQ device, we may want
> to select a queue for which the response will be handled on the same
> or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
> notification mechanism to support this.
> 
> This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Thomas, I hope this answers all your comments.  If you are happy with
> this, can it go into net-next-2.6 so the rest of this series can get
> into 2.6.39?

I've pulled from Thomas's tree to get this commit into net-next-2.6,
thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 2/5] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-19 21:03 ` [PATCH net-next-2.6 2/5] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
@ 2011-01-24 23:01   ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2011-01-24 23:01 UTC (permalink / raw)
  To: bhutchings; +Cc: tglx, netdev, linux-net-drivers, therbert, linux-kernel

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 19 Jan 2011 21:03:25 +0000

> When initiating I/O on a multiqueue and multi-IRQ device, we may want
> to select a queue for which the response will be handled on the same
> or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
> library functions to support a generic reverse-mapping from CPUs to
> objects with affinity and the specific case where the objects are
> IRQs.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 3/5] net: RPS: Enable hardware acceleration of RFS
  2011-01-19 21:03 ` [PATCH net-next-2.6 3/5] net: RPS: Enable hardware acceleration of RFS Ben Hutchings
@ 2011-01-24 23:01   ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2011-01-24 23:01 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers, therbert

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 19 Jan 2011 21:03:53 +0000

> Allow drivers for multiqueue hardware with flow filter tables to
> accelerate RFS.  The driver must:
> 
> 1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion
> IRQs (in queue order).  This will provide a mapping from CPUs to the
> queues for which completions are handled nearest to them.
> 
> 2. Implement net_device_ops::ndo_rx_flow_steer.  This operation adds
> or replaces a filter steering the given flow to the given RX queue, if
> possible.
> 
> 3. Periodically remove filters for which rps_may_expire_flow() returns
> true.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 4/5] sfc: Limit filter search depth further for performance hints (i.e. RFS)
  2011-01-19 21:06 ` [PATCH net-next-2.6 4/5] sfc: Limit filter search depth further for performance hints (i.e. RFS) Ben Hutchings
@ 2011-01-24 23:02   ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2011-01-24 23:02 UTC (permalink / raw)
  To: bhutchings; +Cc: therbert, netdev, linux-net-drivers


Please resubmit #4 and #5 when you are confident enough to sign off on
them, then I'll add them to net-next-2.6

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers
  2011-01-22 16:38           ` Thomas Gleixner
@ 2011-02-07 20:44             ` Ben Hutchings
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Hutchings @ 2011-02-07 20:44 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: David Miller, netdev, linux-net-drivers, therbert

On Sat, 2011-01-22 at 17:38 +0100, Thomas Gleixner wrote:
> B1;2401;0cOn Wed, 19 Jan 2011, David Miller wrote:
> > You said you had stuff before Ben's patches, and that's why you needed
> > to provide me with an -rc1 relative version of his commits.
> 
> I hope a rc2 relative version works as well :)
>  
> > If that's not the case, then yes it would work just fine.
> > 
> > Just give me the URL to pull from, thanks!
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git irq/numa
> 
> @Ben: Your patch was pretty white space damaged. Please be more
> careful next time.

Oh, I see that struct irq_affinity_notify members were indented with
spaces; is that what you mean?  Sorry, I've no idea how that happened.

Anyway, thanks for applying this.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-02-07 20:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-19 20:59 [PATCH net-next-2.6 0/5] RFS hardware acceleration (v3) Ben Hutchings
2011-01-19 21:01 ` [PATCH net-next-2.6 1/5] genirq: Add IRQ affinity notifiers Ben Hutchings
2011-01-19 21:02   ` Ben Hutchings
2011-01-19 21:19   ` Thomas Gleixner
2011-01-19 21:27     ` David Miller
2011-01-19 21:39       ` Thomas Gleixner
2011-01-19 21:48         ` David Miller
2011-01-19 21:53           ` Thomas Gleixner
2011-01-22 16:38           ` Thomas Gleixner
2011-02-07 20:44             ` Ben Hutchings
2011-01-22 16:37   ` [tip:irq/numa] " tip-bot for Ben Hutchings
2011-01-24 23:01   ` [PATCH net-next-2.6 1/5] " David Miller
2011-01-19 21:03 ` [PATCH net-next-2.6 2/5] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
2011-01-24 23:01   ` David Miller
2011-01-19 21:03 ` [PATCH net-next-2.6 3/5] net: RPS: Enable hardware acceleration of RFS Ben Hutchings
2011-01-24 23:01   ` David Miller
2011-01-19 21:06 ` [PATCH net-next-2.6 4/5] sfc: Limit filter search depth further for performance hints (i.e. RFS) Ben Hutchings
2011-01-24 23:02   ` David Miller
2011-01-19 21:06 ` [PATCH net-next-2.6 5/5] sfc: Implement hardware acceleration of RFS Ben Hutchings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.