linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
@ 2013-02-13 16:11 Sebastian Andrzej Siewior
  2013-02-13 16:11 ` [PATCH 01/16] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
                   ` (17 more replies)
  0 siblings, 18 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:11 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, linux-rt-users, Carsten Emde

3.2-rt is a long term supported kernel, which lacks two RT features
from 3.6: SLUB support and the split softirq lock implementation.

SLUB has a way better performance than SLAB on RT and the split
softirq lock implementation is necessary especially for realtime
networking applications.

The following patch series backports these features to 3.2-rt.

Steven, could you please apply these patches to a separate
3.2-rt-features branch?

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/16] softirq: Make serving softirqs a task flag
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
@ 2013-02-13 16:11 ` Sebastian Andrzej Siewior
  2013-02-13 16:11 ` [PATCH 02/16] softirq: Split handling function Sebastian Andrzej Siewior
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Avoid the percpu softirq_runner pointer magic by using a task flag.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: different PF_IN_SOFTIRQ bit]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched.h |    1 +
 kernel/softirq.c      |   20 +++-----------------
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e2f9e3b..751e65a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1821,6 +1821,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * Per process flags
  */
+#define PF_IN_SOFTIRQ	0x00000001	/* Task is serving softirq */
 #define PF_STARTING	0x00000002	/* being created */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_EXITPIDONE	0x00000008	/* pi exit done on shut down */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ca00a68..d4cace1 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -383,7 +383,6 @@ static inline void ksoftirqd_clr_sched_params(void) { }
  * On RT we serialize softirq execution with a cpu local lock
  */
 static DEFINE_LOCAL_IRQ_LOCK(local_softirq_lock);
-static DEFINE_PER_CPU(struct task_struct *, local_softirq_runner);
 
 static void __do_softirq_common(int need_rcu_bh_qs);
 
@@ -438,22 +437,9 @@ void _local_bh_enable(void)
 }
 EXPORT_SYMBOL(_local_bh_enable);
 
-/* For tracing */
-int notrace __in_softirq(void)
-{
-	if (__get_cpu_var(local_softirq_lock).owner == current)
-		return __get_cpu_var(local_softirq_lock).nestcnt;
-	return 0;
-}
-
 int in_serving_softirq(void)
 {
-	int res;
-
-	preempt_disable();
-	res = __get_cpu_var(local_softirq_runner) == current;
-	preempt_enable();
-	return res;
+	return current->flags & PF_IN_SOFTIRQ;
 }
 EXPORT_SYMBOL(in_serving_softirq);
 
@@ -471,7 +457,7 @@ static void __do_softirq_common(int need_rcu_bh_qs)
 	/* Reset the pending bitmask before enabling irqs */
 	set_softirq_pending(0);
 
-	__get_cpu_var(local_softirq_runner) = current;
+	current->flags |= PF_IN_SOFTIRQ;
 
 	lockdep_softirq_enter();
 
@@ -482,7 +468,7 @@ static void __do_softirq_common(int need_rcu_bh_qs)
 		wakeup_softirqd();
 
 	lockdep_softirq_exit();
-	__get_cpu_var(local_softirq_runner) = NULL;
+	current->flags &= ~PF_IN_SOFTIRQ;
 
 	current->softirq_nestcnt--;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/16] softirq: Split handling function
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
  2013-02-13 16:11 ` [PATCH 01/16] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
@ 2013-02-13 16:11 ` Sebastian Andrzej Siewior
  2013-02-13 16:11 ` [PATCH 03/16] softirq: Split softirq locks Sebastian Andrzej Siewior
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner

From: Thomas Gleixner <tglx@linutronix.de>

Split out the inner handling function, so RT can reuse it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/softirq.c |   43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d4cace1..79ba7d3 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -139,31 +139,34 @@ static void wakeup_softirqd(void)
 		wake_up_process(tsk);
 }
 
-static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
+static void handle_softirq(unsigned int vec_nr, int cpu, int need_rcu_bh_qs)
 {
-	struct softirq_action *h = softirq_vec;
+	struct softirq_action *h = softirq_vec + vec_nr;
 	unsigned int prev_count = preempt_count();
 
-	local_irq_enable();
-	for ( ; pending; h++, pending >>= 1) {
-		unsigned int vec_nr = h - softirq_vec;
+	kstat_incr_softirqs_this_cpu(vec_nr);
+	trace_softirq_entry(vec_nr);
+	h->action(h);
+	trace_softirq_exit(vec_nr);
 
-		if (!(pending & 1))
-			continue;
+	if (unlikely(prev_count != preempt_count())) {
+		pr_err("softirq %u %s %p preempt count leak: %08x -> %08x\n",
+		       vec_nr, softirq_to_name[vec_nr], h->action,
+		       prev_count, (unsigned int) preempt_count());
+		preempt_count() = prev_count;
+	}
+	if (need_rcu_bh_qs)
+		rcu_bh_qs(cpu);
+}
 
-		kstat_incr_softirqs_this_cpu(vec_nr);
-		trace_softirq_entry(vec_nr);
-		h->action(h);
-		trace_softirq_exit(vec_nr);
-		if (unlikely(prev_count != preempt_count())) {
-			printk(KERN_ERR
- "huh, entered softirq %u %s %p with preempt_count %08x exited with %08x?\n",
-			       vec_nr, softirq_to_name[vec_nr], h->action,
-			       prev_count, (unsigned int) preempt_count());
-			preempt_count() = prev_count;
-		}
-		if (need_rcu_bh_qs)
-			rcu_bh_qs(cpu);
+static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
+{
+	unsigned int vec_nr;
+
+	local_irq_enable();
+	for (vec_nr = 0; pending; vec_nr++, pending >>= 1) {
+		if (pending & 1)
+			handle_softirq(vec_nr, cpu, need_rcu_bh_qs);
 	}
 	local_irq_disable();
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/16] softirq: Split softirq locks
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
  2013-02-13 16:11 ` [PATCH 01/16] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
  2013-02-13 16:11 ` [PATCH 02/16] softirq: Split handling function Sebastian Andrzej Siewior
@ 2013-02-13 16:11 ` Sebastian Andrzej Siewior
  2013-02-13 16:11 ` [PATCH 04/16] rcu: rcutiny: Prevent RCU stall Sebastian Andrzej Siewior
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.

If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: PF_MALLOC handling, __raise_softirq_irqoff() not
inline, shrink invoke_softirq()]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/interrupt.h |    6 +-
 include/linux/sched.h     |    1 +
 kernel/softirq.c          |  295 ++++++++++++++++++++++++++++-----------------
 3 files changed, 185 insertions(+), 117 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index f70a65b..cafd56c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -468,11 +468,7 @@ extern void thread_do_softirq(void);
 
 extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
-static inline void __raise_softirq_irqoff(unsigned int nr)
-{
-	trace_softirq_raise(nr);
-	or_softirq_pending(1UL << nr);
-}
+extern void __raise_softirq_irqoff(unsigned int nr);
 
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 751e65a..324b277 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1607,6 +1607,7 @@ struct task_struct {
 #ifdef CONFIG_PREEMPT_RT_BASE
 	struct rcu_head put_rcu;
 	int softirq_nestcnt;
+	unsigned int softirqs_raised;
 #endif
 #if defined CONFIG_PREEMPT_RT_FULL && defined CONFIG_HIGHMEM
 	int kmap_idx;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 79ba7d3..3f7b3fb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -159,6 +159,7 @@ static void handle_softirq(unsigned int vec_nr, int cpu, int need_rcu_bh_qs)
 		rcu_bh_qs(cpu);
 }
 
+#ifndef CONFIG_PREEMPT_RT_FULL
 static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
 {
 	unsigned int vec_nr;
@@ -171,7 +172,6 @@ static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
 	local_irq_disable();
 }
 
-#ifndef CONFIG_PREEMPT_RT_FULL
 /*
  * preempt_count and SOFTIRQ_OFFSET usage:
  * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
@@ -375,28 +375,113 @@ asmlinkage void do_softirq(void)
 
 #endif
 
+/*
+ * This function must run with irqs disabled!
+ */
+void raise_softirq_irqoff(unsigned int nr)
+{
+	__raise_softirq_irqoff(nr);
+
+	/*
+	 * If we're in an interrupt or softirq, we're done
+	 * (this also catches softirq-disabled code). We will
+	 * actually run the softirq once we return from
+	 * the irq or softirq.
+	 *
+	 * Otherwise we wake up ksoftirqd to make sure we
+	 * schedule the softirq soon.
+	 */
+	if (!in_interrupt())
+		wakeup_softirqd();
+}
+
+void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise(nr);
+	or_softirq_pending(1UL << nr);
+}
+
 static inline void local_bh_disable_nort(void) { local_bh_disable(); }
 static inline void _local_bh_enable_nort(void) { _local_bh_enable(); }
 static inline void ksoftirqd_set_sched_params(void) { }
 static inline void ksoftirqd_clr_sched_params(void) { }
 
+static inline int ksoftirqd_softirq_pending(void)
+{
+	return local_softirq_pending();
+}
+
 #else /* !PREEMPT_RT_FULL */
 
 /*
- * On RT we serialize softirq execution with a cpu local lock
+ * On RT we serialize softirq execution with a cpu local lock per softirq
  */
-static DEFINE_LOCAL_IRQ_LOCK(local_softirq_lock);
+static DEFINE_PER_CPU(struct local_irq_lock [NR_SOFTIRQS], local_softirq_locks);
 
-static void __do_softirq_common(int need_rcu_bh_qs);
+void __init softirq_early_init(void)
+{
+	int i;
+
+	for (i = 0; i < NR_SOFTIRQS; i++)
+		local_irq_lock_init(local_softirq_locks[i]);
+}
 
-void __do_softirq(void)
+static void lock_softirq(int which)
 {
-	__do_softirq_common(0);
+	__local_lock(&__get_cpu_var(local_softirq_locks[which]));
 }
 
-void __init softirq_early_init(void)
+static void unlock_softirq(int which)
+{
+	__local_unlock(&__get_cpu_var(local_softirq_locks[which]));
+}
+
+static void do_single_softirq(int which, int need_rcu_bh_qs)
+{
+	account_system_vtime(current);
+	current->flags |= PF_IN_SOFTIRQ;
+	lockdep_softirq_enter();
+	local_irq_enable();
+	handle_softirq(which, smp_processor_id(), need_rcu_bh_qs);
+	local_irq_disable();
+	lockdep_softirq_exit();
+	current->flags &= ~PF_IN_SOFTIRQ;
+	account_system_vtime(current);
+}
+
+/*
+ * Called with interrupts disabled. Process softirqs which were raised
+ * in current context (or on behalf of ksoftirqd).
+ */
+static void do_current_softirqs(int need_rcu_bh_qs)
 {
-	local_irq_lock_init(local_softirq_lock);
+	while (current->softirqs_raised) {
+		int i = __ffs(current->softirqs_raised);
+		unsigned int pending, mask = (1U << i);
+
+		current->softirqs_raised &= ~mask;
+		local_irq_enable();
+
+		/*
+		 * If the lock is contended, we boost the owner to
+		 * process the softirq or leave the critical section
+		 * now.
+		 */
+		lock_softirq(i);
+		local_irq_disable();
+		/*
+		 * Check with the local_softirq_pending() bits,
+		 * whether we need to process this still or if someone
+		 * else took care of it.
+		 */
+		pending = local_softirq_pending();
+		if (pending & mask) {
+			set_softirq_pending(pending & ~mask);
+			do_single_softirq(i, need_rcu_bh_qs);
+		}
+		unlock_softirq(i);
+		WARN_ON(current->softirq_nestcnt != 1);
+	}
 }
 
 void local_bh_disable(void)
@@ -411,17 +496,11 @@ void local_bh_enable(void)
 	if (WARN_ON(current->softirq_nestcnt == 0))
 		return;
 
-	if ((current->softirq_nestcnt == 1) &&
-	    local_softirq_pending() &&
-	    local_trylock(local_softirq_lock)) {
+	local_irq_disable();
+	if (current->softirq_nestcnt == 1 && current->softirqs_raised)
+		do_current_softirqs(1);
+	local_irq_enable();
 
-		local_irq_disable();
-		if (local_softirq_pending())
-			__do_softirq();
-		local_irq_enable();
-		local_unlock(local_softirq_lock);
-		WARN_ON(current->softirq_nestcnt != 1);
-	}
 	current->softirq_nestcnt--;
 	migrate_enable();
 }
@@ -446,37 +525,8 @@ int in_serving_softirq(void)
 }
 EXPORT_SYMBOL(in_serving_softirq);
 
-/*
- * Called with bh and local interrupts disabled. For full RT cpu must
- * be pinned.
- */
-static void __do_softirq_common(int need_rcu_bh_qs)
-{
-	u32 pending = local_softirq_pending();
-	int cpu = smp_processor_id();
-
-	current->softirq_nestcnt++;
-
-	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
-
-	current->flags |= PF_IN_SOFTIRQ;
-
-	lockdep_softirq_enter();
-
-	handle_pending_softirqs(pending, cpu, need_rcu_bh_qs);
-
-	pending = local_softirq_pending();
-	if (pending)
-		wakeup_softirqd();
-
-	lockdep_softirq_exit();
-	current->flags &= ~PF_IN_SOFTIRQ;
-
-	current->softirq_nestcnt--;
-}
-
-static int __thread_do_softirq(int cpu)
+/* Called with preemption disabled */
+static int ksoftirqd_do_softirq(int cpu)
 {
 	/*
 	 * Prevent the current cpu from going offline.
@@ -487,45 +537,90 @@ static int __thread_do_softirq(int cpu)
 	 */
 	pin_current_cpu();
 	/*
-	 * If called from ksoftirqd (cpu >= 0) we need to check
-	 * whether we are on the wrong cpu due to cpu offlining. If
-	 * called via thread_do_softirq() no action required.
+	 * We need to check whether we are on the wrong cpu due to cpu
+	 * offlining.
 	 */
-	if (cpu >= 0 && cpu_is_offline(cpu)) {
+	if (cpu_is_offline(cpu)) {
 		unpin_current_cpu();
 		return -1;
 	}
 	preempt_enable();
-	local_lock(local_softirq_lock);
 	local_irq_disable();
-	/*
-	 * We cannot switch stacks on RT as we want to be able to
-	 * schedule!
-	 */
-	if (local_softirq_pending())
-		__do_softirq_common(cpu >= 0);
-	local_unlock(local_softirq_lock);
-	unpin_current_cpu();
-	preempt_disable();
+	current->softirq_nestcnt++;
+	do_current_softirqs(1);
+	current->softirq_nestcnt--;
 	local_irq_enable();
+
+	preempt_disable();
+	unpin_current_cpu();
 	return 0;
 }
 
 /*
- * Called from netif_rx_ni(). Preemption enabled.
+ * Called from netif_rx_ni(). Preemption enabled, but migration
+ * disabled. So the cpu can't go away under us.
  */
 void thread_do_softirq(void)
 {
-	if (!in_serving_softirq()) {
-		preempt_disable();
-		__thread_do_softirq(-1);
-		preempt_enable();
+	if (!in_serving_softirq() && current->softirqs_raised) {
+		current->softirq_nestcnt++;
+		do_current_softirqs(0);
+		current->softirq_nestcnt--;
 	}
 }
 
-static int ksoftirqd_do_softirq(int cpu)
+void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise(nr);
+	or_softirq_pending(1UL << nr);
+
+	/*
+	 * If we are not in a hard interrupt and inside a bh disabled
+	 * region, we simply raise the flag on current. local_bh_enable()
+	 * will make sure that the softirq is executed. Otherwise we
+	 * delegate it to ksoftirqd.
+	 */
+	if (!in_irq() && current->softirq_nestcnt)
+		current->softirqs_raised |= (1U << nr);
+	else if (__this_cpu_read(ksoftirqd))
+		__this_cpu_read(ksoftirqd)->softirqs_raised |= (1U << nr);
+}
+
+/*
+ * This function must run with irqs disabled!
+ */
+void raise_softirq_irqoff(unsigned int nr)
+{
+	__raise_softirq_irqoff(nr);
+
+	/*
+	 * If we're in an hard interrupt we let irq return code deal
+	 * with the wakeup of ksoftirqd.
+	 */
+	if (in_irq())
+		return;
+
+	/*
+	 * If we are in thread context but outside of a bh disabled
+	 * region, we need to wake ksoftirqd as well.
+	 *
+	 * CHECKME: Some of the places which do that could be wrapped
+	 * into local_bh_disable/enable pairs. Though it's unclear
+	 * whether this is worth the effort. To find those places just
+	 * raise a WARN() if the condition is met.
+	 */
+	if (!current->softirq_nestcnt)
+		wakeup_softirqd();
+}
+
+void do_raise_softirq_irqoff(unsigned int nr)
+{
+	raise_softirq_irqoff(nr);
+}
+
+static inline int ksoftirqd_softirq_pending(void)
 {
-	return __thread_do_softirq(cpu);
+	return current->softirqs_raised;
 }
 
 static inline void local_bh_disable_nort(void) { }
@@ -536,6 +631,10 @@ static inline void ksoftirqd_set_sched_params(void)
 	struct sched_param param = { .sched_priority = 1 };
 
 	sched_setscheduler(current, SCHED_FIFO, &param);
+	/* Take over all pending softirqs when starting */
+	local_irq_disable();
+	current->softirqs_raised = local_softirq_pending();
+	local_irq_enable();
 }
 
 static inline void ksoftirqd_clr_sched_params(void)
@@ -567,39 +666,31 @@ void irq_enter(void)
 	__irq_enter();
 }
 
-#ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
 static inline void invoke_softirq(void)
 {
 #ifndef CONFIG_PREEMPT_RT_FULL
-	if (!force_irqthreads)
+	if (!force_irqthreads) {
+#ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
 		__do_softirq();
-	else {
-		__local_bh_disable((unsigned long)__builtin_return_address(0),
-				SOFTIRQ_OFFSET);
-		wakeup_softirqd();
-		__local_bh_enable(SOFTIRQ_OFFSET);
-	}
-#else
-	wakeup_softirqd();
-#endif
-}
 #else
-static inline void invoke_softirq(void)
-{
-#ifndef CONFIG_PREEMPT_RT_FULL
-	if (!force_irqthreads)
 		do_softirq();
-	else {
+#endif
+	} else {
 		__local_bh_disable((unsigned long)__builtin_return_address(0),
 				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
 		__local_bh_enable(SOFTIRQ_OFFSET);
 	}
-#else
-	wakeup_softirqd();
+#else /* PREEMPT_RT_FULL */
+	unsigned long flags;
+
+	local_irq_save(flags);
+	if (__this_cpu_read(ksoftirqd) &&
+	    __this_cpu_read(ksoftirqd)->softirqs_raised)
+		wakeup_softirqd();
+	local_irq_restore(flags);
 #endif
 }
-#endif
 
 /*
  * Exit an interrupt context. Process softirqs if needed and possible:
@@ -621,26 +712,6 @@ void irq_exit(void)
 	__preempt_enable_no_resched();
 }
 
-/*
- * This function must run with irqs disabled!
- */
-inline void raise_softirq_irqoff(unsigned int nr)
-{
-	__raise_softirq_irqoff(nr);
-
-	/*
-	 * If we're in an interrupt or softirq, we're done
-	 * (this also catches softirq-disabled code). We will
-	 * actually run the softirq once we return from
-	 * the irq or softirq.
-	 *
-	 * Otherwise we wake up ksoftirqd to make sure we
-	 * schedule the softirq soon.
-	 */
-	if (!in_interrupt())
-		wakeup_softirqd();
-}
-
 void raise_softirq(unsigned int nr)
 {
 	unsigned long flags;
@@ -1101,12 +1172,12 @@ static int run_ksoftirqd(void * __bind_cpu)
 
 	while (!kthread_should_stop()) {
 		preempt_disable();
-		if (!local_softirq_pending())
+		if (!ksoftirqd_softirq_pending())
 			schedule_preempt_disabled();
 
 		__set_current_state(TASK_RUNNING);
 
-		while (local_softirq_pending()) {
+		while (ksoftirqd_softirq_pending()) {
 			if (ksoftirqd_do_softirq((long) __bind_cpu))
 				goto wait_to_die;
 			__preempt_enable_no_resched();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/16] rcu: rcutiny: Prevent RCU stall
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2013-02-13 16:11 ` [PATCH 03/16] softirq: Split softirq locks Sebastian Andrzej Siewior
@ 2013-02-13 16:11 ` Sebastian Andrzej Siewior
  2013-02-16 20:59   ` Paul E. McKenney
  2013-02-13 16:12 ` [PATCH 05/16] softirq: Adapt NOHZ softirq pending check to new RT scheme Sebastian Andrzej Siewior
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

rcu_read_unlock_special() checks in_serving_softirq() and leaves early
when true. On RT this is obviously wrong as softirq processing context
can be preempted and therefor such a task can be on the gp_tasks
list. Leaving early here will leave the task on the list and therefor
block RCU processing forever.

This cannot happen on mainline because softirq processing context
cannot be preempted and therefor this can never happen at all.

In fact this check looks quite questionable in general. Neither irq
context nor softirq processing context in mainline can ever be
preempted in mainline so the special unlock case should not ever be
invoked in such context. Now the only explanation might be a
rcu_read_unlock() being interrupted and therefor leave the rcu nest
count at 0 before the special unlock bit has been cleared. That looks
fragile. At least it's missing a big fat comment. Paul ????

See mainline commits: ec433f0c5 and 8762705a for further enlightment.

Reported-by: Kristian Lehmann <krleit00@hs-esslingen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: different in-irq check]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/rcutiny_plugin.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
index 2b0484a..bac1906 100644
--- a/kernel/rcutiny_plugin.h
+++ b/kernel/rcutiny_plugin.h
@@ -552,7 +552,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		rcu_preempt_cpu_qs();
 
 	/* Hardware IRQ handlers cannot block. */
-	if (in_irq()) {
+	if (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET)) {
 		local_irq_restore(flags);
 		return;
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/16] softirq: Adapt NOHZ softirq pending check to new RT scheme
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2013-02-13 16:11 ` [PATCH 04/16] rcu: rcutiny: Prevent RCU stall Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 06/16] softirq: Add more debugging Sebastian Andrzej Siewior
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

We can't rely on ksoftirqd anymore and we need to check the tasks
which run a particular softirq and if such a task is pi blocked ignore
the other pending bits of that task as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   68 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 3f7b3fb..66999ad 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -65,45 +65,75 @@ char *softirq_to_name[NR_SOFTIRQS] = {
 
 #ifdef CONFIG_NO_HZ
 # ifdef CONFIG_PREEMPT_RT_FULL
+
+struct softirq_runner {
+	struct task_struct *runner[NR_SOFTIRQS];
+};
+
+static DEFINE_PER_CPU(struct softirq_runner, softirq_runners);
+
+static inline void softirq_set_runner(unsigned int sirq)
+{
+	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
+
+	sr->runner[sirq] = current;
+}
+
+static inline void softirq_clr_runner(unsigned int sirq)
+{
+	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
+
+	sr->runner[sirq] = NULL;
+}
+
 /*
- * On preempt-rt a softirq might be blocked on a lock. There might be
- * no other runnable task on this CPU because the lock owner runs on
- * some other CPU. So we have to go into idle with the pending bit
- * set. Therefor we need to check this otherwise we warn about false
- * positives which confuses users and defeats the whole purpose of
- * this test.
+ * On preempt-rt a softirq running context might be blocked on a
+ * lock. There might be no other runnable task on this CPU because the
+ * lock owner runs on some other CPU. So we have to go into idle with
+ * the pending bit set. Therefor we need to check this otherwise we
+ * warn about false positives which confuses users and defeats the
+ * whole purpose of this test.
  *
  * This code is called with interrupts disabled.
  */
 void softirq_check_pending_idle(void)
 {
 	static int rate_limit;
-	u32 warnpending = 0, pending = local_softirq_pending();
+	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
+	u32 warnpending, pending = local_softirq_pending();
 
 	if (rate_limit >= 10)
 		return;
 
-	if (pending) {
+	warnpending = pending;
+
+	while (pending) {
 		struct task_struct *tsk;
+		int i = __ffs(pending);
 
-		tsk = __get_cpu_var(ksoftirqd);
+		pending &= ~(1 << i);
+
+		tsk = sr->runner[i];
 		/*
 		 * The wakeup code in rtmutex.c wakes up the task
 		 * _before_ it sets pi_blocked_on to NULL under
 		 * tsk->pi_lock. So we need to check for both: state
 		 * and pi_blocked_on.
 		 */
-		raw_spin_lock(&tsk->pi_lock);
-
-		if (!tsk->pi_blocked_on && !(tsk->state == TASK_RUNNING))
-			warnpending = 1;
-
-		raw_spin_unlock(&tsk->pi_lock);
+		if (tsk) {
+			raw_spin_lock(&tsk->pi_lock);
+			if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING) {
+				/* Clear all bits pending in that task */
+				warnpending &= ~(tsk->softirqs_raised);
+				warnpending &= ~(1 << i);
+			}
+			raw_spin_unlock(&tsk->pi_lock);
+		}
 	}
 
 	if (warnpending) {
 		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
-		       pending);
+		       warnpending);
 		rate_limit++;
 	}
 }
@@ -122,6 +152,10 @@ void softirq_check_pending_idle(void)
 	}
 }
 # endif
+
+#else /* !NO_HZ */
+static inline void softirq_set_runner(unsigned int sirq) { }
+static inline void softirq_clr_runner(unsigned int sirq) { }
 #endif
 
 /*
@@ -469,6 +503,7 @@ static void do_current_softirqs(int need_rcu_bh_qs)
 		 */
 		lock_softirq(i);
 		local_irq_disable();
+		softirq_set_runner(i);
 		/*
 		 * Check with the local_softirq_pending() bits,
 		 * whether we need to process this still or if someone
@@ -479,6 +514,7 @@ static void do_current_softirqs(int need_rcu_bh_qs)
 			set_softirq_pending(pending & ~mask);
 			do_single_softirq(i, need_rcu_bh_qs);
 		}
+		softirq_clr_runner(i);
 		unlock_softirq(i);
 		WARN_ON(current->softirq_nestcnt != 1);
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/16] softirq: Add more debugging
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 05/16] softirq: Adapt NOHZ softirq pending check to new RT scheme Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 07/16] softirq: Fix nohz pending issue for real Sebastian Andrzej Siewior
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

We really want to find code which calls __raise_softirq_irqsoff() and
runs neither in hardirq context nor in a local_bh disabled
region. This is even wrong on mainline as that code relies on random
events to take care of it's newly raised softirq.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 66999ad..385fcea 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -605,7 +605,7 @@ void thread_do_softirq(void)
 	}
 }
 
-void __raise_softirq_irqoff(unsigned int nr)
+static void do_raise_softirq_irqoff(unsigned int nr)
 {
 	trace_softirq_raise(nr);
 	or_softirq_pending(1UL << nr);
@@ -622,12 +622,19 @@ void __raise_softirq_irqoff(unsigned int nr)
 		__this_cpu_read(ksoftirqd)->softirqs_raised |= (1U << nr);
 }
 
+void __raise_softirq_irqoff(unsigned int nr)
+{
+	do_raise_softirq_irqoff(nr);
+	if (!in_irq() && !current->softirq_nestcnt)
+		wakeup_softirqd();
+}
+
 /*
  * This function must run with irqs disabled!
  */
 void raise_softirq_irqoff(unsigned int nr)
 {
-	__raise_softirq_irqoff(nr);
+	do_raise_softirq_irqoff(nr);
 
 	/*
 	 * If we're in an hard interrupt we let irq return code deal
@@ -649,11 +656,6 @@ void raise_softirq_irqoff(unsigned int nr)
 		wakeup_softirqd();
 }
 
-void do_raise_softirq_irqoff(unsigned int nr)
-{
-	raise_softirq_irqoff(nr);
-}
-
 static inline int ksoftirqd_softirq_pending(void)
 {
 	return current->softirqs_raised;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/16] softirq: Fix nohz pending issue for real
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 06/16] softirq: Add more debugging Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 08/16] net: Use local_bh_disable in netif_rx_ni() Sebastian Andrzej Siewior
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

We really need to iterate through all softirqs to find a potentially
blocked runner.

T1 runs softirq X (that cleared pending bit for X)

Interrupt raises softirq Y

T1 gets blocked on a lock and lock owner is not runnable

T1 schedules out

CPU goes idle and complains about pending softirq Y.

Now iterating over all softirqs lets us find the runner for X and
eliminate Y from the to warn about list as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 385fcea..1a0145a 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -100,20 +100,15 @@ void softirq_check_pending_idle(void)
 {
 	static int rate_limit;
 	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
-	u32 warnpending, pending = local_softirq_pending();
+	u32 warnpending = local_softirq_pending();
+	int i;
 
 	if (rate_limit >= 10)
 		return;
 
-	warnpending = pending;
-
-	while (pending) {
-		struct task_struct *tsk;
-		int i = __ffs(pending);
-
-		pending &= ~(1 << i);
+	for (i = 0; i < NR_SOFTIRQS; i++) {
+		struct task_struct *tsk = sr->runner[i];
 
-		tsk = sr->runner[i];
 		/*
 		 * The wakeup code in rtmutex.c wakes up the task
 		 * _before_ it sets pi_blocked_on to NULL under
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/16] net: Use local_bh_disable in netif_rx_ni()
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 07/16] softirq: Fix nohz pending issue for real Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 09/16] FIX [1/2] slub: Do not dereference NULL pointer in node_match Sebastian Andrzej Siewior
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

This code triggers the new WARN in __raise_softirq_irqsoff() though it
actually looks at the softirq pending bit and calls into the softirq
code, but that fits not well with the context related softirq model of
RT. It's correct on mainline though, but going through
local_bh_disable/enable here is not going to hurt badly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/dev.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7e8f459..9a0ca4d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3033,11 +3033,9 @@ int netif_rx_ni(struct sk_buff *skb)
 {
 	int err;
 
-	migrate_disable();
+	local_bh_disable();
 	err = netif_rx(skb);
-	if (local_softirq_pending())
-		thread_do_softirq();
-	migrate_enable();
+	local_bh_enable();
 
 	return err;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/16] FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 08/16] net: Use local_bh_disable in netif_rx_ni() Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 10/16] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor Sebastian Andrzej Siewior
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Christoph Lameter,
	Pekka Enberg, Thomas Gleixner, Sebastian Andrzej Siewior

From: Christoph Lameter <cl@linux.com>

The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxhchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.

Check for the NULL pointer in node_match.

A false positive will lead to a retry of the loop in __slab_alloc.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: replace page with c->page]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 5710788..08eb4c1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2038,7 +2038,7 @@ static void flush_all(struct kmem_cache *s)
 static inline int node_match(struct kmem_cache_cpu *c, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && c->node != node)
+	if (!c->page || (node != NUMA_NO_NODE && c->node != node))
 		return 0;
 #endif
 	return 1;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/16] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 09/16] FIX [1/2] slub: Do not dereference NULL pointer in node_match Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 11/16] slub: Use correct cpu_slab on dead cpu Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Christoph Lameter,
	Pekka Enberg, Thomas Gleixner, Sebastian Andrzej Siewior

From: Christoph Lameter <cl@linux.com>

As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
after the determination of the per cpu pointer and before the tid is retrieved.
This could result in allocation from the wrong node in slab_alloc.

The effect is much more severe in slab_free() where we could free to the freelist
of the wrong page.

The window for something like that occurring is pretty small but it is possible.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 08eb4c1..78d2756 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2286,13 +2286,18 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 		return NULL;
 
 redo:
-
 	/*
 	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
 	 * enabled. We may switch back and forth between cpus while
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
+	 *
+	 * Preemption is disabled for the retrieval of the tid because that
+	 * must occur from the current processor. We cannot allow rescheduling
+	 * on a different processor between the determination of the pointer
+	 * and the retrieval of the tid.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2302,7 +2307,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 	 * linked list in between.
 	 */
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	object = c->freelist;
 	if (unlikely(!object || !node_match(c, node)))
@@ -2544,10 +2549,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
 	 * data is retrieved via this pointer. If we are on the same cpu
 	 * during the cmpxchg then the free will succedd.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	if (likely(page == c->page)) {
 		set_freepointer(s, object, c->freelist);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/16] slub: Use correct cpu_slab on dead cpu
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (9 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 10/16] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 12/16] smp: introduce a generic on_each_cpu_mask() function Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Christoph Lameter,
	Sebastian Andrzej Siewior

From: Christoph Lameter <cl@linux.com>

Pass a kmem_cache_cpu pointer into unfreeze partials so that a different
kmem_cache_cpu structure than the local one can be specified.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christoph Lameter <cl@linux.com>
[bigeasy@linutronix: remove unused n2]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 78d2756..d9ba33d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1870,11 +1870,15 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 	}
 }
 
-/* Unfreeze all the cpu partial slabs */
-static void unfreeze_partials(struct kmem_cache *s)
+/*
+ * Unfreeze all the cpu partial slabs.
+ *
+ * This function must be called with interrupt disabled.
+ */
+static void unfreeze_partials(struct kmem_cache *s,
+		struct kmem_cache_cpu *c)
 {
 	struct kmem_cache_node *n = NULL;
-	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
 	struct page *page, *discard_page = NULL;
 
 	while ((page = c->partial)) {
@@ -1977,7 +1981,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				 * set to the per node partial list.
 				 */
 				local_irq_save(flags);
-				unfreeze_partials(s);
+				unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
 				local_irq_restore(flags);
 				pobjects = 0;
 				pages = 0;
@@ -2015,7 +2019,7 @@ static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 		if (c->page)
 			flush_slab(s, c);
 
-		unfreeze_partials(s);
+		unfreeze_partials(s, c);
 	}
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/16] smp: introduce a generic on_each_cpu_mask() function
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (10 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 11/16] slub: Use correct cpu_slab on dead cpu Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 13/16] smp: add func to IPI cpus based on parameter func Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Gilad Ben-Yossef,
	Frederic Weisbecker, Russell King, Pekka Enberg, Matt Mackall,
	Rik van Riel, Andi Kleen, Sasha Levin, Mel Gorman,
	Alexander Viro, Avi Kivity, Kosaki Motohiro, Milton Miller,
	Andrew Morton, Linus Torvalds, Sebastian Andrzej Siewior

From: Gilad Ben-Yossef <gilad@benyossef.com>

We have lots of infrastructure in place to partition multi-core systems
such that we have a group of CPUs that are dedicated to specific task:
cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
Still, kernel code will at times interrupt all CPUs in the system via IPIs
for various needs.  These IPIs are useful and cannot be avoided
altogether, but in certain cases it is possible to interrupt only specific
CPUs that have useful work to do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying
to explore doing this by locating the places where such global IPI calls
are being made and turning the global IPI into an IPI for a specific group
of CPUs.  The purpose of the patch set is to get feedback if this is the
right way to go for dealing with this issue and indeed, if the issue is
even worth dealing with at all.  Based on the feedback from this patch set
I plan to offer further patches that address similar issue in other code
paths.

This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
infrastructure API (the former derived from existing arch specific
versions in Tile and Arm) and uses them to turn several global IPI
invocation to per CPU group invocations.

Core kernel:

on_each_cpu_mask() calls a function on processors specified by cpumask,
which may or may not include the local processor.

You must not call this function with disabled interrupts or from a
hardware interrupt handler or from a bottom half handler.

arch/arm:

Note that the generic version is a little different then the Arm one:

1. It has the mask as first parameter
2. It calls the function on the calling CPU with interrupts disabled,
   but this should be OK since the function is called on the other CPUs
   with interrupts disabled anyway.

arch/tile:

The API is the same as the tile private one, but the generic version
also calls the function on the with interrupts disabled in UP case

This is OK since the function is called on the other CPUs
with interrupts disabled.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Avi Kivity <avi@redhat.com>
Acked-by: Michal Nazarewicz <mina86@mina86.org>
Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/arm/kernel/smp_tlb.c   |   20 +++++---------------
 arch/tile/include/asm/smp.h |    7 -------
 arch/tile/kernel/smp.c      |   19 -------------------
 include/linux/smp.h         |   22 ++++++++++++++++++++++
 kernel/smp.c                |   29 +++++++++++++++++++++++++++++
 5 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 7dcb352..02c5d2c 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -13,18 +13,6 @@
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
-static void on_each_cpu_mask(void (*func)(void *), void *info, int wait,
-	const struct cpumask *mask)
-{
-	preempt_disable();
-
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(smp_processor_id(), mask))
-		func(info);
-
-	preempt_enable();
-}
-
 /**********************************************************************/
 
 /*
@@ -87,7 +75,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(ipi_flush_tlb_mm, mm, 1, mm_cpumask(mm));
+		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
 	else
 		local_flush_tlb_mm(mm);
 }
@@ -98,7 +86,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(ipi_flush_tlb_page, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+					&ta, 1);
 	} else
 		local_flush_tlb_page(vma, uaddr);
 }
@@ -121,7 +110,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(ipi_flush_tlb_range, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+					&ta, 1);
 	} else
 		local_flush_tlb_range(vma, start, end);
 }
diff --git a/arch/tile/include/asm/smp.h b/arch/tile/include/asm/smp.h
index 532124a..1aa759a 100644
--- a/arch/tile/include/asm/smp.h
+++ b/arch/tile/include/asm/smp.h
@@ -43,10 +43,6 @@ void evaluate_message(int tag);
 /* Boot a secondary cpu */
 void online_secondary(void);
 
-/* Call a function on a specified set of CPUs (may include this one). */
-extern void on_each_cpu_mask(const struct cpumask *mask,
-			     void (*func)(void *), void *info, bool wait);
-
 /* Topology of the supervisor tile grid, and coordinates of boot processor */
 extern HV_Topology smp_topology;
 
@@ -91,9 +87,6 @@ void print_disabled_cpus(void);
 
 #else /* !CONFIG_SMP */
 
-#define on_each_cpu_mask(mask, func, info, wait)		\
-  do { if (cpumask_test_cpu(0, (mask))) func(info); } while (0)
-
 #define smp_master_cpu		0
 #define smp_height		1
 #define smp_width		1
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index c52224d..a44e103 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -87,25 +87,6 @@ void send_IPI_allbutself(int tag)
 	send_IPI_many(&mask, tag);
 }
 
-
-/*
- * Provide smp_call_function_mask, but also run function locally
- * if specified in the mask.
- */
-void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
-		      void *info, bool wait)
-{
-	int cpu = get_cpu();
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(cpu, mask)) {
-		local_irq_disable();
-		func(info);
-		local_irq_enable();
-	}
-	put_cpu();
-}
-
-
 /*
  * Functions related to starting/stopping cpus.
  */
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 78fd0a2..742a45a 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -101,6 +101,13 @@ static inline void call_function_init(void) { }
 int on_each_cpu(smp_call_func_t func, void *info, int wait);
 
 /*
+ * Call a function on processors specified by mask, which might include
+ * the local one.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
+		void *info, bool wait);
+
+/*
  * Mark the boot cpu "online" so that it can call console drivers in
  * printk() and can access its per-cpu storage.
  */
@@ -131,6 +138,21 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
 		local_irq_enable();		\
 		0;				\
 	})
+/*
+ * Note we still need to test the mask even for UP
+ * because we actually can get an empty mask from
+ * code that on SMP might call us without the local
+ * CPU in the mask.
+ */
+#define on_each_cpu_mask(mask, func, info, wait) \
+	do {						\
+		if (cpumask_test_cpu(0, (mask))) {	\
+			local_irq_disable();		\
+			(func)(info);			\
+			local_irq_enable();		\
+		}					\
+	} while (0)
+
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
 #define smp_prepare_boot_cpu()			do {} while (0)
diff --git a/kernel/smp.c b/kernel/smp.c
index db197d6..a081e6c 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -701,3 +701,32 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
+
+/**
+ * on_each_cpu_mask(): Run a function on processors specified by
+ * cpumask, which may include the local processor.
+ * @mask: The set of cpus to run on (only runs on online subset).
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait (atomically) until function has completed
+ *        on other CPUs.
+ *
+ * If @wait is true, then returns once @func has returned.
+ *
+ * You must not call this function with disabled interrupts or
+ * from a hardware interrupt handler or from a bottom half handler.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
+			void *info, bool wait)
+{
+	int cpu = get_cpu();
+
+	smp_call_function_many(mask, func, info, wait);
+	if (cpumask_test_cpu(cpu, mask)) {
+		local_irq_disable();
+		func(info);
+		local_irq_enable();
+	}
+	put_cpu();
+}
+EXPORT_SYMBOL(on_each_cpu_mask);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/16] smp: add func to IPI cpus based on parameter func
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (11 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 12/16] smp: introduce a generic on_each_cpu_mask() function Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 14/16] slub: only IPI CPUs that have per cpu obj to flush Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Gilad Ben-Yossef,
	Chris Metcalf, Christoph Lameter, Frederic Weisbecker,
	Russell King, Pekka Enberg, Matt Mackall, Sasha Levin,
	Rik van Riel, Andi Kleen, Alexander Viro, Avi Kivity,
	Kosaki Motohiro, Milton Miller, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior

From: Gilad Ben-Yossef <gilad@benyossef.com>

Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
calculates the cpumask of cpus to IPI by calling a function supplied as a
parameter in order to determine whether to IPI each specific cpu.

The function works around allocation failure of cpumask variable in
CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
via smp_call_function_single().

The function is useful since it allows to seperate the specific code that
decided in each case whether to IPI a specific cpu for a specific request
from the common boilerplate code of handling creating the mask, handling
failures etc.

[akpm@linux-foundation.org: s/gfpflags/gfp_flags/]
[akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
[akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment]
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Avi Kivity <avi@redhat.com>
Acked-by: Michal Nazarewicz <mina86@mina86.org>
Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com>
Cc: Milton Miller <miltonm@bga.com>
Reviewed-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/smp.h |   24 ++++++++++++++++++++
 kernel/smp.c        |   61 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 742a45a..3001ba5 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -108,6 +108,15 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 		void *info, bool wait);
 
 /*
+ * Call a function on each processor for which the supplied function
+ * cond_func returns a positive value. This may include the local
+ * processor.
+ */
+void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
+		smp_call_func_t func, void *info, bool wait,
+		gfp_t gfp_flags);
+
+/*
  * Mark the boot cpu "online" so that it can call console drivers in
  * printk() and can access its per-cpu storage.
  */
@@ -152,6 +161,21 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
 			local_irq_enable();		\
 		}					\
 	} while (0)
+/*
+ * Preemption is disabled here to make sure the cond_func is called under the
+ * same condtions in UP and SMP.
+ */
+#define on_each_cpu_cond(cond_func, func, info, wait, gfp_flags)\
+	do {							\
+		void *__info = (info);				\
+		preempt_disable();				\
+		if ((cond_func)(0, __info)) {			\
+			local_irq_disable();			\
+			(func)(__info);				\
+			local_irq_enable();			\
+		}						\
+		preempt_enable();				\
+	} while (0)
 
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
diff --git a/kernel/smp.c b/kernel/smp.c
index a081e6c..2f8b10e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -730,3 +730,64 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 	put_cpu();
 }
 EXPORT_SYMBOL(on_each_cpu_mask);
+
+/*
+ * on_each_cpu_cond(): Call a function on each processor for which
+ * the supplied function cond_func returns true, optionally waiting
+ * for all the required CPUs to finish. This may include the local
+ * processor.
+ * @cond_func:	A callback function that is passed a cpu id and
+ *		the the info parameter. The function is called
+ *		with preemption disabled. The function should
+ *		return a blooean value indicating whether to IPI
+ *		the specified CPU.
+ * @func:	The function to run on all applicable CPUs.
+ *		This must be fast and non-blocking.
+ * @info:	An arbitrary pointer to pass to both functions.
+ * @wait:	If true, wait (atomically) until function has
+ *		completed on other CPUs.
+ * @gfp_flags:	GFP flags to use when allocating the cpumask
+ *		used internally by the function.
+ *
+ * The function might sleep if the GFP flags indicates a non
+ * atomic allocation is allowed.
+ *
+ * Preemption is disabled to protect against CPUs going offline but not online.
+ * CPUs going online during the call will not be seen or sent an IPI.
+ *
+ * You must not call this function with disabled interrupts or
+ * from a hardware interrupt handler or from a bottom half handler.
+ */
+void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
+			smp_call_func_t func, void *info, bool wait,
+			gfp_t gfp_flags)
+{
+	cpumask_var_t cpus;
+	int cpu, ret;
+
+	might_sleep_if(gfp_flags & __GFP_WAIT);
+
+	if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) {
+		preempt_disable();
+		for_each_online_cpu(cpu)
+			if (cond_func(cpu, info))
+				cpumask_set_cpu(cpu, cpus);
+		on_each_cpu_mask(cpus, func, info, wait);
+		preempt_enable();
+		free_cpumask_var(cpus);
+	} else {
+		/*
+		 * No free cpumask, bother. No matter, we'll
+		 * just have to IPI them one by one.
+		 */
+		preempt_disable();
+		for_each_online_cpu(cpu)
+			if (cond_func(cpu, info)) {
+				ret = smp_call_function_single(cpu, func,
+								info, wait);
+				WARN_ON_ONCE(!ret);
+			}
+		preempt_enable();
+	}
+}
+EXPORT_SYMBOL(on_each_cpu_cond);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/16] slub: only IPI CPUs that have per cpu obj to flush
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (12 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 13/16] smp: add func to IPI cpus based on parameter func Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 15/16] mm: Enable SLUB for RT Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Gilad Ben-Yossef,
	Chris Metcalf, Frederic Weisbecker, Russell King, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen, Mel Gorman,
	Alexander Viro, Avi Kivity, Michal Nazarewicz, Kosaki Motohiro,
	Milton Miller, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior

From: Gilad Ben-Yossef <gilad@benyossef.com>

flush_all() is called for each kmem_cache_destroy().  So every cache being
destroyed dynamically ends up sending an IPI to each CPU in the system,
regardless if the cache has ever been used there.

For example, if you close the Infinband ipath driver char device file, the
close file ops calls kmem_cache_destroy().  So running some infiniband
config tool on one a single CPU dedicated to system tasks might interrupt
the rest of the 127 CPUs dedicated to some CPU intensive or latency
sensitive task.

I suspect there is a good chance that every line in the output of "git
grep kmem_cache_destroy linux/ | grep '\->'" has a similar scenario.

This patch attempts to rectify this issue by sending an IPI to flush the
per cpu objects back to the free lists only to CPUs that seem to have such
objects.

The check which CPU to IPI is racy but we don't care since asking a CPU
without per cpu objects to flush does no damage and as far as I can tell
the flush_all by itself is racy against allocs on remote CPUs anyway, so
if you required the flush_all to be determinstic, you had to arrange for
locking regardless.

Without this patch the following artificial test case:

$ cd /sys/kernel/slab
$ for DIR in *; do cat $DIR/alloc_calls > /dev/null; done

produces 166 IPIs on an cpuset isolated CPU. With it it produces none.

The code path of memory allocation failure for CPUMASK_OFFSTACK=y
config was tested using fault injection framework.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Avi Kivity <avi@redhat.com>
Cc: Michal Nazarewicz <mina86@mina86.org>
Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com>
Cc: Milton Miller <miltonm@bga.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index d9ba33d..2a250f4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2030,9 +2030,17 @@ static void flush_cpu_slab(void *d)
 	__flush_cpu_slab(s, smp_processor_id());
 }
 
+static bool has_cpu_slab(int cpu, void *info)
+{
+	struct kmem_cache *s = info;
+	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
+
+	return !!(c->page);
+}
+
 static void flush_all(struct kmem_cache *s)
 {
-	on_each_cpu(flush_cpu_slab, s, 1);
+	on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1, GFP_ATOMIC);
 }
 
 /*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 15/16] mm: Enable SLUB for RT
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (13 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 14/16] slub: only IPI CPUs that have per cpu obj to flush Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 16:12 ` [PATCH 16/16] slub: Enable irqs for __GFP_WAIT Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Make SLUB RT aware and remove the restriction in Kconfig.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: fix a few conflicts]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/slub_def.h |    2 +-
 init/Kconfig             |    1 -
 mm/slub.c                |  115 +++++++++++++++++++++++++++++++++++-----------
 3 files changed, 90 insertions(+), 28 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index a32bcfd..0c674f6 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -52,7 +52,7 @@ struct kmem_cache_cpu {
 };
 
 struct kmem_cache_node {
-	spinlock_t list_lock;	/* Protect partial list and nr_partial */
+	raw_spinlock_t list_lock;	/* Protect partial list and nr_partial */
 	unsigned long nr_partial;
 	struct list_head partial;
 #ifdef CONFIG_SLUB_DEBUG
diff --git a/init/Kconfig b/init/Kconfig
index aa6545f..cfb1668 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1240,7 +1240,6 @@ config SLAB
 
 config SLUB
 	bool "SLUB (Unqueued Allocator)"
-	depends on !PREEMPT_RT_FULL
 	help
 	   SLUB is a slab allocator that minimizes cache line usage
 	   instead of managing queues of cached objects (SLAB approach).
diff --git a/mm/slub.c b/mm/slub.c
index 2a250f4..8475580 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1258,6 +1258,12 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) {}
 
 #endif /* CONFIG_SLUB_DEBUG */
 
+struct slub_free_list {
+	raw_spinlock_t		lock;
+	struct list_head	list;
+};
+static DEFINE_PER_CPU(struct slub_free_list, slub_free_list);
+
 /*
  * Slab allocation and freeing
  */
@@ -1282,7 +1288,11 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 	flags &= gfp_allowed_mask;
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (system_state == SYSTEM_RUNNING)
+#else
 	if (flags & __GFP_WAIT)
+#endif
 		local_irq_enable();
 
 	flags |= s->allocflags;
@@ -1306,7 +1316,11 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 			stat(s, ORDER_FALLBACK);
 	}
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (system_state == SYSTEM_RUNNING)
+#else
 	if (flags & __GFP_WAIT)
+#endif
 		local_irq_disable();
 
 	if (!page)
@@ -1412,6 +1426,16 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__free_pages(page, order);
 }
 
+static void free_delayed(struct kmem_cache *s, struct list_head *h)
+{
+	while(!list_empty(h)) {
+		struct page *page = list_first_entry(h, struct page, lru);
+
+		list_del(&page->lru);
+		__free_slab(s, page);
+	}
+}
+
 #define need_reserve_slab_rcu						\
 	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
 
@@ -1446,6 +1470,12 @@ static void free_slab(struct kmem_cache *s, struct page *page)
 		}
 
 		call_rcu(head, rcu_free_slab);
+	} else if (irqs_disabled()) {
+		struct slub_free_list *f = &__get_cpu_var(slub_free_list);
+
+		raw_spin_lock(&f->lock);
+		list_add(&page->lru, &f->list);
+		raw_spin_unlock(&f->lock);
 	} else
 		__free_slab(s, page);
 }
@@ -1545,7 +1575,7 @@ static void *get_partial_node(struct kmem_cache *s,
 	if (!n || !n->nr_partial)
 		return NULL;
 
-	spin_lock(&n->list_lock);
+	raw_spin_lock(&n->list_lock);
 	list_for_each_entry_safe(page, page2, &n->partial, lru) {
 		void *t = acquire_slab(s, n, page, object == NULL);
 		int available;
@@ -1566,7 +1596,7 @@ static void *get_partial_node(struct kmem_cache *s,
 			break;
 
 	}
-	spin_unlock(&n->list_lock);
+	raw_spin_unlock(&n->list_lock);
 	return object;
 }
 
@@ -1815,7 +1845,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 			 * that acquire_slab() will see a slab page that
 			 * is frozen
 			 */
-			spin_lock(&n->list_lock);
+			raw_spin_lock(&n->list_lock);
 		}
 	} else {
 		m = M_FULL;
@@ -1826,7 +1856,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 			 * slabs from diagnostic functions will not see
 			 * any frozen slabs.
 			 */
-			spin_lock(&n->list_lock);
+			raw_spin_lock(&n->list_lock);
 		}
 	}
 
@@ -1861,7 +1891,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 		goto redo;
 
 	if (lock)
-		spin_unlock(&n->list_lock);
+		raw_spin_unlock(&n->list_lock);
 
 	if (m == M_FREE) {
 		stat(s, DEACTIVATE_EMPTY);
@@ -1910,10 +1940,10 @@ static void unfreeze_partials(struct kmem_cache *s,
 				m = M_PARTIAL;
 				if (n != n2) {
 					if (n)
-						spin_unlock(&n->list_lock);
+						raw_spin_unlock(&n->list_lock);
 
 					n = n2;
-					spin_lock(&n->list_lock);
+					raw_spin_lock(&n->list_lock);
 				}
 			}
 
@@ -1939,7 +1969,7 @@ static void unfreeze_partials(struct kmem_cache *s,
 	}
 
 	if (n)
-		spin_unlock(&n->list_lock);
+		raw_spin_unlock(&n->list_lock);
 
 	while (discard_page) {
 		page = discard_page;
@@ -1960,7 +1990,7 @@ static void unfreeze_partials(struct kmem_cache *s,
  * If we did not find a slot then simply move all the partials to the
  * per node partial list.
  */
-int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
+static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 {
 	struct page *oldpage;
 	int pages;
@@ -1975,6 +2005,8 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 			pobjects = oldpage->pobjects;
 			pages = oldpage->pages;
 			if (drain && pobjects > s->cpu_partial) {
+				LIST_HEAD(tofree);
+				struct slub_free_list *f;
 				unsigned long flags;
 				/*
 				 * partial array is full. Move the existing
@@ -1982,7 +2014,12 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				 */
 				local_irq_save(flags);
 				unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
+				f = &__get_cpu_var(slub_free_list);
+				raw_spin_lock(&f->lock);
+				list_splice_init(&f->list, &tofree);
+				raw_spin_unlock(&f->lock);
 				local_irq_restore(flags);
+				free_delayed(s, &tofree);
 				pobjects = 0;
 				pages = 0;
 			}
@@ -2040,7 +2077,22 @@ static bool has_cpu_slab(int cpu, void *info)
 
 static void flush_all(struct kmem_cache *s)
 {
+	LIST_HEAD(tofree);
+	int cpu;
+
 	on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1, GFP_ATOMIC);
+	for_each_online_cpu(cpu) {
+		struct slub_free_list *f;
+
+		if (!has_cpu_slab(cpu, s))
+			continue;
+
+		f = &per_cpu(slub_free_list, cpu);
+		raw_spin_lock_irq(&f->lock);
+		list_splice_init(&f->list, &tofree);
+		raw_spin_unlock_irq(&f->lock);
+		free_delayed(s, &tofree);
+	}
 }
 
 /*
@@ -2068,10 +2120,10 @@ static unsigned long count_partial(struct kmem_cache_node *n,
 	unsigned long x = 0;
 	struct page *page;
 
-	spin_lock_irqsave(&n->list_lock, flags);
+	raw_spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry(page, &n->partial, lru)
 		x += get_count(page);
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	return x;
 }
 
@@ -2167,6 +2219,8 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			  unsigned long addr, struct kmem_cache_cpu *c)
 {
+	struct slub_free_list *f;
+	LIST_HEAD(tofree);
 	void **object;
 	unsigned long flags;
 	struct page new;
@@ -2233,7 +2287,13 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 load_freelist:
 	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
+out:
+	f = &__get_cpu_var(slub_free_list);
+	raw_spin_lock(&f->lock);
+	list_splice_init(&f->list, &tofree);
+	raw_spin_unlock(&f->lock);
 	local_irq_restore(flags);
+	free_delayed(s, &tofree);
 	return object;
 
 new_slab:
@@ -2258,8 +2318,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
 				slab_out_of_memory(s, gfpflags, node);
 
-			local_irq_restore(flags);
-			return NULL;
+			goto out;
 		}
 	}
 
@@ -2273,8 +2332,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	c->freelist = get_freepointer(s, object);
 	deactivate_slab(s, c);
 	c->node = NUMA_NO_NODE;
-	local_irq_restore(flags);
-	return object;
+	goto out;
 }
 
 /*
@@ -2466,7 +2524,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 				 * Otherwise the list_lock will synchronize with
 				 * other processors updating the list of slabs.
 				 */
-				spin_lock_irqsave(&n->list_lock, flags);
+				raw_spin_lock_irqsave(&n->list_lock, flags);
 
 			}
 		}
@@ -2515,7 +2573,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 			stat(s, FREE_ADD_PARTIAL);
 		}
 	}
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	return;
 
 slab_empty:
@@ -2529,7 +2587,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 		/* Slab must be on the full list */
 		remove_full(s, page);
 
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	stat(s, FREE_SLAB);
 	discard_slab(s, page);
 }
@@ -2759,7 +2817,7 @@ static void
 init_kmem_cache_node(struct kmem_cache_node *n, struct kmem_cache *s)
 {
 	n->nr_partial = 0;
-	spin_lock_init(&n->list_lock);
+	raw_spin_lock_init(&n->list_lock);
 	INIT_LIST_HEAD(&n->partial);
 #ifdef CONFIG_SLUB_DEBUG
 	atomic_long_set(&n->nr_slabs, 0);
@@ -3499,7 +3557,7 @@ int kmem_cache_shrink(struct kmem_cache *s)
 		for (i = 0; i < objects; i++)
 			INIT_LIST_HEAD(slabs_by_inuse + i);
 
-		spin_lock_irqsave(&n->list_lock, flags);
+		raw_spin_lock_irqsave(&n->list_lock, flags);
 
 		/*
 		 * Build lists indexed by the items in use in each slab.
@@ -3520,7 +3578,7 @@ int kmem_cache_shrink(struct kmem_cache *s)
 		for (i = objects - 1; i > 0; i--)
 			list_splice(slabs_by_inuse + i, n->partial.prev);
 
-		spin_unlock_irqrestore(&n->list_lock, flags);
+		raw_spin_unlock_irqrestore(&n->list_lock, flags);
 
 		/* Release empty slabs */
 		list_for_each_entry_safe(page, t, slabs_by_inuse, lru)
@@ -3686,10 +3744,15 @@ void __init kmem_cache_init(void)
 	int i;
 	int caches = 0;
 	struct kmem_cache *temp_kmem_cache;
-	int order;
+	int order, cpu;
 	struct kmem_cache *temp_kmem_cache_node;
 	unsigned long kmalloc_size;
 
+	for_each_possible_cpu(cpu) {
+		raw_spin_lock_init(&per_cpu(slub_free_list, cpu).lock);
+		INIT_LIST_HEAD(&per_cpu(slub_free_list, cpu).list);
+	}
+
 	kmem_size = offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *);
 
@@ -4110,7 +4173,7 @@ static int validate_slab_node(struct kmem_cache *s,
 	struct page *page;
 	unsigned long flags;
 
-	spin_lock_irqsave(&n->list_lock, flags);
+	raw_spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, lru) {
 		validate_slab_slab(s, page, map);
@@ -4133,7 +4196,7 @@ static int validate_slab_node(struct kmem_cache *s,
 			atomic_long_read(&n->nr_slabs));
 
 out:
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	return count;
 }
 
@@ -4323,12 +4386,12 @@ static int list_locations(struct kmem_cache *s, char *buf,
 		if (!atomic_long_read(&n->nr_slabs))
 			continue;
 
-		spin_lock_irqsave(&n->list_lock, flags);
+		raw_spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, lru)
 			process_slab(&t, s, page, alloc, map);
 		list_for_each_entry(page, &n->full, lru)
 			process_slab(&t, s, page, alloc, map);
-		spin_unlock_irqrestore(&n->list_lock, flags);
+		raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
 	for (i = 0; i < t.count; i++) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 16/16] slub: Enable irqs for __GFP_WAIT
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (14 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 15/16] mm: Enable SLUB for RT Sebastian Andrzej Siewior
@ 2013-02-13 16:12 ` Sebastian Andrzej Siewior
  2013-02-13 17:24 ` [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Steven Rostedt
  2013-04-24  2:36 ` Steven Rostedt
  17 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: fix !page conflict]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8475580..4c62b7f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1285,14 +1285,15 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	struct page *page;
 	struct kmem_cache_order_objects oo = s->oo;
 	gfp_t alloc_gfp;
+	bool enableirqs;
 
 	flags &= gfp_allowed_mask;
 
+	enableirqs = (flags & __GFP_WAIT) != 0;
 #ifdef CONFIG_PREEMPT_RT_FULL
-	if (system_state == SYSTEM_RUNNING)
-#else
-	if (flags & __GFP_WAIT)
+	enableirqs |= system_state == SYSTEM_RUNNING;
 #endif
+	if (enableirqs)
 		local_irq_enable();
 
 	flags |= s->allocflags;
@@ -1316,11 +1317,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 			stat(s, ORDER_FALLBACK);
 	}
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (system_state == SYSTEM_RUNNING)
-#else
-	if (flags & __GFP_WAIT)
-#endif
+	if (enableirqs)
 		local_irq_disable();
 
 	if (!page)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (15 preceding siblings ...)
  2013-02-13 16:12 ` [PATCH 16/16] slub: Enable irqs for __GFP_WAIT Sebastian Andrzej Siewior
@ 2013-02-13 17:24 ` Steven Rostedt
  2013-02-13 17:41   ` Thomas Gleixner
  2013-02-19  1:54   ` Li Zefan
  2013-04-24  2:36 ` Steven Rostedt
  17 siblings, 2 replies; 28+ messages in thread
From: Steven Rostedt @ 2013-02-13 17:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users, Carsten Emde

On Wed, 2013-02-13 at 17:11 +0100, Sebastian Andrzej Siewior wrote:
> 3.2-rt is a long term supported kernel, which lacks two RT features
> from 3.6: SLUB support and the split softirq lock implementation.
> 
> SLUB has a way better performance than SLAB on RT and the split
> softirq lock implementation is necessary especially for realtime
> networking applications.
> 
> The following patch series backports these features to 3.2-rt.
> 
> Steven, could you please apply these patches to a separate
> 3.2-rt-features branch?

Is it possible to separate these two features into two different patch
sets. I would like to apply the softirq changes first, and then apply
the slub changes.

Although it looks as though the first half is softirq only 1-8, and the
second half is slub related 9-16. If that's the case, I'll just pull in
1-8 first, play with them, and then do 9-16.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-02-13 17:24 ` [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Steven Rostedt
@ 2013-02-13 17:41   ` Thomas Gleixner
  2013-02-19  1:54   ` Li Zefan
  1 sibling, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2013-02-13 17:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, linux-kernel, linux-rt-users, Carsten Emde

On Wed, 13 Feb 2013, Steven Rostedt wrote:

> On Wed, 2013-02-13 at 17:11 +0100, Sebastian Andrzej Siewior wrote:
> > 3.2-rt is a long term supported kernel, which lacks two RT features
> > from 3.6: SLUB support and the split softirq lock implementation.
> > 
> > SLUB has a way better performance than SLAB on RT and the split
> > softirq lock implementation is necessary especially for realtime
> > networking applications.
> > 
> > The following patch series backports these features to 3.2-rt.
> > 
> > Steven, could you please apply these patches to a separate
> > 3.2-rt-features branch?
> 
> Is it possible to separate these two features into two different patch
> sets. I would like to apply the softirq changes first, and then apply
> the slub changes.
> 
> Although it looks as though the first half is softirq only 1-8, and the
> second half is slub related 9-16. If that's the case, I'll just pull in
> 1-8 first, play with them, and then do 9-16.

That's correct.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/16] rcu: rcutiny: Prevent RCU stall
  2013-02-13 16:11 ` [PATCH 04/16] rcu: rcutiny: Prevent RCU stall Sebastian Andrzej Siewior
@ 2013-02-16 20:59   ` Paul E. McKenney
  2013-02-18 15:02     ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2013-02-16 20:59 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-kernel, linux-rt-users, Carsten Emde,
	Thomas Gleixner

On Wed, Feb 13, 2013 at 05:11:59PM +0100, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> rcu_read_unlock_special() checks in_serving_softirq() and leaves early
> when true. On RT this is obviously wrong as softirq processing context
> can be preempted and therefor such a task can be on the gp_tasks
> list. Leaving early here will leave the task on the list and therefor
> block RCU processing forever.
> 
> This cannot happen on mainline because softirq processing context
> cannot be preempted and therefor this can never happen at all.
> 
> In fact this check looks quite questionable in general. Neither irq
> context nor softirq processing context in mainline can ever be
> preempted in mainline so the special unlock case should not ever be
> invoked in such context. Now the only explanation might be a
> rcu_read_unlock() being interrupted and therefor leave the rcu nest
> count at 0 before the special unlock bit has been cleared. That looks
> fragile. At least it's missing a big fat comment. Paul ????
> 
> See mainline commits: ec433f0c5 and 8762705a for further enlightment.
> 
> Reported-by: Kristian Lehmann <krleit00@hs-esslingen.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> [bigeasy@linutronix: different in-irq check]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/rcutiny_plugin.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
> index 2b0484a..bac1906 100644
> --- a/kernel/rcutiny_plugin.h
> +++ b/kernel/rcutiny_plugin.h
> @@ -552,7 +552,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  		rcu_preempt_cpu_qs();
> 
>  	/* Hardware IRQ handlers cannot block. */
> -	if (in_irq()) {
> +	if (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET)) {

For whatever it is worth, in mainline this is:

	if (in_irq() || in_serving_softirq()) {

The definition of in_serving_softirq() is a bit different:

#define in_serving_softirq()    (softirq_count() & SOFTIRQ_OFFSET)

This might be due to differences between mainline and -rt, but thought
it worth calling attention to.

							Thanx, Paul

>  		local_irq_restore(flags);
>  		return;
>  	}
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/16] rcu: rcutiny: Prevent RCU stall
  2013-02-16 20:59   ` Paul E. McKenney
@ 2013-02-18 15:02     ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2013-02-18 15:02 UTC (permalink / raw)
  To: paulmck
  Cc: Sebastian Andrzej Siewior, linux-kernel, linux-rt-users,
	Carsten Emde, Thomas Gleixner

On Sat, 2013-02-16 at 12:59 -0800, Paul E. McKenney wrote:
>  
> > diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
> > index 2b0484a..bac1906 100644
> > --- a/kernel/rcutiny_plugin.h
> > +++ b/kernel/rcutiny_plugin.h
> > @@ -552,7 +552,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  		rcu_preempt_cpu_qs();
> > 
> >  	/* Hardware IRQ handlers cannot block. */
> > -	if (in_irq()) {
> > +	if (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET)) {
> 
> For whatever it is worth, in mainline this is:
> 
> 	if (in_irq() || in_serving_softirq()) {
> 
> The definition of in_serving_softirq() is a bit different:
> 
> #define in_serving_softirq()    (softirq_count() & SOFTIRQ_OFFSET)
> 
> This might be due to differences between mainline and -rt, but thought
> it worth calling attention to.
> 

Right, and we should avoid open coding HARDIRQ_MASK and SOFTIRQ_OFFSET. 
And note, -rt is slowly creeping into mainline. It's better to have
things like this nicely tidied up.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-02-13 17:24 ` [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Steven Rostedt
  2013-02-13 17:41   ` Thomas Gleixner
@ 2013-02-19  1:54   ` Li Zefan
  2013-02-19  1:56     ` Li Zefan
  1 sibling, 1 reply; 28+ messages in thread
From: Li Zefan @ 2013-02-19  1:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, linux-kernel, linux-rt-users, Carsten Emde

On 2013/2/14 1:24, Steven Rostedt wrote:
> On Wed, 2013-02-13 at 17:11 +0100, Sebastian Andrzej Siewior wrote:
>> 3.2-rt is a long term supported kernel, which lacks two RT features
>> from 3.6: SLUB support and the split softirq lock implementation.
>>
>> SLUB has a way better performance than SLAB on RT and the split
>> softirq lock implementation is necessary especially for realtime
>> networking applications.
>>
>> The following patch series backports these features to 3.2-rt.
>>
>> Steven, could you please apply these patches to a separate
>> 3.2-rt-features branch?
> 
> Is it possible to separate these two features into two different patch
> sets. I would like to apply the softirq changes first, and then apply
> the slub changes.
> 
> Although it looks as though the first half is softirq only 1-8, and the
> second half is slub related 9-16. If that's the case, I'll just pull in
> 1-8 first, play with them, and then do 9-16.
> 

We've also ported these two features to 3.4-rt. Do you want us to post
them?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-02-19  1:54   ` Li Zefan
@ 2013-02-19  1:56     ` Li Zefan
  2013-02-19  4:06       ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Li Zefan @ 2013-02-19  1:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, linux-kernel, linux-rt-users, Carsten Emde

On 2013/2/19 9:54, Li Zefan wrote:
> On 2013/2/14 1:24, Steven Rostedt wrote:
>> On Wed, 2013-02-13 at 17:11 +0100, Sebastian Andrzej Siewior wrote:
>>> 3.2-rt is a long term supported kernel, which lacks two RT features
>>> from 3.6: SLUB support and the split softirq lock implementation.
>>>
>>> SLUB has a way better performance than SLAB on RT and the split
>>> softirq lock implementation is necessary especially for realtime
>>> networking applications.
>>>
>>> The following patch series backports these features to 3.2-rt.
>>>
>>> Steven, could you please apply these patches to a separate
>>> 3.2-rt-features branch?
>>
>> Is it possible to separate these two features into two different patch
>> sets. I would like to apply the softirq changes first, and then apply
>> the slub changes.
>>
>> Although it looks as though the first half is softirq only 1-8, and the
>> second half is slub related 9-16. If that's the case, I'll just pull in
>> 1-8 first, play with them, and then do 9-16.
>>
> 
> We've also ported these two features to 3.4-rt. Do you want us to post
> them?
> 

Oh ignore me. Just saw the patchset for 3.4-rt.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-02-19  1:56     ` Li Zefan
@ 2013-02-19  4:06       ` Steven Rostedt
  2013-02-19  6:17         ` Mike Galbraith
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-02-19  4:06 UTC (permalink / raw)
  To: Li Zefan
  Cc: Sebastian Andrzej Siewior, linux-kernel, linux-rt-users, Carsten Emde

On Tue, 2013-02-19 at 09:56 +0800, Li Zefan wrote:

> Oh ignore me. Just saw the patchset for 3.4-rt.

Note, I already have part of the 3.4-feature (softirq backport) tested
and ready. What I'm waiting on is trying to figure out the best way to
process it.

I already know I'll have a v3.4-features branch, but I'm thinking about
patches and tarballs, if it should be posted as well, or just be in git?

If I choose to post them, I want the scripts done before I do so, but
right now I'll be leaving for ELC and wont get to it till I get back.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-02-19  4:06       ` Steven Rostedt
@ 2013-02-19  6:17         ` Mike Galbraith
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Galbraith @ 2013-02-19  6:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Li Zefan, Sebastian Andrzej Siewior, linux-kernel,
	linux-rt-users, Carsten Emde

On Mon, 2013-02-18 at 23:06 -0500, Steven Rostedt wrote: 
> On Tue, 2013-02-19 at 09:56 +0800, Li Zefan wrote:
> 
> > Oh ignore me. Just saw the patchset for 3.4-rt.
> 
> Note, I already have part of the 3.4-feature (softirq backport) tested
> and ready. What I'm waiting on is trying to figure out the best way to
> process it.
> 
> I already know I'll have a v3.4-features branch, but I'm thinking about
> patches and tarballs, if it should be posted as well, or just be in git?

If I had a vote, it'd be for git only.

Extracting patches or using git directly is trivial.  The less time
maintainers spend on trivial crud, the more time they have for hard
stuff.  Git only is a win for maintainers _and_ us greedy users :)

-Mike


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
                   ` (16 preceding siblings ...)
  2013-02-13 17:24 ` [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Steven Rostedt
@ 2013-04-24  2:36 ` Steven Rostedt
  2013-04-24  8:11   ` Sebastian Andrzej Siewior
  17 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-04-24  2:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users, Carsten Emde

On Wed, 2013-02-13 at 17:11 +0100, Sebastian Andrzej Siewior wrote:
> 3.2-rt is a long term supported kernel, which lacks two RT features
> from 3.6: SLUB support and the split softirq lock implementation.
> 
> SLUB has a way better performance than SLAB on RT and the split
> softirq lock implementation is necessary especially for realtime
> networking applications.
> 
> The following patch series backports these features to 3.2-rt.
> 
> Steven, could you please apply these patches to a separate
> 3.2-rt-features branch?
> 

Hi Sebastian,

Which version of 3.2-rt was this applied to. It does not apply, where
patch 7/16 totally does not apply. I looked at the history of 3.2-rt and
I can't find where it would apply.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-04-24  2:36 ` Steven Rostedt
@ 2013-04-24  8:11   ` Sebastian Andrzej Siewior
  2013-04-24 15:45     ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-24  8:11 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, linux-rt-users, Carsten Emde

On 04/24/2013 04:36 AM, Steven Rostedt wrote:
> Hi Sebastian,

Hi Steven,

> Which version of 3.2-rt was this applied to. It does not apply, where
> patch 7/16 totally does not apply. I looked at the history of 3.2-rt and
> I can't find where it would apply.

I applied them on top of 2438ee33 ("Linux 3.2.37-rt55 REBASE"). I just
grabbed the patches from the mailing list and re-applied them on top of
this tree and is still works.

After doing the same thing on v3.2.43-rt63-rebase I got:

Applying: softirq: Make serving softirqs a task flag
Applying: softirq: Split handling function
Applying: softirq: Split softirq locks
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging include/linux/sched.h
Applying: rcu: rcutiny: Prevent RCU stall
Applying: softirq: Adapt NOHZ softirq pending check to new RT scheme
Applying: softirq: Add more debugging
Applying: softirq: Fix nohz pending issue for real
Applying: net: Use local_bh_disable in netif_rx_ni()
Applying: FIX [1/2] slub: Do not dereference NULL pointer in node_match
Applying: FIX [2/2] slub: Tid must be retrieved from the percpu area of
the current processor
Applying: slub: Use correct cpu_slab on dead cpu
Applying: smp: introduce a generic on_each_cpu_mask() function
Applying: smp: add func to IPI cpus based on parameter func
Applying: slub: only IPI CPUs that have per cpu obj to flush
Applying: mm: Enable SLUB for RT
Applying: slub: Enable irqs for __GFP_WAIT

git did not complain about patch 7 ("softirq: Fix nohz pending issue for
real") but about 3 ("softirq: Split softirq locks"). The diff between
the two queues (except different sha1, index and so) is:

--- old/0003-softirq-Split-softirq-locks.patch
+++ new/0003-softirq-Split-softirq-locks.patch
@@ -60,8 +60,8 @@
        int softirq_nestcnt;
 +      unsigned int softirqs_raised;
  #endif
- #if defined CONFIG_PREEMPT_RT_FULL && defined CONFIG_HIGHMEM
-       int kmap_idx;
+ #ifdef CONFIG_PREEMPT_RT_FULL
+ # if defined CONFIG_HIGHMEM || defined CONFIG_X86_32


and this looks good.

> Thanks,
> 
> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PREEMPT RT] SLUB and split softirq lock for v3.2-rt
  2013-04-24  8:11   ` Sebastian Andrzej Siewior
@ 2013-04-24 15:45     ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2013-04-24 15:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users, Carsten Emde

On Wed, 2013-04-24 at 10:11 +0200, Sebastian Andrzej Siewior wrote:
> On 04/24/2013 04:36 AM, Steven Rostedt wrote:
> > Hi Sebastian,
> 
> Hi Steven,
> 
> > Which version of 3.2-rt was this applied to. It does not apply, where
> > patch 7/16 totally does not apply. I looked at the history of 3.2-rt and
> > I can't find where it would apply.
> 
> I applied them on top of 2438ee33 ("Linux 3.2.37-rt55 REBASE"). I just
> grabbed the patches from the mailing list and re-applied them on top of
> this tree and is still works.
> 

Nevermind, the problem was at my end :-(  I saved the patches as a mbox,
but evolution decided to move patch 5 to after patch 7, which broke
everything :-p

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-04-24 15:45 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-13 16:11 [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Sebastian Andrzej Siewior
2013-02-13 16:11 ` [PATCH 01/16] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
2013-02-13 16:11 ` [PATCH 02/16] softirq: Split handling function Sebastian Andrzej Siewior
2013-02-13 16:11 ` [PATCH 03/16] softirq: Split softirq locks Sebastian Andrzej Siewior
2013-02-13 16:11 ` [PATCH 04/16] rcu: rcutiny: Prevent RCU stall Sebastian Andrzej Siewior
2013-02-16 20:59   ` Paul E. McKenney
2013-02-18 15:02     ` Steven Rostedt
2013-02-13 16:12 ` [PATCH 05/16] softirq: Adapt NOHZ softirq pending check to new RT scheme Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 06/16] softirq: Add more debugging Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 07/16] softirq: Fix nohz pending issue for real Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 08/16] net: Use local_bh_disable in netif_rx_ni() Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 09/16] FIX [1/2] slub: Do not dereference NULL pointer in node_match Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 10/16] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 11/16] slub: Use correct cpu_slab on dead cpu Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 12/16] smp: introduce a generic on_each_cpu_mask() function Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 13/16] smp: add func to IPI cpus based on parameter func Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 14/16] slub: only IPI CPUs that have per cpu obj to flush Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 15/16] mm: Enable SLUB for RT Sebastian Andrzej Siewior
2013-02-13 16:12 ` [PATCH 16/16] slub: Enable irqs for __GFP_WAIT Sebastian Andrzej Siewior
2013-02-13 17:24 ` [PREEMPT RT] SLUB and split softirq lock for v3.2-rt Steven Rostedt
2013-02-13 17:41   ` Thomas Gleixner
2013-02-19  1:54   ` Li Zefan
2013-02-19  1:56     ` Li Zefan
2013-02-19  4:06       ` Steven Rostedt
2013-02-19  6:17         ` Mike Galbraith
2013-04-24  2:36 ` Steven Rostedt
2013-04-24  8:11   ` Sebastian Andrzej Siewior
2013-04-24 15:45     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).