linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PREEMPT RT] SLUB and split softirq lock for v3.4-rt
@ 2013-02-13 16:13 Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 01/12] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, linux-rt-users, Carsten Emde

3.4-rt which lacks two RT features from 3.6: SLUB support and the
split softirq lock implementation.

SLUB has a way better performance than SLAB on RT and the split
softirq lock implementation is necessary especially for realtime
networking applications.

The following patch series backports these features to 3.4-rt.

Steven, could you please apply these patches to a separate
3.4-rt-features branch?

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/12] softirq: Make serving softirqs a task flag
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 02/12] softirq: Split handling function Sebastian Andrzej Siewior
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Avoid the percpu softirq_runner pointer magic by using a task flag.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched.h |    1 +
 kernel/softirq.c      |   20 +++-----------------
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b0448fa..8313893 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1876,6 +1876,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_MEMALLOC	0x00000800	/* Allocating memory */
 #define PF_NPROC_EXCEEDED 0x00001000	/* set_user noticed that RLIMIT_NPROC was exceeded */
 #define PF_USED_MATH	0x00002000	/* if unset the fpu must be initialized before use */
+#define PF_IN_SOFTIRQ	0x00004000	/* Task is serving softirq */
 #define PF_NOFREEZE	0x00008000	/* this thread should not be frozen */
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 34fe1db..2980048 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -383,7 +383,6 @@ static inline void ksoftirqd_clr_sched_params(void) { }
  * On RT we serialize softirq execution with a cpu local lock
  */
 static DEFINE_LOCAL_IRQ_LOCK(local_softirq_lock);
-static DEFINE_PER_CPU(struct task_struct *, local_softirq_runner);
 
 static void __do_softirq_common(int need_rcu_bh_qs);
 
@@ -438,22 +437,9 @@ void _local_bh_enable(void)
 }
 EXPORT_SYMBOL(_local_bh_enable);
 
-/* For tracing */
-int notrace __in_softirq(void)
-{
-	if (__get_cpu_var(local_softirq_lock).owner == current)
-		return __get_cpu_var(local_softirq_lock).nestcnt;
-	return 0;
-}
-
 int in_serving_softirq(void)
 {
-	int res;
-
-	preempt_disable();
-	res = __get_cpu_var(local_softirq_runner) == current;
-	preempt_enable();
-	return res;
+	return current->flags & PF_IN_SOFTIRQ;
 }
 EXPORT_SYMBOL(in_serving_softirq);
 
@@ -471,7 +457,7 @@ static void __do_softirq_common(int need_rcu_bh_qs)
 	/* Reset the pending bitmask before enabling irqs */
 	set_softirq_pending(0);
 
-	__get_cpu_var(local_softirq_runner) = current;
+	current->flags |= PF_IN_SOFTIRQ;
 
 	lockdep_softirq_enter();
 
@@ -482,7 +468,7 @@ static void __do_softirq_common(int need_rcu_bh_qs)
 		wakeup_softirqd();
 
 	lockdep_softirq_exit();
-	__get_cpu_var(local_softirq_runner) = NULL;
+	current->flags &= ~PF_IN_SOFTIRQ;
 
 	current->softirq_nestcnt--;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 02/12] softirq: Split handling function
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 01/12] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 03/12] softirq: Split softirq locks Sebastian Andrzej Siewior
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Split out the inner handling function, so RT can reuse it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 2980048..ad99de2 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -139,31 +139,34 @@ static void wakeup_softirqd(void)
 		wake_up_process(tsk);
 }
 
-static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
+static void handle_softirq(unsigned int vec_nr, int cpu, int need_rcu_bh_qs)
 {
-	struct softirq_action *h = softirq_vec;
+	struct softirq_action *h = softirq_vec + vec_nr;
 	unsigned int prev_count = preempt_count();
 
-	local_irq_enable();
-	for ( ; pending; h++, pending >>= 1) {
-		unsigned int vec_nr = h - softirq_vec;
+	kstat_incr_softirqs_this_cpu(vec_nr);
+	trace_softirq_entry(vec_nr);
+	h->action(h);
+	trace_softirq_exit(vec_nr);
 
-		if (!(pending & 1))
-			continue;
+	if (unlikely(prev_count != preempt_count())) {
+		pr_err("softirq %u %s %p preempt count leak: %08x -> %08x\n",
+		       vec_nr, softirq_to_name[vec_nr], h->action,
+		       prev_count, (unsigned int) preempt_count());
+		preempt_count() = prev_count;
+	}
+	if (need_rcu_bh_qs)
+		rcu_bh_qs(cpu);
+}
 
-		kstat_incr_softirqs_this_cpu(vec_nr);
-		trace_softirq_entry(vec_nr);
-		h->action(h);
-		trace_softirq_exit(vec_nr);
-		if (unlikely(prev_count != preempt_count())) {
-			printk(KERN_ERR
- "huh, entered softirq %u %s %p with preempt_count %08x exited with %08x?\n",
-			       vec_nr, softirq_to_name[vec_nr], h->action,
-			       prev_count, (unsigned int) preempt_count());
-			preempt_count() = prev_count;
-		}
-		if (need_rcu_bh_qs)
-			rcu_bh_qs(cpu);
+static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
+{
+	unsigned int vec_nr;
+
+	local_irq_enable();
+	for (vec_nr = 0; pending; vec_nr++, pending >>= 1) {
+		if (pending & 1)
+			handle_softirq(vec_nr, cpu, need_rcu_bh_qs);
 	}
 	local_irq_disable();
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 03/12] softirq: Split softirq locks
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 01/12] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 02/12] softirq: Split handling function Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 04/12] softirq: Adapt NOHZ softirq pending check to new RT scheme Sebastian Andrzej Siewior
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.

If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: remove PF_MEMALLOC logic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched.h |    1 +
 kernel/softirq.c      |  277 +++++++++++++++++++++++++++++++------------------
 2 files changed, 179 insertions(+), 99 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8313893..d907dd7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1650,6 +1650,7 @@ struct task_struct {
 #ifdef CONFIG_PREEMPT_RT_BASE
 	struct rcu_head put_rcu;
 	int softirq_nestcnt;
+	unsigned int softirqs_raised;
 #endif
 #if defined CONFIG_PREEMPT_RT_FULL && defined CONFIG_HIGHMEM
 	int kmap_idx;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ad99de2..8e5e1c7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -159,6 +159,7 @@ static void handle_softirq(unsigned int vec_nr, int cpu, int need_rcu_bh_qs)
 		rcu_bh_qs(cpu);
 }
 
+#ifndef CONFIG_PREEMPT_RT_FULL
 static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
 {
 	unsigned int vec_nr;
@@ -171,7 +172,6 @@ static void handle_pending_softirqs(u32 pending, int cpu, int need_rcu_bh_qs)
 	local_irq_disable();
 }
 
-#ifndef CONFIG_PREEMPT_RT_FULL
 /*
  * preempt_count and SOFTIRQ_OFFSET usage:
  * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
@@ -375,28 +375,113 @@ asmlinkage void do_softirq(void)
 
 #endif
 
+/*
+ * This function must run with irqs disabled!
+ */
+void raise_softirq_irqoff(unsigned int nr)
+{
+	__raise_softirq_irqoff(nr);
+
+	/*
+	 * If we're in an interrupt or softirq, we're done
+	 * (this also catches softirq-disabled code). We will
+	 * actually run the softirq once we return from
+	 * the irq or softirq.
+	 *
+	 * Otherwise we wake up ksoftirqd to make sure we
+	 * schedule the softirq soon.
+	 */
+	if (!in_interrupt())
+		wakeup_softirqd();
+}
+
+void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise(nr);
+	or_softirq_pending(1UL << nr);
+}
+
 static inline void local_bh_disable_nort(void) { local_bh_disable(); }
 static inline void _local_bh_enable_nort(void) { _local_bh_enable(); }
 static inline void ksoftirqd_set_sched_params(void) { }
 static inline void ksoftirqd_clr_sched_params(void) { }
 
+static inline int ksoftirqd_softirq_pending(void)
+{
+	return local_softirq_pending();
+}
+
 #else /* !PREEMPT_RT_FULL */
 
 /*
- * On RT we serialize softirq execution with a cpu local lock
+ * On RT we serialize softirq execution with a cpu local lock per softirq
  */
-static DEFINE_LOCAL_IRQ_LOCK(local_softirq_lock);
+static DEFINE_PER_CPU(struct local_irq_lock [NR_SOFTIRQS], local_softirq_locks);
 
-static void __do_softirq_common(int need_rcu_bh_qs);
+void __init softirq_early_init(void)
+{
+	int i;
 
-void __do_softirq(void)
+	for (i = 0; i < NR_SOFTIRQS; i++)
+		local_irq_lock_init(local_softirq_locks[i]);
+}
+
+static void lock_softirq(int which)
 {
-	__do_softirq_common(0);
+	__local_lock(&__get_cpu_var(local_softirq_locks[which]));
 }
 
-void __init softirq_early_init(void)
+static void unlock_softirq(int which)
+{
+	__local_unlock(&__get_cpu_var(local_softirq_locks[which]));
+}
+
+static void do_single_softirq(int which, int need_rcu_bh_qs)
+{
+	account_system_vtime(current);
+	current->flags |= PF_IN_SOFTIRQ;
+	lockdep_softirq_enter();
+	local_irq_enable();
+	handle_softirq(which, smp_processor_id(), need_rcu_bh_qs);
+	local_irq_disable();
+	lockdep_softirq_exit();
+	current->flags &= ~PF_IN_SOFTIRQ;
+	account_system_vtime(current);
+}
+
+/*
+ * Called with interrupts disabled. Process softirqs which were raised
+ * in current context (or on behalf of ksoftirqd).
+ */
+static void do_current_softirqs(int need_rcu_bh_qs)
 {
-	local_irq_lock_init(local_softirq_lock);
+	while (current->softirqs_raised) {
+		int i = __ffs(current->softirqs_raised);
+		unsigned int pending, mask = (1U << i);
+
+		current->softirqs_raised &= ~mask;
+		local_irq_enable();
+
+		/*
+		 * If the lock is contended, we boost the owner to
+		 * process the softirq or leave the critical section
+		 * now.
+		 */
+		lock_softirq(i);
+		local_irq_disable();
+		/*
+		 * Check with the local_softirq_pending() bits,
+		 * whether we need to process this still or if someone
+		 * else took care of it.
+		 */
+		pending = local_softirq_pending();
+		if (pending & mask) {
+			set_softirq_pending(pending & ~mask);
+			do_single_softirq(i, need_rcu_bh_qs);
+		}
+		unlock_softirq(i);
+		WARN_ON(current->softirq_nestcnt != 1);
+	}
 }
 
 void local_bh_disable(void)
@@ -411,17 +496,11 @@ void local_bh_enable(void)
 	if (WARN_ON(current->softirq_nestcnt == 0))
 		return;
 
-	if ((current->softirq_nestcnt == 1) &&
-	    local_softirq_pending() &&
-	    local_trylock(local_softirq_lock)) {
+	local_irq_disable();
+	if (current->softirq_nestcnt == 1 && current->softirqs_raised)
+		do_current_softirqs(1);
+	local_irq_enable();
 
-		local_irq_disable();
-		if (local_softirq_pending())
-			__do_softirq();
-		local_irq_enable();
-		local_unlock(local_softirq_lock);
-		WARN_ON(current->softirq_nestcnt != 1);
-	}
 	current->softirq_nestcnt--;
 	migrate_enable();
 }
@@ -446,37 +525,8 @@ int in_serving_softirq(void)
 }
 EXPORT_SYMBOL(in_serving_softirq);
 
-/*
- * Called with bh and local interrupts disabled. For full RT cpu must
- * be pinned.
- */
-static void __do_softirq_common(int need_rcu_bh_qs)
-{
-	u32 pending = local_softirq_pending();
-	int cpu = smp_processor_id();
-
-	current->softirq_nestcnt++;
-
-	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
-
-	current->flags |= PF_IN_SOFTIRQ;
-
-	lockdep_softirq_enter();
-
-	handle_pending_softirqs(pending, cpu, need_rcu_bh_qs);
-
-	pending = local_softirq_pending();
-	if (pending)
-		wakeup_softirqd();
-
-	lockdep_softirq_exit();
-	current->flags &= ~PF_IN_SOFTIRQ;
-
-	current->softirq_nestcnt--;
-}
-
-static int __thread_do_softirq(int cpu)
+/* Called with preemption disabled */
+static int ksoftirqd_do_softirq(int cpu)
 {
 	/*
 	 * Prevent the current cpu from going offline.
@@ -487,45 +537,90 @@ static int __thread_do_softirq(int cpu)
 	 */
 	pin_current_cpu();
 	/*
-	 * If called from ksoftirqd (cpu >= 0) we need to check
-	 * whether we are on the wrong cpu due to cpu offlining. If
-	 * called via thread_do_softirq() no action required.
+	 * We need to check whether we are on the wrong cpu due to cpu
+	 * offlining.
 	 */
-	if (cpu >= 0 && cpu_is_offline(cpu)) {
+	if (cpu_is_offline(cpu)) {
 		unpin_current_cpu();
 		return -1;
 	}
 	preempt_enable();
-	local_lock(local_softirq_lock);
 	local_irq_disable();
-	/*
-	 * We cannot switch stacks on RT as we want to be able to
-	 * schedule!
-	 */
-	if (local_softirq_pending())
-		__do_softirq_common(cpu >= 0);
-	local_unlock(local_softirq_lock);
-	unpin_current_cpu();
-	preempt_disable();
+	current->softirq_nestcnt++;
+	do_current_softirqs(1);
+	current->softirq_nestcnt--;
 	local_irq_enable();
+
+	preempt_disable();
+	unpin_current_cpu();
 	return 0;
 }
 
 /*
- * Called from netif_rx_ni(). Preemption enabled.
+ * Called from netif_rx_ni(). Preemption enabled, but migration
+ * disabled. So the cpu can't go away under us.
  */
 void thread_do_softirq(void)
 {
-	if (!in_serving_softirq()) {
-		preempt_disable();
-		__thread_do_softirq(-1);
-		preempt_enable();
+	if (!in_serving_softirq() && current->softirqs_raised) {
+		current->softirq_nestcnt++;
+		do_current_softirqs(0);
+		current->softirq_nestcnt--;
 	}
 }
 
-static int ksoftirqd_do_softirq(int cpu)
+void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise(nr);
+	or_softirq_pending(1UL << nr);
+
+	/*
+	 * If we are not in a hard interrupt and inside a bh disabled
+	 * region, we simply raise the flag on current. local_bh_enable()
+	 * will make sure that the softirq is executed. Otherwise we
+	 * delegate it to ksoftirqd.
+	 */
+	if (!in_irq() && current->softirq_nestcnt)
+		current->softirqs_raised |= (1U << nr);
+	else if (__this_cpu_read(ksoftirqd))
+		__this_cpu_read(ksoftirqd)->softirqs_raised |= (1U << nr);
+}
+
+/*
+ * This function must run with irqs disabled!
+ */
+void raise_softirq_irqoff(unsigned int nr)
+{
+	__raise_softirq_irqoff(nr);
+
+	/*
+	 * If we're in an hard interrupt we let irq return code deal
+	 * with the wakeup of ksoftirqd.
+	 */
+	if (in_irq())
+		return;
+
+	/*
+	 * If we are in thread context but outside of a bh disabled
+	 * region, we need to wake ksoftirqd as well.
+	 *
+	 * CHECKME: Some of the places which do that could be wrapped
+	 * into local_bh_disable/enable pairs. Though it's unclear
+	 * whether this is worth the effort. To find those places just
+	 * raise a WARN() if the condition is met.
+	 */
+	if (!current->softirq_nestcnt)
+		wakeup_softirqd();
+}
+
+void do_raise_softirq_irqoff(unsigned int nr)
 {
-	return __thread_do_softirq(cpu);
+	raise_softirq_irqoff(nr);
+}
+
+static inline int ksoftirqd_softirq_pending(void)
+{
+	return current->softirqs_raised;
 }
 
 static inline void local_bh_disable_nort(void) { }
@@ -536,6 +631,10 @@ static inline void ksoftirqd_set_sched_params(void)
 	struct sched_param param = { .sched_priority = 1 };
 
 	sched_setscheduler(current, SCHED_FIFO, &param);
+	/* Take over all pending softirqs when starting */
+	local_irq_disable();
+	current->softirqs_raised = local_softirq_pending();
+	local_irq_enable();
 }
 
 static inline void ksoftirqd_clr_sched_params(void)
@@ -582,8 +681,14 @@ static inline void invoke_softirq(void)
 		wakeup_softirqd();
 		__local_bh_enable(SOFTIRQ_OFFSET);
 	}
-#else
+#else /* PREEMPT_RT_FULL */
+	unsigned long flags;
+
+	local_irq_save(flags);
+	if (__this_cpu_read(ksoftirqd) &&
+	    __this_cpu_read(ksoftirqd)->softirqs_raised)
 		wakeup_softirqd();
+	local_irq_restore(flags);
 #endif
 }
 
@@ -607,26 +712,6 @@ void irq_exit(void)
 	sched_preempt_enable_no_resched();
 }
 
-/*
- * This function must run with irqs disabled!
- */
-inline void raise_softirq_irqoff(unsigned int nr)
-{
-	__raise_softirq_irqoff(nr);
-
-	/*
-	 * If we're in an interrupt or softirq, we're done
-	 * (this also catches softirq-disabled code). We will
-	 * actually run the softirq once we return from
-	 * the irq or softirq.
-	 *
-	 * Otherwise we wake up ksoftirqd to make sure we
-	 * schedule the softirq soon.
-	 */
-	if (!in_interrupt())
-		wakeup_softirqd();
-}
-
 void raise_softirq(unsigned int nr)
 {
 	unsigned long flags;
@@ -636,12 +721,6 @@ void raise_softirq(unsigned int nr)
 	local_irq_restore(flags);
 }
 
-void __raise_softirq_irqoff(unsigned int nr)
-{
-	trace_softirq_raise(nr);
-	or_softirq_pending(1UL << nr);
-}
-
 void open_softirq(int nr, void (*action)(struct softirq_action *))
 {
 	softirq_vec[nr].action = action;
@@ -1093,12 +1172,12 @@ static int run_ksoftirqd(void * __bind_cpu)
 
 	while (!kthread_should_stop()) {
 		preempt_disable();
-		if (!local_softirq_pending())
+		if (!ksoftirqd_softirq_pending())
 			schedule_preempt_disabled();
 
 		__set_current_state(TASK_RUNNING);
 
-		while (local_softirq_pending()) {
+		while (ksoftirqd_softirq_pending()) {
 			if (ksoftirqd_do_softirq((long) __bind_cpu))
 				goto wait_to_die;
 			sched_preempt_enable_no_resched();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 04/12] softirq: Adapt NOHZ softirq pending check to new RT scheme
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 03/12] softirq: Split softirq locks Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 05/12] softirq: Add more debugging Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

We can't rely on ksoftirqd anymore and we need to check the tasks
which run a particular softirq and if such a task is pi blocked ignore
the other pending bits of that task as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   68 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 8e5e1c7..72abc38 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -65,45 +65,75 @@ char *softirq_to_name[NR_SOFTIRQS] = {
 
 #ifdef CONFIG_NO_HZ
 # ifdef CONFIG_PREEMPT_RT_FULL
+
+struct softirq_runner {
+	struct task_struct *runner[NR_SOFTIRQS];
+};
+
+static DEFINE_PER_CPU(struct softirq_runner, softirq_runners);
+
+static inline void softirq_set_runner(unsigned int sirq)
+{
+	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
+
+	sr->runner[sirq] = current;
+}
+
+static inline void softirq_clr_runner(unsigned int sirq)
+{
+	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
+
+	sr->runner[sirq] = NULL;
+}
+
 /*
- * On preempt-rt a softirq might be blocked on a lock. There might be
- * no other runnable task on this CPU because the lock owner runs on
- * some other CPU. So we have to go into idle with the pending bit
- * set. Therefor we need to check this otherwise we warn about false
- * positives which confuses users and defeats the whole purpose of
- * this test.
+ * On preempt-rt a softirq running context might be blocked on a
+ * lock. There might be no other runnable task on this CPU because the
+ * lock owner runs on some other CPU. So we have to go into idle with
+ * the pending bit set. Therefor we need to check this otherwise we
+ * warn about false positives which confuses users and defeats the
+ * whole purpose of this test.
  *
  * This code is called with interrupts disabled.
  */
 void softirq_check_pending_idle(void)
 {
 	static int rate_limit;
-	u32 warnpending = 0, pending = local_softirq_pending();
+	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
+	u32 warnpending, pending = local_softirq_pending();
 
 	if (rate_limit >= 10)
 		return;
 
-	if (pending) {
+	warnpending = pending;
+
+	while (pending) {
 		struct task_struct *tsk;
+		int i = __ffs(pending);
 
-		tsk = __get_cpu_var(ksoftirqd);
+		pending &= ~(1 << i);
+
+		tsk = sr->runner[i];
 		/*
 		 * The wakeup code in rtmutex.c wakes up the task
 		 * _before_ it sets pi_blocked_on to NULL under
 		 * tsk->pi_lock. So we need to check for both: state
 		 * and pi_blocked_on.
 		 */
-		raw_spin_lock(&tsk->pi_lock);
-
-		if (!tsk->pi_blocked_on && !(tsk->state == TASK_RUNNING))
-			warnpending = 1;
-
-		raw_spin_unlock(&tsk->pi_lock);
+		if (tsk) {
+			raw_spin_lock(&tsk->pi_lock);
+			if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING) {
+				/* Clear all bits pending in that task */
+				warnpending &= ~(tsk->softirqs_raised);
+				warnpending &= ~(1 << i);
+			}
+			raw_spin_unlock(&tsk->pi_lock);
+		}
 	}
 
 	if (warnpending) {
 		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
-		       pending);
+		       warnpending);
 		rate_limit++;
 	}
 }
@@ -122,6 +152,10 @@ void softirq_check_pending_idle(void)
 	}
 }
 # endif
+
+#else /* !NO_HZ */
+static inline void softirq_set_runner(unsigned int sirq) { }
+static inline void softirq_clr_runner(unsigned int sirq) { }
 #endif
 
 /*
@@ -469,6 +503,7 @@ static void do_current_softirqs(int need_rcu_bh_qs)
 		 */
 		lock_softirq(i);
 		local_irq_disable();
+		softirq_set_runner(i);
 		/*
 		 * Check with the local_softirq_pending() bits,
 		 * whether we need to process this still or if someone
@@ -479,6 +514,7 @@ static void do_current_softirqs(int need_rcu_bh_qs)
 			set_softirq_pending(pending & ~mask);
 			do_single_softirq(i, need_rcu_bh_qs);
 		}
+		softirq_clr_runner(i);
 		unlock_softirq(i);
 		WARN_ON(current->softirq_nestcnt != 1);
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 05/12] softirq: Add more debugging
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 04/12] softirq: Adapt NOHZ softirq pending check to new RT scheme Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 06/12] softirq: Fix nohz pending issue for real Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

We really want to find code which calls __raise_softirq_irqsoff() and
runs neither in hardirq context nor in a local_bh disabled
region. This is even wrong on mainline as that code relies on random
events to take care of it's newly raised softirq.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 72abc38..499329d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -605,7 +605,7 @@ void thread_do_softirq(void)
 	}
 }
 
-void __raise_softirq_irqoff(unsigned int nr)
+static void do_raise_softirq_irqoff(unsigned int nr)
 {
 	trace_softirq_raise(nr);
 	or_softirq_pending(1UL << nr);
@@ -622,12 +622,19 @@ void __raise_softirq_irqoff(unsigned int nr)
 		__this_cpu_read(ksoftirqd)->softirqs_raised |= (1U << nr);
 }
 
+void __raise_softirq_irqoff(unsigned int nr)
+{
+	do_raise_softirq_irqoff(nr);
+	if (!in_irq() && !current->softirq_nestcnt)
+		wakeup_softirqd();
+}
+
 /*
  * This function must run with irqs disabled!
  */
 void raise_softirq_irqoff(unsigned int nr)
 {
-	__raise_softirq_irqoff(nr);
+	do_raise_softirq_irqoff(nr);
 
 	/*
 	 * If we're in an hard interrupt we let irq return code deal
@@ -649,11 +656,6 @@ void raise_softirq_irqoff(unsigned int nr)
 		wakeup_softirqd();
 }
 
-void do_raise_softirq_irqoff(unsigned int nr)
-{
-	raise_softirq_irqoff(nr);
-}
-
 static inline int ksoftirqd_softirq_pending(void)
 {
 	return current->softirqs_raised;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 06/12] softirq: Fix nohz pending issue for real
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 05/12] softirq: Add more debugging Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 07/12] net: Use local_bh_disable in netif_rx_ni() Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

We really need to iterate through all softirqs to find a potentially
blocked runner.

T1 runs softirq X (that cleared pending bit for X)

Interrupt raises softirq Y

T1 gets blocked on a lock and lock owner is not runnable

T1 schedules out

CPU goes idle and complains about pending softirq Y.

Now iterating over all softirqs lets us find the runner for X and
eliminate Y from the to warn about list as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 499329d..3f67aff 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -100,20 +100,15 @@ void softirq_check_pending_idle(void)
 {
 	static int rate_limit;
 	struct softirq_runner *sr = &__get_cpu_var(softirq_runners);
-	u32 warnpending, pending = local_softirq_pending();
+	u32 warnpending = local_softirq_pending();
+	int i;
 
 	if (rate_limit >= 10)
 		return;
 
-	warnpending = pending;
-
-	while (pending) {
-		struct task_struct *tsk;
-		int i = __ffs(pending);
-
-		pending &= ~(1 << i);
+	for (i = 0; i < NR_SOFTIRQS; i++) {
+		struct task_struct *tsk = sr->runner[i];
 
-		tsk = sr->runner[i];
 		/*
 		 * The wakeup code in rtmutex.c wakes up the task
 		 * _before_ it sets pi_blocked_on to NULL under
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 07/12] net: Use local_bh_disable in netif_rx_ni()
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 06/12] softirq: Fix nohz pending issue for real Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 08/12] FIX [1/2] slub: Do not dereference NULL pointer in node_match Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

This code triggers the new WARN in __raise_softirq_irqsoff() though it
actually looks at the softirq pending bit and calls into the softirq
code, but that fits not well with the context related softirq model of
RT. It's correct on mainline though, but going through
local_bh_disable/enable here is not going to hurt badly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/dev.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5a40075..c0a6400 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2965,11 +2965,9 @@ int netif_rx_ni(struct sk_buff *skb)
 {
 	int err;
 
-	migrate_disable();
+	local_bh_disable();
 	err = netif_rx(skb);
-	if (local_softirq_pending())
-		thread_do_softirq();
-	migrate_enable();
+	local_bh_enable();
 
 	return err;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 08/12] FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 07/12] net: Use local_bh_disable in netif_rx_ni() Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 09/12] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Christoph Lameter,
	Pekka Enberg, Thomas Gleixner, Sebastian Andrzej Siewior

From: Christoph Lameter <cl@linux.com>

The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxhchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.

Check for the NULL pointer in node_match.

A false positive will lead to a retry of the loop in __slab_alloc.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: replace page with c->page]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 71de9b5..fbf6810 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2058,7 +2058,7 @@ static void flush_all(struct kmem_cache *s)
 static inline int node_match(struct kmem_cache_cpu *c, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && c->node != node)
+	if (!c->page || (node != NUMA_NO_NODE && c->node != node))
 		return 0;
 #endif
 	return 1;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 09/12] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 08/12] FIX [1/2] slub: Do not dereference NULL pointer in node_match Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 10/12] slub: Use correct cpu_slab on dead cpu Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Christoph Lameter,
	Pekka Enberg, Thomas Gleixner, Sebastian Andrzej Siewior

From: Christoph Lameter <cl@linux.com>

As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
after the determination of the per cpu pointer and before the tid is retrieved.
This could result in allocation from the wrong node in slab_alloc.

The effect is much more severe in slab_free() where we could free to the freelist
of the wrong page.

The window for something like that occurring is pretty small but it is possible.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index fbf6810..634aabc 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2313,13 +2313,18 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 		return NULL;
 
 redo:
-
 	/*
 	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
 	 * enabled. We may switch back and forth between cpus while
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
+	 *
+	 * Preemption is disabled for the retrieval of the tid because that
+	 * must occur from the current processor. We cannot allow rescheduling
+	 * on a different processor between the determination of the pointer
+	 * and the retrieval of the tid.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2329,7 +2334,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 	 * linked list in between.
 	 */
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	object = c->freelist;
 	if (unlikely(!object || !node_match(c, node)))
@@ -2575,10 +2580,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
 	 * data is retrieved via this pointer. If we are on the same cpu
 	 * during the cmpxchg then the free will succedd.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	if (likely(page == c->page)) {
 		set_freepointer(s, object, c->freelist);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 10/12] slub: Use correct cpu_slab on dead cpu
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 09/12] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 11/12] mm: Enable SLUB for RT Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 12/12] slub: Enable irqs for __GFP_WAIT Sebastian Andrzej Siewior
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Christoph Lameter,
	Sebastian Andrzej Siewior

From: Christoph Lameter <cl@linux.com>

Pass a kmem_cache_cpu pointer into unfreeze partials so that a different
kmem_cache_cpu structure than the local one can be specified.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christoph Lameter <cl@linux.com>
[bigeasy@linutronix: remove unused n2]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 634aabc..f788e7e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1879,11 +1879,15 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 	}
 }
 
-/* Unfreeze all the cpu partial slabs */
-static void unfreeze_partials(struct kmem_cache *s)
+/*
+ * Unfreeze all the cpu partial slabs.
+ *
+ * This function must be called with interrupt disabled.
+ */
+static void unfreeze_partials(struct kmem_cache *s,
+		struct kmem_cache_cpu *c)
 {
 	struct kmem_cache_node *n = NULL;
-	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
 	struct page *page, *discard_page = NULL;
 
 	while ((page = c->partial)) {
@@ -1989,7 +1993,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				 * set to the per node partial list.
 				 */
 				local_irq_save(flags);
-				unfreeze_partials(s);
+				unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
 				local_irq_restore(flags);
 				pobjects = 0;
 				pages = 0;
@@ -2027,7 +2031,7 @@ static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 		if (c->page)
 			flush_slab(s, c);
 
-		unfreeze_partials(s);
+		unfreeze_partials(s, c);
 	}
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 11/12] mm: Enable SLUB for RT
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (9 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 10/12] slub: Use correct cpu_slab on dead cpu Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  2013-02-13 16:13 ` [PATCH 12/12] slub: Enable irqs for __GFP_WAIT Sebastian Andrzej Siewior
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Make SLUB RT aware and remove the restriction in Kconfig.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: fix a few conflicts]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/slub_def.h |    2 +-
 init/Kconfig             |    1 -
 mm/slub.c                |  115 +++++++++++++++++++++++++++++++++++-----------
 3 files changed, 90 insertions(+), 28 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index c2f8c8b..f0a69f5 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -55,7 +55,7 @@ struct kmem_cache_cpu {
 };
 
 struct kmem_cache_node {
-	spinlock_t list_lock;	/* Protect partial list and nr_partial */
+	raw_spinlock_t list_lock;	/* Protect partial list and nr_partial */
 	unsigned long nr_partial;
 	struct list_head partial;
 #ifdef CONFIG_SLUB_DEBUG
diff --git a/init/Kconfig b/init/Kconfig
index 87afda5..5390b4b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1266,7 +1266,6 @@ config SLAB
 
 config SLUB
 	bool "SLUB (Unqueued Allocator)"
-	depends on !PREEMPT_RT_FULL
 	help
 	   SLUB is a slab allocator that minimizes cache line usage
 	   instead of managing queues of cached objects (SLAB approach).
diff --git a/mm/slub.c b/mm/slub.c
index f788e7e..624deaa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1266,6 +1266,12 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) {}
 
 #endif /* CONFIG_SLUB_DEBUG */
 
+struct slub_free_list {
+	raw_spinlock_t		lock;
+	struct list_head	list;
+};
+static DEFINE_PER_CPU(struct slub_free_list, slub_free_list);
+
 /*
  * Slab allocation and freeing
  */
@@ -1290,7 +1296,11 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 	flags &= gfp_allowed_mask;
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (system_state == SYSTEM_RUNNING)
+#else
 	if (flags & __GFP_WAIT)
+#endif
 		local_irq_enable();
 
 	flags |= s->allocflags;
@@ -1314,7 +1324,11 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 			stat(s, ORDER_FALLBACK);
 	}
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (system_state == SYSTEM_RUNNING)
+#else
 	if (flags & __GFP_WAIT)
+#endif
 		local_irq_disable();
 
 	if (!page)
@@ -1420,6 +1434,16 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__free_pages(page, order);
 }
 
+static void free_delayed(struct kmem_cache *s, struct list_head *h)
+{
+	while(!list_empty(h)) {
+		struct page *page = list_first_entry(h, struct page, lru);
+
+		list_del(&page->lru);
+		__free_slab(s, page);
+	}
+}
+
 #define need_reserve_slab_rcu						\
 	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
 
@@ -1454,6 +1478,12 @@ static void free_slab(struct kmem_cache *s, struct page *page)
 		}
 
 		call_rcu(head, rcu_free_slab);
+	} else if (irqs_disabled()) {
+		struct slub_free_list *f = &__get_cpu_var(slub_free_list);
+
+		raw_spin_lock(&f->lock);
+		list_add(&page->lru, &f->list);
+		raw_spin_unlock(&f->lock);
 	} else
 		__free_slab(s, page);
 }
@@ -1553,7 +1583,7 @@ static void *get_partial_node(struct kmem_cache *s,
 	if (!n || !n->nr_partial)
 		return NULL;
 
-	spin_lock(&n->list_lock);
+	raw_spin_lock(&n->list_lock);
 	list_for_each_entry_safe(page, page2, &n->partial, lru) {
 		void *t = acquire_slab(s, n, page, object == NULL);
 		int available;
@@ -1575,7 +1605,7 @@ static void *get_partial_node(struct kmem_cache *s,
 			break;
 
 	}
-	spin_unlock(&n->list_lock);
+	raw_spin_unlock(&n->list_lock);
 	return object;
 }
 
@@ -1824,7 +1854,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 			 * that acquire_slab() will see a slab page that
 			 * is frozen
 			 */
-			spin_lock(&n->list_lock);
+			raw_spin_lock(&n->list_lock);
 		}
 	} else {
 		m = M_FULL;
@@ -1835,7 +1865,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 			 * slabs from diagnostic functions will not see
 			 * any frozen slabs.
 			 */
-			spin_lock(&n->list_lock);
+			raw_spin_lock(&n->list_lock);
 		}
 	}
 
@@ -1870,7 +1900,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 		goto redo;
 
 	if (lock)
-		spin_unlock(&n->list_lock);
+		raw_spin_unlock(&n->list_lock);
 
 	if (m == M_FREE) {
 		stat(s, DEACTIVATE_EMPTY);
@@ -1919,10 +1949,10 @@ static void unfreeze_partials(struct kmem_cache *s,
 				m = M_PARTIAL;
 				if (n != n2) {
 					if (n)
-						spin_unlock(&n->list_lock);
+						raw_spin_unlock(&n->list_lock);
 
 					n = n2;
-					spin_lock(&n->list_lock);
+					raw_spin_lock(&n->list_lock);
 				}
 			}
 
@@ -1951,7 +1981,7 @@ static void unfreeze_partials(struct kmem_cache *s,
 	}
 
 	if (n)
-		spin_unlock(&n->list_lock);
+		raw_spin_unlock(&n->list_lock);
 
 	while (discard_page) {
 		page = discard_page;
@@ -1972,7 +2002,7 @@ static void unfreeze_partials(struct kmem_cache *s,
  * If we did not find a slot then simply move all the partials to the
  * per node partial list.
  */
-int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
+static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 {
 	struct page *oldpage;
 	int pages;
@@ -1987,6 +2017,8 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 			pobjects = oldpage->pobjects;
 			pages = oldpage->pages;
 			if (drain && pobjects > s->cpu_partial) {
+				LIST_HEAD(tofree);
+				struct slub_free_list *f;
 				unsigned long flags;
 				/*
 				 * partial array is full. Move the existing
@@ -1994,7 +2026,12 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				 */
 				local_irq_save(flags);
 				unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
+				f = &__get_cpu_var(slub_free_list);
+				raw_spin_lock(&f->lock);
+				list_splice_init(&f->list, &tofree);
+				raw_spin_unlock(&f->lock);
 				local_irq_restore(flags);
+				free_delayed(s, &tofree);
 				pobjects = 0;
 				pages = 0;
 				stat(s, CPU_PARTIAL_DRAIN);
@@ -2052,7 +2089,22 @@ static bool has_cpu_slab(int cpu, void *info)
 
 static void flush_all(struct kmem_cache *s)
 {
+	LIST_HEAD(tofree);
+	int cpu;
+
 	on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1, GFP_ATOMIC);
+	for_each_online_cpu(cpu) {
+		struct slub_free_list *f;
+
+		if (!has_cpu_slab(cpu, s))
+			continue;
+
+		f = &per_cpu(slub_free_list, cpu);
+		raw_spin_lock_irq(&f->lock);
+		list_splice_init(&f->list, &tofree);
+		raw_spin_unlock_irq(&f->lock);
+		free_delayed(s, &tofree);
+	}
 }
 
 /*
@@ -2080,10 +2132,10 @@ static unsigned long count_partial(struct kmem_cache_node *n,
 	unsigned long x = 0;
 	struct page *page;
 
-	spin_lock_irqsave(&n->list_lock, flags);
+	raw_spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry(page, &n->partial, lru)
 		x += get_count(page);
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	return x;
 }
 
@@ -2210,6 +2262,8 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			  unsigned long addr, struct kmem_cache_cpu *c)
 {
+	struct slub_free_list *f;
+	LIST_HEAD(tofree);
 	void **object;
 	unsigned long flags;
 
@@ -2252,7 +2306,13 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 load_freelist:
 	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
+out:
+	f = &__get_cpu_var(slub_free_list);
+	raw_spin_lock(&f->lock);
+	list_splice_init(&f->list, &tofree);
+	raw_spin_unlock(&f->lock);
 	local_irq_restore(flags);
+	free_delayed(s, &tofree);
 	return object;
 
 new_slab:
@@ -2277,8 +2337,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
 				slab_out_of_memory(s, gfpflags, node);
 
-			local_irq_restore(flags);
-			return NULL;
+			goto out;
 		}
 	}
 
@@ -2292,8 +2351,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	c->freelist = get_freepointer(s, object);
 	deactivate_slab(s, c);
 	c->node = NUMA_NO_NODE;
-	local_irq_restore(flags);
-	return object;
+	goto out;
 }
 
 /*
@@ -2488,7 +2546,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 				 * Otherwise the list_lock will synchronize with
 				 * other processors updating the list of slabs.
 				 */
-				spin_lock_irqsave(&n->list_lock, flags);
+				raw_spin_lock_irqsave(&n->list_lock, flags);
 
 			}
 		}
@@ -2538,7 +2596,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 			stat(s, FREE_ADD_PARTIAL);
 		}
 	}
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	return;
 
 slab_empty:
@@ -2552,7 +2610,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 		/* Slab must be on the full list */
 		remove_full(s, page);
 
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	stat(s, FREE_SLAB);
 	discard_slab(s, page);
 }
@@ -2782,7 +2840,7 @@ static void
 init_kmem_cache_node(struct kmem_cache_node *n, struct kmem_cache *s)
 {
 	n->nr_partial = 0;
-	spin_lock_init(&n->list_lock);
+	raw_spin_lock_init(&n->list_lock);
 	INIT_LIST_HEAD(&n->partial);
 #ifdef CONFIG_SLUB_DEBUG
 	atomic_long_set(&n->nr_slabs, 0);
@@ -3525,7 +3583,7 @@ int kmem_cache_shrink(struct kmem_cache *s)
 		for (i = 0; i < objects; i++)
 			INIT_LIST_HEAD(slabs_by_inuse + i);
 
-		spin_lock_irqsave(&n->list_lock, flags);
+		raw_spin_lock_irqsave(&n->list_lock, flags);
 
 		/*
 		 * Build lists indexed by the items in use in each slab.
@@ -3546,7 +3604,7 @@ int kmem_cache_shrink(struct kmem_cache *s)
 		for (i = objects - 1; i > 0; i--)
 			list_splice(slabs_by_inuse + i, n->partial.prev);
 
-		spin_unlock_irqrestore(&n->list_lock, flags);
+		raw_spin_unlock_irqrestore(&n->list_lock, flags);
 
 		/* Release empty slabs */
 		list_for_each_entry_safe(page, t, slabs_by_inuse, lru)
@@ -3712,10 +3770,15 @@ void __init kmem_cache_init(void)
 	int i;
 	int caches = 0;
 	struct kmem_cache *temp_kmem_cache;
-	int order;
+	int order, cpu;
 	struct kmem_cache *temp_kmem_cache_node;
 	unsigned long kmalloc_size;
 
+	for_each_possible_cpu(cpu) {
+		raw_spin_lock_init(&per_cpu(slub_free_list, cpu).lock);
+		INIT_LIST_HEAD(&per_cpu(slub_free_list, cpu).list);
+	}
+
 	if (debug_guardpage_minorder())
 		slub_max_order = 0;
 
@@ -4139,7 +4202,7 @@ static int validate_slab_node(struct kmem_cache *s,
 	struct page *page;
 	unsigned long flags;
 
-	spin_lock_irqsave(&n->list_lock, flags);
+	raw_spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, lru) {
 		validate_slab_slab(s, page, map);
@@ -4162,7 +4225,7 @@ static int validate_slab_node(struct kmem_cache *s,
 			atomic_long_read(&n->nr_slabs));
 
 out:
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	return count;
 }
 
@@ -4352,12 +4415,12 @@ static int list_locations(struct kmem_cache *s, char *buf,
 		if (!atomic_long_read(&n->nr_slabs))
 			continue;
 
-		spin_lock_irqsave(&n->list_lock, flags);
+		raw_spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, lru)
 			process_slab(&t, s, page, alloc, map);
 		list_for_each_entry(page, &n->full, lru)
 			process_slab(&t, s, page, alloc, map);
-		spin_unlock_irqrestore(&n->list_lock, flags);
+		raw_spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
 	for (i = 0; i < t.count; i++) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 12/12] slub: Enable irqs for __GFP_WAIT
  2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
                   ` (10 preceding siblings ...)
  2013-02-13 16:13 ` [PATCH 11/12] mm: Enable SLUB for RT Sebastian Andrzej Siewior
@ 2013-02-13 16:13 ` Sebastian Andrzej Siewior
  11 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-02-13 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Carsten Emde, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy@linutronix: fix !page conflict]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/slub.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 624deaa..bdb7f3a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1293,14 +1293,15 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	struct page *page;
 	struct kmem_cache_order_objects oo = s->oo;
 	gfp_t alloc_gfp;
+	bool enableirqs;
 
 	flags &= gfp_allowed_mask;
 
+	enableirqs = (flags & __GFP_WAIT) != 0;
 #ifdef CONFIG_PREEMPT_RT_FULL
-	if (system_state == SYSTEM_RUNNING)
-#else
-	if (flags & __GFP_WAIT)
+	enableirqs |= system_state == SYSTEM_RUNNING;
 #endif
+	if (enableirqs)
 		local_irq_enable();
 
 	flags |= s->allocflags;
@@ -1324,11 +1325,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 			stat(s, ORDER_FALLBACK);
 	}
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (system_state == SYSTEM_RUNNING)
-#else
-	if (flags & __GFP_WAIT)
-#endif
+	if (enableirqs)
 		local_irq_disable();
 
 	if (!page)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-02-13 16:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-13 16:13 [PREEMPT RT] SLUB and split softirq lock for v3.4-rt Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 01/12] softirq: Make serving softirqs a task flag Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 02/12] softirq: Split handling function Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 03/12] softirq: Split softirq locks Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 04/12] softirq: Adapt NOHZ softirq pending check to new RT scheme Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 05/12] softirq: Add more debugging Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 06/12] softirq: Fix nohz pending issue for real Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 07/12] net: Use local_bh_disable in netif_rx_ni() Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 08/12] FIX [1/2] slub: Do not dereference NULL pointer in node_match Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 09/12] FIX [2/2] slub: Tid must be retrieved from the percpu area of the current processor Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 10/12] slub: Use correct cpu_slab on dead cpu Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 11/12] mm: Enable SLUB for RT Sebastian Andrzej Siewior
2013-02-13 16:13 ` [PATCH 12/12] slub: Enable irqs for __GFP_WAIT Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).