All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21] sched: Reduce runqueue lock contention -v6
@ 2011-04-05 15:23 Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() Peter Zijlstra
                   ` (23 more replies)
  0 siblings, 24 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

This patch series aims to optimize remote wakeups by moving most of the
work of the wakeup to the remote cpu and avoid bouncing runqueue data
structures where possible.

As measured by sembench (which basically creates a wakeup storm) on my
dual-socket westmere:

$ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
$ echo 4096 32000 64 128 > /proc/sys/kernel/sem
$ ./sembench -t 2048 -w 1900 -o 0

unpatched: run time 30 seconds 647278 worker burns per second
patched:   run time 30 seconds 816715 worker burns per second

I've queued this series for .40.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-13 21:15   ` Tony Luck
  2011-04-14  8:31   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 02/21] sched: Always provide p->on_cpu Peter Zijlstra
                   ` (22 subsequent siblings)
  23 siblings, 2 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra, Russell King, Martin Schwidefsky,
	Chris Metcalf, Jesper Nilsson, Benjamin Herrenschmidt,
	Ralf Baechle

[-- Attachment #1: peter_zijlstra-sched-provide_scheduler_ipi_callback_in_response_to.patch --]
[-- Type: text/plain, Size: 17735 bytes --]

For future rework of try_to_wake_up() we'd like to push part of that
onto the CPU the task is actually going to run on, in order to do so we
need a generic callback from the existing scheduler IPI.

This patch introduces such a generic callback: scheduler_ipi() and
implements it as a NOP.

BenH notes: PowerPC might use this IPI on offline CPUs under rare
conditions!!

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/alpha/kernel/smp.c             |    3 +--
 arch/arm/kernel/smp.c               |    5 +----
 arch/blackfin/mach-common/smp.c     |    3 +++
 arch/cris/arch-v32/kernel/smp.c     |   13 ++++++++-----
 arch/ia64/kernel/irq_ia64.c         |    2 ++
 arch/ia64/xen/irq_xen.c             |   10 +++++++++-
 arch/m32r/kernel/smp.c              |    4 +---
 arch/mips/cavium-octeon/smp.c       |    2 ++
 arch/mips/kernel/smtc.c             |    2 +-
 arch/mips/mti-malta/malta-int.c     |    2 ++
 arch/mips/pmc-sierra/yosemite/smp.c |    4 ++++
 arch/mips/sgi-ip27/ip27-irq.c       |    2 ++
 arch/mips/sibyte/bcm1480/smp.c      |    7 +++----
 arch/mips/sibyte/sb1250/smp.c       |    7 +++----
 arch/mn10300/kernel/smp.c           |    5 +----
 arch/parisc/kernel/smp.c            |    5 +----
 arch/powerpc/kernel/smp.c           |    4 ++--
 arch/s390/kernel/smp.c              |    6 +++---
 arch/sh/kernel/smp.c                |    2 ++
 arch/sparc/kernel/smp_32.c          |    4 +++-
 arch/sparc/kernel/smp_64.c          |    1 +
 arch/tile/kernel/smp.c              |    6 +-----
 arch/um/kernel/smp.c                |    2 +-
 arch/x86/kernel/smp.c               |    5 ++---
 arch/x86/xen/smp.c                  |    5 ++---
 include/linux/sched.h               |    1 +
 26 files changed, 62 insertions(+), 50 deletions(-)

Index: linux-2.6/arch/alpha/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/alpha/kernel/smp.c
+++ linux-2.6/arch/alpha/kernel/smp.c
@@ -585,8 +585,7 @@ handle_ipi(struct pt_regs *regs)
 
 		switch (which) {
 		case IPI_RESCHEDULE:
-			/* Reschedule callback.  Everything to be done
-			   is done by the interrupt return path.  */
+			scheduler_ipi();
 			break;
 
 		case IPI_CALL_FUNC:
Index: linux-2.6/arch/arm/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/smp.c
+++ linux-2.6/arch/arm/kernel/smp.c
@@ -560,10 +560,7 @@ asmlinkage void __exception_irq_entry do
 		break;
 
 	case IPI_RESCHEDULE:
-		/*
-		 * nothing more to do - eveything is
-		 * done on the interrupt return path
-		 */
+		scheduler_ipi();
 		break;
 
 	case IPI_CALL_FUNC:
Index: linux-2.6/arch/blackfin/mach-common/smp.c
===================================================================
--- linux-2.6.orig/arch/blackfin/mach-common/smp.c
+++ linux-2.6/arch/blackfin/mach-common/smp.c
@@ -164,6 +164,9 @@ static irqreturn_t ipi_handler_int1(int 
 	while (msg_queue->count) {
 		msg = &msg_queue->ipi_message[msg_queue->head];
 		switch (msg->type) {
+		case BFIN_IPI_RESCHEDULE:
+			scheduler_ipi();
+			break;
 		case BFIN_IPI_CALL_FUNC:
 			spin_unlock_irqrestore(&msg_queue->lock, flags);
 			ipi_call_function(cpu, msg);
Index: linux-2.6/arch/cris/arch-v32/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/cris/arch-v32/kernel/smp.c
+++ linux-2.6/arch/cris/arch-v32/kernel/smp.c
@@ -342,15 +342,18 @@ irqreturn_t crisv32_ipi_interrupt(int ir
 
 	ipi = REG_RD(intr_vect, irq_regs[smp_processor_id()], rw_ipi);
 
+	if (ipi.vector & IPI_SCHEDULE) {
+		scheduler_ipi();
+	}
 	if (ipi.vector & IPI_CALL) {
-	         func(info);
+		func(info);
 	}
 	if (ipi.vector & IPI_FLUSH_TLB) {
-		     if (flush_mm == FLUSH_ALL)
-			 __flush_tlb_all();
-		     else if (flush_vma == FLUSH_ALL)
+		if (flush_mm == FLUSH_ALL)
+			__flush_tlb_all();
+		else if (flush_vma == FLUSH_ALL)
 			__flush_tlb_mm(flush_mm);
-		     else
+		else
 			__flush_tlb_page(flush_vma, flush_addr);
 	}
 
Index: linux-2.6/arch/ia64/kernel/irq_ia64.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/irq_ia64.c
+++ linux-2.6/arch/ia64/kernel/irq_ia64.c
@@ -31,6 +31,7 @@
 #include <linux/irq.h>
 #include <linux/ratelimit.h>
 #include <linux/acpi.h>
+#include <linux/sched.h>
 
 #include <asm/delay.h>
 #include <asm/intrinsics.h>
@@ -496,6 +497,7 @@ ia64_handle_irq (ia64_vector vector, str
 			smp_local_flush_tlb();
 			kstat_incr_irqs_this_cpu(irq, desc);
 		} else if (unlikely(IS_RESCHEDULE(vector))) {
+			scheduler_ipi();
 			kstat_incr_irqs_this_cpu(irq, desc);
 		} else {
 			ia64_setreg(_IA64_REG_CR_TPR, vector);
Index: linux-2.6/arch/ia64/xen/irq_xen.c
===================================================================
--- linux-2.6.orig/arch/ia64/xen/irq_xen.c
+++ linux-2.6/arch/ia64/xen/irq_xen.c
@@ -92,6 +92,8 @@ static unsigned short saved_irq_cnt;
 static int xen_slab_ready;
 
 #ifdef CONFIG_SMP
+#include <linux/sched.h>
+
 /* Dummy stub. Though we may check XEN_RESCHEDULE_VECTOR before __do_IRQ,
  * it ends up to issue several memory accesses upon percpu data and
  * thus adds unnecessary traffic to other paths.
@@ -99,7 +101,13 @@ static int xen_slab_ready;
 static irqreturn_t
 xen_dummy_handler(int irq, void *dev_id)
 {
+	return IRQ_HANDLED;
+}
 
+static irqreturn_t
+xen_resched_handler(int irq, void *dev_id)
+{
+	scheduler_ipi();
 	return IRQ_HANDLED;
 }
 
@@ -110,7 +118,7 @@ static struct irqaction xen_ipi_irqactio
 };
 
 static struct irqaction xen_resched_irqaction = {
-	.handler =	xen_dummy_handler,
+	.handler =	xen_resched_handler,
 	.flags =	IRQF_DISABLED,
 	.name =		"resched"
 };
Index: linux-2.6/arch/m32r/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/m32r/kernel/smp.c
+++ linux-2.6/arch/m32r/kernel/smp.c
@@ -122,8 +122,6 @@ void smp_send_reschedule(int cpu_id)
  *
  * Description:  This routine executes on CPU which received
  *               'RESCHEDULE_IPI'.
- *               Rescheduling is processed at the exit of interrupt
- *               operation.
  *
  * Born on Date: 2002.02.05
  *
@@ -138,7 +136,7 @@ void smp_send_reschedule(int cpu_id)
  *==========================================================================*/
 void smp_reschedule_interrupt(void)
 {
-	/* nothing to do */
+	scheduler_ipi();
 }
 
 /*==========================================================================*
Index: linux-2.6/arch/mips/kernel/smtc.c
===================================================================
--- linux-2.6.orig/arch/mips/kernel/smtc.c
+++ linux-2.6/arch/mips/kernel/smtc.c
@@ -929,7 +929,7 @@ static void post_direct_ipi(int cpu, str
 
 static void ipi_resched_interrupt(void)
 {
-	/* Return from interrupt should be enough to cause scheduler check */
+	scheduler_ipi();
 }
 
 static void ipi_call_interrupt(void)
Index: linux-2.6/arch/mips/sibyte/bcm1480/smp.c
===================================================================
--- linux-2.6.orig/arch/mips/sibyte/bcm1480/smp.c
+++ linux-2.6/arch/mips/sibyte/bcm1480/smp.c
@@ -20,6 +20,7 @@
 #include <linux/delay.h>
 #include <linux/smp.h>
 #include <linux/kernel_stat.h>
+#include <linux/sched.h>
 
 #include <asm/mmu_context.h>
 #include <asm/io.h>
@@ -189,10 +190,8 @@ void bcm1480_mailbox_interrupt(void)
 	/* Clear the mailbox to clear the interrupt */
 	__raw_writeq(((u64)action)<<48, mailbox_0_clear_regs[cpu]);
 
-	/*
-	 * Nothing to do for SMP_RESCHEDULE_YOURSELF; returning from the
-	 * interrupt will do the reschedule for us
-	 */
+	if (action & SMP_RESCHEDULE_YOURSELF)
+		scheduler_ipi();
 
 	if (action & SMP_CALL_FUNCTION)
 		smp_call_function_interrupt();
Index: linux-2.6/arch/mips/sibyte/sb1250/smp.c
===================================================================
--- linux-2.6.orig/arch/mips/sibyte/sb1250/smp.c
+++ linux-2.6/arch/mips/sibyte/sb1250/smp.c
@@ -21,6 +21,7 @@
 #include <linux/interrupt.h>
 #include <linux/smp.h>
 #include <linux/kernel_stat.h>
+#include <linux/sched.h>
 
 #include <asm/mmu_context.h>
 #include <asm/io.h>
@@ -177,10 +178,8 @@ void sb1250_mailbox_interrupt(void)
 	/* Clear the mailbox to clear the interrupt */
 	____raw_writeq(((u64)action) << 48, mailbox_clear_regs[cpu]);
 
-	/*
-	 * Nothing to do for SMP_RESCHEDULE_YOURSELF; returning from the
-	 * interrupt will do the reschedule for us
-	 */
+	if (action & SMP_RESCHEDULE_YOURSELF)
+		scheduler_ipi();
 
 	if (action & SMP_CALL_FUNCTION)
 		smp_call_function_interrupt();
Index: linux-2.6/arch/mn10300/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/mn10300/kernel/smp.c
+++ linux-2.6/arch/mn10300/kernel/smp.c
@@ -494,14 +494,11 @@ void smp_send_stop(void)
  * @irq: The interrupt number.
  * @dev_id: The device ID.
  *
- * We need do nothing here, since the scheduling will be effected on our way
- * back through entry.S.
- *
  * Returns IRQ_HANDLED to indicate we handled the interrupt successfully.
  */
 static irqreturn_t smp_reschedule_interrupt(int irq, void *dev_id)
 {
-	/* do nothing */
+	scheduler_ipi();
 	return IRQ_HANDLED;
 }
 
Index: linux-2.6/arch/parisc/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/smp.c
+++ linux-2.6/arch/parisc/kernel/smp.c
@@ -155,10 +155,7 @@ ipi_interrupt(int irq, void *dev_id) 
 				
 			case IPI_RESCHEDULE:
 				smp_debug(100, KERN_DEBUG "CPU%d IPI_RESCHEDULE\n", this_cpu);
-				/*
-				 * Reschedule callback.  Everything to be
-				 * done is done by the interrupt return path.
-				 */
+				scheduler_ipi();
 				break;
 
 			case IPI_CALL_FUNC:
Index: linux-2.6/arch/powerpc/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/smp.c
+++ linux-2.6/arch/powerpc/kernel/smp.c
@@ -116,7 +116,7 @@ void smp_message_recv(int msg)
 		generic_smp_call_function_interrupt();
 		break;
 	case PPC_MSG_RESCHEDULE:
-		/* we notice need_resched on exit */
+		scheduler_ipi();
 		break;
 	case PPC_MSG_CALL_FUNC_SINGLE:
 		generic_smp_call_function_single_interrupt();
@@ -146,7 +146,7 @@ static irqreturn_t call_function_action(
 
 static irqreturn_t reschedule_action(int irq, void *data)
 {
-	/* we just need the return path side effect of checking need_resched */
+	scheduler_ipi();
 	return IRQ_HANDLED;
 }
 
Index: linux-2.6/arch/s390/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/s390/kernel/smp.c
+++ linux-2.6/arch/s390/kernel/smp.c
@@ -165,12 +165,12 @@ static void do_ext_call_interrupt(unsign
 	kstat_cpu(smp_processor_id()).irqs[EXTINT_IPI]++;
 	/*
 	 * handle bit signal external calls
-	 *
-	 * For the ec_schedule signal we have to do nothing. All the work
-	 * is done automatically when we return from the interrupt.
 	 */
 	bits = xchg(&S390_lowcore.ext_call_fast, 0);
 
+	if (test_bit(ec_schedule, &bits))
+		scheduler_ipi();
+
 	if (test_bit(ec_call_function, &bits))
 		generic_smp_call_function_interrupt();
 
Index: linux-2.6/arch/sh/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/smp.c
+++ linux-2.6/arch/sh/kernel/smp.c
@@ -20,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
+#include <linux/sched.h>
 #include <asm/atomic.h>
 #include <asm/processor.h>
 #include <asm/system.h>
@@ -323,6 +324,7 @@ void smp_message_recv(unsigned int msg)
 		generic_smp_call_function_interrupt();
 		break;
 	case SMP_MSG_RESCHEDULE:
+		scheduler_ipi();
 		break;
 	case SMP_MSG_FUNCTION_SINGLE:
 		generic_smp_call_function_single_interrupt();
Index: linux-2.6/arch/sparc/kernel/smp_32.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/smp_32.c
+++ linux-2.6/arch/sparc/kernel/smp_32.c
@@ -125,7 +125,9 @@ struct linux_prom_registers smp_penguin_
 
 void smp_send_reschedule(int cpu)
 {
-	/* See sparc64 */
+	/*
+	 * XXX missing reschedule IPI, see scheduler_ipi()
+	 */
 }
 
 void smp_send_stop(void)
Index: linux-2.6/arch/sparc/kernel/smp_64.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/smp_64.c
+++ linux-2.6/arch/sparc/kernel/smp_64.c
@@ -1368,6 +1368,7 @@ void smp_send_reschedule(int cpu)
 void __irq_entry smp_receive_signal_client(int irq, struct pt_regs *regs)
 {
 	clear_softint(1 << irq);
+	scheduler_ipi();
 }
 
 /* This is a nop because we capture all other cpus
Index: linux-2.6/arch/tile/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/tile/kernel/smp.c
+++ linux-2.6/arch/tile/kernel/smp.c
@@ -189,12 +189,8 @@ void flush_icache_range(unsigned long st
 /* Called when smp_send_reschedule() triggers IRQ_RESCHEDULE. */
 static irqreturn_t handle_reschedule_ipi(int irq, void *token)
 {
-	/*
-	 * Nothing to do here; when we return from interrupt, the
-	 * rescheduling will occur there. But do bump the interrupt
-	 * profiler count in the meantime.
-	 */
 	__get_cpu_var(irq_stat).irq_resched_count++;
+	scheduler_ipi();
 
 	return IRQ_HANDLED;
 }
Index: linux-2.6/arch/um/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/um/kernel/smp.c
+++ linux-2.6/arch/um/kernel/smp.c
@@ -173,7 +173,7 @@ void IPI_handler(int cpu)
 			break;
 
 		case 'R':
-			set_tsk_need_resched(current);
+			scheduler_ipi();
 			break;
 
 		case 'S':
Index: linux-2.6/arch/x86/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/smp.c
+++ linux-2.6/arch/x86/kernel/smp.c
@@ -194,14 +194,13 @@ static void native_stop_other_cpus(int w
 }
 
 /*
- * Reschedule call back. Nothing to do,
- * all the work is done automatically when
- * we return from the interrupt.
+ * Reschedule call back.
  */
 void smp_reschedule_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	inc_irq_stat(irq_resched_count);
+	scheduler_ipi();
 	/*
 	 * KVM uses this interrupt to force a cpu out of guest mode
 	 */
Index: linux-2.6/arch/x86/xen/smp.c
===================================================================
--- linux-2.6.orig/arch/x86/xen/smp.c
+++ linux-2.6/arch/x86/xen/smp.c
@@ -46,13 +46,12 @@ static irqreturn_t xen_call_function_int
 static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id);
 
 /*
- * Reschedule call back. Nothing to do,
- * all the work is done automatically when
- * we return from the interrupt.
+ * Reschedule call back.
  */
 static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id)
 {
 	inc_irq_stat(irq_resched_count);
+	scheduler_ipi();
 
 	return IRQ_HANDLED;
 }
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -2189,6 +2189,7 @@ extern void set_task_comm(struct task_st
 extern char *get_task_comm(char *to, struct task_struct *tsk);
 
 #ifdef CONFIG_SMP
+static inline void scheduler_ipi(void) { }
 extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
 #else
 static inline unsigned long wait_task_inactive(struct task_struct *p,
Index: linux-2.6/arch/mips/cavium-octeon/smp.c
===================================================================
--- linux-2.6.orig/arch/mips/cavium-octeon/smp.c
+++ linux-2.6/arch/mips/cavium-octeon/smp.c
@@ -44,6 +44,8 @@ static irqreturn_t mailbox_interrupt(int
 
 	if (action & SMP_CALL_FUNCTION)
 		smp_call_function_interrupt();
+	if (action & SMP_RESCHEDULE_YOURSELF)
+		scheduler_ipi();
 
 	/* Check if we've been told to flush the icache */
 	if (action & SMP_ICACHE_FLUSH)
Index: linux-2.6/arch/mips/mti-malta/malta-int.c
===================================================================
--- linux-2.6.orig/arch/mips/mti-malta/malta-int.c
+++ linux-2.6/arch/mips/mti-malta/malta-int.c
@@ -309,6 +309,8 @@ static void ipi_call_dispatch(void)
 
 static irqreturn_t ipi_resched_interrupt(int irq, void *dev_id)
 {
+	scheduler_ipi();
+
 	return IRQ_HANDLED;
 }
 
Index: linux-2.6/arch/mips/pmc-sierra/yosemite/smp.c
===================================================================
--- linux-2.6.orig/arch/mips/pmc-sierra/yosemite/smp.c
+++ linux-2.6/arch/mips/pmc-sierra/yosemite/smp.c
@@ -55,6 +55,8 @@ void titan_mailbox_irq(void)
 
 		if (status & 0x2)
 			smp_call_function_interrupt();
+		if (status & 0x4)
+			scheduler_ipi();
 		break;
 
 	case 1:
@@ -63,6 +65,8 @@ void titan_mailbox_irq(void)
 
 		if (status & 0x2)
 			smp_call_function_interrupt();
+		if (status & 0x4)
+			scheduler_ipi();
 		break;
 	}
 }
Index: linux-2.6/arch/mips/sgi-ip27/ip27-irq.c
===================================================================
--- linux-2.6.orig/arch/mips/sgi-ip27/ip27-irq.c
+++ linux-2.6/arch/mips/sgi-ip27/ip27-irq.c
@@ -147,8 +147,10 @@ static void ip27_do_irq_mask0(void)
 #ifdef CONFIG_SMP
 	if (pend0 & (1UL << CPU_RESCHED_A_IRQ)) {
 		LOCAL_HUB_CLR_INTR(CPU_RESCHED_A_IRQ);
+		scheduler_ipi();
 	} else if (pend0 & (1UL << CPU_RESCHED_B_IRQ)) {
 		LOCAL_HUB_CLR_INTR(CPU_RESCHED_B_IRQ);
+		scheduler_ipi();
 	} else if (pend0 & (1UL << CPU_CALL_A_IRQ)) {
 		LOCAL_HUB_CLR_INTR(CPU_CALL_A_IRQ);
 		smp_call_function_interrupt();



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 02/21] sched: Always provide p->on_cpu
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:31   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 03/21] mutex: Use p->on_cpu for the adaptive spin Peter Zijlstra
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-on_cpu.patch --]
[-- Type: text/plain, Size: 3751 bytes --]

Always provide p->on_cpu so that we can determine if its on a cpu
without having to lock the rq.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 include/linux/sched.h |    4 +---
 kernel/sched.c        |   46 +++++++++++++++++++++++++++++-----------------
 2 files changed, 30 insertions(+), 20 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -845,18 +845,39 @@ static inline int task_current(struct rq
 	return rq->curr == p;
 }
 
-#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 static inline int task_running(struct rq *rq, struct task_struct *p)
 {
+#ifdef CONFIG_SMP
+	return p->on_cpu;
+#else
 	return task_current(rq, p);
+#endif
 }
 
+#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
+#ifdef CONFIG_SMP
+	/*
+	 * We can optimise this out completely for !SMP, because the
+	 * SMP rebalancing from interrupt is the only thing that cares
+	 * here.
+	 */
+	next->on_cpu = 1;
+#endif
 }
 
 static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 {
+#ifdef CONFIG_SMP
+	/*
+	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
+	 * We must ensure this doesn't happen until the switch is completely
+	 * finished.
+	 */
+	smp_wmb();
+	prev->on_cpu = 0;
+#endif
 #ifdef CONFIG_DEBUG_SPINLOCK
 	/* this is a valid case when another task releases the spinlock */
 	rq->lock.owner = current;
@@ -872,15 +893,6 @@ static inline void finish_lock_switch(st
 }
 
 #else /* __ARCH_WANT_UNLOCKED_CTXSW */
-static inline int task_running(struct rq *rq, struct task_struct *p)
-{
-#ifdef CONFIG_SMP
-	return p->oncpu;
-#else
-	return task_current(rq, p);
-#endif
-}
-
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
 #ifdef CONFIG_SMP
@@ -889,7 +901,7 @@ static inline void prepare_lock_switch(s
 	 * SMP rebalancing from interrupt is the only thing that cares
 	 * here.
 	 */
-	next->oncpu = 1;
+	next->on_cpu = 1;
 #endif
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 	raw_spin_unlock_irq(&rq->lock);
@@ -902,12 +914,12 @@ static inline void finish_lock_switch(st
 {
 #ifdef CONFIG_SMP
 	/*
-	 * After ->oncpu is cleared, the task can be moved to a different CPU.
+	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
 	 * We must ensure this doesn't happen until the switch is completely
 	 * finished.
 	 */
 	smp_wmb();
-	prev->oncpu = 0;
+	prev->on_cpu = 0;
 #endif
 #ifndef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 	local_irq_enable();
@@ -2645,8 +2657,8 @@ void sched_fork(struct task_struct *p, i
 	if (likely(sched_info_on()))
 		memset(&p->sched_info, 0, sizeof(p->sched_info));
 #endif
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
-	p->oncpu = 0;
+#if defined(CONFIG_SMP)
+	p->on_cpu = 0;
 #endif
 #ifdef CONFIG_PREEMPT
 	/* Want to start with kernel preemption disabled. */
@@ -5557,8 +5569,8 @@ void __cpuinit init_idle(struct task_str
 	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
-	idle->oncpu = 1;
+#if defined(CONFIG_SMP)
+	idle->on_cpu = 1;
 #endif
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1198,9 +1198,7 @@ struct task_struct {
 	int lock_depth;		/* BKL lock depth */
 
 #ifdef CONFIG_SMP
-#ifdef __ARCH_WANT_UNLOCKED_CTXSW
-	int oncpu;
-#endif
+	int on_cpu;
 #endif
 
 	int prio, static_prio, normal_prio;



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 03/21] mutex: Use p->on_cpu for the adaptive spin
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 02/21] sched: Always provide p->on_cpu Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:32   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 04/21] sched: Change the ttwu success details Peter Zijlstra
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-on_cpu-use.patch --]
[-- Type: text/plain, Size: 5922 bytes --]

Since we now have p->on_cpu unconditionally available, use it to
re-implement mutex_spin_on_owner.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 include/linux/mutex.h |    2 -
 include/linux/sched.h |    2 -
 kernel/mutex-debug.c  |    2 -
 kernel/mutex-debug.h  |    2 -
 kernel/mutex.c        |    2 -
 kernel/mutex.h        |    2 -
 kernel/sched.c        |   83 +++++++++++++++++++-------------------------------
 7 files changed, 39 insertions(+), 56 deletions(-)

Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -51,7 +51,7 @@ struct mutex {
 	spinlock_t		wait_lock;
 	struct list_head	wait_list;
 #if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP)
-	struct thread_info	*owner;
+	struct task_struct	*owner;
 #endif
 #ifdef CONFIG_DEBUG_MUTEXES
 	const char 		*name;
Index: linux-2.6/kernel/mutex-debug.c
===================================================================
--- linux-2.6.orig/kernel/mutex-debug.c
+++ linux-2.6/kernel/mutex-debug.c
@@ -75,7 +75,7 @@ void debug_mutex_unlock(struct mutex *lo
 		return;
 
 	DEBUG_LOCKS_WARN_ON(lock->magic != lock);
-	DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
+	DEBUG_LOCKS_WARN_ON(lock->owner != current);
 	DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
 	mutex_clear_owner(lock);
 }
Index: linux-2.6/kernel/mutex-debug.h
===================================================================
--- linux-2.6.orig/kernel/mutex-debug.h
+++ linux-2.6/kernel/mutex-debug.h
@@ -29,7 +29,7 @@ extern void debug_mutex_init(struct mute
 
 static inline void mutex_set_owner(struct mutex *lock)
 {
-	lock->owner = current_thread_info();
+	lock->owner = current;
 }
 
 static inline void mutex_clear_owner(struct mutex *lock)
Index: linux-2.6/kernel/mutex.c
===================================================================
--- linux-2.6.orig/kernel/mutex.c
+++ linux-2.6/kernel/mutex.c
@@ -160,7 +160,7 @@ __mutex_lock_common(struct mutex *lock, 
 	 */
 
 	for (;;) {
-		struct thread_info *owner;
+		struct task_struct *owner;
 
 		/*
 		 * If we own the BKL, then don't spin. The owner of
Index: linux-2.6/kernel/mutex.h
===================================================================
--- linux-2.6.orig/kernel/mutex.h
+++ linux-2.6/kernel/mutex.h
@@ -19,7 +19,7 @@
 #ifdef CONFIG_SMP
 static inline void mutex_set_owner(struct mutex *lock)
 {
-	lock->owner = current_thread_info();
+	lock->owner = current;
 }
 
 static inline void mutex_clear_owner(struct mutex *lock)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4034,70 +4034,53 @@ asmlinkage void __sched schedule(void)
 EXPORT_SYMBOL(schedule);
 
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-/*
- * Look out! "owner" is an entirely speculative pointer
- * access and not reliable.
- */
-int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
-{
-	unsigned int cpu;
-	struct rq *rq;
 
-	if (!sched_feat(OWNER_SPIN))
-		return 0;
+static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
+{
+	bool ret = false;
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
-	/*
-	 * Need to access the cpu field knowing that
-	 * DEBUG_PAGEALLOC could have unmapped it if
-	 * the mutex owner just released it and exited.
-	 */
-	if (probe_kernel_address(&owner->cpu, cpu))
-		return 0;
-#else
-	cpu = owner->cpu;
-#endif
+	rcu_read_lock();
+	if (lock->owner != owner)
+		goto fail;
 
 	/*
-	 * Even if the access succeeded (likely case),
-	 * the cpu field may no longer be valid.
+	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
+	 * lock->owner still matches owner, if that fails, owner might
+	 * point to free()d memory, if it still matches, the rcu_read_lock()
+	 * ensures the memory stays valid.
 	 */
-	if (cpu >= nr_cpumask_bits)
-		return 0;
+	barrier();
 
-	/*
-	 * We need to validate that we can do a
-	 * get_cpu() and that we have the percpu area.
-	 */
-	if (!cpu_online(cpu))
-		return 0;
+	ret = owner->on_cpu;
+fail:
+	rcu_read_unlock();
 
-	rq = cpu_rq(cpu);
+	return ret;
+}
 
-	for (;;) {
-		/*
-		 * Owner changed, break to re-assess state.
-		 */
-		if (lock->owner != owner) {
-			/*
-			 * If the lock has switched to a different owner,
-			 * we likely have heavy contention. Return 0 to quit
-			 * optimistic spinning and not contend further:
-			 */
-			if (lock->owner)
-				return 0;
-			break;
-		}
+/*
+ * Look out! "owner" is an entirely speculative pointer
+ * access and not reliable.
+ */
+int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+{
+	if (!sched_feat(OWNER_SPIN))
+		return 0;
 
-		/*
-		 * Is that owner really running on that cpu?
-		 */
-		if (task_thread_info(rq->curr) != owner || need_resched())
+	while (owner_running(lock, owner)) {
+		if (need_resched())
 			return 0;
 
 		arch_mutex_cpu_relax();
 	}
 
+	/*
+	 * If the owner changed to another task there is likely
+	 * heavy contention, stop spinning.
+	 */
+	if (lock->owner)
+		return 0;
+
 	return 1;
 }
 #endif
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -360,7 +360,7 @@ extern signed long schedule_timeout_inte
 extern signed long schedule_timeout_killable(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
 asmlinkage void schedule(void);
-extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner);
 
 struct nsproxy;
 struct user_namespace;



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 04/21] sched: Change the ttwu success details
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (2 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 03/21] mutex: Use p->on_cpu for the adaptive spin Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-13  9:23   ` Peter Zijlstra
  2011-04-14  8:32   ` [tip:sched/locking] sched: Change the ttwu() " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 05/21] sched: Clean up ttwu stats Peter Zijlstra
                   ` (19 subsequent siblings)
  23 siblings, 2 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-change-ttwu-return.patch --]
[-- Type: text/plain, Size: 2484 bytes --]

try_to_wake_up() would only return a success when it would have to
place a task on a rq, change that to every time we change p->state to
TASK_RUNNING, because that's the real measure of wakeups.

This results in that success is always true for the tracepoints.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |   18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2383,10 +2383,10 @@ static inline void ttwu_activate(struct 
 	activate_task(rq, p, en_flags);
 }
 
-static inline void ttwu_post_activation(struct task_struct *p, struct rq *rq,
-					int wake_flags, bool success)
+static void
+ttwu_post_activation(struct task_struct *p, struct rq *rq, int wake_flags)
 {
-	trace_sched_wakeup(p, success);
+	trace_sched_wakeup(p, true);
 	check_preempt_curr(rq, p, wake_flags);
 
 	p->state = TASK_RUNNING;
@@ -2406,7 +2406,7 @@ static inline void ttwu_post_activation(
 	}
 #endif
 	/* if a worker is waking up, notify workqueue */
-	if ((p->flags & PF_WQ_WORKER) && success)
+	if (p->flags & PF_WQ_WORKER)
 		wq_worker_waking_up(p, cpu_of(rq));
 }
 
@@ -2505,9 +2505,9 @@ static int try_to_wake_up(struct task_st
 #endif /* CONFIG_SMP */
 	ttwu_activate(p, rq, wake_flags & WF_SYNC, orig_cpu != cpu,
 		      cpu == this_cpu, en_flags);
-	success = 1;
 out_running:
-	ttwu_post_activation(p, rq, wake_flags, success);
+	ttwu_post_activation(p, rq, wake_flags);
+	success = 1;
 out:
 	task_rq_unlock(rq, &flags);
 	put_cpu();
@@ -2526,7 +2526,6 @@ static int try_to_wake_up(struct task_st
 static void try_to_wake_up_local(struct task_struct *p)
 {
 	struct rq *rq = task_rq(p);
-	bool success = false;
 
 	BUG_ON(rq != this_rq());
 	BUG_ON(p == current);
@@ -2541,9 +2540,8 @@ static void try_to_wake_up_local(struct 
 			schedstat_inc(rq, ttwu_local);
 		}
 		ttwu_activate(p, rq, false, false, true, ENQUEUE_WAKEUP);
-		success = true;
 	}
-	ttwu_post_activation(p, rq, 0, success);
+	ttwu_post_activation(p, rq, 0);
 }
 
 /**
@@ -2705,7 +2703,7 @@ void wake_up_new_task(struct task_struct
 
 	rq = task_rq_lock(p, &flags);
 	activate_task(rq, p, 0);
-	trace_sched_wakeup_new(p, 1);
+	trace_sched_wakeup_new(p, true);
 	check_preempt_curr(rq, p, WF_FORK);
 #ifdef CONFIG_SMP
 	if (p->sched_class->task_woken)



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 05/21] sched: Clean up ttwu stats
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (3 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 04/21] sched: Change the ttwu success details Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:33   ` [tip:sched/locking] sched: Clean up ttwu() stats tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 06/21] sched: Provide p->on_rq Peter Zijlstra
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu-stat.patch --]
[-- Type: text/plain, Size: 3338 bytes --]

Collect all ttwu stat code into a single function and ensure its
always called for an actual wakeup (changing p->state to
TASK_RUNNING).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |   69 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 34 insertions(+), 35 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2408,21 +2408,38 @@ static void update_avg(u64 *avg, u64 sam
 }
 #endif
 
-static inline void ttwu_activate(struct task_struct *p, struct rq *rq,
-				 bool is_sync, bool is_migrate, bool is_local,
-				 unsigned long en_flags)
+static void
+ttwu_stat(struct rq *rq, struct task_struct *p, int cpu, int wake_flags)
 {
+#ifdef CONFIG_SCHEDSTATS
+	int this_cpu = smp_processor_id();
+
+	schedstat_inc(rq, ttwu_count);
 	schedstat_inc(p, se.statistics.nr_wakeups);
-	if (is_sync)
+
+	if (wake_flags & WF_SYNC)
 		schedstat_inc(p, se.statistics.nr_wakeups_sync);
-	if (is_migrate)
+
+	if (cpu != task_cpu(p))
 		schedstat_inc(p, se.statistics.nr_wakeups_migrate);
-	if (is_local)
+
+#ifdef CONFIG_SMP
+	if (cpu == this_cpu) {
+		schedstat_inc(rq, ttwu_local);
 		schedstat_inc(p, se.statistics.nr_wakeups_local);
-	else
-		schedstat_inc(p, se.statistics.nr_wakeups_remote);
+	} else {
+		struct sched_domain *sd;
 
-	activate_task(rq, p, en_flags);
+		schedstat_inc(p, se.statistics.nr_wakeups_remote);
+		for_each_domain(this_cpu, sd) {
+			if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
+				schedstat_inc(sd, ttwu_wake_remote);
+				break;
+			}
+		}
+	}
+#endif /* CONFIG_SMP */
+#endif /* CONFIG_SCHEDSTATS */
 }
 
 static void
@@ -2482,12 +2499,12 @@ static int try_to_wake_up(struct task_st
 	if (!(p->state & state))
 		goto out;
 
+	cpu = task_cpu(p);
+
 	if (p->se.on_rq)
 		goto out_running;
 
-	cpu = task_cpu(p);
 	orig_cpu = cpu;
-
 #ifdef CONFIG_SMP
 	if (unlikely(task_running(rq, p)))
 		goto out_activate;
@@ -2528,27 +2545,12 @@ static int try_to_wake_up(struct task_st
 	WARN_ON(task_cpu(p) != cpu);
 	WARN_ON(p->state != TASK_WAKING);
 
-#ifdef CONFIG_SCHEDSTATS
-	schedstat_inc(rq, ttwu_count);
-	if (cpu == this_cpu)
-		schedstat_inc(rq, ttwu_local);
-	else {
-		struct sched_domain *sd;
-		for_each_domain(this_cpu, sd) {
-			if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
-				schedstat_inc(sd, ttwu_wake_remote);
-				break;
-			}
-		}
-	}
-#endif /* CONFIG_SCHEDSTATS */
-
 out_activate:
 #endif /* CONFIG_SMP */
-	ttwu_activate(p, rq, wake_flags & WF_SYNC, orig_cpu != cpu,
-		      cpu == this_cpu, en_flags);
+	activate_task(rq, p, en_flags);
 out_running:
 	ttwu_post_activation(p, rq, wake_flags);
+	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
 out:
 	task_rq_unlock(rq, &flags);
@@ -2576,14 +2578,11 @@ static void try_to_wake_up_local(struct 
 	if (!(p->state & TASK_NORMAL))
 		return;
 
-	if (!p->se.on_rq) {
-		if (likely(!task_running(rq, p))) {
-			schedstat_inc(rq, ttwu_count);
-			schedstat_inc(rq, ttwu_local);
-		}
-		ttwu_activate(p, rq, false, false, true, ENQUEUE_WAKEUP);
-	}
+	if (!p->se.on_rq)
+		activate_task(rq, p, ENQUEUE_WAKEUP);
+
 	ttwu_post_activation(p, rq, 0);
+	ttwu_stat(rq, p, smp_processor_id(), 0);
 }
 
 /**



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 06/21] sched: Provide p->on_rq
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (4 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 05/21] sched: Clean up ttwu stats Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:33   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 07/21] sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock Peter Zijlstra
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-onrq.patch --]
[-- Type: text/plain, Size: 9619 bytes --]

Provide a generic p->on_rq because the p->se.on_rq semantics are
unfavourable for lockless wakeups but needed for sched_fair.

In particular, p->on_rq is only cleared when we actually dequeue the
task in schedule() and not on any random dequeue as done by things
like __migrate_task() and __sched_setscheduler().

This also allows us to remove p->se usage from !sched_fair code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 include/linux/sched.h   |    1 +
 kernel/sched.c          |   37 +++++++++++++++++++------------------
 kernel/sched_debug.c    |    2 +-
 kernel/sched_rt.c       |   16 ++++++++--------
 kernel/sched_stoptask.c |    2 +-
 5 files changed, 30 insertions(+), 28 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1208,6 +1208,7 @@ struct task_struct {
 #ifdef CONFIG_SMP
 	int on_cpu;
 #endif
+	int on_rq;
 
 	int prio, static_prio, normal_prio;
 	unsigned int rt_priority;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1788,7 +1788,6 @@ static void enqueue_task(struct rq *rq, 
 	update_rq_clock(rq);
 	sched_info_queued(p);
 	p->sched_class->enqueue_task(rq, p, flags);
-	p->se.on_rq = 1;
 }
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
@@ -1796,7 +1795,6 @@ static void dequeue_task(struct rq *rq, 
 	update_rq_clock(rq);
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, flags);
-	p->se.on_rq = 0;
 }
 
 /*
@@ -2131,7 +2129,7 @@ static void check_preempt_curr(struct rq
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (rq->curr->se.on_rq && test_tsk_need_resched(rq->curr))
+	if (rq->curr->on_rq && test_tsk_need_resched(rq->curr))
 		rq->skip_clock_update = 1;
 }
 
@@ -2206,7 +2204,7 @@ static bool migrate_task(struct task_str
 	 * If the task is not on a runqueue (and not running), then
 	 * the next wake-up will properly place the task.
 	 */
-	return p->se.on_rq || task_running(rq, p);
+	return p->on_rq || task_running(rq, p);
 }
 
 /*
@@ -2266,7 +2264,7 @@ unsigned long wait_task_inactive(struct 
 		rq = task_rq_lock(p, &flags);
 		trace_sched_wait_task(p);
 		running = task_running(rq, p);
-		on_rq = p->se.on_rq;
+		on_rq = p->on_rq;
 		ncsw = 0;
 		if (!match_state || p->state == match_state)
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
@@ -2502,7 +2500,7 @@ static int try_to_wake_up(struct task_st
 
 	cpu = task_cpu(p);
 
-	if (p->se.on_rq)
+	if (p->on_rq)
 		goto out_running;
 
 	orig_cpu = cpu;
@@ -2549,6 +2547,7 @@ static int try_to_wake_up(struct task_st
 out_activate:
 #endif /* CONFIG_SMP */
 	activate_task(rq, p, en_flags);
+	p->on_rq = 1;
 out_running:
 	ttwu_post_activation(p, rq, wake_flags);
 	ttwu_stat(rq, p, cpu, wake_flags);
@@ -2579,7 +2578,7 @@ static void try_to_wake_up_local(struct 
 	if (!(p->state & TASK_NORMAL))
 		return;
 
-	if (!p->se.on_rq)
+	if (!p->on_rq)
 		activate_task(rq, p, ENQUEUE_WAKEUP);
 
 	ttwu_post_activation(p, rq, 0);
@@ -2616,19 +2615,21 @@ int wake_up_state(struct task_struct *p,
  */
 static void __sched_fork(struct task_struct *p)
 {
+	p->on_rq			= 0;
+
+	p->se.on_rq			= 0;
 	p->se.exec_start		= 0;
 	p->se.sum_exec_runtime		= 0;
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
+	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
 	INIT_LIST_HEAD(&p->rt.run_list);
-	p->se.on_rq = 0;
-	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
@@ -2746,6 +2747,7 @@ void wake_up_new_task(struct task_struct
 
 	rq = task_rq_lock(p, &flags);
 	activate_task(rq, p, 0);
+	p->on_rq = 1;
 	trace_sched_wakeup_new(p, true);
 	check_preempt_curr(rq, p, WF_FORK);
 #ifdef CONFIG_SMP
@@ -4047,7 +4049,7 @@ static inline void schedule_debug(struct
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->se.on_rq)
+	if (prev->on_rq)
 		update_rq_clock(rq);
 	prev->sched_class->put_prev_task(rq, prev);
 }
@@ -4126,6 +4128,7 @@ asmlinkage void __sched schedule(void)
 					try_to_wake_up_local(to_wakeup);
 			}
 			deactivate_task(rq, prev, DEQUEUE_SLEEP);
+			prev->on_rq = 0;
 		}
 		switch_count = &prev->nvcsw;
 	}
@@ -4687,7 +4690,7 @@ void rt_mutex_setprio(struct task_struct
 	trace_sched_pi_setprio(p, prio);
 	oldprio = p->prio;
 	prev_class = p->sched_class;
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
 		dequeue_task(rq, p, 0);
@@ -4735,7 +4738,7 @@ void set_user_nice(struct task_struct *p
 		p->static_prio = NICE_TO_PRIO(nice);
 		goto out_unlock;
 	}
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	if (on_rq)
 		dequeue_task(rq, p, 0);
 
@@ -4869,8 +4872,6 @@ static struct task_struct *find_process_
 static void
 __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
 {
-	BUG_ON(p->se.on_rq);
-
 	p->policy = policy;
 	p->rt_priority = prio;
 	p->normal_prio = normal_prio(p);
@@ -5022,7 +5023,7 @@ static int __sched_setscheduler(struct t
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 		goto recheck;
 	}
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
 		deactivate_task(rq, p, 0);
@@ -5939,7 +5940,7 @@ static int __migrate_task(struct task_st
 	 * If we're not on a rq, the next wake-up will ensure we're
 	 * placed properly.
 	 */
-	if (p->se.on_rq) {
+	if (p->on_rq) {
 		deactivate_task(rq_src, p, 0);
 		set_task_cpu(p, dest_cpu);
 		activate_task(rq_dest, p, 0);
@@ -7930,7 +7931,7 @@ static void normalize_task(struct rq *rq
 	int old_prio = p->prio;
 	int on_rq;
 
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	if (on_rq)
 		deactivate_task(rq, p, 0);
 	__setscheduler(rq, p, SCHED_NORMAL, 0);
@@ -8276,7 +8277,7 @@ void sched_move_task(struct task_struct 
 	rq = task_rq_lock(tsk, &flags);
 
 	running = task_current(rq, tsk);
-	on_rq = tsk->se.on_rq;
+	on_rq = tsk->on_rq;
 
 	if (on_rq)
 		dequeue_task(rq, tsk, 0);
Index: linux-2.6/kernel/sched_debug.c
===================================================================
--- linux-2.6.orig/kernel/sched_debug.c
+++ linux-2.6/kernel/sched_debug.c
@@ -152,7 +152,7 @@ static void print_rq(struct seq_file *m,
 	read_lock_irqsave(&tasklist_lock, flags);
 
 	do_each_thread(g, p) {
-		if (!p->se.on_rq || task_cpu(p) != rq_cpu)
+		if (!p->on_rq || task_cpu(p) != rq_cpu)
 			continue;
 
 		print_task(m, rq, p);
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -1136,7 +1136,7 @@ static void put_prev_task_rt(struct rq *
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active
 	 */
-	if (p->se.on_rq && p->rt.nr_cpus_allowed > 1)
+	if (on_rt_rq(&p->rt) && p->rt.nr_cpus_allowed > 1)
 		enqueue_pushable_task(rq, p);
 }
 
@@ -1287,7 +1287,7 @@ static struct rq *find_lock_lowest_rq(st
 				     !cpumask_test_cpu(lowest_rq->cpu,
 						       &task->cpus_allowed) ||
 				     task_running(rq, task) ||
-				     !task->se.on_rq)) {
+				     !task->on_rq)) {
 
 				raw_spin_unlock(&lowest_rq->lock);
 				lowest_rq = NULL;
@@ -1321,7 +1321,7 @@ static struct task_struct *pick_next_pus
 	BUG_ON(task_current(rq, p));
 	BUG_ON(p->rt.nr_cpus_allowed <= 1);
 
-	BUG_ON(!p->se.on_rq);
+	BUG_ON(!p->on_rq);
 	BUG_ON(!rt_task(p));
 
 	return p;
@@ -1467,7 +1467,7 @@ static int pull_rt_task(struct rq *this_
 		 */
 		if (p && (p->prio < this_rq->rt.highest_prio.curr)) {
 			WARN_ON(p == src_rq->curr);
-			WARN_ON(!p->se.on_rq);
+			WARN_ON(!p->on_rq);
 
 			/*
 			 * There's a chance that p is higher in priority
@@ -1538,7 +1538,7 @@ static void set_cpus_allowed_rt(struct t
 	 * Update the migration status of the RQ if we have an RT task
 	 * which is running AND changing its weight value.
 	 */
-	if (p->se.on_rq && (weight != p->rt.nr_cpus_allowed)) {
+	if (p->on_rq && (weight != p->rt.nr_cpus_allowed)) {
 		struct rq *rq = task_rq(p);
 
 		if (!task_current(rq, p)) {
@@ -1608,7 +1608,7 @@ static void switched_from_rt(struct rq *
 	 * we may need to handle the pulling of RT tasks
 	 * now.
 	 */
-	if (p->se.on_rq && !rq->rt.rt_nr_running)
+	if (p->on_rq && !rq->rt.rt_nr_running)
 		pull_rt_task(rq);
 }
 
@@ -1638,7 +1638,7 @@ static void switched_to_rt(struct rq *rq
 	 * If that current running task is also an RT task
 	 * then see if we can move to another run queue.
 	 */
-	if (p->se.on_rq && rq->curr != p) {
+	if (p->on_rq && rq->curr != p) {
 #ifdef CONFIG_SMP
 		if (rq->rt.overloaded && push_rt_task(rq) &&
 		    /* Don't resched if we changed runqueues */
@@ -1657,7 +1657,7 @@ static void switched_to_rt(struct rq *rq
 static void
 prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 {
-	if (!p->se.on_rq)
+	if (!p->on_rq)
 		return;
 
 	if (rq->curr == p) {
Index: linux-2.6/kernel/sched_stoptask.c
===================================================================
--- linux-2.6.orig/kernel/sched_stoptask.c
+++ linux-2.6/kernel/sched_stoptask.c
@@ -26,7 +26,7 @@ static struct task_struct *pick_next_tas
 {
 	struct task_struct *stop = rq->stop;
 
-	if (stop && stop->se.on_rq)
+	if (stop && stop->on_rq)
 		return stop;
 
 	return NULL;



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 07/21] sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (5 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 06/21] sched: Provide p->on_rq Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:34   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 08/21] sched: Drop the rq argument to sched_class::select_task_rq() Peter Zijlstra
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-pi_lock-wakeup.patch --]
[-- Type: text/plain, Size: 3668 bytes --]

Currently p->pi_lock already serializes p->sched_class, also put
p->cpus_allowed and try_to_wake_up() under it, this prepares the way
to do the first part of ttwu() without holding rq->lock.

By having p->sched_class and p->cpus_allowed serialized by p->pi_lock,
we prepare the way to call select_task_rq() without holding rq->lock.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |   37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2301,7 +2301,7 @@ void task_oncpu_function_call(struct tas
 
 #ifdef CONFIG_SMP
 /*
- * ->cpus_allowed is protected by either TASK_WAKING or rq->lock held.
+ * ->cpus_allowed is protected by both rq->lock and p->pi_lock
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -2334,7 +2334,7 @@ static int select_fallback_rq(int cpu, s
 }
 
 /*
- * The caller (fork, wakeup) owns TASK_WAKING, ->cpus_allowed is stable.
+ * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
  */
 static inline
 int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags)
@@ -2450,7 +2450,8 @@ static int try_to_wake_up(struct task_st
 	this_cpu = get_cpu();
 
 	smp_wmb();
-	rq = task_rq_lock(p, &flags);
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	rq = __task_rq_lock(p);
 	if (!(p->state & state))
 		goto out;
 
@@ -2508,7 +2509,8 @@ static int try_to_wake_up(struct task_st
 	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
 out:
-	task_rq_unlock(rq, &flags);
+	__task_rq_unlock(rq);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 	put_cpu();
 
 	return success;
@@ -4543,6 +4545,8 @@ void rt_mutex_setprio(struct task_struct
 
 	BUG_ON(prio < 0 || prio > MAX_PRIO);
 
+	lockdep_assert_held(&p->pi_lock);
+
 	rq = task_rq_lock(p, &flags);
 
 	trace_sched_pi_setprio(p, prio);
@@ -5150,7 +5154,6 @@ long sched_getaffinity(pid_t pid, struct
 {
 	struct task_struct *p;
 	unsigned long flags;
-	struct rq *rq;
 	int retval;
 
 	get_online_cpus();
@@ -5165,9 +5168,9 @@ long sched_getaffinity(pid_t pid, struct
 	if (retval)
 		goto out_unlock;
 
-	rq = task_rq_lock(p, &flags);
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	cpumask_and(mask, &p->cpus_allowed, cpu_online_mask);
-	task_rq_unlock(rq, &flags);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 out_unlock:
 	rcu_read_unlock();
@@ -5652,18 +5655,8 @@ int set_cpus_allowed_ptr(struct task_str
 	unsigned int dest_cpu;
 	int ret = 0;
 
-	/*
-	 * Serialize against TASK_WAKING so that ttwu() and wunt() can
-	 * drop the rq->lock and still rely on ->cpus_allowed.
-	 */
-again:
-	while (task_is_waking(p))
-		cpu_relax();
-	rq = task_rq_lock(p, &flags);
-	if (task_is_waking(p)) {
-		task_rq_unlock(rq, &flags);
-		goto again;
-	}
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	rq = __task_rq_lock(p);
 
 	if (!cpumask_intersects(new_mask, cpu_active_mask)) {
 		ret = -EINVAL;
@@ -5691,13 +5684,15 @@ int set_cpus_allowed_ptr(struct task_str
 	if (migrate_task(p, rq)) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
-		task_rq_unlock(rq, &flags);
+		__task_rq_unlock(rq);
+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		tlb_migrate_finish(p->mm);
 		return 0;
 	}
 out:
-	task_rq_unlock(rq, &flags);
+	__task_rq_unlock(rq);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 	return ret;
 }



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 08/21] sched: Drop the rq argument to sched_class::select_task_rq()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (6 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 07/21] sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:34   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 09/21] sched: Remove rq argument to sched_class::task_waking() Peter Zijlstra
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-select_task_rq.patch --]
[-- Type: text/plain, Size: 7480 bytes --]

In preparation of calling select_task_rq() without rq->lock held, drop
the dependency on the rq argument.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 include/linux/sched.h   |    3 +--
 kernel/sched.c          |   20 +++++++++++---------
 kernel/sched_fair.c     |    2 +-
 kernel/sched_idletask.c |    2 +-
 kernel/sched_rt.c       |   38 ++++++++++++++++++++++++++------------
 kernel/sched_stoptask.c |    3 +--
 6 files changed, 41 insertions(+), 27 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1063,8 +1063,7 @@ struct sched_class {
 	void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
 #ifdef CONFIG_SMP
-	int  (*select_task_rq)(struct rq *rq, struct task_struct *p,
-			       int sd_flag, int flags);
+	int  (*select_task_rq)(struct task_struct *p, int sd_flag, int flags);
 
 	void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
 	void (*post_schedule) (struct rq *this_rq);
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2138,13 +2138,15 @@ static int migration_cpu_stop(void *data
  * The task's runqueue lock must be held.
  * Returns true if you have to wait for migration thread.
  */
-static bool migrate_task(struct task_struct *p, struct rq *rq)
+static bool need_migrate_task(struct task_struct *p)
 {
 	/*
 	 * If the task is not on a runqueue (and not running), then
 	 * the next wake-up will properly place the task.
 	 */
-	return p->on_rq || task_running(rq, p);
+	bool running = p->on_rq || p->on_cpu;
+	smp_rmb(); /* finish_lock_switch() */
+	return running;
 }
 
 /*
@@ -2337,9 +2339,9 @@ static int select_fallback_rq(int cpu, s
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
  */
 static inline
-int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags)
+int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
 {
-	int cpu = p->sched_class->select_task_rq(rq, p, sd_flags, wake_flags);
+	int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags);
 
 	/*
 	 * In order not to call set_task_cpu() on a blocking task we need
@@ -2484,7 +2486,7 @@ static int try_to_wake_up(struct task_st
 		en_flags |= ENQUEUE_WAKING;
 	}
 
-	cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags);
+	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
 	if (cpu != orig_cpu)
 		set_task_cpu(p, cpu);
 	__task_rq_unlock(rq);
@@ -2694,7 +2696,7 @@ void wake_up_new_task(struct task_struct
 	 * We set TASK_WAKING so that select_task_rq() can drop rq->lock
 	 * without people poking at ->cpus_allowed.
 	 */
-	cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
+	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
 	set_task_cpu(p, cpu);
 
 	p->state = TASK_RUNNING;
@@ -3420,7 +3422,7 @@ void sched_exec(void)
 	int dest_cpu;
 
 	rq = task_rq_lock(p, &flags);
-	dest_cpu = p->sched_class->select_task_rq(rq, p, SD_BALANCE_EXEC, 0);
+	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
 		goto unlock;
 
@@ -3428,7 +3430,7 @@ void sched_exec(void)
 	 * select_task_rq() can race against ->cpus_allowed
 	 */
 	if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
-	    likely(cpu_active(dest_cpu)) && migrate_task(p, rq)) {
+	    likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 
 		task_rq_unlock(rq, &flags);
@@ -5681,7 +5683,7 @@ int set_cpus_allowed_ptr(struct task_str
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (migrate_task(p, rq)) {
+	if (need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		__task_rq_unlock(rq);
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1623,7 +1623,7 @@ static int select_idle_sibling(struct ta
  * preempt must be disabled.
  */
 static int
-select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_flags)
+select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 {
 	struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
 	int cpu = smp_processor_id();
Index: linux-2.6/kernel/sched_idletask.c
===================================================================
--- linux-2.6.orig/kernel/sched_idletask.c
+++ linux-2.6/kernel/sched_idletask.c
@@ -7,7 +7,7 @@
 
 #ifdef CONFIG_SMP
 static int
-select_task_rq_idle(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
+select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
 {
 	return task_cpu(p); /* IDLE tasks as never migrated */
 }
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -973,13 +973,23 @@ static void yield_task_rt(struct rq *rq)
 static int find_lowest_rq(struct task_struct *task);
 
 static int
-select_task_rq_rt(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
+select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
 {
+	struct task_struct *curr;
+	struct rq *rq;
+	int cpu;
+
 	if (sd_flag != SD_BALANCE_WAKE)
 		return smp_processor_id();
 
+	cpu = task_cpu(p);
+	rq = cpu_rq(cpu);
+
+	rcu_read_lock();
+	curr = ACCESS_ONCE(rq->curr); /* unlocked access */
+
 	/*
-	 * If the current task is an RT task, then
+	 * If the current task on @p's runqueue is an RT task, then
 	 * try to see if we can wake this RT task up on another
 	 * runqueue. Otherwise simply start this RT task
 	 * on its current runqueue.
@@ -993,21 +1003,25 @@ select_task_rq_rt(struct rq *rq, struct 
 	 * lock?
 	 *
 	 * For equal prio tasks, we just let the scheduler sort it out.
+	 *
+	 * Otherwise, just let it ride on the affined RQ and the
+	 * post-schedule router will push the preempted task away
+	 *
+	 * This test is optimistic, if we get it wrong the load-balancer
+	 * will have to sort it out.
 	 */
-	if (unlikely(rt_task(rq->curr)) &&
-	    (rq->curr->rt.nr_cpus_allowed < 2 ||
-	     rq->curr->prio < p->prio) &&
+	if (curr && unlikely(rt_task(curr)) &&
+	    (curr->rt.nr_cpus_allowed < 2 ||
+	     curr->prio < p->prio) &&
 	    (p->rt.nr_cpus_allowed > 1)) {
-		int cpu = find_lowest_rq(p);
+		int target = find_lowest_rq(p);
 
-		return (cpu == -1) ? task_cpu(p) : cpu;
+		if (target != -1)
+			cpu = target;
 	}
+	rcu_read_unlock();
 
-	/*
-	 * Otherwise, just let it ride on the affined RQ and the
-	 * post-schedule router will push the preempted task away
-	 */
-	return task_cpu(p);
+	return cpu;
 }
 
 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
Index: linux-2.6/kernel/sched_stoptask.c
===================================================================
--- linux-2.6.orig/kernel/sched_stoptask.c
+++ linux-2.6/kernel/sched_stoptask.c
@@ -9,8 +9,7 @@
 
 #ifdef CONFIG_SMP
 static int
-select_task_rq_stop(struct rq *rq, struct task_struct *p,
-		    int sd_flag, int flags)
+select_task_rq_stop(struct task_struct *p, int sd_flag, int flags)
 {
 	return task_cpu(p); /* stop tasks as never migrate */
 }



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 09/21] sched: Remove rq argument to sched_class::task_waking()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (7 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 08/21] sched: Drop the rq argument to sched_class::select_task_rq() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:35   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 10/21] sched: Deal with non-atomic min_vruntime reads on 32bits Peter Zijlstra
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-task_waking.patch --]
[-- Type: text/plain, Size: 2283 bytes --]

In preparation of calling this without rq->lock held, remove the
dependency on the rq argument.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 include/linux/sched.h |   10 +++++++---
 kernel/sched.c        |    2 +-
 kernel/sched_fair.c   |    4 +++-
 3 files changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1045,8 +1045,12 @@ struct sched_domain;
 #define WF_FORK		0x02		/* child wakeup after fork */
 
 #define ENQUEUE_WAKEUP		1
-#define ENQUEUE_WAKING		2
-#define ENQUEUE_HEAD		4
+#define ENQUEUE_HEAD		2
+#ifdef CONFIG_SMP
+#define ENQUEUE_WAKING		4	/* sched_class::task_waking was called */
+#else
+#define ENQUEUE_WAKING		0
+#endif
 
 #define DEQUEUE_SLEEP		1
 
@@ -1067,7 +1071,7 @@ struct sched_class {
 
 	void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
 	void (*post_schedule) (struct rq *this_rq);
-	void (*task_waking) (struct rq *this_rq, struct task_struct *task);
+	void (*task_waking) (struct task_struct *task);
 	void (*task_woken) (struct rq *this_rq, struct task_struct *task);
 
 	void (*set_cpus_allowed)(struct task_struct *p,
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2481,7 +2481,7 @@ static int try_to_wake_up(struct task_st
 	p->state = TASK_WAKING;
 
 	if (p->sched_class->task_waking) {
-		p->sched_class->task_waking(rq, p);
+		p->sched_class->task_waking(p);
 		en_flags |= ENQUEUE_WAKING;
 	}
 
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1338,11 +1338,13 @@ static void yield_task_fair(struct rq *r
 
 #ifdef CONFIG_SMP
 
-static void task_waking_fair(struct rq *rq, struct task_struct *p)
+static void task_waking_fair(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
+	lockdep_assert_held(&task_rq(p)->lock);
+
 	se->vruntime -= cfs_rq->min_vruntime;
 }
 



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 10/21] sched: Deal with non-atomic min_vruntime reads on 32bits
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (8 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 09/21] sched: Remove rq argument to sched_class::task_waking() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:35   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 11/21] sched: Delay task_contributes_to_load() Peter Zijlstra
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-fair-fix-wakeup.patch --]
[-- Type: text/plain, Size: 1716 bytes --]

In order to avoid reading partial updated min_vruntime values on 32bit
implement a seqcount like solution.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c      |    3 +++
 kernel/sched_fair.c |   19 +++++++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -313,6 +313,9 @@ struct cfs_rq {
 
 	u64 exec_clock;
 	u64 min_vruntime;
+#ifndef CONFIG_64BIT
+	u64 min_vruntime_copy;
+#endif
 
 	struct rb_root tasks_timeline;
 	struct rb_node *rb_leftmost;
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -365,6 +365,10 @@ static void update_min_vruntime(struct c
 	}
 
 	cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
+#ifndef CONFIG_64BIT
+	smp_wmb();
+	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
+#endif
 }
 
 /*
@@ -1342,10 +1346,21 @@ static void task_waking_fair(struct task
 {
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	u64 min_vruntime;
 
-	lockdep_assert_held(&task_rq(p)->lock);
+#ifndef CONFIG_64BIT
+	u64 min_vruntime_copy;
 
-	se->vruntime -= cfs_rq->min_vruntime;
+	do {
+		min_vruntime_copy = cfs_rq->min_vruntime_copy;
+		smp_rmb();
+		min_vruntime = cfs_rq->min_vruntime;
+	} while (min_vruntime != min_vruntime_copy);
+#else
+	min_vruntime = cfs_rq->min_vruntime;
+#endif
+
+	se->vruntime -= min_vruntime;
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 11/21] sched: Delay task_contributes_to_load()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (9 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 10/21] sched: Deal with non-atomic min_vruntime reads on 32bits Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:35   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 12/21] sched: Also serialize ttwu_local() with p->pi_lock Peter Zijlstra
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu-contribute-to-load.patch --]
[-- Type: text/plain, Size: 1824 bytes --]

In prepratation of having to call task_contributes_to_load() without
holding rq->lock, we need to store the result until we do and can
update the rq accounting accordingly.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 include/linux/sched.h |    1 +
 kernel/sched.c        |   16 ++++------------
 2 files changed, 5 insertions(+), 12 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1264,6 +1264,7 @@ struct task_struct {
 
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
+	unsigned sched_contributes_to_load:1;
 
 	pid_t pid;
 	pid_t tgid;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2467,18 +2467,7 @@ static int try_to_wake_up(struct task_st
 	if (unlikely(task_running(rq, p)))
 		goto out_activate;
 
-	/*
-	 * In order to handle concurrent wakeups and release the rq->lock
-	 * we put the task in TASK_WAKING state.
-	 *
-	 * First fix up the nr_uninterruptible count:
-	 */
-	if (task_contributes_to_load(p)) {
-		if (likely(cpu_online(orig_cpu)))
-			rq->nr_uninterruptible--;
-		else
-			this_rq()->nr_uninterruptible--;
-	}
+	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
 	if (p->sched_class->task_waking) {
@@ -2503,6 +2492,9 @@ static int try_to_wake_up(struct task_st
 	WARN_ON(task_cpu(p) != cpu);
 	WARN_ON(p->state != TASK_WAKING);
 
+	if (p->sched_contributes_to_load)
+		rq->nr_uninterruptible--;
+
 out_activate:
 #endif /* CONFIG_SMP */
 	activate_task(rq, p, en_flags);



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 12/21] sched: Also serialize ttwu_local() with p->pi_lock
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (10 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 11/21] sched: Delay task_contributes_to_load() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:36   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 13/21] sched: Add p->pi_lock to task_rq_lock() Peter Zijlstra
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu_local.patch --]
[-- Type: text/plain, Size: 2594 bytes --]

Since we now serialize ttwu() using p->pi_lock, we also need to
serialize ttwu_local() using that, otherwise, once we drop the
rq->lock from ttwu() it can race with ttwu_local().

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2560,9 +2560,9 @@ static int try_to_wake_up(struct task_st
  * try_to_wake_up_local - try to wake up a local task with rq lock held
  * @p: the thread to be awakened
  *
- * Put @p on the run-queue if it's not already there.  The caller must
+ * Put @p on the run-queue if it's not already there. The caller must
  * ensure that this_rq() is locked, @p is bound to this_rq() and not
- * the current task.  this_rq() stays locked over invocation.
+ * the current task.
  */
 static void try_to_wake_up_local(struct task_struct *p)
 {
@@ -2570,16 +2570,21 @@ static void try_to_wake_up_local(struct 
 
 	BUG_ON(rq != this_rq());
 	BUG_ON(p == current);
-	lockdep_assert_held(&rq->lock);
+
+	raw_spin_unlock(&rq->lock);
+	raw_spin_lock(&p->pi_lock);
+	raw_spin_lock(&rq->lock);
 
 	if (!(p->state & TASK_NORMAL))
-		return;
+		goto out;
 
 	if (!p->on_rq)
 		activate_task(rq, p, ENQUEUE_WAKEUP);
 
 	ttwu_post_activation(p, rq, 0);
 	ttwu_stat(rq, p, smp_processor_id(), 0);
+out:
+	raw_spin_unlock(&p->pi_lock);
 }
 
 /**
@@ -4084,6 +4089,7 @@ pick_next_task(struct rq *rq)
  */
 asmlinkage void __sched schedule(void)
 {
+	struct task_struct *to_wakeup = NULL;
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
 	struct rq *rq;
@@ -4114,13 +4120,8 @@ asmlinkage void __sched schedule(void)
 			 * task to maintain concurrency.  If so, wake
 			 * up the task.
 			 */
-			if (prev->flags & PF_WQ_WORKER) {
-				struct task_struct *to_wakeup;
-
+			if (prev->flags & PF_WQ_WORKER)
 				to_wakeup = wq_worker_sleeping(prev, cpu);
-				if (to_wakeup)
-					try_to_wake_up_local(to_wakeup);
-			}
 			deactivate_task(rq, prev, DEQUEUE_SLEEP);
 			prev->on_rq = 0;
 		}
@@ -4137,8 +4138,13 @@ asmlinkage void __sched schedule(void)
 		raw_spin_lock(&rq->lock);
 	}
 
+	/*
+	 * All three: try_to_wake_up_local(), pre_schedule() and idle_balance()
+	 * can drop rq->lock.
+	 */
+	if (to_wakeup)
+		try_to_wake_up_local(to_wakeup);
 	pre_schedule(rq, prev);
-
 	if (unlikely(!rq->nr_running))
 		idle_balance(cpu, rq);
 



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 13/21] sched: Add p->pi_lock to task_rq_lock()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (11 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 12/21] sched: Also serialize ttwu_local() with p->pi_lock Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:36   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 14/21] sched: Drop rq->lock from first part of wake_up_new_task() Peter Zijlstra
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu-task_rq_lock.patch --]
[-- Type: text/plain, Size: 10564 bytes --]

In order to be able to call set_task_cpu() while either holding
p->pi_lock or task_rq(p)->lock we need to hold both locks in order to
stabilize task_rq().

This makes task_rq_lock() acquire both locks, and have
__task_rq_lock() validate that p->pi_lock is held. This increases the
locking overhead for most scheduler syscalls but allows reduction of
rq->lock contention for some scheduler hot paths (ttwu).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |  103 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 47 insertions(+), 56 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -600,7 +600,7 @@ static inline int cpu_of(struct rq *rq)
  * Return the group to which this tasks belongs.
  *
  * We use task_subsys_state_check() and extend the RCU verification
- * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
+ * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
  * holds that lock for each task it moves into the cgroup. Therefore
  * by holding that lock, we pin the task to the current cgroup.
  */
@@ -610,7 +610,7 @@ static inline struct task_group *task_gr
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
-			lockdep_is_held(&task_rq(p)->lock));
+			lockdep_is_held(&p->pi_lock));
 	tg = container_of(css, struct task_group, css);
 
 	return autogroup_task_group(p, tg);
@@ -926,23 +926,15 @@ static inline void finish_lock_switch(st
 #endif /* __ARCH_WANT_UNLOCKED_CTXSW */
 
 /*
- * Check whether the task is waking, we use this to synchronize ->cpus_allowed
- * against ttwu().
- */
-static inline int task_is_waking(struct task_struct *p)
-{
-	return unlikely(p->state == TASK_WAKING);
-}
-
-/*
- * __task_rq_lock - lock the runqueue a given task resides on.
- * Must be called interrupts disabled.
+ * __task_rq_lock - lock the rq @p resides on.
  */
 static inline struct rq *__task_rq_lock(struct task_struct *p)
 	__acquires(rq->lock)
 {
 	struct rq *rq;
 
+	lockdep_assert_held(&p->pi_lock);
+
 	for (;;) {
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
@@ -953,22 +945,22 @@ static inline struct rq *__task_rq_lock(
 }
 
 /*
- * task_rq_lock - lock the runqueue a given task resides on and disable
- * interrupts. Note the ordering: we can safely lookup the task_rq without
- * explicitly disabling preemption.
+ * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
  */
 static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)
+	__acquires(p->pi_lock)
 	__acquires(rq->lock)
 {
 	struct rq *rq;
 
 	for (;;) {
-		local_irq_save(*flags);
+		raw_spin_lock_irqsave(&p->pi_lock, *flags);
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
 		if (likely(rq == task_rq(p)))
 			return rq;
-		raw_spin_unlock_irqrestore(&rq->lock, *flags);
+		raw_spin_unlock(&rq->lock);
+		raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
 	}
 }
 
@@ -978,10 +970,13 @@ static void __task_rq_unlock(struct rq *
 	raw_spin_unlock(&rq->lock);
 }
 
-static inline void task_rq_unlock(struct rq *rq, unsigned long *flags)
+static inline void
+task_rq_unlock(struct rq *rq, struct task_struct *p, unsigned long *flags)
 	__releases(rq->lock)
+	__releases(p->pi_lock)
 {
-	raw_spin_unlock_irqrestore(&rq->lock, *flags);
+	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
 }
 
 /*
@@ -2178,6 +2173,11 @@ void set_task_cpu(struct task_struct *p,
 	 */
 	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
 			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+
+#ifdef CONFIG_LOCKDEP
+	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
+				      lockdep_is_held(&task_rq(p)->lock)));
+#endif
 #endif
 
 	trace_sched_migrate_task(p, new_cpu);
@@ -2273,7 +2273,7 @@ unsigned long wait_task_inactive(struct 
 		ncsw = 0;
 		if (!match_state || p->state == match_state)
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, p, &flags);
 
 		/*
 		 * If it changed from the expected state, bail out now.
@@ -2639,6 +2639,7 @@ static void __sched_fork(struct task_str
  */
 void sched_fork(struct task_struct *p, int clone_flags)
 {
+	unsigned long flags;
 	int cpu = get_cpu();
 
 	__sched_fork(p);
@@ -2689,9 +2690,9 @@ void sched_fork(struct task_struct *p, i
 	 *
 	 * Silence PROVE_RCU.
 	 */
-	rcu_read_lock();
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	set_task_cpu(p, cpu);
-	rcu_read_unlock();
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
 	if (likely(sched_info_on()))
@@ -2740,7 +2741,7 @@ void wake_up_new_task(struct task_struct
 	set_task_cpu(p, cpu);
 
 	p->state = TASK_RUNNING;
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 #endif
 
 	rq = task_rq_lock(p, &flags);
@@ -2751,7 +2752,7 @@ void wake_up_new_task(struct task_struct
 	if (p->sched_class->task_woken)
 		p->sched_class->task_woken(rq, p);
 #endif
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 	put_cpu();
 }
 
@@ -3476,12 +3477,12 @@ void sched_exec(void)
 	    likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, p, &flags);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 }
 
 #endif
@@ -3518,7 +3519,7 @@ unsigned long long task_delta_exec(struc
 
 	rq = task_rq_lock(p, &flags);
 	ns = do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ns;
 }
@@ -3536,7 +3537,7 @@ unsigned long long task_sched_runtime(st
 
 	rq = task_rq_lock(p, &flags);
 	ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ns;
 }
@@ -3560,7 +3561,7 @@ unsigned long long thread_group_sched_ru
 	rq = task_rq_lock(p, &flags);
 	thread_group_cputime(p, &totals);
 	ns = totals.sum_exec_runtime + do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ns;
 }
@@ -4675,16 +4676,13 @@ EXPORT_SYMBOL(sleep_on_timeout);
  */
 void rt_mutex_setprio(struct task_struct *p, int prio)
 {
-	unsigned long flags;
 	int oldprio, on_rq, running;
 	struct rq *rq;
 	const struct sched_class *prev_class;
 
 	BUG_ON(prio < 0 || prio > MAX_PRIO);
 
-	lockdep_assert_held(&p->pi_lock);
-
-	rq = task_rq_lock(p, &flags);
+	rq = __task_rq_lock(p);
 
 	trace_sched_pi_setprio(p, prio);
 	oldprio = p->prio;
@@ -4709,7 +4707,7 @@ void rt_mutex_setprio(struct task_struct
 		enqueue_task(rq, p, oldprio < prio ? ENQUEUE_HEAD : 0);
 
 	check_class_changed(rq, p, prev_class, oldprio);
-	task_rq_unlock(rq, &flags);
+	__task_rq_unlock(rq);
 }
 
 #endif
@@ -4757,7 +4755,7 @@ void set_user_nice(struct task_struct *p
 			resched_task(rq->curr);
 	}
 out_unlock:
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 }
 EXPORT_SYMBOL(set_user_nice);
 
@@ -4979,20 +4977,17 @@ static int __sched_setscheduler(struct t
 	/*
 	 * make sure no PI-waiters arrive (or leave) while we are
 	 * changing the priority of the task:
-	 */
-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	/*
+	 *
 	 * To be able to change p->policy safely, the apropriate
 	 * runqueue lock must be held.
 	 */
-	rq = __task_rq_lock(p);
+	rq = task_rq_lock(p, &flags);
 
 	/*
 	 * Changing the policy of the stop threads its a very bad idea
 	 */
 	if (p == rq->stop) {
-		__task_rq_unlock(rq);
-		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		task_rq_unlock(rq, p, &flags);
 		return -EINVAL;
 	}
 
@@ -5005,8 +5000,7 @@ static int __sched_setscheduler(struct t
 		if (rt_bandwidth_enabled() && rt_policy(policy) &&
 				task_group(p)->rt_bandwidth.rt_runtime == 0 &&
 				!task_group_is_autogroup(task_group(p))) {
-			__task_rq_unlock(rq);
-			raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+			task_rq_unlock(rq, p, &flags);
 			return -EPERM;
 		}
 	}
@@ -5015,8 +5009,7 @@ static int __sched_setscheduler(struct t
 	/* recheck policy now with rq lock held */
 	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
 		policy = oldpolicy = -1;
-		__task_rq_unlock(rq);
-		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		task_rq_unlock(rq, p, &flags);
 		goto recheck;
 	}
 	on_rq = p->on_rq;
@@ -5038,8 +5031,7 @@ static int __sched_setscheduler(struct t
 		activate_task(rq, p, 0);
 
 	check_class_changed(rq, p, prev_class, oldprio);
-	__task_rq_unlock(rq);
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	task_rq_unlock(rq, p, &flags);
 
 	rt_mutex_adjust_pi(p);
 
@@ -5620,7 +5612,7 @@ SYSCALL_DEFINE2(sched_rr_get_interval, p
 
 	rq = task_rq_lock(p, &flags);
 	time_slice = p->sched_class->get_rr_interval(rq, p);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	rcu_read_unlock();
 	jiffies_to_timespec(time_slice, &t);
@@ -5843,8 +5835,7 @@ int set_cpus_allowed_ptr(struct task_str
 	unsigned int dest_cpu;
 	int ret = 0;
 
-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	rq = __task_rq_lock(p);
+	rq = task_rq_lock(p, &flags);
 
 	if (!cpumask_intersects(new_mask, cpu_active_mask)) {
 		ret = -EINVAL;
@@ -5872,15 +5863,13 @@ int set_cpus_allowed_ptr(struct task_str
 	if (need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
-		__task_rq_unlock(rq);
-		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		task_rq_unlock(rq, p, &flags);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		tlb_migrate_finish(p->mm);
 		return 0;
 	}
 out:
-	__task_rq_unlock(rq);
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ret;
 }
@@ -5908,6 +5897,7 @@ static int __migrate_task(struct task_st
 	rq_src = cpu_rq(src_cpu);
 	rq_dest = cpu_rq(dest_cpu);
 
+	raw_spin_lock(&p->pi_lock);
 	double_rq_lock(rq_src, rq_dest);
 	/* Already moved. */
 	if (task_cpu(p) != src_cpu)
@@ -5930,6 +5920,7 @@ static int __migrate_task(struct task_st
 	ret = 1;
 fail:
 	double_rq_unlock(rq_src, rq_dest);
+	raw_spin_unlock(&p->pi_lock);
 	return ret;
 }
 
@@ -8656,7 +8647,7 @@ void sched_move_task(struct task_struct 
 	if (on_rq)
 		enqueue_task(rq, tsk, 0);
 
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, tsk, &flags);
 }
 #endif /* CONFIG_CGROUP_SCHED */
 



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 14/21] sched: Drop rq->lock from first part of wake_up_new_task()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (12 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 13/21] sched: Add p->pi_lock to task_rq_lock() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:37   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 15/21] sched: Drop rq->lock from sched_exec() Peter Zijlstra
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-wunt.patch --]
[-- Type: text/plain, Size: 1617 bytes --]

Since p->pi_lock now protects all things needed to call
select_task_rq() avoid the double remote rq->lock acquisition and rely
on p->pi_lock.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |   17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2729,28 +2729,18 @@ void wake_up_new_task(struct task_struct
 {
 	unsigned long flags;
 	struct rq *rq;
-	int cpu __maybe_unused = get_cpu();
 
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
-	rq = task_rq_lock(p, &flags);
-	p->state = TASK_WAKING;
-
 	/*
 	 * Fork balancing, do it here and not earlier because:
 	 *  - cpus_allowed can change in the fork path
 	 *  - any previously selected cpu might disappear through hotplug
-	 *
-	 * We set TASK_WAKING so that select_task_rq() can drop rq->lock
-	 * without people poking at ->cpus_allowed.
 	 */
-	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
-	set_task_cpu(p, cpu);
-
-	p->state = TASK_RUNNING;
-	task_rq_unlock(rq, p, &flags);
+	set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
 #endif
 
-	rq = task_rq_lock(p, &flags);
+	rq = __task_rq_lock(p);
 	activate_task(rq, p, 0);
 	p->on_rq = 1;
 	trace_sched_wakeup_new(p, true);
@@ -2760,7 +2750,6 @@ void wake_up_new_task(struct task_struct
 		p->sched_class->task_woken(rq, p);
 #endif
 	task_rq_unlock(rq, p, &flags);
-	put_cpu();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 15/21] sched: Drop rq->lock from sched_exec()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (13 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 14/21] sched: Drop rq->lock from first part of wake_up_new_task() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:37   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 16/21] sched: Remove rq->lock from the first half of ttwu() Peter Zijlstra
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-exec.patch --]
[-- Type: text/plain, Size: 1577 bytes --]

Since we can now call select_task_rq() and set_task_cpu() with only
p->pi_lock held, and sched_exec() load-balancing has always been
optimistic, drop all rq->lock usage.

Oleg also noted that need_migrate_task() will always be true for
current, so don't bother calling that at all.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |   15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -3454,27 +3454,22 @@ void sched_exec(void)
 {
 	struct task_struct *p = current;
 	unsigned long flags;
-	struct rq *rq;
 	int dest_cpu;
 
-	rq = task_rq_lock(p, &flags);
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
 		goto unlock;
 
-	/*
-	 * select_task_rq() can race against ->cpus_allowed
-	 */
-	if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
-	    likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
+	if (likely(cpu_active(dest_cpu))) {
 		struct migration_arg arg = { p, dest_cpu };
 
-		task_rq_unlock(rq, p, &flags);
-		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
-	task_rq_unlock(rq, p, &flags);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 }
 
 #endif



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 16/21] sched: Remove rq->lock from the first half of ttwu()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (14 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 15/21] sched: Drop rq->lock from sched_exec() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:38   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 17/21] sched: Remove rq argument from ttwu_stat() Peter Zijlstra
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu-optimize.patch --]
[-- Type: text/plain, Size: 3865 bytes --]

Currently ttwu() does two rq->lock acquisitions, once on the task's
old rq, holding it over the p->state fiddling and load-balance pass.
Then it drops the old rq->lock to acquire the new rq->lock.

By having serialized ttwu(), p->sched_class, p->cpus_allowed with
p->pi_lock, we can now drop the whole first rq->lock acquisition.

The p->pi_lock serializing concurrent ttwu() calls protects p->state,
which we will set to TASK_WAKING to bridge possible p->pi_lock to
rq->lock gaps and serialize set_task_cpu() calls against
task_rq_lock().

The p->pi_lock serialization of p->sched_class allows us to call
scheduling class methods without holding the rq->lock, and the
serialization of p->cpus_allowed allows us to do the load-balancing
bits without races.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   65 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 28 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2485,70 +2485,79 @@ ttwu_post_activation(struct task_struct 
  * Returns %true if @p was woken up, %false if it was already running
  * or @state didn't match @p's state.
  */
-static int try_to_wake_up(struct task_struct *p, unsigned int state,
-			  int wake_flags)
+static int
+try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 {
-	int cpu, orig_cpu, this_cpu, success = 0;
+	int cpu, this_cpu, success = 0;
 	unsigned long flags;
-	unsigned long en_flags = ENQUEUE_WAKEUP;
 	struct rq *rq;
 
 	this_cpu = get_cpu();
 
 	smp_wmb();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	rq = __task_rq_lock(p);
 	if (!(p->state & state))
 		goto out;
 
 	cpu = task_cpu(p);
 
-	if (p->on_rq)
-		goto out_running;
+	if (p->on_rq) {
+		rq = __task_rq_lock(p);
+		if (p->on_rq)
+			goto out_running;
+		__task_rq_unlock(rq);
+	}
 
-	orig_cpu = cpu;
 #ifdef CONFIG_SMP
-	if (unlikely(task_running(rq, p)))
-		goto out_activate;
+	while (p->on_cpu) {
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+		/*
+		 * If called from interrupt context we could have landed in the
+		 * middle of schedule(), in this case we should take care not
+		 * to spin on ->on_cpu if p is current, since that would
+		 * deadlock.
+		 */
+		if (p == current)
+			goto out_activate;
+#endif
+		cpu_relax();
+	}
+	/*
+	 * Pairs with the smp_wmb() in finish_lock_switch().
+	 */
+	smp_rmb();
 
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
-	if (p->sched_class->task_waking) {
+	if (p->sched_class->task_waking)
 		p->sched_class->task_waking(p);
-		en_flags |= ENQUEUE_WAKING;
-	}
 
 	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
-	if (cpu != orig_cpu)
-		set_task_cpu(p, cpu);
-	__task_rq_unlock(rq);
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+out_activate:
+#endif
+#endif /* CONFIG_SMP */
 
 	rq = cpu_rq(cpu);
 	raw_spin_lock(&rq->lock);
 
-	/*
-	 * We migrated the task without holding either rq->lock, however
-	 * since the task is not on the task list itself, nobody else
-	 * will try and migrate the task, hence the rq should match the
-	 * cpu we just moved it to.
-	 */
-	WARN_ON(task_cpu(p) != cpu);
-	WARN_ON(p->state != TASK_WAKING);
+#ifdef CONFIG_SMP
+	if (cpu != task_cpu(p))
+		set_task_cpu(p, cpu);
 
 	if (p->sched_contributes_to_load)
 		rq->nr_uninterruptible--;
+#endif
 
-out_activate:
-#endif /* CONFIG_SMP */
-	activate_task(rq, p, en_flags);
+	activate_task(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
 	p->on_rq = 1;
 out_running:
 	ttwu_post_activation(p, rq, wake_flags);
 	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
-out:
 	__task_rq_unlock(rq);
+out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 	put_cpu();
 



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 17/21] sched: Remove rq argument from ttwu_stat()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (15 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 16/21] sched: Remove rq->lock from the first half of ttwu() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:38   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 18/21] sched: Rename ttwu_post_activation Peter Zijlstra
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu_stat-rq.patch --]
[-- Type: text/plain, Size: 1547 bytes --]

In order to call ttwu_stat() without holding rq->lock we must remove
its rq argument. Since we need to change rq stats, account to the
local rq instead of the task rq, this is safe since we have IRQs
disabled.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2413,10 +2413,11 @@ static void update_avg(u64 *avg, u64 sam
 #endif
 
 static void
-ttwu_stat(struct rq *rq, struct task_struct *p, int cpu, int wake_flags)
+ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 {
 #ifdef CONFIG_SCHEDSTATS
 	int this_cpu = smp_processor_id();
+	struct rq *rq = this_rq();
 
 	schedstat_inc(rq, ttwu_count);
 	schedstat_inc(p, se.statistics.nr_wakeups);
@@ -2555,9 +2556,10 @@ try_to_wake_up(struct task_struct *p, un
 	p->on_rq = 1;
 out_running:
 	ttwu_post_activation(p, rq, wake_flags);
-	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
 	__task_rq_unlock(rq);
+
+	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 	put_cpu();
@@ -2591,7 +2593,7 @@ static void try_to_wake_up_local(struct 
 		activate_task(rq, p, ENQUEUE_WAKEUP);
 
 	ttwu_post_activation(p, rq, 0);
-	ttwu_stat(rq, p, smp_processor_id(), 0);
+	ttwu_stat(p, smp_processor_id(), 0);
 out:
 	raw_spin_unlock(&p->pi_lock);
 }



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 18/21] sched: Rename ttwu_post_activation
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (16 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 17/21] sched: Remove rq argument from ttwu_stat() Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:39   ` [tip:sched/locking] sched: Rename ttwu_post_activation() to ttwu_do_wakeup() tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 19/21] sched: Restructure ttwu some more Peter Zijlstra
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu_do_wakeup.patch --]
[-- Type: text/plain, Size: 1434 bytes --]

The ttwu_post_activation() does the core wakeup, it sets TASK_RUNNING
and performs wakeup-preemption, so give is a more descriptive name.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2447,8 +2447,11 @@ ttwu_stat(struct task_struct *p, int cpu
 #endif /* CONFIG_SCHEDSTATS */
 }
 
+/*
+ * Mark the task runnable and perform wakeup-preemption.
+ */
 static void
-ttwu_post_activation(struct task_struct *p, struct rq *rq, int wake_flags)
+ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
 {
 	trace_sched_wakeup(p, true);
 	check_preempt_curr(rq, p, wake_flags);
@@ -2555,7 +2558,7 @@ try_to_wake_up(struct task_struct *p, un
 	activate_task(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
 	p->on_rq = 1;
 out_running:
-	ttwu_post_activation(p, rq, wake_flags);
+	ttwu_do_wakeup(rq, p, wake_flags);
 	success = 1;
 	__task_rq_unlock(rq);
 
@@ -2592,7 +2595,7 @@ static void try_to_wake_up_local(struct 
 	if (!p->on_rq)
 		activate_task(rq, p, ENQUEUE_WAKEUP);
 
-	ttwu_post_activation(p, rq, 0);
+	ttwu_do_wakeup(rq, p, 0);
 	ttwu_stat(p, smp_processor_id(), 0);
 out:
 	raw_spin_unlock(&p->pi_lock);



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 19/21] sched: Restructure ttwu some more
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (17 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 18/21] sched: Rename ttwu_post_activation Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:39   ` [tip:sched/locking] sched: Restructure ttwu() " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 20/21] sched: Move the second half of ttwu() to the remote cpu Peter Zijlstra
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu-foo.patch --]
[-- Type: text/plain, Size: 3612 bytes --]

The last few changes to ttwu that allow for adding remote queues.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   93 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 59 insertions(+), 34 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2474,6 +2474,49 @@ ttwu_do_wakeup(struct rq *rq, struct tas
 		wq_worker_waking_up(p, cpu_of(rq));
 }
 
+static void
+ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags)
+{
+#ifdef CONFIG_SMP
+	if (p->sched_contributes_to_load)
+		rq->nr_uninterruptible--;
+#endif
+
+	activate_task(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
+	p->on_rq = 1;
+	ttwu_do_wakeup(rq, p, wake_flags);
+}
+
+/*
+ * Called in case the task @p isn't fully descheduled from its runqueue,
+ * in this case we must do a remote wakeup. Its a 'light' wakeup though,
+ * since all we need to do is flip p->state to TASK_RUNNING, since
+ * the task is still ->on_rq.
+ */
+static int ttwu_remote(struct task_struct *p, int wake_flags)
+{
+	struct rq *rq;
+	int ret = 0;
+
+	rq = __task_rq_lock(p);
+	if (p->on_rq) {
+		ttwu_do_wakeup(rq, p, wake_flags);
+		ret = 1;
+	}
+	__task_rq_unlock(rq);
+
+	return ret;
+}
+
+static void ttwu_queue(struct task_struct *p, int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	raw_spin_lock(&rq->lock);
+	ttwu_do_activate(rq, p, 0);
+	raw_spin_unlock(&rq->lock);
+}
+
 /**
  * try_to_wake_up - wake up a thread
  * @p: the thread to be awakened
@@ -2492,27 +2535,25 @@ ttwu_do_wakeup(struct rq *rq, struct tas
 static int
 try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 {
-	int cpu, this_cpu, success = 0;
 	unsigned long flags;
-	struct rq *rq;
-
-	this_cpu = get_cpu();
+	int cpu, success = 0;
 
 	smp_wmb();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
 
+	success = 1; /* we're going to change ->state */
 	cpu = task_cpu(p);
 
-	if (p->on_rq) {
-		rq = __task_rq_lock(p);
-		if (p->on_rq)
-			goto out_running;
-		__task_rq_unlock(rq);
-	}
+	if (p->on_rq && ttwu_remote(p, wake_flags))
+		goto stat;
 
 #ifdef CONFIG_SMP
+	/*
+	 * If the owning (remote) cpu is still in the middle of schedule() with
+	 * this task as prev, wait until its done referencing the task.
+	 */
 	while (p->on_cpu) {
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 		/*
@@ -2521,8 +2562,10 @@ try_to_wake_up(struct task_struct *p, un
 		 * to spin on ->on_cpu if p is current, since that would
 		 * deadlock.
 		 */
-		if (p == current)
-			goto out_activate;
+		if (p == current) {
+			ttwu_queue(p, cpu);
+			goto stat;
+		}
 #endif
 		cpu_relax();
 	}
@@ -2538,33 +2581,15 @@ try_to_wake_up(struct task_struct *p, un
 		p->sched_class->task_waking(p);
 
 	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-out_activate:
-#endif
-#endif /* CONFIG_SMP */
-
-	rq = cpu_rq(cpu);
-	raw_spin_lock(&rq->lock);
-
-#ifdef CONFIG_SMP
-	if (cpu != task_cpu(p))
+	if (task_cpu(p) != cpu)
 		set_task_cpu(p, cpu);
+#endif /* CONFIG_SMP */
 
-	if (p->sched_contributes_to_load)
-		rq->nr_uninterruptible--;
-#endif
-
-	activate_task(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
-	p->on_rq = 1;
-out_running:
-	ttwu_do_wakeup(rq, p, wake_flags);
-	success = 1;
-	__task_rq_unlock(rq);
-
+	ttwu_queue(p, cpu);
+stat:
 	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
-	put_cpu();
 
 	return success;
 }



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 20/21] sched: Move the second half of ttwu() to the remote cpu
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (18 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 19/21] sched: Restructure ttwu some more Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:39   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:23 ` [PATCH 21/21] sched: Remove need_migrate_task() Peter Zijlstra
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-ttwu-queue-remote.patch --]
[-- Type: text/plain, Size: 4484 bytes --]

Now that we've removed the rq->lock requirement from the first part of
ttwu() and can compute placement without holding any rq->lock, ensure
we execute the second half of ttwu() on the actual cpu we want the
task to run on.

This avoids having to take rq->lock and doing the task enqueue
remotely, saving lots on cacheline transfers.

As measured using: http://oss.oracle.com/~mason/sembench.c

$ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
$ echo 4096 32000 64 128 > /proc/sys/kernel/sem
$ ./sembench -t 2048 -w 1900 -o 0

unpatched: run time 30 seconds 647278 worker burns per second
patched:   run time 30 seconds 816715 worker burns per second

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/sched.h   |    3 +-
 init/Kconfig            |    5 ++++
 kernel/sched.c          |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched_features.h |    6 +++++
 4 files changed, 69 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1203,6 +1203,7 @@ struct task_struct {
 	int lock_depth;		/* BKL lock depth */
 
 #ifdef CONFIG_SMP
+	struct task_struct *wake_entry;
 	int on_cpu;
 #endif
 	int on_rq;
@@ -2192,7 +2193,7 @@ extern void set_task_comm(struct task_st
 extern char *get_task_comm(char *to, struct task_struct *tsk);
 
 #ifdef CONFIG_SMP
-static inline void scheduler_ipi(void) { }
+void scheduler_ipi(void);
 extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
 #else
 static inline unsigned long wait_task_inactive(struct task_struct *p,
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -556,6 +556,10 @@ struct rq {
 	unsigned int ttwu_count;
 	unsigned int ttwu_local;
 #endif
+
+#ifdef CONFIG_SMP
+	struct task_struct *wake_list;
+#endif
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
@@ -2508,10 +2512,61 @@ static int ttwu_remote(struct task_struc
 	return ret;
 }
 
+#ifdef CONFIG_SMP
+static void sched_ttwu_pending(void)
+{
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	raw_spin_lock(&rq->lock);
+
+	while (list) {
+		struct task_struct *p = list;
+		list = list->wake_entry;
+		ttwu_do_activate(rq, p, 0);
+	}
+
+	raw_spin_unlock(&rq->lock);
+}
+
+void scheduler_ipi(void)
+{
+	sched_ttwu_pending();
+}
+
+static void ttwu_queue_remote(struct task_struct *p, int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	struct task_struct *next = rq->wake_list;
+
+	for (;;) {
+		struct task_struct *old = next;
+
+		p->wake_entry = next;
+		next = cmpxchg(&rq->wake_list, old, p);
+		if (next == old)
+			break;
+	}
+
+	if (!next)
+		smp_send_reschedule(cpu);
+}
+#endif
+
 static void ttwu_queue(struct task_struct *p, int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
+#if defined(CONFIG_SMP) && defined(CONFIG_SCHED_TTWU_QUEUE)
+	if (sched_feat(TTWU_QUEUE) && cpu != smp_processor_id()) {
+		ttwu_queue_remote(p, cpu);
+		return;
+	}
+#endif
+
 	raw_spin_lock(&rq->lock);
 	ttwu_do_activate(rq, p, 0);
 	raw_spin_unlock(&rq->lock);
@@ -6321,6 +6376,7 @@ migration_call(struct notifier_block *nf
 
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_DYING:
+		sched_ttwu_pending();
 		/* Update our root-domain */
 		raw_spin_lock_irqsave(&rq->lock, flags);
 		if (rq->rd) {
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -64,3 +64,9 @@ SCHED_FEAT(OWNER_SPIN, 1)
  * Decrement CPU power based on irq activity
  */
 SCHED_FEAT(NONIRQ_POWER, 1)
+
+/*
+ * Queue remote wakeups on the target CPU and process them
+ * using the scheduler IPI. Reduces rq->lock contention/bounces.
+ */
+SCHED_FEAT(TTWU_QUEUE, 1)
Index: linux-2.6/init/Kconfig
===================================================================
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -827,6 +827,11 @@ config SCHED_AUTOGROUP
 	  desktop applications.  Task group autogeneration is currently based
 	  upon task session.
 
+config SCHED_TTWU_QUEUE
+	bool
+	depends on !SPARC32
+	default y
+
 config MM_OWNER
 	bool
 



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [PATCH 21/21] sched: Remove need_migrate_task()
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (19 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 20/21] sched: Move the second half of ttwu() to the remote cpu Peter Zijlstra
@ 2011-04-05 15:23 ` Peter Zijlstra
  2011-04-14  8:40   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  2011-04-05 15:59 ` [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:23 UTC (permalink / raw)
  To: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-opt-need_migrate_task.patch --]
[-- Type: text/plain, Size: 1589 bytes --]

Oleg noticed that need_migrate_task() doesn't need the ->on_cpu check
now that ttwu() doesn't do remote enqueues for !->on_rq && ->on_cpu,
so remove the helper and replace the single instance with a direct
->on_rq test.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
---
 kernel/sched.c |   17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2139,21 +2139,6 @@ struct migration_arg {
 static int migration_cpu_stop(void *data);
 
 /*
- * The task's runqueue lock must be held.
- * Returns true if you have to wait for migration thread.
- */
-static bool need_migrate_task(struct task_struct *p)
-{
-	/*
-	 * If the task is not on a runqueue (and not running), then
-	 * the next wake-up will properly place the task.
-	 */
-	bool running = p->on_rq || p->on_cpu;
-	smp_rmb(); /* finish_lock_switch() */
-	return running;
-}
-
-/*
  * wait_task_inactive - wait for a thread to unschedule.
  *
  * If @match_state is nonzero, it's the @p->state value just checked and
@@ -5734,7 +5719,7 @@ int set_cpus_allowed_ptr(struct task_str
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (need_migrate_task(p)) {
+	if (p->on_rq) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 00/21] sched: Reduce runqueue lock contention -v6
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (20 preceding siblings ...)
  2011-04-05 15:23 ` [PATCH 21/21] sched: Remove need_migrate_task() Peter Zijlstra
@ 2011-04-05 15:59 ` Peter Zijlstra
  2011-04-06 11:00 ` Peter Zijlstra
  2011-04-27 16:54 ` Dave Kleikamp
  23 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-05 15:59 UTC (permalink / raw)
  To: Chris Mason
  Cc: Frank Rowand, Ingo Molnar, Thomas Gleixner, Mike Galbraith,
	Oleg Nesterov, Paul Turner, Jens Axboe, Yong Zhang, linux-kernel

On Tue, 2011-04-05 at 17:23 +0200, Peter Zijlstra wrote:
> 
> unpatched: run time 30 seconds 647278 worker burns per second
> patched:   run time 30 seconds 816715 worker burns per second 

Obviously bigger is better :-), the above means 26% more wakeups
processed in the 30 seconds.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 00/21] sched: Reduce runqueue lock contention -v6
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (21 preceding siblings ...)
  2011-04-05 15:59 ` [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
@ 2011-04-06 11:00 ` Peter Zijlstra
  2011-04-27 16:54 ` Dave Kleikamp
  23 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-06 11:00 UTC (permalink / raw)
  To: Chris Mason
  Cc: Frank Rowand, Ingo Molnar, Thomas Gleixner, Mike Galbraith,
	Oleg Nesterov, Paul Turner, Jens Axboe, Yong Zhang, linux-kernel

On Tue, 2011-04-05 at 17:23 +0200, Peter Zijlstra wrote:
> This patch series aims to optimize remote wakeups by moving most of the
> work of the wakeup to the remote cpu and avoid bouncing runqueue data
> structures where possible.
> 
> As measured by sembench (which basically creates a wakeup storm) on my
> dual-socket westmere:
> 
> $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
> $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
> $ ./sembench -t 2048 -w 1900 -o 0
> 
> unpatched: run time 30 seconds 647278 worker burns per second
> patched:   run time 30 seconds 816715 worker burns per second
> 
> I've queued this series for .40.

Full diffstat per request

---
 arch/alpha/kernel/smp.c             |    3 +-
 arch/arm/kernel/smp.c               |    5 +-
 arch/blackfin/mach-common/smp.c     |    3 +
 arch/cris/arch-v32/kernel/smp.c     |   13 +-
 arch/ia64/kernel/irq_ia64.c         |    2 +
 arch/ia64/xen/irq_xen.c             |   10 +-
 arch/m32r/kernel/smp.c              |    4 +-
 arch/mips/cavium-octeon/smp.c       |    2 +
 arch/mips/kernel/smtc.c             |    2 +-
 arch/mips/mti-malta/malta-int.c     |    2 +
 arch/mips/pmc-sierra/yosemite/smp.c |    4 +
 arch/mips/sgi-ip27/ip27-irq.c       |    2 +
 arch/mips/sibyte/bcm1480/smp.c      |    7 +-
 arch/mips/sibyte/sb1250/smp.c       |    7 +-
 arch/mn10300/kernel/smp.c           |    5 +-
 arch/parisc/kernel/smp.c            |    5 +-
 arch/powerpc/kernel/smp.c           |    4 +-
 arch/s390/kernel/smp.c              |    6 +-
 arch/sh/kernel/smp.c                |    2 +
 arch/sparc/kernel/smp_32.c          |    4 +-
 arch/sparc/kernel/smp_64.c          |    1 +
 arch/tile/kernel/smp.c              |    6 +-
 arch/um/kernel/smp.c                |    2 +-
 arch/x86/kernel/smp.c               |    5 +-
 arch/x86/xen/smp.c                  |    5 +-
 include/linux/mutex.h               |    2 +-
 include/linux/sched.h               |   23 +-
 init/Kconfig                        |    5 +
 kernel/mutex-debug.c                |    2 +-
 kernel/mutex-debug.h                |    2 +-
 kernel/mutex.c                      |    2 +-
 kernel/mutex.h                      |    2 +-
 kernel/sched.c                      |  622 +++++++++++++++++++----------------
 kernel/sched_debug.c                |    2 +-
 kernel/sched_fair.c                 |   23 ++-
 kernel/sched_features.h             |    6 +
 kernel/sched_idletask.c             |    2 +-
 kernel/sched_rt.c                   |   54 ++--
 kernel/sched_stoptask.c             |    5 +-
 39 files changed, 483 insertions(+), 380 deletions(-)


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 04/21] sched: Change the ttwu success details
  2011-04-05 15:23 ` [PATCH 04/21] sched: Change the ttwu success details Peter Zijlstra
@ 2011-04-13  9:23   ` Peter Zijlstra
  2011-04-13 10:48     ` Peter Zijlstra
  2011-04-14  8:32   ` [tip:sched/locking] sched: Change the ttwu() " tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-13  9:23 UTC (permalink / raw)
  To: Chris Mason
  Cc: Frank Rowand, Ingo Molnar, Thomas Gleixner, Mike Galbraith,
	Oleg Nesterov, Paul Turner, Jens Axboe, Yong Zhang, linux-kernel

On Tue, 2011-04-05 at 17:23 +0200, Peter Zijlstra wrote:
> plain text document attachment (sched-change-ttwu-return.patch)
> try_to_wake_up() would only return a success when it would have to
> place a task on a rq, change that to every time we change p->state to
> TASK_RUNNING, because that's the real measure of wakeups.

So Ingo is reporting lockups with this patch on UP and I can't seem to
reproduce with his .config nor does it seem to make any sense.

The biggest change here is the movement of success = 1 in
try_to_wake_up(). The change to ttwu_post_activation() only affects a
tracepoint (if that changes behaviour something is seriously screwy) and
workqueue wakeups, and I doubt extra wakeups will cause lockups.

Therefore, the changes to try_to_wake_up_local() and wake_up_new_task()
are also not interesting. Leaving us with the one change in
try_to_wake_up().

> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
> ---
>  kernel/sched.c |   18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c

> @@ -2505,9 +2505,9 @@ static int try_to_wake_up(struct task_st
>  #endif /* CONFIG_SMP */
>  	ttwu_activate(p, rq, wake_flags & WF_SYNC, orig_cpu != cpu,
>  		      cpu == this_cpu, en_flags);
> -	success = 1;
>  out_running:
> -	ttwu_post_activation(p, rq, wake_flags, success);
> +	ttwu_post_activation(p, rq, wake_flags);
> +	success = 1;
>  out:
>  	task_rq_unlock(rq, &flags);
>  	put_cpu();

There we move success=1 so that out_running also returns success.

out_running is the case where the task has been marked
TASK_(UN)INTERRUPTIBLE but the schedule() function hasn't managed to
call deactivate_task() yet.

[ On UP that means ttwu took rq->lock before schedule() takes rq->lock ]

In that case we used to simply flip ->state back to TASK_RUNNING and not
return having woken up a task.

Now I think that is silly, because we most certainly did a wake-up, and
if we'd have a slightly different interleave with ttwu()/schedule() we
would have had to do a full wakeup.

So not treating that as a wakeup makes the ttwu() semantics dependent on
timing, something which IMO doesn't make any kind of sense.

Also, the only effect of also making that return a success is that
things like __wake_up_common() would see an extra wakeup and thus wakeup
one less task. But again, it actually was a real wakeup, the task would
have gone to sleep otherwise.

So I'm mighty puzzled as to how this causes grief.. and vexed for not
being able to reproduce.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 04/21] sched: Change the ttwu success details
  2011-04-13  9:23   ` Peter Zijlstra
@ 2011-04-13 10:48     ` Peter Zijlstra
  2011-04-13 11:06       ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-13 10:48 UTC (permalink / raw)
  To: Chris Mason
  Cc: Frank Rowand, Ingo Molnar, Thomas Gleixner, Mike Galbraith,
	Oleg Nesterov, Paul Turner, Jens Axboe, Yong Zhang, linux-kernel,
	Tejun Heo

On Wed, 2011-04-13 at 11:23 +0200, Peter Zijlstra wrote:
> and workqueue wakeups, and I doubt extra wakeups will cause lockups. 

Damn assumptions ;-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2447,7 +2447,7 @@ static inline void ttwu_post_activation(
 	}
 #endif
 	/* if a worker is waking up, notify workqueue */
-	if ((p->flags & PF_WQ_WORKER) && success)
+	if (p->flags & PF_WQ_WORKER)
 		wq_worker_waking_up(p, cpu_of(rq));
 }
 

Appears to be sufficient to cause the lockup, so somehow the whole
workqueue stuff relies on the fact that waking a TASK_(UN)INTERRUPTIBLE
task that hasn't been dequeued yet isn't a wakeup.

Tejun any quick clues as to why and how to cure this?

/me goes read that stuff


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 04/21] sched: Change the ttwu success details
  2011-04-13 10:48     ` Peter Zijlstra
@ 2011-04-13 11:06       ` Peter Zijlstra
  2011-04-13 18:39         ` Tejun Heo
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-13 11:06 UTC (permalink / raw)
  To: Chris Mason
  Cc: Frank Rowand, Ingo Molnar, Thomas Gleixner, Mike Galbraith,
	Oleg Nesterov, Paul Turner, Jens Axboe, Yong Zhang, linux-kernel,
	Tejun Heo

On Wed, 2011-04-13 at 12:48 +0200, Peter Zijlstra wrote:
> On Wed, 2011-04-13 at 11:23 +0200, Peter Zijlstra wrote:
> > and workqueue wakeups, and I doubt extra wakeups will cause lockups. 
> 
> Damn assumptions ;-)
> 
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -2447,7 +2447,7 @@ static inline void ttwu_post_activation(
>  	}
>  #endif
>  	/* if a worker is waking up, notify workqueue */
> -	if ((p->flags & PF_WQ_WORKER) && success)
> +	if (p->flags & PF_WQ_WORKER)
>  		wq_worker_waking_up(p, cpu_of(rq));
>  }
>  
> 
> Appears to be sufficient to cause the lockup, so somehow the whole
> workqueue stuff relies on the fact that waking a TASK_(UN)INTERRUPTIBLE
> task that hasn't been dequeued yet isn't a wakeup.
> 
> Tejun any quick clues as to why and how to cure this?
> 
> /me goes read that stuff

OK, so wq_worker_waking_up() does an atomic_inc() that wants to be
balanced against the atomic_dec() in wq_worker_sleeping(), which is only
called when we dequeue things.



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 04/21] sched: Change the ttwu success details
  2011-04-13 11:06       ` Peter Zijlstra
@ 2011-04-13 18:39         ` Tejun Heo
  2011-04-13 19:11           ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Tejun Heo @ 2011-04-13 18:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang, linux-kernel

Hello, Peter.

On Wed, Apr 13, 2011 at 01:06:39PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-04-13 at 12:48 +0200, Peter Zijlstra wrote:
> > Appears to be sufficient to cause the lockup, so somehow the whole
> > workqueue stuff relies on the fact that waking a TASK_(UN)INTERRUPTIBLE
> > task that hasn't been dequeued yet isn't a wakeup.
> > 
> > Tejun any quick clues as to why and how to cure this?
> > 
> > /me goes read that stuff
> 
> OK, so wq_worker_waking_up() does an atomic_inc() that wants to be
> balanced against the atomic_dec() in wq_worker_sleeping(), which is only
> called when we dequeue things.

Yeap, the root cause of the problem is that the change makes
wq_worker_sleeping() and wq_worker_waking_up() asymmetric and thus
puts the nr_running counter goes out of sync which hides active worker
depletion from the workqueue code leading to stall.

One way to deal with it would be adding an extra worker flag to track
sleep state from workqueue side so that it can filter out spurious
wakeups; however, I think it would be far better to resolve this from
scheduler side.  If the callback name is misleading, rename it to
wq_worker_sched_activated() or something and call it only when the
task gets activated.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 04/21] sched: Change the ttwu success details
  2011-04-13 18:39         ` Tejun Heo
@ 2011-04-13 19:11           ` Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-13 19:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang, linux-kernel

On Thu, 2011-04-14 at 03:39 +0900, Tejun Heo wrote:
> One way to deal with it would be adding an extra worker flag to track
> sleep state from workqueue side so that it can filter out spurious
> wakeups; however, I think it would be far better to resolve this from
> scheduler side.  If the callback name is misleading, rename it to
> wq_worker_sched_activated() or something and call it only when the
> task gets activated. 

Right, what I did is move the thing near activate_task() instead of
relying on any passing on of that information.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule()
  2011-04-05 15:23 ` [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() Peter Zijlstra
@ 2011-04-13 21:15   ` Tony Luck
  2011-04-13 21:38     ` Peter Zijlstra
  2011-04-14  8:31   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 152+ messages in thread
From: Tony Luck @ 2011-04-13 21:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang, linux-kernel, Russell King, Martin Schwidefsky,
	Chris Metcalf, Jesper Nilsson, Benjamin Herrenschmidt,
	Ralf Baechle

On Tue, Apr 5, 2011 at 8:23 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> --- linux-2.6.orig/arch/ia64/kernel/irq_ia64.c
> +++ linux-2.6/arch/ia64/kernel/irq_ia64.c
> @@ -31,6 +31,7 @@
>  #include <linux/irq.h>
>  #include <linux/ratelimit.h>
>  #include <linux/acpi.h>
> +#include <linux/sched.h>
>
>  #include <asm/delay.h>
>  #include <asm/intrinsics.h>
> @@ -496,6 +497,7 @@ ia64_handle_irq (ia64_vector vector, str
>                        smp_local_flush_tlb();
>                        kstat_incr_irqs_this_cpu(irq, desc);
>                } else if (unlikely(IS_RESCHEDULE(vector))) {
> +                       scheduler_ipi();
>                        kstat_incr_irqs_this_cpu(irq, desc);
>                } else {
>                        ia64_setreg(_IA64_REG_CR_TPR, vector);

This bit breaks ia64 CONFIG_SMP=n builds in next-20110413 with:

arch/ia64/kernel/irq_ia64.c: In function ‘ia64_handle_irq’:
arch/ia64/kernel/irq_ia64.c:500: error: implicit declaration of
function ‘scheduler_ipi’

-Tony

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule()
  2011-04-13 21:15   ` Tony Luck
@ 2011-04-13 21:38     ` Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-04-13 21:38 UTC (permalink / raw)
  To: Tony Luck
  Cc: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, Oleg Nesterov, Paul Turner, Jens Axboe,
	Yong Zhang, linux-kernel, Russell King, Martin Schwidefsky,
	Chris Metcalf, Jesper Nilsson, Benjamin Herrenschmidt,
	Ralf Baechle

On Wed, 2011-04-13 at 14:15 -0700, Tony Luck wrote:
> On Tue, Apr 5, 2011 at 8:23 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > --- linux-2.6.orig/arch/ia64/kernel/irq_ia64.c
> > +++ linux-2.6/arch/ia64/kernel/irq_ia64.c
> > @@ -31,6 +31,7 @@
> >  #include <linux/irq.h>
> >  #include <linux/ratelimit.h>
> >  #include <linux/acpi.h>
> > +#include <linux/sched.h>
> >
> >  #include <asm/delay.h>
> >  #include <asm/intrinsics.h>
> > @@ -496,6 +497,7 @@ ia64_handle_irq (ia64_vector vector, str
> >                        smp_local_flush_tlb();
> >                        kstat_incr_irqs_this_cpu(irq, desc);
> >                } else if (unlikely(IS_RESCHEDULE(vector))) {
> > +                       scheduler_ipi();
> >                        kstat_incr_irqs_this_cpu(irq, desc);
> >                } else {
> >                        ia64_setreg(_IA64_REG_CR_TPR, vector);
> 
> This bit breaks ia64 CONFIG_SMP=n builds in next-20110413 with:
> 
> arch/ia64/kernel/irq_ia64.c: In function ‘ia64_handle_irq’:
> arch/ia64/kernel/irq_ia64.c:500: error: implicit declaration of
> function ‘scheduler_ipi’

Ah, I didn't think arch code would have the reschedule interrupt on UP.
I'll provide an empty stub. Thanks!

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule()
  2011-04-05 15:23 ` [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() Peter Zijlstra
  2011-04-13 21:15   ` Tony Luck
@ 2011-04-14  8:31   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, jesper.nilsson, torvalds, a.p.zijlstra, efault,
	schwidefsky, rmk+kernel, cmetcalf, npiggin, akpm, ralf,
	frank.rowand, tglx, hpa, linux-kernel, benh, mingo

Commit-ID:  184748cc50b2dceb8287f9fb657eda48ff8fcfe7
Gitweb:     http://git.kernel.org/tip/184748cc50b2dceb8287f9fb657eda48ff8fcfe7
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:39 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:32 +0200

sched: Provide scheduler_ipi() callback in response to smp_send_reschedule()

For future rework of try_to_wake_up() we'd like to push part of that
function onto the CPU the task is actually going to run on.

In order to do so we need a generic callback from the existing scheduler IPI.

This patch introduces such a generic callback: scheduler_ipi() and
implements it as a NOP.

BenH notes: PowerPC might use this IPI on offline CPUs under rare conditions!

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152728.744338123@chello.nl
---
 arch/alpha/kernel/smp.c             |    3 +--
 arch/arm/kernel/smp.c               |    5 +----
 arch/blackfin/mach-common/smp.c     |    3 +++
 arch/cris/arch-v32/kernel/smp.c     |   13 ++++++++-----
 arch/ia64/kernel/irq_ia64.c         |    2 ++
 arch/ia64/xen/irq_xen.c             |   10 +++++++++-
 arch/m32r/kernel/smp.c              |    4 +---
 arch/mips/cavium-octeon/smp.c       |    2 ++
 arch/mips/kernel/smtc.c             |    2 +-
 arch/mips/mti-malta/malta-int.c     |    2 ++
 arch/mips/pmc-sierra/yosemite/smp.c |    4 ++++
 arch/mips/sgi-ip27/ip27-irq.c       |    2 ++
 arch/mips/sibyte/bcm1480/smp.c      |    7 +++----
 arch/mips/sibyte/sb1250/smp.c       |    7 +++----
 arch/mn10300/kernel/smp.c           |    5 +----
 arch/parisc/kernel/smp.c            |    5 +----
 arch/powerpc/kernel/smp.c           |    4 ++--
 arch/s390/kernel/smp.c              |    6 +++---
 arch/sh/kernel/smp.c                |    2 ++
 arch/sparc/kernel/smp_32.c          |    4 +++-
 arch/sparc/kernel/smp_64.c          |    1 +
 arch/tile/kernel/smp.c              |    6 +-----
 arch/um/kernel/smp.c                |    2 +-
 arch/x86/kernel/smp.c               |    5 ++---
 arch/x86/xen/smp.c                  |    5 ++---
 include/linux/sched.h               |    2 ++
 26 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 42aa078..5a621c6 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -585,8 +585,7 @@ handle_ipi(struct pt_regs *regs)
 
 		switch (which) {
 		case IPI_RESCHEDULE:
-			/* Reschedule callback.  Everything to be done
-			   is done by the interrupt return path.  */
+			scheduler_ipi();
 			break;
 
 		case IPI_CALL_FUNC:
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 8fe05ad..7a561eb 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -560,10 +560,7 @@ asmlinkage void __exception_irq_entry do_IPI(int ipinr, struct pt_regs *regs)
 		break;
 
 	case IPI_RESCHEDULE:
-		/*
-		 * nothing more to do - eveything is
-		 * done on the interrupt return path
-		 */
+		scheduler_ipi();
 		break;
 
 	case IPI_CALL_FUNC:
diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
index 6e17a26..326bb86 100644
--- a/arch/blackfin/mach-common/smp.c
+++ b/arch/blackfin/mach-common/smp.c
@@ -164,6 +164,9 @@ static irqreturn_t ipi_handler_int1(int irq, void *dev_instance)
 	while (msg_queue->count) {
 		msg = &msg_queue->ipi_message[msg_queue->head];
 		switch (msg->type) {
+		case BFIN_IPI_RESCHEDULE:
+			scheduler_ipi();
+			break;
 		case BFIN_IPI_CALL_FUNC:
 			spin_unlock_irqrestore(&msg_queue->lock, flags);
 			ipi_call_function(cpu, msg);
diff --git a/arch/cris/arch-v32/kernel/smp.c b/arch/cris/arch-v32/kernel/smp.c
index 4c9e3e1..66cc756 100644
--- a/arch/cris/arch-v32/kernel/smp.c
+++ b/arch/cris/arch-v32/kernel/smp.c
@@ -342,15 +342,18 @@ irqreturn_t crisv32_ipi_interrupt(int irq, void *dev_id)
 
 	ipi = REG_RD(intr_vect, irq_regs[smp_processor_id()], rw_ipi);
 
+	if (ipi.vector & IPI_SCHEDULE) {
+		scheduler_ipi();
+	}
 	if (ipi.vector & IPI_CALL) {
-	         func(info);
+		func(info);
 	}
 	if (ipi.vector & IPI_FLUSH_TLB) {
-		     if (flush_mm == FLUSH_ALL)
-			 __flush_tlb_all();
-		     else if (flush_vma == FLUSH_ALL)
+		if (flush_mm == FLUSH_ALL)
+			__flush_tlb_all();
+		else if (flush_vma == FLUSH_ALL)
 			__flush_tlb_mm(flush_mm);
-		     else
+		else
 			__flush_tlb_page(flush_vma, flush_addr);
 	}
 
diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 5b70474..782c3a35 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -31,6 +31,7 @@
 #include <linux/irq.h>
 #include <linux/ratelimit.h>
 #include <linux/acpi.h>
+#include <linux/sched.h>
 
 #include <asm/delay.h>
 #include <asm/intrinsics.h>
@@ -496,6 +497,7 @@ ia64_handle_irq (ia64_vector vector, struct pt_regs *regs)
 			smp_local_flush_tlb();
 			kstat_incr_irqs_this_cpu(irq, desc);
 		} else if (unlikely(IS_RESCHEDULE(vector))) {
+			scheduler_ipi();
 			kstat_incr_irqs_this_cpu(irq, desc);
 		} else {
 			ia64_setreg(_IA64_REG_CR_TPR, vector);
diff --git a/arch/ia64/xen/irq_xen.c b/arch/ia64/xen/irq_xen.c
index 108bb85..b279e14 100644
--- a/arch/ia64/xen/irq_xen.c
+++ b/arch/ia64/xen/irq_xen.c
@@ -92,6 +92,8 @@ static unsigned short saved_irq_cnt;
 static int xen_slab_ready;
 
 #ifdef CONFIG_SMP
+#include <linux/sched.h>
+
 /* Dummy stub. Though we may check XEN_RESCHEDULE_VECTOR before __do_IRQ,
  * it ends up to issue several memory accesses upon percpu data and
  * thus adds unnecessary traffic to other paths.
@@ -99,7 +101,13 @@ static int xen_slab_ready;
 static irqreturn_t
 xen_dummy_handler(int irq, void *dev_id)
 {
+	return IRQ_HANDLED;
+}
 
+static irqreturn_t
+xen_resched_handler(int irq, void *dev_id)
+{
+	scheduler_ipi();
 	return IRQ_HANDLED;
 }
 
@@ -110,7 +118,7 @@ static struct irqaction xen_ipi_irqaction = {
 };
 
 static struct irqaction xen_resched_irqaction = {
-	.handler =	xen_dummy_handler,
+	.handler =	xen_resched_handler,
 	.flags =	IRQF_DISABLED,
 	.name =		"resched"
 };
diff --git a/arch/m32r/kernel/smp.c b/arch/m32r/kernel/smp.c
index 31cef20..fc10b39 100644
--- a/arch/m32r/kernel/smp.c
+++ b/arch/m32r/kernel/smp.c
@@ -122,8 +122,6 @@ void smp_send_reschedule(int cpu_id)
  *
  * Description:  This routine executes on CPU which received
  *               'RESCHEDULE_IPI'.
- *               Rescheduling is processed at the exit of interrupt
- *               operation.
  *
  * Born on Date: 2002.02.05
  *
@@ -138,7 +136,7 @@ void smp_send_reschedule(int cpu_id)
  *==========================================================================*/
 void smp_reschedule_interrupt(void)
 {
-	/* nothing to do */
+	scheduler_ipi();
 }
 
 /*==========================================================================*
diff --git a/arch/mips/cavium-octeon/smp.c b/arch/mips/cavium-octeon/smp.c
index ba78b21..76923ee 100644
--- a/arch/mips/cavium-octeon/smp.c
+++ b/arch/mips/cavium-octeon/smp.c
@@ -44,6 +44,8 @@ static irqreturn_t mailbox_interrupt(int irq, void *dev_id)
 
 	if (action & SMP_CALL_FUNCTION)
 		smp_call_function_interrupt();
+	if (action & SMP_RESCHEDULE_YOURSELF)
+		scheduler_ipi();
 
 	/* Check if we've been told to flush the icache */
 	if (action & SMP_ICACHE_FLUSH)
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 5a88cc4..cedac46 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -929,7 +929,7 @@ static void post_direct_ipi(int cpu, struct smtc_ipi *pipi)
 
 static void ipi_resched_interrupt(void)
 {
-	/* Return from interrupt should be enough to cause scheduler check */
+	scheduler_ipi();
 }
 
 static void ipi_call_interrupt(void)
diff --git a/arch/mips/mti-malta/malta-int.c b/arch/mips/mti-malta/malta-int.c
index 9027061..7d93e6f 100644
--- a/arch/mips/mti-malta/malta-int.c
+++ b/arch/mips/mti-malta/malta-int.c
@@ -309,6 +309,8 @@ static void ipi_call_dispatch(void)
 
 static irqreturn_t ipi_resched_interrupt(int irq, void *dev_id)
 {
+	scheduler_ipi();
+
 	return IRQ_HANDLED;
 }
 
diff --git a/arch/mips/pmc-sierra/yosemite/smp.c b/arch/mips/pmc-sierra/yosemite/smp.c
index efc9e88..2608752 100644
--- a/arch/mips/pmc-sierra/yosemite/smp.c
+++ b/arch/mips/pmc-sierra/yosemite/smp.c
@@ -55,6 +55,8 @@ void titan_mailbox_irq(void)
 
 		if (status & 0x2)
 			smp_call_function_interrupt();
+		if (status & 0x4)
+			scheduler_ipi();
 		break;
 
 	case 1:
@@ -63,6 +65,8 @@ void titan_mailbox_irq(void)
 
 		if (status & 0x2)
 			smp_call_function_interrupt();
+		if (status & 0x4)
+			scheduler_ipi();
 		break;
 	}
 }
diff --git a/arch/mips/sgi-ip27/ip27-irq.c b/arch/mips/sgi-ip27/ip27-irq.c
index 0a04603..b18b04e 100644
--- a/arch/mips/sgi-ip27/ip27-irq.c
+++ b/arch/mips/sgi-ip27/ip27-irq.c
@@ -147,8 +147,10 @@ static void ip27_do_irq_mask0(void)
 #ifdef CONFIG_SMP
 	if (pend0 & (1UL << CPU_RESCHED_A_IRQ)) {
 		LOCAL_HUB_CLR_INTR(CPU_RESCHED_A_IRQ);
+		scheduler_ipi();
 	} else if (pend0 & (1UL << CPU_RESCHED_B_IRQ)) {
 		LOCAL_HUB_CLR_INTR(CPU_RESCHED_B_IRQ);
+		scheduler_ipi();
 	} else if (pend0 & (1UL << CPU_CALL_A_IRQ)) {
 		LOCAL_HUB_CLR_INTR(CPU_CALL_A_IRQ);
 		smp_call_function_interrupt();
diff --git a/arch/mips/sibyte/bcm1480/smp.c b/arch/mips/sibyte/bcm1480/smp.c
index 47b347c..d667875 100644
--- a/arch/mips/sibyte/bcm1480/smp.c
+++ b/arch/mips/sibyte/bcm1480/smp.c
@@ -20,6 +20,7 @@
 #include <linux/delay.h>
 #include <linux/smp.h>
 #include <linux/kernel_stat.h>
+#include <linux/sched.h>
 
 #include <asm/mmu_context.h>
 #include <asm/io.h>
@@ -189,10 +190,8 @@ void bcm1480_mailbox_interrupt(void)
 	/* Clear the mailbox to clear the interrupt */
 	__raw_writeq(((u64)action)<<48, mailbox_0_clear_regs[cpu]);
 
-	/*
-	 * Nothing to do for SMP_RESCHEDULE_YOURSELF; returning from the
-	 * interrupt will do the reschedule for us
-	 */
+	if (action & SMP_RESCHEDULE_YOURSELF)
+		scheduler_ipi();
 
 	if (action & SMP_CALL_FUNCTION)
 		smp_call_function_interrupt();
diff --git a/arch/mips/sibyte/sb1250/smp.c b/arch/mips/sibyte/sb1250/smp.c
index c00a5cb..38e7f6b 100644
--- a/arch/mips/sibyte/sb1250/smp.c
+++ b/arch/mips/sibyte/sb1250/smp.c
@@ -21,6 +21,7 @@
 #include <linux/interrupt.h>
 #include <linux/smp.h>
 #include <linux/kernel_stat.h>
+#include <linux/sched.h>
 
 #include <asm/mmu_context.h>
 #include <asm/io.h>
@@ -177,10 +178,8 @@ void sb1250_mailbox_interrupt(void)
 	/* Clear the mailbox to clear the interrupt */
 	____raw_writeq(((u64)action) << 48, mailbox_clear_regs[cpu]);
 
-	/*
-	 * Nothing to do for SMP_RESCHEDULE_YOURSELF; returning from the
-	 * interrupt will do the reschedule for us
-	 */
+	if (action & SMP_RESCHEDULE_YOURSELF)
+		scheduler_ipi();
 
 	if (action & SMP_CALL_FUNCTION)
 		smp_call_function_interrupt();
diff --git a/arch/mn10300/kernel/smp.c b/arch/mn10300/kernel/smp.c
index 226c826..83fb279 100644
--- a/arch/mn10300/kernel/smp.c
+++ b/arch/mn10300/kernel/smp.c
@@ -494,14 +494,11 @@ void smp_send_stop(void)
  * @irq: The interrupt number.
  * @dev_id: The device ID.
  *
- * We need do nothing here, since the scheduling will be effected on our way
- * back through entry.S.
- *
  * Returns IRQ_HANDLED to indicate we handled the interrupt successfully.
  */
 static irqreturn_t smp_reschedule_interrupt(int irq, void *dev_id)
 {
-	/* do nothing */
+	scheduler_ipi();
 	return IRQ_HANDLED;
 }
 
diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 69d63d3..828305f 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -155,10 +155,7 @@ ipi_interrupt(int irq, void *dev_id)
 				
 			case IPI_RESCHEDULE:
 				smp_debug(100, KERN_DEBUG "CPU%d IPI_RESCHEDULE\n", this_cpu);
-				/*
-				 * Reschedule callback.  Everything to be
-				 * done is done by the interrupt return path.
-				 */
+				scheduler_ipi();
 				break;
 
 			case IPI_CALL_FUNC:
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index cbdbb14..9f9c204 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -116,7 +116,7 @@ void smp_message_recv(int msg)
 		generic_smp_call_function_interrupt();
 		break;
 	case PPC_MSG_RESCHEDULE:
-		/* we notice need_resched on exit */
+		scheduler_ipi();
 		break;
 	case PPC_MSG_CALL_FUNC_SINGLE:
 		generic_smp_call_function_single_interrupt();
@@ -146,7 +146,7 @@ static irqreturn_t call_function_action(int irq, void *data)
 
 static irqreturn_t reschedule_action(int irq, void *data)
 {
-	/* we just need the return path side effect of checking need_resched */
+	scheduler_ipi();
 	return IRQ_HANDLED;
 }
 
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 63a97db..63c7d9f 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -165,12 +165,12 @@ static void do_ext_call_interrupt(unsigned int ext_int_code,
 	kstat_cpu(smp_processor_id()).irqs[EXTINT_IPI]++;
 	/*
 	 * handle bit signal external calls
-	 *
-	 * For the ec_schedule signal we have to do nothing. All the work
-	 * is done automatically when we return from the interrupt.
 	 */
 	bits = xchg(&S390_lowcore.ext_call_fast, 0);
 
+	if (test_bit(ec_schedule, &bits))
+		scheduler_ipi();
+
 	if (test_bit(ec_call_function, &bits))
 		generic_smp_call_function_interrupt();
 
diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 509b36b..6207561 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -20,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
+#include <linux/sched.h>
 #include <asm/atomic.h>
 #include <asm/processor.h>
 #include <asm/system.h>
@@ -323,6 +324,7 @@ void smp_message_recv(unsigned int msg)
 		generic_smp_call_function_interrupt();
 		break;
 	case SMP_MSG_RESCHEDULE:
+		scheduler_ipi();
 		break;
 	case SMP_MSG_FUNCTION_SINGLE:
 		generic_smp_call_function_single_interrupt();
diff --git a/arch/sparc/kernel/smp_32.c b/arch/sparc/kernel/smp_32.c
index 91c10fb..f95690c 100644
--- a/arch/sparc/kernel/smp_32.c
+++ b/arch/sparc/kernel/smp_32.c
@@ -125,7 +125,9 @@ struct linux_prom_registers smp_penguin_ctable __cpuinitdata = { 0 };
 
 void smp_send_reschedule(int cpu)
 {
-	/* See sparc64 */
+	/*
+	 * XXX missing reschedule IPI, see scheduler_ipi()
+	 */
 }
 
 void smp_send_stop(void)
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 3e94a8c..9478da7 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1368,6 +1368,7 @@ void smp_send_reschedule(int cpu)
 void __irq_entry smp_receive_signal_client(int irq, struct pt_regs *regs)
 {
 	clear_softint(1 << irq);
+	scheduler_ipi();
 }
 
 /* This is a nop because we capture all other cpus
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index a429310..c52224d 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -189,12 +189,8 @@ void flush_icache_range(unsigned long start, unsigned long end)
 /* Called when smp_send_reschedule() triggers IRQ_RESCHEDULE. */
 static irqreturn_t handle_reschedule_ipi(int irq, void *token)
 {
-	/*
-	 * Nothing to do here; when we return from interrupt, the
-	 * rescheduling will occur there. But do bump the interrupt
-	 * profiler count in the meantime.
-	 */
 	__get_cpu_var(irq_stat).irq_resched_count++;
+	scheduler_ipi();
 
 	return IRQ_HANDLED;
 }
diff --git a/arch/um/kernel/smp.c b/arch/um/kernel/smp.c
index 106bf27..eefb107 100644
--- a/arch/um/kernel/smp.c
+++ b/arch/um/kernel/smp.c
@@ -173,7 +173,7 @@ void IPI_handler(int cpu)
 			break;
 
 		case 'R':
-			set_tsk_need_resched(current);
+			scheduler_ipi();
 			break;
 
 		case 'S':
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 513deac..013e7eb 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -194,14 +194,13 @@ static void native_stop_other_cpus(int wait)
 }
 
 /*
- * Reschedule call back. Nothing to do,
- * all the work is done automatically when
- * we return from the interrupt.
+ * Reschedule call back.
  */
 void smp_reschedule_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	inc_irq_stat(irq_resched_count);
+	scheduler_ipi();
 	/*
 	 * KVM uses this interrupt to force a cpu out of guest mode
 	 */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 3061244..762b46a 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -46,13 +46,12 @@ static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id);
 static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id);
 
 /*
- * Reschedule call back. Nothing to do,
- * all the work is done automatically when
- * we return from the interrupt.
+ * Reschedule call back.
  */
 static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id)
 {
 	inc_irq_stat(irq_resched_count);
+	scheduler_ipi();
 
 	return IRQ_HANDLED;
 }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4ec2c02..758e27a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2189,8 +2189,10 @@ extern void set_task_comm(struct task_struct *tsk, char *from);
 extern char *get_task_comm(char *to, struct task_struct *tsk);
 
 #ifdef CONFIG_SMP
+static inline void scheduler_ipi(void) { }
 extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
 #else
+static inline void scheduler_ipi(void) { }
 static inline unsigned long wait_task_inactive(struct task_struct *p,
 					       long match_state)
 {

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Always provide p->on_cpu
  2011-04-05 15:23 ` [PATCH 02/21] sched: Always provide p->on_cpu Peter Zijlstra
@ 2011-04-14  8:31   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  3ca7a440da394808571dad32d33d3bc0389982e6
Gitweb:     http://git.kernel.org/tip/3ca7a440da394808571dad32d33d3bc0389982e6
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:40 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:32 +0200

sched: Always provide p->on_cpu

Always provide p->on_cpu so that we can determine if its on a cpu
without having to lock the rq.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152728.785452014@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h |    4 +---
 kernel/sched.c        |   46 +++++++++++++++++++++++++++++-----------------
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 758e27a..3435837 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1200,9 +1200,7 @@ struct task_struct {
 	int lock_depth;		/* BKL lock depth */
 
 #ifdef CONFIG_SMP
-#ifdef __ARCH_WANT_UNLOCKED_CTXSW
-	int oncpu;
-#endif
+	int on_cpu;
 #endif
 
 	int prio, static_prio, normal_prio;
diff --git a/kernel/sched.c b/kernel/sched.c
index a187c3f..cd2593e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -838,18 +838,39 @@ static inline int task_current(struct rq *rq, struct task_struct *p)
 	return rq->curr == p;
 }
 
-#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 static inline int task_running(struct rq *rq, struct task_struct *p)
 {
+#ifdef CONFIG_SMP
+	return p->on_cpu;
+#else
 	return task_current(rq, p);
+#endif
 }
 
+#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
+#ifdef CONFIG_SMP
+	/*
+	 * We can optimise this out completely for !SMP, because the
+	 * SMP rebalancing from interrupt is the only thing that cares
+	 * here.
+	 */
+	next->on_cpu = 1;
+#endif
 }
 
 static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 {
+#ifdef CONFIG_SMP
+	/*
+	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
+	 * We must ensure this doesn't happen until the switch is completely
+	 * finished.
+	 */
+	smp_wmb();
+	prev->on_cpu = 0;
+#endif
 #ifdef CONFIG_DEBUG_SPINLOCK
 	/* this is a valid case when another task releases the spinlock */
 	rq->lock.owner = current;
@@ -865,15 +886,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 }
 
 #else /* __ARCH_WANT_UNLOCKED_CTXSW */
-static inline int task_running(struct rq *rq, struct task_struct *p)
-{
-#ifdef CONFIG_SMP
-	return p->oncpu;
-#else
-	return task_current(rq, p);
-#endif
-}
-
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
 #ifdef CONFIG_SMP
@@ -882,7 +894,7 @@ static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 	 * SMP rebalancing from interrupt is the only thing that cares
 	 * here.
 	 */
-	next->oncpu = 1;
+	next->on_cpu = 1;
 #endif
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 	raw_spin_unlock_irq(&rq->lock);
@@ -895,12 +907,12 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 {
 #ifdef CONFIG_SMP
 	/*
-	 * After ->oncpu is cleared, the task can be moved to a different CPU.
+	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
 	 * We must ensure this doesn't happen until the switch is completely
 	 * finished.
 	 */
 	smp_wmb();
-	prev->oncpu = 0;
+	prev->on_cpu = 0;
 #endif
 #ifndef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 	local_irq_enable();
@@ -2686,8 +2698,8 @@ void sched_fork(struct task_struct *p, int clone_flags)
 	if (likely(sched_info_on()))
 		memset(&p->sched_info, 0, sizeof(p->sched_info));
 #endif
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
-	p->oncpu = 0;
+#if defined(CONFIG_SMP)
+	p->on_cpu = 0;
 #endif
 #ifdef CONFIG_PREEMPT
 	/* Want to start with kernel preemption disabled. */
@@ -5776,8 +5788,8 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
-	idle->oncpu = 1;
+#if defined(CONFIG_SMP)
+	idle->on_cpu = 1;
 #endif
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] mutex: Use p->on_cpu for the adaptive spin
  2011-04-05 15:23 ` [PATCH 03/21] mutex: Use p->on_cpu for the adaptive spin Peter Zijlstra
@ 2011-04-14  8:32   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  c6eb3dda25892f1f974f5420f63e6721aab02f6f
Gitweb:     http://git.kernel.org/tip/c6eb3dda25892f1f974f5420f63e6721aab02f6f
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:41 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:33 +0200

mutex: Use p->on_cpu for the adaptive spin

Since we now have p->on_cpu unconditionally available, use it to
re-implement mutex_spin_on_owner.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl
---
 include/linux/mutex.h |    2 +-
 include/linux/sched.h |    2 +-
 kernel/mutex-debug.c  |    2 +-
 kernel/mutex-debug.h  |    2 +-
 kernel/mutex.c        |    2 +-
 kernel/mutex.h        |    2 +-
 kernel/sched.c        |   83 +++++++++++++++++++-----------------------------
 7 files changed, 39 insertions(+), 56 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 94b48bd..c75471d 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -51,7 +51,7 @@ struct mutex {
 	spinlock_t		wait_lock;
 	struct list_head	wait_list;
 #if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP)
-	struct thread_info	*owner;
+	struct task_struct	*owner;
 #endif
 #ifdef CONFIG_DEBUG_MUTEXES
 	const char 		*name;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3435837..1738504 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -360,7 +360,7 @@ extern signed long schedule_timeout_interruptible(signed long timeout);
 extern signed long schedule_timeout_killable(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
 asmlinkage void schedule(void);
-extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner);
 
 struct nsproxy;
 struct user_namespace;
diff --git a/kernel/mutex-debug.c b/kernel/mutex-debug.c
index ec815a9..73da83a 100644
--- a/kernel/mutex-debug.c
+++ b/kernel/mutex-debug.c
@@ -75,7 +75,7 @@ void debug_mutex_unlock(struct mutex *lock)
 		return;
 
 	DEBUG_LOCKS_WARN_ON(lock->magic != lock);
-	DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
+	DEBUG_LOCKS_WARN_ON(lock->owner != current);
 	DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
 	mutex_clear_owner(lock);
 }
diff --git a/kernel/mutex-debug.h b/kernel/mutex-debug.h
index 57d527a..0799fd3 100644
--- a/kernel/mutex-debug.h
+++ b/kernel/mutex-debug.h
@@ -29,7 +29,7 @@ extern void debug_mutex_init(struct mutex *lock, const char *name,
 
 static inline void mutex_set_owner(struct mutex *lock)
 {
-	lock->owner = current_thread_info();
+	lock->owner = current;
 }
 
 static inline void mutex_clear_owner(struct mutex *lock)
diff --git a/kernel/mutex.c b/kernel/mutex.c
index c4195fa..fe4706c 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -160,7 +160,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 */
 
 	for (;;) {
-		struct thread_info *owner;
+		struct task_struct *owner;
 
 		/*
 		 * If we own the BKL, then don't spin. The owner of
diff --git a/kernel/mutex.h b/kernel/mutex.h
index 67578ca..4115fbf 100644
--- a/kernel/mutex.h
+++ b/kernel/mutex.h
@@ -19,7 +19,7 @@
 #ifdef CONFIG_SMP
 static inline void mutex_set_owner(struct mutex *lock)
 {
-	lock->owner = current_thread_info();
+	lock->owner = current;
 }
 
 static inline void mutex_clear_owner(struct mutex *lock)
diff --git a/kernel/sched.c b/kernel/sched.c
index cd2593e..55cc503 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4173,70 +4173,53 @@ need_resched:
 EXPORT_SYMBOL(schedule);
 
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-/*
- * Look out! "owner" is an entirely speculative pointer
- * access and not reliable.
- */
-int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
-{
-	unsigned int cpu;
-	struct rq *rq;
 
-	if (!sched_feat(OWNER_SPIN))
-		return 0;
+static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
+{
+	bool ret = false;
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
-	/*
-	 * Need to access the cpu field knowing that
-	 * DEBUG_PAGEALLOC could have unmapped it if
-	 * the mutex owner just released it and exited.
-	 */
-	if (probe_kernel_address(&owner->cpu, cpu))
-		return 0;
-#else
-	cpu = owner->cpu;
-#endif
+	rcu_read_lock();
+	if (lock->owner != owner)
+		goto fail;
 
 	/*
-	 * Even if the access succeeded (likely case),
-	 * the cpu field may no longer be valid.
+	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
+	 * lock->owner still matches owner, if that fails, owner might
+	 * point to free()d memory, if it still matches, the rcu_read_lock()
+	 * ensures the memory stays valid.
 	 */
-	if (cpu >= nr_cpumask_bits)
-		return 0;
+	barrier();
 
-	/*
-	 * We need to validate that we can do a
-	 * get_cpu() and that we have the percpu area.
-	 */
-	if (!cpu_online(cpu))
-		return 0;
+	ret = owner->on_cpu;
+fail:
+	rcu_read_unlock();
 
-	rq = cpu_rq(cpu);
+	return ret;
+}
 
-	for (;;) {
-		/*
-		 * Owner changed, break to re-assess state.
-		 */
-		if (lock->owner != owner) {
-			/*
-			 * If the lock has switched to a different owner,
-			 * we likely have heavy contention. Return 0 to quit
-			 * optimistic spinning and not contend further:
-			 */
-			if (lock->owner)
-				return 0;
-			break;
-		}
+/*
+ * Look out! "owner" is an entirely speculative pointer
+ * access and not reliable.
+ */
+int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+{
+	if (!sched_feat(OWNER_SPIN))
+		return 0;
 
-		/*
-		 * Is that owner really running on that cpu?
-		 */
-		if (task_thread_info(rq->curr) != owner || need_resched())
+	while (owner_running(lock, owner)) {
+		if (need_resched())
 			return 0;
 
 		arch_mutex_cpu_relax();
 	}
 
+	/*
+	 * If the owner changed to another task there is likely
+	 * heavy contention, stop spinning.
+	 */
+	if (lock->owner)
+		return 0;
+
 	return 1;
 }
 #endif

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Change the ttwu() success details
  2011-04-05 15:23 ` [PATCH 04/21] sched: Change the ttwu success details Peter Zijlstra
  2011-04-13  9:23   ` Peter Zijlstra
@ 2011-04-14  8:32   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  893633817f5b58f5227365d74344e0170a718213
Gitweb:     http://git.kernel.org/tip/893633817f5b58f5227365d74344e0170a718213
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:42 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:34 +0200

sched: Change the ttwu() success details

try_to_wake_up() would only return a success when it would have to
place a task on a rq, change that to every time we change p->state to
TASK_RUNNING, because that's the real measure of wakeups.

This results in that success is always true for the tracepoints.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152728.866866929@chello.nl
---
 kernel/sched.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 81ab58e..3919aa4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2427,10 +2427,10 @@ static inline void ttwu_activate(struct task_struct *p, struct rq *rq,
 		wq_worker_waking_up(p, cpu_of(rq));
 }
 
-static inline void ttwu_post_activation(struct task_struct *p, struct rq *rq,
-					int wake_flags, bool success)
+static void
+ttwu_post_activation(struct task_struct *p, struct rq *rq, int wake_flags)
 {
-	trace_sched_wakeup(p, success);
+	trace_sched_wakeup(p, true);
 	check_preempt_curr(rq, p, wake_flags);
 
 	p->state = TASK_RUNNING;
@@ -2546,9 +2546,9 @@ out_activate:
 #endif /* CONFIG_SMP */
 	ttwu_activate(p, rq, wake_flags & WF_SYNC, orig_cpu != cpu,
 		      cpu == this_cpu, en_flags);
-	success = 1;
 out_running:
-	ttwu_post_activation(p, rq, wake_flags, success);
+	ttwu_post_activation(p, rq, wake_flags);
+	success = 1;
 out:
 	task_rq_unlock(rq, &flags);
 	put_cpu();
@@ -2567,7 +2567,6 @@ out:
 static void try_to_wake_up_local(struct task_struct *p)
 {
 	struct rq *rq = task_rq(p);
-	bool success = false;
 
 	BUG_ON(rq != this_rq());
 	BUG_ON(p == current);
@@ -2582,9 +2581,8 @@ static void try_to_wake_up_local(struct task_struct *p)
 			schedstat_inc(rq, ttwu_local);
 		}
 		ttwu_activate(p, rq, false, false, true, ENQUEUE_WAKEUP);
-		success = true;
 	}
-	ttwu_post_activation(p, rq, 0, success);
+	ttwu_post_activation(p, rq, 0);
 }
 
 /**
@@ -2747,7 +2745,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 
 	rq = task_rq_lock(p, &flags);
 	activate_task(rq, p, 0);
-	trace_sched_wakeup_new(p, 1);
+	trace_sched_wakeup_new(p, true);
 	check_preempt_curr(rq, p, WF_FORK);
 #ifdef CONFIG_SMP
 	if (p->sched_class->task_woken)

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Clean up ttwu() stats
  2011-04-05 15:23 ` [PATCH 05/21] sched: Clean up ttwu stats Peter Zijlstra
@ 2011-04-14  8:33   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  d7c01d27ab767a30d672d1fd657aa8336ebdcbca
Gitweb:     http://git.kernel.org/tip/d7c01d27ab767a30d672d1fd657aa8336ebdcbca
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:34 +0200

sched: Clean up ttwu() stats

Collect all ttwu() stat code into a single function and ensure its
always called for an actual wakeup (changing p->state to
TASK_RUNNING).

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152728.908177058@chello.nl
---
 kernel/sched.c |   75 +++++++++++++++++++++++++++++--------------------------
 1 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 3919aa4..4481638 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2406,20 +2406,43 @@ static void update_avg(u64 *avg, u64 sample)
 }
 #endif
 
-static inline void ttwu_activate(struct task_struct *p, struct rq *rq,
-				 bool is_sync, bool is_migrate, bool is_local,
-				 unsigned long en_flags)
+static void
+ttwu_stat(struct rq *rq, struct task_struct *p, int cpu, int wake_flags)
 {
+#ifdef CONFIG_SCHEDSTATS
+#ifdef CONFIG_SMP
+	int this_cpu = smp_processor_id();
+
+	if (cpu == this_cpu) {
+		schedstat_inc(rq, ttwu_local);
+		schedstat_inc(p, se.statistics.nr_wakeups_local);
+	} else {
+		struct sched_domain *sd;
+
+		schedstat_inc(p, se.statistics.nr_wakeups_remote);
+		for_each_domain(this_cpu, sd) {
+			if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
+				schedstat_inc(sd, ttwu_wake_remote);
+				break;
+			}
+		}
+	}
+#endif /* CONFIG_SMP */
+
+	schedstat_inc(rq, ttwu_count);
 	schedstat_inc(p, se.statistics.nr_wakeups);
-	if (is_sync)
+
+	if (wake_flags & WF_SYNC)
 		schedstat_inc(p, se.statistics.nr_wakeups_sync);
-	if (is_migrate)
+
+	if (cpu != task_cpu(p))
 		schedstat_inc(p, se.statistics.nr_wakeups_migrate);
-	if (is_local)
-		schedstat_inc(p, se.statistics.nr_wakeups_local);
-	else
-		schedstat_inc(p, se.statistics.nr_wakeups_remote);
 
+#endif /* CONFIG_SCHEDSTATS */
+}
+
+static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
+{
 	activate_task(rq, p, en_flags);
 
 	/* if a worker is waking up, notify workqueue */
@@ -2481,12 +2504,12 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	if (!(p->state & state))
 		goto out;
 
+	cpu = task_cpu(p);
+
 	if (p->se.on_rq)
 		goto out_running;
 
-	cpu = task_cpu(p);
 	orig_cpu = cpu;
-
 #ifdef CONFIG_SMP
 	if (unlikely(task_running(rq, p)))
 		goto out_activate;
@@ -2527,27 +2550,12 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	WARN_ON(task_cpu(p) != cpu);
 	WARN_ON(p->state != TASK_WAKING);
 
-#ifdef CONFIG_SCHEDSTATS
-	schedstat_inc(rq, ttwu_count);
-	if (cpu == this_cpu)
-		schedstat_inc(rq, ttwu_local);
-	else {
-		struct sched_domain *sd;
-		for_each_domain(this_cpu, sd) {
-			if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
-				schedstat_inc(sd, ttwu_wake_remote);
-				break;
-			}
-		}
-	}
-#endif /* CONFIG_SCHEDSTATS */
-
 out_activate:
 #endif /* CONFIG_SMP */
-	ttwu_activate(p, rq, wake_flags & WF_SYNC, orig_cpu != cpu,
-		      cpu == this_cpu, en_flags);
+	ttwu_activate(rq, p, en_flags);
 out_running:
 	ttwu_post_activation(p, rq, wake_flags);
+	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
 out:
 	task_rq_unlock(rq, &flags);
@@ -2575,14 +2583,11 @@ static void try_to_wake_up_local(struct task_struct *p)
 	if (!(p->state & TASK_NORMAL))
 		return;
 
-	if (!p->se.on_rq) {
-		if (likely(!task_running(rq, p))) {
-			schedstat_inc(rq, ttwu_count);
-			schedstat_inc(rq, ttwu_local);
-		}
-		ttwu_activate(p, rq, false, false, true, ENQUEUE_WAKEUP);
-	}
+	if (!p->se.on_rq)
+		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+
 	ttwu_post_activation(p, rq, 0);
+	ttwu_stat(rq, p, smp_processor_id(), 0);
 }
 
 /**

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Provide p->on_rq
  2011-04-05 15:23 ` [PATCH 06/21] sched: Provide p->on_rq Peter Zijlstra
@ 2011-04-14  8:33   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  fd2f4419b4cbe8fe90796df9617c355762afd6a4
Gitweb:     http://git.kernel.org/tip/fd2f4419b4cbe8fe90796df9617c355762afd6a4
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:44 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:35 +0200

sched: Provide p->on_rq

Provide a generic p->on_rq because the p->se.on_rq semantics are
unfavourable for lockless wakeups but needed for sched_fair.

In particular, p->on_rq is only cleared when we actually dequeue the
task in schedule() and not on any random dequeue as done by things
like __migrate_task() and __sched_setscheduler().

This also allows us to remove p->se usage from !sched_fair code.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152728.949545047@chello.nl
---
 include/linux/sched.h   |    1 +
 kernel/sched.c          |   38 ++++++++++++++++++++------------------
 kernel/sched_debug.c    |    2 +-
 kernel/sched_rt.c       |   16 ++++++++--------
 kernel/sched_stoptask.c |    2 +-
 5 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1738504..b33a700 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1202,6 +1202,7 @@ struct task_struct {
 #ifdef CONFIG_SMP
 	int on_cpu;
 #endif
+	int on_rq;
 
 	int prio, static_prio, normal_prio;
 	unsigned int rt_priority;
diff --git a/kernel/sched.c b/kernel/sched.c
index 4481638..dece28e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1785,7 +1785,6 @@ static void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 	update_rq_clock(rq);
 	sched_info_queued(p);
 	p->sched_class->enqueue_task(rq, p, flags);
-	p->se.on_rq = 1;
 }
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
@@ -1793,7 +1792,6 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
 	update_rq_clock(rq);
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, flags);
-	p->se.on_rq = 0;
 }
 
 /*
@@ -2128,7 +2126,7 @@ static void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (rq->curr->se.on_rq && test_tsk_need_resched(rq->curr))
+	if (rq->curr->on_rq && test_tsk_need_resched(rq->curr))
 		rq->skip_clock_update = 1;
 }
 
@@ -2203,7 +2201,7 @@ static bool migrate_task(struct task_struct *p, struct rq *rq)
 	 * If the task is not on a runqueue (and not running), then
 	 * the next wake-up will properly place the task.
 	 */
-	return p->se.on_rq || task_running(rq, p);
+	return p->on_rq || task_running(rq, p);
 }
 
 /*
@@ -2263,7 +2261,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
 		rq = task_rq_lock(p, &flags);
 		trace_sched_wait_task(p);
 		running = task_running(rq, p);
-		on_rq = p->se.on_rq;
+		on_rq = p->on_rq;
 		ncsw = 0;
 		if (!match_state || p->state == match_state)
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
@@ -2444,6 +2442,7 @@ ttwu_stat(struct rq *rq, struct task_struct *p, int cpu, int wake_flags)
 static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
 {
 	activate_task(rq, p, en_flags);
+	p->on_rq = 1;
 
 	/* if a worker is waking up, notify workqueue */
 	if (p->flags & PF_WQ_WORKER)
@@ -2506,7 +2505,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 
 	cpu = task_cpu(p);
 
-	if (p->se.on_rq)
+	if (p->on_rq)
 		goto out_running;
 
 	orig_cpu = cpu;
@@ -2583,7 +2582,7 @@ static void try_to_wake_up_local(struct task_struct *p)
 	if (!(p->state & TASK_NORMAL))
 		return;
 
-	if (!p->se.on_rq)
+	if (!p->on_rq)
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
 	ttwu_post_activation(p, rq, 0);
@@ -2620,19 +2619,21 @@ int wake_up_state(struct task_struct *p, unsigned int state)
  */
 static void __sched_fork(struct task_struct *p)
 {
+	p->on_rq			= 0;
+
+	p->se.on_rq			= 0;
 	p->se.exec_start		= 0;
 	p->se.sum_exec_runtime		= 0;
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
+	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
 	INIT_LIST_HEAD(&p->rt.run_list);
-	p->se.on_rq = 0;
-	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
@@ -2750,6 +2751,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 
 	rq = task_rq_lock(p, &flags);
 	activate_task(rq, p, 0);
+	p->on_rq = 1;
 	trace_sched_wakeup_new(p, true);
 	check_preempt_curr(rq, p, WF_FORK);
 #ifdef CONFIG_SMP
@@ -4051,7 +4053,7 @@ static inline void schedule_debug(struct task_struct *prev)
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->se.on_rq)
+	if (prev->on_rq)
 		update_rq_clock(rq);
 	prev->sched_class->put_prev_task(rq, prev);
 }
@@ -4126,7 +4128,9 @@ need_resched:
 				if (to_wakeup)
 					try_to_wake_up_local(to_wakeup);
 			}
+
 			deactivate_task(rq, prev, DEQUEUE_SLEEP);
+			prev->on_rq = 0;
 
 			/*
 			 * If we are going to sleep and we have plugged IO queued, make
@@ -4695,7 +4699,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 	trace_sched_pi_setprio(p, prio);
 	oldprio = p->prio;
 	prev_class = p->sched_class;
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
 		dequeue_task(rq, p, 0);
@@ -4743,7 +4747,7 @@ void set_user_nice(struct task_struct *p, long nice)
 		p->static_prio = NICE_TO_PRIO(nice);
 		goto out_unlock;
 	}
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	if (on_rq)
 		dequeue_task(rq, p, 0);
 
@@ -4877,8 +4881,6 @@ static struct task_struct *find_process_by_pid(pid_t pid)
 static void
 __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
 {
-	BUG_ON(p->se.on_rq);
-
 	p->policy = policy;
 	p->rt_priority = prio;
 	p->normal_prio = normal_prio(p);
@@ -5044,7 +5046,7 @@ recheck:
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 		goto recheck;
 	}
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
 		deactivate_task(rq, p, 0);
@@ -5965,7 +5967,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
 	 * If we're not on a rq, the next wake-up will ensure we're
 	 * placed properly.
 	 */
-	if (p->se.on_rq) {
+	if (p->on_rq) {
 		deactivate_task(rq_src, p, 0);
 		set_task_cpu(p, dest_cpu);
 		activate_task(rq_dest, p, 0);
@@ -8339,7 +8341,7 @@ static void normalize_task(struct rq *rq, struct task_struct *p)
 	int old_prio = p->prio;
 	int on_rq;
 
-	on_rq = p->se.on_rq;
+	on_rq = p->on_rq;
 	if (on_rq)
 		deactivate_task(rq, p, 0);
 	__setscheduler(rq, p, SCHED_NORMAL, 0);
@@ -8682,7 +8684,7 @@ void sched_move_task(struct task_struct *tsk)
 	rq = task_rq_lock(tsk, &flags);
 
 	running = task_current(rq, tsk);
-	on_rq = tsk->se.on_rq;
+	on_rq = tsk->on_rq;
 
 	if (on_rq)
 		dequeue_task(rq, tsk, 0);
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index 7bacd83..3669bec6 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -152,7 +152,7 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
 	read_lock_irqsave(&tasklist_lock, flags);
 
 	do_each_thread(g, p) {
-		if (!p->se.on_rq || task_cpu(p) != rq_cpu)
+		if (!p->on_rq || task_cpu(p) != rq_cpu)
 			continue;
 
 		print_task(m, rq, p);
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index e7cebdc..9ca4f5f 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1136,7 +1136,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active
 	 */
-	if (p->se.on_rq && p->rt.nr_cpus_allowed > 1)
+	if (on_rt_rq(&p->rt) && p->rt.nr_cpus_allowed > 1)
 		enqueue_pushable_task(rq, p);
 }
 
@@ -1287,7 +1287,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 				     !cpumask_test_cpu(lowest_rq->cpu,
 						       &task->cpus_allowed) ||
 				     task_running(rq, task) ||
-				     !task->se.on_rq)) {
+				     !task->on_rq)) {
 
 				raw_spin_unlock(&lowest_rq->lock);
 				lowest_rq = NULL;
@@ -1321,7 +1321,7 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq)
 	BUG_ON(task_current(rq, p));
 	BUG_ON(p->rt.nr_cpus_allowed <= 1);
 
-	BUG_ON(!p->se.on_rq);
+	BUG_ON(!p->on_rq);
 	BUG_ON(!rt_task(p));
 
 	return p;
@@ -1467,7 +1467,7 @@ static int pull_rt_task(struct rq *this_rq)
 		 */
 		if (p && (p->prio < this_rq->rt.highest_prio.curr)) {
 			WARN_ON(p == src_rq->curr);
-			WARN_ON(!p->se.on_rq);
+			WARN_ON(!p->on_rq);
 
 			/*
 			 * There's a chance that p is higher in priority
@@ -1538,7 +1538,7 @@ static void set_cpus_allowed_rt(struct task_struct *p,
 	 * Update the migration status of the RQ if we have an RT task
 	 * which is running AND changing its weight value.
 	 */
-	if (p->se.on_rq && (weight != p->rt.nr_cpus_allowed)) {
+	if (p->on_rq && (weight != p->rt.nr_cpus_allowed)) {
 		struct rq *rq = task_rq(p);
 
 		if (!task_current(rq, p)) {
@@ -1608,7 +1608,7 @@ static void switched_from_rt(struct rq *rq, struct task_struct *p)
 	 * we may need to handle the pulling of RT tasks
 	 * now.
 	 */
-	if (p->se.on_rq && !rq->rt.rt_nr_running)
+	if (p->on_rq && !rq->rt.rt_nr_running)
 		pull_rt_task(rq);
 }
 
@@ -1638,7 +1638,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 	 * If that current running task is also an RT task
 	 * then see if we can move to another run queue.
 	 */
-	if (p->se.on_rq && rq->curr != p) {
+	if (p->on_rq && rq->curr != p) {
 #ifdef CONFIG_SMP
 		if (rq->rt.overloaded && push_rt_task(rq) &&
 		    /* Don't resched if we changed runqueues */
@@ -1657,7 +1657,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 static void
 prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 {
-	if (!p->se.on_rq)
+	if (!p->on_rq)
 		return;
 
 	if (rq->curr == p) {
diff --git a/kernel/sched_stoptask.c b/kernel/sched_stoptask.c
index 1ba2bd4..f607de4 100644
--- a/kernel/sched_stoptask.c
+++ b/kernel/sched_stoptask.c
@@ -26,7 +26,7 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
 {
 	struct task_struct *stop = rq->stop;
 
-	if (stop && stop->se.on_rq)
+	if (stop && stop->on_rq)
 		return stop;
 
 	return NULL;

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock
  2011-04-05 15:23 ` [PATCH 07/21] sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock Peter Zijlstra
@ 2011-04-14  8:34   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  013fdb8086acaae5f8eb96f9ad48fcd98882ac46
Gitweb:     http://git.kernel.org/tip/013fdb8086acaae5f8eb96f9ad48fcd98882ac46
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:45 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:35 +0200

sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock

Currently p->pi_lock already serializes p->sched_class, also put
p->cpus_allowed and try_to_wake_up() under it, this prepares the way
to do the first part of ttwu() without holding rq->lock.

By having p->sched_class and p->cpus_allowed serialized by p->pi_lock,
we prepare the way to call select_task_rq() without holding rq->lock.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152728.990364093@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   37 ++++++++++++++++---------------------
 1 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index dece28e..d398f2f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2340,7 +2340,7 @@ EXPORT_SYMBOL_GPL(kick_process);
 
 #ifdef CONFIG_SMP
 /*
- * ->cpus_allowed is protected by either TASK_WAKING or rq->lock held.
+ * ->cpus_allowed is protected by both rq->lock and p->pi_lock
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -2373,7 +2373,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 }
 
 /*
- * The caller (fork, wakeup) owns TASK_WAKING, ->cpus_allowed is stable.
+ * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
  */
 static inline
 int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags)
@@ -2499,7 +2499,8 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	this_cpu = get_cpu();
 
 	smp_wmb();
-	rq = task_rq_lock(p, &flags);
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	rq = __task_rq_lock(p);
 	if (!(p->state & state))
 		goto out;
 
@@ -2557,7 +2558,8 @@ out_running:
 	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
 out:
-	task_rq_unlock(rq, &flags);
+	__task_rq_unlock(rq);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 	put_cpu();
 
 	return success;
@@ -4694,6 +4696,8 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 
 	BUG_ON(prio < 0 || prio > MAX_PRIO);
 
+	lockdep_assert_held(&p->pi_lock);
+
 	rq = task_rq_lock(p, &flags);
 
 	trace_sched_pi_setprio(p, prio);
@@ -5317,7 +5321,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
 {
 	struct task_struct *p;
 	unsigned long flags;
-	struct rq *rq;
 	int retval;
 
 	get_online_cpus();
@@ -5332,9 +5335,9 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
 	if (retval)
 		goto out_unlock;
 
-	rq = task_rq_lock(p, &flags);
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	cpumask_and(mask, &p->cpus_allowed, cpu_online_mask);
-	task_rq_unlock(rq, &flags);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 out_unlock:
 	rcu_read_unlock();
@@ -5882,18 +5885,8 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 	unsigned int dest_cpu;
 	int ret = 0;
 
-	/*
-	 * Serialize against TASK_WAKING so that ttwu() and wunt() can
-	 * drop the rq->lock and still rely on ->cpus_allowed.
-	 */
-again:
-	while (task_is_waking(p))
-		cpu_relax();
-	rq = task_rq_lock(p, &flags);
-	if (task_is_waking(p)) {
-		task_rq_unlock(rq, &flags);
-		goto again;
-	}
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	rq = __task_rq_lock(p);
 
 	if (!cpumask_intersects(new_mask, cpu_active_mask)) {
 		ret = -EINVAL;
@@ -5921,13 +5914,15 @@ again:
 	if (migrate_task(p, rq)) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
-		task_rq_unlock(rq, &flags);
+		__task_rq_unlock(rq);
+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		tlb_migrate_finish(p->mm);
 		return 0;
 	}
 out:
-	task_rq_unlock(rq, &flags);
+	__task_rq_unlock(rq);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 	return ret;
 }

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Drop the rq argument to sched_class::select_task_rq()
  2011-04-05 15:23 ` [PATCH 08/21] sched: Drop the rq argument to sched_class::select_task_rq() Peter Zijlstra
@ 2011-04-14  8:34   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  7608dec2ce2004c234339bef8c8074e5e601d0e9
Gitweb:     http://git.kernel.org/tip/7608dec2ce2004c234339bef8c8074e5e601d0e9
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:46 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:36 +0200

sched: Drop the rq argument to sched_class::select_task_rq()

In preparation of calling select_task_rq() without rq->lock held, drop
the dependency on the rq argument.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.031077745@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h   |    3 +--
 kernel/sched.c          |   20 +++++++++++---------
 kernel/sched_fair.c     |    2 +-
 kernel/sched_idletask.c |    2 +-
 kernel/sched_rt.c       |   38 ++++++++++++++++++++++++++------------
 kernel/sched_stoptask.c |    3 +--
 6 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b33a700..ff4e2f9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1067,8 +1067,7 @@ struct sched_class {
 	void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
 #ifdef CONFIG_SMP
-	int  (*select_task_rq)(struct rq *rq, struct task_struct *p,
-			       int sd_flag, int flags);
+	int  (*select_task_rq)(struct task_struct *p, int sd_flag, int flags);
 
 	void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
 	void (*post_schedule) (struct rq *this_rq);
diff --git a/kernel/sched.c b/kernel/sched.c
index d398f2f..d4b815d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2195,13 +2195,15 @@ static int migration_cpu_stop(void *data);
  * The task's runqueue lock must be held.
  * Returns true if you have to wait for migration thread.
  */
-static bool migrate_task(struct task_struct *p, struct rq *rq)
+static bool need_migrate_task(struct task_struct *p)
 {
 	/*
 	 * If the task is not on a runqueue (and not running), then
 	 * the next wake-up will properly place the task.
 	 */
-	return p->on_rq || task_running(rq, p);
+	bool running = p->on_rq || p->on_cpu;
+	smp_rmb(); /* finish_lock_switch() */
+	return running;
 }
 
 /*
@@ -2376,9 +2378,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
  */
 static inline
-int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags)
+int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
 {
-	int cpu = p->sched_class->select_task_rq(rq, p, sd_flags, wake_flags);
+	int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags);
 
 	/*
 	 * In order not to call set_task_cpu() on a blocking task we need
@@ -2533,7 +2535,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 		en_flags |= ENQUEUE_WAKING;
 	}
 
-	cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags);
+	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
 	if (cpu != orig_cpu)
 		set_task_cpu(p, cpu);
 	__task_rq_unlock(rq);
@@ -2744,7 +2746,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 	 * We set TASK_WAKING so that select_task_rq() can drop rq->lock
 	 * without people poking at ->cpus_allowed.
 	 */
-	cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
+	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
 	set_task_cpu(p, cpu);
 
 	p->state = TASK_RUNNING;
@@ -3474,7 +3476,7 @@ void sched_exec(void)
 	int dest_cpu;
 
 	rq = task_rq_lock(p, &flags);
-	dest_cpu = p->sched_class->select_task_rq(rq, p, SD_BALANCE_EXEC, 0);
+	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
 		goto unlock;
 
@@ -3482,7 +3484,7 @@ void sched_exec(void)
 	 * select_task_rq() can race against ->cpus_allowed
 	 */
 	if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
-	    likely(cpu_active(dest_cpu)) && migrate_task(p, rq)) {
+	    likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 
 		task_rq_unlock(rq, &flags);
@@ -5911,7 +5913,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (migrate_task(p, rq)) {
+	if (need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		__task_rq_unlock(rq);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 4ee50f0..96b2c95 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1657,7 +1657,7 @@ static int select_idle_sibling(struct task_struct *p, int target)
  * preempt must be disabled.
  */
 static int
-select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_flags)
+select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 {
 	struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
 	int cpu = smp_processor_id();
diff --git a/kernel/sched_idletask.c b/kernel/sched_idletask.c
index a776a63..0a51882 100644
--- a/kernel/sched_idletask.c
+++ b/kernel/sched_idletask.c
@@ -7,7 +7,7 @@
 
 #ifdef CONFIG_SMP
 static int
-select_task_rq_idle(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
+select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
 {
 	return task_cpu(p); /* IDLE tasks as never migrated */
 }
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 9ca4f5f..19ecb31 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -977,13 +977,23 @@ static void yield_task_rt(struct rq *rq)
 static int find_lowest_rq(struct task_struct *task);
 
 static int
-select_task_rq_rt(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
+select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
 {
+	struct task_struct *curr;
+	struct rq *rq;
+	int cpu;
+
 	if (sd_flag != SD_BALANCE_WAKE)
 		return smp_processor_id();
 
+	cpu = task_cpu(p);
+	rq = cpu_rq(cpu);
+
+	rcu_read_lock();
+	curr = ACCESS_ONCE(rq->curr); /* unlocked access */
+
 	/*
-	 * If the current task is an RT task, then
+	 * If the current task on @p's runqueue is an RT task, then
 	 * try to see if we can wake this RT task up on another
 	 * runqueue. Otherwise simply start this RT task
 	 * on its current runqueue.
@@ -997,21 +1007,25 @@ select_task_rq_rt(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
 	 * lock?
 	 *
 	 * For equal prio tasks, we just let the scheduler sort it out.
+	 *
+	 * Otherwise, just let it ride on the affined RQ and the
+	 * post-schedule router will push the preempted task away
+	 *
+	 * This test is optimistic, if we get it wrong the load-balancer
+	 * will have to sort it out.
 	 */
-	if (unlikely(rt_task(rq->curr)) &&
-	    (rq->curr->rt.nr_cpus_allowed < 2 ||
-	     rq->curr->prio < p->prio) &&
+	if (curr && unlikely(rt_task(curr)) &&
+	    (curr->rt.nr_cpus_allowed < 2 ||
+	     curr->prio < p->prio) &&
 	    (p->rt.nr_cpus_allowed > 1)) {
-		int cpu = find_lowest_rq(p);
+		int target = find_lowest_rq(p);
 
-		return (cpu == -1) ? task_cpu(p) : cpu;
+		if (target != -1)
+			cpu = target;
 	}
+	rcu_read_unlock();
 
-	/*
-	 * Otherwise, just let it ride on the affined RQ and the
-	 * post-schedule router will push the preempted task away
-	 */
-	return task_cpu(p);
+	return cpu;
 }
 
 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
diff --git a/kernel/sched_stoptask.c b/kernel/sched_stoptask.c
index f607de4..6f43763 100644
--- a/kernel/sched_stoptask.c
+++ b/kernel/sched_stoptask.c
@@ -9,8 +9,7 @@
 
 #ifdef CONFIG_SMP
 static int
-select_task_rq_stop(struct rq *rq, struct task_struct *p,
-		    int sd_flag, int flags)
+select_task_rq_stop(struct task_struct *p, int sd_flag, int flags)
 {
 	return task_cpu(p); /* stop tasks as never migrate */
 }

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Remove rq argument to sched_class::task_waking()
  2011-04-05 15:23 ` [PATCH 09/21] sched: Remove rq argument to sched_class::task_waking() Peter Zijlstra
@ 2011-04-14  8:35   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  74f8e4b2335de45485b8d5b31a504747f13c8070
Gitweb:     http://git.kernel.org/tip/74f8e4b2335de45485b8d5b31a504747f13c8070
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:47 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:36 +0200

sched: Remove rq argument to sched_class::task_waking()

In preparation of calling this without rq->lock held, remove the
dependency on the rq argument.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.071474242@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h |   10 +++++++---
 kernel/sched.c        |    2 +-
 kernel/sched_fair.c   |    4 +++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ff4e2f9..7f5732f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1048,8 +1048,12 @@ struct sched_domain;
 #define WF_FORK		0x02		/* child wakeup after fork */
 
 #define ENQUEUE_WAKEUP		1
-#define ENQUEUE_WAKING		2
-#define ENQUEUE_HEAD		4
+#define ENQUEUE_HEAD		2
+#ifdef CONFIG_SMP
+#define ENQUEUE_WAKING		4	/* sched_class::task_waking was called */
+#else
+#define ENQUEUE_WAKING		0
+#endif
 
 #define DEQUEUE_SLEEP		1
 
@@ -1071,7 +1075,7 @@ struct sched_class {
 
 	void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
 	void (*post_schedule) (struct rq *this_rq);
-	void (*task_waking) (struct rq *this_rq, struct task_struct *task);
+	void (*task_waking) (struct task_struct *task);
 	void (*task_woken) (struct rq *this_rq, struct task_struct *task);
 
 	void (*set_cpus_allowed)(struct task_struct *p,
diff --git a/kernel/sched.c b/kernel/sched.c
index d4b815d..46f42ca 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2531,7 +2531,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	p->state = TASK_WAKING;
 
 	if (p->sched_class->task_waking) {
-		p->sched_class->task_waking(rq, p);
+		p->sched_class->task_waking(p);
 		en_flags |= ENQUEUE_WAKING;
 	}
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 96b2c95..ad4c414f 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1372,11 +1372,13 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 
 #ifdef CONFIG_SMP
 
-static void task_waking_fair(struct rq *rq, struct task_struct *p)
+static void task_waking_fair(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
+	lockdep_assert_held(&task_rq(p)->lock);
+
 	se->vruntime -= cfs_rq->min_vruntime;
 }
 

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Deal with non-atomic min_vruntime reads on 32bits
  2011-04-05 15:23 ` [PATCH 10/21] sched: Deal with non-atomic min_vruntime reads on 32bits Peter Zijlstra
@ 2011-04-14  8:35   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  3fe1698b7fe05aeb063564e71e40d09f28d8e80c
Gitweb:     http://git.kernel.org/tip/3fe1698b7fe05aeb063564e71e40d09f28d8e80c
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:48 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:37 +0200

sched: Deal with non-atomic min_vruntime reads on 32bits

In order to avoid reading partial updated min_vruntime values on 32bit
implement a seqcount like solution.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.111378493@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c      |    3 +++
 kernel/sched_fair.c |   19 +++++++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 46f42ca..7a5eb26 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -312,6 +312,9 @@ struct cfs_rq {
 
 	u64 exec_clock;
 	u64 min_vruntime;
+#ifndef CONFIG_64BIT
+	u64 min_vruntime_copy;
+#endif
 
 	struct rb_root tasks_timeline;
 	struct rb_node *rb_leftmost;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index ad4c414f..054cebb 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -358,6 +358,10 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 	}
 
 	cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
+#ifndef CONFIG_64BIT
+	smp_wmb();
+	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
+#endif
 }
 
 /*
@@ -1376,10 +1380,21 @@ static void task_waking_fair(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	u64 min_vruntime;
 
-	lockdep_assert_held(&task_rq(p)->lock);
+#ifndef CONFIG_64BIT
+	u64 min_vruntime_copy;
 
-	se->vruntime -= cfs_rq->min_vruntime;
+	do {
+		min_vruntime_copy = cfs_rq->min_vruntime_copy;
+		smp_rmb();
+		min_vruntime = cfs_rq->min_vruntime;
+	} while (min_vruntime != min_vruntime_copy);
+#else
+	min_vruntime = cfs_rq->min_vruntime;
+#endif
+
+	se->vruntime -= min_vruntime;
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Delay task_contributes_to_load()
  2011-04-05 15:23 ` [PATCH 11/21] sched: Delay task_contributes_to_load() Peter Zijlstra
@ 2011-04-14  8:35   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  a8e4f2eaecc9bfa4954adf79a04f4f22fddd829c
Gitweb:     http://git.kernel.org/tip/a8e4f2eaecc9bfa4954adf79a04f4f22fddd829c
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:49 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:37 +0200

sched: Delay task_contributes_to_load()

In prepratation of having to call task_contributes_to_load() without
holding rq->lock, we need to store the result until we do and can
update the rq accounting accordingly.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152729.151523907@chello.nl
---
 include/linux/sched.h |    1 +
 kernel/sched.c        |   16 ++++------------
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7f5732f..25c5031 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1273,6 +1273,7 @@ struct task_struct {
 
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
+	unsigned sched_contributes_to_load:1;
 
 	pid_t pid;
 	pid_t tgid;
diff --git a/kernel/sched.c b/kernel/sched.c
index 7a5eb26..fd32b78 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2519,18 +2519,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	if (unlikely(task_running(rq, p)))
 		goto out_activate;
 
-	/*
-	 * In order to handle concurrent wakeups and release the rq->lock
-	 * we put the task in TASK_WAKING state.
-	 *
-	 * First fix up the nr_uninterruptible count:
-	 */
-	if (task_contributes_to_load(p)) {
-		if (likely(cpu_online(orig_cpu)))
-			rq->nr_uninterruptible--;
-		else
-			this_rq()->nr_uninterruptible--;
-	}
+	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
 	if (p->sched_class->task_waking) {
@@ -2555,6 +2544,9 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	WARN_ON(task_cpu(p) != cpu);
 	WARN_ON(p->state != TASK_WAKING);
 
+	if (p->sched_contributes_to_load)
+		rq->nr_uninterruptible--;
+
 out_activate:
 #endif /* CONFIG_SMP */
 	ttwu_activate(rq, p, en_flags);

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Also serialize ttwu_local() with p->pi_lock
  2011-04-05 15:23 ` [PATCH 12/21] sched: Also serialize ttwu_local() with p->pi_lock Peter Zijlstra
@ 2011-04-14  8:36   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  2acca55ed98ad9b9aa25e7e587ebe306c0313dc7
Gitweb:     http://git.kernel.org/tip/2acca55ed98ad9b9aa25e7e587ebe306c0313dc7
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:50 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:37 +0200

sched: Also serialize ttwu_local() with p->pi_lock

Since we now serialize ttwu() using p->pi_lock, we also need to
serialize ttwu_local() using that, otherwise, once we drop the
rq->lock from ttwu() it can race with ttwu_local().

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152729.192366907@chello.nl
---
 kernel/sched.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index fd32b78..6b269b7 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2566,9 +2566,9 @@ out:
  * try_to_wake_up_local - try to wake up a local task with rq lock held
  * @p: the thread to be awakened
  *
- * Put @p on the run-queue if it's not already there.  The caller must
+ * Put @p on the run-queue if it's not already there. The caller must
  * ensure that this_rq() is locked, @p is bound to this_rq() and not
- * the current task.  this_rq() stays locked over invocation.
+ * the current task.
  */
 static void try_to_wake_up_local(struct task_struct *p)
 {
@@ -2578,14 +2578,22 @@ static void try_to_wake_up_local(struct task_struct *p)
 	BUG_ON(p == current);
 	lockdep_assert_held(&rq->lock);
 
+	if (!raw_spin_trylock(&p->pi_lock)) {
+		raw_spin_unlock(&rq->lock);
+		raw_spin_lock(&p->pi_lock);
+		raw_spin_lock(&rq->lock);
+	}
+
 	if (!(p->state & TASK_NORMAL))
-		return;
+		goto out;
 
 	if (!p->on_rq)
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
 	ttwu_post_activation(p, rq, 0);
 	ttwu_stat(rq, p, smp_processor_id(), 0);
+out:
+	raw_spin_unlock(&p->pi_lock);
 }
 
 /**
@@ -4114,11 +4122,13 @@ need_resched:
 		if (unlikely(signal_pending_state(prev->state, prev))) {
 			prev->state = TASK_RUNNING;
 		} else {
+			deactivate_task(rq, prev, DEQUEUE_SLEEP);
+			prev->on_rq = 0;
+
 			/*
-			 * If a worker is going to sleep, notify and
-			 * ask workqueue whether it wants to wake up a
-			 * task to maintain concurrency.  If so, wake
-			 * up the task.
+			 * If a worker went to sleep, notify and ask workqueue
+			 * whether it wants to wake up a task to maintain
+			 * concurrency.
 			 */
 			if (prev->flags & PF_WQ_WORKER) {
 				struct task_struct *to_wakeup;
@@ -4128,12 +4138,9 @@ need_resched:
 					try_to_wake_up_local(to_wakeup);
 			}
 
-			deactivate_task(rq, prev, DEQUEUE_SLEEP);
-			prev->on_rq = 0;
-
 			/*
-			 * If we are going to sleep and we have plugged IO queued, make
-			 * sure to submit it to avoid deadlocks.
+			 * If we are going to sleep and we have plugged IO
+			 * queued, make sure to submit it to avoid deadlocks.
 			 */
 			if (blk_needs_flush_plug(prev)) {
 				raw_spin_unlock(&rq->lock);

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-04-05 15:23 ` [PATCH 13/21] sched: Add p->pi_lock to task_rq_lock() Peter Zijlstra
@ 2011-04-14  8:36   ` tip-bot for Peter Zijlstra
  2011-06-01 13:58     ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  0122ec5b02f766c355b3168df53a6c038a24fa0d
Gitweb:     http://git.kernel.org/tip/0122ec5b02f766c355b3168df53a6c038a24fa0d
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:51 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:38 +0200

sched: Add p->pi_lock to task_rq_lock()

In order to be able to call set_task_cpu() while either holding
p->pi_lock or task_rq(p)->lock we need to hold both locks in order to
stabilize task_rq().

This makes task_rq_lock() acquire both locks, and have
__task_rq_lock() validate that p->pi_lock is held. This increases the
locking overhead for most scheduler syscalls but allows reduction of
rq->lock contention for some scheduler hot paths (ttwu).

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.232781355@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |  103 +++++++++++++++++++++++++------------------------------
 1 files changed, 47 insertions(+), 56 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 6b269b7..f155127 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -599,7 +599,7 @@ static inline int cpu_of(struct rq *rq)
  * Return the group to which this tasks belongs.
  *
  * We use task_subsys_state_check() and extend the RCU verification
- * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
+ * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
  * holds that lock for each task it moves into the cgroup. Therefore
  * by holding that lock, we pin the task to the current cgroup.
  */
@@ -609,7 +609,7 @@ static inline struct task_group *task_group(struct task_struct *p)
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
-			lockdep_is_held(&task_rq(p)->lock));
+			lockdep_is_held(&p->pi_lock));
 	tg = container_of(css, struct task_group, css);
 
 	return autogroup_task_group(p, tg);
@@ -924,23 +924,15 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 #endif /* __ARCH_WANT_UNLOCKED_CTXSW */
 
 /*
- * Check whether the task is waking, we use this to synchronize ->cpus_allowed
- * against ttwu().
- */
-static inline int task_is_waking(struct task_struct *p)
-{
-	return unlikely(p->state == TASK_WAKING);
-}
-
-/*
- * __task_rq_lock - lock the runqueue a given task resides on.
- * Must be called interrupts disabled.
+ * __task_rq_lock - lock the rq @p resides on.
  */
 static inline struct rq *__task_rq_lock(struct task_struct *p)
 	__acquires(rq->lock)
 {
 	struct rq *rq;
 
+	lockdep_assert_held(&p->pi_lock);
+
 	for (;;) {
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
@@ -951,22 +943,22 @@ static inline struct rq *__task_rq_lock(struct task_struct *p)
 }
 
 /*
- * task_rq_lock - lock the runqueue a given task resides on and disable
- * interrupts. Note the ordering: we can safely lookup the task_rq without
- * explicitly disabling preemption.
+ * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
  */
 static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)
+	__acquires(p->pi_lock)
 	__acquires(rq->lock)
 {
 	struct rq *rq;
 
 	for (;;) {
-		local_irq_save(*flags);
+		raw_spin_lock_irqsave(&p->pi_lock, *flags);
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
 		if (likely(rq == task_rq(p)))
 			return rq;
-		raw_spin_unlock_irqrestore(&rq->lock, *flags);
+		raw_spin_unlock(&rq->lock);
+		raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
 	}
 }
 
@@ -976,10 +968,13 @@ static void __task_rq_unlock(struct rq *rq)
 	raw_spin_unlock(&rq->lock);
 }
 
-static inline void task_rq_unlock(struct rq *rq, unsigned long *flags)
+static inline void
+task_rq_unlock(struct rq *rq, struct task_struct *p, unsigned long *flags)
 	__releases(rq->lock)
+	__releases(p->pi_lock)
 {
-	raw_spin_unlock_irqrestore(&rq->lock, *flags);
+	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
 }
 
 /*
@@ -2175,6 +2170,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	 */
 	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
 			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+
+#ifdef CONFIG_LOCKDEP
+	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
+				      lockdep_is_held(&task_rq(p)->lock)));
+#endif
 #endif
 
 	trace_sched_migrate_task(p, new_cpu);
@@ -2270,7 +2270,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
 		ncsw = 0;
 		if (!match_state || p->state == match_state)
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, p, &flags);
 
 		/*
 		 * If it changed from the expected state, bail out now.
@@ -2652,6 +2652,7 @@ static void __sched_fork(struct task_struct *p)
  */
 void sched_fork(struct task_struct *p, int clone_flags)
 {
+	unsigned long flags;
 	int cpu = get_cpu();
 
 	__sched_fork(p);
@@ -2702,9 +2703,9 @@ void sched_fork(struct task_struct *p, int clone_flags)
 	 *
 	 * Silence PROVE_RCU.
 	 */
-	rcu_read_lock();
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	set_task_cpu(p, cpu);
-	rcu_read_unlock();
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
 	if (likely(sched_info_on()))
@@ -2753,7 +2754,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 	set_task_cpu(p, cpu);
 
 	p->state = TASK_RUNNING;
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 #endif
 
 	rq = task_rq_lock(p, &flags);
@@ -2765,7 +2766,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 	if (p->sched_class->task_woken)
 		p->sched_class->task_woken(rq, p);
 #endif
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 	put_cpu();
 }
 
@@ -3490,12 +3491,12 @@ void sched_exec(void)
 	    likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, p, &flags);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 }
 
 #endif
@@ -3532,7 +3533,7 @@ unsigned long long task_delta_exec(struct task_struct *p)
 
 	rq = task_rq_lock(p, &flags);
 	ns = do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ns;
 }
@@ -3550,7 +3551,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
 
 	rq = task_rq_lock(p, &flags);
 	ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ns;
 }
@@ -3574,7 +3575,7 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
 	rq = task_rq_lock(p, &flags);
 	thread_group_cputime(p, &totals);
 	ns = totals.sum_exec_runtime + do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ns;
 }
@@ -4693,16 +4694,13 @@ EXPORT_SYMBOL(sleep_on_timeout);
  */
 void rt_mutex_setprio(struct task_struct *p, int prio)
 {
-	unsigned long flags;
 	int oldprio, on_rq, running;
 	struct rq *rq;
 	const struct sched_class *prev_class;
 
 	BUG_ON(prio < 0 || prio > MAX_PRIO);
 
-	lockdep_assert_held(&p->pi_lock);
-
-	rq = task_rq_lock(p, &flags);
+	rq = __task_rq_lock(p);
 
 	trace_sched_pi_setprio(p, prio);
 	oldprio = p->prio;
@@ -4727,7 +4725,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 		enqueue_task(rq, p, oldprio < prio ? ENQUEUE_HEAD : 0);
 
 	check_class_changed(rq, p, prev_class, oldprio);
-	task_rq_unlock(rq, &flags);
+	__task_rq_unlock(rq);
 }
 
 #endif
@@ -4775,7 +4773,7 @@ void set_user_nice(struct task_struct *p, long nice)
 			resched_task(rq->curr);
 	}
 out_unlock:
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 }
 EXPORT_SYMBOL(set_user_nice);
 
@@ -5003,20 +5001,17 @@ recheck:
 	/*
 	 * make sure no PI-waiters arrive (or leave) while we are
 	 * changing the priority of the task:
-	 */
-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	/*
+	 *
 	 * To be able to change p->policy safely, the appropriate
 	 * runqueue lock must be held.
 	 */
-	rq = __task_rq_lock(p);
+	rq = task_rq_lock(p, &flags);
 
 	/*
 	 * Changing the policy of the stop threads its a very bad idea
 	 */
 	if (p == rq->stop) {
-		__task_rq_unlock(rq);
-		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		task_rq_unlock(rq, p, &flags);
 		return -EINVAL;
 	}
 
@@ -5040,8 +5035,7 @@ recheck:
 		if (rt_bandwidth_enabled() && rt_policy(policy) &&
 				task_group(p)->rt_bandwidth.rt_runtime == 0 &&
 				!task_group_is_autogroup(task_group(p))) {
-			__task_rq_unlock(rq);
-			raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+			task_rq_unlock(rq, p, &flags);
 			return -EPERM;
 		}
 	}
@@ -5050,8 +5044,7 @@ recheck:
 	/* recheck policy now with rq lock held */
 	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
 		policy = oldpolicy = -1;
-		__task_rq_unlock(rq);
-		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		task_rq_unlock(rq, p, &flags);
 		goto recheck;
 	}
 	on_rq = p->on_rq;
@@ -5073,8 +5066,7 @@ recheck:
 		activate_task(rq, p, 0);
 
 	check_class_changed(rq, p, prev_class, oldprio);
-	__task_rq_unlock(rq);
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	task_rq_unlock(rq, p, &flags);
 
 	rt_mutex_adjust_pi(p);
 
@@ -5666,7 +5658,7 @@ SYSCALL_DEFINE2(sched_rr_get_interval, pid_t, pid,
 
 	rq = task_rq_lock(p, &flags);
 	time_slice = p->sched_class->get_rr_interval(rq, p);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, p, &flags);
 
 	rcu_read_unlock();
 	jiffies_to_timespec(time_slice, &t);
@@ -5889,8 +5881,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 	unsigned int dest_cpu;
 	int ret = 0;
 
-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	rq = __task_rq_lock(p);
+	rq = task_rq_lock(p, &flags);
 
 	if (!cpumask_intersects(new_mask, cpu_active_mask)) {
 		ret = -EINVAL;
@@ -5918,15 +5909,13 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 	if (need_migrate_task(p)) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
-		__task_rq_unlock(rq);
-		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		task_rq_unlock(rq, p, &flags);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		tlb_migrate_finish(p->mm);
 		return 0;
 	}
 out:
-	__task_rq_unlock(rq);
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	task_rq_unlock(rq, p, &flags);
 
 	return ret;
 }
@@ -5954,6 +5943,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
 	rq_src = cpu_rq(src_cpu);
 	rq_dest = cpu_rq(dest_cpu);
 
+	raw_spin_lock(&p->pi_lock);
 	double_rq_lock(rq_src, rq_dest);
 	/* Already moved. */
 	if (task_cpu(p) != src_cpu)
@@ -5976,6 +5966,7 @@ done:
 	ret = 1;
 fail:
 	double_rq_unlock(rq_src, rq_dest);
+	raw_spin_unlock(&p->pi_lock);
 	return ret;
 }
 
@@ -8702,7 +8693,7 @@ void sched_move_task(struct task_struct *tsk)
 	if (on_rq)
 		enqueue_task(rq, tsk, 0);
 
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, tsk, &flags);
 }
 #endif /* CONFIG_CGROUP_SCHED */
 

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Drop rq->lock from first part of wake_up_new_task()
  2011-04-05 15:23 ` [PATCH 14/21] sched: Drop rq->lock from first part of wake_up_new_task() Peter Zijlstra
@ 2011-04-14  8:37   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  ab2515c4b98f7bc4fa11cad9fa0f811d63a72a26
Gitweb:     http://git.kernel.org/tip/ab2515c4b98f7bc4fa11cad9fa0f811d63a72a26
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:52 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:38 +0200

sched: Drop rq->lock from first part of wake_up_new_task()

Since p->pi_lock now protects all things needed to call
select_task_rq() avoid the double remote rq->lock acquisition and rely
on p->pi_lock.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.273362517@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   17 +++--------------
 1 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index f155127..7c5494d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2736,28 +2736,18 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 {
 	unsigned long flags;
 	struct rq *rq;
-	int cpu __maybe_unused = get_cpu();
 
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
-	rq = task_rq_lock(p, &flags);
-	p->state = TASK_WAKING;
-
 	/*
 	 * Fork balancing, do it here and not earlier because:
 	 *  - cpus_allowed can change in the fork path
 	 *  - any previously selected cpu might disappear through hotplug
-	 *
-	 * We set TASK_WAKING so that select_task_rq() can drop rq->lock
-	 * without people poking at ->cpus_allowed.
 	 */
-	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
-	set_task_cpu(p, cpu);
-
-	p->state = TASK_RUNNING;
-	task_rq_unlock(rq, p, &flags);
+	set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
 #endif
 
-	rq = task_rq_lock(p, &flags);
+	rq = __task_rq_lock(p);
 	activate_task(rq, p, 0);
 	p->on_rq = 1;
 	trace_sched_wakeup_new(p, true);
@@ -2767,7 +2757,6 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 		p->sched_class->task_woken(rq, p);
 #endif
 	task_rq_unlock(rq, p, &flags);
-	put_cpu();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Drop rq->lock from sched_exec()
  2011-04-05 15:23 ` [PATCH 15/21] sched: Drop rq->lock from sched_exec() Peter Zijlstra
@ 2011-04-14  8:37   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  8f42ced974df7d5af2de4cf5ea21fe978c7e4478
Gitweb:     http://git.kernel.org/tip/8f42ced974df7d5af2de4cf5ea21fe978c7e4478
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:53 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:39 +0200

sched: Drop rq->lock from sched_exec()

Since we can now call select_task_rq() and set_task_cpu() with only
p->pi_lock held, and sched_exec() load-balancing has always been
optimistic, drop all rq->lock usage.

Oleg also noted that need_migrate_task() will always be true for
current, so don't bother calling that at all.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.314204889@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   15 +++++----------
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 7c5494d..1be1a09 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3465,27 +3465,22 @@ void sched_exec(void)
 {
 	struct task_struct *p = current;
 	unsigned long flags;
-	struct rq *rq;
 	int dest_cpu;
 
-	rq = task_rq_lock(p, &flags);
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
 		goto unlock;
 
-	/*
-	 * select_task_rq() can race against ->cpus_allowed
-	 */
-	if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
-	    likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
+	if (likely(cpu_active(dest_cpu))) {
 		struct migration_arg arg = { p, dest_cpu };
 
-		task_rq_unlock(rq, p, &flags);
-		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
-	task_rq_unlock(rq, p, &flags);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 }
 
 #endif

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Remove rq->lock from the first half of ttwu()
  2011-04-05 15:23 ` [PATCH 16/21] sched: Remove rq->lock from the first half of ttwu() Peter Zijlstra
@ 2011-04-14  8:38   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  e4a52bcb9a18142d79e231b6733cabdbf2e67c1f
Gitweb:     http://git.kernel.org/tip/e4a52bcb9a18142d79e231b6733cabdbf2e67c1f
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:54 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:39 +0200

sched: Remove rq->lock from the first half of ttwu()

Currently ttwu() does two rq->lock acquisitions, once on the task's
old rq, holding it over the p->state fiddling and load-balance pass.
Then it drops the old rq->lock to acquire the new rq->lock.

By having serialized ttwu(), p->sched_class, p->cpus_allowed with
p->pi_lock, we can now drop the whole first rq->lock acquisition.

The p->pi_lock serializing concurrent ttwu() calls protects p->state,
which we will set to TASK_WAKING to bridge possible p->pi_lock to
rq->lock gaps and serialize set_task_cpu() calls against
task_rq_lock().

The p->pi_lock serialization of p->sched_class allows us to call
scheduling class methods without holding the rq->lock, and the
serialization of p->cpus_allowed allows us to do the load-balancing
bits without races.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152729.354401150@chello.nl
---
 kernel/sched.c |   65 +++++++++++++++++++++++++++++++------------------------
 1 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 1be1a09..871dd9e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2493,69 +2493,78 @@ ttwu_post_activation(struct task_struct *p, struct rq *rq, int wake_flags)
  * Returns %true if @p was woken up, %false if it was already running
  * or @state didn't match @p's state.
  */
-static int try_to_wake_up(struct task_struct *p, unsigned int state,
-			  int wake_flags)
+static int
+try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 {
-	int cpu, orig_cpu, this_cpu, success = 0;
+	int cpu, this_cpu, success = 0;
 	unsigned long flags;
-	unsigned long en_flags = ENQUEUE_WAKEUP;
 	struct rq *rq;
 
 	this_cpu = get_cpu();
 
 	smp_wmb();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	rq = __task_rq_lock(p);
 	if (!(p->state & state))
 		goto out;
 
 	cpu = task_cpu(p);
 
-	if (p->on_rq)
-		goto out_running;
+	if (p->on_rq) {
+		rq = __task_rq_lock(p);
+		if (p->on_rq)
+			goto out_running;
+		__task_rq_unlock(rq);
+	}
 
-	orig_cpu = cpu;
 #ifdef CONFIG_SMP
-	if (unlikely(task_running(rq, p)))
-		goto out_activate;
+	while (p->on_cpu) {
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+		/*
+		 * If called from interrupt context we could have landed in the
+		 * middle of schedule(), in this case we should take care not
+		 * to spin on ->on_cpu if p is current, since that would
+		 * deadlock.
+		 */
+		if (p == current)
+			goto out_activate;
+#endif
+		cpu_relax();
+	}
+	/*
+	 * Pairs with the smp_wmb() in finish_lock_switch().
+	 */
+	smp_rmb();
 
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
-	if (p->sched_class->task_waking) {
+	if (p->sched_class->task_waking)
 		p->sched_class->task_waking(p);
-		en_flags |= ENQUEUE_WAKING;
-	}
 
 	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
-	if (cpu != orig_cpu)
-		set_task_cpu(p, cpu);
-	__task_rq_unlock(rq);
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+out_activate:
+#endif
+#endif /* CONFIG_SMP */
 
 	rq = cpu_rq(cpu);
 	raw_spin_lock(&rq->lock);
 
-	/*
-	 * We migrated the task without holding either rq->lock, however
-	 * since the task is not on the task list itself, nobody else
-	 * will try and migrate the task, hence the rq should match the
-	 * cpu we just moved it to.
-	 */
-	WARN_ON(task_cpu(p) != cpu);
-	WARN_ON(p->state != TASK_WAKING);
+#ifdef CONFIG_SMP
+	if (cpu != task_cpu(p))
+		set_task_cpu(p, cpu);
 
 	if (p->sched_contributes_to_load)
 		rq->nr_uninterruptible--;
+#endif
 
-out_activate:
-#endif /* CONFIG_SMP */
-	ttwu_activate(rq, p, en_flags);
+	ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
 out_running:
 	ttwu_post_activation(p, rq, wake_flags);
 	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
-out:
 	__task_rq_unlock(rq);
+out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 	put_cpu();
 

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Remove rq argument from ttwu_stat()
  2011-04-05 15:23 ` [PATCH 17/21] sched: Remove rq argument from ttwu_stat() Peter Zijlstra
@ 2011-04-14  8:38   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  b84cb5df1f9ad6da3f214c638d5fb08d0c99de1f
Gitweb:     http://git.kernel.org/tip/b84cb5df1f9ad6da3f214c638d5fb08d0c99de1f
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:55 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:40 +0200

sched: Remove rq argument from ttwu_stat()

In order to call ttwu_stat() without holding rq->lock we must remove
its rq argument. Since we need to change rq stats, account to the
local rq instead of the task rq, this is safe since we have IRQs
disabled.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152729.394638826@chello.nl
---
 kernel/sched.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 871dd9e..5ec2e8b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2410,9 +2410,11 @@ static void update_avg(u64 *avg, u64 sample)
 #endif
 
 static void
-ttwu_stat(struct rq *rq, struct task_struct *p, int cpu, int wake_flags)
+ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 {
 #ifdef CONFIG_SCHEDSTATS
+	struct rq *rq = this_rq();
+
 #ifdef CONFIG_SMP
 	int this_cpu = smp_processor_id();
 
@@ -2561,9 +2563,10 @@ out_activate:
 	ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
 out_running:
 	ttwu_post_activation(p, rq, wake_flags);
-	ttwu_stat(rq, p, cpu, wake_flags);
 	success = 1;
 	__task_rq_unlock(rq);
+
+	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 	put_cpu();
@@ -2600,7 +2603,7 @@ static void try_to_wake_up_local(struct task_struct *p)
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
 	ttwu_post_activation(p, rq, 0);
-	ttwu_stat(rq, p, smp_processor_id(), 0);
+	ttwu_stat(p, smp_processor_id(), 0);
 out:
 	raw_spin_unlock(&p->pi_lock);
 }

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
  2011-04-05 15:23 ` [PATCH 18/21] sched: Rename ttwu_post_activation Peter Zijlstra
@ 2011-04-14  8:39   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  23f41eeb42ce7f6f1210904e49e84718f02cb61c
Gitweb:     http://git.kernel.org/tip/23f41eeb42ce7f6f1210904e49e84718f02cb61c
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:56 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:40 +0200

sched: Rename ttwu_post_activation() to ttwu_do_wakeup()

The ttwu_post_activation() code does the core wakeup, it sets TASK_RUNNING
and performs wakeup-preemption, so give is a more descriptive name.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152729.434609705@chello.nl
---
 kernel/sched.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 5ec2e8b..e309dba 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2456,8 +2456,11 @@ static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
 		wq_worker_waking_up(p, cpu_of(rq));
 }
 
+/*
+ * Mark the task runnable and perform wakeup-preemption.
+ */
 static void
-ttwu_post_activation(struct task_struct *p, struct rq *rq, int wake_flags)
+ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
 {
 	trace_sched_wakeup(p, true);
 	check_preempt_curr(rq, p, wake_flags);
@@ -2562,7 +2565,7 @@ out_activate:
 
 	ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
 out_running:
-	ttwu_post_activation(p, rq, wake_flags);
+	ttwu_do_wakeup(rq, p, wake_flags);
 	success = 1;
 	__task_rq_unlock(rq);
 
@@ -2602,7 +2605,7 @@ static void try_to_wake_up_local(struct task_struct *p)
 	if (!p->on_rq)
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
-	ttwu_post_activation(p, rq, 0);
+	ttwu_do_wakeup(rq, p, 0);
 	ttwu_stat(p, smp_processor_id(), 0);
 out:
 	raw_spin_unlock(&p->pi_lock);

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Restructure ttwu() some more
  2011-04-05 15:23 ` [PATCH 19/21] sched: Restructure ttwu some more Peter Zijlstra
@ 2011-04-14  8:39   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  c05fbafba1c5482bee399b360288fa405415e126
Gitweb:     http://git.kernel.org/tip/c05fbafba1c5482bee399b360288fa405415e126
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:57 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:40 +0200

sched: Restructure ttwu() some more

Factor our helper functions to make the inner workings of try_to_wake_up()
more obvious, this also allows for adding remote queues.

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152729.475848012@chello.nl
---
 kernel/sched.c |   91 +++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 58 insertions(+), 33 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e309dba..7d8b85f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2483,6 +2483,48 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
 #endif
 }
 
+static void
+ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags)
+{
+#ifdef CONFIG_SMP
+	if (p->sched_contributes_to_load)
+		rq->nr_uninterruptible--;
+#endif
+
+	ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
+	ttwu_do_wakeup(rq, p, wake_flags);
+}
+
+/*
+ * Called in case the task @p isn't fully descheduled from its runqueue,
+ * in this case we must do a remote wakeup. Its a 'light' wakeup though,
+ * since all we need to do is flip p->state to TASK_RUNNING, since
+ * the task is still ->on_rq.
+ */
+static int ttwu_remote(struct task_struct *p, int wake_flags)
+{
+	struct rq *rq;
+	int ret = 0;
+
+	rq = __task_rq_lock(p);
+	if (p->on_rq) {
+		ttwu_do_wakeup(rq, p, wake_flags);
+		ret = 1;
+	}
+	__task_rq_unlock(rq);
+
+	return ret;
+}
+
+static void ttwu_queue(struct task_struct *p, int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	raw_spin_lock(&rq->lock);
+	ttwu_do_activate(rq, p, 0);
+	raw_spin_unlock(&rq->lock);
+}
+
 /**
  * try_to_wake_up - wake up a thread
  * @p: the thread to be awakened
@@ -2501,27 +2543,25 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
 static int
 try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 {
-	int cpu, this_cpu, success = 0;
 	unsigned long flags;
-	struct rq *rq;
-
-	this_cpu = get_cpu();
+	int cpu, success = 0;
 
 	smp_wmb();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
 
+	success = 1; /* we're going to change ->state */
 	cpu = task_cpu(p);
 
-	if (p->on_rq) {
-		rq = __task_rq_lock(p);
-		if (p->on_rq)
-			goto out_running;
-		__task_rq_unlock(rq);
-	}
+	if (p->on_rq && ttwu_remote(p, wake_flags))
+		goto stat;
 
 #ifdef CONFIG_SMP
+	/*
+	 * If the owning (remote) cpu is still in the middle of schedule() with
+	 * this task as prev, wait until its done referencing the task.
+	 */
 	while (p->on_cpu) {
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 		/*
@@ -2530,8 +2570,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		 * to spin on ->on_cpu if p is current, since that would
 		 * deadlock.
 		 */
-		if (p == current)
-			goto out_activate;
+		if (p == current) {
+			ttwu_queue(p, cpu);
+			goto stat;
+		}
 #endif
 		cpu_relax();
 	}
@@ -2547,32 +2589,15 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		p->sched_class->task_waking(p);
 
 	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-out_activate:
-#endif
-#endif /* CONFIG_SMP */
-
-	rq = cpu_rq(cpu);
-	raw_spin_lock(&rq->lock);
-
-#ifdef CONFIG_SMP
-	if (cpu != task_cpu(p))
+	if (task_cpu(p) != cpu)
 		set_task_cpu(p, cpu);
+#endif /* CONFIG_SMP */
 
-	if (p->sched_contributes_to_load)
-		rq->nr_uninterruptible--;
-#endif
-
-	ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
-out_running:
-	ttwu_do_wakeup(rq, p, wake_flags);
-	success = 1;
-	__task_rq_unlock(rq);
-
+	ttwu_queue(p, cpu);
+stat:
 	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
-	put_cpu();
 
 	return success;
 }

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Move the second half of ttwu() to the remote cpu
  2011-04-05 15:23 ` [PATCH 20/21] sched: Move the second half of ttwu() to the remote cpu Peter Zijlstra
@ 2011-04-14  8:39   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, frank.rowand, mingo

Commit-ID:  317f394160e9beb97d19a84c39b7e5eb3d7815a8
Gitweb:     http://git.kernel.org/tip/317f394160e9beb97d19a84c39b7e5eb3d7815a8
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:58 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:41 +0200

sched: Move the second half of ttwu() to the remote cpu

Now that we've removed the rq->lock requirement from the first part of
ttwu() and can compute placement without holding any rq->lock, ensure
we execute the second half of ttwu() on the actual cpu we want the
task to run on.

This avoids having to take rq->lock and doing the task enqueue
remotely, saving lots on cacheline transfers.

As measured using: http://oss.oracle.com/~mason/sembench.c

  $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
  $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
  $ ./sembench -t 2048 -w 1900 -o 0

  unpatched: run time 30 seconds 647278 worker burns per second
  patched:   run time 30 seconds 816715 worker burns per second

Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152729.515897185@chello.nl
---
 include/linux/sched.h   |    3 +-
 init/Kconfig            |    5 ++++
 kernel/sched.c          |   56 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched_features.h |    6 +++++
 4 files changed, 69 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 25c5031..e09dafa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1203,6 +1203,7 @@ struct task_struct {
 	int lock_depth;		/* BKL lock depth */
 
 #ifdef CONFIG_SMP
+	struct task_struct *wake_entry;
 	int on_cpu;
 #endif
 	int on_rq;
@@ -2192,7 +2193,7 @@ extern void set_task_comm(struct task_struct *tsk, char *from);
 extern char *get_task_comm(char *to, struct task_struct *tsk);
 
 #ifdef CONFIG_SMP
-static inline void scheduler_ipi(void) { }
+void scheduler_ipi(void);
 extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
 #else
 static inline void scheduler_ipi(void) { }
diff --git a/init/Kconfig b/init/Kconfig
index 56240e7..32745bf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -827,6 +827,11 @@ config SCHED_AUTOGROUP
 	  desktop applications.  Task group autogeneration is currently based
 	  upon task session.
 
+config SCHED_TTWU_QUEUE
+	bool
+	depends on !SPARC32
+	default y
+
 config MM_OWNER
 	bool
 
diff --git a/kernel/sched.c b/kernel/sched.c
index 7d8b85f..9e3ede1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -556,6 +556,10 @@ struct rq {
 	unsigned int ttwu_count;
 	unsigned int ttwu_local;
 #endif
+
+#ifdef CONFIG_SMP
+	struct task_struct *wake_list;
+#endif
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
@@ -2516,10 +2520,61 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 	return ret;
 }
 
+#ifdef CONFIG_SMP
+static void sched_ttwu_pending(void)
+{
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	raw_spin_lock(&rq->lock);
+
+	while (list) {
+		struct task_struct *p = list;
+		list = list->wake_entry;
+		ttwu_do_activate(rq, p, 0);
+	}
+
+	raw_spin_unlock(&rq->lock);
+}
+
+void scheduler_ipi(void)
+{
+	sched_ttwu_pending();
+}
+
+static void ttwu_queue_remote(struct task_struct *p, int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	struct task_struct *next = rq->wake_list;
+
+	for (;;) {
+		struct task_struct *old = next;
+
+		p->wake_entry = next;
+		next = cmpxchg(&rq->wake_list, old, p);
+		if (next == old)
+			break;
+	}
+
+	if (!next)
+		smp_send_reschedule(cpu);
+}
+#endif
+
 static void ttwu_queue(struct task_struct *p, int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
+#if defined(CONFIG_SMP) && defined(CONFIG_SCHED_TTWU_QUEUE)
+	if (sched_feat(TTWU_QUEUE) && cpu != smp_processor_id()) {
+		ttwu_queue_remote(p, cpu);
+		return;
+	}
+#endif
+
 	raw_spin_lock(&rq->lock);
 	ttwu_do_activate(rq, p, 0);
 	raw_spin_unlock(&rq->lock);
@@ -6331,6 +6386,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_DYING:
+		sched_ttwu_pending();
 		/* Update our root-domain */
 		raw_spin_lock_irqsave(&rq->lock, flags);
 		if (rq->rd) {
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 68e69ac..be40f73 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -64,3 +64,9 @@ SCHED_FEAT(OWNER_SPIN, 1)
  * Decrement CPU power based on irq activity
  */
 SCHED_FEAT(NONIRQ_POWER, 1)
+
+/*
+ * Queue remote wakeups on the target CPU and process them
+ * using the scheduler IPI. Reduces rq->lock contention/bounces.
+ */
+SCHED_FEAT(TTWU_QUEUE, 1)

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [tip:sched/locking] sched: Remove need_migrate_task()
  2011-04-05 15:23 ` [PATCH 21/21] sched: Remove need_migrate_task() Peter Zijlstra
@ 2011-04-14  8:40   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-04-14  8:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, efault,
	npiggin, akpm, tglx, oleg, frank.rowand, mingo

Commit-ID:  bd8e7dded88a3e1c085c333f19ff31387616f71a
Gitweb:     http://git.kernel.org/tip/bd8e7dded88a3e1c085c333f19ff31387616f71a
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Tue, 5 Apr 2011 17:23:59 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 14 Apr 2011 08:52:41 +0200

sched: Remove need_migrate_task()

Oleg noticed that need_migrate_task() doesn't need the ->on_cpu check
now that ttwu() doesn't do remote enqueues for !->on_rq && ->on_cpu,
so remove the helper and replace the single instance with a direct
->on_rq test.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.556674812@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   17 +----------------
 1 files changed, 1 insertions(+), 16 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 9e3ede1..cd597c7 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2199,21 +2199,6 @@ struct migration_arg {
 static int migration_cpu_stop(void *data);
 
 /*
- * The task's runqueue lock must be held.
- * Returns true if you have to wait for migration thread.
- */
-static bool need_migrate_task(struct task_struct *p)
-{
-	/*
-	 * If the task is not on a runqueue (and not running), then
-	 * the next wake-up will properly place the task.
-	 */
-	bool running = p->on_rq || p->on_cpu;
-	smp_rmb(); /* finish_lock_switch() */
-	return running;
-}
-
-/*
  * wait_task_inactive - wait for a thread to unschedule.
  *
  * If @match_state is nonzero, it's the @p->state value just checked and
@@ -5985,7 +5970,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (need_migrate_task(p)) {
+	if (p->on_rq) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [PATCH 00/21] sched: Reduce runqueue lock contention -v6
  2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
                   ` (22 preceding siblings ...)
  2011-04-06 11:00 ` Peter Zijlstra
@ 2011-04-27 16:54 ` Dave Kleikamp
  23 siblings, 0 replies; 152+ messages in thread
From: Dave Kleikamp @ 2011-04-27 16:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Mason, Frank Rowand, Ingo Molnar, Thomas Gleixner,
	Mike Galbraith, linux-kernel

On 04/05/2011 10:23 AM, Peter Zijlstra wrote:
> This patch series aims to optimize remote wakeups by moving most of the
> work of the wakeup to the remote cpu and avoid bouncing runqueue data
> structures where possible.
>
> As measured by sembench (which basically creates a wakeup storm) on my
> dual-socket westmere:
>
> $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance>  $i; done
> $ echo 4096 32000 64 128>  /proc/sys/kernel/sem
> $ ./sembench -t 2048 -w 1900 -o 0
>
> unpatched: run time 30 seconds 647278 worker burns per second
> patched:   run time 30 seconds 816715 worker burns per second
>
> I've queued this series for .40.

Here are the results of running sembench on a 128 cpu box. In all of the
below cases, I had to use the kernel parameter idle=mwait to eliminate
spinlock contention in clockevents_notify() in the idle loop. I'll try
to track down what can be done about that later.

I took Peter's patches from the tip/sched/locking tree. I got similar
results directly from that branch, but separated them out to try to
isolate some irregular behavior that mostly went away when I added
idle=mwait. Since that branch was on top of 2.6.39-rc3, I used that
as a base.

The other patchset in play is Chris Mason's semtimedop optimization
patches. By themselves, I didn't see an improvement with Chris' patches,
but in conjunction with Peter's, they gave the best results. When
combining the patches, I removed Chris' batched wakeup patch, since it
conflicted with Peter's patchset and really isn't needed any more.

(It's been a while since Chris posted these. They are in the 
"unbreakable" git tree,
http://oss.oracle.com/git/?p=linux-2.6-unbreakable.git;a=summary ,
and ported easily to mainline. I can repost them.)

I used Chris's latest sembench, http://oss.oracle.com/~mason/sembench.c
and the command "./sembench -t 2048 -w 1900 -o 0".  I got similar
burns-per-second numbers when cranking up the parameters to
"./sembench -t 16384 -w 15000 -o 0".


2.6.38:

2048 threads, waking 1900 at a time
using ipc sem operations
main thread burns: 6549
worker burn count total 12443100 min 6068 max 6105 avg 6075
run time 30 seconds 414770 worker burns per second

2.6.39-rc3:

worker burn count total 11876900 min 5791 max 5805 avg 5799
run time 30 seconds 395896 worker burns per second

2.6.39-rc3 + mason's semtimedop patches:

worker burn count total 9988300 min 4868 max 4896 avg 4877
run time 30 seconds 332943 worker burns per second

2.6.39-rc3 + mason's patches (no batch wakeup patch):

worker burn count total 9743200 min 4750 max 4786 avg 4757
run time 30 seconds 324773 worker burns per second

2.6.39-rc3 + peterz's patches:

worker burn count total 14430500 min 7038 max 7060 avg 7046
run time 30 seconds 481016 worker burns per second

2.6.39-rc3 + mason's patches + peterz's patches:

worker burn count total 15072700 min 7348 max 7381 avg 7359
run time 30 seconds 502423 worker burns per second

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-04-14  8:36   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
@ 2011-06-01 13:58     ` Arne Jansen
  2011-06-01 16:35       ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-01 13:58 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, a.p.zijlstra, torvalds, efault,
	npiggin, akpm, frank.rowand, tglx, mingo
  Cc: linux-tip-commits

[-- Attachment #1: Type: text/plain, Size: 12979 bytes --]

Hi,

git bisect blames this commit for a problem I have with v3.0-rc1:
If I printk large amounts of data, the machine locks up.
As the commit does not revert cleanly on top of 3.0, I haven't been
able to double check.
The test I use is simple, just add something like

for (i=0; i < 10000; ++i) printk("test %d\n", i);

and trigger it, in most cases I can see the first 10 printks before
I have to power cycle the machine (sysrq-b does not work anymore).
Attached my .config.

-Arne



On 14.04.2011 10:36, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  0122ec5b02f766c355b3168df53a6c038a24fa0d
> Gitweb:     http://git.kernel.org/tip/0122ec5b02f766c355b3168df53a6c038a24fa0d
> Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
> AuthorDate: Tue, 5 Apr 2011 17:23:51 +0200
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Thu, 14 Apr 2011 08:52:38 +0200
> 
> sched: Add p->pi_lock to task_rq_lock()
> 
> In order to be able to call set_task_cpu() while either holding
> p->pi_lock or task_rq(p)->lock we need to hold both locks in order to
> stabilize task_rq().
> 
> This makes task_rq_lock() acquire both locks, and have
> __task_rq_lock() validate that p->pi_lock is held. This increases the
> locking overhead for most scheduler syscalls but allows reduction of
> rq->lock contention for some scheduler hot paths (ttwu).
> 
> Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Link: http://lkml.kernel.org/r/20110405152729.232781355@chello.nl
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/sched.c |  103 +++++++++++++++++++++++++------------------------------
>  1 files changed, 47 insertions(+), 56 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 6b269b7..f155127 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -599,7 +599,7 @@ static inline int cpu_of(struct rq *rq)
>   * Return the group to which this tasks belongs.
>   *
>   * We use task_subsys_state_check() and extend the RCU verification
> - * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
> + * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
>   * holds that lock for each task it moves into the cgroup. Therefore
>   * by holding that lock, we pin the task to the current cgroup.
>   */
> @@ -609,7 +609,7 @@ static inline struct task_group *task_group(struct task_struct *p)
>  	struct cgroup_subsys_state *css;
>  
>  	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> -			lockdep_is_held(&task_rq(p)->lock));
> +			lockdep_is_held(&p->pi_lock));
>  	tg = container_of(css, struct task_group, css);
>  
>  	return autogroup_task_group(p, tg);
> @@ -924,23 +924,15 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
>  #endif /* __ARCH_WANT_UNLOCKED_CTXSW */
>  
>  /*
> - * Check whether the task is waking, we use this to synchronize ->cpus_allowed
> - * against ttwu().
> - */
> -static inline int task_is_waking(struct task_struct *p)
> -{
> -	return unlikely(p->state == TASK_WAKING);
> -}
> -
> -/*
> - * __task_rq_lock - lock the runqueue a given task resides on.
> - * Must be called interrupts disabled.
> + * __task_rq_lock - lock the rq @p resides on.
>   */
>  static inline struct rq *__task_rq_lock(struct task_struct *p)
>  	__acquires(rq->lock)
>  {
>  	struct rq *rq;
>  
> +	lockdep_assert_held(&p->pi_lock);
> +
>  	for (;;) {
>  		rq = task_rq(p);
>  		raw_spin_lock(&rq->lock);
> @@ -951,22 +943,22 @@ static inline struct rq *__task_rq_lock(struct task_struct *p)
>  }
>  
>  /*
> - * task_rq_lock - lock the runqueue a given task resides on and disable
> - * interrupts. Note the ordering: we can safely lookup the task_rq without
> - * explicitly disabling preemption.
> + * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
>   */
>  static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)
> +	__acquires(p->pi_lock)
>  	__acquires(rq->lock)
>  {
>  	struct rq *rq;
>  
>  	for (;;) {
> -		local_irq_save(*flags);
> +		raw_spin_lock_irqsave(&p->pi_lock, *flags);
>  		rq = task_rq(p);
>  		raw_spin_lock(&rq->lock);
>  		if (likely(rq == task_rq(p)))
>  			return rq;
> -		raw_spin_unlock_irqrestore(&rq->lock, *flags);
> +		raw_spin_unlock(&rq->lock);
> +		raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
>  	}
>  }
>  
> @@ -976,10 +968,13 @@ static void __task_rq_unlock(struct rq *rq)
>  	raw_spin_unlock(&rq->lock);
>  }
>  
> -static inline void task_rq_unlock(struct rq *rq, unsigned long *flags)
> +static inline void
> +task_rq_unlock(struct rq *rq, struct task_struct *p, unsigned long *flags)
>  	__releases(rq->lock)
> +	__releases(p->pi_lock)
>  {
> -	raw_spin_unlock_irqrestore(&rq->lock, *flags);
> +	raw_spin_unlock(&rq->lock);
> +	raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
>  }
>  
>  /*
> @@ -2175,6 +2170,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  	 */
>  	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
>  			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
> +
> +#ifdef CONFIG_LOCKDEP
> +	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
> +				      lockdep_is_held(&task_rq(p)->lock)));
> +#endif
>  #endif
>  
>  	trace_sched_migrate_task(p, new_cpu);
> @@ -2270,7 +2270,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
>  		ncsw = 0;
>  		if (!match_state || p->state == match_state)
>  			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
> -		task_rq_unlock(rq, &flags);
> +		task_rq_unlock(rq, p, &flags);
>  
>  		/*
>  		 * If it changed from the expected state, bail out now.
> @@ -2652,6 +2652,7 @@ static void __sched_fork(struct task_struct *p)
>   */
>  void sched_fork(struct task_struct *p, int clone_flags)
>  {
> +	unsigned long flags;
>  	int cpu = get_cpu();
>  
>  	__sched_fork(p);
> @@ -2702,9 +2703,9 @@ void sched_fork(struct task_struct *p, int clone_flags)
>  	 *
>  	 * Silence PROVE_RCU.
>  	 */
> -	rcu_read_lock();
> +	raw_spin_lock_irqsave(&p->pi_lock, flags);
>  	set_task_cpu(p, cpu);
> -	rcu_read_unlock();
> +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
>  
>  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
>  	if (likely(sched_info_on()))
> @@ -2753,7 +2754,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
>  	set_task_cpu(p, cpu);
>  
>  	p->state = TASK_RUNNING;
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  #endif
>  
>  	rq = task_rq_lock(p, &flags);
> @@ -2765,7 +2766,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
>  	if (p->sched_class->task_woken)
>  		p->sched_class->task_woken(rq, p);
>  #endif
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  	put_cpu();
>  }
>  
> @@ -3490,12 +3491,12 @@ void sched_exec(void)
>  	    likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
>  		struct migration_arg arg = { p, dest_cpu };
>  
> -		task_rq_unlock(rq, &flags);
> +		task_rq_unlock(rq, p, &flags);
>  		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>  		return;
>  	}
>  unlock:
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  }
>  
>  #endif
> @@ -3532,7 +3533,7 @@ unsigned long long task_delta_exec(struct task_struct *p)
>  
>  	rq = task_rq_lock(p, &flags);
>  	ns = do_task_delta_exec(p, rq);
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  
>  	return ns;
>  }
> @@ -3550,7 +3551,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
>  
>  	rq = task_rq_lock(p, &flags);
>  	ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  
>  	return ns;
>  }
> @@ -3574,7 +3575,7 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
>  	rq = task_rq_lock(p, &flags);
>  	thread_group_cputime(p, &totals);
>  	ns = totals.sum_exec_runtime + do_task_delta_exec(p, rq);
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  
>  	return ns;
>  }
> @@ -4693,16 +4694,13 @@ EXPORT_SYMBOL(sleep_on_timeout);
>   */
>  void rt_mutex_setprio(struct task_struct *p, int prio)
>  {
> -	unsigned long flags;
>  	int oldprio, on_rq, running;
>  	struct rq *rq;
>  	const struct sched_class *prev_class;
>  
>  	BUG_ON(prio < 0 || prio > MAX_PRIO);
>  
> -	lockdep_assert_held(&p->pi_lock);
> -
> -	rq = task_rq_lock(p, &flags);
> +	rq = __task_rq_lock(p);
>  
>  	trace_sched_pi_setprio(p, prio);
>  	oldprio = p->prio;
> @@ -4727,7 +4725,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
>  		enqueue_task(rq, p, oldprio < prio ? ENQUEUE_HEAD : 0);
>  
>  	check_class_changed(rq, p, prev_class, oldprio);
> -	task_rq_unlock(rq, &flags);
> +	__task_rq_unlock(rq);
>  }
>  
>  #endif
> @@ -4775,7 +4773,7 @@ void set_user_nice(struct task_struct *p, long nice)
>  			resched_task(rq->curr);
>  	}
>  out_unlock:
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  }
>  EXPORT_SYMBOL(set_user_nice);
>  
> @@ -5003,20 +5001,17 @@ recheck:
>  	/*
>  	 * make sure no PI-waiters arrive (or leave) while we are
>  	 * changing the priority of the task:
> -	 */
> -	raw_spin_lock_irqsave(&p->pi_lock, flags);
> -	/*
> +	 *
>  	 * To be able to change p->policy safely, the appropriate
>  	 * runqueue lock must be held.
>  	 */
> -	rq = __task_rq_lock(p);
> +	rq = task_rq_lock(p, &flags);
>  
>  	/*
>  	 * Changing the policy of the stop threads its a very bad idea
>  	 */
>  	if (p == rq->stop) {
> -		__task_rq_unlock(rq);
> -		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +		task_rq_unlock(rq, p, &flags);
>  		return -EINVAL;
>  	}
>  
> @@ -5040,8 +5035,7 @@ recheck:
>  		if (rt_bandwidth_enabled() && rt_policy(policy) &&
>  				task_group(p)->rt_bandwidth.rt_runtime == 0 &&
>  				!task_group_is_autogroup(task_group(p))) {
> -			__task_rq_unlock(rq);
> -			raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +			task_rq_unlock(rq, p, &flags);
>  			return -EPERM;
>  		}
>  	}
> @@ -5050,8 +5044,7 @@ recheck:
>  	/* recheck policy now with rq lock held */
>  	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
>  		policy = oldpolicy = -1;
> -		__task_rq_unlock(rq);
> -		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +		task_rq_unlock(rq, p, &flags);
>  		goto recheck;
>  	}
>  	on_rq = p->on_rq;
> @@ -5073,8 +5066,7 @@ recheck:
>  		activate_task(rq, p, 0);
>  
>  	check_class_changed(rq, p, prev_class, oldprio);
> -	__task_rq_unlock(rq);
> -	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +	task_rq_unlock(rq, p, &flags);
>  
>  	rt_mutex_adjust_pi(p);
>  
> @@ -5666,7 +5658,7 @@ SYSCALL_DEFINE2(sched_rr_get_interval, pid_t, pid,
>  
>  	rq = task_rq_lock(p, &flags);
>  	time_slice = p->sched_class->get_rr_interval(rq, p);
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, p, &flags);
>  
>  	rcu_read_unlock();
>  	jiffies_to_timespec(time_slice, &t);
> @@ -5889,8 +5881,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  	unsigned int dest_cpu;
>  	int ret = 0;
>  
> -	raw_spin_lock_irqsave(&p->pi_lock, flags);
> -	rq = __task_rq_lock(p);
> +	rq = task_rq_lock(p, &flags);
>  
>  	if (!cpumask_intersects(new_mask, cpu_active_mask)) {
>  		ret = -EINVAL;
> @@ -5918,15 +5909,13 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  	if (need_migrate_task(p)) {
>  		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
> -		__task_rq_unlock(rq);
> -		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +		task_rq_unlock(rq, p, &flags);
>  		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>  		tlb_migrate_finish(p->mm);
>  		return 0;
>  	}
>  out:
> -	__task_rq_unlock(rq);
> -	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +	task_rq_unlock(rq, p, &flags);
>  
>  	return ret;
>  }
> @@ -5954,6 +5943,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>  	rq_src = cpu_rq(src_cpu);
>  	rq_dest = cpu_rq(dest_cpu);
>  
> +	raw_spin_lock(&p->pi_lock);
>  	double_rq_lock(rq_src, rq_dest);
>  	/* Already moved. */
>  	if (task_cpu(p) != src_cpu)
> @@ -5976,6 +5966,7 @@ done:
>  	ret = 1;
>  fail:
>  	double_rq_unlock(rq_src, rq_dest);
> +	raw_spin_unlock(&p->pi_lock);
>  	return ret;
>  }
>  
> @@ -8702,7 +8693,7 @@ void sched_move_task(struct task_struct *tsk)
>  	if (on_rq)
>  		enqueue_task(rq, tsk, 0);
>  
> -	task_rq_unlock(rq, &flags);
> +	task_rq_unlock(rq, tsk, &flags);
>  }
>  #endif /* CONFIG_CGROUP_SCHED */
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


[-- Attachment #2: config --]
[-- Type: text/plain, Size: 79609 bytes --]

#
# Automatically generated make config: don't edit
# Linux/x86_64 2.6.39-rc3 Kernel Configuration
# Wed Jun  1 15:07:33 2011
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
# CONFIG_KTIME_SCALAR is not set
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_FHANDLE is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_HAVE_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_NS is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
# CONFIG_CGROUP_MEM_RES_CTLR is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_BLK_CGROUP=y
CONFIG_DEBUG_BLK_CGROUP=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EXPERT is not set
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_PERF_COUNTERS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# CONFIG_OPROFILE is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_OPTPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
# CONFIG_BLK_DEV_INTEGRITY is not set
# CONFIG_BLK_DEV_THROTTLING is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_CFQ_GROUP_IOSCHED is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_INLINE_SPIN_TRYLOCK is not set
# CONFIG_INLINE_SPIN_TRYLOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK is not set
# CONFIG_INLINE_SPIN_LOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK_IRQ is not set
# CONFIG_INLINE_SPIN_LOCK_IRQSAVE is not set
# CONFIG_INLINE_SPIN_UNLOCK is not set
# CONFIG_INLINE_SPIN_UNLOCK_BH is not set
# CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set
# CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_READ_TRYLOCK is not set
# CONFIG_INLINE_READ_LOCK is not set
# CONFIG_INLINE_READ_LOCK_BH is not set
# CONFIG_INLINE_READ_LOCK_IRQ is not set
# CONFIG_INLINE_READ_LOCK_IRQSAVE is not set
# CONFIG_INLINE_READ_UNLOCK is not set
# CONFIG_INLINE_READ_UNLOCK_BH is not set
# CONFIG_INLINE_READ_UNLOCK_IRQ is not set
# CONFIG_INLINE_READ_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_WRITE_TRYLOCK is not set
# CONFIG_INLINE_WRITE_LOCK is not set
# CONFIG_INLINE_WRITE_LOCK_BH is not set
# CONFIG_INLINE_WRITE_LOCK_IRQ is not set
# CONFIG_INLINE_WRITE_LOCK_IRQSAVE is not set
# CONFIG_INLINE_WRITE_UNLOCK is not set
# CONFIG_INLINE_WRITE_UNLOCK_BH is not set
# CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set
# CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE is not set
# CONFIG_MUTEX_SPIN_ON_OWNER is not set
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
CONFIG_NO_BOOTMEM=y
# CONFIG_MEMTEST is not set
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_INTERNODE_CACHE_SHIFT=7
CONFIG_X86_CMPXCHG=y
CONFIG_CMPXCHG_LOCAL=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_STATS=y
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_IOMMU_API=y
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=64
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_X86_MCE_INJECT is not set
CONFIG_X86_THERMAL_VECTOR=y
# CONFIG_I8K is not set
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
# CONFIG_NUMA_EMU is not set
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=999999
# CONFIG_COMPACTION is not set
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_MEMORY_FAILURE is not set
# CONFIG_TRANSPARENT_HUGEPAGE is not set
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_EFI=y
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
# CONFIG_KEXEC_JUMP is not set
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_HOTPLUG_CPU=y
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_RUNTIME is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_VERBOSE is not set
# CONFIG_PM_ADVANCED_DEBUG is not set
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_CAN_PM_TRACE=y
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
# CONFIG_ACPI_POWER_METER is not set
# CONFIG_ACPI_EC_DEBUGFS is not set
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
# CONFIG_ACPI_SBS is not set
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_APEI is not set
# CONFIG_SFI is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
# CONFIG_CPU_FREQ_DEBUG is not set
# CONFIG_CPU_FREQ_STAT is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set

#
# CPUFreq processor drivers
#
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=y
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_INTEL_IDLE is not set

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCI_CNB20LE_QUIRK is not set
CONFIG_DMAR=y
# CONFIG_DMAR_DEFAULT_ON is not set
CONFIG_DMAR_FLOPPY_WA=y
# CONFIG_INTR_REMAP is not set
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIE_ECRC=y
CONFIG_PCIEAER_INJECT=y
CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEBUG is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
CONFIG_PCI_IOV=y
CONFIG_PCI_IOAPIC=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
CONFIG_PCCARD=y
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=y
CONFIG_YENTA_O2=y
CONFIG_YENTA_RICOH=y
CONFIG_YENTA_TI=y
CONFIG_YENTA_ENE_TUNE=y
CONFIG_YENTA_TOSHIBA=y
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_FAKE is not set
# CONFIG_HOTPLUG_PCI_ACPI is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
# CONFIG_RAPIDIO is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_KEYS_COMPAT=y
CONFIG_HAVE_TEXT_POKE_SMP=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_IPCOMP=y
CONFIG_NET_KEY=y
CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=y
# CONFIG_NET_IPGRE_DEMUX is not set
CONFIG_IP_MROUTE=y
# CONFIG_IP_MROUTE_MULTIPLE_TABLES is not set
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_ARPD=y
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=y
CONFIG_INET_ESP=y
CONFIG_INET_IPCOMP=y
CONFIG_INET_XFRM_TUNNEL=y
CONFIG_INET_TUNNEL=y
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_ADVANCED=y
# CONFIG_TCP_CONG_BIC is not set
CONFIG_TCP_CONG_CUBIC=y
# CONFIG_TCP_CONG_WESTWOOD is not set
# CONFIG_TCP_CONG_HTCP is not set
# CONFIG_TCP_CONG_HSTCP is not set
# CONFIG_TCP_CONG_HYBLA is not set
# CONFIG_TCP_CONG_VEGAS is not set
# CONFIG_TCP_CONG_SCALABLE is not set
# CONFIG_TCP_CONG_LP is not set
# CONFIG_TCP_CONG_VENO is not set
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=y
CONFIG_INET6_ESP=y
CONFIG_INET6_IPCOMP=y
CONFIG_IPV6_MIP6=y
CONFIG_INET6_XFRM_TUNNEL=y
CONFIG_INET6_TUNNEL=y
CONFIG_INET6_XFRM_MODE_TRANSPORT=y
CONFIG_INET6_XFRM_MODE_TUNNEL=y
CONFIG_INET6_XFRM_MODE_BEET=y
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=y
CONFIG_IPV6_SIT=y
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=y
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_QUEUE=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CONNTRACK=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
# CONFIG_NF_CONNTRACK_ZONES is not set
CONFIG_NF_CONNTRACK_EVENTS=y
# CONFIG_NF_CONNTRACK_TIMESTAMP is not set
# CONFIG_NF_CT_PROTO_DCCP is not set
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
# CONFIG_NF_CONNTRACK_AMANDA is not set
CONFIG_NF_CONNTRACK_FTP=y
CONFIG_NF_CONNTRACK_H323=y
CONFIG_NF_CONNTRACK_IRC=y
CONFIG_NF_CONNTRACK_BROADCAST=y
CONFIG_NF_CONNTRACK_NETBIOS_NS=y
# CONFIG_NF_CONNTRACK_SNMP is not set
CONFIG_NF_CONNTRACK_PPTP=y
CONFIG_NF_CONNTRACK_SANE=y
CONFIG_NF_CONNTRACK_SIP=y
CONFIG_NF_CONNTRACK_TFTP=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NETFILTER_TPROXY=y
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=y
CONFIG_NETFILTER_XT_CONNMARK=y

#
# Xtables targets
#
# CONFIG_NETFILTER_XT_TARGET_AUDIT is not set
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=y
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
CONFIG_NETFILTER_XT_TARGET_CT=y
CONFIG_NETFILTER_XT_TARGET_DSCP=y
CONFIG_NETFILTER_XT_TARGET_HL=y
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
CONFIG_NETFILTER_XT_TARGET_LED=y
CONFIG_NETFILTER_XT_TARGET_MARK=y
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
CONFIG_NETFILTER_XT_TARGET_NOTRACK=y
CONFIG_NETFILTER_XT_TARGET_RATEEST=y
CONFIG_NETFILTER_XT_TARGET_TEE=y
CONFIG_NETFILTER_XT_TARGET_TPROXY=y
CONFIG_NETFILTER_XT_TARGET_TRACE=y
CONFIG_NETFILTER_XT_TARGET_SECMARK=y
CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=y

#
# Xtables matches
#
# CONFIG_NETFILTER_XT_MATCH_ADDRTYPE is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=y
CONFIG_NETFILTER_XT_MATCH_COMMENT=y
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=y
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
CONFIG_NETFILTER_XT_MATCH_CPU=y
CONFIG_NETFILTER_XT_MATCH_DCCP=y
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
CONFIG_NETFILTER_XT_MATCH_DSCP=y
CONFIG_NETFILTER_XT_MATCH_ESP=y
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
CONFIG_NETFILTER_XT_MATCH_HELPER=y
CONFIG_NETFILTER_XT_MATCH_HL=y
CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
CONFIG_NETFILTER_XT_MATCH_LENGTH=y
CONFIG_NETFILTER_XT_MATCH_LIMIT=y
CONFIG_NETFILTER_XT_MATCH_MAC=y
CONFIG_NETFILTER_XT_MATCH_MARK=y
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
CONFIG_NETFILTER_XT_MATCH_OSF=y
CONFIG_NETFILTER_XT_MATCH_OWNER=y
CONFIG_NETFILTER_XT_MATCH_POLICY=y
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=y
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
CONFIG_NETFILTER_XT_MATCH_RATEEST=y
CONFIG_NETFILTER_XT_MATCH_REALM=y
CONFIG_NETFILTER_XT_MATCH_RECENT=y
CONFIG_NETFILTER_XT_MATCH_SCTP=y
CONFIG_NETFILTER_XT_MATCH_SOCKET=y
CONFIG_NETFILTER_XT_MATCH_STATE=y
CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
CONFIG_NETFILTER_XT_MATCH_STRING=y
CONFIG_NETFILTER_XT_MATCH_TCPMSS=y
CONFIG_NETFILTER_XT_MATCH_TIME=y
CONFIG_NETFILTER_XT_MATCH_U32=y
# CONFIG_IP_SET is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_CONNTRACK_IPV4=y
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_IP_NF_QUEUE=y
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_AH=y
CONFIG_IP_NF_MATCH_ECN=y
CONFIG_IP_NF_MATCH_TTL=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
CONFIG_IP_NF_TARGET_LOG=y
CONFIG_IP_NF_TARGET_ULOG=y
CONFIG_NF_NAT=y
CONFIG_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_IP_NF_TARGET_NETMAP=y
CONFIG_IP_NF_TARGET_REDIRECT=y
CONFIG_NF_NAT_PROTO_GRE=y
CONFIG_NF_NAT_PROTO_UDPLITE=y
CONFIG_NF_NAT_PROTO_SCTP=y
CONFIG_NF_NAT_FTP=y
CONFIG_NF_NAT_IRC=y
CONFIG_NF_NAT_TFTP=y
# CONFIG_NF_NAT_AMANDA is not set
CONFIG_NF_NAT_PPTP=y
CONFIG_NF_NAT_H323=y
CONFIG_NF_NAT_SIP=y
CONFIG_IP_NF_MANGLE=y
CONFIG_IP_NF_TARGET_CLUSTERIP=y
CONFIG_IP_NF_TARGET_ECN=y
CONFIG_IP_NF_TARGET_TTL=y
CONFIG_IP_NF_RAW=y
CONFIG_IP_NF_SECURITY=y
CONFIG_IP_NF_ARPTABLES=y
CONFIG_IP_NF_ARPFILTER=y
CONFIG_IP_NF_ARP_MANGLE=y

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV6=y
CONFIG_NF_CONNTRACK_IPV6=y
CONFIG_IP6_NF_QUEUE=y
CONFIG_IP6_NF_IPTABLES=y
CONFIG_IP6_NF_MATCH_AH=y
CONFIG_IP6_NF_MATCH_EUI64=y
CONFIG_IP6_NF_MATCH_FRAG=y
CONFIG_IP6_NF_MATCH_OPTS=y
CONFIG_IP6_NF_MATCH_HL=y
CONFIG_IP6_NF_MATCH_IPV6HEADER=y
CONFIG_IP6_NF_MATCH_MH=y
CONFIG_IP6_NF_MATCH_RT=y
CONFIG_IP6_NF_TARGET_HL=y
CONFIG_IP6_NF_TARGET_LOG=y
CONFIG_IP6_NF_FILTER=y
CONFIG_IP6_NF_TARGET_REJECT=y
CONFIG_IP6_NF_MANGLE=y
CONFIG_IP6_NF_RAW=y
CONFIG_IP6_NF_SECURITY=y
CONFIG_BRIDGE_NF_EBTABLES=y
CONFIG_BRIDGE_EBT_BROUTE=y
CONFIG_BRIDGE_EBT_T_FILTER=y
CONFIG_BRIDGE_EBT_T_NAT=y
CONFIG_BRIDGE_EBT_802_3=y
CONFIG_BRIDGE_EBT_AMONG=y
CONFIG_BRIDGE_EBT_ARP=y
CONFIG_BRIDGE_EBT_IP=y
CONFIG_BRIDGE_EBT_IP6=y
CONFIG_BRIDGE_EBT_LIMIT=y
CONFIG_BRIDGE_EBT_MARK=y
CONFIG_BRIDGE_EBT_PKTTYPE=y
CONFIG_BRIDGE_EBT_STP=y
CONFIG_BRIDGE_EBT_VLAN=y
CONFIG_BRIDGE_EBT_ARPREPLY=y
CONFIG_BRIDGE_EBT_DNAT=y
CONFIG_BRIDGE_EBT_MARK_T=y
CONFIG_BRIDGE_EBT_REDIRECT=y
CONFIG_BRIDGE_EBT_SNAT=y
CONFIG_BRIDGE_EBT_LOG=y
CONFIG_BRIDGE_EBT_ULOG=y
CONFIG_BRIDGE_EBT_NFLOG=y
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=y
# CONFIG_NET_SCTPPROBE is not set
# CONFIG_SCTP_DBG_MSG is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_HMAC_NONE is not set
# CONFIG_SCTP_HMAC_SHA1 is not set
CONFIG_SCTP_HMAC_MD5=y
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
CONFIG_STP=y
CONFIG_GARP=y
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=y
CONFIG_VLAN_8021Q_GVRP=y
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFB is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set
# CONFIG_NET_SCH_MQPRIO is not set
# CONFIG_NET_SCH_CHOKE is not set
# CONFIG_NET_SCH_INGRESS is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
# CONFIG_NET_CLS_CGROUP is not set
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
# CONFIG_NET_EMATCH_CMP is not set
# CONFIG_NET_EMATCH_NBYTE is not set
# CONFIG_NET_EMATCH_U32 is not set
# CONFIG_NET_EMATCH_META is not set
# CONFIG_NET_EMATCH_TEXT is not set
CONFIG_NET_CLS_ACT=y
# CONFIG_NET_ACT_POLICE is not set
# CONFIG_NET_ACT_GACT is not set
# CONFIG_NET_ACT_MIRRED is not set
# CONFIG_NET_ACT_IPT is not set
# CONFIG_NET_ACT_NAT is not set
# CONFIG_NET_ACT_PEDIT is not set
# CONFIG_NET_ACT_SIMP is not set
# CONFIG_NET_ACT_SKBEDIT is not set
# CONFIG_NET_ACT_CSUM is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set
# CONFIG_LIB80211 is not set

#
# CFG80211 needs to be enabled for MAC80211
#
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
# CONFIG_DEVTMPFS is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_ARCH_NO_SYSDEV_OPS=y
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_VIRTIO_BLK is not set
# CONFIG_BLK_DEV_HD is not set
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_SENSORS_LIS3LV02D is not set
CONFIG_MISC_DEVICES=y
# CONFIG_AD525X_DPOT is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_CS5535_MFGPT is not set
# CONFIG_HP_ILO is not set
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
# CONFIG_ISL29020 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_SENSORS_BH1780 is not set
# CONFIG_SENSORS_BH1770 is not set
# CONFIG_SENSORS_APDS990X is not set
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
# CONFIG_VMWARE_BALLOON is not set
# CONFIG_BMP085 is not set
# CONFIG_PCH_PHUB is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
# CONFIG_EEPROM_MAX6875 is not set
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# CONFIG_SENSORS_LIS3_I2C is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=m
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_SCSI_BNX2X_FCOE is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT2SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT2SAS_LOGGING=y
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_FCOE_FNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=y
# CONFIG_SATA_AHCI_PLATFORM is not set
CONFIG_SATA_INIC162X=y
# CONFIG_SATA_ACARD_AHCI is not set
CONFIG_SATA_SIL24=y
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
CONFIG_PDC_ADMA=y
CONFIG_SATA_QSTOR=y
CONFIG_SATA_SX4=y
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=y
CONFIG_SATA_MV=y
CONFIG_SATA_NV=y
CONFIG_SATA_PROMISE=y
CONFIG_SATA_SIL=y
CONFIG_SATA_SIS=y
CONFIG_SATA_SVW=y
CONFIG_SATA_ULI=y
CONFIG_SATA_VIA=y
CONFIG_SATA_VITESSE=y

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
CONFIG_PATA_AMD=y
# CONFIG_PATA_ARASAN_CF is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CS5536 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
CONFIG_PATA_OLDPIIX=y
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SC1200 is not set
CONFIG_PATA_SCH=y
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
CONFIG_PATA_SIS=y
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_PCMCIA is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
# CONFIG_MULTICORE_RAID456 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
CONFIG_DM_MIRROR=y
# CONFIG_DM_RAID is not set
# CONFIG_DM_LOG_USERSPACE is not set
CONFIG_DM_ZERO=y
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
# CONFIG_DM_FLAKEY is not set
# CONFIG_TARGET_CORE is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# CONFIG_I2O is not set
CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
# CONFIG_IFB is not set
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
CONFIG_MACVLAN=y
# CONFIG_MACVTAP is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=y
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
# CONFIG_MII is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM63XX_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_MDIO_BITBANG is not set
# CONFIG_NET_ETHERNET is not set
CONFIG_NETDEV_1000=y
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
CONFIG_E1000=y
CONFIG_E1000E=y
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
CONFIG_SKY2=y
# CONFIG_SKY2_DEBUG is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_TIGON3=y
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
# CONFIG_STMMAC_ETH is not set
# CONFIG_PCH_GBE is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set
CONFIG_WLAN=y
# CONFIG_PCMCIA_RAYCS is not set
# CONFIG_AIRO is not set
# CONFIG_ATMEL is not set
# CONFIG_AIRO_CS is not set
# CONFIG_PCMCIA_WL3501 is not set
# CONFIG_PRISM54 is not set
# CONFIG_USB_ZD1201 is not set
# CONFIG_HOSTAP is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_USB_IPHETH is not set
# CONFIG_NET_PCMCIA is not set
# CONFIG_WAN is not set

#
# CAIF transport drivers
#
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_MPPE=m
CONFIG_PPPOE=m
# CONFIG_SLIP is not set
CONFIG_SLHC=m
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=y
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_VIRTIO_NET is not set
# CONFIG_VMXNET3 is not set
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=y
CONFIG_INPUT_SPARSEKMAP=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_SENTELIC is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
CONFIG_INPUT_JOYSTICK=y
# CONFIG_JOYSTICK_ANALOG is not set
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
# CONFIG_JOYSTICK_GF2K is not set
# CONFIG_JOYSTICK_GRIP is not set
# CONFIG_JOYSTICK_GRIP_MP is not set
# CONFIG_JOYSTICK_GUILLEMOT is not set
# CONFIG_JOYSTICK_INTERACT is not set
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
# CONFIG_JOYSTICK_TWIDJOY is not set
# CONFIG_JOYSTICK_ZHENHUA is not set
# CONFIG_JOYSTICK_AS5011 is not set
# CONFIG_JOYSTICK_JOYDUMP is not set
# CONFIG_JOYSTICK_XPAD is not set
CONFIG_INPUT_TABLET=y
# CONFIG_TABLET_USB_ACECAD is not set
# CONFIG_TABLET_USB_AIPTEK is not set
# CONFIG_TABLET_USB_GTCO is not set
# CONFIG_TABLET_USB_HANWANG is not set
# CONFIG_TABLET_USB_KBTAB is not set
# CONFIG_TABLET_USB_WACOM is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_AD7879 is not set
# CONFIG_TOUCHSCREEN_ATMEL_MXT is not set
# CONFIG_TOUCHSCREEN_BU21013 is not set
# CONFIG_TOUCHSCREEN_DYNAPRO is not set
# CONFIG_TOUCHSCREEN_HAMPSHIRE is not set
# CONFIG_TOUCHSCREEN_EETI is not set
# CONFIG_TOUCHSCREEN_FUJITSU is not set
# CONFIG_TOUCHSCREEN_GUNZE is not set
# CONFIG_TOUCHSCREEN_ELO is not set
# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set
# CONFIG_TOUCHSCREEN_MCS5000 is not set
# CONFIG_TOUCHSCREEN_MTOUCH is not set
# CONFIG_TOUCHSCREEN_INEXIO is not set
# CONFIG_TOUCHSCREEN_MK712 is not set
# CONFIG_TOUCHSCREEN_PENMOUNT is not set
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
# CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
# CONFIG_TOUCHSCREEN_TSC2007 is not set
# CONFIG_TOUCHSCREEN_ST1232 is not set
# CONFIG_TOUCHSCREEN_TPS6507X is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_AD714X is not set
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
# CONFIG_INPUT_UINPUT is not set
# CONFIG_INPUT_PCF8574 is not set
# CONFIG_INPUT_ADXL34X is not set
# CONFIG_INPUT_CMA3000 is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_ROCKETPORT is not set
# CONFIG_CYCLADES is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_SYNCLINK is not set
# CONFIG_SYNCLINKMP is not set
# CONFIG_SYNCLINK_GT is not set
# CONFIG_NOZOMI is not set
# CONFIG_ISI is not set
# CONFIG_N_HDLC is not set
# CONFIG_N_GSM is not set
CONFIG_DEVKMEM=y
# CONFIG_STALDRV is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MFD_HSU is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_CONSOLE_POLL=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_TIMBERDALE is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_PCH_UART is not set
# CONFIG_VIRTIO_CONSOLE is not set
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_HW_RANDOM_INTEL is not set
# CONFIG_HW_RANDOM_AMD is not set
CONFIG_HW_RANDOM_VIA=y
# CONFIG_HW_RANDOM_VIRTIO is not set
CONFIG_NVRAM=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
# CONFIG_IPWIRELESS is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
# CONFIG_HANGCHECK_TIMER is not set
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
# CONFIG_RAMOOPS is not set
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
CONFIG_I2C_I801=y
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
# CONFIG_I2C_SCMI is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_INTEL_MID is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA_PCI is not set
# CONFIG_I2C_SIMTEC is not set
# CONFIG_I2C_XILINX is not set
# CONFIG_I2C_EG20T is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_SPI is not set

#
# PPS support
#
# CONFIG_PPS is not set

#
# PPS generators support
#
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_BQ20Z75 is not set
# CONFIG_BATTERY_BQ27x00 is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
CONFIG_HWMON=y
# CONFIG_HWMON_VID is not set
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7411 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_ASC7621 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_K10TEMP is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS620 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_PKGTEMP is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_JC42 is not set
# CONFIG_SENSORS_LINEAGE is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM73 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4151 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6639 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_PMBUS is not set
# CONFIG_SENSORS_SHT21 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_EMC1403 is not set
# CONFIG_SENSORS_EMC2103 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_SCH5627 is not set
# CONFIG_SENSORS_ADS1015 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_AMC6821 is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_TMP102 is not set
# CONFIG_SENSORS_TMP401 is not set
# CONFIG_SENSORS_TMP421 is not set
# CONFIG_SENSORS_VIA_CPUTEMP is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_APPLESMC is not set

#
# ACPI drivers
#
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_THERMAL=y
# CONFIG_THERMAL_HWMON is not set
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_F71808E_WDT is not set
# CONFIG_SP5100_TCO is not set
# CONFIG_SC520_WDT is not set
# CONFIG_SBC_FITPC2_WATCHDOG is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
# CONFIG_IBMASR is not set
# CONFIG_WAFER_WDT is not set
# CONFIG_I6300ESB_WDT is not set
# CONFIG_ITCO_WDT is not set
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_NV_TCO is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_W83627HF_WDT is not set
# CONFIG_W83697HF_WDT is not set
# CONFIG_W83697UG_WDT is not set
# CONFIG_W83877F_WDT is not set
# CONFIG_W83977F_WDT is not set
# CONFIG_MACHZ_WDT is not set
# CONFIG_SBC_EPX_C3_WATCHDOG is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_MFD_SUPPORT=y
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS6507X is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_MFD_STMPE is not set
# CONFIG_MFD_TC3589X is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_CS5535 is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
CONFIG_DRM_KMS_HELPER=y
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_I810 is not set
CONFIG_DRM_I915=y
CONFIG_DRM_I915_KMS=y
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
# CONFIG_STUB_POULSBO is not set
# CONFIG_VGASTATE is not set
CONFIG_VIDEO_OUTPUT_CONTROL=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_WMT_GE_ROPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_VESA is not set
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_GENERIC=y
# CONFIG_BACKLIGHT_PROGEAR is not set
# CONFIG_BACKLIGHT_APPLE is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
CONFIG_SOUND_OSS_CORE_PRECLAIM=y
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_HRTIMER=y
CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
CONFIG_SND_DYNAMIC_MINORS=y
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
CONFIG_SND_VMASTER=y
CONFIG_SND_DMA_SGBUF=y
# CONFIG_SND_RAWMIDI_SEQ is not set
# CONFIG_SND_OPL3_LIB_SEQ is not set
# CONFIG_SND_OPL4_LIB_SEQ is not set
# CONFIG_SND_SBAWE_SEQ is not set
# CONFIG_SND_EMU10K1_SEQ is not set
CONFIG_SND_DRIVERS=y
# CONFIG_SND_PCSP is not set
# CONFIG_SND_DUMMY is not set
# CONFIG_SND_ALOOP is not set
# CONFIG_SND_VIRMIDI is not set
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_MPU401 is not set
CONFIG_SND_PCI=y
# CONFIG_SND_AD1889 is not set
# CONFIG_SND_ALS300 is not set
# CONFIG_SND_ALS4000 is not set
# CONFIG_SND_ALI5451 is not set
# CONFIG_SND_ASIHPI is not set
# CONFIG_SND_ATIIXP is not set
# CONFIG_SND_ATIIXP_MODEM is not set
# CONFIG_SND_AU8810 is not set
# CONFIG_SND_AU8820 is not set
# CONFIG_SND_AU8830 is not set
# CONFIG_SND_AW2 is not set
# CONFIG_SND_AZT3328 is not set
# CONFIG_SND_BT87X is not set
# CONFIG_SND_CA0106 is not set
# CONFIG_SND_CMIPCI is not set
# CONFIG_SND_OXYGEN is not set
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_CS46XX is not set
# CONFIG_SND_CS5530 is not set
# CONFIG_SND_CS5535AUDIO is not set
# CONFIG_SND_CTXFI is not set
# CONFIG_SND_DARLA20 is not set
# CONFIG_SND_GINA20 is not set
# CONFIG_SND_LAYLA20 is not set
# CONFIG_SND_DARLA24 is not set
# CONFIG_SND_GINA24 is not set
# CONFIG_SND_LAYLA24 is not set
# CONFIG_SND_MONA is not set
# CONFIG_SND_MIA is not set
# CONFIG_SND_ECHO3G is not set
# CONFIG_SND_INDIGO is not set
# CONFIG_SND_INDIGOIO is not set
# CONFIG_SND_INDIGODJ is not set
# CONFIG_SND_INDIGOIOX is not set
# CONFIG_SND_INDIGODJX is not set
# CONFIG_SND_EMU10K1 is not set
# CONFIG_SND_EMU10K1X is not set
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
# CONFIG_SND_ES1938 is not set
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_FM801 is not set
CONFIG_SND_HDA_INTEL=y
CONFIG_SND_HDA_HWDEP=y
# CONFIG_SND_HDA_RECONFIG is not set
# CONFIG_SND_HDA_INPUT_BEEP is not set
# CONFIG_SND_HDA_INPUT_JACK is not set
# CONFIG_SND_HDA_PATCH_LOADER is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_HDMI=y
CONFIG_SND_HDA_CODEC_CIRRUS=y
CONFIG_SND_HDA_CODEC_CONEXANT=y
CONFIG_SND_HDA_CODEC_CA0110=y
CONFIG_SND_HDA_CODEC_CMEDIA=y
CONFIG_SND_HDA_CODEC_SI3054=y
CONFIG_SND_HDA_GENERIC=y
# CONFIG_SND_HDA_POWER_SAVE is not set
# CONFIG_SND_HDSP is not set
# CONFIG_SND_HDSPM is not set
# CONFIG_SND_ICE1712 is not set
# CONFIG_SND_ICE1724 is not set
# CONFIG_SND_INTEL8X0 is not set
# CONFIG_SND_INTEL8X0M is not set
# CONFIG_SND_KORG1212 is not set
# CONFIG_SND_LX6464ES is not set
# CONFIG_SND_MAESTRO3 is not set
# CONFIG_SND_MIXART is not set
# CONFIG_SND_NM256 is not set
# CONFIG_SND_PCXHR is not set
# CONFIG_SND_RIPTIDE is not set
# CONFIG_SND_RME32 is not set
# CONFIG_SND_RME96 is not set
# CONFIG_SND_RME9652 is not set
# CONFIG_SND_SONICVIBES is not set
# CONFIG_SND_TRIDENT is not set
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VIRTUOSO is not set
# CONFIG_SND_VX222 is not set
# CONFIG_SND_YMFPCI is not set
CONFIG_SND_USB=y
# CONFIG_SND_USB_AUDIO is not set
# CONFIG_SND_USB_UA101 is not set
# CONFIG_SND_USB_USX2Y is not set
# CONFIG_SND_USB_CAIAQ is not set
# CONFIG_SND_USB_US122L is not set
# CONFIG_SND_USB_6FIRE is not set
CONFIG_SND_PCMCIA=y
# CONFIG_SND_VXPOCKET is not set
# CONFIG_SND_PDAUDIOCF is not set
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_HIDRAW=y

#
# USB Input Devices
#
CONFIG_USB_HID=y
CONFIG_HID_PID=y
CONFIG_USB_HIDDEV=y

#
# Special HID drivers
#
# CONFIG_HID_3M_PCT is not set
CONFIG_HID_A4TECH=y
# CONFIG_HID_ACRUX is not set
CONFIG_HID_APPLE=y
CONFIG_HID_BELKIN=y
# CONFIG_HID_CANDO is not set
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
# CONFIG_HID_PRODIKEYS is not set
CONFIG_HID_CYPRESS=y
CONFIG_HID_DRAGONRISE=y
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
CONFIG_HID_EZKEY=y
# CONFIG_HID_KEYTOUCH is not set
CONFIG_HID_KYE=y
# CONFIG_HID_UCLOGIC is not set
# CONFIG_HID_WALTOP is not set
CONFIG_HID_GYRATION=y
CONFIG_HID_TWINHAN=y
CONFIG_HID_KENSINGTON=y
# CONFIG_HID_LCPOWER is not set
CONFIG_HID_LOGITECH=y
CONFIG_LOGITECH_FF=y
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWII_FF is not set
CONFIG_HID_MICROSOFT=y
# CONFIG_HID_MOSART is not set
CONFIG_HID_MONTEREY=y
# CONFIG_HID_MULTITOUCH is not set
CONFIG_HID_NTRIG=y
# CONFIG_HID_ORTEK is not set
CONFIG_HID_PANTHERLORD=y
CONFIG_PANTHERLORD_FF=y
CONFIG_HID_PETALYNX=y
# CONFIG_HID_PICOLCD is not set
# CONFIG_HID_QUANTA is not set
# CONFIG_HID_ROCCAT is not set
# CONFIG_HID_ROCCAT_ARVO is not set
# CONFIG_HID_ROCCAT_KONE is not set
# CONFIG_HID_ROCCAT_KONEPLUS is not set
# CONFIG_HID_ROCCAT_KOVAPLUS is not set
# CONFIG_HID_ROCCAT_PYRA is not set
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
# CONFIG_HID_STANTUM is not set
CONFIG_HID_SUNPLUS=y
CONFIG_HID_GREENASIA=y
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_SMARTJOYPLUS=y
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TOPSEED=y
CONFIG_HID_THRUSTMASTER=y
CONFIG_THRUSTMASTER_FF=y
CONFIG_HID_ZEROPLUS=y
CONFIG_ZEROPLUS_FF=y
# CONFIG_HID_ZYDACRON is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
CONFIG_USB_DEBUG=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_DEVICE_CLASS is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
CONFIG_USB_MON=y
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
# CONFIG_USB_ISP1362_HCD is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
CONFIG_USB_PRINTER=y
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set
CONFIG_USB_LIBUSUAL=y

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_LM3530 is not set
# CONFIG_LEDS_ALIX2 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_LP3944 is not set
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_BD2802 is not set
# CONFIG_LEDS_INTEL_SS4200 is not set
CONFIG_LEDS_TRIGGERS=y

#
# LED Triggers
#
# CONFIG_LEDS_TRIGGER_TIMER is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_NFC_DEVICES is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=y
# CONFIG_EDAC_MCE_INJ is not set
# CONFIG_EDAC_MM_EDAC is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
# CONFIG_RTC_HCTOSYS is not set
# CONFIG_RTC_DEBUG is not set

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_DS3232 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_ISL12022 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_BQ32K is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set

#
# on-CPU RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
# CONFIG_INTEL_MID_DMAC is not set
# CONFIG_INTEL_IOATDMA is not set
# CONFIG_TIMB_DMA is not set
# CONFIG_PCH_DMA is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_HP_ACCEL is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_INTEL_MENLOW is not set
CONFIG_EEEPC_LAPTOP=y
# CONFIG_ACPI_WMI is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_ACPI_TOSHIBA is not set
# CONFIG_TOSHIBA_BT_RFKILL is not set
# CONFIG_ACPI_CMPC is not set
# CONFIG_INTEL_IPS is not set
# CONFIG_IBM_RTL is not set
# CONFIG_XO15_EBOOK is not set

#
# Firmware Drivers
#
CONFIG_EDD=y
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_EFI_VARS=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_DMI_SYSFS is not set
# CONFIG_ISCSI_IBFT_FIND is not set
# CONFIG_SIGMA is not set

#
# File systems
#
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4_FS is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
# CONFIG_XFS_DEBUG is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_NILFS2_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_FANOTIFY is not set
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_AUTOFS4_FS=y
CONFIG_FUSE_FS=y
CONFIG_CUSE=y
CONFIG_GENERIC_ACL=y

#
# Caches
#
CONFIG_FSCACHE=y
CONFIG_FSCACHE_STATS=y
CONFIG_FSCACHE_HISTOGRAM=y
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=y
# CONFIG_CACHEFILES_DEBUG is not set
CONFIG_CACHEFILES_HISTOGRAM=y

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_LOGFS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
# CONFIG_NFS_V4_1 is not set
CONFIG_NFS_FSCACHE=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
# CONFIG_NFS_USE_NEW_IDMAPPER is not set
CONFIG_NFSD=y
CONFIG_NFSD_DEPRECATED=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
CONFIG_RPCSEC_GSS_KRB5=y
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=y
CONFIG_CIFS_STATS=y
CONFIG_CIFS_STATS2=y
# CONFIG_CIFS_WEAK_PW_HASH is not set
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG2 is not set
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_CIFS_ACL is not set
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_PRINTK_TIME=y
CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
# CONFIG_DETECT_HUNG_TASK is not set
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RCU is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_DEBUG_CREDENTIALS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_RCU_CPU_STALL_DETECTOR=y
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE=y
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# CONFIG_LKDTM is not set
# CONFIG_CPU_NOTIFIER_ERROR_INJECT is not set
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
# CONFIG_FAIL_PAGE_ALLOC is not set
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_LATENCYTOP=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_EVENT_POWER_TRACING_DEPRECATED=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_FTRACE_SYSCALLS is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_STACK_TRACER is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENT=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
# CONFIG_KGDB_TESTS is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
CONFIG_KGDB_KDB=y
# CONFIG_KDB_KEYBOARD is not set
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_KMEMCHECK is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_STRICT_DEVMEM is not set
# CONFIG_X86_VERBOSE_BOOTUP is not set
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
# CONFIG_DEBUG_RODATA_TEST is not set
# CONFIG_DEBUG_SET_MODULE_RONX is not set
CONFIG_DEBUG_NX_TEST=m
# CONFIG_IOMMU_DEBUG is not set
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
# CONFIG_X86_DECODER_SELFTEST is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
CONFIG_OPTIMIZE_INLINING=y
# CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is not set

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
# CONFIG_SECURITYFS is not set
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
# CONFIG_SECURITY_PATH is not set
# CONFIG_INTEL_TXT is not set
# CONFIG_SECURITY_SELINUX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_IMA is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
# CONFIG_CRYPTO_GF128MUL is not set
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=y
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set
CONFIG_CRYPTO_FPU=y

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_GHASH is not set
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=y
CONFIG_CRYPTO_RMD128=y
CONFIG_CRYPTO_RMD160=y
CONFIG_CRYPTO_RMD256=y
CONFIG_CRYPTO_RMD320=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_X86_64=y
CONFIG_CRYPTO_AES_NI_INTEL=y
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_ARC4=y
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_ZLIB=y
CONFIG_CRYPTO_LZO=y

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=y
# CONFIG_CRYPTO_USER_API_HASH is not set
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=y
CONFIG_CRYPTO_DEV_PADLOCK_AES=y
CONFIG_CRYPTO_DEV_PADLOCK_SHA=y
CONFIG_CRYPTO_DEV_HIFN_795X=y
CONFIG_CRYPTO_DEV_HIFN_795X_RNG=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
# CONFIG_KVM_INTEL is not set
CONFIG_KVM_AMD=y
# CONFIG_KVM_MMU_AUDIT is not set
CONFIG_VHOST_NET=y
CONFIG_VIRTIO=y
CONFIG_VIRTIO_RING=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=y
CONFIG_TEXTSEARCH_BM=y
CONFIG_TEXTSEARCH_FSM=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_CPU_RMAP=y
CONFIG_NLATTR=y
# CONFIG_AVERAGE is not set

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-01 13:58     ` Arne Jansen
@ 2011-06-01 16:35       ` Peter Zijlstra
  2011-06-01 17:20         ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-01 16:35 UTC (permalink / raw)
  To: Arne Jansen
  Cc: mingo, hpa, linux-kernel, torvalds, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
> git bisect blames this commit for a problem I have with v3.0-rc1:
> If I printk large amounts of data, the machine locks up.
> As the commit does not revert cleanly on top of 3.0, I haven't been
> able to double check.
> The test I use is simple, just add something like
> 
> for (i=0; i < 10000; ++i) printk("test %d\n", i);
> 
> and trigger it, in most cases I can see the first 10 printks before
> I have to power cycle the machine (sysrq-b does not work anymore).
> Attached my .config. 

I've made me a module that does the above, I've also changed my .config
to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
but sadly I cannot reproduce, I get all 10k prints on my serial line.

Even without serial line it works (somehow booting without visible
console is scary as hell :)

Which makes me ask, how are you observing your console?

Because those 10k lines aren't even near the amount of crap a regular
boot spews out on this box, although I guess the tight loop might
generate it slightly faster than a regular boot does.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-01 16:35       ` Peter Zijlstra
@ 2011-06-01 17:20         ` Arne Jansen
  2011-06-01 18:09           ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-01 17:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, torvalds, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On 01.06.2011 18:35, Peter Zijlstra wrote:
> On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
>> git bisect blames this commit for a problem I have with v3.0-rc1:
>> If I printk large amounts of data, the machine locks up.
>> As the commit does not revert cleanly on top of 3.0, I haven't been
>> able to double check.
>> The test I use is simple, just add something like
>>
>> for (i=0; i<  10000; ++i) printk("test %d\n", i);
>>
>> and trigger it, in most cases I can see the first 10 printks before
>> I have to power cycle the machine (sysrq-b does not work anymore).
>> Attached my .config.
>
> I've made me a module that does the above, I've also changed my .config
> to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
> but sadly I cannot reproduce, I get all 10k prints on my serial line.
>
> Even without serial line it works (somehow booting without visible
> console is scary as hell :)
>
> Which makes me ask, how are you observing your console?
>

They don't go out to the serial line, I only observe them with a
tail -f on messages. Default log level doesn't go the console here.

> Because those 10k lines aren't even near the amount of crap a regular
> boot spews out on this box, although I guess the tight loop might
> generate it slightly faster than a regular boot does.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-01 17:20         ` Arne Jansen
@ 2011-06-01 18:09           ` Peter Zijlstra
  2011-06-01 18:44             ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-01 18:09 UTC (permalink / raw)
  To: Arne Jansen
  Cc: mingo, hpa, linux-kernel, torvalds, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On Wed, 2011-06-01 at 19:20 +0200, Arne Jansen wrote:
> On 01.06.2011 18:35, Peter Zijlstra wrote:
> > On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
> >> git bisect blames this commit for a problem I have with v3.0-rc1:
> >> If I printk large amounts of data, the machine locks up.
> >> As the commit does not revert cleanly on top of 3.0, I haven't been
> >> able to double check.
> >> The test I use is simple, just add something like
> >>
> >> for (i=0; i<  10000; ++i) printk("test %d\n", i);
> >>
> >> and trigger it, in most cases I can see the first 10 printks before
> >> I have to power cycle the machine (sysrq-b does not work anymore).
> >> Attached my .config.
> >
> > I've made me a module that does the above, I've also changed my .config
> > to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
> > but sadly I cannot reproduce, I get all 10k prints on my serial line.
> >
> > Even without serial line it works (somehow booting without visible
> > console is scary as hell :)
> >
> > Which makes me ask, how are you observing your console?
> >
> 
> They don't go out to the serial line, I only observe them with a
> tail -f on messages. Default log level doesn't go the console here.

Right ok, so I used your exact .config, added a few drivers needed for
my hardware and indeed, it doesn't even finish booting and gets stuck
someplace.

Sadly it looks like even the NMI watchdog is dead,.. /me goes try and
make sense of this.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-01 18:09           ` Peter Zijlstra
@ 2011-06-01 18:44             ` Peter Zijlstra
  2011-06-01 19:30               ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-01 18:44 UTC (permalink / raw)
  To: Arne Jansen
  Cc: mingo, hpa, linux-kernel, torvalds, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On Wed, 2011-06-01 at 20:09 +0200, Peter Zijlstra wrote:
> On Wed, 2011-06-01 at 19:20 +0200, Arne Jansen wrote:
> > On 01.06.2011 18:35, Peter Zijlstra wrote:
> > > On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
> > >> git bisect blames this commit for a problem I have with v3.0-rc1:
> > >> If I printk large amounts of data, the machine locks up.
> > >> As the commit does not revert cleanly on top of 3.0, I haven't been
> > >> able to double check.
> > >> The test I use is simple, just add something like
> > >>
> > >> for (i=0; i<  10000; ++i) printk("test %d\n", i);
> > >>
> > >> and trigger it, in most cases I can see the first 10 printks before
> > >> I have to power cycle the machine (sysrq-b does not work anymore).
> > >> Attached my .config.
> > >
> > > I've made me a module that does the above, I've also changed my .config
> > > to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
> > > but sadly I cannot reproduce, I get all 10k prints on my serial line.
> > >
> > > Even without serial line it works (somehow booting without visible
> > > console is scary as hell :)
> > >
> > > Which makes me ask, how are you observing your console?
> > >
> > 
> > They don't go out to the serial line, I only observe them with a
> > tail -f on messages. Default log level doesn't go the console here.
> 
> Right ok, so I used your exact .config, added a few drivers needed for
> my hardware and indeed, it doesn't even finish booting and gets stuck
> someplace.
> 
> Sadly it looks like even the NMI watchdog is dead,.. /me goes try and
> make sense of this.

Sadly both 0122ec5b02f766c355b3168df53a6c038a24fa0d^1 and
0122ec5b02f766c355b3168df53a6c038a24fa0d itself boot just fine and run
the test module without problems.

I will have to re-bisect this.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-01 18:44             ` Peter Zijlstra
@ 2011-06-01 19:30               ` Arne Jansen
  2011-06-01 21:09                 ` Linus Torvalds
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-01 19:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, torvalds, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On 01.06.2011 20:44, Peter Zijlstra wrote:
> On Wed, 2011-06-01 at 20:09 +0200, Peter Zijlstra wrote:
>> On Wed, 2011-06-01 at 19:20 +0200, Arne Jansen wrote:
>>> On 01.06.2011 18:35, Peter Zijlstra wrote:
>>>> On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
>>>>> git bisect blames this commit for a problem I have with v3.0-rc1:
>>>>> If I printk large amounts of data, the machine locks up.
>>>>> As the commit does not revert cleanly on top of 3.0, I haven't been
>>>>> able to double check.
>>>>> The test I use is simple, just add something like
>>>>>
>>>>> for (i=0; i<   10000; ++i) printk("test %d\n", i);
>>>>>
>>>>> and trigger it, in most cases I can see the first 10 printks before
>>>>> I have to power cycle the machine (sysrq-b does not work anymore).
>>>>> Attached my .config.
>>>>
>>>> I've made me a module that does the above, I've also changed my .config
>>>> to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
>>>> but sadly I cannot reproduce, I get all 10k prints on my serial line.
>>>>
>>>> Even without serial line it works (somehow booting without visible
>>>> console is scary as hell :)
>>>>
>>>> Which makes me ask, how are you observing your console?
>>>>
>>>
>>> They don't go out to the serial line, I only observe them with a
>>> tail -f on messages. Default log level doesn't go the console here.
>>
>> Right ok, so I used your exact .config, added a few drivers needed for
>> my hardware and indeed, it doesn't even finish booting and gets stuck
>> someplace.
>>
>> Sadly it looks like even the NMI watchdog is dead,.. /me goes try and
>> make sense of this.
>
> Sadly both 0122ec5b02f766c355b3168df53a6c038a24fa0d^1 and
> 0122ec5b02f766c355b3168df53a6c038a24fa0d itself boot just fine and run
> the test module without problems.

I can only partially confirm this:

2acca55ed98ad9b9aa25e7e587ebe306c0313dc7 runs fine
0122ec5b02f766c355b3168df53a6c038a24fa0d freezes after line 189
ab2515c4b98f7bc4fa11cad9fa0f811d63a72a26 freezes after line 39

>
> I will have to re-bisect this.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-01 19:30               ` Arne Jansen
@ 2011-06-01 21:09                 ` Linus Torvalds
  2011-06-03  9:15                   ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Linus Torvalds @ 2011-06-01 21:09 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, mingo, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

Boot-time hang - maybe due to the mis-merge that re-introduced the
infinite media change signals for ide-cd?

I just pushed out a fix, it may not have mirrored out yet.

I dunno. Worth checking out before spending a lot of time bisecting.

                 Linus

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-01 21:09                 ` Linus Torvalds
@ 2011-06-03  9:15                   ` Peter Zijlstra
  2011-06-03 10:02                     ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-03  9:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arne Jansen, mingo, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On Thu, 2011-06-02 at 06:09 +0900, Linus Torvalds wrote:
> Boot-time hang - maybe due to the mis-merge that re-introduced the
> infinite media change signals for ide-cd?
> 
> I just pushed out a fix, it may not have mirrored out yet.
> 
> I dunno. Worth checking out before spending a lot of time bisecting.

Right, so that wasn't it. I haven't done a full bisect yet because I
noticed it died on a usb suspend line every single time and that machine
only had a single usb device, a memory stick, in it. So I simply pulled
the stick and voila it booted. So something is screwy with usb suspend
or something.

This of course means that I'm now completely unable to reproduce the
issue at hand :/

Maybe if I try another box..

Anyway, Arne, how long did you wait before power cycling the box? The
NMI watchdog should trigger in about a minute or so if it will trigger
at all (its enabled in your config).



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-03  9:15                   ` Peter Zijlstra
@ 2011-06-03 10:02                     ` Arne Jansen
  2011-06-03 10:30                       ` Peter Zijlstra
  2011-06-03 12:44                       ` [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock() Linus Torvalds
  0 siblings, 2 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-03 10:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, mingo, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On 03.06.2011 11:15, Peter Zijlstra wrote:
> On Thu, 2011-06-02 at 06:09 +0900, Linus Torvalds wrote:
>> Boot-time hang - maybe due to the mis-merge that re-introduced the
>> infinite media change signals for ide-cd?
>>
>> I just pushed out a fix, it may not have mirrored out yet.
>>
>> I dunno. Worth checking out before spending a lot of time bisecting.
> 
> Right, so that wasn't it. I haven't done a full bisect yet because I
> noticed it died on a usb suspend line every single time and that machine
> only had a single usb device, a memory stick, in it. So I simply pulled
> the stick and voila it booted. So something is screwy with usb suspend
> or something.
> 
> This of course means that I'm now completely unable to reproduce the
> issue at hand :/
> 
> Maybe if I try another box..
> 
> Anyway, Arne, how long did you wait before power cycling the box? The
> NMI watchdog should trigger in about a minute or so if it will trigger
> at all (its enabled in your config).

No, it doesn't trigger, but the hang is not as complete as I first
thought. A running iostat via ssh continues to give output for a while,
the serial console still reacts to return and prompts for login. But
after a while more and more locks up. The console locks as soon as I
sysrq-t.
Maybe it has also something to do with the place where I added the
printks (btrfs_scan_one_device). Also the 10k-print gets triggered
several times (though I only see 10 lines of output). Maybe you can
send me your test-module and I'll try that, so we have more equal
conditions.
What also might help: the maschine I'm testing with is a quad-core
X3450 with 8GB RAM.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-03 10:02                     ` Arne Jansen
@ 2011-06-03 10:30                       ` Peter Zijlstra
  2011-06-03 11:52                         ` Arne Jansen
  2011-06-05  8:17                         ` Ingo Molnar
  2011-06-03 12:44                       ` [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock() Linus Torvalds
  1 sibling, 2 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-03 10:30 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Linus Torvalds, mingo, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
> On 03.06.2011 11:15, Peter Zijlstra wrote:

> > Anyway, Arne, how long did you wait before power cycling the box? The
> > NMI watchdog should trigger in about a minute or so if it will trigger
> > at all (its enabled in your config).
> 
> No, it doesn't trigger,

Bummer.

>  but the hang is not as complete as I first
> thought. A running iostat via ssh continues to give output for a while,
> the serial console still reacts to return and prompts for login. But
> after a while more and more locks up. The console locks as soon as I
> sysrq-t.

OK, that seems to suggest one CPU is stuck, and once you try something
that touches the CPU everything grinds to a halt. Does something like
sysrq-l work? That would send NMIs to the other CPUs.

Anyway, good to know using serial doesn't make it go away, that means
its not too timing sensitive.

> Maybe it has also something to do with the place where I added the
> printks (btrfs_scan_one_device). 

printk() should work pretty much anywhere these days, and filesystem
code in particular shouldn't be ran from any weird and wonderful
contexts afaik.

> Also the 10k-print gets triggered
> several times (though I only see 10 lines of output). Maybe you can
> send me your test-module and I'll try that, so we have more equal
> conditions.

Sure, see below.

> What also might help: the maschine I'm testing with is a quad-core
> X3450 with 8GB RAM.

/me & wikipedia, that's a nehalem box, ok I'm testing on a westmere
(don't have a nehalem).

---
 kernel/Makefile |    1 +
 kernel/test.c   |   23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index 2d64cfc..65eff6c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -80,6 +80,7 @@ obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
+obj-m += test.o
 obj-$(CONFIG_TREE_RCU) += rcutree.o
 obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
 obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
diff --git a/kernel/test.c b/kernel/test.c
index e69de29..8005395 100644
--- a/kernel/test.c
+++ b/kernel/test.c
@@ -0,0 +1,23 @@
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+MODULE_LICENSE("GPL");
+
+static void
+test_cleanup(void)
+{
+}
+
+static int __init
+test_init(void)
+{
+	int i;
+
+	for (i = 0; i < 10000; i++)
+		printk("test %d\n", i);
+
+	return 0;
+}
+
+module_init(test_init);
+module_exit(test_cleanup);


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-03 10:30                       ` Peter Zijlstra
@ 2011-06-03 11:52                         ` Arne Jansen
  2011-06-05  8:17                         ` Ingo Molnar
  1 sibling, 0 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-03 11:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, mingo, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On 03.06.2011 12:30, Peter Zijlstra wrote:
> On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
>> On 03.06.2011 11:15, Peter Zijlstra wrote:
> 
> Bummer.
> 
>>  but the hang is not as complete as I first
>> thought. A running iostat via ssh continues to give output for a while,
>> the serial console still reacts to return and prompts for login. But
>> after a while more and more locks up. The console locks as soon as I
>> sysrq-t.
> 
> OK, that seems to suggest one CPU is stuck, and once you try something
> that touches the CPU everything grinds to a halt. Does something like
> sysrq-l work? That would send NMIs to the other CPUs.
> 
> Anyway, good to know using serial doesn't make it go away, that means
> its not too timing sensitive.
> 
> 
>> Also the 10k-print gets triggered
>> several times (though I only see 10 lines of output). Maybe you can
>> send me your test-module and I'll try that, so we have more equal
>> conditions.
> 
> Sure, see below.
> 

Your module also triggers it. On first test directly on first try, on
second test only on the 3rd try. When it hangs sysrq-l doesn't give
any output. I double checked without a hang, and then it dumps
something.


>> What also might help: the maschine I'm testing with is a quad-core
>> X3450 with 8GB RAM.
> 
> /me & wikipedia, that's a nehalem box, ok I'm testing on a westmere
> (don't have a nehalem).


> 
> ---
>  kernel/Makefile |    1 +
>  kernel/test.c   |   23 +++++++++++++++++++++++
>  2 files changed, 24 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 2d64cfc..65eff6c 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -80,6 +80,7 @@ obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
>  obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
>  obj-$(CONFIG_SECCOMP) += seccomp.o
>  obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
> +obj-m += test.o
>  obj-$(CONFIG_TREE_RCU) += rcutree.o
>  obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
>  obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
> diff --git a/kernel/test.c b/kernel/test.c
> index e69de29..8005395 100644
> --- a/kernel/test.c
> +++ b/kernel/test.c
> @@ -0,0 +1,23 @@
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +MODULE_LICENSE("GPL");
> +
> +static void
> +test_cleanup(void)
> +{
> +}
> +
> +static int __init
> +test_init(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < 10000; i++)
> +		printk("test %d\n", i);
> +
> +	return 0;
> +}
> +
> +module_init(test_init);
> +module_exit(test_cleanup);
> 


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-03 10:02                     ` Arne Jansen
  2011-06-03 10:30                       ` Peter Zijlstra
@ 2011-06-03 12:44                       ` Linus Torvalds
  2011-06-03 13:05                         ` Arne Jansen
  1 sibling, 1 reply; 152+ messages in thread
From: Linus Torvalds @ 2011-06-03 12:44 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, mingo, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On Fri, Jun 3, 2011 at 7:02 PM, Arne Jansen <lists@die-jansens.de> wrote:
>
> No, it doesn't trigger, but the hang is not as complete as I first
> thought. A running iostat via ssh continues to give output for a while,
> the serial console still reacts to return and prompts for login. But
> after a while more and more locks up. The console locks as soon as I
> sysrq-t.

Is it the tty rescheduling bug?

That would explain the printk's mattering.

Remove the schedule_work() call from flush_to_ldisc() in
drivers/tty/tty_buffer.c and see if the problem goes away. See the
other discussion thread on lkml ("tty breakage in X (Was: tty vs
workqueue oddities)")

Hmm?

                   Linus

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-03 12:44                       ` [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock() Linus Torvalds
@ 2011-06-03 13:05                         ` Arne Jansen
  2011-06-04 21:29                           ` Linus Torvalds
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-03 13:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, mingo, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, tglx, mingo, linux-tip-commits

On 03.06.2011 14:44, Linus Torvalds wrote:
> On Fri, Jun 3, 2011 at 7:02 PM, Arne Jansen <lists@die-jansens.de> wrote:
>>
>> No, it doesn't trigger, but the hang is not as complete as I first
>> thought. A running iostat via ssh continues to give output for a while,
>> the serial console still reacts to return and prompts for login. But
>> after a while more and more locks up. The console locks as soon as I
>> sysrq-t.
> 
> Is it the tty rescheduling bug?
> 
> That would explain the printk's mattering.
> 
> Remove the schedule_work() call from flush_to_ldisc() in
> drivers/tty/tty_buffer.c and see if the problem goes away. See the
> other discussion thread on lkml ("tty breakage in X (Was: tty vs
> workqueue oddities)")
> 
> Hmm?

No change. Also git bisect quite clearly points to
0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
b1c43f82c5aa2654 mentioned in the other thread.

-Arne


>                    Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-03 13:05                         ` Arne Jansen
@ 2011-06-04 21:29                           ` Linus Torvalds
  2011-06-04 22:08                             ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Linus Torvalds @ 2011-06-04 21:29 UTC (permalink / raw)
  To: Arne Jansen, Ingo Molnar
  Cc: Peter Zijlstra, hpa, linux-kernel, efault, npiggin, akpm,
	frank.rowand, linux-tip-commits, Thomas Gleixner

On Fri, Jun 3, 2011 at 10:05 PM, Arne Jansen <lists@die-jansens.de> wrote:
>
> No change. Also git bisect quite clearly points to
> 0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
> b1c43f82c5aa2654 mentioned in the other thread.

Ok, I haven't heard anything further on this. Ingo? Peter?

We're getting to the point where we just need to revert the thing,
since I'm not getting the feeling that there are any fixes
forthcoming, and I'd like -rc2 to not have this kind of bisected bug.

Ingo? Those two commits no longer revert cleanly, presumably due to
other changes in the area (but I didn't check). Can you do a patch to
do the reverts, and then you can try to re-do the thing later once you
figure out what's wrong.

                       Linus

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-04 21:29                           ` Linus Torvalds
@ 2011-06-04 22:08                             ` Peter Zijlstra
  2011-06-04 22:50                               ` Linus Torvalds
  2011-06-05  6:01                               ` Arne Jansen
  0 siblings, 2 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-04 22:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arne Jansen, Ingo Molnar, hpa, linux-kernel, efault, npiggin,
	akpm, frank.rowand, linux-tip-commits, Thomas Gleixner

On Sun, 2011-06-05 at 06:29 +0900, Linus Torvalds wrote:
> On Fri, Jun 3, 2011 at 10:05 PM, Arne Jansen <lists@die-jansens.de> wrote:
> >
> > No change. Also git bisect quite clearly points to
> > 0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
> > b1c43f82c5aa2654 mentioned in the other thread.
> 
> Ok, I haven't heard anything further on this. Ingo? Peter?

I'm a bit stumped, and not being able to reproduce at all :/

> We're getting to the point where we just need to revert the thing,
> since I'm not getting the feeling that there are any fixes
> forthcoming, and I'd like -rc2 to not have this kind of bisected bug.

Agreed.

> Ingo? Those two commits no longer revert cleanly, presumably due to
> other changes in the area (but I didn't check). Can you do a patch to
> do the reverts, and then you can try to re-do the thing later once you
> figure out what's wrong.

Yeah, that wants a whole lot of reverting, from the offending commit up
to and including 317f394160e9beb97d19a84c39b7e5eb3d7815a8.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-04 22:08                             ` Peter Zijlstra
@ 2011-06-04 22:50                               ` Linus Torvalds
  2011-06-05  6:01                               ` Arne Jansen
  1 sibling, 0 replies; 152+ messages in thread
From: Linus Torvalds @ 2011-06-04 22:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Ingo Molnar, hpa, linux-kernel, efault, npiggin,
	akpm, frank.rowand, linux-tip-commits, Thomas Gleixner

On Sun, Jun 5, 2011 at 7:08 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Yeah, that wants a whole lot of reverting, from the offending commit up
> to and including 317f394160e9beb97d19a84c39b7e5eb3d7815a8.

Mind sending one single tested patch? I still get conflicts, even just
trying to revert the last of those (ie 317f394160e9) due to all the
other scheduler changes..

                   Linus

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-04 22:08                             ` Peter Zijlstra
  2011-06-04 22:50                               ` Linus Torvalds
@ 2011-06-05  6:01                               ` Arne Jansen
  2011-06-05  7:57                                 ` Mike Galbraith
  1 sibling, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05  6:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, hpa, linux-kernel, efault, npiggin,
	akpm, frank.rowand, linux-tip-commits, Thomas Gleixner

On 05.06.2011 00:08, Peter Zijlstra wrote:
> On Sun, 2011-06-05 at 06:29 +0900, Linus Torvalds wrote:
>> On Fri, Jun 3, 2011 at 10:05 PM, Arne Jansen<lists@die-jansens.de>  wrote:
>>>
>>> No change. Also git bisect quite clearly points to
>>> 0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
>>> b1c43f82c5aa2654 mentioned in the other thread.
>>
>> Ok, I haven't heard anything further on this. Ingo? Peter?
>
> I'm a bit stumped, and not being able to reproduce at all :/

I'm willing to take any number of round trips to get to the true
nature of the bug. From the description I have a feeling that the
offending patch might just shift the timing slightly, so even if
the problem is gone for me, it might just be buried deeper.
I can also try to reproduce it on a second machine and give you
access to it, though this might take a few days.

>
>> We're getting to the point where we just need to revert the thing,
>> since I'm not getting the feeling that there are any fixes
>> forthcoming, and I'd like -rc2 to not have this kind of bisected bug.
>
> Agreed.
>
>> Ingo? Those two commits no longer revert cleanly, presumably due to
>> other changes in the area (but I didn't check). Can you do a patch to
>> do the reverts, and then you can try to re-do the thing later once you
>> figure out what's wrong.
>
> Yeah, that wants a whole lot of reverting, from the offending commit up
> to and including 317f394160e9beb97d19a84c39b7e5eb3d7815a8.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05  6:01                               ` Arne Jansen
@ 2011-06-05  7:57                                 ` Mike Galbraith
  0 siblings, 0 replies; 152+ messages in thread
From: Mike Galbraith @ 2011-06-05  7:57 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, hpa, linux-kernel,
	npiggin, akpm, frank.rowand, linux-tip-commits, Thomas Gleixner

On Sun, 2011-06-05 at 08:01 +0200, Arne Jansen wrote:
> On 05.06.2011 00:08, Peter Zijlstra wrote:
> > On Sun, 2011-06-05 at 06:29 +0900, Linus Torvalds wrote:
> >> On Fri, Jun 3, 2011 at 10:05 PM, Arne Jansen<lists@die-jansens.de>  wrote:
> >>>
> >>> No change. Also git bisect quite clearly points to
> >>> 0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
> >>> b1c43f82c5aa2654 mentioned in the other thread.
> >>
> >> Ok, I haven't heard anything further on this. Ingo? Peter?
> >
> > I'm a bit stumped, and not being able to reproduce at all :/
> 
> I'm willing to take any number of round trips to get to the true
> nature of the bug. From the description I have a feeling that the
> offending patch might just shift the timing slightly, so even if
> the problem is gone for me, it might just be buried deeper.
> I can also try to reproduce it on a second machine and give you
> access to it, though this might take a few days.

My x3550 M3 with your config plus a couple drivers I need is exhibiting
hard lockups with no watchdog triggered.  If I use both serial and tty
consoles, I get all kinds of gripes from soft lockup and rcu when
loading Peter's test module, but that's the tty problem.

With serial console and earlyprintk=serial alone, box may or may not
boot all the way without hard locking silently.  Ditto on restart after
loading/unloading Peter's test module.  I always see all 10k messages
though, IFF I have no tty console active.

	-Mike


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-03 10:30                       ` Peter Zijlstra
  2011-06-03 11:52                         ` Arne Jansen
@ 2011-06-05  8:17                         ` Ingo Molnar
  2011-06-05  8:53                           ` Arne Jansen
  2011-06-05  9:43                           ` Arne Jansen
  1 sibling, 2 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05  8:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
> > On 03.06.2011 11:15, Peter Zijlstra wrote:
> 
> > > Anyway, Arne, how long did you wait before power cycling the box? The
> > > NMI watchdog should trigger in about a minute or so if it will trigger
> > > at all (its enabled in your config).
> > 
> > No, it doesn't trigger,
> 
> Bummer.

Is there no output even when the console is configured to do an 
earlyprintk? That will allow the NMI watchdog to punch through even a 
printk or scheduler lockup.

Arne, you can turn this on via one of these:

  earlyprintk=vga,keep
  earlyprintk=serial,ttyS0,115200,keep

(the ',keep' portion is important to have it active even after the 
regular console has been switched on.)

Could you also please check with the (untested) patch below applied? 
This will turn off *all* printk done by the NMI watchdog and switches 
it to do pure early_printk() - which does not use any locking so it 
should never lock up.

[ If you keep seeing 'NMI watchdog tick' messages periodically 
  occuring after the lockup then i'll send a more complete patch that 
  shuts off the regular printk path and makes sure that all output is 
  early_printk() based only. ]

earlyprintk=,keep with such a patch has let me down only on the 
rarest of occasions.

( Arne, please also double check on a working bootup that the NMI 
  watchdog is actually ticking, by checking the NMI counts in 
  /proc/interrupts go up slowly but surely on all CPUs. )

Thanks,

	Ingo

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3d0c56a..7c7e33f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -234,15 +234,12 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		if (__this_cpu_read(hard_watchdog_warn) == true)
 			return;
 
-		if (hardlockup_panic)
-			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
-		else
-			WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
-
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;
 	}
 
+	early_printk("NMI watchog tick %ld\n", jiffies);
+
 	__this_cpu_write(hard_watchdog_warn, false);
 	return;
 }

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05  8:17                         ` Ingo Molnar
@ 2011-06-05  8:53                           ` Arne Jansen
  2011-06-05  9:41                             ` Ingo Molnar
  2011-06-05  9:43                           ` Arne Jansen
  1 sibling, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05  8:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 10:17, Ingo Molnar wrote:
>
> * Peter Zijlstra<peterz@infradead.org>  wrote:
>
>> On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
>>> On 03.06.2011 11:15, Peter Zijlstra wrote:
>>
>>>> Anyway, Arne, how long did you wait before power cycling the box? The
>>>> NMI watchdog should trigger in about a minute or so if it will trigger
>>>> at all (its enabled in your config).
>>>
>>> No, it doesn't trigger,
>>
>> Bummer.
>
> Is there no output even when the console is configured to do an
> earlyprintk? That will allow the NMI watchdog to punch through even a
> printk or scheduler lockup.
>

Just to be clear, I have no boot problems whatsoever. And I have no
problems with the serial console. It's just the regular printk locking
up when e.g. I load the test module.

> Arne, you can turn this on via one of these:
>
>    earlyprintk=vga,keep

I don't have access to vga as it is a remote machine.

>    earlyprintk=serial,ttyS0,115200,keep

I'll try that.

>
> (the ',keep' portion is important to have it active even after the
> regular console has been switched on.)
>
> Could you also please check with the (untested) patch below applied?
> This will turn off *all* printk done by the NMI watchdog and switches
> it to do pure early_printk() - which does not use any locking so it
> should never lock up.
>
> [ If you keep seeing 'NMI watchdog tick' messages periodically
>    occuring after the lockup then i'll send a more complete patch that
>    shuts off the regular printk path and makes sure that all output is
>    early_printk() based only. ]
>
> earlyprintk=,keep with such a patch has let me down only on the
> rarest of occasions.
>
> ( Arne, please also double check on a working bootup that the NMI
>    watchdog is actually ticking, by checking the NMI counts in
>    /proc/interrupts go up slowly but surely on all CPUs. )
>
> Thanks,
>
> 	Ingo
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 3d0c56a..7c7e33f 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -234,15 +234,12 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
>   		if (__this_cpu_read(hard_watchdog_warn) == true)
>   			return;
>
> -		if (hardlockup_panic)
> -			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> -		else
> -			WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> -
>   		__this_cpu_write(hard_watchdog_warn, true);
>   		return;
>   	}
>
> +	early_printk("NMI watchog tick %ld\n", jiffies);
> +
>   	__this_cpu_write(hard_watchdog_warn, false);
>   	return;
>   }


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05  8:53                           ` Arne Jansen
@ 2011-06-05  9:41                             ` Ingo Molnar
  2011-06-05  9:45                               ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05  9:41 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> On 05.06.2011 10:17, Ingo Molnar wrote:
> >
> >* Peter Zijlstra<peterz@infradead.org>  wrote:
> >
> >>On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
> >>>On 03.06.2011 11:15, Peter Zijlstra wrote:
> >>
> >>>>Anyway, Arne, how long did you wait before power cycling the box? The
> >>>>NMI watchdog should trigger in about a minute or so if it will trigger
> >>>>at all (its enabled in your config).
> >>>
> >>>No, it doesn't trigger,
> >>
> >>Bummer.
> >
> >Is there no output even when the console is configured to do an
> >earlyprintk? That will allow the NMI watchdog to punch through even a
> >printk or scheduler lockup.
> >
> 
> Just to be clear, I have no boot problems whatsoever. And I have no
> problems with the serial console. It's just the regular printk locking
> up when e.g. I load the test module.

Yes.

> > Arne, you can turn this on via one of these:
> >
> >   earlyprintk=vga,keep
> 
> I don't have access to vga as it is a remote machine.
> 
> >   earlyprintk=serial,ttyS0,115200,keep
> 
> I'll try that.

Please don't forget:

> > Could you also please check with the (untested) patch below applied?
> > This will turn off *all* printk done by the NMI watchdog and switches
> > it to do pure early_printk() - which does not use any locking so it

if you get a lockup somewhere within printk then the NMI watchdog 
will lock up.

Also please first check that the NMI watchdog is ticking. (the patch 
will ensure that, there will be periodic prints to the serial log)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05  8:17                         ` Ingo Molnar
  2011-06-05  8:53                           ` Arne Jansen
@ 2011-06-05  9:43                           ` Arne Jansen
  2011-06-05  9:55                             ` Ingo Molnar
  1 sibling, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05  9:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 10:17, Ingo Molnar wrote:
>
> * Peter Zijlstra<peterz@infradead.org>  wrote:
>
>> On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
>>> On 03.06.2011 11:15, Peter Zijlstra wrote:
>>
>>>> Anyway, Arne, how long did you wait before power cycling the box? The
>>>> NMI watchdog should trigger in about a minute or so if it will trigger
>>>> at all (its enabled in your config).
>>>
>>> No, it doesn't trigger,
>>
>> Bummer.
>
> Is there no output even when the console is configured to do an
> earlyprintk? That will allow the NMI watchdog to punch through even a
> printk or scheduler lockup.
>
> Arne, you can turn this on via one of these:
>
>    earlyprintk=vga,keep
>    earlyprintk=serial,ttyS0,115200,keep

My grub conf looks like this now:
kernel /boot/vmlinuz-2.6.39-rc3+ root=LABEL=label panic=15 
console=ttyS0,9600 earlyprintk=serial,ttyS0,9600,keep quiet

>
> (the ',keep' portion is important to have it active even after the
> regular console has been switched on.)
>
> Could you also please check with the (untested) patch below applied?
> This will turn off *all* printk done by the NMI watchdog and switches
> it to do pure early_printk() - which does not use any locking so it
> should never lock up.
>
> [ If you keep seeing 'NMI watchdog tick' messages periodically
>    occuring after the lockup then i'll send a more complete patch that
>    shuts off the regular printk path and makes sure that all output is
>    early_printk() based only. ]
>
> earlyprintk=,keep with such a patch has let me down only on the
> rarest of occasions.
>
> ( Arne, please also double check on a working bootup that the NMI
>    watchdog is actually ticking, by checking the NMI counts in
>    /proc/interrupts go up slowly but surely on all CPUs. )

It does, but _very_ slowly. Some CPUs do not count up for tens of
minutes if the machine is idle. If I generate some load like 'make
tags', the counters go up quite quickly.
After 4 minutes and one 'make cscope' it looks like this:
NMI:          8         13         43          5          2          3 
        22          1   Non-maskable interrupts

But I never see a single tick on console or in dmesg, even when I
replace the early_printk with a printk.

Btw, I get one warn on boot, but it look irrelevant to me:
[   36.064321] ------------[ cut here ]------------
[   36.064328] WARNING: at kernel/printk.c:293 do_syslog+0xbf/0x550()
[   36.064330] Hardware name: X8SIL
[   36.064331] Attempt to access syslog with CAP_SYS_ADMIN but no 
CAP_SYSLOG (deprecated).
[   36.064333] Modules linked in: mpt2sas scsi_transport_sas raid_class
[   36.064338] Pid: 21625, comm: syslog-ng Not tainted 2.6.39-rc3+ #8
[   36.064340] Call Trace:
[   36.064344]  [<ffffffff81091f7a>] warn_slowpath_common+0x7a/0xb0
[   36.064347]  [<ffffffff81092051>] warn_slowpath_fmt+0x41/0x50
[   36.064351]  [<ffffffff8109d8a5>] ? ns_capable+0x25/0x60
[   36.064354]  [<ffffffff8109365f>] do_syslog+0xbf/0x550
[   36.064358]  [<ffffffff810c9575>] ? lock_release_holdtime+0x35/0x170
[   36.064362]  [<ffffffff811e17a7>] kmsg_open+0x17/0x20
[   36.064366]  [<ffffffff811d5f46>] proc_reg_open+0xa6/0x180
[   36.064368]  [<ffffffff811e1790>] ? kmsg_release+0x20/0x20
[   36.064371]  [<ffffffff811e1770>] ? read_vmcore+0x1d0/0x1d0
[   36.064374]  [<ffffffff811d5ea0>] ? proc_fill_super+0xb0/0xb0
[   36.064378]  [<ffffffff811790bb>] __dentry_open+0x15b/0x330
[   36.064382]  [<ffffffff8185d6e6>] ? _raw_spin_unlock+0x26/0x30
[   36.064385]  [<ffffffff81179379>] nameidata_to_filp+0x69/0x80
[   36.064388]  [<ffffffff81187a3a>] do_last+0x1da/0x840
[   36.064391]  [<ffffffff81188fdb>] path_openat+0xcb/0x3f0
[   36.064394]  [<ffffffff810ba5c5>] ? sched_clock_cpu+0xc5/0x100
[   36.064397]  [<ffffffff8118944a>] do_filp_open+0x7a/0xa0
[   36.064400]  [<ffffffff8185d6e6>] ? _raw_spin_unlock+0x26/0x30
[   36.064402]  [<ffffffff81196c12>] ? alloc_fd+0xf2/0x140
[   36.064405]  [<ffffffff8117a3d2>] do_sys_open+0x102/0x1e0
[   36.064408]  [<ffffffff8117a4db>] sys_open+0x1b/0x20
[   36.064412]  [<ffffffff81864dbb>] system_call_fastpath+0x16/0x1b
[   36.064414] ---[ end trace df959c735174f5f7 ]---


-Arne

>
> Thanks,
>
> 	Ingo
>

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05  9:41                             ` Ingo Molnar
@ 2011-06-05  9:45                               ` Ingo Molnar
  0 siblings, 0 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05  9:45 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Arne Jansen <lists@die-jansens.de> wrote:
> 
> > On 05.06.2011 10:17, Ingo Molnar wrote:
> > >
> > >* Peter Zijlstra<peterz@infradead.org>  wrote:
> > >
> > >>On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
> > >>>On 03.06.2011 11:15, Peter Zijlstra wrote:
> > >>
> > >>>>Anyway, Arne, how long did you wait before power cycling the box? The
> > >>>>NMI watchdog should trigger in about a minute or so if it will trigger
> > >>>>at all (its enabled in your config).
> > >>>
> > >>>No, it doesn't trigger,
> > >>
> > >>Bummer.
> > >
> > >Is there no output even when the console is configured to do an
> > >earlyprintk? That will allow the NMI watchdog to punch through even a
> > >printk or scheduler lockup.
> > >
> > 
> > Just to be clear, I have no boot problems whatsoever. And I have no
> > problems with the serial console. It's just the regular printk locking
> > up when e.g. I load the test module.
> 
> Yes.
> 
> > > Arne, you can turn this on via one of these:
> > >
> > >   earlyprintk=vga,keep
> > 
> > I don't have access to vga as it is a remote machine.
> > 
> > >   earlyprintk=serial,ttyS0,115200,keep
> > 
> > I'll try that.
> 
> Please don't forget:
> 
> > > Could you also please check with the (untested) patch below applied?
> > > This will turn off *all* printk done by the NMI watchdog and switches
> > > it to do pure early_printk() - which does not use any locking so it
> 
> if you get a lockup somewhere within printk then the NMI watchdog 
> will lock up.

Please use the updated patch below - the first one wasnt informative 
enough and it would stop 'ticking' after a hard lockup - not good :-)

With the patch below applied you should get periodic printouts from 
the NMI watchdog both before and after the hard lockup.

If the NMI watchdog does not stop ticking after the lockup i'll send 
a more complete patch that allows the printout of a backtrace on 
every CPU, after the lockup.

Thanks,

	Ingo
--
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3d0c56a..d335bc7 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -216,6 +216,8 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
 	/* Ensure the watchdog never gets throttled */
 	event->hw.interrupts = 0;
 
+	early_printk("CPU #%d NMI watchdog tick %ld\n", smp_processor_id(), jiffies);
+
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;
@@ -234,11 +236,6 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		if (__this_cpu_read(hard_watchdog_warn) == true)
 			return;
 
-		if (hardlockup_panic)
-			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
-		else
-			WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
-
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;
 	}

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05  9:43                           ` Arne Jansen
@ 2011-06-05  9:55                             ` Ingo Molnar
  2011-06-05 10:22                               ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05  9:55 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> >( Arne, please also double check on a working bootup that the NMI
> >   watchdog is actually ticking, by checking the NMI counts in
> >   /proc/interrupts go up slowly but surely on all CPUs. )
> 
> It does, but _very_ slowly. Some CPUs do not count up for tens of
> minutes if the machine is idle. If I generate some load like 'make
> tags', the counters go up quite quickly.
> After 4 minutes and one 'make cscope' it looks like this:
> NMI:          8         13         43          5          2
> 3        22          1   Non-maskable interrupts
> 
> But I never see a single tick on console or in dmesg, even when I
> replace the early_printk with a printk.

hm, that might be because the NMI watchdog uses halted cycles to 
tick.

That's not a problem (the kernel cannot lock up while there are no 
cycles ticking) but nevertheless could you work this around please
by starting 8 infinite shell loops:

   for ((i=0; i<8; i++)); do while : ; do : ; done & done

?

This will saturate all cores and makes sure the NMI watchdog is 
ticking everywhere.

Hopefully this wont make the bug go away :-)

This will remove one factor of uncertainty (of where the NMI watchdog 
is working or not), so it simplifies debugging.

> [   36.064321] ------------[ cut here ]------------
> [   36.064328] WARNING: at kernel/printk.c:293 do_syslog+0xbf/0x550()
> [   36.064330] Hardware name: X8SIL
> [   36.064331] Attempt to access syslog with CAP_SYS_ADMIN but no
> CAP_SYSLOG (deprecated).

Yeah, unrelated, and rather annoying looking that warning. The 
warning is borderline correct (it's messy to drop CAP_SYSLOG but keep 
CAP_SYS_ADMIN) but still, if we warned every time userspace relies on 
something the kernel provided in the past, in a somewhat messy way, 
we'd never complete bootup i guess ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05  9:55                             ` Ingo Molnar
@ 2011-06-05 10:22                               ` Arne Jansen
  2011-06-05 11:01                                 ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 10:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 11:55, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>>> ( Arne, please also double check on a working bootup that the NMI
>>>    watchdog is actually ticking, by checking the NMI counts in
>>>    /proc/interrupts go up slowly but surely on all CPUs. )
>>
>> It does, but _very_ slowly. Some CPUs do not count up for tens of
>> minutes if the machine is idle. If I generate some load like 'make
>> tags', the counters go up quite quickly.
>> After 4 minutes and one 'make cscope' it looks like this:
>> NMI:          8         13         43          5          2
>> 3        22          1   Non-maskable interrupts
>>
>> But I never see a single tick on console or in dmesg, even when I
>> replace the early_printk with a printk.
>
> hm, that might be because the NMI watchdog uses halted cycles to
> tick.
>
> That's not a problem (the kernel cannot lock up while there are no
> cycles ticking) but nevertheless could you work this around please
> by starting 8 infinite shell loops:
>
>     for ((i=0; i<8; i++)); do while : ; do : ; done&  done
>
> ?
>
> This will saturate all cores and makes sure the NMI watchdog is
> ticking everywhere.
>
> Hopefully this wont make the bug go away :-)
>

OK, now we get going. I get the ticks, the bug is still there, and
all CPUs still tick after the lockup. I also added an early_printk
inside the lockup-if, and it reports hard lockups. At first for only
one or 2 CPUs, and after some time all CPUs are locked up.

-Arne

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
  2011-06-05 10:22                               ` Arne Jansen
@ 2011-06-05 11:01                                 ` Ingo Molnar
  2011-06-05 11:19                                   ` [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 11:01 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> On 05.06.2011 11:55, Ingo Molnar wrote:
> >
> >* Arne Jansen<lists@die-jansens.de>  wrote:
> >
> >>>( Arne, please also double check on a working bootup that the NMI
> >>>   watchdog is actually ticking, by checking the NMI counts in
> >>>   /proc/interrupts go up slowly but surely on all CPUs. )
> >>
> >>It does, but _very_ slowly. Some CPUs do not count up for tens of
> >>minutes if the machine is idle. If I generate some load like 'make
> >>tags', the counters go up quite quickly.
> >>After 4 minutes and one 'make cscope' it looks like this:
> >>NMI:          8         13         43          5          2
> >>3        22          1   Non-maskable interrupts
> >>
> >>But I never see a single tick on console or in dmesg, even when I
> >>replace the early_printk with a printk.
> >
> >hm, that might be because the NMI watchdog uses halted cycles to
> >tick.
> >
> >That's not a problem (the kernel cannot lock up while there are no
> >cycles ticking) but nevertheless could you work this around please
> >by starting 8 infinite shell loops:
> >
> >    for ((i=0; i<8; i++)); do while : ; do : ; done&  done
> >
> >?
> >
> >This will saturate all cores and makes sure the NMI watchdog is
> >ticking everywhere.
> >
> >Hopefully this wont make the bug go away :-)
> >
> 
> OK, now we get going. I get the ticks, the bug is still there, and
> all CPUs still tick after the lockup. I also added an early_printk
> inside the lockup-if, and it reports hard lockups. At first for only
> one or 2 CPUs, and after some time all CPUs are locked up.

Very good!

If you add a dump_stack() do you get a stacktrace, or do the NMI 
watchdog ticks stop?

If the ticks stop this suggests a lockup within the printk code. If 
you get a stack dump then we'll have good debug data.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 11:01                                 ` Ingo Molnar
@ 2011-06-05 11:19                                   ` Ingo Molnar
  2011-06-05 11:36                                     ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 11:19 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Ingo Molnar <mingo@elte.hu> wrote:

> If the ticks stop this suggests a lockup within the printk code.
> [...]

In which case the printk-killswitch patch below (to be applied 
*instead* of the previous debugging patch i sent) should provide the 
desired NMI watchdog output on the serial console.

Warning: it's entirely untested.

Thanks,

	Ingo

 arch/x86/kernel/early_printk.c |    2 +-
 include/linux/printk.h         |    4 ++++
 kernel/printk.c                |   18 ++++++++++++++++++
 kernel/watchdog.c              |    7 +++++++
 4 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index cd28a35..d75fd66 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -171,7 +171,7 @@ static struct console early_serial_console = {
 
 /* Direct interface for emergencies */
 static struct console *early_console = &early_vga_console;
-static int __initdata early_console_initialized;
+int early_console_initialized;
 
 asmlinkage void early_printk(const char *fmt, ...)
 {
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 0101d55..7393291 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -88,6 +88,8 @@ int no_printk(const char *fmt, ...)
 	return 0;
 }
 
+extern int early_console_initialized;
+
 extern asmlinkage __attribute__ ((format (printf, 1, 2)))
 void early_printk(const char *fmt, ...);
 
@@ -114,6 +116,8 @@ extern int printk_delay_msec;
 extern int dmesg_restrict;
 extern int kptr_restrict;
 
+extern void printk_kill(void);
+
 void log_buf_kexec_setup(void);
 void __init setup_log_buf(int early);
 #else
diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..f6193e1 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -519,6 +519,19 @@ static void __call_console_drivers(unsigned start, unsigned end)
 	}
 }
 
+/*
+ * This is independent of any log levels - a global
+ * kill switch that turns off all of printk.
+ *
+ * Used by the NMI watchdog if early-printk is enabled.
+ */
+static int __read_mostly printk_killswitch;
+
+void printk_kill(void)
+{
+	printk_killswitch = 1;
+}
+
 static int __read_mostly ignore_loglevel;
 
 static int __init ignore_loglevel_setup(char *str)
@@ -833,6 +846,10 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	size_t plen;
 	char special;
 
+	/* Return early if a debugging subsystem has killed printk output: */
+	if (unlikely(printk_killswitch))
+		return 1;
+
 	boot_delay_msec();
 	printk_delay();
 
@@ -1533,6 +1550,7 @@ void register_console(struct console *newcon)
 		for_each_console(bcon)
 			if (bcon->flags & CON_BOOT)
 				unregister_console(bcon);
+		early_console_initialized = 0;
 	} else {
 		printk(KERN_INFO "%sconsole [%s%d] enabled\n",
 			(newcon->flags & CON_BOOT) ? "boot" : "" ,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3d0c56a..6e9b109 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -234,6 +234,13 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		if (__this_cpu_read(hard_watchdog_warn) == true)
 			return;
 
+		/*
+		 * If early-printk is enabled then make sure we do not
+		 * lock up in printk() and kill console logging:
+		 */
+		if (early_console_initialized)
+			printk_kill();
+
 		if (hardlockup_panic)
 			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
 		else

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 11:19                                   ` [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages Ingo Molnar
@ 2011-06-05 11:36                                     ` Ingo Molnar
  2011-06-05 11:57                                       ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 11:36 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Ingo Molnar <mingo@elte.hu> wrote:

> > If the ticks stop this suggests a lockup within the printk code. 
> > [...]
> 
> In which case the printk-killswitch patch below (to be applied 
> *instead* of the previous debugging patch i sent) should provide 
> the desired NMI watchdog output on the serial console.
> 
> Warning: it's entirely untested.

Note, since this is an SMP box, if the lockup messages show up with 
this patch but are mixed up with each other then adding a spinlock 
around the WARN() would probably help keeping the output serialized.

A simple:

 static DEFINE_SPINLOCK(watchdog_output_lock);

 ...
	spin_lock(&watchdog_output_lock);
 ...
	[ the WARN_ON() logic. ]
 ...
	spin_unlock(&watchdog_output_lock);
 ...

would suffice.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 11:36                                     ` Ingo Molnar
@ 2011-06-05 11:57                                       ` Arne Jansen
  2011-06-05 13:39                                         ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 11:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 13:36, Ingo Molnar wrote:
>
> * Ingo Molnar<mingo@elte.hu>  wrote:
>
>>> If the ticks stop this suggests a lockup within the printk code.
>>> [...]
>>
>> In which case the printk-killswitch patch below (to be applied
>> *instead* of the previous debugging patch i sent) should provide
>> the desired NMI watchdog output on the serial console.
>>
>> Warning: it's entirely untested.

How is the output supposed to come through? shouldn't printk revert
to early_printk instead of just returning?


>
> Note, since this is an SMP box, if the lockup messages show up with
> this patch but are mixed up with each other then adding a spinlock
> around the WARN() would probably help keeping the output serialized.
>
> A simple:
>
>   static DEFINE_SPINLOCK(watchdog_output_lock);
>
>   ...
> 	spin_lock(&watchdog_output_lock);
>   ...
> 	[ the WARN_ON() logic. ]
>   ...
> 	spin_unlock(&watchdog_output_lock);
>   ...
>
> would suffice.
>
> Thanks,
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 11:57                                       ` Arne Jansen
@ 2011-06-05 13:39                                         ` Ingo Molnar
  2011-06-05 13:54                                           ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 13:39 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> >> Warning: it's entirely untested.
> 
> How is the output supposed to come through? shouldn't printk revert 
> to early_printk instead of just returning?

oh, right you are.

Does the patch below work? It does early-printk within printk().

Thanks,

	Ingo

diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index cd28a35..211d8c2 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -170,8 +170,8 @@ static struct console early_serial_console = {
 };
 
 /* Direct interface for emergencies */
-static struct console *early_console = &early_vga_console;
-static int __initdata early_console_initialized;
+struct console *early_console = &early_vga_console;
+int early_console_initialized;
 
 asmlinkage void early_printk(const char *fmt, ...)
 {
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 0101d55..414dc34 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -88,6 +88,9 @@ int no_printk(const char *fmt, ...)
 	return 0;
 }
 
+extern struct console *early_console;
+extern int early_console_initialized;
+
 extern asmlinkage __attribute__ ((format (printf, 1, 2)))
 void early_printk(const char *fmt, ...);
 
@@ -114,6 +117,8 @@ extern int printk_delay_msec;
 extern int dmesg_restrict;
 extern int kptr_restrict;
 
+extern void printk_kill(void);
+
 void log_buf_kexec_setup(void);
 void __init setup_log_buf(int early);
 #else
diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..50684e3 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -519,6 +519,19 @@ static void __call_console_drivers(unsigned start, unsigned end)
 	}
 }
 
+/*
+ * This is independent of any log levels - a global
+ * kill switch that turns off all of printk.
+ *
+ * Used by the NMI watchdog if early-printk is enabled.
+ */
+static int __read_mostly printk_killswitch;
+
+void printk_kill(void)
+{
+	printk_killswitch = 1;
+}
+
 static int __read_mostly ignore_loglevel;
 
 static int __init ignore_loglevel_setup(char *str)
@@ -833,6 +846,16 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	size_t plen;
 	char special;
 
+	/* Return early if a debugging subsystem has killed printk output: */
+	if (unlikely(printk_killswitch)) {
+		char buf[512];
+
+		printed_len = vscnprintf(buf, sizeof(buf), fmt, args);
+		early_console->write(early_console, buf, printed_len);
+
+		return printed_len;
+	}
+
 	boot_delay_msec();
 	printk_delay();
 
@@ -1533,6 +1556,7 @@ void register_console(struct console *newcon)
 		for_each_console(bcon)
 			if (bcon->flags & CON_BOOT)
 				unregister_console(bcon);
+		early_console_initialized = 0;
 	} else {
 		printk(KERN_INFO "%sconsole [%s%d] enabled\n",
 			(newcon->flags & CON_BOOT) ? "boot" : "" ,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3d0c56a..6e9b109 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -234,6 +234,13 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		if (__this_cpu_read(hard_watchdog_warn) == true)
 			return;
 
+		/*
+		 * If early-printk is enabled then make sure we do not
+		 * lock up in printk() and kill console logging:
+		 */
+		if (early_console_initialized)
+			printk_kill();
+
 		if (hardlockup_panic)
 			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
 		else

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 13:39                                         ` Ingo Molnar
@ 2011-06-05 13:54                                           ` Arne Jansen
  2011-06-05 14:06                                             ` Ingo Molnar
  2011-06-05 14:10                                             ` Ingo Molnar
  0 siblings, 2 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 13:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 15:39, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>>>> Warning: it's entirely untested.
>>
>> How is the output supposed to come through? shouldn't printk revert
>> to early_printk instead of just returning?
>
> oh, right you are.
>
> Does the patch below work? It does early-printk within printk().

Too late, I already built my on early_vprintk ;)

Here we go:

http://eischnee.de/lockup.txt

Now it's your turn :)

-Arne

>
> Thanks,
>
> 	Ingo
>
> diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
> index cd28a35..211d8c2 100644
> --- a/arch/x86/kernel/early_printk.c
> +++ b/arch/x86/kernel/early_printk.c
> @@ -170,8 +170,8 @@ static struct console early_serial_console = {
>   };
>
>   /* Direct interface for emergencies */
> -static struct console *early_console =&early_vga_console;
> -static int __initdata early_console_initialized;
> +struct console *early_console =&early_vga_console;
> +int early_console_initialized;
>
>   asmlinkage void early_printk(const char *fmt, ...)
>   {
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 0101d55..414dc34 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -88,6 +88,9 @@ int no_printk(const char *fmt, ...)
>   	return 0;
>   }
>
> +extern struct console *early_console;
> +extern int early_console_initialized;
> +
>   extern asmlinkage __attribute__ ((format (printf, 1, 2)))
>   void early_printk(const char *fmt, ...);
>
> @@ -114,6 +117,8 @@ extern int printk_delay_msec;
>   extern int dmesg_restrict;
>   extern int kptr_restrict;
>
> +extern void printk_kill(void);
> +
>   void log_buf_kexec_setup(void);
>   void __init setup_log_buf(int early);
>   #else
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 3518539..50684e3 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -519,6 +519,19 @@ static void __call_console_drivers(unsigned start, unsigned end)
>   	}
>   }
>
> +/*
> + * This is independent of any log levels - a global
> + * kill switch that turns off all of printk.
> + *
> + * Used by the NMI watchdog if early-printk is enabled.
> + */
> +static int __read_mostly printk_killswitch;
> +
> +void printk_kill(void)
> +{
> +	printk_killswitch = 1;
> +}
> +
>   static int __read_mostly ignore_loglevel;
>
>   static int __init ignore_loglevel_setup(char *str)
> @@ -833,6 +846,16 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>   	size_t plen;
>   	char special;
>
> +	/* Return early if a debugging subsystem has killed printk output: */
> +	if (unlikely(printk_killswitch)) {
> +		char buf[512];
> +
> +		printed_len = vscnprintf(buf, sizeof(buf), fmt, args);
> +		early_console->write(early_console, buf, printed_len);
> +
> +		return printed_len;
> +	}
> +
>   	boot_delay_msec();
>   	printk_delay();
>
> @@ -1533,6 +1556,7 @@ void register_console(struct console *newcon)
>   		for_each_console(bcon)
>   			if (bcon->flags&  CON_BOOT)
>   				unregister_console(bcon);
> +		early_console_initialized = 0;
>   	} else {
>   		printk(KERN_INFO "%sconsole [%s%d] enabled\n",
>   			(newcon->flags&  CON_BOOT) ? "boot" : "" ,
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 3d0c56a..6e9b109 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -234,6 +234,13 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
>   		if (__this_cpu_read(hard_watchdog_warn) == true)
>   			return;
>
> +		/*
> +		 * If early-printk is enabled then make sure we do not
> +		 * lock up in printk() and kill console logging:
> +		 */
> +		if (early_console_initialized)
> +			printk_kill();
> +
>   		if (hardlockup_panic)
>   			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
>   		else
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 13:54                                           ` Arne Jansen
@ 2011-06-05 14:06                                             ` Ingo Molnar
  2011-06-05 14:45                                               ` Arne Jansen
  2011-06-05 14:10                                             ` Ingo Molnar
  1 sibling, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 14:06 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> On 05.06.2011 15:39, Ingo Molnar wrote:
> >
> >* Arne Jansen<lists@die-jansens.de>  wrote:
> >
> >>>>Warning: it's entirely untested.
> >>
> >>How is the output supposed to come through? shouldn't printk revert
> >>to early_printk instead of just returning?
> >
> >oh, right you are.
> >
> >Does the patch below work? It does early-printk within printk().
> 
> Too late, I already built my on early_vprintk ;)

heh :-)

Mind posting the patch? Your tested patch is infinitely more valuable 
than my not-even-build-tested patch ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 13:54                                           ` Arne Jansen
  2011-06-05 14:06                                             ` Ingo Molnar
@ 2011-06-05 14:10                                             ` Ingo Molnar
  2011-06-05 14:31                                               ` Arne Jansen
  1 sibling, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 14:10 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> Here we go:
> 
> http://eischnee.de/lockup.txt
> 
> Now it's your turn :)

So the lockup is in:

 [<ffffffff813af2d9>] do_raw_spin_lock+0x129/0x170
 [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
 [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
 [<ffffffff81092df6>] ? vprintk+0x76/0x4a0
 [<ffffffff81092df6>] vprintk+0x76/0x4a0
 [<ffffffff810c5f8d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
 [<ffffffff81859e19>] printk+0x63/0x65
 [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
 [<ffffffff81091f98>] warn_slowpath_common+0x38/0xb0
 [<ffffffff81092025>] warn_slowpath_null+0x15/0x20
 [<ffffffff8108a4bd>] try_to_wake_up+0x29d/0x350
 [<ffffffff8108a5a0>] wake_up_process+0x10/0x20
 [<ffffffff8185c071>] __up+0x41/0x50
 [<ffffffff810b937c>] up+0x3c/0x50
 [<ffffffff81092a36>] console_unlock+0x1a6/0x200
 [<ffffffff81092f86>] vprintk+0x206/0x4a0
 [<ffffffff810c5f8d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810ba6db>] ? local_clock+0x4b/0x60
 [<ffffffffa0012000>] ? 0xffffffffa0011fff
 [<ffffffff81859e19>] printk+0x63/0x65
 [<ffffffffa001201d>] test_init+0x1d/0x2b [test]
 [<ffffffff810001ce>] do_one_initcall+0x3e/0x170

Somehow we end up generating a WARN_ON() within a printk()'s 
try_to_wake_up() and predictably we lock up on that ...

Peter?

Arne, mind helping a bit with:

 [<ffffffff81091f98>] warn_slowpath_common+0x38/0xb0
 [<ffffffff81092025>] warn_slowpath_null+0x15/0x20
 [<ffffffff8108a4bd>] try_to_wake_up+0x29d/0x350

which WARN_ON() does that correspond to in try_to_wake_up()?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 14:10                                             ` Ingo Molnar
@ 2011-06-05 14:31                                               ` Arne Jansen
  2011-06-05 15:13                                                 ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 14:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 16:10, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>> Here we go:
>>
>> http://eischnee.de/lockup.txt
>>
>> Now it's your turn :)
>
> So the lockup is in:
>
>   [<ffffffff813af2d9>] do_raw_spin_lock+0x129/0x170
>   [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
>   [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
>   [<ffffffff81092df6>] ? vprintk+0x76/0x4a0
>   [<ffffffff81092df6>] vprintk+0x76/0x4a0
>   [<ffffffff810c5f8d>] ? trace_hardirqs_off+0xd/0x10
>   [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
>   [<ffffffff81859e19>] printk+0x63/0x65
>   [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
>   [<ffffffff81091f98>] warn_slowpath_common+0x38/0xb0
>   [<ffffffff81092025>] warn_slowpath_null+0x15/0x20
>   [<ffffffff8108a4bd>] try_to_wake_up+0x29d/0x350
>   [<ffffffff8108a5a0>] wake_up_process+0x10/0x20
>   [<ffffffff8185c071>] __up+0x41/0x50
>   [<ffffffff810b937c>] up+0x3c/0x50
>   [<ffffffff81092a36>] console_unlock+0x1a6/0x200
>   [<ffffffff81092f86>] vprintk+0x206/0x4a0
>   [<ffffffff810c5f8d>] ? trace_hardirqs_off+0xd/0x10
>   [<ffffffff810ba6db>] ? local_clock+0x4b/0x60
>   [<ffffffffa0012000>] ? 0xffffffffa0011fff
>   [<ffffffff81859e19>] printk+0x63/0x65
>   [<ffffffffa001201d>] test_init+0x1d/0x2b [test]
>   [<ffffffff810001ce>] do_one_initcall+0x3e/0x170
>
> Somehow we end up generating a WARN_ON() within a printk()'s
> try_to_wake_up() and predictably we lock up on that ...
>
> Peter?
>
> Arne, mind helping a bit with:
>
>   [<ffffffff81091f98>] warn_slowpath_common+0x38/0xb0
>   [<ffffffff81092025>] warn_slowpath_null+0x15/0x20
>   [<ffffffff8108a4bd>] try_to_wake_up+0x29d/0x350
>
> which WARN_ON() does that correspond to in try_to_wake_up()?

(gdb) info line *0xffffffff8108a4bd
Line 934 of "kernel/sched.c" starts at address 0xffffffff8108a498 
<try_to_wake_up+632> and ends at 0xffffffff8108a4c8 <try_to_wake_up+680>.

sched.c:934: in function __task_rq_lock
         lockdep_assert_held(&p->pi_lock);

I'm currently testing on commit 0122ec5b02f766c355b3168d.

-Arne

>
> Thanks,
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 14:06                                             ` Ingo Molnar
@ 2011-06-05 14:45                                               ` Arne Jansen
  0 siblings, 0 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 14:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

[-- Attachment #1: Type: text/plain, Size: 710 bytes --]

On 05.06.2011 16:06, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>> On 05.06.2011 15:39, Ingo Molnar wrote:
>>>
>>> * Arne Jansen<lists@die-jansens.de>   wrote:
>>>
>>>>>> Warning: it's entirely untested.
>>>>
>>>> How is the output supposed to come through? shouldn't printk revert
>>>> to early_printk instead of just returning?
>>>
>>> oh, right you are.
>>>
>>> Does the patch below work? It does early-printk within printk().
>>
>> Too late, I already built my on early_vprintk ;)
>
> heh :-)
>
> Mind posting the patch? Your tested patch is infinitely more valuable
> than my not-even-build-tested patch ;-)

It's basically just your patch. I attached it nevertheless...

-Arne


[-- Attachment #2: printk_kill.patch --]
[-- Type: text/x-patch, Size: 3938 bytes --]

diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index cd28a35..0623126 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -171,7 +171,7 @@ static struct console early_serial_console = {
 
 /* Direct interface for emergencies */
 static struct console *early_console = &early_vga_console;
-static int __initdata early_console_initialized;
+int early_console_initialized;
 
 asmlinkage void early_printk(const char *fmt, ...)
 {
@@ -185,6 +185,15 @@ asmlinkage void early_printk(const char *fmt, ...)
 	va_end(ap);
 }
 
+asmlinkage void early_vprintk(const char *fmt, va_list ap)
+{
+	char buf[512];
+	int n;
+
+	n = vscnprintf(buf, sizeof(buf), fmt, ap);
+	early_console->write(early_console, buf, n);
+}
+
 static inline void early_console_register(struct console *con, int keep_early)
 {
 	if (early_console->index != -1) {
diff --git a/include/linux/printk.h b/include/linux/printk.h
index ee048e7..6bb6963 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -86,8 +86,11 @@ int no_printk(const char *fmt, ...)
 	return 0;
 }
 
+extern int early_console_initialized;
+
 extern asmlinkage __attribute__ ((format (printf, 1, 2)))
 void early_printk(const char *fmt, ...);
+void early_vprintk(const char *fmt, va_list args);
 
 extern int printk_needs_cpu(int cpu);
 extern void printk_tick(void);
@@ -112,6 +115,8 @@ extern int printk_delay_msec;
 extern int dmesg_restrict;
 extern int kptr_restrict;
 
+extern void printk_kill(void);
+
 void log_buf_kexec_setup(void);
 #else
 static inline __attribute__ ((format (printf, 1, 0)))
diff --git a/kernel/printk.c b/kernel/printk.c
index da8ca81..38f880f 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -490,6 +490,19 @@ static void __call_console_drivers(unsigned start, unsigned end)
 	}
 }
 
+/*
+ * This is independent of any log levels - a global
+ * kill switch that turns off all of printk.
+ *
+ * Used by the NMI watchdog if early-printk is enabled.
+ */
+static int __read_mostly printk_killswitch;
+
+void printk_kill(void)
+{
+	printk_killswitch = 1;
+}
+
 static int __read_mostly ignore_loglevel;
 
 static int __init ignore_loglevel_setup(char *str)
@@ -804,6 +817,15 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	size_t plen;
 	char special;
 
+	/*
+	 * Fall back to early_printk if a debugging subsystem has
+	 * killed printk output
+	 */
+	if (unlikely(printk_killswitch)) {
+		early_vprintk(fmt, args);
+		return 1;
+	}
+
 	boot_delay_msec();
 	printk_delay();
 
@@ -1504,6 +1526,7 @@ void register_console(struct console *newcon)
 		for_each_console(bcon)
 			if (bcon->flags & CON_BOOT)
 				unregister_console(bcon);
+		early_console_initialized = 0;
 	} else {
 		printk(KERN_INFO "%sconsole [%s%d] enabled\n",
 			(newcon->flags & CON_BOOT) ? "boot" : "" ,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 140dce7..18fca3d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -197,6 +197,8 @@ static struct perf_event_attr wd_hw_attr = {
 	.disabled	= 1,
 };
 
+static DEFINE_SPINLOCK(watchdog_output_lock);
+
 /* Callback function for perf event subsystem */
 static void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		 struct perf_sample_data *data,
@@ -223,10 +225,20 @@ static void watchdog_overflow_callback(struct perf_event *event, int nmi,
 		if (__this_cpu_read(hard_watchdog_warn) == true)
 			return;
 
-		if (hardlockup_panic)
+		/*
+		 * If early-printk is enabled then make sure we do not
+		 * lock up in printk() and kill console logging:
+		 */
+		if (early_console_initialized)
+			printk_kill();
+
+		if (hardlockup_panic) {
 			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
-		else
+		} else {
+			spin_lock(&watchdog_output_lock);
 			WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
+			spin_unlock(&watchdog_output_lock);
+		}
 
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 14:31                                               ` Arne Jansen
@ 2011-06-05 15:13                                                 ` Ingo Molnar
  2011-06-05 15:26                                                   ` Ingo Molnar
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 15:13 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> sched.c:934: in function __task_rq_lock
>         lockdep_assert_held(&p->pi_lock);

Oh. Could you remove that line with the patch below - does it result 
in a working system?

Now, this patch alone just removes a debugging check - but i'm not 
sure the debugging check is correct - we take the pi_lock in a raw 
way - which means it's not lockdep covered.

So how can lockdep_assert_held() be called on it?

Thanks,

	Ingo

diff --git a/kernel/sched.c b/kernel/sched.c
index fd18f39..a32316b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -938,8 +938,6 @@ static inline struct rq *__task_rq_lock(struct task_struct *p)
 {
 	struct rq *rq;
 
-	lockdep_assert_held(&p->pi_lock);
-
 	for (;;) {
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 15:13                                                 ` Ingo Molnar
@ 2011-06-05 15:26                                                   ` Ingo Molnar
  2011-06-05 15:32                                                     ` Ingo Molnar
  2011-06-06  7:34                                                     ` Arne Jansen
  2011-06-05 15:34                                                   ` Arne Jansen
  2011-06-06  8:38                                                   ` Peter Zijlstra
  2 siblings, 2 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 15:26 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Arne Jansen <lists@die-jansens.de> wrote:
> 
> > sched.c:934: in function __task_rq_lock
> >         lockdep_assert_held(&p->pi_lock);
> 
> Oh. Could you remove that line with the patch below - does it result 
> in a working system?
> 
> Now, this patch alone just removes a debugging check - but i'm not 
> sure the debugging check is correct - we take the pi_lock in a raw 
> way - which means it's not lockdep covered.
> 
> So how can lockdep_assert_held() be called on it?

Ok, i'm wrong there - it's lockdep covered.

I also reviewed all the __task_rq_lock() call sites and each of them 
has the pi_lock acquired. So unless both Peter and me are blind, the 
other option would be some sort of memory corruption corrupting the 
runqueue.

But ... that looks so unlikely here, it's clearly heavy printk() and 
console_sem twiddling that triggers the bug, not any other scheduler 
activity.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 15:26                                                   ` Ingo Molnar
@ 2011-06-05 15:32                                                     ` Ingo Molnar
  2011-06-05 16:07                                                       ` Arne Jansen
  2011-06-06  7:34                                                     ` Arne Jansen
  1 sibling, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 15:32 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


one more thing, could you please add this call:

	debug_show_all_locks();

to after the WARN(), in watchdog.c?

Please surround the whole printout portion by the 
spin_lock()/unlock() protection code i suggested, full-lock-state 
printouts are slow and other CPUs might start printing their NMI 
ticks ...

With the all-locks-printed output we can double check what locks are 
held.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 15:13                                                 ` Ingo Molnar
  2011-06-05 15:26                                                   ` Ingo Molnar
@ 2011-06-05 15:34                                                   ` Arne Jansen
  2011-06-06  8:38                                                   ` Peter Zijlstra
  2 siblings, 0 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 15:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 17:13, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>> sched.c:934: in function __task_rq_lock
>>          lockdep_assert_held(&p->pi_lock);
>
> Oh. Could you remove that line with the patch below - does it result
> in a working system?

yes.

>
> Now, this patch alone just removes a debugging check - but i'm not
> sure the debugging check is correct - we take the pi_lock in a raw
> way - which means it's not lockdep covered.
>
> So how can lockdep_assert_held() be called on it?
>
> Thanks,
>
> 	Ingo
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index fd18f39..a32316b 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -938,8 +938,6 @@ static inline struct rq *__task_rq_lock(struct task_struct *p)
>   {
>   	struct rq *rq;
>
> -	lockdep_assert_held(&p->pi_lock);
> -
>   	for (;;) {
>   		rq = task_rq(p);
>   		raw_spin_lock(&rq->lock);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 15:32                                                     ` Ingo Molnar
@ 2011-06-05 16:07                                                       ` Arne Jansen
  2011-06-05 16:35                                                         ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 16:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 17:32, Ingo Molnar wrote:
>
> one more thing, could you please add this call:
>
> 	debug_show_all_locks();
>
> to after the WARN(), in watchdog.c?
>
> Please surround the whole printout portion by the
> spin_lock()/unlock() protection code i suggested, full-lock-state
> printouts are slow and other CPUs might start printing their NMI
> ticks ...
>
> With the all-locks-printed output we can double check what locks are
> held.

Hm, on first try:

INFO: lockdep is turned off.

Recompiling...

>
> Thanks,
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 16:07                                                       ` Arne Jansen
@ 2011-06-05 16:35                                                         ` Arne Jansen
  2011-06-05 16:50                                                           ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 16:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 18:07, Arne Jansen wrote:
> On 05.06.2011 17:32, Ingo Molnar wrote:
>>
>> one more thing, could you please add this call:
>>
>> debug_show_all_locks();
>>
>> to after the WARN(), in watchdog.c?
>>
>> Please surround the whole printout portion by the
>> spin_lock()/unlock() protection code i suggested, full-lock-state
>> printouts are slow and other CPUs might start printing their NMI
>> ticks ...
>>
>> With the all-locks-printed output we can double check what locks are
>> held.
>
> Hm, on first try:
>
> INFO: lockdep is turned off.
>
> Recompiling...
>

same after a full recompile.

# grep LOCKDEP .config

CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
# CONFIG_DEBUG_LOCKDEP is not set


-Arne

>>
>> Thanks,
>>
>> Ingo
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 16:35                                                         ` Arne Jansen
@ 2011-06-05 16:50                                                           ` Arne Jansen
  2011-06-05 17:20                                                             ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 16:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 18:35, Arne Jansen wrote:
> On 05.06.2011 18:07, Arne Jansen wrote:
>> On 05.06.2011 17:32, Ingo Molnar wrote:
>>>
>>> one more thing, could you please add this call:
>>>
>>> debug_show_all_locks();
>>>
>>> to after the WARN(), in watchdog.c?
>>>
>>> Please surround the whole printout portion by the
>>> spin_lock()/unlock() protection code i suggested, full-lock-state
>>> printouts are slow and other CPUs might start printing their NMI
>>> ticks ...
>>>
>>> With the all-locks-printed output we can double check what locks are
>>> held.
>>

btw, the output posted earlier also contains some
BUG: spinlock lockup.


>> Hm, on first try:
>>
>> INFO: lockdep is turned off.
>>
>> Recompiling...
>>
>
> same after a full recompile.
>
> # grep LOCKDEP .config
>
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_LOCKDEP=y
> # CONFIG_DEBUG_LOCKDEP is not set
>
>
> -Arne
>
>>>
>>> Thanks,
>>>
>>> Ingo
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 16:50                                                           ` Arne Jansen
@ 2011-06-05 17:20                                                             ` Ingo Molnar
  2011-06-05 17:42                                                               ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 17:20 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> >>>With the all-locks-printed output we can double check what locks are
> >>>held.
> 
> btw, the output posted earlier also contains some BUG: spinlock 
> lockup.

hm, it's hard to interpret that without the spin_lock()/unlock() 
logic keeping the dumps apart.

Was lockdep enabled as you started the test?

but ... if the lock is reasonably sorted then it's this one:

<0>BUG: spinlock lockup on CPU#3, modprobe/22211, ffffffff81e1c0c0
Pid: 22211, comm: modprobe Tainted: G        W   2.6.39-rc3+ #19
Call Trace:
 [<ffffffff813af306>] do_raw_spin_lock+0x156/0x170
 [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
 [<ffffffff81092df6>] ? vprintk+0x76/0x4a0
 [<ffffffff81092df6>] vprintk+0x76/0x4a0
 [<ffffffff810c5f8d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff81859e19>] printk+0x63/0x65
 [<ffffffff813af301>] do_raw_spin_lock+0x151/0x170
 [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
 [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
 [<ffffffff81092df6>] ? vprintk+0x76/0x4a0
 [<ffffffff81092df6>] vprintk+0x76/0x4a0
 [<ffffffff8108758b>] ? cpuacct_charge+0x9b/0xb0
 [<ffffffff8108750f>] ? cpuacct_charge+0x1f/0xb0
 [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
 [<ffffffff81859e19>] printk+0x63/0x65
 [<ffffffff813af090>] spin_bug+0x70/0xf0
 [<ffffffff813af2d9>] do_raw_spin_lock+0x129/0x170
 [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
 [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
 [<ffffffff81092df6>] ? vprintk+0x76/0x4a0

and it occured before the lockup in the scheduler.

Which could be due to a race between disabling lockdep on one CPU and 
the scheduler doing the lock-held check on another CPU.

Do you get any messages after the assert is removed, during the test?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 17:20                                                             ` Ingo Molnar
@ 2011-06-05 17:42                                                               ` Arne Jansen
  2011-06-05 18:59                                                                 ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 17:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 19:20, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>>>>> With the all-locks-printed output we can double check what locks are
>>>>> held.
>>
>> btw, the output posted earlier also contains some BUG: spinlock
>> lockup.
>
> hm, it's hard to interpret that without the spin_lock()/unlock()
> logic keeping the dumps apart.

The locking was in place from the beginning. As the output is still
scrambled, there are other sources for BUG/WARN outside the watchdog
that trigger in parallel. Maybe we should protect the whole BUG/WARN
mechanism with a lock and send it to early_printk from the beginning,
so we don't have to wait for the watchdog to kill printk off and the
first BUG can come through.
Or just let WARN/BUG kill off printk instead of the watchdog (though
I have to get rid of that syslog-WARN on startup).

>
> Was lockdep enabled as you started the test?

At least it was in the config, but haven't double checked. ATM, it is.

>
> but ... if the lock is reasonably sorted then it's this one:
>
> <0>BUG: spinlock lockup on CPU#3, modprobe/22211, ffffffff81e1c0c0
> Pid: 22211, comm: modprobe Tainted: G        W   2.6.39-rc3+ #19
> Call Trace:
>   [<ffffffff813af306>] do_raw_spin_lock+0x156/0x170
>   [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
>   [<ffffffff81092df6>] ? vprintk+0x76/0x4a0
>   [<ffffffff81092df6>] vprintk+0x76/0x4a0
>   [<ffffffff810c5f8d>] ? trace_hardirqs_off+0xd/0x10
>   [<ffffffff81859e19>] printk+0x63/0x65
>   [<ffffffff813af301>] do_raw_spin_lock+0x151/0x170
>   [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
>   [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
>   [<ffffffff81092df6>] ? vprintk+0x76/0x4a0
>   [<ffffffff81092df6>] vprintk+0x76/0x4a0
>   [<ffffffff8108758b>] ? cpuacct_charge+0x9b/0xb0
>   [<ffffffff8108750f>] ? cpuacct_charge+0x1f/0xb0
>   [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
>   [<ffffffff81859e19>] printk+0x63/0x65
>   [<ffffffff813af090>] spin_bug+0x70/0xf0
>   [<ffffffff813af2d9>] do_raw_spin_lock+0x129/0x170
>   [<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
>   [<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
>   [<ffffffff81092df6>] ? vprintk+0x76/0x4a0
>
> and it occured before the lockup in the scheduler.
>
> Which could be due to a race between disabling lockdep on one CPU and
> the scheduler doing the lock-held check on another CPU.
>
> Do you get any messages after the assert is removed, during the test?

No.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 17:42                                                               ` Arne Jansen
@ 2011-06-05 18:59                                                                 ` Ingo Molnar
  2011-06-05 19:30                                                                   ` Arne Jansen
  2011-06-06 13:10                                                                   ` Ingo Molnar
  0 siblings, 2 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 18:59 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> > hm, it's hard to interpret that without the spin_lock()/unlock() 
> > logic keeping the dumps apart.
> 
> The locking was in place from the beginning. [...]

Ok, i was surprised it looked relatively ordered :-)

> [...] As the output is still scrambled, there are other sources for 
> BUG/WARN outside the watchdog that trigger in parallel. Maybe we 
> should protect the whole BUG/WARN mechanism with a lock and send it 
> to early_printk from the beginning, so we don't have to wait for 
> the watchdog to kill printk off and the first BUG can come through. 
> Or just let WARN/BUG kill off printk instead of the watchdog 
> (though I have to get rid of that syslog-WARN on startup).

I had yet another look at your lockup.txt and i think the main cause 
is the WARN_ON() caused by the not-held pi_lock. The lockup there 
causes other CPUs to wedge in printk, which triggers spinlock-lockup 
messages there.

So i think the primary trigger is the pi_lock WARN_ON() (as your 
bisection has confirmed that too), everything else comes from this.

Unfortunately i don't think we can really 'fix' the problem by 
removing the assert. By all means the assert is correct: pi_lock 
should be held there. If we are not holding it then we likely won't 
crash in an easily visible way - it's a lot easier to trigger asserts 
than to trigger obscure side-effects of locking bugs.

It is also a mystery why only printk() triggers this bug. The wakeup 
done there is not particularly special, so by all means we should 
have seen similar lockups elsewhere as well - not just with 
printk()s. Yet we are not seeing them.

So some essential piece of the puzzle is still missing.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 18:59                                                                 ` Ingo Molnar
@ 2011-06-05 19:30                                                                   ` Arne Jansen
  2011-06-05 19:44                                                                     ` Ingo Molnar
  2011-06-06 13:10                                                                   ` Ingo Molnar
  1 sibling, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 19:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 20:59, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>>> hm, it's hard to interpret that without the spin_lock()/unlock()
>>> logic keeping the dumps apart.
>>
>> The locking was in place from the beginning. [...]
>
> Ok, i was surprised it looked relatively ordered :-)
>
>> [...] As the output is still scrambled, there are other sources for
>> BUG/WARN outside the watchdog that trigger in parallel. Maybe we
>> should protect the whole BUG/WARN mechanism with a lock and send it
>> to early_printk from the beginning, so we don't have to wait for
>> the watchdog to kill printk off and the first BUG can come through.
>> Or just let WARN/BUG kill off printk instead of the watchdog
>> (though I have to get rid of that syslog-WARN on startup).
>
> I had yet another look at your lockup.txt and i think the main cause
> is the WARN_ON() caused by the not-held pi_lock. The lockup there
> causes other CPUs to wedge in printk, which triggers spinlock-lockup
> messages there.
>
> So i think the primary trigger is the pi_lock WARN_ON() (as your
> bisection has confirmed that too), everything else comes from this.
>
> Unfortunately i don't think we can really 'fix' the problem by
> removing the assert. By all means the assert is correct: pi_lock
> should be held there. If we are not holding it then we likely won't
> crash in an easily visible way - it's a lot easier to trigger asserts
> than to trigger obscure side-effects of locking bugs.
>
> It is also a mystery why only printk() triggers this bug. The wakeup
> done there is not particularly special, so by all means we should
> have seen similar lockups elsewhere as well - not just with
> printk()s. Yet we are not seeing them.

 From the timing I see I'd guess it has something to do with the
scheduler kicking in during printk. I'm neither familiar with the
printk code nor with the scheduler.
If you have any ideas what I should test or add please let me know.

-Arne

>
> So some essential piece of the puzzle is still missing.
>
> Thanks,
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 19:30                                                                   ` Arne Jansen
@ 2011-06-05 19:44                                                                     ` Ingo Molnar
  2011-06-05 20:15                                                                       ` Arne Jansen
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-05 19:44 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> From the timing I see I'd guess it has something to do with the 
> scheduler kicking in during printk. I'm neither familiar with the 
> printk code nor with the scheduler.

Yeah, that's the well-known wake-up of klogd:

void console_unlock(void)
{
...
        up(&console_sem);

actually ... that's not the klogd wake-up at all (!). I so suck today 
at bug analysis :-)

It's the console lock()/unlock() sequence, and guess what does it:

 drivers/tty/tty_io.c:   console_lock();
 drivers/tty/vt/selection.c:     console_lock();

and the vt.c code in a dozen places.

So maybe it's some sort of tty related memory corruption that was 
made *visible* via the extra assert that the scheduler is doing? The 
pi_list is embedded in task struct.

This would explain why only printk() triggers it and other wakeup 
patterns not.

Now, i don't really like this theory either. Why is there no other 
type of corruption? And exactly why did only the task_struct::pi_lock 
field get corrupted while nearby fields not? Also, none of the fields 
near pi_lock are even remotely tty related.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 19:44                                                                     ` Ingo Molnar
@ 2011-06-05 20:15                                                                       ` Arne Jansen
  2011-06-06  6:56                                                                         ` Arne Jansen
  2011-06-06  9:01                                                                         ` Peter Zijlstra
  0 siblings, 2 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-05 20:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 21:44, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>>  From the timing I see I'd guess it has something to do with the
>> scheduler kicking in during printk. I'm neither familiar with the
>> printk code nor with the scheduler.
>
> Yeah, that's the well-known wake-up of klogd:
>
> void console_unlock(void)
> {
> ...
>          up(&console_sem);
>
> actually ... that's not the klogd wake-up at all (!). I so suck today
> at bug analysis :-)
>
> It's the console lock()/unlock() sequence, and guess what does it:
>
>   drivers/tty/tty_io.c:   console_lock();
>   drivers/tty/vt/selection.c:     console_lock();
>
> and the vt.c code in a dozen places.
>
> So maybe it's some sort of tty related memory corruption that was
> made *visible* via the extra assert that the scheduler is doing? The
> pi_list is embedded in task struct.
>
> This would explain why only printk() triggers it and other wakeup
> patterns not.
>
> Now, i don't really like this theory either. Why is there no other
> type of corruption? And exactly why did only the task_struct::pi_lock
> field get corrupted while nearby fields not? Also, none of the fields
> near pi_lock are even remotely tty related.

Can lockdep just get confused by the lockdep_off/on calls in printk
while scheduling is allowed? There aren't many users of lockdep_off().

I'll can try again tomorrow to get a dump of all logs from the
watchdog, but enough for today...


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 20:15                                                                       ` Arne Jansen
@ 2011-06-06  6:56                                                                         ` Arne Jansen
  2011-06-06  9:01                                                                         ` Peter Zijlstra
  1 sibling, 0 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-06  6:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 22:15, Arne Jansen wrote:
> On 05.06.2011 21:44, Ingo Molnar wrote:
>>
>> * Arne Jansen<lists@die-jansens.de>  wrote:
>>
>>>  From the timing I see I'd guess it has something to do with the
>>> scheduler kicking in during printk. I'm neither familiar with the
>>> printk code nor with the scheduler.
>>
>> Yeah, that's the well-known wake-up of klogd:
>>
>> void console_unlock(void)
>> {
>> ...
>>          up(&console_sem);
>>
>> actually ... that's not the klogd wake-up at all (!). I so suck today
>> at bug analysis :-)
>>
>> It's the console lock()/unlock() sequence, and guess what does it:
>>
>>   drivers/tty/tty_io.c:   console_lock();
>>   drivers/tty/vt/selection.c:     console_lock();
>>
>> and the vt.c code in a dozen places.
>>
>> So maybe it's some sort of tty related memory corruption that was
>> made *visible* via the extra assert that the scheduler is doing? The
>> pi_list is embedded in task struct.
>>
>> This would explain why only printk() triggers it and other wakeup
>> patterns not.
>>
>> Now, i don't really like this theory either. Why is there no other
>> type of corruption? And exactly why did only the task_struct::pi_lock
>> field get corrupted while nearby fields not? Also, none of the fields
>> near pi_lock are even remotely tty related.
> 
> Can lockdep just get confused by the lockdep_off/on calls in printk
> while scheduling is allowed? There aren't many users of lockdep_off().
> 
> I'll can try again tomorrow to get a dump of all logs from the
> watchdog, but enough for today...

I just let it dump the locks in debug_show_all_locks, even though
for some reason debug_locks is false. Don't know if the result is
helpful in any way, as it might well be inaccurate.

INFO: lockdep is turned off.

Showing all locks held in the system:
2 locks held by syslog-ng/21624:
 #0:  (&tty->atomic_write_lock){+.+.+.}, at: [<ffffffff8142ade3>]
tty_write_lock+0x23/0x60
 #1:  (&tty->output_lock){+.+...}, at: [<ffffffff8142ee7a>]
n_tty_write+0x14a/0x490
1 lock held by agetty/22174:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22175:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22176:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22177:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22178:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22179:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22180:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by tail/22197:
 #0:  (&rq->lock){-.-.-.}, at: [<ffffffff8185ae42>] schedule+0xe2/0x940

Some more fact that might help figure out what happens:
 - I nearly always either see all 10000 messages or only 10.
   Never 9, never 11. I saw 40 once, and once 190.
 - If I printk only 1000 lines instead of 10000, nothing bad happens
 - If /var/log/syslog is not filled with binary garbage, I also just
   see the 10 lines.

-Arne

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 15:26                                                   ` Ingo Molnar
  2011-06-05 15:32                                                     ` Ingo Molnar
@ 2011-06-06  7:34                                                     ` Arne Jansen
  1 sibling, 0 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-06  7:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 05.06.2011 17:26, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>>
>> * Arne Jansen <lists@die-jansens.de> wrote:
>>
>>> sched.c:934: in function __task_rq_lock
>>>         lockdep_assert_held(&p->pi_lock);
>>
>> Oh. Could you remove that line with the patch below - does it result 
>> in a working system?
>>
>> Now, this patch alone just removes a debugging check - but i'm not 
>> sure the debugging check is correct - we take the pi_lock in a raw 
>> way - which means it's not lockdep covered.
>>
>> So how can lockdep_assert_held() be called on it?
> 
> Ok, i'm wrong there - it's lockdep covered.
> 
> I also reviewed all the __task_rq_lock() call sites and each of them 
> has the pi_lock acquired. So unless both Peter and me are blind, the 
> other option would be some sort of memory corruption corrupting the 
> runqueue.

Another small idea, can we install the assert into a pre-0122ec5b02f766c
to see if it's an older problem that just got uncovered by the assert?

-Arne

> 
> But ... that looks so unlikely here, it's clearly heavy printk() and 
> console_sem twiddling that triggers the bug, not any other scheduler 
> activity.
> 
> Thanks,
> 
> 	Ingo


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 15:13                                                 ` Ingo Molnar
  2011-06-05 15:26                                                   ` Ingo Molnar
  2011-06-05 15:34                                                   ` Arne Jansen
@ 2011-06-06  8:38                                                   ` Peter Zijlstra
  2011-06-06 14:58                                                     ` Ingo Molnar
  2 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06  8:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Sun, 2011-06-05 at 17:13 +0200, Ingo Molnar wrote:

> Now, this patch alone just removes a debugging check - but i'm not 
> sure the debugging check is correct - we take the pi_lock in a raw 
> way - which means it's not lockdep covered.

Ever since tglx did s/raw_/arch_/g raw_ is covered by lockdep.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 20:15                                                                       ` Arne Jansen
  2011-06-06  6:56                                                                         ` Arne Jansen
@ 2011-06-06  9:01                                                                         ` Peter Zijlstra
  2011-06-06  9:18                                                                           ` Arne Jansen
                                                                                             ` (3 more replies)
  1 sibling, 4 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06  9:01 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Sun, 2011-06-05 at 22:15 +0200, Arne Jansen wrote:
> 
> Can lockdep just get confused by the lockdep_off/on calls in printk
> while scheduling is allowed? There aren't many users of lockdep_off().

Yes!, in that case lock_is_held() returns false, triggering the warning.
I guess there's an argument to be made in favour of the below..

---
 kernel/lockdep.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 53a6895..e4129cf 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -3242,7 +3242,7 @@ int lock_is_held(struct lockdep_map *lock)
 	int ret = 0;
 
 	if (unlikely(current->lockdep_recursion))
-		return ret;
+		return 1; /* avoid false negative lockdep_assert_held */
 
 	raw_local_irq_save(flags);
 	check_flags(flags);


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06  9:01                                                                         ` Peter Zijlstra
@ 2011-06-06  9:18                                                                           ` Arne Jansen
  2011-06-06  9:24                                                                             ` Peter Zijlstra
  2011-06-06 10:00                                                                           ` Arne Jansen
                                                                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-06  9:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 06.06.2011 11:01, Peter Zijlstra wrote:
> On Sun, 2011-06-05 at 22:15 +0200, Arne Jansen wrote:
>>
>> Can lockdep just get confused by the lockdep_off/on calls in printk
>> while scheduling is allowed? There aren't many users of lockdep_off().
> 
> Yes!, in that case lock_is_held() returns false, triggering the warning.
> I guess there's an argument to be made in favour of the below..


Two questions... is there any protection between the lockdep_recursion
check and the set to one? I guess in our case it is, because it's the
scheduler that calls it, but in general?
And why is lockdep needed to check if a lock is help? Isn't it reflected
in the lock structure itself?

-Arne

> 
> ---
>  kernel/lockdep.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 53a6895..e4129cf 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -3242,7 +3242,7 @@ int lock_is_held(struct lockdep_map *lock)
>  	int ret = 0;
>  
>  	if (unlikely(current->lockdep_recursion))
> -		return ret;
> +		return 1; /* avoid false negative lockdep_assert_held */
>  
>  	raw_local_irq_save(flags);
>  	check_flags(flags);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06  9:18                                                                           ` Arne Jansen
@ 2011-06-06  9:24                                                                             ` Peter Zijlstra
  2011-06-06  9:52                                                                               ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06  9:24 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 11:18 +0200, Arne Jansen wrote:
> On 06.06.2011 11:01, Peter Zijlstra wrote:
> > On Sun, 2011-06-05 at 22:15 +0200, Arne Jansen wrote:
> >>
> >> Can lockdep just get confused by the lockdep_off/on calls in printk
> >> while scheduling is allowed? There aren't many users of lockdep_off().
> > 
> > Yes!, in that case lock_is_held() returns false, triggering the warning.
> > I guess there's an argument to be made in favour of the below..
> 
> 
> Two questions... is there any protection between the lockdep_recursion
> check and the set to one? I guess in our case it is, because it's the
> scheduler that calls it, but in general?

Yeah, its always current->lockdep_recursion, so there is no
concurrency :-)

> And why is lockdep needed to check if a lock is help? Isn't it reflected
> in the lock structure itself?

Ah, so the difference is between _who_ owns the lock. Things like
spin_is_locked() check if the lock is taken but cannot tell you who owns
it, but lock_is_held() checks if the current context owns the lock.





^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06  9:24                                                                             ` Peter Zijlstra
@ 2011-06-06  9:52                                                                               ` Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06  9:52 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 11:24 +0200, Peter Zijlstra wrote:
> On Mon, 2011-06-06 at 11:18 +0200, Arne Jansen wrote:
> > On 06.06.2011 11:01, Peter Zijlstra wrote:
> > > On Sun, 2011-06-05 at 22:15 +0200, Arne Jansen wrote:
> > >>
> > >> Can lockdep just get confused by the lockdep_off/on calls in printk
> > >> while scheduling is allowed? There aren't many users of lockdep_off().
> > > 
> > > Yes!, in that case lock_is_held() returns false, triggering the warning.
> > > I guess there's an argument to be made in favour of the below..
> > 
> > 
> > Two questions... is there any protection between the lockdep_recursion
> > check and the set to one? I guess in our case it is, because it's the
> > scheduler that calls it, but in general?
> 
> Yeah, its always current->lockdep_recursion, so there is no
> concurrency :-)
> 
> > And why is lockdep needed to check if a lock is help? Isn't it reflected
> > in the lock structure itself?
> 
> Ah, so the difference is between _who_ owns the lock. Things like
> spin_is_locked() check if the lock is taken but cannot tell you who owns
> it, but lock_is_held() checks if the current context owns the lock.

Also, lockdep_assert_held() doesn't generate any code when lockdep is
not configured.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06  9:01                                                                         ` Peter Zijlstra
  2011-06-06  9:18                                                                           ` Arne Jansen
@ 2011-06-06 10:00                                                                           ` Arne Jansen
  2011-06-06 10:26                                                                             ` Peter Zijlstra
  2011-06-06 15:04                                                                           ` Ingo Molnar
  2011-06-07  5:20                                                                           ` Mike Galbraith
  3 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-06 10:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 06.06.2011 11:01, Peter Zijlstra wrote:
> On Sun, 2011-06-05 at 22:15 +0200, Arne Jansen wrote:
>>
>> Can lockdep just get confused by the lockdep_off/on calls in printk
>> while scheduling is allowed? There aren't many users of lockdep_off().
> 
> Yes!, in that case lock_is_held() returns false, triggering the warning.
> I guess there's an argument to be made in favour of the below..

As expected this apparently fixes the problem. But are we confident
enough this is the true source? If it's really that simple, printk
calling into the scheduler, why am I the only one seeing this?

-Arne

> 
> ---
>  kernel/lockdep.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 53a6895..e4129cf 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -3242,7 +3242,7 @@ int lock_is_held(struct lockdep_map *lock)
>  	int ret = 0;
>  
>  	if (unlikely(current->lockdep_recursion))
> -		return ret;
> +		return 1; /* avoid false negative lockdep_assert_held */
>  
>  	raw_local_irq_save(flags);
>  	check_flags(flags);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 10:00                                                                           ` Arne Jansen
@ 2011-06-06 10:26                                                                             ` Peter Zijlstra
  2011-06-06 13:25                                                                               ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 10:26 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 12:00 +0200, Arne Jansen wrote:
> As expected this apparently fixes the problem. But are we confident
> enough this is the true source? If it's really that simple, printk
> calling into the scheduler, why am I the only one seeing this?

Right, so apparently you have contention on console_sem and the up()
actually does a wakeup. I'm still trying to figure out how to do that.



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-05 18:59                                                                 ` Ingo Molnar
  2011-06-05 19:30                                                                   ` Arne Jansen
@ 2011-06-06 13:10                                                                   ` Ingo Molnar
  2011-06-06 13:12                                                                     ` Peter Zijlstra
  1 sibling, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 13:10 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Ingo Molnar <mingo@elte.hu> wrote:

> So some essential piece of the puzzle is still missing.

Oh. I think the essential piece of the puzzle might be this code in 
printk():

asmlinkage int vprintk(const char *fmt, va_list args)
{
...
        lockdep_off();
        if (console_trylock_for_printk(this_cpu))
                console_unlock();

        lockdep_on();
...

So while i right now do not see how this (ancient) piece of logic 
causes trouble, could you try the patch below, does it make the 
WARN()+lockup go away?

Thanks,

	Ingo

diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..1b9d2be 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -859,7 +859,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		zap_locks();
 	}
 
-	lockdep_off();
 	spin_lock(&logbuf_lock);
 	printk_cpu = this_cpu;
 
@@ -947,7 +946,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Try to acquire and then immediately release the
 	 * console semaphore. The release will do all the
 	 * actual magic (print out buffers, wake up klogd,
-	 * etc). 
+	 * etc).
 	 *
 	 * The console_trylock_for_printk() function
 	 * will release 'logbuf_lock' regardless of whether it
@@ -956,7 +955,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
 
-	lockdep_on();
 out_restore_irqs:
 	raw_local_irq_restore(flags);
 

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 13:10                                                                   ` Ingo Molnar
@ 2011-06-06 13:12                                                                     ` Peter Zijlstra
  2011-06-06 13:21                                                                       ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 13:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 15:10 +0200, Ingo Molnar wrote:


> diff --git a/kernel/printk.c b/kernel/printk.c
> index 3518539..1b9d2be 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -859,7 +859,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>  		zap_locks();
>  	}
>  
> -	lockdep_off();

At the very least you should also do: s/raw_local_irq_/local_irq/ on
this function.

>  	spin_lock(&logbuf_lock);
>  	printk_cpu = this_cpu;
>  
> @@ -947,7 +946,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>  	 * Try to acquire and then immediately release the
>  	 * console semaphore. The release will do all the
>  	 * actual magic (print out buffers, wake up klogd,
> -	 * etc). 
> +	 * etc).
>  	 *
>  	 * The console_trylock_for_printk() function
>  	 * will release 'logbuf_lock' regardless of whether it
> @@ -956,7 +955,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>  	if (console_trylock_for_printk(this_cpu))
>  		console_unlock();
>  
> -	lockdep_on();
>  out_restore_irqs:
>  	raw_local_irq_restore(flags);
>  


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 13:12                                                                     ` Peter Zijlstra
@ 2011-06-06 13:21                                                                       ` Ingo Molnar
  2011-06-06 13:31                                                                         ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 13:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2011-06-06 at 15:10 +0200, Ingo Molnar wrote:
> 
> 
> > diff --git a/kernel/printk.c b/kernel/printk.c
> > index 3518539..1b9d2be 100644
> > --- a/kernel/printk.c
> > +++ b/kernel/printk.c
> > @@ -859,7 +859,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
> >  		zap_locks();
> >  	}
> >  
> > -	lockdep_off();
> 
> At the very least you should also do: s/raw_local_irq_/local_irq/ on
> this function.

Right, i've also removed the preempt_disable()/enable() pair - that 
looks superfluous.

Updated patch below - still untested.

Thanks,

	Ingo

---
 kernel/printk.c |   10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Index: tip/kernel/printk.c
===================================================================
--- tip.orig/kernel/printk.c
+++ tip/kernel/printk.c
@@ -836,9 +836,8 @@ asmlinkage int vprintk(const char *fmt, 
 	boot_delay_msec();
 	printk_delay();
 
-	preempt_disable();
 	/* This stops the holder of console_sem just where we want him */
-	raw_local_irq_save(flags);
+	local_irq_save(flags);
 	this_cpu = smp_processor_id();
 
 	/*
@@ -859,7 +858,6 @@ asmlinkage int vprintk(const char *fmt, 
 		zap_locks();
 	}
 
-	lockdep_off();
 	spin_lock(&logbuf_lock);
 	printk_cpu = this_cpu;
 
@@ -947,7 +945,7 @@ asmlinkage int vprintk(const char *fmt, 
 	 * Try to acquire and then immediately release the
 	 * console semaphore. The release will do all the
 	 * actual magic (print out buffers, wake up klogd,
-	 * etc). 
+	 * etc).
 	 *
 	 * The console_trylock_for_printk() function
 	 * will release 'logbuf_lock' regardless of whether it
@@ -956,11 +954,9 @@ asmlinkage int vprintk(const char *fmt, 
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
 
-	lockdep_on();
 out_restore_irqs:
-	raw_local_irq_restore(flags);
+	local_irq_restore(flags);
 
-	preempt_enable();
 	return printed_len;
 }
 EXPORT_SYMBOL(printk);

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 10:26                                                                             ` Peter Zijlstra
@ 2011-06-06 13:25                                                                               ` Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 13:25 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 12:26 +0200, Peter Zijlstra wrote:
> On Mon, 2011-06-06 at 12:00 +0200, Arne Jansen wrote:
> > As expected this apparently fixes the problem. But are we confident
> > enough this is the true source? If it's really that simple, printk
> > calling into the scheduler, why am I the only one seeing this?
> 
> Right, so apparently you have contention on console_sem and the up()
> actually does a wakeup. I'm still trying to figure out how to do that.

On a related note, I'm not quite sure what we need that
lockdep_{off,on}() for, I just build and booted a kernel without them
and so far life is good.

I tried lockdep splats and in-scheduler printk()s (although the latter
will still mess up the box if printk()'s up(console_sem) triggers a
wakeup for obvious reasons).

---
Subject: lockdep, printk: Remove lockdep_off from printk()
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Jun 06 13:37:22 CEST 2011

Remove the lockdep_{off,on}() usage from printk() as it appears
superfluous, a kernel with this patch on can printk lockdep output just
fine.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -838,7 +838,7 @@ asmlinkage int vprintk(const char *fmt,
 
 	preempt_disable();
 	/* This stops the holder of console_sem just where we want him */
-	raw_local_irq_save(flags);
+	local_irq_save(flags);
 	this_cpu = smp_processor_id();
 
 	/*
@@ -859,7 +859,6 @@ asmlinkage int vprintk(const char *fmt,
 		zap_locks();
 	}
 
-	lockdep_off();
 	spin_lock(&logbuf_lock);
 	printk_cpu = this_cpu;
 
@@ -956,9 +955,8 @@ asmlinkage int vprintk(const char *fmt,
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
 
-	lockdep_on();
 out_restore_irqs:
-	raw_local_irq_restore(flags);
+	local_irq_restore(flags);
 
 	preempt_enable();
 	return printed_len;


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 13:21                                                                       ` Ingo Molnar
@ 2011-06-06 13:31                                                                         ` Peter Zijlstra
  0 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 13:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 15:21 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, 2011-06-06 at 15:10 +0200, Ingo Molnar wrote:
> > 
> > 
> > > diff --git a/kernel/printk.c b/kernel/printk.c
> > > index 3518539..1b9d2be 100644
> > > --- a/kernel/printk.c
> > > +++ b/kernel/printk.c
> > > @@ -859,7 +859,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
> > >  		zap_locks();
> > >  	}
> > >  
> > > -	lockdep_off();
> > 
> > At the very least you should also do: s/raw_local_irq_/local_irq/ on
> > this function.
> 
> Right, i've also removed the preempt_disable()/enable() pair - that 
> looks superfluous.

aside from the preempt thing, such a patch was just tested, I had a
module trigger a lockdep warning, and stuck a printk() in the middle of
ttwu (conditional so I could actually boot).

So go ahead, and merge this.

We still need the patch to lock_is_held() though, since there's a few
other lockdep_off() sites in the kernel, and at least the NTFS one needs
to be able to schedule.



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06  8:38                                                   ` Peter Zijlstra
@ 2011-06-06 14:58                                                     ` Ingo Molnar
  2011-06-06 15:09                                                       ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 14:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sun, 2011-06-05 at 17:13 +0200, Ingo Molnar wrote:
> 
> > Now, this patch alone just removes a debugging check - but i'm not 
> > sure the debugging check is correct - we take the pi_lock in a raw 
> > way - which means it's not lockdep covered.
> 
> Ever since tglx did s/raw_/arch_/g raw_ is covered by lockdep.

It's not lockdep covered due to the lockdep_off(), or am i missing 
something?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06  9:01                                                                         ` Peter Zijlstra
  2011-06-06  9:18                                                                           ` Arne Jansen
  2011-06-06 10:00                                                                           ` Arne Jansen
@ 2011-06-06 15:04                                                                           ` Ingo Molnar
  2011-06-06 15:08                                                                             ` Ingo Molnar
  2011-06-07  5:20                                                                           ` Mike Galbraith
  3 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sun, 2011-06-05 at 22:15 +0200, Arne Jansen wrote:
> > 
> > Can lockdep just get confused by the lockdep_off/on calls in printk
> > while scheduling is allowed? There aren't many users of lockdep_off().
> 
> Yes!, in that case lock_is_held() returns false, triggering the warning.
> I guess there's an argument to be made in favour of the below..
> 
> ---
>  kernel/lockdep.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 53a6895..e4129cf 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -3242,7 +3242,7 @@ int lock_is_held(struct lockdep_map *lock)
>  	int ret = 0;
>  
>  	if (unlikely(current->lockdep_recursion))
> -		return ret;
> +		return 1; /* avoid false negative lockdep_assert_held */
>  
>  	raw_local_irq_save(flags);
>  	check_flags(flags);

Oh, this explains the full bug i think.

lockdep_off() causes us to not track pi_lock, and thus the assert 
inside printk() called try_to_wake_up() triggers incorrectly.

The reason why Arne triggered it is probably because console_lock 
*wakeups* from printk are very, very rare: almost nothing actually 
locks the console. His remote system probably has some VT-intense 
application (screen?) that hits console_lock more intensely.

Arne, do you use some vt-intense application there?

The real fix might be to remove the lockdep_off()/on() call from 
printk(), that looks actively evil ... we had to hack through several 
layers of side-effects before we found the real bug - so it's not 
like the off()/on() made things more robust!

So i think what we want to apply is the lockdep_off()/on() removal, 
once Arne has it tested.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 15:04                                                                           ` Ingo Molnar
@ 2011-06-06 15:08                                                                             ` Ingo Molnar
  2011-06-06 17:44                                                                               ` Mike Galbraith
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Ingo Molnar <mingo@elte.hu> wrote:

> The real fix might be to remove the lockdep_off()/on() call from 
> printk(), that looks actively evil ... we had to hack through 
> several layers of side-effects before we found the real bug - so 
> it's not like the off()/on() made things more robust!

The other obvious fix would be to *remove* the blasted wakeup from 
printk(). It's a serious debugging robustness violation and it's not 
like the wakeup is super important latency-wise.

We *already* have a timer tick driven klogd wakeup poll routine. So i 
doubt we'd have many problems from not doing wakeups from printk(). 
Opinions?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 14:58                                                     ` Ingo Molnar
@ 2011-06-06 15:09                                                       ` Peter Zijlstra
  2011-06-06 15:47                                                         ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 16:58 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Sun, 2011-06-05 at 17:13 +0200, Ingo Molnar wrote:
> > 
> > > Now, this patch alone just removes a debugging check - but i'm not 
> > > sure the debugging check is correct - we take the pi_lock in a raw 
> > > way - which means it's not lockdep covered.
> > 
> > Ever since tglx did s/raw_/arch_/g raw_ is covered by lockdep.
> 
> It's not lockdep covered due to the lockdep_off(), or am i missing 
> something?

Your initial stmt was about the raw_ part, raw_ locks are tracked by
lockdep ever since tglx renamed them to arch_ and introduced new raw_
primitives.

But yeah, the lockdep_off() stuff also disables all tracking, on top of
that it also makes lock_is_held() return an unconditional false (even if
the lock was acquired before lockdep_off and thus registered).

My patch that fixes lock_is_held() should avoid false
lockdep_assert_held() explosions and this this printk() while rq->lock
problem.

Removing lockdep_off() usage from printk() would also be nice, but Mike
triggered logbuf_lock <-> rq->lock inversion with that due to the
up(&console_sem) wakeup muck.

Ideally we'd pull the up() out from under logbuf_lock, am looking at
that.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 15:09                                                       ` Peter Zijlstra
@ 2011-06-06 15:47                                                         ` Peter Zijlstra
  2011-06-06 15:52                                                           ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 15:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 17:09 +0200, Peter Zijlstra wrote:
> On Mon, 2011-06-06 at 16:58 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Sun, 2011-06-05 at 17:13 +0200, Ingo Molnar wrote:
> > > 
> > > > Now, this patch alone just removes a debugging check - but i'm not 
> > > > sure the debugging check is correct - we take the pi_lock in a raw 
> > > > way - which means it's not lockdep covered.
> > > 
> > > Ever since tglx did s/raw_/arch_/g raw_ is covered by lockdep.
> > 
> > It's not lockdep covered due to the lockdep_off(), or am i missing 
> > something?
> 
> Your initial stmt was about the raw_ part, raw_ locks are tracked by
> lockdep ever since tglx renamed them to arch_ and introduced new raw_
> primitives.
> 
> But yeah, the lockdep_off() stuff also disables all tracking, on top of
> that it also makes lock_is_held() return an unconditional false (even if
> the lock was acquired before lockdep_off and thus registered).
> 
> My patch that fixes lock_is_held() should avoid false
> lockdep_assert_held() explosions and this this printk() while rq->lock
> problem.
> 
> Removing lockdep_off() usage from printk() would also be nice, but Mike
> triggered logbuf_lock <-> rq->lock inversion with that due to the
> up(&console_sem) wakeup muck.
> 
> Ideally we'd pull the up() out from under logbuf_lock, am looking at
> that.

something like so,.. but then there's a comment about console_sem and
logbuf_lock interlocking in interating ways, but it fails to mention how
and why. But I think it should maybe work.. 

Needs more staring at, preferably by someone who actually understands
that horrid mess :/ Also, this all still doesn't make printk() work
reliably while holding rq->lock.

---
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -686,6 +686,7 @@ static void zap_locks(void)
 
 	oops_timestamp = jiffies;
 
+	debug_locks_off();
 	/* If a crash is occurring, make sure we can't deadlock */
 	spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
@@ -782,7 +783,7 @@ static inline int can_use_console(unsign
 static int console_trylock_for_printk(unsigned int cpu)
 	__releases(&logbuf_lock)
 {
-	int retval = 0;
+	int retval = 0, wake = 0;
 
 	if (console_trylock()) {
 		retval = 1;
@@ -795,12 +796,14 @@ static int console_trylock_for_printk(un
 		 */
 		if (!can_use_console(cpu)) {
 			console_locked = 0;
-			up(&console_sem);
+			wake = 1;
 			retval = 0;
 		}
 	}
 	printk_cpu = UINT_MAX;
 	spin_unlock(&logbuf_lock);
+	if (wake)
+		up(&console_sem);
 	return retval;
 }
 static const char recursion_bug_msg [] =
@@ -836,9 +839,8 @@ asmlinkage int vprintk(const char *fmt,
 	boot_delay_msec();
 	printk_delay();
 
-	preempt_disable();
 	/* This stops the holder of console_sem just where we want him */
-	raw_local_irq_save(flags);
+	local_irq_save(flags);
 	this_cpu = smp_processor_id();
 
 	/*
@@ -859,7 +861,6 @@ asmlinkage int vprintk(const char *fmt,
 		zap_locks();
 	}
 
-	lockdep_off();
 	spin_lock(&logbuf_lock);
 	printk_cpu = this_cpu;
 
@@ -956,11 +957,9 @@ asmlinkage int vprintk(const char *fmt,
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
 
-	lockdep_on();
 out_restore_irqs:
-	raw_local_irq_restore(flags);
+	local_irq_restore(flags);
 
-	preempt_enable();
 	return printed_len;
 }
 EXPORT_SYMBOL(printk);
@@ -1271,8 +1270,8 @@ void console_unlock(void)
 	if (unlikely(exclusive_console))
 		exclusive_console = NULL;
 
-	up(&console_sem);
 	spin_unlock_irqrestore(&logbuf_lock, flags);
+	up(&console_sem);
 	if (wake_klogd)
 		wake_up_klogd();
 }


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 15:47                                                         ` Peter Zijlstra
@ 2011-06-06 15:52                                                           ` Ingo Molnar
  2011-06-06 16:00                                                             ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> Needs more staring at, preferably by someone who actually 
> understands that horrid mess :/ Also, this all still doesn't make 
> printk() work reliably while holding rq->lock.

So, what about my suggestion to just *remove* the wakeup from there 
and use the deferred wakeup mechanism that klogd uses.

That would make printk() *visibly* more robust in practice.

[ It would also open up the way to possibly make printk() NMI entry 
  safe - currently we lock up if we printk in an NMI or #MC context 
  that happens to nest inside a printk(). ]

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 15:52                                                           ` Ingo Molnar
@ 2011-06-06 16:00                                                             ` Peter Zijlstra
  2011-06-06 16:08                                                               ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 16:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 17:52 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Needs more staring at, preferably by someone who actually 
> > understands that horrid mess :/ Also, this all still doesn't make 
> > printk() work reliably while holding rq->lock.
> 
> So, what about my suggestion to just *remove* the wakeup from there 
> and use the deferred wakeup mechanism that klogd uses.
> 
> That would make printk() *visibly* more robust in practice.

That's currently done from the jiffy tick, do you want to effectively
delay releasing the console_sem for the better part of a jiffy?

> [ It would also open up the way to possibly make printk() NMI entry 
>   safe - currently we lock up if we printk in an NMI or #MC context 
>   that happens to nest inside a printk(). ]

Well, for that to happen you also need to deal with logbuf_lock nesting.
Personally I think using printk() from NMI context is quite beyond sane.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:00                                                             ` Peter Zijlstra
@ 2011-06-06 16:08                                                               ` Ingo Molnar
  2011-06-06 16:12                                                                 ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 16:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2011-06-06 at 17:52 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > Needs more staring at, preferably by someone who actually 
> > > understands that horrid mess :/ Also, this all still doesn't make 
> > > printk() work reliably while holding rq->lock.
> > 
> > So, what about my suggestion to just *remove* the wakeup from there 
> > and use the deferred wakeup mechanism that klogd uses.
> > 
> > That would make printk() *visibly* more robust in practice.
> 
> That's currently done from the jiffy tick, do you want to effectively
> delay releasing the console_sem for the better part of a jiffy?

Yes, and we already do it in some other circumstances. Can you see 
any problem with that? klogd is an utter slowpath anyway.

> > [ It would also open up the way to possibly make printk() NMI entry 
> >   safe - currently we lock up if we printk in an NMI or #MC context 
> >   that happens to nest inside a printk(). ]
> 
> Well, for that to happen you also need to deal with logbuf_lock 
> nesting. [...]

That we could do as a robustness patch: detect when the current CPU 
already holds it and do not lock up on that. This would also allow 
printk() to work within a crashing printk(). (assuming the second 
printk() does not crash - in which case it's game over anyway)

> Personally I think using printk() from NMI context is quite beyond 
> sane.

Yeah, quite so, but it *can* happen so if we can make it work as a 
free side-effect of a printk()-robustness increasing patch, why not?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:08                                                               ` Ingo Molnar
@ 2011-06-06 16:12                                                                 ` Peter Zijlstra
  2011-06-06 16:17                                                                   ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 16:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 18:08 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, 2011-06-06 at 17:52 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > Needs more staring at, preferably by someone who actually 
> > > > understands that horrid mess :/ Also, this all still doesn't make 
> > > > printk() work reliably while holding rq->lock.
> > > 
> > > So, what about my suggestion to just *remove* the wakeup from there 
> > > and use the deferred wakeup mechanism that klogd uses.
> > > 
> > > That would make printk() *visibly* more robust in practice.
> > 
> > That's currently done from the jiffy tick, do you want to effectively
> > delay releasing the console_sem for the better part of a jiffy?
> 
> Yes, and we already do it in some other circumstances. 

We do?

> Can you see 
> any problem with that? klogd is an utter slowpath anyway.

but console_sem isn't klogd. We delay klogd and that's perfectly fine,
but afaict we don't delay console_sem.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:12                                                                 ` Peter Zijlstra
@ 2011-06-06 16:17                                                                   ` Ingo Molnar
  2011-06-06 16:38                                                                     ` Arne Jansen
  2011-06-06 16:44                                                                     ` Peter Zijlstra
  0 siblings, 2 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2011-06-06 at 18:08 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Mon, 2011-06-06 at 17:52 +0200, Ingo Molnar wrote:
> > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > 
> > > > > Needs more staring at, preferably by someone who actually 
> > > > > understands that horrid mess :/ Also, this all still doesn't make 
> > > > > printk() work reliably while holding rq->lock.
> > > > 
> > > > So, what about my suggestion to just *remove* the wakeup from there 
> > > > and use the deferred wakeup mechanism that klogd uses.
> > > > 
> > > > That would make printk() *visibly* more robust in practice.
> > > 
> > > That's currently done from the jiffy tick, do you want to effectively
> > > delay releasing the console_sem for the better part of a jiffy?
> > 
> > Yes, and we already do it in some other circumstances. 
> 
> We do?

Yes, see the whole printk_pending logic, it delays:

                wake_up_interruptible(&log_wait);

to the next jiffies tick.

> > Can you see 
> > any problem with that? klogd is an utter slowpath anyway.
> 
> but console_sem isn't klogd. We delay klogd and that's perfectly 
> fine, but afaict we don't delay console_sem.

But console_sem is really a similar special case as klogd. See, it's 
about a *printk*. That's rare by definition.

If someone on the console sees it he'll be startled by at least 10 
msecs ;-) So delaying the wakeup to the next jiffy really fits into 
the same approach as we already do with &log_wait, hm?

This would solve a real nightmare that has plagued us ever since 
printk() has done wakeups directly - i.e. like forever.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:17                                                                   ` Ingo Molnar
@ 2011-06-06 16:38                                                                     ` Arne Jansen
  2011-06-06 16:45                                                                       ` Arne Jansen
  2011-06-06 16:44                                                                     ` Peter Zijlstra
  1 sibling, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-06 16:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 06.06.2011 18:17, Ingo Molnar wrote:
>
> * Peter Zijlstra<peterz@infradead.org>  wrote:
>
>> On Mon, 2011-06-06 at 18:08 +0200, Ingo Molnar wrote:
>>> * Peter Zijlstra<peterz@infradead.org>  wrote:
>>>
>>>> On Mon, 2011-06-06 at 17:52 +0200, Ingo Molnar wrote:
>>>>> * Peter Zijlstra<peterz@infradead.org>  wrote:
>>>>>
>>>>>> Needs more staring at, preferably by someone who actually
>>>>>> understands that horrid mess :/ Also, this all still doesn't make
>>>>>> printk() work reliably while holding rq->lock.
>>>>>
>>>>> So, what about my suggestion to just *remove* the wakeup from there
>>>>> and use the deferred wakeup mechanism that klogd uses.
>>>>>
>>>>> That would make printk() *visibly* more robust in practice.
>>>>
>>>> That's currently done from the jiffy tick, do you want to effectively
>>>> delay releasing the console_sem for the better part of a jiffy?
>>>
>>> Yes, and we already do it in some other circumstances.
>>
>> We do?
>
> Yes, see the whole printk_pending logic, it delays:
>
>                  wake_up_interruptible(&log_wait);
>
> to the next jiffies tick.
>
>>> Can you see
>>> any problem with that? klogd is an utter slowpath anyway.
>>
>> but console_sem isn't klogd. We delay klogd and that's perfectly
>> fine, but afaict we don't delay console_sem.
>
> But console_sem is really a similar special case as klogd. See, it's
> about a *printk*. That's rare by definition.
>
> If someone on the console sees it he'll be startled by at least 10
> msecs ;-) So delaying the wakeup to the next jiffy really fits into
> the same approach as we already do with&log_wait, hm?

As long as it doesn't scramble the order of the messages, the delay
imho doesn't matter even in very printk-heavy debugging sessions.

>
> This would solve a real nightmare that has plagued us ever since
> printk() has done wakeups directly - i.e. like forever.
>
> Thanks,
>
> 	Ingo


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:17                                                                   ` Ingo Molnar
  2011-06-06 16:38                                                                     ` Arne Jansen
@ 2011-06-06 16:44                                                                     ` Peter Zijlstra
  2011-06-06 16:50                                                                       ` Peter Zijlstra
                                                                                         ` (2 more replies)
  1 sibling, 3 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 16:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 18:17 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, 2011-06-06 at 18:08 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > On Mon, 2011-06-06 at 17:52 +0200, Ingo Molnar wrote:
> > > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > 
> > > > > > Needs more staring at, preferably by someone who actually 
> > > > > > understands that horrid mess :/ Also, this all still doesn't make 
> > > > > > printk() work reliably while holding rq->lock.
> > > > > 
> > > > > So, what about my suggestion to just *remove* the wakeup from there 
> > > > > and use the deferred wakeup mechanism that klogd uses.
> > > > > 
> > > > > That would make printk() *visibly* more robust in practice.
> > > > 
> > > > That's currently done from the jiffy tick, do you want to effectively
> > > > delay releasing the console_sem for the better part of a jiffy?
> > > 
> > > Yes, and we already do it in some other circumstances. 
> > 
> > We do?
> 
> Yes, see the whole printk_pending logic, it delays:
> 
>                 wake_up_interruptible(&log_wait);
> 
> to the next jiffies tick.

Again, that's not console_sem ("..delay releasing console_sem.."
"..already done.." isn't true).

> > > Can you see 
> > > any problem with that? klogd is an utter slowpath anyway.
> > 
> > but console_sem isn't klogd. We delay klogd and that's perfectly 
> > fine, but afaict we don't delay console_sem.
> 
> But console_sem is really a similar special case as klogd. See, it's 
> about a *printk*. That's rare by definition.

But its not rare, its _the_ lock that serialized the whole console
layer. Pretty much everything a console does goes through that lock.

By delaying this with 10ms (CONFIG_HZ=100) per printk could really delay
the whole boot process.

> If someone on the console sees it he'll be startled by at least 10 
> msecs ;-) So delaying the wakeup to the next jiffy really fits into 
> the same approach as we already do with &log_wait, hm?

Not convinced yet, I mean, don't get me wrong, I'd love to rid us of the
thing, but I'm not sure delaying the release of a resource like this is
the right approach.

Ahh, what we could do is something like the below and delay both the
acquire and release of the console_sem.

---
 kernel/printk.c |   86 +++++++++++++++++++++++++-----------------------------
 1 files changed, 40 insertions(+), 46 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..d3bdf5a 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -686,6 +686,7 @@ static void zap_locks(void)
 
 	oops_timestamp = jiffies;
 
+	debug_locks_off();
 	/* If a crash is occurring, make sure we can't deadlock */
 	spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
@@ -774,16 +775,13 @@ static inline int can_use_console(unsigned int cpu)
  * messages from a 'printk'. Return true (and with the
  * console_lock held, and 'console_locked' set) if it
  * is successful, false otherwise.
- *
- * This gets called with the 'logbuf_lock' spinlock held and
- * interrupts disabled. It should return with 'lockbuf_lock'
- * released but interrupts still disabled.
  */
 static int console_trylock_for_printk(unsigned int cpu)
 	__releases(&logbuf_lock)
 {
 	int retval = 0;
 
+	spin_lock(&logbuf_lock);
 	if (console_trylock()) {
 		retval = 1;
 
@@ -803,12 +801,27 @@ static int console_trylock_for_printk(unsigned int cpu)
 	spin_unlock(&logbuf_lock);
 	return retval;
 }
+
 static const char recursion_bug_msg [] =
 		KERN_CRIT "BUG: recent printk recursion!\n";
 static int recursion_bug;
 static int new_text_line = 1;
 static char printk_buf[1024];
 
+static DEFINE_PER_CPU(int, printk_pending);
+
+int printk_needs_cpu(int cpu)
+{
+	if (cpu_is_offline(cpu))
+		printk_tick();
+	return __this_cpu_read(printk_pending);
+}
+
+void printk_set_pending(void)
+{
+	this_cpu_write(printk_pending, 1);
+}
+
 int printk_delay_msec __read_mostly;
 
 static inline void printk_delay(void)
@@ -836,9 +849,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	boot_delay_msec();
 	printk_delay();
 
-	preempt_disable();
 	/* This stops the holder of console_sem just where we want him */
-	raw_local_irq_save(flags);
+	local_irq_save(flags);
 	this_cpu = smp_processor_id();
 
 	/*
@@ -859,7 +871,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		zap_locks();
 	}
 
-	lockdep_off();
 	spin_lock(&logbuf_lock);
 	printk_cpu = this_cpu;
 
@@ -942,25 +953,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		if (*p == '\n')
 			new_text_line = 1;
 	}
+	spin_unlock(&logbuf_lock);
 
-	/*
-	 * Try to acquire and then immediately release the
-	 * console semaphore. The release will do all the
-	 * actual magic (print out buffers, wake up klogd,
-	 * etc). 
-	 *
-	 * The console_trylock_for_printk() function
-	 * will release 'logbuf_lock' regardless of whether it
-	 * actually gets the semaphore or not.
-	 */
-	if (console_trylock_for_printk(this_cpu))
-		console_unlock();
+	printk_set_pending();
 
-	lockdep_on();
 out_restore_irqs:
-	raw_local_irq_restore(flags);
+	local_irq_restore(flags);
 
-	preempt_enable();
 	return printed_len;
 }
 EXPORT_SYMBOL(printk);
@@ -1201,29 +1200,6 @@ int is_console_locked(void)
 	return console_locked;
 }
 
-static DEFINE_PER_CPU(int, printk_pending);
-
-void printk_tick(void)
-{
-	if (__this_cpu_read(printk_pending)) {
-		__this_cpu_write(printk_pending, 0);
-		wake_up_interruptible(&log_wait);
-	}
-}
-
-int printk_needs_cpu(int cpu)
-{
-	if (cpu_is_offline(cpu))
-		printk_tick();
-	return __this_cpu_read(printk_pending);
-}
-
-void wake_up_klogd(void)
-{
-	if (waitqueue_active(&log_wait))
-		this_cpu_write(printk_pending, 1);
-}
-
 /**
  * console_unlock - unlock the console system
  *
@@ -1273,11 +1249,29 @@ void console_unlock(void)
 
 	up(&console_sem);
 	spin_unlock_irqrestore(&logbuf_lock, flags);
+
 	if (wake_klogd)
-		wake_up_klogd();
+		wake_up_interruptible(&log_wait);
 }
 EXPORT_SYMBOL(console_unlock);
 
+void printk_tick(void)
+{
+	if (!__this_cpu_read(printk_pending))
+		return;
+
+	/*
+	 * Try to acquire and then immediately release the
+	 * console semaphore. The release will do all the
+	 * actual magic (print out buffers, wake up klogd,
+	 * etc). 
+	 */
+	if (console_trylock_for_printk(smp_processor_id())) {
+		console_unlock();
+		__this_cpu_write(printk_pending, 0);
+	}
+}
+
 /**
  * console_conditional_schedule - yield the CPU if required
  *


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:38                                                                     ` Arne Jansen
@ 2011-06-06 16:45                                                                       ` Arne Jansen
  2011-06-06 16:53                                                                         ` Peter Zijlstra
  2011-06-06 17:07                                                                         ` Ingo Molnar
  0 siblings, 2 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 06.06.2011 18:38, Arne Jansen wrote:
> On 06.06.2011 18:17, Ingo Molnar wrote:
>>
>> * Peter Zijlstra<peterz@infradead.org> wrote:
>>
>>> On Mon, 2011-06-06 at 18:08 +0200, Ingo Molnar wrote:
>>>> * Peter Zijlstra<peterz@infradead.org> wrote:
>>>>
>>>>> On Mon, 2011-06-06 at 17:52 +0200, Ingo Molnar wrote:
>>>>>> * Peter Zijlstra<peterz@infradead.org> wrote:
>>>>>>
>>>>>>> Needs more staring at, preferably by someone who actually
>>>>>>> understands that horrid mess :/ Also, this all still doesn't make
>>>>>>> printk() work reliably while holding rq->lock.
>>>>>>
>>>>>> So, what about my suggestion to just *remove* the wakeup from there
>>>>>> and use the deferred wakeup mechanism that klogd uses.
>>>>>>
>>>>>> That would make printk() *visibly* more robust in practice.
>>>>>
>>>>> That's currently done from the jiffy tick, do you want to effectively
>>>>> delay releasing the console_sem for the better part of a jiffy?
>>>>
>>>> Yes, and we already do it in some other circumstances.
>>>
>>> We do?
>>
>> Yes, see the whole printk_pending logic, it delays:
>>
>> wake_up_interruptible(&log_wait);
>>
>> to the next jiffies tick.
>>
>>>> Can you see
>>>> any problem with that? klogd is an utter slowpath anyway.
>>>
>>> but console_sem isn't klogd. We delay klogd and that's perfectly
>>> fine, but afaict we don't delay console_sem.
>>
>> But console_sem is really a similar special case as klogd. See, it's
>> about a *printk*. That's rare by definition.
>>
>> If someone on the console sees it he'll be startled by at least 10
>> msecs ;-) So delaying the wakeup to the next jiffy really fits into
>> the same approach as we already do with&log_wait, hm?
>
> As long as it doesn't scramble the order of the messages, the delay
> imho doesn't matter even in very printk-heavy debugging sessions.

And, as important, doesn't reduce the throughput of printk. Having only
100 wakeups/s sounds like the throughput is limited to 100xsizeof(ring 
buffer).

>
>>
>> This would solve a real nightmare that has plagued us ever since
>> printk() has done wakeups directly - i.e. like forever.
>>
>> Thanks,
>>
>> Ingo
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:44                                                                     ` Peter Zijlstra
@ 2011-06-06 16:50                                                                       ` Peter Zijlstra
  2011-06-06 17:13                                                                         ` Ingo Molnar
  2011-06-06 17:04                                                                       ` Peter Zijlstra
  2011-06-06 17:11                                                                       ` Ingo Molnar
  2 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 16:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 18:44 +0200, Peter Zijlstra wrote:
> +void printk_tick(void)
> +{
> +       if (!__this_cpu_read(printk_pending))
> +               return;
> +
> +       /*
> +        * Try to acquire and then immediately release the
> +        * console semaphore. The release will do all the
> +        * actual magic (print out buffers, wake up klogd,
> +        * etc). 
> +        */
> +       if (console_trylock_for_printk(smp_processor_id())) {
> +               console_unlock();
> +               __this_cpu_write(printk_pending, 0);
> +       }
> +} 

Aside from not compiling (someone stuck a ref to wake_up_klogd somewhere
in lib/) this does delay the whole of printk() output by up to a jiffy,
if the machine dies funny you could be missing large parts of the
output :/



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:45                                                                       ` Arne Jansen
@ 2011-06-06 16:53                                                                         ` Peter Zijlstra
  2011-06-06 17:07                                                                         ` Ingo Molnar
  1 sibling, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 16:53 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Ingo Molnar, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 18:45 +0200, Arne Jansen wrote:
> > As long as it doesn't scramble the order of the messages, the delay
> > imho doesn't matter even in very printk-heavy debugging sessions.
> 
> And, as important, doesn't reduce the throughput of printk. Having only
> 100 wakeups/s sounds like the throughput is limited to 100xsizeof(ring 
> buffer). 

Right, that's another problem.. not really sure delaying all this is
going to work :/



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:44                                                                     ` Peter Zijlstra
  2011-06-06 16:50                                                                       ` Peter Zijlstra
@ 2011-06-06 17:04                                                                       ` Peter Zijlstra
  2011-06-06 17:11                                                                       ` Ingo Molnar
  2 siblings, 0 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 17:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 18:44 +0200, Peter Zijlstra wrote:
>  
> @@ -942,25 +953,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>                 if (*p == '\n')
>                         new_text_line = 1;
>         }
> +       spin_unlock(&logbuf_lock);
>  
> -       /*
> -        * Try to acquire and then immediately release the
> -        * console semaphore. The release will do all the
> -        * actual magic (print out buffers, wake up klogd,
> -        * etc). 
> -        *
> -        * The console_trylock_for_printk() function
> -        * will release 'logbuf_lock' regardless of whether it
> -        * actually gets the semaphore or not.
> -        */
> -       if (console_trylock_for_printk(this_cpu))
> -               console_unlock(); 

FWIW the existing printk recursion logic is broken, console_unlock()
clears printk_cpu but console_trylock_for_printk() can release
logbuf_lock and fail the trylock of console_sem, in which case a
subsequent printk() is perfectly valid and non-recursing.



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:45                                                                       ` Arne Jansen
  2011-06-06 16:53                                                                         ` Peter Zijlstra
@ 2011-06-06 17:07                                                                         ` Ingo Molnar
  2011-06-06 17:11                                                                           ` Peter Zijlstra
  1 sibling, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 17:07 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> > As long as it doesn't scramble the order of the messages, the 
> > delay imho doesn't matter even in very printk-heavy debugging 
> > sessions.
> 
> And, as important, doesn't reduce the throughput of printk. Having 
> only 100 wakeups/s sounds like the throughput is limited to 
> 100xsizeof(ring buffer).

Nah.

I for example *always* kill klogd during such printk based debugging 
sessions, because it's *already* very easy to overflow its buffering 
abilities. Also, klogd often interferes with debugging.

So i make the log buffer big enough to contain enough debugging info.

So it's a non-issue IMHO. Linus, what do you think?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 17:07                                                                         ` Ingo Molnar
@ 2011-06-06 17:11                                                                           ` Peter Zijlstra
  2011-06-08 15:50                                                                             ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 17:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 19:07 +0200, Ingo Molnar wrote:
> * Arne Jansen <lists@die-jansens.de> wrote:
> 
> > > As long as it doesn't scramble the order of the messages, the 
> > > delay imho doesn't matter even in very printk-heavy debugging 
> > > sessions.
> > 
> > And, as important, doesn't reduce the throughput of printk. Having 
> > only 100 wakeups/s sounds like the throughput is limited to 
> > 100xsizeof(ring buffer).
> 
> Nah.
> 
> I for example *always* kill klogd during such printk based debugging 
> sessions, because it's *already* very easy to overflow its buffering 
> abilities. Also, klogd often interferes with debugging.

Also, klogd is completely irrelevant, klogd doesn't do anything useful.
Writing things to the actual console otoh is very useful (you get to see
them on the screen/serial line).

Delaying the console_sem release will delay anything touching the
console_sem, including userland stuffs.

Delaying the console_sem acquire+release will delay showing important
printk() lines on your serial.

Both suck.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:44                                                                     ` Peter Zijlstra
  2011-06-06 16:50                                                                       ` Peter Zijlstra
  2011-06-06 17:04                                                                       ` Peter Zijlstra
@ 2011-06-06 17:11                                                                       ` Ingo Molnar
  2011-06-06 17:57                                                                         ` Arne Jansen
  2 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> > > but console_sem isn't klogd. We delay klogd and that's 
> > > perfectly fine, but afaict we don't delay console_sem.
> > 
> > But console_sem is really a similar special case as klogd. See, 
> > it's about a *printk*. That's rare by definition.
> 
> But its not rare, its _the_ lock that serialized the whole console 
> layer. Pretty much everything a console does goes through that 
> lock.

Please. Think.

If console_sem was so frequently held then why on earth were you 
*unable* to trigger the lockup with an artificial printk() storm and 
why on earth has almost no-one else but Arne triggered it? :-)

This bug is the very proof that console_sem is seldom contended!

> Ahh, what we could do is something like the below and delay both 
> the acquire and release of the console_sem.

Yeah!

> +void printk_tick(void)
> +{
> +	if (!__this_cpu_read(printk_pending))
> +		return;
> +
> +	/*
> +	 * Try to acquire and then immediately release the
> +	 * console semaphore. The release will do all the
> +	 * actual magic (print out buffers, wake up klogd,
> +	 * etc). 
> +	 */
> +	if (console_trylock_for_printk(smp_processor_id())) {
> +		console_unlock();
> +		__this_cpu_write(printk_pending, 0);
> +	}
> +}

Arne does this fix the hang you are seeing?

Now, we probably don't want to do this in 3.0, just to give time for 
interactions to found and complaints to be worded. So we could do the 
minimal fix first and queue up the bigger change for 3.1.

Hm?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 16:50                                                                       ` Peter Zijlstra
@ 2011-06-06 17:13                                                                         ` Ingo Molnar
  0 siblings, 0 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 17:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> Aside from not compiling (someone stuck a ref to wake_up_klogd 
> somewhere in lib/) this does delay the whole of printk() output by 
> up to a jiffy, if the machine dies funny you could be missing large 
> parts of the output :/

oh, that's bad and i missed that aspect.

We want the console *output* immediately. That's non-negotiable!

What we want to delay are the various wakeups of secondary recipients 
of printks ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 15:08                                                                             ` Ingo Molnar
@ 2011-06-06 17:44                                                                               ` Mike Galbraith
  0 siblings, 0 replies; 152+ messages in thread
From: Mike Galbraith @ 2011-06-06 17:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arne Jansen, Linus Torvalds, mingo, hpa,
	linux-kernel, npiggin, akpm, frank.rowand, tglx,
	linux-tip-commits

On Mon, 2011-06-06 at 17:08 +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > The real fix might be to remove the lockdep_off()/on() call from 
> > printk(), that looks actively evil ... we had to hack through 
> > several layers of side-effects before we found the real bug - so 
> > it's not like the off()/on() made things more robust!
> 
> The other obvious fix would be to *remove* the blasted wakeup from 
> printk(). It's a serious debugging robustness violation and it's not 
> like the wakeup is super important latency-wise.
> 
> We *already* have a timer tick driven klogd wakeup poll routine. So i 
> doubt we'd have many problems from not doing wakeups from printk(). 
> Opinions?

Seconded!  I routinely whack that damn thing.

	-Mike


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 17:11                                                                       ` Ingo Molnar
@ 2011-06-06 17:57                                                                         ` Arne Jansen
  2011-06-06 18:07                                                                           ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Arne Jansen @ 2011-06-06 17:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 06.06.2011 19:11, Ingo Molnar wrote:
>
> * Peter Zijlstra<peterz@infradead.org>  wrote:

>
>> +void printk_tick(void)
>> +{
>> +	if (!__this_cpu_read(printk_pending))
>> +		return;
>> +
>> +	/*
>> +	 * Try to acquire and then immediately release the
>> +	 * console semaphore. The release will do all the
>> +	 * actual magic (print out buffers, wake up klogd,
>> +	 * etc).
>> +	 */
>> +	if (console_trylock_for_printk(smp_processor_id())) {
>> +		console_unlock();
>> +		__this_cpu_write(printk_pending, 0);
>> +	}
>> +}
>
> Arne does this fix the hang you are seeing?

What do you want me to test? just replace printk_tick with the
above version? If I do that, the machine doesn't even boot up
any more.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 17:57                                                                         ` Arne Jansen
@ 2011-06-06 18:07                                                                           ` Ingo Molnar
  2011-06-06 18:14                                                                             ` Arne Jansen
  2011-06-06 18:19                                                                             ` Peter Zijlstra
  0 siblings, 2 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 18:07 UTC (permalink / raw)
  To: Arne Jansen
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Arne Jansen <lists@die-jansens.de> wrote:

> On 06.06.2011 19:11, Ingo Molnar wrote:
> >
> >* Peter Zijlstra<peterz@infradead.org>  wrote:
> 
> >
> >>+void printk_tick(void)
> >>+{
> >>+	if (!__this_cpu_read(printk_pending))
> >>+		return;
> >>+
> >>+	/*
> >>+	 * Try to acquire and then immediately release the
> >>+	 * console semaphore. The release will do all the
> >>+	 * actual magic (print out buffers, wake up klogd,
> >>+	 * etc).
> >>+	 */
> >>+	if (console_trylock_for_printk(smp_processor_id())) {
> >>+		console_unlock();
> >>+		__this_cpu_write(printk_pending, 0);
> >>+	}
> >>+}
> >
> >Arne does this fix the hang you are seeing?
> 
> What do you want me to test? just replace printk_tick with the 
> above version? If I do that, the machine doesn't even boot up any 
> more.

Yeah.

So i think we want two patches:

 - The first one that minimally removes the lockdep_off()/on() dance 
   and fixes the regression: the patch that i sent earlier today.
   I *think* that should fix the crash.

   3.0 material.

 - The second one that moves console_sem wakeups to the jiffies tick. 
   It does not push the acquiring and the console->write() calls to
   jiffies context, only delays the wakeup.

   3.1 material.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 18:07                                                                           ` Ingo Molnar
@ 2011-06-06 18:14                                                                             ` Arne Jansen
  2011-06-06 18:19                                                                             ` Peter Zijlstra
  1 sibling, 0 replies; 152+ messages in thread
From: Arne Jansen @ 2011-06-06 18:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On 06.06.2011 20:07, Ingo Molnar wrote:
>
> * Arne Jansen<lists@die-jansens.de>  wrote:
>
>> On 06.06.2011 19:11, Ingo Molnar wrote:
>>>
>>> * Peter Zijlstra<peterz@infradead.org>   wrote:
>>
>>>
>>>> +void printk_tick(void)
>>>> +{
>>>> +	if (!__this_cpu_read(printk_pending))
>>>> +		return;
>>>> +
>>>> +	/*
>>>> +	 * Try to acquire and then immediately release the
>>>> +	 * console semaphore. The release will do all the
>>>> +	 * actual magic (print out buffers, wake up klogd,
>>>> +	 * etc).
>>>> +	 */
>>>> +	if (console_trylock_for_printk(smp_processor_id())) {
>>>> +		console_unlock();
>>>> +		__this_cpu_write(printk_pending, 0);
>>>> +	}
>>>> +}
>>>
>>> Arne does this fix the hang you are seeing?
>>
>> What do you want me to test? just replace printk_tick with the
>> above version? If I do that, the machine doesn't even boot up any
>> more.
>
> Yeah.
>
> So i think we want two patches:
>
>   - The first one that minimally removes the lockdep_off()/on() dance
>     and fixes the regression: the patch that i sent earlier today.
>     I *think* that should fix the crash.

Isn't the regression just the false lockdep_assert_held(&p->pi_lock)?
The patch Peter sent earlier seems like the minimal changeset to fix
that, plus it fixes a bug that might pop up somewhere else, too.

>
>     3.0 material.
>
>   - The second one that moves console_sem wakeups to the jiffies tick.
>     It does not push the acquiring and the console->write() calls to
>     jiffies context, only delays the wakeup.
>
>     3.1 material.
>
> Thanks,
>
> 	Ingo


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 18:07                                                                           ` Ingo Molnar
  2011-06-06 18:14                                                                             ` Arne Jansen
@ 2011-06-06 18:19                                                                             ` Peter Zijlstra
  2011-06-06 22:08                                                                               ` Ingo Molnar
  1 sibling, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-06 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 20:07 +0200, Ingo Molnar wrote:

>  - The first one that minimally removes the lockdep_off()/on() dance 
>    and fixes the regression: the patch that i sent earlier today.
>    I *think* that should fix the crash.
> 
>    3.0 material.

I think the lock_is_held() fix is the urgent one, all the others are
make printk() suck less patches, ie. not so very important.

Without that lock_is_held() thing all lockdep_off() sites, inc ntfs will
still be able to trigger this problem.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 18:19                                                                             ` Peter Zijlstra
@ 2011-06-06 22:08                                                                               ` Ingo Molnar
  0 siblings, 0 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-06 22:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2011-06-06 at 20:07 +0200, Ingo Molnar wrote:
> 
> >  - The first one that minimally removes the lockdep_off()/on() dance 
> >    and fixes the regression: the patch that i sent earlier today.
> >    I *think* that should fix the crash.
> > 
> >    3.0 material.
> 
> I think the lock_is_held() fix is the urgent one, all the others are
> make printk() suck less patches, ie. not so very important.
> 
> Without that lock_is_held() thing all lockdep_off() sites, inc ntfs will
> still be able to trigger this problem.

ok. Could you please send a changeloggified patch with a 
Reported-and-tested-by from Arne in a new thread, etc?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06  9:01                                                                         ` Peter Zijlstra
                                                                                             ` (2 preceding siblings ...)
  2011-06-06 15:04                                                                           ` Ingo Molnar
@ 2011-06-07  5:20                                                                           ` Mike Galbraith
  3 siblings, 0 replies; 152+ messages in thread
From: Mike Galbraith @ 2011-06-07  5:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Ingo Molnar, Linus Torvalds, mingo, hpa,
	linux-kernel, npiggin, akpm, frank.rowand, tglx,
	linux-tip-commits

On Mon, 2011-06-06 at 11:01 +0200, Peter Zijlstra wrote:
> On Sun, 2011-06-05 at 22:15 +0200, Arne Jansen wrote:
> > 
> > Can lockdep just get confused by the lockdep_off/on calls in printk
> > while scheduling is allowed? There aren't many users of lockdep_off().
> 
> Yes!, in that case lock_is_held() returns false, triggering the warning.
> I guess there's an argument to be made in favour of the below..

I've been testing/rebooting for a couple hours now, x3550 M3 lockup woes
using Arne's config are history.

All-better-by: (nah, heel sense-o-humor, _down_ i say;)
  
> ---
>  kernel/lockdep.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 53a6895..e4129cf 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -3242,7 +3242,7 @@ int lock_is_held(struct lockdep_map *lock)
>  	int ret = 0;
>  
>  	if (unlikely(current->lockdep_recursion))
> -		return ret;
> +		return 1; /* avoid false negative lockdep_assert_held */
>  
>  	raw_local_irq_save(flags);
>  	check_flags(flags);
> 



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-06 17:11                                                                           ` Peter Zijlstra
@ 2011-06-08 15:50                                                                             ` Peter Zijlstra
  2011-06-08 19:17                                                                               ` Ingo Molnar
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-08 15:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Mon, 2011-06-06 at 19:11 +0200, Peter Zijlstra wrote:
> Delaying the console_sem release will delay anything touching the
> console_sem, including userland stuffs.
> 
> Delaying the console_sem acquire+release will delay showing important
> printk() lines on your serial.
> 
> Both suck. 

I came up with the below hackery, seems to actually boot and such on a
lockdep enabled kernel (although Ingo did report lockups with a partial
version of the patch, still need to look at that).

The idea is to use the console_sem.lock instead of the semaphore itself,
we flush the console when console_sem.count > 0, which means its
uncontended. Its more or less equivalent to down_trylock() + up(),
except it never releases the sem internal lock, and optimizes the count
fiddling away.

It doesn't require a wakeup because any real semaphore contention will
still be spinning on the spinlock instead of enqueueing itself on the
waitlist.

Its rather ugly, exposes semaphore internals in places it shouldn't,
although we could of course expose some primitives for this, but then
people might thing it'd be okay to use them etc..

/me puts on the asbestos underwear

comments?

---
 kernel/printk.c |  113 +++++++++++++++++++-----------------------------------
 1 files changed, 40 insertions(+), 73 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..127b003 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -686,6 +686,7 @@ static void zap_locks(void)
 
 	oops_timestamp = jiffies;
 
+	debug_locks_off();
 	/* If a crash is occurring, make sure we can't deadlock */
 	spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
@@ -769,40 +770,6 @@ static inline int can_use_console(unsigned int cpu)
 	return cpu_online(cpu) || have_callable_console();
 }
 
-/*
- * Try to get console ownership to actually show the kernel
- * messages from a 'printk'. Return true (and with the
- * console_lock held, and 'console_locked' set) if it
- * is successful, false otherwise.
- *
- * This gets called with the 'logbuf_lock' spinlock held and
- * interrupts disabled. It should return with 'lockbuf_lock'
- * released but interrupts still disabled.
- */
-static int console_trylock_for_printk(unsigned int cpu)
-	__releases(&logbuf_lock)
-{
-	int retval = 0;
-
-	if (console_trylock()) {
-		retval = 1;
-
-		/*
-		 * If we can't use the console, we need to release
-		 * the console semaphore by hand to avoid flushing
-		 * the buffer. We need to hold the console semaphore
-		 * in order to do this test safely.
-		 */
-		if (!can_use_console(cpu)) {
-			console_locked = 0;
-			up(&console_sem);
-			retval = 0;
-		}
-	}
-	printk_cpu = UINT_MAX;
-	spin_unlock(&logbuf_lock);
-	return retval;
-}
 static const char recursion_bug_msg [] =
 		KERN_CRIT "BUG: recent printk recursion!\n";
 static int recursion_bug;
@@ -823,6 +790,8 @@ static inline void printk_delay(void)
 	}
 }
 
+static void __console_flush(void);
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
 	int printed_len = 0;
@@ -836,9 +805,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	boot_delay_msec();
 	printk_delay();
 
-	preempt_disable();
 	/* This stops the holder of console_sem just where we want him */
-	raw_local_irq_save(flags);
+	local_irq_save(flags);
 	this_cpu = smp_processor_id();
 
 	/*
@@ -859,7 +827,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		zap_locks();
 	}
 
-	lockdep_off();
 	spin_lock(&logbuf_lock);
 	printk_cpu = this_cpu;
 
@@ -942,25 +909,18 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		if (*p == '\n')
 			new_text_line = 1;
 	}
+	printk_cpu = UINT_MAX;
+	spin_unlock(&logbuf_lock);
 
-	/*
-	 * Try to acquire and then immediately release the
-	 * console semaphore. The release will do all the
-	 * actual magic (print out buffers, wake up klogd,
-	 * etc). 
-	 *
-	 * The console_trylock_for_printk() function
-	 * will release 'logbuf_lock' regardless of whether it
-	 * actually gets the semaphore or not.
-	 */
-	if (console_trylock_for_printk(this_cpu))
-		console_unlock();
+	spin_lock(&console_sem.lock);
+	if (console_sem.count > 0 && can_use_console(smp_processor_id()))
+		__console_flush();
+
+	spin_unlock(&console_sem.lock);
 
-	lockdep_on();
 out_restore_irqs:
-	raw_local_irq_restore(flags);
+	local_irq_restore(flags);
 
-	preempt_enable();
 	return printed_len;
 }
 EXPORT_SYMBOL(printk);
@@ -1224,31 +1184,12 @@ void wake_up_klogd(void)
 		this_cpu_write(printk_pending, 1);
 }
 
-/**
- * console_unlock - unlock the console system
- *
- * Releases the console_lock which the caller holds on the console system
- * and the console driver list.
- *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
- *
- * If there is output waiting for klogd, we wake it up.
- *
- * console_unlock(); may be called from any context.
- */
-void console_unlock(void)
+static void __console_flush(void)
 {
 	unsigned long flags;
 	unsigned _con_start, _log_end;
 	unsigned wake_klogd = 0;
 
-	if (console_suspended) {
-		up(&console_sem);
-		return;
-	}
-
 	console_may_schedule = 0;
 
 	for ( ; ; ) {
@@ -1271,11 +1212,37 @@ void console_unlock(void)
 	if (unlikely(exclusive_console))
 		exclusive_console = NULL;
 
-	up(&console_sem);
 	spin_unlock_irqrestore(&logbuf_lock, flags);
+
 	if (wake_klogd)
 		wake_up_klogd();
 }
+
+/**
+ * console_unlock - unlock the console system
+ *
+ * Releases the console_lock which the caller holds on the console system
+ * and the console driver list.
+ *
+ * While the console_lock was held, console output may have been buffered
+ * by printk().  If this is the case, console_unlock(); emits
+ * the output prior to releasing the lock.
+ *
+ * If there is output waiting for klogd, we wake it up.
+ *
+ * console_unlock(); may be called from any context.
+ */
+void console_unlock(void)
+{
+	if (console_suspended) {
+		up(&console_sem);
+		return;
+	}
+
+	__console_flush();
+
+	up(&console_sem);
+}
 EXPORT_SYMBOL(console_unlock);
 
 /**


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 15:50                                                                             ` Peter Zijlstra
@ 2011-06-08 19:17                                                                               ` Ingo Molnar
  2011-06-08 19:27                                                                                 ` Linus Torvalds
  2011-06-08 19:45                                                                                 ` Peter Zijlstra
  0 siblings, 2 replies; 152+ messages in thread
From: Ingo Molnar @ 2011-06-08 19:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> I came up with the below hackery, seems to actually boot and such 
> on a lockdep enabled kernel (although Ingo did report lockups with 
> a partial version of the patch, still need to look at that).
> 
> The idea is to use the console_sem.lock instead of the semaphore 
> itself, we flush the console when console_sem.count > 0, which 
> means its uncontended. Its more or less equivalent to 
> down_trylock() + up(), except it never releases the sem internal 
> lock, and optimizes the count fiddling away.
> 
> It doesn't require a wakeup because any real semaphore contention 
> will still be spinning on the spinlock instead of enqueueing itself 
> on the waitlist.
> 
> Its rather ugly, exposes semaphore internals in places it 
> shouldn't, although we could of course expose some primitives for 
> this, but then people might thing it'd be okay to use them etc..
> 
> /me puts on the asbestos underwear

Hm, the no-wakeup aspect seems rather useful.

Could we perhaps remove console_sem and replace it with a mutex and 
do something like this with a mutex and its ->wait_lock?

We'd have two happy side effects:

 - we'd thus remove one of the last core kernel semaphore users
 - we'd gain lockdep coverage for console locking as a bonus ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 19:17                                                                               ` Ingo Molnar
@ 2011-06-08 19:27                                                                                 ` Linus Torvalds
  2011-06-08 20:32                                                                                   ` Peter Zijlstra
  2011-06-08 19:45                                                                                 ` Peter Zijlstra
  1 sibling, 1 reply; 152+ messages in thread
From: Linus Torvalds @ 2011-06-08 19:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arne Jansen, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Wed, Jun 8, 2011 at 12:17 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> Hm, the no-wakeup aspect seems rather useful.

I like the patch, but I would like to get it much better abstracted
out. Make some kind of

  void atomic_down();
  int atomic_down_trylock();
  void atomic_up();

interfaces that basically get the semaphore in an "atomic" mode that
leaves the semaphore spinlock locked in the locked region.

So they would basically be spinlocks that can then be mixed with
normal sleeping semaphore usage.

> Could we perhaps remove console_sem and replace it with a mutex and
> do something like this with a mutex and its ->wait_lock?

That would be horrible.

The reason it works well for semaphores is that the semaphores have no
architecture-specific fast-path any more, and everything is done
within the spinlock.

With a mutex? Not good. We have several different mutex
implementations, along with fastpaths that never touch the spinlock at
all etc. And that is very much on purpose, because the spinlock
approach is noticeably slower and needs more atomic accesses. In
contrast, the semaphores are "legacy interfaces" and aren't considered
high-performance locking any more.

                      Linus

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 19:17                                                                               ` Ingo Molnar
  2011-06-08 19:27                                                                                 ` Linus Torvalds
@ 2011-06-08 19:45                                                                                 ` Peter Zijlstra
  2011-06-08 20:52                                                                                   ` Ingo Molnar
  1 sibling, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-08 19:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Wed, 2011-06-08 at 21:17 +0200, Ingo Molnar wrote:
> Hm, the no-wakeup aspect seems rather useful.
> 
> Could we perhaps remove console_sem and replace it with a mutex and 
> do something like this with a mutex and its ->wait_lock?
> 
> We'd have two happy side effects:
> 
>  - we'd thus remove one of the last core kernel semaphore users
>  - we'd gain lockdep coverage for console locking as a bonus ... 

The mutex thing is more complex due to the mutex fast path, the
advantage of the semaphore is its simple implementation that always
takes the internal lock.

I guess I can make it happen, but its a tad more tricky.

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 19:27                                                                                 ` Linus Torvalds
@ 2011-06-08 20:32                                                                                   ` Peter Zijlstra
  2011-06-08 20:53                                                                                     ` Linus Torvalds
  2011-06-08 20:54                                                                                     ` Thomas Gleixner
  0 siblings, 2 replies; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-08 20:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Arne Jansen, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Wed, 2011-06-08 at 12:27 -0700, Linus Torvalds wrote:
> Make some kind of
> 
>   void atomic_down();
>   int atomic_down_trylock();
>   void atomic_up(); 

atomic_down() is a tad iffy, it would have to wait for an actual
semaphore owner, which might sleep etc.. So I skipped it.

The other two are implemented here, and assume IRQs are disabled, we
could add _irq and _irqsave versions of both, but since there are no
users I avoided the effort.

---
 include/linux/semaphore.h |    3 +++
 kernel/semaphore.c        |   36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 38 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/semaphore.h
===================================================================
--- linux-2.6.orig/include/linux/semaphore.h
+++ linux-2.6/include/linux/semaphore.h
@@ -43,4 +43,7 @@ extern int __must_check down_trylock(str
 extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
 extern void up(struct semaphore *sem);
 
+extern int atomic_down_trylock(struct semaphore *sem);
+extern void atomic_up(struct semaphore *sem);
+
 #endif /* __LINUX_SEMAPHORE_H */
Index: linux-2.6/kernel/semaphore.c
===================================================================
--- linux-2.6.orig/kernel/semaphore.c
+++ linux-2.6/kernel/semaphore.c
@@ -118,7 +118,7 @@ EXPORT_SYMBOL(down_killable);
  * down_trylock - try to acquire the semaphore, without waiting
  * @sem: the semaphore to be acquired
  *
- * Try to acquire the semaphore atomically.  Returns 0 if the mutex has
+ * Try to acquire the semaphore atomically.  Returns 0 if the semaphore has
  * been acquired successfully or 1 if it it cannot be acquired.
  *
  * NOTE: This return value is inverted from both spin_trylock and
@@ -143,6 +143,29 @@ int down_trylock(struct semaphore *sem)
 EXPORT_SYMBOL(down_trylock);
 
 /**
+ * atomic_down_trylock - try to acquire the semaphore internal lock
+ * #sem: the semaphore to be acquired
+ *
+ * Try to acquire the semaphore internal lock, blocking all other semaphore
+ * operations. Returns 0 if the trylock has been acquired successfully or
+ * 1 if it cannot be acquired.
+ *
+ * NOTE: This return value is inverted from both spin_trylock and
+ * mutex_trylock!  Be careful about this when converting code.
+ *
+ * NOTE: assumes IRQs are disabled.
+ */
+int atomic_down_trylock(struct semaphore *sem)
+{
+	spin_lock(&sem->lock);
+	if (sem->count > 0)
+		return 0;
+
+	spin_unlock(&sem->lock);
+	return 1;
+}
+
+/**
  * down_timeout - acquire the semaphore within a specified time
  * @sem: the semaphore to be acquired
  * @jiffies: how long to wait before failing
@@ -188,6 +211,17 @@ void up(struct semaphore *sem)
 }
 EXPORT_SYMBOL(up);
 
+/**
+ * atomic_up - release the semaphore internal lock
+ * @sem: the semaphore to release the internal lock of
+ *
+ * Release the semaphore internal lock.
+ */
+void atomic_up(struct semaphore *sem)
+{
+	spin_unlock(&sem->lock);
+}
+
 /* Functions for the contended case */
 
 struct semaphore_waiter {



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 19:45                                                                                 ` Peter Zijlstra
@ 2011-06-08 20:52                                                                                   ` Ingo Molnar
  2011-06-08 21:49                                                                                     ` Peter Zijlstra
  0 siblings, 1 reply; 152+ messages in thread
From: Ingo Molnar @ 2011-06-08 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2011-06-08 at 21:17 +0200, Ingo Molnar wrote:
> > Hm, the no-wakeup aspect seems rather useful.
> > 
> > Could we perhaps remove console_sem and replace it with a mutex and 
> > do something like this with a mutex and its ->wait_lock?
> > 
> > We'd have two happy side effects:
> > 
> >  - we'd thus remove one of the last core kernel semaphore users
> >  - we'd gain lockdep coverage for console locking as a bonus ... 
> 
> The mutex thing is more complex due to the mutex fast path, the 
> advantage of the semaphore is its simple implementation that always 
> takes the internal lock.
> 
> I guess I can make it happen, but its a tad more tricky.

Hm, i thought it would be possible to only express it via the 
slowpath: if mutex_trylock() succeeds then *all* execution goes into 
the slowpath so we don't have to take all the fastpaths into account.

If that's notpossible then i think you and Linus are right that it's 
not worth creating all the per arch fastpath special cases for 
something like this.

The non-removal of the console_sem is sad though. Sniff.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 20:32                                                                                   ` Peter Zijlstra
@ 2011-06-08 20:53                                                                                     ` Linus Torvalds
  2011-06-08 20:54                                                                                     ` Thomas Gleixner
  1 sibling, 0 replies; 152+ messages in thread
From: Linus Torvalds @ 2011-06-08 20:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arne Jansen, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Wed, Jun 8, 2011 at 1:32 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> atomic_down() is a tad iffy, it would have to wait for an actual
> semaphore owner, which might sleep etc.. So I skipped it.

I think sleeping would be fine: the "atomic" part is about the code
the semaphore protects, not about the down() itself.

But the way you made the semantics be (caller has to disable
interrupts) for the other helpers, that doesn't really work.

                               Linus

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 20:32                                                                                   ` Peter Zijlstra
  2011-06-08 20:53                                                                                     ` Linus Torvalds
@ 2011-06-08 20:54                                                                                     ` Thomas Gleixner
  1 sibling, 0 replies; 152+ messages in thread
From: Thomas Gleixner @ 2011-06-08 20:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Arne Jansen, mingo, hpa,
	linux-kernel, efault, npiggin, akpm, frank.rowand,
	linux-tip-commits

On Wed, 8 Jun 2011, Peter Zijlstra wrote:

> On Wed, 2011-06-08 at 12:27 -0700, Linus Torvalds wrote:
> > Make some kind of
> > 
> >   void atomic_down();
> >   int atomic_down_trylock();
> >   void atomic_up(); 
> 
> atomic_down() is a tad iffy, it would have to wait for an actual
> semaphore owner, which might sleep etc.. So I skipped it.
> 
> The other two are implemented here, and assume IRQs are disabled, we
> could add _irq and _irqsave versions of both, but since there are no
> users I avoided the effort.
> 
> ---
>  include/linux/semaphore.h |    3 +++
>  kernel/semaphore.c        |   36 +++++++++++++++++++++++++++++++++++-

Can we please confine this to kernel/printk.c ?

I can see the creative abuse of this already.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 20:52                                                                                   ` Ingo Molnar
@ 2011-06-08 21:49                                                                                     ` Peter Zijlstra
  2011-06-08 21:57                                                                                       ` Thomas Gleixner
  0 siblings, 1 reply; 152+ messages in thread
From: Peter Zijlstra @ 2011-06-08 21:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arne Jansen, Linus Torvalds, mingo, hpa, linux-kernel, efault,
	npiggin, akpm, frank.rowand, tglx, linux-tip-commits

On Wed, 2011-06-08 at 22:52 +0200, Ingo Molnar wrote:
> Hm, i thought it would be possible to only express it via the 
> slowpath: if mutex_trylock() succeeds then *all* execution goes into 
> the slowpath so we don't have to take all the fastpaths into account.

Right, but you first have to take wait_lock, then do the trylock, but
that's complicated for asm/mutex-null.h because trylock will then also
try to obtain the wait_lock.

You can do it by creating ___mutex_trylock_slowpath() which contains the
meat of __mutex_trylock_slowpath() and then implement
atomic_mutex_trylock{_irq,_irqsave,} using that, not releasing wait_lock
on success.

Shouldn't be too bad, but it ain't too pretty either.

Furthermore, like I said in my initial patch, I share Thomas' worry
about 'creative' usage of these primitives.


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages
  2011-06-08 21:49                                                                                     ` Peter Zijlstra
@ 2011-06-08 21:57                                                                                       ` Thomas Gleixner
  0 siblings, 0 replies; 152+ messages in thread
From: Thomas Gleixner @ 2011-06-08 21:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arne Jansen, Linus Torvalds, mingo, hpa,
	linux-kernel, efault, npiggin, akpm, frank.rowand,
	linux-tip-commits

On Wed, 8 Jun 2011, Peter Zijlstra wrote:

> On Wed, 2011-06-08 at 22:52 +0200, Ingo Molnar wrote:
> > Hm, i thought it would be possible to only express it via the 
> > slowpath: if mutex_trylock() succeeds then *all* execution goes into 
> > the slowpath so we don't have to take all the fastpaths into account.
> 
> Right, but you first have to take wait_lock, then do the trylock, but
> that's complicated for asm/mutex-null.h because trylock will then also
> try to obtain the wait_lock.
> 
> You can do it by creating ___mutex_trylock_slowpath() which contains the
> meat of __mutex_trylock_slowpath() and then implement
> atomic_mutex_trylock{_irq,_irqsave,} using that, not releasing wait_lock
> on success.
> 
> Shouldn't be too bad, but it ain't too pretty either.
> 
> Furthermore, like I said in my initial patch, I share Thomas' worry
> about 'creative' usage of these primitives.

We are way better off with the semaphore abuse confined to printk.c.

A mutex would give us lockdep coverage, but due to the strict owner
semantics - which we have already proven in -rt by converting it to a
mutex - we can annotate console_sem lockdep wise and still keep the
nifty semaphore abuse.

Further I don't have any worries about -rt either as a RT task using
printf is doomed anyway and we should not encourage that by making it
somehow more deterministic.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 152+ messages in thread

end of thread, other threads:[~2011-06-08 21:57 UTC | newest]

Thread overview: 152+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-05 15:23 [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
2011-04-05 15:23 ` [PATCH 01/21] sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() Peter Zijlstra
2011-04-13 21:15   ` Tony Luck
2011-04-13 21:38     ` Peter Zijlstra
2011-04-14  8:31   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 02/21] sched: Always provide p->on_cpu Peter Zijlstra
2011-04-14  8:31   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 03/21] mutex: Use p->on_cpu for the adaptive spin Peter Zijlstra
2011-04-14  8:32   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 04/21] sched: Change the ttwu success details Peter Zijlstra
2011-04-13  9:23   ` Peter Zijlstra
2011-04-13 10:48     ` Peter Zijlstra
2011-04-13 11:06       ` Peter Zijlstra
2011-04-13 18:39         ` Tejun Heo
2011-04-13 19:11           ` Peter Zijlstra
2011-04-14  8:32   ` [tip:sched/locking] sched: Change the ttwu() " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 05/21] sched: Clean up ttwu stats Peter Zijlstra
2011-04-14  8:33   ` [tip:sched/locking] sched: Clean up ttwu() stats tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 06/21] sched: Provide p->on_rq Peter Zijlstra
2011-04-14  8:33   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 07/21] sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock Peter Zijlstra
2011-04-14  8:34   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 08/21] sched: Drop the rq argument to sched_class::select_task_rq() Peter Zijlstra
2011-04-14  8:34   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 09/21] sched: Remove rq argument to sched_class::task_waking() Peter Zijlstra
2011-04-14  8:35   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 10/21] sched: Deal with non-atomic min_vruntime reads on 32bits Peter Zijlstra
2011-04-14  8:35   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 11/21] sched: Delay task_contributes_to_load() Peter Zijlstra
2011-04-14  8:35   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 12/21] sched: Also serialize ttwu_local() with p->pi_lock Peter Zijlstra
2011-04-14  8:36   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 13/21] sched: Add p->pi_lock to task_rq_lock() Peter Zijlstra
2011-04-14  8:36   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-06-01 13:58     ` Arne Jansen
2011-06-01 16:35       ` Peter Zijlstra
2011-06-01 17:20         ` Arne Jansen
2011-06-01 18:09           ` Peter Zijlstra
2011-06-01 18:44             ` Peter Zijlstra
2011-06-01 19:30               ` Arne Jansen
2011-06-01 21:09                 ` Linus Torvalds
2011-06-03  9:15                   ` Peter Zijlstra
2011-06-03 10:02                     ` Arne Jansen
2011-06-03 10:30                       ` Peter Zijlstra
2011-06-03 11:52                         ` Arne Jansen
2011-06-05  8:17                         ` Ingo Molnar
2011-06-05  8:53                           ` Arne Jansen
2011-06-05  9:41                             ` Ingo Molnar
2011-06-05  9:45                               ` Ingo Molnar
2011-06-05  9:43                           ` Arne Jansen
2011-06-05  9:55                             ` Ingo Molnar
2011-06-05 10:22                               ` Arne Jansen
2011-06-05 11:01                                 ` Ingo Molnar
2011-06-05 11:19                                   ` [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages Ingo Molnar
2011-06-05 11:36                                     ` Ingo Molnar
2011-06-05 11:57                                       ` Arne Jansen
2011-06-05 13:39                                         ` Ingo Molnar
2011-06-05 13:54                                           ` Arne Jansen
2011-06-05 14:06                                             ` Ingo Molnar
2011-06-05 14:45                                               ` Arne Jansen
2011-06-05 14:10                                             ` Ingo Molnar
2011-06-05 14:31                                               ` Arne Jansen
2011-06-05 15:13                                                 ` Ingo Molnar
2011-06-05 15:26                                                   ` Ingo Molnar
2011-06-05 15:32                                                     ` Ingo Molnar
2011-06-05 16:07                                                       ` Arne Jansen
2011-06-05 16:35                                                         ` Arne Jansen
2011-06-05 16:50                                                           ` Arne Jansen
2011-06-05 17:20                                                             ` Ingo Molnar
2011-06-05 17:42                                                               ` Arne Jansen
2011-06-05 18:59                                                                 ` Ingo Molnar
2011-06-05 19:30                                                                   ` Arne Jansen
2011-06-05 19:44                                                                     ` Ingo Molnar
2011-06-05 20:15                                                                       ` Arne Jansen
2011-06-06  6:56                                                                         ` Arne Jansen
2011-06-06  9:01                                                                         ` Peter Zijlstra
2011-06-06  9:18                                                                           ` Arne Jansen
2011-06-06  9:24                                                                             ` Peter Zijlstra
2011-06-06  9:52                                                                               ` Peter Zijlstra
2011-06-06 10:00                                                                           ` Arne Jansen
2011-06-06 10:26                                                                             ` Peter Zijlstra
2011-06-06 13:25                                                                               ` Peter Zijlstra
2011-06-06 15:04                                                                           ` Ingo Molnar
2011-06-06 15:08                                                                             ` Ingo Molnar
2011-06-06 17:44                                                                               ` Mike Galbraith
2011-06-07  5:20                                                                           ` Mike Galbraith
2011-06-06 13:10                                                                   ` Ingo Molnar
2011-06-06 13:12                                                                     ` Peter Zijlstra
2011-06-06 13:21                                                                       ` Ingo Molnar
2011-06-06 13:31                                                                         ` Peter Zijlstra
2011-06-06  7:34                                                     ` Arne Jansen
2011-06-05 15:34                                                   ` Arne Jansen
2011-06-06  8:38                                                   ` Peter Zijlstra
2011-06-06 14:58                                                     ` Ingo Molnar
2011-06-06 15:09                                                       ` Peter Zijlstra
2011-06-06 15:47                                                         ` Peter Zijlstra
2011-06-06 15:52                                                           ` Ingo Molnar
2011-06-06 16:00                                                             ` Peter Zijlstra
2011-06-06 16:08                                                               ` Ingo Molnar
2011-06-06 16:12                                                                 ` Peter Zijlstra
2011-06-06 16:17                                                                   ` Ingo Molnar
2011-06-06 16:38                                                                     ` Arne Jansen
2011-06-06 16:45                                                                       ` Arne Jansen
2011-06-06 16:53                                                                         ` Peter Zijlstra
2011-06-06 17:07                                                                         ` Ingo Molnar
2011-06-06 17:11                                                                           ` Peter Zijlstra
2011-06-08 15:50                                                                             ` Peter Zijlstra
2011-06-08 19:17                                                                               ` Ingo Molnar
2011-06-08 19:27                                                                                 ` Linus Torvalds
2011-06-08 20:32                                                                                   ` Peter Zijlstra
2011-06-08 20:53                                                                                     ` Linus Torvalds
2011-06-08 20:54                                                                                     ` Thomas Gleixner
2011-06-08 19:45                                                                                 ` Peter Zijlstra
2011-06-08 20:52                                                                                   ` Ingo Molnar
2011-06-08 21:49                                                                                     ` Peter Zijlstra
2011-06-08 21:57                                                                                       ` Thomas Gleixner
2011-06-06 16:44                                                                     ` Peter Zijlstra
2011-06-06 16:50                                                                       ` Peter Zijlstra
2011-06-06 17:13                                                                         ` Ingo Molnar
2011-06-06 17:04                                                                       ` Peter Zijlstra
2011-06-06 17:11                                                                       ` Ingo Molnar
2011-06-06 17:57                                                                         ` Arne Jansen
2011-06-06 18:07                                                                           ` Ingo Molnar
2011-06-06 18:14                                                                             ` Arne Jansen
2011-06-06 18:19                                                                             ` Peter Zijlstra
2011-06-06 22:08                                                                               ` Ingo Molnar
2011-06-03 12:44                       ` [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock() Linus Torvalds
2011-06-03 13:05                         ` Arne Jansen
2011-06-04 21:29                           ` Linus Torvalds
2011-06-04 22:08                             ` Peter Zijlstra
2011-06-04 22:50                               ` Linus Torvalds
2011-06-05  6:01                               ` Arne Jansen
2011-06-05  7:57                                 ` Mike Galbraith
2011-04-05 15:23 ` [PATCH 14/21] sched: Drop rq->lock from first part of wake_up_new_task() Peter Zijlstra
2011-04-14  8:37   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 15/21] sched: Drop rq->lock from sched_exec() Peter Zijlstra
2011-04-14  8:37   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 16/21] sched: Remove rq->lock from the first half of ttwu() Peter Zijlstra
2011-04-14  8:38   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 17/21] sched: Remove rq argument from ttwu_stat() Peter Zijlstra
2011-04-14  8:38   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 18/21] sched: Rename ttwu_post_activation Peter Zijlstra
2011-04-14  8:39   ` [tip:sched/locking] sched: Rename ttwu_post_activation() to ttwu_do_wakeup() tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 19/21] sched: Restructure ttwu some more Peter Zijlstra
2011-04-14  8:39   ` [tip:sched/locking] sched: Restructure ttwu() " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 20/21] sched: Move the second half of ttwu() to the remote cpu Peter Zijlstra
2011-04-14  8:39   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:23 ` [PATCH 21/21] sched: Remove need_migrate_task() Peter Zijlstra
2011-04-14  8:40   ` [tip:sched/locking] " tip-bot for Peter Zijlstra
2011-04-05 15:59 ` [PATCH 00/21] sched: Reduce runqueue lock contention -v6 Peter Zijlstra
2011-04-06 11:00 ` Peter Zijlstra
2011-04-27 16:54 ` Dave Kleikamp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.