linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state
@ 2011-09-17 14:39 Frederic Weisbecker
  2011-09-17 14:39 ` [PATCH 1/5] nohz: Seperate out irq exit and idle loop dyntick logic Frederic Weisbecker
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-09-17 14:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Andy Henroid, Mike Frysinger, Guan Xuetao,
	David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Russell King, Paul Mackerras, Heiko Carstens,
	Paul Mundt

This is an updated set that tries to fix some illegal uses
of RCU that are made in extended quiescent state, ie: between
calls of rcu_enter_nohz() and rcu_exit_nohz().

Changes:

- Restructuring of the changes in nohz core code. We need to handle
the case when the tick is not stopped from tick_nohz_idle_enter() but
from the irq exit path when the irq happens in idle loop. This should
fix the RCU stall Paul detected after applying my patches.
The first patch sort of overlaps the simple scope of fixes for RCU
but I believe it makes the code slightly more simple and optimized
in the end.

- Rename tick_nohz_enter_idle() into tick_nohz_idle_enter() (same for
exit). That to stay coherent with usual naming (tips from Peterz on
the nohz cpuset patchset that needs similar changes).

- Remove the rcu_ext_qs parameter from tick_nohz_idle_exit(). We only
need it in tick_nohz_idle_enter() because as long as we call
rcu_enter_nohz() explicitly, we really want to pair it with an explicit
call to rcu_exit_nohz() (and vice versa), to avoid confusion.

There are still the powerpc fixes that are not included. But I would
like to first get this set merged in Paul's tree before focusing on
the powerpc reported issues.

Frederic Weisbecker (5):
  nohz: Seperate out irq exit and idle loop dyntick logic
  nohz: Allow rcu extended state handling seperately from tick stop
  x86: Enter rcu extended qs after idle notifier call
  x86: Call idle notifier after irq_enter()
  rcu: Fix early call to rcu_irq_exit()

 arch/arm/kernel/process.c                |    4 +-
 arch/avr32/kernel/process.c              |    4 +-
 arch/blackfin/kernel/process.c           |    4 +-
 arch/microblaze/kernel/process.c         |    4 +-
 arch/mips/kernel/process.c               |    4 +-
 arch/powerpc/kernel/idle.c               |    4 +-
 arch/powerpc/platforms/iseries/setup.c   |    8 +-
 arch/s390/kernel/process.c               |    4 +-
 arch/sh/kernel/idle.c                    |    4 +-
 arch/sparc/kernel/process_64.c           |    4 +-
 arch/tile/kernel/process.c               |    4 +-
 arch/um/kernel/process.c                 |    4 +-
 arch/unicore32/kernel/process.c          |    4 +-
 arch/x86/kernel/apic/apic.c              |    6 +-
 arch/x86/kernel/apic/io_apic.c           |    2 +-
 arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
 arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
 arch/x86/kernel/irq.c                    |    6 +-
 arch/x86/kernel/process_32.c             |    4 +-
 arch/x86/kernel/process_64.c             |    9 ++-
 include/linux/tick.h                     |   12 ++-
 kernel/softirq.c                         |    4 +-
 kernel/time/tick-sched.c                 |  110 ++++++++++++++++++++----------
 24 files changed, 131 insertions(+), 84 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] nohz: Seperate out irq exit and idle loop dyntick logic
  2011-09-17 14:39 [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
@ 2011-09-17 14:39 ` Frederic Weisbecker
  2011-09-17 14:39 ` [PATCH 2/5] nohz: Allow rcu extended state handling seperately from tick stop Frederic Weisbecker
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-09-17 14:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Mike Frysinger, Guan Xuetao,
	David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	H. Peter Anvin, Russell King, Paul Mackerras, Heiko Carstens,
	Paul Mundt

The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1 and enters
dynticks idle mode unconditionally if it can.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

The point of all this is to micro-optimize the irq exit path (no need
for local_irq_save there) and to prepare for the split between dynticks
and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---
 arch/arm/kernel/process.c              |    4 +-
 arch/avr32/kernel/process.c            |    4 +-
 arch/blackfin/kernel/process.c         |    4 +-
 arch/microblaze/kernel/process.c       |    4 +-
 arch/mips/kernel/process.c             |    4 +-
 arch/powerpc/kernel/idle.c             |    4 +-
 arch/powerpc/platforms/iseries/setup.c |    8 ++--
 arch/s390/kernel/process.c             |    4 +-
 arch/sh/kernel/idle.c                  |    4 +-
 arch/sparc/kernel/process_64.c         |    4 +-
 arch/tile/kernel/process.c             |    4 +-
 arch/um/kernel/process.c               |    4 +-
 arch/unicore32/kernel/process.c        |    4 +-
 arch/x86/kernel/process_32.c           |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 include/linux/tick.h                   |    9 ++--
 kernel/softirq.c                       |    2 +-
 kernel/time/tick-sched.c               |   85 +++++++++++++++++++-------------
 18 files changed, 89 insertions(+), 71 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 5e1e541..f570e8f 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -182,7 +182,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		leds_event(led_idle_start);
 		while (!need_resched()) {
 #ifdef CONFIG_HOTPLUG_CPU
@@ -208,7 +208,7 @@ void cpu_idle(void)
 			}
 		}
 		leds_event(led_idle_end);
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index ef5a2a0..6ee7952 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			cpu_idle_sleep();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 6a660fa..790b12e 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
 #endif
 		if (!idle)
 			idle = default_idle;
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 968648a..80c749b 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
 		if (!idle)
 			idle = default_idle;
 
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 
 		preempt_enable_no_resched();
 		schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d2112d3..d72a0e9 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched() && cpu_online(cpu)) {
 #ifdef CONFIG_MIPS_MT_SMTC
 			extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
 		     system_state == SYSTEM_BOOTING))
 			play_dead();
 #endif
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..878572f 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
@@ -93,7 +93,7 @@ void cpu_idle(void)
 
 		HMT_medium();
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		if (cpu_should_die())
 			cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index c25a081..e2f5fad 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
 static void iseries_shared_idle(void)
 {
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched() && !hvlpevent_is_pending()) {
 			local_irq_disable();
 			ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 
 		if (hvlpevent_is_pending())
 			process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		if (!need_resched()) {
 			while (!need_resched()) {
 				ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 541a750..db3e930 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
 void cpu_idle(void)
 {
 	for (;;) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			default_idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 425d604..a6b9a96 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -88,7 +88,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 
 		while (!need_resched()) {
 			check_pgt_cache();
@@ -109,7 +109,7 @@ void cpu_idle(void)
 			start_critical_timings();
 		}
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index c158a95..1235f63 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while(1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 
 		while (!need_resched() && !cpu_is_offline(cpu))
 			sparc64_yield(cpu);
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 
 		preempt_enable_no_resched();
 
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 9c45d8b..920e674 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 			if (cpu_is_offline(cpu))
 				BUG();  /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
 				local_irq_enable();
 			current_thread_info()->status |= TS_POLLING;
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index fab4371..046abea 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
 		if (need_resched())
 			schedule();
 
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		nsecs = disable_timer();
 		idle_sleep(nsecs);
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 	}
 }
 
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index ba401df..9999b9a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 			local_irq_disable();
 			stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
 			local_irq_enable();
 			start_critical_timings();
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a3d0dc5..1ab4c58 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -97,7 +97,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 
 			check_pgt_cache();
@@ -112,7 +112,7 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ca6f7ab..d7a6418 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 
 			rmb();
@@ -145,7 +145,7 @@ void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index b232ccc..3f094d4 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -121,14 +121,15 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
 # ifdef CONFIG_NO_HZ
-extern void tick_nohz_stop_sched_tick(int inidle);
-extern void tick_nohz_restart_sched_tick(void);
+extern void tick_nohz_idle_enter(void);
+extern void tick_nohz_idle_exit(void);
+extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_stop_sched_tick(int inidle) { }
-static inline void tick_nohz_restart_sched_tick(void) { }
+static inline void tick_nohz_idle_enter(void) { }
+static inline void tick_nohz_idle_exit(void) { }
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
 	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/softirq.c b/kernel/softirq.c
index fca82c3..d2be0e0 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -351,7 +351,7 @@ void irq_exit(void)
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
-		tick_nohz_stop_sched_tick(0);
+		tick_nohz_irq_exit();
 #endif
 	preempt_enable_no_resched();
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index eb98e55..c783d95 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -246,42 +246,18 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 
-/**
- * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
- *
- * When the next event is more than a tick into the future, stop the idle tick
- * Called either from the idle loop or from irq_exit() when an idle period was
- * just interrupted by an interrupt which did not cause a reschedule.
- */
-void tick_nohz_stop_sched_tick(int inidle)
+
+static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 {
-	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
-	struct tick_sched *ts;
+	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies;
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
 	int cpu;
 
-	local_irq_save(flags);
-
 	cpu = smp_processor_id();
 	ts = &per_cpu(tick_cpu_sched, cpu);
 
-	/*
-	 * Call to tick_nohz_start_idle stops the last_update_time from being
-	 * updated. Thus, it must not be called in the event we are called from
-	 * irq_exit() with the prior state different than idle.
-	 */
-	if (!inidle && !ts->inidle)
-		goto end;
-
-	/*
-	 * Set ts->inidle unconditionally. Even if the system did not
-	 * switch to NOHZ mode the cpu frequency governers rely on the
-	 * update of the idle time accounting in tick_nohz_start_idle().
-	 */
-	ts->inidle = 1;
-
 	now = tick_nohz_start_idle(cpu, ts);
 
 	/*
@@ -297,10 +273,10 @@ void tick_nohz_stop_sched_tick(int inidle)
 	}
 
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
-		goto end;
+		return;
 
 	if (need_resched())
-		goto end;
+		return;
 
 	if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
 		static int ratelimit;
@@ -310,7 +286,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 			       (unsigned int) local_softirq_pending());
 			ratelimit++;
 		}
-		goto end;
+		return;
 	}
 
 	ts->idle_calls++;
@@ -443,8 +419,49 @@ out:
 	ts->next_jiffies = next_jiffies;
 	ts->last_jiffies = last_jiffies;
 	ts->sleep_length = ktime_sub(dev->next_event, now);
-end:
-	local_irq_restore(flags);
+}
+
+/**
+ * tick_nohz_idle_enter - stop the idle tick from the idle task
+ *
+ * When the next event is more than a tick into the future, stop the idle tick
+ * Called when we start the idle loop.
+ */
+void tick_nohz_idle_enter(void)
+{
+	struct tick_sched *ts;
+
+	WARN_ON_ONCE(irqs_disabled());
+
+	local_irq_disable();
+
+	ts = &__get_cpu_var(tick_cpu_sched);
+	/*
+	 * set ts->inidle unconditionally. even if the system did not
+	 * switch to nohz mode the cpu frequency governers rely on the
+	 * update of the idle time accounting in tick_nohz_start_idle().
+	 */
+	ts->inidle = 1;
+	tick_nohz_stop_sched_tick(ts);
+
+	local_irq_enable();
+}
+
+/**
+ * tick_nohz_irq_exit - update next tick event from interrupt exit
+ *
+ * When an interrupt fires while we are idle, it may add,
+ * modify or delete a timer, enqueue an RCU callback, etc...
+ * So we need to re-calculate and reprogram the next tick event.
+ */
+void tick_nohz_irq_exit(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	if (!ts->inidle)
+		return;
+
+	tick_nohz_stop_sched_tick(ts);
 }
 
 /**
@@ -486,11 +503,11 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 }
 
 /**
- * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
+ * tick_nohz_idle_exit - restart the idle tick from the idle task
  *
  * Restart the idle tick when the CPU is woken up from idle
  */
-void tick_nohz_restart_sched_tick(void)
+void tick_nohz_idle_exit(void)
 {
 	int cpu = smp_processor_id();
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] nohz: Allow rcu extended state handling seperately from tick stop
  2011-09-17 14:39 [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
  2011-09-17 14:39 ` [PATCH 1/5] nohz: Seperate out irq exit and idle loop dyntick logic Frederic Weisbecker
@ 2011-09-17 14:39 ` Frederic Weisbecker
  2011-09-17 14:40 ` [PATCH 3/5] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-09-17 14:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Mike Frysinger, Guan Xuetao,
	David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	H. Peter Anvin, Russell King, Paul Mackerras, Heiko Carstens,
	Paul Mundt

It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add a parameter to tick_nohz_enter_idle()
named "rcu_ext_qs" that tells whether we want to enter RCU extended
quiescent state at the same time we stop the tick.

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the parameter
must be set to true and the arch doesn't need to call rcu_enter_nohz()
and rcu_exit_nohz() explicitly.

Otherwise the parameter must be set to false and the arch is
responsible of calling:

- rcu_enter_nohz() after its last use of RCU before the CPU is put
to sleep.
- rcu_exit_nohz() before the first use of RCU after the CPU is woken
up.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---
 arch/arm/kernel/process.c              |    2 +-
 arch/avr32/kernel/process.c            |    2 +-
 arch/blackfin/kernel/process.c         |    2 +-
 arch/microblaze/kernel/process.c       |    2 +-
 arch/mips/kernel/process.c             |    2 +-
 arch/powerpc/kernel/idle.c             |    2 +-
 arch/powerpc/platforms/iseries/setup.c |    4 ++--
 arch/s390/kernel/process.c             |    2 +-
 arch/sh/kernel/idle.c                  |    2 +-
 arch/sparc/kernel/process_64.c         |    2 +-
 arch/tile/kernel/process.c             |    2 +-
 arch/um/kernel/process.c               |    2 +-
 arch/unicore32/kernel/process.c        |    2 +-
 arch/x86/kernel/process_32.c           |    2 +-
 arch/x86/kernel/process_64.c           |    2 +-
 include/linux/tick.h                   |    7 +++++--
 kernel/time/tick-sched.c               |   29 +++++++++++++++++++++++++----
 17 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f570e8f..51b0e39 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -182,7 +182,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		leds_event(led_idle_start);
 		while (!need_resched()) {
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 6ee7952..5041c84 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,7 +34,7 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			cpu_idle_sleep();
 		tick_nohz_idle_exit();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 790b12e..f22a0da 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,7 +88,7 @@ void cpu_idle(void)
 #endif
 		if (!idle)
 			idle = default_idle;
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			idle();
 		tick_nohz_idle_exit();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 80c749b..0f5290f 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,7 +103,7 @@ void cpu_idle(void)
 		if (!idle)
 			idle = default_idle;
 
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			idle();
 		tick_nohz_idle_exit();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d72a0e9..20be814 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched() && cpu_online(cpu)) {
 #ifdef CONFIG_MIPS_MT_SMTC
 			extern void smtc_idle_loop_hook(void);
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 878572f..a0e31a7 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index e2f5fad..f239427 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
 static void iseries_shared_idle(void)
 {
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched() && !hvlpevent_is_pending()) {
 			local_irq_disable();
 			ppc64_runlatch_off();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		if (!need_resched()) {
 			while (!need_resched()) {
 				ppc64_runlatch_off();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index db3e930..3dbaf59 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,7 +90,7 @@ static void default_idle(void)
 void cpu_idle(void)
 {
 	for (;;) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			default_idle();
 		tick_nohz_idle_exit();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index a6b9a96..bb0a627 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -88,7 +88,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 
 		while (!need_resched()) {
 			check_pgt_cache();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 1235f63..3c5d363 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,7 +95,7 @@ void cpu_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while(1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 
 		while (!need_resched() && !cpu_is_offline(cpu))
 			sparc64_yield(cpu);
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 920e674..727dc85 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched()) {
 			if (cpu_is_offline(cpu))
 				BUG();  /* no HOTPLUG_CPU */
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 046abea..5693d6d 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,7 +245,7 @@ void default_idle(void)
 		if (need_resched())
 			schedule();
 
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		nsecs = disable_timer();
 		idle_sleep(nsecs);
 		tick_nohz_idle_exit();
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 9999b9a..afa50d9 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched()) {
 			local_irq_disable();
 			stop_critical_timings();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 1ab4c58..8c2faa9 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -97,7 +97,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched()) {
 
 			check_pgt_cache();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index d7a6418..19ca231 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter(true);
 		while (!need_resched()) {
 
 			rmb();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 3f094d4..375e7d8 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -45,6 +45,8 @@ enum tick_nohz_mode {
  * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
  * @sleep_length:	Duration of the current idle sleep
  * @do_timer_lst:	CPU was the last one doing do_timer before going idle
+ * @rcu_ext_qs:		Set if we want to enter RCU extended quiescent state
+ *			when the tick gets stopped.
  */
 struct tick_sched {
 	struct hrtimer			sched_timer;
@@ -67,6 +69,7 @@ struct tick_sched {
 	unsigned long			next_jiffies;
 	ktime_t				idle_expires;
 	int				do_timer_last;
+	int				rcu_ext_qs;
 };
 
 extern void __init tick_init(void);
@@ -121,14 +124,14 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
 # ifdef CONFIG_NO_HZ
-extern void tick_nohz_idle_enter(void);
+extern void tick_nohz_idle_enter(bool rcu_ext_qs);
 extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_idle_enter(void) { }
+static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
 static inline void tick_nohz_idle_exit(void) { }
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c783d95..88c57a9 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -381,7 +381,8 @@ static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
-			rcu_enter_nohz();
+			if (ts->rcu_ext_qs)
+				rcu_enter_nohz();
 		}
 
 		ts->idle_sleeps++;
@@ -423,11 +424,24 @@ out:
 
 /**
  * tick_nohz_idle_enter - stop the idle tick from the idle task
+ * @rcu_ext_qs: enter into rcu extended quiescent state
  *
- * When the next event is more than a tick into the future, stop the idle tick
+ * When the next event is more than a tick into the future, stop the idle tick.
  * Called when we start the idle loop.
+ *
+ * If no use of RCU is made in the idle loop between
+ * tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, rcu_ext_qs
+ * must be set to true and the arch doesn't need to call rcu_enter_nohz()
+ * and rcu_exit_nohz() explicitly.
+ *
+ * Otherwise the parameter must be set to false and the arch is
+ * responsible of calling:
+ *
+ * - rcu_enter_nohz() after its last use of RCU before the CPU is put
+ *  to sleep.
+ * - rcu_exit_nohz() before the first use of RCU after the CPU is woken up.
  */
-void tick_nohz_idle_enter(void)
+void tick_nohz_idle_enter(bool rcu_ext_qs)
 {
 	struct tick_sched *ts;
 
@@ -442,6 +456,10 @@ void tick_nohz_idle_enter(void)
 	 * update of the idle time accounting in tick_nohz_start_idle().
 	 */
 	ts->inidle = 1;
+
+	if (rcu_ext_qs)
+		ts->rcu_ext_qs = 1;
+
 	tick_nohz_stop_sched_tick(ts);
 
 	local_irq_enable();
@@ -531,7 +549,10 @@ void tick_nohz_idle_exit(void)
 
 	ts->inidle = 0;
 
-	rcu_exit_nohz();
+	if (ts->rcu_ext_qs) {
+		rcu_exit_nohz();
+		ts->rcu_ext_qs = 0;
+	}
 
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] x86: Enter rcu extended qs after idle notifier call
  2011-09-17 14:39 [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
  2011-09-17 14:39 ` [PATCH 1/5] nohz: Seperate out irq exit and idle loop dyntick logic Frederic Weisbecker
  2011-09-17 14:39 ` [PATCH 2/5] nohz: Allow rcu extended state handling seperately from tick stop Frederic Weisbecker
@ 2011-09-17 14:40 ` Frederic Weisbecker
  2011-09-17 14:40 ` [PATCH 4/5] x86: Call idle notifier after irq_enter() Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-09-17 14:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, H. Peter Anvin

The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
rcu dynticks idle mode. And it's illegal to use rcu_read_lock()
in that state.

This results in rcu reporting its bad mood:

[    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    1.275635] Hardware name: AMD690VM-FMH
[    1.275635] Modules linked in:
[    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[    1.275635] Call Trace:
[    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
[    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
[    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
[    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[    1.275635] ---[ end trace a22d306b065d4a66 ]---

Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/process_64.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 19ca231..dee2e6c 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter(true);
+		tick_nohz_idle_enter(false);
 		while (!need_resched()) {
 
 			rmb();
@@ -136,7 +136,12 @@ void cpu_idle(void)
 			enter_idle();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
+
+			/* enter_idle() needs rcu for notifiers */
+			rcu_enter_nohz();
 			pm_idle();
+			rcu_exit_nohz();
+
 			start_critical_timings();
 
 			/* In many cases the interrupt that ended idle
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] x86: Call idle notifier after irq_enter()
  2011-09-17 14:39 [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2011-09-17 14:40 ` [PATCH 3/5] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
@ 2011-09-17 14:40 ` Frederic Weisbecker
  2011-09-17 14:40 ` [PATCH 5/5] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
  2011-09-17 19:05 ` [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Paul E. McKenney
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-09-17 14:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Henroid

Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for rcu_irq_enter() to be called before doing so
otherwise this results in a grumpy RCU:

[    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    0.099991] Hardware name: AMD690VM-FMH
[    0.099991] Modules linked in:
[    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[    0.099991] Call Trace:
[    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
[    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
[    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
[    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Henroid <andrew.d.henroid@intel.com>
---
 arch/x86/kernel/apic/apic.c              |    6 +++---
 arch/x86/kernel/apic/io_apic.c           |    2 +-
 arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
 arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
 arch/x86/kernel/irq.c                    |    6 +++---
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b9338b8..82af921 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -856,8 +856,8 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
 	 * Besides, if we don't timer interrupts ignore the global
 	 * interrupt lock, which is the WrongThing (tm) to do.
 	 */
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	local_apic_timer_interrupt();
 	irq_exit();
 
@@ -1790,8 +1790,8 @@ void smp_spurious_interrupt(struct pt_regs *regs)
 {
 	u32 v;
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	/*
 	 * Check if this really is a spurious interrupt and ACK it
 	 * if it is a vectored one.  Just in case...
@@ -1827,8 +1827,8 @@ void smp_error_interrupt(struct pt_regs *regs)
 		"Illegal register address",	/* APIC Error Bit 7 */
 	};
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	/* First tickle the hardware, only then report what went on. -- REW */
 	v0 = apic_read(APIC_ESR);
 	apic_write(APIC_ESR, 0);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e529339..e1b5eec 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2275,8 +2275,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
 	unsigned vector, me;
 
 	ack_APIC_irq();
-	exit_idle();
 	irq_enter();
+	exit_idle();
 
 	me = smp_processor_id();
 	for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index ff1ae9b..7ba9757 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -470,8 +470,8 @@ static inline void mce_get_rip(struct mce *m, struct pt_regs *regs)
 asmlinkage void smp_mce_self_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	mce_notify_irq();
 	mce_schedule_work();
 	irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 27c6251..f6bbc64 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -396,8 +396,8 @@ static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
 
 asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
 {
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	inc_irq_stat(irq_thermal_count);
 	smp_thermal_vector();
 	irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
index d746df2..aa578ca 100644
--- a/arch/x86/kernel/cpu/mcheck/threshold.c
+++ b/arch/x86/kernel/cpu/mcheck/threshold.c
@@ -19,8 +19,8 @@ void (*mce_threshold_vector)(void) = default_threshold_interrupt;
 
 asmlinkage void smp_threshold_interrupt(void)
 {
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	inc_irq_stat(irq_threshold_count);
 	mce_threshold_vector();
 	irq_exit();
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6c0802e..73cf928 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -180,8 +180,8 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
 	unsigned vector = ~regs->orig_ax;
 	unsigned irq;
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 
 	irq = __this_cpu_read(vector_irq[vector]);
 
@@ -208,10 +208,10 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
 
 	ack_APIC_irq();
 
-	exit_idle();
-
 	irq_enter();
 
+	exit_idle();
+
 	inc_irq_stat(x86_platform_ipis);
 
 	if (x86_platform_ipi_callback)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] rcu: Fix early call to rcu_irq_exit()
  2011-09-17 14:39 [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2011-09-17 14:40 ` [PATCH 4/5] x86: Call idle notifier after irq_enter() Frederic Weisbecker
@ 2011-09-17 14:40 ` Frederic Weisbecker
  2011-09-17 19:05 ` [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Paul E. McKenney
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-09-17 14:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra

On the irq exit path, tick_nohz_irq_exit()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.

This is an illegal use of RCU because we may be in dyntick
idle mode:

[  132.978883] ===============================
[  132.978883] [ INFO: suspicious RCU usage. ]
[  132.978883] -------------------------------
[  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[  132.978883]
[  132.978883] other info that might help us debug this:
[  132.978883]
[  132.978883]
[  132.978883] rcu_scheduler_active = 1, debug_locks = 0
[  132.978883] RCU used illegally from extended quiescent state!
[  132.978883] 2 locks held by swapper/0:
[  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
[  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
[  132.978883]
[  132.978883] stack backtrace:
[  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
[  132.978883] Call Trace:
[  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
[  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
[  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
[  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
[  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
[  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
[  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
[  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
[  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
[  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
[  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
[  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
[  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
[  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
[  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
[  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
[  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
[  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
[  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
[  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
[  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
[  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112

Fix this by calling rcu_irq_exit() after tick_nohz_stop_sched_tick().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/softirq.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d2be0e0..328aabb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -347,12 +347,12 @@ void irq_exit(void)
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
 
-	rcu_irq_exit();
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
 		tick_nohz_irq_exit();
 #endif
+	rcu_irq_exit();
 	preempt_enable_no_resched();
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state
  2011-09-17 14:39 [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2011-09-17 14:40 ` [PATCH 5/5] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
@ 2011-09-17 19:05 ` Paul E. McKenney
  5 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2011-09-17 19:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Andy Henroid, Mike Frysinger, Guan Xuetao, David Miller,
	Chris Metcalf, Hans-Christian Egtvedt, Ralf Baechle,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Sat, Sep 17, 2011 at 04:39:57PM +0200, Frederic Weisbecker wrote:
> This is an updated set that tries to fix some illegal uses
> of RCU that are made in extended quiescent state, ie: between
> calls of rcu_enter_nohz() and rcu_exit_nohz().

Thank you, Frederic!

I have queued these, and have started testing them.

							Thanx, Paul

> Changes:
> 
> - Restructuring of the changes in nohz core code. We need to handle
> the case when the tick is not stopped from tick_nohz_idle_enter() but
> from the irq exit path when the irq happens in idle loop. This should
> fix the RCU stall Paul detected after applying my patches.
> The first patch sort of overlaps the simple scope of fixes for RCU
> but I believe it makes the code slightly more simple and optimized
> in the end.
> 
> - Rename tick_nohz_enter_idle() into tick_nohz_idle_enter() (same for
> exit). That to stay coherent with usual naming (tips from Peterz on
> the nohz cpuset patchset that needs similar changes).
> 
> - Remove the rcu_ext_qs parameter from tick_nohz_idle_exit(). We only
> need it in tick_nohz_idle_enter() because as long as we call
> rcu_enter_nohz() explicitly, we really want to pair it with an explicit
> call to rcu_exit_nohz() (and vice versa), to avoid confusion.
> 
> There are still the powerpc fixes that are not included. But I would
> like to first get this set merged in Paul's tree before focusing on
> the powerpc reported issues.
> 
> Frederic Weisbecker (5):
>   nohz: Seperate out irq exit and idle loop dyntick logic
>   nohz: Allow rcu extended state handling seperately from tick stop
>   x86: Enter rcu extended qs after idle notifier call
>   x86: Call idle notifier after irq_enter()
>   rcu: Fix early call to rcu_irq_exit()
> 
>  arch/arm/kernel/process.c                |    4 +-
>  arch/avr32/kernel/process.c              |    4 +-
>  arch/blackfin/kernel/process.c           |    4 +-
>  arch/microblaze/kernel/process.c         |    4 +-
>  arch/mips/kernel/process.c               |    4 +-
>  arch/powerpc/kernel/idle.c               |    4 +-
>  arch/powerpc/platforms/iseries/setup.c   |    8 +-
>  arch/s390/kernel/process.c               |    4 +-
>  arch/sh/kernel/idle.c                    |    4 +-
>  arch/sparc/kernel/process_64.c           |    4 +-
>  arch/tile/kernel/process.c               |    4 +-
>  arch/um/kernel/process.c                 |    4 +-
>  arch/unicore32/kernel/process.c          |    4 +-
>  arch/x86/kernel/apic/apic.c              |    6 +-
>  arch/x86/kernel/apic/io_apic.c           |    2 +-
>  arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
>  arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
>  arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
>  arch/x86/kernel/irq.c                    |    6 +-
>  arch/x86/kernel/process_32.c             |    4 +-
>  arch/x86/kernel/process_64.c             |    9 ++-
>  include/linux/tick.h                     |   12 ++-
>  kernel/softirq.c                         |    4 +-
>  kernel/time/tick-sched.c                 |  110 ++++++++++++++++++++----------
>  24 files changed, 131 insertions(+), 84 deletions(-)
> 
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-09-17 19:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-17 14:39 [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
2011-09-17 14:39 ` [PATCH 1/5] nohz: Seperate out irq exit and idle loop dyntick logic Frederic Weisbecker
2011-09-17 14:39 ` [PATCH 2/5] nohz: Allow rcu extended state handling seperately from tick stop Frederic Weisbecker
2011-09-17 14:40 ` [PATCH 3/5] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
2011-09-17 14:40 ` [PATCH 4/5] x86: Call idle notifier after irq_enter() Frederic Weisbecker
2011-09-17 14:40 ` [PATCH 5/5] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
2011-09-17 19:05 ` [PATCH 0/5 v3] rcu: Fix some rcu uses in extended quiescent state Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).