All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state
@ 2011-07-11 20:34 Frederic Weisbecker
  2011-07-11 20:34 ` [PATCH 1/3] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-07-11 20:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Thomas Gleixner, H. Peter Anvin,
	David Miller, Chris Metcalf, Guan Xuetao, Hans-Christian Egtvedt,
	Mike Frysinger, Ralf Baechle, Peter Zijlstra, Russell King,
	Paul Mackerras, Heiko Carstens, Paul Mundt

Hi,

First patch lays the ground to fix rcu uses in dyntick idle mode
by splitting rcu extended qs state logic from tick nohz one.

The rest fixes the misuses that RCU has detected for me in x86.

I'll try to fix those Paul has detected in PowerPc.

Frederic Weisbecker (3):
  nohz: Split extended quiescent state handling from nohz switch
  x86: Enter rcu extended qs after idle notifier call
  x86: Call idle notifier after irq_enter()

 arch/arm/kernel/process.c                |    4 +-
 arch/avr32/kernel/process.c              |    4 +-
 arch/blackfin/kernel/process.c           |    4 +-
 arch/microblaze/kernel/process.c         |    4 +-
 arch/mips/kernel/process.c               |    4 +-
 arch/powerpc/kernel/idle.c               |    4 +-
 arch/powerpc/platforms/iseries/setup.c   |    8 +++---
 arch/s390/kernel/process.c               |    4 +-
 arch/sh/kernel/idle.c                    |    2 +-
 arch/sparc/kernel/process_64.c           |    4 +-
 arch/tile/kernel/process.c               |    4 +-
 arch/um/kernel/process.c                 |    4 +-
 arch/unicore32/kernel/process.c          |    4 +-
 arch/x86/kernel/apic/apic.c              |    6 ++--
 arch/x86/kernel/apic/io_apic.c           |    2 +-
 arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
 arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
 arch/x86/kernel/irq.c                    |    6 ++--
 arch/x86/kernel/process_32.c             |    4 +-
 arch/x86/kernel/process_64.c             |    5 ++++
 include/linux/tick.h                     |   10 +++++--
 kernel/time/tick-sched.c                 |   36 ++++++++++++++++++++++++++---
 23 files changed, 83 insertions(+), 46 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] nohz: Split extended quiescent state handling from nohz switch
  2011-07-11 20:34 [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
@ 2011-07-11 20:34 ` Frederic Weisbecker
  2011-07-11 20:34 ` [PATCH 2/3] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-07-11 20:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
	Heiko Carstens, Paul Mundt

It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, split the tickless mode switching and
RCU extended quiescent state logics.
Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
a new pair of APIs tick_nohz_enter/exit_idle() that keep the
old behaviour by handling both the nohz mode and RCU extended
quiescent states, then convert every archs to use these.

Archs that want to switch to extended QS to some custom points
can do it later by using tick_nohz_stop_sched_tick() and
rcu_enter_nohz() seperately.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---
 arch/arm/kernel/process.c              |    4 +-
 arch/avr32/kernel/process.c            |    4 +-
 arch/blackfin/kernel/process.c         |    4 +-
 arch/microblaze/kernel/process.c       |    4 +-
 arch/mips/kernel/process.c             |    4 +-
 arch/powerpc/kernel/idle.c             |    4 +-
 arch/powerpc/platforms/iseries/setup.c |    8 +++---
 arch/s390/kernel/process.c             |    4 +-
 arch/sh/kernel/idle.c                  |    2 +-
 arch/sparc/kernel/process_64.c         |    4 +-
 arch/tile/kernel/process.c             |    4 +-
 arch/um/kernel/process.c               |    4 +-
 arch/unicore32/kernel/process.c        |    4 +-
 arch/x86/kernel/process_32.c           |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 include/linux/tick.h                   |   10 ++++++--
 kernel/time/tick-sched.c               |   36 ++++++++++++++++++++++++++++---
 17 files changed, 70 insertions(+), 38 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 5e1e541..27b68b0 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -182,7 +182,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		leds_event(led_idle_start);
 		while (!need_resched()) {
 #ifdef CONFIG_HOTPLUG_CPU
@@ -208,7 +208,7 @@ void cpu_idle(void)
 			}
 		}
 		leds_event(led_idle_end);
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index ef5a2a0..e683a34 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched())
 			cpu_idle_sleep();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 6a660fa..8082a8f 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
 #endif
 		if (!idle)
 			idle = default_idle;
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched())
 			idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 968648a..1b295b2 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
 		if (!idle)
 			idle = default_idle;
 
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched())
 			idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 
 		preempt_enable_no_resched();
 		schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d2112d3..cdbfa52 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched() && cpu_online(cpu)) {
 #ifdef CONFIG_MIPS_MT_SMTC
 			extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
 		     system_state == SYSTEM_BOOTING))
 			play_dead();
 #endif
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..1108260 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
@@ -93,7 +93,7 @@ void cpu_idle(void)
 
 		HMT_medium();
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		if (cpu_should_die())
 			cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index c25a081..d40dcd9 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
 static void iseries_shared_idle(void)
 {
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched() && !hvlpevent_is_pending()) {
 			local_irq_disable();
 			ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 
 		if (hvlpevent_is_pending())
 			process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		if (!need_resched()) {
 			while (!need_resched()) {
 				ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 541a750..560cd94 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
 void cpu_idle(void)
 {
 	for (;;) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched())
 			default_idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 425d604..3957972 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -88,7 +88,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 
 		while (!need_resched()) {
 			check_pgt_cache();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index c158a95..5c36632 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while(1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 
 		while (!need_resched() && !cpu_is_offline(cpu))
 			sparc64_yield(cpu);
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 
 		preempt_enable_no_resched();
 
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 9c45d8b..cc1bd4f 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched()) {
 			if (cpu_is_offline(cpu))
 				BUG();  /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
 				local_irq_enable();
 			current_thread_info()->status |= TS_POLLING;
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index fab4371..f1b3864 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
 		if (need_resched())
 			schedule();
 
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		nsecs = disable_timer();
 		idle_sleep(nsecs);
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 	}
 }
 
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index ba401df..e2df91a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched()) {
 			local_irq_disable();
 			stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
 			local_irq_enable();
 			start_critical_timings();
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a3d0dc5..1d7e26c 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -97,7 +97,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched()) {
 
 			check_pgt_cache();
@@ -112,7 +112,7 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ca6f7ab..4f3cb3e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_enter_idle();
 		while (!need_resched()) {
 
 			rmb();
@@ -145,7 +145,7 @@ void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_exit_idle();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index b232ccc..ff31a71 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -121,14 +121,18 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
 # ifdef CONFIG_NO_HZ
-extern void tick_nohz_stop_sched_tick(int inidle);
+extern bool tick_nohz_stop_sched_tick(int inidle);
 extern void tick_nohz_restart_sched_tick(void);
+extern void tick_nohz_enter_idle(void);
+extern void tick_nohz_exit_idle(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_stop_sched_tick(int inidle) { }
-static inline void tick_nohz_restart_sched_tick(void) { }
+static inline bool tick_nohz_stop_sched_tick(int inidle) { return false; }
+static inline void tick_nohz_restart_sched_tick(void) { return false; }
+static inline void tick_nohz_enter_idle(void) { }
+static inline void tick_nohz_exit_idle(void) { }
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
 	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d5097c4..8e81e9f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -254,12 +254,13 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
  * Called either from the idle loop or from irq_exit() when an idle period was
  * just interrupted by an interrupt which did not cause a reschedule.
  */
-void tick_nohz_stop_sched_tick(int inidle)
+bool tick_nohz_stop_sched_tick(int inidle)
 {
 	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
 	struct tick_sched *ts;
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
+	int stopped = false;
 	u64 time_delta;
 	int cpu;
 
@@ -409,7 +410,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
-			rcu_enter_nohz();
+			stopped = true;
 		}
 
 		ts->idle_sleeps++;
@@ -450,6 +451,22 @@ out:
 	ts->sleep_length = ktime_sub(dev->next_event, now);
 end:
 	local_irq_restore(flags);
+
+	return stopped;
+}
+
+
+/**
+ * tick_nohz_enter_idle - stop the tick and enter extended quiescent state
+ *
+ * Most arch may want to enter RCU extended state right after they switched
+ * to nohz mode. Beware though, no read side use of RCU can be done until we
+ * call tick_nohz_exit_idle().
+ */
+void tick_nohz_enter_idle(void)
+{
+	if (tick_nohz_stop_sched_tick(1))
+		rcu_enter_nohz();
 }
 
 /**
@@ -519,8 +536,6 @@ void tick_nohz_restart_sched_tick(void)
 
 	ts->inidle = 0;
 
-	rcu_exit_nohz();
-
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);
@@ -552,6 +567,19 @@ void tick_nohz_restart_sched_tick(void)
 	local_irq_enable();
 }
 
+/**
+ * tick_nohz_exit_idle - restart the tick and exit extended quiescent state
+ */
+void tick_nohz_exit_idle(void)
+{
+	struct tick_sched *ts = &__raw_get_cpu_var(tick_cpu_sched);
+
+	if (ts->tick_stopped)
+		rcu_exit_nohz();
+
+	tick_nohz_restart_sched_tick();
+}
+
 static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t now)
 {
 	hrtimer_forward(&ts->sched_timer, now, tick_period);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] x86: Enter rcu extended qs after idle notifier call
  2011-07-11 20:34 [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
  2011-07-11 20:34 ` [PATCH 1/3] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
@ 2011-07-11 20:34 ` Frederic Weisbecker
  2011-07-11 20:34 ` [PATCH 3/3] x86: Call idle notifier after irq_enter() Frederic Weisbecker
  2011-07-15 23:32 ` [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Paul E. McKenney
  3 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-07-11 20:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, H. Peter Anvin

The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
rcu dynticks idle mode. And it's illegal to use rcu_read_lock()
in that state.

This results in rcu reporting its bad mood:

[    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    1.275635] Hardware name: AMD690VM-FMH
[    1.275635] Modules linked in:
[    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[    1.275635] Call Trace:
[    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
[    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
[    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
[    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[    1.275635] ---[ end trace a22d306b065d4a66 ]---

Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/process_64.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4f3cb3e..df5f83d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_enter_idle();
+		tick_nohz_stop_sched_tick(1);
 		while (!need_resched()) {
 
 			rmb();
@@ -136,7 +136,12 @@ void cpu_idle(void)
 			enter_idle();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
+
+			/* enter_idle() needs rcu for notifiers */
+			rcu_enter_nohz();
 			pm_idle();
+			rcu_exit_nohz();
+
 			start_critical_timings();
 
 			/* In many cases the interrupt that ended idle
@@ -145,7 +150,7 @@ void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_exit_idle();
+		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] x86: Call idle notifier after irq_enter()
  2011-07-11 20:34 [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
  2011-07-11 20:34 ` [PATCH 1/3] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
  2011-07-11 20:34 ` [PATCH 2/3] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
@ 2011-07-11 20:34 ` Frederic Weisbecker
  2011-07-15 23:32 ` [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Paul E. McKenney
  3 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2011-07-11 20:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, H. Peter Anvin

Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for rcu_irq_enter() to be called before doing so
otherwise this results in a grumpy RCU:

[    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    0.099991] Hardware name: AMD690VM-FMH
[    0.099991] Modules linked in:
[    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[    0.099991] Call Trace:
[    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
[    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
[    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
[    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/apic/apic.c              |    6 +++---
 arch/x86/kernel/apic/io_apic.c           |    2 +-
 arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
 arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
 arch/x86/kernel/irq.c                    |    6 +++---
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b9338b8..82af921 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -856,8 +856,8 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
 	 * Besides, if we don't timer interrupts ignore the global
 	 * interrupt lock, which is the WrongThing (tm) to do.
 	 */
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	local_apic_timer_interrupt();
 	irq_exit();
 
@@ -1790,8 +1790,8 @@ void smp_spurious_interrupt(struct pt_regs *regs)
 {
 	u32 v;
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	/*
 	 * Check if this really is a spurious interrupt and ACK it
 	 * if it is a vectored one.  Just in case...
@@ -1827,8 +1827,8 @@ void smp_error_interrupt(struct pt_regs *regs)
 		"Illegal register address",	/* APIC Error Bit 7 */
 	};
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	/* First tickle the hardware, only then report what went on. -- REW */
 	v0 = apic_read(APIC_ESR);
 	apic_write(APIC_ESR, 0);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e529339..e1b5eec 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2275,8 +2275,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
 	unsigned vector, me;
 
 	ack_APIC_irq();
-	exit_idle();
 	irq_enter();
+	exit_idle();
 
 	me = smp_processor_id();
 	for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index ff1ae9b..7ba9757 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -470,8 +470,8 @@ static inline void mce_get_rip(struct mce *m, struct pt_regs *regs)
 asmlinkage void smp_mce_self_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	mce_notify_irq();
 	mce_schedule_work();
 	irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 27c6251..f6bbc64 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -396,8 +396,8 @@ static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
 
 asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
 {
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	inc_irq_stat(irq_thermal_count);
 	smp_thermal_vector();
 	irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
index d746df2..aa578ca 100644
--- a/arch/x86/kernel/cpu/mcheck/threshold.c
+++ b/arch/x86/kernel/cpu/mcheck/threshold.c
@@ -19,8 +19,8 @@ void (*mce_threshold_vector)(void) = default_threshold_interrupt;
 
 asmlinkage void smp_threshold_interrupt(void)
 {
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	inc_irq_stat(irq_threshold_count);
 	mce_threshold_vector();
 	irq_exit();
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6c0802e..73cf928 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -180,8 +180,8 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
 	unsigned vector = ~regs->orig_ax;
 	unsigned irq;
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 
 	irq = __this_cpu_read(vector_irq[vector]);
 
@@ -208,10 +208,10 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
 
 	ack_APIC_irq();
 
-	exit_idle();
-
 	irq_enter();
 
+	exit_idle();
+
 	inc_irq_stat(x86_platform_ipis);
 
 	if (x86_platform_ipi_callback)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state
  2011-07-11 20:34 [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2011-07-11 20:34 ` [PATCH 3/3] x86: Call idle notifier after irq_enter() Frederic Weisbecker
@ 2011-07-15 23:32 ` Paul E. McKenney
  2011-07-15 23:41   ` Frederic Weisbecker
  3 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2011-07-15 23:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, H. Peter Anvin, David Miller,
	Chris Metcalf, Guan Xuetao, Hans-Christian Egtvedt,
	Mike Frysinger, Ralf Baechle, Peter Zijlstra, Russell King,
	Paul Mackerras, Heiko Carstens, Paul Mundt

On Mon, Jul 11, 2011 at 10:34:29PM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> First patch lays the ground to fix rcu uses in dyntick idle mode
> by splitting rcu extended qs state logic from tick nohz one.
> 
> The rest fixes the misuses that RCU has detected for me in x86.

These look good to me -- I have queued them and will try testing
them out.

> I'll try to fix those Paul has detected in PowerPc.

Looking forward to that one as well.  ;-)

It would be really cool if this approach could serve as RCU's idle-CPU
detection regardless of whether or not CONFIG_NO_HZ was set.  For one
thing, this would simplify the RCU code that treats idle tasks as extended
quiescent states.  Except during boot time.  :-/

							Thanx, Paul

> Frederic Weisbecker (3):
>   nohz: Split extended quiescent state handling from nohz switch
>   x86: Enter rcu extended qs after idle notifier call
>   x86: Call idle notifier after irq_enter()
> 
>  arch/arm/kernel/process.c                |    4 +-
>  arch/avr32/kernel/process.c              |    4 +-
>  arch/blackfin/kernel/process.c           |    4 +-
>  arch/microblaze/kernel/process.c         |    4 +-
>  arch/mips/kernel/process.c               |    4 +-
>  arch/powerpc/kernel/idle.c               |    4 +-
>  arch/powerpc/platforms/iseries/setup.c   |    8 +++---
>  arch/s390/kernel/process.c               |    4 +-
>  arch/sh/kernel/idle.c                    |    2 +-
>  arch/sparc/kernel/process_64.c           |    4 +-
>  arch/tile/kernel/process.c               |    4 +-
>  arch/um/kernel/process.c                 |    4 +-
>  arch/unicore32/kernel/process.c          |    4 +-
>  arch/x86/kernel/apic/apic.c              |    6 ++--
>  arch/x86/kernel/apic/io_apic.c           |    2 +-
>  arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
>  arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
>  arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
>  arch/x86/kernel/irq.c                    |    6 ++--
>  arch/x86/kernel/process_32.c             |    4 +-
>  arch/x86/kernel/process_64.c             |    5 ++++
>  include/linux/tick.h                     |   10 +++++--
>  kernel/time/tick-sched.c                 |   36 ++++++++++++++++++++++++++---
>  23 files changed, 83 insertions(+), 46 deletions(-)
> 
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state
  2011-07-15 23:32 ` [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Paul E. McKenney
@ 2011-07-15 23:41   ` Frederic Weisbecker
  2011-07-28 20:56     ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2011-07-15 23:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Thomas Gleixner, H. Peter Anvin, David Miller,
	Chris Metcalf, Guan Xuetao, Hans-Christian Egtvedt,
	Mike Frysinger, Ralf Baechle, Peter Zijlstra, Russell King,
	Paul Mackerras, Heiko Carstens, Paul Mundt

On Fri, Jul 15, 2011 at 04:32:03PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 11, 2011 at 10:34:29PM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > First patch lays the ground to fix rcu uses in dyntick idle mode
> > by splitting rcu extended qs state logic from tick nohz one.
> > 
> > The rest fixes the misuses that RCU has detected for me in x86.
> 
> These look good to me -- I have queued them and will try testing
> them out.
> 
> > I'll try to fix those Paul has detected in PowerPc.
> 
> Looking forward to that one as well.  ;-)
> 
> It would be really cool if this approach could serve as RCU's idle-CPU
> detection regardless of whether or not CONFIG_NO_HZ was set.  For one
> thing, this would simplify the RCU code that treats idle tasks as extended
> quiescent states.  Except during boot time.  :-/

For that we could just build rcu_enter_nohz() things on CONFIG_NO_HZ || CONFIG_PROVE_RCU.

> 
> 							Thanx, Paul
> 
> > Frederic Weisbecker (3):
> >   nohz: Split extended quiescent state handling from nohz switch
> >   x86: Enter rcu extended qs after idle notifier call
> >   x86: Call idle notifier after irq_enter()
> > 
> >  arch/arm/kernel/process.c                |    4 +-
> >  arch/avr32/kernel/process.c              |    4 +-
> >  arch/blackfin/kernel/process.c           |    4 +-
> >  arch/microblaze/kernel/process.c         |    4 +-
> >  arch/mips/kernel/process.c               |    4 +-
> >  arch/powerpc/kernel/idle.c               |    4 +-
> >  arch/powerpc/platforms/iseries/setup.c   |    8 +++---
> >  arch/s390/kernel/process.c               |    4 +-
> >  arch/sh/kernel/idle.c                    |    2 +-
> >  arch/sparc/kernel/process_64.c           |    4 +-
> >  arch/tile/kernel/process.c               |    4 +-
> >  arch/um/kernel/process.c                 |    4 +-
> >  arch/unicore32/kernel/process.c          |    4 +-
> >  arch/x86/kernel/apic/apic.c              |    6 ++--
> >  arch/x86/kernel/apic/io_apic.c           |    2 +-
> >  arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
> >  arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
> >  arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
> >  arch/x86/kernel/irq.c                    |    6 ++--
> >  arch/x86/kernel/process_32.c             |    4 +-
> >  arch/x86/kernel/process_64.c             |    5 ++++
> >  include/linux/tick.h                     |   10 +++++--
> >  kernel/time/tick-sched.c                 |   36 ++++++++++++++++++++++++++---
> >  23 files changed, 83 insertions(+), 46 deletions(-)
> > 
> > -- 
> > 1.7.5.4
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state
  2011-07-15 23:41   ` Frederic Weisbecker
@ 2011-07-28 20:56     ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2011-07-28 20:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, H. Peter Anvin, David Miller,
	Chris Metcalf, Guan Xuetao, Hans-Christian Egtvedt,
	Mike Frysinger, Ralf Baechle, Peter Zijlstra, Russell King,
	Paul Mackerras, Heiko Carstens, Paul Mundt

On Sat, Jul 16, 2011 at 01:41:56AM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 15, 2011 at 04:32:03PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 11, 2011 at 10:34:29PM +0200, Frederic Weisbecker wrote:
> > > Hi,
> > > 
> > > First patch lays the ground to fix rcu uses in dyntick idle mode
> > > by splitting rcu extended qs state logic from tick nohz one.
> > > 
> > > The rest fixes the misuses that RCU has detected for me in x86.
> > 
> > These look good to me -- I have queued them and will try testing
> > them out.
> > 
> > > I'll try to fix those Paul has detected in PowerPc.
> > 
> > Looking forward to that one as well.  ;-)
> > 
> > It would be really cool if this approach could serve as RCU's idle-CPU
> > detection regardless of whether or not CONFIG_NO_HZ was set.  For one
> > thing, this would simplify the RCU code that treats idle tasks as extended
> > quiescent states.  Except during boot time.  :-/
> 
> For that we could just build rcu_enter_nohz() things on CONFIG_NO_HZ || CONFIG_PROVE_RCU.

Actually, it would be really cool if RCU could just rely on those
callbacks to detect idle regardless of the configuration.  Might not
be practical, but it would be nice.  ;-)

							Thanx, Paul

> > > Frederic Weisbecker (3):
> > >   nohz: Split extended quiescent state handling from nohz switch
> > >   x86: Enter rcu extended qs after idle notifier call
> > >   x86: Call idle notifier after irq_enter()
> > > 
> > >  arch/arm/kernel/process.c                |    4 +-
> > >  arch/avr32/kernel/process.c              |    4 +-
> > >  arch/blackfin/kernel/process.c           |    4 +-
> > >  arch/microblaze/kernel/process.c         |    4 +-
> > >  arch/mips/kernel/process.c               |    4 +-
> > >  arch/powerpc/kernel/idle.c               |    4 +-
> > >  arch/powerpc/platforms/iseries/setup.c   |    8 +++---
> > >  arch/s390/kernel/process.c               |    4 +-
> > >  arch/sh/kernel/idle.c                    |    2 +-
> > >  arch/sparc/kernel/process_64.c           |    4 +-
> > >  arch/tile/kernel/process.c               |    4 +-
> > >  arch/um/kernel/process.c                 |    4 +-
> > >  arch/unicore32/kernel/process.c          |    4 +-
> > >  arch/x86/kernel/apic/apic.c              |    6 ++--
> > >  arch/x86/kernel/apic/io_apic.c           |    2 +-
> > >  arch/x86/kernel/cpu/mcheck/mce.c         |    2 +-
> > >  arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
> > >  arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
> > >  arch/x86/kernel/irq.c                    |    6 ++--
> > >  arch/x86/kernel/process_32.c             |    4 +-
> > >  arch/x86/kernel/process_64.c             |    5 ++++
> > >  include/linux/tick.h                     |   10 +++++--
> > >  kernel/time/tick-sched.c                 |   36 ++++++++++++++++++++++++++---
> > >  23 files changed, 83 insertions(+), 46 deletions(-)
> > > 
> > > -- 
> > > 1.7.5.4
> > > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-07-28 20:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-11 20:34 [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
2011-07-11 20:34 ` [PATCH 1/3] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
2011-07-11 20:34 ` [PATCH 2/3] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
2011-07-11 20:34 ` [PATCH 3/3] x86: Call idle notifier after irq_enter() Frederic Weisbecker
2011-07-15 23:32 ` [PATCH 0/3] rcu: Fix some rcu uses in extended quiescent state Paul E. McKenney
2011-07-15 23:41   ` Frederic Weisbecker
2011-07-28 20:56     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.