linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: [RFC] Strange code in cpu_idle()
@ 2004-12-05  0:45 Paul E. McKenney
  2004-12-06  0:16 ` Stephen Rothwell
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2004-12-05  0:45 UTC (permalink / raw)
  To: sfr; +Cc: linux-kernel

Hello, Steve,

OK, I believe I found the other end of this:

static void __exit apm_exit(void)
{
	int	error;

	if (set_pm_idle) {
		pm_idle = original_pm_idle;
		/*
		 * We are about to unload the current idle thread pm callback
		 * (pm_idle), Wait for all processors to update cached/local
		 * copies of pm_idle before proceeding.
		 */
		synchronize_kernel();
	}

Unfortunately, the idle loop is a quiescent state, so it is
possible for synchronize_kernel() to return before the idle threads
have returned.  So I don't believe RCU is useful here.  One other
approach would be to keep a cpu mask, in which apm_exit() sets all
bits, and pm_idle() clears its CPU's bit only if it is set.
Then apm_exit() could wait for all CPU's bits to clear.

There is probably a better way to do this, but that is what comes
to mind immediately.

Thoughts?

					Thanx, Paul

----- Forwarded message from "Paul E. McKenney" <paulmck@us.ibm.com> -----

Date: Sat, 4 Dec 2004 15:11:49 -0800
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: dipankar@in.ibm.com, rusty@au1.ibm.com, ak@suse.de, gareth@valinux.com,
	davidm@hpl.hp.com
Cc: linux-kernel@vger.kernel.org
Subject: [RFC] Strange code in cpu_idle()
Reply-To: paulmck@us.ibm.com

Hello!

Strange code in i386, ia64, and x86-64 cpu_idle():

	void cpu_idle (void)
	{
		/* endless idle loop with no priority at all */
		while (1) {
			while (!need_resched()) {
				void (*idle)(void);
				/*
				 * Mark this as an RCU critical section so that
				 * synchronize_kernel() in the unload path waits
				 * for our completion.
				 */
				rcu_read_lock();
				idle = pm_idle;
				if (!idle)
					idle = default_idle;
				idle();
				rcu_read_unlock();
			}
			schedule();
		}
	}

Unless idle_cpu() is busted, it seems like the above is, given the code in
rcu_check_callbacks():

	void rcu_check_callbacks(int cpu, int user)
	{
		if (user || 
		    (idle_cpu(cpu) && !in_softirq() && 
					hardirq_count() <= (1 << HARDIRQ_SHIFT))) {
			rcu_qsctr_inc(cpu);
			rcu_bh_qsctr_inc(cpu);
		} else if (!in_softirq())
			rcu_bh_qsctr_inc(cpu);
		tasklet_schedule(&per_cpu(rcu_tasklet, cpu));
	}

And idle_cpu() is pretty straightforward:

	int idle_cpu(int cpu)
	{
		return cpu_curr(cpu) == cpu_rq(cpu)->idle;
	}

So I would say that the rcu_read_lock() in cpu_idle() is having no
effect, because any timer interrupt from cpu_idle() will mark a
quiescent state notwithstanding.  What am I missing here?

If I am not missing anything, then the attached patch would be in
order here, though there might be some additional work required.
(Though I thought that the try_stop_module() stuff took care of
all of this these days...)

Note that we really, really do want the idle loop to be an extended
quiescent state, otherwise one gets indefinite grace periods and
runs out of memory...

						Thanx, Paul

diff -urpN -X ../dontdiff linux-2.5/arch/i386/kernel/process.c linux-2.5-idle_rcu/arch/i386/kernel/process.c
--- linux-2.5/arch/i386/kernel/process.c	Mon Nov 29 10:47:14 2004
+++ linux-2.5-idle_rcu/arch/i386/kernel/process.c	Sat Dec  4 14:53:37 2004
@@ -144,14 +144,12 @@ void cpu_idle (void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+		/*
+		 * Note that it is illegal to use RCU read-side
+		 * critical sections within the idle loop.
+		 */
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
 			idle = pm_idle;
 
 			if (!idle)
@@ -159,7 +157,6 @@ void cpu_idle (void)
 
 			irq_stat[smp_processor_id()].idle_timestamp = jiffies;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}
diff -urpN -X ../dontdiff linux-2.5/arch/ia64/kernel/process.c linux-2.5-idle_rcu/arch/ia64/kernel/process.c
--- linux-2.5/arch/ia64/kernel/process.c	Mon Nov 29 10:47:18 2004
+++ linux-2.5-idle_rcu/arch/ia64/kernel/process.c	Sat Dec  4 14:54:30 2004
@@ -230,6 +230,10 @@ cpu_idle (void *unused)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
+		/*
+		 * Note that it is illegal to use RCU read-side
+		 * critical sections within the idle loop.
+		 */
 #ifdef CONFIG_SMP
 		if (!need_resched())
 			min_xtp();
@@ -239,17 +243,10 @@ cpu_idle (void *unused)
 
 			if (mark_idle)
 				(*mark_idle)(1);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			(*idle)();
-			rcu_read_unlock();
 		}
 
 		if (mark_idle)
diff -urpN -X ../dontdiff linux-2.5/arch/x86_64/kernel/process.c linux-2.5-idle_rcu/arch/x86_64/kernel/process.c
--- linux-2.5/arch/x86_64/kernel/process.c	Mon Nov 29 10:48:05 2004
+++ linux-2.5-idle_rcu/arch/x86_64/kernel/process.c	Sat Dec  4 14:55:13 2004
@@ -133,19 +133,16 @@ void cpu_idle (void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+		/*
+		 * Note that it is illegal to use RCU read-side
+		 * critical sections within the idle loop.
+		 */
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}

----- End forwarded message -----

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-05  0:45 Fw: [RFC] Strange code in cpu_idle() Paul E. McKenney
@ 2004-12-06  0:16 ` Stephen Rothwell
  2004-12-06  6:46   ` Stephen Rothwell
  2004-12-06  7:20   ` Andrew Morton
  0 siblings, 2 replies; 32+ messages in thread
From: Stephen Rothwell @ 2004-12-06  0:16 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

Hi Paul,

On Sat, 4 Dec 2004 16:45:57 -0800 "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
>
> OK, I believe I found the other end of this:
> 
> static void __exit apm_exit(void)
> {
> 	int	error;
> 
> 	if (set_pm_idle) {
> 		pm_idle = original_pm_idle;
> 		/*
> 		 * We are about to unload the current idle thread pm callback
> 		 * (pm_idle), Wait for all processors to update cached/local
> 		 * copies of pm_idle before proceeding.
> 		 */
> 		synchronize_kernel();
> 	}
> 
> Unfortunately, the idle loop is a quiescent state, so it is
> possible for synchronize_kernel() to return before the idle threads
> have returned.  So I don't believe RCU is useful here.  One other
> approach would be to keep a cpu mask, in which apm_exit() sets all
> bits, and pm_idle() clears its CPU's bit only if it is set.
> Then apm_exit() could wait for all CPU's bits to clear.
> 
> There is probably a better way to do this, but that is what comes
> to mind immediately.
> 
> Thoughts?

None, sorry :-)

I don't even remember that piece of code going in (I certainly didn't
write it), though I can see what it was trying to do.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06  0:16 ` Stephen Rothwell
@ 2004-12-06  6:46   ` Stephen Rothwell
  2004-12-06 10:00     ` Zwane Mwaikambo
  2004-12-06  7:20   ` Andrew Morton
  1 sibling, 1 reply; 32+ messages in thread
From: Stephen Rothwell @ 2004-12-06  6:46 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 267 bytes --]

On Mon, 6 Dec 2004 11:16:34 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> > Thoughts?
> 
> None, sorry :-)

Actually, Rusty suggests stop_machine() ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06  0:16 ` Stephen Rothwell
  2004-12-06  6:46   ` Stephen Rothwell
@ 2004-12-06  7:20   ` Andrew Morton
  2004-12-06  9:38     ` Zwane Mwaikambo
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-12-06  7:20 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: paulmck, linux-kernel, Zwane Mwaikambo

Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> > OK, I believe I found the other end of this:
>  > 
>  > static void __exit apm_exit(void)
>  > {
>  > 	int	error;
>  > 
>  > 	if (set_pm_idle) {
>  > 		pm_idle = original_pm_idle;
>  > 		/*
>  > 		 * We are about to unload the current idle thread pm callback
>  > 		 * (pm_idle), Wait for all processors to update cached/local
>  > 		 * copies of pm_idle before proceeding.
>  > 		 */
>  > 		synchronize_kernel();
>  > 	}
>  > 
>  > Unfortunately, the idle loop is a quiescent state, so it is
>  > possible for synchronize_kernel() to return before the idle threads
>  > have returned.  So I don't believe RCU is useful here.  One other
>  > approach would be to keep a cpu mask, in which apm_exit() sets all
>  > bits, and pm_idle() clears its CPU's bit only if it is set.
>  > Then apm_exit() could wait for all CPU's bits to clear.
>  > 
>  > There is probably a better way to do this, but that is what comes
>  > to mind immediately.
>  > 
>  > Thoughts?
> 
>  None, sorry :-)
> 
>  I don't even remember that piece of code going in (I certainly didn't
>  write it), though I can see what it was trying to do.

It came in via the below patch.  I guess we need to find a different way to
fix this problem.

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/09/17 12:09:12-07:00 zwane@linuxpower.ca 
#   [PATCH] Close race with preempt and modular pm_idle callbacks
#   
#   The following patch from Shaohua Li fixes a race with preempt enabled when
#   a module containing a pm_idle callback is unloaded.  Cached values in local
#   variables need to be protected as RCU critical sections so that the
#   synchronize_kernel() call in the unload path waits for all processors.
#   There original bugzilla entry can be found at
#   
#   Shaohua, i had to make a small change (variable declaration after code in
#   code block) so that it compiles with geriatric compilers such as the ones
#   Andrew is attached to ;)
#   
#   http://bugzilla.kernel.org/show_bug.cgi?id=1716
#   
#   Signed-off-by: Li Shaohua <shaohua.li@intel.com>
#   Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
#   Signed-off-by: Andrew Morton <akpm@osdl.org>
#   Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 
# drivers/acpi/processor.c
#   2004/09/17 00:07:02-07:00 zwane@linuxpower.ca +5 -0
#   Close race with preempt and modular pm_idle callbacks
# 
# arch/x86_64/kernel/process.c
#   2004/09/17 00:07:02-07:00 zwane@linuxpower.ca +13 -4
#   Close race with preempt and modular pm_idle callbacks
# 
# arch/ia64/kernel/process.c
#   2004/09/17 00:07:02-07:00 zwane@linuxpower.ca +12 -4
#   Close race with preempt and modular pm_idle callbacks
# 
# arch/i386/kernel/process.c
#   2004/09/17 00:07:02-07:00 zwane@linuxpower.ca +9 -1
#   Close race with preempt and modular pm_idle callbacks
# 
# arch/i386/kernel/apm.c
#   2004/09/17 00:07:02-07:00 zwane@linuxpower.ca +8 -1
#   Close race with preempt and modular pm_idle callbacks
# 
diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
--- a/arch/i386/kernel/apm.c	2004-12-05 23:18:08 -08:00
+++ b/arch/i386/kernel/apm.c	2004-12-05 23:18:08 -08:00
@@ -2362,8 +2362,15 @@
 {
 	int	error;
 
-	if (set_pm_idle)
+	if (set_pm_idle) {
 		pm_idle = original_pm_idle;
+		/*
+		 * We are about to unload the current idle thread pm callback
+		 * (pm_idle), Wait for all processors to update cached/local
+		 * copies of pm_idle before proceeding.
+		 */
+		synchronize_kernel();
+	}
 	if (((apm_info.bios.flags & APM_BIOS_DISENGAGED) == 0)
 	    && (apm_info.connection_version > 0x0100)) {
 		error = apm_engage_power_management(APM_DEVICE_ALL, 0);
diff -Nru a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c	2004-12-05 23:18:08 -08:00
+++ b/arch/i386/kernel/process.c	2004-12-05 23:18:08 -08:00
@@ -142,13 +142,21 @@
 	/* endless idle loop with no priority at all */
 	while (1) {
 		while (!need_resched()) {
-			void (*idle)(void) = pm_idle;
+			void (*idle)(void);
+			/*
+			 * Mark this as an RCU critical section so that
+			 * synchronize_kernel() in the unload path waits
+			 * for our completion.
+			 */
+			rcu_read_lock();
+			idle = pm_idle;
 
 			if (!idle)
 				idle = default_idle;
 
 			irq_stat[smp_processor_id()].idle_timestamp = jiffies;
 			idle();
+			rcu_read_unlock();
 		}
 		schedule();
 	}
diff -Nru a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
--- a/arch/ia64/kernel/process.c	2004-12-05 23:18:08 -08:00
+++ b/arch/ia64/kernel/process.c	2004-12-05 23:18:08 -08:00
@@ -228,18 +228,26 @@
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		void (*idle)(void) = pm_idle;
-		if (!idle)
-			idle = default_idle;
-
 #ifdef CONFIG_SMP
 		if (!need_resched())
 			min_xtp();
 #endif
 		while (!need_resched()) {
+			void (*idle)(void);
+
 			if (mark_idle)
 				(*mark_idle)(1);
+			/*
+			 * Mark this as an RCU critical section so that
+			 * synchronize_kernel() in the unload path waits
+			 * for our completion.
+			 */
+			rcu_read_lock();
+			idle = pm_idle;
+			if (!idle)
+				idle = default_idle;
 			(*idle)();
+			rcu_read_unlock();
 		}
 
 		if (mark_idle)
diff -Nru a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c
--- a/arch/x86_64/kernel/process.c	2004-12-05 23:18:08 -08:00
+++ b/arch/x86_64/kernel/process.c	2004-12-05 23:18:08 -08:00
@@ -130,11 +130,20 @@
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		void (*idle)(void) = pm_idle;
-		if (!idle)
-			idle = default_idle;
-		while (!need_resched())
+		while (!need_resched()) {
+			void (*idle)(void);
+			/*
+			 * Mark this as an RCU critical section so that
+			 * synchronize_kernel() in the unload path waits
+			 * for our completion.
+			 */
+			rcu_read_lock();
+			idle = pm_idle;
+			if (!idle)
+				idle = default_idle;
 			idle();
+			rcu_read_unlock();
+		}
 		schedule();
 	}
 }
diff -Nru a/drivers/acpi/processor.c b/drivers/acpi/processor.c
--- a/drivers/acpi/processor.c	2004-12-05 23:18:08 -08:00
+++ b/drivers/acpi/processor.c	2004-12-05 23:18:08 -08:00
@@ -2419,6 +2419,11 @@
 	/* Unregister the idle handler when processor #0 is removed. */
 	if (pr->id == 0) {
 		pm_idle = pm_idle_save;
+		/*
+		 * We are about to unload the current idle thread pm callback
+		 * (pm_idle), Wait for all processors to update cached/local
+		 * copies of pm_idle before proceeding.
+		 */
 		synchronize_kernel();
 	}
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06  7:20   ` Andrew Morton
@ 2004-12-06  9:38     ` Zwane Mwaikambo
  2004-12-06 16:04       ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-06  9:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Rothwell, paulmck, Linux Kernel, Li Shaohua

Hi,
	The original intent to go with synchronize_kernel and RCU 
protection was for simplicity's sake, as the alternative implementations 
at the time looked like major overkill. Now in defense of this method, 
when entering the idle thread and placing the processor in a holding state 
(hlt) and an RCU grace period is begun, the processor in the holding state 
will be unaware of the new RCU grace period until it exits the idle loop 
callback (pm_idle) anyway, so the rcu_read will block the other processors 
from making RCU grace period completion as much as the processor holding 
state. This is true of all current pm_idle callbacks on i386, x86_64 and 
ia64 with the exception of APM (but i'll conveniently ignore that for now 
;). When we do take an interrupt to exit the processor holding state and 
run through rcu_check_callbacks we will notice that we are in a hard 
interrupt and will defer marking of the processsor as quiescent. By that 
point we will have exited the idle thread callback therefore making it 
safe to use synchronize_kernel to protect removal of the callback.

Thanks,
	Zwane "who usually doesn't condone interface abuse" Mwaikambo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06  6:46   ` Stephen Rothwell
@ 2004-12-06 10:00     ` Zwane Mwaikambo
  0 siblings, 0 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-06 10:00 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: paulmck, linux-kernel

On Mon, 6 Dec 2004, Stephen Rothwell wrote:

> On Mon, 6 Dec 2004 11:16:34 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > > Thoughts?
> > 
> > None, sorry :-)
> 
> Actually, Rusty suggests stop_machine() ...

Hmm i believe there is a window with CONFIG_PREEMPT whereupon we read the 
pm_idle pointer, get scheduled out to run the stopmachine thread and 
return with the stale pointer.

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06  9:38     ` Zwane Mwaikambo
@ 2004-12-06 16:04       ` Paul E. McKenney
  2004-12-06 16:47         ` Zwane Mwaikambo
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2004-12-06 16:04 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andrew Morton, Stephen Rothwell, Linux Kernel, Li Shaohua

On Mon, Dec 06, 2004 at 02:38:33AM -0700, Zwane Mwaikambo wrote:
> Hi,
> 	The original intent to go with synchronize_kernel and RCU 
> protection was for simplicity's sake, as the alternative implementations 
> at the time looked like major overkill. Now in defense of this method, 
> when entering the idle thread and placing the processor in a holding state 
> (hlt) and an RCU grace period is begun, the processor in the holding state 
> will be unaware of the new RCU grace period until it exits the idle loop 
> callback (pm_idle) anyway, so the rcu_read will block the other processors 
> from making RCU grace period completion as much as the processor holding 
> state. This is true of all current pm_idle callbacks on i386, x86_64 and 
> ia64 with the exception of APM (but i'll conveniently ignore that for now 
> ;). When we do take an interrupt to exit the processor holding state and 
> run through rcu_check_callbacks we will notice that we are in a hard 
> interrupt and will defer marking of the processsor as quiescent. By that 
> point we will have exited the idle thread callback therefore making it 
> safe to use synchronize_kernel to protect removal of the callback.

I am not going to claim to thoroughly understand the power-management
code, but do have an additional question.

What happens if the processor became aware of a new grace period just
before entering pm_idle?  I could imagine this code simply refusing
to power down the processor if there was a pending grace period, but
I don't see any sign of this.  I could also imagine somehow deferring
interrupts until pm_idle exits.  I don't see anything that looks like
it does this, but don't claim to be any sort of power-management
expert.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06 16:04       ` Paul E. McKenney
@ 2004-12-06 16:47         ` Zwane Mwaikambo
  2004-12-06 19:22           ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-06 16:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrew Morton, Stephen Rothwell, Linux Kernel, Dipankar Sarma,
	Li Shaohua

On Mon, 6 Dec 2004, Paul E. McKenney wrote:

> I am not going to claim to thoroughly understand the power-management
> code, but do have an additional question.
> 
> What happens if the processor became aware of a new grace period just
> before entering pm_idle?  I could imagine this code simply refusing
> to power down the processor if there was a pending grace period, but
> I don't see any sign of this.  I could also imagine somehow deferring
> interrupts until pm_idle exits.  I don't see anything that looks like
> it does this, but don't claim to be any sort of power-management
> expert.

Are you referring to the synchronize_kernel side? That's basically unload 
module path so it's ok if it sits there for a bit, but it should only last 
for as long as the next interrupt, which would be a pretty short perid 
considering HZ=1000. But the usage still has a race and hence invalid as 
pointed out by Dipankar

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06 16:47         ` Zwane Mwaikambo
@ 2004-12-06 19:22           ` Paul E. McKenney
  2004-12-11 15:07             ` Zwane Mwaikambo
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2004-12-06 19:22 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Andrew Morton, Stephen Rothwell, Linux Kernel, Dipankar Sarma,
	Li Shaohua

On Mon, Dec 06, 2004 at 09:47:04AM -0700, Zwane Mwaikambo wrote:
> On Mon, 6 Dec 2004, Paul E. McKenney wrote:
> 
> > I am not going to claim to thoroughly understand the power-management
> > code, but do have an additional question.
> > 
> > What happens if the processor became aware of a new grace period just
> > before entering pm_idle?  I could imagine this code simply refusing
> > to power down the processor if there was a pending grace period, but
> > I don't see any sign of this.  I could also imagine somehow deferring
> > interrupts until pm_idle exits.  I don't see anything that looks like
> > it does this, but don't claim to be any sort of power-management
> > expert.
> 
> Are you referring to the synchronize_kernel side? That's basically unload 
> module path so it's ok if it sits there for a bit, but it should only last 
> for as long as the next interrupt, which would be a pretty short perid 
> considering HZ=1000. But the usage still has a race and hence invalid as 
> pointed out by Dipankar

My bad -- I hadn't read through the entire thread before responding,
so was speculating on how it might manage to be correct.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fw: [RFC] Strange code in cpu_idle()
  2004-12-06 19:22           ` Paul E. McKenney
@ 2004-12-11 15:07             ` Zwane Mwaikambo
  2004-12-12  4:54               ` [PATCH] Remove RCU abuse " Zwane Mwaikambo
  0 siblings, 1 reply; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-11 15:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrew Morton, Stephen Rothwell, Linux Kernel, Dipankar Sarma,
	Li Shaohua, Len Brown

On Mon, 6 Dec 2004, Paul E. McKenney wrote:

> My bad -- I hadn't read through the entire thread before responding,
> so was speculating on how it might manage to be correct.

Ok, i'll come up with a method which doesn't use RCU. Thank you, for 
pointing out the faults.

	Zwane

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-11 15:07             ` Zwane Mwaikambo
@ 2004-12-12  4:54               ` Zwane Mwaikambo
  2004-12-12  5:06                 ` Zwane Mwaikambo
  0 siblings, 1 reply; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-12  4:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrew Morton, Stephen Rothwell, Linux Kernel, Dipankar Sarma,
	Li Shaohua, Len Brown

Introduce cpu_idle_wait() on architectures requiring modification of 
pm_idle from modules, this will ensure that all processors have updated 
their cached values of pm_idle upon exit. This patch is to address the bug 
report at http://bugme.osdl.org/show_bug.cgi?id=1716 and replaces the 
current code fix which is in violation of normal RCU usage as pointed out 
by Stephen, Dipankar and Paul.

Tested on EM64T/SMP/x86_64 with modprobe/rmmod loops of ACPI 
processor/thermal modules.

 arch/i386/kernel/apm.c       |    2 +-
 arch/i386/kernel/process.c   |   27 ++++++++++++++++++++-------
 arch/ia64/kernel/process.c   |   26 +++++++++++++++++++-------
 arch/x86_64/kernel/process.c |   27 ++++++++++++++++++++-------
 drivers/acpi/processor.c     |    2 +-
 include/asm-i386/system.h    |    1 +
 include/asm-ia64/system.h    |    1 +
 include/asm-x86_64/system.h  |    2 ++
 8 files changed, 65 insertions(+), 23 deletions(-)

Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>

Index: linux-2.6.10-rc2/arch/i386/kernel/apm.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/i386/kernel/apm.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 apm.c
--- linux-2.6.10-rc2/arch/i386/kernel/apm.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/i386/kernel/apm.c	12 Dec 2004 04:28:11 -0000
@@ -2369,7 +2369,7 @@ static void __exit apm_exit(void)
 		 * (pm_idle), Wait for all processors to update cached/local
 		 * copies of pm_idle before proceeding.
 		 */
-		synchronize_kernel();
+		cpu_idle_wait();
 	}
 	if (((apm_info.bios.flags & APM_BIOS_DISENGAGED) == 0)
 	    && (apm_info.connection_version > 0x0100)) {
Index: linux-2.6.10-rc2/arch/i386/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/i386/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.10-rc2/arch/i386/kernel/process.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/i386/kernel/process.c	12 Dec 2004 04:29:51 -0000
@@ -72,6 +72,7 @@ unsigned long thread_saved_pc(struct tas
  * Powermanagement idle function, if any..
  */
 void (*pm_idle)(void);
+static cpumask_t cpu_idle_map;
 
 void disable_hlt(void)
 {
@@ -142,16 +143,16 @@ static void poll_idle (void)
  */
 void cpu_idle (void)
 {
+	int cpu = smp_processor_id();
+
 	/* endless idle loop with no priority at all */
 	while (1) {
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 
 			if (!idle)
@@ -159,12 +160,24 @@ void cpu_idle (void)
 
 			irq_stat[smp_processor_id()].idle_timestamp = jiffies;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}
 }
 
+void cpu_idle_wait(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		cpu_set(cpu, cpu_idle_map);
+
+	wmb();	
+	while (cpus_equal(cpu_idle_map, cpu_online_map))
+		schedule_timeout(HZ);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
Index: linux-2.6.10-rc2/arch/ia64/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/ia64/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.10-rc2/arch/ia64/kernel/process.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/ia64/kernel/process.c	12 Dec 2004 04:28:11 -0000
@@ -46,6 +46,7 @@
 #include "sigframe.h"
 
 void (*ia64_mark_idle)(int);
+static cpumask_t cpu_idle_map;
 
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
@@ -223,10 +224,24 @@ static inline void play_dead(void)
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+void cpu_idle_wait(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		cpu_set(cpu, cpu_idle_map);
+
+	wmb();	
+	while (cpus_equal(cpu_idle_map, cpu_online_map))
+		schedule_timeout(HZ);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 void __attribute__((noreturn))
 cpu_idle (void *unused)
 {
 	void (*mark_idle)(int) = ia64_mark_idle;
+	int cpu = smp_processor_id();
 
 	/* endless idle loop with no priority at all */
 	while (1) {
@@ -239,17 +254,14 @@ cpu_idle (void *unused)
 
 			if (mark_idle)
 				(*mark_idle)(1);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			(*idle)();
-			rcu_read_unlock();
 		}
 
 		if (mark_idle)
Index: linux-2.6.10-rc2/arch/x86_64/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/x86_64/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.10-rc2/arch/x86_64/kernel/process.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/x86_64/kernel/process.c	12 Dec 2004 04:28:11 -0000
@@ -61,6 +61,7 @@ EXPORT_SYMBOL(boot_option_idle_override)
  * Powermanagement idle function, if any..
  */
 void (*pm_idle)(void);
+static cpumask_t cpu_idle_map;
 
 void disable_hlt(void)
 {
@@ -123,6 +124,19 @@ static void poll_idle (void)
 	}
 }
 
+void cpu_idle_wait(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		cpu_set(cpu, cpu_idle_map);
+	
+	wmb();	
+	while (cpus_equal(cpu_idle_map, cpu_online_map))
+		schedule_timeout(HZ);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 /*
  * The idle thread. There's no useful work to be
  * done, so just try to conserve power and have a
@@ -131,21 +145,20 @@ static void poll_idle (void)
  */
 void cpu_idle (void)
 {
+	int cpu = smp_processor_id();
+
 	/* endless idle loop with no priority at all */
 	while (1) {
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}
Index: linux-2.6.10-rc2/drivers/acpi/processor.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/drivers/acpi/processor.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 processor.c
--- linux-2.6.10-rc2/drivers/acpi/processor.c	25 Nov 2004 19:45:34 -0000	1.1.1.1
+++ linux-2.6.10-rc2/drivers/acpi/processor.c	12 Dec 2004 04:28:11 -0000
@@ -2535,7 +2535,7 @@ acpi_processor_remove (
 		 * (pm_idle), Wait for all processors to update cached/local
 		 * copies of pm_idle before proceeding.
 		 */
-		synchronize_kernel();
+		cpu_idle_wait();
 	}
 
 	status = acpi_remove_notify_handler(pr->handle, ACPI_DEVICE_NOTIFY, 
Index: linux-2.6.10-rc2/include/asm-i386/system.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/include/asm-i386/system.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 system.h
--- linux-2.6.10-rc2/include/asm-i386/system.h	25 Nov 2004 19:45:33 -0000	1.1.1.1
+++ linux-2.6.10-rc2/include/asm-i386/system.h	12 Dec 2004 04:28:11 -0000
@@ -466,5 +466,6 @@ void disable_hlt(void);
 void enable_hlt(void);
 
 extern int es7000_plat;
+void cpu_idle_wait(void);
 
 #endif
Index: linux-2.6.10-rc2/include/asm-ia64/system.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/include/asm-ia64/system.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 system.h
--- linux-2.6.10-rc2/include/asm-ia64/system.h	25 Nov 2004 19:45:33 -0000	1.1.1.1
+++ linux-2.6.10-rc2/include/asm-ia64/system.h	12 Dec 2004 04:28:11 -0000
@@ -284,6 +284,7 @@ do {						\
 
 #define ia64_platform_is(x) (strcmp(x, platform_name) == 0)
 
+void cpu_idle_wait(void);
 #endif /* __KERNEL__ */
 
 #endif /* __ASSEMBLY__ */
Index: linux-2.6.10-rc2/include/asm-x86_64/system.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/include/asm-x86_64/system.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 system.h
--- linux-2.6.10-rc2/include/asm-x86_64/system.h	25 Nov 2004 19:45:33 -0000	1.1.1.1
+++ linux-2.6.10-rc2/include/asm-x86_64/system.h	12 Dec 2004 04:28:11 -0000
@@ -326,6 +326,8 @@ static inline unsigned long __cmpxchg(vo
 /* For spinlocks etc */
 #define local_irq_save(x) 	do { warn_if_not_ulong(x); __asm__ __volatile__("# local_irq_save \n\t pushfq ; popq %0 ; cli":"=g" (x): /* no input */ :"memory"); } while (0)
 
+void cpu_idle_wait(void);
+
 /*
  * disable hlt during certain critical i/o operations
  */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-12  4:54               ` [PATCH] Remove RCU abuse " Zwane Mwaikambo
@ 2004-12-12  5:06                 ` Zwane Mwaikambo
  2004-12-12  5:49                   ` Zwane Mwaikambo
  0 siblings, 1 reply; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-12  5:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrew Morton, Stephen Rothwell, Linux Kernel, Dipankar Sarma,
	Li Shaohua, Len Brown

On Sat, 11 Dec 2004, Zwane Mwaikambo wrote:

> Introduce cpu_idle_wait() on architectures requiring modification of 
> pm_idle from modules, this will ensure that all processors have updated 
> their cached values of pm_idle upon exit. This patch is to address the bug 
> report at http://bugme.osdl.org/show_bug.cgi?id=1716 and replaces the 
> current code fix which is in violation of normal RCU usage as pointed out 
> by Stephen, Dipankar and Paul.

Aargh, i shouldn't be programming whilst sick :( i'll repost a non broken 
version shortly.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-12  5:06                 ` Zwane Mwaikambo
@ 2004-12-12  5:49                   ` Zwane Mwaikambo
  2004-12-13  6:13                     ` Andrew Morton
  2004-12-19  2:40                     ` Nish Aravamudan
  0 siblings, 2 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-12  5:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrew Morton, Stephen Rothwell, Linux Kernel, Dipankar Sarma,
	Li Shaohua, Len Brown

On Sat, 11 Dec 2004, Zwane Mwaikambo wrote:

> > Introduce cpu_idle_wait() on architectures requiring modification of 
> > pm_idle from modules, this will ensure that all processors have updated 
> > their cached values of pm_idle upon exit. This patch is to address the bug 
> > report at http://bugme.osdl.org/show_bug.cgi?id=1716 and replaces the 
> > current code fix which is in violation of normal RCU usage as pointed out 
> > by Stephen, Dipankar and Paul.

 arch/i386/kernel/apm.c       |    2 +-
 arch/i386/kernel/process.c   |   30 +++++++++++++++++++++++-------
 arch/ia64/kernel/process.c   |   30 +++++++++++++++++++++++-------
 arch/x86_64/kernel/process.c |   31 ++++++++++++++++++++++++-------
 drivers/acpi/processor.c     |    2 +-
 include/asm-i386/system.h    |    1 +
 include/asm-ia64/system.h    |    1 +
 include/asm-x86_64/system.h  |    2 ++
 8 files changed, 76 insertions(+), 23 deletions(-)

Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>

Index: linux-2.6.10-rc2/arch/i386/kernel/apm.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/i386/kernel/apm.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 apm.c
--- linux-2.6.10-rc2/arch/i386/kernel/apm.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/i386/kernel/apm.c	12 Dec 2004 04:28:11 -0000
@@ -2369,7 +2369,7 @@ static void __exit apm_exit(void)
 		 * (pm_idle), Wait for all processors to update cached/local
 		 * copies of pm_idle before proceeding.
 		 */
-		synchronize_kernel();
+		cpu_idle_wait();
 	}
 	if (((apm_info.bios.flags & APM_BIOS_DISENGAGED) == 0)
 	    && (apm_info.connection_version > 0x0100)) {
Index: linux-2.6.10-rc2/arch/i386/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/i386/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.10-rc2/arch/i386/kernel/process.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/i386/kernel/process.c	12 Dec 2004 05:11:27 -0000
@@ -72,6 +72,7 @@ unsigned long thread_saved_pc(struct tas
  * Powermanagement idle function, if any..
  */
 void (*pm_idle)(void);
+static cpumask_t cpu_idle_map;
 
 void disable_hlt(void)
 {
@@ -142,16 +143,16 @@ static void poll_idle (void)
  */
 void cpu_idle (void)
 {
+	int cpu = smp_processor_id();
+
 	/* endless idle loop with no priority at all */
 	while (1) {
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 
 			if (!idle)
@@ -159,12 +160,27 @@ void cpu_idle (void)
 
 			irq_stat[smp_processor_id()].idle_timestamp = jiffies;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}
 }
 
+void cpu_idle_wait(void)
+{
+	int cpu;
+	cpumask_t map;
+
+	for_each_online_cpu(cpu)
+		cpu_set(cpu, cpu_idle_map);
+
+	wmb();
+	do {
+		schedule_timeout(HZ);
+		cpus_and(map, cpu_idle_map, cpu_online_map);
+	} while (!cpus_empty(map));
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
Index: linux-2.6.10-rc2/arch/ia64/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/ia64/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.10-rc2/arch/ia64/kernel/process.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/ia64/kernel/process.c	12 Dec 2004 05:11:56 -0000
@@ -46,6 +46,7 @@
 #include "sigframe.h"
 
 void (*ia64_mark_idle)(int);
+static cpumask_t cpu_idle_map;
 
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
@@ -223,10 +224,28 @@ static inline void play_dead(void)
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+
+void cpu_idle_wait(void)
+{
+        int cpu;
+        cpumask_t map;
+
+        for_each_online_cpu(cpu)
+                cpu_set(cpu, cpu_idle_map);
+
+        wmb();
+        do {
+                schedule_timeout(HZ);
+                cpus_and(map, cpu_idle_map, cpu_online_map);
+        } while (!cpus_empty(map));
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 void __attribute__((noreturn))
 cpu_idle (void *unused)
 {
 	void (*mark_idle)(int) = ia64_mark_idle;
+	int cpu = smp_processor_id();
 
 	/* endless idle loop with no priority at all */
 	while (1) {
@@ -239,17 +258,14 @@ cpu_idle (void *unused)
 
 			if (mark_idle)
 				(*mark_idle)(1);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			(*idle)();
-			rcu_read_unlock();
 		}
 
 		if (mark_idle)
Index: linux-2.6.10-rc2/arch/x86_64/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/arch/x86_64/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.10-rc2/arch/x86_64/kernel/process.c	25 Nov 2004 19:45:32 -0000	1.1.1.1
+++ linux-2.6.10-rc2/arch/x86_64/kernel/process.c	12 Dec 2004 05:26:48 -0000
@@ -61,6 +61,7 @@ EXPORT_SYMBOL(boot_option_idle_override)
  * Powermanagement idle function, if any..
  */
 void (*pm_idle)(void);
+static cpumask_t cpu_idle_map;
 
 void disable_hlt(void)
 {
@@ -123,6 +124,23 @@ static void poll_idle (void)
 	}
 }
 
+
+void cpu_idle_wait(void)
+{
+        int cpu;
+        cpumask_t map;
+
+        for_each_online_cpu(cpu)
+                cpu_set(cpu, cpu_idle_map);
+
+        wmb();
+        do {
+                schedule_timeout(HZ);
+                cpus_and(map, cpu_idle_map, cpu_online_map);
+        } while (!cpus_empty(map));
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 /*
  * The idle thread. There's no useful work to be
  * done, so just try to conserve power and have a
@@ -131,21 +149,20 @@ static void poll_idle (void)
  */
 void cpu_idle (void)
 {
+	int cpu = smp_processor_id();
+
 	/* endless idle loop with no priority at all */
 	while (1) {
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}
Index: linux-2.6.10-rc2/drivers/acpi/processor.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/drivers/acpi/processor.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 processor.c
--- linux-2.6.10-rc2/drivers/acpi/processor.c	25 Nov 2004 19:45:34 -0000	1.1.1.1
+++ linux-2.6.10-rc2/drivers/acpi/processor.c	12 Dec 2004 04:28:11 -0000
@@ -2535,7 +2535,7 @@ acpi_processor_remove (
 		 * (pm_idle), Wait for all processors to update cached/local
 		 * copies of pm_idle before proceeding.
 		 */
-		synchronize_kernel();
+		cpu_idle_wait();
 	}
 
 	status = acpi_remove_notify_handler(pr->handle, ACPI_DEVICE_NOTIFY, 
Index: linux-2.6.10-rc2/include/asm-i386/system.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/include/asm-i386/system.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 system.h
--- linux-2.6.10-rc2/include/asm-i386/system.h	25 Nov 2004 19:45:33 -0000	1.1.1.1
+++ linux-2.6.10-rc2/include/asm-i386/system.h	12 Dec 2004 04:28:11 -0000
@@ -466,5 +466,6 @@ void disable_hlt(void);
 void enable_hlt(void);
 
 extern int es7000_plat;
+void cpu_idle_wait(void);
 
 #endif
Index: linux-2.6.10-rc2/include/asm-ia64/system.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/include/asm-ia64/system.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 system.h
--- linux-2.6.10-rc2/include/asm-ia64/system.h	25 Nov 2004 19:45:33 -0000	1.1.1.1
+++ linux-2.6.10-rc2/include/asm-ia64/system.h	12 Dec 2004 04:28:11 -0000
@@ -284,6 +284,7 @@ do {						\
 
 #define ia64_platform_is(x) (strcmp(x, platform_name) == 0)
 
+void cpu_idle_wait(void);
 #endif /* __KERNEL__ */
 
 #endif /* __ASSEMBLY__ */
Index: linux-2.6.10-rc2/include/asm-x86_64/system.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-rc2/include/asm-x86_64/system.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 system.h
--- linux-2.6.10-rc2/include/asm-x86_64/system.h	25 Nov 2004 19:45:33 -0000	1.1.1.1
+++ linux-2.6.10-rc2/include/asm-x86_64/system.h	12 Dec 2004 04:28:11 -0000
@@ -326,6 +326,8 @@ static inline unsigned long __cmpxchg(vo
 /* For spinlocks etc */
 #define local_irq_save(x) 	do { warn_if_not_ulong(x); __asm__ __volatile__("# local_irq_save \n\t pushfq ; popq %0 ; cli":"=g" (x): /* no input */ :"memory"); } while (0)
 
+void cpu_idle_wait(void);
+
 /*
  * disable hlt during certain critical i/o operations
  */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-12  5:49                   ` Zwane Mwaikambo
@ 2004-12-13  6:13                     ` Andrew Morton
  2004-12-13  6:22                       ` Zwane Mwaikambo
  2004-12-19  2:40                     ` Nish Aravamudan
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-12-13  6:13 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: paulmck, sfr, linux-kernel, dipankar, shaohua.li, len.brown

Zwane Mwaikambo <zwane@arm.linux.org.uk> wrote:
>
> On Sat, 11 Dec 2004, Zwane Mwaikambo wrote:
> 
> > > Introduce cpu_idle_wait() on architectures requiring modification of 
> > > pm_idle from modules, this will ensure that all processors have updated 
> > > their cached values of pm_idle upon exit. This patch is to address the bug 
> > > report at http://bugme.osdl.org/show_bug.cgi?id=1716 and replaces the 
> > > current code fix which is in violation of normal RCU usage as pointed out 
> > > by Stephen, Dipankar and Paul.
> 
> ...
> 
>  void cpu_idle (void)
>  {
> +	int cpu = smp_processor_id();
> +
>  	/* endless idle loop with no priority at all */

This gives me scadzillions of "using smp_procesor_id() in preemptible"
warnings.

I'll shut that up with _smp_processor_id() but one does wonder what happens
if we get preempted and `cpu' refers to some other CPU?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-13  6:13                     ` Andrew Morton
@ 2004-12-13  6:22                       ` Zwane Mwaikambo
  2004-12-13  6:32                         ` Andrew Morton
  2004-12-13  6:41                         ` Andrew Morton
  0 siblings, 2 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-13  6:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, sfr, linux-kernel, dipankar, shaohua.li, len.brown

On Sun, 12 Dec 2004, Andrew Morton wrote:

> >  void cpu_idle (void)
> >  {
> > +	int cpu = smp_processor_id();
> > +
> >  	/* endless idle loop with no priority at all */
> 
> This gives me scadzillions of "using smp_procesor_id() in preemptible"
> warnings.
> 
> I'll shut that up with _smp_processor_id() but one does wonder what happens
> if we get preempted and `cpu' refers to some other CPU?

The idle thread is special in the sense that it can't get migrated so the 
cached values of smp_processor_id are fine.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-13  6:22                       ` Zwane Mwaikambo
@ 2004-12-13  6:32                         ` Andrew Morton
  2004-12-13  7:09                           ` Zwane Mwaikambo
  2004-12-13  6:41                         ` Andrew Morton
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-12-13  6:32 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: paulmck, sfr, linux-kernel, dipankar, shaohua.li, len.brown

Zwane Mwaikambo <zwane@arm.linux.org.uk> wrote:
>
> The idle thread is special in the sense that it can't get migrated so the 
>  cached values of smp_processor_id are fine.

duh, knew that.

We can use the cached value throughout, no?

--- 25/arch/i386/kernel/process.c~remove-rcu-abuse-in-cpu_idle-warning-fix	2004-12-12 22:30:10.200626944 -0800
+++ 25-akpm/arch/i386/kernel/process.c	2004-12-12 22:31:22.417648288 -0800
@@ -146,7 +146,7 @@ static void poll_idle (void)
  */
 void cpu_idle (void)
 {
-	int cpu = smp_processor_id();
+	int cpu = _smp_processor_id();
 
 	/* endless idle loop with no priority at all */
 	while (1) {
@@ -161,7 +161,7 @@ void cpu_idle (void)
 			if (!idle)
 				idle = default_idle;
 
-			irq_stat[smp_processor_id()].idle_timestamp = jiffies;
+			irq_stat[cpu].idle_timestamp = jiffies;
 			idle();
 		}
 		schedule();
_


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-13  6:22                       ` Zwane Mwaikambo
  2004-12-13  6:32                         ` Andrew Morton
@ 2004-12-13  6:41                         ` Andrew Morton
  2004-12-13  7:13                           ` Zwane Mwaikambo
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-12-13  6:41 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: paulmck, sfr, linux-kernel, dipankar, shaohua.li, len.brown

Zwane Mwaikambo <zwane@arm.linux.org.uk> wrote:
>
> > This gives me scadzillions of "using smp_procesor_id() in preemptible"
>  > warnings.

As does the current_cpu_data evaluation in default_idle(), now we've gone
and dropped the rcu_read_lock() from around it.  That debugging patch is a
pain.

I'll switch it to boot_cpu_data.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-13  6:32                         ` Andrew Morton
@ 2004-12-13  7:09                           ` Zwane Mwaikambo
  0 siblings, 0 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-13  7:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, sfr, linux-kernel, dipankar, shaohua.li, len.brown

On Sun, 12 Dec 2004, Andrew Morton wrote:

> Zwane Mwaikambo <zwane@arm.linux.org.uk> wrote:
> >
> > The idle thread is special in the sense that it can't get migrated so the 
> >  cached values of smp_processor_id are fine.
> 
> duh, knew that.
> 
> We can use the cached value throughout, no?

Oh yes, silly me i forgot about those.

Thanks!


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-13  6:41                         ` Andrew Morton
@ 2004-12-13  7:13                           ` Zwane Mwaikambo
  0 siblings, 0 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-13  7:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, sfr, linux-kernel, dipankar, shaohua.li, len.brown

On Sun, 12 Dec 2004, Andrew Morton wrote:

> Zwane Mwaikambo <zwane@arm.linux.org.uk> wrote:
> >
> > > This gives me scadzillions of "using smp_procesor_id() in preemptible"
> >  > warnings.
> 
> As does the current_cpu_data evaluation in default_idle(), now we've gone
> and dropped the rcu_read_lock() from around it.  That debugging patch is a
> pain.
> 
> I'll switch it to boot_cpu_data.

I really should have turned on all the debugging knobs, although i might 
have hesitated before scattering _smp_processor_id in places.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-12  5:49                   ` Zwane Mwaikambo
  2004-12-13  6:13                     ` Andrew Morton
@ 2004-12-19  2:40                     ` Nish Aravamudan
  2004-12-20  0:59                       ` Zwane Mwaikambo
  1 sibling, 1 reply; 32+ messages in thread
From: Nish Aravamudan @ 2004-12-19  2:40 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Paul E. McKenney, Andrew Morton, Stephen Rothwell, Linux Kernel,
	Dipankar Sarma, Li Shaohua, Len Brown

On Sat, 11 Dec 2004 22:49:28 -0700 (MST), Zwane Mwaikambo
<zwane@arm.linux.org.uk> wrote:
> On Sat, 11 Dec 2004, Zwane Mwaikambo wrote:
> 
> > > Introduce cpu_idle_wait() on architectures requiring modification of
> > > pm_idle from modules, this will ensure that all processors have updated
> > > their cached values of pm_idle upon exit. This patch is to address the bug
> > > report at http://bugme.osdl.org/show_bug.cgi?id=1716 and replaces the
> > > current code fix which is in violation of normal RCU usage as pointed out
> > > by Stephen, Dipankar and Paul.

<snip>

> +       wmb();
> +       do {
> +               schedule_timeout(HZ);
> +               cpus_and(map, cpu_idle_map, cpu_online_map);
> +       } while (!cpus_empty(map));

<snip>

All of these schedule_timeout() calls are broken. They do not set the
state before hand and therefore will return early. Since you're not
checking for signals and there are no waitqueue events around the
code, I'm assuming you can just use ssleep(1) instead of the current
schedule_timeout() calls.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-19  2:40                     ` Nish Aravamudan
@ 2004-12-20  0:59                       ` Zwane Mwaikambo
  2004-12-20  1:15                         ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-20  0:59 UTC (permalink / raw)
  To: Nish Aravamudan
  Cc: Paul E. McKenney, Andrew Morton, Stephen Rothwell, Linux Kernel,
	Dipankar Sarma, Li Shaohua, Len Brown

On Sat, 18 Dec 2004, Nish Aravamudan wrote:

> All of these schedule_timeout() calls are broken. They do not set the
> state before hand and therefore will return early. Since you're not
> checking for signals and there are no waitqueue events around the
> code, I'm assuming you can just use ssleep(1) instead of the current
> schedule_timeout() calls.

Returning early is fine (and will happen if the other processors are 
busy), we're spinning on a condition, but yes ssleep() could be used 
instead.

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20  0:59                       ` Zwane Mwaikambo
@ 2004-12-20  1:15                         ` Nick Piggin
  2004-12-20  1:44                           ` Zwane Mwaikambo
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Piggin @ 2004-12-20  1:15 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Nish Aravamudan, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Sun, 2004-12-19 at 17:59 -0700, Zwane Mwaikambo wrote:
> On Sat, 18 Dec 2004, Nish Aravamudan wrote:
> 
> > All of these schedule_timeout() calls are broken. They do not set the
> > state before hand and therefore will return early. Since you're not
> > checking for signals and there are no waitqueue events around the
> > code, I'm assuming you can just use ssleep(1) instead of the current
> > schedule_timeout() calls.
> 
> Returning early is fine (and will happen if the other processors are 
> busy), we're spinning on a condition, but yes ssleep() could be used 
> instead.
> 

Hi Zwane,

This thread can possibly be stalled forever if there is a CPU hog
running, right?

In which case, you will want to use ssleep rather than a busy loop.

Another alternative may be to use more complex logic to detect that a
CPU is not in the idle loop at all. In that case, a simple cpu_relax
type spin loop should be OK, because the synchronisation would be
achieved very quickly.

Nick



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20  1:15                         ` Nick Piggin
@ 2004-12-20  1:44                           ` Zwane Mwaikambo
  2004-12-20  1:56                             ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-20  1:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nish Aravamudan, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Mon, 20 Dec 2004, Nick Piggin wrote:

> This thread can possibly be stalled forever if there is a CPU hog
> running, right?

Yep.

> In which case, you will want to use ssleep rather than a busy loop.

Well ssleep essentially does the same thing as the schedule_timeout.

> Another alternative may be to use more complex logic to detect that a
> CPU is not in the idle loop at all. In that case, a simple cpu_relax
> type spin loop should be OK, because the synchronisation would be
> achieved very quickly.

I considered checking whether the cpu is in the idle thread or not but 
wouldn't that require locking runqueues? Something like;

pm_idle = new_value;
wmb();
busy_map = cpu_online_map;
for_each_online_cpu(cpu) {
	runqueue_t *rq = cpu_rq(cpu);
	spin_lock_irq(&rq->lock);
	if (rq->curr != rq->idle)
		cpu_clear(cpu, busy_map);
	spin_unlock_irq(&rq->lock);
}

cpu_idle_map = busy_map;
wmb();

while (!cpus_empty(cpu_idle_map)) {
	cpus_and(cpu_idle_map, cpu_idle_map, cpu_online_map);
	ssleep(1);
}

Hmm then again, i think we could get away with doing an unlocked compare 
on the rq->curr and rq->idle since we've written back pm_idle and reading 
a stale rq->curr which isn't equal to rq->idle means that the remote 
processor should also have the new pm_idle. I'm still not convinced that 
it deserves this much complexity, this is a rarely carried out operation 
and usually at boottime or shutdown.

Thanks,
	Zwane

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20  1:44                           ` Zwane Mwaikambo
@ 2004-12-20  1:56                             ` Nick Piggin
  2004-12-20  2:10                               ` Zwane Mwaikambo
  2004-12-20  2:26                               ` Nish Aravamudan
  0 siblings, 2 replies; 32+ messages in thread
From: Nick Piggin @ 2004-12-20  1:56 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Nish Aravamudan, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Sun, 2004-12-19 at 18:44 -0700, Zwane Mwaikambo wrote:
> On Mon, 20 Dec 2004, Nick Piggin wrote:
> 
> > This thread can possibly be stalled forever if there is a CPU hog
> > running, right?
> 
> Yep.
> 
> > In which case, you will want to use ssleep rather than a busy loop.
> 
> Well ssleep essentially does the same thing as the schedule_timeout.
> 

Yes - so long as you set ->state when using schedule_timeout ;)

> > Another alternative may be to use more complex logic to detect that a
> > CPU is not in the idle loop at all. In that case, a simple cpu_relax
> > type spin loop should be OK, because the synchronisation would be
> > achieved very quickly.
> 
> I considered checking whether the cpu is in the idle thread or not but 
> wouldn't that require locking runqueues? Something like;
> 
> pm_idle = new_value;
> wmb();
> busy_map = cpu_online_map;
> for_each_online_cpu(cpu) {
> 	runqueue_t *rq = cpu_rq(cpu);
> 	spin_lock_irq(&rq->lock);
> 	if (rq->curr != rq->idle)
> 		cpu_clear(cpu, busy_map);
> 	spin_unlock_irq(&rq->lock);
> }
> 
> cpu_idle_map = busy_map;
> wmb();
> 
> while (!cpus_empty(cpu_idle_map)) {
> 	cpus_and(cpu_idle_map, cpu_idle_map, cpu_online_map);
> 	ssleep(1);
> }
> 
> Hmm then again, i think we could get away with doing an unlocked compare 
> on the rq->curr and rq->idle since we've written back pm_idle and reading 
> a stale rq->curr which isn't equal to rq->idle means that the remote 
> processor should also have the new pm_idle. I'm still not convinced that 
> it deserves this much complexity, this is a rarely carried out operation 
> and usually at boottime or shutdown.
> 

Hmm, yeah it is fairly complex, now that you've expanded on my
handwaving!

I think you are right about not requiring locking, so long as you
have the appropriate memory barriers in place... but it's probably
not worth the effort, as you say.

Nick



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20  1:56                             ` Nick Piggin
@ 2004-12-20  2:10                               ` Zwane Mwaikambo
  2004-12-20  2:30                                 ` Nish Aravamudan
  2004-12-20 18:27                                 ` Nishanth Aravamudan
  2004-12-20  2:26                               ` Nish Aravamudan
  1 sibling, 2 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-20  2:10 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nish Aravamudan, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Mon, 20 Dec 2004, Nick Piggin wrote:

> On Sun, 2004-12-19 at 18:44 -0700, Zwane Mwaikambo wrote:
> > On Mon, 20 Dec 2004, Nick Piggin wrote:
> > 
> > > This thread can possibly be stalled forever if there is a CPU hog
> > > running, right?
> > 
> > Yep.
> > 
> > > In which case, you will want to use ssleep rather than a busy loop.
> > 
> > Well ssleep essentially does the same thing as the schedule_timeout.
> > 
> 
> Yes - so long as you set ->state when using schedule_timeout ;)

Nish could you please submit something to switch it to ssleep?

Thanks,
	Zwane

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20  1:56                             ` Nick Piggin
  2004-12-20  2:10                               ` Zwane Mwaikambo
@ 2004-12-20  2:26                               ` Nish Aravamudan
  1 sibling, 0 replies; 32+ messages in thread
From: Nish Aravamudan @ 2004-12-20  2:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Zwane Mwaikambo, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Mon, 20 Dec 2004 12:56:24 +1100, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> On Sun, 2004-12-19 at 18:44 -0700, Zwane Mwaikambo wrote:
> > On Mon, 20 Dec 2004, Nick Piggin wrote:
> >
> > > This thread can possibly be stalled forever if there is a CPU hog
> > > running, right?
> >
> > Yep.
> >
> > > In which case, you will want to use ssleep rather than a busy loop.
> >
> > Well ssleep essentially does the same thing as the schedule_timeout.
> >
> 
> Yes - so long as you set ->state when using schedule_timeout ;)

You do not need to set the state before calling ssleep(). It is set to
TASK_UNINTERRUPTIBLE within the loop of ssleep().

-Nish

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20  2:10                               ` Zwane Mwaikambo
@ 2004-12-20  2:30                                 ` Nish Aravamudan
  2004-12-20 18:27                                 ` Nishanth Aravamudan
  1 sibling, 0 replies; 32+ messages in thread
From: Nish Aravamudan @ 2004-12-20  2:30 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Nick Piggin, Paul E. McKenney, Andrew Morton, Stephen Rothwell,
	Linux Kernel, Dipankar Sarma, Li Shaohua, Len Brown

On Sun, 19 Dec 2004 19:10:31 -0700 (MST), Zwane Mwaikambo
<zwane@arm.linux.org.uk> wrote:
> On Mon, 20 Dec 2004, Nick Piggin wrote:
> 
> > On Sun, 2004-12-19 at 18:44 -0700, Zwane Mwaikambo wrote:
> > > On Mon, 20 Dec 2004, Nick Piggin wrote:
> > >
> > > > This thread can possibly be stalled forever if there is a CPU hog
> > > > running, right?
> > >
> > > Yep.
> > >
> > > > In which case, you will want to use ssleep rather than a busy loop.
> > >
> > > Well ssleep essentially does the same thing as the schedule_timeout.
> > >
> >
> > Yes - so long as you set ->state when using schedule_timeout ;)
> 
> Nish could you please submit something to switch it to ssleep?

Yup, I'll send it out tomorrow.

-Nish

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20  2:10                               ` Zwane Mwaikambo
  2004-12-20  2:30                                 ` Nish Aravamudan
@ 2004-12-20 18:27                                 ` Nishanth Aravamudan
  2004-12-20 22:57                                   ` Zwane Mwaikambo
  1 sibling, 1 reply; 32+ messages in thread
From: Nishanth Aravamudan @ 2004-12-20 18:27 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Nick Piggin, Nish Aravamudan, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Sun, Dec 19, 2004 at 07:10:31PM -0700, Zwane Mwaikambo wrote:
> On Mon, 20 Dec 2004, Nick Piggin wrote:
> 
> > On Sun, 2004-12-19 at 18:44 -0700, Zwane Mwaikambo wrote:
> > > On Mon, 20 Dec 2004, Nick Piggin wrote:
> > > 
> > > > This thread can possibly be stalled forever if there is a CPU hog
> > > > running, right?
> > > 
> > > Yep.
> > > 
> > > > In which case, you will want to use ssleep rather than a busy loop.
> > > 
> > > Well ssleep essentially does the same thing as the schedule_timeout.
> > > 
> > 
> > Yes - so long as you set ->state when using schedule_timeout ;)
> 
> Nish could you please submit something to switch it to ssleep?

I believe the only files/patches that needed to be changed were the process.c
changes. Here they are re-worked to use ssleep(1) instead of
schedule_timeout(HZ).

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>


--- 2.6.10-rc3-v/arch/i386/kernel/process.c	2004-12-08 13:38:42.000000000 -0800
+++ 2.6.10-rc3/arch/i386/kernel/process.c	2004-12-20 10:15:54.000000000 -0800
@@ -72,6 +72,7 @@ unsigned long thread_saved_pc(struct tas
  * Powermanagement idle function, if any..
  */
 void (*pm_idle)(void);
+static cpumask_t cpu_idle_map;
 
 void disable_hlt(void)
 {
@@ -142,16 +143,16 @@ static void poll_idle (void)
  */
 void cpu_idle (void)
 {
+	int cpu = smp_processor_id();
+
 	/* endless idle loop with no priority at all */
 	while (1) {
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 
 			if (!idle)
@@ -159,12 +160,24 @@ void cpu_idle (void)
 
 			irq_stat[smp_processor_id()].idle_timestamp = jiffies;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}
 }
 
+void cpu_idle_wait(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		cpu_set(cpu, cpu_idle_map);
+
+	wmb();	
+	while (cpus_equal(cpu_idle_map, cpu_online_map))
+		ssleep(1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
--- 2.6.10-rc3-v/arch/ia64/kernel/process.c	2004-12-08 13:38:43.000000000 -0800
+++ 2.6.10-rc3/arch/ia64/kernel/process.c	2004-12-20 10:16:18.000000000 -0800
@@ -46,6 +46,7 @@
 #include "sigframe.h"
 
 void (*ia64_mark_idle)(int);
+static cpumask_t cpu_idle_map;
 
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
@@ -225,10 +226,24 @@ static inline void play_dead(void)
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+void cpu_idle_wait(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		cpu_set(cpu, cpu_idle_map);
+
+	wmb();	
+	while (cpus_equal(cpu_idle_map, cpu_online_map))
+		ssleep(1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 void __attribute__((noreturn))
 cpu_idle (void *unused)
 {
 	void (*mark_idle)(int) = ia64_mark_idle;
+	int cpu = smp_processor_id();
 
 	/* endless idle loop with no priority at all */
 	while (1) {
@@ -241,17 +256,14 @@ cpu_idle (void *unused)
 
 			if (mark_idle)
 				(*mark_idle)(1);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			(*idle)();
-			rcu_read_unlock();
 		}
 
 		if (mark_idle)
--- 2.6.10-rc3-v/arch/x86_64/kernel/process.c	2004-12-08 13:38:45.000000000 -0800
+++ 2.6.10-rc3/arch/x86_64/kernel/process.c	2004-12-20 10:16:59.000000000 -0800
@@ -61,6 +61,7 @@ EXPORT_SYMBOL(boot_option_idle_override)
  * Powermanagement idle function, if any..
  */
 void (*pm_idle)(void);
+static cpumask_t cpu_idle_map;
 
 void disable_hlt(void)
 {
@@ -123,6 +124,19 @@ static void poll_idle (void)
 	}
 }
 
+void cpu_idle_wait(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		cpu_set(cpu, cpu_idle_map);
+	
+	wmb();	
+	while (cpus_equal(cpu_idle_map, cpu_online_map))
+		ssleep(1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
 /*
  * The idle thread. There's no useful work to be
  * done, so just try to conserve power and have a
@@ -131,21 +145,20 @@ static void poll_idle (void)
  */
 void cpu_idle (void)
 {
+	int cpu = smp_processor_id();
+
 	/* endless idle loop with no priority at all */
 	while (1) {
 		while (!need_resched()) {
 			void (*idle)(void);
-			/*
-			 * Mark this as an RCU critical section so that
-			 * synchronize_kernel() in the unload path waits
-			 * for our completion.
-			 */
-			rcu_read_lock();
+
+			if (cpu_isset(cpu, cpu_idle_map))
+				cpu_clear(cpu, cpu_idle_map);
+			rmb();
 			idle = pm_idle;
 			if (!idle)
 				idle = default_idle;
 			idle();
-			rcu_read_unlock();
 		}
 		schedule();
 	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20 18:27                                 ` Nishanth Aravamudan
@ 2004-12-20 22:57                                   ` Zwane Mwaikambo
  2004-12-20 23:15                                     ` Andrew Morton
  2004-12-20 23:26                                     ` Nishanth Aravamudan
  0 siblings, 2 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-20 22:57 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Nick Piggin, Nish Aravamudan, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Mon, 20 Dec 2004, Nishanth Aravamudan wrote:

> I believe the only files/patches that needed to be changed were the process.c
> changes. Here they are re-worked to use ssleep(1) instead of

This makes it hard for the person integrating the patches to graft them 
together. How about just rediffing the whole lot so that my original patch 
gets replaced? Ideally it should be able to resolve the issue in 2.6-bk 
standalone.

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20 22:57                                   ` Zwane Mwaikambo
@ 2004-12-20 23:15                                     ` Andrew Morton
  2004-12-20 23:16                                       ` Zwane Mwaikambo
  2004-12-20 23:26                                     ` Nishanth Aravamudan
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-12-20 23:15 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: nacc, nickpiggin, nish.aravamudan, paulmck, sfr, linux-kernel,
	dipankar, shaohua.li, len.brown

Zwane Mwaikambo <zwane@arm.linux.org.uk> wrote:
>
> > I believe the only files/patches that needed to be changed were the process.c
> > changes. Here they are re-worked to use ssleep(1) instead of
> 
> This makes it hard for the person integrating the patches to graft them 
> together.

I gave up and just edited the original diff, with
s/schedule_timeout(HZ)/ssleep(1)/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20 23:15                                     ` Andrew Morton
@ 2004-12-20 23:16                                       ` Zwane Mwaikambo
  0 siblings, 0 replies; 32+ messages in thread
From: Zwane Mwaikambo @ 2004-12-20 23:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nacc, nickpiggin, nish.aravamudan, paulmck, sfr, linux-kernel,
	dipankar, shaohua.li, len.brown

On Mon, 20 Dec 2004, Andrew Morton wrote:

> Zwane Mwaikambo <zwane@arm.linux.org.uk> wrote:
> >
> > > I believe the only files/patches that needed to be changed were the process.c
> > > changes. Here they are re-worked to use ssleep(1) instead of
> > 
> > This makes it hard for the person integrating the patches to graft them 
> > together.
> 
> I gave up and just edited the original diff, with
> s/schedule_timeout(HZ)/ssleep(1)/

Thanks Andrew.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Remove RCU abuse in cpu_idle()
  2004-12-20 22:57                                   ` Zwane Mwaikambo
  2004-12-20 23:15                                     ` Andrew Morton
@ 2004-12-20 23:26                                     ` Nishanth Aravamudan
  1 sibling, 0 replies; 32+ messages in thread
From: Nishanth Aravamudan @ 2004-12-20 23:26 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Nick Piggin, Nish Aravamudan, Paul E. McKenney, Andrew Morton,
	Stephen Rothwell, Linux Kernel, Dipankar Sarma, Li Shaohua,
	Len Brown

On Mon, Dec 20, 2004 at 03:57:59PM -0700, Zwane Mwaikambo wrote:
> On Mon, 20 Dec 2004, Nishanth Aravamudan wrote:
> 
> > I believe the only files/patches that needed to be changed were the process.c
> > changes. Here they are re-worked to use ssleep(1) instead of
> 
> This makes it hard for the person integrating the patches to graft them 
> together. How about just rediffing the whole lot so that my original patch 
> gets replaced? Ideally it should be able to resolve the issue in 2.6-bk 
> standalone.

Sorry about that, I will do that next time, thanks for the info.

-Nish

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2004-12-20 23:33 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-05  0:45 Fw: [RFC] Strange code in cpu_idle() Paul E. McKenney
2004-12-06  0:16 ` Stephen Rothwell
2004-12-06  6:46   ` Stephen Rothwell
2004-12-06 10:00     ` Zwane Mwaikambo
2004-12-06  7:20   ` Andrew Morton
2004-12-06  9:38     ` Zwane Mwaikambo
2004-12-06 16:04       ` Paul E. McKenney
2004-12-06 16:47         ` Zwane Mwaikambo
2004-12-06 19:22           ` Paul E. McKenney
2004-12-11 15:07             ` Zwane Mwaikambo
2004-12-12  4:54               ` [PATCH] Remove RCU abuse " Zwane Mwaikambo
2004-12-12  5:06                 ` Zwane Mwaikambo
2004-12-12  5:49                   ` Zwane Mwaikambo
2004-12-13  6:13                     ` Andrew Morton
2004-12-13  6:22                       ` Zwane Mwaikambo
2004-12-13  6:32                         ` Andrew Morton
2004-12-13  7:09                           ` Zwane Mwaikambo
2004-12-13  6:41                         ` Andrew Morton
2004-12-13  7:13                           ` Zwane Mwaikambo
2004-12-19  2:40                     ` Nish Aravamudan
2004-12-20  0:59                       ` Zwane Mwaikambo
2004-12-20  1:15                         ` Nick Piggin
2004-12-20  1:44                           ` Zwane Mwaikambo
2004-12-20  1:56                             ` Nick Piggin
2004-12-20  2:10                               ` Zwane Mwaikambo
2004-12-20  2:30                                 ` Nish Aravamudan
2004-12-20 18:27                                 ` Nishanth Aravamudan
2004-12-20 22:57                                   ` Zwane Mwaikambo
2004-12-20 23:15                                     ` Andrew Morton
2004-12-20 23:16                                       ` Zwane Mwaikambo
2004-12-20 23:26                                     ` Nishanth Aravamudan
2004-12-20  2:26                               ` Nish Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).