linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 v2] rcu: Fix series of spurious RCU softirqs
@ 2010-12-10 21:11 Frederic Weisbecker
  2010-12-10 21:11 ` [PATCH 1/2] rcu: Stop chasing QS if another CPU did it for us Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-10 21:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Paul E . McKenney, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt

Hi,

Following Lai's idea.

An example of such series of spurious softirqs:

http://tglx.de/~fweisbec/trace_rcu_softirq.txt

In that example, the rcu softirq is raised at every tick
during 20 secs (was perhaps more, but the trace snapshot happened
during 20 secs). It happens randomly.

Ah and it survived several hours of rcutorture (with rcu cpu stall
detection).

Thanks.

Frederic Weisbecker (2):
  rcu: Stop chasing QS if another CPU did it for us
  rcu: Keep gpnum and completed fields synchronized

 kernel/rcutree.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

-- 
1.7.3.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] rcu: Stop chasing QS if another CPU did it for us
  2010-12-10 21:11 [PATCH 0/2 v2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker
@ 2010-12-10 21:11 ` Frederic Weisbecker
  2010-12-10 22:58   ` Paul E. McKenney
  2010-12-10 21:14 ` [PATCH 0/2 v2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker
       [not found] ` <1292015471-19227-3-git-send-email-fweisbec@gmail.com>
  2 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-10 21:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

When a CPU is idle and others CPUs handled its extended
quiescent state to complete grace periods on its behalf,
it will catch up with completed grace periods numbers
when it wakes up.

But at this point there might be no more grace period to
complete, but still the woken CPU always keeps its stale
qs_pending value and will then continue to chase quiescent
states even if its not needed anymore.

This results in clusters of spurious softirqs until a new
real grace period is started. Because if we continue to
chase quiescent states but we have completed every grace
periods, rcu_report_qs_rdp() is puzzled and makes that
state run into infinite loops.

As suggested by Lai Jiangshan, just reset qs_pending if
someone completed every grace periods on our behalf.

Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcutree.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ccdc04c..8c4ed60 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -681,6 +681,14 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
 
 		/* Remember that we saw this grace-period completion. */
 		rdp->completed = rnp->completed;
+
+		/*
+		 * If another CPU handled our extended quiescent states and
+		 * we have no more grace period to complete yet, then stop
+		 * chasing quiescent states.
+		 */
+		if (rdp->completed == rnp->gpnum)
+			rdp->qs_pending = 0;
 	}
 }
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2 v2] rcu: Fix series of spurious RCU softirqs
  2010-12-10 21:11 [PATCH 0/2 v2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker
  2010-12-10 21:11 ` [PATCH 1/2] rcu: Stop chasing QS if another CPU did it for us Frederic Weisbecker
@ 2010-12-10 21:14 ` Frederic Weisbecker
       [not found] ` <1292015471-19227-3-git-send-email-fweisbec@gmail.com>
  2 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-10 21:14 UTC (permalink / raw)
  To: Paul E. McKenney, Lai Jiangshan
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Steven Rostedt

(Adding Lai in Cc, I forgot my script doesn't handle Suggested-by: tags :)

On Fri, Dec 10, 2010 at 10:11:09PM +0100, Frederic Weisbecker wrote:
> Hi,
> 
> Following Lai's idea.
> 
> An example of such series of spurious softirqs:
> 
> http://tglx.de/~fweisbec/trace_rcu_softirq.txt
> 
> In that example, the rcu softirq is raised at every tick
> during 20 secs (was perhaps more, but the trace snapshot happened
> during 20 secs). It happens randomly.
> 
> Ah and it survived several hours of rcutorture (with rcu cpu stall
> detection).
> 
> Thanks.
> 
> Frederic Weisbecker (2):
>   rcu: Stop chasing QS if another CPU did it for us
>   rcu: Keep gpnum and completed fields synchronized
> 
>  kernel/rcutree.c |   17 +++++++++++++++++
>  1 files changed, 17 insertions(+), 0 deletions(-)
> 
> -- 
> 1.7.3.2
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] rcu: Stop chasing QS if another CPU did it for us
  2010-12-10 21:11 ` [PATCH 1/2] rcu: Stop chasing QS if another CPU did it for us Frederic Weisbecker
@ 2010-12-10 22:58   ` Paul E. McKenney
  2010-12-10 23:33     ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2010-12-10 22:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 10:11:10PM +0100, Frederic Weisbecker wrote:
> When a CPU is idle and others CPUs handled its extended
> quiescent state to complete grace periods on its behalf,
> it will catch up with completed grace periods numbers
> when it wakes up.
> 
> But at this point there might be no more grace period to
> complete, but still the woken CPU always keeps its stale
> qs_pending value and will then continue to chase quiescent
> states even if its not needed anymore.
> 
> This results in clusters of spurious softirqs until a new
> real grace period is started. Because if we continue to
> chase quiescent states but we have completed every grace
> periods, rcu_report_qs_rdp() is puzzled and makes that
> state run into infinite loops.
> 
> As suggested by Lai Jiangshan, just reset qs_pending if
> someone completed every grace periods on our behalf.

Nice!!!

I have queued this patch, and followed it up with a patch that changes
the condition to "rnp->qsmask & rdp->grpmask", which indicates that RCU
needs a quiescent state from the CPU, and is valid regardless of how
messed up the CPU is about which grace period is which.

I am making a similar change to the check in __note_new_gpnum().

Seem reasonable?

							Thanx, Paul

> Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/rcutree.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index ccdc04c..8c4ed60 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -681,6 +681,14 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> 
>  		/* Remember that we saw this grace-period completion. */
>  		rdp->completed = rnp->completed;
> +
> +		/*
> +		 * If another CPU handled our extended quiescent states and
> +		 * we have no more grace period to complete yet, then stop
> +		 * chasing quiescent states.
> +		 */
> +		if (rdp->completed == rnp->gpnum)
> +			rdp->qs_pending = 0;
>  	}
>  }
> 
> -- 
> 1.7.3.2
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
       [not found] ` <1292015471-19227-3-git-send-email-fweisbec@gmail.com>
@ 2010-12-10 23:02   ` Paul E. McKenney
  2010-12-10 23:39     ` Paul E. McKenney
  2010-12-11  0:00     ` Frederic Weisbecker
  0 siblings, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2010-12-10 23:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> When a CPU that was in an extended quiescent state wakes
> up and catches up with grace periods that remote CPUs
> completed on its behalf, we update the completed field
> but not the gpnum that keeps a stale value of a backward
> grace period ID.
> 
> Later, note_new_gpnum() will interpret the shift between
> the local CPU and the node grace period ID as some new grace
> period to handle and will then start to hunt quiescent state.
> 
> But if every grace periods have already been completed, this
> interpretation becomes broken. And we'll be stuck in clusters
> of spurious softirqs because rcu_report_qs_rdp() will make
> this broken state run into infinite loop.
> 
> The solution, as suggested by Lai Jiangshan, is to ensure that
> the gpnum and completed fields are well synchronized when we catch
> up with completed grace periods on their behalf by other cpus.
> This way we won't start noting spurious new grace periods.

Also good, queued!

One issue -- this approach is vulnerable to overflow.  I therefore
followed up with a patch that changes the condition to

	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))

And I clearly need to make RCU defend itself against the scenario where
a CPU stays in dyntick-idle mode long enough for the grace-period number
to wrap halfway around its range of possible values.  Not a problem at
the moment, and never will be for 64-bit systems, but...

I will fix that up.

							Thanx, Paul

> Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Steven Rostedt <rostedt@goodmis.org
> ---
>  kernel/rcutree.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 8c4ed60..2e16da3 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -683,6 +683,15 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
>  		rdp->completed = rnp->completed;
> 
>  		/*
> +		 * If we were in an extended quiescent state, we may have
> +		 * missed some grace periods that others CPUs took care on
> +		 * our behalf. Catch up with this state to avoid noting
> +		 * spurious new grace periods.
> +		 */
> +		if (rdp->completed > rdp->gpnum)
> +			rdp->gpnum = rdp->completed;
> +
> +		/*
>  		 * If another CPU handled our extended quiescent states and
>  		 * we have no more grace period to complete yet, then stop
>  		 * chasing quiescent states.
> -- 
> 1.7.3.2
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] rcu: Stop chasing QS if another CPU did it for us
  2010-12-10 22:58   ` Paul E. McKenney
@ 2010-12-10 23:33     ` Frederic Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-10 23:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 02:58:55PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 10, 2010 at 10:11:10PM +0100, Frederic Weisbecker wrote:
> > When a CPU is idle and others CPUs handled its extended
> > quiescent state to complete grace periods on its behalf,
> > it will catch up with completed grace periods numbers
> > when it wakes up.
> > 
> > But at this point there might be no more grace period to
> > complete, but still the woken CPU always keeps its stale
> > qs_pending value and will then continue to chase quiescent
> > states even if its not needed anymore.
> > 
> > This results in clusters of spurious softirqs until a new
> > real grace period is started. Because if we continue to
> > chase quiescent states but we have completed every grace
> > periods, rcu_report_qs_rdp() is puzzled and makes that
> > state run into infinite loops.
> > 
> > As suggested by Lai Jiangshan, just reset qs_pending if
> > someone completed every grace periods on our behalf.
> 
> Nice!!!
> 
> I have queued this patch, and followed it up with a patch that changes
> the condition to "rnp->qsmask & rdp->grpmask", which indicates that RCU
> needs a quiescent state from the CPU, and is valid regardless of how
> messed up the CPU is about which grace period is which.
> 
> I am making a similar change to the check in __note_new_gpnum().
> 
> Seem reasonable?

Look good yeah.

Thanks!

> 
> 							Thanx, Paul
> 
> > Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  kernel/rcutree.c |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index ccdc04c..8c4ed60 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -681,6 +681,14 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> > 
> >  		/* Remember that we saw this grace-period completion. */
> >  		rdp->completed = rnp->completed;
> > +
> > +		/*
> > +		 * If another CPU handled our extended quiescent states and
> > +		 * we have no more grace period to complete yet, then stop
> > +		 * chasing quiescent states.
> > +		 */
> > +		if (rdp->completed == rnp->gpnum)
> > +			rdp->qs_pending = 0;
> >  	}
> >  }
> > 
> > -- 
> > 1.7.3.2
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-10 23:02   ` [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized Paul E. McKenney
@ 2010-12-10 23:39     ` Paul E. McKenney
  2010-12-10 23:47       ` Frederic Weisbecker
  2010-12-11  0:00     ` Frederic Weisbecker
  1 sibling, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2010-12-10 23:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > When a CPU that was in an extended quiescent state wakes
> > up and catches up with grace periods that remote CPUs
> > completed on its behalf, we update the completed field
> > but not the gpnum that keeps a stale value of a backward
> > grace period ID.
> > 
> > Later, note_new_gpnum() will interpret the shift between
> > the local CPU and the node grace period ID as some new grace
> > period to handle and will then start to hunt quiescent state.
> > 
> > But if every grace periods have already been completed, this
> > interpretation becomes broken. And we'll be stuck in clusters
> > of spurious softirqs because rcu_report_qs_rdp() will make
> > this broken state run into infinite loop.
> > 
> > The solution, as suggested by Lai Jiangshan, is to ensure that
> > the gpnum and completed fields are well synchronized when we catch
> > up with completed grace periods on their behalf by other cpus.
> > This way we won't start noting spurious new grace periods.
> 
> Also good, queued!
> 
> One issue -- this approach is vulnerable to overflow.  I therefore
> followed up with a patch that changes the condition to
> 
> 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))

And here is the follow-up patch, FWIW.

							Thanx, Paul

------------------------------------------------------------------------

commit d864b245030645e3465b3bd7e253b7ccf76e9d35
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Fri Dec 10 15:02:47 2010 -0800

    rcu: fine-tune grace-period begin/end checks
    
    Use the CPU's bit in rnp->qsmask to determine whether or not the CPU
    should try to report a quiescent state.  Handle overflow in the check
    for rdp->gpnum having fallen behind.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f8e4ee7..6103017 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -618,20 +618,16 @@ static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct
 {
 	if (rdp->gpnum != rnp->gpnum) {
 		/*
-		 * Because RCU checks for the prior grace period ending
-		 * before checking for a new grace period starting, it
-		 * is possible for rdp->gpnum to be set to the old grace
-		 * period and rdp->completed to be set to the new grace
-		 * period.  So don't bother checking for a quiescent state
-		 * for the rnp->gpnum grace period unless it really is
-		 * waiting for this CPU.
+		 * If the current grace period is waiting for this CPU,
+		 * set up to detect a quiescent state, otherwise don't
+		 * go looking for one.
 		 */
-		if (rdp->completed != rnp->gpnum) {
+		rdp->gpnum = rnp->gpnum;
+		if (rnp->qsmask & rdp->grpmask) {
 			rdp->qs_pending = 1;
 			rdp->passed_quiesc = 0;
-		}
-
-		rdp->gpnum = rnp->gpnum;
+		} else
+			rdp->qs_pending = 0;
 	}
 }
 
@@ -693,19 +689,20 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
 
 		/*
 		 * If we were in an extended quiescent state, we may have
-		 * missed some grace periods that others CPUs took care on
+		 * missed some grace periods that others CPUs handled on
 		 * our behalf. Catch up with this state to avoid noting
-		 * spurious new grace periods.
+		 * spurious new grace periods.  If another grace period
+		 * has started, then rnp->gpnum will have advanced, so
+		 * we will detect this later on.
 		 */
-		if (rdp->completed > rdp->gpnum)
+		if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
 			rdp->gpnum = rdp->completed;
 
 		/*
-		 * If another CPU handled our extended quiescent states and
-		 * we have no more grace period to complete yet, then stop
-		 * chasing quiescent states.
+		 * If RCU does not need a quiescent state from this CPU,
+		 * then make sure that this CPU doesn't go looking for one.
 		 */
-		if (rdp->completed == rnp->gpnum)
+		if (rnp->qsmask & rdp->grpmask)
 			rdp->qs_pending = 0;
 	}
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-10 23:39     ` Paul E. McKenney
@ 2010-12-10 23:47       ` Frederic Weisbecker
  2010-12-11  0:04         ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-10 23:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 03:39:20PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > When a CPU that was in an extended quiescent state wakes
> > > up and catches up with grace periods that remote CPUs
> > > completed on its behalf, we update the completed field
> > > but not the gpnum that keeps a stale value of a backward
> > > grace period ID.
> > > 
> > > Later, note_new_gpnum() will interpret the shift between
> > > the local CPU and the node grace period ID as some new grace
> > > period to handle and will then start to hunt quiescent state.
> > > 
> > > But if every grace periods have already been completed, this
> > > interpretation becomes broken. And we'll be stuck in clusters
> > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > this broken state run into infinite loop.
> > > 
> > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > the gpnum and completed fields are well synchronized when we catch
> > > up with completed grace periods on their behalf by other cpus.
> > > This way we won't start noting spurious new grace periods.
> > 
> > Also good, queued!
> > 
> > One issue -- this approach is vulnerable to overflow.  I therefore
> > followed up with a patch that changes the condition to
> > 
> > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> 
> And here is the follow-up patch, FWIW.
> 
> 							Thanx, Paul

Hmm, it doesn't apply on top of my two patches. It seems you have
kept my two previous patches, which makes it fail as it lacks them
as a base.

Did you intend to keep them? I hope they are quite useless now, otherwise
it means there is other cases I forgot.

Thanks.

> 
> ------------------------------------------------------------------------
> 
> commit d864b245030645e3465b3bd7e253b7ccf76e9d35
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Fri Dec 10 15:02:47 2010 -0800
> 
>     rcu: fine-tune grace-period begin/end checks
>     
>     Use the CPU's bit in rnp->qsmask to determine whether or not the CPU
>     should try to report a quiescent state.  Handle overflow in the check
>     for rdp->gpnum having fallen behind.
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f8e4ee7..6103017 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -618,20 +618,16 @@ static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct
>  {
>  	if (rdp->gpnum != rnp->gpnum) {
>  		/*
> -		 * Because RCU checks for the prior grace period ending
> -		 * before checking for a new grace period starting, it
> -		 * is possible for rdp->gpnum to be set to the old grace
> -		 * period and rdp->completed to be set to the new grace
> -		 * period.  So don't bother checking for a quiescent state
> -		 * for the rnp->gpnum grace period unless it really is
> -		 * waiting for this CPU.
> +		 * If the current grace period is waiting for this CPU,
> +		 * set up to detect a quiescent state, otherwise don't
> +		 * go looking for one.
>  		 */
> -		if (rdp->completed != rnp->gpnum) {
> +		rdp->gpnum = rnp->gpnum;
> +		if (rnp->qsmask & rdp->grpmask) {
>  			rdp->qs_pending = 1;
>  			rdp->passed_quiesc = 0;
> -		}
> -
> -		rdp->gpnum = rnp->gpnum;
> +		} else
> +			rdp->qs_pending = 0;
>  	}
>  }
>  
> @@ -693,19 +689,20 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
>  
>  		/*
>  		 * If we were in an extended quiescent state, we may have
> -		 * missed some grace periods that others CPUs took care on
> +		 * missed some grace periods that others CPUs handled on
>  		 * our behalf. Catch up with this state to avoid noting
> -		 * spurious new grace periods.
> +		 * spurious new grace periods.  If another grace period
> +		 * has started, then rnp->gpnum will have advanced, so
> +		 * we will detect this later on.
>  		 */
> -		if (rdp->completed > rdp->gpnum)
> +		if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
>  			rdp->gpnum = rdp->completed;
>  
>  		/*
> -		 * If another CPU handled our extended quiescent states and
> -		 * we have no more grace period to complete yet, then stop
> -		 * chasing quiescent states.
> +		 * If RCU does not need a quiescent state from this CPU,
> +		 * then make sure that this CPU doesn't go looking for one.
>  		 */
> -		if (rdp->completed == rnp->gpnum)
> +		if (rnp->qsmask & rdp->grpmask)
>  			rdp->qs_pending = 0;
>  	}
>  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-10 23:02   ` [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized Paul E. McKenney
  2010-12-10 23:39     ` Paul E. McKenney
@ 2010-12-11  0:00     ` Frederic Weisbecker
  2010-12-11  0:48       ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-11  0:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > When a CPU that was in an extended quiescent state wakes
> > up and catches up with grace periods that remote CPUs
> > completed on its behalf, we update the completed field
> > but not the gpnum that keeps a stale value of a backward
> > grace period ID.
> > 
> > Later, note_new_gpnum() will interpret the shift between
> > the local CPU and the node grace period ID as some new grace
> > period to handle and will then start to hunt quiescent state.
> > 
> > But if every grace periods have already been completed, this
> > interpretation becomes broken. And we'll be stuck in clusters
> > of spurious softirqs because rcu_report_qs_rdp() will make
> > this broken state run into infinite loop.
> > 
> > The solution, as suggested by Lai Jiangshan, is to ensure that
> > the gpnum and completed fields are well synchronized when we catch
> > up with completed grace periods on their behalf by other cpus.
> > This way we won't start noting spurious new grace periods.
> 
> Also good, queued!
> 
> One issue -- this approach is vulnerable to overflow.  I therefore
> followed up with a patch that changes the condition to
> 
> 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> 
> And I clearly need to make RCU defend itself against the scenario where
> a CPU stays in dyntick-idle mode long enough for the grace-period number
> to wrap halfway around its range of possible values.  Not a problem at
> the moment, and never will be for 64-bit systems, but...
> 
> I will fix that up.

Oh you're right of course. I did not think about possible overflows.

Now looking at ULONG_CMP_LT() definition, if it wraps more than halfways
we are screwed anyway. I suspect it won't ever happen, but it can. Perhaps
we need some watchguard code in note_new_gpnum() to fixup that corner case.


> 
> 							Thanx, Paul
> 
> > Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Steven Rostedt <rostedt@goodmis.org
> > ---
> >  kernel/rcutree.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 8c4ed60..2e16da3 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -683,6 +683,15 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> >  		rdp->completed = rnp->completed;
> > 
> >  		/*
> > +		 * If we were in an extended quiescent state, we may have
> > +		 * missed some grace periods that others CPUs took care on
> > +		 * our behalf. Catch up with this state to avoid noting
> > +		 * spurious new grace periods.
> > +		 */
> > +		if (rdp->completed > rdp->gpnum)
> > +			rdp->gpnum = rdp->completed;
> > +
> > +		/*
> >  		 * If another CPU handled our extended quiescent states and
> >  		 * we have no more grace period to complete yet, then stop
> >  		 * chasing quiescent states.
> > -- 
> > 1.7.3.2
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-10 23:47       ` Frederic Weisbecker
@ 2010-12-11  0:04         ` Paul E. McKenney
  2010-12-11  0:15           ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2010-12-11  0:04 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Sat, Dec 11, 2010 at 12:47:11AM +0100, Frederic Weisbecker wrote:
> On Fri, Dec 10, 2010 at 03:39:20PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > > When a CPU that was in an extended quiescent state wakes
> > > > up and catches up with grace periods that remote CPUs
> > > > completed on its behalf, we update the completed field
> > > > but not the gpnum that keeps a stale value of a backward
> > > > grace period ID.
> > > > 
> > > > Later, note_new_gpnum() will interpret the shift between
> > > > the local CPU and the node grace period ID as some new grace
> > > > period to handle and will then start to hunt quiescent state.
> > > > 
> > > > But if every grace periods have already been completed, this
> > > > interpretation becomes broken. And we'll be stuck in clusters
> > > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > > this broken state run into infinite loop.
> > > > 
> > > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > > the gpnum and completed fields are well synchronized when we catch
> > > > up with completed grace periods on their behalf by other cpus.
> > > > This way we won't start noting spurious new grace periods.
> > > 
> > > Also good, queued!
> > > 
> > > One issue -- this approach is vulnerable to overflow.  I therefore
> > > followed up with a patch that changes the condition to
> > > 
> > > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > 
> > And here is the follow-up patch, FWIW.
> > 
> > 							Thanx, Paul
> 
> Hmm, it doesn't apply on top of my two patches. It seems you have
> kept my two previous patches, which makes it fail as it lacks them
> as a base.
> 
> Did you intend to keep them? I hope they are quite useless now, otherwise
> it means there is other cases I forgot.

One is indeed useless, while the other is useful in combinations of
dyntick-idle and force_quiescent_state().  I rebased your earlier two
out and reworked mine, please see below.  Work better?

							Thanx, Paul

------------------------------------------------------------------------

commit c808bedd1b1d7c720546a6682fca44c66703af4e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Fri Dec 10 15:02:47 2010 -0800

    rcu: fine-tune grace-period begin/end checks
    
    Use the CPU's bit in rnp->qsmask to determine whether or not the CPU
    should try to report a quiescent state.  Handle overflow in the check
    for rdp->gpnum having fallen behind.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 368be76..530cdcd 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -616,9 +616,17 @@ static void __init check_cpu_stall_init(void)
 static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp)
 {
 	if (rdp->gpnum != rnp->gpnum) {
-		rdp->qs_pending = 1;
-		rdp->passed_quiesc = 0;
+		/*
+		 * If the current grace period is waiting for this CPU,
+		 * set up to detect a quiescent state, otherwise don't
+		 * go looking for one.
+		 */
 		rdp->gpnum = rnp->gpnum;
+		if (rnp->qsmask & rdp->grpmask) {
+			rdp->qs_pending = 1;
+			rdp->passed_quiesc = 0;
+		} else
+			rdp->qs_pending = 0;
 	}
 }
 
@@ -680,19 +688,20 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
 
 		/*
 		 * If we were in an extended quiescent state, we may have
-		 * missed some grace periods that others CPUs took care on
+		 * missed some grace periods that others CPUs handled on
 		 * our behalf. Catch up with this state to avoid noting
-		 * spurious new grace periods.
+		 * spurious new grace periods.  If another grace period
+		 * has started, then rnp->gpnum will have advanced, so
+		 * we will detect this later on.
 		 */
-		if (rdp->completed > rdp->gpnum)
+		if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
 			rdp->gpnum = rdp->completed;
 
 		/*
-		 * If another CPU handled our extended quiescent states and
-		 * we have no more grace period to complete yet, then stop
-		 * chasing quiescent states.
+		 * If RCU does not need a quiescent state from this CPU,
+		 * then make sure that this CPU doesn't go looking for one.
 		 */
-		if (rdp->completed == rnp->gpnum)
+		if (rnp->qsmask & rdp->grpmask)
 			rdp->qs_pending = 0;
 	}
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-11  0:04         ` Paul E. McKenney
@ 2010-12-11  0:15           ` Frederic Weisbecker
  2010-12-11  0:58             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-11  0:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 04:04:51PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 11, 2010 at 12:47:11AM +0100, Frederic Weisbecker wrote:
> > On Fri, Dec 10, 2010 at 03:39:20PM -0800, Paul E. McKenney wrote:
> > > On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > > > When a CPU that was in an extended quiescent state wakes
> > > > > up and catches up with grace periods that remote CPUs
> > > > > completed on its behalf, we update the completed field
> > > > > but not the gpnum that keeps a stale value of a backward
> > > > > grace period ID.
> > > > > 
> > > > > Later, note_new_gpnum() will interpret the shift between
> > > > > the local CPU and the node grace period ID as some new grace
> > > > > period to handle and will then start to hunt quiescent state.
> > > > > 
> > > > > But if every grace periods have already been completed, this
> > > > > interpretation becomes broken. And we'll be stuck in clusters
> > > > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > > > this broken state run into infinite loop.
> > > > > 
> > > > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > > > the gpnum and completed fields are well synchronized when we catch
> > > > > up with completed grace periods on their behalf by other cpus.
> > > > > This way we won't start noting spurious new grace periods.
> > > > 
> > > > Also good, queued!
> > > > 
> > > > One issue -- this approach is vulnerable to overflow.  I therefore
> > > > followed up with a patch that changes the condition to
> > > > 
> > > > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > > 
> > > And here is the follow-up patch, FWIW.
> > > 
> > > 							Thanx, Paul
> > 
> > Hmm, it doesn't apply on top of my two patches. It seems you have
> > kept my two previous patches, which makes it fail as it lacks them
> > as a base.
> > 
> > Did you intend to keep them? I hope they are quite useless now, otherwise
> > it means there is other cases I forgot.
> 
> One is indeed useless, while the other is useful in combinations of
> dyntick-idle and force_quiescent_state().

I don't see how.

Before we call __note_new_gpnum(), we always have the opportunity
to resync gpnum and completed as  __rcu_process_gp_end() is called
before.

Am I missing something?

Thanks.

>  I rebased your earlier two
> out and reworked mine, please see below.  Work better?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit c808bedd1b1d7c720546a6682fca44c66703af4e
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Fri Dec 10 15:02:47 2010 -0800
> 
>     rcu: fine-tune grace-period begin/end checks
>     
>     Use the CPU's bit in rnp->qsmask to determine whether or not the CPU
>     should try to report a quiescent state.  Handle overflow in the check
>     for rdp->gpnum having fallen behind.
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 368be76..530cdcd 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -616,9 +616,17 @@ static void __init check_cpu_stall_init(void)
>  static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp)
>  {
>  	if (rdp->gpnum != rnp->gpnum) {
> -		rdp->qs_pending = 1;
> -		rdp->passed_quiesc = 0;
> +		/*
> +		 * If the current grace period is waiting for this CPU,
> +		 * set up to detect a quiescent state, otherwise don't
> +		 * go looking for one.
> +		 */
>  		rdp->gpnum = rnp->gpnum;
> +		if (rnp->qsmask & rdp->grpmask) {
> +			rdp->qs_pending = 1;
> +			rdp->passed_quiesc = 0;
> +		} else
> +			rdp->qs_pending = 0;
>  	}
>  }
>  
> @@ -680,19 +688,20 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
>  
>  		/*
>  		 * If we were in an extended quiescent state, we may have
> -		 * missed some grace periods that others CPUs took care on
> +		 * missed some grace periods that others CPUs handled on
>  		 * our behalf. Catch up with this state to avoid noting
> -		 * spurious new grace periods.
> +		 * spurious new grace periods.  If another grace period
> +		 * has started, then rnp->gpnum will have advanced, so
> +		 * we will detect this later on.
>  		 */
> -		if (rdp->completed > rdp->gpnum)
> +		if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
>  			rdp->gpnum = rdp->completed;
>  
>  		/*
> -		 * If another CPU handled our extended quiescent states and
> -		 * we have no more grace period to complete yet, then stop
> -		 * chasing quiescent states.
> +		 * If RCU does not need a quiescent state from this CPU,
> +		 * then make sure that this CPU doesn't go looking for one.
>  		 */
> -		if (rdp->completed == rnp->gpnum)
> +		if (rnp->qsmask & rdp->grpmask)
>  			rdp->qs_pending = 0;
>  	}
>  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-11  0:00     ` Frederic Weisbecker
@ 2010-12-11  0:48       ` Paul E. McKenney
  2010-12-11  0:51         ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2010-12-11  0:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Sat, Dec 11, 2010 at 01:00:39AM +0100, Frederic Weisbecker wrote:
> On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > When a CPU that was in an extended quiescent state wakes
> > > up and catches up with grace periods that remote CPUs
> > > completed on its behalf, we update the completed field
> > > but not the gpnum that keeps a stale value of a backward
> > > grace period ID.
> > > 
> > > Later, note_new_gpnum() will interpret the shift between
> > > the local CPU and the node grace period ID as some new grace
> > > period to handle and will then start to hunt quiescent state.
> > > 
> > > But if every grace periods have already been completed, this
> > > interpretation becomes broken. And we'll be stuck in clusters
> > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > this broken state run into infinite loop.
> > > 
> > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > the gpnum and completed fields are well synchronized when we catch
> > > up with completed grace periods on their behalf by other cpus.
> > > This way we won't start noting spurious new grace periods.
> > 
> > Also good, queued!
> > 
> > One issue -- this approach is vulnerable to overflow.  I therefore
> > followed up with a patch that changes the condition to
> > 
> > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > 
> > And I clearly need to make RCU defend itself against the scenario where
> > a CPU stays in dyntick-idle mode long enough for the grace-period number
> > to wrap halfway around its range of possible values.  Not a problem at
> > the moment, and never will be for 64-bit systems, but...
> > 
> > I will fix that up.
> 
> Oh you're right of course. I did not think about possible overflows.
> 
> Now looking at ULONG_CMP_LT() definition, if it wraps more than halfways
> we are screwed anyway. I suspect it won't ever happen, but it can. Perhaps
> we need some watchguard code in note_new_gpnum() to fixup that corner case.

We still have to guard against a full wrap, though there are lots of other
things that break if you stay in dyntick-idle mode that long.  My plan is
to have a counter in the rcu_state structure that cycles through the CPUs.
Check the current CPU at the start of each grace period.  If a given
CPU is more than one-quarter of the way behind, force it to wake up
and catch up.  This gets easier to do given my in-progress changes to
convert from softirq to kthread, so I will combine it with those changes.

So I will cover this.

							Thanx, Paul

> > > Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Cc: Ingo Molnar <mingo@elte.hu>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > Cc: Steven Rostedt <rostedt@goodmis.org
> > > ---
> > >  kernel/rcutree.c |    9 +++++++++
> > >  1 files changed, 9 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > index 8c4ed60..2e16da3 100644
> > > --- a/kernel/rcutree.c
> > > +++ b/kernel/rcutree.c
> > > @@ -683,6 +683,15 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> > >  		rdp->completed = rnp->completed;
> > > 
> > >  		/*
> > > +		 * If we were in an extended quiescent state, we may have
> > > +		 * missed some grace periods that others CPUs took care on
> > > +		 * our behalf. Catch up with this state to avoid noting
> > > +		 * spurious new grace periods.
> > > +		 */
> > > +		if (rdp->completed > rdp->gpnum)
> > > +			rdp->gpnum = rdp->completed;
> > > +
> > > +		/*
> > >  		 * If another CPU handled our extended quiescent states and
> > >  		 * we have no more grace period to complete yet, then stop
> > >  		 * chasing quiescent states.
> > > -- 
> > > 1.7.3.2
> > > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-11  0:48       ` Paul E. McKenney
@ 2010-12-11  0:51         ` Frederic Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-11  0:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 04:48:43PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 11, 2010 at 01:00:39AM +0100, Frederic Weisbecker wrote:
> > On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > > When a CPU that was in an extended quiescent state wakes
> > > > up and catches up with grace periods that remote CPUs
> > > > completed on its behalf, we update the completed field
> > > > but not the gpnum that keeps a stale value of a backward
> > > > grace period ID.
> > > > 
> > > > Later, note_new_gpnum() will interpret the shift between
> > > > the local CPU and the node grace period ID as some new grace
> > > > period to handle and will then start to hunt quiescent state.
> > > > 
> > > > But if every grace periods have already been completed, this
> > > > interpretation becomes broken. And we'll be stuck in clusters
> > > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > > this broken state run into infinite loop.
> > > > 
> > > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > > the gpnum and completed fields are well synchronized when we catch
> > > > up with completed grace periods on their behalf by other cpus.
> > > > This way we won't start noting spurious new grace periods.
> > > 
> > > Also good, queued!
> > > 
> > > One issue -- this approach is vulnerable to overflow.  I therefore
> > > followed up with a patch that changes the condition to
> > > 
> > > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > > 
> > > And I clearly need to make RCU defend itself against the scenario where
> > > a CPU stays in dyntick-idle mode long enough for the grace-period number
> > > to wrap halfway around its range of possible values.  Not a problem at
> > > the moment, and never will be for 64-bit systems, but...
> > > 
> > > I will fix that up.
> > 
> > Oh you're right of course. I did not think about possible overflows.
> > 
> > Now looking at ULONG_CMP_LT() definition, if it wraps more than halfways
> > we are screwed anyway. I suspect it won't ever happen, but it can. Perhaps
> > we need some watchguard code in note_new_gpnum() to fixup that corner case.
> 
> We still have to guard against a full wrap, though there are lots of other
> things that break if you stay in dyntick-idle mode that long.  My plan is
> to have a counter in the rcu_state structure that cycles through the CPUs.
> Check the current CPU at the start of each grace period.  If a given
> CPU is more than one-quarter of the way behind, force it to wake up
> and catch up.  This gets easier to do given my in-progress changes to
> convert from softirq to kthread, so I will combine it with those changes.
> 
> So I will cover this.

Fine!

Thanks.

> 
> 							Thanx, Paul
> 
> > > > Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > Cc: Ingo Molnar <mingo@elte.hu>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > Cc: Steven Rostedt <rostedt@goodmis.org
> > > > ---
> > > >  kernel/rcutree.c |    9 +++++++++
> > > >  1 files changed, 9 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > index 8c4ed60..2e16da3 100644
> > > > --- a/kernel/rcutree.c
> > > > +++ b/kernel/rcutree.c
> > > > @@ -683,6 +683,15 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> > > >  		rdp->completed = rnp->completed;
> > > > 
> > > >  		/*
> > > > +		 * If we were in an extended quiescent state, we may have
> > > > +		 * missed some grace periods that others CPUs took care on
> > > > +		 * our behalf. Catch up with this state to avoid noting
> > > > +		 * spurious new grace periods.
> > > > +		 */
> > > > +		if (rdp->completed > rdp->gpnum)
> > > > +			rdp->gpnum = rdp->completed;
> > > > +
> > > > +		/*
> > > >  		 * If another CPU handled our extended quiescent states and
> > > >  		 * we have no more grace period to complete yet, then stop
> > > >  		 * chasing quiescent states.
> > > > -- 
> > > > 1.7.3.2
> > > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-11  0:15           ` Frederic Weisbecker
@ 2010-12-11  0:58             ` Paul E. McKenney
  2010-12-11  1:21               ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2010-12-11  0:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Sat, Dec 11, 2010 at 01:15:17AM +0100, Frederic Weisbecker wrote:
> On Fri, Dec 10, 2010 at 04:04:51PM -0800, Paul E. McKenney wrote:
> > On Sat, Dec 11, 2010 at 12:47:11AM +0100, Frederic Weisbecker wrote:
> > > On Fri, Dec 10, 2010 at 03:39:20PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > > > > When a CPU that was in an extended quiescent state wakes
> > > > > > up and catches up with grace periods that remote CPUs
> > > > > > completed on its behalf, we update the completed field
> > > > > > but not the gpnum that keeps a stale value of a backward
> > > > > > grace period ID.
> > > > > > 
> > > > > > Later, note_new_gpnum() will interpret the shift between
> > > > > > the local CPU and the node grace period ID as some new grace
> > > > > > period to handle and will then start to hunt quiescent state.
> > > > > > 
> > > > > > But if every grace periods have already been completed, this
> > > > > > interpretation becomes broken. And we'll be stuck in clusters
> > > > > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > > > > this broken state run into infinite loop.
> > > > > > 
> > > > > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > > > > the gpnum and completed fields are well synchronized when we catch
> > > > > > up with completed grace periods on their behalf by other cpus.
> > > > > > This way we won't start noting spurious new grace periods.
> > > > > 
> > > > > Also good, queued!
> > > > > 
> > > > > One issue -- this approach is vulnerable to overflow.  I therefore
> > > > > followed up with a patch that changes the condition to
> > > > > 
> > > > > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > > > 
> > > > And here is the follow-up patch, FWIW.
> > > > 
> > > > 							Thanx, Paul
> > > 
> > > Hmm, it doesn't apply on top of my two patches. It seems you have
> > > kept my two previous patches, which makes it fail as it lacks them
> > > as a base.
> > > 
> > > Did you intend to keep them? I hope they are quite useless now, otherwise
> > > it means there is other cases I forgot.
> > 
> > One is indeed useless, while the other is useful in combinations of
> > dyntick-idle and force_quiescent_state().
> 
> I don't see how.
> 
> Before we call __note_new_gpnum(), we always have the opportunity
> to resync gpnum and completed as  __rcu_process_gp_end() is called
> before.
> 
> Am I missing something?

If the CPU is already aware of the end of the previous grace period,
then __rcu_process_gp_end() will return without doing anything.  But if
force_quiescent_state() already took care of this CPU, there is no point
in its looking for another quiescent state.  This can happen as follows:

o	CPU 0 notes the end of the previous grace period and then
	enters dyntick-idle mode.

o	CPU 2 enters a very long RCU read-side critical section.

o	CPU 1 starts a new grace period.

o	CPU 0 does not check in because it is in dyntick-idle mode.

o	CPU 1 eventually calls force_quiescent_state() a few times,
	and sees that CPU 0 is in dyntick-idle mode, so tells RCU
	that CPU 0 is in an extended quiescent state.  But the
	grace period cannot end because CPU 2 is still in its
	RCU read-side critical section.

o	CPU 0 comes out of dyntick-idle mode, and sees the new
	grace period.  The old code would nevertheless look for
	a quiescent state, and the new code would avoid doing so.

Unless I am missing something, of course...

							Thanx, Paul

> Thanks.
> 
> >  I rebased your earlier two
> > out and reworked mine, please see below.  Work better?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit c808bedd1b1d7c720546a6682fca44c66703af4e
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Fri Dec 10 15:02:47 2010 -0800
> > 
> >     rcu: fine-tune grace-period begin/end checks
> >     
> >     Use the CPU's bit in rnp->qsmask to determine whether or not the CPU
> >     should try to report a quiescent state.  Handle overflow in the check
> >     for rdp->gpnum having fallen behind.
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 368be76..530cdcd 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -616,9 +616,17 @@ static void __init check_cpu_stall_init(void)
> >  static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp)
> >  {
> >  	if (rdp->gpnum != rnp->gpnum) {
> > -		rdp->qs_pending = 1;
> > -		rdp->passed_quiesc = 0;
> > +		/*
> > +		 * If the current grace period is waiting for this CPU,
> > +		 * set up to detect a quiescent state, otherwise don't
> > +		 * go looking for one.
> > +		 */
> >  		rdp->gpnum = rnp->gpnum;
> > +		if (rnp->qsmask & rdp->grpmask) {
> > +			rdp->qs_pending = 1;
> > +			rdp->passed_quiesc = 0;
> > +		} else
> > +			rdp->qs_pending = 0;
> >  	}
> >  }
> >  
> > @@ -680,19 +688,20 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> >  
> >  		/*
> >  		 * If we were in an extended quiescent state, we may have
> > -		 * missed some grace periods that others CPUs took care on
> > +		 * missed some grace periods that others CPUs handled on
> >  		 * our behalf. Catch up with this state to avoid noting
> > -		 * spurious new grace periods.
> > +		 * spurious new grace periods.  If another grace period
> > +		 * has started, then rnp->gpnum will have advanced, so
> > +		 * we will detect this later on.
> >  		 */
> > -		if (rdp->completed > rdp->gpnum)
> > +		if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> >  			rdp->gpnum = rdp->completed;
> >  
> >  		/*
> > -		 * If another CPU handled our extended quiescent states and
> > -		 * we have no more grace period to complete yet, then stop
> > -		 * chasing quiescent states.
> > +		 * If RCU does not need a quiescent state from this CPU,
> > +		 * then make sure that this CPU doesn't go looking for one.
> >  		 */
> > -		if (rdp->completed == rnp->gpnum)
> > +		if (rnp->qsmask & rdp->grpmask)
> >  			rdp->qs_pending = 0;
> >  	}
> >  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-11  0:58             ` Paul E. McKenney
@ 2010-12-11  1:21               ` Frederic Weisbecker
  2010-12-11  6:36                 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-12-11  1:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Fri, Dec 10, 2010 at 04:58:27PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 11, 2010 at 01:15:17AM +0100, Frederic Weisbecker wrote:
> > On Fri, Dec 10, 2010 at 04:04:51PM -0800, Paul E. McKenney wrote:
> > > On Sat, Dec 11, 2010 at 12:47:11AM +0100, Frederic Weisbecker wrote:
> > > > On Fri, Dec 10, 2010 at 03:39:20PM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > > > > > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > > > > > When a CPU that was in an extended quiescent state wakes
> > > > > > > up and catches up with grace periods that remote CPUs
> > > > > > > completed on its behalf, we update the completed field
> > > > > > > but not the gpnum that keeps a stale value of a backward
> > > > > > > grace period ID.
> > > > > > > 
> > > > > > > Later, note_new_gpnum() will interpret the shift between
> > > > > > > the local CPU and the node grace period ID as some new grace
> > > > > > > period to handle and will then start to hunt quiescent state.
> > > > > > > 
> > > > > > > But if every grace periods have already been completed, this
> > > > > > > interpretation becomes broken. And we'll be stuck in clusters
> > > > > > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > > > > > this broken state run into infinite loop.
> > > > > > > 
> > > > > > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > > > > > the gpnum and completed fields are well synchronized when we catch
> > > > > > > up with completed grace periods on their behalf by other cpus.
> > > > > > > This way we won't start noting spurious new grace periods.
> > > > > > 
> > > > > > Also good, queued!
> > > > > > 
> > > > > > One issue -- this approach is vulnerable to overflow.  I therefore
> > > > > > followed up with a patch that changes the condition to
> > > > > > 
> > > > > > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > > > > 
> > > > > And here is the follow-up patch, FWIW.
> > > > > 
> > > > > 							Thanx, Paul
> > > > 
> > > > Hmm, it doesn't apply on top of my two patches. It seems you have
> > > > kept my two previous patches, which makes it fail as it lacks them
> > > > as a base.
> > > > 
> > > > Did you intend to keep them? I hope they are quite useless now, otherwise
> > > > it means there is other cases I forgot.
> > > 
> > > One is indeed useless, while the other is useful in combinations of
> > > dyntick-idle and force_quiescent_state().
> > 
> > I don't see how.
> > 
> > Before we call __note_new_gpnum(), we always have the opportunity
> > to resync gpnum and completed as  __rcu_process_gp_end() is called
> > before.
> > 
> > Am I missing something?
> 
> If the CPU is already aware of the end of the previous grace period,
> then __rcu_process_gp_end() will return without doing anything.  But if
> force_quiescent_state() already took care of this CPU, there is no point
> in its looking for another quiescent state.  This can happen as follows:
> 
> o	CPU 0 notes the end of the previous grace period and then
> 	enters dyntick-idle mode.
> 
> o	CPU 2 enters a very long RCU read-side critical section.
> 
> o	CPU 1 starts a new grace period.
> 
> o	CPU 0 does not check in because it is in dyntick-idle mode.
> 
> o	CPU 1 eventually calls force_quiescent_state() a few times,
> 	and sees that CPU 0 is in dyntick-idle mode, so tells RCU
> 	that CPU 0 is in an extended quiescent state.  But the
> 	grace period cannot end because CPU 2 is still in its
> 	RCU read-side critical section.
> 
> o	CPU 0 comes out of dyntick-idle mode, and sees the new
> 	grace period.  The old code would nevertheless look for
> 	a quiescent state, and the new code would avoid doing so.
> 
> Unless I am missing something, of course...
> 
> 							Thanx, Paul

Aah, so in your scenario, CPU 0, 1 et 2 are the same node (rnp),
we have not updated rnp->completed because we still wait for CPU 2.

Then __rcu_process_gp_end() won't increase the gpnum either
because rnp->completed is still equal to rdp->completed.

And later on we call note_new_gpnum() that thinks it has a new
gp to handle but it's wrong.

Hence the need to look at the mask level there.

This makes all sense!

Thanks!

> 
> > Thanks.
> > 
> > >  I rebased your earlier two
> > > out and reworked mine, please see below.  Work better?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > commit c808bedd1b1d7c720546a6682fca44c66703af4e
> > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Date:   Fri Dec 10 15:02:47 2010 -0800
> > > 
> > >     rcu: fine-tune grace-period begin/end checks
> > >     
> > >     Use the CPU's bit in rnp->qsmask to determine whether or not the CPU
> > >     should try to report a quiescent state.  Handle overflow in the check
> > >     for rdp->gpnum having fallen behind.
> > >     
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > index 368be76..530cdcd 100644
> > > --- a/kernel/rcutree.c
> > > +++ b/kernel/rcutree.c
> > > @@ -616,9 +616,17 @@ static void __init check_cpu_stall_init(void)
> > >  static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp)
> > >  {
> > >  	if (rdp->gpnum != rnp->gpnum) {
> > > -		rdp->qs_pending = 1;
> > > -		rdp->passed_quiesc = 0;
> > > +		/*
> > > +		 * If the current grace period is waiting for this CPU,
> > > +		 * set up to detect a quiescent state, otherwise don't
> > > +		 * go looking for one.
> > > +		 */
> > >  		rdp->gpnum = rnp->gpnum;
> > > +		if (rnp->qsmask & rdp->grpmask) {
> > > +			rdp->qs_pending = 1;
> > > +			rdp->passed_quiesc = 0;
> > > +		} else
> > > +			rdp->qs_pending = 0;
> > >  	}
> > >  }
> > >  
> > > @@ -680,19 +688,20 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> > >  
> > >  		/*
> > >  		 * If we were in an extended quiescent state, we may have
> > > -		 * missed some grace periods that others CPUs took care on
> > > +		 * missed some grace periods that others CPUs handled on
> > >  		 * our behalf. Catch up with this state to avoid noting
> > > -		 * spurious new grace periods.
> > > +		 * spurious new grace periods.  If another grace period
> > > +		 * has started, then rnp->gpnum will have advanced, so
> > > +		 * we will detect this later on.
> > >  		 */
> > > -		if (rdp->completed > rdp->gpnum)
> > > +		if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > >  			rdp->gpnum = rdp->completed;
> > >  
> > >  		/*
> > > -		 * If another CPU handled our extended quiescent states and
> > > -		 * we have no more grace period to complete yet, then stop
> > > -		 * chasing quiescent states.
> > > +		 * If RCU does not need a quiescent state from this CPU,
> > > +		 * then make sure that this CPU doesn't go looking for one.
> > >  		 */
> > > -		if (rdp->completed == rnp->gpnum)
> > > +		if (rnp->qsmask & rdp->grpmask)
> > >  			rdp->qs_pending = 0;
> > >  	}
> > >  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized
  2010-12-11  1:21               ` Frederic Weisbecker
@ 2010-12-11  6:36                 ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2010-12-11  6:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, laijs

On Sat, Dec 11, 2010 at 02:21:04AM +0100, Frederic Weisbecker wrote:
> On Fri, Dec 10, 2010 at 04:58:27PM -0800, Paul E. McKenney wrote:
> > On Sat, Dec 11, 2010 at 01:15:17AM +0100, Frederic Weisbecker wrote:
> > > On Fri, Dec 10, 2010 at 04:04:51PM -0800, Paul E. McKenney wrote:
> > > > On Sat, Dec 11, 2010 at 12:47:11AM +0100, Frederic Weisbecker wrote:
> > > > > On Fri, Dec 10, 2010 at 03:39:20PM -0800, Paul E. McKenney wrote:
> > > > > > On Fri, Dec 10, 2010 at 03:02:00PM -0800, Paul E. McKenney wrote:
> > > > > > > On Fri, Dec 10, 2010 at 10:11:11PM +0100, Frederic Weisbecker wrote:
> > > > > > > > When a CPU that was in an extended quiescent state wakes
> > > > > > > > up and catches up with grace periods that remote CPUs
> > > > > > > > completed on its behalf, we update the completed field
> > > > > > > > but not the gpnum that keeps a stale value of a backward
> > > > > > > > grace period ID.
> > > > > > > > 
> > > > > > > > Later, note_new_gpnum() will interpret the shift between
> > > > > > > > the local CPU and the node grace period ID as some new grace
> > > > > > > > period to handle and will then start to hunt quiescent state.
> > > > > > > > 
> > > > > > > > But if every grace periods have already been completed, this
> > > > > > > > interpretation becomes broken. And we'll be stuck in clusters
> > > > > > > > of spurious softirqs because rcu_report_qs_rdp() will make
> > > > > > > > this broken state run into infinite loop.
> > > > > > > > 
> > > > > > > > The solution, as suggested by Lai Jiangshan, is to ensure that
> > > > > > > > the gpnum and completed fields are well synchronized when we catch
> > > > > > > > up with completed grace periods on their behalf by other cpus.
> > > > > > > > This way we won't start noting spurious new grace periods.
> > > > > > > 
> > > > > > > Also good, queued!
> > > > > > > 
> > > > > > > One issue -- this approach is vulnerable to overflow.  I therefore
> > > > > > > followed up with a patch that changes the condition to
> > > > > > > 
> > > > > > > 	if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > > > > > 
> > > > > > And here is the follow-up patch, FWIW.
> > > > > > 
> > > > > > 							Thanx, Paul
> > > > > 
> > > > > Hmm, it doesn't apply on top of my two patches. It seems you have
> > > > > kept my two previous patches, which makes it fail as it lacks them
> > > > > as a base.
> > > > > 
> > > > > Did you intend to keep them? I hope they are quite useless now, otherwise
> > > > > it means there is other cases I forgot.
> > > > 
> > > > One is indeed useless, while the other is useful in combinations of
> > > > dyntick-idle and force_quiescent_state().
> > > 
> > > I don't see how.
> > > 
> > > Before we call __note_new_gpnum(), we always have the opportunity
> > > to resync gpnum and completed as  __rcu_process_gp_end() is called
> > > before.
> > > 
> > > Am I missing something?
> > 
> > If the CPU is already aware of the end of the previous grace period,
> > then __rcu_process_gp_end() will return without doing anything.  But if
> > force_quiescent_state() already took care of this CPU, there is no point
> > in its looking for another quiescent state.  This can happen as follows:
> > 
> > o	CPU 0 notes the end of the previous grace period and then
> > 	enters dyntick-idle mode.
> > 
> > o	CPU 2 enters a very long RCU read-side critical section.
> > 
> > o	CPU 1 starts a new grace period.
> > 
> > o	CPU 0 does not check in because it is in dyntick-idle mode.
> > 
> > o	CPU 1 eventually calls force_quiescent_state() a few times,
> > 	and sees that CPU 0 is in dyntick-idle mode, so tells RCU
> > 	that CPU 0 is in an extended quiescent state.  But the
> > 	grace period cannot end because CPU 2 is still in its
> > 	RCU read-side critical section.
> > 
> > o	CPU 0 comes out of dyntick-idle mode, and sees the new
> > 	grace period.  The old code would nevertheless look for
> > 	a quiescent state, and the new code would avoid doing so.
> > 
> > Unless I am missing something, of course...
> > 
> > 							Thanx, Paul
> 
> Aah, so in your scenario, CPU 0, 1 et 2 are the same node (rnp),
> we have not updated rnp->completed because we still wait for CPU 2.
> 
> Then __rcu_process_gp_end() won't increase the gpnum either
> because rnp->completed is still equal to rdp->completed.
> 
> And later on we call note_new_gpnum() that thinks it has a new
> gp to handle but it's wrong.
> 
> Hence the need to look at the mask level there.

CPUs 0, 1, et 2 are not necessarily on the same node, but other than
that, you have it exactly.  The trick is that force_quiescent_state()
takes global action, so CPU 1 does not need to be on the same node as CPU 0.
Furthermore, an RCU read-side critical section anywhere in the system
will prevent any subsequent grace period from completing, so CPU 2
can be on yet another node.

> This makes all sense!

Thank you for the confirmation, will try testing it out more thoroughly.

							Thanx, Paul

> Thanks!
> 
> > 
> > > Thanks.
> > > 
> > > >  I rebased your earlier two
> > > > out and reworked mine, please see below.  Work better?
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit c808bedd1b1d7c720546a6682fca44c66703af4e
> > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > Date:   Fri Dec 10 15:02:47 2010 -0800
> > > > 
> > > >     rcu: fine-tune grace-period begin/end checks
> > > >     
> > > >     Use the CPU's bit in rnp->qsmask to determine whether or not the CPU
> > > >     should try to report a quiescent state.  Handle overflow in the check
> > > >     for rdp->gpnum having fallen behind.
> > > >     
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > index 368be76..530cdcd 100644
> > > > --- a/kernel/rcutree.c
> > > > +++ b/kernel/rcutree.c
> > > > @@ -616,9 +616,17 @@ static void __init check_cpu_stall_init(void)
> > > >  static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp)
> > > >  {
> > > >  	if (rdp->gpnum != rnp->gpnum) {
> > > > -		rdp->qs_pending = 1;
> > > > -		rdp->passed_quiesc = 0;
> > > > +		/*
> > > > +		 * If the current grace period is waiting for this CPU,
> > > > +		 * set up to detect a quiescent state, otherwise don't
> > > > +		 * go looking for one.
> > > > +		 */
> > > >  		rdp->gpnum = rnp->gpnum;
> > > > +		if (rnp->qsmask & rdp->grpmask) {
> > > > +			rdp->qs_pending = 1;
> > > > +			rdp->passed_quiesc = 0;
> > > > +		} else
> > > > +			rdp->qs_pending = 0;
> > > >  	}
> > > >  }
> > > >  
> > > > @@ -680,19 +688,20 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
> > > >  
> > > >  		/*
> > > >  		 * If we were in an extended quiescent state, we may have
> > > > -		 * missed some grace periods that others CPUs took care on
> > > > +		 * missed some grace periods that others CPUs handled on
> > > >  		 * our behalf. Catch up with this state to avoid noting
> > > > -		 * spurious new grace periods.
> > > > +		 * spurious new grace periods.  If another grace period
> > > > +		 * has started, then rnp->gpnum will have advanced, so
> > > > +		 * we will detect this later on.
> > > >  		 */
> > > > -		if (rdp->completed > rdp->gpnum)
> > > > +		if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))
> > > >  			rdp->gpnum = rdp->completed;
> > > >  
> > > >  		/*
> > > > -		 * If another CPU handled our extended quiescent states and
> > > > -		 * we have no more grace period to complete yet, then stop
> > > > -		 * chasing quiescent states.
> > > > +		 * If RCU does not need a quiescent state from this CPU,
> > > > +		 * then make sure that this CPU doesn't go looking for one.
> > > >  		 */
> > > > -		if (rdp->completed == rnp->gpnum)
> > > > +		if (rnp->qsmask & rdp->grpmask)
> > > >  			rdp->qs_pending = 0;
> > > >  	}
> > > >  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-12-11  6:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-10 21:11 [PATCH 0/2 v2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker
2010-12-10 21:11 ` [PATCH 1/2] rcu: Stop chasing QS if another CPU did it for us Frederic Weisbecker
2010-12-10 22:58   ` Paul E. McKenney
2010-12-10 23:33     ` Frederic Weisbecker
2010-12-10 21:14 ` [PATCH 0/2 v2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker
     [not found] ` <1292015471-19227-3-git-send-email-fweisbec@gmail.com>
2010-12-10 23:02   ` [PATCH 2/2] rcu: Keep gpnum and completed fields synchronized Paul E. McKenney
2010-12-10 23:39     ` Paul E. McKenney
2010-12-10 23:47       ` Frederic Weisbecker
2010-12-11  0:04         ` Paul E. McKenney
2010-12-11  0:15           ` Frederic Weisbecker
2010-12-11  0:58             ` Paul E. McKenney
2010-12-11  1:21               ` Frederic Weisbecker
2010-12-11  6:36                 ` Paul E. McKenney
2010-12-11  0:00     ` Frederic Weisbecker
2010-12-11  0:48       ` Paul E. McKenney
2010-12-11  0:51         ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).