CPU trying to start a GP when no CBs were assigned new GP numbers

rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* CPU trying to start a GP when no CBs were assigned new GP numbers
@ 2020-06-17  0:24 Joel Fernandes
  2020-06-17  4:06 ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2020-06-17  0:24 UTC (permalink / raw)
  To: rcu, Paul E. McKenney

Hi,
I am seeing something a bit strange with RCU where it is trying to
start a GP twice from a CPU even though no new CB was queued on that
CPU. It is quite possible that I'm missing something. Anyway, I wrote
a patch to add some tracing when CBs are queued into the segcb. I am
planning to post this trace patch later.

The trace in the link below shows CPU2 queuing around 5 CBs, which
then gets accelerated at 5.192123. The GP thread running on CPU3
starts a new GP. Now the CPU2 softirq runs again (roughly 1ms after
the previous acceleration). The softirq runs probably because the GP
thread is expecting a QS report from CPU 2.  When the CPU2's softirq
runs though, it does an acceleration again which triggers a second new
GP start. This seems a bit unnecessary AFAICS - because the need for
GP *832 was already recorded which is all CPU2 should really be caring
about right?

Here is the trace: https://pastebin.com/raw/AYGzu1g4

Thanks,

  - Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPU trying to start a GP when no CBs were assigned new GP numbers
  2020-06-17  0:24 CPU trying to start a GP when no CBs were assigned new GP numbers Joel Fernandes
@ 2020-06-17  4:06 ` Paul E. McKenney
  2020-06-17  6:39   ` Joel Fernandes
  0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2020-06-17  4:06 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Tue, Jun 16, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> Hi,
> I am seeing something a bit strange with RCU where it is trying to
> start a GP twice from a CPU even though no new CB was queued on that
> CPU. It is quite possible that I'm missing something. Anyway, I wrote
> a patch to add some tracing when CBs are queued into the segcb. I am
> planning to post this trace patch later.
> 
> The trace in the link below shows CPU2 queuing around 5 CBs, which
> then gets accelerated at 5.192123. The GP thread running on CPU3
> starts a new GP. Now the CPU2 softirq runs again (roughly 1ms after
> the previous acceleration). The softirq runs probably because the GP
> thread is expecting a QS report from CPU 2.  When the CPU2's softirq
> runs though, it does an acceleration again which triggers a second new
> GP start. This seems a bit unnecessary AFAICS - because the need for
> GP *832 was already recorded which is all CPU2 should really be caring
> about right?
> 
> Here is the trace: https://pastebin.com/raw/AYGzu1g4

Assuming that the WAIT= and NEXT_READY= numbers are grace-period numbers,
this trace is expected behavior for two sets of callbacks, one that
arrived at CPU 2 by time 5.192121 and another that arrived between then
and time 5.193131.

So I have to ask...  What tells you that no callbacks arrived at CPU 2
during this interval?

On the other hand, if CPU 2 is offloaded, what you might be seeing is
the delayed drain of callbacks from the bypass.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPU trying to start a GP when no CBs were assigned new GP numbers
  2020-06-17  4:06 ` Paul E. McKenney
@ 2020-06-17  6:39   ` Joel Fernandes
  2020-06-17 14:01     ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2020-06-17  6:39 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu

On Tue, Jun 16, 2020 at 09:06:14PM -0700, Paul E. McKenney wrote:
> On Tue, Jun 16, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> > Hi,
> > I am seeing something a bit strange with RCU where it is trying to
> > start a GP twice from a CPU even though no new CB was queued on that
> > CPU. It is quite possible that I'm missing something. Anyway, I wrote
> > a patch to add some tracing when CBs are queued into the segcb. I am
> > planning to post this trace patch later.
> > 
> > The trace in the link below shows CPU2 queuing around 5 CBs, which
> > then gets accelerated at 5.192123. The GP thread running on CPU3
> > starts a new GP. Now the CPU2 softirq runs again (roughly 1ms after
> > the previous acceleration). The softirq runs probably because the GP
> > thread is expecting a QS report from CPU 2.  When the CPU2's softirq
> > runs though, it does an acceleration again which triggers a second new
> > GP start. This seems a bit unnecessary AFAICS - because the need for
> > GP *832 was already recorded which is all CPU2 should really be caring
> > about right?
> > 
> > Here is the trace: https://pastebin.com/raw/AYGzu1g4
> 
> Assuming that the WAIT= and NEXT_READY= numbers are grace-period numbers,

In the trace there are 2 numbers for WAIT and NEXT_READY each, number of
callbacks and gp numbers. Sorry, should have clarified that.

> this trace is expected behavior for two sets of callbacks, one that
> arrived at CPU 2 by time 5.192121 and another that arrived between then
> and time 5.193131.

There is just 1 set of callbacks.
> 
> So I have to ask...  What tells you that no callbacks arrived at CPU 2
> during this interval?

Because there is no rcu_callback tracepoint fired in the interim. The number
of callbacks that I'm tracing also confirm this.

> 
> On the other hand, if CPU 2 is offloaded, what you might be seeing is
> the delayed drain of callbacks from the bypass.

Sorry should have clarified it was not offloaded.

I dug more deeper and noticed that during acceleration, it is possible that
the gp_seq numbers of empty segments are updated. In this case,
rcu_segcblist_accelerate() still returns true resulting in starting of a new
future GP. The below patch cures it, but I'm not sure if it introduces other
issues. In light testing, it appears working. WDYT?

---8<-----------------------

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 5f4fd3b8777ca..ebdba1d95f629 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -446,7 +478,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
  */
 bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 {
-	int i;
+	int i, oldest_seg;
 
 	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
 	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
@@ -465,6 +497,9 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 		    ULONG_CMP_LT(rsclp->gp_seq[i], seq))
 			break;
 
+	/* The oldest segment after which everything later is merged. */
+	oldest_seg = i;
+
 	/*
 	 * If all the segments contain callbacks that correspond to
 	 * earlier grace-period sequence numbers than "seq", leave.
@@ -488,10 +523,19 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 	 * where there were no pending callbacks in the rcu_segcblist
 	 * structure other than in the RCU_NEXT_TAIL segment.
 	 */
 	for (; i < RCU_NEXT_TAIL; i++) {
 		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_NEXT_TAIL]);
 		rsclp->gp_seq[i] = seq;
 	}
+
+	/*
+	 * If all segments after oldest_seg were empty, then new GP numbers
+	 * were assigned to empty segments. In this case, no need to start
+	 * those future GPs.
+	 */
+	if (rcu_segcblist_restempty(rsclp, oldest_seg))
+		return false;
+
 	return true;
 }
 


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: CPU trying to start a GP when no CBs were assigned new GP numbers
  2020-06-17  6:39   ` Joel Fernandes
@ 2020-06-17 14:01     ` Paul E. McKenney
  2020-06-17 14:44       ` Joel Fernandes
  0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2020-06-17 14:01 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

[-- Attachment #1: Type: text/plain, Size: 4701 bytes --]

On Wed, Jun 17, 2020 at 02:39:54AM -0400, Joel Fernandes wrote:
> On Tue, Jun 16, 2020 at 09:06:14PM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 16, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> > > Hi,
> > > I am seeing something a bit strange with RCU where it is trying to
> > > start a GP twice from a CPU even though no new CB was queued on that
> > > CPU. It is quite possible that I'm missing something. Anyway, I wrote
> > > a patch to add some tracing when CBs are queued into the segcb. I am
> > > planning to post this trace patch later.
> > > 
> > > The trace in the link below shows CPU2 queuing around 5 CBs, which
> > > then gets accelerated at 5.192123. The GP thread running on CPU3
> > > starts a new GP. Now the CPU2 softirq runs again (roughly 1ms after
> > > the previous acceleration). The softirq runs probably because the GP
> > > thread is expecting a QS report from CPU 2.  When the CPU2's softirq
> > > runs though, it does an acceleration again which triggers a second new
> > > GP start. This seems a bit unnecessary AFAICS - because the need for
> > > GP *832 was already recorded which is all CPU2 should really be caring
> > > about right?
> > > 
> > > Here is the trace: https://pastebin.com/raw/AYGzu1g4
> > 
> > Assuming that the WAIT= and NEXT_READY= numbers are grace-period numbers,
> 
> In the trace there are 2 numbers for WAIT and NEXT_READY each, number of
> callbacks and gp numbers. Sorry, should have clarified that.

Ah, I see.  What are you doing to compute the number of callbacks?

> > this trace is expected behavior for two sets of callbacks, one that
> > arrived at CPU 2 by time 5.192121 and another that arrived between then
> > and time 5.193131.
> 
> There is just 1 set of callbacks.
> > 
> > So I have to ask...  What tells you that no callbacks arrived at CPU 2
> > during this interval?
> 
> Because there is no rcu_callback tracepoint fired in the interim. The number
> of callbacks that I'm tracing also confirm this.
> 
> > 
> > On the other hand, if CPU 2 is offloaded, what you might be seeing is
> > the delayed drain of callbacks from the bypass.
> 
> Sorry should have clarified it was not offloaded.
> 
> I dug more deeper and noticed that during acceleration, it is possible that
> the gp_seq numbers of empty segments are updated. In this case,
> rcu_segcblist_accelerate() still returns true resulting in starting of a new
> future GP. The below patch cures it, but I'm not sure if it introduces other
> issues. In light testing, it appears working. WDYT?

You are quite correct that empty segments should not be assigned
grace-period numbers.  Or at least that any such grace-period numbers
should not be taken very seriously.

> ---8<-----------------------
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 5f4fd3b8777ca..ebdba1d95f629 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -446,7 +478,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>   */
>  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  {
> -	int i;
> +	int i, oldest_seg;
>  
>  	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
>  	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
> @@ -465,6 +497,9 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  		    ULONG_CMP_LT(rsclp->gp_seq[i], seq))
>  			break;
>  
> +	/* The oldest segment after which everything later is merged. */
> +	oldest_seg = i;
> +
>  	/*
>  	 * If all the segments contain callbacks that correspond to
>  	 * earlier grace-period sequence numbers than "seq", leave.
> @@ -488,10 +523,19 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  	 * where there were no pending callbacks in the rcu_segcblist
>  	 * structure other than in the RCU_NEXT_TAIL segment.
>  	 */
>  	for (; i < RCU_NEXT_TAIL; i++) {
>  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_NEXT_TAIL]);
>  		rsclp->gp_seq[i] = seq;
>  	}
> +
> +	/*
> +	 * If all segments after oldest_seg were empty, then new GP numbers
> +	 * were assigned to empty segments. In this case, no need to start
> +	 * those future GPs.
> +	 */
> +	if (rcu_segcblist_restempty(rsclp, oldest_seg))
> +		return false;
> +
>  	return true;
>  }

Looks like this needs a focused test program.  ;-)

Please see attached for an outdated test userspace program.  It probably
needs help to check for this additional case.  And probably also other
changes to account for three years of change.

This test program works by copying files from the designated Linux
source tree, then building them into a userspace torture test.

							Thanx, Paul


[-- Attachment #2: rcusegcblisttest.2020.06.17a.tgz --]
[-- Type: application/x-gtar-compressed, Size: 10401 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPU trying to start a GP when no CBs were assigned new GP numbers
  2020-06-17 14:01     ` Paul E. McKenney
@ 2020-06-17 14:44       ` Joel Fernandes
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Fernandes @ 2020-06-17 14:44 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu

On Wed, Jun 17, 2020 at 07:01:00AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 17, 2020 at 02:39:54AM -0400, Joel Fernandes wrote:
> > On Tue, Jun 16, 2020 at 09:06:14PM -0700, Paul E. McKenney wrote:
> > > On Tue, Jun 16, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> > > > Hi,
> > > > I am seeing something a bit strange with RCU where it is trying to
> > > > start a GP twice from a CPU even though no new CB was queued on that
> > > > CPU. It is quite possible that I'm missing something. Anyway, I wrote
> > > > a patch to add some tracing when CBs are queued into the segcb. I am
> > > > planning to post this trace patch later.
> > > > 
> > > > The trace in the link below shows CPU2 queuing around 5 CBs, which
> > > > then gets accelerated at 5.192123. The GP thread running on CPU3
> > > > starts a new GP. Now the CPU2 softirq runs again (roughly 1ms after
> > > > the previous acceleration). The softirq runs probably because the GP
> > > > thread is expecting a QS report from CPU 2.  When the CPU2's softirq
> > > > runs though, it does an acceleration again which triggers a second new
> > > > GP start. This seems a bit unnecessary AFAICS - because the need for
> > > > GP *832 was already recorded which is all CPU2 should really be caring
> > > > about right?
> > > > 
> > > > Here is the trace: https://pastebin.com/raw/AYGzu1g4
> > > 
> > > Assuming that the WAIT= and NEXT_READY= numbers are grace-period numbers,
> > 
> > In the trace there are 2 numbers for WAIT and NEXT_READY each, number of
> > callbacks and gp numbers. Sorry, should have clarified that.
> 
> Ah, I see.  What are you doing to compute the number of callbacks?

Here is my segcb trace patch, I added a rcu_segcblist_countseq() function:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?id=794cba82fe8b5c72df46d4f2548957db36317d39

> > > On the other hand, if CPU 2 is offloaded, what you might be seeing is
> > > the delayed drain of callbacks from the bypass.
> > 
> > Sorry should have clarified it was not offloaded.
> > 
> > I dug more deeper and noticed that during acceleration, it is possible that
> > the gp_seq numbers of empty segments are updated. In this case,
> > rcu_segcblist_accelerate() still returns true resulting in starting of a new
> > future GP. The below patch cures it, but I'm not sure if it introduces other
> > issues. In light testing, it appears working. WDYT?
> 
> You are quite correct that empty segments should not be assigned
> grace-period numbers.  Or at least that any such grace-period numbers
> should not be taken very seriously.

Thanks. This is what the test diff I shared below does, basically return
false from _accelerate() if all the segments assigned gp_seq(s) are empty.

> > ---8<-----------------------
> > 
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 5f4fd3b8777ca..ebdba1d95f629 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -446,7 +478,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
> >   */
> >  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
> >  {
> > -	int i;
> > +	int i, oldest_seg;
> >  
> >  	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
> >  	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
> > @@ -465,6 +497,9 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
> >  		    ULONG_CMP_LT(rsclp->gp_seq[i], seq))
> >  			break;
> >  
> > +	/* The oldest segment after which everything later is merged. */
> > +	oldest_seg = i;
> > +
> >  	/*
> >  	 * If all the segments contain callbacks that correspond to
> >  	 * earlier grace-period sequence numbers than "seq", leave.
> > @@ -488,10 +523,19 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
> >  	 * where there were no pending callbacks in the rcu_segcblist
> >  	 * structure other than in the RCU_NEXT_TAIL segment.
> >  	 */
> >  	for (; i < RCU_NEXT_TAIL; i++) {
> >  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_NEXT_TAIL]);
> >  		rsclp->gp_seq[i] = seq;
> >  	}
> > +
> > +	/*
> > +	 * If all segments after oldest_seg were empty, then new GP numbers
> > +	 * were assigned to empty segments. In this case, no need to start
> > +	 * those future GPs.
> > +	 */
> > +	if (rcu_segcblist_restempty(rsclp, oldest_seg))
> > +		return false;
> > +
> >  	return true;
> >  }
> 
> Looks like this needs a focused test program.  ;-)
> 
> Please see attached for an outdated test userspace program.  It probably
> needs help to check for this additional case.  And probably also other
> changes to account for three years of change.
> 
> This test program works by copying files from the designated Linux
> source tree, then building them into a userspace torture test.

Thanks, I will give it a try.

I also added a warning in the trace patch I linked above, at the end of
_accelerate() to make sure NEXT segment was always merged:

+	/*
+	 * Make sure the NEXT list is always empty after an acceleration.
+	 */
+	WARN_ON_ONCE(!rcu_segcblist_restempty(rsclp, RCU_NEXT_READY_TAIL));


thanks,

 - Joel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-17 14:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-17  0:24 CPU trying to start a GP when no CBs were assigned new GP numbers Joel Fernandes
2020-06-17  4:06 ` Paul E. McKenney
2020-06-17  6:39   ` Joel Fernandes
2020-06-17 14:01     ` Paul E. McKenney
2020-06-17 14:44       ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).