rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	fweisbec@gmail.com, oleg@redhat.com
Subject: Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs
Date: Wed, 14 Aug 2019 15:05:16 -0700	[thread overview]
Message-ID: <20190814220516.GY28441@linux.ibm.com> (raw)
In-Reply-To: <20190814175546.GB68498@google.com>

On Wed, Aug 14, 2019 at 01:55:46PM -0400, Joel Fernandes wrote:
> On Tue, Aug 13, 2019 at 07:48:09AM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 13, 2019 at 02:30:19PM +0200, Frederic Weisbecker wrote:
> > > On Mon, Aug 12, 2019 at 04:23:16PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Aug 12, 2019 at 11:02:33PM +0200, Frederic Weisbecker wrote:
> > > > > On Fri, Aug 02, 2019 at 08:15:01AM -0700, Paul E. McKenney wrote:
> > > > > Looks like it's not the right fix but, should you ever need to set an
> > > > > all-CPUs (system wide) tick dependency in the future, you can use tick_set_dep().
> > > > 
> > > > Indeed, I have dropped this patch, but I now do something similar in
> > > > RCU's CPU-hotplug notifiers.  Which does have an effect, especially on
> > > > the system that isn't subject to the insane-latency cpu_relax().
> > > > 
> > > > Plus I am having to put a similar workaround into RCU's quiescent-state
> > > > forcing logic.
> > > > 
> > > > But how should this really be done?
> > > > 
> > > > Isn't there some sort of monitoring of nohz_full CPUs for accounting
> > > > purposes?  If so, would it make sense for this monitoring to check for
> > > > long-duration kernel execution and enable the tick in this case?  The
> > > > RCU dyntick machinery can be used to remotely detect the long-duration
> > > > kernel execution using something like the following:
> > > > 
> > > > 	int nohz_in_kernel_snap = rcu_dynticks_snap_cpu(cpu);
> > > > 
> > > > 	...
> > > > 
> > > > 	if (rcu_dynticks_in_eqs_cpu(cpu, nohz_in_kernel_snap)
> > > > 		nohz_in_kernel_snap = rcu_dynticks_snap_cpu(cpu);
> > > > 	else
> > > > 		/* Turn on the tick! */
> > > > 
> > > > I would supply rcu_dynticks_snap_cpu() and rcu_dynticks_in_eqs_cpu(),
> > > > which would be simple wrappers around RCU's private rcu_dynticks_snap()
> > > > and rcu_dynticks_in_eqs() functions.
> > > > 
> > > > Would this make sense as a general solution, or am I missing a corner
> > > > case or three?
> > > 
> > > Oh I see. Until now we considered than running into the kernel (between user/guest/idle)
> > > is supposed to be short but there can be specific places where it doesn't apply.
> > > 
> > > I'm wondering if, more than just providing wrappers, this shouldn't be entirely
> > > driven by RCU using the tick_set_dep_cpu()/tick_clear_dep_cpu() at appropriate timings.
> > > 
> > > I don't want to sound like I'm trying to put all the work on you :p  It's just that
> > > the tick shouldn't know much about RCU, it's rather RCU that is a client for the tick and
> > > is probably better suited to determine when a CPU becomes annoying with its extended grace
> > > period.
> > > 
> > > Arming a CPU timer could also be an alternative to tick_set_dep_cpu() for that.
> > > 
> > > What do you think?
> > 
> > Left to itself, RCU would take action only when a given nohz_full
> > in-kernel CPU was delaying a grace period, which is what the (lightly
> > tested) patch below is supposed to help with.  If that is all that is
> > needed, well and good!
> > 
> > But should we need long-running in-kernel nohz_full CPUs to turn on
> > their ticks when they are not blocking an RCU grace period, for example,
> > when RCU is idle, more will be needed.  To that point, isn't there some
> > sort of monitoring that checks up on nohz_full CPUs ever second or so?
> 
> Wouldn't such monitoring need to be more often than a second, given that
> rcu_urgent_qs and rcu_need_heavy_qs are configured typically to be sooner
> (200-300 jiffies on my system).

Either it would have to be more often than once per second, or RCU would
need to retain its more frequent checks.  But note that RCU isn't going
to check unless there is a grace period in progress.

> > If so, perhaps that monitoring could periodically invoke an RCU function
> > that I provide for deciding when to turn the tick on.  We would also need
> > to work out how to turn the tick off in a timely fashion once the CPU got
> > out of kernel mode, perhaps in rcu_user_enter() or rcu_nmi_exit_common().
> > 
> > If this would be called only every second or so, the separate grace-period
> > checking is still needed for its shorter timespan, though.
> > 
> > Thoughts?
> 
> Do you want me to test the below patch to see if it fixes the issue with my
> other test case (where I had a nohz full CPU holding up a grace period).

Please!

> 2 comments below:
> 
> > ------------------------------------------------------------------------
> > 
> > commit 1cb89508804f6f2fdb79a1be032b1932d52318c4
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Mon Aug 12 16:14:00 2019 -0700
> > 
> >     rcu: Force tick on for nohz_full CPUs not reaching quiescent states
> >     
> >     CPUs running for long time periods in the kernel in nohz_full mode
> >     might leave the scheduling-clock interrupt disabled for then full
> >     duration of their in-kernel execution.  This can (among other things)
> >     delay grace periods.  This commit therefore forces the tick back on
> >     for any nohz_full CPU that is failing to pass through a quiescent state
> >     upon return from interrupt, which the resched_cpu() will induce.
> >     
> >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8c494a692728..8b8f5bffdc5a 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -651,6 +651,12 @@ static __always_inline void rcu_nmi_exit_common(bool irq)
> >  	 */
> >  	if (rdp->dynticks_nmi_nesting != 1) {
> >  		trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2, rdp->dynticks);
> > +		if (tick_nohz_full_cpu(rdp->cpu) &&
> > +		    rdp->dynticks_nmi_nesting == 2 &&
> > +		    rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > +			rdp->rcu_forced_tick = true;
> > +			tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> > +		}
> >  		WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
> >  			   rdp->dynticks_nmi_nesting - 2);
> >  		return;
> > @@ -886,6 +892,16 @@ void rcu_irq_enter_irqson(void)
> >  	local_irq_restore(flags);
> >  }
> >  
> > +/*
> > + * If the scheduler-clock interrupt was enabled on a nohz_full CPU
> > + * in order to get to a quiescent state, disable it.
> > + */
> > +void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
> > +{
> > +	if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick)
> > +		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> > +}
> > +
> >  /**
> >   * rcu_is_watching - see if RCU thinks that the current CPU is not idle
> >   *
> > @@ -1980,6 +1996,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
> >  		if (!offloaded)
> >  			needwake = rcu_accelerate_cbs(rnp, rdp);
> >  
> > +		rcu_disable_tick_upon_qs(rdp);
> >  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >  		/* ^^^ Released rnp->lock */
> >  		if (needwake)
> > @@ -2269,6 +2286,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
> >  	int cpu;
> >  	unsigned long flags;
> >  	unsigned long mask;
> > +	struct rcu_data *rdp;
> >  	struct rcu_node *rnp;
> >  
> >  	rcu_for_each_leaf_node(rnp) {
> > @@ -2293,8 +2311,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
> >  		for_each_leaf_node_possible_cpu(rnp, cpu) {
> >  			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
> >  			if ((rnp->qsmask & bit) != 0) {
> > -				if (f(per_cpu_ptr(&rcu_data, cpu)))
> > +				rdp = per_cpu_ptr(&rcu_data, cpu);
> > +				if (f(rdp)) {
> >  					mask |= bit;
> > +					rcu_disable_tick_upon_qs(rdp);
> > +				}
> 
> I am guessing this was the earlier thing you corrected, cool!!

Like I said, stupid error on my part.  The usual kind.  ;-)

> >  			}
> >  		}
> >  		if (mask != 0) {
> > @@ -2322,7 +2343,7 @@ void rcu_force_quiescent_state(void)
> >  	rnp = __this_cpu_read(rcu_data.mynode);
> >  	for (; rnp != NULL; rnp = rnp->parent) {
> >  		ret = (READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_FQS) ||
> > -		      !raw_spin_trylock(&rnp->fqslock);
> > +		       !raw_spin_trylock(&rnp->fqslock);
> >  		if (rnp_old != NULL)
> >  			raw_spin_unlock(&rnp_old->fqslock);
> >  		if (ret)
> > @@ -2855,7 +2876,7 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
> >  {
> >  	if (atomic_dec_and_test(&rcu_state.barrier_cpu_count)) {
> >  		rcu_barrier_trace(TPS("LastCB"), -1,
> > -				   rcu_state.barrier_sequence);
> > +				  rcu_state.barrier_sequence);
> >  		complete(&rcu_state.barrier_completion);
> >  	} else {
> >  		rcu_barrier_trace(TPS("CB"), -1, rcu_state.barrier_sequence);
> > @@ -2879,7 +2900,7 @@ static void rcu_barrier_func(void *unused)
> >  	} else {
> >  		debug_rcu_head_unqueue(&rdp->barrier_head);
> >  		rcu_barrier_trace(TPS("IRQNQ"), -1,
> > -				   rcu_state.barrier_sequence);
> > +				  rcu_state.barrier_sequence);
> >  	}
> >  	rcu_nocb_unlock(rdp);
> >  }
> > @@ -2906,7 +2927,7 @@ void rcu_barrier(void)
> >  	/* Did someone else do our work for us? */
> >  	if (rcu_seq_done(&rcu_state.barrier_sequence, s)) {
> >  		rcu_barrier_trace(TPS("EarlyExit"), -1,
> > -				   rcu_state.barrier_sequence);
> > +				  rcu_state.barrier_sequence);
> >  		smp_mb(); /* caller's subsequent code after above check. */
> >  		mutex_unlock(&rcu_state.barrier_mutex);
> >  		return;
> > @@ -2938,11 +2959,11 @@ void rcu_barrier(void)
> >  			continue;
> >  		if (rcu_segcblist_n_cbs(&rdp->cblist)) {
> >  			rcu_barrier_trace(TPS("OnlineQ"), cpu,
> > -					   rcu_state.barrier_sequence);
> > +					  rcu_state.barrier_sequence);
> >  			smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
> >  		} else {
> >  			rcu_barrier_trace(TPS("OnlineNQ"), cpu,
> > -					   rcu_state.barrier_sequence);
> > +					  rcu_state.barrier_sequence);
> >  		}
> >  	}
> >  	put_online_cpus();
> > @@ -3168,6 +3189,7 @@ void rcu_cpu_starting(unsigned int cpu)
> >  	rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> >  	rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> >  	if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > +		rcu_disable_tick_upon_qs(rdp);
> >  		/* Report QS -after- changing ->qsmaskinitnext! */
> >  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> 
> Just curious about the existing code. If a CPU is just starting up (after
> bringing it online), how can RCU be waiting on it? I thought RCU would not be
> watching offline CPUs.

Well, neither grace periods nor CPU-hotplug operations are atomic,
and each can take significant time to complete.

So suppose we have a large system with multiple leaf rcu_node structures
(not that 17 CPUs is all that many these days, but please bear with me).
Suppose just after a new grace period initializes a given leaf rcu_node
structure, one of its CPUs goes offline (yes, that CPU would have to
have waited on a grace period, but that might have been the previous
grace period).  But before the FQS scan notices that RCU is waiting on
an offline CPU, the CPU comes back online.

That situation is exactly what the above code is intended to handle.

Without that code, RCU can give false-positive splats at various points
in its processing.  ("Wait!  How can a task be blocked waiting on a
grace period that hasn't even started yet???")

							Thanx, Paul

  reply	other threads:[~2019-08-14 22:06 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 15:14 [PATCH tip/core/rcu 0/14] No-CBs bypass addition for v5.4 Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 01/14] rcu/nocb: Atomic ->len field in rcu_segcblist structure Paul E. McKenney
2019-08-04 14:50   ` Peter Zijlstra
2019-08-04 14:52     ` Peter Zijlstra
2019-08-04 18:45       ` Paul E. McKenney
2019-08-04 18:42     ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 02/14] rcu/nocb: Add bypass callback queueing Paul E. McKenney
2019-08-07  0:03   ` Joel Fernandes
2019-08-07  0:16     ` Joel Fernandes
2019-08-07  0:35     ` Paul E. McKenney
2019-08-07  0:40       ` Steven Rostedt
2019-08-07  1:17         ` Paul E. McKenney
2019-08-07  1:24           ` Steven Rostedt
2019-08-07  3:47             ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 03/14] rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 04/14] rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 05/14] rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 06/14] rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 07/14] rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 08/14] rcu/nocb: Reduce __call_rcu_nocb_wake() " Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 09/14] rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 10/14] rcu: Allow rcu_do_batch() to dynamically adjust batch sizes Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 11/14] EXP nohz: Add TICK_DEP_BIT_RCU Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 12/14] rcu/nohz: Force on tick when invoking lots of callbacks Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 13/14] rcutorture: Force on tick for readers and callback flooders Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs Paul E. McKenney
2019-08-04 14:43   ` Peter Zijlstra
2019-08-04 14:48     ` Peter Zijlstra
2019-08-04 18:41       ` Paul E. McKenney
2019-08-04 20:24         ` Paul E. McKenney
2019-08-05  4:19           ` Paul E. McKenney
2019-08-05  8:07             ` Peter Zijlstra
2019-08-05 14:47               ` Paul E. McKenney
2019-08-05  8:05         ` Peter Zijlstra
2019-08-05 14:54           ` Paul E. McKenney
2019-08-05 15:50             ` Peter Zijlstra
2019-08-05 17:48               ` Paul E. McKenney
2019-08-06 18:08                 ` Paul E. McKenney
2019-08-07 21:41                   ` Paul E. McKenney
2019-08-08 20:35                     ` Paul E. McKenney
2019-08-08 21:30                       ` Paul E. McKenney
2019-08-09 16:51                         ` Paul E. McKenney
2019-08-09 18:07                           ` Joel Fernandes
2019-08-09 18:39                             ` Paul E. McKenney
2019-08-12 21:02   ` Frederic Weisbecker
2019-08-12 23:23     ` Paul E. McKenney
2019-08-13  1:33       ` Joel Fernandes
2019-08-13 12:30       ` Frederic Weisbecker
2019-08-13 14:48         ` Paul E. McKenney
2019-08-14 17:55           ` Joel Fernandes
2019-08-14 22:05             ` Paul E. McKenney [this message]
2019-08-15 15:07               ` Joel Fernandes
2019-08-15 17:23                 ` Paul E. McKenney
2019-08-15 18:15                   ` Joel Fernandes
2019-08-15 18:39                     ` Paul E. McKenney
2019-08-15 19:42                       ` Joel Fernandes
2019-08-13 21:06       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190814220516.GY28441@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).