From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>,
rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
fweisbec@gmail.com, oleg@redhat.com
Subject: Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs
Date: Wed, 14 Aug 2019 13:55:46 -0400 [thread overview]
Message-ID: <20190814175546.GB68498@google.com> (raw)
In-Reply-To: <20190813144809.GB28441@linux.ibm.com>
On Tue, Aug 13, 2019 at 07:48:09AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 13, 2019 at 02:30:19PM +0200, Frederic Weisbecker wrote:
> > On Mon, Aug 12, 2019 at 04:23:16PM -0700, Paul E. McKenney wrote:
> > > On Mon, Aug 12, 2019 at 11:02:33PM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Aug 02, 2019 at 08:15:01AM -0700, Paul E. McKenney wrote:
> > > > Looks like it's not the right fix but, should you ever need to set an
> > > > all-CPUs (system wide) tick dependency in the future, you can use tick_set_dep().
> > >
> > > Indeed, I have dropped this patch, but I now do something similar in
> > > RCU's CPU-hotplug notifiers. Which does have an effect, especially on
> > > the system that isn't subject to the insane-latency cpu_relax().
> > >
> > > Plus I am having to put a similar workaround into RCU's quiescent-state
> > > forcing logic.
> > >
> > > But how should this really be done?
> > >
> > > Isn't there some sort of monitoring of nohz_full CPUs for accounting
> > > purposes? If so, would it make sense for this monitoring to check for
> > > long-duration kernel execution and enable the tick in this case? The
> > > RCU dyntick machinery can be used to remotely detect the long-duration
> > > kernel execution using something like the following:
> > >
> > > int nohz_in_kernel_snap = rcu_dynticks_snap_cpu(cpu);
> > >
> > > ...
> > >
> > > if (rcu_dynticks_in_eqs_cpu(cpu, nohz_in_kernel_snap)
> > > nohz_in_kernel_snap = rcu_dynticks_snap_cpu(cpu);
> > > else
> > > /* Turn on the tick! */
> > >
> > > I would supply rcu_dynticks_snap_cpu() and rcu_dynticks_in_eqs_cpu(),
> > > which would be simple wrappers around RCU's private rcu_dynticks_snap()
> > > and rcu_dynticks_in_eqs() functions.
> > >
> > > Would this make sense as a general solution, or am I missing a corner
> > > case or three?
> >
> > Oh I see. Until now we considered than running into the kernel (between user/guest/idle)
> > is supposed to be short but there can be specific places where it doesn't apply.
> >
> > I'm wondering if, more than just providing wrappers, this shouldn't be entirely
> > driven by RCU using the tick_set_dep_cpu()/tick_clear_dep_cpu() at appropriate timings.
> >
> > I don't want to sound like I'm trying to put all the work on you :p It's just that
> > the tick shouldn't know much about RCU, it's rather RCU that is a client for the tick and
> > is probably better suited to determine when a CPU becomes annoying with its extended grace
> > period.
> >
> > Arming a CPU timer could also be an alternative to tick_set_dep_cpu() for that.
> >
> > What do you think?
>
> Left to itself, RCU would take action only when a given nohz_full
> in-kernel CPU was delaying a grace period, which is what the (lightly
> tested) patch below is supposed to help with. If that is all that is
> needed, well and good!
>
> But should we need long-running in-kernel nohz_full CPUs to turn on
> their ticks when they are not blocking an RCU grace period, for example,
> when RCU is idle, more will be needed. To that point, isn't there some
> sort of monitoring that checks up on nohz_full CPUs ever second or so?
Wouldn't such monitoring need to be more often than a second, given that
rcu_urgent_qs and rcu_need_heavy_qs are configured typically to be sooner
(200-300 jiffies on my system).
> If so, perhaps that monitoring could periodically invoke an RCU function
> that I provide for deciding when to turn the tick on. We would also need
> to work out how to turn the tick off in a timely fashion once the CPU got
> out of kernel mode, perhaps in rcu_user_enter() or rcu_nmi_exit_common().
>
> If this would be called only every second or so, the separate grace-period
> checking is still needed for its shorter timespan, though.
>
> Thoughts?
Do you want me to test the below patch to see if it fixes the issue with my
other test case (where I had a nohz full CPU holding up a grace period).
2 comments below:
> ------------------------------------------------------------------------
>
> commit 1cb89508804f6f2fdb79a1be032b1932d52318c4
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date: Mon Aug 12 16:14:00 2019 -0700
>
> rcu: Force tick on for nohz_full CPUs not reaching quiescent states
>
> CPUs running for long time periods in the kernel in nohz_full mode
> might leave the scheduling-clock interrupt disabled for then full
> duration of their in-kernel execution. This can (among other things)
> delay grace periods. This commit therefore forces the tick back on
> for any nohz_full CPU that is failing to pass through a quiescent state
> upon return from interrupt, which the resched_cpu() will induce.
>
> Reported-by: Joel Fernandes <joel@joelfernandes.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8c494a692728..8b8f5bffdc5a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -651,6 +651,12 @@ static __always_inline void rcu_nmi_exit_common(bool irq)
> */
> if (rdp->dynticks_nmi_nesting != 1) {
> trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2, rdp->dynticks);
> + if (tick_nohz_full_cpu(rdp->cpu) &&
> + rdp->dynticks_nmi_nesting == 2 &&
> + rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> + rdp->rcu_forced_tick = true;
> + tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> + }
> WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
> rdp->dynticks_nmi_nesting - 2);
> return;
> @@ -886,6 +892,16 @@ void rcu_irq_enter_irqson(void)
> local_irq_restore(flags);
> }
>
> +/*
> + * If the scheduler-clock interrupt was enabled on a nohz_full CPU
> + * in order to get to a quiescent state, disable it.
> + */
> +void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
> +{
> + if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick)
> + tick_dep_clear_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> +}
> +
> /**
> * rcu_is_watching - see if RCU thinks that the current CPU is not idle
> *
> @@ -1980,6 +1996,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
> if (!offloaded)
> needwake = rcu_accelerate_cbs(rnp, rdp);
>
> + rcu_disable_tick_upon_qs(rdp);
> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> /* ^^^ Released rnp->lock */
> if (needwake)
> @@ -2269,6 +2286,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
> int cpu;
> unsigned long flags;
> unsigned long mask;
> + struct rcu_data *rdp;
> struct rcu_node *rnp;
>
> rcu_for_each_leaf_node(rnp) {
> @@ -2293,8 +2311,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
> for_each_leaf_node_possible_cpu(rnp, cpu) {
> unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
> if ((rnp->qsmask & bit) != 0) {
> - if (f(per_cpu_ptr(&rcu_data, cpu)))
> + rdp = per_cpu_ptr(&rcu_data, cpu);
> + if (f(rdp)) {
> mask |= bit;
> + rcu_disable_tick_upon_qs(rdp);
> + }
I am guessing this was the earlier thing you corrected, cool!!
> }
> }
> if (mask != 0) {
> @@ -2322,7 +2343,7 @@ void rcu_force_quiescent_state(void)
> rnp = __this_cpu_read(rcu_data.mynode);
> for (; rnp != NULL; rnp = rnp->parent) {
> ret = (READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_FQS) ||
> - !raw_spin_trylock(&rnp->fqslock);
> + !raw_spin_trylock(&rnp->fqslock);
> if (rnp_old != NULL)
> raw_spin_unlock(&rnp_old->fqslock);
> if (ret)
> @@ -2855,7 +2876,7 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
> {
> if (atomic_dec_and_test(&rcu_state.barrier_cpu_count)) {
> rcu_barrier_trace(TPS("LastCB"), -1,
> - rcu_state.barrier_sequence);
> + rcu_state.barrier_sequence);
> complete(&rcu_state.barrier_completion);
> } else {
> rcu_barrier_trace(TPS("CB"), -1, rcu_state.barrier_sequence);
> @@ -2879,7 +2900,7 @@ static void rcu_barrier_func(void *unused)
> } else {
> debug_rcu_head_unqueue(&rdp->barrier_head);
> rcu_barrier_trace(TPS("IRQNQ"), -1,
> - rcu_state.barrier_sequence);
> + rcu_state.barrier_sequence);
> }
> rcu_nocb_unlock(rdp);
> }
> @@ -2906,7 +2927,7 @@ void rcu_barrier(void)
> /* Did someone else do our work for us? */
> if (rcu_seq_done(&rcu_state.barrier_sequence, s)) {
> rcu_barrier_trace(TPS("EarlyExit"), -1,
> - rcu_state.barrier_sequence);
> + rcu_state.barrier_sequence);
> smp_mb(); /* caller's subsequent code after above check. */
> mutex_unlock(&rcu_state.barrier_mutex);
> return;
> @@ -2938,11 +2959,11 @@ void rcu_barrier(void)
> continue;
> if (rcu_segcblist_n_cbs(&rdp->cblist)) {
> rcu_barrier_trace(TPS("OnlineQ"), cpu,
> - rcu_state.barrier_sequence);
> + rcu_state.barrier_sequence);
> smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
> } else {
> rcu_barrier_trace(TPS("OnlineNQ"), cpu,
> - rcu_state.barrier_sequence);
> + rcu_state.barrier_sequence);
> }
> }
> put_online_cpus();
> @@ -3168,6 +3189,7 @@ void rcu_cpu_starting(unsigned int cpu)
> rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> + rcu_disable_tick_upon_qs(rdp);
> /* Report QS -after- changing ->qsmaskinitnext! */
> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
Just curious about the existing code. If a CPU is just starting up (after
bringing it online), how can RCU be waiting on it? I thought RCU would not be
watching offline CPUs.
thanks,
- Joel
[snip]
next prev parent reply other threads:[~2019-08-14 17:55 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-02 15:14 [PATCH tip/core/rcu 0/14] No-CBs bypass addition for v5.4 Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 01/14] rcu/nocb: Atomic ->len field in rcu_segcblist structure Paul E. McKenney
2019-08-04 14:50 ` Peter Zijlstra
2019-08-04 14:52 ` Peter Zijlstra
2019-08-04 18:45 ` Paul E. McKenney
2019-08-04 18:42 ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 02/14] rcu/nocb: Add bypass callback queueing Paul E. McKenney
2019-08-07 0:03 ` Joel Fernandes
2019-08-07 0:16 ` Joel Fernandes
2019-08-07 0:35 ` Paul E. McKenney
2019-08-07 0:40 ` Steven Rostedt
2019-08-07 1:17 ` Paul E. McKenney
2019-08-07 1:24 ` Steven Rostedt
2019-08-07 3:47 ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 03/14] rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 04/14] rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 05/14] rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 06/14] rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 07/14] rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 08/14] rcu/nocb: Reduce __call_rcu_nocb_wake() " Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 09/14] rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 10/14] rcu: Allow rcu_do_batch() to dynamically adjust batch sizes Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 11/14] EXP nohz: Add TICK_DEP_BIT_RCU Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 12/14] rcu/nohz: Force on tick when invoking lots of callbacks Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 13/14] rcutorture: Force on tick for readers and callback flooders Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs Paul E. McKenney
2019-08-04 14:43 ` Peter Zijlstra
2019-08-04 14:48 ` Peter Zijlstra
2019-08-04 18:41 ` Paul E. McKenney
2019-08-04 20:24 ` Paul E. McKenney
2019-08-05 4:19 ` Paul E. McKenney
2019-08-05 8:07 ` Peter Zijlstra
2019-08-05 14:47 ` Paul E. McKenney
2019-08-05 8:05 ` Peter Zijlstra
2019-08-05 14:54 ` Paul E. McKenney
2019-08-05 15:50 ` Peter Zijlstra
2019-08-05 17:48 ` Paul E. McKenney
2019-08-06 18:08 ` Paul E. McKenney
2019-08-07 21:41 ` Paul E. McKenney
2019-08-08 20:35 ` Paul E. McKenney
2019-08-08 21:30 ` Paul E. McKenney
2019-08-09 16:51 ` Paul E. McKenney
2019-08-09 18:07 ` Joel Fernandes
2019-08-09 18:39 ` Paul E. McKenney
2019-08-12 21:02 ` Frederic Weisbecker
2019-08-12 23:23 ` Paul E. McKenney
2019-08-13 1:33 ` Joel Fernandes
2019-08-13 12:30 ` Frederic Weisbecker
2019-08-13 14:48 ` Paul E. McKenney
2019-08-14 17:55 ` Joel Fernandes [this message]
2019-08-14 22:05 ` Paul E. McKenney
2019-08-15 15:07 ` Joel Fernandes
2019-08-15 17:23 ` Paul E. McKenney
2019-08-15 18:15 ` Joel Fernandes
2019-08-15 18:39 ` Paul E. McKenney
2019-08-15 19:42 ` Joel Fernandes
2019-08-13 21:06 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190814175546.GB68498@google.com \
--to=joel@joelfernandes.org \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).