From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: paulmck <paulmck@kernel.org>
Cc: rcu <rcu@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
kernel-team <kernel-team@fb.com>, Ingo Molnar <mingo@kernel.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Josh Triplett <josh@joshtriplett.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
rostedt <rostedt@goodmis.org>,
David Howells <dhowells@redhat.com>,
Eric Dumazet <edumazet@google.com>, fweisbec <fweisbec@gmail.com>,
Oleg Nesterov <oleg@redhat.com>,
"Joel Fernandes, Google" <joel@joelfernandes.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 rcu 04/18] rcu: Weaken ->dynticks accesses and updates
Date: Thu, 29 Jul 2021 10:41:18 -0400 (EDT) [thread overview]
Message-ID: <1929727713.10248.1627569678176.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20210728202802.GL4397@paulmck-ThinkPad-P17-Gen-1>
----- On Jul 28, 2021, at 4:28 PM, paulmck paulmck@kernel.org wrote:
> On Wed, Jul 28, 2021 at 04:03:02PM -0400, Mathieu Desnoyers wrote:
>> ----- On Jul 28, 2021, at 3:45 PM, paulmck paulmck@kernel.org wrote:
>> [...]
>> >
>> > And how about like this?
>> >
>> > Thanx, Paul
>> >
>> > ------------------------------------------------------------------------
>> >
>> > commit cb8914dcc6443cca15ce48d937a93c0dfdb114d3
>> > Author: Paul E. McKenney <paulmck@kernel.org>
>> > Date: Wed Jul 28 12:38:42 2021 -0700
>> >
>> > rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting()
>> >
>> > The purpose of rcu_dynticks_eqs_online() is to adjust the ->dynticks
>> > counter of an incoming CPU if required. It is currently is invoked
>>
>> "is currently is" -> "is currently"
>
> Good catch, fixed!
>
>> > from rcutree_prepare_cpu(), which runs before the incoming CPU is
>> > running, and thus on some other CPU. This makes the per-CPU accesses in
>> > rcu_dynticks_eqs_online() iffy at best, and it all "works" only because
>> > the running CPU cannot possibly be in dyntick-idle mode, which means
>> > that rcu_dynticks_eqs_online() never has any effect. One could argue
>> > that this means that rcu_dynticks_eqs_online() is unnecessary, however,
>> > removing it makes the CPU-online process vulnerable to slight changes
>> > in the CPU-offline process.
>>
>> Why favor moving this from the prepare_cpu to the cpu_starting hotplug step,
>> rather than using the target cpu's rdp from rcutree_prepare_cpu ? Maybe there
>> was a good reason for having this very early in the prepare_cpu step ?
>
> Some years back, there was a good reason. This reason was that
> rcutree_prepare_cpu() marked the CPU as being online from an RCU
> viewpoint. But now rcu_cpu_starting() is the one that marks the CPU as
> being online, so the ->dynticks check can be deferred to this function.
>
>> Also, the commit message refers to this bug as having no effect because the
>> running CPU cannot possibly be in dyntick-idle mode. I understand that calling
>> this function was indeed effect-less, but then why is it OK for the CPU coming
>> online to skip this call in the first place ? This commit message hints at
>> "slight changes in the CPU-offline process" which could break it, but therer is
>> no explanation of what makes this not an actual bug fix.
>
> Because rcutorture would not have suffered in silence had this
> situation ever arisen.
Testing can usually prove the presence of a bug, but it's rather tricky to prove
the absence of bug.
>
> I have updated the commit log to answer these questions as shown
> below. Thoughts?
I'm still concerned about one scenario wrt moving rcu_dynticks_eqs_online()
from rcutree_prepare_cpu to rcu_cpu_starting. What happens if an interrupt
handler, or a NMI handler, nests early over the CPU-online startup code ?
AFAIU, this interrupt handler could contain RCU read-side critical sections,
but if the eqs state does not show the CPU as "online", I wonder whether it
will work as expected.
Thanks,
Mathieu
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 516c8c4cc6fce62539f7e0182739812db4591c1d
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Wed Jul 28 12:38:42 2021 -0700
>
> rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting()
>
> The purpose of rcu_dynticks_eqs_online() is to adjust the ->dynticks
> counter of an incoming CPU when required. It is currently invoked
> from rcutree_prepare_cpu(), which runs before the incoming CPU is
> running, and thus on some other CPU. This makes the per-CPU accesses in
> rcu_dynticks_eqs_online() iffy at best, and it all "works" only because
> the running CPU cannot possibly be in dyntick-idle mode, which means
> that rcu_dynticks_eqs_online() never has any effect.
>
> It is currently OK for rcu_dynticks_eqs_online() to have no effect, but
> only because the CPU-offline process just happens to leave ->dynticks in
> the correct state. After all, if ->dynticks were in the wrong state on a
> just-onlined CPU, rcutorture would complain bitterly the next time that
> CPU went idle, at least in kernels built with CONFIG_RCU_EQS_DEBUG=y,
> for example, those built by rcutorture scenario TREE04. One could
> argue that this means that rcu_dynticks_eqs_online() is unnecessary,
> however, removing it would make the CPU-online process vulnerable to
> slight changes in the CPU-offline process.
>
> One could also ask why it is safe to move the rcu_dynticks_eqs_online()
> call so late in the CPU-online process. Indeed, there was a time when it
> would not have been safe, which does much to explain its current location.
> However, the marking of a CPU as online from an RCU perspective has long
> since moved from rcutree_prepare_cpu() to rcu_cpu_starting(), and all
> that is required is that ->dynticks be set correctly by the time that
> the CPU is marked as online from an RCU perspective. After all, the RCU
> grace-period kthread does not check to see if offline CPUs are also idle.
> (In case you were curious, this is one reason why there is quiescent-state
> reporting as part of the offlining process.)
>
> This commit therefore moves the call to rcu_dynticks_eqs_online() from
> rcutree_prepare_cpu() to rcu_cpu_starting(), this latter being guaranteed
> to be running on the incoming CPU. The call to this function must of
> course be placed before this rcu_cpu_starting() announces this CPU's
> presence to RCU.
>
> Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0172a5fd6d8de..aa00babdaf544 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4129,7 +4129,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs);
> rdp->blimit = blimit;
> rdp->dynticks_nesting = 1; /* CPU not up, no tearing. */
> - rcu_dynticks_eqs_online();
> raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
>
> /*
> @@ -4249,6 +4248,7 @@ void rcu_cpu_starting(unsigned int cpu)
> mask = rdp->grpmask;
> WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
> WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
> + rcu_dynticks_eqs_online();
> smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
> raw_spin_lock_irqsave_rcu_node(rnp, flags);
> WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2021-07-29 14:41 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-21 20:20 [PATCH rcu 0/18] Miscellaneous fixes for v5.15 Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 01/18] rcu: Fix to include first blocked task in stall warning Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 02/18] rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock Paul E. McKenney
2021-08-03 14:24 ` Qais Yousef
2021-08-03 15:52 ` Paul E. McKenney
2021-08-03 16:12 ` Qais Yousef
2021-08-03 16:28 ` Paul E. McKenney
2021-08-03 16:33 ` Qais Yousef
2021-08-04 13:50 ` Qais Yousef
2021-08-04 22:33 ` Paul E. McKenney
2021-08-06 9:56 ` Qais Yousef
2021-08-06 9:57 ` Qais Yousef
2021-08-06 11:43 ` Paul E. McKenney
2021-08-06 12:33 ` Qais Yousef
2021-07-21 20:21 ` [PATCH rcu 03/18] rcu: Remove special bit at the bottom of the ->dynticks counter Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates Paul E. McKenney
2021-07-21 20:41 ` Linus Torvalds
2021-07-21 21:25 ` Paul E. McKenney
2021-07-28 17:37 ` [PATCH v2 " Paul E. McKenney
2021-07-28 17:58 ` Linus Torvalds
2021-07-28 18:12 ` Mathieu Desnoyers
2021-07-28 18:32 ` Linus Torvalds
2021-07-28 18:39 ` Mathieu Desnoyers
2021-07-28 18:46 ` Paul E. McKenney
2021-07-28 18:46 ` Paul E. McKenney
2021-07-28 18:57 ` Linus Torvalds
2021-07-28 18:23 ` Mathieu Desnoyers
2021-07-28 18:58 ` Paul E. McKenney
2021-07-28 19:45 ` Paul E. McKenney
2021-07-28 20:03 ` Mathieu Desnoyers
2021-07-28 20:28 ` Paul E. McKenney
2021-07-29 14:41 ` Mathieu Desnoyers [this message]
2021-07-29 15:57 ` Paul E. McKenney
2021-07-29 17:41 ` Mathieu Desnoyers
2021-07-29 18:05 ` Paul E. McKenney
2021-07-29 18:42 ` Mathieu Desnoyers
2021-07-28 20:37 ` Josh Triplett
2021-07-28 20:47 ` Paul E. McKenney
2021-07-28 22:23 ` Frederic Weisbecker
2021-07-29 1:07 ` Paul E. McKenney
2021-07-29 7:58 ` [PATCH " Boqun Feng
2021-07-29 10:53 ` Frederic Weisbecker
2021-07-30 5:56 ` Boqun Feng
2021-07-30 17:18 ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 05/18] rcu: Mark accesses to ->rcu_read_lock_nesting Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 06/18] rculist: Unify documentation about missing list_empty_rcu() Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 07/18] rcu/tree: Handle VM stoppage in stall detection Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 08/18] rcu: Do not disable GP stall detection in rcu_cpu_stall_reset() Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 09/18] rcu: Start timing stall repetitions after warning complete Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 10/18] srcutiny: Mark read-side data races Paul E. McKenney
2021-07-29 8:23 ` Boqun Feng
2021-07-29 13:36 ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 11/18] rcu: Mark lockless ->qsmask read in rcu_check_boost_fail() Paul E. McKenney
2021-07-29 8:54 ` Boqun Feng
2021-07-29 14:03 ` Paul E. McKenney
2021-07-30 2:28 ` Boqun Feng
2021-07-30 3:26 ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 12/18] rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 13/18] rcu: Remove trailing spaces and tabs Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 14/18] rcu: Mark accesses in tree_stall.h Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 15/18] rcu: Remove useless "ret" update in rcu_gp_fqs_loop() Paul E. McKenney
2021-08-03 16:48 ` Joe Perches
2021-08-03 17:10 ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 16/18] rcu: Use per_cpu_ptr to get the pointer of per_cpu variable Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 17/18] rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCU Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 18/18] rcu: Print human-readable message for schedule() in RCU reader Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1929727713.10248.1627569678176.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).