From: Joel Fernandes <joel@joelfernandes.org>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: "Zhang, Qiang1" <qiang1.zhang@intel.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
quic_neeraju@quicinc.com, rcu@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] rcu: Remove impossible wakeup rcu GP kthread action from rcu_report_qs_rdp()
Date: Tue, 24 Jan 2023 16:58:26 +0000 [thread overview]
Message-ID: <Y9AOMhy4GCyaqxdG@google.com> (raw)
In-Reply-To: <Y873895orjOvPNg9@lothringen>
Hi Frederic,
On Mon, Jan 23, 2023 at 10:11:15PM +0100, Frederic Weisbecker wrote:
> On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote:
> > > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@kernel.org> wrote:
> > >
> > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote:
> > >>> What am I missing?
> > >>
> > >> That the acceleration is also done by __note_gp_changes() once the
> > >> grace period ends anyway, so if any acceleration was missed as you
> > >> say, it will be done anyway.
> > >>
> > >> Also it is done by scheduler tick raising softirq:
> > >>
> > >> rcu_pending() does this:
> > >> /* Has RCU gone idle with this CPU needing another grace period? */
> > >> if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
> > >> !rcu_rdp_is_offloaded(rdp) &&
> > >> !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> > >> return 1;
> > >>
> > >> and rcu_core():
> > >> /* No grace period and unregistered callbacks? */
> > >> if (!rcu_gp_in_progress() &&
> > >> rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
> > >> rcu_nocb_lock_irqsave(rdp, flags);
> > >> if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> > >> rcu_accelerate_cbs_unlocked(rnp, rdp);
> > >> rcu_nocb_unlock_irqrestore(rdp, flags);
> > >> }
> > >>
> > >> So, I am not sure if you need needacc at all. Those CBs that have not
> > >> been assigned grace period numbers will be taken care off :)
> > >
> > > But that's only when there is no grace period pending, so it can't happen while
> > > we report a QS.
> > >
> > > OTOH without the needacc, those callbacks waiting to be accelerated would be
> > > eventually processed but only on the next tick following the end of a grace
> > > period...if none has started since then. So if someone else starts a new GP
> > > before the current CPU, we must wait another GP, etc...
> > >
> > > That's potentially dangerous.
> >
> > Waiting for just one more GP cannot be dangerous IMO. Anyway there is no
> > guarantee that callback will run immediately at end of GP, there may be one or
> > more GPs before callback can run, if I remember correctly. That is by
> > design.. but please correct me if my understanding is different from yours.
>
> It's not bound to just one GP. If you have N CPUs flooding callbacks for a
> long while, your CPU has 1/N chances to be the one starting the next GP on
> each turn. Relying on the acceleration to be performed only when no GP is
> running may theoretically starve your callbacks forever.
I tend to agree with you, however I was mostly referring to the "needac". The
theoretical case is even more theoretical because you have to be half way
through the transition _and_ flooding at the same time such that there is not
a chance for RCU to be idle for a long period of time (which we don't really
see in our systems per-se). But I agree, even if theoretical, maybe better to
handle it.
> > > And unfortunately we can't do the acceleration from __note_gp_changes()
> > > due to lock ordering restrictions: nocb_lock -> rnp_lock
> > >
Yeah I was more going for the fact that if the offload to deoffload
transition completed before note_gp_changes() was called, which obviously we
cannot assume...
But yeah for the 'limbo between offload to not offload state', eventually RCU
goes idle, the scheduling clock interrupt (via rcu_pending()) will raise
softirq or note_gp_changes() will accelerate. Which does not help the
theoretical flooding case above, but still...
> >
> > Ah. This part I am not sure. Appreciate if point me to any old archive
> > links or documentation detailing that, if possible…
>
> It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has
> that locking order for example.
>
> Excerpt:
>
> rcu_nocb_lock_irqsave(rdp, flags); if (rcu_segcblist_nextgp(cblist,
> &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
> raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ }
>
> Thanks.
Ah I know what you mean now, in other words there is a chance of an ABBA
deadlock if we do not acquire the locks in the correct order. Thank you for
clarifying. Indeed rcutree_migrate_callbacks() also confirms it.
I guess to Frederic's point, another case other than the CB flooding, that can
starve CB accelerations is if the system is in the 'limbo between offload and
de-offload' state for a long period of time. In such a situation also, neither
note_gp_changes(), nor the scheduling-clock interrupt via softirq will help
accelerate CBs.
Whether such a limbo state is possible indefinitely, I am not sure...
Thanks a lot for the discussions and it is always a learning experience
talking about these...!
thanks,
- Joel
next prev parent reply other threads:[~2023-01-24 16:58 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-18 7:30 [PATCH v3] rcu: Remove impossible wakeup rcu GP kthread action from rcu_report_qs_rdp() Zqiang
2023-01-18 10:18 ` Frederic Weisbecker
2023-01-18 18:07 ` Paul E. McKenney
2023-01-18 23:30 ` Zhang, Qiang1
2023-01-20 3:14 ` Joel Fernandes
2023-01-20 3:17 ` Joel Fernandes
2023-01-20 4:09 ` Zhang, Qiang1
2023-01-20 4:40 ` Joel Fernandes
2023-01-20 8:19 ` Zhang, Qiang1
2023-01-20 13:27 ` Joel Fernandes
2023-01-20 20:33 ` Paul E. McKenney
2023-01-20 22:35 ` Frederic Weisbecker
2023-01-20 23:20 ` Paul E. McKenney
2023-01-20 23:04 ` Frederic Weisbecker
2023-01-23 15:22 ` Joel Fernandes
2023-01-23 16:27 ` Frederic Weisbecker
2023-01-23 20:54 ` Joel Fernandes
2023-01-23 21:11 ` Frederic Weisbecker
2023-01-24 16:58 ` Joel Fernandes [this message]
2023-01-21 0:38 ` Zhang, Qiang1
2023-01-24 17:10 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y9AOMhy4GCyaqxdG@google.com \
--to=joel@joelfernandes.org \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=qiang1.zhang@intel.com \
--cc=quic_neeraju@quicinc.com \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).