From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>,
Suren Baghdasaryan <surenb@google.com>,
kernel-team <kernel-team@android.com>, Tejun Heo <tj@kernel.org>,
Tim Murray <timmurray@google.com>, Wei Wang <wvw@google.com>,
Kyle Lin <kylelin@google.com>, Chunwei Lu <chunweilu@google.com>,
Lulu Wang <luluw@google.com>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <quic_neeraju@quicinc.com>,
Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>, rcu <rcu@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT kthread_worker
Date: Fri, 8 Apr 2022 10:41:26 -0400 [thread overview]
Message-ID: <CAEXW_YSrGKXh5DiJyrNvmbssSXbWBkA-XUjGRdS8HtGvW1r6hw@mail.gmail.com> (raw)
In-Reply-To: <20220408143444.GC4285@paulmck-ThinkPad-P17-Gen-1>
On Fri, Apr 8, 2022 at 10:34 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Apr 08, 2022 at 06:42:42AM -0400, Joel Fernandes wrote:
> > On Fri, Apr 8, 2022 at 12:57 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> > >
> > [...]
> > > @@ -334,15 +334,13 @@ static bool exp_funnel_lock(unsigned long s)
> > > * Select the CPUs within the specified rcu_node that the upcoming
> > > * expedited grace period needs to wait for.
> > > */
> > > -static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > +static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
> > > {
> > > int cpu;
> > > unsigned long flags;
> > > unsigned long mask_ofl_test;
> > > unsigned long mask_ofl_ipi;
> > > int ret;
> > > - struct rcu_exp_work *rewp =
> > > - container_of(wp, struct rcu_exp_work, rew_work);
> > > struct rcu_node *rnp = container_of(rewp, struct rcu_node, rew);
> > >
> > > raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > @@ -417,13 +415,119 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
> > > }
> > >
> > > +static void rcu_exp_sel_wait_wake(unsigned long s);
> > > +
> > > +#ifdef CONFIG_RCU_EXP_KTHREAD
> >
> > Just my 2c:
> >
> > Honestly, I am not sure if the benefits of duplicating the code to use
> > normal workqueues outweighs the drawbacks (namely code complexity,
> > code duplication - which can in turn cause more bugs and maintenance
> > headaches down the line). The code is harder to read and adding more
> > 30 character function names does not help.
> >
> > For something as important as expedited GPs, I can't imagine a
> > scenario where an RT kthread worker would cause "issues". If it does
> > cause issues, that's what the -rc cycles and the stable releases are
> > for. I prefer to trust the process than take a one-foot-in-the-door
> > approach.
> >
> > So please, can we just keep it simple?
>
> Yes and no.
>
> This is a bug fix, but only for those systems that are expecting real-time
> response from synchronize_rcu_expedited(). As far as I know, this is only
> Android. The rest of the systems are just fine with the current behavior.
As far as you know, but are you sure?
> In addition, this bug fix introduces significant risks, especially in
> terms of performance for throughput-oriented workloads.
Could you explain what the risk is? That's the part I did not follow.
How can making synchronize_rcu_expedited() work getting priority
introduce throughput issues?
> So yes, let's do this bug fix (with appropriate adjustment), but let's
> also avoid exposing the non-Android workloads to risks from the inevitable
> unintended consequences. ;-)
I would argue the risk is also adding code complexity and more bugs
without clear rationale for why it is being done. There's always risk
with any change, but that's the -rc cycles and stable kernels help
catch those. I think we should not add more code complexity if it is a
theoretical concern.
There's also another possible risk - there is a possible hidden
problem here that probably the non-Android folks haven't noticed or
been able to debug. I would rather just do the right thing.
Just my 2c,
- Joel
>
> Thanx, Paul
>
> > Thanks,
> >
> > - Joel
> >
> >
> > > +static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp)
> > > +{
> > > + struct rcu_exp_work *rewp =
> > > + container_of(wp, struct rcu_exp_work, rew_work);
> > > +
> > > + __sync_rcu_exp_select_node_cpus(rewp);
> > > +}
> > > +
> > > +static inline bool rcu_gp_par_worker_started(void)
> > > +{
> > > + return !!READ_ONCE(rcu_exp_par_gp_kworker);
> > > +}
> > > +
> > > +static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp)
> > > +{
> > > + kthread_init_work(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
> > > + /*
> > > + * Use rcu_exp_par_gp_kworker, because flushing a work item from
> > > + * another work item on the same kthread worker can result in
> > > + * deadlock.
> > > + */
> > > + kthread_queue_work(rcu_exp_par_gp_kworker, &rnp->rew.rew_work);
> > > +}
> > > +
> > > +static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp)
> > > +{
> > > + kthread_flush_work(&rnp->rew.rew_work);
> > > +}
> > > +
> > > +/*
> > > + * Work-queue handler to drive an expedited grace period forward.
> > > + */
> > > +static void wait_rcu_exp_gp(struct kthread_work *wp)
> > > +{
> > > + struct rcu_exp_work *rewp;
> > > +
> > > + rewp = container_of(wp, struct rcu_exp_work, rew_work);
> > > + rcu_exp_sel_wait_wake(rewp->rew_s);
> > > +}
> > > +
> > > +static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew)
> > > +{
> > > + kthread_init_work(&rew->rew_work, wait_rcu_exp_gp);
> > > + kthread_queue_work(rcu_exp_gp_kworker, &rew->rew_work);
> > > +}
> > > +
> > > +static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew)
> > > +{
> > > +}
> > > +#else /* !CONFIG_RCU_EXP_KTHREAD */
> > > +static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > +{
> > > + struct rcu_exp_work *rewp =
> > > + container_of(wp, struct rcu_exp_work, rew_work);
> > > +
> > > + __sync_rcu_exp_select_node_cpus(rewp);
> > > +}
> > > +
> > > +static inline bool rcu_gp_par_worker_started(void)
> > > +{
> > > + return !!READ_ONCE(rcu_par_gp_wq);
> > > +}
> > > +
> > > +static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp)
> > > +{
> > > + int cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
> > > +
> > > + INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
> > > + /* If all offline, queue the work on an unbound CPU. */
> > > + if (unlikely(cpu > rnp->grphi - rnp->grplo))
> > > + cpu = WORK_CPU_UNBOUND;
> > > + else
> > > + cpu += rnp->grplo;
> > > + queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
> > > +}
> > > +
> > > +static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp)
> > > +{
> > > + flush_work(&rnp->rew.rew_work);
> > > +}
> > > +
> > > +/*
> > > + * Work-queue handler to drive an expedited grace period forward.
> > > + */
> > > +static void wait_rcu_exp_gp(struct work_struct *wp)
> > > +{
> > > + struct rcu_exp_work *rewp;
> > > +
> > > + rewp = container_of(wp, struct rcu_exp_work, rew_work);
> > > + rcu_exp_sel_wait_wake(rewp->rew_s);
> > > +}
> > > +
> > > +static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew)
> > > +{
> > > + INIT_WORK_ONSTACK(&rew->rew_work, wait_rcu_exp_gp);
> > > + queue_work(rcu_gp_wq, &rew->rew_work);
> > > +}
> > > +
> > > +static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew)
> > > +{
> > > + destroy_work_on_stack(&rew->rew_work);
> > > +}
> > > +#endif /* CONFIG_RCU_EXP_KTHREAD */
> > > +
> > > /*
> > > * Select the nodes that the upcoming expedited grace period needs
> > > * to wait for.
> > > */
> > > static void sync_rcu_exp_select_cpus(void)
> > > {
> > > - int cpu;
> > > struct rcu_node *rnp;
> > >
> > > trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("reset"));
> > > @@ -435,28 +539,21 @@ static void sync_rcu_exp_select_cpus(void)
> > > rnp->exp_need_flush = false;
> > > if (!READ_ONCE(rnp->expmask))
> > > continue; /* Avoid early boot non-existent wq. */
> > > - if (!READ_ONCE(rcu_par_gp_wq) ||
> > > + if (!rcu_gp_par_worker_started() ||
> > > rcu_scheduler_active != RCU_SCHEDULER_RUNNING ||
> > > rcu_is_last_leaf_node(rnp)) {
> > > - /* No workqueues yet or last leaf, do direct call. */
> > > + /* No worker started yet or last leaf, do direct call. */
> > > sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work);
> > > continue;
> > > }
> > > - INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
> > > - cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
> > > - /* If all offline, queue the work on an unbound CPU. */
> > > - if (unlikely(cpu > rnp->grphi - rnp->grplo))
> > > - cpu = WORK_CPU_UNBOUND;
> > > - else
> > > - cpu += rnp->grplo;
> > > - queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
> > > + sync_rcu_exp_select_cpus_queue_work(rnp);
> > > rnp->exp_need_flush = true;
> > > }
> > >
> > > - /* Wait for workqueue jobs (if any) to complete. */
> > > + /* Wait for jobs (if any) to complete. */
> > > rcu_for_each_leaf_node(rnp)
> > > if (rnp->exp_need_flush)
> > > - flush_work(&rnp->rew.rew_work);
> > > + sync_rcu_exp_select_cpus_flush_work(rnp);
> > > }
> > >
> > > /*
> > > @@ -622,17 +719,6 @@ static void rcu_exp_sel_wait_wake(unsigned long s)
> > > rcu_exp_wait_wake(s);
> > > }
> > >
> > > -/*
> > > - * Work-queue handler to drive an expedited grace period forward.
> > > - */
> > > -static void wait_rcu_exp_gp(struct work_struct *wp)
> > > -{
> > > - struct rcu_exp_work *rewp;
> > > -
> > > - rewp = container_of(wp, struct rcu_exp_work, rew_work);
> > > - rcu_exp_sel_wait_wake(rewp->rew_s);
> > > -}
> > > -
> > > #ifdef CONFIG_PREEMPT_RCU
> > >
> > > /*
> > > @@ -848,20 +934,19 @@ void synchronize_rcu_expedited(void)
> > > } else {
> > > /* Marshall arguments & schedule the expedited grace period. */
> > > rew.rew_s = s;
> > > - INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
> > > - queue_work(rcu_gp_wq, &rew.rew_work);
> > > + synchronize_rcu_expedited_queue_work(&rew);
> > > }
> > >
> > > /* Wait for expedited grace period to complete. */
> > > rnp = rcu_get_root();
> > > wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
> > > sync_exp_work_done(s));
> > > - smp_mb(); /* Workqueue actions happen before return. */
> > > + smp_mb(); /* Work actions happen before return. */
> > >
> > > /* Let the next expedited grace period start. */
> > > mutex_unlock(&rcu_state.exp_mutex);
> > >
> > > if (likely(!boottime))
> > > - destroy_work_on_stack(&rew.rew_work);
> > > + synchronize_rcu_expedited_destroy_work(&rew);
> > > }
> > > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
> > >
> > > base-commit: 42e7a03d3badebd4e70aea5362d6914dfc7c220b
> > > --
> > > 2.35.1.1178.g4f1659d476-goog
> > >
next prev parent reply other threads:[~2022-04-08 14:41 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-08 4:57 [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT kthread_worker Kalesh Singh
2022-04-08 10:21 ` Joel Fernandes
2022-04-08 15:25 ` Paul E. McKenney
2022-04-08 15:39 ` Steven Rostedt
2022-04-08 15:58 ` Paul E. McKenney
2022-04-08 21:27 ` Steven Rostedt
2022-04-08 10:42 ` Joel Fernandes
2022-04-08 14:34 ` Paul E. McKenney
2022-04-08 14:41 ` Joel Fernandes [this message]
2022-04-08 15:34 ` Paul E. McKenney
2022-04-08 17:14 ` Joel Fernandes
2022-04-08 17:39 ` Paul E. McKenney
2022-04-08 17:53 ` Kalesh Singh
2022-04-08 21:42 ` Steven Rostedt
2022-04-08 22:06 ` Kalesh Singh
[not found] ` <20220409071740.6024-1-hdanton@sina.com>
2022-04-09 15:56 ` Paul E. McKenney
[not found] ` <20220413113711.1263-1-hdanton@sina.com>
2022-04-13 14:07 ` Paul E. McKenney
2022-04-13 17:21 ` Joel Fernandes
2022-04-13 18:07 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEXW_YSrGKXh5DiJyrNvmbssSXbWBkA-XUjGRdS8HtGvW1r6hw@mail.gmail.com \
--to=joel@joelfernandes.org \
--cc=chunweilu@google.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=kaleshsingh@google.com \
--cc=kernel-team@android.com \
--cc=kylelin@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luluw@google.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=quic_neeraju@quicinc.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=surenb@google.com \
--cc=timmurray@google.com \
--cc=tj@kernel.org \
--cc=wvw@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).