From: Uladzislau Rezki <urezki@gmail.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>,
Uladzislau Rezki <urezki@gmail.com>,
"Zhang, Qiang" <Qiang.Zhang@windriver.com>,
Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>, rcu <rcu@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
Date: Mon, 31 Aug 2020 11:30:32 +0200 [thread overview]
Message-ID: <20200831093032.GA19139@pc636> (raw)
In-Reply-To: <20200821153328.GH2855@paulmck-ThinkPad-P72>
On Fri, Aug 21, 2020 at 08:33:28AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> > > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > > > >
> > > > > >
> > > > > > ________________________________________
> > > > > > 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> > > > > > 发送时间: 2020年8月19日 8:04
> > > > > > 收件人: Paul E. McKenney
> > > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > > >
> > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > > > {
> > > > > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > > > > > + struct kfree_rcu_cpu *krcp;
> > > > > > > >
> > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > > > return 0;
> > > > > > > >
> > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > > > + krcp = this_cpu_ptr(&krc)
> > > > > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > > > +
> > > > > > > >
> > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > > > > Maybe be never.
> > > > > > >
> > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > > > > > hard time getting too worried about it. ;-)
> > > > > >
> > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > > > >
> > > > > > >thanks,
> > > > > > >
> > > > > > >- Joel
> > > > > >
> > > > > > When cpu going offline, the slub or slab only flush free objects in offline
> > > > > > cpu cache, put these free objects in node list or return buddy system,
> > > > > > for those who are still in use, they still stay offline cpu cache.
> > > > > >
> > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should
> > > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > > > > called in "cpuhp/%u" per-cpu thread.
> > > > > >
> > > > >
> > > > > Could you please wrap text properly when you post to mailing list, thanks. I
> > > > > fixed it for you above.
> > > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > > unsigned long flags;
> > > > > > struct rcu_data *rdp;
> > > > > > struct rcu_node *rnp;
> > > > > > + struct kfree_rcu_cpu *krcp;
> > > > > >
> > > > > > rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > rnp = rdp->mynode;
> > > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > >
> > > > > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > > > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > > > > +
> > > > > > + krcp = per_cpu_ptr(&krc, cpu);
> > > > > > + raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > > > return 0;
> > > > >
> > > > > I realized the above is not good enough for what this is trying to do. Unlike
> > > > > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > > > > RCU because the previous batch may still be waiting for a grace period. So
> > > > > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > > > > objects still in the cache.
> > > > >
> > > > > One option is to spin-wait here for monitor_todo to be false and keep calling
> > > > > kfree_rcu_drain_unlock() till then.
> > > > >
> > > > > But then that's not good enough either, because if new objects are queued
> > > > > when interrupts are enabled in the CPU offline path, then the cache will get
> > > > > new objects after the previous set was drained. Further, spin waiting may
> > > > > introduce deadlocks.
> > > > >
> > > > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > > > objects cannot be cached in the offline path and are submitted directly to
> > > > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > > > > kfree_rcu() will like that.
> > > >
> > > > Or spawn a workqueue that does something like this:
> > > >
> > > > 1. Get any pending kvfree_rcu() requests sent off to RCU.
> > > >
> > > > 2. Do an rcu_barrier().
> > > >
> > > > 3. Do the cleanup actions.
> > > >
> > > > > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > > > > the shrinker case, and then we can tackle the hotplug one.
> > > >
> > > > It might take some experimentation to find the best solution.
> > > >
> > >
> > > <snip>
> > > static void do_idle(void)
> > > {
> > > ...
> > > while (!need_resched()) {
> > > rmb();
> > >
> > > local_irq_disable();
> > >
> > > if (cpu_is_offline(cpu)) {
> > > tick_nohz_idle_stop_tick();
> > > cpuhp_report_idle_dead();
> > > -> cpuhp_report_idle_dead(void)
> > > -> rcu_report_dead(smp_processor_id());
> > > arch_cpu_idle_dead();
> > > }
> > > ...
> > > <snip>
> > >
> > > We have the rcu_report_dead() callback. When it gets called IRQs are off
> > > and CPU that is in question is offline.
> > >
> > > krcp = per_cpu_ptr(&krc, cpu);
> > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > krcp->monotro_todo = true;
> > > schedule_delayed_work(&krcp->monitor_work, 0);
> > > raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > >
> > > If there is a batch that is in progress, the job will rearm itself.
> > > But i agree, it requires more experiments.
> >
> > I chatted with Ulad and we believe the timer and/or (delayed) workqueue will
> > get migrated during the CPU offline path, so it is not an issue.
> >
> > In this case, Qiang's initial patch suffices to fix the shrinker issue.
>
> As in the patch that is currented in -rcu, correct?
>
The "[PATCH] rcu: shrink each possible cpu krcp" will fix the
shrinker "issue" when CPU goes offline. So that was valid concern.
As for "main track", our drain work or a timer that triggers it
will be migrated anyway. Just to repeat what Joel wrote earlier,
we do not have any issues with this part.
--
Vlad Rezki
next prev parent reply other threads:[~2020-08-31 9:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-14 6:45 [PATCH] rcu: shrink each possible cpu krcp qiang.zhang
2020-08-14 18:51 ` Uladzislau Rezki
2020-08-17 22:03 ` Joel Fernandes
2020-08-18 17:18 ` Paul E. McKenney
2020-08-18 19:00 ` Joel Fernandes
2020-08-18 21:03 ` Paul E. McKenney
2020-08-18 21:55 ` Uladzislau Rezki
2020-08-18 22:02 ` Paul E. McKenney
2020-08-19 0:04 ` Joel Fernandes
2020-08-19 3:00 ` 回复: " Zhang, Qiang
2020-08-19 13:04 ` Paul E. McKenney
2020-08-19 13:56 ` Joel Fernandes
2020-08-19 15:21 ` Paul E. McKenney
2020-08-19 15:54 ` Joel Fernandes
2020-08-19 15:58 ` Uladzislau Rezki
2020-08-20 22:39 ` Joel Fernandes
2020-08-21 15:33 ` Paul E. McKenney
2020-08-31 9:30 ` Uladzislau Rezki [this message]
2020-09-09 6:35 ` Zhang, Qiang
2020-09-09 7:03 ` RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func Zhang, Qiang
2020-09-09 11:22 ` Paul E. McKenney
2020-09-10 3:25 ` 回复: " Zhang, Qiang
2020-09-14 20:06 ` Joel Fernandes
2020-08-19 11:22 ` [PATCH] rcu: shrink each possible cpu krcp Uladzislau Rezki
2020-08-19 13:25 ` Joel Fernandes
2020-08-18 23:25 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200831093032.GA19139@pc636 \
--to=urezki@gmail.com \
--cc=Qiang.Zhang@windriver.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).