From: Joel Fernandes <joel@joelfernandes.org>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
qiang.zhang@windriver.com, Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>, rcu <rcu@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp
Date: Tue, 18 Aug 2020 19:25:54 -0400 [thread overview]
Message-ID: <20200818232554.GA2850477@google.com> (raw)
In-Reply-To: <20200818215511.GA2538@pc636>
On Tue, Aug 18, 2020 at 11:55:11PM +0200, Uladzislau Rezki wrote:
> > On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> > > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > > >
> > > > > > > From: Zqiang <qiang.zhang@windriver.com>
> > > > > > >
> > > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > > > possible cpu "krcp".
> > > > > > >
> > > > > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > > > > > ---
> > > > > > > kernel/rcu/tree.c | 6 +++---
> > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > unsigned long count = 0;
> > > > > > >
> > > > > > > /* Snapshot count of all CPUs */
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > count += READ_ONCE(krcp->count);
> > > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > int cpu, freed = 0;
> > > > > > > unsigned long flags;
> > > > > > >
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > int count;
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > > > int cpu;
> > > > > > > unsigned long flags;
> > > > > > >
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > >
> > > > > > I agree that it can happen.
> > > > > >
> > > > > > Joel, what is your view?
> > > > >
> > > > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > > > to drain the caches in the CPU offline path and save the memory. But
> > > > > then it will take hit during __get_free_page(). If CPU
> > > > > offlining/online is not frequent, then it will save the lost memory.
> > > > >
> > > > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Do I count this as an ack or a review? If not, what precisely would
> > > > you like the submitter to do differently?
> > >
> > > Hi Paul,
> > > The patch is correct and is definitely an improvement. I was thinking
> > > about whether we should always do what the patch is doing when
> > > offlining CPUs to save memory but now I feel that may not be that much
> > > of a win to justify more complexity.
> > >
> > > You can take it with my ack:
> > >
> > > Acked-by: Joel Fernandes <joel@joelfernandes.org>
> >
> > Thank you all! I wordsmithed a bit as shown below, so please let
> > me know if I messed anything up.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit fe5d89cc025b3efe682cac122bc4d39f4722821e
> > Author: Zqiang <qiang.zhang@windriver.com>
> > Date: Fri Aug 14 14:45:57 2020 +0800
> >
> > rcu: Shrink each possible cpu krcp
> >
> > CPUs can go offline shortly after kfree_call_rcu() has been invoked,
> > which can leave memory stranded until those CPUs come back online.
> > This commit therefore drains the kcrp of each CPU, not just the
> > ones that happen to be online.
> >
> > Acked-by: Joel Fernandes <joel@joelfernandes.org>
> > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 02ca8e5..d9f90f6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > unsigned long count = 0;
> >
> > /* Snapshot count of all CPUs */
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > count += READ_ONCE(krcp->count);
> > @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > int cpu, freed = 0;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > int count;
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
> > int cpu;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > raw_spin_lock_irqsave(&krcp->lock, flags);
> >
>
> Should we just clean a krc of a CPU when it goes offline?
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b8ccd7b5af82..6decb9ad2421 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> + struct kfree_rcu_cpu *krcp;
>
> if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> return 0;
>
> + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> + krcp = this_cpu_ptr(&krc)
> + schedule_delayed_work(&krcp->monitor_work, 0);
> +
>
> A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> Maybe be never.
>
Yes that is a bug as we discussed on IRC, thanks for following up as well.
We need to acquire the krcp->lock too if no monitor is scheduled then nothing
to do so it does not race with the kfree_rcu_work. So same as what shrinker
does:
raw_spin_lock_irqsave(&krcp->lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
else
raw_spin_unlock_irqrestore(&krcp->lock, flags);
thanks!
- Joel
prev parent reply other threads:[~2020-08-18 23:25 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-14 6:45 [PATCH] rcu: shrink each possible cpu krcp qiang.zhang
2020-08-14 18:51 ` Uladzislau Rezki
2020-08-17 22:03 ` Joel Fernandes
2020-08-18 17:18 ` Paul E. McKenney
2020-08-18 19:00 ` Joel Fernandes
2020-08-18 21:03 ` Paul E. McKenney
2020-08-18 21:55 ` Uladzislau Rezki
2020-08-18 22:02 ` Paul E. McKenney
2020-08-19 0:04 ` Joel Fernandes
2020-08-19 3:00 ` 回复: " Zhang, Qiang
2020-08-19 13:04 ` Paul E. McKenney
2020-08-19 13:56 ` Joel Fernandes
2020-08-19 15:21 ` Paul E. McKenney
2020-08-19 15:54 ` Joel Fernandes
2020-08-19 15:58 ` Uladzislau Rezki
2020-08-20 22:39 ` Joel Fernandes
2020-08-21 15:33 ` Paul E. McKenney
2020-08-31 9:30 ` Uladzislau Rezki
2020-09-09 6:35 ` Zhang, Qiang
2020-09-09 7:03 ` RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func Zhang, Qiang
2020-09-09 11:22 ` Paul E. McKenney
2020-09-10 3:25 ` 回复: " Zhang, Qiang
2020-09-14 20:06 ` Joel Fernandes
2020-08-19 11:22 ` [PATCH] rcu: shrink each possible cpu krcp Uladzislau Rezki
2020-08-19 13:25 ` Joel Fernandes
2020-08-18 23:25 ` Joel Fernandes [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200818232554.GA2850477@google.com \
--to=joel@joelfernandes.org \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=qiang.zhang@windriver.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).