From: Feng Tang <feng.tang@intel.com>
To: Dennis Zhou <dennis@kernel.org>
Cc: Qian Cai <cai@lca.pw>, Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
kernel test robot <rong.a.chen@intel.com>,
Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mgorman@suse.de>, Kees Cook <keescook@chromium.org>,
Luis Chamberlain <mcgrof@kernel.org>,
Iurii Zaikin <yzaikin@google.com>,
andi.kleen@intel.com, tim.c.chen@intel.com,
dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, lkp@lists.01.org
Subject: Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
Date: Tue, 7 Jul 2020 11:24:39 +0800 [thread overview]
Message-ID: <20200707032439.GA21741@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20200707010651.GA2384124@google.com>
On Tue, Jul 07, 2020 at 01:06:51AM +0000, Dennis Zhou wrote:
> On Mon, Jul 06, 2020 at 09:24:43PM +0800, Feng Tang wrote:
> > Hi All,
> >
> > Please help to review this fix patch, thanks!
> >
> > It is against today's linux-mm tree. For easy review, I put the fix
> > into one patch, and I could split it to 2 parts for percpu-counter
> > and mm/util.c if it's preferred.
> >
> > From 593f9dc139181a7c3bb1705aacd1f625f400e458 Mon Sep 17 00:00:00 2001
> > From: Feng Tang <feng.tang@intel.com>
> > Date: Mon, 6 Jul 2020 14:48:29 +0800
> > Subject: [PATCH] mm/util.c: sync vm_committed_as when changing memory policy
> > to OVERCOMMIT_NEVER
> >
> > With the patch to improve scalability of vm_committed_as [1], 0day reported
> > the ltp overcommit_memory test case could fail (fail rate is about 5/50) [2].
> > The root cause is when system is running with loose memory overcommit policy
> > like OVERCOMMIT_GUESS/ALWAYS, the deviation of vm_committed_as could be big,
> > and once the policy is runtime changed to OVERCOMMIT_NEVER, vm_committed_as's
> > batch is decreased to 1/64 of original one, but the deviation is not
> > compensated accordingly, and following __vm_enough_memory() check for vm
> > overcommit could be wrong due to this deviation, which breaks the ltp
> > overcommit_memory case.
> >
> > Fix it by forcing a sync for percpu counter vm_committed_as when overcommit
> > policy is changed to OVERCOMMIT_NEVER (sysctl -w vm.overcommit_memory=2).
> > The sync itself is not a fast operation, and is toleratable given user is
> > not expected to frequently changing policy to OVERCOMMIT_NEVER.
> >
> > [1] https://lore.kernel.org/lkml/1592725000-73486-1-git-send-email-feng.tang@intel.com/
> > [2] https://marc.info/?l=linux-mm&m=159367156428286 (can't find a link in lore.kernel.org)
> >
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > ---
> > include/linux/percpu_counter.h | 4 ++++
> > lib/percpu_counter.c | 14 ++++++++++++++
> > mm/util.c | 11 ++++++++++-
> > 3 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> > index 0a4f54d..01861ee 100644
> > --- a/include/linux/percpu_counter.h
> > +++ b/include/linux/percpu_counter.h
> > @@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
> > s32 batch);
> > s64 __percpu_counter_sum(struct percpu_counter *fbc);
> > int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
> > +void percpu_counter_sync(struct percpu_counter *fbc);
> >
> > static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
> > {
> > @@ -172,6 +173,9 @@ static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
> > return true;
> > }
> >
> > +static inline void percpu_counter_sync(struct percpu_counter *fbc)
> > +{
> > +}
> > #endif /* CONFIG_SMP */
> >
> > static inline void percpu_counter_inc(struct percpu_counter *fbc)
> > diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> > index a66595b..02d87fc 100644
> > --- a/lib/percpu_counter.c
> > +++ b/lib/percpu_counter.c
> > @@ -98,6 +98,20 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
> > }
> > EXPORT_SYMBOL(percpu_counter_add_batch);
> >
> > +void percpu_counter_sync(struct percpu_counter *fbc)
> > +{
> > + unsigned long flags;
> > + s64 count;
> > +
> > + raw_spin_lock_irqsave(&fbc->lock, flags);
> > + count = __this_cpu_read(*fbc->counters);
> > + fbc->count += count;
> > + __this_cpu_sub(*fbc->counters, count);
> > + raw_spin_unlock_irqrestore(&fbc->lock, flags);
> > +}
> > +EXPORT_SYMBOL(percpu_counter_sync);
> > +
> > +
> > /*
> > * Add up all the per-cpu counts, return the result. This is a more accurate
> > * but much slower version of percpu_counter_read_positive()
> > diff --git a/mm/util.c b/mm/util.c
> > index 52ed9c1..5fb62c0 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -746,14 +746,23 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
> > return ret;
> > }
> >
> > +static void sync_overcommit_as(struct work_struct *dummy)
> > +{
> > + percpu_counter_sync(&vm_committed_as);
> > +}
> > +
>
> This seems like a rather niche use case as it's currently coupled with a
> schedule_on_each_cpu(). I can't imagine a use case where you'd want to
> do this without being called by schedule_on_each_cpu().
Yes!
>
> Would it be better to modify or introduce something akin to
> percpu_counter_sum() which sums and folds in the counter state? I'd be
> curious to see what the cost of always folding would be as this is
> already considered the cold path and would help with the next batch too.
Initially, I also thought about doing the sync just like percpu_counter_sum():
raw_spin_lock_irqsave
for_each_online_cpu(cpu) }
do-the-sync
raw_spin_unlock_irqrestore
One problem is the per_cpu_ptr(fbc->counters, cpu) could still be
updated on other CPUs as the fast path update is not protected by
fbc->lock.
As for cost, it is about about 800 nanoseconds on a 2C/4T platform
and 2~3 microseconds on a 2S/36C/72T Skylake server in normal case,
and in worst case where vm_committed_as's spinlock is under severe
contention, it costs 30~40 microseconds for the 2S/36C/72T Skylake
sever.
Thanks,
Feng
> > int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
> > size_t *lenp, loff_t *ppos)
> > {
> > int ret;
> >
> > ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > - if (ret == 0 && write)
> > + if (ret == 0 && write) {
> > + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> > + schedule_on_each_cpu(sync_overcommit_as);
> > +
> > mm_compute_batch();
> > + }
> >
> > return ret;
> > }
> > --
> > 2.7.4
next prev parent reply other threads:[~2020-07-07 3:24 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-21 7:36 [PATCH v5 0/3] make vm_committed_as_batch aware of vm overcommit policy Feng Tang
2020-06-21 7:36 ` [PATCH v5 1/3] proc/meminfo: avoid open coded reading of vm_committed_as Feng Tang
2020-06-21 7:36 ` [PATCH v5 2/3] mm/util.c: make vm_memory_committed() more accurate Feng Tang
2020-06-21 7:36 ` [PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang
2020-06-22 13:25 ` [mm] 4e2c82a409: will-it-scale.per_process_ops 1894.6% improvement kernel test robot
2020-07-02 6:32 ` [mm] 4e2c82a409: ltp.overcommit_memory01.fail kernel test robot
2020-07-02 7:12 ` Feng Tang
2020-07-05 3:20 ` Qian Cai
2020-07-05 4:44 ` Feng Tang
2020-07-05 12:15 ` Qian Cai
2020-07-05 12:58 ` Feng Tang
[not found] ` <20200705155232.GA608@lca.pw>
2020-07-06 1:43 ` Feng Tang
[not found] ` <20200706023614.GA1231@lca.pw>
2020-07-06 13:24 ` Feng Tang
2020-07-06 13:34 ` Andi Kleen
2020-07-06 23:42 ` Andrew Morton
2020-07-07 2:38 ` Feng Tang
2020-07-07 4:00 ` Huang, Ying
2020-07-07 5:41 ` Feng Tang
2020-07-09 4:55 ` Feng Tang
2020-07-09 13:40 ` Qian Cai
2020-07-09 14:15 ` Feng Tang
2020-07-10 1:38 ` Feng Tang
2020-07-07 1:06 ` Dennis Zhou
2020-07-07 3:24 ` Feng Tang [this message]
2020-07-07 10:28 ` Michal Hocko
2020-06-24 9:45 ` [PATCH v5 0/3] make vm_committed_as_batch aware of vm overcommit policy Michal Hocko
[not found] <AF8CFC10-7655-4664-974D-3632793B0710@lca.pw>
2020-07-07 12:06 ` [mm] 4e2c82a409: ltp.overcommit_memory01.fail Michal Hocko
[not found] ` <20200707130436.GA992@lca.pw>
2020-07-07 13:56 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200707032439.GA21741@shbuild999.sh.intel.com \
--to=feng.tang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=andi.kleen@intel.com \
--cc=cai@lca.pw \
--cc=cl@linux.com \
--cc=dave.hansen@intel.com \
--cc=dennis@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@lists.01.org \
--cc=mcgrof@kernel.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=rong.a.chen@intel.com \
--cc=tim.c.chen@intel.com \
--cc=tj@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).