linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Qian Cai <cai@lca.pw>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Kees Cook <keescook@chromium.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Iurii Zaikin <yzaikin@google.com>,
	andi.kleen@intel.com, tim.c.chen@intel.com,
	dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/3] make vm_committed_as_batch aware of vm overcommit policy
Date: Thu, 28 May 2020 23:10:20 +0800	[thread overview]
Message-ID: <20200528151020.GF93879@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20200528141802.GB1810@lca.pw>

On Thu, May 28, 2020 at 10:18:02AM -0400, Qian Cai wrote:
> > > I have been reproduced this on both AMD and Intel. The test just
> > > allocating memory and swapping.
> > > 
> > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/oom/oom01.c
> > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/tunable/overcommit_memory.c
> > > 
> > > It might be better to run the whole LTP mm tests if none of the above
> > > triggers it for you which has quite a few memory pressurers.
> > > 
> > > /opt/ltp/runltp -f mm
> > 
> > Thanks for sharing. I tried to reproduce this on 2 server plaforms,
> > but can't reproduce it, and they are still under testing.
> > 
> > Meanwhile, could you help to try the below patch, which is based on
> > Andi's suggestion and have some debug info. The warning is a little
> > strange, as the condition is
> > 
> > 	(percpu_counter_read(&vm_committed_as) <
> >                        -(s64)vm_committed_as_batch * num_online_cpus())
> > 
> > while for your platform (48 CPU + 128 GB RAM), the
> > '-(s64)vm_committed_as_batch * num_online_cpus()'
> > is a s64 value: '-32G', which makes the condition hard to be true,
> > and when it is,  it could be triggered by some magic for s32/s64
> > operations around the percpu-counter. 
> 
> The patch below does not fix anything.

Thanks for the info.

> 
> [ 3160.275230][ T7955] LTP: starting oom01
> [ 3160.306977][T106365] KK as:-59683  as_sum:6896 

This does show that percpu_counter_read() is not accurate
comparing to percpu_counter_sum(). And if we use 
percpu_counter_sum() then the warning won't happen, as
6896 >> -32768

> check:-32768 batch:256

-32768 means the RAM of the test platform  is either 512M
or 32GB (more likely) depending on overcommit policy, and
it has 128 CPUs. 

So my guess of the root cause is the overcommit policy is
changing while doing the memory stress test, so the
vm_commited_as.count could have bigger deviation when the
overcommit policy is !OVERCOMMIT_NEVER, and when the policy
is changed to OVERCOMMIT_NEVER, the check value is hugely
reduced, which trigger the warning.

If it's true, then there could be 2 solutions, one is to
skip the WARN_ONCE as it has no practical value, as the real
check is the following code, the other is to rectify the
percpu counter when the policy is changing to OVERCOMMIT_NEVER. 

Thanks,
Feng

> [ 3160.307161][T106365] ------------[ cut here ]------------
> [ 3160.307184][T106365] memory commitment underflow
> [ 3160.307216][T106365] WARNING: CPU: 103 PID: 106365 at mm/util.c:858 __vm_enough_memory+0x204/0x250
> [ 3160.307275][T106365] Modules linked in: brd ext4 crc16 mbcache jbd2 loop kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci libahci mdio tg3 libata libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
> [ 3160.307341][T106365] CPU: 103 PID: 106365 Comm: oom01 Not tainted 5.7.0-rc7-next-20200528+ #3
> [ 3160.307368][T106365] NIP:  c0000000003ee654 LR: c0000000003ee650 CTR: c000000000745b40
> [ 3160.307382][T106365] REGS: c0002017f940f730 TRAP: 0700   Not tainted  (5.7.0-rc7-next-20200528+)
> [ 3160.307409][T106365] MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28222482  XER: 00000000
> [ 3160.307442][T106365] CFAR: c00000000010e428 IRQMASK: 0
> [ 3160.307442][T106365] GPR00: c0000000003ee650 c0002017f940f9c0 c00000000130a300 000000000000001b
> [ 3160.307442][T106365] GPR04: c0000000017b8d68 0000000000000006 0000000078154885 ffffffff6ca32090
> [ 3160.307442][T106365] GPR08: 0000201cc61a0000 0000000000000000 0000000000000000 0000000000000003
> [ 3160.307442][T106365] GPR12: 0000000000002000 c000201fff675100 0000000000000000 0000000000000000
> [ 3160.307442][T106365] GPR16: 0000000000000000 0000000000000000 c000200350fe3b60 fffffffffff7dfff
> [ 3160.307442][T106365] GPR20: c000201a36824928 c000201a3682c128 c000200ee33ad3a0 c000200ee33ad3a8
> [ 3160.307442][T106365] GPR24: c000200ee33ad390 ffffffffffff16dd 0000000000000000 0000000000000001
> [ 3160.307442][T106365] GPR28: c000201a36824880 0000000000000001 c000000003fa2a80 c0000000011fb0a8
> [ 3160.307723][T106365] NIP [c0000000003ee654] __vm_enough_memory+0x204/0x250
> [ 3160.307759][T106365] LR [c0000000003ee650] __vm_enough_memory+0x200/0x250
> [ 3160.308310][T106365] ---[ end trace e2152aa44c190593 ]---
> [ 3160.308478][T106365] KK as:-59683  as_sum:6897  check:-32768 batch:256
> [ 3160.308614][T106365] KK as:-59683  as_sum:6898  check:-32768 batch:256
> [ 3160.308714][T106365] KK as:-59683  as_sum:6901  check:-32768 batch:256
> [ 3160.308806][T106365] KK as:-59683  as_sum:6902  check:-32768 batch:256
> [ 3160.308900][T106365] KK as:-59683  as_sum:6903  check:-32768 batch:256
> [ 3160.308979][T106365] KK as:-59683  as_sum:6904  check:-32768 batch:256
> [ 3160.309064][T106365] KK as:-59683  as_sum:6905  check:-32768 batch:256
> [ 3160.309160][T106365] KK as:-59683  as_sum:6906  check:-32768 batch:256
> [ 3160.309275][T106365] KK as:-59683  as_sum:6907  check:-32768 batch:256
> [ 3160.309356][T106365] KK as:-59683  as_sum:6908  check:-32768 batch:256
> [ 3160.309437][T106365] KK as:-59683  as_sum:6909  check:-32768 batch:256
> [ 3160.310939][T106366] KK as:-59683  as_sum:6912  check:-32768 batch:256
> [ 3160.311134][T106366] KK as:-59270  as_sum:7040  check:-32768 batch:256
> [ 3160.311240][T106367] KK as:-59270  as_sum:7168  check:-32768 batch:256
> 

  reply	other threads:[~2020-05-28 15:10 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-08  7:25 [PATCH 0/3] make vm_committed_as_batch aware of vm overcommit policy Feng Tang
2020-05-08  7:25 ` [PATCH 1/3] proc/meminfo: avoid open coded reading of vm_committed_as Feng Tang
2020-05-15  7:22   ` Michal Hocko
2020-05-08  7:25 ` [PATCH 2/3] mm/util.c: make vm_memory_committed() more accurate Feng Tang
2020-05-15  7:23   ` Michal Hocko
2020-05-15  8:11     ` Feng Tang
2020-05-15  9:04       ` Michal Hocko
2020-05-15 13:01         ` Feng Tang
2020-05-08  7:25 ` [PATCH 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang
2020-05-08 11:24   ` Matthew Wilcox
2020-05-08 12:33     ` Feng Tang
2020-05-15  7:41   ` Michal Hocko
2020-05-15  8:02     ` Feng Tang
2020-05-15  9:08       ` Michal Hocko
2020-05-15 11:26         ` Feng Tang
2020-05-15  7:44   ` Michal Hocko
2020-05-15  8:38     ` Feng Tang
2020-05-21 21:27 ` [PATCH 0/3] make vm_committed_as_batch aware of " Qian Cai
2020-05-26 18:14   ` Qian Cai
2020-05-27  1:14     ` Andi Kleen
2020-05-27  1:43       ` Feng Tang
2020-05-27  1:46     ` Feng Tang
2020-05-27  2:25       ` Qian Cai
2020-05-27 10:46         ` Feng Tang
2020-05-27 12:05           ` Qian Cai
2020-05-27 13:33             ` Feng Tang
2020-05-27 15:42               ` Qian Cai
2020-05-28 14:18           ` Qian Cai
2020-05-28 15:10             ` Feng Tang [this message]
2020-05-28 15:21               ` Kleen, Andi
2020-05-29 15:43                 ` Feng Tang
2020-05-29 15:50                   ` Andi Kleen
2020-05-29 16:04                     ` Feng Tang
2020-05-28 15:48               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200528151020.GF93879@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=cai@lca.pw \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tim.c.chen@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).