linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	ying.huang@intel.com, s.priebe@profihost.ag,
	mgorman@techsingularity.net,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name,
	Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression
Date: Mon, 3 Dec 2018 12:26:28 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1812031210490.192288@chino.kir.corp.google.com> (raw)
In-Reply-To: <20181203192344.GA2986@redhat.com>

On Mon, 3 Dec 2018, Andrea Arcangeli wrote:

> It's trivial to reproduce the badness by running a memhog process that
> allocates more than the RAM of 1 NUMA node, under defrag=always
> setting (or by changing memhog to use MADV_HUGEPAGE) and it'll create
> swap storms despite 75% of the RAM is completely free in a 4 node NUMA
> (or 50% of RAM free in a 2 node NUMA) etc..
> 
> How can it be ok to push the system into gigabytes of swap by default
> without any special capability despite 50% - 75% or more of the RAM is
> free? That's the downside of the __GFP_THISNODE optimizaton.
> 

The swap storm is the issue that is being addressed.  If your remote 
memory is as low as local memory, the patch to clear __GFP_THISNODE has 
done nothing to fix it: you still get swap storms and memory compaction 
can still fail if the per-zone freeing scanner cannot utilize the 
reclaimed memory.  Recall that this patch to clear __GFP_THISNODE was 
measured by me to have a 40% increase in allocation latency for fragmented 
remote memory on Haswell.  It makes the problem much, much worse.

> __GFP_THISNODE helps increasing NUMA locality if your app can fit in a
> single node which is the common David's workload. But if his workload
> would more often than not fit in a single node, he would also run into
> an unacceptable slowdown because of the __GFP_THISNODE.
> 

Which is why I have suggested that we do not do direct reclaim, as the 
page allocator implementation expects all thp page fault allocations to 
have __GFP_NORETRY set, because no amount of reclaim can be shown to be 
useful to the memory compaction freeing scanner if it is iterated over by 
the migration scanner.

> I think there's lots of room for improvement for the future, but in my
> view that __GFP_THISNODE as it was implemented was an incomplete hack,
> that opened the door for bad VM corner cases that should not happen.
> 

__GFP_THISNODE is intended specifically because of the remote access 
latency increase that is encountered if you fault remote hugepages over 
local pages of the native page size.

  reply	other threads:[~2018-12-03 20:26 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27  6:25 [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression kernel test robot
2018-11-27 17:08 ` Linus Torvalds
2018-11-27 18:17   ` Michal Hocko
2018-11-27 18:21     ` Michal Hocko
2018-11-27 19:05   ` Vlastimil Babka
2018-11-27 19:16     ` Vlastimil Babka
2018-11-27 20:57   ` Andrea Arcangeli
2018-11-27 22:50     ` Linus Torvalds
2018-11-28  6:30       ` Michal Hocko
2018-11-28  3:20     ` Huang, Ying
2018-11-28 16:48       ` Linus Torvalds
2018-11-28 18:39         ` Andrea Arcangeli
2018-11-28 23:10         ` David Rientjes
2018-12-03 18:01         ` Linus Torvalds
2018-12-03 18:14           ` Michal Hocko
2018-12-03 18:19             ` Linus Torvalds
2018-12-03 18:30               ` Michal Hocko
2018-12-03 18:45                 ` Linus Torvalds
2018-12-03 18:59                   ` Michal Hocko
2018-12-03 19:23                     ` Andrea Arcangeli
2018-12-03 20:26                       ` David Rientjes [this message]
2018-12-03 19:28                     ` Linus Torvalds
2018-12-03 20:12                       ` Andrea Arcangeli
2018-12-03 20:36                         ` David Rientjes
2018-12-03 22:04                         ` Linus Torvalds
2018-12-03 22:27                           ` Linus Torvalds
2018-12-03 22:57                             ` David Rientjes
2018-12-04  9:22                             ` Vlastimil Babka
2018-12-04 10:45                               ` Mel Gorman
2018-12-05  0:47                                 ` David Rientjes
2018-12-05  9:08                                   ` Michal Hocko
2018-12-05 10:43                                     ` Mel Gorman
2018-12-05 11:43                                       ` Michal Hocko
2018-12-05 10:06                                 ` Mel Gorman
2018-12-05 20:40                                 ` Andrea Arcangeli
2018-12-05 21:59                                   ` David Rientjes
2018-12-06  0:00                                     ` Andrea Arcangeli
2018-12-05 22:03                                   ` Linus Torvalds
2018-12-05 22:12                                     ` David Rientjes
2018-12-05 23:36                                     ` Andrea Arcangeli
2018-12-05 23:51                                       ` Linus Torvalds
2018-12-06  0:58                                         ` Linus Torvalds
2018-12-06  9:14                                           ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko
2018-12-06 23:49                                             ` David Rientjes
2018-12-07  7:34                                               ` Michal Hocko
2018-12-07  4:31                                             ` Linus Torvalds
2018-12-07  7:49                                               ` Michal Hocko
2018-12-07  9:06                                                 ` Vlastimil Babka
2018-12-07 23:15                                                   ` David Rientjes
2018-12-06 23:43                                           ` [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes
2018-12-07  4:01                                             ` Linus Torvalds
2018-12-10  0:29                                               ` David Rientjes
2018-12-10  4:49                                                 ` Andrea Arcangeli
2018-12-12  0:37                                                   ` David Rientjes
2018-12-12  9:50                                                     ` Michal Hocko
2018-12-12 17:00                                                       ` Andrea Arcangeli
2018-12-14 11:32                                                         ` Michal Hocko
2018-12-12 10:14                                                     ` Vlastimil Babka
2018-12-14 21:04                                                       ` David Rientjes
2018-12-14 21:33                                                         ` Vlastimil Babka
2018-12-21 22:18                                                           ` David Rientjes
2018-12-22 12:08                                                             ` Mel Gorman
2018-12-14 23:11                                                         ` Mel Gorman
2018-12-21 22:15                                                           ` David Rientjes
2018-12-12 10:44                                                   ` Andrea Arcangeli
2019-04-15 11:48                                             ` Michal Hocko
2018-12-06  0:18                                       ` David Rientjes
2018-12-06  0:54                                         ` Andrea Arcangeli
2018-12-06  9:23                                           ` Vlastimil Babka
2018-12-03 20:39                     ` David Rientjes
2018-12-03 21:25                       ` Michal Hocko
2018-12-03 21:53                         ` David Rientjes
2018-12-04  8:48                           ` Michal Hocko
2018-12-05  0:07                             ` David Rientjes
2018-12-05 10:18                               ` Michal Hocko
2018-12-05 19:16                                 ` David Rientjes
2018-11-27  7:23 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1812031210490.192288@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=s.priebe@profihost.ag \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).