From: David Rientjes <rientjes@google.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
ying.huang@intel.com, s.priebe@profihost.ag,
mgorman@techsingularity.net,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name,
Andrew Morton <akpm@linux-foundation.org>,
zi.yan@cs.rutgers.edu, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression
Date: Mon, 3 Dec 2018 12:26:28 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.21.1812031210490.192288@chino.kir.corp.google.com> (raw)
In-Reply-To: <20181203192344.GA2986@redhat.com>
On Mon, 3 Dec 2018, Andrea Arcangeli wrote:
> It's trivial to reproduce the badness by running a memhog process that
> allocates more than the RAM of 1 NUMA node, under defrag=always
> setting (or by changing memhog to use MADV_HUGEPAGE) and it'll create
> swap storms despite 75% of the RAM is completely free in a 4 node NUMA
> (or 50% of RAM free in a 2 node NUMA) etc..
>
> How can it be ok to push the system into gigabytes of swap by default
> without any special capability despite 50% - 75% or more of the RAM is
> free? That's the downside of the __GFP_THISNODE optimizaton.
>
The swap storm is the issue that is being addressed. If your remote
memory is as low as local memory, the patch to clear __GFP_THISNODE has
done nothing to fix it: you still get swap storms and memory compaction
can still fail if the per-zone freeing scanner cannot utilize the
reclaimed memory. Recall that this patch to clear __GFP_THISNODE was
measured by me to have a 40% increase in allocation latency for fragmented
remote memory on Haswell. It makes the problem much, much worse.
> __GFP_THISNODE helps increasing NUMA locality if your app can fit in a
> single node which is the common David's workload. But if his workload
> would more often than not fit in a single node, he would also run into
> an unacceptable slowdown because of the __GFP_THISNODE.
>
Which is why I have suggested that we do not do direct reclaim, as the
page allocator implementation expects all thp page fault allocations to
have __GFP_NORETRY set, because no amount of reclaim can be shown to be
useful to the memory compaction freeing scanner if it is iterated over by
the migration scanner.
> I think there's lots of room for improvement for the future, but in my
> view that __GFP_THISNODE as it was implemented was an incomplete hack,
> that opened the door for bad VM corner cases that should not happen.
>
__GFP_THISNODE is intended specifically because of the remote access
latency increase that is encountered if you fault remote hugepages over
local pages of the native page size.
next prev parent reply other threads:[~2018-12-03 20:26 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-27 6:25 [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression kernel test robot
2018-11-27 17:08 ` Linus Torvalds
2018-11-27 18:17 ` Michal Hocko
2018-11-27 18:21 ` Michal Hocko
2018-11-27 19:05 ` Vlastimil Babka
2018-11-27 19:16 ` Vlastimil Babka
2018-11-27 20:57 ` Andrea Arcangeli
2018-11-27 22:50 ` Linus Torvalds
2018-11-28 6:30 ` Michal Hocko
2018-11-28 3:20 ` Huang, Ying
2018-11-28 16:48 ` Linus Torvalds
2018-11-28 18:39 ` Andrea Arcangeli
2018-11-28 23:10 ` David Rientjes
2018-12-03 18:01 ` Linus Torvalds
2018-12-03 18:14 ` Michal Hocko
2018-12-03 18:19 ` Linus Torvalds
2018-12-03 18:30 ` Michal Hocko
2018-12-03 18:45 ` Linus Torvalds
2018-12-03 18:59 ` Michal Hocko
2018-12-03 19:23 ` Andrea Arcangeli
2018-12-03 20:26 ` David Rientjes [this message]
2018-12-03 19:28 ` Linus Torvalds
2018-12-03 20:12 ` Andrea Arcangeli
2018-12-03 20:36 ` David Rientjes
2018-12-03 22:04 ` Linus Torvalds
2018-12-03 22:27 ` Linus Torvalds
2018-12-03 22:57 ` David Rientjes
2018-12-04 9:22 ` Vlastimil Babka
2018-12-04 10:45 ` Mel Gorman
2018-12-05 0:47 ` David Rientjes
2018-12-05 9:08 ` Michal Hocko
2018-12-05 10:43 ` Mel Gorman
2018-12-05 11:43 ` Michal Hocko
2018-12-05 10:06 ` Mel Gorman
2018-12-05 20:40 ` Andrea Arcangeli
2018-12-05 21:59 ` David Rientjes
2018-12-06 0:00 ` Andrea Arcangeli
2018-12-05 22:03 ` Linus Torvalds
2018-12-05 22:12 ` David Rientjes
2018-12-05 23:36 ` Andrea Arcangeli
2018-12-05 23:51 ` Linus Torvalds
2018-12-06 0:58 ` Linus Torvalds
2018-12-06 9:14 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko
2018-12-06 23:49 ` David Rientjes
2018-12-07 7:34 ` Michal Hocko
2018-12-07 4:31 ` Linus Torvalds
2018-12-07 7:49 ` Michal Hocko
2018-12-07 9:06 ` Vlastimil Babka
2018-12-07 23:15 ` David Rientjes
2018-12-06 23:43 ` [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes
2018-12-07 4:01 ` Linus Torvalds
2018-12-10 0:29 ` David Rientjes
2018-12-10 4:49 ` Andrea Arcangeli
2018-12-12 0:37 ` David Rientjes
2018-12-12 9:50 ` Michal Hocko
2018-12-12 17:00 ` Andrea Arcangeli
2018-12-14 11:32 ` Michal Hocko
2018-12-12 10:14 ` Vlastimil Babka
2018-12-14 21:04 ` David Rientjes
2018-12-14 21:33 ` Vlastimil Babka
2018-12-21 22:18 ` David Rientjes
2018-12-22 12:08 ` Mel Gorman
2018-12-14 23:11 ` Mel Gorman
2018-12-21 22:15 ` David Rientjes
2018-12-12 10:44 ` Andrea Arcangeli
2019-04-15 11:48 ` Michal Hocko
2018-12-06 0:18 ` David Rientjes
2018-12-06 0:54 ` Andrea Arcangeli
2018-12-06 9:23 ` Vlastimil Babka
2018-12-03 20:39 ` David Rientjes
2018-12-03 21:25 ` Michal Hocko
2018-12-03 21:53 ` David Rientjes
2018-12-04 8:48 ` Michal Hocko
2018-12-05 0:07 ` David Rientjes
2018-12-05 10:18 ` Michal Hocko
2018-12-05 19:16 ` David Rientjes
2018-11-27 7:23 kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.1812031210490.192288@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@01.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=s.priebe@profihost.ag \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=ying.huang@intel.com \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).