linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Alex Williamson <alex.williamson@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Subject: Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings
Date: Tue, 4 Sep 2018 23:44:03 -0400	[thread overview]
Message-ID: <20180905034403.GN4762@redhat.com> (raw)
In-Reply-To: <20180830164057.GK2656@dhcp22.suse.cz>

On Thu, Aug 30, 2018 at 06:40:57PM +0200, Michal Hocko wrote:
> On Thu 30-08-18 10:02:23, Zi Yan wrote:
> > On 30 Aug 2018, at 9:45, Michal Hocko wrote:
> > 
> > > On Thu 30-08-18 09:22:21, Zi Yan wrote:
> > >> On 30 Aug 2018, at 3:00, Michal Hocko wrote:
> > >>
> > >>> On Wed 29-08-18 18:54:23, Zi Yan wrote:
> > >>> [...]
> > >>>> I tested it against Linusa??s tree with a??memhog -r3 130ga?? in a two-socket machine with 128GB memory on
> > >>>> each node and got the results below. I expect this test should fill one node, then fall back to the other.
> > >>>>
> > >>>> 1. madvise(MADV_HUGEPAGE) + defrag = {always, madvise, defer+madvise}:
> > >>>> no swap, THPs are allocated in the fallback node.

no swap

> > >>>> 2. madvise(MADV_HUGEPAGE) + defrag = defer: pages got swapped to the
> > >>>> disk instead of being allocated in the fallback node.

swap

> > >>>> 3. no madvise, THP is on by default + defrag = {always, defer,
> > >>>> defer+madvise}: pages got swapped to the disk instead of being
> > >>>> allocated in the fallback node.

swap

> > >>>> 4. no madvise, THP is on by default + defrag = madvise: no swap, base
> > >>>> pages are allocated in the fallback node.

no swap

> > >>>> The result 2 and 3 seems unexpected, since pages should be allocated in the fallback node.

I agree it's not great for 2 and 3.

I don't see how the above can be considered a 100% "pass" to the test,
at best it's a 50% pass.

Let me clarify the setup to be sure:

1) There was no hard bind at all

2) Let's also ignore NUMA balancing which is all but restrictive at
   the start and it's meant to converge over time if current
   conditions don't allow immediate convergence. For simplicity let's
   assume NUMA balancing off.

So what the test exercised is the plain normal allocation of RAM with
THP main knob enabled to "always" on a NUMA system.

No matter the madvise used or not used, 2 cases over 4 decided to
swapout instead of allocating totally free THP or PAGE_SIZEd pages.

As opposed there would have been absolutely zero swapouts in the exact
same test if the main THP knob would have been disabled with:

     echo never >/sys/kernel/mm/transparent_hugepage/enabled

There is no way that enabling THP (no matter what other defrag
settings were and no matter if MADV_HUGEPAGE was used or not) should
cause heavy swap storms during page faults allocating memory, when
disabling THP doesn't swap even a single 4k page. That can't possibly
be right.

This is because there is no way the overhead of swapping can be
compensated by the THP improvement.

And with swapping I really mean "reclaim", just testing with the
swapout testcase is simpler and doesn't require an iommu pinning all
memory. So setting may_swap and may_unmap to zero won't move the
needle because my test showed just massive CPU consumption in trying
so hard to generate THP from the local node, but nothing got swapped
out because of the iommu pins.

That kind of swapping may only pay off in the very long long term,
which is what khugepaged is for. khugepaged already takes care of the
long term, so we could later argue and think if khugepaged should
swapout or not in such condition, but I don't think there's much to
argue about the page fault.

> Thanks for your and Stefan's testing. I will wait for some more
> feedback. I will be offline next few days and if there are no major
> objections I will repost with both tested-bys early next week.

I'm not so positive about 2 of the above tests if I understood the
test correctly.

Those results are totally fine if you used the non default memory
policy, but with MPOL_DEFAULT and in turn no hard bind of the memory,
I'm afraid it'll be even be harder to reproduce when things will go
wrong again in those two cases.

Thanks,
Andrea

  reply	other threads:[~2018-09-05  3:44 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-20  3:22 [PATCH 0/2] fix for "pathological THP behavior" Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 1/2] mm: thp: consolidate policy_nodemask call Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20  3:26   ` [PATCH 0/1] fix for "pathological THP behavior" v2 Andrea Arcangeli
2018-08-20  3:26     ` [PATCH 1/1] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20 12:35   ` [PATCH 2/2] " Zi Yan
2018-08-20 15:32     ` Andrea Arcangeli
2018-08-21 11:50   ` Michal Hocko
2018-08-21 21:40     ` Andrea Arcangeli
2018-08-22  9:02       ` Michal Hocko
2018-08-22 11:07         ` Michal Hocko
2018-08-22 14:24           ` Andrea Arcangeli
2018-08-22 14:45             ` Michal Hocko
2018-08-22 15:24               ` Andrea Arcangeli
2018-08-23 10:50                 ` Michal Hocko
2018-08-22 15:52         ` Andrea Arcangeli
2018-08-23 10:52           ` Michal Hocko
2018-08-28  7:53             ` Michal Hocko
2018-08-28  8:18               ` Michal Hocko
2018-08-28  8:54                 ` Stefan Priebe - Profihost AG
2018-08-29 11:11                   ` Stefan Priebe - Profihost AG
     [not found]                 ` <D5F4A33C-0A37-495C-9468-D6866A862097@cs.rutgers.edu>
2018-08-29 14:28                   ` Michal Hocko
2018-08-29 14:35                     ` Michal Hocko
2018-08-29 15:22                       ` Zi Yan
2018-08-29 15:47                         ` Michal Hocko
2018-08-29 16:06                           ` Zi Yan
2018-08-29 16:25                             ` Michal Hocko
2018-08-29 19:24                               ` [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-08-29 22:54                                 ` Zi Yan
2018-08-30  7:00                                   ` Michal Hocko
2018-08-30 13:22                                     ` Zi Yan
2018-08-30 13:45                                       ` Michal Hocko
2018-08-30 14:02                                         ` Zi Yan
2018-08-30 16:19                                           ` Stefan Priebe - Profihost AG
2018-08-30 16:40                                           ` Michal Hocko
2018-09-05  3:44                                             ` Andrea Arcangeli [this message]
2018-09-05  7:08                                               ` Michal Hocko
2018-09-06 11:10                                                 ` Vlastimil Babka
2018-09-06 11:16                                                   ` Vlastimil Babka
2018-09-06 11:25                                                     ` Michal Hocko
2018-09-06 12:35                                                       ` Zi Yan
2018-09-06 10:59                                   ` Vlastimil Babka
2018-09-06 11:17                                     ` Zi Yan
2018-08-30  6:47                                 ` Michal Hocko
2018-09-06 11:18                                   ` Vlastimil Babka
2018-09-06 11:27                                     ` Michal Hocko
2018-09-12 17:29                                 ` Mel Gorman
2018-09-17  6:11                                   ` Michal Hocko
2018-09-17  7:04                                     ` Stefan Priebe - Profihost AG
2018-09-17  9:32                                       ` Stefan Priebe - Profihost AG
2018-09-17 11:27                                       ` Michal Hocko
2018-08-20 11:58 ` [PATCH 0/2] fix for "pathological THP behavior" Kirill A. Shutemov
2018-08-20 15:19   ` Andrea Arcangeli
2018-08-21 15:30     ` Vlastimil Babka
2018-08-21 17:26       ` David Rientjes
2018-08-21 22:18         ` Andrea Arcangeli
2018-08-21 22:05       ` Andrea Arcangeli
2018-08-22  9:24       ` Michal Hocko
2018-08-22 15:56         ` Andrea Arcangeli
2018-08-20 19:06   ` Yang Shi
2018-08-20 23:24     ` Andrea Arcangeli
2018-09-07 13:05 [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-09-08 18:52 ` Stefan Priebe - Profihost AG
2018-09-10  7:39   ` Michal Hocko
2018-09-11  9:03   ` Vlastimil Babka
2018-09-10 20:08 ` David Rientjes
2018-09-10 20:22   ` Stefan Priebe - Profihost AG
2018-09-11  8:51   ` Vlastimil Babka
2018-09-11 11:56   ` Michal Hocko
2018-09-11 20:30     ` David Rientjes
2018-09-12 12:05       ` Michal Hocko
2018-09-12 20:40         ` David Rientjes
2018-09-12 13:54     ` Andrea Arcangeli
2018-09-12 14:21       ` Michal Hocko
2018-09-12 15:25         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180905034403.GN4762@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=s.priebe@profihost.ag \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).