linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Alex Williamson <alex.williamson@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always
Date: Wed, 22 Aug 2018 11:02:14 +0200	[thread overview]
Message-ID: <20180822090214.GF29735@dhcp22.suse.cz> (raw)
In-Reply-To: <20180821214049.GG13047@redhat.com>

On Tue 21-08-18 17:40:49, Andrea Arcangeli wrote:
> On Tue, Aug 21, 2018 at 01:50:57PM +0200, Michal Hocko wrote:
[...]
> > I really detest a new gfp flag for one time semantic that is muddy as
> > hell.
> 
> Well there's no way to fix this other than to prevent reclaim to run,
> if you still want to give a chance to page faults to obtain THP under
> MADV_HUGEPAGE in the page fault without waiting minutes or hours for
> khugpaged to catch up with it.

I do not get that part. Why should caller even care about reclaim vs.
compaction. How can you even make an educated guess what makes more
sense? This should be fully controlled by the allocator path. The caller
should only care about how hard to try. It's been some time since I've
looked but we used to have a gfp flags to tell that for THP allocations
as well.

> > This is simply incomprehensible. How can anybody who is not deeply
> > familiar with the allocator/reclaim internals know when to use it.
> 
> Nobody should use this in drivers, it's a __GFP flag.

Like other __GFP flags (e.g. __GFP_NOWARN, __GFP_COMP, __GFP_ZERO and
many others)?

> Note:
> 
> 	if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) {
> 
> #define alloc_hugepage_vma(gfp_mask, vma, addr, order)	\
> 	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
> 
> Only THP is ever affected by the BUG so nothing else will ever need to
> call __GFP_COMPACT_ONLY. It is a VM internal flag, I wish there was a
> way to make the build fail if a driver would use it but there isn't
> right now.

My experience tells me that there is nothing like an internal gfp flag
and abusing them is quite common and really hard to get rid of. Having a
THP specific gfp flag has also turned out to be a bad idea (e.g. GFP_THISNODE).

> > If this is really a regression then we should start by pinpointing the
> 
> You can check yourself, create a 2 node vnuma guest or pick any host
> with more than one node. Set defrag=always and run "memhog -r11111111
> 18g" if host has 16g per node. Add some swap and notice the swap storm
> while all ram is left free in the other node.

I am not disputing the bug itself. How hard should defrag=allways really
try is good question and I would say different people would have
different ideas but a swapping storm sounds like genuinely unwanted
behavior. I would expect that to be handled in the reclaim/compaction.
GFP_TRANSHUGE doesn't have ___GFP_RETRY_MAYFAIL so it shouldn't really
try too hard to reclaim.

> > real culprit and go from there. If this is really 5265047ac301 then just
> 
> In my view there's no single culprit, but it was easy to identify the
> last drop that made the MADV_HUGEPAGE glass overflow, and it's that
> commit that adds __GFP_THISNODE. The combination of the previous code
> that prioritized NUMA over THP and then the MADV_HUGEPAGE logic that
> still uses compaction (and in turn reclaim if compaction fails with
> COMPACT_SKIPPED because there's no 4k page in the local node) just
> falls apart with __GFP_THISNODE set as well on top of it and it
> doesn't do the expected thing either without it (i.e. THP gets
> priority over NUMA locality without such flag).
> 
> __GFP_THISNODE and the logic there, only works ok when
> __GFP_DIRECT_RECLAIM is not set, i.e. MADV_HUGEPAGE not set.

I still have to digest the __GFP_THISNODE thing but I _think_ that the
alloc_pages_vma code is just trying to be overly clever and
__GFP_THISNODE is not a good fit for it. 

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-08-22  9:02 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-20  3:22 [PATCH 0/2] fix for "pathological THP behavior" Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 1/2] mm: thp: consolidate policy_nodemask call Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20  3:26   ` [PATCH 0/1] fix for "pathological THP behavior" v2 Andrea Arcangeli
2018-08-20  3:26     ` [PATCH 1/1] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20 12:35   ` [PATCH 2/2] " Zi Yan
2018-08-20 15:32     ` Andrea Arcangeli
2018-08-21 11:50   ` Michal Hocko
2018-08-21 21:40     ` Andrea Arcangeli
2018-08-22  9:02       ` Michal Hocko [this message]
2018-08-22 11:07         ` Michal Hocko
2018-08-22 14:24           ` Andrea Arcangeli
2018-08-22 14:45             ` Michal Hocko
2018-08-22 15:24               ` Andrea Arcangeli
2018-08-23 10:50                 ` Michal Hocko
2018-08-22 15:52         ` Andrea Arcangeli
2018-08-23 10:52           ` Michal Hocko
2018-08-28  7:53             ` Michal Hocko
2018-08-28  8:18               ` Michal Hocko
2018-08-28  8:54                 ` Stefan Priebe - Profihost AG
2018-08-29 11:11                   ` Stefan Priebe - Profihost AG
     [not found]                 ` <D5F4A33C-0A37-495C-9468-D6866A862097@cs.rutgers.edu>
2018-08-29 14:28                   ` Michal Hocko
2018-08-29 14:35                     ` Michal Hocko
2018-08-29 15:22                       ` Zi Yan
2018-08-29 15:47                         ` Michal Hocko
2018-08-29 16:06                           ` Zi Yan
2018-08-29 16:25                             ` Michal Hocko
2018-08-29 19:24                               ` [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-08-29 22:54                                 ` Zi Yan
2018-08-30  7:00                                   ` Michal Hocko
2018-08-30 13:22                                     ` Zi Yan
2018-08-30 13:45                                       ` Michal Hocko
2018-08-30 14:02                                         ` Zi Yan
2018-08-30 16:19                                           ` Stefan Priebe - Profihost AG
2018-08-30 16:40                                           ` Michal Hocko
2018-09-05  3:44                                             ` Andrea Arcangeli
2018-09-05  7:08                                               ` Michal Hocko
2018-09-06 11:10                                                 ` Vlastimil Babka
2018-09-06 11:16                                                   ` Vlastimil Babka
2018-09-06 11:25                                                     ` Michal Hocko
2018-09-06 12:35                                                       ` Zi Yan
2018-09-06 10:59                                   ` Vlastimil Babka
2018-09-06 11:17                                     ` Zi Yan
2018-08-30  6:47                                 ` Michal Hocko
2018-09-06 11:18                                   ` Vlastimil Babka
2018-09-06 11:27                                     ` Michal Hocko
2018-09-12 17:29                                 ` Mel Gorman
2018-09-17  6:11                                   ` Michal Hocko
2018-09-17  7:04                                     ` Stefan Priebe - Profihost AG
2018-09-17  9:32                                       ` Stefan Priebe - Profihost AG
2018-09-17 11:27                                       ` Michal Hocko
2018-08-20 11:58 ` [PATCH 0/2] fix for "pathological THP behavior" Kirill A. Shutemov
2018-08-20 15:19   ` Andrea Arcangeli
2018-08-21 15:30     ` Vlastimil Babka
2018-08-21 17:26       ` David Rientjes
2018-08-21 22:18         ` Andrea Arcangeli
2018-08-21 22:05       ` Andrea Arcangeli
2018-08-22  9:24       ` Michal Hocko
2018-08-22 15:56         ` Andrea Arcangeli
2018-08-20 19:06   ` Yang Shi
2018-08-20 23:24     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180822090214.GF29735@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).