archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <>
To: Hugh Dickins <>
Cc: Yu Xu <>,
	Andrew Morton <>,
	 Mel Gorman <>,
	Andrea Arcangeli <>,
	Matthew Wilcox <>,
	 Michal Hocko <>, Vlastimil Babka <>,
	"Kirill A. Shutemov" <>,,,
Subject: Re: [PATCH v2] mm,thp,shmem: limit shmem THP alloc gfp_mask
Date: Thu, 22 Oct 2020 23:40:53 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <alpine.LSU.2.11.2010221909060.1001@eggly.anvils>

[-- Attachment #1: Type: text/plain, Size: 3973 bytes --]

On Thu, 2020-10-22 at 19:54 -0700, Hugh Dickins wrote:
> On Thu, 22 Oct 2020, Rik van Riel wrote:
> > The allocation flags of anonymous transparent huge pages can be
> controlled
> > through the files in /sys/kernel/mm/transparent_hugepage/defrag,
> which can
> > help the system from getting bogged down in the page reclaim and
> compaction
> > code when many THPs are getting allocated simultaneously.
> > 
> > However, the gfp_mask for shmem THP allocations were not limited by
> those
> > configuration settings, and some workloads ended up with all CPUs
> stuck
> > on the LRU lock in the page reclaim code, trying to allocate dozens
> of
> > THPs simultaneously.
> > 
> > This patch applies the same configurated limitation of THPs to
> shmem
> > hugepage allocations, to prevent that from happening.
> > 
> > This way a THP defrag setting of "never" or "defer+madvise" will
> result
> > in quick allocation failures without direct reclaim when no 2MB
> free
> > pages are available.
> > 
> > Signed-off-by: Rik van Riel <>
> NAK in its present untested form: see below.

Oops. That issue is easy to fix, but indeed lets figure
out what the desired behavior is.

> I'm open to change here, particularly to Yu Xu's point (in other
> mail)
> about direct reclaim - we avoid that here in Google too: though it's
> not so much to avoid the direct reclaim, as to avoid the latencies of
> direct compaction, which __GFP_DIRECT_RECLAIM allows as a side-
> effect.
> > @@ -1887,7 +1888,8 @@ static int shmem_getpage_gfp(struct inode
> *inode, pgoff_t index,
> >       }
> >  
> >  alloc_huge:
> > -     page = shmem_alloc_and_acct_page(gfp, inode, index, true);
> > +     huge_gfp = alloc_hugepage_direct_gfpmask(vma);
> Still looks nice: but what about the crash when vma is NULL?

That's a one line fix, but I suppose we should get the
discussion on what the code behavior should be out of
the way first :)

> Michal is right to remember pushback before, because tmpfs is a
> filesystem, and "huge=" is a mount option: in using a huge=always
> filesystem, the user has already declared a preference for huge
> pages.
> Whereas the original anon THP had to deduce that preference from sys
> tunables and vma madvice.


> But it's likely that they have accumulated some defrag wisdom, which
> tmpfs can take on board - but please accept that in using a huge
> mount,
> the preference for huge has already been expressed, so I don't expect
> anon THP alloc_hugepage_direct_gfpmask() choices will map one to one.

In my mind, the huge= mount options for tmpfs corresponded
to the "enabled" anon THP options, denoting a desired end
state, not necessarily how much we will stall allocations
to get there immediately.

The underlying allocation behavior has been changed repeatedly,
with changes to the direct reclaim code and the compaction
deferral code.

The shmem THP gfp_mask never tried really hard anyway,
with __GFP_NORETRY being the default, which matches what
is used for non-VM_HUGEPAGE anon VMAs.

Likewise, the direct reclaim done from the opportunistic
THP allocations done by the shmem code limited itself to
reclaiming 32 4kB pages per THP allocation.

In other words, mounting
with huge=always has never behaved
the same as the more aggressive allocations done for

This patch would leave shmem THP allocations for non-MADV_HUGEPAGE
mapped files opportunistic like today, and make shmem THP
allocations for files mapped with MADV_HUGEPAGE more aggressive
than today.

However, I would like to know what people think the shmem
huge= mount options should do, and how things should behave
when memory gets low, before pushing in a patch just because
it makes the system run smoother "without changing current
behavior too much".

What do people want tmpfs THP allocations to do?

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-10-23  3:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22 16:45 [PATCH v2] mm,thp,shmem: limit shmem THP alloc gfp_mask Rik van Riel
2020-10-22 16:52 ` Vlastimil Babka
2020-10-23  2:54 ` Hugh Dickins
2020-10-23  3:40   ` Rik van Riel [this message]
2020-10-23  8:49     ` Michal Hocko
2020-10-23 12:55     ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).