linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>,
	xuyu@linux.alibaba.com, mgorman@suse.de, aarcange@redhat.com,
	willy@infradead.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, linux-mm@kvack.org, mhocko@suse.com
Subject: Re: [PATCH v6 0/3] mm,thp,shm: limit shmem THP alloc gfp_mask
Date: Mon, 14 Dec 2020 23:52:54 +0100	[thread overview]
Message-ID: <e3f67a5f-9835-2752-3d35-fb5f6d701cf1@suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.2012141226350.1925@eggly.anvils>

On 12/14/20 10:16 PM, Hugh Dickins wrote:
> On Tue, 24 Nov 2020, Rik van Riel wrote:
> 
>> The allocation flags of anonymous transparent huge pages can be controlled
>> through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
>> help the system from getting bogged down in the page reclaim and compaction
>> code when many THPs are getting allocated simultaneously.
>> 
>> However, the gfp_mask for shmem THP allocations were not limited by those
>> configuration settings, and some workloads ended up with all CPUs stuck
>> on the LRU lock in the page reclaim code, trying to allocate dozens of
>> THPs simultaneously.
>> 
>> This patch applies the same configurated limitation of THPs to shmem
>> hugepage allocations, to prevent that from happening.
>> 
>> This way a THP defrag setting of "never" or "defer+madvise" will result
>> in quick allocation failures without direct reclaim when no 2MB free
>> pages are available.
>> 
>> With this patch applied, THP allocations for tmpfs will be a little
>> more aggressive than today for files mmapped with MADV_HUGEPAGE,
>> and a little less aggressive for files that are not mmapped or
>> mapped without that flag.
>> 
>> v6: make khugepaged actually obey tmpfs mount flags
>> v5: reduce gfp mask further if needed, to accomodate i915 (Matthew Wilcox)
>> v4: rename alloc_hugepage_direct_gfpmask to vma_thp_gfp_mask (Matthew Wilcox)
>> v3: fix NULL vma issue spotted by Hugh Dickins & tested
>> v2: move gfp calculation to shmem_getpage_gfp as suggested by Yu Xu
> 
> Andrew, please don't rush
> 
> mmthpshmem-limit-shmem-thp-alloc-gfp_mask.patch
> mmthpshm-limit-gfp-mask-to-no-more-than-specified.patch
> mmthpshmem-make-khugepaged-obey-tmpfs-mount-flags.patch
> 
> to Linus in your first wave of mmotm->5.11 sendings.
> Or, alternatively, go ahead and send them to Linus, but
> be aware that I'm fairly likely to want adjustments later.
> 
> Sorry for limping along so far behind, but I still have more
> re-reading of the threads to do, and I'm still investigating
> why tmpfs huge=always becomes so ineffective in my testing with
> these changes, even if I ramp up from default defrag=madvise to
> defrag=always:
>                     5.10   mmotm
> thp_file_alloc   4641788  216027
> thp_file_fallback 275339 8895647

So AFAICS before the patch shmem allocated hugepages basically with:
mapping_gfp_mask(inode->i_mapping) |  __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN
where mapping_gfp_mask() should be the default GFP_HIGHUSER_MOVABLE unless I
missed some shmem-specific override of the mask.

So the important flags mean all zones avilable, both __GFP_DIRECT_RECLAIM and
__GFP_KSWAPD_RECLAIM, but also __GFP_NORETRY which makes it less aggressive.

Now, with defrag=madvise and without madvised vma, there's just
GFP_TRANSHUGE_LIGHT, which means no __GFP_DIRECT_RECLAIM (and no
__GFP_KSWAPD_RECLAIM). Thus no reclaim and compaction at all. Indeed "little
less aggressive" is an understatement.

On the other hand, with defrag=always and again without madvised vma there
should be GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM | __GFP_NORETRY, so
compared to "before the patch" this is only missing __GFP_KSWAPD_RECLAIM. I
would be surprised if this meant so much difference in your testing as you show
above - I think you would have to be allocating those THPs just at a rate where
kswapd+kcompactd can keep up and nothing else "steals" the pages that background
reclaim+compaction creates.
In that (subjectively unlikely) case, I think significant improvement should be
visible with defrag=defer over defrag=madvise.

> I've been looking into it off and on for weeks (gfp_mask wrangling is
> not my favourite task! so tend to find higher priorities to divert me);
> hoped to arrive at a conclusion before merge window, but still have
> nothing constructive to say yet, hence my silence so far.
> 
> Above's "a little less aggressive" appears understatement at present.
> I respect what Rik is trying to achieve here, and I may end up
> concluding that there's nothing better to be done than what he has.
> My kind of hugepage-thrashing-in-low-memory may be so remote from
> normal usage, and could be skirting the latency horrors we all want
> to avoid: but I haven't reached that conclusion yet - the disparity
> in effectiveness still deserves more investigation.
> 
> (There's also a specific issue with the gfp_mask limiting: I have
> not yet reviewed the allowing and denying in detail, but it looks
> like it does not respect the caller's GFP_ZONEMASK - the gfp in
> shmem_getpage_gfp() and shmem_read_mapping_page_gfp() is there to
> satisfy the gma500, which wanted to use shmem but could only manage
> DMA32.  I doubt it wants THPS, but shmem_enabled=force forces them.)
> 
> Thanks,
> Hugh
> 



  parent reply	other threads:[~2020-12-14 22:52 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 19:49 [PATCH v6 0/3] mm,thp,shm: limit shmem THP alloc gfp_mask Rik van Riel
2020-11-24 19:49 ` [PATCH 1/3] mm,thp,shmem: " Rik van Riel
2020-11-26 16:56   ` Vlastimil Babka
2020-11-27  8:15   ` Michal Hocko
2020-11-24 19:49 ` [PATCH 2/3] mm,thp,shm: limit gfp mask to no more than specified Rik van Riel
2020-11-26 13:40   ` Michal Hocko
2020-11-26 18:04     ` Rik van Riel
2020-11-27  7:52       ` Michal Hocko
2020-11-27 19:03         ` Rik van Riel
2020-11-30 10:00           ` Michal Hocko
2020-11-30 14:40             ` Rik van Riel
2020-11-24 19:49 ` [PATCH 3/3] mm,thp,shmem: make khugepaged obey tmpfs mount flags Rik van Riel
2020-11-26 17:18   ` Vlastimil Babka
2020-11-26 18:14     ` Rik van Riel
2020-11-26 19:42       ` Vlastimil Babka
2020-11-26 20:14         ` Rik van Riel
2020-12-14 21:16 ` [PATCH v6 0/3] mm,thp,shm: limit shmem THP alloc gfp_mask Hugh Dickins
2020-12-14 22:20   ` Andrew Morton
2020-12-14 22:52   ` Vlastimil Babka [this message]
2021-02-24  8:41     ` Hugh Dickins
2021-02-24 14:46       ` Rik van Riel
2021-02-24 16:55         ` Hugh Dickins
2021-02-24 17:10           ` [PATCH 4/3] mm,shmem,thp: limit shmem THP allocations to requested zones Rik van Riel
2021-02-26 12:34             ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3f67a5f-9835-2752-3d35-fb5f6d701cf1@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=willy@infradead.org \
    --cc=xuyu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).