linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [Question] Should direct reclaim time be bounded?
Date: Tue, 23 Apr 2019 09:19:53 +0200	[thread overview]
Message-ID: <20190423071953.GC25106@dhcp22.suse.cz> (raw)
In-Reply-To: <d38a095e-dc39-7e82-bb76-2c9247929f07@oracle.com>

On Mon 22-04-19 21:07:28, Mike Kravetz wrote:
[...]
> However, consider the case of a 2 node system where:
> node 0 has 2GB memory
> node 1 has 4GB memory
> 
> Now, if one wants to allocate 4GB of huge pages they may be tempted to simply,
> "echo 2048 > nr_hugepages".  At first this will go well until node 0 is out
> of memory.  When this happens, alloc_pool_huge_page() will continue to be
> called.  Because of that for_each_node_mask_to_alloc() macro, it will likely
> attempt to first allocate a page from node 0.  It will call direct reclaim and
> compaction until it fails.  Then, it will successfully allocate from node 1.

Yeah, the even distribution is quite a strong statement. We just try to
distribute somehow and it is likely to not work really great on system
with nodes that are different in size. I know it sucks but I've been
recommending to use the /sys/devices/system/node/node$N/hugepages/hugepages-2048kB/nr_hugepages
because that allows the define the actual policy much better. I guess we
want to be more specific about this in the documentation at least.

> In our distro kernel, I am thinking about making allocations try "less hard"
> on nodes where we start to see failures.  less hard == NORETRY/NORECLAIM.
> I was going to try something like this on an upstream kernel when I noticed
> that it seems like direct reclaim may never end/exit.  It 'may' exit, but I
> instrumented __alloc_pages_slowpath() and saw it take well over an hour
> before I 'tricked' it into exiting.
> 
> [ 5916.248341] hpage_slow_alloc: jiffies 5295742  tries 2   node 0 success
> [ 5916.249271]                   reclaim 5295741  compact 1

This is unexpected though. What does tries mean? Number of reclaim
attempts? If yes could you enable tracing to see what takes so long in
the reclaim path?

> This is where it stalled after "echo 4096 > nr_hugepages" on a little VM
> with 8GB total memory.
> 
> I have not started looking at the direct reclaim code to see exactly where
> we may be stuck, or trying really hard.  My question is, "Is this expected
> or should direct reclaim be somewhat bounded?"  With __alloc_pages_slowpath
> getting 'stuck' in direct reclaim, the documented behavior for huge page
> allocation is not going to happen.

Well, our "how hard to try for hugetlb pages" is quite arbitrary. We
used to rety as long as at least order worth of pages have been
reclaimed but that didn't make any sense since the lumpy reclaim was
gone. So the semantic has change to reclaim&compact as long as there is
some progress. From what I understad above it seems that you are not
thrashing and calling reclaim again and again but rather one reclaim
round takes ages.

That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It
looks like there is something wrong in the reclaim going on.

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-04-23  7:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23  4:07 [Question] Should direct reclaim time be bounded? Mike Kravetz
2019-04-23  7:19 ` Michal Hocko [this message]
2019-04-23 16:39   ` Mike Kravetz
2019-04-24 14:35     ` Vlastimil Babka
2019-06-28 18:20       ` Mike Kravetz
2019-07-01  8:59         ` Mel Gorman
2019-07-02  3:15           ` Mike Kravetz
2019-07-03  9:43             ` Mel Gorman
2019-07-03 23:54               ` Mike Kravetz
2019-07-04 11:09                 ` Michal Hocko
2019-07-04 15:11                   ` Mike Kravetz
2019-07-08  5:19             ` Hillf Danton
2019-07-10 18:42             ` Mike Kravetz
2019-07-10 19:44               ` Michal Hocko
2019-07-10 23:36                 ` Mike Kravetz
2019-07-11  7:12                   ` Michal Hocko
2019-07-12  9:49                     ` Mel Gorman
2019-07-11 15:44 Hillf Danton
2019-07-12  5:47 Hillf Danton
2019-07-13  1:11 ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190423071953.GC25106@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).