linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: linux-mm@kvack.org
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: hugetlb page migration vs. overcommit
Date: Tue, 28 Nov 2017 11:19:07 +0100	[thread overview]
Message-ID: <20171128101907.jtjthykeuefxu7gl@dhcp22.suse.cz> (raw)
In-Reply-To: <20171122152832.iayefrlxbugphorp@dhcp22.suse.cz>

On Wed 22-11-17 16:28:32, Michal Hocko wrote:
> Hi,
> is there any reason why we enforce the overcommit limit during hugetlb
> pages migration? It's in alloc_huge_page_node->__alloc_buddy_huge_page
> path. I am wondering whether this is really an intentional behavior.
> The page migration allocates a page just temporarily so we should be
> able to go over the overcommit limit for the migration duration. The
> reason I am asking is that hugetlb pages tend to be utilized usually
> (otherwise the memory would be just wasted and pool shrunk) but then
> the migration simply fails which breaks memory hotplug and other
> migration dependent functionality which is quite suboptimal. You can
> workaround that by increasing the overcommit limit.
> 
> Why don't we simply migrate as long as we are able to allocate the
> target hugetlb page? I have a half baked patch to remove this
> restriction, would there be an opposition to do something like that?

So I finally got to think about this some more and looked at how we
actually account things more thoroughly. And it is, you both of you
expected, quite subtle and not easy to get around. Per NUMA pools make
things quite complicated. Why? Migration can really increase the overall
pool size. Say we are migrating from Node1 to Node2. Node2 doesn't have
any pre-allocated pages but assume that the overcommit allows us to move
on. All good. Except that the original page will return to the pool
because free_huge_page will see Node1 without any surplus pages and
therefore moves back the page to the pool. Node2 will release the
surplus page only after it is freed which can be an unbound amount of
time. 

While we are still effectively under the overcommit limit the semantic
is kind of strange and I am not sure the behavior is really intended.
I see why per node surplus counter is used here. We simply want to
maintain per node counts after regular page free. So I was thinking
to add a temporary/migrate state to the huge page for migration pages
(start with new page, state transfered to the old page on success) and
free such a page to the allocator regardless of the surplus counters.

This would mean that the page migration might change inter node pool
sizes but I guess that should be acceptable. What do you guys think?
I can send a draft patch if that helps you to understand the idea.
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2017-11-28 10:19 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 15:28 hugetlb page migration vs. overcommit Michal Hocko
2017-11-22 19:11 ` Mike Kravetz
2017-11-23  9:21   ` Michal Hocko
2017-11-27  6:27   ` Naoya Horiguchi
2017-11-28 10:19 ` Michal Hocko [this message]
2017-11-28 14:12   ` Michal Hocko
2017-11-28 14:12     ` [PATCH RFC 1/2] mm, hugetlb: unify core page allocation accounting and initialization Michal Hocko
2017-11-28 21:34       ` Mike Kravetz
2017-11-29  6:57         ` Michal Hocko
2017-11-29 19:09           ` Mike Kravetz
2017-11-28 14:12     ` [PATCH RFC 2/2] mm, hugetlb: do not rely on overcommit limit during migration Michal Hocko
2017-11-29  1:39       ` Mike Kravetz
2017-11-29  7:17         ` Michal Hocko
2017-11-29  9:22       ` Michal Hocko
2017-11-29  9:40         ` Michal Hocko
2017-11-29 11:23         ` Michal Hocko
2017-11-29 19:52         ` Mike Kravetz
2017-11-30  7:57           ` Michal Hocko
2017-11-30 19:35             ` Mike Kravetz
2017-11-30 19:57               ` Michal Hocko
2017-11-30 20:06                 ` Michal Hocko
2017-11-29  9:51       ` Michal Hocko
2017-11-29 11:33       ` [PATCH RFC v2 " Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171128101907.jtjthykeuefxu7gl@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).