From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Eiichi Tsukata <eiichi.tsukata@nutanix.com>,
corbet@lwn.net, mike.kravetz@oracle.com, mcgrof@kernel.org,
keescook@chromium.org, yzaikin@google.com,
akpm@linux-foundation.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, felipe.franciosi@nutanix.com
Subject: Re: [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom
Date: Tue, 16 Feb 2021 13:53:12 -0800 (PST) [thread overview]
Message-ID: <b821f4de-3260-f6e2-469f-65ccfa699bb7@google.com> (raw)
In-Reply-To: <YCt+cVvWPbWvt2rG@dhcp22.suse.cz>
On Tue, 16 Feb 2021, Michal Hocko wrote:
> > Hugepages can be preallocated to avoid unpredictable allocation latency.
> > If we run into 4k page shortage, the kernel can trigger OOM even though
> > there were free hugepages. When OOM is triggered by user address page
> > fault handler, we can use oom notifier to free hugepages in user space
> > but if it's triggered by memory allocation for kernel, there is no way
> > to synchronously handle it in user space.
>
> Can you expand some more on what kind of problem do you see?
> Hugetlb pages are, by definition, a preallocated, unreclaimable and
> admin controlled pool of pages.
Small nit: true of non-surplus hugetlb pages.
> Under those conditions it is expected
> and required that the sizing would be done very carefully. Why is that a
> problem in your particular setup/scenario?
>
> If the sizing is really done properly and then a random process can
> trigger OOM then this can lead to malfunctioning of those workloads
> which do depend on hugetlb pool, right? So isn't this a kinda DoS
> scenario?
>
> > This patch introduces a new sysctl vm.sacrifice_hugepage_on_oom. If
> > enabled, it first tries to free a hugepage if available before invoking
> > the oom-killer. The default value is disabled not to change the current
> > behavior.
>
> Why is this interface not hugepage size aware? It is quite different to
> release a GB huge page or 2MB one. Or is it expected to release the
> smallest one? To the implementation...
>
> [...]
> > +static int sacrifice_hugepage(void)
> > +{
> > + int ret;
> > +
> > + spin_lock(&hugetlb_lock);
> > + ret = free_pool_huge_page(&default_hstate, &node_states[N_MEMORY], 0);
>
> ... no it is going to release the default huge page. This will be 2MB in
> most cases but this is not given.
>
> Unless I am mistaken this will free up also reserved hugetlb pages. This
> would mean that a page fault would SIGBUS which is very likely not
> something we want to do right? You also want to use oom nodemask rather
> than a full one.
>
> Overall, I am not really happy about this feature even when above is
> fixed, but let's hear more the actual problem first.
Shouldn't this behavior be possible as an oomd plugin instead, perhaps
triggered by psi? I'm not sure if oomd is intended only to kill something
(oomkilld? lol) or if it can be made to do sysadmin level behavior, such
as shrinking the hugetlb pool, to solve the oom condition.
If so, it seems like we want to do this at the absolute last minute. In
other words, reclaim has failed to free memory by other means so we would
like to shrink the hugetlb pool. (It's the reason why it's implemented as
a predecessor to oom as opposed to part of reclaim in general.)
Do we have the ability to suppress the oom killer until oomd has a chance
to react in this scenario?
next prev parent reply other threads:[~2021-02-16 21:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-16 3:07 [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom Eiichi Tsukata
2021-02-16 8:12 ` Michal Hocko
2021-02-16 21:53 ` David Rientjes [this message]
2021-02-16 21:53 ` David Rientjes
2021-02-17 7:54 ` Michal Hocko
2021-02-17 14:59 ` Shakeel Butt
2021-02-17 14:59 ` Shakeel Butt
2021-02-16 22:30 ` Mike Kravetz
2021-02-17 7:57 ` Michal Hocko
2021-02-17 10:42 ` Eiichi Tsukata
2021-02-17 12:31 ` Michal Hocko
2021-02-17 12:40 ` Michal Hocko
2021-02-18 12:22 ` Eiichi Tsukata
2021-02-18 12:39 ` Chris Down
2021-02-16 13:38 ` Chris Down
2021-02-17 9:09 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b821f4de-3260-f6e2-469f-65ccfa699bb7@google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=eiichi.tsukata@nutanix.com \
--cc=felipe.franciosi@nutanix.com \
--cc=keescook@chromium.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.