From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
"xishi.qiuxishi@alibaba-inc.com" <xishi.qiuxishi@alibaba-inc.com>,
"zy.zhengyi@alibaba-inc.com" <zy.zhengyi@alibaba-inc.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 1/2] mm: fix race on soft-offlining free huge pages
Date: Wed, 18 Jul 2018 01:41:06 +0000 [thread overview]
Message-ID: <20180718014106.GC12184@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20180718005528.GA12184@hori1.linux.bs1.fc.nec.co.jp>
On Wed, Jul 18, 2018 at 12:55:29AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> On Tue, Jul 17, 2018 at 04:27:43PM +0200, Michal Hocko wrote:
> > On Tue 17-07-18 14:32:31, Naoya Horiguchi wrote:
> > > There's a race condition between soft offline and hugetlb_fault which
> > > causes unexpected process killing and/or hugetlb allocation failure.
> > >
> > > The process killing is caused by the following flow:
> > >
> > > CPU 0 CPU 1 CPU 2
> > >
> > > soft offline
> > > get_any_page
> > > // find the hugetlb is free
> > > mmap a hugetlb file
> > > page fault
> > > ...
> > > hugetlb_fault
> > > hugetlb_no_page
> > > alloc_huge_page
> > > // succeed
> > > soft_offline_free_page
> > > // set hwpoison flag
> > > mmap the hugetlb file
> > > page fault
> > > ...
> > > hugetlb_fault
> > > hugetlb_no_page
> > > find_lock_page
> > > return VM_FAULT_HWPOISON
> > > mm_fault_error
> > > do_sigbus
> > > // kill the process
> > >
> > >
> > > The hugetlb allocation failure comes from the following flow:
> > >
> > > CPU 0 CPU 1
> > >
> > > mmap a hugetlb file
> > > // reserve all free page but don't fault-in
> > > soft offline
> > > get_any_page
> > > // find the hugetlb is free
> > > soft_offline_free_page
> > > // set hwpoison flag
> > > dissolve_free_huge_page
> > > // fail because all free hugepages are reserved
> > > page fault
> > > ...
> > > hugetlb_fault
> > > hugetlb_no_page
> > > alloc_huge_page
> > > ...
> > > dequeue_huge_page_node_exact
> > > // ignore hwpoisoned hugepage
> > > // and finally fail due to no-mem
> > >
> > > The root cause of this is that current soft-offline code is written
> > > based on an assumption that PageHWPoison flag should beset at first to
> > > avoid accessing the corrupted data. This makes sense for memory_failure()
> > > or hard offline, but does not for soft offline because soft offline is
> > > about corrected (not uncorrected) error and is safe from data lost.
> > > This patch changes soft offline semantics where it sets PageHWPoison flag
> > > only after containment of the error page completes successfully.
> >
> > Could you please expand on the worklow here please? The code is really
> > hard to grasp. I must be missing something because the thing shouldn't
> > be really complicated. Either the page is in the free pool and you just
> > remove it from the allocator (with hugetlb asking for a new hugeltb page
> > to guaratee reserves) or it is used and you just migrate the content to
> > a new page (again with the hugetlb reserves consideration). Why should
> > PageHWPoison flag ordering make any relevance?
>
> (Considering soft offlining free hugepage,)
> PageHWPoison is set at first before this patch, which is racy with
> hugetlb fault code because it's not protected by hugetlb_lock.
>
> Originally this was written in the similar manner as hard-offline, where
> the race is accepted and a PageHWPoison flag is set as soon as possible.
> But actually that's found not necessary/correct because soft offline is
> supposed to be less aggressive and failure is OK.
>
> So this patch is suggesting to make soft-offline less aggressive
> by moving SetPageHWPoison into the lock.
My apology, this part of reasoning was incorrect. What patch 1/2 actually
does is transforming the issue into the normal page's similar race issue
which is solved by patch 2/2. After patch 1/2, soft offline never sets
PageHWPoison on hugepage.
Thanks,
Naoya Horiguchi
>
> >
> > Do I get it right that the only difference between the hard and soft
> > offlining is that hugetlb reserves might break for the former while not
> > for the latter
>
> Correct.
>
> > and that the failed migration kills all owners for the
> > former while not for latter?
>
> Hard-offline doesn't cause any page migration because the data is already
> lost, but yes it can kill the owners.
> Soft-offline never kills processes even if it fails (due to migration failrue
> or some other reasons.)
>
> I listed below some common points and differences between hard-offline
> and soft-offline.
>
> common points
> - they are both contained by PageHWPoison flag,
> - error is injected via simliar interfaces.
>
> differences
> - the data on the page is considered lost in hard offline, but is not
> in soft offline,
> - hard offline likely kills the affected processes, but soft offline
> never kills processes,
> - soft offline causes page migration, but hard offline does not,
> - hard offline prioritizes to prevent consumption of broken data with
> accepting some race, and soft offline prioritizes not to impact
> userspace with accepting failure.
>
> Looks to me that there're more differences rather than commont points.
next prev parent reply other threads:[~2018-07-18 1:49 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-17 5:32 [PATCH v2 0/2] mm: soft-offline: fix race against page allocation Naoya Horiguchi
2018-07-17 5:32 ` [PATCH v2 1/2] mm: fix race on soft-offlining free huge pages Naoya Horiguchi
2018-07-17 14:27 ` Michal Hocko
2018-07-17 20:10 ` Mike Kravetz
2018-07-18 1:28 ` Naoya Horiguchi
2018-07-18 2:36 ` Mike Kravetz
2018-07-18 0:55 ` Naoya Horiguchi
2018-07-18 1:41 ` Naoya Horiguchi [this message]
2018-07-18 8:50 ` Michal Hocko
2018-07-19 6:19 ` Naoya Horiguchi
2018-07-19 7:15 ` Michal Hocko
2018-07-19 8:08 ` Naoya Horiguchi
2018-07-19 8:27 ` Michal Hocko
2018-07-19 9:22 ` Naoya Horiguchi
2018-07-19 10:32 ` Michal Hocko
2018-07-17 5:32 ` [PATCH v2 2/2] mm: soft-offline: close the race against page allocation Naoya Horiguchi
2018-08-15 22:43 ` [PATCH v2 0/2] mm: soft-offline: fix " Andrew Morton
2018-08-22 1:37 ` Naoya Horiguchi
2018-08-22 2:25 ` Mike Kravetz
2018-08-22 8:00 ` Michal Hocko
2018-10-26 8:46 ` Michal Hocko
2018-10-30 6:54 ` Naoya Horiguchi
2018-10-30 8:16 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180718014106.GC12184@hori1.linux.bs1.fc.nec.co.jp \
--to=n-horiguchi@ah.jp.nec.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=xishi.qiuxishi@alibaba-inc.com \
--cc=zy.zhengyi@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).