From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Yang Shi <shy828301@gmail.com>,
Dan Carpenter <dan.carpenter@oracle.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Date: Fri, 8 Apr 2022 01:56:12 +0000 [thread overview]
Message-ID: <20220408015610.GA3061012@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <4b5ad6c3-99a0-b04f-21ad-8ade46984c76@huawei.com>
On Thu, Apr 07, 2022 at 09:38:26PM +0800, Miaohe Lin wrote:
> On 2022/4/7 19:29, Naoya Horiguchi wrote:
...
> > +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> > +{
> > + struct page *page = pfn_to_page(pfn);
> > + struct page *head = compound_head(page);
> > + int ret = 2; /* fallback to normal page handling */
> > + bool count_increased = false;
> > +
> > + if (!PageHeadHuge(head))
> > + goto out;
> > +
> > + if (flags & MF_COUNT_INCREASED) {
> > + ret = 1;
> > + count_increased = true;
> > + } else if (HPageFreed(head) || HPageMigratable(head)) {
> > + ret = get_page_unless_zero(head);
> > + if (ret)
> > + count_increased = true;
> > + } else {
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + if (hwpoison_filter(page)) {
> > + ret = -EOPNOTSUPP;
> > + goto out;
> > + }
>
> Now hwpoison_filter is done without lock_page + unlock_page. Is this ok or
> lock_page + unlock_page pair is indeed required?
Hmm, we had better call hwpoison_filter in page lock for hugepages.
I'll move this too, thank you.
> > +
> > + if (TestSetPageHWPoison(head)) {
> > + ret = -EHWPOISON;
> > + goto out;
> > + }
>
> Without this patch, page refcnt is not decremented if MF_COUNT_INCREASED is set in flags
> when PageHWPoison is already set. So I think this patch also fixes that issue. Thanks!
Good point, I even didn't notice that. And the issue still seems to exist
for normal page's cases. Maybe encountering "already hwpoisoned" case from
madvise_inject_error() is rare but could happen when the first call failed
to contain the error (which is still accessible from the calling process).
>
> > +
> > + return ret;
> > +out:
> > + if (count_increased)
> > + put_page(head);
> > + return ret;
> > +}
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +/*
> > + * Taking refcount of hugetlb pages needs extra care about race conditions
> > + * with basic operations like hugepage allocation/free/demotion.
> > + * So all necessary prechecks for hwpoison (like pinning, testing/setting
> > + * PageHWPoison, and hwpoison_filter) are done in single hugetlb_lock range.
> > + */
> > +static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
> > {
> > - struct page *p = pfn_to_page(pfn);
> > - struct page *head = compound_head(p);
> > int res;
> > + struct page *p = pfn_to_page(pfn);
> > + struct page *head;
> > unsigned long page_flags;
> > + bool retry = true;
> >
> > - if (TestSetPageHWPoison(head)) {
> > - pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > - pfn);
> > - res = -EHWPOISON;
> > - if (flags & MF_ACTION_REQUIRED)
> > + *hugetlb = 1;
> > +retry:
> > + res = get_huge_page_for_hwpoison(pfn, flags);
> > + if (res == 2) { /* fallback to normal page handling */
> > + *hugetlb = 0;
> > + return 0;
> > + } else if (res == -EOPNOTSUPP) {
> > + return res;
> > + } else if (res == -EHWPOISON) {
> > + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > + if (flags & MF_ACTION_REQUIRED) {
> > + head = compound_head(p);
> > res = kill_accessing_process(current, page_to_pfn(head), flags);
> > + }
> > + return res;
> > + } else if (res == -EBUSY) {
> > + if (retry) {
> > + retry = false;
> > + goto retry;
> > + }
> > + action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > return res;
> > }
> >
> > num_poisoned_pages_inc();
> >
> > - if (!(flags & MF_COUNT_INCREASED)) {
> > - res = get_hwpoison_page(p, flags);
> > - if (!res) {
> > - lock_page(head);
> > - if (hwpoison_filter(p)) {
> > - if (TestClearPageHWPoison(head))
> > - num_poisoned_pages_dec();
> > - unlock_page(head);
> > - return -EOPNOTSUPP;
> > - }
> > - unlock_page(head);
> > - res = MF_FAILED;
> > - if (__page_handle_poison(p)) {
> > - page_ref_inc(p);
> > - res = MF_RECOVERED;
> > - }
> > - action_result(pfn, MF_MSG_FREE_HUGE, res);
> > - return res == MF_RECOVERED ? 0 : -EBUSY;
> > - } else if (res < 0) {
> > - action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > - return -EBUSY;
> > + /*
> > + * Handling free hugepage. The possible race with hugepage allocation
> > + * or demotion can be prevented by PageHWPoison flag.
> > + */
> > + if (res == 0) {
> > + res = MF_FAILED;
> > + if (__page_handle_poison(p)) {
> > + page_ref_inc(p);
> > + res = MF_RECOVERED;
> > }
> > + action_result(pfn, MF_MSG_FREE_HUGE, res);
> > + return res == MF_RECOVERED ? 0 : -EBUSY;
> > }
> >
> > + head = compound_head(p);
> > lock_page(head);
> >
> > /*
>
> IMHO, the below code could be removed now as we fetch the refcnt under the hugetlb_lock:
>
> /*
> * The page could have changed compound pages due to race window.
> * If this happens just bail out.
> */
> if (!PageHuge(p) || compound_head(p) != head) {
> action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> res = -EBUSY;
> goto out;
> }
>
> But this might be another patch.
I'll do this.
Thank you for the review and suggestions,
- Naoya Horiguchi
next prev parent reply other threads:[~2022-04-08 1:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-07 11:29 [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() Naoya Horiguchi
2022-04-07 13:38 ` Miaohe Lin
2022-04-08 1:56 ` HORIGUCHI NAOYA(堀口 直也) [this message]
2022-04-08 3:31 ` Miaohe Lin
2022-04-08 5:07 ` HORIGUCHI NAOYA(堀口 直也)
2022-04-08 6:28 ` Miaohe Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220408015610.GA3061012@hori.linux.bs1.fc.nec.co.jp \
--to=naoya.horiguchi@nec.com \
--cc=akpm@linux-foundation.org \
--cc=dan.carpenter@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@linux.dev \
--cc=shy828301@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).