All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Yang Shi <shy828301@gmail.com>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Date: Fri, 8 Apr 2022 01:56:12 +0000	[thread overview]
Message-ID: <20220408015610.GA3061012@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <4b5ad6c3-99a0-b04f-21ad-8ade46984c76@huawei.com>

On Thu, Apr 07, 2022 at 09:38:26PM +0800, Miaohe Lin wrote:
> On 2022/4/7 19:29, Naoya Horiguchi wrote:
...
> > +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> > +{
> > +	struct page *page = pfn_to_page(pfn);
> > +	struct page *head = compound_head(page);
> > +	int ret = 2;	/* fallback to normal page handling */
> > +	bool count_increased = false;
> > +
> > +	if (!PageHeadHuge(head))
> > +		goto out;
> > +
> > +	if (flags & MF_COUNT_INCREASED) {
> > +		ret = 1;
> > +		count_increased = true;
> > +	} else if (HPageFreed(head) || HPageMigratable(head)) {
> > +		ret = get_page_unless_zero(head);
> > +		if (ret)
> > +			count_increased = true;
> > +	} else {
> > +		ret = -EBUSY;
> > +		goto out;
> > +	}
> > +
> > +	if (hwpoison_filter(page)) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> 
> Now hwpoison_filter is done without lock_page + unlock_page. Is this ok or
> lock_page + unlock_page pair is indeed required?

Hmm, we had better call hwpoison_filter in page lock for hugepages.
I'll move this too, thank you.

> > +
> > +	if (TestSetPageHWPoison(head)) {
> > +		ret = -EHWPOISON;
> > +		goto out;
> > +	}
> 
> Without this patch, page refcnt is not decremented if MF_COUNT_INCREASED is set in flags
> when PageHWPoison is already set. So I think this patch also fixes that issue. Thanks!

Good point, I even didn't notice that. And the issue still seems to exist
for normal page's cases.  Maybe encountering "already hwpoisoned" case from
madvise_inject_error() is rare but could happen when the first call failed
to contain the error (which is still accessible from the calling process).

> 
> > +
> > +	return ret;
> > +out:
> > +	if (count_increased)
> > +		put_page(head);
> > +	return ret;
> > +}
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +/*
> > + * Taking refcount of hugetlb pages needs extra care about race conditions
> > + * with basic operations like hugepage allocation/free/demotion.
> > + * So all necessary prechecks for hwpoison (like pinning, testing/setting
> > + * PageHWPoison, and hwpoison_filter) are done in single hugetlb_lock range.
> > + */
> > +static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
> >  {
> > -	struct page *p = pfn_to_page(pfn);
> > -	struct page *head = compound_head(p);
> >  	int res;
> > +	struct page *p = pfn_to_page(pfn);
> > +	struct page *head;
> >  	unsigned long page_flags;
> > +	bool retry = true;
> >  
> > -	if (TestSetPageHWPoison(head)) {
> > -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > -		       pfn);
> > -		res = -EHWPOISON;
> > -		if (flags & MF_ACTION_REQUIRED)
> > +	*hugetlb = 1;
> > +retry:
> > +	res = get_huge_page_for_hwpoison(pfn, flags);
> > +	if (res == 2) { /* fallback to normal page handling */
> > +		*hugetlb = 0;
> > +		return 0;
> > +	} else if (res == -EOPNOTSUPP) {
> > +		return res;
> > +	} else if (res == -EHWPOISON) {
> > +		pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > +		if (flags & MF_ACTION_REQUIRED) {
> > +			head = compound_head(p);
> >  			res = kill_accessing_process(current, page_to_pfn(head), flags);
> > +		}
> > +		return res;
> > +	} else if (res == -EBUSY) {
> > +		if (retry) {
> > +			retry = false;
> > +			goto retry;
> > +		}
> > +		action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> >  		return res;
> >  	}
> >  
> >  	num_poisoned_pages_inc();
> >  
> > -	if (!(flags & MF_COUNT_INCREASED)) {
> > -		res = get_hwpoison_page(p, flags);
> > -		if (!res) {
> > -			lock_page(head);
> > -			if (hwpoison_filter(p)) {
> > -				if (TestClearPageHWPoison(head))
> > -					num_poisoned_pages_dec();
> > -				unlock_page(head);
> > -				return -EOPNOTSUPP;
> > -			}
> > -			unlock_page(head);
> > -			res = MF_FAILED;
> > -			if (__page_handle_poison(p)) {
> > -				page_ref_inc(p);
> > -				res = MF_RECOVERED;
> > -			}
> > -			action_result(pfn, MF_MSG_FREE_HUGE, res);
> > -			return res == MF_RECOVERED ? 0 : -EBUSY;
> > -		} else if (res < 0) {
> > -			action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > -			return -EBUSY;
> > +	/*
> > +	 * Handling free hugepage.  The possible race with hugepage allocation
> > +	 * or demotion can be prevented by PageHWPoison flag.
> > +	 */
> > +	if (res == 0) {
> > +		res = MF_FAILED;
> > +		if (__page_handle_poison(p)) {
> > +			page_ref_inc(p);
> > +			res = MF_RECOVERED;
> >  		}
> > +		action_result(pfn, MF_MSG_FREE_HUGE, res);
> > +		return res == MF_RECOVERED ? 0 : -EBUSY;
> >  	}
> >  
> > +	head = compound_head(p);
> >  	lock_page(head);
> >  
> >  	/*
> 
> IMHO, the below code could be removed now as we fetch the refcnt under the hugetlb_lock:
> 
> 	/*
> 	 * The page could have changed compound pages due to race window.
> 	 * If this happens just bail out.
> 	 */
> 	if (!PageHuge(p) || compound_head(p) != head) {
> 		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> 		res = -EBUSY;
> 		goto out;
> 	}
> 
> But this might be another patch.

I'll do this.

Thank you for the review and suggestions,

- Naoya Horiguchi

  reply	other threads:[~2022-04-08  1:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07 11:29 [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() Naoya Horiguchi
2022-04-07 13:38 ` Miaohe Lin
2022-04-08  1:56   ` HORIGUCHI NAOYA(堀口 直也) [this message]
2022-04-08  3:31     ` Miaohe Lin
2022-04-08  5:07       ` HORIGUCHI NAOYA(堀口 直也)
2022-04-08  6:28         ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220408015610.GA3061012@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.carpenter@oracle.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.