Re: [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()

From: "HORIGUCHI NAOYA(堀口　直也)" <naoya.horiguchi@nec.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Yang Shi <shy828301@gmail.com>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Date: Fri, 8 Apr 2022 01:56:12 +0000	[thread overview]
Message-ID: <20220408015610.GA3061012@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <4b5ad6c3-99a0-b04f-21ad-8ade46984c76@huawei.com>

On Thu, Apr 07, 2022 at 09:38:26PM +0800, Miaohe Lin wrote:
> On 2022/4/7 19:29, Naoya Horiguchi wrote:
...
> > +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> > +{
> > +	struct page *page = pfn_to_page(pfn);
> > +	struct page *head = compound_head(page);
> > +	int ret = 2;	/* fallback to normal page handling */
> > +	bool count_increased = false;
> > +
> > +	if (!PageHeadHuge(head))
> > +		goto out;
> > +
> > +	if (flags & MF_COUNT_INCREASED) {
> > +		ret = 1;
> > +		count_increased = true;
> > +	} else if (HPageFreed(head) || HPageMigratable(head)) {
> > +		ret = get_page_unless_zero(head);
> > +		if (ret)
> > +			count_increased = true;
> > +	} else {
> > +		ret = -EBUSY;
> > +		goto out;
> > +	}
> > +
> > +	if (hwpoison_filter(page)) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> 
> Now hwpoison_filter is done without lock_page + unlock_page. Is this ok or
> lock_page + unlock_page pair is indeed required?

Hmm, we had better call hwpoison_filter in page lock for hugepages.
I'll move this too, thank you.

> > +
> > +	if (TestSetPageHWPoison(head)) {
> > +		ret = -EHWPOISON;
> > +		goto out;
> > +	}
> 
> Without this patch, page refcnt is not decremented if MF_COUNT_INCREASED is set in flags
> when PageHWPoison is already set. So I think this patch also fixes that issue. Thanks!

Good point, I even didn't notice that. And the issue still seems to exist
for normal page's cases.  Maybe encountering "already hwpoisoned" case from
madvise_inject_error() is rare but could happen when the first call failed
to contain the error (which is still accessible from the calling process).

> 
> > +
> > +	return ret;
> > +out:
> > +	if (count_increased)
> > +		put_page(head);
> > +	return ret;
> > +}
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +/*
> > + * Taking refcount of hugetlb pages needs extra care about race conditions
> > + * with basic operations like hugepage allocation/free/demotion.
> > + * So all necessary prechecks for hwpoison (like pinning, testing/setting
> > + * PageHWPoison, and hwpoison_filter) are done in single hugetlb_lock range.
> > + */
> > +static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
> >  {
> > -	struct page *p = pfn_to_page(pfn);
> > -	struct page *head = compound_head(p);
> >  	int res;
> > +	struct page *p = pfn_to_page(pfn);
> > +	struct page *head;
> >  	unsigned long page_flags;
> > +	bool retry = true;
> >  
> > -	if (TestSetPageHWPoison(head)) {
> > -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > -		       pfn);
> > -		res = -EHWPOISON;
> > -		if (flags & MF_ACTION_REQUIRED)
> > +	*hugetlb = 1;
> > +retry:
> > +	res = get_huge_page_for_hwpoison(pfn, flags);
> > +	if (res == 2) { /* fallback to normal page handling */
> > +		*hugetlb = 0;
> > +		return 0;
> > +	} else if (res == -EOPNOTSUPP) {
> > +		return res;
> > +	} else if (res == -EHWPOISON) {
> > +		pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > +		if (flags & MF_ACTION_REQUIRED) {
> > +			head = compound_head(p);
> >  			res = kill_accessing_process(current, page_to_pfn(head), flags);
> > +		}
> > +		return res;
> > +	} else if (res == -EBUSY) {
> > +		if (retry) {
> > +			retry = false;
> > +			goto retry;
> > +		}
> > +		action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> >  		return res;
> >  	}
> >  
> >  	num_poisoned_pages_inc();
> >  
> > -	if (!(flags & MF_COUNT_INCREASED)) {
> > -		res = get_hwpoison_page(p, flags);
> > -		if (!res) {
> > -			lock_page(head);
> > -			if (hwpoison_filter(p)) {
> > -				if (TestClearPageHWPoison(head))
> > -					num_poisoned_pages_dec();
> > -				unlock_page(head);
> > -				return -EOPNOTSUPP;
> > -			}
> > -			unlock_page(head);
> > -			res = MF_FAILED;
> > -			if (__page_handle_poison(p)) {
> > -				page_ref_inc(p);
> > -				res = MF_RECOVERED;
> > -			}
> > -			action_result(pfn, MF_MSG_FREE_HUGE, res);
> > -			return res == MF_RECOVERED ? 0 : -EBUSY;
> > -		} else if (res < 0) {
> > -			action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > -			return -EBUSY;
> > +	/*
> > +	 * Handling free hugepage.  The possible race with hugepage allocation
> > +	 * or demotion can be prevented by PageHWPoison flag.
> > +	 */
> > +	if (res == 0) {
> > +		res = MF_FAILED;
> > +		if (__page_handle_poison(p)) {
> > +			page_ref_inc(p);
> > +			res = MF_RECOVERED;
> >  		}
> > +		action_result(pfn, MF_MSG_FREE_HUGE, res);
> > +		return res == MF_RECOVERED ? 0 : -EBUSY;
> >  	}
> >  
> > +	head = compound_head(p);
> >  	lock_page(head);
> >  
> >  	/*
> 
> IMHO, the below code could be removed now as we fetch the refcnt under the hugetlb_lock:
> 
> 	/*
> 	 * The page could have changed compound pages due to race window.
> 	 * If this happens just bail out.
> 	 */
> 	if (!PageHuge(p) || compound_head(p) != head) {
> 		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> 		res = -EBUSY;
> 		goto out;
> 	}
> 
> But this might be another patch.

I'll do this.

Thank you for the review and suggestions,

- Naoya Horiguchi