All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: "Miaohe Lin" <linmiaohe@huawei.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yang Shi <shy828301@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
Date: Tue, 15 Mar 2022 17:33:43 -0700	[thread overview]
Message-ID: <68c2b93d-b6a5-5b17-cfb1-722b2d4412b5@oracle.com> (raw)
In-Reply-To: <1770029b-fd59-4eb1-c891-5a2ba4beef9c@huawei.com>

On 3/15/22 07:00, Miaohe Lin wrote:
> On 2022/3/15 13:49, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Mon, Mar 14, 2022 at 03:10:25PM +0800, Miaohe Lin wrote:
>>> On 2022/3/14 10:13, Naoya Horiguchi wrote:
>>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>
>>>> There is a race condition between memory_failure_hugetlb() and hugetlb
>>>> free/demotion, which causes setting PageHWPoison flag on the wrong page
>>>> (which was a hugetlb when memory_failure() was called, but was removed
>>>> or demoted when memory_failure_hugetlb() is called).  This results in
>>>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
>>>
>>> It seems hold page lock could not help solve this race condition as hugetlb
>>> page demotion is not required to hold the page lock. Could you please explain
>>> this a bit more?
>>
>> Sorry, the last line in the paragraph need change. What prevents the current
>> race is hugetlb_lock, not page lock.  The page lock is here to prevent the
>> race with hugepage allocation (not directly related to the current issue,
>> but it's still necessary).
> 
> Many thanks for clarifying this.
> 
>>
>>>
>>> BTW:Is there some words missing or here should be 'page lock.' instead of 'page lock,' ?
>>
>> I should use a period here, I'll fix it.
>>
>> [...]
>>
>>>> @@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>>  	int res;
>>>>  	unsigned long page_flags;
>>>>  
>>>> -	if (TestSetPageHWPoison(head)) {
>>>> -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
>>>> -		       pfn);
>>>> -		res = -EHWPOISON;
>>>> -		if (flags & MF_ACTION_REQUIRED)
>>>> -			res = kill_accessing_process(current, page_to_pfn(head), flags);
>>>> -		return res;
>>>> -	}
>>>> -
>>>> -	num_poisoned_pages_inc();
>>>> -
>>>>  	if (!(flags & MF_COUNT_INCREASED)) {
>>>>  		res = get_hwpoison_page(p, flags);
>>>>  		if (!res) {
>>>
>>> In this (res == 0) case, hugetlb page could be dissolved via __page_handle_poison.
>>> But since PageHWPoison is not set yet, we can't set the PageHWPoison to the correct
>>> page. Think about the below code in dissolve_free_huge_page:
>>> 	/*
>>> 	 * Move PageHWPoison flag from head page to the raw
>>> 	 * error page, which makes any subpages rather than
>>> 	 * the error page reusable.
>>> 	 */
>>> 	if (PageHWPoison(head) && page != head) {
>>> 		SetPageHWPoison(page);
>>> 		ClearPageHWPoison(head);
>>> 	}
>>>
>>> SetPageHWPoison won't be called for the error page. Or am I miss something?
>>
>> No, you're right.  We need call page_handle_poison() instead of
>> __page_handle_poison().
>>
>> @@ -1512,7 +1512,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  			}
>>  			unlock_page(head);
>>  			res = MF_FAILED;
>> -			if (__page_handle_poison(p)) {
>> +			if (page_handle_poison(p, true, false)) {
>>  				page_ref_inc(p);
>>  				res = MF_RECOVERED;
>>  			}
>>
> 
> This one looks good to me.

I must be missing something.  It seems page_handle_poison() calls
__page_handle_poison and thus dissolve_free_huge_page before
SetPageHWPoison.

I could easily be missing some patches, but that is the order of calls
in the code I am looking at.
-- 
Mike Kravetz

  reply	other threads:[~2022-03-16  0:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-14  2:13 [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() Naoya Horiguchi
2022-03-14  7:10 ` Miaohe Lin
2022-03-14 18:41   ` Mike Kravetz
2022-03-15  5:49   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-15 14:00     ` Miaohe Lin
2022-03-16  0:33       ` Mike Kravetz [this message]
2022-03-16  1:00         ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68c2b93d-b6a5-5b17-cfb1-722b2d4412b5@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@linux.dev \
    --cc=naoya.horiguchi@nec.com \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.