linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: zhenwei pi <pizhenwei@bytedance.com>
To: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: akpm@linux-foundation.org, naoya.horiguchi@nec.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Wu Fengguang <fengguang.wu@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>
Subject: Re: Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter
Date: Fri, 6 May 2022 21:38:29 +0800	[thread overview]
Message-ID: <3c0e25fb-695d-4a29-6de4-c892f89cea7a@bytedance.com> (raw)
In-Reply-To: <20220506085920.GC1356094@u2004>



On 5/6/22 16:59, Naoya Horiguchi wrote:
> On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
>> In the memory failure procedure, hwpoison_filter has higher priority,
>> if memory_filter() filters the error event, there is no need to do
>> the further work.
> 
> Could you clarify what problem you are trying to solve (what does
> "optimize" mean in this context or what is the benefit)?
> 

OK. The background of this work:
As well known, the memory failure mechanism handles memory corrupted 
event, and try to send SIGBUS to the user process which uses this 
corrupted page.

For the virtualization case, QEMU catches SIGBUS and tries to inject MCE 
into the guest, and the guest handles memory failure again. Thus the 
guest gets the minimal effect from hardware memory corruption.

The further step I'm working on:
1, try to modify code to decrease poisoned pages in a single place 
(mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).

2, try to use page_handle_poison() to handle SetPageHWPoison() and 
num_poisoned_pages_inc() together. It would be best to call 
num_poisoned_pages_inc() in a single place too. I'm not sure if this is 
possible or not, please correct me if I misunderstand.

3, introduce memory failure notifier list in memory-failure.c: notify 
the corrupted PFN to someone who registers this list.
If I can complete [1] and [2] part, [3] will be quite easy(just call 
notifier list after increasing poisoned page).

4, introduce memory recover VQ for memory balloon device, and registers 
memory failure notifier list. During the guest kernel handles memory 
failure, balloon device gets notified by memory failure notifier list, 
and tells the host to recover the corrupted PFN(GPA) by the new VQ.

5, host side remaps the corrupted page(HVA), and tells the guest side to 
unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA) 
dynamically.

Because [4] and [5] are related to balloon device, also CC Michael, 
David and Jason.

> Now hwpoison_filter() can be called both with *and* without taking page refcount.
> It's mainly called *with* taking page refcount in order to make sure that the
> actual handling process is executed only for pages that meet a given condition.
> IOW, it's important to prevent pages which do not meet the condition from going
> ahead to further steps (false-positive is not permitted).  So this type of
> callsite should not be omittable.
> 
> As for the other case, hwpoison_filter() is also called in hwpoison_inject()
> *without* taking page refcount.  This actually has a different nuance and
> intended to speculatively filter the injection events before setting
> PageHWPoison flag to reduce the noise due to setting PG_hwpoison temporary.
> The point is that it's not intended here to filter precisely and this callsite
> is omittable.
> 
> So in my understanding, we need keep hwpoison_filter() after taking page
> refcount as we do now.  Maybe optionally and additionally calling
> hwpoison_filter() at the beginning of memory_failure() might be possible,
> but I'm not sure yet how helpful...
> 
> Thanks,
> Naoya Horiguchi
> 
>>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>> ---
>>   mm/memory-failure.c | 14 +++++---------
>>   1 file changed, 5 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index ece05858568f..a6a27c8b800f 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1800,6 +1800,11 @@ int memory_failure(unsigned long pfn, int flags)
>>   		goto unlock_mutex;
>>   	}
>>   
>> +	if (hwpoison_filter(p)) {
>> +		res = -EOPNOTSUPP;
>> +		goto unlock_mutex;
>> +	}
>> +
>>   try_again:
>>   	res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
>>   	if (hugetlb)
>> @@ -1937,15 +1942,6 @@ int memory_failure(unsigned long pfn, int flags)
>>   	 */
>>   	page_flags = p->flags;
>>   
>> -	if (hwpoison_filter(p)) {
>> -		if (TestClearPageHWPoison(p))
>> -			num_poisoned_pages_dec();
>> -		unlock_page(p);
>> -		put_page(p);
>> -		res = -EOPNOTSUPP;
>> -		goto unlock_mutex;
>> -	}
>> -
>>   	/*
>>   	 * __munlock_pagevec may clear a writeback page's LRU flag without
>>   	 * page_lock. We need wait writeback completion for this page or it
>> -- 
>> 2.20.1
>>

-- 
zhenwei pi

  reply	other threads:[~2022-05-06 13:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 14:22 [PATCH 0/4] memory-failure: fix hwpoison_filter zhenwei pi
2022-04-29 14:22 ` [PATCH 1/4] mm/memory-failure.c: move clear_hwpoisoned_pages zhenwei pi
2022-05-06  8:55   ` Naoya Horiguchi
2022-04-29 14:22 ` [PATCH 2/4] mm/memofy-failure.c:: simplify num_poisoned_pages_dec zhenwei pi
2022-05-06  8:55   ` Naoya Horiguchi
2022-04-29 14:22 ` [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter zhenwei pi
2022-05-06  8:59   ` Naoya Horiguchi
2022-05-06 13:38     ` zhenwei pi [this message]
2022-05-06 16:28       ` David Hildenbrand
2022-05-07  0:28         ` zhenwei pi
2022-05-07  8:20           ` Naoya Horiguchi
2022-05-07  9:19             ` zhenwei pi
2022-04-29 14:22 ` [PATCH 4/4] mm/memofy-failure.c: add hwpoison_filter for soft offline zhenwei pi
2022-05-06  8:59   ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c0e25fb-695d-4a29-6de4-c892f89cea7a@bytedance.com \
    --to=pizhenwei@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=fengguang.wu@intel.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=naoya.horiguchi@nec.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).