All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: zhenwei pi <pizhenwei@bytedance.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"david@redhat.com" <david@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"peterx@redhat.com" <peterx@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [PATCH 2/3] mm/memory-failure.c: support reset PTE during unpoison
Date: Mon, 30 May 2022 05:02:34 +0000	[thread overview]
Message-ID: <20220530050234.GA1036127@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20220520070648.1794132-3-pizhenwei@bytedance.com>

On Fri, May 20, 2022 at 03:06:47PM +0800, zhenwei pi wrote:
> Origianlly, unpoison_memory() is only used by hwpoison-inject, and
> unpoisons a page which is poisoned by hwpoison-inject too. The kernel PTE
> entry has no change during software poison/unpoison.
> 
> On a virtualization platform, it's possible to fix hardware corrupted page
> by hypervisor, typically the hypervisor remaps the error HVA(host virtual
> address). So add a new parameter 'const char *reason' to show the reason
> called by.
> 
> Once the corrupted page gets fixed, the guest kernel needs put page to
> buddy. Reuse the page and hit the following issue(Intel Platinum 8260):
>  BUG: unable to handle page fault for address: ffff888061646000
>  #PF: supervisor write access in kernel mode
>  #PF: error_code(0x0002) - not-present page
>  PGD 2c01067 P4D 2c01067 PUD 61aaa063 PMD 10089b063 PTE 800fffff9e9b9062
>  Oops: 0002 [#1] PREEMPT SMP NOPTI
>  CPU: 2 PID: 31106 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0-rc6.bm.1-amd64 #6
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:clear_page_erms+0x7/0x10
> 
> The kernel PTE entry of the fixed page is still uncorrected, kernel hits
> page fault during prep_new_page. So add 'bool reset_kpte' to get a change
> to fix the PTE entry if the page is fixed by hypervisor.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  include/linux/mm.h   |  2 +-
>  mm/hwpoison-inject.c |  2 +-
>  mm/memory-failure.c  | 26 +++++++++++++++++++-------
>  3 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 665873c2788c..7ba210e86401 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3191,7 +3191,7 @@ enum mf_flags {
>  extern int memory_failure(unsigned long pfn, int flags);
>  extern void memory_failure_queue(unsigned long pfn, int flags);
>  extern void memory_failure_queue_kick(int cpu);
> -extern int unpoison_memory(unsigned long pfn);
> +extern int unpoison_memory(unsigned long pfn, bool reset_kpte, const char *reason);
>  extern int sysctl_memory_failure_early_kill;
>  extern int sysctl_memory_failure_recovery;
>  extern void shake_page(struct page *p);
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..0dd17ba98ade 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -57,7 +57,7 @@ static int hwpoison_unpoison(void *data, u64 val)
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	return unpoison_memory(val);
> +	return unpoison_memory(val, false, "hwpoison-inject");
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 95c218bb0a37..a46de3be1dd7 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2132,21 +2132,26 @@ core_initcall(memory_failure_init);
>  /**
>   * unpoison_memory - Unpoison a previously poisoned page
>   * @pfn: Page number of the to be unpoisoned page
> + * @reset_kpte: Reset the PTE entry for kmap
> + * @reason: The callers tells why unpoisoning the page
>   *
> - * Software-unpoison a page that has been poisoned by
> - * memory_failure() earlier.
> + * Unpoison a page that has been poisoned by memory_failure() earlier.
>   *
> - * This is only done on the software-level, so it only works
> - * for linux injected failures, not real hardware failures
> + * For linux injected failures, there is no need to reset PTE entry.
> + * It's possible to fix hardware memory failure on a virtualization platform,
> + * once hypervisor fixes the failure, guest needs put page back to buddy and
> + * reset the PTE entry in kernel.
>   *
>   * Returns 0 for success, otherwise -errno.
>   */
> -int unpoison_memory(unsigned long pfn)
> +int unpoison_memory(unsigned long pfn, bool reset_kpte, const char *reason)
>  {
>  	struct page *page;
>  	struct page *p;
>  	int ret = -EBUSY;
>  	int freeit = 0;
> +	pte_t *kpte;
> +	unsigned long addr;

These variables are used only in "if (reset_kpte)" block, so you can
move the definitions in it.

>  	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>  					DEFAULT_RATELIMIT_BURST);
>  
> @@ -2208,8 +2213,15 @@ int unpoison_memory(unsigned long pfn)
>  	mutex_unlock(&mf_mutex);
>  	if (!ret || freeit) {
>  		num_poisoned_pages_dec();
> -		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> -				 page_to_pfn(p), &unpoison_rs);
> +		pr_info("Unpoison: Unpoisoned page %#lx by %s\n",
> +				 page_to_pfn(p), reason);

Do you need undoing rate limiting here?  In the original unpoison's usage,
avoiding flood of "Unpoison: Software-unpoisoned page" messages is helpful.

And unpoison seems to be called from virtio-balloon multiple times when
the backend is 2MB hugepages.  If it's right, printing out 512 lines of
"Unpoison: Unpoisoned page 0xXXX by virtio-balloon" messages might not be
so helpful?

Thanks,
Naoya Horiguchi

> +		if (reset_kpte) {
> +			preempt_disable();
> +			addr = (unsigned long)page_to_virt(p);
> +			kpte = virt_to_kpte(addr);
> +			set_pte_at(&init_mm, addr, kpte, pfn_pte(pfn, PAGE_KERNEL));
> +			preempt_enable();
> +		}
>  	}
>  	return ret;
>  }
> -- 
> 2.20.1

  reply	other threads:[~2022-05-30  5:02 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-20  7:06 [PATCH 0/3] recover hardware corrupted page by virtio balloon zhenwei pi
2022-05-20  7:06 ` zhenwei pi
2022-05-20  7:06 ` [PATCH 1/3] memory-failure: Introduce memory failure notifier zhenwei pi
2022-05-20  7:06   ` zhenwei pi
2022-05-30  5:09   ` HORIGUCHI NAOYA(堀口 直也)
2022-05-20  7:06 ` [PATCH 2/3] mm/memory-failure.c: support reset PTE during unpoison zhenwei pi
2022-05-20  7:06   ` zhenwei pi
2022-05-30  5:02   ` HORIGUCHI NAOYA(堀口 直也) [this message]
2022-05-30  5:46     ` zhenwei pi
2022-05-30  5:46       ` zhenwei pi
2022-05-30  6:50   ` David Hildenbrand
2022-05-30  6:50     ` David Hildenbrand
2022-05-20  7:06 ` [PATCH 3/3] virtio_balloon: Introduce memory recover zhenwei pi
2022-05-20  7:06   ` zhenwei pi
2022-05-20 12:48   ` kernel test robot
2022-05-20 12:48     ` kernel test robot
2022-05-20 13:39   ` kernel test robot
2022-05-20 13:39     ` kernel test robot
2022-05-20 15:28   ` kernel test robot
2022-05-20 15:28     ` kernel test robot
2022-05-24 19:35   ` Sean Christopherson
2022-05-24 23:32     ` zhenwei pi
2022-05-24 23:32       ` zhenwei pi
2022-05-30  7:53       ` David Hildenbrand
2022-05-30  7:53         ` David Hildenbrand
2022-05-26 19:18   ` Michael S. Tsirkin
2022-05-26 19:18     ` Michael S. Tsirkin
2022-05-27  2:22     ` zhenwei pi
2022-05-27  2:22       ` zhenwei pi
2022-05-30  7:48   ` David Hildenbrand
2022-05-30  7:48     ` David Hildenbrand
2022-05-30 12:47     ` zhenwei pi
2022-05-30 12:47       ` zhenwei pi
2022-05-24 18:59 ` [PATCH 0/3] recover hardware corrupted page by virtio balloon David Hildenbrand
2022-05-24 18:59   ` David Hildenbrand
2022-05-27  3:47 ` zhenwei pi
2022-05-27  3:47   ` zhenwei pi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220530050234.GA1036127@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pizhenwei@bytedance.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.