All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miaohe Lin <linmiaohe@huawei.com>
To: Tony Luck <tony.luck@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Shuai Xue <xueshuai@linux.alibaba.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<linuxppc-dev@lists.ozlabs.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v3 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline
Date: Fri, 28 Oct 2022 10:28:44 +0800	[thread overview]
Message-ID: <c5ade8ea-6349-7ade-f245-273de69888a4@huawei.com> (raw)
In-Reply-To: <20221021200120.175753-3-tony.luck@intel.com>

On 2022/10/22 4:01, Tony Luck wrote:
> Cannot call memory_failure() directly from the fault handler because
> mmap_lock (and others) are held.

Could you please explain which lock makes it unfeasible to call memory_failure() directly and
why? I'm somewhat confused. But I agree using memory_failure_queue() should be a good idea.

> 
> It is important, but not urgent, to mark the source page as h/w poisoned
> and unmap it from other tasks.
> 
> Use memory_failure_queue() to request a call to memory_failure() for the
> page with the error.
> 
> Also provide a stub version for CONFIG_MEMORY_FAILURE=n
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/mm.h | 5 ++++-
>  mm/memory.c        | 4 +++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8bbcccbc5565..03ced659eb58 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3268,7 +3268,6 @@ enum mf_flags {
>  int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>  		      unsigned long count, int mf_flags);
>  extern int memory_failure(unsigned long pfn, int flags);
> -extern void memory_failure_queue(unsigned long pfn, int flags);
>  extern void memory_failure_queue_kick(int cpu);
>  extern int unpoison_memory(unsigned long pfn);
>  extern int sysctl_memory_failure_early_kill;
> @@ -3277,8 +3276,12 @@ extern void shake_page(struct page *p);
>  extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(unsigned long pfn, int flags);
>  #ifdef CONFIG_MEMORY_FAILURE
> +extern void memory_failure_queue(unsigned long pfn, int flags);
>  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
>  #else
> +static inline void memory_failure_queue(unsigned long pfn, int flags)
> +{
> +}
>  static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  {
>  	return 0;
> diff --git a/mm/memory.c b/mm/memory.c
> index b6056eef2f72..eae242351726 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2866,8 +2866,10 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src,
>  	unsigned long addr = vmf->address;
>  
>  	if (likely(src)) {
> -		if (copy_mc_user_highpage(dst, src, addr, vma))
> +		if (copy_mc_user_highpage(dst, src, addr, vma)) {
> +			memory_failure_queue(page_to_pfn(src), 0);

It seems MF_ACTION_REQUIRED is not needed for memory_failure_queue() here. Thanks for your patch.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks,
Miaohe Lin


WARNING: multiple messages have this Message-ID (diff)
From: Miaohe Lin <linmiaohe@huawei.com>
To: Tony Luck <tony.luck@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linuxppc-dev@lists.ozlabs.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v3 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline
Date: Fri, 28 Oct 2022 10:28:44 +0800	[thread overview]
Message-ID: <c5ade8ea-6349-7ade-f245-273de69888a4@huawei.com> (raw)
In-Reply-To: <20221021200120.175753-3-tony.luck@intel.com>

On 2022/10/22 4:01, Tony Luck wrote:
> Cannot call memory_failure() directly from the fault handler because
> mmap_lock (and others) are held.

Could you please explain which lock makes it unfeasible to call memory_failure() directly and
why? I'm somewhat confused. But I agree using memory_failure_queue() should be a good idea.

> 
> It is important, but not urgent, to mark the source page as h/w poisoned
> and unmap it from other tasks.
> 
> Use memory_failure_queue() to request a call to memory_failure() for the
> page with the error.
> 
> Also provide a stub version for CONFIG_MEMORY_FAILURE=n
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/mm.h | 5 ++++-
>  mm/memory.c        | 4 +++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8bbcccbc5565..03ced659eb58 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3268,7 +3268,6 @@ enum mf_flags {
>  int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>  		      unsigned long count, int mf_flags);
>  extern int memory_failure(unsigned long pfn, int flags);
> -extern void memory_failure_queue(unsigned long pfn, int flags);
>  extern void memory_failure_queue_kick(int cpu);
>  extern int unpoison_memory(unsigned long pfn);
>  extern int sysctl_memory_failure_early_kill;
> @@ -3277,8 +3276,12 @@ extern void shake_page(struct page *p);
>  extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(unsigned long pfn, int flags);
>  #ifdef CONFIG_MEMORY_FAILURE
> +extern void memory_failure_queue(unsigned long pfn, int flags);
>  extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
>  #else
> +static inline void memory_failure_queue(unsigned long pfn, int flags)
> +{
> +}
>  static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  {
>  	return 0;
> diff --git a/mm/memory.c b/mm/memory.c
> index b6056eef2f72..eae242351726 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2866,8 +2866,10 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src,
>  	unsigned long addr = vmf->address;
>  
>  	if (likely(src)) {
> -		if (copy_mc_user_highpage(dst, src, addr, vma))
> +		if (copy_mc_user_highpage(dst, src, addr, vma)) {
> +			memory_failure_queue(page_to_pfn(src), 0);

It seems MF_ACTION_REQUIRED is not needed for memory_failure_queue() here. Thanks for your patch.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks,
Miaohe Lin


  reply	other threads:[~2022-10-28  2:28 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17 23:42 [RFC PATCH] mm, hwpoison: Recover from copy-on-write machine checks Tony Luck
2022-10-18  8:43 ` HORIGUCHI NAOYA(堀口 直也)
2022-10-18 17:52   ` Luck, Tony
2022-10-19 17:08     ` [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults Tony Luck
2022-10-19 17:08       ` Tony Luck
2022-10-19 17:45       ` Dan Williams
2022-10-19 17:45         ` Dan Williams
2022-10-19 20:30         ` Luck, Tony
2022-10-19 20:30           ` Luck, Tony
2022-10-20  1:57       ` Shuai Xue
2022-10-20  1:57         ` Shuai Xue
2022-10-20 20:05         ` Tony Luck
2022-10-20 20:05           ` Tony Luck
2022-10-21  1:38           ` Miaohe Lin
2022-10-21  1:38             ` Miaohe Lin
2022-10-21  3:57             ` Luck, Tony
2022-10-21  3:57               ` Luck, Tony
2022-10-21  1:52           ` Shuai Xue
2022-10-21  1:52             ` Shuai Xue
2022-10-21  4:08             ` Tony Luck
2022-10-21  4:08               ` Tony Luck
2022-10-21  4:11               ` David Laight
2022-10-21  4:11                 ` David Laight
2022-10-21  4:41                 ` Luck, Tony
2022-10-21  4:41                   ` Luck, Tony
2022-10-21  9:29                   ` Shuai Xue
2022-10-21  9:29                     ` Shuai Xue
2022-10-21 16:30                     ` Luck, Tony
2022-10-21 16:30                       ` Luck, Tony
2022-10-23 15:04                       ` Shuai Xue
2022-10-23 15:04                         ` Shuai Xue
2022-10-21  6:57               ` Shuai Xue
2022-10-21  6:57                 ` Shuai Xue
2022-10-21 20:01       ` [PATCH v3 0/2] Copy-on-write poison recovery Tony Luck
2022-10-21 20:01         ` Tony Luck
2022-10-21 20:01         ` [PATCH v3 1/2] mm, hwpoison: Try to recover from copy-on write faults Tony Luck
2022-10-21 20:01           ` Tony Luck
2022-10-25  5:46           ` HORIGUCHI NAOYA(堀口 直也)
2022-10-25  5:46             ` HORIGUCHI NAOYA(堀口 直也)
2022-10-28  2:11           ` Miaohe Lin
2022-10-28  2:11             ` Miaohe Lin
2022-10-28 16:09             ` Luck, Tony
2022-10-28 16:09               ` Luck, Tony
2022-11-02 14:27               ` Alexander Potapenko
2022-11-02 14:27                 ` Alexander Potapenko
2022-11-02 14:30                 ` Alexander Potapenko
2022-11-02 14:30                   ` Alexander Potapenko
2022-10-21 20:01         ` [PATCH v3 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline Tony Luck
2022-10-21 20:01           ` Tony Luck
2022-10-28  2:28           ` Miaohe Lin [this message]
2022-10-28  2:28             ` Miaohe Lin
2022-10-28 16:13             ` Luck, Tony
2022-10-28 16:13               ` Luck, Tony
2022-10-29  1:55               ` Miaohe Lin
2022-10-29  1:55                 ` Miaohe Lin
2022-10-23 15:52         ` [PATCH v3 0/2] Copy-on-write poison recovery Shuai Xue
2022-10-23 15:52           ` Shuai Xue
2022-10-26  5:19           ` Shuai Xue
2022-10-26  5:19             ` Shuai Xue
2022-10-31 20:10         ` [PATCH v4 " Tony Luck
2022-10-31 20:10           ` Tony Luck
2022-10-31 20:10           ` [PATCH v4 1/2] mm, hwpoison: Try to recover from copy-on write faults Tony Luck
2022-10-31 20:10             ` Tony Luck
2022-10-31 20:10           ` [PATCH v4 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline Tony Luck
2022-10-31 20:10             ` Tony Luck
2023-05-18 21:49             ` Jane Chu
2023-05-18 22:10               ` Luck, Tony
2023-05-19  7:28               ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5ade8ea-6349-7ade-f245-273de69888a4@huawei.com \
    --to=linmiaohe@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=naoya.horiguchi@nec.com \
    --cc=npiggin@gmail.com \
    --cc=tony.luck@intel.com \
    --cc=willy@infradead.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.