linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aili Yao <yaoaili@kingsoft.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"david@redhat.com" <david@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>,
	yaoaili@kingsoft.com
Subject: Re: [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races
Date: Tue, 9 Mar 2021 10:04:21 +0800	[thread overview]
Message-ID: <20210309100421.3d09b6b1@alex-virtual-machine> (raw)
In-Reply-To: <20210308225504.GA233893@agluck-desk2.amr.corp.intel.com>

On Mon, 8 Mar 2021 14:55:04 -0800
"Luck, Tony" <tony.luck@intel.com> wrote:

> There can be races when multiple CPUs consume poison from the same
> page. The first into memory_failure() atomically sets the HWPoison
> page flag and begins hunting for tasks that map this page. Eventually
> it invalidates those mappings and may send a SIGBUS to the affected
> tasks.
> 
> But while all that work is going on, other CPUs see a "success"
> return code from memory_failure() and so they believe the error
> has been handled and continue executing.
> 
> Fix by wrapping most of the internal parts of memory_failure() in
> a mutex.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  mm/memory-failure.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 24210c9bd843..c1509f4b565e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1381,6 +1381,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>  	return rc;
>  }
>  
> +static DEFINE_MUTEX(mf_mutex);
> +
>  /**
>   * memory_failure - Handle memory failure of a page.
>   * @pfn: Page Number of the corrupted page
> @@ -1424,12 +1426,18 @@ int memory_failure(unsigned long pfn, int flags)
>  		return -ENXIO;
>  	}
>  
> +	mutex_lock(&mf_mutex);
> +
>  try_again:
> -	if (PageHuge(p))
> -		return memory_failure_hugetlb(pfn, flags);
> +	if (PageHuge(p)) {
> +		res = memory_failure_hugetlb(pfn, flags);
> +		goto out2;
> +	}
> +
>  	if (TestSetPageHWPoison(p)) {
>  		pr_err("Memory failure: %#lx: already hardware poisoned\n",
>  			pfn);
> +		mutex_unlock(&mf_mutex);
>  		return 0;
>  	}
>  
> @@ -1463,9 +1471,11 @@ int memory_failure(unsigned long pfn, int flags)
>  				res = MF_FAILED;
>  			}
>  			action_result(pfn, MF_MSG_BUDDY, res);
> +			mutex_unlock(&mf_mutex);
>  			return res == MF_RECOVERED ? 0 : -EBUSY;
>  		} else {
>  			action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> +			mutex_unlock(&mf_mutex);
>  			return -EBUSY;
>  		}
>  	}
> @@ -1473,6 +1483,7 @@ int memory_failure(unsigned long pfn, int flags)
>  	if (PageTransHuge(hpage)) {
>  		if (try_to_split_thp_page(p, "Memory Failure") < 0) {
>  			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> +			mutex_unlock(&mf_mutex);
>  			return -EBUSY;
>  		}
>  		VM_BUG_ON_PAGE(!page_count(p), p);
> @@ -1517,6 +1528,7 @@ int memory_failure(unsigned long pfn, int flags)
>  		num_poisoned_pages_dec();
>  		unlock_page(p);
>  		put_page(p);
> +		mutex_unlock(&mf_mutex);
>  		return 0;
>  	}
>  	if (hwpoison_filter(p)) {
> @@ -1524,6 +1536,7 @@ int memory_failure(unsigned long pfn, int flags)
>  			num_poisoned_pages_dec();
>  		unlock_page(p);
>  		put_page(p);
> +		mutex_unlock(&mf_mutex);
>  		return 0;
>  	}
>  
> @@ -1559,6 +1572,8 @@ int memory_failure(unsigned long pfn, int flags)
>  	res = identify_page_state(pfn, p, page_flags);
>  out:
>  	unlock_page(p);
> +out2:
> +	mutex_unlock(&mf_mutex);
>  	return res;
>  }
>  EXPORT_SYMBOL_GPL(memory_failure);

If others are OK with this method, then I am OK too.
But I have two concerns, May you take into account:

1. The memory_failure with 0 return code for race condition, then the kill_me_maybe() goes into branch:
	if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) &&
	    !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) {
		set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
		sync_core();
		return;
	}

while we place set_mce_nospec() here is for a reason, please see commit fd0e786d9d09024f67b.

2. When memory_failure return 0 and maybe return to user process, and it may re-execute the instruction triggering previous fault, this behavior
assume an implicit dependence that the related pte has been correctly set. or if not correctlily set, it will lead to infinite loop again.

-- 
Thanks!
Aili Yao

  parent reply	other threads:[~2021-03-09  2:05 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210224151619.67c29731@alex-virtual-machine>
     [not found] ` <20210224103105.GA16368@linux>
     [not found]   ` <20210225114329.4e1a41c6@alex-virtual-machine>
     [not found]     ` <20210225112818.GA10141@hori.linux.bs1.fc.nec.co.jp>
     [not found]       ` <20210225113930.GA7227@localhost.localdomain>
     [not found]         ` <20210225123806.GA15006@hori.linux.bs1.fc.nec.co.jp>
     [not found]           ` <20210225181542.GA178925@agluck-desk2.amr.corp.intel.com>
     [not found]             ` <20210226021907.GA27861@hori.linux.bs1.fc.nec.co.jp>
2021-02-26  2:59               ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-03  3:39                 ` Luck, Tony
2021-03-03  3:57                   ` Aili Yao
2021-03-03  8:39                     ` Aili Yao
2021-03-03 15:41                       ` Luck, Tony
2021-03-04  2:16                         ` Aili Yao
2021-03-04  4:19                           ` Aili Yao
2021-03-04  6:45                             ` Aili Yao
2021-03-04 23:57                               ` Luck, Tony
2021-03-05  1:30                                 ` Aili Yao
2021-03-05  1:36                                   ` Aili Yao
2021-03-05 22:11                                     ` Luck, Tony
2021-03-08  6:45                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 18:54                                         ` Luck, Tony
2021-03-08 22:38                                           ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 22:55                                             ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Luck, Tony
2021-03-08 23:42                                               ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  2:04                                               ` Aili Yao [this message]
2021-03-09  6:04                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  6:35                                                   ` [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-09  8:28                                                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09 20:01                                                       ` Luck, Tony
2021-03-10  8:05                                                         ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  1:55                                                         ` Jue Wang
2021-03-10  8:01                                                       ` Aili Yao
2021-03-09  6:38                                                   ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Aili Yao
2021-03-05 15:55                                   ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Luck, Tony
2021-03-10  6:10                                     ` Aili Yao
2021-03-11  8:55                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 11:23                                         ` Aili Yao
2021-03-11 17:05                                         ` Luck, Tony
2021-03-12  5:55                                           ` Aili Yao
2021-03-12 16:29                                             ` Luck, Tony
2021-03-12 23:48                                               ` Luck, Tony
2021-03-16  6:42                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-16  7:54                                                   ` Aili Yao
2021-03-17  0:29                                                 ` Luck, Tony
2021-03-17  9:07                                                   ` Aili Yao
2021-03-17  7:48                                         ` Aili Yao
2021-03-17  8:23                                           ` Aili Yao
     [not found]         ` <20210226105250.3a15e35c@alex-virtual-machine>
2021-02-26 17:58           ` Luck, Tony
2021-03-02  4:32             ` Aili Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210309100421.3d09b6b1@alex-virtual-machine \
    --to=yaoaili@kingsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).