All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Aili Yao <yaoaili@kingsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"inux-edac@vger.kernel.org" <inux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned
Date: Thu, 25 Feb 2021 11:28:18 +0000	[thread overview]
Message-ID: <20210225112818.GA10141@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20210225114329.4e1a41c6@alex-virtual-machine>

On Thu, Feb 25, 2021 at 11:43:29AM +0800, Aili Yao wrote:
> On Wed, 24 Feb 2021 11:31:55 +0100 Oscar Salvador <osalvador@suse.de> wrote:
...
>  
> > > 
> > > 3.The kill_me_maybe will check the return:
> > > 
> > > 1244 static void kill_me_maybe(struct callback_head *cb)
> > > 1245 {
> > > 
> > > 1254         if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) &&
> > > 1255             !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) {
> > > 1256                 set_mce_nospec(p->mce_addr >> PAGE_SHIFT,  
> > 
> > So, IIUC, in case of a LMCE nested call, the second MCE will reach here.
> > set_mce_nospec() will either mark the underlying page as not mapped/cached.
> >
> This set_mce_nospec() is not proper when the recovery job is on the fly. In my test
> this function failed.

Hi Aili,

I agree that this set_mce_nospec() is not expected to be called for
"already hwpoisoned" page because in the reported case the error
page is already contained and no need to resort changing cache mode.

...

> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index e9481632fcd1..06f006174b8c 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -1224,7 +1224,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> > >  	if (TestSetPageHWPoison(head)) {
> > >  		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > >  		       pfn);
> > > -		return 0;
> > > +		return -EBUSY;  
> > 
> > As David said, madvise_inject_error() will start returning -EBUSY now in case
> > we madvise(MADV_HWPOISON) on an already hwpoisoned page.
> > 
> > AFAICS, memory_failure() can return 0, -Eerrors, and MF_XXX.
> > Would it make sense to unify that? That way we could declare error codes that
> > make somse sense (like MF_ALREADY_HWPOISONED).

It seems to me that memory_failure() does not return MF_XXX.  But yes,
returning some positive value for the reported case could be a solution.

> > 
> 
> @David:
> 
> I checked the code again, and find a few places will care the exact return value, like:
> 
> 1: drivers/base/memory.c:483:      ret = memory_failure(pfn, 0);
> This is for hard page offline, I see the code in mcelog:
> static void offline_action(struct mempage *mp, u64 addr)
> {
> 	if (offline <= OFFLINE_ACCOUNT)
> 		return;
> 	Lprintf("Offlining page %llx\n", addr);
> 	if (memory_offline(addr) < 0) {
> 		Lprintf("Offlining page %llx failed: %s\n", addr, strerror(errno));
> 		mp->offlined = PAGE_OFFLINE_FAILED;
> 	} else
> 		mp->offlined = PAGE_OFFLINE;
> }
> I think return an negative value will be more proper? As the related killing function may not be performed, and we can't say
> it's a success operation?
> 
> 2:mm/hwpoison-inject.c:51:        return memory_failure(pfn, 0);
> mm/madvise.c:910:               ret = memory_failure(pfn, MF_COUNT_INCREASED);
> 
> These two cases are mainly for error injections, I checked the test codes, mostly it only care if the value is 0 or < 0;
> I do the related test, normally it work well, but for stress test, sometimes in some case, I do meet some fail cases along with the -EBUSY return.
> I will dig more.
> 
> Other place will only care if the return value is 0. or just ignore it.
> 
> Hi naoya, what's your opnion for this possible issue, I need your inputs!

We could use some negative value (error code) to report the reported case,
then as you mentioned above, some callers need change to handle the
new case, and the same is true if you use some positive value.
My preference is -EHWPOISON, but other options are fine if justified well.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2021-02-25 11:32 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  7:16 [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-02-24 10:10 ` David Hildenbrand
2021-02-24 10:31 ` Oscar Salvador
2021-02-25  3:43   ` Aili Yao
2021-02-25 11:28     ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-02-25 11:39       ` Oscar Salvador
2021-02-25 12:38         ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 18:15           ` Luck, Tony
2021-02-26  2:19             ` HORIGUCHI NAOYA(堀口 直也)
2021-02-26  2:59               ` Aili Yao
2021-03-03  3:39                 ` Luck, Tony
2021-03-03  3:57                   ` Aili Yao
2021-03-03  8:39                     ` Aili Yao
2021-03-03 15:41                       ` Luck, Tony
2021-03-04  2:16                         ` Aili Yao
2021-03-04  4:19                           ` Aili Yao
2021-03-04  6:45                             ` Aili Yao
2021-03-04 23:57                               ` Luck, Tony
2021-03-05  1:30                                 ` Aili Yao
2021-03-05  1:36                                   ` Aili Yao
2021-03-05 22:11                                     ` Luck, Tony
2021-03-08  6:45                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 18:54                                         ` Luck, Tony
2021-03-08 22:38                                           ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 22:55                                             ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Luck, Tony
2021-03-08 23:42                                               ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  2:04                                               ` Aili Yao
2021-03-09  6:04                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  6:35                                                   ` [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-09  8:28                                                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09 20:01                                                       ` Luck, Tony
2021-03-10  8:05                                                         ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  1:55                                                         ` Jue Wang
2021-03-13  1:55                                                           ` Jue Wang
2021-03-10  8:01                                                       ` Aili Yao
2021-03-31 11:25                                                     ` [PATCH v3] mm,hwpoison: return -EHWPOISON " Aili Yao
2021-04-01 15:33                                                       ` Luck, Tony
2021-04-02  1:18                                                         ` Aili Yao
2021-04-02 15:11                                                           ` Luck, Tony
2021-04-05 13:50                                                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-06  1:04                                                               ` Aili Yao
2021-03-09  6:38                                                   ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Aili Yao
2021-03-05 15:55                                   ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Luck, Tony
2021-03-10  6:10                                     ` Aili Yao
2021-03-11  8:55                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 11:23                                         ` Aili Yao
2021-03-11 17:05                                         ` Luck, Tony
2021-03-12  5:55                                           ` Aili Yao
2021-03-12 16:29                                             ` Luck, Tony
2021-03-12 23:48                                               ` Luck, Tony
2021-03-16  6:42                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-16  7:54                                                   ` Aili Yao
2021-03-17  0:29                                                 ` Luck, Tony
2021-03-17  9:07                                                   ` Aili Yao
2021-03-17  7:48                                         ` Aili Yao
2021-03-17  8:23                                           ` Aili Yao
2021-02-26  3:26               ` Tony Luck
2021-02-26  3:26                 ` Tony Luck
2021-02-26  2:52         ` Aili Yao
2021-02-26 17:58           ` Luck, Tony
2021-03-02  4:32             ` Aili Yao
2021-03-31 10:56         ` Aili Yao
2021-03-31 10:58           ` David Hildenbrand
2021-03-01 23:21 Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210225112818.GA10141@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=inux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.