linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: linux-mm@kvack.org, Tony Luck <tony.luck@intel.com>,
	Aili Yao <yaoaili@kingsoft.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Andy Lutomirski <luto@kernel.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Jue Wang <juew@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already
Date: Thu, 22 Apr 2021 19:02:04 +0200	[thread overview]
Message-ID: <20210422170204.GD7021@zn.tnic> (raw)
In-Reply-To: <20210421005728.1994268-3-nao.horiguchi@gmail.com>

On Wed, Apr 21, 2021 at 09:57:27AM +0900, Naoya Horiguchi wrote:
> From: Aili Yao <yaoaili@kingsoft.com>

> Subject: Re: [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already

		... Return -EHWPOISON to denote that the page has already been poisoned"

> When the page is already poisoned, another memory_failure() call in the
> same page now returns 0, meaning OK. For nested memory mce handling, this
> behavior may lead to one mce looping,

s/mce/MCE/g

> Example:

For example:

> 1. When LCME is enabled, and there are two processes A && B running on
> different core X && Y separately, which will access one same page, then

				  which access the same page...

s/&&/and/g

> the page corrupted when process A access it, a MCE will be rasied to
> core X and the error process is just underway.

... and you lost me here. I don't understand what that is trying to say.
Is that trying to say that when process A encounters the error, the MCE
will be raised on CPU X?

> 2. Then B access the page and trigger another MCE to core Y, it will also
> do error process, it will see TestSetPageHWPoison be true, and 0 is
> returned.

That sentence needs massaging.

> 3. The kill_me_maybe will check the return:
> 
>     1244 static void kill_me_maybe(struct callback_head *cb)
>     1245 {
>     ...
>     1254         if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) &&
>     1255             !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) {
>     1256                 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
>     1257                 sync_core();
>     1258                 return;
>     1259         }
>     ...
>     1267 }

No need for the line numbers.

> 4. The error process for B will end, and may nothing happened if
> kill-early is not set, The process B will re-excute instruction and get
> into mce again and then loop happens. And also the set_mce_nospec()
> here is not proper, may refer to commit fd0e786d9d09 ("x86/mm,
> mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages").

That needs massaging too.

> For other cases which care the return value of memory_failure() should
> check why they want to process a memory error which have already been
> processed. This behavior seems reasonable.

This whole commit message needs sanitizing.

Also, looking at the next patch, you can merge this one into the next
because the next one is acting on -EHWPOISON so it all belongs together
in a single patch.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2021-04-22 17:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  0:57 [PATCH v3 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Naoya Horiguchi
2021-04-21  0:57 ` [PATCH v3 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Naoya Horiguchi
2021-04-22 17:01   ` Borislav Petkov
2021-04-21  0:57 ` [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already Naoya Horiguchi
2021-04-22 17:02   ` Borislav Petkov [this message]
2021-04-23  2:17     ` HORIGUCHI NAOYA(堀口 直也)
2021-04-21  0:57 ` [PATCH v3 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
2021-04-22 17:02   ` Borislav Petkov
2021-04-23  2:18     ` HORIGUCHI NAOYA(堀口 直也)
2021-04-23 11:57       ` Borislav Petkov
2021-04-26  8:23         ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07  2:10 ` [PATCH v3 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Luck, Tony
2021-05-07  3:37   ` Luck, Tony
2021-05-07  5:24     ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07 11:14       ` Aili Yao
2021-05-07 18:02         ` Luck, Tony
2021-05-08  2:38           ` Aili Yao
2021-05-10  3:28             ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210422170204.GD7021@zn.tnic \
    --to=bp@alien8.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).