linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
@ 2023-01-27  1:50 Tony Luck
  2023-01-30  2:32 ` HORIGUCHI NAOYA(堀口 直也)
  2023-07-19 21:16 ` [PATCH v2] " Tony Luck
  0 siblings, 2 replies; 6+ messages in thread
From: Tony Luck @ 2023-01-27  1:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naoya Horiguchi, x86, linux-edac, linux-kernel, patches,
	Zhiquan Li, Youquan Song, Tony Luck

From: Zhiquan Li <zhiquan1.li@intel.com>

Kdump can exclude the HWPosion page to avoid touch the error page
again, the prerequisite is the PG_hwpoison page flag is set.
However, for some MCE fatal error cases, there are no opportunity
to queue a task for calling memory_failure(), as a result,
the capture kernel touches the error page again and panics.

Add function mce_set_page_hwpoison_now() which mark a page as
HWPoison before kernel panic() for MCE error, so that the dump
program can check and skip the error page and prevent the capture
kernel panic.

[Tony: Changed TestSetPageHWPoison() to SetPageHWPoison()]

Co-developedd-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..0630999c6311 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -162,6 +162,24 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
 
+/*
+ * Kdump can exclude the HWPosion page to avoid touch the error page again,
+ * the prerequisite is the PG_hwpoison page flag is set. However, for some
+ * MCE fatal error cases, there are no opportunity to queue a task
+ * for calling memory_failure(), as a result, the capture kernel panic.
+ * This function mark the page as HWPoison before kernel panic() for MCE.
+ */
+static void mce_set_page_hwpoison_now(unsigned long pfn)
+{
+	struct page *p;
+
+	/* TODO: need to handle other sort of page, like SGX, PMEM and
+	 * HugeTLB pages*/
+	p = pfn_to_online_page(pfn);
+	if (p)
+		SetPageHWPoison(p);
+}
+
 static void __print_mce(struct mce *m)
 {
 	pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
@@ -292,6 +310,8 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	if (!fake_panic) {
 		if (panic_timeout == 0)
 			panic_timeout = mca_cfg.panic_timeout;
+		if (final && (final->status & MCI_STATUS_ADDRV))
+			 mce_set_page_hwpoison_now(final->addr >> PAGE_SHIFT);
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
  2023-01-27  1:50 [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic Tony Luck
@ 2023-01-30  2:32 ` HORIGUCHI NAOYA(堀口 直也)
  2023-01-30 19:21   ` Luck, Tony
  2023-07-19 21:16 ` [PATCH v2] " Tony Luck
  1 sibling, 1 reply; 6+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2023-01-30  2:32 UTC (permalink / raw)
  To: Tony Luck
  Cc: Borislav Petkov, x86, linux-edac, linux-kernel, patches,
	Zhiquan Li, Youquan Song

On Thu, Jan 26, 2023 at 05:50:30PM -0800, Tony Luck wrote:
> From: Zhiquan Li <zhiquan1.li@intel.com>
> 
> Kdump can exclude the HWPosion page to avoid touch the error page
> again, the prerequisite is the PG_hwpoison page flag is set.
> However, for some MCE fatal error cases, there are no opportunity
> to queue a task for calling memory_failure(), as a result,
> the capture kernel touches the error page again and panics.
> 
> Add function mce_set_page_hwpoison_now() which mark a page as
> HWPoison before kernel panic() for MCE error, so that the dump
> program can check and skip the error page and prevent the capture
> kernel panic.
> 
> [Tony: Changed TestSetPageHWPoison() to SetPageHWPoison()]
> 
> Co-developedd-by: Youquan Song <youquan.song@intel.com>
> Signed-off-by: Youquan Song <youquan.song@intel.com>
> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Hi,
Thank you for the patch.

> ---
>  arch/x86/kernel/cpu/mce/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2c8ec5c71712..0630999c6311 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -162,6 +162,24 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
>  }
>  EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
>  
> +/*
> + * Kdump can exclude the HWPosion page to avoid touch the error page again,
> + * the prerequisite is the PG_hwpoison page flag is set. However, for some
> + * MCE fatal error cases, there are no opportunity to queue a task
> + * for calling memory_failure(), as a result, the capture kernel panic.

s/panic/panics/ ?

> + * This function mark the page as HWPoison before kernel panic() for MCE.

s/mark/marks/

> + */
> +static void mce_set_page_hwpoison_now(unsigned long pfn)
> +{
> +	struct page *p;
> +
> +	/* TODO: need to handle other sort of page, like SGX, PMEM and
> +	 * HugeTLB pages*/

Although I'm not sure that SGX memory or PMEM pages are expected to be
included in kdump, but simply setting PageHWPoison does not work for them?
(Maybe that depends on how kdump handles these types of memory.)

As for HugeTLB, kdump utility should parse the struct page and be aware of
HugeTLB pages, so maybe setting PageHWPoison on the head page could work.

> +	p = pfn_to_online_page(pfn);
> +	if (p)
> +		SetPageHWPoison(p);
> +}
> +
>  static void __print_mce(struct mce *m)
>  {
>  	pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
> @@ -292,6 +310,8 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
>  	if (!fake_panic) {
>  		if (panic_timeout == 0)
>  			panic_timeout = mca_cfg.panic_timeout;
> +		if (final && (final->status & MCI_STATUS_ADDRV))
> +			 mce_set_page_hwpoison_now(final->addr >> PAGE_SHIFT);
>  		panic(msg);

I think that setting PageHWPoison outside hwpoison subsystem is OK here,
because this is called just before calling panic() so it's expected to not
conflict with other hwpoison-related code.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
  2023-01-30  2:32 ` HORIGUCHI NAOYA(堀口 直也)
@ 2023-01-30 19:21   ` Luck, Tony
  2023-01-31  4:50     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2023-01-30 19:21 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Borislav Petkov, x86, linux-edac, linux-kernel, patches, Li,
	Zhiquan1, Song, Youquan

> Although I'm not sure that SGX memory or PMEM pages are expected to be
> included in kdump, but simply setting PageHWPoison does not work for them?
> (Maybe that depends on how kdump handles these types of memory.)

SGX/TDX pages can't be dumped. They are encrypted with no way for kdump to
get the key.

PMEM seems pointless (but I don't know what kdump does here).

> As for HugeTLB, kdump utility should parse the struct page and be aware of
> HugeTLB pages, so maybe setting PageHWPoison on the head page could work.

Or maybe kdump can take not of the PageHWPoison flag on the sub-page of the
huge page? It depends on whether there is any benefit to the dump to include the
not-poisoned parts of a huge page.


> I think that setting PageHWPoison outside hwpoison subsystem is OK here,
> because this is called just before calling panic() so it's expected to not
> conflict with other hwpoison-related code.

Thanks for the review.

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
  2023-01-30 19:21   ` Luck, Tony
@ 2023-01-31  4:50     ` HORIGUCHI NAOYA(堀口 直也)
  2023-02-01  2:47       ` Zhiquan Li
  0 siblings, 1 reply; 6+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2023-01-31  4:50 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, x86, linux-edac, linux-kernel, patches, Li,
	Zhiquan1, Song, Youquan

On Mon, Jan 30, 2023 at 07:21:15PM +0000, Luck, Tony wrote:
> > Although I'm not sure that SGX memory or PMEM pages are expected to be
> > included in kdump, but simply setting PageHWPoison does not work for them?
> > (Maybe that depends on how kdump handles these types of memory.)
> 
> SGX/TDX pages can't be dumped. They are encrypted with no way for kdump to
> get the key.
> 
> PMEM seems pointless (but I don't know what kdump does here).
> 
> > As for HugeTLB, kdump utility should parse the struct page and be aware of
> > HugeTLB pages, so maybe setting PageHWPoison on the head page could work.
> 
> Or maybe kdump can take not of the PageHWPoison flag on the sub-page of the
> huge page? It depends on whether there is any benefit to the dump to include the
> not-poisoned parts of a huge page.

I think that many kdump users filter out HugeTLB pages (setting dump_level
to filter "User pages") to reduce the size of kdump.  User pages are not
much helpful to investigate kernel problems, so filtering all sub-pages in
hwpoisoned hugepage seems to me not so harmful.

I don't say that saving healthy subpages has no benefit, but I don't know
much about usecases where user pages in kdump file help.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
  2023-01-31  4:50     ` HORIGUCHI NAOYA(堀口 直也)
@ 2023-02-01  2:47       ` Zhiquan Li
  0 siblings, 0 replies; 6+ messages in thread
From: Zhiquan Li @ 2023-02-01  2:47 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也), Luck, Tony
  Cc: Borislav Petkov, x86, linux-edac, linux-kernel, patches, Song, Youquan


On 2023/1/31 12:50, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Jan 30, 2023 at 07:21:15PM +0000, Luck, Tony wrote:
>>> Although I'm not sure that SGX memory or PMEM pages are expected to be
>>> included in kdump, but simply setting PageHWPoison does not work for them?
>>> (Maybe that depends on how kdump handles these types of memory.)
>>
>> SGX/TDX pages can't be dumped. They are encrypted with no way for kdump to
>> get the key.
>>
>> PMEM seems pointless (but I don't know what kdump does here).
>>
>>> As for HugeTLB, kdump utility should parse the struct page and be aware of
>>> HugeTLB pages, so maybe setting PageHWPoison on the head page could work.
>>
>> Or maybe kdump can take not of the PageHWPoison flag on the sub-page of the
>> huge page? It depends on whether there is any benefit to the dump to include the
>> not-poisoned parts of a huge page.
> 
> I think that many kdump users filter out HugeTLB pages (setting dump_level
> to filter "User pages") to reduce the size of kdump.  User pages are not
> much helpful to investigate kernel problems, so filtering all sub-pages in
> hwpoisoned hugepage seems to me not so harmful.
> 
> I don't say that saving healthy subpages has no benefit, but I don't know
> much about usecases where user pages in kdump file help.
> 

Many thanks, Naoya and Tony.

The "TODO" comment was initially added by me. I referenced
memory_failure() that gets rid of the three cases specifically, so I'd
like the cases are fully discussed here to make sure it is a strong fix.

- SGX, as Tony said those pages can't be dumped.
- PMEM, although the memory region is figured out by kexec, kdump
utility doesn't have corresponding implementation for it.  So we can
ignore it.
- HugeTLB
  As we known there are complicated processes for HugeTLB pages at
memory_failure(). However, it doesn't make sense for a routine going to
call panic().  Kernel marks the page hwpoisoned would be enough.

The information is sufficient for kdump to make decision on how to deal
with a hwpoisoned huge pages, even though setting the PageHWPoison flag
on the sub-page of a huge page.  It's kdump's responsibility for the
not-poisoned parts of a huge page.  Currently the implementation is to
exclude the whole huge page simply:

-
https://github.com/makedumpfile/makedumpfile/commit/e8b4f93b3260defe86f5e13ca7536c07f2e32914
- https://bugzilla.redhat.com/show_bug.cgi?id=1181649

Best Regards,
Zhiquan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
  2023-01-27  1:50 [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic Tony Luck
  2023-01-30  2:32 ` HORIGUCHI NAOYA(堀口 直也)
@ 2023-07-19 21:16 ` Tony Luck
  1 sibling, 0 replies; 6+ messages in thread
From: Tony Luck @ 2023-07-19 21:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: HORIGUCHI NAOYA, Li, Zhiquan1, Song, Youquan, x86, linux-edac,
	linux-kernel, patches, Tony Luck

From: Zhiquan Li <zhiquan1.li@intel.com>

Kdump can exclude the HWPosion page to avoid touch the error page
again, the prerequisite is the PG_hwpoison page flag is set.
However, for some MCE fatal error cases, there are no opportunity
to queue a task for calling memory_failure(), as a result,
the capture kernel touches the error page again and panics.

Add function mce_set_page_hwpoison_now() which mark a page as
HWPoison before kernel panic() for MCE error, so that the dump
program can check and skip the error page and prevent the capture
kernel panic.

[Tony: Changed TestSetPageHWPoison() to SetPageHWPoison()]

Co-developed-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

v2: Replaced "TODO" comment in code with comments based on mailing
list discussion on the lack of value in covering other page types

 arch/x86/kernel/cpu/mce/core.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 89e2aab5d34d..766f64fade51 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -156,6 +156,30 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
 
+/*
+ * Kdump can exclude the HWPosion page to avoid touch the error page again,
+ * the prerequisite is the PG_hwpoison page flag is set. However, for some
+ * MCE fatal error cases, there are no opportunity to queue a task
+ * for calling memory_failure(), as a result, the capture kernel panic.
+ * This function mark the page as HWPoison before kernel panic() for MCE.
+ *
+ * This covers normal 4KByte pages. There is little/no value in covering
+ * other page types. E.g.
+ * SGX: These cannot be dumped.
+ * PMEM: Pointless to dump these. Persistent memory contents remain
+ * available across reboots.
+ * HugeTLB: These are user pages. Generally filtered out of the kdump
+ * to keep size small. Not helpful to debug kernel issues.
+ */
+static void mce_set_page_hwpoison_now(unsigned long pfn)
+{
+	struct page *p;
+
+	p = pfn_to_online_page(pfn);
+	if (p)
+		SetPageHWPoison(p);
+}
+
 static void __print_mce(struct mce *m)
 {
 	pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
@@ -286,6 +310,8 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	if (!fake_panic) {
 		if (panic_timeout == 0)
 			panic_timeout = mca_cfg.panic_timeout;
+		if (final && (final->status & MCI_STATUS_ADDRV))
+			mce_set_page_hwpoison_now(final->addr >> PAGE_SHIFT);
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-19 21:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-27  1:50 [PATCH] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic Tony Luck
2023-01-30  2:32 ` HORIGUCHI NAOYA(堀口 直也)
2023-01-30 19:21   ` Luck, Tony
2023-01-31  4:50     ` HORIGUCHI NAOYA(堀口 直也)
2023-02-01  2:47       ` Zhiquan Li
2023-07-19 21:16 ` [PATCH v2] " Tony Luck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).