linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier
@ 2021-08-19 22:44 Tony Luck
  2021-08-20 12:28 ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Tony Luck @ 2021-08-19 22:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-edac, linux-kernel, Tony Luck, Sumanth Kamatala

When a fatal machine check results in a system reset, Linux does
not clear the error(s) from machine check bank(s).

Hardware preserves the machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads,
logs, and clears all machine check banks.

But there is a problem. In:
commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
the call to mce_register_decode_chain() moved later in the boot sequence.
This means that /dev/mcelog doesn't see those early error logs.

This was partially fixed by:
commit cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users
of /dev/mcelog should expect to find any early errors logged to
standard places.

Split the initialization code in dev-mcelog.c into:
1) an "early" part that registers for mce notifications. Call this
directly from mcheck_init() because early_initcall() is still too late.
This allocation is too early for kzalloc() so use memblock_alloc().
2) "late" part that registers the /dev/mcelog character device.

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/mce.h           |  6 ++++++
 arch/x86/kernel/cpu/mce/core.c       |  1 +
 arch/x86/kernel/cpu/mce/dev-mcelog.c | 30 ++++++++++++++++++++++------
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0607ec4f5091..54841da55357 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -369,4 +369,10 @@ umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)	{ return
 #endif
 
 static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)	{ return mce_amd_feature_init(c); }
+
+#ifdef CONFIG_X86_MCELOG_LEGACY
+void __init dev_mcelog_init_decode_chain(void);
+#else
+static inline void dev_mcelog_init_decode_chain(void) { }
+#endif
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..76c04d472e32 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2202,6 +2202,7 @@ int __init mcheck_init(void)
 	mce_register_decode_chain(&early_nb);
 	mce_register_decode_chain(&mce_uc_nb);
 	mce_register_decode_chain(&mce_default_nb);
+	dev_mcelog_init_decode_chain();
 	mcheck_vendor_init_severity();
 
 	INIT_WORK(&mce_work, mce_gen_pool_process);
diff --git a/arch/x86/kernel/cpu/mce/dev-mcelog.c b/arch/x86/kernel/cpu/mce/dev-mcelog.c
index 100fbeebdc72..31b812c17ccc 100644
--- a/arch/x86/kernel/cpu/mce/dev-mcelog.c
+++ b/arch/x86/kernel/cpu/mce/dev-mcelog.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/kmod.h>
 #include <linux/poll.h>
+#include <linux/memblock.h>
 
 #include "internal.h"
 
@@ -341,20 +342,36 @@ static struct miscdevice mce_chrdev_device = {
 	&mce_chrdev_ops,
 };
 
-static __init int dev_mcelog_init_device(void)
+/*
+ * Register early to pick up logs when init code
+ * scans machine check banks for errors from previous
+ * kernel instance.
+ */
+__init void dev_mcelog_init_decode_chain(void)
 {
-	int mce_log_len;
-	int err;
+	int mce_log_entries, mce_log_len;
 
-	mce_log_len = max(MCE_LOG_MIN_LEN, num_online_cpus());
-	mcelog = kzalloc(struct_size(mcelog, entry, mce_log_len), GFP_KERNEL);
+	mce_log_entries = max(MCE_LOG_MIN_LEN, num_online_cpus());
+	mce_log_len = struct_size(mcelog, entry, mce_log_entries);
+	mcelog = memblock_alloc(mce_log_len, SMP_CACHE_BYTES);
 	if (!mcelog)
-		return -ENOMEM;
+		return;
 
+	memset(mcelog, 0, mce_log_len);
 	memcpy(mcelog->signature, MCE_LOG_SIGNATURE, sizeof(mcelog->signature));
 	mcelog->len = mce_log_len;
 	mcelog->recordlen = sizeof(struct mce);
 
+	mce_register_decode_chain(&dev_mcelog_nb);
+}
+
+static __init int dev_mcelog_init_device(void)
+{
+	int err;
+
+	if (!mcelog)
+		return -ENOMEM;
+
 	/* register character device /dev/mcelog */
 	err = misc_register(&mce_chrdev_device);
 	if (err) {
@@ -365,6 +382,7 @@ static __init int dev_mcelog_init_device(void)
 			pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
 
 		kfree(mcelog);
+		mce_register_decode_chain(&dev_mcelog_nb);
 		return err;
 	}
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier
  2021-08-19 22:44 [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier Tony Luck
@ 2021-08-20 12:28 ` Borislav Petkov
  2021-08-20 14:43   ` Luck, Tony
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2021-08-20 12:28 UTC (permalink / raw)
  To: Tony Luck; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

On Thu, Aug 19, 2021 at 03:44:52PM -0700, Tony Luck wrote:
> which made sure that the logs were not lost completely by printing
> to the console. But parsing console logs is error prone. Users
> of /dev/mcelog should expect to find any early errors logged to
> standard places.

Yes, and for that matter, *all* consumers which register on the decoding
chain should get a chance to look at those records...

> Split the initialization code in dev-mcelog.c into:
> 1) an "early" part that registers for mce notifications. Call this
> directly from mcheck_init() because early_initcall() is still too late.
> This allocation is too early for kzalloc() so use memblock_alloc().
> 2) "late" part that registers the /dev/mcelog character device.

... but this looks like a hack to me: why aren't we adding those early
records to the gen_pool and kick the work to consume them *only* *after*
all consumers have been registered properly and everything is up and
running?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier
  2021-08-20 12:28 ` Borislav Petkov
@ 2021-08-20 14:43   ` Luck, Tony
  2021-08-20 15:48     ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2021-08-20 14:43 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

On Fri, Aug 20, 2021 at 02:28:45PM +0200, Borislav Petkov wrote:
> On Thu, Aug 19, 2021 at 03:44:52PM -0700, Tony Luck wrote:
> > which made sure that the logs were not lost completely by printing
> > to the console. But parsing console logs is error prone. Users
> > of /dev/mcelog should expect to find any early errors logged to
> > standard places.
> 
> Yes, and for that matter, *all* consumers which register on the decoding
> chain should get a chance to look at those records...
> 
> > Split the initialization code in dev-mcelog.c into:
> > 1) an "early" part that registers for mce notifications. Call this
> > directly from mcheck_init() because early_initcall() is still too late.
> > This allocation is too early for kzalloc() so use memblock_alloc().
> > 2) "late" part that registers the /dev/mcelog character device.
> 
> ... but this looks like a hack to me: why aren't we adding those early
> records to the gen_pool and kick the work to consume them *only* *after*
> all consumers have been registered properly and everything is up and
> running?

How can the kernel tell that all consumers have registered? Is there
some new kernel crystal ball functionality that can predict that an
EDAC driver module is going to be loaded at some point in the future
when user space is up and running :-)

I think the best we could do would be to set a timer for some point
far enough out (one minute?, two minutes?) to give a chance for
modules to load. But this seems even more hacky ... I have no idea
how much time is enough? In this particular case we know that the
system crashed before ... maybe the file systems are going to need
a fsck(8) before modules are loaded?

-Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier
  2021-08-20 14:43   ` Luck, Tony
@ 2021-08-20 15:48     ` Borislav Petkov
  2021-08-23 18:45       ` Luck, Tony
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2021-08-20 15:48 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

On Fri, Aug 20, 2021 at 07:43:14AM -0700, Luck, Tony wrote:
> How can the kernel tell that all consumers have registered? Is there
> some new kernel crystal ball functionality that can predict that an
> EDAC driver module is going to be loaded at some point in the future
> when user space is up and running :-)

The crystal ball is called mcheck_late_init(). There's even:

        /*
         * Flush out everything that has been logged during early boot, now that
         * everything has been initialized (workqueues, decoders, ...).
         */
        mce_schedule_work();

in there. That thing is late_initcall() and by that time mcelog should
have been registered. And I wonder why isn't that working as expected...

> I think the best we could do would be to set a timer for some point
> far enough out (one minute?, two minutes?) to give a chance for
> modules to load.

Forget modules - only the built-in stuff. We cannot be waiting
indefinitely until someone loads mcelog for decoding.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier
  2021-08-20 15:48     ` Borislav Petkov
@ 2021-08-23 18:45       ` Luck, Tony
  2021-08-23 20:41         ` [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init() Luck, Tony
  0 siblings, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2021-08-23 18:45 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

On Fri, Aug 20, 2021 at 05:48:21PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 07:43:14AM -0700, Luck, Tony wrote:
> > How can the kernel tell that all consumers have registered? Is there
> > some new kernel crystal ball functionality that can predict that an
> > EDAC driver module is going to be loaded at some point in the future
> > when user space is up and running :-)
> 
> The crystal ball is called mcheck_late_init(). There's even:
> 
>         /*
>          * Flush out everything that has been logged during early boot, now that
>          * everything has been initialized (workqueues, decoders, ...).
>          */
>         mce_schedule_work();
> 
> in there. That thing is late_initcall() and by that time mcelog should
> have been registered. And I wonder why isn't that working as expected...
> 
> > I think the best we could do would be to set a timer for some point
> > far enough out (one minute?, two minutes?) to give a chance for
> > modules to load.
> 
> Forget modules - only the built-in stuff. We cannot be waiting
> indefinitely until someone loads mcelog for decoding.

I added some traces:

$ dmesg | grep mce:
[    0.033648] mce: mce_register_decode_chain fn=mce_early_notifier+0x0/0x50 pri=6
[    0.033655] mce: mce_register_decode_chain fn=uc_decode_notifier+0x0/0xd0 pri=5
[    0.033659] mce: mce_register_decode_chain fn=mce_default_notifier+0x0/0x30 pri=0
[    4.392631] mce: [Hardware Error]: Machine check events logged
[    4.393356] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 0: a000000000004321
[    4.395352] mce: [Hardware Error]: TSC 0
[    4.396352] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1629743651 SOCKET 0 APIC 0 microcode b000019
[   15.172861] mce: mce_register_decode_chain fn=dev_mce_log+0x0/0x110 pri=1
[   15.192101] mce: mcheck_late_init: calling mce_schedule_work()
[   31.618245] mce: mce_register_decode_chain fn=sbridge_mce_check_error+0x0/0x92 [sb_edac] pri=2

So you are right that mcheck_late_init() is called after dev_mce_log()
is registered.  But it seems someone kicks the queue long before that
happens.  Probably this:

void mce_log(struct mce *m)
{
        if (!mce_gen_pool_add(m))
                irq_work_queue(&mce_irq_work);
}

Could we add a flag:

static bool mce_ready_to_rock; // better name needed :-)

Which gets set in mcheck_late_init(). Then mce_log() becomes:

void mce_log(struct mce *m)
{
        if (!mce_gen_pool_add(m) && mce_ready_to_rock)
                irq_work_queue(&mce_irq_work);
}

-Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init()
  2021-08-23 18:45       ` Luck, Tony
@ 2021-08-23 20:41         ` Luck, Tony
  2021-08-23 20:51           ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2021-08-23 20:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

When a fatal machine check results in a system reset, Linux does
not clear the error(s) from machine check bank(s).

Hardware preserves the machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads,
logs, and clears all machine check banks.

But there is a problem. In:
commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
the call to mce_register_decode_chain() moved later in the boot sequence.
This means that /dev/mcelog doesn't see those early error logs.

This was partially fixed by:
commit cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users
of /dev/mcelog should expect to find any early errors logged to
standard places.

Delay processing logs until after all built-in code has had a chance
to register on the mce notifier chain (modules are still out of luck,
there's not way to know how long to wait for those to load).

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..593af202f586 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -129,6 +129,8 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
  */
 BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
 
+static bool mce_init_complete;
+
 /* Do initial initialization of a struct mce */
 noinstr void mce_setup(struct mce *m)
 {
@@ -155,7 +157,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
 void mce_log(struct mce *m)
 {
-	if (!mce_gen_pool_add(m))
+	if (!mce_gen_pool_add(m) && mce_init_complete)
 		irq_work_queue(&mce_irq_work);
 }
 EXPORT_SYMBOL_GPL(mce_log);
@@ -2771,6 +2773,8 @@ static int __init mcheck_late_init(void)
 
 	mcheck_debugfs_init();
 
+	mce_init_complete = true;
+
 	/*
 	 * Flush out everything that has been logged during early boot, now that
 	 * everything has been initialized (workqueues, decoders, ...).
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init()
  2021-08-23 20:41         ` [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init() Luck, Tony
@ 2021-08-23 20:51           ` Borislav Petkov
  2021-08-23 21:41             ` Luck, Tony
  2021-08-24  0:31             ` [PATCH v3] x86/mce: Defer processing of early errors Luck, Tony
  0 siblings, 2 replies; 10+ messages in thread
From: Borislav Petkov @ 2021-08-23 20:51 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

On Mon, Aug 23, 2021 at 01:41:22PM -0700, Luck, Tony wrote:
>  arch/x86/kernel/cpu/mce/core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

I actually had a different idea in mind, considering the fact that some
machinery to only log the early MCEs is already there. And this fits
more naturally in the flow and doesn't need a bool switch.

Hmmm?

---
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0607ec4f5091..9b13cca74f65 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -265,6 +265,7 @@ enum mcp_flags {
 	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
 	MCP_UC		= BIT(1),	/* log uncorrected errors */
 	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
+	MCP_LOG_ONLY	= BIT(3),	/* log only */
 };
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..bb691503c2e4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -817,7 +817,10 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
 			goto clear_it;
 
-		mce_log(&m);
+		if (flags & MCP_LOG_ONLY)
+			mce_gen_pool_add(&m);
+		else
+			mce_log(&m);
 
 clear_it:
 		/*
@@ -1639,10 +1642,12 @@ static void __mcheck_cpu_init_generic(void)
 		m_fl = MCP_DONTLOG;
 
 	/*
-	 * Log the machine checks left over from the previous reset.
+	 * Log the machine checks left over from the previous reset. Log them
+	 * only, do not start processing them. That will happen in mcheck_late_init()
+	 * when all consumers have been registered on the notifier chain.
 	 */
 	bitmap_fill(all_banks, MAX_NR_BANKS);
-	machine_check_poll(MCP_UC | m_fl, &all_banks);
+	machine_check_poll(MCP_UC | MCP_LOG_ONLY | m_fl, &all_banks);
 
 	cr4_set_bits(X86_CR4_MCE);
 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init()
  2021-08-23 20:51           ` Borislav Petkov
@ 2021-08-23 21:41             ` Luck, Tony
  2021-08-24  0:31             ` [PATCH v3] x86/mce: Defer processing of early errors Luck, Tony
  1 sibling, 0 replies; 10+ messages in thread
From: Luck, Tony @ 2021-08-23 21:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

+	MCP_LOG_ONLY	= BIT(3),	/* log only */

That looks nice ... except for the name & comment here. They don't
really capture what this flag bit does.

Maybe

	MCP_QUEUE_LOG	= BIT(3),	/* Just queue to genpool */

-Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] x86/mce: Defer processing of early errors
  2021-08-23 20:51           ` Borislav Petkov
  2021-08-23 21:41             ` Luck, Tony
@ 2021-08-24  0:31             ` Luck, Tony
  2021-08-24  8:44               ` [tip: ras/core] " tip-bot2 for Borislav Petkov
  1 sibling, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2021-08-24  0:31 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-edac, linux-kernel, Sumanth Kamatala

From: Borislav Petkov <bp@alien8.de>

When a fatal machine check results in a system reset, Linux does
not clear the error(s) from machine check bank(s).

Hardware preserves the machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads,
logs, and clears all machine check banks.

But there is a problem. In:
commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
the call to mce_register_decode_chain() moved later in the boot sequence.
This means that /dev/mcelog doesn't see those early error logs.

This was partially fixed by:
commit cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users
of /dev/mcelog should expect to find any early errors logged to
standard places.

Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used
in early machine check initialization to indicate that any errors
found should just be queued to genpool. When mcheck_late_init() is
called it will call mce_schedule_work() to actually log any errors
queued in the genpool

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[Boris: Your code (with one name change) my commit log]
---
 arch/x86/include/asm/mce.h     |  1 +
 arch/x86/kernel/cpu/mce/core.c | 11 ++++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0607ec4f5091..da9321548f6f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -265,6 +265,7 @@ enum mcp_flags {
 	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
 	MCP_UC		= BIT(1),	/* log uncorrected errors */
 	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
+	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
 };
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..8cb7816d03b4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -817,7 +817,10 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
 			goto clear_it;
 
-		mce_log(&m);
+		if (flags & MCP_QUEUE_LOG)
+			mce_gen_pool_add(&m);
+		else
+			mce_log(&m);
 
 clear_it:
 		/*
@@ -1639,10 +1642,12 @@ static void __mcheck_cpu_init_generic(void)
 		m_fl = MCP_DONTLOG;
 
 	/*
-	 * Log the machine checks left over from the previous reset.
+	 * Log the machine checks left over from the previous reset. Log them
+	 * only, do not start processing them. That will happen in mcheck_late_init()
+	 * when all consumers have been registered on the notifier chain.
 	 */
 	bitmap_fill(all_banks, MAX_NR_BANKS);
-	machine_check_poll(MCP_UC | m_fl, &all_banks);
+	machine_check_poll(MCP_UC | MCP_QUEUE_LOG | m_fl, &all_banks);
 
 	cr4_set_bits(X86_CR4_MCE);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip: ras/core] x86/mce: Defer processing of early errors
  2021-08-24  0:31             ` [PATCH v3] x86/mce: Defer processing of early errors Luck, Tony
@ 2021-08-24  8:44               ` tip-bot2 for Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Borislav Petkov @ 2021-08-24  8:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sumanth Kamatala, Borislav Petkov, Tony Luck, x86, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     3bff147b187d5dfccfca1ee231b0761a89f1eff5
Gitweb:        https://git.kernel.org/tip/3bff147b187d5dfccfca1ee231b0761a89f1eff5
Author:        Borislav Petkov <bp@alien8.de>
AuthorDate:    Mon, 23 Aug 2021 17:31:29 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 24 Aug 2021 10:40:58 +02:00

x86/mce: Defer processing of early errors

When a fatal machine check results in a system reset, Linux does not
clear the error(s) from machine check bank(s) - hardware preserves the
machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads, logs,
and clears all machine check banks.

But there is a problem. In:

  5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")

the call to mce_register_decode_chain() moved later in the boot
sequence. This means that /dev/mcelog doesn't see those early error
logs.

This was partially fixed by:

  cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users of
/dev/mcelog should expect to find any early errors logged to standard
places.

Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early
machine check initialization to indicate that any errors found should
just be queued to genpool. When mcheck_late_init() is called it will
call mce_schedule_work() to actually log and flush any errors queued in
the genpool.

 [ Based on an original patch, commit message by and completely
   productized by Tony Luck. ]

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
---
 arch/x86/include/asm/mce.h     |  1 +
 arch/x86/kernel/cpu/mce/core.c | 11 ++++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0607ec4..da93215 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -265,6 +265,7 @@ enum mcp_flags {
 	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
 	MCP_UC		= BIT(1),	/* log uncorrected errors */
 	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
+	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
 };
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aa..8cb7816 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -817,7 +817,10 @@ log_it:
 		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
 			goto clear_it;
 
-		mce_log(&m);
+		if (flags & MCP_QUEUE_LOG)
+			mce_gen_pool_add(&m);
+		else
+			mce_log(&m);
 
 clear_it:
 		/*
@@ -1639,10 +1642,12 @@ static void __mcheck_cpu_init_generic(void)
 		m_fl = MCP_DONTLOG;
 
 	/*
-	 * Log the machine checks left over from the previous reset.
+	 * Log the machine checks left over from the previous reset. Log them
+	 * only, do not start processing them. That will happen in mcheck_late_init()
+	 * when all consumers have been registered on the notifier chain.
 	 */
 	bitmap_fill(all_banks, MAX_NR_BANKS);
-	machine_check_poll(MCP_UC | m_fl, &all_banks);
+	machine_check_poll(MCP_UC | MCP_QUEUE_LOG | m_fl, &all_banks);
 
 	cr4_set_bits(X86_CR4_MCE);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-24  8:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19 22:44 [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier Tony Luck
2021-08-20 12:28 ` Borislav Petkov
2021-08-20 14:43   ` Luck, Tony
2021-08-20 15:48     ` Borislav Petkov
2021-08-23 18:45       ` Luck, Tony
2021-08-23 20:41         ` [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init() Luck, Tony
2021-08-23 20:51           ` Borislav Petkov
2021-08-23 21:41             ` Luck, Tony
2021-08-24  0:31             ` [PATCH v3] x86/mce: Defer processing of early errors Luck, Tony
2021-08-24  8:44               ` [tip: ras/core] " tip-bot2 for Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).