linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Implement recovery for errors in TDX/SEAM non-root mode
@ 2024-04-08 18:09 Tony Luck
  2024-04-18 18:16 ` Yazen Ghannam
  0 siblings, 1 reply; 3+ messages in thread
From: Tony Luck @ 2024-04-08 18:09 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel, linux-edac, patches, Tony Luck

Machine check SMIs (MSMI) signaled during SEAM operation (typically
inside TDX guests), on a system with Intel eMCA enabled, might eventually
be reported to the kernel #MC handler with the saved RIP on the stack
pointing to the instruction in kernel code after the SEAMCALL instruction
that entered the SEAM operation. Linux currently says that is a fatal
error and shuts down.

There is a new bit in IA32_MCG_STATUS that, when set to 1, indicates
that the machine check didn't originally occur at that saved RIP, but
during SEAM non-root operation.

Add new entries to the severity table to detect this for both data load
and instruction fetch that set the severity to "AR" (action required).

Increase the width of the mcgmask/mcgres fields in "struct severity"
from unsigned char to unsigned short since the new bit is in position 12.

Action required for these errors is just mark the page as poisoned and
return from the machine check handler.

Backport note. Little value in backporting this patch to stable or LTS
kernels as this is only relevant with support for TDX, which I assume
won't be backported. But for anyone taking this to v6.1 or older, you
also need commit a51cbd0d86d3 ("x86/mce: Use severity table to handle
uncorrected errors in kernel")

Signed-off-by: Tony Luck <tony.luck@intel.com>

---
The SEAM_NR bit in IA32_MCG_STATUS hasn't yet made it into the Intel
Software Developers' Manual. But it is described in section 16.5.2
of "Intel(R) Trust Domain Extensions (Intel(R) TDX) Module Base
Architecture Specification" downloadable from:
https://cdrdv2.intel.com/v1/dl/getContent/733575
---
 arch/x86/include/asm/mce.h         |  2 ++
 arch/x86/kernel/cpu/mce/core.c     | 18 ++++++++++++++++++
 arch/x86/kernel/cpu/mce/severity.c | 16 ++++++++++++++--
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index de3118305838..dfd2e9699bd7 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -13,6 +13,7 @@
 #define MCG_CTL_P		BIT_ULL(8)   /* MCG_CTL register available */
 #define MCG_EXT_P		BIT_ULL(9)   /* Extended registers available */
 #define MCG_CMCI_P		BIT_ULL(10)  /* CMCI supported */
+#define MCG_SEAM_NR		BIT_ULL(12)  /* MCG_STATUS_SEAM_NR supported */
 #define MCG_EXT_CNT_MASK	0xff0000     /* Number of Extended registers */
 #define MCG_EXT_CNT_SHIFT	16
 #define MCG_EXT_CNT(c)		(((c) & MCG_EXT_CNT_MASK) >> MCG_EXT_CNT_SHIFT)
@@ -25,6 +26,7 @@
 #define MCG_STATUS_EIPV		BIT_ULL(1)   /* ip points to correct instruction */
 #define MCG_STATUS_MCIP		BIT_ULL(2)   /* machine check in progress */
 #define MCG_STATUS_LMCES	BIT_ULL(3)   /* LMCE signaled */
+#define MCG_STATUS_SEAM_NR	BIT_ULL(12)  /* Machine check inside SEAM non-root mode */
 
 /* MCG_EXT_CTL register defines */
 #define MCG_EXT_CTL_LMCE_EN	BIT_ULL(0) /* Enable LMCE */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 84d41be6d06b..771a9f183260 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1593,6 +1593,24 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		else
 			queue_task_work(&m, msg, kill_me_maybe);
 
+	} else if (m.mcgstatus & MCG_STATUS_SEAM_NR) {
+		/*
+		 * Saved RIP on stack makes it look like the machine check
+		 * was taken in the kernel on the instruction following
+		 * the entry to SEAM mode. But MCG_STATUS_SEAM_NR indicates
+		 * that the machine check was taken inside SEAM non-root
+		 * mode.  CPU core has already marked that guest as dead.
+		 * It is OK for the kernel to resume execution at the
+		 * apparent point of the machine check as the fault did
+		 * not occur there. Mark the page as poisoned so it won't
+		 * be added to free list when the guest is terminated.
+		 */
+		if (mce_usable_address(&m)) {
+			struct page *p = pfn_to_online_page(m.addr >> PAGE_SHIFT);
+
+			if (p)
+				SetPageHWPoison(p);
+		}
 	} else {
 		/*
 		 * Handle an MCE which has happened in kernel space but from
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..fc8988cfe1c3 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -39,8 +39,8 @@ static struct severity {
 	u64 mask;
 	u64 result;
 	unsigned char sev;
-	unsigned char mcgmask;
-	unsigned char mcgres;
+	unsigned short mcgmask;
+	unsigned short mcgres;
 	unsigned char ser;
 	unsigned char context;
 	unsigned char excp;
@@ -173,6 +173,18 @@ static struct severity {
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
 		USER
 		),
+	MCESEV(
+		AR, "Data load error in SEAM non-root mode",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
+		MCGMASK(MCG_STATUS_SEAM_NR, MCG_STATUS_SEAM_NR),
+		KERNEL
+		),
+	MCESEV(
+		AR, "Instruction fetch error in SEAM non-root mode",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
+		MCGMASK(MCG_STATUS_SEAM_NR, MCG_STATUS_SEAM_NR),
+		KERNEL
+		),
 	MCESEV(
 		PANIC, "Data load in unrecoverable area of kernel",
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),

base-commit: fec50db7033ea478773b159e0e2efb135270e3b7
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/mce: Implement recovery for errors in TDX/SEAM non-root mode
  2024-04-08 18:09 [PATCH] x86/mce: Implement recovery for errors in TDX/SEAM non-root mode Tony Luck
@ 2024-04-18 18:16 ` Yazen Ghannam
  2024-04-22 16:57   ` Luck, Tony
  0 siblings, 1 reply; 3+ messages in thread
From: Yazen Ghannam @ 2024-04-18 18:16 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: yazen.ghannam, x86, linux-kernel, linux-edac, patches

On 4/8/2024 2:09 PM, Tony Luck wrote:
> Machine check SMIs (MSMI) signaled during SEAM operation (typically
> inside TDX guests), on a system with Intel eMCA enabled, might eventually
> be reported to the kernel #MC handler with the saved RIP on the stack
> pointing to the instruction in kernel code after the SEAMCALL instruction
> that entered the SEAM operation. Linux currently says that is a fatal
> error and shuts down.
> 
> There is a new bit in IA32_MCG_STATUS that, when set to 1, indicates
> that the machine check didn't originally occur at that saved RIP, but
> during SEAM non-root operation.
> 
> Add new entries to the severity table to detect this for both data load
> and instruction fetch that set the severity to "AR" (action required).
> 
> Increase the width of the mcgmask/mcgres fields in "struct severity"
> from unsigned char to unsigned short since the new bit is in position 12.
> 
> Action required for these errors is just mark the page as poisoned and
> return from the machine check handler.
> 
> Backport note. Little value in backporting this patch to stable or LTS
> kernels as this is only relevant with support for TDX, which I assume
> won't be backported. But for anyone taking this to v6.1 or older, you
> also need commit a51cbd0d86d3 ("x86/mce: Use severity table to handle
> uncorrected errors in kernel")
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> 
> ---
> The SEAM_NR bit in IA32_MCG_STATUS hasn't yet made it into the Intel
> Software Developers' Manual. But it is described in section 16.5.2
> of "Intel(R) Trust Domain Extensions (Intel(R) TDX) Module Base
> Architecture Specification" downloadable from:
> https://cdrdv2.intel.com/v1/dl/getContent/733575
> ---
>  arch/x86/include/asm/mce.h         |  2 ++
>  arch/x86/kernel/cpu/mce/core.c     | 18 ++++++++++++++++++
>  arch/x86/kernel/cpu/mce/severity.c | 16 ++++++++++++++--
>  3 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index de3118305838..dfd2e9699bd7 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -13,6 +13,7 @@
>  #define MCG_CTL_P		BIT_ULL(8)   /* MCG_CTL register available */
>  #define MCG_EXT_P		BIT_ULL(9)   /* Extended registers available */
>  #define MCG_CMCI_P		BIT_ULL(10)  /* CMCI supported */
> +#define MCG_SEAM_NR		BIT_ULL(12)  /* MCG_STATUS_SEAM_NR supported */
>  #define MCG_EXT_CNT_MASK	0xff0000     /* Number of Extended registers */
>  #define MCG_EXT_CNT_SHIFT	16
>  #define MCG_EXT_CNT(c)		(((c) & MCG_EXT_CNT_MASK) >> MCG_EXT_CNT_SHIFT)
> @@ -25,6 +26,7 @@
>  #define MCG_STATUS_EIPV		BIT_ULL(1)   /* ip points to correct instruction */
>  #define MCG_STATUS_MCIP		BIT_ULL(2)   /* machine check in progress */
>  #define MCG_STATUS_LMCES	BIT_ULL(3)   /* LMCE signaled */
> +#define MCG_STATUS_SEAM_NR	BIT_ULL(12)  /* Machine check inside SEAM non-root mode */
>  
>  /* MCG_EXT_CTL register defines */
>  #define MCG_EXT_CTL_LMCE_EN	BIT_ULL(0) /* Enable LMCE */
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 84d41be6d06b..771a9f183260 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1593,6 +1593,24 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  		else
>  			queue_task_work(&m, msg, kill_me_maybe);
>  
> +	} else if (m.mcgstatus & MCG_STATUS_SEAM_NR) {

MCG_CAP[12] (MCG_SEAM_NR) should be checked first, correct? This could be a
new mce_vendor_flags field set during MCA init.

> +		/*
> +		 * Saved RIP on stack makes it look like the machine check
> +		 * was taken in the kernel on the instruction following
> +		 * the entry to SEAM mode. But MCG_STATUS_SEAM_NR indicates
> +		 * that the machine check was taken inside SEAM non-root
> +		 * mode.  CPU core has already marked that guest as dead.
> +		 * It is OK for the kernel to resume execution at the
> +		 * apparent point of the machine check as the fault did
> +		 * not occur there. Mark the page as poisoned so it won't
> +		 * be added to free list when the guest is terminated.
> +		 */
> +		if (mce_usable_address(&m)) {
> +			struct page *p = pfn_to_online_page(m.addr >> PAGE_SHIFT);
> +
> +			if (p)
> +				SetPageHWPoison(p);
> +		}

I think this is okay, and it could even be more generalized as a "page
offline" action.

Here's some WIP for a generic MCE "action table":
https://github.com/AMDESE/linux/commit/cf0b8a97240ab

This is based on the short discussion here:
https://lore.kernel.org/linux-edac/ZD7gPkfWQeEeEfBe@agluck-desk3.sc.intel.com/

Basically, all the status bits would be checked in mce_severity() and the
appropriate action is set to be done later.

This would be future work, of course. What do you think?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] x86/mce: Implement recovery for errors in TDX/SEAM non-root mode
  2024-04-18 18:16 ` Yazen Ghannam
@ 2024-04-22 16:57   ` Luck, Tony
  0 siblings, 0 replies; 3+ messages in thread
From: Luck, Tony @ 2024-04-22 16:57 UTC (permalink / raw)
  To: Yazen Ghannam, Borislav Petkov; +Cc: x86, linux-kernel, linux-edac, patches

> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1593,6 +1593,24 @@ noinstr void do_machine_check(struct pt_regs *regs)
> >             else
> >                     queue_task_work(&m, msg, kill_me_maybe);
> >
> > +   } else if (m.mcgstatus & MCG_STATUS_SEAM_NR) {
>
> MCG_CAP[12] (MCG_SEAM_NR) should be checked first, correct? This could be a
> new mce_vendor_flags field set during MCA init.

For absolute architectural purity you are right. But the MCG_SEAM_NR bit has never been
used in IA32_MCG_STATUS, so I felt it would just be extra noise in a busy piece of code
to add it.

> > +           /*
> > +            * Saved RIP on stack makes it look like the machine check
> > +            * was taken in the kernel on the instruction following
> > +            * the entry to SEAM mode. But MCG_STATUS_SEAM_NR indicates
> > +            * that the machine check was taken inside SEAM non-root
> > +            * mode.  CPU core has already marked that guest as dead.
> > +            * It is OK for the kernel to resume execution at the
> > +            * apparent point of the machine check as the fault did
> > +            * not occur there. Mark the page as poisoned so it won't
> > +            * be added to free list when the guest is terminated.
> > +            */
> > +           if (mce_usable_address(&m)) {
> > +                   struct page *p = pfn_to_online_page(m.addr >> PAGE_SHIFT);
> > +
> > +                   if (p)
> > +                           SetPageHWPoison(p);
> > +           }
>
> I think this is okay, and it could even be more generalized as a "page
> offline" action.
>
> Here's some WIP for a generic MCE "action table":
> https://github.com/AMDESE/linux/commit/cf0b8a97240ab
> This is based on the short discussion here:
> https://lore.kernel.org/linux-edac/ZD7gPkfWQeEeEfBe@agluck-desk3.sc.intel.com/
>
> Basically, all the status bits would be checked in mce_severity() and the
> appropriate action is set to be done later.
>
> This would be future work, of course. What do you think?

That looks like a very nice start to tackle this. Much appreciated as I'm
in the beginning steps to figure out some other SRAR recovery actions.
Having a way to keep track of which action to take will make everything
much cleaner.

It would also solve the " architectural purity" issue above as the check for the
MCG_CAP bit could be imbedded in the severity -> action code. So do_machine_check()
would just have a switch(action) { case MCE_DO_THIS: ... }

-Tony

-Tony

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-22 16:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-08 18:09 [PATCH] x86/mce: Implement recovery for errors in TDX/SEAM non-root mode Tony Luck
2024-04-18 18:16 ` Yazen Ghannam
2024-04-22 16:57   ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).