[RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
@ 2020-09-06 21:21 Borislav Petkov
  2020-09-07 20:06 ` Luck, Tony
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Borislav Petkov @ 2020-09-06 21:21 UTC (permalink / raw)
  To: x86-ml, Tony Luck; +Cc: lkml

Hi,

Ingo and I talked about this thing this morning and tglx has had it on
his to-fix list too so here's a first attempt at it.

Below is just a brain dump of what we talked about so let's start with
it and see where it would take us.

Thx.

---

From: Borislav Petkov <bp@suse.de>

... without any exception handling and tracing.

If an exception needs to be handled while reading an MSR - which is in
most of the cases caused by a #GP on a non-existent MSR - then this
is most likely the incarnation of a BIOS or a hardware bug. Such bug
violates the architectural guarantee that MSR banks are present with all
MSRs belonging to them.

The proper fix belongs in the hardware/firmware - not in the kernel.

Handling exceptions while in #MC and while an NMI is being handled would
cause the nasty NMI nesting issue because of the shortcoming of IRET
of reenabling NMIs when executed. And the machine is in an #MC context
already so <Deity> be at its side.

Tracing MSR accesses while in #MC is another no-no due to tracing being
inherently a bad idea in atomic context:

  vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section

so remove all that "additional" functionality from mce_rdmsrl() and
concentrate on solely reading the MSRs.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mce/core.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0ba24dfffdb2..14ebdf3e22f3 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -376,7 +376,7 @@ static int msr_to_offset(u32 msr)
 /* MSR access wrappers used for error injection */
 static u64 mce_rdmsrl(u32 msr)
 {
-	u64 v;
+	DECLARE_ARGS(val, low, high);

 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
@@ -386,17 +386,13 @@ static u64 mce_rdmsrl(u32 msr)
 		return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset);
 	}

-	if (rdmsrl_safe(msr, &v)) {
-		WARN_ONCE(1, "mce: Unable to read MSR 0x%x!\n", msr);
-		/*
-		 * Return zero in case the access faulted. This should
-		 * not happen normally but can happen if the CPU does
-		 * something weird, or if the code is buggy.
-		 */
-		v = 0;
-	}
+	/*
+	 * RDMSR on MCA MSRs should not fault. If they do, this is very much an
+	 * architectural violation and needs to be reported to hw vendor.
+	 */
+	asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));

-	return v;
+	return EAX_EDX_VAL(val, low, high);
 }

 static void mce_wrmsrl(u32 msr, u64 v)
-- 
2.21.0

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-06 21:21 [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Borislav Petkov
@ 2020-09-07 20:06 ` Luck, Tony
  2020-09-08  9:46   ` Borislav Petkov
  2020-09-07 20:16 ` Andy Lutomirski
  2020-09-11  9:47 ` [tip: ras/core] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR tip-bot2 for Borislav Petkov
  2 siblings, 1 reply; 17+ messages in thread
From: Luck, Tony @ 2020-09-07 20:06 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, lkml

On Sun, Sep 06, 2020 at 11:21:30PM +0200, Borislav Petkov wrote:
> Hi,
> 
> Ingo and I talked about this thing this morning and tglx has had it on
> his to-fix list too so here's a first attempt at it.
> 
> Below is just a brain dump of what we talked about so let's start with
> it and see where it would take us.

This very much looks to be the right thing to do. Returning "0" from
a failed RDMSR in MCE handling might be a much worse thing to do than
crashing. It papers over a serious error and the system might keep
running using corrupted data.

Digging into the history it seems that this rdmsrl_safe() was added for
a possible bug on a pentiumIII back in 2009 that was eventually closed
as "unreproducible".

Do we have three distinct classes of calls to RDMSR now?

1) This case. Architecture checks have been made. This call can't
fail. We are calling from a tricky state (on IST) so no tracing
allowed.

2) Normal case ... architecture checks have been done so shouldn't
fail. Safe state for tracing etc. Use rdmsrl().

3) Probe case. Architecture didn't provide definitive enumeration,
so we might take a #GP. Use the rdmsrl_safe().

> +	/*
> +	 * RDMSR on MCA MSRs should not fault. If they do, this is very much an
> +	 * architectural violation and needs to be reported to hw vendor.
> +	 */
> +	asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));

If mce subsystem is the only instance of case "1", then this
inline is OK.  If there are others then rather than inlining
the asm here, we should have some new rdmsrl_notrace() inline
function declared for all to use.

Needs a:

Fixes: 11868a2dc4f5 ("x86: mce: Use safer ways to access MCE registers")

since this is undoing an earlier change.

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-07 20:06 ` Luck, Tony
@ 2020-09-08  9:46   ` Borislav Petkov
  2020-09-08 10:08     ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2020-09-08  9:46 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

On Mon, Sep 07, 2020 at 01:06:22PM -0700, Luck, Tony wrote:
> Digging into the history it seems that this rdmsrl_safe() was added for
> a possible bug on a pentiumIII back in 2009 that was eventually closed
> as "unreproducible".

So this is the $ 10^6 question so far: if I can assume that those two
boxes we found in bug reports which triggered this warn because they
failed the MSR read even though those MSRs are MCA architectural, if I
can assume that those are a one-off thing and that we won't support such
and we'd let machines like that crash and burn in testing, hopefully
before they get released to the public, then, we can do the plain MSR
read there.

No, tracing, no exception handling.

HOWEVER!

*If* that non-existent MSR read happens after all, then the machine
would #GP.

Our #GP handler, however, doesn't panic unless panic_on_oops, as Andy
pointed out on IRC yesterday and this makes a nasty situation even
worse.

Now, even if it did panic, reporting this would not be very
user-friendly because people would have to look at the Code: output of
the splat to see which insn rIP pointed to and then figure out what
happened.

Now, we can look at the code bytes around rIP ourselves, in the #GP
handler, and decide to panic if 0F 32, i.e., RDMSR opcode.

But that is ewww ugly to say the least.

So, Andy suggested we do a simple .fixup so that when the RDMSR fails,
in the fixup we panic directly.

The problem there is:

NMI -> #MC -> #GP ... IRET

Another #MC won't happen after we IRET because MSR_IA32_MCG_STATUS is
not cleared yet but the state is mightily confused and you don't want to
run code on that box anymore but panic and not move.

> Do we have three distinct classes of calls to RDMSR now?
> 
> 
> 1) This case. Architecture checks have been made. This call can't
> fail. We are calling from a tricky state (on IST) so no tracing
> allowed.

*If* we can hold hw people to the architectural guarantee, then this
would be the cleanest fix.

> 2) Normal case ... architecture checks have been done so shouldn't
> fail. Safe state for tracing etc. Use rdmsrl().

Yeah, but see above.

> 3) Probe case. Architecture didn't provide definitive enumeration,
> so we might take a #GP. Use the rdmsrl_safe().

... and that is bad because above *and* tracing can't handle atomic
context properly.

> If mce subsystem is the only instance of case "1", then this
> inline is OK.  If there are others then rather than inlining
> the asm here, we should have some new rdmsrl_notrace() inline
> function declared for all to use.

Yeah, Ingo suggested the same thing but let's solve the bigger issue
first.

> Needs a:
> 
> Fixes: 11868a2dc4f5 ("x86: mce: Use safer ways to access MCE registers")
> 
> since this is undoing an earlier change.

Ok.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-08  9:46   ` Borislav Petkov
@ 2020-09-08 10:08     ` Borislav Petkov
  2020-09-08 15:07       ` Luck, Tony
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2020-09-08 10:08 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

On Tue, Sep 08, 2020 at 11:46:50AM +0200, Borislav Petkov wrote:
> So, Andy suggested we do a simple .fixup so that when the RDMSR fails,
> in the fixup we panic directly.

Ok, so I think this is what Andy meant last night and PeterZ just
suggested it too:

We do a:

	 _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_panic)

which panics straight in the #GP handler and avoids the IRET.

I mean, we want to panic there because architectural violation and we
won't have to deal with the IRET...

Hohumm, sounds like the right thing to do in the face of the brokenness
of this magnificent architecture.</maxed out sarcasm>.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-08 10:08     ` Borislav Petkov
@ 2020-09-08 15:07       ` Luck, Tony
  2020-09-08 15:25         ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Luck, Tony @ 2020-09-08 15:07 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, lkml

> Ok, so I think this is what Andy meant last night and PeterZ just
> suggested it too:
>
> We do a:
>
>	 _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_panic)
>
> which panics straight in the #GP handler and avoids the IRET.

We can even get a nice diagnostic message since the handler
has access to "regs". It can print which MSR (regs->cx) and
where it happened (regs->ip).

Which sounds like you might want a specific ex_handler_rdmsr
function rather than a generic ex_handler_panic.

Maybe same deal for wrmsr() too? That would also print edx:eax
so you could see what was being written.

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-08 15:07       ` Luck, Tony
@ 2020-09-08 15:25         ` Borislav Petkov
  2020-09-09 11:30           ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2020-09-08 15:25 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

On Tue, Sep 08, 2020 at 03:07:05PM +0000, Luck, Tony wrote:
> We can even get a nice diagnostic message since the handler
> has access to "regs". It can print which MSR (regs->cx) and
> where it happened (regs->ip).
> 
> Which sounds like you might want a specific ex_handler_rdmsr
> function rather than a generic ex_handler_panic.
> 
> Maybe same deal for wrmsr() too? That would also print edx:eax
> so you could see what was being written.

Sounds good. Lemme try to come up with something these days.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-08 15:25         ` Borislav Petkov
@ 2020-09-09 11:30           ` Borislav Petkov
  2020-09-09 18:20             ` Luck, Tony
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2020-09-09 11:30 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

I guess something as straightforward as this:

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0ba24dfffdb2..9893caaf2696 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -373,10 +373,27 @@ static int msr_to_offset(u32 msr)
 	return -1;
 }
 
+__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr)
+{
+	if (pr_warn_once("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
+			 (unsigned int)regs->cx, regs->ip, (void *)regs->ip))
+		show_stack_regs(regs);
+
+	panic("MCA Architectural violation!\n");
+
+	while (true)
+		cpu_relax();
+
+	return true;
+}
+
 /* MSR access wrappers used for error injection */
 static u64 mce_rdmsrl(u32 msr)
 {
-	u64 v;
+	DECLARE_ARGS(val, low, high);
 
 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
@@ -386,21 +403,42 @@ static u64 mce_rdmsrl(u32 msr)
 		return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset);
 	}
 
-	if (rdmsrl_safe(msr, &v)) {
-		WARN_ONCE(1, "mce: Unable to read MSR 0x%x!\n", msr);
-		/*
-		 * Return zero in case the access faulted. This should
-		 * not happen normally but can happen if the CPU does
-		 * something weird, or if the code is buggy.
-		 */
-		v = 0;
-	}
+	/*
+	 * RDMSR on MCA MSRs should not fault. If they do, this is very much an
+	 * architectural violation and needs to be reported to hw vendor. Panic
+	 * the box to not allow any further progress.
+	 */
+	asm volatile("1: rdmsr\n"
+		     "2:\n"
+		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_fault)
+		     : EAX_EDX_RET(val, low, high) : "c" (msr));
 
-	return v;
+
+	return EAX_EDX_VAL(val, low, high);
+}
+
+__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr)
+{
+	if (pr_warn_once("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
+			 (unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
+			 regs->ip, (void *)regs->ip))
+		show_stack_regs(regs);
+
+	panic("MCA Architectural violation!\n");
+
+	while (true)
+		cpu_relax();
+
+	return true;
 }
 
 static void mce_wrmsrl(u32 msr, u64 v)
 {
+	u32 low, high;
+
 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
 
@@ -408,7 +446,15 @@ static void mce_wrmsrl(u32 msr, u64 v)
 			*(u64 *)((char *)this_cpu_ptr(&injectm) + offset) = v;
 		return;
 	}
-	wrmsrl(msr, v);
+
+	low  = (u32)v;
+	high = (u32)(v >> 32);
+
+	/* See comment in mce_rdmsrl() */
+	asm volatile("1: wrmsr\n"
+		     "2:\n"
+		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_fault)
+		     : : "c" (msr), "a"(low), "d" (high) : "memory");
 }
 
 /*

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-09 11:30           ` Borislav Petkov
@ 2020-09-09 18:20             ` Luck, Tony
  2020-09-09 20:03               ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Luck, Tony @ 2020-09-09 18:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, lkml

On Wed, Sep 09, 2020 at 01:30:22PM +0200, Borislav Petkov wrote:
> I guess something as straightforward as this:

Do we think there will be other places where we want this
MSR-or-die behaviour?  If there are, then most of this
belongs elsewhere from arch/x86/kernel/cpu/mce/core.c

> ---
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 0ba24dfffdb2..9893caaf2696 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -373,10 +373,27 @@ static int msr_to_offset(u32 msr)
>  	return -1;
>  }
>  
> +__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
> +				      struct pt_regs *regs, int trapnr,
> +				      unsigned long error_code,
> +				      unsigned long fault_addr)
> +{
> +	if (pr_warn_once("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",

The "_once" version seems a little pointless when the next statement in the function
is "panic()".

"warn" seems understated for an error that is going to crash the system.
Just go for "pr_emerg()".

There seems no consistency on using "rIP" or "RIP" ... but I think "RIP"
is slightly ahead.

> +			 (unsigned int)regs->cx, regs->ip, (void *)regs->ip))
> +		show_stack_regs(regs);
> +
> +	panic("MCA Architectural violation!\n");

nitpick: I don't thing Architectural needs to be capitalized.

> +
> +	while (true)
> +		cpu_relax();

Ugh. Is this why you have warn_once() ... because panic might return?

Above comments also apply to the wrmsr path.

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-09 18:20             ` Luck, Tony
@ 2020-09-09 20:03               ` Borislav Petkov
  2020-09-10 18:29                 ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2020-09-09 20:03 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

On Wed, Sep 09, 2020 at 11:20:51AM -0700, Luck, Tony wrote:
> Do we think there will be other places where we want this
> MSR-or-die behaviour?

MSR-or-die - I like that. That belongs on a T-shirt. :-)

> If there are, then most of this belongs elsewhere from
> arch/x86/kernel/cpu/mce/core.c

Yeah, I can't think of any other users needing this ATM. But if they do,
lifting those and moving them somewhere else is trivial.

> The "_once" version seems a little pointless when the next statement
> in the function is "panic()".

True, removed.

> "warn" seems understated for an error that is going to crash the system.
> Just go for "pr_emerg()".

Done.

> There seems no consistency on using "rIP" or "RIP" ... but I think "RIP"
> is slightly ahead.

The "r" in rIP means that it can be RIP or EIP and that code builds on
32-bit too.

> nitpick: I don't thing Architectural needs to be capitalized.

Done.

> Ugh. Is this why you have warn_once() ... because panic might return?

People tend to put all kinds of things in panic() and I want to make
sure that even if it returns for whatever reason, at some point in
the future, we don't make any further progress here. Think of it as a
paranoid precation of sorts...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-09 20:03               ` Borislav Petkov
@ 2020-09-10 18:29                 ` Borislav Petkov
  2020-09-10 18:38                   ` [PATCH -v2] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR Borislav Petkov
  2020-09-10 18:42                   ` [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Luck, Tony
  0 siblings, 2 replies; 17+ messages in thread
From: Borislav Petkov @ 2020-09-10 18:29 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

Ok, with all those changes, I don't think the following nice and juicy
message can be overlooked:

[   32.267830] mce: MSR access error: RDMSR from 0x1234 at rIP: 0xffffffff8102ed62 (mce_rdmsrl+0x12/0x50)
[   32.267838] Call Trace:
[   32.267838]  <#MC>
[   32.267838]  do_machine_check+0xbd/0x9f0
[   32.267839]  ? trigger_mce+0x7/0x10
[   32.267839]  exc_machine_check+0x77/0xd0
[   32.267840]  asm_exc_machine_check+0x19/0x30
[   32.267840] RIP: 0010:trigger_mce+0x7/0x10
[   32.267841] Code: c0 c3 90 0f 1f 44 00 00 48 8b 47 58 48 89 06 31 c0 c3 90 0f 1f 44 00 00 8b 47 44 48 89 06 31 c0 c3 66 90 0f 1f 44 00 00 cd 12 <c3> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 cd f4 c3 0f 1f 84 00 00
[   32.267841] RSP: 0018:ffffc90000003fc0 EFLAGS: 00000002
[   32.267842] RAX: ffffffff81032940 RBX: 0000000000000000 RCX: ffff88807dce9e80
[   32.267843] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   32.267843] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   32.267844] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   32.267844] R13: ffff88807dce9e80 R14: 0000000000000000 R15: 0000000000000000
[   32.267845]  ? inj_extcpu_get+0x10/0x10
[   32.267845]  </#MC>
[   32.267845]  <IRQ>
[   32.267846]  flush_smp_call_function_queue+0xce/0x1b0
[   32.267846]  __sysvec_call_function_single+0x2c/0xd0
[   32.267846]  asm_call_on_stack+0x12/0x20
[   32.267847]  </IRQ>
[   32.267847]  sysvec_call_function_single+0x6e/0x80
[   32.267847]  asm_sysvec_call_function_single+0x12/0x20
[   32.267848] RIP: 0010:default_idle+0x13/0x20
[   32.267849] Code: 0f ae 38 0f ae f0 eb b9 fb eb e3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d c1 9c 5e 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 65 8b 15 04
[   32.267849] RSP: 0018:ffffffff82203ec0 EFLAGS: 00000212
[   32.267850] RAX: ffffffff81819730 RBX: 0000000000000000 RCX: 0000000000029d00
[   32.267851] RDX: 000000000007de7e RSI: 7ffffff880b4b40b RDI: ffffffff8220f880
[   32.267851] RBP: ffffffff8220f880 R08: 0000000473333333 R09: 00000049ae089af4
[   32.267852] R10: 000000000006fecd R11: 0000000000006f96 R12: ffffffff8220f880
[   32.267852] R13: ffffffff8220f880 R14: 0000000000000000 R15: 0000000000000000
[   32.267852]  ? mwait_idle+0x80/0x80
[   32.267853]  default_idle_call+0x34/0xf0
[   32.267853]  do_idle+0x1ba/0x210
[   32.267853]  cpu_startup_entry+0x19/0x20
[   32.267854]  start_kernel+0x4f5/0x502
[   32.267854]  secondary_startup_64+0xa4/0xb0
[   32.267855] Kernel panic - not syncing: MCA architectural violation!
[   32.267855] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc4+ #4
[   32.267856] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[   32.267856] Call Trace:
[   32.267856]  <#MC>
[   32.267856]  dump_stack+0x57/0x6a
[   32.267857]  panic+0xf2/0x2c2
[   32.267857]  ? secondary_startup_64+0xa3/0xb0
[   32.267857]  ex_handler_rdmsr_fault+0x36/0x36
[   32.267858]  fixup_exception+0x56/0x80
[   32.267858]  exc_general_protection+0xb6/0x2d0
[   32.267858]  asm_exc_general_protection+0x1e/0x30
[   32.267859] RIP: 0010:mce_rdmsrl+0x12/0x50
[   32.267860] Code: 54 b9 15 82 48 c7 c1 20 e4 02 81 48 c7 c2 00 e4 02 81 e9 01 17 20 00 90 0f 1f 44 00 00 65 8a 05 07 81 fe 7e 84 c0 75 10 89 f9 <0f> 32 48 89 d7 48 c1 e7 20 48 09 f8 f3 c3 e8 7b f5 ff ff 48 63 d0
[   32.267860] RSP: 0018:fffffe000000fe30 EFLAGS: 00010046
[   32.267861] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001234
[   32.267861] RDX: 0000001d39aa4ebc RSI: fffffe000000ff08 RDI: 0000000000001234
[   32.267862] RBP: fffffe000000fe88 R08: 0000000000000000 R09: 0000000000000000
[   32.267862] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000000ff58
[   32.267863] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   32.267863]  do_machine_check+0xbd/0x9f0
[   32.267864]  ? trigger_mce+0x7/0x10
[   32.267864]  exc_machine_check+0x77/0xd0
[   32.267864]  asm_exc_machine_check+0x19/0x30
[   32.267865] RIP: 0010:trigger_mce+0x7/0x10
[   32.267866] Code: c0 c3 90 0f 1f 44 00 00 48 8b 47 58 48 89 06 31 c0 c3 90 0f 1f 44 00 00 8b 47 44 48 89 06 31 c0 c3 66 90 0f 1f 44 00 00 cd 12 <c3> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 cd f4 c3 0f 1f 84 00 00
[   32.267866] RSP: 0018:ffffc90000003fc0 EFLAGS: 00000002
[   32.267867] RAX: ffffffff81032940 RBX: 0000000000000000 RCX: ffff88807dce9e80
[   32.267867] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   32.267868] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   32.267868] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   32.267869] R13: ffff88807dce9e80 R14: 0000000000000000 R15: 0000000000000000
[   32.267869]  ? inj_extcpu_get+0x10/0x10
[   32.267869]  </#MC>
[   32.267870]  <IRQ>
[   32.267870]  flush_smp_call_function_queue+0xce/0x1b0
[   32.267870]  __sysvec_call_function_single+0x2c/0xd0
[   32.267871]  asm_call_on_stack+0x12/0x20
[   32.267871]  </IRQ>
[   32.267871]  sysvec_call_function_single+0x6e/0x80
[   32.267872]  asm_sysvec_call_function_single+0x12/0x20
[   32.267872] RIP: 0010:default_idle+0x13/0x20
[   32.267873] Code: 0f ae 38 0f ae f0 eb b9 fb eb e3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d c1 9c 5e 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 65 8b 15 04
[   32.267874] RSP: 0018:ffffffff82203ec0 EFLAGS: 00000212
[   32.267874] RAX: ffffffff81819730 RBX: 0000000000000000 RCX: 0000000000029d00
[   32.267875] RDX: 000000000007de7e RSI: 7ffffff880b4b40b RDI: ffffffff8220f880
[   32.267875] RBP: ffffffff8220f880 R08: 0000000473333333 R09: 00000049ae089af4
[   32.267876] R10: 000000000006fecd R11: 0000000000006f96 R12: ffffffff8220f880
[   32.267876] R13: ffffffff8220f880 R14: 0000000000000000 R15: 0000000000000000
[   32.267877]  ? mwait_idle+0x80/0x80
[   32.267877]  default_idle_call+0x34/0xf0
[   32.267877]  do_idle+0x1ba/0x210
[   32.267878]  cpu_startup_entry+0x19/0x20
[   32.267878]  start_kernel+0x4f5/0x502
[   32.267878]  secondary_startup_64+0xa4/0xb0
[   32.268204] Kernel Offset: disabled

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH -v2] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
  2020-09-10 18:29                 ` Borislav Petkov
@ 2020-09-10 18:38                   ` Borislav Petkov
  2020-09-10 18:42                   ` [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Luck, Tony
  1 sibling, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2020-09-10 18:38 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

From: Borislav Petkov <bp@suse.de>

If an exception needs to be handled while reading an MSR - which is in
most of the cases caused by a #GP on a non-existent MSR - then this
is most likely the incarnation of a BIOS or a hardware bug. Such bug
violates the architectural guarantee that MSR banks are present with all
MSRs belonging to them.

The proper fix belongs in the hardware/firmware - not in the kernel.

Handling an #MC exception which is raised while an NMI is being handled
would cause the nasty NMI nesting issue because of the shortcoming of
IRET of reenabling NMIs when executed. And the machine is in an #MC
context already so <Deity> be at its side.

Tracing MSR accesses while in #MC is another no-no due to tracing being
inherently a bad idea in atomic context:

  vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section

so remove all that "additional" functionality from mce_rdmsrl() and
provide it with a special exception handler which panics the machine
when that MSR is not accessible.

The exception handler prints a human-readable message explaining what
the panic reason is but, what is more, it panics while in the #GP
handler and latter won't have executed an IRET, thus opening the NMI
nesting issue in the case when the #MC has happened while handling
an NMI. (#MC itself won't be reenabled until MCG_STATUS hasn't been
cleared).

Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mce/core.c | 72 ++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0ba24dfffdb2..a697baee9aa0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -373,10 +373,28 @@ static int msr_to_offset(u32 msr)
 	return -1;
 }
 
+__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr)
+{
+	pr_emerg("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
+		 (unsigned int)regs->cx, regs->ip, (void *)regs->ip);
+
+	show_stack_regs(regs);
+
+	panic("MCA architectural violation!\n");
+
+	while (true)
+		cpu_relax();
+
+	return true;
+}
+
 /* MSR access wrappers used for error injection */
 static u64 mce_rdmsrl(u32 msr)
 {
-	u64 v;
+	DECLARE_ARGS(val, low, high);
 
 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
@@ -386,21 +404,43 @@ static u64 mce_rdmsrl(u32 msr)
 		return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset);
 	}
 
-	if (rdmsrl_safe(msr, &v)) {
-		WARN_ONCE(1, "mce: Unable to read MSR 0x%x!\n", msr);
-		/*
-		 * Return zero in case the access faulted. This should
-		 * not happen normally but can happen if the CPU does
-		 * something weird, or if the code is buggy.
-		 */
-		v = 0;
-	}
+	/*
+	 * RDMSR on MCA MSRs should not fault. If they do, this is very much an
+	 * architectural violation and needs to be reported to hw vendor. Panic
+	 * the box to not allow any further progress.
+	 */
+	asm volatile("1: rdmsr\n"
+		     "2:\n"
+		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_fault)
+		     : EAX_EDX_RET(val, low, high) : "c" (msr));
 
-	return v;
+
+	return EAX_EDX_VAL(val, low, high);
+}
+
+__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr)
+{
+	pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
+		 (unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
+		  regs->ip, (void *)regs->ip);
+
+	show_stack_regs(regs);
+
+	panic("MCA architectural violation!\n");
+
+	while (true)
+		cpu_relax();
+
+	return true;
 }
 
 static void mce_wrmsrl(u32 msr, u64 v)
 {
+	u32 low, high;
+
 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
 
@@ -408,7 +448,15 @@ static void mce_wrmsrl(u32 msr, u64 v)
 			*(u64 *)((char *)this_cpu_ptr(&injectm) + offset) = v;
 		return;
 	}
-	wrmsrl(msr, v);
+
+	low  = (u32)v;
+	high = (u32)(v >> 32);
+
+	/* See comment in mce_rdmsrl() */
+	asm volatile("1: wrmsr\n"
+		     "2:\n"
+		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_fault)
+		     : : "c" (msr), "a"(low), "d" (high) : "memory");
 }
 
 /*
-- 
2.21.0

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-10 18:29                 ` Borislav Petkov
  2020-09-10 18:38                   ` [PATCH -v2] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR Borislav Petkov
@ 2020-09-10 18:42                   ` Luck, Tony
  2020-09-10 18:54                     ` Borislav Petkov
  1 sibling, 1 reply; 17+ messages in thread
From: Luck, Tony @ 2020-09-10 18:42 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, lkml

On Thu, Sep 10, 2020 at 08:29:19PM +0200, Borislav Petkov wrote:
> Ok, with all those changes, I don't think the following nice and juicy
> message can be overlooked:
> 
> [   32.267830] mce: MSR access error: RDMSR from 0x1234 at rIP: 0xffffffff8102ed62 (mce_rdmsrl+0x12/0x50)

With only one call site the rIP isn't super helpful at the moment. But
once you start selling those "MSR or die" T-shirts everyone will want
to use this :-)

> [   32.267838] Call Trace:
> [   32.267838]  <#MC>
> [   32.267838]  do_machine_check+0xbd/0x9f0
> [   32.267839]  ? trigger_mce+0x7/0x10
> [   32.267839]  exc_machine_check+0x77/0xd0
> [   32.267840]  asm_exc_machine_check+0x19/0x30

Do we need the stack trace twice? Once from your fixup
function, second from panic()?

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-10 18:42                   ` [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Luck, Tony
@ 2020-09-10 18:54                     ` Borislav Petkov
  2020-09-10 19:43                       ` Luck, Tony
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2020-09-10 18:54 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86-ml, lkml

On Thu, Sep 10, 2020 at 11:42:06AM -0700, Luck, Tony wrote:
> With only one call site the rIP isn't super helpful at the moment. But
> once you start selling those "MSR or die" T-shirts everyone will want
> to use this :-)

:-)))

> Do we need the stack trace twice? Once from your fixup
> function, second from panic()?

The second panic comes from:

        if (lmce) {
                if (no_way_out)
                        mce_panic("Fatal local machine check", &m, msg);

in do_machine_check() and we panic earlier in mce_no_way_out() where I
stuck

	mce_rdmsrl(0x1234);

because that one does reads MCi_STATUS. mce_gather_info() could cause it
too because it gathers all the other MSRs.

Now, if we want to avoid the second panic(), we'd have to tell that site
above that we have panicked already but that would require some flag
variable or even more uglification.

Considering how this situation is supposed to almost never happen and
how we're actually interested in the first line of the whole splat I
pasted, how much output comes after it, doesn't really matter. All it
matters is that the machine stops any further progress (as much as we
can do that with a panic, ofc).

Or do you see a more important issue with a second panic to warrant such
not-really-needed change?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-10 18:54                     ` Borislav Petkov
@ 2020-09-10 19:43                       ` Luck, Tony
  0 siblings, 0 replies; 17+ messages in thread
From: Luck, Tony @ 2020-09-10 19:43 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, lkml

> Considering how this situation is supposed to almost never happen and
> how we're actually interested in the first line of the whole splat I
> pasted, how much output comes after it, doesn't really matter. All it
> matters is that the machine stops any further progress (as much as we
> can do that with a panic, ofc).

It would be a "nice to have" ... because all that extra noise pushed the
useful line off the top of the screen. For anyone with a serial console
it’s a non-issue. But since this is a "should never happen" case it isn't
worth creating a complex bunch of infrastructure.

You can slap a:

Reviewed-by: Tony Luck <tony.luck@intel.com>

on it when you push into GIT.

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-06 21:21 [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Borislav Petkov
  2020-09-07 20:06 ` Luck, Tony
@ 2020-09-07 20:16 ` Andy Lutomirski
  2020-09-07 20:27   ` Borislav Petkov
  2020-09-11  9:47 ` [tip: ras/core] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR tip-bot2 for Borislav Petkov
  2 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2020-09-07 20:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, Tony Luck, lkml

On Sun, Sep 6, 2020 at 2:21 PM Borislav Petkov <bp@alien8.de> wrote:
>
> Hi,
>
> Ingo and I talked about this thing this morning and tglx has had it on
> his to-fix list too so here's a first attempt at it.
>
> Below is just a brain dump of what we talked about so let's start with
> it and see where it would take us.
>
> Thx.
>
> ---
>
> From: Borislav Petkov <bp@suse.de>
>
> ... without any exception handling and tracing.
>
> If an exception needs to be handled while reading an MSR - which is in
> most of the cases caused by a #GP on a non-existent MSR - then this
> is most likely the incarnation of a BIOS or a hardware bug. Such bug
> violates the architectural guarantee that MSR banks are present with all
> MSRs belonging to them.
>
> The proper fix belongs in the hardware/firmware - not in the kernel.
>
> Handling exceptions while in #MC and while an NMI is being handled would
> cause the nasty NMI nesting issue because of the shortcoming of IRET
> of reenabling NMIs when executed. And the machine is in an #MC context
> already so <Deity> be at its side.
>
> Tracing MSR accesses while in #MC is another no-no due to tracing being
> inherently a bad idea in atomic context:
>
>   vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section
>
> so remove all that "additional" functionality from mce_rdmsrl() and
> concentrate on solely reading the MSRs.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 18 +++++++-----------
>  1 file changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 0ba24dfffdb2..14ebdf3e22f3 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -376,7 +376,7 @@ static int msr_to_offset(u32 msr)
>  /* MSR access wrappers used for error injection */
>  static u64 mce_rdmsrl(u32 msr)
>  {
> -       u64 v;
> +       DECLARE_ARGS(val, low, high);
>
>         if (__this_cpu_read(injectm.finished)) {
>                 int offset = msr_to_offset(msr);
> @@ -386,17 +386,13 @@ static u64 mce_rdmsrl(u32 msr)
>                 return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset);
>         }
>
> -       if (rdmsrl_safe(msr, &v)) {
> -               WARN_ONCE(1, "mce: Unable to read MSR 0x%x!\n", msr);
> -               /*
> -                * Return zero in case the access faulted. This should
> -                * not happen normally but can happen if the CPU does
> -                * something weird, or if the code is buggy.
> -                */
> -               v = 0;
> -       }
> +       /*
> +        * RDMSR on MCA MSRs should not fault. If they do, this is very much an
> +        * architectural violation and needs to be reported to hw vendor.
> +        */
> +       asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));

I don't like this.  Plain rdmsrl() will at least print a nice error if it fails.

Perhaps we should add a read_msr_panic() variant that panics on
failure?  Or, if there is just this one case, then we can use
rdmsrl_safe() and print a nice error and panic on failure.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only
  2020-09-07 20:16 ` Andy Lutomirski
@ 2020-09-07 20:27   ` Borislav Petkov
  0 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2020-09-07 20:27 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86-ml, Tony Luck, lkml

On Mon, Sep 07, 2020 at 01:16:43PM -0700, Andy Lutomirski wrote:
> > +       asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));
> 
> I don't like this.  Plain rdmsrl() will at least print a nice error if it fails.

I think you read my commit message too quickly :)

The point is to not print anything and not have *any* exception handling
or whatever - just plain RDMSR.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip: ras/core] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
  2020-09-06 21:21 [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Borislav Petkov
  2020-09-07 20:06 ` Luck, Tony
  2020-09-07 20:16 ` Andy Lutomirski
@ 2020-09-11  9:47 ` tip-bot2 for Borislav Petkov
  2 siblings, 0 replies; 17+ messages in thread
From: tip-bot2 for Borislav Petkov @ 2020-09-11  9:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Peter Zijlstra, kernel test robot,
	Borislav Petkov, Tony Luck, x86, LKML

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     e2def7d49d0812ea40a224161b2001b2e815dce2
Gitweb:        https://git.kernel.org/tip/e2def7d49d0812ea40a224161b2001b2e815dce2
Author:        Borislav Petkov <bp@suse.de>
AuthorDate:    Sun, 06 Sep 2020 23:02:52 +02:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 11 Sep 2020 08:25:43 +02:00

x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR

If an exception needs to be handled while reading an MSR - which is in
most of the cases caused by a #GP on a non-existent MSR - then this
is most likely the incarnation of a BIOS or a hardware bug. Such bug
violates the architectural guarantee that MCA banks are present with all
MSRs belonging to them.

The proper fix belongs in the hardware/firmware - not in the kernel.

Handling an #MC exception which is raised while an NMI is being handled
would cause the nasty NMI nesting issue because of the shortcoming of
IRET of reenabling NMIs when executed. And the machine is in an #MC
context already so <Deity> be at its side.

Tracing MSR accesses while in #MC is another no-no due to tracing being
inherently a bad idea in atomic context:

  vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section

so remove all that "additional" functionality from mce_rdmsrl() and
provide it with a special exception handler which panics the machine
when that MSR is not accessible.

The exception handler prints a human-readable message explaining what
the panic reason is but, what is more, it panics while in the #GP
handler and latter won't have executed an IRET, thus opening the NMI
nesting issue in the case when the #MC has happened while handling
an NMI. (#MC itself won't be reenabled until MCG_STATUS hasn't been
cleared).

Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
[ Add missing prototypes for ex_handler_* ]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200906212130.GA28456@zn.tnic
---
 arch/x86/kernel/cpu/mce/core.c     | 72 ++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/mce/internal.h | 10 ++++-
 2 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0ba24df..a697bae 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -373,10 +373,28 @@ static int msr_to_offset(u32 msr)
 	return -1;
 }
 
+__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr)
+{
+	pr_emerg("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
+		 (unsigned int)regs->cx, regs->ip, (void *)regs->ip);
+
+	show_stack_regs(regs);
+
+	panic("MCA architectural violation!\n");
+
+	while (true)
+		cpu_relax();
+
+	return true;
+}
+
 /* MSR access wrappers used for error injection */
 static u64 mce_rdmsrl(u32 msr)
 {
-	u64 v;
+	DECLARE_ARGS(val, low, high);
 
 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
@@ -386,21 +404,43 @@ static u64 mce_rdmsrl(u32 msr)
 		return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset);
 	}
 
-	if (rdmsrl_safe(msr, &v)) {
-		WARN_ONCE(1, "mce: Unable to read MSR 0x%x!\n", msr);
-		/*
-		 * Return zero in case the access faulted. This should
-		 * not happen normally but can happen if the CPU does
-		 * something weird, or if the code is buggy.
-		 */
-		v = 0;
-	}
+	/*
+	 * RDMSR on MCA MSRs should not fault. If they do, this is very much an
+	 * architectural violation and needs to be reported to hw vendor. Panic
+	 * the box to not allow any further progress.
+	 */
+	asm volatile("1: rdmsr\n"
+		     "2:\n"
+		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_fault)
+		     : EAX_EDX_RET(val, low, high) : "c" (msr));
 
-	return v;
+
+	return EAX_EDX_VAL(val, low, high);
+}
+
+__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr)
+{
+	pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
+		 (unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
+		  regs->ip, (void *)regs->ip);
+
+	show_stack_regs(regs);
+
+	panic("MCA architectural violation!\n");
+
+	while (true)
+		cpu_relax();
+
+	return true;
 }
 
 static void mce_wrmsrl(u32 msr, u64 v)
 {
+	u32 low, high;
+
 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
 
@@ -408,7 +448,15 @@ static void mce_wrmsrl(u32 msr, u64 v)
 			*(u64 *)((char *)this_cpu_ptr(&injectm) + offset) = v;
 		return;
 	}
-	wrmsrl(msr, v);
+
+	low  = (u32)v;
+	high = (u32)(v >> 32);
+
+	/* See comment in mce_rdmsrl() */
+	asm volatile("1: wrmsr\n"
+		     "2:\n"
+		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_fault)
+		     : : "c" (msr), "a"(low), "d" (high) : "memory");
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 6473070..b122610 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -185,4 +185,14 @@ extern bool amd_filter_mce(struct mce *m);
 static inline bool amd_filter_mce(struct mce *m)			{ return false; };
 #endif
 
+__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr);
+
+__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
+				      struct pt_regs *regs, int trapnr,
+				      unsigned long error_code,
+				      unsigned long fault_addr);
+
 #endif /* __X86_MCE_INTERNAL_H__ */

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-09-11  9:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-06 21:21 [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Borislav Petkov
2020-09-07 20:06 ` Luck, Tony
2020-09-08  9:46   ` Borislav Petkov
2020-09-08 10:08     ` Borislav Petkov
2020-09-08 15:07       ` Luck, Tony
2020-09-08 15:25         ` Borislav Petkov
2020-09-09 11:30           ` Borislav Petkov
2020-09-09 18:20             ` Luck, Tony
2020-09-09 20:03               ` Borislav Petkov
2020-09-10 18:29                 ` Borislav Petkov
2020-09-10 18:38                   ` [PATCH -v2] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR Borislav Petkov
2020-09-10 18:42                   ` [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only Luck, Tony
2020-09-10 18:54                     ` Borislav Petkov
2020-09-10 19:43                       ` Luck, Tony
2020-09-07 20:16 ` Andy Lutomirski
2020-09-07 20:27   ` Borislav Petkov
2020-09-11  9:47 ` [tip: ras/core] x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR tip-bot2 for Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).