linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] x86/msr: MSR access failure changes
@ 2015-09-21  0:02 Andy Lutomirski
  2015-09-21  0:02 ` [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
  2015-09-21  0:02 ` [PATCH v2 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski
  0 siblings, 2 replies; 23+ messages in thread
From: Andy Lutomirski @ 2015-09-21  0:02 UTC (permalink / raw)
  To: x86
  Cc: Paolo Bonzini, Peter Zijlstra, KVM list, Arjan van de Ven,
	xen-devel, linux-kernel, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Thomas Gleixner, Andy Lutomirski

This applies on top of my earlier paravirt MSR series.

Changes from v1:
 - Return zero instead of poison on bad RDMSRs.

Andy Lutomirski (2):
  x86/msr: Carry on after a non-"safe" MSR access fails without
    !panic_on_oops
  x86/msr: Set the return value to zero when native_rdmsr_safe fails

 arch/x86/include/asm/msr.h |  5 ++++-
 arch/x86/kernel/traps.c    | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21  0:02 [PATCH v2 0/2] x86/msr: MSR access failure changes Andy Lutomirski
@ 2015-09-21  0:02 ` Andy Lutomirski
  2015-09-21  0:15   ` Linus Torvalds
  2015-09-21  0:02 ` [PATCH v2 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski
  1 sibling, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2015-09-21  0:02 UTC (permalink / raw)
  To: x86
  Cc: Paolo Bonzini, Peter Zijlstra, KVM list, Arjan van de Ven,
	xen-devel, linux-kernel, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Thomas Gleixner, Andy Lutomirski

This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ON_ONCE and a return of zero (in the RDMSR case).
We still write a pr_info entry unconditionally for debugging.

To be clear, this type of failure should *not* happen.  This patch
exists to minimize the chance of nasty undebuggable failures due on
systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 346eec73f7db..f82987643e32 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -437,6 +437,58 @@ exit_trap:
 	do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
 }
 
+static bool paper_over_kernel_gpf(struct pt_regs *regs)
+{
+	/*
+	 * Try to decode the opcode that failed.  So far, we only care
+	 * about boring two-byte unprefixed opcodes, so we don't need
+	 * the full instruction decoder machinery.
+	 */
+	u16 opcode;
+
+	if (probe_kernel_read(&opcode, (const void *)regs->ip, sizeof(opcode)))
+		return false;
+
+	if (opcode == 0x320f) {
+		/* RDMSR */
+		pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
+			(unsigned int)regs->cx);
+		if (!panic_on_oops) {
+			WARN_ON_ONCE(true);
+
+			/*
+			 * Pretend that RDMSR worked and returned zero.  We
+			 * chose zero because zero seems less likely to
+			 * cause further malfunctions than any other value.
+			 */
+			regs->ax = 0;
+			regs->dx = 0;
+			regs->ip += 2;
+			return true;
+		} else {
+			/* Don't fix it up. */
+			return false;
+		}
+	} else if (opcode == 0x300f) {
+		/* WRMSR */
+		pr_info("bad kernel WRMSR writing 0x%08x%08x to MSR 0x%x",
+			(unsigned int)regs->dx, (unsigned int)regs->ax,
+			(unsigned int)regs->cx);
+		if (!panic_on_oops) {
+			WARN_ON_ONCE(true);
+
+			/* Pretend it worked and carry on. */
+			regs->ip += 2;
+			return true;
+		} else {
+			/* Don't fix it up. */
+			return false;
+		}
+	}
+
+	return false;
+}
+
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
@@ -456,6 +508,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
 		if (fixup_exception(regs))
 			return;
 
+		if (paper_over_kernel_gpf(regs))
+			return;
+
 		tsk->thread.error_code = error_code;
 		tsk->thread.trap_nr = X86_TRAP_GP;
 		if (notify_die(DIE_GPF, "general protection fault", regs, error_code,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails
  2015-09-21  0:02 [PATCH v2 0/2] x86/msr: MSR access failure changes Andy Lutomirski
  2015-09-21  0:02 ` [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
@ 2015-09-21  0:02 ` Andy Lutomirski
  1 sibling, 0 replies; 23+ messages in thread
From: Andy Lutomirski @ 2015-09-21  0:02 UTC (permalink / raw)
  To: x86
  Cc: Paolo Bonzini, Peter Zijlstra, KVM list, Arjan van de Ven,
	xen-devel, linux-kernel, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Thomas Gleixner, Andy Lutomirski

This will cause unchecked native_rdmsr_safe failures to return
deterministic results.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/msr.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 77d8b284e4a7..9eda52205d42 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -73,7 +73,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
 	asm volatile("2: rdmsr ; xor %[err],%[err]\n"
 		     "1:\n\t"
 		     ".section .fixup,\"ax\"\n\t"
-		     "3:  mov %[fault],%[err] ; jmp 1b\n\t"
+		     "3: mov %[fault],%[err]\n\t"
+		     "xorl %%eax, %%eax\n\t"
+		     "xorl %%edx, %%edx\n\t"
+		     "jmp 1b\n\t"
 		     ".previous\n\t"
 		     _ASM_EXTABLE(2b, 3b)
 		     : [err] "=r" (*err), EAX_EDX_RET(val, low, high)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21  0:02 ` [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
@ 2015-09-21  0:15   ` Linus Torvalds
  2015-09-21  1:13     ` Andy Lutomirski
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2015-09-21  0:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: the arch/x86 maintainers, Paolo Bonzini, Peter Zijlstra,
	KVM list, Arjan van de Ven, xen-devel, Linux Kernel Mailing List,
	Andrew Morton, Ingo Molnar, Thomas Gleixner

On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ON_ONCE and a return of zero (in the RDMSR case).
> We still write a pr_info entry unconditionally for debugging.

No, this is wrong.

If you really want to do something like this, then just make all MSR
reads safe. So the only difference between "safe" and "unsafe" is that
the unsafe version just doesn't check the return value, and silently
just returns zero for reads (or writes nothing).

To quote Obi-Wan: "Use the exception table, Luke".

Because decoding instructions is just too ugly. We'll do it for CPU
errata where we might have to do it for user space code too (ie the
AMD prefetch mess), but for code that _we_ control? Hell no.

So NAK on this.

                   Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21  0:15   ` Linus Torvalds
@ 2015-09-21  1:13     ` Andy Lutomirski
  2015-09-21  8:46       ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2015-09-21  1:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Paolo Bonzini, xen-devel, Arjan van de Ven,
	Andrew Morton, KVM list, the arch/x86 maintainers, Ingo Molnar,
	Linux Kernel Mailing List, Peter Zijlstra

On Sep 20, 2015 5:15 PM, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>
> On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case).
> > We still write a pr_info entry unconditionally for debugging.
>
> No, this is wrong.
>
> If you really want to do something like this, then just make all MSR
> reads safe. So the only difference between "safe" and "unsafe" is that
> the unsafe version just doesn't check the return value, and silently
> just returns zero for reads (or writes nothing).
>
> To quote Obi-Wan: "Use the exception table, Luke".
>
> Because decoding instructions is just too ugly. We'll do it for CPU
> errata where we might have to do it for user space code too (ie the
> AMD prefetch mess), but for code that _we_ control? Hell no.
>
> So NAK on this.

My personal preference is to just not do this at all.  A couple people
disagree.  If we make the unsafe variants not oops, then I think we
want to have the nice loud warning, since these issues are bugs if
they happen.

We could certainly use the exception table for this, but it'll result
in bigger core, since each MSR access will need an exception table
entry and an associated fixup to call some helper that warns and sets
the result to zero.

I'd be happy to implement that, but only if it'll be applied.
Otherwise I'd rather just drop this patch and keep the rest of the
series.

--Andy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21  1:13     ` Andy Lutomirski
@ 2015-09-21  8:46       ` Ingo Molnar
  2015-09-21 12:27         ` Paolo Bonzini
  2015-09-21 16:36         ` Linus Torvalds
  0 siblings, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2015-09-21  8:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Thomas Gleixner, Paolo Bonzini, xen-devel,
	Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Sep 20, 2015 5:15 PM, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
> >
> > On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > > This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> > > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case).
> > > We still write a pr_info entry unconditionally for debugging.
> >
> > No, this is wrong.
> >
> > If you really want to do something like this, then just make all MSR reads 
> > safe. So the only difference between "safe" and "unsafe" is that the unsafe 
> > version just doesn't check the return value, and silently just returns zero 
> > for reads (or writes nothing).
> >
> > To quote Obi-Wan: "Use the exception table, Luke".
> >
> > Because decoding instructions is just too ugly. We'll do it for CPU errata 
> > where we might have to do it for user space code too (ie the AMD prefetch 
> > mess), but for code that _we_ control? Hell no.
> >
> > So NAK on this.
> 
> My personal preference is to just not do this at all.  A couple people disagree.  
> If we make the unsafe variants not oops, then I think we want to have the nice 
> loud warning, since these issues are bugs if they happen.
> 
> We could certainly use the exception table for this, but it'll result in bigger 
> core, since each MSR access will need an exception table entry and an associated 
> fixup to call some helper that warns and sets the result to zero.
> 
> I'd be happy to implement that, but only if it'll be applied. Otherwise I'd 
> rather just drop this patch and keep the rest of the series.

Linus, what's your preference?

Due to the bug mentioned earlier in this thread all MSR reads are currently 'safe' 
on all the major Linux distros (which all have CONFIG_PARAVIRT=y), i.e. by 
'fixing' them we'd reintroduce random crashes into various fragile pieces of 
code...

To add insult to injury, the current 'silently safe by accident' MSR code isn't so 
safe: because it leaves the result of the read uninitialized...

To fix this all I'd really like to have:

 - safe MSR reads by default (i.e. never boot crash the kernel on some rare 
   condition - which to most users is either a silent boot hang or an instant 
   restart). Historicaly we had a stream of 'silly boot crashes' due to MSR reads 
   that generate a #GPF. They make Linux less usable around the edges, especially 
   in the x86 non-server (desktop) space where most hardware vendors are either 
   openly Linux hostile, or, at best, Linux oblivious.

 - proper result-zeroing behavior on exceptions

 - and we should also generate _some_ sort of warning when MSR exceptions happen
   in an 'unintended' fashion.

Maybe the warning could be put under a (default-enabled) config option for the 
size conscious.

Or we could extend exception table entry encoding to include a 'warning bit', to 
not bloat the kernel. If the exception handler code encounters such an exception 
it would generate a one-time warning for that entry, but otherwise not crash the 
kernel and continue execution with an all-zeroes result for the MSR read.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21  8:46       ` Ingo Molnar
@ 2015-09-21 12:27         ` Paolo Bonzini
  2015-09-21 16:36         ` Linus Torvalds
  1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2015-09-21 12:27 UTC (permalink / raw)
  To: Ingo Molnar, Andy Lutomirski
  Cc: Linus Torvalds, Thomas Gleixner, xen-devel, Arjan van de Ven,
	Andrew Morton, KVM list, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra



On 21/09/2015 10:46, Ingo Molnar wrote:
> Or we could extend exception table entry encoding to include a 'warning bit', to 
> not bloat the kernel. If the exception handler code encounters such an exception 
> it would generate a one-time warning for that entry, but otherwise not crash the 
> kernel and continue execution with an all-zeroes result for the MSR read.

The 'warning bit' already exists, it is the opcode that caused the fault. :)

The concern about bloat is a good one.  However, why is it necessary to
keep native_*_msr* inline?  If they are moved out-of-line, using the
exception table becomes the obvious solution and doesn't cause bloat
anymore.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21  8:46       ` Ingo Molnar
  2015-09-21 12:27         ` Paolo Bonzini
@ 2015-09-21 16:36         ` Linus Torvalds
  2015-09-21 16:49           ` Arjan van de Ven
                             ` (4 more replies)
  1 sibling, 5 replies; 23+ messages in thread
From: Linus Torvalds @ 2015-09-21 16:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Thomas Gleixner, Paolo Bonzini, xen-devel,
	Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra

On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Linus, what's your preference?

So quite frankly, is there any reason we don't just implement
native_read_msr() as just

   unsigned long long native_read_msr(unsigned int msr)
   {
      int err;
      unsigned long long val;

      val = native_read_msr_safe(msr, &err);
      WARN_ON_ONCE(err);
      return val;
   }

Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
done with it. I don't see the downside.

How many msr reads are <i>so</i> critical that the function call
overhead would matter? Get rid of the inline version of the _safe()
thing too, and put that thing there too.

                  Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 16:36         ` Linus Torvalds
@ 2015-09-21 16:49           ` Arjan van de Ven
  2015-09-21 17:27             ` Linus Torvalds
  2015-09-21 17:43             ` Andy Lutomirski
  2015-09-21 18:16           ` Andy Lutomirski
                             ` (3 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Arjan van de Ven @ 2015-09-21 16:49 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andy Lutomirski, Thomas Gleixner, Paolo Bonzini, xen-devel,
	Andrew Morton, KVM list, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra

On 9/21/2015 9:36 AM, Linus Torvalds wrote:
> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> Linus, what's your preference?
>
> So quite frankly, is there any reason we don't just implement
> native_read_msr() as just
>
>     unsigned long long native_read_msr(unsigned int msr)
>     {
>        int err;
>        unsigned long long val;
>
>        val = native_read_msr_safe(msr, &err);
>        WARN_ON_ONCE(err);
>        return val;
>     }
>
> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> done with it. I don't see the downside.
>
> How many msr reads are <i>so</i> critical that the function call
> overhead would matter?

if anything qualifies it'd be switch_to() and friends.

note that I'm not entirely happy about the notion of "safe" MSRs.
They're safe as in "won't fault".
Reading random MSRs isn't a generic safe operation though, but the name sort of gives people
the impression that it is. Even with _safe variants, you still need to KNOW the MSR exists (by means
of CPUID or similar) unfortunately.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 16:49           ` Arjan van de Ven
@ 2015-09-21 17:27             ` Linus Torvalds
  2015-09-21 17:43             ` Andy Lutomirski
  1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2015-09-21 17:27 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Andy Lutomirski, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Andrew Morton, KVM list, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra

On Mon, Sep 21, 2015 at 9:49 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>
>> How many msr reads are <i>so</i> critical that the function call
>> overhead would matter?
>
> if anything qualifies it'd be switch_to() and friends.

Is there anything else than the FS/GS_BASE thing (possibly hidden
behind inlines etc that I didn't get from a quick grep)? And why is
that sometimes using the "safe" version (in do_arch_prctl()), and
sometimes not (switch_to())?

I'm not convinced that mess is a good argument for the status quo ;)

> note that I'm not entirely happy about the notion of "safe" MSRs.
> They're safe as in "won't fault".

I wouldn't object to renaming them.

                Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 16:49           ` Arjan van de Ven
  2015-09-21 17:27             ` Linus Torvalds
@ 2015-09-21 17:43             ` Andy Lutomirski
  2015-09-22  8:12               ` Paolo Bonzini
  1 sibling, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2015-09-21 17:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Andrew Morton, KVM list, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra

On Mon, Sep 21, 2015 at 9:49 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 9/21/2015 9:36 AM, Linus Torvalds wrote:
>>
>> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>>
>>> Linus, what's your preference?
>>
>>
>> So quite frankly, is there any reason we don't just implement
>> native_read_msr() as just
>>
>>     unsigned long long native_read_msr(unsigned int msr)
>>     {
>>        int err;
>>        unsigned long long val;
>>
>>        val = native_read_msr_safe(msr, &err);
>>        WARN_ON_ONCE(err);
>>        return val;
>>     }
>>
>> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
>> done with it. I don't see the downside.
>>
>> How many msr reads are <i>so</i> critical that the function call
>> overhead would matter?
>
>
> if anything qualifies it'd be switch_to() and friends.

And maybe the KVM user return notifier.  Unfortunately, switch_to
might gain another two MSR accesses at some point if we decide to fix
the bugs in there.  Sigh.

>
> note that I'm not entirely happy about the notion of "safe" MSRs.
> They're safe as in "won't fault".
> Reading random MSRs isn't a generic safe operation though, but the name sort
> of gives people
> the impression that it is. Even with _safe variants, you still need to KNOW
> the MSR exists (by means
> of CPUID or similar) unfortunately.
>

I tend to agree.

Anyway, the fully out-of-line approach isn't obviously a bad idea, and
it simplifies the whole mess (we can drop most of the paravirt
patches, too).  I'll give it a try and see what happens.

--Andy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 16:36         ` Linus Torvalds
  2015-09-21 16:49           ` Arjan van de Ven
@ 2015-09-21 18:16           ` Andy Lutomirski
  2015-09-21 18:36             ` Borislav Petkov
  2015-09-21 18:47             ` Linus Torvalds
  2015-09-22  7:14           ` Ingo Molnar
                             ` (2 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Andy Lutomirski @ 2015-09-21 18:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, Paolo Bonzini, xen-devel,
	Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Borislav Petkov

On Mon, Sep 21, 2015 at 9:36 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> Linus, what's your preference?
>
> So quite frankly, is there any reason we don't just implement
> native_read_msr() as just
>
>    unsigned long long native_read_msr(unsigned int msr)
>    {
>       int err;
>       unsigned long long val;
>
>       val = native_read_msr_safe(msr, &err);
>       WARN_ON_ONCE(err);
>       return val;
>    }
>
> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> done with it. I don't see the downside.

In the interest of sanity, I want to drop the "native_", too, since
there appear to be few or no good use cases for native_read_msr as
such.  I'm tempted to add new functions read_msr and write_msr that
forward to rdmsrl_safe and wrmsrl_safe.

It looks like the msr helpers are every bit as bad as the TSC helpers
used to be :(

--Andy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 18:16           ` Andy Lutomirski
@ 2015-09-21 18:36             ` Borislav Petkov
  2015-09-21 18:47             ` Linus Torvalds
  1 sibling, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2015-09-21 18:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra

On Mon, Sep 21, 2015 at 11:16:30AM -0700, Andy Lutomirski wrote:
> In the interest of sanity, I want to drop the "native_", too, since
> there appear to be few or no good use cases for native_read_msr as
> such.  I'm tempted to add new functions read_msr and write_msr that
> forward to rdmsrl_safe and wrmsrl_safe.

Just change the msr_read/msr_write() ones in arch/x86/lib/msr.c to take
a u64 and you're there.

> It looks like the msr helpers are every bit as bad as the TSC helpers
> used to be :(

Yap.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 18:16           ` Andy Lutomirski
  2015-09-21 18:36             ` Borislav Petkov
@ 2015-09-21 18:47             ` Linus Torvalds
  1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2015-09-21 18:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, Paolo Bonzini, xen-devel,
	Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra, Borislav Petkov

On Mon, Sep 21, 2015 at 11:16 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> In the interest of sanity, I want to drop the "native_", too

Yes. I think the only reason it exists is to have that wrapper layer
for PV. And that argument just goes away if you just make the
non-inline helper function do all the PV logic directly.

I really suspect we should do this for a *lot* of the PV ops. Yeah,
some are so performance-critical that we probably do have a good
reason for the inline indirections etc (historical example: native
spin-unlock, which traditionally could be done as a single store
instruction), but I suspect a lot of the PV indirection is for this
kind of "historical wrapper model" reason, and it often makes it
really hard to see what is going on because you have to go through
several layers of indirection, often in different files.

                      Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 16:36         ` Linus Torvalds
  2015-09-21 16:49           ` Arjan van de Ven
  2015-09-21 18:16           ` Andy Lutomirski
@ 2015-09-22  7:14           ` Ingo Molnar
  2015-09-30 13:10           ` Peter Zijlstra
  2015-09-30 18:32           ` H. Peter Anvin
  4 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2015-09-22  7:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, Paolo Bonzini, xen-devel,
	Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Linus, what's your preference?
> 
> So quite frankly, is there any reason we don't just implement
> native_read_msr() as just
> 
>    unsigned long long native_read_msr(unsigned int msr)
>    {
>       int err;
>       unsigned long long val;
> 
>       val = native_read_msr_safe(msr, &err);
>       WARN_ON_ONCE(err);
>       return val;
>    }
> 
> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> done with it. I don't see the downside.

Absolutely!

> How many msr reads are <i>so</i> critical that the function call overhead would 
> matter? Get rid of the inline version of the _safe() thing too, and put that 
> thing there too.

Only a very low number of them is performance critical (because even 
hw-accelerated MSR accesses are generally slow so we try to avoid MSR accesses in 
fast paths as much as possible, via shadowing, etc.) - and in the few cases where 
we have to access an MSR in a fast path we can do those separately.

I'm only worried about the 'default' APIs, i.e. rdmsr() that is used throughout 
arch/x86/ over a hundred times, not about performance critical code paths that get 
enough testing and enough attention in general.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 17:43             ` Andy Lutomirski
@ 2015-09-22  8:12               ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2015-09-22  8:12 UTC (permalink / raw)
  To: Andy Lutomirski, Arjan van de Ven
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, xen-devel,
	Andrew Morton, KVM list, the arch/x86 maintainers,
	Linux Kernel Mailing List, Peter Zijlstra



On 21/09/2015 19:43, Andy Lutomirski wrote:
> And maybe the KVM user return notifier.

No, not really.  If anything, the place in KVM where it makes a
difference is vmx_save_host_state, which is also only using
always-present MSRs.  But don't care about KVM.

First clean it up, then we can add back inline versions like __rdmsr or
rdmsr_fault or rdmsr_unsafe or whatever.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 16:36         ` Linus Torvalds
                             ` (2 preceding siblings ...)
  2015-09-22  7:14           ` Ingo Molnar
@ 2015-09-30 13:10           ` Peter Zijlstra
  2015-09-30 14:01             ` Ingo Molnar
  2015-09-30 18:32           ` H. Peter Anvin
  4 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2015-09-30 13:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andy Lutomirski, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Linus, what's your preference?
> 
> So quite frankly, is there any reason we don't just implement
> native_read_msr() as just
> 
>    unsigned long long native_read_msr(unsigned int msr)
>    {
>       int err;
>       unsigned long long val;
> 
>       val = native_read_msr_safe(msr, &err);
>       WARN_ON_ONCE(err);
>       return val;
>    }
> 
> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> done with it. I don't see the downside.
> 
> How many msr reads are <i>so</i> critical that the function call
> overhead would matter? Get rid of the inline version of the _safe()
> thing too, and put that thing there too.

There are a few in the perf code, and esp. on cores without a stack
engine the call overhead is noticeable. Also note that the perf MSRs are
generally optimized MSRs and less slow (we cannot say fast, they're
still MSRs) than regular MSRs.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-30 13:10           ` Peter Zijlstra
@ 2015-09-30 14:01             ` Ingo Molnar
  2015-09-30 18:04               ` Andy Lutomirski
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2015-09-30 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > Linus, what's your preference?
> > 
> > So quite frankly, is there any reason we don't just implement
> > native_read_msr() as just
> > 
> >    unsigned long long native_read_msr(unsigned int msr)
> >    {
> >       int err;
> >       unsigned long long val;
> > 
> >       val = native_read_msr_safe(msr, &err);
> >       WARN_ON_ONCE(err);
> >       return val;
> >    }
> > 
> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> > done with it. I don't see the downside.
> > 
> > How many msr reads are <i>so</i> critical that the function call
> > overhead would matter? Get rid of the inline version of the _safe()
> > thing too, and put that thing there too.
> 
> There are a few in the perf code, and esp. on cores without a stack engine the 
> call overhead is noticeable. Also note that the perf MSRs are generally 
> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than 
> regular MSRs.

These could still be open coded in an inlined fashion, like the scheduler usage.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-30 14:01             ` Ingo Molnar
@ 2015-09-30 18:04               ` Andy Lutomirski
  2015-10-01  7:15                 ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2015-09-30 18:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Wed, Sep 30, 2015 at 7:01 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
>> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> > >
>> > > Linus, what's your preference?
>> >
>> > So quite frankly, is there any reason we don't just implement
>> > native_read_msr() as just
>> >
>> >    unsigned long long native_read_msr(unsigned int msr)
>> >    {
>> >       int err;
>> >       unsigned long long val;
>> >
>> >       val = native_read_msr_safe(msr, &err);
>> >       WARN_ON_ONCE(err);
>> >       return val;
>> >    }
>> >
>> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
>> > done with it. I don't see the downside.
>> >
>> > How many msr reads are <i>so</i> critical that the function call
>> > overhead would matter? Get rid of the inline version of the _safe()
>> > thing too, and put that thing there too.
>>
>> There are a few in the perf code, and esp. on cores without a stack engine the
>> call overhead is noticeable. Also note that the perf MSRs are generally
>> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than
>> regular MSRs.
>
> These could still be open coded in an inlined fashion, like the scheduler usage.
>

We could have a raw_rdmsr for those.

OTOH, I'm still not 100% convinced that this warn-but-don't-die
behavior is worth the effort.  This isn't a frequent source of bugs to
my knowledge, and we don't try to recover from incorrect cr writes,
out-of-bounds MMIO, etc, so do we really gain much by rigging a
recovery mechanism for rdmsr and wrmsr failures for code that doesn't
use the _safe variants?

--Andy

> Thanks,
>
>         Ingo



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-21 16:36         ` Linus Torvalds
                             ` (3 preceding siblings ...)
  2015-09-30 13:10           ` Peter Zijlstra
@ 2015-09-30 18:32           ` H. Peter Anvin
  4 siblings, 0 replies; 23+ messages in thread
From: H. Peter Anvin @ 2015-09-30 18:32 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andy Lutomirski, Thomas Gleixner, Paolo Bonzini, xen-devel,
	Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Peter Zijlstra

On 09/21/2015 09:36 AM, Linus Torvalds wrote:
> 
> How many msr reads are <i>so</i> critical that the function call
> overhead would matter? Get rid of the inline version of the _safe()
> thing too, and put that thing there too.
> 

Probably only the ones that may go in the context switch path.

	-hpa



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-09-30 18:04               ` Andy Lutomirski
@ 2015-10-01  7:15                 ` Ingo Molnar
  2016-03-11 16:48                   ` Andy Lutomirski
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2015-10-01  7:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Linus Torvalds, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List


* Andy Lutomirski <luto@amacapital.net> wrote:

> > These could still be open coded in an inlined fashion, like the scheduler usage.
> 
> We could have a raw_rdmsr for those.
> 
> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is 
> worth the effort.  This isn't a frequent source of bugs to my knowledge, and we 
> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we 
> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures 
> for code that doesn't use the _safe variants?

It's just the general principle really: don't crash the kernel on bootup. There's 
few things more user hostile than that.

Also, this would maintain the status quo: since we now (accidentally) don't crash 
the kernel on distro kernels (but silently and unsafely ignore the faulting 
instruction), we should not regress that behavior (by adding the chance to crash 
again), but improve upon it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2015-10-01  7:15                 ` Ingo Molnar
@ 2016-03-11 16:48                   ` Andy Lutomirski
  2016-03-12 16:02                     ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2016-03-11 16:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
>> > These could still be open coded in an inlined fashion, like the scheduler usage.
>>
>> We could have a raw_rdmsr for those.
>>
>> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is
>> worth the effort.  This isn't a frequent source of bugs to my knowledge, and we
>> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we
>> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures
>> for code that doesn't use the _safe variants?
>
> It's just the general principle really: don't crash the kernel on bootup. There's
> few things more user hostile than that.
>
> Also, this would maintain the status quo: since we now (accidentally) don't crash
> the kernel on distro kernels (but silently and unsafely ignore the faulting
> instruction), we should not regress that behavior (by adding the chance to crash
> again), but improve upon it.

Just a heads up: the extable improvements in tip:ras/core make it
straightforward to get the best of all worlds: explicit failure
handling (written in C!), no fast path overhead whatsoever, and no new
garbage in the exception handlers.

Patches coming once I test them.

>
> Thanks,
>
>         Ingo



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
  2016-03-11 16:48                   ` Andy Lutomirski
@ 2016-03-12 16:02                     ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2016-03-12 16:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Linus Torvalds, Thomas Gleixner, Paolo Bonzini,
	xen-devel, Arjan van de Ven, Andrew Morton, KVM list,
	the arch/x86 maintainers, Linux Kernel Mailing List


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Andy Lutomirski <luto@amacapital.net> wrote:
> >
> >> > These could still be open coded in an inlined fashion, like the scheduler usage.
> >>
> >> We could have a raw_rdmsr for those.
> >>
> >> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is
> >> worth the effort.  This isn't a frequent source of bugs to my knowledge, and we
> >> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we
> >> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures
> >> for code that doesn't use the _safe variants?
> >
> > It's just the general principle really: don't crash the kernel on bootup. There's
> > few things more user hostile than that.
> >
> > Also, this would maintain the status quo: since we now (accidentally) don't crash
> > the kernel on distro kernels (but silently and unsafely ignore the faulting
> > instruction), we should not regress that behavior (by adding the chance to crash
> > again), but improve upon it.
> 
> Just a heads up: the extable improvements in tip:ras/core make it
> straightforward to get the best of all worlds: explicit failure
> handling (written in C!), no fast path overhead whatsoever, and no new
> garbage in the exception handlers.

I _knew_ I should have merged them into tip:x86/mm, not tip:ras/core ;-)

I had a quick look at your new MSR series and I'm very happy with that direction!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-03-12 16:02 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-21  0:02 [PATCH v2 0/2] x86/msr: MSR access failure changes Andy Lutomirski
2015-09-21  0:02 ` [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
2015-09-21  0:15   ` Linus Torvalds
2015-09-21  1:13     ` Andy Lutomirski
2015-09-21  8:46       ` Ingo Molnar
2015-09-21 12:27         ` Paolo Bonzini
2015-09-21 16:36         ` Linus Torvalds
2015-09-21 16:49           ` Arjan van de Ven
2015-09-21 17:27             ` Linus Torvalds
2015-09-21 17:43             ` Andy Lutomirski
2015-09-22  8:12               ` Paolo Bonzini
2015-09-21 18:16           ` Andy Lutomirski
2015-09-21 18:36             ` Borislav Petkov
2015-09-21 18:47             ` Linus Torvalds
2015-09-22  7:14           ` Ingo Molnar
2015-09-30 13:10           ` Peter Zijlstra
2015-09-30 14:01             ` Ingo Molnar
2015-09-30 18:04               ` Andy Lutomirski
2015-10-01  7:15                 ` Ingo Molnar
2016-03-11 16:48                   ` Andy Lutomirski
2016-03-12 16:02                     ` Ingo Molnar
2015-09-30 18:32           ` H. Peter Anvin
2015-09-21  0:02 ` [PATCH v2 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).