* [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
[not found] <cover.1442523997.git.luto@kernel.org>
@ 2015-09-17 21:11 ` Andy Lutomirski
2015-09-17 21:11 ` [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski
[not found] ` <203bb8a52efae1781281fb70ccd45c3e164fbce2.1442523997.git.luto@kernel.org>
2 siblings, 0 replies; 3+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: KVM list, Peter Zijlstra, Linus Torvalds, linux-kernel,
xen-devel, Thomas Gleixner, Andy Lutomirski, Paolo Bonzini,
Andrew Morton, Arjan van de Ven, Ingo Molnar
This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ON_ONCE and a return of poisoned values (in the
RDMSR case). We still write a pr_info entry unconditionally for
debugging.
To be clear, this type of failure should *not* happen. This patch
exists to minimize the chance of nasty undebuggable failures due on
systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/kernel/traps.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 346eec73f7db..b7731765017a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -437,6 +437,54 @@ exit_trap:
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
}
+static bool paper_over_kernel_gpf(struct pt_regs *regs)
+{
+ /*
+ * Try to decode the opcode that failed. So far, we only care
+ * about boring two-byte unprefixed opcodes, so we don't need
+ * the full instruction decoder machinery.
+ */
+ u16 opcode;
+
+ if (probe_kernel_read(&opcode, (const void *)regs->ip, sizeof(opcode)))
+ return false;
+
+ if (opcode == 0x320f) {
+ /* RDMSR */
+ pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
+ (unsigned int)regs->cx);
+ if (!panic_on_oops) {
+ WARN_ON_ONCE(true);
+
+ /* Patch it up with deterministic poison. */
+ regs->ax = 0x5aadc0de;
+ regs->dx = 0x8badf00d;
+ regs->ip += 2;
+ return true;
+ } else {
+ /* Don't fix it up. */
+ return false;
+ }
+ } else if (opcode == 0x300f) {
+ /* WRMSR */
+ pr_info("bad kernel WRMSR writing 0x%08x%08x to MSR 0x%x",
+ (unsigned int)regs->dx, (unsigned int)regs->ax,
+ (unsigned int)regs->cx);
+ if (!panic_on_oops) {
+ WARN_ON_ONCE(true);
+
+ /* Pretend it worked and carry on. */
+ regs->ip += 2;
+ return true;
+ } else {
+ /* Don't fix it up. */
+ return false;
+ }
+ }
+
+ return false;
+}
+
dotraplinkage void
do_general_protection(struct pt_regs *regs, long error_code)
{
@@ -456,6 +504,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
if (fixup_exception(regs))
return;
+ if (paper_over_kernel_gpf(regs))
+ return;
+
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
if (notify_die(DIE_GPF, "general protection fault", regs, error_code,
--
2.4.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails
[not found] <cover.1442523997.git.luto@kernel.org>
2015-09-17 21:11 ` [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
@ 2015-09-17 21:11 ` Andy Lutomirski
[not found] ` <203bb8a52efae1781281fb70ccd45c3e164fbce2.1442523997.git.luto@kernel.org>
2 siblings, 0 replies; 3+ messages in thread
From: Andy Lutomirski @ 2015-09-17 21:11 UTC (permalink / raw)
To: x86
Cc: KVM list, Peter Zijlstra, Linus Torvalds, linux-kernel,
xen-devel, Thomas Gleixner, Andy Lutomirski, Paolo Bonzini,
Andrew Morton, Arjan van de Ven, Ingo Molnar
This will cause unchecked native_rdmsr_safe failures to return
deterministic results.
We could poison the results with something other than zero, but that
would increase code size.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/include/asm/msr.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 77d8b284e4a7..9eda52205d42 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -73,7 +73,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
asm volatile("2: rdmsr ; xor %[err],%[err]\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
- "3: mov %[fault],%[err] ; jmp 1b\n\t"
+ "3: mov %[fault],%[err]\n\t"
+ "xorl %%eax, %%eax\n\t"
+ "xorl %%edx, %%edx\n\t"
+ "jmp 1b\n\t"
".previous\n\t"
_ASM_EXTABLE(2b, 3b)
: [err] "=r" (*err), EAX_EDX_RET(val, low, high)
--
2.4.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
[not found] ` <203bb8a52efae1781281fb70ccd45c3e164fbce2.1442523997.git.luto@kernel.org>
@ 2015-09-18 7:14 ` Ingo Molnar
0 siblings, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2015-09-18 7:14 UTC (permalink / raw)
To: Andy Lutomirski
Cc: KVM list, Peter Zijlstra, Linus Torvalds, x86, linux-kernel,
xen-devel, Paolo Bonzini, Andrew Morton, Arjan van de Ven,
Thomas Gleixner
* Andy Lutomirski <luto@kernel.org> wrote:
> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ON_ONCE and a return of poisoned values (in the
> RDMSR case). We still write a pr_info entry unconditionally for
> debugging.
>
> To be clear, this type of failure should *not* happen. This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
> + if (opcode == 0x320f) {
> + /* RDMSR */
> + pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
> + (unsigned int)regs->cx);
> + if (!panic_on_oops) {
> + WARN_ON_ONCE(true);
> +
> + /* Patch it up with deterministic poison. */
> + regs->ax = 0x5aadc0de;
> + regs->dx = 0x8badf00d;
> + regs->ip += 2;
> + return true;
IMHO this should really not poison the result, but use zero as the result.
The poison might randomly indicate 'present' feature in various registers that
might be accessed in a buggy way. Don't send the code further down into la-la-land
by giving it a 'success'.
And yes, zero can mean success too, but we have to pick a side here ...
The warning will be enough to fix these ups, people (and in particular distro
testing people) will be watching out for them.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-09-18 7:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <cover.1442523997.git.luto@kernel.org>
2015-09-17 21:11 ` [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Andy Lutomirski
2015-09-17 21:11 ` [PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails Andy Lutomirski
[not found] ` <203bb8a52efae1781281fb70ccd45c3e164fbce2.1442523997.git.luto@kernel.org>
2015-09-18 7:14 ` [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).