From: Andy Lutomirski <luto@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Borislav Petkov <bp@alien8.de>
Cc: Laura Abbott <labbott@redhat.com>, X86 ML <x86@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
stable <stable@vger.kernel.org>,
Andy Lutomirski <luto@kernel.org>
Subject: Re: Yet another KPTI regression with 4.14.x series in a VM
Date: Fri, 12 Jan 2018 22:08:20 -0800 [thread overview]
Message-ID: <CALCETrWOpg_OXThyMC66JQu7JF=TxzRwFrEgunUZt3H+VUfxrQ@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1801122250080.2371@nanos>
On Fri, Jan 12, 2018 at 1:51 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 12 Jan 2018, Laura Abbott wrote:
>
> Cc+ Andy
>
> I'm almost crashed out by now. Andy might have an idea. I'll look again
> tomorrow with brain awake.
>
>> On 01/12/2018 10:51 AM, Thomas Gleixner wrote:
>> > On Fri, 12 Jan 2018, Laura Abbott wrote:
>> > > Fedora got a bug report on 4.14.11 of a panic when booting a
>> > > Fedora guest in a CentOS 6 VM, not reproducible with nopti.
>> > > The issue is still present as of 4.14.13 as well. The only
>> > > report is a panic screenshot
>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1532458
>> > >
>> > > I've lost track of all the fixes that have been flying around,
>> > > is this a new issue or has a fix not yet made it to stable?
>> >
>> > Hmm. Looks kinda familiar, but that has been fixed I think even before
>> > 4.4.11. Could you please ask the reported to provide a full console log via
>> > the VM "serial console" ?
>> >
>> > Thanks,
>> >
>> > tglx
>> >
>>
>> [ 2.910025] PANIC: double fault, error_code: 0x0
Probably a stack overflow
>> [ 2.910025] CPU: 1 PID: 56 Comm: modprobe Not tainted
>> 4.14.13-300.fc27.x86_64 #1
>> [ 2.910025] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
>> [ 2.910025] task: ffff891f78dc3c00 task.stack: ffffa6eac0594000
>> [ 2.910025] RIP: 0010:vprintk_default+0x5/0x30
>> [ 2.910025] RSP: 0000:fffffe000002e000 EFLAGS: 00010046
>> [ 2.910025] RAX: 0000000000000000 RBX: fffffe000002e118 RCX:
>> 0000000000000001
>> [ 2.910025] RDX: 0000000000000000 RSI: fffffe000002e018 RDI:
>> ffffffffbe0715a0
>> [ 2.910025] RBP: fffffe000002e008 R08: ffffffffbe0bb565 R09:
>> ffffffffbe07159b
>> [ 2.910025] R10: fffffe000002e080 R11: 0000000000000000 R12:
>> ffffffffbe070fdd
>> [ 2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [ 2.910025] FS: 0000000000000000(0000) GS:ffff891f7fd00000(0000)
>> knlGS:0000000000000000
>> [ 2.910025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2.910025] CR2: fffffe000002dff8 CR3: 0000000078da6000 CR4:
>> 00000000000006e0
CR2 and RSP are consistent with stack overflow, but that's not
terribly important, since...
>> [ 2.910025] Call Trace:
>> [ 2.910025] <ENTRY_TRAMPOLINE>
>> [ 2.910025] ? vprintk_func+0x27/0x60
>> [ 2.910025] printk+0x52/0x6e
>> [ 2.910025] __die+0x6b/0xe0
>> [ 2.910025] die+0x2f/0x50
>> [ 2.910025] do_general_protection+0x149/0x160
This is the real problem here.
>> [ 2.910025] general_protection+0x2c/0x60
>> [ 2.910025] RIP: 0010:swapgs_restore_regs_and_return_to_usermode+0x6f/0x80
>> [ 2.910025] RSP: 0000:fffffe000002e1c8 EFLAGS: 00000006
>> [ 2.910025] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> 0000000000000000
>> [ 2.910025] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> 0000000078da7800
>> [ 2.910025] RBP: 0000000000000000 R08: 0000000000000000 R09:
>> 0000000000000000
>> [ 2.910025] R10: 0000000000000000 R11: 0000000000000000 R12:
>> 0000000000000000
>> [ 2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [ 2.910025] </ENTRY_TRAMPOLINE>
ffffffff81a00b3e: 65 48 0f b3 3c 25 8e btr %rdi,%gs:0x1e168e
ffffffff81a00b45: 16 1e 00
ffffffff81a00b48: 48 89 c7 mov %rax,%rdi
ffffffff81a00b4b: eb 08 jmp 0xffffffff81a00b55
ffffffff81a00b4d: 48 89 c7 mov %rax,%rdi
ffffffff81a00b50: 48 0f ba ef 3f bts $0x3f,%rdi
ffffffff81a00b55: 48 81 cf 00 18 00 00 or $0x1800,%rdi
ffffffff81a00b5c: 0f 22 df mov %rdi,%cr3 <-- #GP here
ffffffff81a00b5f: 58 pop %rax
ffffffff81a00b60: 5f pop %rdi
The CPU didn't like writing 0x0000000078da7800 to %cr3.
I'm guessing that CR4.PCIDE=0.
Now this is quite a strange value to write to CR3. The 0x800 part
means that we're using the "user" variant of the address space that
would have ASID=0 and the 0x1000 bit being set corresponds to the user
pgdir, but this is nonsense, since the kernel never uses PCID 0 for
user mode. We always start at 1. The only exception is if
X86_FEATURE_PCID is off. But, if X86_FEATURE_PCID is off, then we
shouldn't be setting any PCID bits.
In fact, it looks like this code is totally bogus and has never been
correct at all. Even in:
commit 4b1d5ae3b103eda43f9d0f85c355bb6995b03a30
Author: Peter Zijlstra <peterz@infradead.org>
Date: Mon Dec 4 15:07:59 2017 +0100
x86/mm: Use/Fix PCID to optimize user/kernel switches
We have:
.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
mov %cr3, \scratch_reg
ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
...
.Lwrcr3_\@:
/* Flip the PGD and ASID to the user version */
orq $(PTI_SWITCH_MASK), \scratch_reg
mov \scratch_reg, %cr3
.Lend_\@:
That's bogus. PTI_SWITCH_MASK is 0x1800, which has PCID = 0x800.
This should probably use an alternative to select between 0x1000 and
0x800 depending on X86_FEATURE_PCID or just use an entirely different
label for the !PCID case.
FWIW, this bit in SAVE_AND_SWITCH_TO_KERNEL_CR3
testq $(PTI_SWITCH_MASK), \scratch_reg
jz .Ldone_\@
is a bit silly, too. It's *correct* (I think), but shouldn't that
just be bt $(PTI_SWITCH_PGTABLES_BIT), \scratch_reg, with the obvious
caveat that the headers don't actually define PTI_SWITCH_PGTABLES_BIT?
Was this code *ever* tested with nopcid?
next prev parent reply other threads:[~2018-01-13 6:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-12 18:19 Yet another KPTI regression with 4.14.x series in a VM Laura Abbott
2018-01-12 18:51 ` Thomas Gleixner
2018-01-12 21:30 ` Laura Abbott
2018-01-12 21:51 ` Thomas Gleixner
2018-01-13 6:08 ` Andy Lutomirski [this message]
2018-01-13 6:33 ` Willy Tarreau
2018-01-13 20:01 ` Andy Lutomirski
2018-01-13 20:45 ` Thomas Gleixner
2018-01-13 20:52 ` Andy Lutomirski
2018-01-13 12:08 ` Peter Zijlstra
2018-01-13 12:30 ` David Woodhouse
2018-01-13 13:10 ` Peter Zijlstra
2018-01-13 13:39 ` David Woodhouse
2018-01-13 14:14 ` David Woodhouse
2018-01-13 12:50 ` Thomas Gleixner
2018-01-13 13:51 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALCETrWOpg_OXThyMC66JQu7JF=TxzRwFrEgunUZt3H+VUfxrQ@mail.gmail.com' \
--to=luto@kernel.org \
--cc=bp@alien8.de \
--cc=labbott@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).