All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: BUG: unable to handle kernel paging request in __switch_to
       [not found] <001a1145e8548cbd3d055f73374f@google.com>
@ 2017-12-14 17:12 ` Thomas Gleixner
  2017-12-14 18:42   ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2017-12-14 17:12 UTC (permalink / raw)
  To: syzbot
  Cc: bp, dsafonov, hpa, linux-kernel, luto, me, mingo, syzkaller-bugs, x86

On Sun, 3 Dec 2017, syzbot wrote:
> BUG: unable to handle kernel paging request at fffffffffffffff8
> IP: switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
> IP: __switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407
> PGD 5e28067 P4D 5e28067 PUD 5e2a067 PMD 0
> Oops: 0002 [#1] SMP KASAN
> Dumping ftrace buffer:
>   (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 4355 Comm: syz-executor1 Not tainted 4.15.0-rc1-next-20171129+ #55
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> task: ffff8801cf1e80c0 task.stack: ffff8801d03a8000
> RIP: 0010:switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
> RIP: 0010:__switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407
> RSP: 0018:ffff8801cb867468 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff8801cc0b8500 RCX: ffff8801cc0b9a00
> RDX: 1ffff10039e3d2d0 RSI: 0000000000000000 RDI: ffff8801cf1e96c0
> RBP: ffff8801cb867628 R08: ffff8801db427918 R09: 1ffff1003a075dfe
> R10: ffff8801cf1e80c0 R11: 0000000000000003 R12: ffff8801cf1e80c0
> R13: ffff8801cf1e96c0 R14: ffff8801cf1e9680 R15: ffff8801cf1e95c0
> FS:  00007f16e6ea0700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: fffffffffffffff8 CR3: 00000001cc778000 CR4: 00000000001426f0
> Call Trace:
> Code: b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f
> 8e d5 06 00 00 8b 85 70 fe ff ff 41 89 84 24 c0 15 00 00 <cc> 1f 44 00 00 65

  <cc> is an int3 !?!?!

> 8b 05 99 01 dc 7e 89 c0 48 0f a3 05 df 97 39

That's the second report I'm staring at today which has CR2
fffffffffffffffx and points to a faulting instruction which does not make
any sense at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-14 17:12 ` BUG: unable to handle kernel paging request in __switch_to Thomas Gleixner
@ 2017-12-14 18:42   ` Linus Torvalds
  2017-12-14 18:54     ` Andy Lutomirski
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2017-12-14 18:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin,
	Linux Kernel Mailing List, Andrew Lutomirski, Kyle Huey,
	Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 9:12 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sun, 3 Dec 2017, syzbot wrote:
>> BUG: unable to handle kernel paging request at fffffffffffffff8
>> Oops: 0002 [#1] SMP KASAN

System write of a non-existent page.

>> RIP: 0010:switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
>> RIP: 0010:__switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407

This says it's

     old_fpu->last_cpu = cpu;

and the code disassembly ends up looking something like this:

   0: 48 c1 ea 03          shr    $0x3,%rdx
   4: 0f b6 04 02          movzbl (%rdx,%rax,1),%eax
   8: 84 c0                test   %al,%al
   a: 74 08                je     0x14
   c: 3c 03                cmp    $0x3,%al
   e: 0f 8e d5 06 00 00    jle    0x6e9
  14: 8b 85 70 fe ff ff    mov    -0x190(%rbp),%eax
  1a: 41 89 84 24 c0 15 00 mov    %eax,0x15c0(%r12)
  21: 00
  22:* cc                    int3    <-- trapping instruction

where that preceding two "mov" instructions look like it might indeed be that

     old_fpu->last_cpu = cpu;

thing, and the register state doesn't look insane for this.

So I think the RIP->line encoding is slightly off, and that "int3" is
almost certainly due to the very next thing after the write:

                trace_x86_fpu_regs_deactivated(old_fpu);

and that actually makes sense if the test robot is doing some tracing,
particularly if it's just about to _start_ tracing, and it has
replaced the first byte of the instruction with 'int3' and is in the
process of doing the rewrite.

The fact that it then takes a system write fault is because some GDT
or IDT setup is screwed up. Or possibly the stack is screwed up and
started out as 0, and then the push to the stack would decrement the
stack pointer and try to push the error state or something.

> That's the second report I'm staring at today which has CR2
> fffffffffffffffx and points to a faulting instruction which does not make
> any sense at all.

That actually does make sense - see above.  It just requires that race
with the instruction rewriting.

*Normally* we never actually take the "int3" exception, because
normally we'll have completed the rewrite before another CPU actually
executes the instruction that is being rewritten.

So I'm assuming this is with the page table isolation, and some
unusual case in exception handling got screwed up.

                 Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-14 18:42   ` Linus Torvalds
@ 2017-12-14 18:54     ` Andy Lutomirski
  2017-12-14 19:28       ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2017-12-14 18:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Andrew Lutomirski,
	Kyle Huey, Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 10:42 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 9:12 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Sun, 3 Dec 2017, syzbot wrote:
>>> BUG: unable to handle kernel paging request at fffffffffffffff8
>>> Oops: 0002 [#1] SMP KASAN
>
> System write of a non-existent page.
>
>>> RIP: 0010:switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
>>> RIP: 0010:__switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407
>
> This says it's
>
>      old_fpu->last_cpu = cpu;
>
> and the code disassembly ends up looking something like this:
>
>    0: 48 c1 ea 03          shr    $0x3,%rdx
>    4: 0f b6 04 02          movzbl (%rdx,%rax,1),%eax
>    8: 84 c0                test   %al,%al
>    a: 74 08                je     0x14
>    c: 3c 03                cmp    $0x3,%al
>    e: 0f 8e d5 06 00 00    jle    0x6e9
>   14: 8b 85 70 fe ff ff    mov    -0x190(%rbp),%eax
>   1a: 41 89 84 24 c0 15 00 mov    %eax,0x15c0(%r12)
>   21: 00
>   22:* cc                    int3    <-- trapping instruction
>
> where that preceding two "mov" instructions look like it might indeed be that
>
>      old_fpu->last_cpu = cpu;
>
> thing, and the register state doesn't look insane for this.
>
> So I think the RIP->line encoding is slightly off, and that "int3" is
> almost certainly due to the very next thing after the write:
>
>                 trace_x86_fpu_regs_deactivated(old_fpu);
>
> and that actually makes sense if the test robot is doing some tracing,
> particularly if it's just about to _start_ tracing, and it has
> replaced the first byte of the instruction with 'int3' and is in the
> process of doing the rewrite.
>
> The fact that it then takes a system write fault is because some GDT
> or IDT setup is screwed up. Or possibly the stack is screwed up and
> started out as 0, and then the push to the stack would decrement the
> stack pointer and try to push the error state or something.
>
>> That's the second report I'm staring at today which has CR2
>> fffffffffffffffx and points to a faulting instruction which does not make
>> any sense at all.
>
> That actually does make sense - see above.  It just requires that race
> with the instruction rewriting.
>
> *Normally* we never actually take the "int3" exception, because
> normally we'll have completed the rewrite before another CPU actually
> executes the instruction that is being rewritten.
>
> So I'm assuming this is with the page table isolation, and some
> unusual case in exception handling got screwed up.

SDM time.  Assuming the CPU actually decoded int3 and tried to execute
it, I can see a couple possible outcomes:

1. Something's wrong with the IDT and it can't read the vector.  I
think this would end up triple-faulting, though.

2. It actually tries to handle the breakpoint.  A breakpoint is a
benign exception, so any exception encountered while delivering it
would result in serial delivery.  I've never thought that serial
delivery made any sense -- presumably it just cancels the breakpoint
and delivers the other exception.  So this *could* be a page fault hit
during delivery of the int3 exception.  I don't believe it's a GDT
problem, though, because that would also likely lead to a triple
fault.  What I *would* believe is that the IST table got messed up and
we're seeing the result of trying to push to the stack with the
initial RSP=0 so the fault hits at address -8.

I have no idea how that would happen, though.  Especially since int3
from userspace would have exactly the same problem, and we exercise
that code in the selftests.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-14 18:54     ` Andy Lutomirski
@ 2017-12-14 19:28       ` Linus Torvalds
  2017-12-14 21:27         ` Andy Lutomirski
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2017-12-14 19:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 10:54 AM, Andy Lutomirski <luto@kernel.org> wrote:
>
> 2. It actually tries to handle the breakpoint.  A breakpoint is a
> benign exception, so any exception encountered while delivering it
> would result in serial delivery.

I don't think that's the case. "int3" is entirely synchronous, and
doesn't have the same odd issues as a breakpoint trap (which honors RF
etc). It's literally just a one-byte shorthand for "int $3".

There should be no serial delivery, although obviously if it's a trap
gate (as opposed to an interrupt gate), you can get a normal external
interrupt on the first instruction of the exception handler.

But that's not what the oops says: it says it happens on the "int3" instruction.

Now, it is possible that the "int3" was written _after_ the CPU took a
real page fault on the original instruction, and that the original
instruction actually caused a perfectly normal page fault, and then we
just report the "int3" because another CPU overwrote the instruction
after the original instruction had already trapped.

But that makes very little sense either. I really do think it's the
"int3" itself that causes the page fault due to some IDT/GDT change.
Because that would actually make sense considering what has changed in
the tree that Thomas is running.

Plus I think the instruction that gets overwritten is just a 5-byte
nop isn't it? So it really shouldn't take a fault without the "int3"
overwriting.

[ Goes back to the original report ]

Yeah, so looking back at the "Code:" line, the faulting instruction
looked like this:

  <cc> 1f 44 00 00

and a P6_NOP5 is

  #define P6_NOP5     0x0f,0x1f,0x44,0x00,0

so it's definitely "first byte of a 5-byte nop has been overwritten
with a 'int3' instruction". The nop does not fault on its own.

                    Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-14 19:28       ` Linus Torvalds
@ 2017-12-14 21:27         ` Andy Lutomirski
  2017-12-14 21:39           ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2017-12-14 21:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov,
	Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List,
	Kyle Huey, Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 10:54 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> 2. It actually tries to handle the breakpoint.  A breakpoint is a
>> benign exception, so any exception encountered while delivering it
>> would result in serial delivery.
>
> I don't think that's the case. "int3" is entirely synchronous, and
> doesn't have the same odd issues as a breakpoint trap (which honors RF
> etc). It's literally just a one-byte shorthand for "int $3".
>

The SDM says precisely the same thing about INT N, so, whichever way
you dice it, int3 is a benign exception.

> There should be no serial delivery, although obviously if it's a trap
> gate (as opposed to an interrupt gate), you can get a normal external
> interrupt on the first instruction of the exception handler.
>
> But that's not what the oops says: it says it happens on the "int3" instruction.
>
> Now, it is possible that the "int3" was written _after_ the CPU took a
> real page fault on the original instruction, and that the original
> instruction actually caused a perfectly normal page fault, and then we
> just report the "int3" because another CPU overwrote the instruction
> after the original instruction had already trapped.
>
> But that makes very little sense either. I really do think it's the
> "int3" itself that causes the page fault due to some IDT/GDT change.
> Because that would actually make sense considering what has changed in
> the tree that Thomas is running.

I still have trouble figuring what IDT or GDT error would cause a page
fault and not a double-fault or triple-fault.  So I like my
bogus-IST-in-the-TSS theory more, even if I have no idea how it would
happen.  Entry stack underflow?  Overflow of whatever is mapped just
above the TSS in that kernel?  Some kind of fuckup where ioperm()
overwrote the IST?  (I tested that, but who knows?  This is a fuzz
test, after all.)

0xfffffffffffffff8 is *exactly* where the fault would be if the
microcoded push of SS faulted if the IST contained zeros.

Hmm.  There is another way that could happen.  If the IDT ended up
with the wrong IST entry, we could get the same failure.  But I don't
see how that would happen either.

Maybe it's the bloody debug_idt thing blowing up?

>
> Plus I think the instruction that gets overwritten is just a 5-byte
> nop isn't it? So it really shouldn't take a fault without the "int3"
> overwriting.

Unless it was being overwritten the other way and the oops hit while
tracing was being turned *off*.

--Andy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-14 21:27         ` Andy Lutomirski
@ 2017-12-14 21:39           ` Linus Torvalds
  2017-12-15  9:07             ` Dmitry Vyukov
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2017-12-14 21:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> I don't think that's the case. "int3" is entirely synchronous, and
>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>> etc). It's literally just a one-byte shorthand for "int $3".
>
> The SDM says precisely the same thing about INT N, so, whichever way
> you dice it, int3 is a benign exception.

That just means that it doesn't double-fault when it takes the page fault.

Which we already know, because we see a page fault, not a double fault.

> 0xfffffffffffffff8 is *exactly* where the fault would be if the
> microcoded push of SS faulted if the IST contained zeros.

Yes, I suspect it's the stack that is buggered for some reason.

>> Plus I think the instruction that gets overwritten is just a 5-byte
>> nop isn't it? So it really shouldn't take a fault without the "int3"
>> overwriting.
>
> Unless it was being overwritten the other way and the oops hit while
> tracing was being turned *off*.

Doesn't really matter. The two forms of that instruction are "5-byte
nop" and "unconditional branch".

Neither of them will write to anything - the only page fault they
could take is for instruction fetch.

So it really must be the "int3" that fails. Unless we're looking at
some odd CPU errata, which sounds very very unlikely.

                 Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-14 21:39           ` Linus Torvalds
@ 2017-12-15  9:07             ` Dmitry Vyukov
  2017-12-15  9:13               ` Dmitry Vyukov
  2017-12-15  9:49               ` Thomas Gleixner
  0 siblings, 2 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2017-12-15  9:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov,
	Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List,
	Kyle Huey, Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> I don't think that's the case. "int3" is entirely synchronous, and
>>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>>> etc). It's literally just a one-byte shorthand for "int $3".
>>
>> The SDM says precisely the same thing about INT N, so, whichever way
>> you dice it, int3 is a benign exception.
>
> That just means that it doesn't double-fault when it takes the page fault.
>
> Which we already know, because we see a page fault, not a double fault.
>
>> 0xfffffffffffffff8 is *exactly* where the fault would be if the
>> microcoded push of SS faulted if the IST contained zeros.
>
> Yes, I suspect it's the stack that is buggered for some reason.
>
>>> Plus I think the instruction that gets overwritten is just a 5-byte
>>> nop isn't it? So it really shouldn't take a fault without the "int3"
>>> overwriting.
>>
>> Unless it was being overwritten the other way and the oops hit while
>> tracing was being turned *off*.
>
> Doesn't really matter. The two forms of that instruction are "5-byte
> nop" and "unconditional branch".
>
> Neither of them will write to anything - the only page fault they
> could take is for instruction fetch.
>
> So it really must be the "int3" that fails. Unless we're looking at
> some odd CPU errata, which sounds very very unlikely.

FTR the commit is:

commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
Author: Stephen Rothwell <sfr@canb.auug.org.au>
Date:   Wed Nov 29 14:09:56 2017 +1100
    Add linux-next specific files for 20171129

You can get it from
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git
Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz
Config was attached.

I've built this exact kernel and here is __switch_to disasm:
https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt

__switch_to+0x95b seems to point to (?):

ffffffff81252f6b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)

which is branch target alignment nop.

We have a bunch of semi-similar non-sense crashes on syzbot:

https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ

Lots of them are on 0xfffffffffffffff8 address.

I have some suspicion towards KVM. Potentially a nested KVM messed
host processor state (CRn or page tables) so that then we get these
weird crashes.

One question: how would triple-fault look like? I am asking because we
have hundreds of cases where kernel just starts silently rebooting
while running some unprivileged syscalls:
https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ
Can these be triple faults? Reproducer for that one also seems to be
related to KVM.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15  9:07             ` Dmitry Vyukov
@ 2017-12-15  9:13               ` Dmitry Vyukov
  2017-12-15  9:38                 ` Dmitry Vyukov
  2017-12-15  9:49               ` Thomas Gleixner
  1 sibling, 1 reply; 20+ messages in thread
From: Dmitry Vyukov @ 2017-12-15  9:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov,
	Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List,
	Kyle Huey, Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Fri, Dec 15, 2017 at 10:07 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>> I don't think that's the case. "int3" is entirely synchronous, and
>>>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>>>> etc). It's literally just a one-byte shorthand for "int $3".
>>>
>>> The SDM says precisely the same thing about INT N, so, whichever way
>>> you dice it, int3 is a benign exception.
>>
>> That just means that it doesn't double-fault when it takes the page fault.
>>
>> Which we already know, because we see a page fault, not a double fault.
>>
>>> 0xfffffffffffffff8 is *exactly* where the fault would be if the
>>> microcoded push of SS faulted if the IST contained zeros.
>>
>> Yes, I suspect it's the stack that is buggered for some reason.
>>
>>>> Plus I think the instruction that gets overwritten is just a 5-byte
>>>> nop isn't it? So it really shouldn't take a fault without the "int3"
>>>> overwriting.
>>>
>>> Unless it was being overwritten the other way and the oops hit while
>>> tracing was being turned *off*.
>>
>> Doesn't really matter. The two forms of that instruction are "5-byte
>> nop" and "unconditional branch".
>>
>> Neither of them will write to anything - the only page fault they
>> could take is for instruction fetch.
>>
>> So it really must be the "int3" that fails. Unless we're looking at
>> some odd CPU errata, which sounds very very unlikely.
>
> FTR the commit is:
>
> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
> Author: Stephen Rothwell <sfr@canb.auug.org.au>
> Date:   Wed Nov 29 14:09:56 2017 +1100
>     Add linux-next specific files for 20171129
>
> You can get it from
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git
> Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz
> Config was attached.
>
> I've built this exact kernel and here is __switch_to disasm:
> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt
>
> __switch_to+0x95b seems to point to (?):
>
> ffffffff81252f6b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
>
> which is branch target alignment nop.
>
> We have a bunch of semi-similar non-sense crashes on syzbot:
>
> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ
>
> Lots of them are on 0xfffffffffffffff8 address.
>
> I have some suspicion towards KVM. Potentially a nested KVM messed
> host processor state (CRn or page tables) so that then we get these
> weird crashes.
>
> One question: how would triple-fault look like? I am asking because we
> have hundreds of cases where kernel just starts silently rebooting
> while running some unprivileged syscalls:
> https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ
> Can these be triple faults? Reproducer for that one also seems to be
> related to KVM.



Well, actually replying log for this crash and for
https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
with:

./syz-execprog -procs=10 -sandbox=namespace -repeat=0 raw.txt
(you can find exact instructions on how to do this here
https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md)

I've got:


[  121.553588] binder: 3856:3857 ioctl 40046205 0 returned -22
[  121.557656] binder: 3856:3857 ERROR: BC_REGISTER_LOOPER called
without request
[  121.559744] binder: 3857 RLIMIT_NICE not set
[  121.586339] binder: 3857 RLIMIT_NICE not set
[  121.591764] binder: 3856:3857 unknown command 1400526783
[  121.593226] binder: 3856:3857 ioctl c0306201 20002fd0 returned -22
[  121.598292] binder: 3857 RLIMIT_NICE not set
[  121.600827] binder: 3856:3857 ioctl c018620b 20000fe8 returned -14
[  121.618284] binder: 3856:3857 BC_FREE_BUFFER uffffffffffffffff no match
[  121.622181] binder: 3856:3857 got reply transaction with no transaction stack
[  121.626345] binder: 3856:3857 transaction failed 29201/-71, size
72-56 line 2747
[  121.628912] binder: 3856:3857 ioctl c0306201 20005fd0 returned -14
[  121.635620] binder: unexpected work type, 4, not freed
[  121.639753] binder: undelivered TRANSACTION_COMPLETE
[  121.645213] binder: undelivered TRANSACTION_ERROR: 29201
[  121.654860] binder: 3856:3857 BC_FREE_BUFFER u00000000ffffffff no match
[  121.667216] *** Guest State ***
[  121.667728] CR0: actual=0x0000000000000030,
shadow=0x0000000060000010, gh_mask=fffffffffffffff7
early console in extract_kernel
input_data: 0x0000000005f13276
input_len: 0x0000000001e7fa4c
output: 0x0000000001000000
output_len: 0x0000000005c85958
kernel_total_size: 0x0000000006db2000

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] Linux version 4.15.0-rc1-next-20171129
(dvyukov@dvyukov-z840.muc.corp.google.com) (gcc version 7.1.1 20170620
(GCC)) #1 SMP Fri Dec 15 09:25:01 CET 2017
[    0.000000] Command line: kvm-intel.nested=1
kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
kvm-intel.flexpriority=1 kvm-intel.vpid=1
kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
panic_on_warn=1 panic=86400
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is
832 bytes, using 'standard' format.
[    0.000000] e820: BIOS-provided physical RAM map:
...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15  9:13               ` Dmitry Vyukov
@ 2017-12-15  9:38                 ` Dmitry Vyukov
  2017-12-15  9:40                   ` Wanpeng Li
  2017-12-15  9:51                   ` David Hildenbrand
  0 siblings, 2 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2017-12-15  9:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov,
	Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List,
	Kyle Huey, Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář,
	KVM list, tianyu.lan, James Mattson, Wanpeng Li,
	David Hildenbrand

On Fri, Dec 15, 2017 at 10:13 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Dec 15, 2017 at 10:07 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
>>>> <torvalds@linux-foundation.org> wrote:
>>>>> I don't think that's the case. "int3" is entirely synchronous, and
>>>>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>>>>> etc). It's literally just a one-byte shorthand for "int $3".
>>>>
>>>> The SDM says precisely the same thing about INT N, so, whichever way
>>>> you dice it, int3 is a benign exception.
>>>
>>> That just means that it doesn't double-fault when it takes the page fault.
>>>
>>> Which we already know, because we see a page fault, not a double fault.
>>>
>>>> 0xfffffffffffffff8 is *exactly* where the fault would be if the
>>>> microcoded push of SS faulted if the IST contained zeros.
>>>
>>> Yes, I suspect it's the stack that is buggered for some reason.
>>>
>>>>> Plus I think the instruction that gets overwritten is just a 5-byte
>>>>> nop isn't it? So it really shouldn't take a fault without the "int3"
>>>>> overwriting.
>>>>
>>>> Unless it was being overwritten the other way and the oops hit while
>>>> tracing was being turned *off*.
>>>
>>> Doesn't really matter. The two forms of that instruction are "5-byte
>>> nop" and "unconditional branch".
>>>
>>> Neither of them will write to anything - the only page fault they
>>> could take is for instruction fetch.
>>>
>>> So it really must be the "int3" that fails. Unless we're looking at
>>> some odd CPU errata, which sounds very very unlikely.
>>
>> FTR the commit is:
>>
>> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
>> Author: Stephen Rothwell <sfr@canb.auug.org.au>
>> Date:   Wed Nov 29 14:09:56 2017 +1100
>>     Add linux-next specific files for 20171129
>>
>> You can get it from
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git
>> Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz
>> Config was attached.
>>
>> I've built this exact kernel and here is __switch_to disasm:
>> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt
>>
>> __switch_to+0x95b seems to point to (?):
>>
>> ffffffff81252f6b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
>>
>> which is branch target alignment nop.
>>
>> We have a bunch of semi-similar non-sense crashes on syzbot:
>>
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ
>>
>> Lots of them are on 0xfffffffffffffff8 address.
>>
>> I have some suspicion towards KVM. Potentially a nested KVM messed
>> host processor state (CRn or page tables) so that then we get these
>> weird crashes.
>>
>> One question: how would triple-fault look like? I am asking because we
>> have hundreds of cases where kernel just starts silently rebooting
>> while running some unprivileged syscalls:
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ
>> Can these be triple faults? Reproducer for that one also seems to be
>> related to KVM.
>
>
>
> Well, actually replying log for this crash and for
> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
> with:
>
> ./syz-execprog -procs=10 -sandbox=namespace -repeat=0 raw.txt
> (you can find exact instructions on how to do this here
> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md)
>
> I've got:
>
>
> [  121.553588] binder: 3856:3857 ioctl 40046205 0 returned -22
> [  121.557656] binder: 3856:3857 ERROR: BC_REGISTER_LOOPER called
> without request
> [  121.559744] binder: 3857 RLIMIT_NICE not set
> [  121.586339] binder: 3857 RLIMIT_NICE not set
> [  121.591764] binder: 3856:3857 unknown command 1400526783
> [  121.593226] binder: 3856:3857 ioctl c0306201 20002fd0 returned -22
> [  121.598292] binder: 3857 RLIMIT_NICE not set
> [  121.600827] binder: 3856:3857 ioctl c018620b 20000fe8 returned -14
> [  121.618284] binder: 3856:3857 BC_FREE_BUFFER uffffffffffffffff no match
> [  121.622181] binder: 3856:3857 got reply transaction with no transaction stack
> [  121.626345] binder: 3856:3857 transaction failed 29201/-71, size
> 72-56 line 2747
> [  121.628912] binder: 3856:3857 ioctl c0306201 20005fd0 returned -14
> [  121.635620] binder: unexpected work type, 4, not freed
> [  121.639753] binder: undelivered TRANSACTION_COMPLETE
> [  121.645213] binder: undelivered TRANSACTION_ERROR: 29201
> [  121.654860] binder: 3856:3857 BC_FREE_BUFFER u00000000ffffffff no match
> [  121.667216] *** Guest State ***
> [  121.667728] CR0: actual=0x0000000000000030,
> shadow=0x0000000060000010, gh_mask=fffffffffffffff7
> early console in extract_kernel
> input_data: 0x0000000005f13276
> input_len: 0x0000000001e7fa4c
> output: 0x0000000001000000
> output_len: 0x0000000005c85958
> kernel_total_size: 0x0000000006db2000
>
> Decompressing Linux... Parsing ELF... done.
> Booting the kernel.
> [    0.000000] Linux version 4.15.0-rc1-next-20171129
> (dvyukov@dvyukov-z840.muc.corp.google.com) (gcc version 7.1.1 20170620
> (GCC)) #1 SMP Fri Dec 15 09:25:01 CET 2017
> [    0.000000] Command line: kvm-intel.nested=1
> kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
> kvm-intel.flexpriority=1 kvm-intel.vpid=1
> kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
> kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
> kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
> earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
> panic_on_warn=1 panic=86400
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
> point registers'
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is
> 832 bytes, using 'standard' format.
> [    0.000000] e820: BIOS-provided physical RAM map:
> ...


Well, the crash was minimized down to:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#define _GNU_SOURCE
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main()
{
  int fd = open("/dev/kvm", 0x80102ul);
  int vm = ioctl(fd, KVM_CREATE_VM, 0);
  int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
  ioctl(cpu, KVM_RUN, 0);
  return 0;
}

And, yes, this in fact triggers instant reboot of kernel (running in qemu).
Am I missing something here?

+kvm maintainers, you can see full thread here:
https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15  9:38                 ` Dmitry Vyukov
@ 2017-12-15  9:40                   ` Wanpeng Li
  2017-12-15  9:51                   ` David Hildenbrand
  1 sibling, 0 replies; 20+ messages in thread
From: Wanpeng Li @ 2017-12-15  9:40 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, syzbot,
	Borislav Petkov, Dmitry Safonov, Peter Anvin,
	Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu, James Mattson, David Hildenbrand

2017-12-15 17:38 GMT+08:00 Dmitry Vyukov <dvyukov@google.com>:
> On Fri, Dec 15, 2017 at 10:13 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Dec 15, 2017 at 10:07 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Thu, Dec 14, 2017 at 10:39 PM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>> On Thu, Dec 14, 2017 at 1:27 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>> On Thu, Dec 14, 2017 at 11:28 AM, Linus Torvalds
>>>>> <torvalds@linux-foundation.org> wrote:
>>>>>> I don't think that's the case. "int3" is entirely synchronous, and
>>>>>> doesn't have the same odd issues as a breakpoint trap (which honors RF
>>>>>> etc). It's literally just a one-byte shorthand for "int $3".
>>>>>
>>>>> The SDM says precisely the same thing about INT N, so, whichever way
>>>>> you dice it, int3 is a benign exception.
>>>>
>>>> That just means that it doesn't double-fault when it takes the page fault.
>>>>
>>>> Which we already know, because we see a page fault, not a double fault.
>>>>
>>>>> 0xfffffffffffffff8 is *exactly* where the fault would be if the
>>>>> microcoded push of SS faulted if the IST contained zeros.
>>>>
>>>> Yes, I suspect it's the stack that is buggered for some reason.
>>>>
>>>>>> Plus I think the instruction that gets overwritten is just a 5-byte
>>>>>> nop isn't it? So it really shouldn't take a fault without the "int3"
>>>>>> overwriting.
>>>>>
>>>>> Unless it was being overwritten the other way and the oops hit while
>>>>> tracing was being turned *off*.
>>>>
>>>> Doesn't really matter. The two forms of that instruction are "5-byte
>>>> nop" and "unconditional branch".
>>>>
>>>> Neither of them will write to anything - the only page fault they
>>>> could take is for instruction fetch.
>>>>
>>>> So it really must be the "int3" that fails. Unless we're looking at
>>>> some odd CPU errata, which sounds very very unlikely.
>>>
>>> FTR the commit is:
>>>
>>> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
>>> Author: Stephen Rothwell <sfr@canb.auug.org.au>
>>> Date:   Wed Nov 29 14:09:56 2017 +1100
>>>     Add linux-next specific files for 20171129
>>>
>>> You can get it from
>>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git
>>> Compiler is this: https://storage.googleapis.com/syzkaller/gcc-7.tar.gz
>>> Config was attached.
>>>
>>> I've built this exact kernel and here is __switch_to disasm:
>>> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt
>>>
>>> __switch_to+0x95b seems to point to (?):
>>>
>>> ffffffff81252f6b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
>>>
>>> which is branch target alignment nop.
>>>
>>> We have a bunch of semi-similar non-sense crashes on syzbot:
>>>
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/9nMSJo9jmGs/tkRYgZ-XAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/04-q4OZrerA/XfYdNnWXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/6iC6rPtAHKQ/UiZ4fnWXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/2zSDbzRIH_k/SLCMqmeXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/uEsjx8VISco/Mwu_pbGWAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/kZ6Z7UQLbCQ/JHpjTGeXAwAJ
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/UjYsJxiGxwU/mponQq2XAwAJ
>>>
>>> Lots of them are on 0xfffffffffffffff8 address.
>>>
>>> I have some suspicion towards KVM. Potentially a nested KVM messed
>>> host processor state (CRn or page tables) so that then we get these
>>> weird crashes.
>>>
>>> One question: how would triple-fault look like? I am asking because we
>>> have hundreds of cases where kernel just starts silently rebooting
>>> while running some unprivileged syscalls:
>>> https://groups.google.com/forum/#!msg/syzkaller-bugs/w8dkVNrgzrc/4mLJLOAbCgAJ
>>> Can these be triple faults? Reproducer for that one also seems to be
>>> related to KVM.
>>
>>
>>
>> Well, actually replying log for this crash and for
>> https://groups.google.com/forum/#!msg/syzkaller-bugs/zGz7AVtMBV0/X_-CPbjNAgAJ
>> with:
>>
>> ./syz-execprog -procs=10 -sandbox=namespace -repeat=0 raw.txt
>> (you can find exact instructions on how to do this here
>> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md)
>>
>> I've got:
>>
>>
>> [  121.553588] binder: 3856:3857 ioctl 40046205 0 returned -22
>> [  121.557656] binder: 3856:3857 ERROR: BC_REGISTER_LOOPER called
>> without request
>> [  121.559744] binder: 3857 RLIMIT_NICE not set
>> [  121.586339] binder: 3857 RLIMIT_NICE not set
>> [  121.591764] binder: 3856:3857 unknown command 1400526783
>> [  121.593226] binder: 3856:3857 ioctl c0306201 20002fd0 returned -22
>> [  121.598292] binder: 3857 RLIMIT_NICE not set
>> [  121.600827] binder: 3856:3857 ioctl c018620b 20000fe8 returned -14
>> [  121.618284] binder: 3856:3857 BC_FREE_BUFFER uffffffffffffffff no match
>> [  121.622181] binder: 3856:3857 got reply transaction with no transaction stack
>> [  121.626345] binder: 3856:3857 transaction failed 29201/-71, size
>> 72-56 line 2747
>> [  121.628912] binder: 3856:3857 ioctl c0306201 20005fd0 returned -14
>> [  121.635620] binder: unexpected work type, 4, not freed
>> [  121.639753] binder: undelivered TRANSACTION_COMPLETE
>> [  121.645213] binder: undelivered TRANSACTION_ERROR: 29201
>> [  121.654860] binder: 3856:3857 BC_FREE_BUFFER u00000000ffffffff no match
>> [  121.667216] *** Guest State ***
>> [  121.667728] CR0: actual=0x0000000000000030,
>> shadow=0x0000000060000010, gh_mask=fffffffffffffff7
>> early console in extract_kernel
>> input_data: 0x0000000005f13276
>> input_len: 0x0000000001e7fa4c
>> output: 0x0000000001000000
>> output_len: 0x0000000005c85958
>> kernel_total_size: 0x0000000006db2000
>>
>> Decompressing Linux... Parsing ELF... done.
>> Booting the kernel.
>> [    0.000000] Linux version 4.15.0-rc1-next-20171129
>> (dvyukov@dvyukov-z840.muc.corp.google.com) (gcc version 7.1.1 20170620
>> (GCC)) #1 SMP Fri Dec 15 09:25:01 CET 2017
>> [    0.000000] Command line: kvm-intel.nested=1
>> kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
>> kvm-intel.flexpriority=1 kvm-intel.vpid=1
>> kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
>> kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
>> kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
>> earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
>> panic_on_warn=1 panic=86400
>> [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
>> point registers'
>> [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
>> [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
>> [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
>> [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is
>> 832 bytes, using 'standard' format.
>> [    0.000000] e820: BIOS-provided physical RAM map:
>> ...
>
>
> Well, the crash was minimized down to:
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include <sys/syscall.h>
> #include <sys/ioctl.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <linux/kvm.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <string.h>
>
> int main()
> {
>   int fd = open("/dev/kvm", 0x80102ul);
>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>   ioctl(cpu, KVM_RUN, 0);
>   return 0;
> }
>
> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
> Am I missing something here?
>
> +kvm maintainers, you can see full thread here:
> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw

I will have a try.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15  9:07             ` Dmitry Vyukov
  2017-12-15  9:13               ` Dmitry Vyukov
@ 2017-12-15  9:49               ` Thomas Gleixner
  1 sibling, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2017-12-15  9:49 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Linus Torvalds, Andy Lutomirski, syzbot, Borislav Petkov,
	Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List,
	Kyle Huey, Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Fri, 15 Dec 2017, Dmitry Vyukov wrote:
> I've built this exact kernel and here is __switch_to disasm:
> https://gist.githubusercontent.com/dvyukov/8137559f7da08fbe32f9018972a4498c/raw/0ef2abf723b117f0d0f0306fd50e216d50c5cecb/gistfile1.txt
> 
> __switch_to+0x95b seems to point to (?):
> 
> ffffffff81252f6b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
> 
> which is branch target alignment nop.

Which is a place holder for a trace point as Linus pointed out and the
'faulting' instruction which is int3 shows that there is a tracepoint
install/remove in progress. Are your test cases fiddling with tracepoints?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15  9:38                 ` Dmitry Vyukov
  2017-12-15  9:40                   ` Wanpeng Li
@ 2017-12-15  9:51                   ` David Hildenbrand
  2017-12-15  9:58                     ` Wanpeng Li
  1 sibling, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2017-12-15  9:51 UTC (permalink / raw)
  To: Dmitry Vyukov, Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, syzbot, Borislav Petkov,
	Dmitry Safonov, Peter Anvin, Linux Kernel Mailing List,
	Kyle Huey, Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář,
	KVM list, tianyu.lan, James Mattson, Wanpeng Li


> int main()
> {
>   int fd = open("/dev/kvm", 0x80102ul);
>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);

Not even a memory region :) So maybe the first memory access directly
triggers a fault?

>   ioctl(cpu, KVM_RUN, 0);
>   return 0;
> }
> 
> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
> Am I missing something here?
> 
> +kvm maintainers, you can see full thread here:
> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15  9:51                   ` David Hildenbrand
@ 2017-12-15  9:58                     ` Wanpeng Li
  2017-12-15 10:02                       ` Dmitry Vyukov
  0 siblings, 1 reply; 20+ messages in thread
From: Wanpeng Li @ 2017-12-15  9:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dmitry Vyukov, Linus Torvalds, Andy Lutomirski, Thomas Gleixner,
	syzbot, Borislav Petkov, Dmitry Safonov, Peter Anvin,
	Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu, James Mattson

2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
>
>> int main()
>> {
>>   int fd = open("/dev/kvm", 0x80102ul);
>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>
> Not even a memory region :) So maybe the first memory access directly
> triggers a fault?
>
>>   ioctl(cpu, KVM_RUN, 0);
>>   return 0;
>> }
>>
>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
>> Am I missing something here?
>>
>> +kvm maintainers, you can see full thread here:
>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw

I didn't see any issue after running the test.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15  9:58                     ` Wanpeng Li
@ 2017-12-15 10:02                       ` Dmitry Vyukov
  2017-12-15 16:16                           ` Andy Lutomirski
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Vyukov @ 2017-12-15 10:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: David Hildenbrand, Linus Torvalds, Andy Lutomirski,
	Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu, James Mattson

On Fri, Dec 15, 2017 at 10:58 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
>>
>>> int main()
>>> {
>>>   int fd = open("/dev/kvm", 0x80102ul);
>>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>>
>> Not even a memory region :) So maybe the first memory access directly
>> triggers a fault?
>>
>>>   ioctl(cpu, KVM_RUN, 0);
>>>   return 0;
>>> }
>>>
>>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
>>> Am I missing something here?
>>>
>>> +kvm maintainers, you can see full thread here:
>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
>
> I didn't see any issue after running the test.

Yes, it's strange. But I can reproduce it. There must be something
different in our setups.
Here is how to build exact same kernel:
https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ

Here is how I start qemu:

qemu-system-x86_64 -hda wheezy.img -net
user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
arch/x86/boot/bzImage -append "kvm-intel.nested=1
kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
kvm-intel.flexpriority=1 kvm-intel.vpid=1
kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
-cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all

The image is here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce

Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15 10:02                       ` Dmitry Vyukov
@ 2017-12-15 16:16                           ` Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Lutomirski @ 2017-12-15 16:16 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Wanpeng Li, David Hildenbrand, Linus Torvalds, Andy Lutomirski,
	Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu, James Mattson

On Fri, Dec 15, 2017 at 2:02 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Dec 15, 2017 at 10:58 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
>>>
>>>> int main()
>>>> {
>>>>   int fd = open("/dev/kvm", 0x80102ul);
>>>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>>>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>>>
>>> Not even a memory region :) So maybe the first memory access directly
>>> triggers a fault?
>>>
>>>>   ioctl(cpu, KVM_RUN, 0);
>>>>   return 0;
>>>> }
>>>>
>>>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
>>>> Am I missing something here?
>>>>
>>>> +kvm maintainers, you can see full thread here:
>>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
>>
>> I didn't see any issue after running the test.
>
> Yes, it's strange. But I can reproduce it. There must be something
> different in our setups.
> Here is how to build exact same kernel:
> https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ
>
> Here is how I start qemu:
>
> qemu-system-x86_64 -hda wheezy.img -net
> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> arch/x86/boot/bzImage -append "kvm-intel.nested=1
> kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
> kvm-intel.flexpriority=1 kvm-intel.vpid=1
> kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
> kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
> kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
> earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
> panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
> -cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all
>
> The image is here:
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
>
> Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3

Looking more closely, you seem to be testing this:

commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
Author: Stephen Rothwell <s...@canb.auug.org.au>
Date:   Wed Nov 29 14:09:56 2017 +1100
    Add linux-next specific files for 20171129

which is almost certainly missing this fix:

https://lkml.kernel.org/r/bc7296f4c8d86af71c31a17588c79d89c0890edc.1512109321.git.luto@kernel.org

on account of the fix being sent the day after the tag.

The symptoms you're seeing are definitely consistent with a screwed up
TSS after VM exit.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
@ 2017-12-15 16:16                           ` Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Lutomirski @ 2017-12-15 16:16 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Wanpeng Li, David Hildenbrand, Linus Torvalds, Andy Lutomirski,
	Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu

On Fri, Dec 15, 2017 at 2:02 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Dec 15, 2017 at 10:58 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
>>>
>>>> int main()
>>>> {
>>>>   int fd = open("/dev/kvm", 0x80102ul);
>>>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>>>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>>>
>>> Not even a memory region :) So maybe the first memory access directly
>>> triggers a fault?
>>>
>>>>   ioctl(cpu, KVM_RUN, 0);
>>>>   return 0;
>>>> }
>>>>
>>>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
>>>> Am I missing something here?
>>>>
>>>> +kvm maintainers, you can see full thread here:
>>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
>>
>> I didn't see any issue after running the test.
>
> Yes, it's strange. But I can reproduce it. There must be something
> different in our setups.
> Here is how to build exact same kernel:
> https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ
>
> Here is how I start qemu:
>
> qemu-system-x86_64 -hda wheezy.img -net
> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> arch/x86/boot/bzImage -append "kvm-intel.nested=1
> kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
> kvm-intel.flexpriority=1 kvm-intel.vpid=1
> kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
> kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
> kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
> earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
> panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
> -cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all
>
> The image is here:
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
>
> Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3

Looking more closely, you seem to be testing this:

commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
Author: Stephen Rothwell <s...@canb.auug.org.au>
Date:   Wed Nov 29 14:09:56 2017 +1100
    Add linux-next specific files for 20171129

which is almost certainly missing this fix:

https://lkml.kernel.org/r/bc7296f4c8d86af71c31a17588c79d89c0890edc.1512109321.git.luto@kernel.org

on account of the fix being sent the day after the tag.

The symptoms you're seeing are definitely consistent with a screwed up
TSS after VM exit.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15 16:16                           ` Andy Lutomirski
@ 2017-12-15 16:44                             ` Ingo Molnar
  -1 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2017-12-15 16:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Vyukov, Wanpeng Li, David Hildenbrand, Linus Torvalds,
	Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu, James Mattson


* Andy Lutomirski <luto@kernel.org> wrote:

> On Fri, Dec 15, 2017 at 2:02 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Fri, Dec 15, 2017 at 10:58 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> >> 2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
> >>>
> >>>> int main()
> >>>> {
> >>>>   int fd = open("/dev/kvm", 0x80102ul);
> >>>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
> >>>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
> >>>
> >>> Not even a memory region :) So maybe the first memory access directly
> >>> triggers a fault?
> >>>
> >>>>   ioctl(cpu, KVM_RUN, 0);
> >>>>   return 0;
> >>>> }
> >>>>
> >>>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
> >>>> Am I missing something here?
> >>>>
> >>>> +kvm maintainers, you can see full thread here:
> >>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
> >>
> >> I didn't see any issue after running the test.
> >
> > Yes, it's strange. But I can reproduce it. There must be something
> > different in our setups.
> > Here is how to build exact same kernel:
> > https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ
> >
> > Here is how I start qemu:
> >
> > qemu-system-x86_64 -hda wheezy.img -net
> > user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> > arch/x86/boot/bzImage -append "kvm-intel.nested=1
> > kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
> > kvm-intel.flexpriority=1 kvm-intel.vpid=1
> > kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
> > kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
> > kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
> > earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
> > panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
> > -cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all
> >
> > The image is here:
> > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
> >
> > Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3
> 
> Looking more closely, you seem to be testing this:
> 
> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
> Author: Stephen Rothwell <s...@canb.auug.org.au>
> Date:   Wed Nov 29 14:09:56 2017 +1100
>     Add linux-next specific files for 20171129
> 
> which is almost certainly missing this fix:
> 
> https://lkml.kernel.org/r/bc7296f4c8d86af71c31a17588c79d89c0890edc.1512109321.git.luto@kernel.org
> 
> on account of the fix being sent the day after the tag.
> 
> The symptoms you're seeing are definitely consistent with a screwed up
> TSS after VM exit.

Note that this should all be fixed in WIP.x86/pti.

If you have:

  5ed1fcd523b9: x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss

then you should be fine.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
@ 2017-12-15 16:44                             ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2017-12-15 16:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Vyukov, Wanpeng Li, David Hildenbrand, Linus Torvalds,
	Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu


* Andy Lutomirski <luto@kernel.org> wrote:

> On Fri, Dec 15, 2017 at 2:02 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Fri, Dec 15, 2017 at 10:58 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> >> 2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
> >>>
> >>>> int main()
> >>>> {
> >>>>   int fd = open("/dev/kvm", 0x80102ul);
> >>>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
> >>>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
> >>>
> >>> Not even a memory region :) So maybe the first memory access directly
> >>> triggers a fault?
> >>>
> >>>>   ioctl(cpu, KVM_RUN, 0);
> >>>>   return 0;
> >>>> }
> >>>>
> >>>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
> >>>> Am I missing something here?
> >>>>
> >>>> +kvm maintainers, you can see full thread here:
> >>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
> >>
> >> I didn't see any issue after running the test.
> >
> > Yes, it's strange. But I can reproduce it. There must be something
> > different in our setups.
> > Here is how to build exact same kernel:
> > https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ
> >
> > Here is how I start qemu:
> >
> > qemu-system-x86_64 -hda wheezy.img -net
> > user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> > arch/x86/boot/bzImage -append "kvm-intel.nested=1
> > kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
> > kvm-intel.flexpriority=1 kvm-intel.vpid=1
> > kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
> > kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
> > kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
> > earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
> > panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
> > -cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all
> >
> > The image is here:
> > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
> >
> > Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3
> 
> Looking more closely, you seem to be testing this:
> 
> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
> Author: Stephen Rothwell <s...@canb.auug.org.au>
> Date:   Wed Nov 29 14:09:56 2017 +1100
>     Add linux-next specific files for 20171129
> 
> which is almost certainly missing this fix:
> 
> https://lkml.kernel.org/r/bc7296f4c8d86af71c31a17588c79d89c0890edc.1512109321.git.luto@kernel.org
> 
> on account of the fix being sent the day after the tag.
> 
> The symptoms you're seeing are definitely consistent with a screwed up
> TSS after VM exit.

Note that this should all be fixed in WIP.x86/pti.

If you have:

  5ed1fcd523b9: x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss

then you should be fine.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
  2017-12-15 16:44                             ` Ingo Molnar
@ 2017-12-19 11:48                               ` Dmitry Vyukov
  -1 siblings, 0 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2017-12-19 11:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Wanpeng Li, David Hildenbrand, Linus Torvalds,
	Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu, James Mattson

On Fri, Dec 15, 2017 at 5:44 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@kernel.org> wrote:
>
>> On Fri, Dec 15, 2017 at 2:02 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> > On Fri, Dec 15, 2017 at 10:58 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> >> 2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
>> >>>
>> >>>> int main()
>> >>>> {
>> >>>>   int fd = open("/dev/kvm", 0x80102ul);
>> >>>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>> >>>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>> >>>
>> >>> Not even a memory region :) So maybe the first memory access directly
>> >>> triggers a fault?
>> >>>
>> >>>>   ioctl(cpu, KVM_RUN, 0);
>> >>>>   return 0;
>> >>>> }
>> >>>>
>> >>>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
>> >>>> Am I missing something here?
>> >>>>
>> >>>> +kvm maintainers, you can see full thread here:
>> >>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
>> >>
>> >> I didn't see any issue after running the test.
>> >
>> > Yes, it's strange. But I can reproduce it. There must be something
>> > different in our setups.
>> > Here is how to build exact same kernel:
>> > https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ
>> >
>> > Here is how I start qemu:
>> >
>> > qemu-system-x86_64 -hda wheezy.img -net
>> > user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
>> > arch/x86/boot/bzImage -append "kvm-intel.nested=1
>> > kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
>> > kvm-intel.flexpriority=1 kvm-intel.vpid=1
>> > kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
>> > kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
>> > kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
>> > earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
>> > panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
>> > -cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all
>> >
>> > The image is here:
>> > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
>> >
>> > Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3
>>
>> Looking more closely, you seem to be testing this:
>>
>> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
>> Author: Stephen Rothwell <s...@canb.auug.org.au>
>> Date:   Wed Nov 29 14:09:56 2017 +1100
>>     Add linux-next specific files for 20171129
>>
>> which is almost certainly missing this fix:
>>
>> https://lkml.kernel.org/r/bc7296f4c8d86af71c31a17588c79d89c0890edc.1512109321.git.luto@kernel.org
>>
>> on account of the fix being sent the day after the tag.
>>
>> The symptoms you're seeing are definitely consistent with a screwed up
>> TSS after VM exit.
>
> Note that this should all be fixed in WIP.x86/pti.
>
> If you have:
>
>   5ed1fcd523b9: x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
>
> then you should be fine.

Let's tell syzbot about the fix:

#syz fix:
x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: BUG: unable to handle kernel paging request in __switch_to
@ 2017-12-19 11:48                               ` Dmitry Vyukov
  0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2017-12-19 11:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Wanpeng Li, David Hildenbrand, Linus Torvalds,
	Thomas Gleixner, syzbot, Borislav Petkov, Dmitry Safonov,
	Peter Anvin, Linux Kernel Mailing List, Kyle Huey, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers, Paolo Bonzini,
	Radim Krčmář,
	KVM list, Lan, Tianyu

On Fri, Dec 15, 2017 at 5:44 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@kernel.org> wrote:
>
>> On Fri, Dec 15, 2017 at 2:02 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> > On Fri, Dec 15, 2017 at 10:58 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> >> 2017-12-15 17:51 GMT+08:00 David Hildenbrand <david@redhat.com>:
>> >>>
>> >>>> int main()
>> >>>> {
>> >>>>   int fd = open("/dev/kvm", 0x80102ul);
>> >>>>   int vm = ioctl(fd, KVM_CREATE_VM, 0);
>> >>>>   int  cpu = ioctl(vm, KVM_CREATE_VCPU, 4);
>> >>>
>> >>> Not even a memory region :) So maybe the first memory access directly
>> >>> triggers a fault?
>> >>>
>> >>>>   ioctl(cpu, KVM_RUN, 0);
>> >>>>   return 0;
>> >>>> }
>> >>>>
>> >>>> And, yes, this in fact triggers instant reboot of kernel (running in qemu).
>> >>>> Am I missing something here?
>> >>>>
>> >>>> +kvm maintainers, you can see full thread here:
>> >>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/_oveOKGm3jw
>> >>
>> >> I didn't see any issue after running the test.
>> >
>> > Yes, it's strange. But I can reproduce it. There must be something
>> > different in our setups.
>> > Here is how to build exact same kernel:
>> > https://groups.google.com/d/msg/syzkaller-bugs/_oveOKGm3jw/vc1tXvsbCgAJ
>> >
>> > Here is how I start qemu:
>> >
>> > qemu-system-x86_64 -hda wheezy.img -net
>> > user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
>> > arch/x86/boot/bzImage -append "kvm-intel.nested=1
>> > kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
>> > kvm-intel.flexpriority=1 kvm-intel.vpid=1
>> > kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
>> > kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
>> > kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
>> > earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
>> > panic_on_warn=1 panic=86400" -enable-kvm -pidfile vm_pid -m 2G -smp 4
>> > -cpu host -usb -usbdevice mouse -usbdevice tablet -soundhw all
>> >
>> > The image is here:
>> > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
>> >
>> > Host cpu is Intel(R) Xeon(R) CPU E5-2690 v3
>>
>> Looking more closely, you seem to be testing this:
>>
>> commit d127129e85a020879f334154300ddd3f7ec21c1e (HEAD, tag: next-20171129)
>> Author: Stephen Rothwell <s...@canb.auug.org.au>
>> Date:   Wed Nov 29 14:09:56 2017 +1100
>>     Add linux-next specific files for 20171129
>>
>> which is almost certainly missing this fix:
>>
>> https://lkml.kernel.org/r/bc7296f4c8d86af71c31a17588c79d89c0890edc.1512109321.git.luto@kernel.org
>>
>> on account of the fix being sent the day after the tag.
>>
>> The symptoms you're seeing are definitely consistent with a screwed up
>> TSS after VM exit.
>
> Note that this should all be fixed in WIP.x86/pti.
>
> If you have:
>
>   5ed1fcd523b9: x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
>
> then you should be fine.

Let's tell syzbot about the fix:

#syz fix:
x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-12-19 11:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <001a1145e8548cbd3d055f73374f@google.com>
2017-12-14 17:12 ` BUG: unable to handle kernel paging request in __switch_to Thomas Gleixner
2017-12-14 18:42   ` Linus Torvalds
2017-12-14 18:54     ` Andy Lutomirski
2017-12-14 19:28       ` Linus Torvalds
2017-12-14 21:27         ` Andy Lutomirski
2017-12-14 21:39           ` Linus Torvalds
2017-12-15  9:07             ` Dmitry Vyukov
2017-12-15  9:13               ` Dmitry Vyukov
2017-12-15  9:38                 ` Dmitry Vyukov
2017-12-15  9:40                   ` Wanpeng Li
2017-12-15  9:51                   ` David Hildenbrand
2017-12-15  9:58                     ` Wanpeng Li
2017-12-15 10:02                       ` Dmitry Vyukov
2017-12-15 16:16                         ` Andy Lutomirski
2017-12-15 16:16                           ` Andy Lutomirski
2017-12-15 16:44                           ` Ingo Molnar
2017-12-15 16:44                             ` Ingo Molnar
2017-12-19 11:48                             ` Dmitry Vyukov
2017-12-19 11:48                               ` Dmitry Vyukov
2017-12-15  9:49               ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.