linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
@ 2016-02-26 18:05 Linus Torvalds
  2016-02-26 18:17 ` Borislav Petkov
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Linus Torvalds @ 2016-02-26 18:05 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Jiri Slaby, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On Fri, Feb 26, 2016 at 9:52 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>
> So more analysis would seem to confirm that RSP has been bumped +8
> while in ttwu_stat() so when the epilog executed, register restore
> was off by 1 qword. However, there's nothing in ttwu_stat() that
> results in stack pointer offset by +1 qword from prolog.

I agree.

That's why I'm actually starting to suspect that it's an AMD microcode
bug that we know very little about. There's apparently register
corruption (the guess being from NMI handling, but virtualization was
also involved) under some circumstances.

Of course, if Jiri isn't actually running this on an AMD CPU, that
theory flies right out the window. But we do have a reported oops on
the security list that looks totally different in the big picture, but
shares the exact same "corrupted stack pointer register state
resulting in crazy instruction pointer, resulting in NX fault"
behavior in the end.

In the other case, microcode patchlevel 0x0600081c was fine, and
0x06000832 is the one exhibiting the corruption problem.

I've contacted Robert Święcki (who found the microcode problem) in
case he wants to weigh in in this thread.. He was talking to some AMD
people, but I don't know the exactly who.

                  Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26 18:05 BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2] Linus Torvalds
@ 2016-02-26 18:17 ` Borislav Petkov
  2016-02-26 18:18 ` Peter Hurley
  2016-02-26 19:44 ` Linus Torvalds
  2 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2016-02-26 18:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Jiri Slaby, Greg KH, Linux Kernel Mailing List,
	Andrew Morton, stable, lwn, Steven Rostedt

On Fri, Feb 26, 2016 at 10:05:26AM -0800, Linus Torvalds wrote:
> Of course, if Jiri isn't actually running this on an AMD CPU, that
> theory flies right out the window. But we do have a reported oops on
> the security list that looks totally different in the big picture, but
> shares the exact same "corrupted stack pointer register state
> resulting in crazy instruction pointer, resulting in NX fault"
> behavior in the end.
> 
> In the other case, microcode patchlevel 0x0600081c was fine, and
> 0x06000832 is the one exhibiting the corruption problem.
> 
> I've contacted Robert Święcki (who found the microcode problem) in
> case he wants to weigh in in this thread.. He was talking to some AMD
> people, but I don't know the exactly who.

It most likely is that problem. If Jiri is using the IBS machines -
and from quick look at http://labs.suse.cz/jslaby/bug-968218/gdb_log,
it looks like he is, then they are exactly those boxes with a b0rked
microcode patch. AMD is working on a fix.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26 18:05 BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2] Linus Torvalds
  2016-02-26 18:17 ` Borislav Petkov
@ 2016-02-26 18:18 ` Peter Hurley
  2016-02-26 19:44 ` Linus Torvalds
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Hurley @ 2016-02-26 18:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jiri Slaby, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/26/2016 10:05 AM, Linus Torvalds wrote:
> On Fri, Feb 26, 2016 at 9:52 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>
>> So more analysis would seem to confirm that RSP has been bumped +8
>> while in ttwu_stat() so when the epilog executed, register restore
>> was off by 1 qword. However, there's nothing in ttwu_stat() that
>> results in stack pointer offset by +1 qword from prolog.
> 
> I agree.
> 
> That's why I'm actually starting to suspect that it's an AMD microcode
> bug that we know very little about. There's apparently register
> corruption (the guess being from NMI handling, but virtualization was
> also involved) under some circumstances.

Yep, that could explain it.

> Of course, if Jiri isn't actually running this on an AMD CPU, that
> theory flies right out the window.

I'll wait for Jiri to confirm before sinking more time here.


> But we do have a reported oops on
> the security list that looks totally different in the big picture, but
> shares the exact same "corrupted stack pointer register state
> resulting in crazy instruction pointer, resulting in NX fault"
> behavior in the end.
> 
> In the other case, microcode patchlevel 0x0600081c was fine, and
> 0x06000832 is the one exhibiting the corruption problem.
> 
> I've contacted Robert Święcki (who found the microcode problem) in
> case he wants to weigh in in this thread.. He was talking to some AMD
> people, but I don't know the exactly who.

Ok, thanks for the info.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26 18:05 BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2] Linus Torvalds
  2016-02-26 18:17 ` Borislav Petkov
  2016-02-26 18:18 ` Peter Hurley
@ 2016-02-26 19:44 ` Linus Torvalds
  2016-02-26 19:59   ` Robert Święcki
  2 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2016-02-26 19:44 UTC (permalink / raw)
  To: Peter Hurley, Robert Święcki
  Cc: Jiri Slaby, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On Fri, Feb 26, 2016 at 10:05 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I've contacted Robert Święcki (who found the microcode problem) in
> case he wants to weigh in in this thread.. He was talking to some AMD
> people, but I don't know the exactly who.

And since it's looking increasingly likely that it's the same issue,
I'm adding Robert here explicitly to the cc so that he sees the
thread...

                Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26 19:44 ` Linus Torvalds
@ 2016-02-26 19:59   ` Robert Święcki
  2016-02-29  7:39     ` Jiri Slaby
  0 siblings, 1 reply; 30+ messages in thread
From: Robert Święcki @ 2016-02-26 19:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Jiri Slaby, Greg KH, Linux Kernel Mailing List,
	Andrew Morton, stable, lwn, Steven Rostedt

2016-02-26 20:44 GMT+01:00 Linus Torvalds <torvalds@linux-foundation.org>:

>> I've contacted Robert Święcki (who found the microcode problem) in
>> case he wants to weigh in in this thread.. He was talking to some AMD
>> people, but I don't know the exactly who.
>
> And since it's looking increasingly likely that it's the same issue,
> I'm adding Robert here explicitly to the cc so that he sees the
> thread...

Thx,

Some data I was able to gather:

It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e.
newer AMD FX, and Opteron 300 series (4300, 6300 etc.).

The visible effects are in ~80% of cases incorrect RSP leading to bad
'rets' into kernel data/bss or stack-protector faults. But there are
also more elusive ones, like registers being cleared before use in
indirect memory fetches or so.

I can trigger it from within qemu guest (non-root), causing bad RIP in
the host kernel. When testing, a couple of times (maybe 2) out of
maybe 30 seen oopses, I was able to set it to user-space addresses
mapped in the guest. It greatly depends on timing, but I think with
some more effort and populating kernel stack with guest addresses it'd
be possible to create a more reliable qemu-guest to host ring0 escape.

I CC'd some AMD engineers from this list, and on of them replied with
"We are working on the final testing of a new microcode patch to
replace 0x06000832."
but without specifying any errata no, or ETA for the new ucode.

I can only now suggest not using 0x06000832 is possible (i.e. if it's
not embedded in BIOS), I tested a few from
http://www.amd64.org/microcode.html and only this version seemed
vulnerable.

PS. There's a bug on vmware pages -
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2061211
- which looks very similar to this problem (affects Opteron 6300 which
is Piledriver-based), and it was "somehow" patched by vmware in their
kernel. It points to AMD errata #815 -
http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf -
but I cannot tell whether it's really the same problem, or whether it
can be somehow by-passed on the kernel side.

-- 
Robert Święcki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26 19:59   ` Robert Święcki
@ 2016-02-29  7:39     ` Jiri Slaby
  2016-02-29 12:43       ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Slaby @ 2016-02-29  7:39 UTC (permalink / raw)
  To: Robert Święcki, Linus Torvalds
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/26/2016, 08:59 PM, Robert Święcki wrote:
> It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e.
> newer AMD FX, and Opteron 300 series (4300, 6300 etc.).

Ok, I can confirm this is:
AMD Opteron(tm) Processor 6348

And:
microcode: CPU0: patch_level=0x06000836

Thank all the interested parties!

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-29  7:39     ` Jiri Slaby
@ 2016-02-29 12:43       ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 30+ messages in thread
From: Henrique de Moraes Holschuh @ 2016-02-29 12:43 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Robert Święcki, Linus Torvalds, Peter Hurley, Greg KH,
	Linux Kernel Mailing List, Andrew Morton, stable, lwn,
	Steven Rostedt

On Mon, 29 Feb 2016, Jiri Slaby wrote:
> On 02/26/2016, 08:59 PM, Robert Święcki wrote:
> > It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e.
> > newer AMD FX, and Opteron 300 series (4300, 6300 etc.).
> 
> Ok, I can confirm this is:
> AMD Opteron(tm) Processor 6348
> 
> And:
> microcode: CPU0: patch_level=0x06000836
> 
> Thank all the interested parties!

Jiri, does microcode 0x6000836 *fix* the erratum in microcode 0x6000832?

Or did you mean both 0x6000832 and 0x6000836 have the same erratum discussed
in this thread?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26 17:12                 ` Linus Torvalds
@ 2016-02-29 15:45                   ` Paolo Bonzini
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2016-02-29 15:45 UTC (permalink / raw)
  To: Linus Torvalds, Jiri Slaby
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt



On 26/02/2016 18:12, Linus Torvalds wrote:
> It does feel like CPU state corruption - either due to a qemu bug, or
> due to some odd trap/interrupt handling bug of ours.
> 
> Or possibly a CPU/microcode bug. You wouldn't happen to run this on an
> AMD Piledriver-based CPU with the 0x06000832 microcode?
> 
> Because we do have a pending qemu-related bug-report that turned out
> to be a AMD microcode problem with NMI delivery. Looking at that bug
> report, it actually looks rather similar - also due to a confused RIP.

Just a couple notes about QEMU and KVM...

First, if you suspect a QEMU or KVM bug, feel free to Cc me.

Second, people generally say "QEMU" because that's what the SMBIOS info
says, but it's helpful to distinguish the two.  Nowadays it's almost
always KVM, but at least Intel was running tests on QEMU's binary
translator (no VT-x, no KVM) because it supported SMEP and SMAP long
before hardware was common.  Similarly, the next version of QEMU should
support PKE so perhaps someone will be using it again to play with PKE.

Third, suspected QEMU bugs almost always end up being QEMU bugs, but KVM
bugs rarely show up as random crashes in a Linux guest.  KVM does
_really_ little these days unless the host is swapping.  (The puzzling
aspect of the NMI microcode issue was that it was a plausible KVM bug,
but such a KVM bug would have either showed up also on Intel, or if
AMD-only also on other kinds of interrupts than NMIs).  On the other
hand, if your host is swapping and you hit a KVM bug, it's the host that
would crash, not the guest.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26  8:56               ` Jiri Slaby
  2016-02-26  9:23                 ` Jiri Slaby
  2016-02-26 17:12                 ` Linus Torvalds
@ 2016-02-26 17:52                 ` Peter Hurley
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Hurley @ 2016-02-26 17:52 UTC (permalink / raw)
  To: Jiri Slaby, Linus Torvalds
  Cc: Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn,
	Steven Rostedt

On 02/26/2016 12:56 AM, Jiri Slaby wrote:
> On 02/26/2016, 01:38 AM, Linus Torvalds wrote:
>> On Thu, Feb 25, 2016 at 1:32 PM, Jiri Slaby <jslaby@suse.cz> wrote:
>>>
>>> Interestingly, RBP contains address inside try_to_wake_up --
>>> ffffffff810a535a (dunno why) which is:
>>> ffffffff810a5355:       e8 66 a0 ff ff          callq  ffffffff8109f3c0
>>> <ttwu_stat>
>>> ffffffff810a535a:       e9 9d fe ff ff          jmpq   ffffffff810a51fc
>>> <try_to_wake_up+0x3c>
>>>
>>> ttwu_stat does in the begginning:
>>> mov    $0x16e80,%r14
>>>
>>> which is what we actually still have in r14 when it crashes. The first
>>> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
>>> be overwritten).
>>
>> Hmm. That does sound very much like it might be ttwu_stat() that has
>> gotten the stack frame wrong, and when finishes exits, it does
>>
>>         popq    %rbp
>>         ret
>>
>> but in fact it popped the return address, and then returned to a crazy address.
>>
>> Which sounds like a corrupted stack pointer (not a corrupted stack).

So more analysis would seem to confirm that RSP has been bumped +8
while in ttwu_stat() so when the epilog executed, register restore
was off by 1 qword. However, there's nothing in ttwu_stat() that
results in stack pointer offset by +1 qword from prolog.

Below I highlighted key instructions from try_to_wake_up() => ttwu_stat() and
what presumably was the resultant stack state at each instruction:


call try_to_wake_up   ffffffff810a5585  \
push rbp              ffff8800bb2a7c90   |
push r15              0000000000010e30   |
push r14              0000000000000005   |
push r13              ffff88017ed2a830   |- values from stack trace
push r12              ffff880234e26a08   |
push rbx              ffff88017ee19f00   |
sub  0x10, rsp        000000008146e197   /
                      ffff88023fd40000   => rip @ crash

call raw_spin_lock_irqsave
mov  rax, r13
mov  0x16e80, r15

mov  1, r12d
call ttwu_stat        ffffffff810a535a   => rbp @ crash
push rbp              ffff8800bb2a7c80   => r15 @ crash
push r15              0000000000016e80   => r14 @ crash
push r14              ffff8800bb37e180   => r13 @ crash
push r13              0000000000000046   => r12 @ crash
push r12              0000000000000001   => rbx @ crash
push rbx                    ???
sub  8, rsp                 ???


So in addition to rbp <= ret addr and r15 <= saved rbp, note also

  rbx <= saved r12 (== 1)
  r12 <= saved r13 (rflags == 00046)
  r14 <= saved r15 (== 0x16e80)

which neatly corresponds to the ttwu_stat() epilog if rsp has
been offset by +1 qword.


Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26  8:56               ` Jiri Slaby
  2016-02-26  9:23                 ` Jiri Slaby
@ 2016-02-26 17:12                 ` Linus Torvalds
  2016-02-29 15:45                   ` Paolo Bonzini
  2016-02-26 17:52                 ` Peter Hurley
  2 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2016-02-26 17:12 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On Fri, Feb 26, 2016 at 12:56 AM, Jiri Slaby <jslaby@suse.cz> wrote:
>
> Sure, both vmlinux w/ its separated .debuginfo sections vmlinux.debug
> are at:
> http://labs.suse.cz/jslaby/bug-968218/

I'm not seeing anything odd there.

It does feel like CPU state corruption - either due to a qemu bug, or
due to some odd trap/interrupt handling bug of ours.

Or possibly a CPU/microcode bug. You wouldn't happen to run this on an
AMD Piledriver-based CPU with the 0x06000832 microcode?

Because we do have a pending qemu-related bug-report that turned out
to be a AMD microcode problem with NMI delivery. Looking at that bug
report, it actually looks rather similar - also due to a confused RIP.

                   Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26  9:50                   ` Jiri Slaby
@ 2016-02-26 16:34                     ` Greg KH
  0 siblings, 0 replies; 30+ messages in thread
From: Greg KH @ 2016-02-26 16:34 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Linus Torvalds, Peter Hurley, Linux Kernel Mailing List,
	Andrew Morton, stable, lwn, Steven Rostedt

On Fri, Feb 26, 2016 at 10:50:27AM +0100, Jiri Slaby wrote:
> On 02/26/2016, 10:23 AM, Jiri Slaby wrote:
> > On 02/26/2016, 09:56 AM, Jiri Slaby wrote:
> >>> I really don't see how it would happen here - that code doesn't look
> >>> particularly odd.
> > 
> > Funnily enough, this is what I got today, when booting 4.4.2 in qemu VM
> > on my host.
> > 
> > RIP crashing (ffffffff810f28d5) is action->dev_id dereference in
> > handle_irq_event_percpu. Look:
> >    0xffffffff810f28d5 <+101>:   mov    0x8(%rbx),%rsi
> >    0xffffffff810f28d9 <+105>:   mov    %r12d,%edi
> >    0xffffffff810f28dc <+108>:   callq  *(%rbx)
> > which is
> >        trace_irq_handler_entry(irq, action);
> >        res = action->handler(irq, action->dev_id);
> >        trace_irq_handler_exit(irq, action, res);
> > 
> ...
> > So is this the same bug or not?
> 
> Seems not, actually. I think I need:
> commit 570540d50710ed192e98e2f7f74578c9486b6b05
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Wed Jan 13 14:07:25 2016 +0100
> 
>     genirq: Validate action before dereferencing it in
> handle_irq_event_percpu()

That's in my queue to pick up later today, sorry about that.

greg k-h

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26  9:23                 ` Jiri Slaby
@ 2016-02-26  9:50                   ` Jiri Slaby
  2016-02-26 16:34                     ` Greg KH
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Slaby @ 2016-02-26  9:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/26/2016, 10:23 AM, Jiri Slaby wrote:
> On 02/26/2016, 09:56 AM, Jiri Slaby wrote:
>>> I really don't see how it would happen here - that code doesn't look
>>> particularly odd.
> 
> Funnily enough, this is what I got today, when booting 4.4.2 in qemu VM
> on my host.
> 
> RIP crashing (ffffffff810f28d5) is action->dev_id dereference in
> handle_irq_event_percpu. Look:
>    0xffffffff810f28d5 <+101>:   mov    0x8(%rbx),%rsi
>    0xffffffff810f28d9 <+105>:   mov    %r12d,%edi
>    0xffffffff810f28dc <+108>:   callq  *(%rbx)
> which is
>        trace_irq_handler_entry(irq, action);
>        res = action->handler(irq, action->dev_id);
>        trace_irq_handler_exit(irq, action, res);
> 
...
> So is this the same bug or not?

Seems not, actually. I think I need:
commit 570540d50710ed192e98e2f7f74578c9486b6b05
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jan 13 14:07:25 2016 +0100

    genirq: Validate action before dereferencing it in
handle_irq_event_percpu()

> [1] http://labs.suse.cz/jslaby/bug-968218/
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810f28d5>] handle_irq_event_percpu+0x65/0x340
> PGD 0
> Oops: 0000 [#1] PREEMPT SMP
> Modules linked in: ...
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.2-13.g19ca782-default #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by
> qemu-project.org 04/01/2014
> task: ffffffff81e12540 ti: ffffffff81e00000 task.ti: ffffffff81e00000
> RIP: 0010:[<ffffffff810f28d5>]  [<ffffffff810f28d5>]
> handle_irq_event_percpu+0x65/0x340
> RSP: 0018:ffff880093e03d88  EFLAGS: 00010002
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000000f
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
> RBP: ffff880093e03dc8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000004
> R13: ffff880087c3b058 R14: 0000000000000000 R15: ffffffff81e03df8
> FS:  0000000000000000(0000) GS:ffff880093e00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 000000008a790000 CR4: 00000000000006f0
> Stack:
>  ffff880087c3b000 0000000087c3b0d8 ffff880087c3b058 ffff880087c3b000
>  ffff880087c3b0d8 ffff880087c3b058 0000000000000034 ffffffff81e03df8
>  ffff880093e03df0 ffffffff810f2bec ffff880087c3b000 ffff880087c3b0d8
> Call Trace:
>  [<ffffffff810f2bec>] handle_irq_event+0x3c/0x60
>  [<ffffffff810f5f60>] handle_edge_irq+0x80/0x150
>  [<ffffffff8101f49d>] handle_irq+0x1d/0x30
>  [<ffffffff81751ac1>] do_IRQ+0x61/0x120
>  [<ffffffff8174f80c>] common_interrupt+0x8c/0x8c
> Full inexact backtrace again:
> 
>  <IRQ>
>  [<ffffffff810f2bec>] handle_irq_event+0x3c/0x60
>  [<ffffffff810f5f60>] handle_edge_irq+0x80/0x150
>  [<ffffffff8101f49d>] handle_irq+0x1d/0x30
>  [<ffffffff81751ac1>] do_IRQ+0x61/0x120
>  [<ffffffff8174f80c>] common_interrupt+0x8c/0x8c
>  [<ffffffff8108bae7>] ? __do_softirq+0xa7/0x470
>  [<ffffffff8108bae0>] ? __do_softirq+0xa0/0x470
>  [<ffffffff8108c053>] irq_exit+0xb3/0xc0
>  [<ffffffff81751bc2>] smp_apic_timer_interrupt+0x42/0x50
>  [<ffffffff8174fb9c>] apic_timer_interrupt+0x8c/0xa0
>  <EOI>
>  [<ffffffff81067c96>] ? native_safe_halt+0x6/0x10
>  [<ffffffff810dcaed>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81027753>] default_idle+0x23/0x170
>  [<ffffffff8102808f>] arch_cpu_idle+0xf/0x20
>  [<ffffffff810d270a>] default_idle_call+0x2a/0x40
>  [<ffffffff810d2b07>] cpu_startup_entry+0x387/0x400
>  [<ffffffff8173fef6>] rest_init+0x136/0x140
>  [<ffffffff81f59fe3>] start_kernel+0x499/0x4a6
>  [<ffffffff81f59120>] ? early_idt_handler_array+0x120/0x120
>  [<ffffffff81f59339>] x86_64_start_reservations+0x2a/0x2c
>  [<ffffffff81f59476>] x86_64_start_kernel+0x13b/0x14a
> Code: 7e 48 8b 05 5e 58 e2 00 e8 79 8e 00 00 85 c0 74 0d 80 3d 54 3a e2
> 00 00 0f 84 db 01 00 00 65 ff 0d 01 96 f1 7e 0f 84 89 01 00 00 <48> 8b
> 73 08 44 89 e7 ff 13 41 89 c5 0f 1f 44 00 00 65 ff 05 e3
> RIP  [<ffffffff810f28d5>] handle_irq_event_percpu+0x65/0x340
>  RSP <ffff880093e03d88>
> CR2: 0000000000000008
> 
> thanks,
> 


-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26  8:56               ` Jiri Slaby
@ 2016-02-26  9:23                 ` Jiri Slaby
  2016-02-26  9:50                   ` Jiri Slaby
  2016-02-26 17:12                 ` Linus Torvalds
  2016-02-26 17:52                 ` Peter Hurley
  2 siblings, 1 reply; 30+ messages in thread
From: Jiri Slaby @ 2016-02-26  9:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/26/2016, 09:56 AM, Jiri Slaby wrote:
>> I really don't see how it would happen here - that code doesn't look
>> particularly odd.

Funnily enough, this is what I got today, when booting 4.4.2 in qemu VM
on my host.

RIP crashing (ffffffff810f28d5) is action->dev_id dereference in
handle_irq_event_percpu. Look:
   0xffffffff810f28d5 <+101>:   mov    0x8(%rbx),%rsi
   0xffffffff810f28d9 <+105>:   mov    %r12d,%edi
   0xffffffff810f28dc <+108>:   callq  *(%rbx)
which is
       trace_irq_handler_entry(irq, action);
       res = action->handler(irq, action->dev_id);
       trace_irq_handler_exit(irq, action, res);

Now, I feel a bit worried: crash involving percpu and trace together? I
have seen this pattern inlined in try_to_wake_up already (see
ffffffff810a54af in core.s [1]).

try_to_wake_up
  -> ttwu_queue
    -> ttwu_queue_remote
      -> trace_sched_wake_idle_without_ipi
  -> ttwu_stat ** CRASH somewhere here

So is this the same bug or not?

[1] http://labs.suse.cz/jslaby/bug-968218/

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810f28d5>] handle_irq_event_percpu+0x65/0x340
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.2-13.g19ca782-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by
qemu-project.org 04/01/2014
task: ffffffff81e12540 ti: ffffffff81e00000 task.ti: ffffffff81e00000
RIP: 0010:[<ffffffff810f28d5>]  [<ffffffff810f28d5>]
handle_irq_event_percpu+0x65/0x340
RSP: 0018:ffff880093e03d88  EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000000f
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
RBP: ffff880093e03dc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000004
R13: ffff880087c3b058 R14: 0000000000000000 R15: ffffffff81e03df8
FS:  0000000000000000(0000) GS:ffff880093e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000008a790000 CR4: 00000000000006f0
Stack:
 ffff880087c3b000 0000000087c3b0d8 ffff880087c3b058 ffff880087c3b000
 ffff880087c3b0d8 ffff880087c3b058 0000000000000034 ffffffff81e03df8
 ffff880093e03df0 ffffffff810f2bec ffff880087c3b000 ffff880087c3b0d8
Call Trace:
 [<ffffffff810f2bec>] handle_irq_event+0x3c/0x60
 [<ffffffff810f5f60>] handle_edge_irq+0x80/0x150
 [<ffffffff8101f49d>] handle_irq+0x1d/0x30
 [<ffffffff81751ac1>] do_IRQ+0x61/0x120
 [<ffffffff8174f80c>] common_interrupt+0x8c/0x8c
Full inexact backtrace again:

 <IRQ>
 [<ffffffff810f2bec>] handle_irq_event+0x3c/0x60
 [<ffffffff810f5f60>] handle_edge_irq+0x80/0x150
 [<ffffffff8101f49d>] handle_irq+0x1d/0x30
 [<ffffffff81751ac1>] do_IRQ+0x61/0x120
 [<ffffffff8174f80c>] common_interrupt+0x8c/0x8c
 [<ffffffff8108bae7>] ? __do_softirq+0xa7/0x470
 [<ffffffff8108bae0>] ? __do_softirq+0xa0/0x470
 [<ffffffff8108c053>] irq_exit+0xb3/0xc0
 [<ffffffff81751bc2>] smp_apic_timer_interrupt+0x42/0x50
 [<ffffffff8174fb9c>] apic_timer_interrupt+0x8c/0xa0
 <EOI>
 [<ffffffff81067c96>] ? native_safe_halt+0x6/0x10
 [<ffffffff810dcaed>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81027753>] default_idle+0x23/0x170
 [<ffffffff8102808f>] arch_cpu_idle+0xf/0x20
 [<ffffffff810d270a>] default_idle_call+0x2a/0x40
 [<ffffffff810d2b07>] cpu_startup_entry+0x387/0x400
 [<ffffffff8173fef6>] rest_init+0x136/0x140
 [<ffffffff81f59fe3>] start_kernel+0x499/0x4a6
 [<ffffffff81f59120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffff81f59339>] x86_64_start_reservations+0x2a/0x2c
 [<ffffffff81f59476>] x86_64_start_kernel+0x13b/0x14a
Code: 7e 48 8b 05 5e 58 e2 00 e8 79 8e 00 00 85 c0 74 0d 80 3d 54 3a e2
00 00 0f 84 db 01 00 00 65 ff 0d 01 96 f1 7e 0f 84 89 01 00 00 <48> 8b
73 08 44 89 e7 ff 13 41 89 c5 0f 1f 44 00 00 65 ff 05 e3
RIP  [<ffffffff810f28d5>] handle_irq_event_percpu+0x65/0x340
 RSP <ffff880093e03d88>
CR2: 0000000000000008

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26  0:38             ` Linus Torvalds
@ 2016-02-26  8:56               ` Jiri Slaby
  2016-02-26  9:23                 ` Jiri Slaby
                                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jiri Slaby @ 2016-02-26  8:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/26/2016, 01:38 AM, Linus Torvalds wrote:
> On Thu, Feb 25, 2016 at 1:32 PM, Jiri Slaby <jslaby@suse.cz> wrote:
>>
>> Interestingly, RBP contains address inside try_to_wake_up --
>> ffffffff810a535a (dunno why) which is:
>> ffffffff810a5355:       e8 66 a0 ff ff          callq  ffffffff8109f3c0
>> <ttwu_stat>
>> ffffffff810a535a:       e9 9d fe ff ff          jmpq   ffffffff810a51fc
>> <try_to_wake_up+0x3c>
>>
>> ttwu_stat does in the begginning:
>> mov    $0x16e80,%r14
>>
>> which is what we actually still have in r14 when it crashes. The first
>> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
>> be overwritten).
> 
> Hmm. That does sound very much like it might be ttwu_stat() that has
> gotten the stack frame wrong, and when finishes exits, it does
> 
>         popq    %rbp
>         ret
> 
> but in fact it popped the return address, and then returned to a crazy address.
> 
> Which sounds like a corrupted stack pointer (not a corrupted stack).
> 
> Can you make just the "vmlinux" file available somewhere?

Sure, both vmlinux w/ its separated .debuginfo sections vmlinux.debug
are at:
http://labs.suse.cz/jslaby/bug-968218/

There is also core.s which is a result of:
objdump -d vmlinux-4.4.2-3-default | grep -A 10000 '<update_rq_clock>:'
>core.s

> In my own private configuration, ttwu_stat() doesn't actually touch
> the stack at all - no stack pointer action anywhere except for the
> 
> ttwu_stat:
> 1:      call    __fentry__
>         pushq   %rbp
>    ..
>         movq    %rsp, %rbp      #,
> 
>  .....
> 
>         popq    %rbp
>         ret
> 
> but yeah, as Peter says, maybe an exception screwed up %rsp somehow..

Lucky you. My ttwu_stat does a bit more stack save-restoring. But all
seem to be paired:

ffffffff8109f3c0 <ttwu_stat>:
ffffffff8109f3c0:       e8 fb ca 60 00          callq  ffffffff816abec0
<__fentry__>
ffffffff8109f3c5:       55                      push   %rbp
ffffffff8109f3c6:       48 89 e5                mov    %rsp,%rbp
ffffffff8109f3c9:       41 57                   push   %r15
ffffffff8109f3cb:       41 56                   push   %r14
ffffffff8109f3cd:       41 55                   push   %r13
ffffffff8109f3cf:       41 54                   push   %r12
ffffffff8109f3d1:       49 c7 c6 80 6e 01 00    mov    $0x16e80,%r14
ffffffff8109f3d8:       53                      push   %rbx
...
ffffffff8109f48c:       5b                      pop    %rbx
ffffffff8109f48d:       41 5c                   pop    %r12
ffffffff8109f48f:       41 5d                   pop    %r13
ffffffff8109f491:       41 5e                   pop    %r14
ffffffff8109f493:       41 5f                   pop    %r15
ffffffff8109f495:       5d                      pop    %rbp
ffffffff8109f496:       c3                      retq


> I really don't see how it would happen here - that code doesn't look
> particularly odd.
> 
> And the fentry code used by the function tracer can certainly screw
> things up, but even that would be hard-pressed to screw up %rbp, since
> the saving of rbp comes *after* fentry. Old pre-__fentry__ gcc
> versions had a much higher likelihood (the whole mcount thing is a
> disaster, but I'm assuming you have a compiler that does __fentry__
> and have CC_USING_FENTRY set?)

Yep, -mfentry in use obviously from the dump above, it is compiled by
gcc 5.3.1 rev231346.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-26  0:38               ` Peter Hurley
@ 2016-02-26  8:45                 ` Jiri Slaby
  0 siblings, 0 replies; 30+ messages in thread
From: Jiri Slaby @ 2016-02-26  8:45 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Linus Torvalds, Greg KH, Linux Kernel Mailing List,
	Andrew Morton, stable, lwn, Steven Rostedt

On 02/26/2016, 01:38 AM, Peter Hurley wrote:
>> That would imply that RSP was off by +8 when the ttwu_stat() epilog was
>> executed so that RBP <= ret addr and RIP <= some local var in try_to_wake_up()
>> stack frame.
>>
>> Looks like R15 in the crash report could be what RBP should have been.
>>
>> Now to find out why RSP is +8
> 
> Which I would investigate if I could download that kernel.
> Unfortunately, OBS doesn't like me so if you could make that
> kernel available some other way or send me a mixed listing
> of kernel/sched/core.c

Actually I cannot do mixed listing using objdump, as my objdump
complains in the middle of vmlinux:
objdump: Dwarf Error: mangled line number section.

and dumps no more code afterwards. (And core.c is after the error happens.)

Nevertheless, I did:
gdb vmlinux-4.4.2-3-default -ex 'disass /m try_to_wake_up' --batch >
try_to_wake_up.mixed
gdb vmlinux-4.4.2-3-default -ex 'disass /m ttwu_stat' --batch
>ttwu_stat.mixed

And both will appear at:
http://labs.suse.cz/jslaby/bug-968218/

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 22:00           ` Jiri Kosina
@ 2016-02-26  8:31             ` Jiri Slaby
  0 siblings, 0 replies; 30+ messages in thread
From: Jiri Slaby @ 2016-02-26  8:31 UTC (permalink / raw)
  To: Jiri Kosina, Linus Torvalds
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/25/2016, 11:00 PM, Jiri Kosina wrote:
>> If it was one of the calls _in_ try_to_wake_up() that called to insane 
>> code, I would have expected to see try_to_wake_up on the stack.
> 
> try_to_wake_up() is very likely to be inlined into wake_up_process(), and 
> therefore in such cases will never be on the stack as a return address; 
> it'll always be wake_up_process().

Actually it is not inlined, see core.s at:
http://labs.suse.cz/jslaby/bug-968218/

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 19:23       ` Steven Rostedt
@ 2016-02-26  8:25         ` Jiri Slaby
  0 siblings, 0 replies; 30+ messages in thread
From: Jiri Slaby @ 2016-02-26  8:25 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn

On 02/25/2016, 08:23 PM, Steven Rostedt wrote:
> On Thu, 25 Feb 2016 11:09:35 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> 
>> Jiri, were you messing around with tracing when this happened? Or
>> maybe shutting down CPU's? There was a RCU locking problem with CPU
>> shutdown, maybe this is one of the symptoms. The fix for that is
>> recent, and not in 4.4.2.
>>
>> Adding Steven Rostedt to the cc. Steven, does that look like a possible case?
> 
> Possible: yes. Likely: no
> 
> The recent fix would require shutting down a CPU at the same time as a
> tracepoint is enabled or disabled. Rather difficult to hit, but easier
> on a virtual machine. If Jiri was not enabling/disabling tracepoints or
> shutting down CPUs, then it would not be the bug.

As this is an automatic build, I very doubt a CPU was offlined or
tracepoints enabled. So I see it even less unlikely this to happen
concurrently.

> But as the comm of the bug is gdb and this running on a virtual
> machine, I think the bug may be elsewhere. Corrupt stack possibly?

Seems so (broken stack frame pointer link), as is indicated in other e-mail.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 18:40   ` Peter Hurley
  2016-02-25 19:09     ` Linus Torvalds
@ 2016-02-26  8:15     ` Jiri Slaby
  1 sibling, 0 replies; 30+ messages in thread
From: Jiri Slaby @ 2016-02-26  8:15 UTC (permalink / raw)
  To: Peter Hurley, Greg KH, linux-kernel, Andrew Morton, torvalds, stable; +Cc: lwn

On 02/25/2016, 07:40 PM, Peter Hurley wrote:
>> This was trigerred by a gdb build on our servers [1].
> 
> I noted that the crash is not strictly for building gdb but appears
> to be with gdb running?

Yes, when gdb is built, gdb tests are run. From the build log gdb_log
[1], checks were run on the top of built gdb:
./orphanripper make -j8 -k check//unix/-m64 check//unix/-m64/-fPIC/-pie
check//unix/-m32 check//unix/-m32/-fPIC/-pie

[1] http://labs.suse.cz/jslaby/bug-968218/

> Perhaps some test that has failed?
> Maybe some ABI violation with gdb + kvm?

[  425s]                === gdb tests ===
[  425s]
[  425s] Schedule of variations:
[  425s]     unix/-m32
[  425s]
[  425s] Running target unix/-m32
[  425s] Using /usr/share/dejagnu/baseboards/unix.exp as board
description file for target.
[  425s] Using /usr/share/dejagnu/config/unix.exp as generic interface
file for target.
[  425s] Using
/home/abuild/rpmbuild/BUILD/gdb-7.10.1/gdb/testsuite/config/unix.exp as
tool-and-target-specific interface file.
[  425s] Running
/home/abuild/rpmbuild/BUILD/gdb-7.10.1/gdb/testsuite/gdb.base/break-interp.exp
...
[  425s] [  413.383880] kernel tried to execute NX-protected page -
exploit attempt? (uid: 399)


It's plenty of code run in there:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/break-interp.exp;h=713e1008fb3739f7fdcdb6c0a484a46b279ef1b6;hb=HEAD

> Is this reproducible?

I tried whole day yesterday without luck :(.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 22:33             ` Peter Hurley
@ 2016-02-26  0:38               ` Peter Hurley
  2016-02-26  8:45                 ` Jiri Slaby
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Hurley @ 2016-02-26  0:38 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Linus Torvalds, Greg KH, Linux Kernel Mailing List,
	Andrew Morton, stable, lwn, Steven Rostedt

On 02/25/2016 02:33 PM, Peter Hurley wrote:
> On 02/25/2016 01:32 PM, Jiri Slaby wrote:
>> On 02/25/2016, 09:51 PM, Linus Torvalds wrote:
>>> Jiri, can you check your try_to_wake_up() disassembly for some
>>> indirect "jmp" instructions?
>>
>> Nope, there is none.
>>
>> I will reply to all your questions tomorrow.
>>
>> Just quickly, as I have to go (and don't want you to duplicate efforts)
>> the kernel which was used can be obtained here:
>> https://build.opensuse.org/package/binaries/openSUSE:Factory:Staging:I/kernel-default?repository=standard
>>
>> The issue is very weird, indeed, this is what I noted to our bugzilla:
>> The stack trace ends in call of try_to_wake_up. Then, there it has to be
>> some of the indirect calls:
>>
>> callq  *0x40(%rax)
>>   p->sched_class->select_task_rq from select_task_rq
>>
>> RAX is 0x00000000bb37e180, barely can be read with offset 0x40
>>
>> callq  *0xd85656(%rip) # ffffffff81e2aba0 <smp_ops+0x20>
>>   smp_ops.smp_send_reschedule from ttwu_queue_remote
>>
>> Which hardly can be it, given smp_ops is static.
>>
>> So it has to be some other "call *" from a nested function :(.
>>
>>
>>
>>
>> Interestingly, RBP contains address inside try_to_wake_up --
>> ffffffff810a535a (dunno why) which is:
>> ffffffff810a5355:       e8 66 a0 ff ff          callq  ffffffff8109f3c0
>> <ttwu_stat>
>> ffffffff810a535a:       e9 9d fe ff ff          jmpq   ffffffff810a51fc
>> <try_to_wake_up+0x3c>
> 
> That would imply that RSP was off by +8 when the ttwu_stat() epilog was
> executed so that RBP <= ret addr and RIP <= some local var in try_to_wake_up()
> stack frame.
> 
> Looks like R15 in the crash report could be what RBP should have been.
> 
> Now to find out why RSP is +8

Which I would investigate if I could download that kernel.
Unfortunately, OBS doesn't like me so if you could make that
kernel available some other way or send me a mixed listing
of kernel/sched/core.c


>> ttwu_stat does in the begginning:
>> mov    $0x16e80,%r14
>>
>> which is what we actually still have in r14 when it crashes. The first
>> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
>> be overwritten).
>>
>>
>>
>> Another note: we die when jmp/calling to 0xffff88023fd40000.
>> RSI=RDI=0xffff88023fdd6e80. RSI-RIP is 0x96e80, which is R14 + 0x80000.
>> Coincidence?
>>
>> thanks,
>>
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 21:32           ` Jiri Slaby
  2016-02-25 22:33             ` Peter Hurley
@ 2016-02-26  0:38             ` Linus Torvalds
  2016-02-26  8:56               ` Jiri Slaby
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2016-02-26  0:38 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Peter Hurley, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On Thu, Feb 25, 2016 at 1:32 PM, Jiri Slaby <jslaby@suse.cz> wrote:
>
> Interestingly, RBP contains address inside try_to_wake_up --
> ffffffff810a535a (dunno why) which is:
> ffffffff810a5355:       e8 66 a0 ff ff          callq  ffffffff8109f3c0
> <ttwu_stat>
> ffffffff810a535a:       e9 9d fe ff ff          jmpq   ffffffff810a51fc
> <try_to_wake_up+0x3c>
>
> ttwu_stat does in the begginning:
> mov    $0x16e80,%r14
>
> which is what we actually still have in r14 when it crashes. The first
> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
> be overwritten).

Hmm. That does sound very much like it might be ttwu_stat() that has
gotten the stack frame wrong, and when finishes exits, it does

        popq    %rbp
        ret

but in fact it popped the return address, and then returned to a crazy address.

Which sounds like a corrupted stack pointer (not a corrupted stack).

Can you make just the "vmlinux" file available somewhere?

In my own private configuration, ttwu_stat() doesn't actually touch
the stack at all - no stack pointer action anywhere except for the

ttwu_stat:
1:      call    __fentry__
        pushq   %rbp
   ..
        movq    %rsp, %rbp      #,

 .....

        popq    %rbp
        ret

but yeah, as Peter says, maybe an exception screwed up %rsp somehow..

I really don't see how it would happen here - that code doesn't look
particularly odd.

And the fentry code used by the function tracer can certainly screw
things up, but even that would be hard-pressed to screw up %rbp, since
the saving of rbp comes *after* fentry. Old pre-__fentry__ gcc
versions had a much higher likelihood (the whole mcount thing is a
disaster, but I'm assuming you have a compiler that does __fentry__
and have CC_USING_FENTRY set?)

               Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 21:32           ` Jiri Slaby
@ 2016-02-25 22:33             ` Peter Hurley
  2016-02-26  0:38               ` Peter Hurley
  2016-02-26  0:38             ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Hurley @ 2016-02-25 22:33 UTC (permalink / raw)
  To: Jiri Slaby, Linus Torvalds
  Cc: Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn,
	Steven Rostedt

On 02/25/2016 01:32 PM, Jiri Slaby wrote:
> On 02/25/2016, 09:51 PM, Linus Torvalds wrote:
>> Jiri, can you check your try_to_wake_up() disassembly for some
>> indirect "jmp" instructions?
> 
> Nope, there is none.
> 
> I will reply to all your questions tomorrow.
> 
> Just quickly, as I have to go (and don't want you to duplicate efforts)
> the kernel which was used can be obtained here:
> https://build.opensuse.org/package/binaries/openSUSE:Factory:Staging:I/kernel-default?repository=standard
> 
> The issue is very weird, indeed, this is what I noted to our bugzilla:
> The stack trace ends in call of try_to_wake_up. Then, there it has to be
> some of the indirect calls:
> 
> callq  *0x40(%rax)
>   p->sched_class->select_task_rq from select_task_rq
> 
> RAX is 0x00000000bb37e180, barely can be read with offset 0x40
> 
> callq  *0xd85656(%rip) # ffffffff81e2aba0 <smp_ops+0x20>
>   smp_ops.smp_send_reschedule from ttwu_queue_remote
> 
> Which hardly can be it, given smp_ops is static.
> 
> So it has to be some other "call *" from a nested function :(.
> 
> 
> 
> 
> Interestingly, RBP contains address inside try_to_wake_up --
> ffffffff810a535a (dunno why) which is:
> ffffffff810a5355:       e8 66 a0 ff ff          callq  ffffffff8109f3c0
> <ttwu_stat>
> ffffffff810a535a:       e9 9d fe ff ff          jmpq   ffffffff810a51fc
> <try_to_wake_up+0x3c>

That would imply that RSP was off by +8 when the ttwu_stat() epilog was
executed so that RBP <= ret addr and RIP <= some local var in try_to_wake_up()
stack frame.

Looks like R15 in the crash report could be what RBP should have been.

Now to find out why RSP is +8


> 
> 
> ttwu_stat does in the begginning:
> mov    $0x16e80,%r14
> 
> which is what we actually still have in r14 when it crashes. The first
> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
> be overwritten).
> 
> 
> 
> Another note: we die when jmp/calling to 0xffff88023fd40000.
> RSI=RDI=0xffff88023fdd6e80. RSI-RIP is 0x96e80, which is R14 + 0x80000.
> Coincidence?
> 
> thanks,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 20:51         ` Linus Torvalds
  2016-02-25 21:32           ` Jiri Slaby
  2016-02-25 21:43           ` Peter Hurley
@ 2016-02-25 22:00           ` Jiri Kosina
  2016-02-26  8:31             ` Jiri Slaby
  2 siblings, 1 reply; 30+ messages in thread
From: Jiri Kosina @ 2016-02-25 22:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Jiri Slaby, Greg KH, Linux Kernel Mailing List,
	Andrew Morton, stable, lwn, Steven Rostedt

On Thu, 25 Feb 2016, Linus Torvalds wrote:

> >>   tty_flip_buffer_push ->
> >>     (queue_work is inline) ->
> >>     queue_work_on ->
> >>       __queue_work ->
> >>         insert_work ->
> >>           (wake_up_worker is inlined)
> >>           wake_up_process ->
> >
> >               try_to_wake_up ->
> >
> >>             *insane non-code address*
> 
> The thing is, we don't actually have that try_to_wake_up() on the
> stack in the oops report. There are other thigns on the stack, but the
> first stack entry that is dumped that is a text address is that
> "ffffffff810a5585" which is wake_up_process.
> 
> That's why I said it might be stack corruption: we might be returning
> from try_to_wake_up(), but with a corrupt stack entry, and returning
> to garbage.
>
> If it was one of the calls _in_ try_to_wake_up() that called to insane 
> code, I would have expected to see try_to_wake_up on the stack.

try_to_wake_up() is very likely to be inlined into wake_up_process(), and 
therefore in such cases will never be on the stack as a return address; 
it'll always be wake_up_process().

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 20:51         ` Linus Torvalds
  2016-02-25 21:32           ` Jiri Slaby
@ 2016-02-25 21:43           ` Peter Hurley
  2016-02-25 22:00           ` Jiri Kosina
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Hurley @ 2016-02-25 21:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jiri Slaby, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/25/2016 12:51 PM, Linus Torvalds wrote:
> On Thu, Feb 25, 2016 at 12:32 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> But yes, the call trace looks accurate and makes sense, we haveL
>>>
>>>   tty_flip_buffer_push ->
>>>     (queue_work is inline) ->
>>>     queue_work_on ->
>>>       __queue_work ->
>>>         insert_work ->
>>>           (wake_up_worker is inlined)
>>>           wake_up_process ->
>>
>>               try_to_wake_up ->
>>
>>>             *insane non-code address*
> 
> The thing is, we don't actually have that try_to_wake_up() on the
> stack in the oops report.

I know, but last execution prior to things going sideways
was definitely in try_to_wake_up().

> There are other thigns on the stack, but the
> first stack entry that is dumped that is a text address is that
> "ffffffff810a5585" which is wake_up_process.
> 
> That's why I said it might be stack corruption: we might be returning
> from try_to_wake_up(), but with a corrupt stack entry, and returning
> to garbage.
> 
> If it was one of the calls _in_ try_to_wake_up() that called to insane
> code, I would have expected to see try_to_wake_up on the stack.

Agreed, how execution got from try_to_wake_up() to mysterious
percpu address without call is the question.

> That's particularly true on modern machines, where things like the
> percpu area is nopefully marked NX, so that we shouldn't be executing
> random instructions. Which is the fault that actually triggers
> ("kernel tried to execute NX-protected page"), so the "we corrupted
> the stack by running random code at the original target of the jump"
> scenario sounds much less likely.
> 
> So the whole oops looks odd. If it really was one of the calls from
> try_to_wake_up(), why isn't that return address on the stack?

I don't think it's anything from code flow.

> Since this is under qemu, I'm wondering if this is a qemu bug, where
> the NX fault processing of a call instruction happens before the stack
> is pushed, but when the instruction pointer already points to the new
> address.

Or any fault processing really; an iret to the bogus address
would then trigger NX fault without leaving a trace of the broken
exception handling.


> Another alternative *might* be that gcc has turned an indirect
> tail-call call into a "jmp *", but I certainly don't see that when I
> compile the file myself. I've seen it in the past in some (very
> unusual) cases, so it's possible - gcc definitely knows about
> tail-call jmp conversion (even if it makes debugging sometimes a
> pain).
> 
> Jiri, can you check your try_to_wake_up() disassembly for some
> indirect "jmp" instructions?
> 
>                         Linus
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 20:51         ` Linus Torvalds
@ 2016-02-25 21:32           ` Jiri Slaby
  2016-02-25 22:33             ` Peter Hurley
  2016-02-26  0:38             ` Linus Torvalds
  2016-02-25 21:43           ` Peter Hurley
  2016-02-25 22:00           ` Jiri Kosina
  2 siblings, 2 replies; 30+ messages in thread
From: Jiri Slaby @ 2016-02-25 21:32 UTC (permalink / raw)
  To: Linus Torvalds, Peter Hurley
  Cc: Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn,
	Steven Rostedt

On 02/25/2016, 09:51 PM, Linus Torvalds wrote:
> Jiri, can you check your try_to_wake_up() disassembly for some
> indirect "jmp" instructions?

Nope, there is none.

I will reply to all your questions tomorrow.

Just quickly, as I have to go (and don't want you to duplicate efforts)
the kernel which was used can be obtained here:
https://build.opensuse.org/package/binaries/openSUSE:Factory:Staging:I/kernel-default?repository=standard

The issue is very weird, indeed, this is what I noted to our bugzilla:
The stack trace ends in call of try_to_wake_up. Then, there it has to be
some of the indirect calls:

callq  *0x40(%rax)
  p->sched_class->select_task_rq from select_task_rq

RAX is 0x00000000bb37e180, barely can be read with offset 0x40

callq  *0xd85656(%rip) # ffffffff81e2aba0 <smp_ops+0x20>
  smp_ops.smp_send_reschedule from ttwu_queue_remote

Which hardly can be it, given smp_ops is static.

So it has to be some other "call *" from a nested function :(.




Interestingly, RBP contains address inside try_to_wake_up --
ffffffff810a535a (dunno why) which is:
ffffffff810a5355:       e8 66 a0 ff ff          callq  ffffffff8109f3c0
<ttwu_stat>
ffffffff810a535a:       e9 9d fe ff ff          jmpq   ffffffff810a51fc
<try_to_wake_up+0x3c>


ttwu_stat does in the begginning:
mov    $0x16e80,%r14

which is what we actually still have in r14 when it crashes. The first
ttwu_stat's "if" has to go through the true branch (otherwise r14 would
be overwritten).



Another note: we die when jmp/calling to 0xffff88023fd40000.
RSI=RDI=0xffff88023fdd6e80. RSI-RIP is 0x96e80, which is R14 + 0x80000.
Coincidence?

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 20:32       ` Peter Hurley
@ 2016-02-25 20:51         ` Linus Torvalds
  2016-02-25 21:32           ` Jiri Slaby
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Linus Torvalds @ 2016-02-25 20:51 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Jiri Slaby, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On Thu, Feb 25, 2016 at 12:32 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
>> But yes, the call trace looks accurate and makes sense, we haveL
>>
>>   tty_flip_buffer_push ->
>>     (queue_work is inline) ->
>>     queue_work_on ->
>>       __queue_work ->
>>         insert_work ->
>>           (wake_up_worker is inlined)
>>           wake_up_process ->
>
>               try_to_wake_up ->
>
>>             *insane non-code address*

The thing is, we don't actually have that try_to_wake_up() on the
stack in the oops report. There are other thigns on the stack, but the
first stack entry that is dumped that is a text address is that
"ffffffff810a5585" which is wake_up_process.

That's why I said it might be stack corruption: we might be returning
from try_to_wake_up(), but with a corrupt stack entry, and returning
to garbage.

If it was one of the calls _in_ try_to_wake_up() that called to insane
code, I would have expected to see try_to_wake_up on the stack.

That's particularly true on modern machines, where things like the
percpu area is nopefully marked NX, so that we shouldn't be executing
random instructions. Which is the fault that actually triggers
("kernel tried to execute NX-protected page"), so the "we corrupted
the stack by running random code at the original target of the jump"
scenario sounds much less likely.

So the whole oops looks odd. If it really was one of the calls from
try_to_wake_up(), why isn't that return address on the stack?

Since this is under qemu, I'm wondering if this is a qemu bug, where
the NX fault processing of a call instruction happens before the stack
is pushed, but when the instruction pointer already points to the new
address.

Another alternative *might* be that gcc has turned an indirect
tail-call call into a "jmp *", but I certainly don't see that when I
compile the file myself. I've seen it in the past in some (very
unusual) cases, so it's possible - gcc definitely knows about
tail-call jmp conversion (even if it makes debugging sometimes a
pain).

Jiri, can you check your try_to_wake_up() disassembly for some
indirect "jmp" instructions?

                        Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 19:09     ` Linus Torvalds
  2016-02-25 19:23       ` Steven Rostedt
@ 2016-02-25 20:32       ` Peter Hurley
  2016-02-25 20:51         ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Hurley @ 2016-02-25 20:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jiri Slaby, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On 02/25/2016 11:09 AM, Linus Torvalds wrote:
> On Thu, Feb 25, 2016 at 10:40 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>
>> The crash itself is in try_to_wake_up() (again, assuming the stacktrace is
>> valid).
> 
> No, the crash seems to be off in la-la-land

I meant the last-known-good address is try_to_wake_up(); in the same way
that RIP @ 0 crashes, but no one says the crash is @ NULL.


>, judging by the oops:
> 
>    IP: [<ffff88023fd40000>] 0xffff88023fd40000
> 
> which isn't kernel code at all. It is close to, but not at, the percpu
> area you point out.

Assuming ffff88023fdc0000 is percpu start for cpu 7 then I'm pretty sure
         ffff88023fd40000 is percpu start for cpu 6.

Either way, RIP is almost certainly in the percpu block.


> But yes, the call trace looks accurate and makes sense, we haveL
> 
>   tty_flip_buffer_push ->
>     (queue_work is inline) ->
>     queue_work_on ->
>       __queue_work ->
>         insert_work ->
>           (wake_up_worker is inlined)
>           wake_up_process ->

              try_to_wake_up ->

>             *insane non-code address*
> 
> but I cannot for the life of me see how we get to an insane address.
> It smells like stack corruption when returning from try_to_wake_up()
> or something like that.
> 
> Hmm. Actually, try_to_wake_up() will do several indirect calls
> (task_waking and select_task_rq, and it_func_ptr->fn for tracing), but
> then I'd expect to see try_to_wake_up itself in the stack trace.


> Of course, when you jump to la-la-land, crazy things can happen. And
> that offending IP is at a page boundary, so it migth have run some
> random code on the previous page.
> 
> Quite frankly, neither ->task_waking() nor ->select_task_rq() look
> very likely.

Agreed, the sched_class indirections do not seem likely.


> But the tracepoint stuff is actually fairly dynamic, and
> does things like
> 
>     it_func_ptr = rcu_dereference_sched((tp)->funcs);
> 
> to get the function pointer information, so if there is some race in
> there, anything can happen.
> 
> Jiri, were you messing around with tracing when this happened? Or
> maybe shutting down CPU's? There was a RCU locking problem with CPU
> shutdown, maybe this is one of the symptoms. The fix for that is
> recent, and not in 4.4.2.
> 
> Adding Steven Rostedt to the cc. Steven, does that look like a possible case?
> 
>                         Linus
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 19:09     ` Linus Torvalds
@ 2016-02-25 19:23       ` Steven Rostedt
  2016-02-26  8:25         ` Jiri Slaby
  2016-02-25 20:32       ` Peter Hurley
  1 sibling, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2016-02-25 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Jiri Slaby, Greg KH, Linux Kernel Mailing List,
	Andrew Morton, stable, lwn

On Thu, 25 Feb 2016 11:09:35 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:


> Jiri, were you messing around with tracing when this happened? Or
> maybe shutting down CPU's? There was a RCU locking problem with CPU
> shutdown, maybe this is one of the symptoms. The fix for that is
> recent, and not in 4.4.2.
> 
> Adding Steven Rostedt to the cc. Steven, does that look like a possible case?

Possible: yes. Likely: no

The recent fix would require shutting down a CPU at the same time as a
tracepoint is enabled or disabled. Rather difficult to hit, but easier
on a virtual machine. If Jiri was not enabling/disabling tracepoints or
shutting down CPUs, then it would not be the bug.

But as the comm of the bug is gdb and this running on a virtual
machine, I think the bug may be elsewhere. Corrupt stack possibly?

-- Steve

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 18:40   ` Peter Hurley
@ 2016-02-25 19:09     ` Linus Torvalds
  2016-02-25 19:23       ` Steven Rostedt
  2016-02-25 20:32       ` Peter Hurley
  2016-02-26  8:15     ` Jiri Slaby
  1 sibling, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2016-02-25 19:09 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Jiri Slaby, Greg KH, Linux Kernel Mailing List, Andrew Morton,
	stable, lwn, Steven Rostedt

On Thu, Feb 25, 2016 at 10:40 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>
> The crash itself is in try_to_wake_up() (again, assuming the stacktrace is
> valid).

No, the crash seems to be off in la-la-land, judging by the oops:

   IP: [<ffff88023fd40000>] 0xffff88023fd40000

which isn't kernel code at all. It is close to, but not at, the percpu
area you point out.

But yes, the call trace looks accurate and makes sense, we haveL

  tty_flip_buffer_push ->
    (queue_work is inline) ->
    queue_work_on ->
      __queue_work ->
        insert_work ->
          (wake_up_worker is inlined)
          wake_up_process ->
            *insane non-code address*

but I cannot for the life of me see how we get to an insane address.
It smells like stack corruption when returning from try_to_wake_up()
or something like that.

Hmm. Actually, try_to_wake_up() will do several indirect calls
(task_waking and select_task_rq, and it_func_ptr->fn for tracing), but
then I'd expect to see try_to_wake_up itself in the stack trace.

Of course, when you jump to la-la-land, crazy things can happen. And
that offending IP is at a page boundary, so it migth have run some
random code on the previous page.

Quite frankly, neither ->task_waking() nor ->select_task_rq() look
very likely. But the tracepoint stuff is actually fairly dynamic, and
does things like

    it_func_ptr = rcu_dereference_sched((tp)->funcs);

to get the function pointer information, so if there is some race in
there, anything can happen.

Jiri, were you messing around with tracing when this happened? Or
maybe shutting down CPU's? There was a RCU locking problem with CPU
shutdown, maybe this is one of the symptoms. The fix for that is
recent, and not in 4.4.2.

Adding Steven Rostedt to the cc. Steven, does that look like a possible case?

                        Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-25 10:12 ` BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2] Jiri Slaby
@ 2016-02-25 18:40   ` Peter Hurley
  2016-02-25 19:09     ` Linus Torvalds
  2016-02-26  8:15     ` Jiri Slaby
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Hurley @ 2016-02-25 18:40 UTC (permalink / raw)
  To: Jiri Slaby, Greg KH, linux-kernel, Andrew Morton, torvalds, stable; +Cc: lwn

On 02/25/2016 02:12 AM, Jiri Slaby wrote:
> On 02/17/2016, 09:37 PM, Greg KH wrote:
>> I'm announcing the release of the 4.4.2 kernel.
> ...
>> Peter Hurley (4):
>>       n_tty: Fix unsafe reference to "other" ldisc
>>       tty: Wait interruptibly for tty lock on reopen
>>       tty: Retry failed reopen if tty teardown in-progress
>>       tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
> 
> It seems like 4.4.2 schedules a tty flush work but the work is deleted
> meanwhile.

Assuming the stack backtrace is accurate, this doesn't look like a freed
work crash.

The process being woken here is a workqueue kworker for the
system_unbound_wq, not the work function being queued.

The crash itself is in try_to_wake_up() (again, assuming the stacktrace is
valid).

Looking at the gs base @ ffff88023fdc0000 which is for CPU7,
rip @ ffff88023fd40000 appears to be in the PERCPU area. You can confirm
this in the kernel log (grep PERCPU) which prints the pcpu base ptr.


> This was trigerred by a gdb build on our servers [1].

I noted that the crash is not strictly for building gdb but appears
to be with gdb running? Perhaps some test that has failed?
Maybe some ABI violation with gdb + kvm?

Is this reproducible?

Regards,
Peter Hurley


> Going to investigate further, if this doesn't ring a bell?
> 
> [1]
> https://build.opensuse.org/package/live_build_log/openSUSE:Factory:Staging:I/gdb/standard/x86_64
> 
> kernel tried to execute NX-protected page - exploit attempt? (uid: 399)
> BUG: unable to handle kernel paging request at ffff88023fd40000
> IP: [<ffff88023fd40000>] 0xffff88023fd40000
> PGD 2240067 PUD 23fced063 PMD 23fcee063 PTE 800000023fd40163
> Oops: 0011 [#1] PREEMPT SMP
> Modules linked in: ata_generic ata_piix nls_iso8859_1 nls_cp437 vfat fat
> virtio_rng virtio_blk virtio_pci virtio
> k_ipv6 nf_defrag_ipv6 nf_conntrack btrfs xor raid6_pq reiserfs squashfs
> fuse dm_snapshot dm_bufio dm_mod binfmt_
> misc loop sg
> CPU: 7 PID: 3127 Comm: gdb Not tainted 4.4.2-3-default #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2
> 014
> task: ffff8801e43a4240 ti: ffff8800bb2a4000 task.ti: ffff8800bb2a4000
> RIP: 0010:[<ffff88023fd40000>]  [<ffff88023fd40000>] 0xffff88023fd40000
> RSP: 0018:ffff8800bb2a7c50  EFLAGS: 00056686
> RAX: 00000000bb37e180 RBX: 0000000000000001 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: ffff88023fdd6e80 RDI: ffff88023fdd6e80
> RBP: ffffffff810a535a R08: 0000000000000000 R09: 0000000000000020
> R10: 0000000001b52cb0 R11: 0000000000000293 R12: 0000000000000046
> R13: ffff8800bb37e180 R14: 0000000000016e80 R15: ffff8800bb2a7c80
> FS:  00007fe3e4aba740(0000) GS:ffff88023fdc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88023fd40000 CR3: 00000002353cd000 CR4: 00000000000406e0
> Stack:
>  000000008146e197 ffff88017ee19f00 ffff880234e26a08 ffff88017ed2a830
>  0000000000000005 0000000000010e30 ffff8800bb2a7c90 ffffffff810a5585
>  ffff8800bb2a7cc8 ffffffff81092fe1 0000000000000000 ffff88017ee19f00
> Call Trace:
> Inexact backtrace:
> 
>  [<ffffffff810a5585>] ? wake_up_process+0x15/0x20
>  [<ffffffff81092fe1>] ? insert_work+0x81/0xc0
>  [<ffffffff8109326c>] ? __queue_work+0x24c/0x390
>  [<ffffffff81093947>] ? queue_work_on+0x27/0x40
>  [<ffffffff814732db>] ? tty_flip_buffer_push+0x2b/0x30
>  [<ffffffff81474f1a>] ? pty_write+0x4a/0x60
>  [<ffffffff8146e5c6>] ? n_tty_write+0x1b6/0x4d0
>  [<ffffffff810bd330>] ? __wake_up_sync+0x20/0x20
>  [<ffffffff8146a96b>] ? tty_write+0x1cb/0x2b0
>  [<ffffffff8146e410>] ? n_tty_open+0xe0/0xe0
>  [<ffffffff811fa858>] ? __vfs_write+0x28/0xf0
>  [<ffffffff81334a48>] ? apparmor_file_permission+0x18/0x20
>  [<ffffffff812ff05d>] ? security_file_permission+0x3d/0xc0
>  [<ffffffff811facbf>] ? rw_verify_area+0x4f/0xe0
>  [<ffffffff811faf29>] ? vfs_write+0xa9/0x1a0
>  [<ffffffff811fbb26>] ? SyS_write+0x46/0xa0
>  [<ffffffff816a96f6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
> Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 0
> 00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> RIP  [<ffff88023fd40000>] 0xffff88023fd40000
>  RSP <ffff8800bb2a7c50>
> CR2: ffff88023fd40000
> ---[ end trace 14d86b882766d1bf ]---
> 
> thanks,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
  2016-02-17 20:37 Linux 4.4.2 Greg KH
@ 2016-02-25 10:12 ` Jiri Slaby
  2016-02-25 18:40   ` Peter Hurley
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Slaby @ 2016-02-25 10:12 UTC (permalink / raw)
  To: Greg KH, linux-kernel, Andrew Morton, torvalds, stable
  Cc: lwn, Jiri Slaby, Peter Hurley

On 02/17/2016, 09:37 PM, Greg KH wrote:
> I'm announcing the release of the 4.4.2 kernel.
...
> Peter Hurley (4):
>       n_tty: Fix unsafe reference to "other" ldisc
>       tty: Wait interruptibly for tty lock on reopen
>       tty: Retry failed reopen if tty teardown in-progress
>       tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)

It seems like 4.4.2 schedules a tty flush work but the work is deleted
meanwhile. This was trigerred by a gdb build on our servers [1]. Going
to investigate further, if this doesn't ring a bell?

[1]
https://build.opensuse.org/package/live_build_log/openSUSE:Factory:Staging:I/gdb/standard/x86_64

kernel tried to execute NX-protected page - exploit attempt? (uid: 399)
BUG: unable to handle kernel paging request at ffff88023fd40000
IP: [<ffff88023fd40000>] 0xffff88023fd40000
PGD 2240067 PUD 23fced063 PMD 23fcee063 PTE 800000023fd40163
Oops: 0011 [#1] PREEMPT SMP
Modules linked in: ata_generic ata_piix nls_iso8859_1 nls_cp437 vfat fat
virtio_rng virtio_blk virtio_pci virtio
k_ipv6 nf_defrag_ipv6 nf_conntrack btrfs xor raid6_pq reiserfs squashfs
fuse dm_snapshot dm_bufio dm_mod binfmt_
misc loop sg
CPU: 7 PID: 3127 Comm: gdb Not tainted 4.4.2-3-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2
014
task: ffff8801e43a4240 ti: ffff8800bb2a4000 task.ti: ffff8800bb2a4000
RIP: 0010:[<ffff88023fd40000>]  [<ffff88023fd40000>] 0xffff88023fd40000
RSP: 0018:ffff8800bb2a7c50  EFLAGS: 00056686
RAX: 00000000bb37e180 RBX: 0000000000000001 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: ffff88023fdd6e80 RDI: ffff88023fdd6e80
RBP: ffffffff810a535a R08: 0000000000000000 R09: 0000000000000020
R10: 0000000001b52cb0 R11: 0000000000000293 R12: 0000000000000046
R13: ffff8800bb37e180 R14: 0000000000016e80 R15: ffff8800bb2a7c80
FS:  00007fe3e4aba740(0000) GS:ffff88023fdc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88023fd40000 CR3: 00000002353cd000 CR4: 00000000000406e0
Stack:
 000000008146e197 ffff88017ee19f00 ffff880234e26a08 ffff88017ed2a830
 0000000000000005 0000000000010e30 ffff8800bb2a7c90 ffffffff810a5585
 ffff8800bb2a7cc8 ffffffff81092fe1 0000000000000000 ffff88017ee19f00
Call Trace:
Inexact backtrace:

 [<ffffffff810a5585>] ? wake_up_process+0x15/0x20
 [<ffffffff81092fe1>] ? insert_work+0x81/0xc0
 [<ffffffff8109326c>] ? __queue_work+0x24c/0x390
 [<ffffffff81093947>] ? queue_work_on+0x27/0x40
 [<ffffffff814732db>] ? tty_flip_buffer_push+0x2b/0x30
 [<ffffffff81474f1a>] ? pty_write+0x4a/0x60
 [<ffffffff8146e5c6>] ? n_tty_write+0x1b6/0x4d0
 [<ffffffff810bd330>] ? __wake_up_sync+0x20/0x20
 [<ffffffff8146a96b>] ? tty_write+0x1cb/0x2b0
 [<ffffffff8146e410>] ? n_tty_open+0xe0/0xe0
 [<ffffffff811fa858>] ? __vfs_write+0x28/0xf0
 [<ffffffff81334a48>] ? apparmor_file_permission+0x18/0x20
 [<ffffffff812ff05d>] ? security_file_permission+0x3d/0xc0
 [<ffffffff811facbf>] ? rw_verify_area+0x4f/0xe0
 [<ffffffff811faf29>] ? vfs_write+0xa9/0x1a0
 [<ffffffff811fbb26>] ? SyS_write+0x46/0xa0
 [<ffffffff816a96f6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 0
00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
RIP  [<ffff88023fd40000>] 0xffff88023fd40000
 RSP <ffff8800bb2a7c50>
CR2: ffff88023fd40000
---[ end trace 14d86b882766d1bf ]---

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-02-29 15:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26 18:05 BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2] Linus Torvalds
2016-02-26 18:17 ` Borislav Petkov
2016-02-26 18:18 ` Peter Hurley
2016-02-26 19:44 ` Linus Torvalds
2016-02-26 19:59   ` Robert Święcki
2016-02-29  7:39     ` Jiri Slaby
2016-02-29 12:43       ` Henrique de Moraes Holschuh
  -- strict thread matches above, loose matches on Subject: below --
2016-02-17 20:37 Linux 4.4.2 Greg KH
2016-02-25 10:12 ` BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2] Jiri Slaby
2016-02-25 18:40   ` Peter Hurley
2016-02-25 19:09     ` Linus Torvalds
2016-02-25 19:23       ` Steven Rostedt
2016-02-26  8:25         ` Jiri Slaby
2016-02-25 20:32       ` Peter Hurley
2016-02-25 20:51         ` Linus Torvalds
2016-02-25 21:32           ` Jiri Slaby
2016-02-25 22:33             ` Peter Hurley
2016-02-26  0:38               ` Peter Hurley
2016-02-26  8:45                 ` Jiri Slaby
2016-02-26  0:38             ` Linus Torvalds
2016-02-26  8:56               ` Jiri Slaby
2016-02-26  9:23                 ` Jiri Slaby
2016-02-26  9:50                   ` Jiri Slaby
2016-02-26 16:34                     ` Greg KH
2016-02-26 17:12                 ` Linus Torvalds
2016-02-29 15:45                   ` Paolo Bonzini
2016-02-26 17:52                 ` Peter Hurley
2016-02-25 21:43           ` Peter Hurley
2016-02-25 22:00           ` Jiri Kosina
2016-02-26  8:31             ` Jiri Slaby
2016-02-26  8:15     ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).