linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* frequent lockups in 3.18rc4
@ 2014-11-14 21:31 Dave Jones
  2014-11-14 22:01 ` Linus Torvalds
  2014-11-17 15:07 ` Don Zickus
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-14 21:31 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Linus Torvalds

I'm not sure how long this goes back (3.17 was fine afair) but I'm
seeing these several times a day lately..


NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c129:25570]
irq event stamp: 74224
hardirqs last  enabled at (74223): [<ffffffff9c875664>] restore_args+0x0/0x30
hardirqs last disabled at (74224): [<ffffffff9c8759aa>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (74222): [<ffffffff9c07f43a>] __do_softirq+0x26a/0x6f0
softirqs last disabled at (74209): [<ffffffff9c07fb4d>] irq_exit+0x13d/0x170
CPU: 3 PID: 25570 Comm: trinity-c129 Not tainted 3.18.0-rc4+ #83 [loadavg: 198.04 186.66 181.58 24/442 26708]
task: ffff880213442f00 ti: ffff8801ea714000 task.ti: ffff8801ea714000
RIP: 0010:[<ffffffff9c11e98a>]  [<ffffffff9c11e98a>] generic_exec_single+0xea/0x1d0
RSP: 0018:ffff8801ea717a08  EFLAGS: 00000202
RAX: ffff880213442f00 RBX: ffffffff9c875664 RCX: 0000000000000006
RDX: 0000000000001370 RSI: ffff880213443790 RDI: ffff880213442f00
RBP: ffff8801ea717a68 R08: ffff880242b56690 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801ea717978
R13: ffff880213442f00 R14: ffff8801ea714000 R15: ffff880213442f00
FS:  00007f240994e700(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 000000019a017000 CR4: 00000000001407e0
DR0: 00007fb3367e0000 DR1: 00007f82542ab000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffffffff9ce4c620 0000000000000000 ffffffff9c048b20 ffff8801ea717b18
 0000000000000003 0000000052e0da3d ffffffff9cc7ef3c 0000000000000002
 ffffffff9c048b20 ffff8801ea717b18 0000000000000001 0000000000000003
Call Trace:
 [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
 [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
 [<ffffffff9c11ead6>] smp_call_function_single+0x66/0x110
 [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
 [<ffffffff9c11f021>] smp_call_function_many+0x2f1/0x390
 [<ffffffff9c049300>] flush_tlb_mm_range+0xe0/0x370
 [<ffffffff9c1d95a2>] tlb_flush_mmu_tlbonly+0x42/0x50
 [<ffffffff9c1d9cb5>] tlb_finish_mmu+0x45/0x50
 [<ffffffff9c1daf59>] zap_page_range_single+0x119/0x170
 [<ffffffff9c1db140>] unmap_mapping_range+0x140/0x1b0
 [<ffffffff9c1c7edd>] shmem_fallocate+0x43d/0x540
 [<ffffffff9c0b111b>] ? preempt_count_sub+0xab/0x100
 [<ffffffff9c0cdac7>] ? prepare_to_wait+0x27/0x80
 [<ffffffff9c2287f3>] ? __sb_start_write+0x103/0x1d0
 [<ffffffff9c223aba>] do_fallocate+0x12a/0x1c0
 [<ffffffff9c1f0bd3>] SyS_madvise+0x3d3/0x890
 [<ffffffff9c1a40d2>] ? context_tracking_user_exit+0x52/0x260
 [<ffffffff9c013ebd>] ? syscall_trace_enter_phase2+0x10d/0x3d0
 [<ffffffff9c874c89>] tracesys_phase2+0xd4/0xd9
Code: 63 c7 48 89 de 48 89 df 48 c7 c2 c0 50 1d 00 48 03 14 c5 40 b9 f2 9c e8 d5 ea 2b 00 84 c0 74 0b e9 bc 00 00 00 0f 1f 40 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 4d c8 65 48 33 0c 25 28 00 00 00 
Kernel panic - not syncing: softlockup: hung tasks


I've got a local hack to dump loadavg on traces, and as you can see in that
example, the machine was really busy, but we were at least making progress
before the trace spewed, and the machine rebooted. (I have reboot-on-lockup sysctl
set, without it, the machine just wedges indefinitely shortly after the spew).

The trace doesn't really enlighten me as to what we should be doing
to prevent this though.

ideas?
I can try to bisect it, but it takes hours before it happens,
so it might take days to complete, and the next few weeks are
complicated timewise..

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 21:31 frequent lockups in 3.18rc4 Dave Jones
@ 2014-11-14 22:01 ` Linus Torvalds
  2014-11-14 22:30   ` Dave Jones
                     ` (2 more replies)
  2014-11-17 15:07 ` Don Zickus
  1 sibling, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-14 22:01 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel; +Cc: the arch/x86 maintainers

On Fri, Nov 14, 2014 at 1:31 PM, Dave Jones <davej@redhat.com> wrote:
> I'm not sure how long this goes back (3.17 was fine afair) but I'm
> seeing these several times a day lately..

Hmm. I don't see what would have changed in this area since v3.17.
There's a TLB range fix in mm/memory.c, but for the life of me I can't
see how that would possibly matter the way x86 does TLB flushing (if
the range fix does something bad and the range goes too large, x86
will just end up doing a full TLB invalidate instead).

Plus, judging by the fact that there's a stale "leave_mm+0x210/0x210"
(wouldn't that be the *next* function, namely do_flush_tlb_all())
pointer on the stack, I suspect that whole range-flushing doesn't even
trigger, and we are flushing everything.

But since you say "several times a day", just for fun, can you test
the follow-up patch to that one-liner fix that Will Deacon posted
today (Subject: "[PATCH] mmu_gather: move minimal range calculations
into generic code"). That does some further cleanup in this area.

I don't see any changes to the x86 IPI or TLB flush handling, but
maybe I'm missing something, so I'm adding the x86 maintainers to the
cc.

> I've got a local hack to dump loadavg on traces, and as you can see in that
> example, the machine was really busy, but we were at least making progress
> before the trace spewed, and the machine rebooted. (I have reboot-on-lockup sysctl
> set, without it, the machine just wedges indefinitely shortly after the spew).
>
> The trace doesn't really enlighten me as to what we should be doing
> to prevent this though.
>
> ideas?

I can't say I have any ideas except to point at the TLB range patch,
and quite frankly, I don't see how that would matter.

If Will's patch doesn't make a difference, what about reverting that
ce9ec37bddb6? Although it really *is* a "obvious bugfix", and I really
don't see why any of this would be noticeable on x86 (it triggered
issues on ARM64, but that was because ARM64 cared much more about the
exact range).

> I can try to bisect it, but it takes hours before it happens,
> so it might take days to complete, and the next few weeks are
> complicated timewise..

Hmm. Even narrowing it down a bit might help, ie if you could get say
four bisections in over a day, and see if that at least says "ok, it's
likely one of these pulls".

But yeah, I can see it being painful, so maybe a quick check of the
TLB ones, even if I can't for the life see why they would possibly
matter.

                 Linus

---
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c129:25570]
> irq event stamp: 74224
> hardirqs last  enabled at (74223): [<ffffffff9c875664>] restore_args+0x0/0x30
> hardirqs last disabled at (74224): [<ffffffff9c8759aa>] apic_timer_interrupt+0x6a/0x80
> softirqs last  enabled at (74222): [<ffffffff9c07f43a>] __do_softirq+0x26a/0x6f0
> softirqs last disabled at (74209): [<ffffffff9c07fb4d>] irq_exit+0x13d/0x170
> CPU: 3 PID: 25570 Comm: trinity-c129 Not tainted 3.18.0-rc4+ #83 [loadavg: 198.04 186.66 181.58 24/442 26708]
> RIP: 0010:[<ffffffff9c11e98a>]  [<ffffffff9c11e98a>] generic_exec_single+0xea/0x1d0
> Call Trace:
>  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
>  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
>  [<ffffffff9c11ead6>] smp_call_function_single+0x66/0x110
>  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
>  [<ffffffff9c11f021>] smp_call_function_many+0x2f1/0x390
>  [<ffffffff9c049300>] flush_tlb_mm_range+0xe0/0x370
>  [<ffffffff9c1d95a2>] tlb_flush_mmu_tlbonly+0x42/0x50
>  [<ffffffff9c1d9cb5>] tlb_finish_mmu+0x45/0x50
>  [<ffffffff9c1daf59>] zap_page_range_single+0x119/0x170
>  [<ffffffff9c1db140>] unmap_mapping_range+0x140/0x1b0
>  [<ffffffff9c1c7edd>] shmem_fallocate+0x43d/0x540
>  [<ffffffff9c223aba>] do_fallocate+0x12a/0x1c0
>  [<ffffffff9c1f0bd3>] SyS_madvise+0x3d3/0x890
>  [<ffffffff9c874c89>] tracesys_phase2+0xd4/0xd9
> Kernel panic - not syncing: softlockup: hung tasks

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 22:01 ` Linus Torvalds
@ 2014-11-14 22:30   ` Dave Jones
  2014-11-14 22:55   ` Thomas Gleixner
  2014-11-15 21:34   ` Dave Jones
  2 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-14 22:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Fri, Nov 14, 2014 at 02:01:27PM -0800, Linus Torvalds wrote:
 
 > Plus, judging by the fact that there's a stale "leave_mm+0x210/0x210"
 > (wouldn't that be the *next* function, namely do_flush_tlb_all())
 > pointer on the stack, I suspect that whole range-flushing doesn't even
 > trigger, and we are flushing everything.
 > 
 > But since you say "several times a day", just for fun, can you test
 > the follow-up patch to that one-liner fix that Will Deacon posted
 > today (Subject: "[PATCH] mmu_gather: move minimal range calculations
 > into generic code"). That does some further cleanup in this area.

I'll give it a shot. Should know by the morning if it changes anything.

 > > The trace doesn't really enlighten me as to what we should be doing
 > > to prevent this though.
 > >
 > > ideas?
 > 
 > I can't say I have any ideas except to point at the TLB range patch,
 > and quite frankly, I don't see how that would matter.
 > 
 > If Will's patch doesn't make a difference, what about reverting that
 > ce9ec37bddb6? Although it really *is* a "obvious bugfix", and I really
 > don't see why any of this would be noticeable on x86 (it triggered
 > issues on ARM64, but that was because ARM64 cared much more about the
 > exact range).

Digging through the serial console logs, there was one other trace variant,
which is even less informative..

NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c104:19168]
irq event stamp: 223186
hardirqs last  enabled at (223185): [<ffffffff941a4092>] context_tracking_user_exit+0x52/0x260
hardirqs last disabled at (223186): [<ffffffff948756aa>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (187030): [<ffffffff9407f43a>] __do_softirq+0x26a/0x6f0
softirqs last disabled at (187017): [<ffffffff9407fb4d>] irq_exit+0x13d/0x170
CPU: 3 PID: 19168 Comm: trinity-c104 Not tainted 3.18.0-rc4+ #82 [loadavg: 99.30 85.88 82.88 9/303 19302]
task: ffff88023f8b4680 ti: ffff880157418000 task.ti: ffff880157418000
RIP: 0010:[<ffffffff941a4094>]  [<ffffffff941a4094>] context_tracking_user_exit+0x54/0x260
RSP: 0018:ffff88015741bee8  EFLAGS: 00000246
RAX: ffff88023f8b4680 RBX: ffffffff940b111b RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88023f8b4680
RBP: ffff88015741bef8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88015741bf58 R14: ffff88023f8b4ae8 R15: ffff88023f8b4b18
FS:  00007f9a0789b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000003dfa1b7c90 CR3: 0000000165f3c000 CR4: 00000000001407e0
DR0: 00000000ffffffbf DR1: 00007f2c0c3d9000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000000080000 ffff88015741c000 ffff88015741bf78 ffffffff94013d35
 ffff88015741bf28 ffffffff940d865d 0000000000004b02 0000000000000000
 00007f9a071bb000 ffffffff943d816b 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff94013d35>] syscall_trace_enter_phase1+0x125/0x1a0
 [<ffffffff940d865d>] ? trace_hardirqs_on_caller+0x16d/0x210
 [<ffffffff943d816b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff9487487f>] tracesys+0x14/0x4a
Code: fa e8 51 0a f3 ff 48 c7 c7 26 52 cd 94 e8 f5 21 24 00 65 8b 04 25 f4 f8 1c 00 83 f8 01 74 28 f6 c7 02 74 13 e8 6e 46 f3 ff 53 9d <5b> 41 5c 5d c3 0f 1f 80
 00 00 00 00 53 9d e8 19 0a f3 ff eb eb 

It looks like I've been seeing these since 3.18-rc1 though,
but for those, the machine crashed before the trace even made it
over usb-serial, leaving just the "NMI watchdog" line.


 > > I can try to bisect it, but it takes hours before it happens,
 > > so it might take days to complete, and the next few weeks are
 > > complicated timewise..
 > 
 > Hmm. Even narrowing it down a bit might help, ie if you could get say
 > four bisections in over a day, and see if that at least says "ok, it's
 > likely one of these pulls".
 > 
 > But yeah, I can see it being painful, so maybe a quick check of the
 > TLB ones, even if I can't for the life see why they would possibly
 > matter.

Assuming the NMI watchdog traces I saw in rc1 are the same problem,
I'll see if I can bisect between .17 and .18rc1 on Monday, and see
if that yields anything interesting.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 22:01 ` Linus Torvalds
  2014-11-14 22:30   ` Dave Jones
@ 2014-11-14 22:55   ` Thomas Gleixner
  2014-11-14 23:32     ` Dave Jones
  2014-11-15  1:59     ` Linus Torvalds
  2014-11-15 21:34   ` Dave Jones
  2 siblings, 2 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-14 22:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, the arch/x86 maintainers

On Fri, 14 Nov 2014, Linus Torvalds wrote:
> On Fri, Nov 14, 2014 at 1:31 PM, Dave Jones <davej@redhat.com> wrote:
> > I'm not sure how long this goes back (3.17 was fine afair) but I'm
> > seeing these several times a day lately..
>
> Plus, judging by the fact that there's a stale "leave_mm+0x210/0x210"
> (wouldn't that be the *next* function, namely do_flush_tlb_all())
> pointer on the stack, I suspect that whole range-flushing doesn't even
> trigger, and we are flushing everything.

This stale entry is not relevant here because the thing is stuck in
generic_exec_single().
 
> > NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c129:25570]
> > RIP: 0010:[<ffffffff9c11e98a>]  [<ffffffff9c11e98a>] generic_exec_single+0xea/0x1d0

> > Call Trace:
> >  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
> >  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
> >  [<ffffffff9c11ead6>] smp_call_function_single+0x66/0x110
> >  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
> >  [<ffffffff9c11f021>] smp_call_function_many+0x2f1/0x390
> >  [<ffffffff9c049300>] flush_tlb_mm_range+0xe0/0x370

flush_tlb_mm_range()
	.....
out:
        if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
                flush_tlb_others(mm_cpumask(mm), mm, start, end);

which calls

      smp_call_function_many() via native_flush_tlb_others()

which is either inlined or not on the stack the invocation of
smp_call_function_many() is a tail call.

So from smp_call_function_many() we end up via
smp_call_function_single() in generic_exec_single().

So the only ways to get stuck there are:

     csd_lock(csd);
and
     csd_lock_wait(csd);

The called function is flush_tlb_func() and I really can't see why
that would get stuck at all.

So this looks more like a smp function call fuckup.

I assume Dave is running that stuff on KVM. So it might be worth while
to look at the IPI magic there.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 22:55   ` Thomas Gleixner
@ 2014-11-14 23:32     ` Dave Jones
  2014-11-15  0:36       ` Thomas Gleixner
  2014-11-15  1:59     ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-14 23:32 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Fri, Nov 14, 2014 at 11:55:30PM +0100, Thomas Gleixner wrote:
 
 > So this looks more like a smp function call fuckup.
 > 
 > I assume Dave is running that stuff on KVM. So it might be worth while
 > to look at the IPI magic there.

no, bare metal.

    Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 23:32     ` Dave Jones
@ 2014-11-15  0:36       ` Thomas Gleixner
  2014-11-15  2:40         ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-15  0:36 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Fri, 14 Nov 2014, Dave Jones wrote:

> On Fri, Nov 14, 2014 at 11:55:30PM +0100, Thomas Gleixner wrote:
>  
>  > So this looks more like a smp function call fuckup.
>  > 
>  > I assume Dave is running that stuff on KVM. So it might be worth while
>  > to look at the IPI magic there.
> 
> no, bare metal.

Ok, but that does not change the fact that we are stuck in
smp_function_call land.

Enabling softlockup_all_cpu_backtrace will probably not help much as
we will end up waiting for csd_lock again :(

Is the machine still accesible when this happens? If yes, we might
enable a few trace points and functions and read out the trace
buffer. If not, we could just panic the machine and dump the trace
buffer over serial.

Sigh

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 22:55   ` Thomas Gleixner
  2014-11-14 23:32     ` Dave Jones
@ 2014-11-15  1:59     ` Linus Torvalds
  2014-11-17 21:22       ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-15  1:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Dave Jones, Linux Kernel, the arch/x86 maintainers

On Fri, Nov 14, 2014 at 2:55 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> This stale entry is not relevant here because the thing is stuck in
> generic_exec_single().

That wasn't really my argument. The fact that "do_flush_tlb_all()" was
left over on the stack frame implies that we're not doing the
range-flush, and if it was some odd bug with a negative range or
something like that (due to the fix in commit ce9ec37bddb6), I'd
expect the lockup to be due to a hung do_kernel_range_flush() or
something. But the range flushing never even happens.

> So from smp_call_function_many() we end up via
> smp_call_function_single() in generic_exec_single().
>
> So the only ways to get stuck there are:
>
>      csd_lock(csd);
> and
>      csd_lock_wait(csd);

Judging by the code disassembly, it's the "csd_lock_wait(csd)" at the
end. The disassembly looks like

  29: f3 90                 pause
  2b:* f6 43 18 01           testb  $0x1,0x18(%rbx) <-- trapping instruction
  2f: 75 f8                 jne    0x29
  31: 31 c0                 xor    %eax,%eax

and that "xor %eax,%eax" seems to be part of the "return 0"
immediately afterwards.

But that's not entirely conclusive, it's just a strong hint.

It does sound like there might be some IPI issue. I just don't see
*any* changes in this area since 3.17. Some unrelated APIC change? I
don't see that either. As you noted, there are KVM changes, but
apparently that isn't involved either.

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-15  0:36       ` Thomas Gleixner
@ 2014-11-15  2:40         ` Dave Jones
  2014-11-16 12:16           ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-15  2:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Sat, Nov 15, 2014 at 01:36:41AM +0100, Thomas Gleixner wrote:
 > On Fri, 14 Nov 2014, Dave Jones wrote:
 > 
 > > On Fri, Nov 14, 2014 at 11:55:30PM +0100, Thomas Gleixner wrote:
 > >  
 > >  > So this looks more like a smp function call fuckup.
 > >  > 
 > >  > I assume Dave is running that stuff on KVM. So it might be worth while
 > >  > to look at the IPI magic there.
 > > 
 > > no, bare metal.
 > 
 > Ok, but that does not change the fact that we are stuck in
 > smp_function_call land.
 > 
 > Enabling softlockup_all_cpu_backtrace will probably not help much as
 > we will end up waiting for csd_lock again :(
 > 
 > Is the machine still accesible when this happens? If yes, we might
 > enable a few trace points and functions and read out the trace
 > buffer. If not, we could just panic the machine and dump the trace
 > buffer over serial.

No, it wedges solid. Even though it says something like "CPU3 locked up",
aparently all cores also get stuck.
9 times out of 10 it doesn't stay alive long enough to even get the full
trace out over usb-serial.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 22:01 ` Linus Torvalds
  2014-11-14 22:30   ` Dave Jones
  2014-11-14 22:55   ` Thomas Gleixner
@ 2014-11-15 21:34   ` Dave Jones
  2014-11-16  1:40     ` Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-15 21:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Fri, Nov 14, 2014 at 02:01:27PM -0800, Linus Torvalds wrote:

 > But since you say "several times a day", just for fun, can you test
 > the follow-up patch to that one-liner fix that Will Deacon posted
 > today (Subject: "[PATCH] mmu_gather: move minimal range calculations
 > into generic code"). That does some further cleanup in this area.

A few hours ago it hit the NMI watchdog again with that patch applied.
Incomplete trace, but it looks different based on what did make it over.
Different RIP at least.

[65155.054155] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c127:12559]
[65155.054573] irq event stamp: 296752
[65155.054589] hardirqs last  enabled at (296751): [<ffffffff9d87403d>] _raw_spin_unlock_irqrestore+0x5d/0x80
[65155.054625] hardirqs last disabled at (296752): [<ffffffff9d875cea>] apic_timer_interrupt+0x6a/0x80
[65155.054657] softirqs last  enabled at (296188): [<ffffffff9d259943>] bdi_queue_work+0x83/0x270
[65155.054688] softirqs last disabled at (296184): [<ffffffff9d259920>] bdi_queue_work+0x60/0x270
[65155.054721] CPU: 1 PID: 12559 Comm: trinity-c127 Not tainted 3.18.0-rc4+ #84 [loadavg: 209.68 187.90 185.33 34/431 17515]
[65155.054795] task: ffff88023f664680 ti: ffff8801649f0000 task.ti: ffff8801649f0000
[65155.054820] RIP: 0010:[<ffffffff9d87403f>]  [<ffffffff9d87403f>] _raw_spin_unlock_irqrestore+0x5f/0x80
[65155.054852] RSP: 0018:ffff8801649f3be8  EFLAGS: 00000292
[65155.054872] RAX: ffff88023f664680 RBX: 0000000000000007 RCX: 0000000000000007
[65155.054895] RDX: 00000000000029e0 RSI: ffff88023f664ea0 RDI: ffff88023f664680
[65155.054919] RBP: ffff8801649f3bf8 R08: 0000000000000000 R09: 0000000000000000
[65155.055956] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[65155.056985] R13: ffff8801649f3b58 R14: ffffffff9d3e7d0e R15: 00000000000003e0
[65155.058037] FS:  00007f0dc957c700(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[65155.059083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[65155.060121] CR2: 00007f0dc958e000 CR3: 000000022f31e000 CR4: 00000000001407e0
[65155.061152] DR0: 00007f54162bc000 DR1: 00007feb92c3d000 DR2: 0000000000000000
[65155.062180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[65155.063202] Stack:

And that's all she wrote.

 > If Will's patch doesn't make a difference, what about reverting that
 > ce9ec37bddb6? Although it really *is* a "obvious bugfix", and I really
 > don't see why any of this would be noticeable on x86 (it triggered
 > issues on ARM64, but that was because ARM64 cared much more about the
 > exact range).

I'll try that next, and check in on it tomorrow.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-15 21:34   ` Dave Jones
@ 2014-11-16  1:40     ` Dave Jones
  2014-11-16  6:33       ` Linus Torvalds
  2014-11-20 15:28       ` Frederic Weisbecker
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-16  1:40 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Sat, Nov 15, 2014 at 04:34:05PM -0500, Dave Jones wrote:
 > On Fri, Nov 14, 2014 at 02:01:27PM -0800, Linus Torvalds wrote:
 > 
 >  > But since you say "several times a day", just for fun, can you test
 >  > the follow-up patch to that one-liner fix that Will Deacon posted
 >  > today (Subject: "[PATCH] mmu_gather: move minimal range calculations
 >  > into generic code"). That does some further cleanup in this area.
 > 
 > A few hours ago it hit the NMI watchdog again with that patch applied.
 > Incomplete trace, but it looks different based on what did make it over.
 > Different RIP at least.
 > 
 > [65155.054155] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c127:12559]
 > [65155.054573] irq event stamp: 296752
 > [65155.054589] hardirqs last  enabled at (296751): [<ffffffff9d87403d>] _raw_spin_unlock_irqrestore+0x5d/0x80
 > [65155.054625] hardirqs last disabled at (296752): [<ffffffff9d875cea>] apic_timer_interrupt+0x6a/0x80
 > [65155.054657] softirqs last  enabled at (296188): [<ffffffff9d259943>] bdi_queue_work+0x83/0x270
 > [65155.054688] softirqs last disabled at (296184): [<ffffffff9d259920>] bdi_queue_work+0x60/0x270
 > [65155.054721] CPU: 1 PID: 12559 Comm: trinity-c127 Not tainted 3.18.0-rc4+ #84 [loadavg: 209.68 187.90 185.33 34/431 17515]
 > [65155.054795] task: ffff88023f664680 ti: ffff8801649f0000 task.ti: ffff8801649f0000
 > [65155.054820] RIP: 0010:[<ffffffff9d87403f>]  [<ffffffff9d87403f>] _raw_spin_unlock_irqrestore+0x5f/0x80
 > [65155.054852] RSP: 0018:ffff8801649f3be8  EFLAGS: 00000292
 > [65155.054872] RAX: ffff88023f664680 RBX: 0000000000000007 RCX: 0000000000000007
 > [65155.054895] RDX: 00000000000029e0 RSI: ffff88023f664ea0 RDI: ffff88023f664680
 > [65155.054919] RBP: ffff8801649f3bf8 R08: 0000000000000000 R09: 0000000000000000
 > [65155.055956] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 > [65155.056985] R13: ffff8801649f3b58 R14: ffffffff9d3e7d0e R15: 00000000000003e0
 > [65155.058037] FS:  00007f0dc957c700(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
 > [65155.059083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 > [65155.060121] CR2: 00007f0dc958e000 CR3: 000000022f31e000 CR4: 00000000001407e0
 > [65155.061152] DR0: 00007f54162bc000 DR1: 00007feb92c3d000 DR2: 0000000000000000
 > [65155.062180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
 > [65155.063202] Stack:
 > 
 > And that's all she wrote.
 > 
 >  > If Will's patch doesn't make a difference, what about reverting that
 >  > ce9ec37bddb6? Although it really *is* a "obvious bugfix", and I really
 >  > don't see why any of this would be noticeable on x86 (it triggered
 >  > issues on ARM64, but that was because ARM64 cared much more about the
 >  > exact range).
 > 
 > I'll try that next, and check in on it tomorrow.

No luck. Died even faster this time.

[  772.459481] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [modprobe:31400]
[  772.459858] irq event stamp: 3362
[  772.459872] hardirqs last  enabled at (3361): [<ffffffff941a437c>] context_tracking_user_enter+0x9c/0x2c0
[  772.459907] hardirqs last disabled at (3362): [<ffffffff94875bea>] apic_timer_interrupt+0x6a/0x80
[  772.459937] softirqs last  enabled at (0): [<ffffffff940764d5>] copy_process.part.26+0x635/0x1d80
[  772.459968] softirqs last disabled at (0): [<          (null)>]           (null)
[  772.459996] CPU: 3 PID: 31400 Comm: modprobe Not tainted 3.18.0-rc4+ #85 [loadavg: 207.70 163.33 92.64 11/433 31547]
[  772.460086] task: ffff88022f0b2f00 ti: ffff88019a944000 task.ti: ffff88019a944000
[  772.460110] RIP: 0010:[<ffffffff941a437e>]  [<ffffffff941a437e>] context_tracking_user_enter+0x9e/0x2c0
[  772.460142] RSP: 0018:ffff88019a947f00  EFLAGS: 00000282
[  772.460161] RAX: ffff88022f0b2f00 RBX: 0000000000000000 RCX: 0000000000000000
[  772.460184] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88022f0b2f00
[  772.460207] RBP: ffff88019a947f10 R08: 0000000000000000 R09: 0000000000000000
[  772.460229] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88019a947e90
[  772.460252] R13: ffffffff940f6d04 R14: ffff88019a947ec0 R15: ffff8802447cd640
[  772.460294] FS:  00007f3b71ee4700(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[  772.460362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  772.460391] CR2: 00007fffdad5af58 CR3: 000000011608e000 CR4: 00000000001407e0
[  772.460424] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  772.460447] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  772.460470] Stack:
[  772.460480]  ffff88019a947f58 00000000006233a8 ffff88019a947f40 ffffffff9401429d
[  772.460512]  00000000006233a8 000000000041d68a 00000000006233a8 0000000000000000
[  772.460543]  00000000006233a0 ffffffff94874fa4 000000001008feff 000507d93d73a434
[  772.460574] Call Trace:
[  772.461576]  [<ffffffff9401429d>] syscall_trace_leave+0xad/0x2e0
[  772.462572]  [<ffffffff94874fa4>] int_check_syscall_exit_work+0x34/0x3d
[  772.463575] Code: f8 1c 00 84 c0 75 46 48 c7 c7 51 53 cd 94 e8 aa 23 24 00 65 c7 04 25 f4 f8 1c 00 01 00 00 00 f6 c7 02 74 19 e8 84 43 f3 ff 53 9d <5b> 41 5c 5d c3 0f 1f 44 00 00 c3 0f 1f 80 00 00 00 00 53 9d e8 
[  772.465797] Kernel panic - not syncing: softlockup: hung tasks
[  772.466821] CPU: 3 PID: 31400 Comm: modprobe Tainted: G             L 3.18.0-rc4+ #85 [loadavg: 207.70 163.33 92.64 11/433 31547]
[  772.468915]  ffff88022f0b2f00 00000000de65d5f5 ffff880244603dc8 ffffffff94869e01
[  772.470031]  0000000000000000 ffffffff94c7599b ffff880244603e48 ffffffff94866b21
[  772.471085]  ffff880200000008 ffff880244603e58 ffff880244603df8 00000000de65d5f5
[  772.472141] Call Trace:
[  772.473183]  <IRQ>  [<ffffffff94869e01>] dump_stack+0x4f/0x7c
[  772.474253]  [<ffffffff94866b21>] panic+0xcf/0x202
[  772.475346]  [<ffffffff94154d1e>] watchdog_timer_fn+0x27e/0x290
[  772.476414]  [<ffffffff94106297>] __run_hrtimer+0xe7/0x740
[  772.477475]  [<ffffffff94106b64>] ? hrtimer_interrupt+0x94/0x270
[  772.478555]  [<ffffffff94154aa0>] ? watchdog+0x40/0x40
[  772.479627]  [<ffffffff94106be7>] hrtimer_interrupt+0x117/0x270
[  772.480703]  [<ffffffff940303db>] local_apic_timer_interrupt+0x3b/0x70
[  772.481777]  [<ffffffff948777f3>] smp_apic_timer_interrupt+0x43/0x60
[  772.482856]  [<ffffffff94875bef>] apic_timer_interrupt+0x6f/0x80
[  772.483915]  <EOI>  [<ffffffff941a437e>] ? context_tracking_user_enter+0x9e/0x2c0
[  772.484972]  [<ffffffff9401429d>] syscall_trace_leave+0xad/0x2e0
[  772.486042]  [<ffffffff94874fa4>] int_check_syscall_exit_work+0x34/0x3d
[  772.487187] Kernel Offset: 0x13000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)


	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-16  1:40     ` Dave Jones
@ 2014-11-16  6:33       ` Linus Torvalds
  2014-11-16 10:06         ` Markus Trippelsdorf
                           ` (2 more replies)
  2014-11-20 15:28       ` Frederic Weisbecker
  1 sibling, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-16  6:33 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Sat, Nov 15, 2014 at 5:40 PM, Dave Jones <davej@redhat.com> wrote:
>  >
>  > I'll try that next, and check in on it tomorrow.
>
> No luck. Died even faster this time.

Yeah, and your other lockups haven't even been TLB related. Not that
they look like anything else *either*.

I have no ideas left. I'd go for a bisection - rather than try random
things, at least bisection will get us a smaller set of suspects if
you can go through a few cycles of it. Even if you decide that you
want to run for most of a day before you are convinced it's all good,
a couple of days should get you a handful of bisection points (that's
assuming you hit a couple of bad ones too that turn bad in a shorter
while). And 4 or five bisections should get us from 11k commits down
to the ~600 commit range. That would be a huge improvement.

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-16  6:33       ` Linus Torvalds
@ 2014-11-16 10:06         ` Markus Trippelsdorf
  2014-11-16 18:33           ` Linus Torvalds
  2014-11-17 17:03         ` Dave Jones
  2014-11-26  0:25         ` Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Markus Trippelsdorf @ 2014-11-16 10:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, the arch/x86 maintainers

On 2014.11.15 at 22:33 -0800, Linus Torvalds wrote:
> On Sat, Nov 15, 2014 at 5:40 PM, Dave Jones <davej@redhat.com> wrote:
> >  >
> >  > I'll try that next, and check in on it tomorrow.
> >
> > No luck. Died even faster this time.
> 
> Yeah, and your other lockups haven't even been TLB related. Not that
> they look like anything else *either*.
> 
> I have no ideas left. I'd go for a bisection

Before starting a bisection you could try and disable transparent_hugepages.
There are strange bugs, that were introduced during this merge-window, in this
area. See: https://lkml.org/lkml/2014/11/4/144
https://lkml.org/lkml/2014/11/4/904
http://thread.gmane.org/gmane.linux.kernel.mm/124451

-- 
Markus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-15  2:40         ` Dave Jones
@ 2014-11-16 12:16           ` Thomas Gleixner
  0 siblings, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-16 12:16 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Fri, 14 Nov 2014, Dave Jones wrote:
> On Sat, Nov 15, 2014 at 01:36:41AM +0100, Thomas Gleixner wrote:
>  > On Fri, 14 Nov 2014, Dave Jones wrote:
>  > 
>  > > On Fri, Nov 14, 2014 at 11:55:30PM +0100, Thomas Gleixner wrote:
>  > >  
>  > >  > So this looks more like a smp function call fuckup.
>  > >  > 
>  > >  > I assume Dave is running that stuff on KVM. So it might be worth while
>  > >  > to look at the IPI magic there.
>  > > 
>  > > no, bare metal.
>  > 
>  > Ok, but that does not change the fact that we are stuck in
>  > smp_function_call land.
>  > 
>  > Enabling softlockup_all_cpu_backtrace will probably not help much as
>  > we will end up waiting for csd_lock again :(
>  > 
>  > Is the machine still accesible when this happens? If yes, we might
>  > enable a few trace points and functions and read out the trace
>  > buffer. If not, we could just panic the machine and dump the trace
>  > buffer over serial.
> 
> No, it wedges solid. Even though it says something like "CPU3 locked up",
> aparently all cores also get stuck.

Does not surprise me. Once the smp function call machinery is wedged...

> 9 times out of 10 it doesn't stay alive long enough to even get the full
> trace out over usb-serial.

usb-serial is definitely not the best tool for stuff like this. I
wonder whether netconsole might give us some more info.

Last time I looked into something like that on my laptop I had to
resort to a crash kernel to get anything useful out of the box.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-16 10:06         ` Markus Trippelsdorf
@ 2014-11-16 18:33           ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-16 18:33 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Dave Jones, Linux Kernel, the arch/x86 maintainers

On Sun, Nov 16, 2014 at 2:06 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> Before starting a bisection you could try and disable transparent_hugepages.
> There are strange bugs, that were introduced during this merge-window, in this
> area. See: https://lkml.org/lkml/2014/11/4/144
> https://lkml.org/lkml/2014/11/4/904
> http://thread.gmane.org/gmane.linux.kernel.mm/124451

Those look different, and hopefully should be fixed by commit
1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in
compact_zone"). Which admittedly isn't in -rc4 (it went in on
Thursday), but I think Dave tends to run git-of-the-day rather than
last rc, so he probably already had it.

I *think* that if it was the infinite compaction problem, you'd have
the soft-lockup reports showing that. Dave's are in random places.
Which is odd.

               Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-14 21:31 frequent lockups in 3.18rc4 Dave Jones
  2014-11-14 22:01 ` Linus Torvalds
@ 2014-11-17 15:07 ` Don Zickus
  1 sibling, 0 replies; 486+ messages in thread
From: Don Zickus @ 2014-11-17 15:07 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, Linus Torvalds

On Fri, Nov 14, 2014 at 04:31:24PM -0500, Dave Jones wrote:
> I'm not sure how long this goes back (3.17 was fine afair) but I'm
> seeing these several times a day lately..
> 
> 
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c129:25570]
> irq event stamp: 74224
> hardirqs last  enabled at (74223): [<ffffffff9c875664>] restore_args+0x0/0x30
> hardirqs last disabled at (74224): [<ffffffff9c8759aa>] apic_timer_interrupt+0x6a/0x80
> softirqs last  enabled at (74222): [<ffffffff9c07f43a>] __do_softirq+0x26a/0x6f0
> softirqs last disabled at (74209): [<ffffffff9c07fb4d>] irq_exit+0x13d/0x170
> CPU: 3 PID: 25570 Comm: trinity-c129 Not tainted 3.18.0-rc4+ #83 [loadavg: 198.04 186.66 181.58 24/442 26708]
> task: ffff880213442f00 ti: ffff8801ea714000 task.ti: ffff8801ea714000
> RIP: 0010:[<ffffffff9c11e98a>]  [<ffffffff9c11e98a>] generic_exec_single+0xea/0x1d0
> RSP: 0018:ffff8801ea717a08  EFLAGS: 00000202
> RAX: ffff880213442f00 RBX: ffffffff9c875664 RCX: 0000000000000006
> RDX: 0000000000001370 RSI: ffff880213443790 RDI: ffff880213442f00
> RBP: ffff8801ea717a68 R08: ffff880242b56690 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801ea717978
> R13: ffff880213442f00 R14: ffff8801ea714000 R15: ffff880213442f00
> FS:  00007f240994e700(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000004 CR3: 000000019a017000 CR4: 00000000001407e0
> DR0: 00007fb3367e0000 DR1: 00007f82542ab000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffffffff9ce4c620 0000000000000000 ffffffff9c048b20 ffff8801ea717b18
>  0000000000000003 0000000052e0da3d ffffffff9cc7ef3c 0000000000000002
>  ffffffff9c048b20 ffff8801ea717b18 0000000000000001 0000000000000003
> Call Trace:
>  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
>  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
>  [<ffffffff9c11ead6>] smp_call_function_single+0x66/0x110
>  [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
>  [<ffffffff9c11f021>] smp_call_function_many+0x2f1/0x390


Hi Dave,

When I usually see stuff like this, it is because another cpu is blocking
the IPI from smp_call_function_many from finishing, so this cpu waits
forever.

The problem usually becomes obvious with a dump of all cpus at the time
the lockup is detected.

Can you try adding 'softlockup_all_cpu_backtrace=1' to the kernel
commandline?  That should dump all the cpus to see if anything stands out.

Though I don't normally see it traverse down to smp_call_function_single.

Anyway something to try.

Cheers,
Don

>  [<ffffffff9c049300>] flush_tlb_mm_range+0xe0/0x370
>  [<ffffffff9c1d95a2>] tlb_flush_mmu_tlbonly+0x42/0x50
>  [<ffffffff9c1d9cb5>] tlb_finish_mmu+0x45/0x50
>  [<ffffffff9c1daf59>] zap_page_range_single+0x119/0x170
>  [<ffffffff9c1db140>] unmap_mapping_range+0x140/0x1b0
>  [<ffffffff9c1c7edd>] shmem_fallocate+0x43d/0x540
>  [<ffffffff9c0b111b>] ? preempt_count_sub+0xab/0x100
>  [<ffffffff9c0cdac7>] ? prepare_to_wait+0x27/0x80
>  [<ffffffff9c2287f3>] ? __sb_start_write+0x103/0x1d0
>  [<ffffffff9c223aba>] do_fallocate+0x12a/0x1c0
>  [<ffffffff9c1f0bd3>] SyS_madvise+0x3d3/0x890
>  [<ffffffff9c1a40d2>] ? context_tracking_user_exit+0x52/0x260
>  [<ffffffff9c013ebd>] ? syscall_trace_enter_phase2+0x10d/0x3d0
>  [<ffffffff9c874c89>] tracesys_phase2+0xd4/0xd9
> Code: 63 c7 48 89 de 48 89 df 48 c7 c2 c0 50 1d 00 48 03 14 c5 40 b9 f2 9c e8 d5 ea 2b 00 84 c0 74 0b e9 bc 00 00 00 0f 1f 40 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 4d c8 65 48 33 0c 25 28 00 00 00 
> Kernel panic - not syncing: softlockup: hung tasks
> 
> 
> I've got a local hack to dump loadavg on traces, and as you can see in that
> example, the machine was really busy, but we were at least making progress
> before the trace spewed, and the machine rebooted. (I have reboot-on-lockup sysctl
> set, without it, the machine just wedges indefinitely shortly after the spew).
> 
> The trace doesn't really enlighten me as to what we should be doing
> to prevent this though.
> 
> ideas?
> I can try to bisect it, but it takes hours before it happens,
> so it might take days to complete, and the next few weeks are
> complicated timewise..
> 
> 	Dave
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-16  6:33       ` Linus Torvalds
  2014-11-16 10:06         ` Markus Trippelsdorf
@ 2014-11-17 17:03         ` Dave Jones
  2014-11-17 19:59           ` Linus Torvalds
  2014-11-20 15:08           ` Frederic Weisbecker
  2014-11-26  0:25         ` Dave Jones
  2 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-17 17:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Sat, Nov 15, 2014 at 10:33:19PM -0800, Linus Torvalds wrote:
 
 > >  > I'll try that next, and check in on it tomorrow.
 > >
 > > No luck. Died even faster this time.
 > 
 > Yeah, and your other lockups haven't even been TLB related. Not that
 > they look like anything else *either*.
 > 
 > I have no ideas left. I'd go for a bisection - rather than try random
 > things, at least bisection will get us a smaller set of suspects if
 > you can go through a few cycles of it. Even if you decide that you
 > want to run for most of a day before you are convinced it's all good,
 > a couple of days should get you a handful of bisection points (that's
 > assuming you hit a couple of bad ones too that turn bad in a shorter
 > while). And 4 or five bisections should get us from 11k commits down
 > to the ~600 commit range. That would be a huge improvement.

Great start to the week: I decided to confirm my recollection that .17
was ok, only to hit this within 10 minutes.

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
CPU: 3 PID: 17176 Comm: trinity-c95 Not tainted 3.17.0+ #87
 0000000000000000 00000000f3a61725 ffff880244606bf0 ffffffff9583e9fa
 ffffffff95c67918 ffff880244606c78 ffffffff9583bcc0 0000000000000010
 ffff880244606c88 ffff880244606c20 00000000f3a61725 0000000000000000
Call Trace:
 <NMI>  [<ffffffff9583e9fa>] dump_stack+0x4e/0x7a
 [<ffffffff9583bcc0>] panic+0xd4/0x207
 [<ffffffff95150908>] watchdog_overflow_callback+0x118/0x120
 [<ffffffff95193dbe>] __perf_event_overflow+0xae/0x340
 [<ffffffff95192230>] ? perf_event_task_disable+0xa0/0xa0
 [<ffffffff9501a7bf>] ? x86_perf_event_set_period+0xbf/0x150
 [<ffffffff95194be4>] perf_event_overflow+0x14/0x20
 [<ffffffff95020676>] intel_pmu_handle_irq+0x206/0x410
 [<ffffffff9501966b>] perf_event_nmi_handler+0x2b/0x50
 [<ffffffff95007bb2>] nmi_handle+0xd2/0x390
 [<ffffffff95007ae5>] ? nmi_handle+0x5/0x390
 [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 [<ffffffff950080a2>] default_do_nmi+0x72/0x1c0
 [<ffffffff950082a8>] do_nmi+0xb8/0x100
 [<ffffffff9584b9aa>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 <<EOE>>  <IRQ>  [<ffffffff95101685>] lock_hrtimer_base.isra.18+0x25/0x50
 [<ffffffff951019d3>] hrtimer_try_to_cancel+0x33/0x1f0
 [<ffffffff95101baa>] hrtimer_cancel+0x1a/0x30
 [<ffffffff95113557>] tick_nohz_restart+0x17/0x90
 [<ffffffff95114533>] __tick_nohz_full_check+0xc3/0x100
 [<ffffffff9511457e>] nohz_full_kick_work_func+0xe/0x10
 [<ffffffff95188894>] irq_work_run_list+0x44/0x70
 [<ffffffff951888ea>] irq_work_run+0x2a/0x50
 [<ffffffff9510109b>] update_process_times+0x5b/0x70
 [<ffffffff95113325>] tick_sched_handle.isra.20+0x25/0x60
 [<ffffffff95113801>] tick_sched_timer+0x41/0x60
 [<ffffffff95102281>] __run_hrtimer+0x81/0x480
 [<ffffffff951137c0>] ? tick_sched_do_timer+0xb0/0xb0
 [<ffffffff95102977>] hrtimer_interrupt+0x117/0x270
 [<ffffffff950346d7>] local_apic_timer_interrupt+0x37/0x60
 [<ffffffff9584c44f>] smp_apic_timer_interrupt+0x3f/0x50
 [<ffffffff9584a86f>] apic_timer_interrupt+0x6f/0x80
 <EOI>  [<ffffffff950d3f3a>] ? lock_release_holdtime.part.28+0x9a/0x160
 [<ffffffff950ef3b7>] ? rcu_is_watching+0x27/0x60
 [<ffffffff9508cb75>] kill_pid_info+0xf5/0x130
 [<ffffffff9508ca85>] ? kill_pid_info+0x5/0x130
 [<ffffffff9508ccd3>] SYSC_kill+0x103/0x330
 [<ffffffff9508cc7c>] ? SYSC_kill+0xac/0x330
 [<ffffffff9519b592>] ? context_tracking_user_exit+0x52/0x1a0
 [<ffffffff950d6f1d>] ? trace_hardirqs_on_caller+0x16d/0x210
 [<ffffffff950d6fcd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff950137ad>] ? syscall_trace_enter+0x14d/0x330
 [<ffffffff9508f44e>] SyS_kill+0xe/0x10
 [<ffffffff95849b24>] tracesys+0xdd/0xe2
Kernel Offset: 0x14000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

It could a completely different cause for lockup, but seeing this now
has me wondering if perhaps it's something unrelated to the kernel.
I have recollection of running late .17rc's for days without incident,
and I'm pretty sure .17 was ok too.  But a few weeks ago I did upgrade
that test box to the Fedora 21 beta.  Which means I have a new gcc.
I'm not sure I really trust 4.9.1 yet, so maybe I'll see if I can
get 4.8 back on there and see if that's any better.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 17:03         ` Dave Jones
@ 2014-11-17 19:59           ` Linus Torvalds
  2014-11-18  2:09             ` Dave Jones
  2014-11-20 15:08           ` Frederic Weisbecker
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-17 19:59 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Mon, Nov 17, 2014 at 9:03 AM, Dave Jones <davej@redhat.com> wrote:
>
> It could a completely different cause for lockup, but seeing this now
> has me wondering if perhaps it's something unrelated to the kernel.
> I have recollection of running late .17rc's for days without incident,
> and I'm pretty sure .17 was ok too.  But a few weeks ago I did upgrade
> that test box to the Fedora 21 beta.  Which means I have a new gcc.
> I'm not sure I really trust 4.9.1 yet, so maybe I'll see if I can
> get 4.8 back on there and see if that's any better.

I'm not sure if I should be relieved or horrified.

Horrified, I think.

It really would be a wonderful thing to have some kind of "compiler
bisection" with mixed object files to see exactly which file it
miscompiles (and by "miscompiles" it might just be a kernel bug where
we are missing a barrier or something, and older gcc's just happened
to not show it - so it could still easily be a kernel problem).

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-15  1:59     ` Linus Torvalds
@ 2014-11-17 21:22       ` Linus Torvalds
  2014-11-17 22:31         ` Thomas Gleixner
  2014-11-17 23:04         ` Jens Axboe
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-17 21:22 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe, Ingo Molnar
  Cc: Dave Jones, Linux Kernel, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

On Fri, Nov 14, 2014 at 5:59 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Judging by the code disassembly, it's the "csd_lock_wait(csd)" at the
> end.

Btw, looking at this, I grew really suspicious of this code in csd_unlock():

        WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));

because that makes no sense at all. It basically removes a sanity
check, yet that sanity check makes a hell of a lot of sense. Unlocking
a CSD that is not locked is *wrong*.

The crazy code code comes from commit c84a83e2aaab ("smp: don't warn
about csd->flags having CSD_FLAG_LOCK cleared for !wait") by Jens, but
the explanation and the code is pure crap.

There is no way in hell that it is ever correct to unlock an entry
that isn't locked, so that whole CSD_FLAG_WAIT thing is buggy as hell.

The explanation in commit c84a83e2aaab says that  "blk-mq reuses the
request potentially immediately" and claims that that is somehow ok,
but that's utter BS. Even if you don't ever wait for it, the CSD lock
bit fundamentally also protects the "csd->llist" pointer. So what that
commit actually does is to just remove a safety check, and do so in a
very unsafe manner. And apparently block-mq re-uses something THAT IS
STILL ACTIVELY IN USE. That's just horrible.

Now, I think we might do this differently, by doing the "csd_unlock()"
after we have loaded everything from the csd, but *before* actually
calling the callback function. That would seem to be equivalent
(interrupts are disabled, so this will not result in the func()
possibly called twice), more efficient, _and_  not remove a useful
check.

Hmm? Completely untested patch attached. Jens, does this still work for you?

Am I missing something?

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1215 bytes --]

 kernel/smp.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f38a1e692259..fbeb9827bdae 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,7 +19,6 @@
 
 enum {
 	CSD_FLAG_LOCK		= 0x01,
-	CSD_FLAG_WAIT		= 0x02,
 };
 
 struct call_function_data {
@@ -126,7 +125,7 @@ static void csd_lock(struct call_single_data *csd)
 
 static void csd_unlock(struct call_single_data *csd)
 {
-	WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
+	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
 
 	/*
 	 * ensure we're all done before releasing data:
@@ -173,9 +172,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
 	csd->func = func;
 	csd->info = info;
 
-	if (wait)
-		csd->flags |= CSD_FLAG_WAIT;
-
 	/*
 	 * The list addition should be visible before sending the IPI
 	 * handler locks the list to pull the entry off it because of
@@ -250,8 +246,11 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
 	}
 
 	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
-		csd->func(csd->info);
+		smp_call_func_t func = csd->func;
+		void *info = csd->info;
 		csd_unlock(csd);
+
+		func(info);
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 21:22       ` Linus Torvalds
@ 2014-11-17 22:31         ` Thomas Gleixner
  2014-11-17 22:43           ` Thomas Gleixner
  2014-11-17 23:04         ` Jens Axboe
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-17 22:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Ingo Molnar, Dave Jones, Linux Kernel,
	the arch/x86 maintainers

On Mon, 17 Nov 2014, Linus Torvalds wrote:
> On Fri, Nov 14, 2014 at 5:59 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Judging by the code disassembly, it's the "csd_lock_wait(csd)" at the
> > end.
> 
> Btw, looking at this, I grew really suspicious of this code in csd_unlock():
> 
>         WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
> 
> because that makes no sense at all. It basically removes a sanity
> check, yet that sanity check makes a hell of a lot of sense. Unlocking
> a CSD that is not locked is *wrong*.
> 
> The crazy code code comes from commit c84a83e2aaab ("smp: don't warn
> about csd->flags having CSD_FLAG_LOCK cleared for !wait") by Jens, but
> the explanation and the code is pure crap.
> 
> There is no way in hell that it is ever correct to unlock an entry
> that isn't locked, so that whole CSD_FLAG_WAIT thing is buggy as hell.
> 
> The explanation in commit c84a83e2aaab says that  "blk-mq reuses the
> request potentially immediately" and claims that that is somehow ok,
> but that's utter BS. Even if you don't ever wait for it, the CSD lock
> bit fundamentally also protects the "csd->llist" pointer. So what that
> commit actually does is to just remove a safety check, and do so in a
> very unsafe manner. And apparently block-mq re-uses something THAT IS
> STILL ACTIVELY IN USE. That's just horrible.
>  
> Now, I think we might do this differently, by doing the "csd_unlock()"
> after we have loaded everything from the csd, but *before* actually
> calling the callback function. That would seem to be equivalent
> (interrupts are disabled, so this will not result in the func()
> possibly called twice), more efficient, _and_  not remove a useful
> check.
> 
> Hmm? Completely untested patch attached. Jens, does this still work for you?
> 
> Am I missing something?

Yes. :)

> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -19,7 +19,6 @@
>  
>  enum {
>  	CSD_FLAG_LOCK		= 0x01,
> -	CSD_FLAG_WAIT		= 0x02,
>  };
>  
>  struct call_function_data {
> @@ -126,7 +125,7 @@ static void csd_lock(struct call_single_data *csd)
>  
>  static void csd_unlock(struct call_single_data *csd)
>  {
> -	WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
> +	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
>  
>  	/*
>  	 * ensure we're all done before releasing data:
> @@ -173,9 +172,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
>  	csd->func = func;
>  	csd->info = info;
>  
> -	if (wait)
> -		csd->flags |= CSD_FLAG_WAIT;
> -
>  	/*
>  	 * The list addition should be visible before sending the IPI
>  	 * handler locks the list to pull the entry off it because of
> @@ -250,8 +246,11 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
>  	}
>  
>  	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> -		csd->func(csd->info);
> +		smp_call_func_t func = csd->func;
> +		void *info = csd->info;
>  		csd_unlock(csd);
> +
> +		func(info);

No, that won't work for synchronous calls:

    CPU 0      	    		CPU 1

    csd_lock(csd);
    queue_csd();
    ipi();
				func = csd->func;
				info = csd->info;
				csd_unlock(csd);
    csd_lock_wait();    
				func(info);
   
The csd_lock_wait() side will succeed and therefor assume that the
call has been completed while the function has not been called at
all. Interesting explosions to follow.

The proper solution is to revert that commit and properly analyze the
problem which Jens was trying to solve and work from there.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 22:31         ` Thomas Gleixner
@ 2014-11-17 22:43           ` Thomas Gleixner
  2014-11-17 22:58             ` Jens Axboe
  2014-11-17 23:59             ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-17 22:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Ingo Molnar, Dave Jones, Linux Kernel,
	the arch/x86 maintainers

On Mon, 17 Nov 2014, Thomas Gleixner wrote:
> On Mon, 17 Nov 2014, Linus Torvalds wrote:
> >  	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> > -		csd->func(csd->info);
> > +		smp_call_func_t func = csd->func;
> > +		void *info = csd->info;
> >  		csd_unlock(csd);
> > +
> > +		func(info);
> 
> No, that won't work for synchronous calls:
> 
>     CPU 0      	    		CPU 1
> 
>     csd_lock(csd);
>     queue_csd();
>     ipi();
> 				func = csd->func;
> 				info = csd->info;
> 				csd_unlock(csd);
>     csd_lock_wait();    
> 				func(info);
>    
> The csd_lock_wait() side will succeed and therefor assume that the
> call has been completed while the function has not been called at
> all. Interesting explosions to follow.
> 
> The proper solution is to revert that commit and properly analyze the
> problem which Jens was trying to solve and work from there.

So a combo of both (Jens and yours) might do the trick. Patch below.

I think what Jens was trying to solve is:

     CPU 0      	    		CPU 1
 
     csd_lock(csd);
     queue_csd();
     ipi();
 				csd->func(csd->info);
     wait_for_completion(csd);
				   complete(csd);
     reuse_csd(csd);		
				csd_unlock(csd);

Thanks,

	tglx	

Index: linux/kernel/smp.c
===================================================================
--- linux.orig/kernel/smp.c
+++ linux/kernel/smp.c
@@ -126,7 +126,7 @@ static void csd_lock(struct call_single_
 
 static void csd_unlock(struct call_single_data *csd)
 {
-	WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
+	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
 
 	/*
 	 * ensure we're all done before releasing data:
@@ -250,8 +250,23 @@ static void flush_smp_call_function_queu
 	}
 
 	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
-		csd->func(csd->info);
-		csd_unlock(csd);
+
+		/*
+		 * For synchronous calls we are not allowed to unlock
+		 * before the callback returned. For the async case
+		 * its the responsibility of the caller to keep
+		 * csd->info consistent while the callback runs.
+		 */
+		if (csd->flags & CSD_FLAG_WAIT) {
+			csd->func(csd->info);
+			csd_unlock(csd);
+		} else {
+			smp_call_func_t func = csd->func;
+			void *info = csd->info;
+
+			csd_unlock(csd);
+			func(info);
+		}
 	}
 
 	/*

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 22:43           ` Thomas Gleixner
@ 2014-11-17 22:58             ` Jens Axboe
  2014-11-17 23:59             ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Jens Axboe @ 2014-11-17 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, Linus Torvalds
  Cc: Ingo Molnar, Dave Jones, Linux Kernel, the arch/x86 maintainers

On 11/17/2014 03:43 PM, Thomas Gleixner wrote:
> On Mon, 17 Nov 2014, Thomas Gleixner wrote:
>> On Mon, 17 Nov 2014, Linus Torvalds wrote:
>>>  	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
>>> -		csd->func(csd->info);
>>> +		smp_call_func_t func = csd->func;
>>> +		void *info = csd->info;
>>>  		csd_unlock(csd);
>>> +
>>> +		func(info);
>>
>> No, that won't work for synchronous calls:
>>
>>     CPU 0      	    		CPU 1
>>
>>     csd_lock(csd);
>>     queue_csd();
>>     ipi();
>> 				func = csd->func;
>> 				info = csd->info;
>> 				csd_unlock(csd);
>>     csd_lock_wait();    
>> 				func(info);
>>    
>> The csd_lock_wait() side will succeed and therefor assume that the
>> call has been completed while the function has not been called at
>> all. Interesting explosions to follow.
>>
>> The proper solution is to revert that commit and properly analyze the
>> problem which Jens was trying to solve and work from there.
> 
> So a combo of both (Jens and yours) might do the trick. Patch below.
> 
> I think what Jens was trying to solve is:
> 
>      CPU 0      	    		CPU 1
>  
>      csd_lock(csd);
>      queue_csd();
>      ipi();
>  				csd->func(csd->info);
>      wait_for_completion(csd);
> 				   complete(csd);
>      reuse_csd(csd);		
> 				csd_unlock(csd);

Maybe... The above looks ok to me from a functional point of view, but
now I can't convince myself that the blk-mq use case is correct.

I'll try and backout the original patch and reproduce the issue, that
should jog my memory and give me full understanding of what the issue I
faced back then was.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 21:22       ` Linus Torvalds
  2014-11-17 22:31         ` Thomas Gleixner
@ 2014-11-17 23:04         ` Jens Axboe
  2014-11-17 23:17           ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Jens Axboe @ 2014-11-17 23:04 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner, Ingo Molnar
  Cc: Dave Jones, Linux Kernel, the arch/x86 maintainers

On 11/17/2014 02:22 PM, Linus Torvalds wrote:
> On Fri, Nov 14, 2014 at 5:59 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Judging by the code disassembly, it's the "csd_lock_wait(csd)" at the
>> end.
> 
> Btw, looking at this, I grew really suspicious of this code in csd_unlock():
> 
>         WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
> 
> because that makes no sense at all. It basically removes a sanity
> check, yet that sanity check makes a hell of a lot of sense. Unlocking
> a CSD that is not locked is *wrong*.
> 
> The crazy code code comes from commit c84a83e2aaab ("smp: don't warn
> about csd->flags having CSD_FLAG_LOCK cleared for !wait") by Jens, but
> the explanation and the code is pure crap.
> 
> There is no way in hell that it is ever correct to unlock an entry
> that isn't locked, so that whole CSD_FLAG_WAIT thing is buggy as hell.
> 
> The explanation in commit c84a83e2aaab says that  "blk-mq reuses the
> request potentially immediately" and claims that that is somehow ok,
> but that's utter BS. Even if you don't ever wait for it, the CSD lock
> bit fundamentally also protects the "csd->llist" pointer. So what that
> commit actually does is to just remove a safety check, and do so in a
> very unsafe manner. And apparently block-mq re-uses something THAT IS
> STILL ACTIVELY IN USE. That's just horrible.

I agree that this description is probably utter crap. And now I do
actually remember the issue at hand. The resource here is the tag, that
decides what request we'll use, and subsequently what call_single_data
storage is used. When this was originally done, blk-mq cleared the
request from the function callback, instead of doing it at allocation
time. The assumption here was cache hotness. That in turn also cleared
->csd, which meant that the flags got zeroed and csd_unlock() was
naturally unhappy. THAT was the reuse case, not that the request would
get reused before we had finished the IPI fn callback since that would
obviously create other badness. Now I'm not sure what made me create
that patch, which in retrospect is a bad hammer for this problem.

blk-mq doesn't do the init-at-finish time anymore, so it should not be
hit by the issue. But if we do bring that back, then it would still work
fine with Thomas' patch, since we unlock prior to running the callback.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 23:04         ` Jens Axboe
@ 2014-11-17 23:17           ` Thomas Gleixner
  2014-11-18  2:23             ` Jens Axboe
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-17 23:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Ingo Molnar, Dave Jones, Linux Kernel,
	the arch/x86 maintainers

On Mon, 17 Nov 2014, Jens Axboe wrote:
> On 11/17/2014 02:22 PM, Linus Torvalds wrote:
> > On Fri, Nov 14, 2014 at 5:59 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> Judging by the code disassembly, it's the "csd_lock_wait(csd)" at the
> >> end.
> > 
> > Btw, looking at this, I grew really suspicious of this code in csd_unlock():
> > 
> >         WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
> > 
> > because that makes no sense at all. It basically removes a sanity
> > check, yet that sanity check makes a hell of a lot of sense. Unlocking
> > a CSD that is not locked is *wrong*.
> > 
> > The crazy code code comes from commit c84a83e2aaab ("smp: don't warn
> > about csd->flags having CSD_FLAG_LOCK cleared for !wait") by Jens, but
> > the explanation and the code is pure crap.
> > 
> > There is no way in hell that it is ever correct to unlock an entry
> > that isn't locked, so that whole CSD_FLAG_WAIT thing is buggy as hell.
> > 
> > The explanation in commit c84a83e2aaab says that  "blk-mq reuses the
> > request potentially immediately" and claims that that is somehow ok,
> > but that's utter BS. Even if you don't ever wait for it, the CSD lock
> > bit fundamentally also protects the "csd->llist" pointer. So what that
> > commit actually does is to just remove a safety check, and do so in a
> > very unsafe manner. And apparently block-mq re-uses something THAT IS
> > STILL ACTIVELY IN USE. That's just horrible.
> 
> I agree that this description is probably utter crap. And now I do
> actually remember the issue at hand. The resource here is the tag, that
> decides what request we'll use, and subsequently what call_single_data
> storage is used. When this was originally done, blk-mq cleared the
> request from the function callback, instead of doing it at allocation
> time. The assumption here was cache hotness. That in turn also cleared
> ->csd, which meant that the flags got zeroed and csd_unlock() was
> naturally unhappy.

So that's exactly what I described in my other reply.

     csd_lock(csd);
     queue_csd();
     ipi();
				csd->func(csd->info);
     wait_for_completion(csd);
				  complete(csd);
     reuse_csd(csd);		
				csd_unlock(csd);

When you call complete() nothing can rely on csd anymore, except for
the smp core code ....

> THAT was the reuse case, not that the request would get reused
> before we had finished the IPI fn callback since that would
> obviously create other badness. Now I'm not sure what made me create
> that patch, which in retrospect is a bad hammer for this problem.

Performance blindness?
 
> blk-mq doesn't do the init-at-finish time anymore, so it should not be
> hit by the issue. But if we do bring that back, then it would still work
> fine with Thomas' patch, since we unlock prior to running the callback.

So if blk-mq is not relying on that, then we really should back out
that stuff for 3.18 and tag it for stable.

Treating sync and async function calls differently makes sense,
because any async caller which cannot deal with the unlock before call
scheme is broken by definition already today. But that's material for
next.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 22:43           ` Thomas Gleixner
  2014-11-17 22:58             ` Jens Axboe
@ 2014-11-17 23:59             ` Linus Torvalds
  2014-11-18  0:15               ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-17 23:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jens Axboe, Ingo Molnar, Dave Jones, Linux Kernel,
	the arch/x86 maintainers

On Mon, Nov 17, 2014 at 2:43 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> No, that won't work for synchronous calls:\

Right you are.

> So a combo of both (Jens and yours) might do the trick. Patch below.

Yeah, I guess that would work. The important part is that *if*
somebody really reuses the csd, we'd better have a release barrier
(which csd_unlock() does, although badly - but this probably isn't
that performance-critical) *before* we call the function, because
otherwise there's no real serialization for the reuse.

Of course, most of these things are presumably always per-cpu data
structures, so the whole worry about "csd" being accessed from
different CPU's probably doesn't even exist, and this all works fine
as-is anyway, even in the presense of odd memory ordering issues.

Judging from Jens' later email, it looks like we simply don't need
this code at all any more, though, and we could just revert the
commit.

NOTE! I don't think this actually has anything to do with the actual
problem that Dave saw. I just reacted to that WARN_ON() when I was
looking at the code, and it made me go "that looks extremely
suspicious".

Particularly on x86, with strong memory ordering, I don't think that
any random accesses to 'csd' after the call to 'csd->func()' could
actually matter. I just felt very nervous about the claim that
somebody can reuse the csd immediately, that smelled bad to me from a
*conceptual* standpoint, even if I suspect it works perfectly fine in
practice.

Anyway, I've found *another* race condition, which (again) doesn't
actually seem to be an issue on x86.

In particular, "csd_lock()" does things pretty well, in that it does a
smp_mb() after setting the lock bit, so certainly nothing afterwards
will leak out of that locked region.

But look at csd_lock_wait(). It just does

        while (csd->flags & CSD_FLAG_LOCK)
                cpu_relax();

and basically there's no memory barriers there. Now, on x86, this is a
non-issue, since all reads act as an acquire, but at least in *theory*
we have this completely unordered read going on. So any subsequent
memory oeprations (ie after the return from generic_exec_single()
could in theory see data from *before* the read.

So that whole kernel/smp.c locking looks rather dubious. The smp_mb()
in csd_lock() is overkill (a "smp_store_release()" should be
sufficient), and I think that the read of csd->flags in csd_unlock()
should be a smp_load_acquire().

Again, none of this has anything to do with Dave's problem. The memory
ordering issues really cannot be an issue on x86, I'm just saying that
there's code there that makes me a bit uncomfortable.

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 23:59             ` Linus Torvalds
@ 2014-11-18  0:15               ` Thomas Gleixner
  0 siblings, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-18  0:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Ingo Molnar, Dave Jones, Linux Kernel,
	the arch/x86 maintainers

On Mon, 17 Nov 2014, Linus Torvalds wrote:
> On Mon, Nov 17, 2014 at 2:43 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > So a combo of both (Jens and yours) might do the trick. Patch below.
> 
> Yeah, I guess that would work. The important part is that *if*
> somebody really reuses the csd, we'd better have a release barrier
> (which csd_unlock() does, although badly - but this probably isn't
> that performance-critical) *before* we call the function, because
> otherwise there's no real serialization for the reuse.

Indeed.
 
> Of course, most of these things are presumably always per-cpu data
> structures, so the whole worry about "csd" being accessed from
> different CPU's probably doesn't even exist, and this all works fine
> as-is anyway, even in the presense of odd memory ordering issues.
> 
> Judging from Jens' later email, it looks like we simply don't need
> this code at all any more, though, and we could just revert the
> commit.

Right. Reverting it is the proper solution for now. Though we should
really think about the async seperation later. It makes a lot of
sense.

> NOTE! I don't think this actually has anything to do with the actual
> problem that Dave saw. I just reacted to that WARN_ON() when I was
> looking at the code, and it made me go "that looks extremely
> suspicious".

One thing I was looking into today is the increased use of irq_work
which uses IPIs as well. Not sure whether that's related, but I it's
not from my radar yet.

But the possible compiler wreckage (or exposed kernel wreckage) is
frightening in several aspects ...

> Particularly on x86, with strong memory ordering, I don't think that
> any random accesses to 'csd' after the call to 'csd->func()' could
> actually matter. I just felt very nervous about the claim that
> somebody can reuse the csd immediately, that smelled bad to me from a
> *conceptual* standpoint, even if I suspect it works perfectly fine in
> practice.
> 
> Anyway, I've found *another* race condition, which (again) doesn't
> actually seem to be an issue on x86.
> 
> In particular, "csd_lock()" does things pretty well, in that it does a
> smp_mb() after setting the lock bit, so certainly nothing afterwards
> will leak out of that locked region.
> 
> But look at csd_lock_wait(). It just does
> 
>         while (csd->flags & CSD_FLAG_LOCK)
>                 cpu_relax();
> 
> and basically there's no memory barriers there. Now, on x86, this is a
> non-issue, since all reads act as an acquire, but at least in *theory*
> we have this completely unordered read going on. So any subsequent
> memory oeprations (ie after the return from generic_exec_single()
> could in theory see data from *before* the read.

True.
 
> So that whole kernel/smp.c locking looks rather dubious. The smp_mb()
> in csd_lock() is overkill (a "smp_store_release()" should be
> sufficient), and I think that the read of csd->flags in csd_unlock()
> should be a smp_load_acquire().
> 
> Again, none of this has anything to do with Dave's problem. The memory
> ordering issues really cannot be an issue on x86, I'm just saying that
> there's code there that makes me a bit uncomfortable.

Right you are and we should fix it asap.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 19:59           ` Linus Torvalds
@ 2014-11-18  2:09             ` Dave Jones
  2014-11-18  2:21               ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-18  2:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Mon, Nov 17, 2014 at 11:59:34AM -0800, Linus Torvalds wrote:
 > On Mon, Nov 17, 2014 at 9:03 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > It could a completely different cause for lockup, but seeing this now
 > > has me wondering if perhaps it's something unrelated to the kernel.
 > > I have recollection of running late .17rc's for days without incident,
 > > and I'm pretty sure .17 was ok too.  But a few weeks ago I did upgrade
 > > that test box to the Fedora 21 beta.  Which means I have a new gcc.
 > > I'm not sure I really trust 4.9.1 yet, so maybe I'll see if I can
 > > get 4.8 back on there and see if that's any better.
 > 
 > I'm not sure if I should be relieved or horrified.
 > 
 > Horrified, I think.
 > 
 > It really would be a wonderful thing to have some kind of "compiler
 > bisection" with mixed object files to see exactly which file it
 > miscompiles (and by "miscompiles" it might just be a kernel bug where
 > we are missing a barrier or something, and older gcc's just happened
 > to not show it - so it could still easily be a kernel problem).

After wasting countless hours rolling back to Fedora 20 and gcc 4.8.1,
I saw the exact same trace on 3.17, so now I don't know what to think.

So it's great that it's not a regression vs .17, but otoh, who knows
how far back this goes. This looks like a nightmarish bisect case, and
I've no idea why it's now happening so often.

I'll give Don's softlockup_all_cpu_backtrace=1 idea a try on 3.18rc5
and see if that shines any more light on this.

Deeply puzzling.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18  2:09             ` Dave Jones
@ 2014-11-18  2:21               ` Linus Torvalds
  2014-11-18  2:39                 ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-18  2:21 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Mon, Nov 17, 2014 at 6:09 PM, Dave Jones <davej@redhat.com> wrote:
>
> After wasting countless hours rolling back to Fedora 20 and gcc 4.8.1,
> I saw the exact same trace on 3.17, so now I don't know what to think.

Uhhuh.

Has anything else changed? New trinity tests? If it has happened in as
little as ten minutes, and you don't recall having seen this until
about a week ago, it does sound like something changed.

But yeah, try the softlockup_all_cpu_backtrace, maybe there's a
pattern somewhere..

              Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 23:17           ` Thomas Gleixner
@ 2014-11-18  2:23             ` Jens Axboe
  0 siblings, 0 replies; 486+ messages in thread
From: Jens Axboe @ 2014-11-18  2:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Ingo Molnar, Dave Jones, Linux Kernel,
	the arch/x86 maintainers

On 11/17/2014 04:17 PM, Thomas Gleixner wrote:
> On Mon, 17 Nov 2014, Jens Axboe wrote:
>> On 11/17/2014 02:22 PM, Linus Torvalds wrote:
>>> On Fri, Nov 14, 2014 at 5:59 PM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>>
>>>> Judging by the code disassembly, it's the "csd_lock_wait(csd)" at the
>>>> end.
>>>
>>> Btw, looking at this, I grew really suspicious of this code in csd_unlock():
>>>
>>>          WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
>>>
>>> because that makes no sense at all. It basically removes a sanity
>>> check, yet that sanity check makes a hell of a lot of sense. Unlocking
>>> a CSD that is not locked is *wrong*.
>>>
>>> The crazy code code comes from commit c84a83e2aaab ("smp: don't warn
>>> about csd->flags having CSD_FLAG_LOCK cleared for !wait") by Jens, but
>>> the explanation and the code is pure crap.
>>>
>>> There is no way in hell that it is ever correct to unlock an entry
>>> that isn't locked, so that whole CSD_FLAG_WAIT thing is buggy as hell.
>>>
>>> The explanation in commit c84a83e2aaab says that  "blk-mq reuses the
>>> request potentially immediately" and claims that that is somehow ok,
>>> but that's utter BS. Even if you don't ever wait for it, the CSD lock
>>> bit fundamentally also protects the "csd->llist" pointer. So what that
>>> commit actually does is to just remove a safety check, and do so in a
>>> very unsafe manner. And apparently block-mq re-uses something THAT IS
>>> STILL ACTIVELY IN USE. That's just horrible.
>>
>> I agree that this description is probably utter crap. And now I do
>> actually remember the issue at hand. The resource here is the tag, that
>> decides what request we'll use, and subsequently what call_single_data
>> storage is used. When this was originally done, blk-mq cleared the
>> request from the function callback, instead of doing it at allocation
>> time. The assumption here was cache hotness. That in turn also cleared
>> ->csd, which meant that the flags got zeroed and csd_unlock() was
>> naturally unhappy.
>
> So that's exactly what I described in my other reply.
>
>       csd_lock(csd);
>       queue_csd();
>       ipi();
> 				csd->func(csd->info);
>       wait_for_completion(csd);
> 				  complete(csd);
>       reuse_csd(csd);		
> 				csd_unlock(csd);
>
> When you call complete() nothing can rely on csd anymore, except for
> the smp core code ....

Right, and I didn't. It was the core use of csd->flags afterwards that 
complained. blk-mq merely cleared ->flags in csd->func(), which 
(granted) was a bit weird. So it was just storing to csd (before 
unlock), but in an inappropriate way. It would obviously have broken a 
sync invocation, but the block layer never does that.

>> THAT was the reuse case, not that the request would get reused
>> before we had finished the IPI fn callback since that would
>> obviously create other badness. Now I'm not sure what made me create
>> that patch, which in retrospect is a bad hammer for this problem.
>
> Performance blindness?

Possibly...

>> blk-mq doesn't do the init-at-finish time anymore, so it should not be
>> hit by the issue. But if we do bring that back, then it would still work
>> fine with Thomas' patch, since we unlock prior to running the callback.
>
> So if blk-mq is not relying on that, then we really should back out
> that stuff for 3.18 and tag it for stable.

Yeah, I'd be fine with doing that. I don't recall at the top of my head 
when when we stopped doing the clear at free time, but it was relatively 
early. OK, so checked, and 3.15 does init at free time, and 3.16 and 
later does it at allocation time. So the revert can only safely be 
applied to 3.16 and later...

> Treating sync and async function calls differently makes sense,
> because any async caller which cannot deal with the unlock before call
> scheme is broken by definition already today. But that's material for
> next.

Agree.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18  2:21               ` Linus Torvalds
@ 2014-11-18  2:39                 ` Dave Jones
  2014-11-18  2:51                   ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-18  2:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Mon, Nov 17, 2014 at 06:21:08PM -0800, Linus Torvalds wrote:
 > On Mon, Nov 17, 2014 at 6:09 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > After wasting countless hours rolling back to Fedora 20 and gcc 4.8.1,
 > > I saw the exact same trace on 3.17, so now I don't know what to think.
 > 
 > Uhhuh.
 > 
 > Has anything else changed? New trinity tests? If it has happened in as
 > little as ten minutes, and you don't recall having seen this until
 > about a week ago, it does sound like something changed.

Looking at the trinity commits over the last month or so, there's a few
new things, but nothing that sounds like it would trip up a bug like
this. "generate random ascii strings" and "mess with fcntl's after
opening fd's on startup" being the stand-outs. Everything else is pretty
much cleanups and code-motion. There was a lot of work on the code
that tracks mmaps about a month ago, but that shouldn't have had any
visible runtime differences.

<runs git diff>

hm, something I changed not that long ago, which I didn't commit yet,
was that it now runs more child processes than it used to (was 64, now 256)
I've been running like that for a while though. I want to say that was
before .17, but I'm not 100% sure.

So it could be that I'm just generating a lot more load now.
I could drop that back down and see if it 'goes away' or at least
happens less, but it strikes me that there's something here that needs
fixing regardless.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18  2:39                 ` Dave Jones
@ 2014-11-18  2:51                   ` Linus Torvalds
  2014-11-18 14:52                     ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-18  2:51 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Mon, Nov 17, 2014 at 6:39 PM, Dave Jones <davej@redhat.com> wrote:
>
> So it could be that I'm just generating a lot more load now.
> I could drop that back down and see if it 'goes away' or at least
> happens less, but it strikes me that there's something here that needs
> fixing regardless.

Oh, absolutely. It's more a question of "maybe what changed can give us a clue".

But if it' something like "more load", that's not going to help
pinpoint, and you might be better off just doing the all-cpu-backtrace
thing and hope that gives some pattern to appreciate..

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18  2:51                   ` Linus Torvalds
@ 2014-11-18 14:52                     ` Dave Jones
  2014-11-18 17:20                       ` Linus Torvalds
  2014-11-18 18:54                       ` Thomas Gleixner
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-18 14:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Mon, Nov 17, 2014 at 06:51:25PM -0800, Linus Torvalds wrote:
 > On Mon, Nov 17, 2014 at 6:39 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > So it could be that I'm just generating a lot more load now.
 > > I could drop that back down and see if it 'goes away' or at least
 > > happens less, but it strikes me that there's something here that needs
 > > fixing regardless.
 > 
 > Oh, absolutely. It's more a question of "maybe what changed can give us a clue".
 > 
 > But if it' something like "more load", that's not going to help
 > pinpoint, and you might be better off just doing the all-cpu-backtrace
 > thing and hope that gives some pattern to appreciate..

Here's the first hit. Curiously, one cpu is missing.


NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]
Modules linked in: dlci snd_seq_dummy fuse tun rfcomm bnep hidp scsi_transport_iscsi af_key llc2 can_raw nfnetlink can_bcm sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_timer ptp shpchp snd pps_core soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
irq event stamp: 2258092
hardirqs last  enabled at (2258091): [<ffffffffa91a58b5>] get_page_from_freelist+0x555/0xaa0
hardirqs last disabled at (2258092): [<ffffffffa985396a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (2244380): [<ffffffffa907b87f>] __do_softirq+0x24f/0x6f0
softirqs last disabled at (2244377): [<ffffffffa907c0dd>] irq_exit+0x13d/0x160
CPU: 1 PID: 17837 Comm: trinity-c180 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526]
task: ffff8801575e4680 ti: ffff880202434000 task.ti: ffff880202434000
RIP: 0010:[<ffffffffa91a0db0>]  [<ffffffffa91a0db0>] bad_range+0x0/0x90
RSP: 0018:ffff8802024377a0  EFLAGS: 00000246
RAX: ffff8801575e4680 RBX: 0000000000000007 RCX: 0000000000000006
RDX: 0000000000002a20 RSI: ffffea0000887fc0 RDI: ffff88024d64c740
RBP: ffff880202437898 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000020 R14: 00000000001d8608 R15: 00000000001d8668
FS:  00007fd3b8960740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd3b5ea0777 CR3: 00000001027cd000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffffffffa91a58c4 00000000000009e4 ffff8801575e4680 0000000000000001
 ffff88024d64dd08 0000010000000000 0000000000000000 ffff8802024377f8
 0000000000000000 ffff88024d64dd00 ffffffffa90ac411 ffffffff00000003
Call Trace:
 [<ffffffffa91a58c4>] ? get_page_from_freelist+0x564/0xaa0
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa91a6030>] __alloc_pages_nodemask+0x230/0xd20
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa90d1e45>] ? mark_held_locks+0x75/0xa0
 [<ffffffffa91f400e>] alloc_pages_vma+0xee/0x1b0
 [<ffffffffa91b643e>] ? shmem_alloc_page+0x6e/0xc0
 [<ffffffffa91b643e>] shmem_alloc_page+0x6e/0xc0
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa90ac58b>] ? preempt_count_sub+0x7b/0x100
 [<ffffffffa93dcc46>] ? __percpu_counter_add+0x86/0xb0
 [<ffffffffa91d50d6>] ? __vm_enough_memory+0x66/0x1c0
 [<ffffffffa919ad65>] ? find_get_entry+0x5/0x230
 [<ffffffffa933b10c>] ? cap_vm_enough_memory+0x4c/0x60
 [<ffffffffa91b8ff0>] shmem_getpage_gfp+0x630/0xa40
 [<ffffffffa90cee01>] ? match_held_lock+0x111/0x160
 [<ffffffffa91b9442>] shmem_write_begin+0x42/0x70
 [<ffffffffa919a684>] generic_perform_write+0xd4/0x1f0
 [<ffffffffa919d5d2>] __generic_file_write_iter+0x162/0x350
 [<ffffffffa92154a0>] ? new_sync_read+0xd0/0xd0
 [<ffffffffa919d7ff>] generic_file_write_iter+0x3f/0xb0
 [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffffa92155e8>] do_iter_readv_writev+0x78/0xc0
 [<ffffffffa9216e18>] do_readv_writev+0xd8/0x2a0
 [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffffa90cf426>] ? lock_release_holdtime.part.28+0xe6/0x160
 [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa90ac58b>] ? preempt_count_sub+0x7b/0x100
 [<ffffffffa90e782e>] ? rcu_read_lock_held+0x6e/0x80
 [<ffffffffa921706c>] vfs_writev+0x3c/0x50
 [<ffffffffa92171dc>] SyS_writev+0x5c/0x100
 [<ffffffffa9852c49>] tracesys_phase2+0xd4/0xd9
Code: 09 48 83 f2 01 83 e2 01 eb a3 90 48 c7 c7 a0 8c e4 a9 e8 44 e1 f2 ff 85 c0 75 d2 eb c1 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 48 b8 00 00 00 00 00 16 00 00 55 4c 8b 47 68 48 
sending NMI to other CPUs:
NMI backtrace for cpu 2
CPU: 2 PID: 15913 Comm: trinity-c141 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526]
task: ffff880223229780 ti: ffff8801afca0000 task.ti: ffff8801afca0000
RIP: 0010:[<ffffffffa9116dbe>]  [<ffffffffa9116dbe>] generic_exec_single+0xee/0x1a0
RSP: 0018:ffff8801afca3928  EFLAGS: 00000202
RAX: ffff8802443d9d00 RBX: ffff8801afca3930 RCX: ffff8802443d9dc0
RDX: ffff8802443d4d80 RSI: ffff8801afca3930 RDI: ffff8801afca3930
RBP: ffff8801afca3988 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000001 R14: ffff8801afca3a48 R15: ffffffffa9045bb0
FS:  00007fd3b8960740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000022f8bd000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff8801afca3a08 ffff8802443d9dc0 ffffffffa9045bb0 ffff8801afca3a48
 0000000000000003 000000007b19adc3 0000000000000001 00000000ffffffff
 0000000000000001 ffffffffa9045bb0 ffff8801afca3a48 0000000000000001
Call Trace:
 [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
 [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
 [<ffffffffa9116f3a>] smp_call_function_single+0x6a/0xe0
 [<ffffffffa93b2e1f>] ? cpumask_next_and+0x4f/0xb0
 [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
 [<ffffffffa9117679>] smp_call_function_many+0x2b9/0x320
 [<ffffffffa9046370>] flush_tlb_mm_range+0xe0/0x370
 [<ffffffffa91cc762>] tlb_flush_mmu_tlbonly+0x42/0x50
 [<ffffffffa91cdd28>] unmap_single_vma+0x6b8/0x900
 [<ffffffffa91ce06c>] zap_page_range_single+0xfc/0x160
 [<ffffffffa91ce254>] unmap_mapping_range+0x134/0x190
 [<ffffffffa91bb9dd>] shmem_fallocate+0x4fd/0x520
 [<ffffffffa90c7c77>] ? prepare_to_wait+0x27/0x90
 [<ffffffffa9213bc2>] do_fallocate+0x132/0x1d0
 [<ffffffffa91e3228>] SyS_madvise+0x398/0x870
 [<ffffffffa983f6c0>] ? rcu_read_lock_sched_held+0x4e/0x6a
 [<ffffffffa9013877>] ? syscall_trace_enter_phase2+0xa7/0x2b0
 [<ffffffffa9852c49>] tracesys_phase2+0xd4/0xd9
Code: 48 89 de 48 03 14 c5 60 74 f1 a9 48 89 df e8 0a fa 2a 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d c8 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
NMI backtrace for cpu 0
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 37.091 msecs
CPU: 0 PID: 15851 Comm: trinity-c80 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 36/402 20526]
task: ffff8801874e8000 ti: ffff88022baec000 task.ti: ffff88022baec000
RIP: 0010:[<ffffffffa90ac450>]  [<ffffffffa90ac450>] preempt_count_add+0x0/0xc0
RSP: 0000:ffff880244003c30  EFLAGS: 00000092
RAX: 0000000000000001 RBX: ffffffffa9edb560 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001
RBP: ffff880244003c48 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: ffff8801874e88c8 [23543.271956] NMI backtrace for cpu 3
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 100.612 msecs
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 37/402 20526]
task: ffff880242b5c680 ti: ffff880242b78000 task.ti: ffff880242b78000
RIP: 0010:[<ffffffffa94251b5>]  [<ffffffffa94251b5>] intel_idle+0xd5/0x180
RSP: 0018:ffff880242b7bdf8  EFLAGS: 00000046
RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880242b7bfd8 RDI: 0000000000000003
RBP: ffff880242b7be28 R08: 000000008baf8f3d R09: 0000000000000000
R10: 0000000000000000 R11: ffff880242b5cea0 R12: 0000000000000005
R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b78000
FS:  0000000000000000(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000b1c9ac CR3: 0000000029e11000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000342b7be28 afc453cb003f4590 ffffe8ffff402200 0000000000000005
 ffffffffa9eaa0c0 0000000000000003 ffff880242b7be78 ffffffffa96bbb45
 0000156cc07cf6e3 ffffffffa9eaa290 0000000000000096 ffffffffa9f197b0
Call Trace:
 [<ffffffffa96bbb45>] cpuidle_enter_state+0x55/0x300
 [<ffffffffa96bbea7>] cpuidle_enter+0x17/0x20
 [<ffffffffa90c88f5>] cpu_startup_entry+0x4e5/0x630
 [<ffffffffa902d523>] start_secondary+0x1a3/0x220
Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 125.739 msecs


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 14:52                     ` Dave Jones
@ 2014-11-18 17:20                       ` Linus Torvalds
  2014-11-18 19:28                         ` Thomas Gleixner
  2014-11-18 18:54                       ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-18 17:20 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, the arch/x86 maintainers, Don Zickus

On Tue, Nov 18, 2014 at 6:52 AM, Dave Jones <davej@redhat.com> wrote:
>
> Here's the first hit. Curiously, one cpu is missing.

That might be the CPU3 that isn't responding to IPIs due to some bug..

> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]
> RIP: 0010:[<ffffffffa91a0db0>]  [<ffffffffa91a0db0>] bad_range+0x0/0x90

Hmm. Something looping in the page allocator? Not waiting for a lock,
but livelocked? I'm not seeing anything here that should trigger the
NMI watchdog at all.

Can the NMI watchdog get confused somehow?

> Call Trace:
>  [<ffffffffa91a6030>] __alloc_pages_nodemask+0x230/0xd20
>  [<ffffffffa91f400e>] alloc_pages_vma+0xee/0x1b0
>  [<ffffffffa91b643e>] shmem_alloc_page+0x6e/0xc0
>  [<ffffffffa91b8ff0>] shmem_getpage_gfp+0x630/0xa40
>  [<ffffffffa91b9442>] shmem_write_begin+0x42/0x70
>  [<ffffffffa919a684>] generic_perform_write+0xd4/0x1f0
>  [<ffffffffa919d5d2>] __generic_file_write_iter+0x162/0x350
>  [<ffffffffa919d7ff>] generic_file_write_iter+0x3f/0xb0
>  [<ffffffffa92155e8>] do_iter_readv_writev+0x78/0xc0
>  [<ffffffffa9216e18>] do_readv_writev+0xd8/0x2a0
>  [<ffffffffa90cf426>] ? lock_release_holdtime.part.28+0xe6/0x160
>  [<ffffffffa921706c>] vfs_writev+0x3c/0x50

And CPU2 is in that TLB flusher again:

> NMI backtrace for cpu 2
> RIP: 0010:[<ffffffffa9116dbe>]  [<ffffffffa9116dbe>] generic_exec_single+0xee/0x1a0
> Call Trace:
>  [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
>  [<ffffffffa9116f3a>] smp_call_function_single+0x6a/0xe0
>  [<ffffffffa9117679>] smp_call_function_many+0x2b9/0x320
>  [<ffffffffa9046370>] flush_tlb_mm_range+0xe0/0x370
>  [<ffffffffa91cc762>] tlb_flush_mmu_tlbonly+0x42/0x50
>  [<ffffffffa91cdd28>] unmap_single_vma+0x6b8/0x900
>  [<ffffffffa91ce06c>] zap_page_range_single+0xfc/0x160
>  [<ffffffffa91ce254>] unmap_mapping_range+0x134/0x190

.. and the code line implies that it's in that csd_lock_wait() loop,
again consistent with waiting for some other CPU. Presumably the
missing CPU3.

> NMI backtrace for cpu 0
> RIP: 0010:[<ffffffffa90ac450>]  [<ffffffffa90ac450>] preempt_count_add+0x0/0xc0
> Call Trace:
>  [<ffffffffa96bbb45>] cpuidle_enter_state+0x55/0x300
>  [<ffffffffa96bbea7>] cpuidle_enter+0x17/0x20
>  [<ffffffffa90c88f5>] cpu_startup_entry+0x4e5/0x630
>  [<ffffffffa902d523>] start_secondary+0x1a3/0x220

And CPU0 is just in the idle loop (that RIP is literally the
instruction after the "mwait" according to the code line).

> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 125.739 msecs

.. and that's us giving up on CPU3.

So it does look like CPU3 is the problem, but sadly, CPU3 is
apparently not listening, and doesn't even react to the NMI, much less
a TLB flush IPI.

Not reacting to NMI could be:
 (a) some APIC state issue
 (b) we're already stuck in a loop in the previous NMI handler
 (c) what?

Anybody?

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 14:52                     ` Dave Jones
  2014-11-18 17:20                       ` Linus Torvalds
@ 2014-11-18 18:54                       ` Thomas Gleixner
  2014-11-18 21:55                         ` Don Zickus
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-18 18:54 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Tue, 18 Nov 2014, Dave Jones wrote:
> Here's the first hit. Curiously, one cpu is missing.

I don't think so

> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]

> irq event stamp: 2258092
> hardirqs last  enabled at (2258091): [<ffffffffa91a58b5>] get_page_from_freelist+0x555/0xaa0
> hardirqs last disabled at (2258092): [<ffffffffa985396a>] apic_timer_interrupt+0x6a/0x80

So that means we are in the timer interrupt and handling
watchdog_timer_fn.

> CPU: 1 PID: 17837 Comm: trinity-c180 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526]
> task: ffff8801575e4680 ti: ffff880202434000 task.ti: ffff880202434000
> RIP: 0010:[<ffffffffa91a0db0>]  [<ffffffffa91a0db0>] bad_range+0x0/0x90

So the softlockup tells us, that the high priority watchdog thread was
not able to touch the watchdog timestamp. That means this task was
hogging the CPU for 20+ seconds. I have no idea how that happens in
that call chain.

Call Trace:
 [<ffffffffa91a58c4>] ? get_page_from_freelist+0x564/0xaa0
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa91a6030>] __alloc_pages_nodemask+0x230/0xd20
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa90d1e45>] ? mark_held_locks+0x75/0xa0
 [<ffffffffa91f400e>] alloc_pages_vma+0xee/0x1b0
 [<ffffffffa91b643e>] ? shmem_alloc_page+0x6e/0xc0
 [<ffffffffa91b643e>] shmem_alloc_page+0x6e/0xc0
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa90ac58b>] ? preempt_count_sub+0x7b/0x100
 [<ffffffffa93dcc46>] ? __percpu_counter_add+0x86/0xb0
 [<ffffffffa91d50d6>] ? __vm_enough_memory+0x66/0x1c0
 [<ffffffffa919ad65>] ? find_get_entry+0x5/0x230
 [<ffffffffa933b10c>] ? cap_vm_enough_memory+0x4c/0x60
 [<ffffffffa91b8ff0>] shmem_getpage_gfp+0x630/0xa40
 [<ffffffffa90cee01>] ? match_held_lock+0x111/0x160
 [<ffffffffa91b9442>] shmem_write_begin+0x42/0x70
 [<ffffffffa919a684>] generic_perform_write+0xd4/0x1f0
 [<ffffffffa919d5d2>] __generic_file_write_iter+0x162/0x350
 [<ffffffffa92154a0>] ? new_sync_read+0xd0/0xd0
 [<ffffffffa919d7ff>] generic_file_write_iter+0x3f/0xb0
 [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffffa92155e8>] do_iter_readv_writev+0x78/0xc0
 [<ffffffffa9216e18>] do_readv_writev+0xd8/0x2a0
 [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffffa90cf426>] ? lock_release_holdtime.part.28+0xe6/0x160
 [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
 [<ffffffffa90ac58b>] ? preempt_count_sub+0x7b/0x100
 [<ffffffffa90e782e>] ? rcu_read_lock_held+0x6e/0x80
 [<ffffffffa921706c>] vfs_writev+0x3c/0x50
 [<ffffffffa92171dc>] SyS_writev+0x5c/0x100
 [<ffffffffa9852c49>] tracesys_phase2+0xd4/0xd9

But this gets pages for a write into shmem and the other one below
does a madvise on a shmem map. Coincidence?

> sending NMI to other CPUs:

So here we kick the other cpus

> NMI backtrace for cpu 2
> CPU: 2 PID: 15913 Comm: trinity-c141 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526]
> task: ffff880223229780 ti: ffff8801afca0000 task.ti: ffff8801afca0000
> RIP: 0010:[<ffffffffa9116dbe>]  [<ffffffffa9116dbe>] generic_exec_single+0xee/0x1a0
>  [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
>  [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
>  [<ffffffffa9116f3a>] smp_call_function_single+0x6a/0xe0
>  [<ffffffffa93b2e1f>] ? cpumask_next_and+0x4f/0xb0
>  [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
>  [<ffffffffa9117679>] smp_call_function_many+0x2b9/0x320
>  [<ffffffffa9046370>] flush_tlb_mm_range+0xe0/0x370
>  [<ffffffffa91cc762>] tlb_flush_mmu_tlbonly+0x42/0x50
>  [<ffffffffa91cdd28>] unmap_single_vma+0x6b8/0x900
>  [<ffffffffa91ce06c>] zap_page_range_single+0xfc/0x160
>  [<ffffffffa91ce254>] unmap_mapping_range+0x134/0x190
>  [<ffffffffa91bb9dd>] shmem_fallocate+0x4fd/0x520
>  [<ffffffffa90c7c77>] ? prepare_to_wait+0x27/0x90
>  [<ffffffffa9213bc2>] do_fallocate+0x132/0x1d0
>  [<ffffffffa91e3228>] SyS_madvise+0x398/0x870
>  [<ffffffffa983f6c0>] ? rcu_read_lock_sched_held+0x4e/0x6a
>  [<ffffffffa9013877>] ? syscall_trace_enter_phase2+0xa7/0x2b0
>  [<ffffffffa9852c49>] tracesys_phase2+0xd4/0xd9

We've seen that before

> NMI backtrace for cpu 0
> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 37.091 msecs

So it complains that the backtrace handler took 37 msec, which is
indeed long for just dumping a stack trace.

> CPU: 0 PID: 15851 Comm: trinity-c80 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 36/402 20526]
> task: ffff8801874e8000 ti: ffff88022baec000 task.ti: ffff88022baec000
> RIP: 0010:[<ffffffffa90ac450>]  [<ffffffffa90ac450>] preempt_count_add+0x0/0xc0
> RSP: 0000:ffff880244003c30  EFLAGS: 00000092
> RAX: 0000000000000001 RBX: ffffffffa9edb560 RCX: 0000000000000001
> RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001
> RBP: ffff880244003c48 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: ffff8801874e88c8 [23543.271956] NMI backtrace for cpu 3

So here we mangle CPU3 in and lose the backtrace for cpu0, which might
be the real interesting one ....

> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 100.612 msecs

This one takes 100ms.

> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 37/402 20526]
> task: ffff880242b5c680 ti: ffff880242b78000 task.ti: ffff880242b78000
> RIP: 0010:[<ffffffffa94251b5>]  [<ffffffffa94251b5>] intel_idle+0xd5/0x180

So that one is simply idle.

> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 125.739 msecs
> 

And we get another backtrace handler taking too long. Of course we
cannot tell which of the 3 complaints comes from which cpu, because
the printk lacks a cpuid.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 17:20                       ` Linus Torvalds
@ 2014-11-18 19:28                         ` Thomas Gleixner
  2014-11-18 21:25                           ` Don Zickus
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-18 19:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Linux Kernel, the arch/x86 maintainers, Don Zickus

On Tue, 18 Nov 2014, Linus Torvalds wrote:
> On Tue, Nov 18, 2014 at 6:52 AM, Dave Jones <davej@redhat.com> wrote:
> >
> > Here's the first hit. Curiously, one cpu is missing.
> 
> That might be the CPU3 that isn't responding to IPIs due to some bug..
> 
> > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]
> > RIP: 0010:[<ffffffffa91a0db0>]  [<ffffffffa91a0db0>] bad_range+0x0/0x90
> 
> Hmm. Something looping in the page allocator? Not waiting for a lock,
> but livelocked? I'm not seeing anything here that should trigger the
> NMI watchdog at all.
> 
> Can the NMI watchdog get confused somehow?

That's the soft lockup detector which runs from the timer interrupt
not from NMI.
 
> So it does look like CPU3 is the problem, but sadly, CPU3 is
> apparently not listening, and doesn't even react to the NMI, much less

As I said in the other mail. It gets the NMI and reacts on it. It's
just mangled into the CPU0 backtrace. 

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 19:28                         ` Thomas Gleixner
@ 2014-11-18 21:25                           ` Don Zickus
  2014-11-18 21:31                             ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Don Zickus @ 2014-11-18 21:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Dave Jones, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 08:28:01PM +0100, Thomas Gleixner wrote:
> On Tue, 18 Nov 2014, Linus Torvalds wrote:
> > On Tue, Nov 18, 2014 at 6:52 AM, Dave Jones <davej@redhat.com> wrote:
> > >
> > > Here's the first hit. Curiously, one cpu is missing.
> > 
> > That might be the CPU3 that isn't responding to IPIs due to some bug..
> > 
> > > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]
> > > RIP: 0010:[<ffffffffa91a0db0>]  [<ffffffffa91a0db0>] bad_range+0x0/0x90
> > 
> > Hmm. Something looping in the page allocator? Not waiting for a lock,
> > but livelocked? I'm not seeing anything here that should trigger the
> > NMI watchdog at all.
> > 
> > Can the NMI watchdog get confused somehow?
> 
> That's the soft lockup detector which runs from the timer interrupt
> not from NMI.
>  
> > So it does look like CPU3 is the problem, but sadly, CPU3 is
> > apparently not listening, and doesn't even react to the NMI, much less
> 
> As I said in the other mail. It gets the NMI and reacts on it. It's
> just mangled into the CPU0 backtrace. 

I was going to reply about both points too. :-)  Though the mangling looks
odd because we have spin_locks serializing the output for each cpu.

Another thing I wanted to ask DaveJ, did you recently turn on
CONFIG_PREEMPT?  That would explain why you are seeing the softlockups
now.  If you disable CONFIG_PREEMPT does the softlockups disappear.

Cheers,
Don

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 21:25                           ` Don Zickus
@ 2014-11-18 21:31                             ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-18 21:31 UTC (permalink / raw)
  To: Don Zickus
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 04:25:53PM -0500, Don Zickus wrote:
 
 > I was going to reply about both points too. :-)  Though the mangling looks
 > odd because we have spin_locks serializing the output for each cpu.
 > 
 > Another thing I wanted to ask DaveJ, did you recently turn on
 > CONFIG_PREEMPT?  That would explain why you are seeing the softlockups
 > now.  If you disable CONFIG_PREEMPT does the softlockups disappear.

I've had it on on my test box forever.  I'll add trying turning to off
to the list of things to try.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 18:54                       ` Thomas Gleixner
@ 2014-11-18 21:55                         ` Don Zickus
  2014-11-18 22:02                           ` Dave Jones
  2014-11-19  2:19                           ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Don Zickus @ 2014-11-18 21:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 07:54:17PM +0100, Thomas Gleixner wrote:
> On Tue, 18 Nov 2014, Dave Jones wrote:
> > Here's the first hit. Curiously, one cpu is missing.
> 
> I don't think so
> 
> > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]
> 
> > irq event stamp: 2258092
> > hardirqs last  enabled at (2258091): [<ffffffffa91a58b5>] get_page_from_freelist+0x555/0xaa0
> > hardirqs last disabled at (2258092): [<ffffffffa985396a>] apic_timer_interrupt+0x6a/0x80
> 
> So that means we are in the timer interrupt and handling
> watchdog_timer_fn.
> 
> > CPU: 1 PID: 17837 Comm: trinity-c180 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526]
> > task: ffff8801575e4680 ti: ffff880202434000 task.ti: ffff880202434000
> > RIP: 0010:[<ffffffffa91a0db0>]  [<ffffffffa91a0db0>] bad_range+0x0/0x90
> 
> So the softlockup tells us, that the high priority watchdog thread was
> not able to touch the watchdog timestamp. That means this task was
> hogging the CPU for 20+ seconds. I have no idea how that happens in
> that call chain.
> 
> Call Trace:
>  [<ffffffffa91a58c4>] ? get_page_from_freelist+0x564/0xaa0
>  [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
>  [<ffffffffa91a6030>] __alloc_pages_nodemask+0x230/0xd20
>  [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
>  [<ffffffffa90d1e45>] ? mark_held_locks+0x75/0xa0
>  [<ffffffffa91f400e>] alloc_pages_vma+0xee/0x1b0
>  [<ffffffffa91b643e>] ? shmem_alloc_page+0x6e/0xc0
>  [<ffffffffa91b643e>] shmem_alloc_page+0x6e/0xc0
>  [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
>  [<ffffffffa90ac58b>] ? preempt_count_sub+0x7b/0x100
>  [<ffffffffa93dcc46>] ? __percpu_counter_add+0x86/0xb0
>  [<ffffffffa91d50d6>] ? __vm_enough_memory+0x66/0x1c0
>  [<ffffffffa919ad65>] ? find_get_entry+0x5/0x230
>  [<ffffffffa933b10c>] ? cap_vm_enough_memory+0x4c/0x60
>  [<ffffffffa91b8ff0>] shmem_getpage_gfp+0x630/0xa40
>  [<ffffffffa90cee01>] ? match_held_lock+0x111/0x160
>  [<ffffffffa91b9442>] shmem_write_begin+0x42/0x70
>  [<ffffffffa919a684>] generic_perform_write+0xd4/0x1f0
>  [<ffffffffa919d5d2>] __generic_file_write_iter+0x162/0x350
>  [<ffffffffa92154a0>] ? new_sync_read+0xd0/0xd0
>  [<ffffffffa919d7ff>] generic_file_write_iter+0x3f/0xb0
>  [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
>  [<ffffffffa92155e8>] do_iter_readv_writev+0x78/0xc0
>  [<ffffffffa9216e18>] do_readv_writev+0xd8/0x2a0
>  [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
>  [<ffffffffa90cf426>] ? lock_release_holdtime.part.28+0xe6/0x160
>  [<ffffffffa919d7c0>] ? __generic_file_write_iter+0x350/0x350
>  [<ffffffffa90ac411>] ? get_parent_ip+0x11/0x50
>  [<ffffffffa90ac58b>] ? preempt_count_sub+0x7b/0x100
>  [<ffffffffa90e782e>] ? rcu_read_lock_held+0x6e/0x80
>  [<ffffffffa921706c>] vfs_writev+0x3c/0x50
>  [<ffffffffa92171dc>] SyS_writev+0x5c/0x100
>  [<ffffffffa9852c49>] tracesys_phase2+0xd4/0xd9
> 
> But this gets pages for a write into shmem and the other one below
> does a madvise on a shmem map. Coincidence?
> 
> > sending NMI to other CPUs:
> 
> So here we kick the other cpus
> 
> > NMI backtrace for cpu 2
> > CPU: 2 PID: 15913 Comm: trinity-c141 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526]
> > task: ffff880223229780 ti: ffff8801afca0000 task.ti: ffff8801afca0000
> > RIP: 0010:[<ffffffffa9116dbe>]  [<ffffffffa9116dbe>] generic_exec_single+0xee/0x1a0
> >  [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
> >  [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
> >  [<ffffffffa9116f3a>] smp_call_function_single+0x6a/0xe0
> >  [<ffffffffa93b2e1f>] ? cpumask_next_and+0x4f/0xb0
> >  [<ffffffffa9045bb0>] ? do_flush_tlb_all+0x60/0x60
> >  [<ffffffffa9117679>] smp_call_function_many+0x2b9/0x320
> >  [<ffffffffa9046370>] flush_tlb_mm_range+0xe0/0x370
> >  [<ffffffffa91cc762>] tlb_flush_mmu_tlbonly+0x42/0x50
> >  [<ffffffffa91cdd28>] unmap_single_vma+0x6b8/0x900
> >  [<ffffffffa91ce06c>] zap_page_range_single+0xfc/0x160
> >  [<ffffffffa91ce254>] unmap_mapping_range+0x134/0x190
> >  [<ffffffffa91bb9dd>] shmem_fallocate+0x4fd/0x520
> >  [<ffffffffa90c7c77>] ? prepare_to_wait+0x27/0x90
> >  [<ffffffffa9213bc2>] do_fallocate+0x132/0x1d0
> >  [<ffffffffa91e3228>] SyS_madvise+0x398/0x870
> >  [<ffffffffa983f6c0>] ? rcu_read_lock_sched_held+0x4e/0x6a
> >  [<ffffffffa9013877>] ? syscall_trace_enter_phase2+0xa7/0x2b0
> >  [<ffffffffa9852c49>] tracesys_phase2+0xd4/0xd9
> 
> We've seen that before
> 
> > NMI backtrace for cpu 0
> > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 37.091 msecs
> 
> So it complains that the backtrace handler took 37 msec, which is
> indeed long for just dumping a stack trace.
> 
> > CPU: 0 PID: 15851 Comm: trinity-c80 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 36/402 20526]
> > task: ffff8801874e8000 ti: ffff88022baec000 task.ti: ffff88022baec000
> > RIP: 0010:[<ffffffffa90ac450>]  [<ffffffffa90ac450>] preempt_count_add+0x0/0xc0
> > RSP: 0000:ffff880244003c30  EFLAGS: 00000092
> > RAX: 0000000000000001 RBX: ffffffffa9edb560 RCX: 0000000000000001
> > RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001
> > RBP: ffff880244003c48 R08: 0000000000000000 R09: 0000000000000001
> > R10: 0000000000000000 R11: ffff8801874e88c8 [23543.271956] NMI backtrace for cpu 3
> 
> So here we mangle CPU3 in and lose the backtrace for cpu0, which might
> be the real interesting one ....


Dave,

Can you provide another dump?  The hope is we get something not mangled?

The other option we have done in RHEL is panic the system and let kdump
capture the memory.  Then we can analyze the vmcore for the stack trace
cpu0 stored in memory to get a rough idea where it might be if the cpu
isn't responding very well.

Cheers,
Don

> 
> > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 100.612 msecs
> 
> This one takes 100ms.
> 
> > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 37/402 20526]
> > task: ffff880242b5c680 ti: ffff880242b78000 task.ti: ffff880242b78000
> > RIP: 0010:[<ffffffffa94251b5>]  [<ffffffffa94251b5>] intel_idle+0xd5/0x180
> 
> So that one is simply idle.
> 
> > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 125.739 msecs
> > 
> 
> And we get another backtrace handler taking too long. Of course we
> cannot tell which of the 3 complaints comes from which cpu, because
> the printk lacks a cpuid.
> 
> Thanks,
> 
> 	tglx
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 21:55                         ` Don Zickus
@ 2014-11-18 22:02                           ` Dave Jones
  2014-11-19 14:41                             ` Don Zickus
  2014-11-19  2:19                           ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-18 22:02 UTC (permalink / raw)
  To: Don Zickus
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 04:55:40PM -0500, Don Zickus wrote:

 > > So here we mangle CPU3 in and lose the backtrace for cpu0, which might
 > > be the real interesting one ....
 > 
 > Can you provide another dump?  The hope is we get something not mangled?

Working on it..

 > The other option we have done in RHEL is panic the system and let kdump
 > capture the memory.  Then we can analyze the vmcore for the stack trace
 > cpu0 stored in memory to get a rough idea where it might be if the cpu
 > isn't responding very well.

I don't know if it's because of the debug options I typically run with,
or that I'm perpetually cursed, but I've never managed to get kdump to
do anything useful. (The last time I tried it was actively harmful in
that not only did it fail to dump anything, it wedged the machine so
it didn't reboot after panic).

Unless there's some magic step missing from the documentation at
http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
then I'm not optimistic it'll be useful.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 21:55                         ` Don Zickus
  2014-11-18 22:02                           ` Dave Jones
@ 2014-11-19  2:19                           ` Dave Jones
  2014-11-19  4:40                             ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-19  2:19 UTC (permalink / raw)
  To: Don Zickus
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 04:55:40PM -0500, Don Zickus wrote:
 
 > Can you provide another dump?  The hope is we get something not mangled?
 
Ok, here's another instance.
This time around, we got all 4 cpu traces.

NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480]
CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
task: ffff88023fda4680 ti: ffff880101ee0000 task.ti: ffff880101ee0000
RIP: 0010:[<ffffffff8a1798b4>]  [<ffffffff8a1798b4>] context_tracking_user_enter+0xa4/0x190
RSP: 0018:ffff880101ee3f00  EFLAGS: 00000282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8802445d1258
RDX: 0000000000000001 RSI: ffffffff8aac2c64 RDI: ffffffff8aa94505
RBP: ffff880101ee3f10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880101ee3ec0
R13: ffffffff8a38c577 R14: ffff880101ee3e70 R15: ffffffff8aa9ee99
FS:  00007f706c089740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001bf67f000 CR4: 00000000001407e0
DR0: 00007f0b19510000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff880101ee3f58 00007f706bcd0000 ffff880101ee3f40 ffffffff8a012fc5
 0000000000000000 0000000000007c1b 00007f706bcd0000 00007f706bcd0068
 0000000000000000 ffffffff8a7d8624 000000001008feff 0000000000000000
Call Trace:
 [<ffffffff8a012fc5>] syscall_trace_leave+0xa5/0x160
 [<ffffffff8a7d8624>] int_check_syscall_exit_work+0x34/0x3d
Code: 75 4d 48 c7 c7 64 2c ac 8a e8 e9 2c 21 00 65 c7 04 25 54 f7 1c 00 01 00 00 00 41 f7 c4 00 02 00 00 74 1c e8 8f 44 fd ff 41 54 9d <5b> 41 5c 5d c3 0f 1f 80 00 00 00 00 f3 c3 66 0f 1f 44 00 00 41 
sending NMI to other CPUs:
NMI backtrace for cpu 0
CPU: 0 PID: 27716 Comm: kworker/0:1 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
Workqueue: events nohz_kick_work_fn
task: ffff88017c358000 ti: ffff8801d7124000 task.ti: ffff8801d7124000
RIP: 0010:[<ffffffff8a0ffb52>]  [<ffffffff8a0ffb52>] smp_call_function_many+0x1b2/0x320
RSP: 0018:ffff8801d7127cb8  EFLAGS: 00000202
RAX: 0000000000000002 RBX: ffff8802441d4dc0 RCX: 0000000000000038
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8802445d84e8
RBP: ffff8801d7127d08 R08: ffff880243c3aa80 R09: 0000000000000000
R10: ffff880243c3aa80 R11: 0000000000000000 R12: ffffffff8a0fa2c0
R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000002
FS:  0000000000000000(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f70688ff000 CR3: 000000000ac11000 CR4: 00000000001407f0
DR0: 00007f0b19510000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff8801d7127d28 0000000000000246 00000000d7127ce8 00000000001d4d80
 ffff880206b99780 ffff88017f6ccc30 ffff8802441d3640 ffff8802441d9e00
 ffffffff8ac4e340 0000000000000000 ffff8801d7127d18 ffffffff8a0fa3f5
Call Trace:
 [<ffffffff8a0fa3f5>] tick_nohz_full_kick_all+0x35/0x70
 [<ffffffff8a0ec8fe>] nohz_kick_work_fn+0xe/0x10
 [<ffffffff8a08e61d>] process_one_work+0x1fd/0x590
 [<ffffffff8a08e597>] ? process_one_work+0x177/0x590
 [<ffffffff8a0c112e>] ? put_lock_stats.isra.23+0xe/0x30
 [<ffffffff8a08eacb>] worker_thread+0x11b/0x490
 [<ffffffff8a08e9b0>] ? process_one_work+0x590/0x590
 [<ffffffff8a0942e9>] kthread+0xf9/0x110
 [<ffffffff8a0c112e>] ? put_lock_stats.isra.23+0xe/0x30
 [<ffffffff8a0941f0>] ? kthread_create_on_node+0x250/0x250
 [<ffffffff8a7d82ac>] ret_from_fork+0x7c/0xb0
 [<ffffffff8a0941f0>] ? kthread_create_on_node+0x250/0x250
Code: a5 c1 00 49 89 c7 41 89 c6 7d 7e 49 63 c7 48 8b 3b 48 03 3c c5 e0 6b d1 8a 0f b7 57 18 f6 c2 01 74 12 0f 1f 80 00 00 00 00 f3 90 <0f> b7 57 18 f6 c2 01 75 f5 83 ca 01 66 89 57 18 0f ae f0 48 8b 
NMI backtrace for cpu 1
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 34.478 msecs
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 10/411 32140]
task: ffff880242b5de00 ti: ffff880242b64000 task.ti: ffff880242b64000
RIP: 0010:[<ffffffff8a3e14a5>]  [<ffffffff8a3e14a5>] intel_idle+0xd5/0x180
RSP: 0018:ffff880242b67df8  EFLAGS: 00000046
RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880242b67fd8 RDI: 0000000000000001
RBP: ffff880242b67e28 R08: 000000008baf93be R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b64000
FS:  0000000000000000(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcd3a34d000 CR3: 000000000ac11000 CR4: 00000000001407e0
DR0: 00007f0b19510000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000142b67e28 84d4d6ff01c7f4fd ffffe8ffff002200 0000000000000005
 ffffffff8acaa080 0000000000000001 ffff880242b67e78 ffffffff8a666075
 000011f1e9f6e882 ffffffff8acaa250 ffff880242b64000 ffffffff8ad18f30
Call Trace:
 [<ffffffff8a666075>] cpuidle_enter_state+0x55/0x1c0
 [<ffffffff8a666297>] cpuidle_enter+0x17/0x20
 [<ffffffff8a0bb323>] cpu_startup_entry+0x433/0x4e0
 [<ffffffff8a02b763>] start_secondary+0x1a3/0x220
Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
NMI backtrace for cpu 3
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 62.461 msecs
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 10/411 32140]
task: ffff880242b5c680 ti: ffff880242b78000 task.ti: ffff880242b78000
RIP: 0010:[<ffffffff8a3e14a5>]  [<ffffffff8a3e14a5>] intel_idle+0xd5/0x180
RSP: 0018:ffff880242b7bdf8  EFLAGS: 00000046
RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880242b7bfd8 RDI: ffffffff8ac11000
RBP: ffff880242b7be28 R08: 000000008baf93be R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b78000
FS:  0000000000000000(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000ac11000 CR4: 00000000001407e0
DR0: 00007f0b19510000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000342b7be28 43db433bbd06c740 ffffe8ffff402200 0000000000000005
 ffffffff8acaa080 0000000000000003 ffff880242b7be78 ffffffff8a666075
 000011f1d69aaf37 ffffffff8acaa250 ffff880242b78000 ffffffff8ad18f30
Call Trace:
 [<ffffffff8a666075>] cpuidle_enter_state+0x55/0x1c0
 [<ffffffff8a666297>] cpuidle_enter+0x17/0x20
 [<ffffffff8a0bb323>] cpu_startup_entry+0x433/0x4e0
 [<ffffffff8a02b763>] start_secondary+0x1a3/0x220
Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 89.635 msecs


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19  2:19                           ` Dave Jones
@ 2014-11-19  4:40                             ` Linus Torvalds
  2014-11-19  4:59                               ` Dave Jones
                                                 ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-19  4:40 UTC (permalink / raw)
  To: Dave Jones, Don Zickus, Thomas Gleixner, Linus Torvalds,
	Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 6:19 PM, Dave Jones <davej@redhat.com> wrote:
>
> NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480]
> CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
> RIP: 0010:[<ffffffff8a1798b4>]  [<ffffffff8a1798b4>] context_tracking_user_enter+0xa4/0x190
> Call Trace:
>  [<ffffffff8a012fc5>] syscall_trace_leave+0xa5/0x160
>  [<ffffffff8a7d8624>] int_check_syscall_exit_work+0x34/0x3d

Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.

Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?

That makes me wonder: does the problem go away if you disable NOHZ?

> CPU: 0 PID: 27716 Comm: kworker/0:1 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
> Workqueue: events nohz_kick_work_fn
> RIP: 0010:[<ffffffff8a0ffb52>]  [<ffffffff8a0ffb52>] smp_call_function_many+0x1b2/0x320
> Call Trace:
>  [<ffffffff8a0fa3f5>] tick_nohz_full_kick_all+0x35/0x70
>  [<ffffffff8a0ec8fe>] nohz_kick_work_fn+0xe/0x10
>  [<ffffffff8a08e61d>] process_one_work+0x1fd/0x590
>  [<ffffffff8a08eacb>] worker_thread+0x11b/0x490
>  [<ffffffff8a0942e9>] kthread+0xf9/0x110
>  [<ffffffff8a7d82ac>] ret_from_fork+0x7c/0xb0

Yeah, there's certainly some NOHZ work going on on CPU0 too.


> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 10/411 32140]
> RIP: 0010:[<ffffffff8a3e14a5>]  [<ffffffff8a3e14a5>] intel_idle+0xd5/0x180
> Call Trace:
>  [<ffffffff8a666075>] cpuidle_enter_state+0x55/0x1c0
>  [<ffffffff8a666297>] cpuidle_enter+0x17/0x20
>  [<ffffffff8a0bb323>] cpu_startup_entry+0x433/0x4e0
>  [<ffffffff8a02b763>] start_secondary+0x1a3/0x220

Nothing.

> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 10/411 32140]
> RIP: 0010:[<ffffffff8a3e14a5>]  [<ffffffff8a3e14a5>] intel_idle+0xd5/0x180
>  [<ffffffff8a666075>] cpuidle_enter_state+0x55/0x1c0
>  [<ffffffff8a666297>] cpuidle_enter+0x17/0x20
>  [<ffffffff8a0bb323>] cpu_startup_entry+0x433/0x4e0
>  [<ffffffff8a02b763>] start_secondary+0x1a3/0x220

Nothing.

Hmm. NOHZ?

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19  4:40                             ` Linus Torvalds
@ 2014-11-19  4:59                               ` Dave Jones
  2014-11-19  5:15                               ` Dave Jones
  2014-11-19 14:59                               ` Dave Jones
  2 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-19  4:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Don Zickus, Thomas Gleixner, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
 > On Tue, Nov 18, 2014 at 6:19 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480]
 > > CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
 > > RIP: 0010:[<ffffffff8a1798b4>]  [<ffffffff8a1798b4>] context_tracking_user_enter+0xa4/0x190
 > > Call Trace:
 > >  [<ffffffff8a012fc5>] syscall_trace_leave+0xa5/0x160
 > >  [<ffffffff8a7d8624>] int_check_syscall_exit_work+0x34/0x3d
 > 
 > Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.
 > 
 > Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?

I do.

 > That makes me wonder: does the problem go away if you disable NOHZ?

I'll give it a try, and see what falls out overnight.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19  4:40                             ` Linus Torvalds
  2014-11-19  4:59                               ` Dave Jones
@ 2014-11-19  5:15                               ` Dave Jones
  2014-11-20 14:36                                 ` Frederic Weisbecker
  2014-11-19 14:59                               ` Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-19  5:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Don Zickus, Thomas Gleixner, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:

 > Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.
 > 
 > Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?
 > 
 > That makes me wonder: does the problem go away if you disable NOHZ?

Does nohz=off do enough ? I couldn't convince myself after looking at
dmesg, and still seeing dynticks stuff in there.

I'll do a rebuild with all the CONFIG_NO_HZ stuff off, though it also changes
some other config stuff wrt timers.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-18 22:02                           ` Dave Jones
@ 2014-11-19 14:41                             ` Don Zickus
  2014-11-19 15:03                               ` Vivek Goyal
  2014-11-20  9:54                               ` Dave Young
  0 siblings, 2 replies; 486+ messages in thread
From: Don Zickus @ 2014-11-19 14:41 UTC (permalink / raw)
  To: Dave Jones, Thomas Gleixner, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers, vgoyal

On Tue, Nov 18, 2014 at 05:02:54PM -0500, Dave Jones wrote:
> On Tue, Nov 18, 2014 at 04:55:40PM -0500, Don Zickus wrote:
> 
>  > > So here we mangle CPU3 in and lose the backtrace for cpu0, which might
>  > > be the real interesting one ....
>  > 
>  > Can you provide another dump?  The hope is we get something not mangled?
> 
> Working on it..
> 
>  > The other option we have done in RHEL is panic the system and let kdump
>  > capture the memory.  Then we can analyze the vmcore for the stack trace
>  > cpu0 stored in memory to get a rough idea where it might be if the cpu
>  > isn't responding very well.
> 
> I don't know if it's because of the debug options I typically run with,
> or that I'm perpetually cursed, but I've never managed to get kdump to
> do anything useful. (The last time I tried it was actively harmful in
> that not only did it fail to dump anything, it wedged the machine so
> it didn't reboot after panic).
> 
> Unless there's some magic step missing from the documentation at
> http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
> then I'm not optimistic it'll be useful.

Well, I don't know when the last time you ran it, but I know the RH kexec
folks have started pursuing a Fedora-first package patch rule a couple of
years ago to ensure Fedora had a working kexec/kdump solution.

As for the wedging part, it was a common problem to have the kernel hang
while trying to boot the second kernel (and before console output
happened).  So the problem makes sense and is unfortunate.  I would
encourage you to try again.  :-)

Though, it is transitioning to have the app built into the kernel to deal
with the whole secure boot thing, so that might be another can of worms.

I cc'd Vivek and he can let us know how well it works with F21.

Cheers,
Don

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19  4:40                             ` Linus Torvalds
  2014-11-19  4:59                               ` Dave Jones
  2014-11-19  5:15                               ` Dave Jones
@ 2014-11-19 14:59                               ` Dave Jones
  2014-11-19 17:22                                 ` Linus Torvalds
                                                   ` (2 more replies)
  2 siblings, 3 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-19 14:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
 > On Tue, Nov 18, 2014 at 6:19 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480]
 > > CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
 > > RIP: 0010:[<ffffffff8a1798b4>]  [<ffffffff8a1798b4>] context_tracking_user_enter+0xa4/0x190
 > > Call Trace:
 > >  [<ffffffff8a012fc5>] syscall_trace_leave+0xa5/0x160
 > >  [<ffffffff8a7d8624>] int_check_syscall_exit_work+0x34/0x3d
 > 
 > Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.
 > 
 > Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?
 > 
 > That makes me wonder: does the problem go away if you disable NOHZ?

Aparently not.

NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c75:25175]
CPU: 3 PID: 25175 Comm: trinity-c75 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
task: ffff8800364e44d0 ti: ffff880192d2c000 task.ti: ffff880192d2c000
RIP: 0010:[<ffffffff94175be7>]  [<ffffffff94175be7>] context_tracking_user_exit+0x57/0x120
RSP: 0018:ffff880192d2fee8  EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000100000046 RCX: 000000336ee35b47
RDX: 0000000000000001 RSI: ffffffff94ac1e84 RDI: ffffffff94a93725
RBP: ffff880192d2fef8 R08: 00007f9b74d0b740 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff940d8503
R13: ffff880192d2fe98 R14: ffffffff943884e7 R15: ffff880192d2fe48
FS:  00007f9b74d0b740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000336f1b7740 CR3: 0000000229a95000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff880192d30000 0000000000080000 ffff880192d2ff78 ffffffff94012c25
 00007f9b747a5000 00007f9b747a5068 0000000000000000 0000000000000000
 0000000000000000 ffffffff9437b3be 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff94012c25>] syscall_trace_enter_phase1+0x125/0x1a0
 [<ffffffff9437b3be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff947d41bf>] tracesys+0x14/0x4a
Code: 42 fd ff 48 c7 c7 7a 1e ac 94 e8 25 29 21 00 65 8b 04 25 34 f7 1c 00 83 f8 01 74 28 f6 c7 02 74 13 0f 1f 00 e8 bb 43 fd ff 53 9d <5b> 41 5c 5d c3 0f 1f 40 00 53 9d e8 89 42 fd ff eb ee 0f 1f 80 
sending NMI to other CPUs:
NMI backtrace for cpu 1
CPU: 1 PID: 25164 Comm: trinity-c64 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
task: ffff88011600dbc0 ti: ffff8801a99a4000 task.ti: ffff8801a99a4000
RIP: 0010:[<ffffffff940fb71e>]  [<ffffffff940fb71e>] generic_exec_single+0xee/0x1a0
RSP: 0018:ffff8801a99a7d18  EFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff8801a99a7d20 RCX: 0000000000000038
RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
RBP: ffff8801a99a7d78 R08: ffff880242b57ce0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
R13: 0000000000000001 R14: ffff880083c28948 R15: ffffffff94166aa0
FS:  00007f9b74d0b740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000001 CR3: 00000001d8611000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff8801a99a7d28 0000000000000000 ffffffff94166aa0 ffff880083c28948
 0000000000000003 00000000e38f9aac ffff880083c28948 00000000ffffffff
 0000000000000003 ffffffff94166aa0 ffff880083c28948 0000000000000001
Call Trace:
 [<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
 [<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
 [<ffffffff940fb89a>] smp_call_function_single+0x6a/0xe0
 [<ffffffff940a172b>] ? preempt_count_sub+0x7b/0x100
 [<ffffffff941671aa>] perf_event_read+0xca/0xd0
 [<ffffffff94167240>] perf_event_read_value+0x90/0xe0
 [<ffffffff941689c6>] perf_read+0x226/0x370
 [<ffffffff942fbfb7>] ? security_file_permission+0x87/0xa0
 [<ffffffff941eafff>] vfs_read+0x9f/0x180
 [<ffffffff941ebbd8>] SyS_read+0x58/0xd0
 [<ffffffff947d42c9>] tracesys_phase2+0xd4/0xd9
Code: 48 89 de 48 03 14 c5 20 65 d1 94 48 89 df e8 8a 4b 28 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d c8 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
NMI backtrace for cpu 0
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 35.055 msecs
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 11/410 27945]
task: ffffffff94c164c0 ti: ffffffff94c00000 task.ti: ffffffff94c00000
RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180
RSP: 0018:ffffffff94c03e28  EFLAGS: 00000046
RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffff94c03fd8 RDI: 0000000000000000
RBP: ffffffff94c03e58 R08: 000000008baf8b86 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff94c00000
FS:  0000000000000000(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f162e060000 CR3: 0000000014c11000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000094c03e58 5955c5b31ad5e8cf ffffe8fffee031a8 0000000000000005
 ffffffff94ca9dc0 0000000000000000 ffffffff94c03ea8 ffffffff94661f05
 00001cb7dcf6fd93 ffffffff94ca9f90 ffffffff94c00000 ffffffff94d18870
Call[31557.908912] NMI backtrace for cpu 2
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 68.178 msecs
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 10/410 27945]
task: ffff880242b596f0 ti: ffff880242b6c000 task.ti: ffff880242b6c000
RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180
RSP: 0018:ffff880242b6fdf8  EFLAGS: 00000046
RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880242b6ffd8 RDI: 0000000000000002
RBP: ffff880242b6fe28 R08: 000000008baf8b86 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b6c000
FS:  0000000000000000(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000014c11000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000242b6fe28 da97aa9b9f42090a ffffe8ffff2031a8 0000000000000005
 ffffffff94ca9dc0 0000000000000002 ffff880242b6fe78 ffffffff94661f05
 00001cb7dcdd1af6 ffffffff94ca9f90 ffff880242b6c000 ffffffff94d18870
Call Trace:
 [<ffffffff94661f05>] cpuidle_enter_state+0x55/0x1c0
 [<ffffffff94662127>] cpuidle_enter+0x17/0x20
 [<ffffffff940b94a3>] cpu_startup_entry+0x423/0x4d0
 [<ffffffff9402b763>] start_secondary+0x1a3/0x220
Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 95.994 msecs


<tangent>

Also, today I learned we can reach the perf_event_read code from
read(). Given I had /proc/sys/kernel/perf_event_paranoid set to 1,
I'm not sure how this is even possible. The only user of perf_fops
is perf_event_open syscall _after_ it's checked that sysctl.

Oh, there's an ioctl path to perf too. Though trinity
doesn't know anything about it, so I find it surprising if it
managed to pull the right combination of entropy to make that
do the right thing.  Still, that ioctl path probably needs
to also be checking that sysctl shouldn't it ?

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 14:41                             ` Don Zickus
@ 2014-11-19 15:03                               ` Vivek Goyal
  2014-11-19 15:38                                 ` Dave Jones
  2014-11-20  9:54                               ` Dave Young
  1 sibling, 1 reply; 486+ messages in thread
From: Vivek Goyal @ 2014-11-19 15:03 UTC (permalink / raw)
  To: Don Zickus
  Cc: Dave Jones, Thomas Gleixner, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 09:41:05AM -0500, Don Zickus wrote:
> On Tue, Nov 18, 2014 at 05:02:54PM -0500, Dave Jones wrote:
> > On Tue, Nov 18, 2014 at 04:55:40PM -0500, Don Zickus wrote:
> > 
> >  > > So here we mangle CPU3 in and lose the backtrace for cpu0, which might
> >  > > be the real interesting one ....
> >  > 
> >  > Can you provide another dump?  The hope is we get something not mangled?
> > 
> > Working on it..
> > 
> >  > The other option we have done in RHEL is panic the system and let kdump
> >  > capture the memory.  Then we can analyze the vmcore for the stack trace
> >  > cpu0 stored in memory to get a rough idea where it might be if the cpu
> >  > isn't responding very well.
> > 
> > I don't know if it's because of the debug options I typically run with,
> > or that I'm perpetually cursed, but I've never managed to get kdump to
> > do anything useful. (The last time I tried it was actively harmful in
> > that not only did it fail to dump anything, it wedged the machine so
> > it didn't reboot after panic).

Hi Dave Jones,

Not being able to capture the dump I can understand but having wedged
the machine so that it does not reboot after dump failure sounds bad.
So you could not get machine to boot even after a power cycle? Would
you remember what was failing. I am curious to know what did kdump do
to make machine unbootable.

> > 
> > Unless there's some magic step missing from the documentation at
> > http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
> > then I'm not optimistic it'll be useful.

I had a quick look at it and it basically looks fine. In fedora ideally
it is just two steps process.

- Reserve memory using crashkernel. Say crashkernel=160M
- systemctl start kdump
- Crash the system or wait for it to crash.

So despite your bad experience in the past, I would encourage you to
give it a try.

> 
> Well, I don't know when the last time you ran it, but I know the RH kexec
> folks have started pursuing a Fedora-first package patch rule a couple of
> years ago to ensure Fedora had a working kexec/kdump solution.

Yep, now we are putting everything in fedora first so it should be much
better. Hard to say the same thing about driver authors. Sometimes they
might have a driver working in rhel and not necessarily upstream. I am
not sure if you ran into one of those issues.

Also recently I have seen issues with graphics drivers too.

> 
> As for the wedging part, it was a common problem to have the kernel hang
> while trying to boot the second kernel (and before console output
> happened).  So the problem makes sense and is unfortunate.  I would
> encourage you to try again.  :-)
> 
> Though, it is transitioning to have the app built into the kernel to deal
> with the whole secure boot thing, so that might be another can of worms.

I doubt that secureboot bits will contribute to the failure.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 15:03                               ` Vivek Goyal
@ 2014-11-19 15:38                                 ` Dave Jones
  2014-11-19 16:28                                   ` Vivek Goyal
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-19 15:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Don Zickus, Thomas Gleixner, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 10:03:33AM -0500, Vivek Goyal wrote:

 > Not being able to capture the dump I can understand but having wedged
 > the machine so that it does not reboot after dump failure sounds bad.
 > So you could not get machine to boot even after a power cycle? Would
 > you remember what was failing. I am curious to know what did kdump do
 > to make machine unbootable.

Power cycling was fine, because then it booted into the non-kdump kernel.
The issue was when I caused that kernel to panic, it would just sit there
wedged, with no indication it even tried to switch to the kdump kernel.

 > > > Unless there's some magic step missing from the documentation at
 > > > http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
 > > > then I'm not optimistic it'll be useful.
 > 
 > I had a quick look at it and it basically looks fine. In fedora ideally
 > it is just two steps process.
 > 
 > - Reserve memory using crashkernel. Say crashkernel=160M
 > - systemctl start kdump
 > - Crash the system or wait for it to crash.
 > 
 > So despite your bad experience in the past, I would encourage you to
 > give it a try.

'the past' here, is two weeks ago, on Fedora 21.

But, since then, I've reinstalled that box with Fedora 20 because I didn't
trust gcc 4.9, and on f20 things are actually even worse.

Right now it doesn't even create the image correctly:

dracut: *** Stripping files done ***
dracut: *** Store current command line parameters ***
dracut: *** Creating image file ***
dracut: *** Creating image file done ***
kdumpctl: cat: write error: Broken pipe
kdumpctl: kexec: failed to load kdump kernel
kdumpctl: Starting kdump: [FAILED]

It works if I run a Fedora kernel, but not with a self-built one.
And there's zero information as to what I'm doing wrong.

I saw something similar on F21, got past it somehow a few weeks ago,
but I can't remember what I had to do. Unfortunatly that was still
fruitless as it didn't actually dump anything, leading to my frustration
with the state of kdump.

I'll try again when I put F21 back on that machine, but I'm
not particularly optimistic tbh.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 15:38                                 ` Dave Jones
@ 2014-11-19 16:28                                   ` Vivek Goyal
  2014-11-20 16:10                                     ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Vivek Goyal @ 2014-11-19 16:28 UTC (permalink / raw)
  To: Dave Jones, Don Zickus, Thomas Gleixner, Linus Torvalds,
	Linux Kernel, the arch/x86 maintainers
  Cc: WANG Chao, Baoquan He, Dave Young

On Wed, Nov 19, 2014 at 10:38:52AM -0500, Dave Jones wrote:
> On Wed, Nov 19, 2014 at 10:03:33AM -0500, Vivek Goyal wrote:
> 
>  > Not being able to capture the dump I can understand but having wedged
>  > the machine so that it does not reboot after dump failure sounds bad.
>  > So you could not get machine to boot even after a power cycle? Would
>  > you remember what was failing. I am curious to know what did kdump do
>  > to make machine unbootable.
> 
> Power cycling was fine, because then it booted into the non-kdump kernel.
> The issue was when I caused that kernel to panic, it would just sit there
> wedged, with no indication it even tried to switch to the kdump kernel.

I have seen the cases where we fail to boot in second kernel and often
failure can happen very early without any information on graphic console.
I have to always hook up a serial console to get an idea what went wrong
that early. It is not an idea situation but at the same time don't know
how to improve it.

I am wondering may be in some cases we panic in second kernel and sit
there. Probably we should append a kernel command line automatically
say "panic=1" so that it reboots itself if second kernel panics.

By any chance, have you enabled "CONFIG_RANDOMIZE_BASE"? If yes, please
disable that as currently kexec/kdump stuff does not work with it. And
it hangs very early in the boot process and I had to hook serial console
to get following message on console.

arch/x86/boot/compressed/misc.c
error("32-bit relocation outside of kernel!\n");

I noticed that error() halts in a while loop after error message. May be
there can be some way for it to try to reboot instead of halting in 
while loop.

> 
>  > > > Unless there's some magic step missing from the documentation at
>  > > > http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
>  > > > then I'm not optimistic it'll be useful.
>  > 
>  > I had a quick look at it and it basically looks fine. In fedora ideally
>  > it is just two steps process.
>  > 
>  > - Reserve memory using crashkernel. Say crashkernel=160M
>  > - systemctl start kdump
>  > - Crash the system or wait for it to crash.
>  > 
>  > So despite your bad experience in the past, I would encourage you to
>  > give it a try.
> 
> 'the past' here, is two weeks ago, on Fedora 21.
> 
> But, since then, I've reinstalled that box with Fedora 20 because I didn't
> trust gcc 4.9, and on f20 things are actually even worse.
> 
> Right now it doesn't even create the image correctly:
> 
> dracut: *** Stripping files done ***
> dracut: *** Store current command line parameters ***
> dracut: *** Creating image file ***
> dracut: *** Creating image file done ***
> kdumpctl: cat: write error: Broken pipe
> kdumpctl: kexec: failed to load kdump kernel
> kdumpctl: Starting kdump: [FAILED]

Hmmm..., can you please enable debugging in kdumpctl using "set -x" and
do "touch /etc/kdump.conf; kdumpctl restart" and give debug output to me.

> 
> It works if I run a Fedora kernel, but not with a self-built one.
> And there's zero information as to what I'm doing wrong.

I just tested F20 kdump on my box and it worked fine for me.

So for you second kernel hangs and there is no info on console? Is there
any possibility to hook up serial console, enable early printk and see
if soemthing shows up there.

Apart from this, if you run into kdump issues in fedora, please cc 
kexec fedora mailing list too so that we are aware of it.

https://lists.fedoraproject.org/mailman/listinfo/kexec

Thanks
Vivek

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 14:59                               ` Dave Jones
@ 2014-11-19 17:22                                 ` Linus Torvalds
  2014-11-19 17:40                                   ` Linus Torvalds
  2014-11-19 19:15                                   ` Andy Lutomirski
  2014-11-19 21:01                                 ` Andy Lutomirski
  2014-11-20 15:04                                 ` Frederic Weisbecker
  2 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-19 17:22 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Frédéric Weisbecker, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 6:59 AM, Dave Jones <davej@redhat.com> wrote:
> On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
>  >
>  > That makes me wonder: does the problem go away if you disable NOHZ?
>
> Aparently not.
>
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c75:25175]
> CPU: 3 PID: 25175 Comm: trinity-c75 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
> RIP: 0010:[<ffffffff94175be7>]  [<ffffffff94175be7>] context_tracking_user_exit+0x57/0x120
> Call Trace:
>  [<ffffffff94012c25>] syscall_trace_enter_phase1+0x125/0x1a0
>  [<ffffffff9437b3be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff947d41bf>] tracesys+0x14/0x4a

Ok, that's just crazy. This is the system call *entry* portion.

Last time it was the system call exit side, which at least made some
sense, because the "return to user space" thing actually has loops in
it to handle all events before we return to user space.

But the whole 'tracesys' part is only entered once, at the very
beginning of a system call. There's no loop over the work. That whole
call trace implies that the lockup happened just after we entered the
system call path from _user_ space.

And in fact, exactly like last time, the code line implies that the
timer interrupt happened on the return from the instruction, and
indeed in both cases the code looked like this (the registers
differed, but the "restore flags, start popping saved regs" was the
exact same):

  26: 53                   push   %rbx
  27: 9d                   popfq
  28:* 5b                   pop    %rbx <-- trapping instruction
  29: 41 5c                 pop    %r12

in both cases, the timer interrupt happened right after the "popfq",
but in both cases the value in the register that was used to restore
eflags was invalid. Here %rbx was 0x0000000100000046 (which is a valid
eflags value, but not the one we've actually restored!), and in your
previous oops (where it was %r12) it was completely invalid.

So it hasn't actually done the "push %rbx; popfq" part - there must be
a label at the return part, and context_tracking_user_exit() never
actually did the local_irq_save/restore at all. Which means that it
took one of the early exits instead:

        if (!context_tracking_is_enabled())
                return;

        if (in_interrupt())
                return;

So not only does this happen at early system call entry time, the
function that is claimed to lock up doesn't actually *do* anything.

Ho humm..

Oh, and to make matters worse, the only way this call chain can happen
is this in syscall_trace_enter_phase1():

        if (work & _TIF_NOHZ) {
                user_exit();
                work &= ~TIF_NOHZ;
        }

so there's still some NOHZ confusion there. It looks like TIF_NOHZ
gets set regardless of whether NOHZ is enabled or not..

I'm adding Frederic explicitly to the cc too, because this is just
fishy.  I am starting to blame context tracking, because it has now
shown up twice in different guises, and TIF_NOHZ seems to be
implicated.

> CPU: 1 PID: 25164 Comm: trinity-c64 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
> RIP: 0010:[<ffffffff940fb71e>]  [<ffffffff940fb71e>] generic_exec_single+0xee/0x1a0
> Call Trace:
>  [<ffffffff940fb89a>] smp_call_function_single+0x6a/0xe0
>  [<ffffffff941671aa>] perf_event_read+0xca/0xd0
>  [<ffffffff94167240>] perf_event_read_value+0x90/0xe0
>  [<ffffffff941689c6>] perf_read+0x226/0x370
>  [<ffffffff941eafff>] vfs_read+0x9f/0x180

Hmm.. We've certainly seen a lot of smp_call, for various different
reasons in your traces..

I'm wondering if the smp-call ended up corrupting something on CPU3.
Because even _with_ TIF_NOHZ confusion, I don't see how system call
*entry* could cause a watchdog event. There are no loops, there are no
locks I see, there is just *nothing* there I can see.

Let's add Andy L to the cc too, in case he hasn't seen this.  He's
been changing the lowlevel asm code, including very much this whole
"syscall_trace_enter_phase1" thing. Maybe he sees something I don't.

Andy?

> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 11/410 27945]
> RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 10/410 27945]
> RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180

Nothing there.

> Also, today I learned we can reach the perf_event_read code from
> read(). Given I had /proc/sys/kernel/perf_event_paranoid set to 1,
> I'm not sure how this is even possible. The only user of perf_fops
> is perf_event_open syscall _after_ it's checked that sysctl.
>
> Oh, there's an ioctl path to perf too. Though trinity
> doesn't know anything about it, so I find it surprising if it
> managed to pull the right combination of entropy to make that
> do the right thing.  Still, that ioctl path probably needs
> to also be checking that sysctl shouldn't it ?

Hmm. Perf people are already mostly on the list. Peter/Ingo/Arnaldo?

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 17:22                                 ` Linus Torvalds
@ 2014-11-19 17:40                                   ` Linus Torvalds
  2014-11-19 19:02                                     ` Frederic Weisbecker
  2014-11-19 19:15                                   ` Andy Lutomirski
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-19 17:40 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Frédéric Weisbecker, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So it hasn't actually done the "push %rbx; popfq" part - there must be
> a label at the return part, and context_tracking_user_exit() never
> actually did the local_irq_save/restore at all. Which means that it
> took one of the early exits instead:
>
>         if (!context_tracking_is_enabled())
>                 return;
>
>         if (in_interrupt())
>                 return;

Ho humm. Interesting. Neither of those should possibly have happened.

We "know" that "context_tracking_is_enabled()" must be true, because
the only way we get to context_tracking_user_exit() in the first place
is through "user_exit()", which does:

        if (context_tracking_is_enabled())
                context_tracking_user_exit();

and we know we shouldn't be in_interrupt(), because the backtrace is
the system call entry path, for chrissake!

So we definitely have some corruption going on. A few possibilities:

 - either the register contents are corrupted (%rbx in your dump said
"0x0000000100000046", but the eflags we restored was 0x246)

 - in_interrupt() is wrong, and we've had some irq_count() corruption.
I'd expect that to result in "scheduling while atomic" messages,
though, especially if it goes on long enough that you get a watchdog
event..

 - there is something rotten in the land of
context_tracking_is_enabled(), which uses a static key.

 - I have misread the whole trace, and am a moron. But your earlier
report really had some very similar things, just in
context_tracking_user_enter() instead of exit.

In your previous oops, the registers that was allegedly used to
restore %eflags was %r12:

  28: 41 54                 push   %r12
  2a: 9d                   popfq
  2b:* 5b                   pop    %rbx <-- trapping instruction
  2c: 41 5c                 pop    %r12
  2e: 5d                   pop    %rbp
  2f: c3                   retq

but:

  R12: ffff880101ee3ec0
  EFLAGS: 00000282

so again, it looks like we never actually did that "popfq"
instruction, and it would have exited through the (same) early exits.

But what an odd coincidence that it ended up in both of your reports
being *exactly* at that instruction after the "popf". If it had
actually *taken* the popf, I'd not be so surprised ("ok, popf enabled
interrupts, and there was an interrupt pending"), but since everything
seems to say that it came there through some control flow that did
*not* go through the popf, that's just a very odd coincidence.

And both context_tracking_user_enter() and exit() have that exact same
issue with the early returns. They shouldn't have happened in the
first place.

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 17:40                                   ` Linus Torvalds
@ 2014-11-19 19:02                                     ` Frederic Weisbecker
  2014-11-19 19:03                                       ` Andy Lutomirski
  2014-11-19 21:56                                       ` Thomas Gleixner
  0 siblings, 2 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-19 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 09:40:26AM -0800, Linus Torvalds wrote:
> On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So it hasn't actually done the "push %rbx; popfq" part - there must be
> > a label at the return part, and context_tracking_user_exit() never
> > actually did the local_irq_save/restore at all. Which means that it
> > took one of the early exits instead:
> >
> >         if (!context_tracking_is_enabled())
> >                 return;
> >
> >         if (in_interrupt())
> >                 return;
> 
> Ho humm. Interesting. Neither of those should possibly have happened.
> 
> We "know" that "context_tracking_is_enabled()" must be true, because
> the only way we get to context_tracking_user_exit() in the first place
> is through "user_exit()", which does:
> 
>         if (context_tracking_is_enabled())
>                 context_tracking_user_exit();
> 
> and we know we shouldn't be in_interrupt(), because the backtrace is
> the system call entry path, for chrissake!
> 
> So we definitely have some corruption going on. A few possibilities:
> 
>  - either the register contents are corrupted (%rbx in your dump said
> "0x0000000100000046", but the eflags we restored was 0x246)
> 
>  - in_interrupt() is wrong, and we've had some irq_count() corruption.
> I'd expect that to result in "scheduling while atomic" messages,
> though, especially if it goes on long enough that you get a watchdog
> event..
> 
>  - there is something rotten in the land of
> context_tracking_is_enabled(), which uses a static key.
> 
>  - I have misread the whole trace, and am a moron. But your earlier
> report really had some very similar things, just in
> context_tracking_user_enter() instead of exit.
> 
> In your previous oops, the registers that was allegedly used to
> restore %eflags was %r12:
> 
>   28: 41 54                 push   %r12
>   2a: 9d                   popfq
>   2b:* 5b                   pop    %rbx <-- trapping instruction
>   2c: 41 5c                 pop    %r12
>   2e: 5d                   pop    %rbp
>   2f: c3                   retq
> 
> but:
> 
>   R12: ffff880101ee3ec0
>   EFLAGS: 00000282
> 
> so again, it looks like we never actually did that "popfq"
> instruction, and it would have exited through the (same) early exits.
> 
> But what an odd coincidence that it ended up in both of your reports
> being *exactly* at that instruction after the "popf". If it had
> actually *taken* the popf, I'd not be so surprised ("ok, popf enabled
> interrupts, and there was an interrupt pending"), but since everything
> seems to say that it came there through some control flow that did
> *not* go through the popf, that's just a very odd coincidence.
> 
> And both context_tracking_user_enter() and exit() have that exact same
> issue with the early returns. They shouldn't have happened in the
> first place.

I got a report lately involving context tracking. Not sure if it's
the same here but the issue was that context tracking uses per cpu data
and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
lazy paging.

With that in mind, about eveything can happen. Some parts of the context tracking
code really aren't fault-safe (or more generally exception safe). That's because
context tracking itself tracks exceptions.

So for example if we enter a syscall, we go to context_tracking_user_exit() then
vtime_user_enter() which _takes a lock_ with write_seqlock().

If an exception occurs before we unlock the seqlock (it's possible for
example account_user_time() -> task_group_account_field()-> cpuacct_account_field()
accesses dynamically allocated per cpu area which can fault) then
the fault calls exception_enter() then user_exit() which does all the same again
and deadlocks.

I can certainly fix that with a few recursion protection.

Now we just need to determine if the current case has the same cause.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 19:02                                     ` Frederic Weisbecker
@ 2014-11-19 19:03                                       ` Andy Lutomirski
  2014-11-19 23:00                                         ` Frederic Weisbecker
  2014-11-19 21:56                                       ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-19 19:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 11:02 AM, Frederic Weisbecker
<fweisbec@gmail.com> wrote:
> On Wed, Nov 19, 2014 at 09:40:26AM -0800, Linus Torvalds wrote:
>> On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> > So it hasn't actually done the "push %rbx; popfq" part - there must be
>> > a label at the return part, and context_tracking_user_exit() never
>> > actually did the local_irq_save/restore at all. Which means that it
>> > took one of the early exits instead:
>> >
>> >         if (!context_tracking_is_enabled())
>> >                 return;
>> >
>> >         if (in_interrupt())
>> >                 return;
>>
>> Ho humm. Interesting. Neither of those should possibly have happened.
>>
>> We "know" that "context_tracking_is_enabled()" must be true, because
>> the only way we get to context_tracking_user_exit() in the first place
>> is through "user_exit()", which does:
>>
>>         if (context_tracking_is_enabled())
>>                 context_tracking_user_exit();
>>
>> and we know we shouldn't be in_interrupt(), because the backtrace is
>> the system call entry path, for chrissake!
>>
>> So we definitely have some corruption going on. A few possibilities:
>>
>>  - either the register contents are corrupted (%rbx in your dump said
>> "0x0000000100000046", but the eflags we restored was 0x246)
>>
>>  - in_interrupt() is wrong, and we've had some irq_count() corruption.
>> I'd expect that to result in "scheduling while atomic" messages,
>> though, especially if it goes on long enough that you get a watchdog
>> event..
>>
>>  - there is something rotten in the land of
>> context_tracking_is_enabled(), which uses a static key.
>>
>>  - I have misread the whole trace, and am a moron. But your earlier
>> report really had some very similar things, just in
>> context_tracking_user_enter() instead of exit.
>>
>> In your previous oops, the registers that was allegedly used to
>> restore %eflags was %r12:
>>
>>   28: 41 54                 push   %r12
>>   2a: 9d                   popfq
>>   2b:* 5b                   pop    %rbx <-- trapping instruction
>>   2c: 41 5c                 pop    %r12
>>   2e: 5d                   pop    %rbp
>>   2f: c3                   retq
>>
>> but:
>>
>>   R12: ffff880101ee3ec0
>>   EFLAGS: 00000282
>>
>> so again, it looks like we never actually did that "popfq"
>> instruction, and it would have exited through the (same) early exits.
>>
>> But what an odd coincidence that it ended up in both of your reports
>> being *exactly* at that instruction after the "popf". If it had
>> actually *taken* the popf, I'd not be so surprised ("ok, popf enabled
>> interrupts, and there was an interrupt pending"), but since everything
>> seems to say that it came there through some control flow that did
>> *not* go through the popf, that's just a very odd coincidence.
>>
>> And both context_tracking_user_enter() and exit() have that exact same
>> issue with the early returns. They shouldn't have happened in the
>> first place.
>
> I got a report lately involving context tracking. Not sure if it's
> the same here but the issue was that context tracking uses per cpu data
> and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> lazy paging.

Wait, what?  If something like kernel_stack ends with an unmapped pmd,
we are well and truly screwed.

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 17:22                                 ` Linus Torvalds
  2014-11-19 17:40                                   ` Linus Torvalds
@ 2014-11-19 19:15                                   ` Andy Lutomirski
  2014-11-19 19:38                                     ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-19 19:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra,
	Frédéric Weisbecker, Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 19, 2014 at 6:59 AM, Dave Jones <davej@redhat.com> wrote:
>> On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
>>  >
>>  > That makes me wonder: does the problem go away if you disable NOHZ?
>>
>> Aparently not.
>>
>> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c75:25175]
>> CPU: 3 PID: 25175 Comm: trinity-c75 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
>> RIP: 0010:[<ffffffff94175be7>]  [<ffffffff94175be7>] context_tracking_user_exit+0x57/0x120
>> Call Trace:
>>  [<ffffffff94012c25>] syscall_trace_enter_phase1+0x125/0x1a0
>>  [<ffffffff9437b3be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>  [<ffffffff947d41bf>] tracesys+0x14/0x4a
>
> Ok, that's just crazy. This is the system call *entry* portion.
>
> Last time it was the system call exit side, which at least made some
> sense, because the "return to user space" thing actually has loops in
> it to handle all events before we return to user space.
>
> But the whole 'tracesys' part is only entered once, at the very
> beginning of a system call. There's no loop over the work. That whole
> call trace implies that the lockup happened just after we entered the
> system call path from _user_ space.

I suspect that the regression was triggered by the seccomp pull, since
that reworked a lot of this code.

>
> And in fact, exactly like last time, the code line implies that the
> timer interrupt happened on the return from the instruction, and
> indeed in both cases the code looked like this (the registers
> differed, but the "restore flags, start popping saved regs" was the
> exact same):
>
>   26: 53                   push   %rbx
>   27: 9d                   popfq
>   28:* 5b                   pop    %rbx <-- trapping instruction
>   29: 41 5c                 pop    %r12

Just to make sure I understand: it says "NMI watchdog", but this trace
is from a timer interrupt, not NMI, right?

>
> in both cases, the timer interrupt happened right after the "popfq",
> but in both cases the value in the register that was used to restore
> eflags was invalid. Here %rbx was 0x0000000100000046 (which is a valid
> eflags value, but not the one we've actually restored!), and in your
> previous oops (where it was %r12) it was completely invalid.
>
> So it hasn't actually done the "push %rbx; popfq" part - there must be
> a label at the return part, and context_tracking_user_exit() never
> actually did the local_irq_save/restore at all. Which means that it
> took one of the early exits instead:
>
>         if (!context_tracking_is_enabled())
>                 return;
>
>         if (in_interrupt())
>                 return;
>
> So not only does this happen at early system call entry time, the
> function that is claimed to lock up doesn't actually *do* anything.
>
> Ho humm..
>
> Oh, and to make matters worse, the only way this call chain can happen
> is this in syscall_trace_enter_phase1():
>
>         if (work & _TIF_NOHZ) {
>                 user_exit();
>                 work &= ~TIF_NOHZ;
>         }
>
> so there's still some NOHZ confusion there. It looks like TIF_NOHZ
> gets set regardless of whether NOHZ is enabled or not..
>
> I'm adding Frederic explicitly to the cc too, because this is just
> fishy.  I am starting to blame context tracking, because it has now
> shown up twice in different guises, and TIF_NOHZ seems to be
> implicated.

Is it possible that we've managed to return to userspace with
interrupts off somehow?  A loop in userspace that somehow has
interrupts off can cause all kinds of fun lockups.

I don't understand the logic of what enables TIF_NOHZ.  That being
said in the new 3.18 code, if TIF_NOHZ is set, we use part of the fast
path instead of the full syscall slow path, which means that we
meander differently through the asm than we used to (we do
syscall_trace_enter_phase1, then a fast path syscall, then we get to
sysret_careful, which does this:

    /*
     * We have a signal, or exit tracing or single-step.
     * These all wind up with the iret return path anyway,
     * so just join that path right now.
     */
    FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
    jmp int_check_syscall_exit_work


In 3.17, I don't think that code would run with context tracking on,
although I don't immediately see any bugs here.

>
>> CPU: 1 PID: 25164 Comm: trinity-c64 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
>> RIP: 0010:[<ffffffff940fb71e>]  [<ffffffff940fb71e>] generic_exec_single+0xee/0x1a0
>> Call Trace:
>>  [<ffffffff940fb89a>] smp_call_function_single+0x6a/0xe0
>>  [<ffffffff941671aa>] perf_event_read+0xca/0xd0
>>  [<ffffffff94167240>] perf_event_read_value+0x90/0xe0
>>  [<ffffffff941689c6>] perf_read+0x226/0x370
>>  [<ffffffff941eafff>] vfs_read+0x9f/0x180
>
> Hmm.. We've certainly seen a lot of smp_call, for various different
> reasons in your traces..
>
> I'm wondering if the smp-call ended up corrupting something on CPU3.
> Because even _with_ TIF_NOHZ confusion, I don't see how system call
> *entry* could cause a watchdog event. There are no loops, there are no
> locks I see, there is just *nothing* there I can see.
>

If we ever landed in userspace with interrupts off, this could happen
quite easily.  It should be straightforward to add an assertion for
that, in trinity or in the kernel.

--Andy

> Let's add Andy L to the cc too, in case he hasn't seen this.  He's
> been changing the lowlevel asm code, including very much this whole
> "syscall_trace_enter_phase1" thing. Maybe he sees something I don't.
>
> Andy?
>
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 11/410 27945]
>> RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180
>> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 10/410 27945]
>> RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180
>
> Nothing there.
>
>> Also, today I learned we can reach the perf_event_read code from
>> read(). Given I had /proc/sys/kernel/perf_event_paranoid set to 1,
>> I'm not sure how this is even possible. The only user of perf_fops
>> is perf_event_open syscall _after_ it's checked that sysctl.
>>
>> Oh, there's an ioctl path to perf too. Though trinity
>> doesn't know anything about it, so I find it surprising if it
>> managed to pull the right combination of entropy to make that
>> do the right thing.  Still, that ioctl path probably needs
>> to also be checking that sysctl shouldn't it ?
>
> Hmm. Perf people are already mostly on the list. Peter/Ingo/Arnaldo?
>
>                       Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 19:15                                   ` Andy Lutomirski
@ 2014-11-19 19:38                                     ` Linus Torvalds
  2014-11-19 22:18                                       ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-19 19:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Jones, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra,
	Frédéric Weisbecker, Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 11:15 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I suspect that the regression was triggered by the seccomp pull, since
> that reworked a lot of this code.

Note that it turns out that Dave can apparently see the same problems
with 3.17, so it's not actually a regression. So it may have been
going on for a while.


> Just to make sure I understand: it says "NMI watchdog", but this trace
> is from a timer interrupt, not NMI, right?

Yeah. The kernel/watchdog.c code always says "NMI watchdog", but it's
actually just a regular tiemr function: watchdog_timer_fn() started
with hrtimer_start().

> Is it possible that we've managed to return to userspace with
> interrupts off somehow?  A loop in userspace that somehow has
> interrupts off can cause all kinds of fun lockups.

That sounds unlikely, but if there is some stack corruption going on.

However, it wouldn't even explain things, because even if interrupts
had been disabled in user space, and even if that popf got executed,
this wouldn't be where they got enabled. That would be the :"sti" in
the system call entry path (hidden behind the ENABLE_INTERRUPTS
macro).

Of course, maybe Dave has paravirtualization enabled (what a crock
_that_ is), and there is something wrong with that whole code.

> I don't understand the logic of what enables TIF_NOHZ.

Yeah, that makes two of us.  But..

> In 3.17, I don't think that code would run with context tracking on,
> although I don't immediately see any bugs here.

See above: the problem apparently isn't new. Although it is possible
that we have two different issues going on..

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 14:59                               ` Dave Jones
  2014-11-19 17:22                                 ` Linus Torvalds
@ 2014-11-19 21:01                                 ` Andy Lutomirski
  2014-11-19 21:47                                   ` Dave Jones
                                                     ` (2 more replies)
  2014-11-20 15:04                                 ` Frederic Weisbecker
  2 siblings, 3 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-19 21:01 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On 11/19/2014 06:59 AM, Dave Jones wrote:
> On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
>  > On Tue, Nov 18, 2014 at 6:19 PM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480]
>  > > CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
>  > > RIP: 0010:[<ffffffff8a1798b4>]  [<ffffffff8a1798b4>] context_tracking_user_enter+0xa4/0x190
>  > > Call Trace:
>  > >  [<ffffffff8a012fc5>] syscall_trace_leave+0xa5/0x160
>  > >  [<ffffffff8a7d8624>] int_check_syscall_exit_work+0x34/0x3d
>  > 
>  > Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.
>  > 
>  > Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?
>  > 
>  > That makes me wonder: does the problem go away if you disable NOHZ?
> 
> Aparently not.

TIF_NOHZ is not the same thing as NOHZ.  Can you try a kernel with
CONFIG_CONTEXT_TRACKING=n?  Doing that may involve fiddling with RCU
settings a bit.  The normal no HZ idle stuff has nothing to do with
TIF_NOHZ, and you either have TIF_NOHZ set or you have some kind of
thread_info corruption going on here.

Hmm.  This isn't a stack overflow, is it?  That could cause all of these
problems quite easily, although I'd expect other symptoms, too.

> 
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c75:25175]
> CPU: 3 PID: 25175 Comm: trinity-c75 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
> task: ffff8800364e44d0 ti: ffff880192d2c000 task.ti: ffff880192d2c000
> RIP: 0010:[<ffffffff94175be7>]  [<ffffffff94175be7>] context_tracking_user_exit+0x57/0x120

This RIP should be impossible if context tracking is off.

> RSP: 0018:ffff880192d2fee8  EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000100000046 RCX: 000000336ee35b47

                                    ^^^^^^^^^

That is a strange coincidence.  Where did 0x46 | (1<<32) come from?
That's a sensible interrupts-disabled flags value with the high part set
to 0x1.  Those high bits are undefined, but they ought to all be zero.

> RDX: 0000000000000001 RSI: ffffffff94ac1e84 RDI: ffffffff94a93725
> RBP: ffff880192d2fef8 R08: 00007f9b74d0b740 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff940d8503
> R13: ffff880192d2fe98 R14: ffffffff943884e7 R15: ffff880192d2fe48
> FS:  00007f9b74d0b740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000336f1b7740 CR3: 0000000229a95000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffff880192d30000 0000000000080000 ffff880192d2ff78 ffffffff94012c25
>  00007f9b747a5000 00007f9b747a5068 0000000000000000 0000000000000000
>  0000000000000000 ffffffff9437b3be 0000000000000000 0000000000000000
> Call Trace:
>  [<ffffffff94012c25>] syscall_trace_enter_phase1+0x125/0x1a0
>  [<ffffffff9437b3be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff947d41bf>] tracesys+0x14/0x4a
> Code: 42 fd ff 48 c7 c7 7a 1e ac 94 e8 25 29 21 00 65 8b 04 25 34 f7 1c 00 83 f8 01 74 28 f6 c7 02 74 13 0f 1f 00 e8 bb 43 fd ff 53 9d <5b> 41 5c 5d c3 0f 1f 40 00 53 9d e8 89 42 fd ff eb ee 0f 1f 80 
> sending NMI to other CPUs:
> NMI backtrace for cpu 1
> CPU: 1 PID: 25164 Comm: trinity-c64 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
> task: ffff88011600dbc0 ti: ffff8801a99a4000 task.ti: ffff8801a99a4000
> RIP: 0010:[<ffffffff940fb71e>]  [<ffffffff940fb71e>] generic_exec_single+0xee/0x1a0
> RSP: 0018:ffff8801a99a7d18  EFLAGS: 00000202
> RAX: 0000000000000000 RBX: ffff8801a99a7d20 RCX: 0000000000000038
> RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
> RBP: ffff8801a99a7d78 R08: ffff880242b57ce0 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
> R13: 0000000000000001 R14: ffff880083c28948 R15: ffffffff94166aa0
> FS:  00007f9b74d0b740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000001 CR3: 00000001d8611000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffff8801a99a7d28 0000000000000000 ffffffff94166aa0 ffff880083c28948
>  0000000000000003 00000000e38f9aac ffff880083c28948 00000000ffffffff
>  0000000000000003 ffffffff94166aa0 ffff880083c28948 0000000000000001
> Call Trace:
>  [<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
>  [<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
>  [<ffffffff940fb89a>] smp_call_function_single+0x6a/0xe0
>  [<ffffffff940a172b>] ? preempt_count_sub+0x7b/0x100
>  [<ffffffff941671aa>] perf_event_read+0xca/0xd0
>  [<ffffffff94167240>] perf_event_read_value+0x90/0xe0
>  [<ffffffff941689c6>] perf_read+0x226/0x370
>  [<ffffffff942fbfb7>] ? security_file_permission+0x87/0xa0
>  [<ffffffff941eafff>] vfs_read+0x9f/0x180
>  [<ffffffff941ebbd8>] SyS_read+0x58/0xd0
>  [<ffffffff947d42c9>] tracesys_phase2+0xd4/0xd9

Riddle me this: what are we doing in tracesys_phase2?  This is a full
slow-path syscall.  TIF_NOHZ doesn't cause that, I think.  I'd love to
see the value of ti->flags here.  Is trinity using ptrace?

Um.  There's a bug.  Patch coming after lunch.  No clue whether it will
help here.

--Andy

> Code: 48 89 de 48 03 14 c5 20 65 d1 94 48 89 df e8 8a 4b 28 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d c8 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
> NMI backtrace for cpu 0
> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 35.055 msecs
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 11/410 27945]
> task: ffffffff94c164c0 ti: ffffffff94c00000 task.ti: ffffffff94c00000
> RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180
> RSP: 0018:ffffffff94c03e28  EFLAGS: 00000046
> RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
> RDX: 0000000000000000 RSI: ffffffff94c03fd8 RDI: 0000000000000000
> RBP: ffffffff94c03e58 R08: 000000008baf8b86 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
> R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff94c00000
> FS:  0000000000000000(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f162e060000 CR3: 0000000014c11000 CR4: 00000000001407f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  0000000094c03e58 5955c5b31ad5e8cf ffffe8fffee031a8 0000000000000005
>  ffffffff94ca9dc0 0000000000000000 ffffffff94c03ea8 ffffffff94661f05
>  00001cb7dcf6fd93 ffffffff94ca9f90 ffffffff94c00000 ffffffff94d18870
> Call[31557.908912] NMI backtrace for cpu 2
> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 68.178 msecs
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 10/410 27945]
> task: ffff880242b596f0 ti: ffff880242b6c000 task.ti: ffff880242b6c000
> RIP: 0010:[<ffffffff943dd415>]  [<ffffffff943dd415>] intel_idle+0xd5/0x180
> RSP: 0018:ffff880242b6fdf8  EFLAGS: 00000046
> RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
> RDX: 0000000000000000 RSI: ffff880242b6ffd8 RDI: 0000000000000002
> RBP: ffff880242b6fe28 R08: 000000008baf8b86 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
> R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b6c000
> FS:  0000000000000000(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000000014c11000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  0000000242b6fe28 da97aa9b9f42090a ffffe8ffff2031a8 0000000000000005
>  ffffffff94ca9dc0 0000000000000002 ffff880242b6fe78 ffffffff94661f05
>  00001cb7dcdd1af6 ffffffff94ca9f90 ffff880242b6c000 ffffffff94d18870
> Call Trace:
>  [<ffffffff94661f05>] cpuidle_enter_state+0x55/0x1c0
>  [<ffffffff94662127>] cpuidle_enter+0x17/0x20
>  [<ffffffff940b94a3>] cpu_startup_entry+0x423/0x4d0
>  [<ffffffff9402b763>] start_secondary+0x1a3/0x220
> Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 95.994 msecs
> 
> 
> <tangent>
> 
> Also, today I learned we can reach the perf_event_read code from
> read(). Given I had /proc/sys/kernel/perf_event_paranoid set to 1,
> I'm not sure how this is even possible. The only user of perf_fops
> is perf_event_open syscall _after_ it's checked that sysctl.
> 
> Oh, there's an ioctl path to perf too. Though trinity
> doesn't know anything about it, so I find it surprising if it
> managed to pull the right combination of entropy to make that
> do the right thing.  Still, that ioctl path probably needs
> to also be checking that sysctl shouldn't it ?
> 
> 	Dave
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 21:01                                 ` Andy Lutomirski
@ 2014-11-19 21:47                                   ` Dave Jones
  2014-11-19 21:58                                     ` Borislav Petkov
  2014-11-19 21:56                                   ` [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 Andy Lutomirski
  2014-11-20 15:25                                   ` frequent lockups in 3.18rc4 Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-19 21:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Wed, Nov 19, 2014 at 01:01:36PM -0800, Andy Lutomirski wrote:
 
 > TIF_NOHZ is not the same thing as NOHZ.  Can you try a kernel with
 > CONFIG_CONTEXT_TRACKING=n?  Doing that may involve fiddling with RCU
 > settings a bit.  The normal no HZ idle stuff has nothing to do with
 > TIF_NOHZ, and you either have TIF_NOHZ set or you have some kind of
 > thread_info corruption going on here.

I'll try that next.

 > > RSP: 0018:ffff880192d2fee8  EFLAGS: 00000246
 > > RAX: 0000000000000000 RBX: 0000000100000046 RCX: 000000336ee35b47
 > 
 >                                     ^^^^^^^^^
 > 
 > That is a strange coincidence.  Where did 0x46 | (1<<32) come from?
 > That's a sensible interrupts-disabled flags value with the high part set
 > to 0x1.  Those high bits are undefined, but they ought to all be zero.

This box is usually pretty solid, but it's been in service as a 24/7
fuzzing box for over a year now, so it's not outside the realm of
possibility that this could all be a hardware fault if some memory
has gone bad or the like.  Unless we find something obvious in the
next few days, I'll try running memtest over the weekend (though
I've seen situations where that doesn't stress hardware enough to
manifest a problem, so it might not be entirely conclusive unless
it actually finds a fault).

I wish I had a second identical box to see if it would be reproducible.

 > >  [<ffffffff941689c6>] perf_read+0x226/0x370
 > >  [<ffffffff942fbfb7>] ? security_file_permission+0x87/0xa0
 > >  [<ffffffff941eafff>] vfs_read+0x9f/0x180
 > >  [<ffffffff941ebbd8>] SyS_read+0x58/0xd0
 > >  [<ffffffff947d42c9>] tracesys_phase2+0xd4/0xd9
 > 
 > Riddle me this: what are we doing in tracesys_phase2?  This is a full
 > slow-path syscall.  TIF_NOHZ doesn't cause that, I think.  I'd love to
 > see the value of ti->flags here.  Is trinity using ptrace?
 
That's one of the few syscalls we actually blacklist (mostly because it
requires some more thinking: just passing it crap can get the fuzzer
into a confused state where it thinks child processes are dead, when
they aren't etc).  So it shouldn't be calling ptrace ever.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  2014-11-19 21:01                                 ` Andy Lutomirski
  2014-11-19 21:47                                   ` Dave Jones
@ 2014-11-19 21:56                                   ` Andy Lutomirski
  2014-11-19 22:13                                     ` Thomas Gleixner
  2014-11-20 22:04                                     ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  2014-11-20 15:25                                   ` frequent lockups in 3.18rc4 Dave Jones
  2 siblings, 2 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-19 21:56 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds
  Cc: Don Zickus, Thomas Gleixner, Linux Kernel, x86, Peter Zijlstra,
	Andy Lutomirski

TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
_TIF_SINGLESTEP), not (1<<19).

This code is involved in Dave's trinity lockup, but I don't see why
it would cause any of the problems he's seeing, except inadvertently
by causing a different path through entry_64.S's syscall handling.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 749b0e423419..e510618b2e91 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1484,7 +1484,7 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
 	 */
 	if (work & _TIF_NOHZ) {
 		user_exit();
-		work &= ~TIF_NOHZ;
+		work &= ~_TIF_NOHZ;
 	}
 
 #ifdef CONFIG_SECCOMP
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 19:02                                     ` Frederic Weisbecker
  2014-11-19 19:03                                       ` Andy Lutomirski
@ 2014-11-19 21:56                                       ` Thomas Gleixner
  2014-11-19 22:56                                         ` Frederic Weisbecker
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-19 21:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
> I got a report lately involving context tracking. Not sure if it's
> the same here but the issue was that context tracking uses per cpu data
> and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> lazy paging.

This is complete nonsense. pcpu allocations are populated right
away. Otherwise no single line of kernel code which uses dynamically
allocated per cpu storage would be safe.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 21:47                                   ` Dave Jones
@ 2014-11-19 21:58                                     ` Borislav Petkov
  2014-11-19 22:18                                       ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Borislav Petkov @ 2014-11-19 21:58 UTC (permalink / raw)
  To: Dave Jones
  Cc: Andy Lutomirski, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Wed, Nov 19, 2014 at 04:47:43PM -0500, Dave Jones wrote:
> This box is usually pretty solid, but it's been in service as a 24/7
> fuzzing box for over a year now, so it's not outside the realm of
> possibility that this could all be a hardware fault if some memory
> has gone bad or the like.

You could grep old logs for "Hardware Error" and the usual suspects
coming from MCE/EDAC. Also /var/log/mcelog or something like that,
depending on what's running on that box.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  2014-11-19 21:56                                   ` [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 Andy Lutomirski
@ 2014-11-19 22:13                                     ` Thomas Gleixner
  2014-11-20 20:33                                       ` Linus Torvalds
  2014-11-20 22:04                                     ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-19 22:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Jones, Linus Torvalds, Don Zickus, Linux Kernel, x86,
	Peter Zijlstra

On Wed, 19 Nov 2014, Andy Lutomirski wrote:

> TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
> _TIF_SINGLESTEP), not (1<<19).
> 
> This code is involved in Dave's trinity lockup, but I don't see why
> it would cause any of the problems he's seeing, except inadvertently
> by causing a different path through entry_64.S's syscall handling.

Right, while it is wrong it does not explain the wreckage on 3.17,
which does not have that code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 19:38                                     ` Linus Torvalds
@ 2014-11-19 22:18                                       ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-19 22:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra,
	Frédéric Weisbecker, Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 11:38:09AM -0800, Linus Torvalds wrote:

 > > Is it possible that we've managed to return to userspace with
 > > interrupts off somehow?  A loop in userspace that somehow has
 > > interrupts off can cause all kinds of fun lockups.
 > 
 > That sounds unlikely, but if there is some stack corruption going on.
 > 
 > However, it wouldn't even explain things, because even if interrupts
 > had been disabled in user space, and even if that popf got executed,
 > this wouldn't be where they got enabled. That would be the :"sti" in
 > the system call entry path (hidden behind the ENABLE_INTERRUPTS
 > macro).
 > 
 > Of course, maybe Dave has paravirtualization enabled (what a crock
 > _that_ is), and there is something wrong with that whole code.

I've had HYPERVISOR_GUEST disabled for a while, which also disables
the paravirt code afaics.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 21:58                                     ` Borislav Petkov
@ 2014-11-19 22:18                                       ` Dave Jones
  2014-11-20 10:33                                         ` Borislav Petkov
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-19 22:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Wed, Nov 19, 2014 at 10:58:14PM +0100, Borislav Petkov wrote:
 > On Wed, Nov 19, 2014 at 04:47:43PM -0500, Dave Jones wrote:
 > > This box is usually pretty solid, but it's been in service as a 24/7
 > > fuzzing box for over a year now, so it's not outside the realm of
 > > possibility that this could all be a hardware fault if some memory
 > > has gone bad or the like.
 > 
 > You could grep old logs for "Hardware Error" and the usual suspects
 > coming from MCE/EDAC. Also /var/log/mcelog or something like that,
 > depending on what's running on that box.

Nothing, but it wouldn't be the first time I'd seen a hardware fault
that didn't raise an MCE.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 21:56                                       ` Thomas Gleixner
@ 2014-11-19 22:56                                         ` Frederic Weisbecker
  2014-11-19 22:59                                           ` Andy Lutomirski
  2014-11-19 23:09                                           ` Thomas Gleixner
  0 siblings, 2 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-19 22:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 10:56:26PM +0100, Thomas Gleixner wrote:
> On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
> > I got a report lately involving context tracking. Not sure if it's
> > the same here but the issue was that context tracking uses per cpu data
> > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> > lazy paging.
> 
> This is complete nonsense. pcpu allocations are populated right
> away. Otherwise no single line of kernel code which uses dynamically
> allocated per cpu storage would be safe.

Note this isn't faulting because part of the allocation is swapped. No
it's all reserved in the physical memory, but it's a lazy allocation.
Part of it isn't yet addressed in the P[UGM?]D. That's what vmalloc_fault() is for.

So it's a non-blocking/sleeping fault which is why it's probably fine
most of the time except on code that isn't fault-safe. And I suspect that
most people assume that kernel data won't fault so probably some other
places have similar issues. 

That's a long standing issue. We even had to convert the perf callchain
allocation to ad-hoc kmalloc() based per cpu allocation to get over vmalloc
faults. At that time, NMIs couldn't handle faults and many callchains were
populated in NMIs. We had serious crashes because of per cpu memory faults.

I think that lazy adressing is there for allocation performance reasons. But
still having faultable per cpu memory is insame IMHO.


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 22:56                                         ` Frederic Weisbecker
@ 2014-11-19 22:59                                           ` Andy Lutomirski
  2014-11-19 23:07                                             ` Frederic Weisbecker
  2014-11-19 23:09                                           ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-19 22:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 2:56 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Wed, Nov 19, 2014 at 10:56:26PM +0100, Thomas Gleixner wrote:
>> On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
>> > I got a report lately involving context tracking. Not sure if it's
>> > the same here but the issue was that context tracking uses per cpu data
>> > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
>> > lazy paging.
>>
>> This is complete nonsense. pcpu allocations are populated right
>> away. Otherwise no single line of kernel code which uses dynamically
>> allocated per cpu storage would be safe.
>
> Note this isn't faulting because part of the allocation is swapped. No
> it's all reserved in the physical memory, but it's a lazy allocation.
> Part of it isn't yet addressed in the P[UGM?]D. That's what vmalloc_fault() is for.
>
> So it's a non-blocking/sleeping fault which is why it's probably fine
> most of the time except on code that isn't fault-safe. And I suspect that
> most people assume that kernel data won't fault so probably some other
> places have similar issues.
>
> That's a long standing issue. We even had to convert the perf callchain
> allocation to ad-hoc kmalloc() based per cpu allocation to get over vmalloc
> faults. At that time, NMIs couldn't handle faults and many callchains were
> populated in NMIs. We had serious crashes because of per cpu memory faults.

Is there seriously more than 512GB of per-cpu virtual space or
whatever's needed to exceed a single pgd on x86_64?

And there are definitely placed that access per-cpu data in contexts
in which a non-IST fault is not allowed.  Maybe not dynamic per-cpu
data, though.

--Andy

>
> I think that lazy adressing is there for allocation performance reasons. But
> still having faultable per cpu memory is insame IMHO.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 19:03                                       ` Andy Lutomirski
@ 2014-11-19 23:00                                         ` Frederic Weisbecker
  2014-11-19 23:07                                           ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-19 23:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 11:03:48AM -0800, Andy Lutomirski wrote:
> On Wed, Nov 19, 2014 at 11:02 AM, Frederic Weisbecker
> <fweisbec@gmail.com> wrote:
> > On Wed, Nov 19, 2014 at 09:40:26AM -0800, Linus Torvalds wrote:
> >> On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> >
> >> > So it hasn't actually done the "push %rbx; popfq" part - there must be
> >> > a label at the return part, and context_tracking_user_exit() never
> >> > actually did the local_irq_save/restore at all. Which means that it
> >> > took one of the early exits instead:
> >> >
> >> >         if (!context_tracking_is_enabled())
> >> >                 return;
> >> >
> >> >         if (in_interrupt())
> >> >                 return;
> >>
> >> Ho humm. Interesting. Neither of those should possibly have happened.
> >>
> >> We "know" that "context_tracking_is_enabled()" must be true, because
> >> the only way we get to context_tracking_user_exit() in the first place
> >> is through "user_exit()", which does:
> >>
> >>         if (context_tracking_is_enabled())
> >>                 context_tracking_user_exit();
> >>
> >> and we know we shouldn't be in_interrupt(), because the backtrace is
> >> the system call entry path, for chrissake!
> >>
> >> So we definitely have some corruption going on. A few possibilities:
> >>
> >>  - either the register contents are corrupted (%rbx in your dump said
> >> "0x0000000100000046", but the eflags we restored was 0x246)
> >>
> >>  - in_interrupt() is wrong, and we've had some irq_count() corruption.
> >> I'd expect that to result in "scheduling while atomic" messages,
> >> though, especially if it goes on long enough that you get a watchdog
> >> event..
> >>
> >>  - there is something rotten in the land of
> >> context_tracking_is_enabled(), which uses a static key.
> >>
> >>  - I have misread the whole trace, and am a moron. But your earlier
> >> report really had some very similar things, just in
> >> context_tracking_user_enter() instead of exit.
> >>
> >> In your previous oops, the registers that was allegedly used to
> >> restore %eflags was %r12:
> >>
> >>   28: 41 54                 push   %r12
> >>   2a: 9d                   popfq
> >>   2b:* 5b                   pop    %rbx <-- trapping instruction
> >>   2c: 41 5c                 pop    %r12
> >>   2e: 5d                   pop    %rbp
> >>   2f: c3                   retq
> >>
> >> but:
> >>
> >>   R12: ffff880101ee3ec0
> >>   EFLAGS: 00000282
> >>
> >> so again, it looks like we never actually did that "popfq"
> >> instruction, and it would have exited through the (same) early exits.
> >>
> >> But what an odd coincidence that it ended up in both of your reports
> >> being *exactly* at that instruction after the "popf". If it had
> >> actually *taken* the popf, I'd not be so surprised ("ok, popf enabled
> >> interrupts, and there was an interrupt pending"), but since everything
> >> seems to say that it came there through some control flow that did
> >> *not* go through the popf, that's just a very odd coincidence.
> >>
> >> And both context_tracking_user_enter() and exit() have that exact same
> >> issue with the early returns. They shouldn't have happened in the
> >> first place.
> >
> > I got a report lately involving context tracking. Not sure if it's
> > the same here but the issue was that context tracking uses per cpu data
> > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> > lazy paging.
> 
> Wait, what?  If something like kernel_stack ends with an unmapped pmd,
> we are well and truly screwed.

Note that's non-sleeping faults. So probably most places are fine except
a few of them that really don't want exception to mess up some state. I
can imagine some entry code that really don't want that.

Is kernel stack allocated by vmalloc or alloc_percpu()?

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 23:00                                         ` Frederic Weisbecker
@ 2014-11-19 23:07                                           ` Andy Lutomirski
  2014-11-19 23:13                                             ` Frederic Weisbecker
  0 siblings, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-19 23:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 3:00 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Wed, Nov 19, 2014 at 11:03:48AM -0800, Andy Lutomirski wrote:
>> On Wed, Nov 19, 2014 at 11:02 AM, Frederic Weisbecker
>> <fweisbec@gmail.com> wrote:
>> > On Wed, Nov 19, 2014 at 09:40:26AM -0800, Linus Torvalds wrote:
>> >> On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
>> >> <torvalds@linux-foundation.org> wrote:
>> >> >
>> >> > So it hasn't actually done the "push %rbx; popfq" part - there must be
>> >> > a label at the return part, and context_tracking_user_exit() never
>> >> > actually did the local_irq_save/restore at all. Which means that it
>> >> > took one of the early exits instead:
>> >> >
>> >> >         if (!context_tracking_is_enabled())
>> >> >                 return;
>> >> >
>> >> >         if (in_interrupt())
>> >> >                 return;
>> >>
>> >> Ho humm. Interesting. Neither of those should possibly have happened.
>> >>
>> >> We "know" that "context_tracking_is_enabled()" must be true, because
>> >> the only way we get to context_tracking_user_exit() in the first place
>> >> is through "user_exit()", which does:
>> >>
>> >>         if (context_tracking_is_enabled())
>> >>                 context_tracking_user_exit();
>> >>
>> >> and we know we shouldn't be in_interrupt(), because the backtrace is
>> >> the system call entry path, for chrissake!
>> >>
>> >> So we definitely have some corruption going on. A few possibilities:
>> >>
>> >>  - either the register contents are corrupted (%rbx in your dump said
>> >> "0x0000000100000046", but the eflags we restored was 0x246)
>> >>
>> >>  - in_interrupt() is wrong, and we've had some irq_count() corruption.
>> >> I'd expect that to result in "scheduling while atomic" messages,
>> >> though, especially if it goes on long enough that you get a watchdog
>> >> event..
>> >>
>> >>  - there is something rotten in the land of
>> >> context_tracking_is_enabled(), which uses a static key.
>> >>
>> >>  - I have misread the whole trace, and am a moron. But your earlier
>> >> report really had some very similar things, just in
>> >> context_tracking_user_enter() instead of exit.
>> >>
>> >> In your previous oops, the registers that was allegedly used to
>> >> restore %eflags was %r12:
>> >>
>> >>   28: 41 54                 push   %r12
>> >>   2a: 9d                   popfq
>> >>   2b:* 5b                   pop    %rbx <-- trapping instruction
>> >>   2c: 41 5c                 pop    %r12
>> >>   2e: 5d                   pop    %rbp
>> >>   2f: c3                   retq
>> >>
>> >> but:
>> >>
>> >>   R12: ffff880101ee3ec0
>> >>   EFLAGS: 00000282
>> >>
>> >> so again, it looks like we never actually did that "popfq"
>> >> instruction, and it would have exited through the (same) early exits.
>> >>
>> >> But what an odd coincidence that it ended up in both of your reports
>> >> being *exactly* at that instruction after the "popf". If it had
>> >> actually *taken* the popf, I'd not be so surprised ("ok, popf enabled
>> >> interrupts, and there was an interrupt pending"), but since everything
>> >> seems to say that it came there through some control flow that did
>> >> *not* go through the popf, that's just a very odd coincidence.
>> >>
>> >> And both context_tracking_user_enter() and exit() have that exact same
>> >> issue with the early returns. They shouldn't have happened in the
>> >> first place.
>> >
>> > I got a report lately involving context tracking. Not sure if it's
>> > the same here but the issue was that context tracking uses per cpu data
>> > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
>> > lazy paging.
>>
>> Wait, what?  If something like kernel_stack ends with an unmapped pmd,
>> we are well and truly screwed.
>
> Note that's non-sleeping faults. So probably most places are fine except
> a few of them that really don't want exception to mess up some state. I
> can imagine some entry code that really don't want that.

Any non-IST fault at all on the kernel_stack reference in system_call
is instant root on non-SMAP systems and instant double-fault or more
challenging root on SMAP systems.  The issue is that rsp is
user-controlled, so the CPU cannot deliver a non-IST fault safely.

>
> Is kernel stack allocated by vmalloc or alloc_percpu()?

DEFINE_PER_CPU(unsigned long, kernel_stack)

Note that I'm talking about kernel_stack, not the kernel stack itself.
The actual stack is regular linearly-mapped memory, although I plan on
trying to change that, complete with all kinds of care to avoid double
faults.

--Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 22:59                                           ` Andy Lutomirski
@ 2014-11-19 23:07                                             ` Frederic Weisbecker
  0 siblings, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-19 23:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 02:59:01PM -0800, Andy Lutomirski wrote:
> On Wed, Nov 19, 2014 at 2:56 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Wed, Nov 19, 2014 at 10:56:26PM +0100, Thomas Gleixner wrote:
> >> On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
> >> > I got a report lately involving context tracking. Not sure if it's
> >> > the same here but the issue was that context tracking uses per cpu data
> >> > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> >> > lazy paging.
> >>
> >> This is complete nonsense. pcpu allocations are populated right
> >> away. Otherwise no single line of kernel code which uses dynamically
> >> allocated per cpu storage would be safe.
> >
> > Note this isn't faulting because part of the allocation is swapped. No
> > it's all reserved in the physical memory, but it's a lazy allocation.
> > Part of it isn't yet addressed in the P[UGM?]D. That's what vmalloc_fault() is for.
> >
> > So it's a non-blocking/sleeping fault which is why it's probably fine
> > most of the time except on code that isn't fault-safe. And I suspect that
> > most people assume that kernel data won't fault so probably some other
> > places have similar issues.
> >
> > That's a long standing issue. We even had to convert the perf callchain
> > allocation to ad-hoc kmalloc() based per cpu allocation to get over vmalloc
> > faults. At that time, NMIs couldn't handle faults and many callchains were
> > populated in NMIs. We had serious crashes because of per cpu memory faults.
> 
> Is there seriously more than 512GB of per-cpu virtual space or
> whatever's needed to exceed a single pgd on x86_64?

No idea, I'm clueless about -mm details.

> 
> And there are definitely placed that access per-cpu data in contexts
> in which a non-IST fault is not allowed.  Maybe not dynamic per-cpu
> data, though.

It probably happens to be fine because the code that accesses first the
related data is fault-safe. Or maybe not and some state is silently messed
up somewhere.

This doesn't leave a comfortable feeling.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 22:56                                         ` Frederic Weisbecker
  2014-11-19 22:59                                           ` Andy Lutomirski
@ 2014-11-19 23:09                                           ` Thomas Gleixner
  2014-11-19 23:50                                             ` Frederic Weisbecker
  2014-11-19 23:54                                             ` Andy Lutomirski
  1 sibling, 2 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-19 23:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Wed, 19 Nov 2014, Frederic Weisbecker wrote:

> On Wed, Nov 19, 2014 at 10:56:26PM +0100, Thomas Gleixner wrote:
> > On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
> > > I got a report lately involving context tracking. Not sure if it's
> > > the same here but the issue was that context tracking uses per cpu data
> > > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> > > lazy paging.
> > 
> > This is complete nonsense. pcpu allocations are populated right
> > away. Otherwise no single line of kernel code which uses dynamically
> > allocated per cpu storage would be safe.
> 
> Note this isn't faulting because part of the allocation is
> swapped. No it's all reserved in the physical memory, but it's a
> lazy allocation.  Part of it isn't yet addressed in the
> P[UGM?]D. That's what vmalloc_fault() is for.

Sorry, I can't follow your argumentation here.

pcpu_alloc()
   ....
area_found:
   ....

        /* clear the areas and return address relative to base address */
        for_each_possible_cpu(cpu)
                memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);

How would that memset fail to establish the mapping, which is
btw. already established via:

     pcpu_populate_chunk()
  
already before that memset?   	    
 
Are we talking about different per cpu allocators here or am I missing
something completely non obvious?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 23:07                                           ` Andy Lutomirski
@ 2014-11-19 23:13                                             ` Frederic Weisbecker
  0 siblings, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-19 23:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 03:07:17PM -0800, Andy Lutomirski wrote:
> On Wed, Nov 19, 2014 at 3:00 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > Note that's non-sleeping faults. So probably most places are fine except
> > a few of them that really don't want exception to mess up some state. I
> > can imagine some entry code that really don't want that.
> 
> Any non-IST fault at all on the kernel_stack reference in system_call
> is instant root on non-SMAP systems and instant double-fault or more
> challenging root on SMAP systems.  The issue is that rsp is
> user-controlled, so the CPU cannot deliver a non-IST fault safely.

Heh.

> >
> > Is kernel stack allocated by vmalloc or alloc_percpu()?
> 
> DEFINE_PER_CPU(unsigned long, kernel_stack)
> 
> Note that I'm talking about kernel_stack, not the kernel stack itself.

Ah. Note, static allocation like DEFINE_PER_CPU() is probably fine. The
issue is on dynamic allocations: alloc_percpu().

> The actual stack is regular linearly-mapped memory, although I plan on
> trying to change that, complete with all kinds of care to avoid double
> faults.

If you do so, you must really ensure that the resulting memory will never
fault.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 23:09                                           ` Thomas Gleixner
@ 2014-11-19 23:50                                             ` Frederic Weisbecker
  2014-11-20 12:23                                               ` Tejun Heo
  2014-11-19 23:54                                             ` Andy Lutomirski
  1 sibling, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-19 23:50 UTC (permalink / raw)
  To: Thomas Gleixner, Tejun Heo
  Cc: Linus Torvalds, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 12:09:22AM +0100, Thomas Gleixner wrote:
> On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
> 
> > On Wed, Nov 19, 2014 at 10:56:26PM +0100, Thomas Gleixner wrote:
> > > On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
> > > > I got a report lately involving context tracking. Not sure if it's
> > > > the same here but the issue was that context tracking uses per cpu data
> > > > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> > > > lazy paging.
> > > 
> > > This is complete nonsense. pcpu allocations are populated right
> > > away. Otherwise no single line of kernel code which uses dynamically
> > > allocated per cpu storage would be safe.
> > 
> > Note this isn't faulting because part of the allocation is
> > swapped. No it's all reserved in the physical memory, but it's a
> > lazy allocation.  Part of it isn't yet addressed in the
> > P[UGM?]D. That's what vmalloc_fault() is for.
> 
> Sorry, I can't follow your argumentation here.
> 
> pcpu_alloc()
>    ....
> area_found:
>    ....
> 
>         /* clear the areas and return address relative to base address */
>         for_each_possible_cpu(cpu)
>                 memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
> 
> How would that memset fail to establish the mapping, which is
> btw. already established via:
> 
>      pcpu_populate_chunk()
>   
> already before that memset?   	    
>  
> Are we talking about different per cpu allocators here or am I missing
> something completely non obvious?

That's the same allocator yeah. So if the whole memory is dereferenced,
faults shouldn't happen indeed.

Maybe that was a bug a few years ago but not anymore.

I'm surprised because I got a report from Dave that very much suggested
a vmalloc fault. See the discussion "Deadlock in vtime_account_user() vs itself across a page fault":

http://marc.info/?l=linux-kernel&m=141047612120263&w=2

Is it possible that, somehow, some part isn't zeroed by pcpu_alloc()?
After all it's allocated with vzalloc() so that part could be skipped. The memset(0)
is passed the whole size though so it looks like the whole is dereferenced.

(cc'ing Tejun just in case).

Now if faults on percpu memory don't happen anymore, perhaps we are accessing some
other vmalloc'ed area. In the above report from Dave, the fault happened somewhere
in account_user_time().

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 23:09                                           ` Thomas Gleixner
  2014-11-19 23:50                                             ` Frederic Weisbecker
@ 2014-11-19 23:54                                             ` Andy Lutomirski
  2014-11-20  0:00                                               ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-19 23:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Nov 19, 2014 at 3:09 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
>
>> On Wed, Nov 19, 2014 at 10:56:26PM +0100, Thomas Gleixner wrote:
>> > On Wed, 19 Nov 2014, Frederic Weisbecker wrote:
>> > > I got a report lately involving context tracking. Not sure if it's
>> > > the same here but the issue was that context tracking uses per cpu data
>> > > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
>> > > lazy paging.
>> >
>> > This is complete nonsense. pcpu allocations are populated right
>> > away. Otherwise no single line of kernel code which uses dynamically
>> > allocated per cpu storage would be safe.
>>
>> Note this isn't faulting because part of the allocation is
>> swapped. No it's all reserved in the physical memory, but it's a
>> lazy allocation.  Part of it isn't yet addressed in the
>> P[UGM?]D. That's what vmalloc_fault() is for.
>
> Sorry, I can't follow your argumentation here.
>
> pcpu_alloc()
>    ....
> area_found:
>    ....
>
>         /* clear the areas and return address relative to base address */
>         for_each_possible_cpu(cpu)
>                 memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
>
> How would that memset fail to establish the mapping, which is
> btw. already established via:
>
>      pcpu_populate_chunk()
>
> already before that memset?

I think that this will map them into init_mm->pgd and
current->active_mm->pgd, but it won't necessarily map them into the
rest of the pgds.

At the risk of suggesting something awful, if we preallocated all 256
or whatever kernel pmd pages at boot, this whole problem would go away
forever.  It would only waste slightly under 1 MB of RAM (less on
extremely large systems).

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 23:54                                             ` Andy Lutomirski
@ 2014-11-20  0:00                                               ` Thomas Gleixner
  2014-11-20  0:30                                                 ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-20  0:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, 19 Nov 2014, Andy Lutomirski wrote:
> On Wed, Nov 19, 2014 at 3:09 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > Sorry, I can't follow your argumentation here.
> >
> > pcpu_alloc()
> >    ....
> > area_found:
> >    ....
> >
> >         /* clear the areas and return address relative to base address */
> >         for_each_possible_cpu(cpu)
> >                 memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
> >
> > How would that memset fail to establish the mapping, which is
> > btw. already established via:
> >
> >      pcpu_populate_chunk()
> >
> > already before that memset?
> 
> I think that this will map them into init_mm->pgd and
> current->active_mm->pgd, but it won't necessarily map them into the
> rest of the pgds.

And why would mapping them into the kernel mapping, i.e. init_mm not
be sufficient?

We are talking about kernel memory and not some random user space
mapping.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20  0:00                                               ` Thomas Gleixner
@ 2014-11-20  0:30                                                 ` Andy Lutomirski
  2014-11-20  0:40                                                   ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-20  0:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Linus Torvalds, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Nov 19, 2014 4:00 PM, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>
> On Wed, 19 Nov 2014, Andy Lutomirski wrote:
> > On Wed, Nov 19, 2014 at 3:09 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > Sorry, I can't follow your argumentation here.
> > >
> > > pcpu_alloc()
> > >    ....
> > > area_found:
> > >    ....
> > >
> > >         /* clear the areas and return address relative to base address */
> > >         for_each_possible_cpu(cpu)
> > >                 memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
> > >
> > > How would that memset fail to establish the mapping, which is
> > > btw. already established via:
> > >
> > >      pcpu_populate_chunk()
> > >
> > > already before that memset?
> >
> > I think that this will map them into init_mm->pgd and
> > current->active_mm->pgd, but it won't necessarily map them into the
> > rest of the pgds.
>
> And why would mapping them into the kernel mapping, i.e. init_mm not
> be sufficient?

Because the kernel can run with any pgd loaded into cr3, and we rely
on vmalloc_fault to lazily populate pgds in all the non-init pgds as
needed.  But this only happens if the first TLB-missing reference to
the pgd in question with any given cr3 value happens from a safe
context.

This is why I think that the grsec kernels will crash on very large
memory systems.  They don't seem to get this right for the kernel
stack, and a page fault trying to access the stack is a big no-no.

--Andy

>
> We are talking about kernel memory and not some random user space
> mapping.
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20  0:30                                                 ` Andy Lutomirski
@ 2014-11-20  0:40                                                   ` Linus Torvalds
  2014-11-20  0:49                                                     ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-20  0:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 4:30 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> This is why I think that the grsec kernels will crash on very large
> memory systems.  They don't seem to get this right for the kernel
> stack, and a page fault trying to access the stack is a big no-no.

For something like a stack, that's trivial, you could just probe it
before the actual task switch.

So I wouldn't worry about the kernel stack itself (although I think
vmallocing it isn't likely worth it), I'd worry more about some other
random dynamic percpu allocation. Although they arguably shouldn't
happen for low-level code that cannot handle the dynamic
pgd-population. And they generally don't.

It's really tracing that tends to be a special case not because of any
particular low-level code issue, but because instrumenting itself
recursively tends to be a bad idea.

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20  0:40                                                   ` Linus Torvalds
@ 2014-11-20  0:49                                                     ` Andy Lutomirski
  2014-11-20  1:07                                                       ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-20  0:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 4:40 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 19, 2014 at 4:30 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> This is why I think that the grsec kernels will crash on very large
>> memory systems.  They don't seem to get this right for the kernel
>> stack, and a page fault trying to access the stack is a big no-no.
>
> For something like a stack, that's trivial, you could just probe it
> before the actual task switch.

I thought so for a while, too, but now I disagree.  On PGE hardware,
it seems entirely possible that the new stack would be in the TLB even
if it's not visible via cr3.  Then, as soon as the TLB entry expires,
we double-fault.

>
> So I wouldn't worry about the kernel stack itself (although I think
> vmallocing it isn't likely worth it),

I don't want vmalloc to avoid low-order allocations -- I want it to
have guard pages.  The fact that a user-triggerable stack overflow is
basically root right now and doesn't reliably OOPS scares me.

> I'd worry more about some other
> random dynamic percpu allocation. Although they arguably shouldn't
> happen for low-level code that cannot handle the dynamic
> pgd-population. And they generally don't.

This issue ought to be limited to nokprobes code, and I doubt that any
of that code touches dynamic per-cpu things.

>
> It's really tracing that tends to be a special case not because of any
> particular low-level code issue, but because instrumenting itself
> recursively tends to be a bad idea.
>
>                     Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20  0:49                                                     ` Andy Lutomirski
@ 2014-11-20  1:07                                                       ` Linus Torvalds
  2014-11-20  1:16                                                         ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-20  1:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I thought so for a while, too, but now I disagree.  On PGE hardware,
> it seems entirely possible that the new stack would be in the TLB even
> if it's not visible via cr3.  Then, as soon as the TLB entry expires,
> we double-fault.

Ahh. Good point.

> I don't want vmalloc to avoid low-order allocations -- I want it to
> have guard pages.  The fact that a user-triggerable stack overflow is
> basically root right now and doesn't reliably OOPS scares me.

Well, if you do that, you would have to make the double-fault handler
aware of the stack issue anyway, and then you could just do teh same
PGD repopulation that a page fault does and return (for the case where
you didn't overflow the stack, just had the page tables unpopulated -
obviously an actual stack overflow should do something more drastic).

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20  1:07                                                       ` Linus Torvalds
@ 2014-11-20  1:16                                                         ` Andy Lutomirski
  2014-11-20  2:42                                                           ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-20  1:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 5:07 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 19, 2014 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> I thought so for a while, too, but now I disagree.  On PGE hardware,
>> it seems entirely possible that the new stack would be in the TLB even
>> if it's not visible via cr3.  Then, as soon as the TLB entry expires,
>> we double-fault.
>
> Ahh. Good point.
>
>> I don't want vmalloc to avoid low-order allocations -- I want it to
>> have guard pages.  The fact that a user-triggerable stack overflow is
>> basically root right now and doesn't reliably OOPS scares me.
>
> Well, if you do that, you would have to make the double-fault handler
> aware of the stack issue anyway, and then you could just do teh same
> PGD repopulation that a page fault does and return (for the case where
> you didn't overflow the stack, just had the page tables unpopulated -
> obviously an actual stack overflow should do something more drastic).

And you were calling me crazy? :)

We could be restarting just about anything if that happens.  Except
that if we double-faulted on a trap gate entry instead of an interrupt
gate entry, then we can't restart, and, unless we can somehow decode
the error code usefully (it's woefully undocumented), int 0x80 and
int3 might be impossible to handle correctly if it double-faults.  And
please don't suggest moving int 0x80 to an IST stack :)

The SDM specifically says that you must not try to recover after a
double-fault.  We do, however, recover from a double-fault in the
specific case of an iret failure during espfix64 processing (and I
even have a nice test case for it), but I think that hpa had a long
conversation with one of the microcode architects before he was okay
with that.

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20  1:16                                                         ` Andy Lutomirski
@ 2014-11-20  2:42                                                           ` Linus Torvalds
  2014-11-20  6:16                                                             ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-20  2:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 5:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> And you were calling me crazy? :)

Hey, I'm crazy like a fox.

> We could be restarting just about anything if that happens. Except
> that if we double-faulted on a trap gate entry instead of an interrupt
> gate entry, then we can't restart, and, unless we can somehow decode
> the error code usefully (it's woefully undocumented), int 0x80 and
> int3 might be impossible to handle correctly if it double-faults.  And
> please don't suggest moving int 0x80 to an IST stack :)

No, no.  So tell me if this won't work:

 - when forking a new process, make sure we allocate the vmalloc stack
*before* we copy the vm

 - this should guarantee that all new processes will at least have its
*own* stack always in its page tables, since vmalloc always fills in
the page table of the current page tables of the thread doing the
vmalloc.

HOWEVER, that leaves the task switch *to* that process, and making
sure that the stack pointer is ok in between the "switch %rsp" and
"switch %cr3".

So then we make the rule be: switch %cr3 *before* switching %rsp, and
only in between those places can we get in trouble. Yes/no?

And that small section is all with interrupts disabled, and nothing
should take an exception. The C code might take a double fault on a
regular access to the old stack (the *new* stack is guaranteed to be
mapped, but the old stack is not), but that should be very similar to
what we already do with "iret". So we can just fill in the page tables
and return.

For safety, add a percpu counter that is cleared before the %cr3
setting, to make sure that we only do a *single* double-fault, but it
really sounds pretty safe. No?

The only deadly thing would be NMI, but that's an IST anyway, so not
an issue. No other traps should be able to happen except the double
page table miss.

But hey, maybe I'm not crazy like a fox. Maybe I'm just plain crazy,
and I missed something else.

And no, I don't think the above is necessarily a *good* idea. But it
doesn't seem really overly complicated either.

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20  2:42                                                           ` Linus Torvalds
@ 2014-11-20  6:16                                                             ` Andy Lutomirski
  0 siblings, 0 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-20  6:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Wed, Nov 19, 2014 at 6:42 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 19, 2014 at 5:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> And you were calling me crazy? :)
>
> Hey, I'm crazy like a fox.
>
>> We could be restarting just about anything if that happens. Except
>> that if we double-faulted on a trap gate entry instead of an interrupt
>> gate entry, then we can't restart, and, unless we can somehow decode
>> the error code usefully (it's woefully undocumented), int 0x80 and
>> int3 might be impossible to handle correctly if it double-faults.  And
>> please don't suggest moving int 0x80 to an IST stack :)
>
> No, no.  So tell me if this won't work:
>
>  - when forking a new process, make sure we allocate the vmalloc stack
> *before* we copy the vm
>
>  - this should guarantee that all new processes will at least have its
> *own* stack always in its page tables, since vmalloc always fills in
> the page table of the current page tables of the thread doing the
> vmalloc.

This gets interesting for kernel threads that don't really have an mm
in the first place, though.

>
> HOWEVER, that leaves the task switch *to* that process, and making
> sure that the stack pointer is ok in between the "switch %rsp" and
> "switch %cr3".
>
> So then we make the rule be: switch %cr3 *before* switching %rsp, and
> only in between those places can we get in trouble. Yes/no?
>

Kernel threads aside, sure.  And we do it in this order anyway, I think.

> And that small section is all with interrupts disabled, and nothing
> should take an exception. The C code might take a double fault on a
> regular access to the old stack (the *new* stack is guaranteed to be
> mapped, but the old stack is not), but that should be very similar to
> what we already do with "iret". So we can just fill in the page tables
> and return.

Unless we try to dump the stack from an NMI or something, but that
should be fine regardless.

>
> For safety, add a percpu counter that is cleared before the %cr3
> setting, to make sure that we only do a *single* double-fault, but it
> really sounds pretty safe. No?

I wouldn't be surprised if that's just as expensive as just fixing up
the pgd in the first place.  The fixup is just:

if (unlikely(pte_none(mm->pgd[pgd_address(rsp)]))) fix it;

or something like that.

>
> The only deadly thing would be NMI, but that's an IST anyway, so not
> an issue. No other traps should be able to happen except the double
> page table miss.
>
> But hey, maybe I'm not crazy like a fox. Maybe I'm just plain crazy,
> and I missed something else.

I actually kind of like it, other than the kernel thread issue.

We should arguably ditch lazy mm for kernel threads in favor of PCID,
but that's a different story.  Or we could beg Intel to give us
separate kernel and user page table hierarchies.

--Andy

>
> And no, I don't think the above is necessarily a *good* idea. But it
> doesn't seem really overly complicated either.
>
>                       Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 14:41                             ` Don Zickus
  2014-11-19 15:03                               ` Vivek Goyal
@ 2014-11-20  9:54                               ` Dave Young
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Young @ 2014-11-20  9:54 UTC (permalink / raw)
  To: Don Zickus
  Cc: Dave Jones, Thomas Gleixner, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers, vgoyal

On 11/19/14 at 09:41am, Don Zickus wrote:
> On Tue, Nov 18, 2014 at 05:02:54PM -0500, Dave Jones wrote:
> > On Tue, Nov 18, 2014 at 04:55:40PM -0500, Don Zickus wrote:
> > 
> >  > > So here we mangle CPU3 in and lose the backtrace for cpu0, which might
> >  > > be the real interesting one ....
> >  > 
> >  > Can you provide another dump?  The hope is we get something not mangled?
> > 
> > Working on it..
> > 
> >  > The other option we have done in RHEL is panic the system and let kdump
> >  > capture the memory.  Then we can analyze the vmcore for the stack trace
> >  > cpu0 stored in memory to get a rough idea where it might be if the cpu
> >  > isn't responding very well.
> > 
> > I don't know if it's because of the debug options I typically run with,
> > or that I'm perpetually cursed, but I've never managed to get kdump to
> > do anything useful. (The last time I tried it was actively harmful in
> > that not only did it fail to dump anything, it wedged the machine so
> > it didn't reboot after panic).
> > 
> > Unless there's some magic step missing from the documentation at
> > http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
> > then I'm not optimistic it'll be useful.
> 
> Well, I don't know when the last time you ran it, but I know the RH kexec
> folks have started pursuing a Fedora-first package patch rule a couple of
> years ago to ensure Fedora had a working kexec/kdump solution.

It started from Fedora 17, I think for Fedora pre F17 kdump support is very
limited, it is becoming better.

> 
> As for the wedging part, it was a common problem to have the kernel hang
> while trying to boot the second kernel (and before console output
> happened).  So the problem makes sense and is unfortunate.  I would
> encourage you to try again.  :-)

In fedora we will have more such issues than RHEL because the kernel is updated
frequestly. There's ocasinaly new problems in upstream kernel, such as kaslr
feature in X86.

Problem for Fedora is it is not by default enabled, so user need explictly
specify kerenl cmdline for crashkernel reservation and enable kdump serivce.

There's very few bugs reported from Fedora user. So I guess it is not well tested
in Fedora community. Since Dave bring up this issue I think it's at least a good
news to us that someone is using it. We can address the problem case by case then.

Probably a good way to get more testing is to add kdump anaconda addon by default
at installation phase so user can choose to enable kdump or not.

Thanks
Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 22:18                                       ` Dave Jones
@ 2014-11-20 10:33                                         ` Borislav Petkov
  0 siblings, 0 replies; 486+ messages in thread
From: Borislav Petkov @ 2014-11-20 10:33 UTC (permalink / raw)
  To: Dave Jones
  Cc: Andy Lutomirski, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Wed, Nov 19, 2014 at 05:18:42PM -0500, Dave Jones wrote:
> Nothing, but it wouldn't be the first time I'd seen a hardware fault
> that didn't raise an MCE.

And maybe it tried but it didn't manage to come out due to hard wedging. :-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 23:50                                             ` Frederic Weisbecker
@ 2014-11-20 12:23                                               ` Tejun Heo
  2014-11-20 21:58                                                 ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Tejun Heo @ 2014-11-20 12:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Hello,

On Thu, Nov 20, 2014 at 12:50:36AM +0100, Frederic Weisbecker wrote:
> > Are we talking about different per cpu allocators here or am I missing
> > something completely non obvious?
> 
> That's the same allocator yeah. So if the whole memory is dereferenced,
> faults shouldn't happen indeed.
> 
> Maybe that was a bug a few years ago but not anymore.

It has been always like that tho.  Percpu memory given out is always
populated and cleared.

> Is it possible that, somehow, some part isn't zeroed by pcpu_alloc()?
> After all it's allocated with vzalloc() so that part could be skipped. The memset(0)

The vzalloc call is for the internal allocation bitmap not the actual
percpu memory area.  The actual address areas for percpu memory are
obtained using pcpu_get_vm_areas() call and later get populated using
map_kernel_range_noflush() (flush is performed after mapping is
complete).

Trying to remember what happens with vmalloc_fault().  Ah okay, so
when a new PUD gets created for vmalloc area, we don't go through all
PGDs and update them.  The PGD entries get faulted in lazily.  Percpu
memory allocator clearing or not clearing the allocated area doesn't
have anything to do with it.  The memory area is always fully
populated in the kernel page table.  It's just that the population
happened while a different PGD was active and this PGD hasn't been
populated with the new PUD yet.

So, yeap, vmalloc_fault() can always happen when accessing vmalloc
areas and the only way to avoid that would be removing lazy PGD
population - going through all PGDs and populating new PUDs
immediately.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19  5:15                               ` Dave Jones
@ 2014-11-20 14:36                                 ` Frederic Weisbecker
  0 siblings, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-20 14:36 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers

On Wed, Nov 19, 2014 at 12:15:24AM -0500, Dave Jones wrote:
> On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
> 
>  > Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.
>  > 
>  > Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?
>  > 
>  > That makes me wonder: does the problem go away if you disable NOHZ?
> 
> Does nohz=off do enough ? I couldn't convince myself after looking at
> dmesg, and still seeing dynticks stuff in there.
> 
> I'll do a rebuild with all the CONFIG_NO_HZ stuff off, though it also changes
> some other config stuff wrt timers.

You also need to disable context tracking. So you need to deactive also
CONFIG_RCU_USER_QS and CONFIG_CONTEXT_TRACKING_FORCE and eventually make sure
nothing else is turning on CONFIG_CONTEXT_TRACKING.

You can keep CONFIG_NO_HZ_IDLE though, just not CONFIG_NO_HZ_FULL.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 14:59                               ` Dave Jones
  2014-11-19 17:22                                 ` Linus Torvalds
  2014-11-19 21:01                                 ` Andy Lutomirski
@ 2014-11-20 15:04                                 ` Frederic Weisbecker
  2 siblings, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-20 15:04 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Wed, Nov 19, 2014 at 09:59:02AM -0500, Dave Jones wrote:
> On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
>  > On Tue, Nov 18, 2014 at 6:19 PM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480]
>  > > CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
>  > > RIP: 0010:[<ffffffff8a1798b4>]  [<ffffffff8a1798b4>] context_tracking_user_enter+0xa4/0x190
>  > > Call Trace:
>  > >  [<ffffffff8a012fc5>] syscall_trace_leave+0xa5/0x160
>  > >  [<ffffffff8a7d8624>] int_check_syscall_exit_work+0x34/0x3d
>  > 
>  > Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.
>  > 
>  > Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?
>  > 
>  > That makes me wonder: does the problem go away if you disable NOHZ?
> 
> Aparently not.
> 
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c75:25175]
> CPU: 3 PID: 25175 Comm: trinity-c75 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
> task: ffff8800364e44d0 ti: ffff880192d2c000 task.ti: ffff880192d2c000
> RIP: 0010:[<ffffffff94175be7>]  [<ffffffff94175be7>] context_tracking_user_exit+0x57/0x120
> RSP: 0018:ffff880192d2fee8  EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000100000046 RCX: 000000336ee35b47
> RDX: 0000000000000001 RSI: ffffffff94ac1e84 RDI: ffffffff94a93725
> RBP: ffff880192d2fef8 R08: 00007f9b74d0b740 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff940d8503
> R13: ffff880192d2fe98 R14: ffffffff943884e7 R15: ffff880192d2fe48
> FS:  00007f9b74d0b740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000336f1b7740 CR3: 0000000229a95000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffff880192d30000 0000000000080000 ffff880192d2ff78 ffffffff94012c25
>  00007f9b747a5000 00007f9b747a5068 0000000000000000 0000000000000000
>  0000000000000000 ffffffff9437b3be 0000000000000000 0000000000000000
> Call Trace:
>  [<ffffffff94012c25>] syscall_trace_enter_phase1+0x125/0x1a0
>  [<ffffffff9437b3be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff947d41bf>] tracesys+0x14/0x4a
> Code: 42 fd ff 48 c7 c7 7a 1e ac 94 e8 25 29 21 00 65 8b 04 25 34 f7 1c 00 83 f8 01 74 28 f6 c7 02 74 13 0f 1f 00 e8 bb 43 fd ff 53 9d <5b> 41 5c 5d c3 0f 1f 40 00 53 9d e8 89 42 fd ff eb ee 0f 1f 80 
> sending NMI to other CPUs:
> NMI backtrace for cpu 1
> CPU: 1 PID: 25164 Comm: trinity-c64 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
> task: ffff88011600dbc0 ti: ffff8801a99a4000 task.ti: ffff8801a99a4000
> RIP: 0010:[<ffffffff940fb71e>]  [<ffffffff940fb71e>] generic_exec_single+0xee/0x1a0
> RSP: 0018:ffff8801a99a7d18  EFLAGS: 00000202
> RAX: 0000000000000000 RBX: ffff8801a99a7d20 RCX: 0000000000000038
> RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
> RBP: ffff8801a99a7d78 R08: ffff880242b57ce0 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
> R13: 0000000000000001 R14: ffff880083c28948 R15: ffffffff94166aa0
> FS:  00007f9b74d0b740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000001 CR3: 00000001d8611000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffff8801a99a7d28 0000000000000000 ffffffff94166aa0 ffff880083c28948
>  0000000000000003 00000000e38f9aac ffff880083c28948 00000000ffffffff
>  0000000000000003 ffffffff94166aa0 ffff880083c28948 0000000000000001
> Call Trace:
>  [<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
>  [<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
>  [<ffffffff940fb89a>] smp_call_function_single+0x6a/0xe0

One thing that happens a lot in your crashes is a CPU sending IPIs. Maybe
stuck polling on csd->lock or something. But's it's not the CPU that soft
lockups. At least not the first that gets reported.

>  [<ffffffff940a172b>] ? preempt_count_sub+0x7b/0x100
>  [<ffffffff941671aa>] perf_event_read+0xca/0xd0
>  [<ffffffff94167240>] perf_event_read_value+0x90/0xe0
>  [<ffffffff941689c6>] perf_read+0x226/0x370
>  [<ffffffff942fbfb7>] ? security_file_permission+0x87/0xa0
>  [<ffffffff941eafff>] vfs_read+0x9f/0x180
>  [<ffffffff941ebbd8>] SyS_read+0x58/0xd0
>  [<ffffffff947d42c9>] tracesys_phase2+0xd4/0xd9

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-17 17:03         ` Dave Jones
  2014-11-17 19:59           ` Linus Torvalds
@ 2014-11-20 15:08           ` Frederic Weisbecker
  2014-11-20 16:19             ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-20 15:08 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Mon, Nov 17, 2014 at 12:03:59PM -0500, Dave Jones wrote:
> On Sat, Nov 15, 2014 at 10:33:19PM -0800, Linus Torvalds wrote:
>  
>  > >  > I'll try that next, and check in on it tomorrow.
>  > >
>  > > No luck. Died even faster this time.
>  > 
>  > Yeah, and your other lockups haven't even been TLB related. Not that
>  > they look like anything else *either*.
>  > 
>  > I have no ideas left. I'd go for a bisection - rather than try random
>  > things, at least bisection will get us a smaller set of suspects if
>  > you can go through a few cycles of it. Even if you decide that you
>  > want to run for most of a day before you are convinced it's all good,
>  > a couple of days should get you a handful of bisection points (that's
>  > assuming you hit a couple of bad ones too that turn bad in a shorter
>  > while). And 4 or five bisections should get us from 11k commits down
>  > to the ~600 commit range. That would be a huge improvement.
> 
> Great start to the week: I decided to confirm my recollection that .17
> was ok, only to hit this within 10 minutes.
> 
> Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
> CPU: 3 PID: 17176 Comm: trinity-c95 Not tainted 3.17.0+ #87
>  0000000000000000 00000000f3a61725 ffff880244606bf0 ffffffff9583e9fa
>  ffffffff95c67918 ffff880244606c78 ffffffff9583bcc0 0000000000000010
>  ffff880244606c88 ffff880244606c20 00000000f3a61725 0000000000000000
> Call Trace:
>  <NMI>  [<ffffffff9583e9fa>] dump_stack+0x4e/0x7a
>  [<ffffffff9583bcc0>] panic+0xd4/0x207
>  [<ffffffff95150908>] watchdog_overflow_callback+0x118/0x120
>  [<ffffffff95193dbe>] __perf_event_overflow+0xae/0x340
>  [<ffffffff95192230>] ? perf_event_task_disable+0xa0/0xa0
>  [<ffffffff9501a7bf>] ? x86_perf_event_set_period+0xbf/0x150
>  [<ffffffff95194be4>] perf_event_overflow+0x14/0x20
>  [<ffffffff95020676>] intel_pmu_handle_irq+0x206/0x410
>  [<ffffffff9501966b>] perf_event_nmi_handler+0x2b/0x50
>  [<ffffffff95007bb2>] nmi_handle+0xd2/0x390
>  [<ffffffff95007ae5>] ? nmi_handle+0x5/0x390
>  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  [<ffffffff950080a2>] default_do_nmi+0x72/0x1c0
>  [<ffffffff950082a8>] do_nmi+0xb8/0x100
>  [<ffffffff9584b9aa>] end_repeat_nmi+0x1e/0x2e
>  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  <<EOE>>  <IRQ>  [<ffffffff95101685>] lock_hrtimer_base.isra.18+0x25/0x50
>  [<ffffffff951019d3>] hrtimer_try_to_cancel+0x33/0x1f0

Ah that one got fixed in the merge window and in -stable, right?

>  [<ffffffff95101baa>] hrtimer_cancel+0x1a/0x30
>  [<ffffffff95113557>] tick_nohz_restart+0x17/0x90
>  [<ffffffff95114533>] __tick_nohz_full_check+0xc3/0x100
>  [<ffffffff9511457e>] nohz_full_kick_work_func+0xe/0x10
>  [<ffffffff95188894>] irq_work_run_list+0x44/0x70
>  [<ffffffff951888ea>] irq_work_run+0x2a/0x50
>  [<ffffffff9510109b>] update_process_times+0x5b/0x70
>  [<ffffffff95113325>] tick_sched_handle.isra.20+0x25/0x60
>  [<ffffffff95113801>] tick_sched_timer+0x41/0x60
>  [<ffffffff95102281>] __run_hrtimer+0x81/0x480
>  [<ffffffff951137c0>] ? tick_sched_do_timer+0xb0/0xb0
>  [<ffffffff95102977>] hrtimer_interrupt+0x117/0x270
>  [<ffffffff950346d7>] local_apic_timer_interrupt+0x37/0x60
>  [<ffffffff9584c44f>] smp_apic_timer_interrupt+0x3f/0x50
>  [<ffffffff9584a86f>] apic_timer_interrupt+0x6f/0x80
>  <EOI>  [<ffffffff950d3f3a>] ? lock_release_holdtime.part.28+0x9a/0x160
>  [<ffffffff950ef3b7>] ? rcu_is_watching+0x27/0x60
>  [<ffffffff9508cb75>] kill_pid_info+0xf5/0x130
>  [<ffffffff9508ca85>] ? kill_pid_info+0x5/0x130
>  [<ffffffff9508ccd3>] SYSC_kill+0x103/0x330
>  [<ffffffff9508cc7c>] ? SYSC_kill+0xac/0x330
>  [<ffffffff9519b592>] ? context_tracking_user_exit+0x52/0x1a0
>  [<ffffffff950d6f1d>] ? trace_hardirqs_on_caller+0x16d/0x210
>  [<ffffffff950d6fcd>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff950137ad>] ? syscall_trace_enter+0x14d/0x330
>  [<ffffffff9508f44e>] SyS_kill+0xe/0x10
>  [<ffffffff95849b24>] tracesys+0xdd/0xe2
> Kernel Offset: 0x14000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> It could a completely different cause for lockup, but seeing this now
> has me wondering if perhaps it's something unrelated to the kernel.
> I have recollection of running late .17rc's for days without incident,
> and I'm pretty sure .17 was ok too.  But a few weeks ago I did upgrade
> that test box to the Fedora 21 beta.  Which means I have a new gcc.
> I'm not sure I really trust 4.9.1 yet, so maybe I'll see if I can
> get 4.8 back on there and see if that's any better.
> 
> 	Dave
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 21:01                                 ` Andy Lutomirski
  2014-11-19 21:47                                   ` Dave Jones
  2014-11-19 21:56                                   ` [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 Andy Lutomirski
@ 2014-11-20 15:25                                   ` Dave Jones
  2014-11-20 19:43                                     ` Linus Torvalds
  2014-11-25 12:22                                     ` Will Deacon
  2 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-20 15:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Wed, Nov 19, 2014 at 01:01:36PM -0800, Andy Lutomirski wrote:
 
 > TIF_NOHZ is not the same thing as NOHZ.  Can you try a kernel with
 > CONFIG_CONTEXT_TRACKING=n?  Doing that may involve fiddling with RCU
 > settings a bit.  The normal no HZ idle stuff has nothing to do with
 > TIF_NOHZ, and you either have TIF_NOHZ set or you have some kind of
 > thread_info corruption going on here.

Disabling CONTEXT_TRACKING didn't change the problem.
Unfortunatly the full trace didn't make it over usb-serial this time. Grr.

Here's what came over serial..

NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c35:11634]
CPU: 2 PID: 11634 Comm: trinity-c35 Not tainted 3.18.0-rc5+ #94 [loadavg: 164.79 157.30 155.90 37/409 11893]
task: ffff88014e0d96f0 ti: ffff880220eb4000 task.ti: ffff880220eb4000
RIP: 0010:[<ffffffff88379605>]  [<ffffffff88379605>] copy_user_enhanced_fast_string+0x5/0x10
RSP: 0018:ffff880220eb7ef0  EFLAGS: 00010283
RAX: ffff880220eb4000 RBX: ffffffff887dac64 RCX: 0000000000006a18
RDX: 000000000000e02f RSI: 00007f766f466620 RDI: ffff88016f6a7617
RBP: ffff880220eb7f78 R08: 8000000000000063 R09: 0000000000000004
R10: 0000000000000010 R11: 0000000000000000 R12: ffffffff880bf50d
R13: 0000000000000001 R14: ffff880220eb4000 R15: 0000000000000001
FS:  00007f766f459740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f766f461000 CR3: 000000018b00e000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffffffff882f4225 ffff880183db5a00 0000000001743440 00007f766f0fb000
 fffffffffffffeff 0000000000000000 0000000000008d79 00007f766f45f000
 ffffffff8837adae 00ff880220eb7f38 000000003203f1ac 0000000000000001
Call Trace:
 [<ffffffff882f4225>] ? SyS_add_key+0xd5/0x240
 [<ffffffff8837adae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff887da092>] system_call_fastpath+0x12/0x17
Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
sending NMI to other CPUs:


Here's a crappy phonecam pic of the screen. 
http://codemonkey.org.uk/junk/IMG_4311.jpg
There's a bit of trace missing between the above and what was on
the screen, so we missed some CPUs.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-16  1:40     ` Dave Jones
  2014-11-16  6:33       ` Linus Torvalds
@ 2014-11-20 15:28       ` Frederic Weisbecker
  1 sibling, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-20 15:28 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers, Andi Lutomirski

On Sat, Nov 15, 2014 at 08:40:06PM -0500, Dave Jones wrote:
> On Sat, Nov 15, 2014 at 04:34:05PM -0500, Dave Jones wrote:
>  > On Fri, Nov 14, 2014 at 02:01:27PM -0800, Linus Torvalds wrote:
>  > 
>  >  > But since you say "several times a day", just for fun, can you test
>  >  > the follow-up patch to that one-liner fix that Will Deacon posted
>  >  > today (Subject: "[PATCH] mmu_gather: move minimal range calculations
>  >  > into generic code"). That does some further cleanup in this area.
>  > 
>  > A few hours ago it hit the NMI watchdog again with that patch applied.
>  > Incomplete trace, but it looks different based on what did make it over.
>  > Different RIP at least.
>  > 
>  > [65155.054155] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c127:12559]
>  > [65155.054573] irq event stamp: 296752
>  > [65155.054589] hardirqs last  enabled at (296751): [<ffffffff9d87403d>] _raw_spin_unlock_irqrestore+0x5d/0x80
>  > [65155.054625] hardirqs last disabled at (296752): [<ffffffff9d875cea>] apic_timer_interrupt+0x6a/0x80
>  > [65155.054657] softirqs last  enabled at (296188): [<ffffffff9d259943>] bdi_queue_work+0x83/0x270
>  > [65155.054688] softirqs last disabled at (296184): [<ffffffff9d259920>] bdi_queue_work+0x60/0x270
>  > [65155.054721] CPU: 1 PID: 12559 Comm: trinity-c127 Not tainted 3.18.0-rc4+ #84 [loadavg: 209.68 187.90 185.33 34/431 17515]
>  > [65155.054795] task: ffff88023f664680 ti: ffff8801649f0000 task.ti: ffff8801649f0000
>  > [65155.054820] RIP: 0010:[<ffffffff9d87403f>]  [<ffffffff9d87403f>] _raw_spin_unlock_irqrestore+0x5f/0x80
>  > [65155.054852] RSP: 0018:ffff8801649f3be8  EFLAGS: 00000292
>  > [65155.054872] RAX: ffff88023f664680 RBX: 0000000000000007 RCX: 0000000000000007
>  > [65155.054895] RDX: 00000000000029e0 RSI: ffff88023f664ea0 RDI: ffff88023f664680
>  > [65155.054919] RBP: ffff8801649f3bf8 R08: 0000000000000000 R09: 0000000000000000
>  > [65155.055956] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>  > [65155.056985] R13: ffff8801649f3b58 R14: ffffffff9d3e7d0e R15: 00000000000003e0
>  > [65155.058037] FS:  00007f0dc957c700(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
>  > [65155.059083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  > [65155.060121] CR2: 00007f0dc958e000 CR3: 000000022f31e000 CR4: 00000000001407e0
>  > [65155.061152] DR0: 00007f54162bc000 DR1: 00007feb92c3d000 DR2: 0000000000000000
>  > [65155.062180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>  > [65155.063202] Stack:
>  > 
>  > And that's all she wrote.
>  > 
>  >  > If Will's patch doesn't make a difference, what about reverting that
>  >  > ce9ec37bddb6? Although it really *is* a "obvious bugfix", and I really
>  >  > don't see why any of this would be noticeable on x86 (it triggered
>  >  > issues on ARM64, but that was because ARM64 cared much more about the
>  >  > exact range).
>  > 
>  > I'll try that next, and check in on it tomorrow.
> 
> No luck. Died even faster this time.
> 
> [  772.459481] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [modprobe:31400]
> [  772.459858] irq event stamp: 3362
> [  772.459872] hardirqs last  enabled at (3361): [<ffffffff941a437c>] context_tracking_user_enter+0x9c/0x2c0
> [  772.459907] hardirqs last disabled at (3362): [<ffffffff94875bea>] apic_timer_interrupt+0x6a/0x80
> [  772.459937] softirqs last  enabled at (0): [<ffffffff940764d5>] copy_process.part.26+0x635/0x1d80
> [  772.459968] softirqs last disabled at (0): [<          (null)>]           (null)
> [  772.459996] CPU: 3 PID: 31400 Comm: modprobe Not tainted 3.18.0-rc4+ #85 [loadavg: 207.70 163.33 92.64 11/433 31547]
> [  772.460086] task: ffff88022f0b2f00 ti: ffff88019a944000 task.ti: ffff88019a944000
> [  772.460110] RIP: 0010:[<ffffffff941a437e>]  [<ffffffff941a437e>] context_tracking_user_enter+0x9e/0x2c0
> [  772.460142] RSP: 0018:ffff88019a947f00  EFLAGS: 00000282
> [  772.460161] RAX: ffff88022f0b2f00 RBX: 0000000000000000 RCX: 0000000000000000
> [  772.460184] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88022f0b2f00
> [  772.460207] RBP: ffff88019a947f10 R08: 0000000000000000 R09: 0000000000000000
> [  772.460229] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88019a947e90
> [  772.460252] R13: ffffffff940f6d04 R14: ffff88019a947ec0 R15: ffff8802447cd640
> [  772.460294] FS:  00007f3b71ee4700(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
> [  772.460362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  772.460391] CR2: 00007fffdad5af58 CR3: 000000011608e000 CR4: 00000000001407e0
> [  772.460424] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  772.460447] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  772.460470] Stack:
> [  772.460480]  ffff88019a947f58 00000000006233a8 ffff88019a947f40 ffffffff9401429d
> [  772.460512]  00000000006233a8 000000000041d68a 00000000006233a8 0000000000000000
> [  772.460543]  00000000006233a0 ffffffff94874fa4 000000001008feff 000507d93d73a434
> [  772.460574] Call Trace:
> [  772.461576]  [<ffffffff9401429d>] syscall_trace_leave+0xad/0x2e0
> [  772.462572]  [<ffffffff94874fa4>] int_check_syscall_exit_work+0x34/0x3d
> [  772.463575] Code: f8 1c 00 84 c0 75 46 48 c7 c7 51 53 cd 94 e8 aa 23 24 00 65 c7 04 25 f4 f8 1c 00 01 00 00 00 f6 c7 02 74 19 e8 84 43 f3 ff 53 9d <5b> 41 5c 5d c3 0f 1f 44 00 00 c3 0f 1f 80 00 00 00 00 53 9d e8 
> [  772.465797] Kernel panic - not syncing: softlockup: hung tasks
> [  772.466821] CPU: 3 PID: 31400 Comm: modprobe Tainted: G             L 3.18.0-rc4+ #85 [loadavg: 207.70 163.33 92.64 11/433 31547]
> [  772.468915]  ffff88022f0b2f00 00000000de65d5f5 ffff880244603dc8 ffffffff94869e01
> [  772.470031]  0000000000000000 ffffffff94c7599b ffff880244603e48 ffffffff94866b21
> [  772.471085]  ffff880200000008 ffff880244603e58 ffff880244603df8 00000000de65d5f5
> [  772.472141] Call Trace:
> [  772.473183]  <IRQ>  [<ffffffff94869e01>] dump_stack+0x4f/0x7c
> [  772.474253]  [<ffffffff94866b21>] panic+0xcf/0x202
> [  772.475346]  [<ffffffff94154d1e>] watchdog_timer_fn+0x27e/0x290
> [  772.476414]  [<ffffffff94106297>] __run_hrtimer+0xe7/0x740
> [  772.477475]  [<ffffffff94106b64>] ? hrtimer_interrupt+0x94/0x270
> [  772.478555]  [<ffffffff94154aa0>] ? watchdog+0x40/0x40
> [  772.479627]  [<ffffffff94106be7>] hrtimer_interrupt+0x117/0x270
> [  772.480703]  [<ffffffff940303db>] local_apic_timer_interrupt+0x3b/0x70
> [  772.481777]  [<ffffffff948777f3>] smp_apic_timer_interrupt+0x43/0x60
> [  772.482856]  [<ffffffff94875bef>] apic_timer_interrupt+0x6f/0x80
> [  772.483915]  <EOI>  [<ffffffff941a437e>] ? context_tracking_user_enter+0x9e/0x2c0
> [  772.484972]  [<ffffffff9401429d>] syscall_trace_leave+0xad/0x2e0

It looks like we are looping somewhere around syscall_trace_leave(). Maybe the
TIF WORK_SYSCALL_EXIT flags aren't cleared properly after some of them got processed. Or
something keeps setting a TIF_WORK_SYSCALL_EXIT flag after they get cleared and we loop
endlessly to jump to int_check_syscall_exit_work().

Andi did some work there lately. Cc'ing him.

> [  772.486042]  [<ffffffff94874fa4>] int_check_syscall_exit_work+0x34/0x3d
> [  772.487187] Kernel Offset: 0x13000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> 
> 	Dave
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-19 16:28                                   ` Vivek Goyal
@ 2014-11-20 16:10                                     ` Dave Jones
  2014-11-20 16:48                                       ` Vivek Goyal
  2014-11-20 16:54                                       ` Vivek Goyal
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-20 16:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Don Zickus, Thomas Gleixner, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers, WANG Chao, Baoquan He, Dave Young

On Wed, Nov 19, 2014 at 11:28:06AM -0500, Vivek Goyal wrote:
 
 > I am wondering may be in some cases we panic in second kernel and sit
 > there. Probably we should append a kernel command line automatically
 > say "panic=1" so that it reboots itself if second kernel panics.
 > 
 > By any chance, have you enabled "CONFIG_RANDOMIZE_BASE"? If yes, please
 > disable that as currently kexec/kdump stuff does not work with it. And
 > it hangs very early in the boot process and I had to hook serial console
 > to get following message on console.

I did have that enabled. (Perhaps the kconfig should conflict?)

After rebuilding without it, this..

 > > dracut: *** Stripping files done ***
 > > dracut: *** Store current command line parameters ***
 > > dracut: *** Creating image file ***
 > > dracut: *** Creating image file done ***
 > > kdumpctl: cat: write error: Broken pipe
 > > kdumpctl: kexec: failed to load kdump kernel
 > > kdumpctl: Starting kdump: [FAILED]
 
went away. It generated the image, and things looked good.
I did echo c > /proc/sysrq-trigger and got this..

SysRq : Trigger a crash
BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1192
in_atomic(): 0, irqs_disabled(): 0, pid: 8860, name: bash
3 locks held by bash/8860:
 #0:  (sb_writers#5){......}, at: [<ffffffff811eac13>] vfs_write+0x1b3/0x1f0
 #1:  (rcu_read_lock){......}, at: [<ffffffff8144a435>] __handle_sysrq+0x5/0x1b0
 #2:  (&mm->mmap_sem){......}, at: [<ffffffff8103cb20>] __do_page_fault+0x140/0x600
Preemption disabled at:[<ffffffff817ca332>] printk+0x5c/0x72

CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 2/143 8909]
 00000000000004a8 00000000e1f75c1b ffff880236473c28 ffffffff817ce5c7
 0000000000000000 0000000000000000 ffff880236473c58 ffffffff8109af8a
 ffff880236473c58 0000000000000029 0000000000000000 ffff880236473d88
Call Trace:
 [<ffffffff817ce5c7>] dump_stack+0x4f/0x7c
 [<ffffffff8109af8a>] __might_sleep+0x12a/0x190
 [<ffffffff8103cb3b>] __do_page_fault+0x15b/0x600
 [<ffffffff811613b2>] ? irq_work_queue+0x62/0xd0
 [<ffffffff8137ad7d>] ? trace_hardirqs_off_thunk+0x3a/0x3f
 [<ffffffff8103cfec>] do_page_fault+0xc/0x10
 [<ffffffff817dbcf2>] page_fault+0x22/0x30
 [<ffffffff817ca332>] ? printk+0x5c/0x72
 [<ffffffff81449ce6>] ? sysrq_handle_crash+0x16/0x20
 [<ffffffff8144a567>] __handle_sysrq+0x137/0x1b0
 [<ffffffff8144a435>] ? __handle_sysrq+0x5/0x1b0
 [<ffffffff8144aa4a>] write_sysrq_trigger+0x4a/0x50
 [<ffffffff81259f2d>] proc_reg_write+0x3d/0x80
 [<ffffffff811eab1a>] vfs_write+0xba/0x1f0
 [<ffffffff811eb628>] SyS_write+0x58/0xd0
 [<ffffffff817da052>] system_call_fastpath+0x12/0x17
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 1/143 8909]
task: ffff8800a1a60000 ti: ffff880236470000 task.ti: ffff880236470000
RIP: 0010:[<ffffffff81449ce6>]  [<ffffffff81449ce6>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff880236473e38  EFLAGS: 00010246
RAX: 000000000000000f RBX: ffffffff81cb4a00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff817ca332 RDI: 0000000000000063
RBP: ffff880236473e38 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000358 R11: 0000000000000357 R12: 0000000000000063
R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000
FS:  00007fc652f4e740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000023a3b2000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffff880236473e78 ffffffff8144a567 ffffffff8144a435 0000000000000002
 0000000000000002 00007fc652f51000 0000000000000002 ffff880236473f48
 ffff880236473ea8 ffffffff8144aa4a 0000000000000002 00007fc652f51000
Call Trace:
 [<ffffffff8144a567>] __handle_sysrq+0x137/0x1b0
 [<ffffffff8144a435>] ? __handle_sysrq+0x5/0x1b0
 [<ffffffff8144aa4a>] write_sysrq_trigger+0x4a/0x50
 [<ffffffff81259f2d>] proc_reg_write+0x3d/0x80
 [<ffffffff811eab1a>] vfs_write+0xba/0x1f0
 [<ffffffff811eb628>] SyS_write+0x58/0xd0
 [<ffffffff817da052>] system_call_fastpath+0x12/0x17
Code: 01 f4 45 39 a5 b4 00 00 00 75 e2 4c 89 ef e8 d2 f7 ff ff eb d8 0f 1f 44 00 00 55 c7 05 08 b7 7e 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 48 89 e5 
RIP  [<ffffffff81449ce6>] sysrq_handle_crash+0x16/0x20
 RSP <ffff880236473e38>
CR2: 0000000000000000

Which, asides from the sleeping while atomic thing which isn't important,
does what I expected.  Shortly later, it rebooted.

And then /var/crash was empty.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 15:08           ` Frederic Weisbecker
@ 2014-11-20 16:19             ` Dave Jones
  2014-11-20 16:42               ` Frederic Weisbecker
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-20 16:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Thu, Nov 20, 2014 at 04:08:00PM +0100, Frederic Weisbecker wrote:
 
 > > Great start to the week: I decided to confirm my recollection that .17
 > > was ok, only to hit this within 10 minutes.
 > > 
 > > Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
 > > CPU: 3 PID: 17176 Comm: trinity-c95 Not tainted 3.17.0+ #87
 > >  0000000000000000 00000000f3a61725 ffff880244606bf0 ffffffff9583e9fa
 > >  ffffffff95c67918 ffff880244606c78 ffffffff9583bcc0 0000000000000010
 > >  ffff880244606c88 ffff880244606c20 00000000f3a61725 0000000000000000
 > > Call Trace:
 > >  <NMI>  [<ffffffff9583e9fa>] dump_stack+0x4e/0x7a
 > >  [<ffffffff9583bcc0>] panic+0xd4/0x207
 > >  [<ffffffff95150908>] watchdog_overflow_callback+0x118/0x120
 > >  [<ffffffff95193dbe>] __perf_event_overflow+0xae/0x340
 > >  [<ffffffff95192230>] ? perf_event_task_disable+0xa0/0xa0
 > >  [<ffffffff9501a7bf>] ? x86_perf_event_set_period+0xbf/0x150
 > >  [<ffffffff95194be4>] perf_event_overflow+0x14/0x20
 > >  [<ffffffff95020676>] intel_pmu_handle_irq+0x206/0x410
 > >  [<ffffffff9501966b>] perf_event_nmi_handler+0x2b/0x50
 > >  [<ffffffff95007bb2>] nmi_handle+0xd2/0x390
 > >  [<ffffffff95007ae5>] ? nmi_handle+0x5/0x390
 > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 > >  [<ffffffff950080a2>] default_do_nmi+0x72/0x1c0
 > >  [<ffffffff950082a8>] do_nmi+0xb8/0x100
 > >  [<ffffffff9584b9aa>] end_repeat_nmi+0x1e/0x2e
 > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
 > >  <<EOE>>  <IRQ>  [<ffffffff95101685>] lock_hrtimer_base.isra.18+0x25/0x50
 > >  [<ffffffff951019d3>] hrtimer_try_to_cancel+0x33/0x1f0
 > 
 > Ah that one got fixed in the merge window and in -stable, right?
 
If that's true, that changes everything, and this might be more
bisectable.  I did the test above on 3.17, but perhaps I should
try a run on 3.17.3

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 16:19             ` Dave Jones
@ 2014-11-20 16:42               ` Frederic Weisbecker
  0 siblings, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-20 16:42 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Thu, Nov 20, 2014 at 11:19:25AM -0500, Dave Jones wrote:
> On Thu, Nov 20, 2014 at 04:08:00PM +0100, Frederic Weisbecker wrote:
>  
>  > > Great start to the week: I decided to confirm my recollection that .17
>  > > was ok, only to hit this within 10 minutes.
>  > > 
>  > > Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
>  > > CPU: 3 PID: 17176 Comm: trinity-c95 Not tainted 3.17.0+ #87
>  > >  0000000000000000 00000000f3a61725 ffff880244606bf0 ffffffff9583e9fa
>  > >  ffffffff95c67918 ffff880244606c78 ffffffff9583bcc0 0000000000000010
>  > >  ffff880244606c88 ffff880244606c20 00000000f3a61725 0000000000000000
>  > > Call Trace:
>  > >  <NMI>  [<ffffffff9583e9fa>] dump_stack+0x4e/0x7a
>  > >  [<ffffffff9583bcc0>] panic+0xd4/0x207
>  > >  [<ffffffff95150908>] watchdog_overflow_callback+0x118/0x120
>  > >  [<ffffffff95193dbe>] __perf_event_overflow+0xae/0x340
>  > >  [<ffffffff95192230>] ? perf_event_task_disable+0xa0/0xa0
>  > >  [<ffffffff9501a7bf>] ? x86_perf_event_set_period+0xbf/0x150
>  > >  [<ffffffff95194be4>] perf_event_overflow+0x14/0x20
>  > >  [<ffffffff95020676>] intel_pmu_handle_irq+0x206/0x410
>  > >  [<ffffffff9501966b>] perf_event_nmi_handler+0x2b/0x50
>  > >  [<ffffffff95007bb2>] nmi_handle+0xd2/0x390
>  > >  [<ffffffff95007ae5>] ? nmi_handle+0x5/0x390
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  [<ffffffff950080a2>] default_do_nmi+0x72/0x1c0
>  > >  [<ffffffff950082a8>] do_nmi+0xb8/0x100
>  > >  [<ffffffff9584b9aa>] end_repeat_nmi+0x1e/0x2e
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  <<EOE>>  <IRQ>  [<ffffffff95101685>] lock_hrtimer_base.isra.18+0x25/0x50
>  > >  [<ffffffff951019d3>] hrtimer_try_to_cancel+0x33/0x1f0
>  > 
>  > Ah that one got fixed in the merge window and in -stable, right?
>  
> If that's true, that changes everything, and this might be more
> bisectable.  I did the test above on 3.17, but perhaps I should
> try a run on 3.17.3

It might not be easier to bisect because stable is a seperate branch than the next -rc1.
And that above got fixed in -rc1, perhaps in the same merge window where the new different
issues were introduced. So you'll probably need to shutdown the above issue in order to
bisect the others.

What you can do is to bisect and then before every build apply the patches that
fix the above issue in -stable, those that I just enumerated to gregkh in our
discussion with him. There are only 4. Just try to apply all of them before each
build, unless they are already.

I could give you a much simpler hack but I fear it may chaoticly apply depending if
the real fixes are applied, halfway or not at all, all that with unpredictable results.
So lets rather stick to what we know to work.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 16:10                                     ` Dave Jones
@ 2014-11-20 16:48                                       ` Vivek Goyal
  2014-11-20 17:38                                         ` Dave Jones
  2014-11-20 16:54                                       ` Vivek Goyal
  1 sibling, 1 reply; 486+ messages in thread
From: Vivek Goyal @ 2014-11-20 16:48 UTC (permalink / raw)
  To: Dave Jones, Don Zickus, Thomas Gleixner, Linus Torvalds,
	Linux Kernel, the arch/x86 maintainers, WANG Chao, Baoquan He,
	Dave Young

On Thu, Nov 20, 2014 at 11:10:55AM -0500, Dave Jones wrote:
> On Wed, Nov 19, 2014 at 11:28:06AM -0500, Vivek Goyal wrote:
>  
>  > I am wondering may be in some cases we panic in second kernel and sit
>  > there. Probably we should append a kernel command line automatically
>  > say "panic=1" so that it reboots itself if second kernel panics.
>  > 
>  > By any chance, have you enabled "CONFIG_RANDOMIZE_BASE"? If yes, please
>  > disable that as currently kexec/kdump stuff does not work with it. And
>  > it hangs very early in the boot process and I had to hook serial console
>  > to get following message on console.
> 
> I did have that enabled. (Perhaps the kconfig should conflict?)

Hi Dave,

Actually kexec/kdump allows booting into a different kernel than running
kernel. So one could have KEXEC and CONFIG_RANDOMIZE_BASE enabled in
the kernel at the same time but still booting into a kernel with
CONFIG_RANDOMIZE_BASE=n and that should work. CONFIG_RANDOMIZE_BASE is
only a problem if it is enabled in second kernel.So kconfig conflict
might not be a good fit here.

> 
> After rebuilding without it, this..
> 
>  > > dracut: *** Stripping files done ***
>  > > dracut: *** Store current command line parameters ***
>  > > dracut: *** Creating image file ***
>  > > dracut: *** Creating image file done ***
>  > > kdumpctl: cat: write error: Broken pipe
>  > > kdumpctl: kexec: failed to load kdump kernel
>  > > kdumpctl: Starting kdump: [FAILED]
>  
> went away. It generated the image, and things looked good.
> I did echo c > /proc/sysrq-trigger and got this..
> 
> SysRq : Trigger a crash
> BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1192
> in_atomic(): 0, irqs_disabled(): 0, pid: 8860, name: bash
> 3 locks held by bash/8860:
>  #0:  (sb_writers#5){......}, at: [<ffffffff811eac13>] vfs_write+0x1b3/0x1f0
>  #1:  (rcu_read_lock){......}, at: [<ffffffff8144a435>] __handle_sysrq+0x5/0x1b0
>  #2:  (&mm->mmap_sem){......}, at: [<ffffffff8103cb20>] __do_page_fault+0x140/0x600
> Preemption disabled at:[<ffffffff817ca332>] printk+0x5c/0x72
> 
> CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 2/143 8909]
>  00000000000004a8 00000000e1f75c1b ffff880236473c28 ffffffff817ce5c7
>  0000000000000000 0000000000000000 ffff880236473c58 ffffffff8109af8a
>  ffff880236473c58 0000000000000029 0000000000000000 ffff880236473d88
> Call Trace:
>  [<ffffffff817ce5c7>] dump_stack+0x4f/0x7c
>  [<ffffffff8109af8a>] __might_sleep+0x12a/0x190
>  [<ffffffff8103cb3b>] __do_page_fault+0x15b/0x600
>  [<ffffffff811613b2>] ? irq_work_queue+0x62/0xd0
>  [<ffffffff8137ad7d>] ? trace_hardirqs_off_thunk+0x3a/0x3f
>  [<ffffffff8103cfec>] do_page_fault+0xc/0x10
>  [<ffffffff817dbcf2>] page_fault+0x22/0x30
>  [<ffffffff817ca332>] ? printk+0x5c/0x72
>  [<ffffffff81449ce6>] ? sysrq_handle_crash+0x16/0x20
>  [<ffffffff8144a567>] __handle_sysrq+0x137/0x1b0
>  [<ffffffff8144a435>] ? __handle_sysrq+0x5/0x1b0
>  [<ffffffff8144aa4a>] write_sysrq_trigger+0x4a/0x50
>  [<ffffffff81259f2d>] proc_reg_write+0x3d/0x80
>  [<ffffffff811eab1a>] vfs_write+0xba/0x1f0
>  [<ffffffff811eb628>] SyS_write+0x58/0xd0
>  [<ffffffff817da052>] system_call_fastpath+0x12/0x17
> Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 1/143 8909]
> task: ffff8800a1a60000 ti: ffff880236470000 task.ti: ffff880236470000
> RIP: 0010:[<ffffffff81449ce6>]  [<ffffffff81449ce6>] sysrq_handle_crash+0x16/0x20
> RSP: 0018:ffff880236473e38  EFLAGS: 00010246
> RAX: 000000000000000f RBX: ffffffff81cb4a00 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff817ca332 RDI: 0000000000000063
> RBP: ffff880236473e38 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000358 R11: 0000000000000357 R12: 0000000000000063
> R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000
> FS:  00007fc652f4e740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000023a3b2000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Stack:
>  ffff880236473e78 ffffffff8144a567 ffffffff8144a435 0000000000000002
>  0000000000000002 00007fc652f51000 0000000000000002 ffff880236473f48
>  ffff880236473ea8 ffffffff8144aa4a 0000000000000002 00007fc652f51000
> Call Trace:
>  [<ffffffff8144a567>] __handle_sysrq+0x137/0x1b0
>  [<ffffffff8144a435>] ? __handle_sysrq+0x5/0x1b0
>  [<ffffffff8144aa4a>] write_sysrq_trigger+0x4a/0x50
>  [<ffffffff81259f2d>] proc_reg_write+0x3d/0x80
>  [<ffffffff811eab1a>] vfs_write+0xba/0x1f0
>  [<ffffffff811eb628>] SyS_write+0x58/0xd0
>  [<ffffffff817da052>] system_call_fastpath+0x12/0x17
> Code: 01 f4 45 39 a5 b4 00 00 00 75 e2 4c 89 ef e8 d2 f7 ff ff eb d8 0f 1f 44 00 00 55 c7 05 08 b7 7e 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 48 89 e5 
> RIP  [<ffffffff81449ce6>] sysrq_handle_crash+0x16/0x20
>  RSP <ffff880236473e38>
> CR2: 0000000000000000
> 
> Which, asides from the sleeping while atomic thing which isn't important,
> does what I expected.  Shortly later, it rebooted.
> 
> And then /var/crash was empty.

These messages came from first kernel. I think we have failed very early
in second kernel boot.

Can we try following and retry and see if some additional messages show
up on console and help us narrow down the problem.

- Enable verbose boot messages. CONFIG_X86_VERBOSE_BOOTUP=y

- Enable early printk in second kernel. (earlyprintk=ttyS0,115200).

  You can either enable early printk in first kernel and reboot. That way
  second kernel will automatically have it enabled. Or you can edit
  "/etc/sysconfig/kdump" and append earlyprintk=<> to KDUMP_COMMANDLINE_APPEND. 
  You will need to restart kdump service after this.

- Enable some debug output during runtime from kexec purgatory. For that one
  needs to pass additional arguments to /sbin/kexec. You can edit
  /etc/sysconfig/kdump file and modify "KEXEC_ARGS" to pass additional
  arguments to /sbin/kexec during kernel load. I use following for my
  serial console.

  KEXEC_ARGS="--console-serial --serial=0x3f8 --serial-baud=115200"

  You will need to restart kdump service.

I hope above give us some information to work with and figure out where
did we fail while booting into second kernel.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 16:10                                     ` Dave Jones
  2014-11-20 16:48                                       ` Vivek Goyal
@ 2014-11-20 16:54                                       ` Vivek Goyal
  1 sibling, 0 replies; 486+ messages in thread
From: Vivek Goyal @ 2014-11-20 16:54 UTC (permalink / raw)
  To: Dave Jones, Don Zickus, Thomas Gleixner, Linus Torvalds,
	Linux Kernel, the arch/x86 maintainers, WANG Chao, Baoquan He,
	Dave Young

On Thu, Nov 20, 2014 at 11:10:55AM -0500, Dave Jones wrote:
> On Wed, Nov 19, 2014 at 11:28:06AM -0500, Vivek Goyal wrote:
>  
>  > I am wondering may be in some cases we panic in second kernel and sit
>  > there. Probably we should append a kernel command line automatically
>  > say "panic=1" so that it reboots itself if second kernel panics.
>  > 
>  > By any chance, have you enabled "CONFIG_RANDOMIZE_BASE"? If yes, please
>  > disable that as currently kexec/kdump stuff does not work with it. And
>  > it hangs very early in the boot process and I had to hook serial console
>  > to get following message on console.
> 
> I did have that enabled. (Perhaps the kconfig should conflict?)

Hi Dave,

Can you please also send me your kernel config file. I will try that on
my machine and see if I can reproduce the problem on my machine.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 16:48                                       ` Vivek Goyal
@ 2014-11-20 17:38                                         ` Dave Jones
  2014-11-21  9:46                                           ` Dave Young
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-20 17:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Don Zickus, Thomas Gleixner, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers, WANG Chao, Baoquan He, Dave Young

On Thu, Nov 20, 2014 at 11:48:09AM -0500, Vivek Goyal wrote:
 
 > Can we try following and retry and see if some additional messages show
 > up on console and help us narrow down the problem.
 > 
 > - Enable verbose boot messages. CONFIG_X86_VERBOSE_BOOTUP=y
 > 
 > - Enable early printk in second kernel. (earlyprintk=ttyS0,115200).
 > 
 >   You can either enable early printk in first kernel and reboot. That way
 >   second kernel will automatically have it enabled. Or you can edit
 >   "/etc/sysconfig/kdump" and append earlyprintk=<> to KDUMP_COMMANDLINE_APPEND. 
 >   You will need to restart kdump service after this.
 > 
 > - Enable some debug output during runtime from kexec purgatory. For that one
 >   needs to pass additional arguments to /sbin/kexec. You can edit
 >   /etc/sysconfig/kdump file and modify "KEXEC_ARGS" to pass additional
 >   arguments to /sbin/kexec during kernel load. I use following for my
 >   serial console.
 > 
 >   KEXEC_ARGS="--console-serial --serial=0x3f8 --serial-baud=115200"
 > 
 >   You will need to restart kdump service.

The only serial port on this machine is usb serial, which doesn't have io ports.

>From my reading of the kexec man page, it doesn't look like I can tell
it to use ttyUSB0.

And because it relies on usb being initialized, this probably isn't
going to help too much with early boot.

earlyprintk=tty0 didn't show anything extra after the sysrq-c oops.
likewise, =ttyUSB0

I'm going to try bisecting the problem I'm debugging again, so I'm not
going to dig into this much more today.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 15:25                                   ` frequent lockups in 3.18rc4 Dave Jones
@ 2014-11-20 19:43                                     ` Linus Torvalds
  2014-11-20 20:06                                       ` Dave Jones
                                                         ` (2 more replies)
  2014-11-25 12:22                                     ` Will Deacon
  1 sibling, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-20 19:43 UTC (permalink / raw)
  To: Dave Jones, Andy Lutomirski, Linus Torvalds, Don Zickus,
	Thomas Gleixner, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra

On Thu, Nov 20, 2014 at 7:25 AM, Dave Jones <davej@redhat.com> wrote:
>
> Disabling CONTEXT_TRACKING didn't change the problem.
> Unfortunatly the full trace didn't make it over usb-serial this time. Grr.
>
> Here's what came over serial..
>
> NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c35:11634]
> RIP: 0010:[<ffffffff88379605>]  [<ffffffff88379605>] copy_user_enhanced_fast_string+0x5/0x10
> RAX: ffff880220eb4000 RBX: ffffffff887dac64 RCX: 0000000000006a18
> RDX: 000000000000e02f RSI: 00007f766f466620 RDI: ffff88016f6a7617
> RBP: ffff880220eb7f78 R08: 8000000000000063 R09: 0000000000000004
> Call Trace:
>  [<ffffffff882f4225>] ? SyS_add_key+0xd5/0x240
>  [<ffffffff8837adae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff887da092>] system_call_fastpath+0x12/0x17

Ok, that's just about half-way in a ~57kB memory copy (you can see it
in the register state: %rdx contains the original size of the key
payload, rcx contains the current remaining size: 57kB total, 27kB
left).

And it's holding absolutely zero locks, and not even doing anything
odd. It wasn't doing anything particularly odd before either, although
the kmalloc() of a 64kB area might just have caused a fair amount of
VM work, of course.

You know what? I'm seriously starting to think that these bugs aren't
actually real. Or rather, I don't think it's really a true softlockup,
because most of them seem to happen in totally harmless code.

So I'm wondering whether the real issue might not be just this:

   [loadavg: 164.79 157.30 155.90 37/409 11893]

together with possibly a scheduler issue and/or a bug in the smpboot
thread logic (that the watchdog uses) or similar.

That's *especially* true if it turns out that the 3.17 problem you saw
was actually a perf bug that has already been fixed and is in stable.
We've been looking at kernel/smp.c changes, and looking for x86 IPI or
APIC changes, and found some harmlessly (at least on x86) suspicious
code and this exercise might be worth it for that reason, but what if
it's really just a scheduler regression.

There's been a *lot* more scheduler changes since 3.17 than the small
things we've looked at for x86 entry or IPI handling. And the
scheduler changes have been about things like overloaded scheduling
groups etc, and I could easily imaging that some bug *there* ends up
causing the watchdog process not to schedule.

Hmm? Scheduler people?

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 19:43                                     ` Linus Torvalds
@ 2014-11-20 20:06                                       ` Dave Jones
  2014-11-20 20:37                                       ` Don Zickus
  2014-11-21  6:37                                       ` Ingo Molnar
  2 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-20 20:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Thu, Nov 20, 2014 at 11:43:07AM -0800, Linus Torvalds wrote:
 
 > You know what? I'm seriously starting to think that these bugs aren't
 > actually real. Or rather, I don't think it's really a true softlockup,
 > because most of them seem to happen in totally harmless code.
 > 
 > So I'm wondering whether the real issue might not be just this:
 > 
 >    [loadavg: 164.79 157.30 155.90 37/409 11893]
 > 
 > together with possibly a scheduler issue and/or a bug in the smpboot
 > thread logic (that the watchdog uses) or similar.
 > 
 > That's *especially* true if it turns out that the 3.17 problem you saw
 > was actually a perf bug that has already been fixed and is in stable.
 > We've been looking at kernel/smp.c changes, and looking for x86 IPI or
 > APIC changes, and found some harmlessly (at least on x86) suspicious
 > code and this exercise might be worth it for that reason, but what if
 > it's really just a scheduler regression.

I started a run against 3.17 with the perf fixes. If that survives
today, I'll start a bisection tomorrow.

 > There's been a *lot* more scheduler changes since 3.17 than the small
 > things we've looked at for x86 entry or IPI handling. And the
 > scheduler changes have been about things like overloaded scheduling
 > groups etc, and I could easily imaging that some bug *there* ends up
 > causing the watchdog process not to schedule.

One other data point: I put another box into service for testing,
but it's considerably slower (a ~6 year old Xeon vs the Haswell).
Maybe it's just because it's so much slower that it'll take longer,
(or slow enough that the bug is masked) but that machine hasn't had
a problem yet in almost a day of runtime.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  2014-11-19 22:13                                     ` Thomas Gleixner
@ 2014-11-20 20:33                                       ` Linus Torvalds
  2014-11-20 22:07                                         ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-20 20:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Wed, Nov 19, 2014 at 2:13 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Right, while it is wrong it does not explain the wreckage on 3.17,
> which does not have that code.

Thomas, I'm currently going off the assumption that I'll see this from
the x86 trees, and I can ignore the patch. It doesn't seem like this
is a particularly pressing bug.

If it's *not* going to show up as a pull request, holler, and I'll
just apply it.

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 19:43                                     ` Linus Torvalds
  2014-11-20 20:06                                       ` Dave Jones
@ 2014-11-20 20:37                                       ` Don Zickus
  2014-11-20 20:51                                         ` Linus Torvalds
  2014-11-21  6:37                                       ` Ingo Molnar
  2 siblings, 1 reply; 486+ messages in thread
From: Don Zickus @ 2014-11-20 20:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Andy Lutomirski, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Thu, Nov 20, 2014 at 11:43:07AM -0800, Linus Torvalds wrote:
> On Thu, Nov 20, 2014 at 7:25 AM, Dave Jones <davej@redhat.com> wrote:
> >
> > Disabling CONTEXT_TRACKING didn't change the problem.
> > Unfortunatly the full trace didn't make it over usb-serial this time. Grr.
> >
> > Here's what came over serial..
> >
> > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c35:11634]
> > RIP: 0010:[<ffffffff88379605>]  [<ffffffff88379605>] copy_user_enhanced_fast_string+0x5/0x10
> > RAX: ffff880220eb4000 RBX: ffffffff887dac64 RCX: 0000000000006a18
> > RDX: 000000000000e02f RSI: 00007f766f466620 RDI: ffff88016f6a7617
> > RBP: ffff880220eb7f78 R08: 8000000000000063 R09: 0000000000000004
> > Call Trace:
> >  [<ffffffff882f4225>] ? SyS_add_key+0xd5/0x240
> >  [<ffffffff8837adae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff887da092>] system_call_fastpath+0x12/0x17
> 
> Ok, that's just about half-way in a ~57kB memory copy (you can see it
> in the register state: %rdx contains the original size of the key
> payload, rcx contains the current remaining size: 57kB total, 27kB
> left).
> 
> And it's holding absolutely zero locks, and not even doing anything
> odd. It wasn't doing anything particularly odd before either, although
> the kmalloc() of a 64kB area might just have caused a fair amount of
> VM work, of course.

Just for clarification, softlockups are processes hogging the cpu (thus
blocking the high priority per-cpu watchdog thread).

Hardlockups on the other hand are cpus with interrupts disabled for too
long (thus blocking the timer interrupt).

The might coincide with your scheduler theory below.  Don't know.

Cheers,
Don

> 
> You know what? I'm seriously starting to think that these bugs aren't
> actually real. Or rather, I don't think it's really a true softlockup,
> because most of them seem to happen in totally harmless code.
> 
> So I'm wondering whether the real issue might not be just this:
> 
>    [loadavg: 164.79 157.30 155.90 37/409 11893]
> 
> together with possibly a scheduler issue and/or a bug in the smpboot
> thread logic (that the watchdog uses) or similar.
> 
> That's *especially* true if it turns out that the 3.17 problem you saw
> was actually a perf bug that has already been fixed and is in stable.
> We've been looking at kernel/smp.c changes, and looking for x86 IPI or
> APIC changes, and found some harmlessly (at least on x86) suspicious
> code and this exercise might be worth it for that reason, but what if
> it's really just a scheduler regression.
> 
> There's been a *lot* more scheduler changes since 3.17 than the small
> things we've looked at for x86 entry or IPI handling. And the
> scheduler changes have been about things like overloaded scheduling
> groups etc, and I could easily imaging that some bug *there* ends up
> causing the watchdog process not to schedule.
> 
> Hmm? Scheduler people?
> 
>                        Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 20:37                                       ` Don Zickus
@ 2014-11-20 20:51                                         ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-20 20:51 UTC (permalink / raw)
  To: Don Zickus
  Cc: Dave Jones, Andy Lutomirski, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Thu, Nov 20, 2014 at 12:37 PM, Don Zickus <dzickus@redhat.com> wrote:
>
> Just for clarification, softlockups are processes hogging the cpu (thus
> blocking the high priority per-cpu watchdog thread).

Right. And there is no actual sign of any CPU hogging going on.
There's a single system call with a small payload (I think it's safe
to call 64kB small these days), no hugely contended CPU-spinning
locking, nada.

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 12:23                                               ` Tejun Heo
@ 2014-11-20 21:58                                                 ` Thomas Gleixner
  2014-11-20 22:06                                                   ` Andy Lutomirski
  2014-11-20 22:11                                                   ` Tejun Heo
  0 siblings, 2 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-20 21:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Thu, 20 Nov 2014, Tejun Heo wrote:
> On Thu, Nov 20, 2014 at 12:50:36AM +0100, Frederic Weisbecker wrote:
> > > Are we talking about different per cpu allocators here or am I missing
> > > something completely non obvious?
> > 
> > That's the same allocator yeah. So if the whole memory is dereferenced,
> > faults shouldn't happen indeed.
> > 
> > Maybe that was a bug a few years ago but not anymore.
> 
> It has been always like that tho.  Percpu memory given out is always
> populated and cleared.
> 
> > Is it possible that, somehow, some part isn't zeroed by pcpu_alloc()?
> > After all it's allocated with vzalloc() so that part could be skipped. The memset(0)
> 
> The vzalloc call is for the internal allocation bitmap not the actual
> percpu memory area.  The actual address areas for percpu memory are
> obtained using pcpu_get_vm_areas() call and later get populated using
> map_kernel_range_noflush() (flush is performed after mapping is
> complete).
> 
> Trying to remember what happens with vmalloc_fault().  Ah okay, so
> when a new PUD gets created for vmalloc area, we don't go through all
> PGDs and update them.  The PGD entries get faulted in lazily.  Percpu
> memory allocator clearing or not clearing the allocated area doesn't
> have anything to do with it.  The memory area is always fully
> populated in the kernel page table.  It's just that the population
> happened while a different PGD was active and this PGD hasn't been
> populated with the new PUD yet.

It's completely undocumented behaviour, whether it has been that way
for ever or not. And I agree with Fredric, that it is insane. Actuallu
it's beyond insane, really.

> So, yeap, vmalloc_fault() can always happen when accessing vmalloc
> areas and the only way to avoid that would be removing lazy PGD
> population - going through all PGDs and populating new PUDs
> immediately.

There is no requirement to go through ALL PGDs and populate that stuff
immediately.

Lets look at the two types of allocations

   1) Kernel percpu allocations

   2) Per process/task percpu allocations

Of course we do not have a way to distinguish those, but we really
should have one.

#1 Kernel percpu allocations usually happen in the context of driver
   bringup, subsystem initialization, interrupt setup etc.

   So this is functionality which is not a hotpath and usually
   requires some form of synchronization versus the rest of the system
   anyway.

   The per cpu population stuff is serialized with a mutex anyway, so
   what's wrong to have a globaly visible percpu sequence counter,
   which is incremented whenever a new allocation is populated or torn
   down?

   We can make that sequence counter a per cpu variable as well to
   avoid the issues of a global variable (preferrably that's a
   compile/boot time allocated percpu variable to avoid the obvious
   circulus vitiosus)

   Now after that increment the allocation side needs to wait for a
   scheduling cycle on all cpus (we have mechanisms for that)
   
   So in the scheduler if the same task gets reselected you check that
   sequence count and update the PGD if different. If a task switch
   happens then you also need to check the sequence count and act
   accordingly.

   If we make the sequence counter a percpu variable as outlined above
   the overhead of checking this is just noise versus the other
   nonsense we do in schedule().


#2 That's process related statistics and instrumentation stuff.

   Now that just needs a immediate population on the process->mm->pgd
   aside of the init_mm.pgd, but that's really not a big deal.

Of course that does not solve the issues we have with the current
infrastructure retroactively, but it allows us to avoid fuckups like
the one Frederic was talking about that perf invented its own kmalloc
based 'percpu' replacement just to workaround the shortcoming in a
particular place.

What really frightens me is the potential and well hidden fuckup
potential which lurks around the corner and the hard to debug once in
a while fallout which might be caused by this.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 486+ messages in thread

* [tip:x86/urgent] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  2014-11-19 21:56                                   ` [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 Andy Lutomirski
  2014-11-19 22:13                                     ` Thomas Gleixner
@ 2014-11-20 22:04                                     ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 486+ messages in thread
From: tip-bot for Andy Lutomirski @ 2014-11-20 22:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, dzickus, mingo, luto, davej, torvalds, hpa, linux-kernel, tglx

Commit-ID:  b5e212a3051b65e426a513901d9c7001681c7215
Gitweb:     http://git.kernel.org/tip/b5e212a3051b65e426a513901d9c7001681c7215
Author:     Andy Lutomirski <luto@amacapital.net>
AuthorDate: Wed, 19 Nov 2014 13:56:19 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 20 Nov 2014 23:01:53 +0100

x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1

TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
_TIF_SINGLESTEP), not (1<<19).

This code is involved in Dave's trinity lockup, but I don't see why
it would cause any of the problems he's seeing, except inadvertently
by causing a different path through entry_64.S's syscall handling.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 749b0e4..e510618 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1484,7 +1484,7 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
 	 */
 	if (work & _TIF_NOHZ) {
 		user_exit();
-		work &= ~TIF_NOHZ;
+		work &= ~_TIF_NOHZ;
 	}
 
 #ifdef CONFIG_SECCOMP

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 21:58                                                 ` Thomas Gleixner
@ 2014-11-20 22:06                                                   ` Andy Lutomirski
  2014-11-20 22:11                                                   ` Tejun Heo
  1 sibling, 0 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-20 22:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tejun Heo, Frederic Weisbecker, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 1:58 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 20 Nov 2014, Tejun Heo wrote:
>> On Thu, Nov 20, 2014 at 12:50:36AM +0100, Frederic Weisbecker wrote:
>> > > Are we talking about different per cpu allocators here or am I missing
>> > > something completely non obvious?
>> >
>> > That's the same allocator yeah. So if the whole memory is dereferenced,
>> > faults shouldn't happen indeed.
>> >
>> > Maybe that was a bug a few years ago but not anymore.
>>
>> It has been always like that tho.  Percpu memory given out is always
>> populated and cleared.
>>
>> > Is it possible that, somehow, some part isn't zeroed by pcpu_alloc()?
>> > After all it's allocated with vzalloc() so that part could be skipped. The memset(0)
>>
>> The vzalloc call is for the internal allocation bitmap not the actual
>> percpu memory area.  The actual address areas for percpu memory are
>> obtained using pcpu_get_vm_areas() call and later get populated using
>> map_kernel_range_noflush() (flush is performed after mapping is
>> complete).
>>
>> Trying to remember what happens with vmalloc_fault().  Ah okay, so
>> when a new PUD gets created for vmalloc area, we don't go through all
>> PGDs and update them.  The PGD entries get faulted in lazily.  Percpu
>> memory allocator clearing or not clearing the allocated area doesn't
>> have anything to do with it.  The memory area is always fully
>> populated in the kernel page table.  It's just that the population
>> happened while a different PGD was active and this PGD hasn't been
>> populated with the new PUD yet.
>
> It's completely undocumented behaviour, whether it has been that way
> for ever or not. And I agree with Fredric, that it is insane. Actuallu
> it's beyond insane, really.
>
>> So, yeap, vmalloc_fault() can always happen when accessing vmalloc
>> areas and the only way to avoid that would be removing lazy PGD
>> population - going through all PGDs and populating new PUDs
>> immediately.
>
> There is no requirement to go through ALL PGDs and populate that stuff
> immediately.
>
> Lets look at the two types of allocations
>
>    1) Kernel percpu allocations
>
>    2) Per process/task percpu allocations
>
> Of course we do not have a way to distinguish those, but we really
> should have one.
>
> #1 Kernel percpu allocations usually happen in the context of driver
>    bringup, subsystem initialization, interrupt setup etc.
>
>    So this is functionality which is not a hotpath and usually
>    requires some form of synchronization versus the rest of the system
>    anyway.
>
>    The per cpu population stuff is serialized with a mutex anyway, so
>    what's wrong to have a globaly visible percpu sequence counter,
>    which is incremented whenever a new allocation is populated or torn
>    down?
>
>    We can make that sequence counter a per cpu variable as well to
>    avoid the issues of a global variable (preferrably that's a
>    compile/boot time allocated percpu variable to avoid the obvious
>    circulus vitiosus)
>
>    Now after that increment the allocation side needs to wait for a
>    scheduling cycle on all cpus (we have mechanisms for that)
>
>    So in the scheduler if the same task gets reselected you check that
>    sequence count and update the PGD if different. If a task switch
>    happens then you also need to check the sequence count and act
>    accordingly.
>
>    If we make the sequence counter a percpu variable as outlined above
>    the overhead of checking this is just noise versus the other
>    nonsense we do in schedule().

This seems like a reasonable idea, but I'd suggest a minor change:
rather than using a sequence number, track the number of kernel pgds.
That number should rarely change, and it's only one byte long.  That
means that we can easily stick it in mm_context_t without making it
any bigger.

The count for init_mm could be copied into cpu_tlbstate, which is
always hot on context switch.

>
>
> #2 That's process related statistics and instrumentation stuff.
>
>    Now that just needs a immediate population on the process->mm->pgd
>    aside of the init_mm.pgd, but that's really not a big deal.
>
> Of course that does not solve the issues we have with the current
> infrastructure retroactively, but it allows us to avoid fuckups like
> the one Frederic was talking about that perf invented its own kmalloc
> based 'percpu' replacement just to workaround the shortcoming in a
> particular place.
>
> What really frightens me is the potential and well hidden fuckup
> potential which lurks around the corner and the hard to debug once in
> a while fallout which might be caused by this.

The annoying part of this is that pgd allocation is *so* rare that
bugs here can probably go unnoticed for a long time.

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  2014-11-20 20:33                                       ` Linus Torvalds
@ 2014-11-20 22:07                                         ` Thomas Gleixner
  0 siblings, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-20 22:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Thu, 20 Nov 2014, Linus Torvalds wrote:
> On Wed, Nov 19, 2014 at 2:13 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Right, while it is wrong it does not explain the wreckage on 3.17,
> > which does not have that code.
> 
> Thomas, I'm currently going off the assumption that I'll see this from
> the x86 trees, and I can ignore the patch. It doesn't seem like this
> is a particularly pressing bug.
> 
> If it's *not* going to show up as a pull request, holler, and I'll
> just apply it.

I'll send out a updated pul request for the one Ingo sent earlier
today in a second.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 21:58                                                 ` Thomas Gleixner
  2014-11-20 22:06                                                   ` Andy Lutomirski
@ 2014-11-20 22:11                                                   ` Tejun Heo
  2014-11-20 22:42                                                     ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Tejun Heo @ 2014-11-20 22:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 10:58:26PM +0100, Thomas Gleixner wrote:
> It's completely undocumented behaviour, whether it has been that way
> for ever or not. And I agree with Fredric, that it is insane. Actuallu
> it's beyond insane, really.

This is exactly the same for any address in the vmalloc space.

..
>    So in the scheduler if the same task gets reselected you check that
>    sequence count and update the PGD if different. If a task switch
>    happens then you also need to check the sequence count and act
>    accordingly.

That isn't enough tho.  What if the percpu allocated pointer gets
passed to another CPU without task switching?  You'd at least need to
send IPIs to all CPUs so that all the active PGDs get updated
synchronously.

> What really frightens me is the potential and well hidden fuckup
> potential which lurks around the corner and the hard to debug once in
> a while fallout which might be caused by this.

Lazy vmalloc population through fault is something we accepted as
reasonable as it works fine for most of the kernel.  If the lazy
loading can be improved so that it doesn't depend on faulting, great.
For the time being, we can make percpu accessors complain when called
from nmi handlers so that the problematic ones can be easily
identified.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 22:11                                                   ` Tejun Heo
@ 2014-11-20 22:42                                                     ` Thomas Gleixner
  2014-11-20 23:05                                                       ` Tejun Heo
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-20 22:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Thu, 20 Nov 2014, Tejun Heo wrote:
> On Thu, Nov 20, 2014 at 10:58:26PM +0100, Thomas Gleixner wrote:
> > It's completely undocumented behaviour, whether it has been that way
> > for ever or not. And I agree with Fredric, that it is insane. Actuallu
> > it's beyond insane, really.
> 
> This is exactly the same for any address in the vmalloc space.

I know, but I really was not aware of the fact that dynamically
allocated percpu stuff is vmalloc based and therefor exposed to the
same issues.

The normal vmalloc space simply does not have the problems which are
generated by percpu allocations which have no documented access
restrictions.

You created a special case and that special case is clever but not
very well thought out considering the use cases of percpu variables
and the completely undocumented limitations you introduced silently.

Just admit it and dont try to educate me about trivial vmalloc
properties.

> ..
> >    So in the scheduler if the same task gets reselected you check that
> >    sequence count and update the PGD if different. If a task switch
> >    happens then you also need to check the sequence count and act
> >    accordingly.
> 
> That isn't enough tho.  What if the percpu allocated pointer gets
> passed to another CPU without task switching?  You'd at least need to
> send IPIs to all CPUs so that all the active PGDs get updated
> synchronously.

You obviously did not even take the time to carefully read what I
wrote:

   "Now after that increment the allocation side needs to wait for a
    scheduling cycle on all cpus (we have mechanisms for that)"

That's exactly stating what you claim to be 'not enough'. 

> > What really frightens me is the potential and well hidden fuckup
> > potential which lurks around the corner and the hard to debug once in
> > a while fallout which might be caused by this.
> 
> Lazy vmalloc population through fault is something we accepted as
> reasonable as it works fine for most of the kernel. 

Emphasis on most.

I'm well aware about the lazy vmalloc population, but I was definitely
not aware about the implications chosen by the dynamic percpu
allocator. I do not care about random discussion threads on LKML or
random slides you produced for a conference. All I care about is that
I cannot find a single word of documentation about that in the source
tree. Neither in the percpu implementation nor in Documentation/

> For the time being, we can make percpu accessors complain when
> called from nmi handlers so that the problematic ones can be easily
> identified.

You should have done that in the very first place instead of letting
other people run into issues which you should have thought of from the
very beginning.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 22:42                                                     ` Thomas Gleixner
@ 2014-11-20 23:05                                                       ` Tejun Heo
  2014-11-20 23:08                                                         ` Andy Lutomirski
  2014-11-21  0:54                                                         ` Thomas Gleixner
  0 siblings, 2 replies; 486+ messages in thread
From: Tejun Heo @ 2014-11-20 23:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Hello,

On Thu, Nov 20, 2014 at 11:42:42PM +0100, Thomas Gleixner wrote:
> On Thu, 20 Nov 2014, Tejun Heo wrote:
> > On Thu, Nov 20, 2014 at 10:58:26PM +0100, Thomas Gleixner wrote:
> > > It's completely undocumented behaviour, whether it has been that way
> > > for ever or not. And I agree with Fredric, that it is insane. Actuallu
> > > it's beyond insane, really.
> > 
> > This is exactly the same for any address in the vmalloc space.
> 
> I know, but I really was not aware of the fact that dynamically
> allocated percpu stuff is vmalloc based and therefor exposed to the
> same issues.
> 
> The normal vmalloc space simply does not have the problems which are
> generated by percpu allocations which have no documented access
> restrictions.
>
> You created a special case and that special case is clever but not
> very well thought out considering the use cases of percpu variables
> and the completely undocumented limitations you introduced silently.
> 
> Just admit it and dont try to educate me about trivial vmalloc
> properties.

Why are you always so overly dramatic?  How is this productive?  Sure,
this could have been better but I missed it at the beginning and this
is the first time I hear about this issue.  Shits happen and we fix
them.

> > That isn't enough tho.  What if the percpu allocated pointer gets
> > passed to another CPU without task switching?  You'd at least need to
> > send IPIs to all CPUs so that all the active PGDs get updated
> > synchronously.
> 
> You obviously did not even take the time to carefully read what I
> wrote:
> 
>    "Now after that increment the allocation side needs to wait for a
>     scheduling cycle on all cpus (we have mechanisms for that)"
> 
> That's exactly stating what you claim to be 'not enough'. 

Missed that.  Sorry.

> > For the time being, we can make percpu accessors complain when
> > called from nmi handlers so that the problematic ones can be easily
> > identified.
> 
> You should have done that in the very first place instead of letting
> other people run into issues which you should have thought of from the
> very beginning.

Sure, it would have been better if I noticed that from the get-go, but
I couldn't think of the NMI case that time and neither did anybody who
reviewed the code.  It'd be awesome if we could have avoided it but it
didn't go that way, so let's fix it.  Can we please stay technical?

So, for now, all we need is adding nmi check in percpu accessors,
right?

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 23:05                                                       ` Tejun Heo
@ 2014-11-20 23:08                                                         ` Andy Lutomirski
  2014-11-20 23:34                                                           ` Linus Torvalds
  2014-11-20 23:39                                                           ` Tejun Heo
  2014-11-21  0:54                                                         ` Thomas Gleixner
  1 sibling, 2 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-20 23:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Frederic Weisbecker, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 3:05 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Thu, Nov 20, 2014 at 11:42:42PM +0100, Thomas Gleixner wrote:
>> On Thu, 20 Nov 2014, Tejun Heo wrote:
>> > On Thu, Nov 20, 2014 at 10:58:26PM +0100, Thomas Gleixner wrote:
>> > > It's completely undocumented behaviour, whether it has been that way
>> > > for ever or not. And I agree with Fredric, that it is insane. Actuallu
>> > > it's beyond insane, really.
>> >
>> > This is exactly the same for any address in the vmalloc space.
>>
>> I know, but I really was not aware of the fact that dynamically
>> allocated percpu stuff is vmalloc based and therefor exposed to the
>> same issues.
>>
>> The normal vmalloc space simply does not have the problems which are
>> generated by percpu allocations which have no documented access
>> restrictions.
>>
>> You created a special case and that special case is clever but not
>> very well thought out considering the use cases of percpu variables
>> and the completely undocumented limitations you introduced silently.
>>
>> Just admit it and dont try to educate me about trivial vmalloc
>> properties.
>
> Why are you always so overly dramatic?  How is this productive?  Sure,
> this could have been better but I missed it at the beginning and this
> is the first time I hear about this issue.  Shits happen and we fix
> them.
>
>> > That isn't enough tho.  What if the percpu allocated pointer gets
>> > passed to another CPU without task switching?  You'd at least need to
>> > send IPIs to all CPUs so that all the active PGDs get updated
>> > synchronously.
>>
>> You obviously did not even take the time to carefully read what I
>> wrote:
>>
>>    "Now after that increment the allocation side needs to wait for a
>>     scheduling cycle on all cpus (we have mechanisms for that)"
>>
>> That's exactly stating what you claim to be 'not enough'.
>
> Missed that.  Sorry.
>
>> > For the time being, we can make percpu accessors complain when
>> > called from nmi handlers so that the problematic ones can be easily
>> > identified.
>>
>> You should have done that in the very first place instead of letting
>> other people run into issues which you should have thought of from the
>> very beginning.
>
> Sure, it would have been better if I noticed that from the get-go, but
> I couldn't think of the NMI case that time and neither did anybody who
> reviewed the code.  It'd be awesome if we could have avoided it but it
> didn't go that way, so let's fix it.  Can we please stay technical?
>
> So, for now, all we need is adding nmi check in percpu accessors,
> right?
>

What's the issue with nmi?  Page faults are supposed to nest correctly
inside nmi, right?

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 23:08                                                         ` Andy Lutomirski
@ 2014-11-20 23:34                                                           ` Linus Torvalds
  2014-11-20 23:39                                                           ` Tejun Heo
  1 sibling, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-20 23:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tejun Heo, Thomas Gleixner, Frederic Weisbecker, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 3:08 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> What's the issue with nmi?  Page faults are supposed to nest correctly
> inside nmi, right?

They should, now, yes. There used to be issues with the whole "that
re-enables NMI".

Which reminds me. We never took your patches that use ljmp to handle
the return-to-kernel mode. You did them for performance reasons, but I
think the bigger deal was that it would have cleaned up that whole
special case.

Or did they have other problems? The ones to return to user space were
admittedly more fun, but just a tad too crazy (and not _quite_ in the
"crazy like a fox" camp ;)

            Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 23:08                                                         ` Andy Lutomirski
  2014-11-20 23:34                                                           ` Linus Torvalds
@ 2014-11-20 23:39                                                           ` Tejun Heo
  2014-11-20 23:55                                                             ` Andy Lutomirski
  2014-11-21  2:33                                                             ` Steven Rostedt
  1 sibling, 2 replies; 486+ messages in thread
From: Tejun Heo @ 2014-11-20 23:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Frederic Weisbecker, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 03:08:03PM -0800, Andy Lutomirski wrote:
> > So, for now, all we need is adding nmi check in percpu accessors,
> > right?
> >
> 
> What's the issue with nmi?  Page faults are supposed to nest correctly
> inside nmi, right?

Thought they couldn't.  Looking at the trace that Frederic linked, it
looks like straight-out tracing function recursion due to an
unexpected fault while holding a lock.  I don't think this can be
annotated from percpu accessor side.  There's nothing special about
the context.  :(

Does this matter for anybody other than tracers?  Ultimately, the
solution would be removing the vmalloc area faulting as Thomas
suggested.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 23:39                                                           ` Tejun Heo
@ 2014-11-20 23:55                                                             ` Andy Lutomirski
  2014-11-21 16:27                                                               ` Tejun Heo
  2014-11-21  2:33                                                             ` Steven Rostedt
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-20 23:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Frederic Weisbecker, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 3:39 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Nov 20, 2014 at 03:08:03PM -0800, Andy Lutomirski wrote:
>> > So, for now, all we need is adding nmi check in percpu accessors,
>> > right?
>> >
>>
>> What's the issue with nmi?  Page faults are supposed to nest correctly
>> inside nmi, right?
>
> Thought they couldn't.  Looking at the trace that Frederic linked, it
> looks like straight-out tracing function recursion due to an
> unexpected fault while holding a lock.  I don't think this can be
> annotated from percpu accessor side.  There's nothing special about
> the context.  :(

That doesn't appear to have anything to with nmi though, right?

Wouldn't this issue be fixed by moving the vmalloc_fault check into
do_page_fault before exception_enter?

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 23:05                                                       ` Tejun Heo
  2014-11-20 23:08                                                         ` Andy Lutomirski
@ 2014-11-21  0:54                                                         ` Thomas Gleixner
  2014-11-21 14:13                                                           ` Frederic Weisbecker
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-21  0:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Tejun,

On Thu, 20 Nov 2014, Tejun Heo wrote:
> On Thu, Nov 20, 2014 at 11:42:42PM +0100, Thomas Gleixner wrote:
> > On Thu, 20 Nov 2014, Tejun Heo wrote:
> > > On Thu, Nov 20, 2014 at 10:58:26PM +0100, Thomas Gleixner wrote:
> > > > It's completely undocumented behaviour, whether it has been that way
> > > > for ever or not. And I agree with Fredric, that it is insane. Actuallu
> > > > it's beyond insane, really.
> > > 
> > > This is exactly the same for any address in the vmalloc space.
> > 
> > I know, but I really was not aware of the fact that dynamically
> > allocated percpu stuff is vmalloc based and therefor exposed to the
> > same issues.
> > 
> > The normal vmalloc space simply does not have the problems which are
> > generated by percpu allocations which have no documented access
> > restrictions.
> >
> > You created a special case and that special case is clever but not
> > very well thought out considering the use cases of percpu variables
> > and the completely undocumented limitations you introduced silently.
> > 
> > Just admit it and dont try to educate me about trivial vmalloc
> > properties.
> 
> Why are you always so overly dramatic?

This has nothing to do with dramatic. It's a matter of fact that I do
not need an education on the basic properties of the vmalloc space.

I just refuse to accept that you try to tell me that I should be aware
of this:

> > > This is exactly the same for any address in the vmalloc space.

What I was not aware of and even was not aware of after staring into
that code fore quite some time is the fact that the whole percpu
business is vmalloc based and therefor exposed to the same limitations
as the vmalloc space in general.

I'm not a mm expert and without the slightest piece of documentation
except for the chunk allocator, which is completely irrelevant in this
context, there is not a single word of explanation about the design and
the resulting limitations of that in the kernel tree.

So, I'm overly dramatic, because I tell you that I'm well aware of the
general vmalloc approach, which is btw. well documented?

> How is this productive?

It's obviously very productive, because I'm AFAICT the first person
who did not take your design decisions as granted and sacrosanct.

> Sure, this could have been better but I missed it at the beginning
> and this is the first time I hear about this issue.

So the issues Frederic talked about in that very thread about
recursive faults and the need that perf had to emulate percpu stuff in
order to work around them have never been communicated to you?

I that's the case then that's not your problem, but a serious problem
in our overall process.

> Shits happen and we fix them.

I have no problem with that and I'm not trying to put blame on you.

As you might have noticed I spent quite some time to think about a
possibile solution and also clearly stated that it's perhaps not
solving the issue at hand (while it's not complex to implement) it
might be too complex backport. The response I get from you is:

> > > That isn't enough tho.  What if the percpu allocated pointer gets
> > > passed to another CPU without task switching?  You'd at least need to
> > > send IPIs to all CPUs so that all the active PGDs get updated
> > > synchronously.
> > 
> > You obviously did not even take the time to carefully read what I
> > wrote:
> > 
> >    "Now after that increment the allocation side needs to wait for a
> >     scheduling cycle on all cpus (we have mechanisms for that)"
> > 
> > That's exactly stating what you claim to be 'not enough'. 
> 
> Missed that.  Sorry.

Apology accepted.
 
> So, for now, all we need is adding nmi check in percpu accessors,
> right?

s/all we need/all we can do/

I think is the proper technical expression for that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 23:39                                                           ` Tejun Heo
  2014-11-20 23:55                                                             ` Andy Lutomirski
@ 2014-11-21  2:33                                                             ` Steven Rostedt
  1 sibling, 0 replies; 486+ messages in thread
From: Steven Rostedt @ 2014-11-21  2:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andy Lutomirski, Thomas Gleixner, Frederic Weisbecker,
	Linus Torvalds, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Thu, Nov 20, 2014 at 06:39:20PM -0500, Tejun Heo wrote:
> On Thu, Nov 20, 2014 at 03:08:03PM -0800, Andy Lutomirski wrote:
> > > So, for now, all we need is adding nmi check in percpu accessors,
> > > right?
> > >
> > 
> > What's the issue with nmi?  Page faults are supposed to nest correctly
> > inside nmi, right?
> 
> Thought they couldn't.  Looking at the trace that Frederic linked, it
> looks like straight-out tracing function recursion due to an
> unexpected fault while holding a lock.  I don't think this can be
> annotated from percpu accessor side.  There's nothing special about
> the context.  :(

There use to be issues with page faults in NMI. One was that the iretq
from the page fault handler would re-enable NMIs, and if another NMI triggered
then it would stomp all over the stack of the initial NMI. But my tripple
copy of the NMI stack frame solved that. You can read all about it here:

  http://lwn.net/Articles/484932/

The second bug was that if an NMI triggered right after a page fault, and
it had a page fault, the content of the cr2 register (faulting address)
would be lost for the page fault that was preempted by the NMI.
This too was solved by using (queue irony) using per_cpu variables.

Now I'm hoping that kernel boot time per_cpu variables never take any
faults, otherwise we are all f*cked!

> 
> Does this matter for anybody other than tracers?  Ultimately, the
> solution would be removing the vmalloc area faulting as Thomas
> suggested.

I don't know, but per_cpu variables are rather special and used all
over the place. Most other vmalloc code isn't as used as per_cpu is.

-- Steve


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 19:43                                     ` Linus Torvalds
  2014-11-20 20:06                                       ` Dave Jones
  2014-11-20 20:37                                       ` Don Zickus
@ 2014-11-21  6:37                                       ` Ingo Molnar
  2014-11-21 14:50                                         ` Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Ingo Molnar @ 2014-11-21  6:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Andy Lutomirski, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> [...]
> 
> That's *especially* true if it turns out that the 3.17 problem 
> you saw was actually a perf bug that has already been fixed and 
> is in stable. We've been looking at kernel/smp.c changes, and 
> looking for x86 IPI or APIC changes, and found some harmlessly 
> (at least on x86) suspicious code and this exercise might be 
> worth it for that reason, but what if it's really just a 
> scheduler regression.
> 
> There's been a *lot* more scheduler changes since 3.17 than the 
> small things we've looked at for x86 entry or IPI handling. And 
> the scheduler changes have been about things like overloaded 
> scheduling groups etc, and I could easily imaging that some bug 
> *there* ends up causing the watchdog process not to schedule.
> 
> Hmm? Scheduler people?

Hm, that's a possibility, yes.

The watchdog threads are pretty simple beasts though, using 
SCHED_FIFO:

 kernel/watchdog.c:      watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);

which is typically only affected by less than 10% of scheduler 
changes - but it's entirely possible still.

It might make sense to disable the softlockup detector altogether 
and just see whether trinity finishes/wedges, whether a login 
over the console is still possible - etc.

The softlockup messages in themselves are only analytical, unless 
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1 is used.

Interesting bug.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 17:38                                         ` Dave Jones
@ 2014-11-21  9:46                                           ` Dave Young
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Young @ 2014-11-21  9:46 UTC (permalink / raw)
  To: Dave Jones, Vivek Goyal, Don Zickus, Thomas Gleixner,
	Linus Torvalds, Linux Kernel, the arch/x86 maintainers,
	WANG Chao, Baoquan He

On 11/20/14 at 12:38pm, Dave Jones wrote:
> On Thu, Nov 20, 2014 at 11:48:09AM -0500, Vivek Goyal wrote:
>  
>  > Can we try following and retry and see if some additional messages show
>  > up on console and help us narrow down the problem.
>  > 
>  > - Enable verbose boot messages. CONFIG_X86_VERBOSE_BOOTUP=y
>  > 
>  > - Enable early printk in second kernel. (earlyprintk=ttyS0,115200).
>  > 
>  >   You can either enable early printk in first kernel and reboot. That way
>  >   second kernel will automatically have it enabled. Or you can edit
>  >   "/etc/sysconfig/kdump" and append earlyprintk=<> to KDUMP_COMMANDLINE_APPEND. 
>  >   You will need to restart kdump service after this.
>  > 
>  > - Enable some debug output during runtime from kexec purgatory. For that one
>  >   needs to pass additional arguments to /sbin/kexec. You can edit
>  >   /etc/sysconfig/kdump file and modify "KEXEC_ARGS" to pass additional
>  >   arguments to /sbin/kexec during kernel load. I use following for my
>  >   serial console.
>  > 
>  >   KEXEC_ARGS="--console-serial --serial=0x3f8 --serial-baud=115200"
>  > 
>  >   You will need to restart kdump service.
> 
> The only serial port on this machine is usb serial, which doesn't have io ports.
> 
> From my reading of the kexec man page, it doesn't look like I can tell
> it to use ttyUSB0.

Enabling ttyUSB0 still need hacks in dracut/kdump module to pack the usb serial
ko to initramfs and load it early. We can work on it in Fedora because it may benefit
to some late problems.

> 
> And because it relies on usb being initialized, this probably isn't
> going to help too much with early boot.
> 
> earlyprintk=tty0 didn't show anything extra after the sysrq-c oops.
> likewise, =ttyUSB0

earlyprintk=vga instead of tty0?
earlyprintk=efi in case efi boot.

earlyprintk=dbgp sometimes also helps but it's a little hard to setup because we
need a usb debugger. My nokia n900 works well as a debugger. But to find a usable
usb debug port in native host might fail, so this is my last try for earlyprintk :(

> 
> I'm going to try bisecting the problem I'm debugging again, so I'm not
> going to dig into this much more today.
> 

Another case what I know about kdump kernel issue is nouveau sometimes does not work
So if this is the case you can try add "rd.driver.blacklist=nouveau" to field
KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump. Or just add "nomodeset" in 1st
kernel grub cmdline so that 2nd kernel will reuse it to avoid load drm modules and
also earlyprintk=vga probably could show something.

Thanks
Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21  0:54                                                         ` Thomas Gleixner
@ 2014-11-21 14:13                                                           ` Frederic Weisbecker
  2014-11-21 16:25                                                             ` Tejun Heo
  0 siblings, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-21 14:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tejun Heo, Linus Torvalds, Dave Jones, Don Zickus, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 01:54:00AM +0100, Thomas Gleixner wrote:
> On Thu, 20 Nov 2014, Tejun Heo wrote:
> > Sure, this could have been better but I missed it at the beginning
> > and this is the first time I hear about this issue.
> 
> So the issues Frederic talked about in that very thread about
> recursive faults and the need that perf had to emulate percpu stuff in
> order to work around them have never been communicated to you?
> 
> I that's the case then that's not your problem, but a serious problem
> in our overall process.

So when the issue arised 4 years ago, it was a problem only for NMIs.
Like Linus says: "what happens in NMI stays in NMI". Ok no that's not quite
what he says :-)  But NMIs happen to be a corner case for about everything
and it's sometimes better to fix things from NMI itself, or have an NMI
special case rather than grow the whole infrastructure in complexity to
support this very corner case.

Not saying that's the only valid approach to take wrt. NMIs but those vmalloc faults
seemed to be well established and generally known (except perhaps for percpu)
and NMI was the only corner case, and we are used to that, so fixing the issue
for NMIs only felt like the right direction when we fixed the callchain thing
with other perf developers.

I certainly should have talked to Tejun about that but it took a bit of time
for me to realize that randomly faultable memory is a dangerous behaviour.

Add to that a bit of the "take the infrastrusture as granted" problem when
you're not well experienced enough...

Anyway, I really hope we fix that, that's a bomb waiting to explode.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21  6:37                                       ` Ingo Molnar
@ 2014-11-21 14:50                                         ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-21 14:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Fri, Nov 21, 2014 at 07:37:42AM +0100, Ingo Molnar wrote:

 > It might make sense to disable the softlockup detector altogether 
 > and just see whether trinity finishes/wedges, whether a login 
 > over the console is still possible - etc.

I can give that a try later.

 > The softlockup messages in themselves are only analytical, unless 
 > CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1 is used.

Hm, I don't recall why I had that set. That should make things easier
to debug if the machine stays alive a little longer rather than
panicing. At least it might make sure that I get the full traces
over usb-serial.

Additionally, it might make ftrace an option.

The last thing I tested was 3.17 plus the perf fixes Frederic pointed
out yesterday. It's survived 20 hours of runtime, so I'm back to
believing that this is a recent (ie, post 3.17 bug).

Running into the weekend though, so I'm not going to get to bisecting
until Monday probably. So maybe I'll try your idea at the top of this
mail in my over-the-weekend run.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 14:13                                                           ` Frederic Weisbecker
@ 2014-11-21 16:25                                                             ` Tejun Heo
  2014-11-21 17:01                                                               ` Steven Rostedt
  2014-11-21 21:44                                                               ` Frederic Weisbecker
  0 siblings, 2 replies; 486+ messages in thread
From: Tejun Heo @ 2014-11-21 16:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Hello, Frederic.

On Fri, Nov 21, 2014 at 03:13:35PM +0100, Frederic Weisbecker wrote:
...
> So when the issue arised 4 years ago, it was a problem only for NMIs.
> Like Linus says: "what happens in NMI stays in NMI". Ok no that's not quite
> what he says :-)  But NMIs happen to be a corner case for about everything
> and it's sometimes better to fix things from NMI itself, or have an NMI
> special case rather than grow the whole infrastructure in complexity to
> support this very corner case.

I'm not familiar with the innards of fault handling, so can you please
help me understand what may actually break?  Here are what I currently
understand.

* Static percpu areas wouldn't trigger fault lazily.  Note that this
  is not necessarily because the first percpu chunk which contains the
  static area is embedded inside the kernel linear mapping.  Depending
  on the memory layout and boot param, percpu allocator may choose to
  map the first chunk in vmalloc space too; however, this still works
  out fine because at that point there are no other page tables and
  the PUD entries covering the first chunk is faulted in before other
  pages tables are copied from the kernel one.

* NMI used to be a problem because vmalloc fault handler couldn't
  safely nest inside NMI handler but this has been fixed since and it
  should work fine from NMI handlers now.

* Function tracers are problematic because they may end up nesting
  inside themselves through triggering a vmalloc fault while accessing
  dynamic percpu memory area.  This may lead to recursive locking and
  other surprises.

Are there other cases where the lazy vmalloc faults can break things?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 23:55                                                             ` Andy Lutomirski
@ 2014-11-21 16:27                                                               ` Tejun Heo
  2014-11-21 16:38                                                                 ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Tejun Heo @ 2014-11-21 16:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Frederic Weisbecker, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

Hello, Andy.

On Thu, Nov 20, 2014 at 03:55:09PM -0800, Andy Lutomirski wrote:
> That doesn't appear to have anything to with nmi though, right?

I thought that was the main offender but, apparently, not any more.

> Wouldn't this issue be fixed by moving the vmalloc_fault check into
> do_page_fault before exception_enter?

Can you please elaborate why that'd fix the issue?  I'm not
intimiately familiar with the fault handling so it'd be great if you
can give me some pointers in terms of where to look at.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 16:27                                                               ` Tejun Heo
@ 2014-11-21 16:38                                                                 ` Andy Lutomirski
  2014-11-21 16:48                                                                   ` Linus Torvalds
  2014-11-21 22:10                                                                   ` Frederic Weisbecker
  0 siblings, 2 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 16:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, Thomas Gleixner, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Linus Torvalds, Frederic Weisbecker, Don Zickus,
	Dave Jones, the arch/x86 maintainers

On Nov 21, 2014 8:27 AM, "Tejun Heo" <tj@kernel.org> wrote:
>
> Hello, Andy.
>
> On Thu, Nov 20, 2014 at 03:55:09PM -0800, Andy Lutomirski wrote:
> > That doesn't appear to have anything to with nmi though, right?
>
> I thought that was the main offender but, apparently, not any more.
>
> > Wouldn't this issue be fixed by moving the vmalloc_fault check into
> > do_page_fault before exception_enter?
>
> Can you please elaborate why that'd fix the issue?  I'm not
> intimiately familiar with the fault handling so it'd be great if you
> can give me some pointers in terms of where to look at.

do_page_fault is called directly from asm.  It does:

    prev_state = exception_enter();
    __do_page_fault(regs, error_code, address);
    exception_exit(prev_state);

The vmalloc fixup is in __do_page_fault.

exception_enter does various accounting and tracing things, and I
think that the recursion in stack trace I saw was in exception_enter.

If you move the vmalloc fixup before exception_enter() and return if
the fault was from vmalloc, then you can't recurse.  You need to be
careful not to touch anything that uses RCU before exception_enter,
though.

--Andy

>
> Thanks.
>
> --
> tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 16:38                                                                 ` Andy Lutomirski
@ 2014-11-21 16:48                                                                   ` Linus Torvalds
  2014-11-21 17:08                                                                     ` Steven Rostedt
  2014-11-21 22:10                                                                   ` Frederic Weisbecker
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 16:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 8:38 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> If you move the vmalloc fixup before exception_enter() and return if
> the fault was from vmalloc, then you can't recurse.  You need to be
> careful not to touch anything that uses RCU before exception_enter,
> though.

This is probably the right thing to do anyway.

The vmalloc fixup is purely about filling in hardware structures, so
there really shouldn't be any need for RCU or anything else. It should
probably be done first, before *anything* else (like the whole
kmemcheck/kmmio fault etc handling)

That said, the whole vmalloc_fault fixup routine does some odd things,
over and beyond just filling in the page tables. So I'm not 100% sure
that is safe as-is. The 32-bit version looks fine, but the x86-64
version is very very dubious.

The x86-64 version does crazy things like:

 - uses "current->active_mm", which is very dubious
 - flush lazy mmu mode
 - walk down further in the page tables

and those are just bugs, imnsho. Get rid of that crap. The 32-bit code
does it right.

(The 64-bit mode also has a "WARN_ON_ONCE(in_nmi())", which I guess is
good - but it's good because the 64-bit version is written the way it
is).

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 16:25                                                             ` Tejun Heo
@ 2014-11-21 17:01                                                               ` Steven Rostedt
  2014-11-21 17:11                                                                 ` Steven Rostedt
  2014-11-21 21:32                                                                 ` Frederic Weisbecker
  2014-11-21 21:44                                                               ` Frederic Weisbecker
  1 sibling, 2 replies; 486+ messages in thread
From: Steven Rostedt @ 2014-11-21 17:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Thomas Gleixner, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 11:25:06AM -0500, Tejun Heo wrote:
> 
> * Static percpu areas wouldn't trigger fault lazily.  Note that this
>   is not necessarily because the first percpu chunk which contains the
>   static area is embedded inside the kernel linear mapping.  Depending
>   on the memory layout and boot param, percpu allocator may choose to
>   map the first chunk in vmalloc space too; however, this still works
>   out fine because at that point there are no other page tables and
>   the PUD entries covering the first chunk is faulted in before other
>   pages tables are copied from the kernel one.

That sounds correct.

> 
> * NMI used to be a problem because vmalloc fault handler couldn't
>   safely nest inside NMI handler but this has been fixed since and it
>   should work fine from NMI handlers now.

Right. Of course "should work fine" does not excatly mean "will work fine".


> 
> * Function tracers are problematic because they may end up nesting
>   inside themselves through triggering a vmalloc fault while accessing
>   dynamic percpu memory area.  This may lead to recursive locking and
>   other surprises.

The function tracer infrastructure now has a recursive check that happens
rather early in the call. Unless the registered OPS specifically states
it handles recursions (FTRACE_OPS_FL_RECUSION_SAFE), ftrace will add the
necessary recursion checks. If a registered OPS lies about being recusion
safe, well we can't stop suicide.

Looking at kernel/trace/trace_functions.c: function_trace_call() which is
registered with RECURSION_SAFE, I see that the recursion check is done
before the per_cpu_ptr() call to the dynamically allocated per_cpu data.

It looks OK, but...

Oh! but if we trace the page fault handler, and we fault here too
we just nuked the cr2 register. Not good.

-- Steve


> 
> Are there other cases where the lazy vmalloc faults can break things?

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 16:48                                                                   ` Linus Torvalds
@ 2014-11-21 17:08                                                                     ` Steven Rostedt
  2014-11-21 17:19                                                                       ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Steven Rostedt @ 2014-11-21 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 08:48:58AM -0800, Linus Torvalds wrote:
> 
> (The 64-bit mode also has a "WARN_ON_ONCE(in_nmi())", which I guess is
> good - but it's good because the 64-bit version is written the way it
> is).

Actually, in_nmi() is now safe for vmalloc faults. In fact, it handles the
clobbering of the cr2 register just fine. I wrote tests to test this, and
submitted patches to get rid of that warn on. But that never went through.

https://lkml.org/lkml/2013/10/15/894

-- Steve


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 17:01                                                               ` Steven Rostedt
@ 2014-11-21 17:11                                                                 ` Steven Rostedt
  2014-11-21 21:32                                                                 ` Frederic Weisbecker
  1 sibling, 0 replies; 486+ messages in thread
From: Steven Rostedt @ 2014-11-21 17:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Thomas Gleixner, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, 21 Nov 2014 12:01:51 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
 
> Looking at kernel/trace/trace_functions.c: function_trace_call() which is
> registered with RECURSION_SAFE, I see that the recursion check is done
> before the per_cpu_ptr() call to the dynamically allocated per_cpu data.
> 
> It looks OK, but...
> 
> Oh! but if we trace the page fault handler, and we fault here too
> we just nuked the cr2 register. Not good.

Ah! Looking at the code, I see that do_page_fault (called from
assembly) is marked notrace. And the first thing it does is:

	unsigned long address = read_cr2();

And uses that. Thus if the function tracer were to fault on
exception_enter() or __do_page_fautt(), the address wont be
clobbered.

-- Steve

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 17:08                                                                     ` Steven Rostedt
@ 2014-11-21 17:19                                                                       ` Linus Torvalds
  2014-11-21 17:22                                                                         ` Andy Lutomirski
  2014-11-21 17:34                                                                         ` Steven Rostedt
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 17:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 9:08 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Actually, in_nmi() is now safe for vmalloc faults. In fact, it handles the
> clobbering of the cr2 register just fine.

That's not what I object to and find incorrect wrt NMI.

Compare the simple and correct 32-bit code to the complex and
incorrect 64-bit code.

In particular, look at how the 32-bit code relies *entirely* on hardware state.

Then look at where the 64-bit code does not.

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 17:19                                                                       ` Linus Torvalds
@ 2014-11-21 17:22                                                                         ` Andy Lutomirski
  2014-11-21 18:22                                                                           ` Linus Torvalds
  2014-11-21 17:34                                                                         ` Steven Rostedt
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 17:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 9:19 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 21, 2014 at 9:08 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>>
>> Actually, in_nmi() is now safe for vmalloc faults. In fact, it handles the
>> clobbering of the cr2 register just fine.
>
> That's not what I object to and find incorrect wrt NMI.
>
> Compare the simple and correct 32-bit code to the complex and
> incorrect 64-bit code.
>
> In particular, look at how the 32-bit code relies *entirely* on hardware state.
>
> Then look at where the 64-bit code does not.

Both mystify me.  Why does the 32-bit version walk down the hierarchy
at all instead of just touching the top level?

And why does the 64-bit version assert that the leaves of the tables
match?  It's already asserted that it's walking down pgd pointers that
are *exactly the same pointers*, so of course the stuff they point to
is the same.

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 17:19                                                                       ` Linus Torvalds
  2014-11-21 17:22                                                                         ` Andy Lutomirski
@ 2014-11-21 17:34                                                                         ` Steven Rostedt
  2014-11-21 18:24                                                                           ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Steven Rostedt @ 2014-11-21 17:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, 21 Nov 2014 09:19:02 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Nov 21, 2014 at 9:08 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Actually, in_nmi() is now safe for vmalloc faults. In fact, it handles the
> > clobbering of the cr2 register just fine.
> 
> That's not what I object to and find incorrect wrt NMI.

I was commenting about the WARN_ON() itself.

> 
> Compare the simple and correct 32-bit code to the complex and
> incorrect 64-bit code.
> 
> In particular, look at how the 32-bit code relies *entirely* on hardware state.
> 
> Then look at where the 64-bit code does not.

I see. You have issues with the use of current->active_mm instead of
just doing a read_cr3() (and I'm sure other things).

Doing a series of git blame, 64 bit has been like that since 2005 (start
of git).

Looks to me we have more work to do with the merging of 64 bit and 32
bit. Perhaps 64 bit can become more like 32 bit.

-- Steve

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 17:22                                                                         ` Andy Lutomirski
@ 2014-11-21 18:22                                                                           ` Linus Torvalds
  2014-11-21 18:28                                                                             ` Andy Lutomirski
  2014-11-21 19:06                                                                             ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 18:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 9:22 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Both mystify me.  Why does the 32-bit version walk down the hierarchy
> at all instead of just touching the top level?

Quite frankly, I think it's just due to historical reasons, and should
be removed.

But the historical reasons are that with the aliasing of the PUD and
PMD entries in the PGD, it's all fairly confusing. So I think we only
used to do the top level, but then when we expanded from two levels to
three, that "top level" became the pmd, and then when we expanded from
three to four, the pmd was actually two levels down. So it's all
basically mindless work.

So I do think we could simplify and unify things.

In 32-bit mode, we actually have two different cases:

 - in PAE, there's the magic top-level 4-entry PGD that always *has*
to be present (the P bit isn't actually checked by hardware)

    As a result, in PAE mode, the top PGD entries always exist, and
are always prepopulated, and for the kernel area (including obviously
the vmalloc space) always points to the init_pgd[] entry.

    Ergo, in PAE mode, I don't think we should ever hit this case in
the first place.

 - in non-PAE mode, we should just copy the top-level entry, and return.

And in 64-bit more, we only have the "copy the top-level entry" case.

So I think we should

 (a) remove the 32-bit vs 64-bit difference, because that's not actually valid

 (b) make it a PAE vs non-PAE difference

 (c) the PAE case is a no-op

 (d) the non-PAE case would look something like this:

    static noinline int vmalloc_fault(unsigned long address)
    {
        unsigned index;
        pgd_t *pgd_dst, pgd_entry;

        /* Make sure we are in vmalloc area: */
        if (!(address >= VMALLOC_START && address < VMALLOC_END))
                return -1;

        index = pgd_index(address);
        pgd_entry = init_mm.pgd[index];
        if (!pgd_present(pgd_entry))
                return -1;

        pgd_dst = __va(PAGE_MASK & read_cr3());
        if (pgd_present(pgd_dst[index]))
                return -1;

        ACCESS_ONCE(pgd_dst[index]) = pgd_entry;
        return 0;
    }
    NOKPROBE_SYMBOL(vmalloc_fault);

and it's done.

Would anybody be willing to actually *test* something like the above?
The above may compile, but that's all the "testing" it got.

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 17:34                                                                         ` Steven Rostedt
@ 2014-11-21 18:24                                                                           ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 18:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 9:34 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I see. You have issues with the use of current->active_mm instead of
> just doing a read_cr3() (and I'm sure other things).

Yes. And I have this memory of it actually mattering, where we'd get
get the page fault, but see that the (wrong) page table is already
populated, and say "ti wasn't a vmalloc fault", and then go down the
oops path.

Of course, the context switch itself has changed completely over the
years, but I think it would still be true with NMI. "active_mm" may
point to a different page table than the one the CPU is actually
using, and then the whole thing is bogus.

               Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 18:22                                                                           ` Linus Torvalds
@ 2014-11-21 18:28                                                                             ` Andy Lutomirski
  2014-11-21 19:06                                                                             ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 10:22 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 21, 2014 at 9:22 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Both mystify me.  Why does the 32-bit version walk down the hierarchy
>> at all instead of just touching the top level?
>
> Quite frankly, I think it's just due to historical reasons, and should
> be removed.
>
> But the historical reasons are that with the aliasing of the PUD and
> PMD entries in the PGD, it's all fairly confusing. So I think we only
> used to do the top level, but then when we expanded from two levels to
> three, that "top level" became the pmd, and then when we expanded from
> three to four, the pmd was actually two levels down. So it's all
> basically mindless work.
>
> So I do think we could simplify and unify things.
>
> In 32-bit mode, we actually have two different cases:
>
>  - in PAE, there's the magic top-level 4-entry PGD that always *has*
> to be present (the P bit isn't actually checked by hardware)
>
>     As a result, in PAE mode, the top PGD entries always exist, and
> are always prepopulated, and for the kernel area (including obviously
> the vmalloc space) always points to the init_pgd[] entry.
>
>     Ergo, in PAE mode, I don't think we should ever hit this case in
> the first place.
>
>  - in non-PAE mode, we should just copy the top-level entry, and return.
>
> And in 64-bit more, we only have the "copy the top-level entry" case.
>
> So I think we should
>
>  (a) remove the 32-bit vs 64-bit difference, because that's not actually valid
>
>  (b) make it a PAE vs non-PAE difference
>
>  (c) the PAE case is a no-op
>
>  (d) the non-PAE case would look something like this:
>
>     static noinline int vmalloc_fault(unsigned long address)
>     {
>         unsigned index;
>         pgd_t *pgd_dst, pgd_entry;
>
>         /* Make sure we are in vmalloc area: */
>         if (!(address >= VMALLOC_START && address < VMALLOC_END))
>                 return -1;
>
>         index = pgd_index(address);
>         pgd_entry = init_mm.pgd[index];
>         if (!pgd_present(pgd_entry))
>                 return -1;
>
>         pgd_dst = __va(PAGE_MASK & read_cr3());
>         if (pgd_present(pgd_dst[index]))
>                 return -1;
>
>         ACCESS_ONCE(pgd_dst[index]) = pgd_entry;
>         return 0;
>     }
>     NOKPROBE_SYMBOL(vmalloc_fault);
>
> and it's done.
>
> Would anybody be willing to actually *test* something like the above?
> The above may compile, but that's all the "testing" it got.
>

I'd be happy to test it (i.e. boot it and try to use my computer), but
I have nowhere near enough RAM to do it right.

Is there any easy way to get the vmalloc code to randomize enough bits
to exercise this?

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 18:22                                                                           ` Linus Torvalds
  2014-11-21 18:28                                                                             ` Andy Lutomirski
@ 2014-11-21 19:06                                                                             ` Linus Torvalds
  2014-11-21 19:23                                                                               ` Steven Rostedt
  2014-11-21 19:51                                                                               ` Thomas Gleixner
  1 sibling, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 19:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 10:22 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>  (d) the non-PAE case would look something like this:
>
>     static noinline int vmalloc_fault(unsigned long address)
>     {
>         unsigned index;
>         pgd_t *pgd_dst, pgd_entry;
>
>         /* Make sure we are in vmalloc area: */
>         if (!(address >= VMALLOC_START && address < VMALLOC_END))
>                 return -1;

Side note: I think this is just unnecessary confusion, and generates
big constants for no good reason.

The thing is, the kernel PGD's should always be in sync. In fact, at
PGD allocation time, we just do

     clone_pgd_range(.. KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS);

and it might actually be better to structure this to be that exact same thing.

So instead of checking the address, we could just do

        index = pgd_index(address);
        if (index < KERNEL_PGD_BOUNDARY)
                return -1;

which actually matches our initialization sequence much better anyway.
And avoids those random big constants.

Also, it turns out that this:

        if (pgd_present(pgd_dst[index]))

generates a crazy big constant because of bad compiler issues (the
"pgd_present()" thing only checks the low bit, but it does so on
pgd_flags(), which does "native_pgd_val(pgd) & PTE_FLAGS_MASK", so you
have an insane extra "and" with the constant 0xffffc00000000fff, just
to then "and" it again with "1". It doesn't do that with the first
pgd_present() check, oddly enough.

WTF, gcc?

Anyway, even more importantly, because of the whole issue with nesting
page tables, it's probably best to actually avoid all the
"pgd_present()" etc helpers, because those might be hardcoded to 1
etc. So avoid the whole issue by just accessign the raw data.

Simplify, simplify, simplify. The actual code generation for this all
should be maybe 20 instructions.

Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
compiled it and verified that the code generation looks like what I'd
have expected, but that's literally it.

  static noinline int vmalloc_fault(unsigned long address)
  {
        pgd_t *pgd_dst;
        pgdval_t pgd_entry;
        unsigned index = pgd_index(address);

        if (index < KERNEL_PGD_BOUNDARY)
                return -1;

        pgd_entry = init_mm.pgd[index].pgd;
        if (!pgd_entry)
                return -1;

        pgd_dst = __va(PAGE_MASK & read_cr3());
        pgd_dst += index;

        if (pgd_dst->pgd)
                return -1;

        ACCESS_ONCE(pgd_dst->pgd) = pgd_entry;
        return 0;
  }
  NOKPROBE_SYMBOL(vmalloc_fault);

Hmm? Does anybody see anything fundamentally wrong with this?

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:06                                                                             ` Linus Torvalds
@ 2014-11-21 19:23                                                                               ` Steven Rostedt
  2014-11-21 19:34                                                                                 ` Linus Torvalds
  2014-11-21 19:51                                                                               ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Steven Rostedt @ 2014-11-21 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, 21 Nov 2014 11:06:41 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:
 
>   static noinline int vmalloc_fault(unsigned long address)
>   {
>         pgd_t *pgd_dst;
>         pgdval_t pgd_entry;
>         unsigned index = pgd_index(address);
> 
>         if (index < KERNEL_PGD_BOUNDARY)
>                 return -1;
> 
>         pgd_entry = init_mm.pgd[index].pgd;
>         if (!pgd_entry)
>                 return -1;

Should we at least check to see if it is present?

	if (!(pgd_entry & 1))
		return -1;

?

-- Steve

> 
>         pgd_dst = __va(PAGE_MASK & read_cr3());
>         pgd_dst += index;
> 
>         if (pgd_dst->pgd)
>                 return -1;
> 
>         ACCESS_ONCE(pgd_dst->pgd) = pgd_entry;
>         return 0;
>   }
>   NOKPROBE_SYMBOL(vmalloc_fault);
> 
> Hmm? Does anybody see anything fundamentally wrong with this?
> 
>                      Linus


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:23                                                                               ` Steven Rostedt
@ 2014-11-21 19:34                                                                                 ` Linus Torvalds
  2014-11-21 19:46                                                                                   ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 19:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 11:23 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Should we at least check to see if it is present?
>
>         if (!(pgd_entry & 1))
>                 return -1;

Maybe. But what other entry could there be?

But yes, returning -1 is "safe", since it basically says "I'm not
doing a vmalloc thing, oops if this is a bad access". So that kind of
argues for being as aggressive as possible in returning 1.

So for the first one (!pgd_entry), instead of returning -1 only for a
completely empty entry, returning it for any non-present case is
probably right.

And for the second one (where we check whether there is anything at
all in the destination), returning -1 for "anything but zero" is
probably the right thing to do.

But in the end, if you have a corrupted top-level kernel page table,
it sounds to me like you're just royally screwed anyway. So I don't
think it matters *that* much.

So I kind of agree, but it wouldn't be my primary worry. My primary
worry is actually paravirt doing something insane.

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:34                                                                                 ` Linus Torvalds
@ 2014-11-21 19:46                                                                                   ` Linus Torvalds
  2014-11-21 19:52                                                                                     ` Andy Lutomirski
  2014-11-21 20:00                                                                                     ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 19:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I kind of agree, but it wouldn't be my primary worry. My primary
> worry is actually paravirt doing something insane.

Btw, on that tangent, does anybody actually care about paravirt any more?

I'd love to start moving away from it. It makes a lot of the low-level
code completely impossible to follow due to the random indirection
through "native" vs "paravirt op table". Not just the page table
handling, it's all over.

Anybody who seriously does virtualization uses hw virtualization that
is much better than it used to be. And the non-serious users aren't
that performance-sensitive by definition.

I note that the Fedora kernel config seems to include paravirt by
default, so you get a lot of the crazy overheads..

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:06                                                                             ` Linus Torvalds
  2014-11-21 19:23                                                                               ` Steven Rostedt
@ 2014-11-21 19:51                                                                               ` Thomas Gleixner
  2014-11-21 20:00                                                                                 ` Linus Torvalds
  2014-11-21 22:33                                                                                 ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-21 19:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, 21 Nov 2014, Linus Torvalds wrote:
> Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
> compiled it and verified that the code generation looks like what I'd
> have expected, but that's literally it.
> 
>   static noinline int vmalloc_fault(unsigned long address)
>   {
>         pgd_t *pgd_dst;
>         pgdval_t pgd_entry;
>         unsigned index = pgd_index(address);
> 
>         if (index < KERNEL_PGD_BOUNDARY)
>                 return -1;
> 
>         pgd_entry = init_mm.pgd[index].pgd;
>         if (!pgd_entry)
>                 return -1;
> 
>         pgd_dst = __va(PAGE_MASK & read_cr3());
>         pgd_dst += index;
> 
>         if (pgd_dst->pgd)
>                 return -1;
> 
>         ACCESS_ONCE(pgd_dst->pgd) = pgd_entry;

This will break paravirt. set_pgd/set_pmd are paravirt functions.

But I'm fine with breaking it, then you just need to change
CONFIG_PARAVIRT to 'def_bool n'

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:46                                                                                   ` Linus Torvalds
@ 2014-11-21 19:52                                                                                     ` Andy Lutomirski
  2014-11-21 20:14                                                                                       ` Josh Boyer
  2014-11-21 20:00                                                                                     ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 19:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Fri, Nov 21, 2014 at 11:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So I kind of agree, but it wouldn't be my primary worry. My primary
>> worry is actually paravirt doing something insane.
>
> Btw, on that tangent, does anybody actually care about paravirt any more?
>

Amazon, for better or for worse.

> I'd love to start moving away from it. It makes a lot of the low-level
> code completely impossible to follow due to the random indirection
> through "native" vs "paravirt op table". Not just the page table
> handling, it's all over.
>
> Anybody who seriously does virtualization uses hw virtualization that
> is much better than it used to be. And the non-serious users aren't
> that performance-sensitive by definition.
>
> I note that the Fedora kernel config seems to include paravirt by
> default, so you get a lot of the crazy overheads..

I think that there is a move toward deprecating Xen PV in favor of
PVH, but we're not there yet.

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:51                                                                               ` Thomas Gleixner
@ 2014-11-21 20:00                                                                                 ` Linus Torvalds
  2014-11-21 20:16                                                                                   ` Thomas Gleixner
  2014-11-21 22:33                                                                                 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 20:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 11:51 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> This will break paravirt. set_pgd/set_pmd are paravirt functions.

I suspect we could use "set_pgd()" here instead of the direct access.
I didn't want to walk through all the levels to see exactly which
random op I needed to use.

                Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:46                                                                                   ` Linus Torvalds
  2014-11-21 19:52                                                                                     ` Andy Lutomirski
@ 2014-11-21 20:00                                                                                     ` Dave Jones
  2014-11-21 20:02                                                                                       ` Andy Lutomirski
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-21 20:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Andy Lutomirski, Tejun Heo, linux-kernel,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Frederic Weisbecker, Don Zickus, the arch/x86 maintainers,
	Josh Boyer, Justin Forbes

On Fri, Nov 21, 2014 at 11:46:57AM -0800, Linus Torvalds wrote:
 
 > Anybody who seriously does virtualization uses hw virtualization that
 > is much better than it used to be. And the non-serious users aren't
 > that performance-sensitive by definition.
 > 
 > I note that the Fedora kernel config seems to include paravirt by
 > default, so you get a lot of the crazy overheads..

I'm not sure how many people actually use paravirt these days,
but the reason Fedora has it enabled still at least is probably
because..

config KVM_GUEST
         bool "KVM Guest support (including kvmclock)"
         depends on PARAVIRT

But tbh I've not looked at this stuff since it first got merged.
Will a full-virt system kvm boot a guest without KVM_GUEST enabled ?
(ie, is this just an optimisation for the paravirt case?)

I'm not a heavy virt user, so I don't even remember how a lot of
this stuff is supposed to work.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 20:00                                                                                     ` Dave Jones
@ 2014-11-21 20:02                                                                                       ` Andy Lutomirski
  0 siblings, 0 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 20:02 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Steven Rostedt, Andy Lutomirski,
	Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, the arch/x86 maintainers, Josh Boyer, Justin Forbes

On Fri, Nov 21, 2014 at 12:00 PM, Dave Jones <davej@redhat.com> wrote:
> On Fri, Nov 21, 2014 at 11:46:57AM -0800, Linus Torvalds wrote:
>
>  > Anybody who seriously does virtualization uses hw virtualization that
>  > is much better than it used to be. And the non-serious users aren't
>  > that performance-sensitive by definition.
>  >
>  > I note that the Fedora kernel config seems to include paravirt by
>  > default, so you get a lot of the crazy overheads..
>
> I'm not sure how many people actually use paravirt these days,
> but the reason Fedora has it enabled still at least is probably
> because..
>
> config KVM_GUEST
>          bool "KVM Guest support (including kvmclock)"
>          depends on PARAVIRT
>
> But tbh I've not looked at this stuff since it first got merged.
> Will a full-virt system kvm boot a guest without KVM_GUEST enabled ?
> (ie, is this just an optimisation for the paravirt case?)
>

It will boot just fine, although there may be some timing glitches.

I think we should have PARAVIRT_LITE that's just enough for KVM.  That
probably involves some apic changes and nothing else.

--Andy

> I'm not a heavy virt user, so I don't even remember how a lot of
> this stuff is supposed to work.
>
>         Dave
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:52                                                                                     ` Andy Lutomirski
@ 2014-11-21 20:14                                                                                       ` Josh Boyer
  2014-11-21 20:16                                                                                         ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Josh Boyer @ 2014-11-21 20:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Steven Rostedt, Tejun Heo, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker, Don Zickus,
	Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 2:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Nov 21, 2014 at 11:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>>
>>> So I kind of agree, but it wouldn't be my primary worry. My primary
>>> worry is actually paravirt doing something insane.
>>
>> Btw, on that tangent, does anybody actually care about paravirt any more?
>>
>
> Amazon, for better or for worse.
>
>> I'd love to start moving away from it. It makes a lot of the low-level
>> code completely impossible to follow due to the random indirection
>> through "native" vs "paravirt op table". Not just the page table
>> handling, it's all over.
>>
>> Anybody who seriously does virtualization uses hw virtualization that
>> is much better than it used to be. And the non-serious users aren't
>> that performance-sensitive by definition.
>>
>> I note that the Fedora kernel config seems to include paravirt by
>> default, so you get a lot of the crazy overheads..
>
> I think that there is a move toward deprecating Xen PV in favor of
> PVH, but we're not there yet.

A move where?  The Xen stuff in Fedora is ... not paid attention to
very much.  If there's something we should be looking at turning off
(or on), we're happy to take suggestions.

josh

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 20:14                                                                                       ` Josh Boyer
@ 2014-11-21 20:16                                                                                         ` Andy Lutomirski
  2014-11-21 20:23                                                                                           ` Josh Boyer
  0 siblings, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 20:16 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Linus Torvalds, Steven Rostedt, Tejun Heo, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker, Don Zickus,
	Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 12:14 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Fri, Nov 21, 2014 at 2:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Nov 21, 2014 at 11:46 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>>
>>>> So I kind of agree, but it wouldn't be my primary worry. My primary
>>>> worry is actually paravirt doing something insane.
>>>
>>> Btw, on that tangent, does anybody actually care about paravirt any more?
>>>
>>
>> Amazon, for better or for worse.
>>
>>> I'd love to start moving away from it. It makes a lot of the low-level
>>> code completely impossible to follow due to the random indirection
>>> through "native" vs "paravirt op table". Not just the page table
>>> handling, it's all over.
>>>
>>> Anybody who seriously does virtualization uses hw virtualization that
>>> is much better than it used to be. And the non-serious users aren't
>>> that performance-sensitive by definition.
>>>
>>> I note that the Fedora kernel config seems to include paravirt by
>>> default, so you get a lot of the crazy overheads..
>>
>> I think that there is a move toward deprecating Xen PV in favor of
>> PVH, but we're not there yet.
>
> A move where?  The Xen stuff in Fedora is ... not paid attention to
> very much.  If there's something we should be looking at turning off
> (or on), we're happy to take suggestions.

A move in the Xen project.  As I understand it, Xen wants to deprecate
PV in favor of PVH, but PVH is still experimental.

I think that dropping PARAVIRT in Fedora might be a bad idea for
several more releases, since that's likely to break the EC2 images.

--Andy

>
> josh



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 20:00                                                                                 ` Linus Torvalds
@ 2014-11-21 20:16                                                                                   ` Thomas Gleixner
  2014-11-21 20:41                                                                                     ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-21 20:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, 21 Nov 2014, Linus Torvalds wrote:

> On Fri, Nov 21, 2014 at 11:51 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > This will break paravirt. set_pgd/set_pmd are paravirt functions.
> 
> I suspect we could use "set_pgd()" here instead of the direct access.
> I didn't want to walk through all the levels to see exactly which
> random op I needed to use.

I don't think that works on 32bit. See the magic in
vmalloc_sync_one().

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 20:16                                                                                         ` Andy Lutomirski
@ 2014-11-21 20:23                                                                                           ` Josh Boyer
  2014-11-24 18:48                                                                                             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 486+ messages in thread
From: Josh Boyer @ 2014-11-21 20:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Steven Rostedt, Tejun Heo, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker, Don Zickus,
	Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 3:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Nov 21, 2014 at 12:14 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> On Fri, Nov 21, 2014 at 2:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Nov 21, 2014 at 11:46 AM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>> On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
>>>> <torvalds@linux-foundation.org> wrote:
>>>>>
>>>>> So I kind of agree, but it wouldn't be my primary worry. My primary
>>>>> worry is actually paravirt doing something insane.
>>>>
>>>> Btw, on that tangent, does anybody actually care about paravirt any more?
>>>>
>>>
>>> Amazon, for better or for worse.
>>>
>>>> I'd love to start moving away from it. It makes a lot of the low-level
>>>> code completely impossible to follow due to the random indirection
>>>> through "native" vs "paravirt op table". Not just the page table
>>>> handling, it's all over.
>>>>
>>>> Anybody who seriously does virtualization uses hw virtualization that
>>>> is much better than it used to be. And the non-serious users aren't
>>>> that performance-sensitive by definition.
>>>>
>>>> I note that the Fedora kernel config seems to include paravirt by
>>>> default, so you get a lot of the crazy overheads..
>>>
>>> I think that there is a move toward deprecating Xen PV in favor of
>>> PVH, but we're not there yet.
>>
>> A move where?  The Xen stuff in Fedora is ... not paid attention to
>> very much.  If there's something we should be looking at turning off
>> (or on), we're happy to take suggestions.
>
> A move in the Xen project.  As I understand it, Xen wants to deprecate
> PV in favor of PVH, but PVH is still experimental.

OK.

> I think that dropping PARAVIRT in Fedora might be a bad idea for
> several more releases, since that's likely to break the EC2 images.

Yes, that's essentially the only reason we haven't looked at disabling
Xen completely for a while now, so <sad trombone>.

josh

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 20:16                                                                                   ` Thomas Gleixner
@ 2014-11-21 20:41                                                                                     ` Linus Torvalds
  2014-11-21 21:11                                                                                       ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 20:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 12:16 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I don't think that works on 32bit. See the magic in
> vmalloc_sync_one().

Heh. I guess we could just add a wrapper around this crap, and make it
very clear that the paravirt case is a horrible horrible hack.

Something like

   #define set_one_pgd_entry(entry,pgdp) (pgdp)->pgd = (entry)

for the regular case, and then for paravirt we do something very
explicitly horrid, like

   #ifdef CONFIG_PARAVIRT
   #ifdef CONFIG_X86_32
   // The pmd is the top-level page directory on non-PAE x86, nested
inside pgd/pud
   #define set_one_pgd_entry(entry,pgdp) set_pmd((pmd_t *)(pgdp),
(pmd_t) { entry } )
   #else
   #define set_one_pgd_entry(entry, pgdp) do { set_pgd(pgdp, (pgd_t) {
entry });  arch_flush_lazy_mmu_mode(); } while (0)
   #endif

because on x86-64, there seems to be that whole lazy_mode pv_ops
craziness (which I'm not at all convinced is needed here, but that's
what the current code does).

                Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 20:41                                                                                     ` Linus Torvalds
@ 2014-11-21 21:11                                                                                       ` Thomas Gleixner
  2014-11-21 22:55                                                                                         ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-21 21:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, 21 Nov 2014, Linus Torvalds wrote:
> On Fri, Nov 21, 2014 at 12:16 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > I don't think that works on 32bit. See the magic in
> > vmalloc_sync_one().
> 
> Heh. I guess we could just add a wrapper around this crap, and make it
> very clear that the paravirt case is a horrible horrible hack.
> 
> Something like
> 
>    #define set_one_pgd_entry(entry,pgdp) (pgdp)->pgd = (entry)
> 
> for the regular case, and then for paravirt we do something very
> explicitly horrid, like
> 
>    #ifdef CONFIG_PARAVIRT
>    #ifdef CONFIG_X86_32
>    // The pmd is the top-level page directory on non-PAE x86, nested
> inside pgd/pud
>    #define set_one_pgd_entry(entry,pgdp) set_pmd((pmd_t *)(pgdp),
> (pmd_t) { entry } )
>    #else
>    #define set_one_pgd_entry(entry, pgdp) do { set_pgd(pgdp, (pgd_t) {
> entry });  arch_flush_lazy_mmu_mode(); } while (0)
>    #endif
> 
> because on x86-64, there seems to be that whole lazy_mode pv_ops
> craziness (which I'm not at all convinced is needed here, but that's
> what the current code does).

I'm fine with that. I just think it's not horrid enough, but that can
be fixed easily :)

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 17:01                                                               ` Steven Rostedt
  2014-11-21 17:11                                                                 ` Steven Rostedt
@ 2014-11-21 21:32                                                                 ` Frederic Weisbecker
  2014-11-21 21:34                                                                   ` Andy Lutomirski
  1 sibling, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-21 21:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Thomas Gleixner, Linus Torvalds, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 12:01:51PM -0500, Steven Rostedt wrote:
> On Fri, Nov 21, 2014 at 11:25:06AM -0500, Tejun Heo wrote:
> > 
> > * Static percpu areas wouldn't trigger fault lazily.  Note that this
> >   is not necessarily because the first percpu chunk which contains the
> >   static area is embedded inside the kernel linear mapping.  Depending
> >   on the memory layout and boot param, percpu allocator may choose to
> >   map the first chunk in vmalloc space too; however, this still works
> >   out fine because at that point there are no other page tables and
> >   the PUD entries covering the first chunk is faulted in before other
> >   pages tables are copied from the kernel one.
> 
> That sounds correct.
> 
> > 
> > * NMI used to be a problem because vmalloc fault handler couldn't
> >   safely nest inside NMI handler but this has been fixed since and it
> >   should work fine from NMI handlers now.
> 
> Right. Of course "should work fine" does not excatly mean "will work fine".
> 
> 
> > 
> > * Function tracers are problematic because they may end up nesting
> >   inside themselves through triggering a vmalloc fault while accessing
> >   dynamic percpu memory area.  This may lead to recursive locking and
> >   other surprises.
> 
> The function tracer infrastructure now has a recursive check that happens
> rather early in the call. Unless the registered OPS specifically states
> it handles recursions (FTRACE_OPS_FL_RECUSION_SAFE), ftrace will add the
> necessary recursion checks. If a registered OPS lies about being recusion
> safe, well we can't stop suicide.

Same if the recursion state is based on per cpu memory.

> 
> Looking at kernel/trace/trace_functions.c: function_trace_call() which is
> registered with RECURSION_SAFE, I see that the recursion check is done
> before the per_cpu_ptr() call to the dynamically allocated per_cpu data.
> 
> It looks OK, but...
> 
> Oh! but if we trace the page fault handler, and we fault here too
> we just nuked the cr2 register. Not good.

If we fault in the page fault handler, we double fault and apparently
recovering from that isn't quite expected anyway.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 21:32                                                                 ` Frederic Weisbecker
@ 2014-11-21 21:34                                                                   ` Andy Lutomirski
  2014-11-21 21:50                                                                     ` Frederic Weisbecker
  0 siblings, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 21:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, Tejun Heo, Thomas Gleixner, Linus Torvalds,
	Dave Jones, Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 1:32 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Fri, Nov 21, 2014 at 12:01:51PM -0500, Steven Rostedt wrote:
>> On Fri, Nov 21, 2014 at 11:25:06AM -0500, Tejun Heo wrote:
>> >
>> > * Static percpu areas wouldn't trigger fault lazily.  Note that this
>> >   is not necessarily because the first percpu chunk which contains the
>> >   static area is embedded inside the kernel linear mapping.  Depending
>> >   on the memory layout and boot param, percpu allocator may choose to
>> >   map the first chunk in vmalloc space too; however, this still works
>> >   out fine because at that point there are no other page tables and
>> >   the PUD entries covering the first chunk is faulted in before other
>> >   pages tables are copied from the kernel one.
>>
>> That sounds correct.
>>
>> >
>> > * NMI used to be a problem because vmalloc fault handler couldn't
>> >   safely nest inside NMI handler but this has been fixed since and it
>> >   should work fine from NMI handlers now.
>>
>> Right. Of course "should work fine" does not excatly mean "will work fine".
>>
>>
>> >
>> > * Function tracers are problematic because they may end up nesting
>> >   inside themselves through triggering a vmalloc fault while accessing
>> >   dynamic percpu memory area.  This may lead to recursive locking and
>> >   other surprises.
>>
>> The function tracer infrastructure now has a recursive check that happens
>> rather early in the call. Unless the registered OPS specifically states
>> it handles recursions (FTRACE_OPS_FL_RECUSION_SAFE), ftrace will add the
>> necessary recursion checks. If a registered OPS lies about being recusion
>> safe, well we can't stop suicide.
>
> Same if the recursion state is based on per cpu memory.
>
>>
>> Looking at kernel/trace/trace_functions.c: function_trace_call() which is
>> registered with RECURSION_SAFE, I see that the recursion check is done
>> before the per_cpu_ptr() call to the dynamically allocated per_cpu data.
>>
>> It looks OK, but...
>>
>> Oh! but if we trace the page fault handler, and we fault here too
>> we just nuked the cr2 register. Not good.
>
> If we fault in the page fault handler, we double fault and apparently
> recovering from that isn't quite expected anyway.

Not quite.  We only double fault if we fault while pushing the
hardware part of the state onto the stack.  That happens even before
the entry asm gets run.

Otherwise if we have a page fault inside do_page_fault, it's just a
nested page fault.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 16:25                                                             ` Tejun Heo
  2014-11-21 17:01                                                               ` Steven Rostedt
@ 2014-11-21 21:44                                                               ` Frederic Weisbecker
  2014-11-22  0:11                                                                 ` Tejun Heo
  1 sibling, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-21 21:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 11:25:06AM -0500, Tejun Heo wrote:
> Hello, Frederic.
> 
> On Fri, Nov 21, 2014 at 03:13:35PM +0100, Frederic Weisbecker wrote:
> ...
> > So when the issue arised 4 years ago, it was a problem only for NMIs.
> > Like Linus says: "what happens in NMI stays in NMI". Ok no that's not quite
> > what he says :-)  But NMIs happen to be a corner case for about everything
> > and it's sometimes better to fix things from NMI itself, or have an NMI
> > special case rather than grow the whole infrastructure in complexity to
> > support this very corner case.
> 
> I'm not familiar with the innards of fault handling, so can you please
> help me understand what may actually break?  Here are what I currently
> understand.
> 
> * Static percpu areas wouldn't trigger fault lazily.  Note that this
>   is not necessarily because the first percpu chunk which contains the
>   static area is embedded inside the kernel linear mapping.  Depending
>   on the memory layout and boot param, percpu allocator may choose to
>   map the first chunk in vmalloc space too; however, this still works
>   out fine because at that point there are no other page tables and
>   the PUD entries covering the first chunk is faulted in before other
>   pages tables are copied from the kernel one.
> 
> * NMI used to be a problem because vmalloc fault handler couldn't
>   safely nest inside NMI handler but this has been fixed since and it
>   should work fine from NMI handlers now.
> 
> * Function tracers are problematic because they may end up nesting
>   inside themselves through triggering a vmalloc fault while accessing
>   dynamic percpu memory area.  This may lead to recursive locking and
>   other surprises.
> 
> Are there other cases where the lazy vmalloc faults can break things?

I fear that enumerating and fix the existing issues won't be enough.
We can't find all the code sites out there which rely on not being
faulted.

The best would be to fix that from the percpu allocator itself, or vmalloc.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 21:34                                                                   ` Andy Lutomirski
@ 2014-11-21 21:50                                                                     ` Frederic Weisbecker
  2014-11-21 22:45                                                                       ` Steven Rostedt
  0 siblings, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-21 21:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steven Rostedt, Tejun Heo, Thomas Gleixner, Linus Torvalds,
	Dave Jones, Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 01:34:08PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 21, 2014 at 1:32 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Fri, Nov 21, 2014 at 12:01:51PM -0500, Steven Rostedt wrote:
> >> On Fri, Nov 21, 2014 at 11:25:06AM -0500, Tejun Heo wrote:
> >> >
> >> > * Static percpu areas wouldn't trigger fault lazily.  Note that this
> >> >   is not necessarily because the first percpu chunk which contains the
> >> >   static area is embedded inside the kernel linear mapping.  Depending
> >> >   on the memory layout and boot param, percpu allocator may choose to
> >> >   map the first chunk in vmalloc space too; however, this still works
> >> >   out fine because at that point there are no other page tables and
> >> >   the PUD entries covering the first chunk is faulted in before other
> >> >   pages tables are copied from the kernel one.
> >>
> >> That sounds correct.
> >>
> >> >
> >> > * NMI used to be a problem because vmalloc fault handler couldn't
> >> >   safely nest inside NMI handler but this has been fixed since and it
> >> >   should work fine from NMI handlers now.
> >>
> >> Right. Of course "should work fine" does not excatly mean "will work fine".
> >>
> >>
> >> >
> >> > * Function tracers are problematic because they may end up nesting
> >> >   inside themselves through triggering a vmalloc fault while accessing
> >> >   dynamic percpu memory area.  This may lead to recursive locking and
> >> >   other surprises.
> >>
> >> The function tracer infrastructure now has a recursive check that happens
> >> rather early in the call. Unless the registered OPS specifically states
> >> it handles recursions (FTRACE_OPS_FL_RECUSION_SAFE), ftrace will add the
> >> necessary recursion checks. If a registered OPS lies about being recusion
> >> safe, well we can't stop suicide.
> >
> > Same if the recursion state is based on per cpu memory.
> >
> >>
> >> Looking at kernel/trace/trace_functions.c: function_trace_call() which is
> >> registered with RECURSION_SAFE, I see that the recursion check is done
> >> before the per_cpu_ptr() call to the dynamically allocated per_cpu data.
> >>
> >> It looks OK, but...
> >>
> >> Oh! but if we trace the page fault handler, and we fault here too
> >> we just nuked the cr2 register. Not good.
> >
> > If we fault in the page fault handler, we double fault and apparently
> > recovering from that isn't quite expected anyway.
> 
> Not quite.  We only double fault if we fault while pushing the
> hardware part of the state onto the stack.  That happens even before
> the entry asm gets run.
> 
> Otherwise if we have a page fault inside do_page_fault, it's just a
> nested page fault.

Oh ok!

But we still have the cr2 issue that Steve talked about.

> 
> --Andy
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 16:38                                                                 ` Andy Lutomirski
  2014-11-21 16:48                                                                   ` Linus Torvalds
@ 2014-11-21 22:10                                                                   ` Frederic Weisbecker
  1 sibling, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-11-21 22:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Linus Torvalds,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 08:38:07AM -0800, Andy Lutomirski wrote:
> On Nov 21, 2014 8:27 AM, "Tejun Heo" <tj@kernel.org> wrote:
> >
> > Hello, Andy.
> >
> > On Thu, Nov 20, 2014 at 03:55:09PM -0800, Andy Lutomirski wrote:
> > > That doesn't appear to have anything to with nmi though, right?
> >
> > I thought that was the main offender but, apparently, not any more.
> >
> > > Wouldn't this issue be fixed by moving the vmalloc_fault check into
> > > do_page_fault before exception_enter?
> >
> > Can you please elaborate why that'd fix the issue?  I'm not
> > intimiately familiar with the fault handling so it'd be great if you
> > can give me some pointers in terms of where to look at.
> 
> do_page_fault is called directly from asm.  It does:
> 
>     prev_state = exception_enter();
>     __do_page_fault(regs, error_code, address);
>     exception_exit(prev_state);
> 
> The vmalloc fixup is in __do_page_fault.
> 
> exception_enter does various accounting and tracing things, and I
> think that the recursion in stack trace I saw was in exception_enter.
> 
> If you move the vmalloc fixup before exception_enter() and return if
> the fault was from vmalloc, then you can't recurse.  You need to be
> careful not to touch anything that uses RCU before exception_enter,
> though.

That fixes the exception_enter() recursion but surely more issues with
per cpu memory faults are lurking somewhere now or in the future.

I'm going to add recursion protection to user_exit()/user_enter() anyway.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 19:51                                                                               ` Thomas Gleixner
  2014-11-21 20:00                                                                                 ` Linus Torvalds
@ 2014-11-21 22:33                                                                                 ` Konrad Rzeszutek Wilk
  2014-11-22  1:17                                                                                   ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-21 22:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Andy Lutomirski, Steven Rostedt, Tejun Heo,
	linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers, xen-devel

On Fri, Nov 21, 2014 at 08:51:43PM +0100, Thomas Gleixner wrote:
> On Fri, 21 Nov 2014, Linus Torvalds wrote:
> > Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
> > compiled it and verified that the code generation looks like what I'd
> > have expected, but that's literally it.
> > 
> >   static noinline int vmalloc_fault(unsigned long address)
> >   {
> >         pgd_t *pgd_dst;
> >         pgdval_t pgd_entry;
> >         unsigned index = pgd_index(address);
> > 
> >         if (index < KERNEL_PGD_BOUNDARY)
> >                 return -1;
> > 
> >         pgd_entry = init_mm.pgd[index].pgd;
> >         if (!pgd_entry)
> >                 return -1;
> > 
> >         pgd_dst = __va(PAGE_MASK & read_cr3());
> >         pgd_dst += index;
> > 
> >         if (pgd_dst->pgd)
> >                 return -1;
> > 
> >         ACCESS_ONCE(pgd_dst->pgd) = pgd_entry;
> 
> This will break paravirt. set_pgd/set_pmd are paravirt functions.
> 
> But I'm fine with breaking it, then you just need to change
> CONFIG_PARAVIRT to 'def_bool n'

That is not very nice.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 21:50                                                                     ` Frederic Weisbecker
@ 2014-11-21 22:45                                                                       ` Steven Rostedt
  0 siblings, 0 replies; 486+ messages in thread
From: Steven Rostedt @ 2014-11-21 22:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andy Lutomirski, Tejun Heo, Thomas Gleixner, Linus Torvalds,
	Dave Jones, Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Fri, 21 Nov 2014 22:50:41 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:
\
> > Otherwise if we have a page fault inside do_page_fault, it's just a
> > nested page fault.
> 
> Oh ok!
> 
> But we still have the cr2 issue that Steve talked about.
>

Nope, as I looked at the code, I noticed that do_page_fault isn't traced
which is the wrapper for __do_page_fault which is. And do_page_fault()
saves off the cr2 before calling anything else.

So we are ok in this respect as well.

-- Steve

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 21:11                                                                                       ` Thomas Gleixner
@ 2014-11-21 22:55                                                                                         ` Linus Torvalds
  2014-11-21 23:03                                                                                           ` Andy Lutomirski
  2014-12-16 19:28                                                                                           ` Peter Zijlstra
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 22:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

On Fri, Nov 21, 2014 at 1:11 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I'm fine with that. I just think it's not horrid enough, but that can
> be fixed easily :)

Oh, I think it's plenty horrid.

Anyway, here's an actual patch. As usual, it has seen absolutely no
actual testing, but I did try to make sure it compiles and seems to do
the right thing on:
 - x86-32 no-PAE
 - x86-32 no-PAE with PARAVIRT
 - x86-32 PAE
 - x86-64

also, I just removed the noise that is "vmalloc_sync_all()", since
it's just all garbage and nothing actually uses it. Yeah, it's used by
"register_die_notifier()", which makes no sense what-so-ever.
Whatever. It's gone.

Can somebody actually *test* this? In particular, in any kind of real
paravirt environment? Or, any comments even without testing?

I *really* am not proud of the mess wrt the whole

  #ifdef CONFIG_PARAVIRT
  #ifdef CONFIG_X86_32
    ...

but I think that from a long-term perspective, we're actually better
off with this kind of really ugly - but very explcit - hack that very
clearly shows what is going on.

The old code that actually "walked" the page tables was more
"portable", but was somewhat misleading about what was actually going
on.

Comments?

                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 6316 bytes --]

 arch/x86/mm/fault.c | 243 +++++++++++++---------------------------------------
 1 file changed, 58 insertions(+), 185 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d973e61e450d..4b0a1b9404b1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -42,6 +42,64 @@ enum x86_pf_error_code {
 };
 
 /*
+ * Handle a possible vmalloc fault. We just copy the
+ * top-level page table entry if necessary.
+ *
+ * With PAE, the top-most pgd entry is always shared,
+ * and that's where the vmalloc area is.  So PAE had
+ * better never have any vmalloc faults.
+ *
+ * NOTE! This on purpose does *NOT* use pgd_present()
+ * and such generic accessor functions, because
+ * the pgd may contain a folded pud/pmd, and is thus
+ * always "present". We access the actual hardware
+ * state directly, except for the final "set_pgd()"
+ * that may go through a paravirtualization layer.
+ *
+ * Also note the disgusting hackery for the whole
+ * paravirtualization case. Since PAE isn't an issue,
+ * we know that the pmd is the top level, and we just
+ * short-circuit it all.
+ *
+ * We *seriously* need to get rid of the crazy
+ * paravirtualization crud.
+ */
+static nokprobe_inline int vmalloc_fault(unsigned long address)
+{
+#ifdef CONFIG_X86_PAE
+	return -1;
+#else
+	pgd_t *pgd_dst, pgd_entry;
+	unsigned index = pgd_index(address);
+
+	if (index < KERNEL_PGD_BOUNDARY)
+		 return -1;
+
+	pgd_entry = init_mm.pgd[index];
+	if (!(pgd_entry.pgd & 1))
+		return -1;
+
+	pgd_dst = __va(PAGE_MASK & read_cr3());
+	pgd_dst += index;
+
+	if (pgd_dst->pgd)
+		return -1;
+
+#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_X86_32
+	set_pmd((pmd_t *)pgd_dst, (pmd_t){(pud_t){pgd_entry}});
+#else
+	set_pgd(pgd_dst, pgd_entry);
+	arch_flush_lazy_mmu_mode(); // WTF?
+#endif
+#else
+	*pgd_dst = pgd_entry;
+#endif
+	return 0;
+#endif
+}
+
+/*
  * Returns 0 if mmiotrace is disabled, or if the fault is not
  * handled by mmiotrace:
  */
@@ -189,110 +247,6 @@ DEFINE_SPINLOCK(pgd_lock);
 LIST_HEAD(pgd_list);
 
 #ifdef CONFIG_X86_32
-static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
-{
-	unsigned index = pgd_index(address);
-	pgd_t *pgd_k;
-	pud_t *pud, *pud_k;
-	pmd_t *pmd, *pmd_k;
-
-	pgd += index;
-	pgd_k = init_mm.pgd + index;
-
-	if (!pgd_present(*pgd_k))
-		return NULL;
-
-	/*
-	 * set_pgd(pgd, *pgd_k); here would be useless on PAE
-	 * and redundant with the set_pmd() on non-PAE. As would
-	 * set_pud.
-	 */
-	pud = pud_offset(pgd, address);
-	pud_k = pud_offset(pgd_k, address);
-	if (!pud_present(*pud_k))
-		return NULL;
-
-	pmd = pmd_offset(pud, address);
-	pmd_k = pmd_offset(pud_k, address);
-	if (!pmd_present(*pmd_k))
-		return NULL;
-
-	if (!pmd_present(*pmd))
-		set_pmd(pmd, *pmd_k);
-	else
-		BUG_ON(pmd_page(*pmd) != pmd_page(*pmd_k));
-
-	return pmd_k;
-}
-
-void vmalloc_sync_all(void)
-{
-	unsigned long address;
-
-	if (SHARED_KERNEL_PMD)
-		return;
-
-	for (address = VMALLOC_START & PMD_MASK;
-	     address >= TASK_SIZE && address < FIXADDR_TOP;
-	     address += PMD_SIZE) {
-		struct page *page;
-
-		spin_lock(&pgd_lock);
-		list_for_each_entry(page, &pgd_list, lru) {
-			spinlock_t *pgt_lock;
-			pmd_t *ret;
-
-			/* the pgt_lock only for Xen */
-			pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
-
-			spin_lock(pgt_lock);
-			ret = vmalloc_sync_one(page_address(page), address);
-			spin_unlock(pgt_lock);
-
-			if (!ret)
-				break;
-		}
-		spin_unlock(&pgd_lock);
-	}
-}
-
-/*
- * 32-bit:
- *
- *   Handle a fault on the vmalloc or module mapping area
- */
-static noinline int vmalloc_fault(unsigned long address)
-{
-	unsigned long pgd_paddr;
-	pmd_t *pmd_k;
-	pte_t *pte_k;
-
-	/* Make sure we are in vmalloc area: */
-	if (!(address >= VMALLOC_START && address < VMALLOC_END))
-		return -1;
-
-	WARN_ON_ONCE(in_nmi());
-
-	/*
-	 * Synchronize this task's top level page-table
-	 * with the 'reference' page table.
-	 *
-	 * Do _not_ use "current" here. We might be inside
-	 * an interrupt in the middle of a task switch..
-	 */
-	pgd_paddr = read_cr3();
-	pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
-	if (!pmd_k)
-		return -1;
-
-	pte_k = pte_offset_kernel(pmd_k, address);
-	if (!pte_present(*pte_k))
-		return -1;
-
-	return 0;
-}
-NOKPROBE_SYMBOL(vmalloc_fault);
-
 /*
  * Did it hit the DOS screen memory VA from vm86 mode?
  */
@@ -347,87 +301,6 @@ out:
 
 #else /* CONFIG_X86_64: */
 
-void vmalloc_sync_all(void)
-{
-	sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END, 0);
-}
-
-/*
- * 64-bit:
- *
- *   Handle a fault on the vmalloc area
- *
- * This assumes no large pages in there.
- */
-static noinline int vmalloc_fault(unsigned long address)
-{
-	pgd_t *pgd, *pgd_ref;
-	pud_t *pud, *pud_ref;
-	pmd_t *pmd, *pmd_ref;
-	pte_t *pte, *pte_ref;
-
-	/* Make sure we are in vmalloc area: */
-	if (!(address >= VMALLOC_START && address < VMALLOC_END))
-		return -1;
-
-	WARN_ON_ONCE(in_nmi());
-
-	/*
-	 * Copy kernel mappings over when needed. This can also
-	 * happen within a race in page table update. In the later
-	 * case just flush:
-	 */
-	pgd = pgd_offset(current->active_mm, address);
-	pgd_ref = pgd_offset_k(address);
-	if (pgd_none(*pgd_ref))
-		return -1;
-
-	if (pgd_none(*pgd)) {
-		set_pgd(pgd, *pgd_ref);
-		arch_flush_lazy_mmu_mode();
-	} else {
-		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
-	}
-
-	/*
-	 * Below here mismatches are bugs because these lower tables
-	 * are shared:
-	 */
-
-	pud = pud_offset(pgd, address);
-	pud_ref = pud_offset(pgd_ref, address);
-	if (pud_none(*pud_ref))
-		return -1;
-
-	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
-		BUG();
-
-	pmd = pmd_offset(pud, address);
-	pmd_ref = pmd_offset(pud_ref, address);
-	if (pmd_none(*pmd_ref))
-		return -1;
-
-	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
-		BUG();
-
-	pte_ref = pte_offset_kernel(pmd_ref, address);
-	if (!pte_present(*pte_ref))
-		return -1;
-
-	pte = pte_offset_kernel(pmd, address);
-
-	/*
-	 * Don't use pte_page here, because the mappings can point
-	 * outside mem_map, and the NUMA hash lookup cannot handle
-	 * that:
-	 */
-	if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-		BUG();
-
-	return 0;
-}
-NOKPROBE_SYMBOL(vmalloc_fault);
-
 #ifdef CONFIG_CPU_SUP_AMD
 static const char errata93_warning[] =
 KERN_ERR 

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 22:55                                                                                         ` Linus Torvalds
@ 2014-11-21 23:03                                                                                           ` Andy Lutomirski
  2014-11-21 23:33                                                                                             ` Linus Torvalds
  2014-12-16 19:28                                                                                           ` Peter Zijlstra
  1 sibling, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-21 23:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 2:55 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 21, 2014 at 1:11 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> I'm fine with that. I just think it's not horrid enough, but that can
>> be fixed easily :)
>
> Oh, I think it's plenty horrid.
>
> Anyway, here's an actual patch. As usual, it has seen absolutely no
> actual testing, but I did try to make sure it compiles and seems to do
> the right thing on:
>  - x86-32 no-PAE
>  - x86-32 no-PAE with PARAVIRT
>  - x86-32 PAE
>  - x86-64
>
> also, I just removed the noise that is "vmalloc_sync_all()", since
> it's just all garbage and nothing actually uses it. Yeah, it's used by
> "register_die_notifier()", which makes no sense what-so-ever.
> Whatever. It's gone.
>
> Can somebody actually *test* this? In particular, in any kind of real
> paravirt environment? Or, any comments even without testing?
>
> I *really* am not proud of the mess wrt the whole
>
>   #ifdef CONFIG_PARAVIRT
>   #ifdef CONFIG_X86_32
>     ...
>
> but I think that from a long-term perspective, we're actually better
> off with this kind of really ugly - but very explcit - hack that very
> clearly shows what is going on.
>
> The old code that actually "walked" the page tables was more
> "portable", but was somewhat misleading about what was actually going
> on.

At the risk of going deeper down the rabbit hole, I grepped for
pgd_list.  I found:

__set_pmd_pte in pageattr.c.  It appears to be completely incorrect.
Unless I've misunderstood, other than the very first line, it will
either do nothing at all or crash when it falls off the end of the
page tables that it's pointlessly trying to update.

sync_global_pgds: OK, I guess -- this is for hot-add of memory, right?
 But if we teach the context switch code to check that the kernel
stack is okay, that can be removed, I think.  (We absolutely MUST keep
the static per-cpu stuff populated everywhere before running user
code, but that's never in hot-added memory.)

xen_mm_pin_all and xen_mm_unpin_all: I have no clue.  I wonder how
that works with SHARED_KERNEL_PMD.

Anyone want to attack these?  It would be kind of nice to remove
pgd_list entirely.  (I realize that doing so precludes the use of
bloody enormous 512GB kernel pages, but any attempt to use *those* is
so completely screwed without a major reworking of all of this (or
perhaps stop_machine) that keeping pgd_list around just for that is
probably a mistake.)

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 23:03                                                                                           ` Andy Lutomirski
@ 2014-11-21 23:33                                                                                             ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-21 23:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 3:03 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Nov 21, 2014 at 2:55 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Anyway, here's an actual patch. As usual, it has seen absolutely no
>> actual testing,

.. ok, it boots and works fine as far as I can tell on x86-64 with no
paravirt anywhere.

> At the risk of going deeper down the rabbit hole, I grepped for
> pgd_list.  I found:

Ugh.

> __set_pmd_pte in pageattr.c.  It appears to be completely incorrect.
> Unless I've misunderstood, other than the very first line, it will
> either do nothing at all or crash when it falls off the end of the
> page tables that it's pointlessly trying to update.

I think you found a rats nest.

I can't make heads nor tails of the logic. The !SHARED_KERNEL_PMD test
doesn't seem very sensible, since that's also the conditional for
adding anything to the list in the first place.

So I agree that the code doesn't make much sense. Although maybe it's
there just because that way the loop goes away at compile-time under
most circumstances. So maybe even that part does make sense.

And the "walk down to the pmd level" part actually looks ok. Remember:
this is on x86-32 only, and you have two cases: non-PAE where the
pmd/pud offset thing does nothing at all, and it just ends up
converting a "pgd_t *" to a "pmd_t *".  And for PAE, the top pud level
always exists, and the pmd is folded, so despite what looks like
walking two levels, it really just walks the one level - the
force-allocated PGD entries.

So it won't "fall off the end of the page tables" like you imply. It
will just walk to the pmd level. And there it will populate all the
page tables with the same pmd.

So I think it works.

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 21:44                                                               ` Frederic Weisbecker
@ 2014-11-22  0:11                                                                 ` Tejun Heo
  2014-11-22  0:18                                                                   ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Tejun Heo @ 2014-11-22  0:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Linus Torvalds, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Hello, Frederic.

On Fri, Nov 21, 2014 at 10:44:46PM +0100, Frederic Weisbecker wrote:
> I fear that enumerating and fix the existing issues won't be enough.
> We can't find all the code sites out there which rely on not being
> faulted.

Oh, sure but that can take some time so adding documentation in the
mean time probably isn't a bad idea.

> The best would be to fix that from the percpu allocator itself, or
> vmalloc.

I don't think there's much percpu allocator itself can do.  The
ability to grow dynamically comes from being able to allocate
relatively consistent layout among areas for different CPUs and pretty
much requires vmalloc area and it'd generally be a good idea to take
out the vmalloc fault anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-22  0:11                                                                 ` Tejun Heo
@ 2014-11-22  0:18                                                                   ` Linus Torvalds
  2014-11-22  0:41                                                                     ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-22  0:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Thomas Gleixner, Dave Jones, Don Zickus,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 4:11 PM, Tejun Heo <tj@kernel.org> wrote:
>
> I don't think there's much percpu allocator itself can do.  The
> ability to grow dynamically comes from being able to allocate
> relatively consistent layout among areas for different CPUs and pretty
> much requires vmalloc area and it'd generally be a good idea to take
> out the vmalloc fault anyway.

Why do you guys worry so much about the vmalloc fault?

This started because of a very different issue: putting the actual
stack in vmalloc space. Then it can cause nasty triple faults etc.

But the normal vmalloc fault? Who cares, really? If that causes
problems, they are bugs. Fix them.

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-22  0:18                                                                   ` Linus Torvalds
@ 2014-11-22  0:41                                                                     ` Andy Lutomirski
  0 siblings, 0 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-11-22  0:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Frederic Weisbecker, Thomas Gleixner, Dave Jones,
	Don Zickus, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra, Arnaldo Carvalho de Melo

On Fri, Nov 21, 2014 at 4:18 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 21, 2014 at 4:11 PM, Tejun Heo <tj@kernel.org> wrote:
>>
>> I don't think there's much percpu allocator itself can do.  The
>> ability to grow dynamically comes from being able to allocate
>> relatively consistent layout among areas for different CPUs and pretty
>> much requires vmalloc area and it'd generally be a good idea to take
>> out the vmalloc fault anyway.
>
> Why do you guys worry so much about the vmalloc fault?
>
> This started because of a very different issue: putting the actual
> stack in vmalloc space. Then it can cause nasty triple faults etc.
>
> But the normal vmalloc fault? Who cares, really? If that causes
> problems, they are bugs. Fix them.

Because of this in system_call_after_swapgs:

    movq    %rsp,PER_CPU_VAR(old_rsp)
    movq    PER_CPU_VAR(kernel_stack),%rsp

It occurs to me that, if we really want to change that, we could have
an array of syscall trampolines, one per CPU, that have the CPU number
hardcoded.  But I really don't think that's worth it.

Other than that, with your fix, vmalloc faults are no big deal :)

--Andy

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 22:33                                                                                 ` Konrad Rzeszutek Wilk
@ 2014-11-22  1:17                                                                                   ` Thomas Gleixner
  0 siblings, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-11-22  1:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Linus Torvalds, Andy Lutomirski, Steven Rostedt, Tejun Heo,
	linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers, xen-devel

On Fri, 21 Nov 2014, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 21, 2014 at 08:51:43PM +0100, Thomas Gleixner wrote:
 > > On Fri, 21 Nov 2014, Linus Torvalds wrote:
> > > Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
> > > compiled it and verified that the code generation looks like what I'd
> > > have expected, but that's literally it.
> > > 
> > >   static noinline int vmalloc_fault(unsigned long address)
> > >   {
> > >         pgd_t *pgd_dst;
> > >         pgdval_t pgd_entry;
> > >         unsigned index = pgd_index(address);
> > > 
> > >         if (index < KERNEL_PGD_BOUNDARY)
> > >                 return -1;
> > > 
> > >         pgd_entry = init_mm.pgd[index].pgd;
> > >         if (!pgd_entry)
> > >                 return -1;
> > > 
> > >         pgd_dst = __va(PAGE_MASK & read_cr3());
> > >         pgd_dst += index;
> > > 
> > >         if (pgd_dst->pgd)
> > >                 return -1;
> > > 
> > >         ACCESS_ONCE(pgd_dst->pgd) = pgd_entry;
> > 
> > This will break paravirt. set_pgd/set_pmd are paravirt functions.
> > 
> > But I'm fine with breaking it, then you just need to change
> > CONFIG_PARAVIRT to 'def_bool n'
> 
> That is not very nice.

Maybe not nice, but sensible.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 20:23                                                                                           ` Josh Boyer
@ 2014-11-24 18:48                                                                                             ` Konrad Rzeszutek Wilk
  2014-11-24 19:07                                                                                               ` Josh Boyer
  2014-11-25  5:36                                                                                               ` Jürgen Groß
  0 siblings, 2 replies; 486+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-24 18:48 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Andy Lutomirski, Linus Torvalds, Steven Rostedt, Tejun Heo,
	linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Fri, Nov 21, 2014 at 03:23:13PM -0500, Josh Boyer wrote:
> On Fri, Nov 21, 2014 at 3:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Fri, Nov 21, 2014 at 12:14 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> >> On Fri, Nov 21, 2014 at 2:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>> On Fri, Nov 21, 2014 at 11:46 AM, Linus Torvalds
> >>> <torvalds@linux-foundation.org> wrote:
> >>>> On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
> >>>> <torvalds@linux-foundation.org> wrote:
> >>>>>
> >>>>> So I kind of agree, but it wouldn't be my primary worry. My primary
> >>>>> worry is actually paravirt doing something insane.
> >>>>
> >>>> Btw, on that tangent, does anybody actually care about paravirt any more?
> >>>>
> >>>
> >>> Amazon, for better or for worse.

And distros: Oracle and Novell.

> >>>
> >>>> I'd love to start moving away from it. It makes a lot of the low-level
> >>>> code completely impossible to follow due to the random indirection
> >>>> through "native" vs "paravirt op table". Not just the page table
> >>>> handling, it's all over.
> >>>>
> >>>> Anybody who seriously does virtualization uses hw virtualization that
> >>>> is much better than it used to be. And the non-serious users aren't
> >>>> that performance-sensitive by definition.

I would point out that the PV paravirt spinlock gives an huge boost
for virtualization guests (this is for both KVM and Xen).
> >>>>
> >>>> I note that the Fedora kernel config seems to include paravirt by
> >>>> default, so you get a lot of the crazy overheads..

Not that much. We ran benchmarks and it was in i-cache overhead - and
the numbers came out to be sub-1% percent.
> >>>
> >>> I think that there is a move toward deprecating Xen PV in favor of
> >>> PVH, but we're not there yet.
> >>
> >> A move where?  The Xen stuff in Fedora is ... not paid attention to
> >> very much.  If there's something we should be looking at turning off
> >> (or on), we're happy to take suggestions.
> >
> > A move in the Xen project.  As I understand it, Xen wants to deprecate
> > PV in favor of PVH, but PVH is still experimental.
> 
> OK.
> 
> > I think that dropping PARAVIRT in Fedora might be a bad idea for
> > several more releases, since that's likely to break the EC2 images.
> 
> Yes, that's essentially the only reason we haven't looked at disabling
> Xen completely for a while now, so <sad trombone>.

Heh. Didn't know you could play on a trombone!

As I had mentioned in the past - if there are Xen related bugs on
Fedora please CC me on them. Or perhaps CC xen-devel@lists.xenproject.org
if that is possible.

And as Andy has mentioned - we are moving towards using PVH as a way
to not use the PV MMU ops. But that is still off (<sad trombone played
from YouTube>).

> 
> josh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-24 18:48                                                                                             ` Konrad Rzeszutek Wilk
@ 2014-11-24 19:07                                                                                               ` Josh Boyer
  2014-11-25  5:36                                                                                               ` Jürgen Groß
  1 sibling, 0 replies; 486+ messages in thread
From: Josh Boyer @ 2014-11-24 19:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andy Lutomirski, Linus Torvalds, Steven Rostedt, Tejun Heo,
	linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Mon, Nov 24, 2014 at 1:48 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Nov 21, 2014 at 03:23:13PM -0500, Josh Boyer wrote:
>> On Fri, Nov 21, 2014 at 3:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > On Fri, Nov 21, 2014 at 12:14 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> >> On Fri, Nov 21, 2014 at 2:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> >>> On Fri, Nov 21, 2014 at 11:46 AM, Linus Torvalds
>> >>> <torvalds@linux-foundation.org> wrote:
>> >>>> On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
>> >>>> <torvalds@linux-foundation.org> wrote:
>> >>>>>
>> >>>>> So I kind of agree, but it wouldn't be my primary worry. My primary
>> >>>>> worry is actually paravirt doing something insane.
>> >>>>
>> >>>> Btw, on that tangent, does anybody actually care about paravirt any more?
>> >>>>
>> >>>
>> >>> Amazon, for better or for worse.
>
> And distros: Oracle and Novell.
>
>> >>>
>> >>>> I'd love to start moving away from it. It makes a lot of the low-level
>> >>>> code completely impossible to follow due to the random indirection
>> >>>> through "native" vs "paravirt op table". Not just the page table
>> >>>> handling, it's all over.
>> >>>>
>> >>>> Anybody who seriously does virtualization uses hw virtualization that
>> >>>> is much better than it used to be. And the non-serious users aren't
>> >>>> that performance-sensitive by definition.
>
> I would point out that the PV paravirt spinlock gives an huge boost
> for virtualization guests (this is for both KVM and Xen).
>> >>>>
>> >>>> I note that the Fedora kernel config seems to include paravirt by
>> >>>> default, so you get a lot of the crazy overheads..
>
> Not that much. We ran benchmarks and it was in i-cache overhead - and
> the numbers came out to be sub-1% percent.
>> >>>
>> >>> I think that there is a move toward deprecating Xen PV in favor of
>> >>> PVH, but we're not there yet.
>> >>
>> >> A move where?  The Xen stuff in Fedora is ... not paid attention to
>> >> very much.  If there's something we should be looking at turning off
>> >> (or on), we're happy to take suggestions.
>> >
>> > A move in the Xen project.  As I understand it, Xen wants to deprecate
>> > PV in favor of PVH, but PVH is still experimental.
>>
>> OK.
>>
>> > I think that dropping PARAVIRT in Fedora might be a bad idea for
>> > several more releases, since that's likely to break the EC2 images.
>>
>> Yes, that's essentially the only reason we haven't looked at disabling
>> Xen completely for a while now, so <sad trombone>.
>
> Heh. Didn't know you could play on a trombone!

It's sad because I can't really play the trombone and it sounds horrible.

> As I had mentioned in the past - if there are Xen related bugs on
> Fedora please CC me on them. Or perhaps CC xen-devel@lists.xenproject.org
> if that is possible.

Indeed, you have been massively helpful.  My comment on it being not
well paid attention to was a reflection on the distro maintainers, not
you.  You've been great once we notice the Xen issue, but that takes a
while on our part and it isn't the best of user experiences :\.

> And as Andy has mentioned - we are moving towards using PVH as a way
> to not use the PV MMU ops. But that is still off (<sad trombone played
> from YouTube>).

OK.  I'll try and do better at keeping up with things.

josh

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-24 18:48                                                                                             ` Konrad Rzeszutek Wilk
  2014-11-24 19:07                                                                                               ` Josh Boyer
@ 2014-11-25  5:36                                                                                               ` Jürgen Groß
  2014-11-25 17:22                                                                                                 ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Jürgen Groß @ 2014-11-25  5:36 UTC (permalink / raw)
  To: torvalds
  Cc: Konrad Rzeszutek Wilk, Josh Boyer, Andy Lutomirski,
	Linus Torvalds, Steven Rostedt, Tejun Heo, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker, Don Zickus,
	Dave Jones, the arch/x86 maintainers

On 11/24/2014 07:48 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 21, 2014 at 03:23:13PM -0500, Josh Boyer wrote:
>> On Fri, Nov 21, 2014 at 3:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Nov 21, 2014 at 12:14 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>>>> On Fri, Nov 21, 2014 at 2:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> On Fri, Nov 21, 2014 at 11:46 AM, Linus Torvalds
>>>>> <torvalds@linux-foundation.org> wrote:
>>>>>> On Fri, Nov 21, 2014 at 11:34 AM, Linus Torvalds
>>>>>> <torvalds@linux-foundation.org> wrote:
>>>>>>>
>>>>>>> So I kind of agree, but it wouldn't be my primary worry. My primary
>>>>>>> worry is actually paravirt doing something insane.
>>>>>>
>>>>>> Btw, on that tangent, does anybody actually care about paravirt any more?
>>>>>>

Funny, during testing some patches related to Xen I hit the lockup
issue. It looked a little bit different, but a variation of your patch
solved my problem. The difference to the original report might be due
to the rather low system load during my test, so the system was still
responsive when the first lockup messages appeared. I could see the
hanging cpus were spinning in pmd_lock() called during
__handle_mm_fault().

I could reproduce the issue within a few minutes reliably without the
patch below. With it the machine survived 12 hours and is still running.

WHY my test would trigger the problem so fast I have no idea. I saw it
on a rather huge machine only (128GB memory, 120 cpus), that's quite
understandable. My test remapped some pages via the hypervisor and
removed those mappings again. Perhaps the TLB flushing involved in these
operations is triggering the problem.


diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d973e61..b847ff7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -377,7 +377,7 @@ static noinline int vmalloc_fault(unsigned long address)
          * happen within a race in page table update. In the later
          * case just flush:
          */
-       pgd = pgd_offset(current->active_mm, address);
+       pgd = (pgd_t *)__va(read_cr3()) + pgd_index(address);
         pgd_ref = pgd_offset_k(address);
         if (pgd_none(*pgd_ref))
                 return -1;



Juergen

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-20 15:25                                   ` frequent lockups in 3.18rc4 Dave Jones
  2014-11-20 19:43                                     ` Linus Torvalds
@ 2014-11-25 12:22                                     ` Will Deacon
  2014-12-01 11:48                                       ` Will Deacon
  1 sibling, 1 reply; 486+ messages in thread
From: Will Deacon @ 2014-11-25 12:22 UTC (permalink / raw)
  To: Dave Jones, Andy Lutomirski, Linus Torvalds, Don Zickus,
	Thomas Gleixner, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra

Hi Dave,

On Thu, Nov 20, 2014 at 10:25:09AM -0500, Dave Jones wrote:
> On Wed, Nov 19, 2014 at 01:01:36PM -0800, Andy Lutomirski wrote:
>  
>  > TIF_NOHZ is not the same thing as NOHZ.  Can you try a kernel with
>  > CONFIG_CONTEXT_TRACKING=n?  Doing that may involve fiddling with RCU
>  > settings a bit.  The normal no HZ idle stuff has nothing to do with
>  > TIF_NOHZ, and you either have TIF_NOHZ set or you have some kind of
>  > thread_info corruption going on here.
> 
> Disabling CONTEXT_TRACKING didn't change the problem.
> Unfortunatly the full trace didn't make it over usb-serial this time. Grr.
> 
> Here's what came over serial..
> 
> NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c35:11634]
> CPU: 2 PID: 11634 Comm: trinity-c35 Not tainted 3.18.0-rc5+ #94 [loadavg: 164.79 157.30 155.90 37/409 11893]
> task: ffff88014e0d96f0 ti: ffff880220eb4000 task.ti: ffff880220eb4000
> RIP: 0010:[<ffffffff88379605>]  [<ffffffff88379605>] copy_user_enhanced_fast_string+0x5/0x10
> RSP: 0018:ffff880220eb7ef0  EFLAGS: 00010283
> RAX: ffff880220eb4000 RBX: ffffffff887dac64 RCX: 0000000000006a18
> RDX: 000000000000e02f RSI: 00007f766f466620 RDI: ffff88016f6a7617
> RBP: ffff880220eb7f78 R08: 8000000000000063 R09: 0000000000000004
> R10: 0000000000000010 R11: 0000000000000000 R12: ffffffff880bf50d
> R13: 0000000000000001 R14: ffff880220eb4000 R15: 0000000000000001
> FS:  00007f766f459740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f766f461000 CR3: 000000018b00e000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffffffff882f4225 ffff880183db5a00 0000000001743440 00007f766f0fb000
>  fffffffffffffeff 0000000000000000 0000000000008d79 00007f766f45f000
>  ffffffff8837adae 00ff880220eb7f38 000000003203f1ac 0000000000000001
> Call Trace:
>  [<ffffffff882f4225>] ? SyS_add_key+0xd5/0x240
>  [<ffffffff8837adae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff887da092>] system_call_fastpath+0x12/0x17
> Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
> sending NMI to other CPUs:
> 
> 
> Here's a crappy phonecam pic of the screen. 
> http://codemonkey.org.uk/junk/IMG_4311.jpg
> There's a bit of trace missing between the above and what was on
> the screen, so we missed some CPUs.

I'm not sure if this is useful, but I've been seeing trinity lockups
on arm64 as well. Sometimes they happen a few times a day, sometimes it
takes a few days (I just saw my first one on -rc6, for example).

However, I have a little bit more trace than you do and *every single time*
the lockup has involved an execve to a virtual file system.

E.g.:

[child1:10700] [212] execve(name="/sys/fs/ext4/features/batched_discard", argv=0x91796a0, envp=0x911a9c0)

(I've seen cases with /proc too)

The child doing the execve then doesn't return an error from the syscall,
and instead seems to disappear from the face of the planet, sometimes with
the tasklist_lock held for write, which causes a lockup shortly afterwards.

I'm running under KVM with two virtual CPUs. When the machine is wedged,
one CPU is sitting in idle and the other seems to be kicking around do_wait
and pid_vnr, but it's difficult to really see what's going on.

I tried increasing the likelihood of execve syscalls in trinity, but it
didn't seem to help with reproducing this issue.

Will

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-25  5:36                                                                                               ` Jürgen Groß
@ 2014-11-25 17:22                                                                                                 ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-25 17:22 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Konrad Rzeszutek Wilk, Josh Boyer, Andy Lutomirski,
	Steven Rostedt, Tejun Heo, linux-kernel, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Mon, Nov 24, 2014 at 9:36 PM, Jürgen Groß <jgross@suse.com> wrote:
>
> Funny, during testing some patches related to Xen I hit the lockup
> issue. It looked a little bit different, but a variation of your patch
> solved my problem.
>
> I could reproduce the issue within a few minutes reliably without the
> patch below. With it the machine survived 12 hours and is still running.

Do you have a backtrace for the failure case? I have no problem
applying this part of the patch (I really don't understand why x86-64
hadn't gotten the proper code from 32-bit), but I'd like to see (and
document) where the fault happens for this.

Since you can apparently reproduce this fairly easily with a broken
kernel, getting a backtrace shouldn't be too hard?

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-16  6:33       ` Linus Torvalds
  2014-11-16 10:06         ` Markus Trippelsdorf
  2014-11-17 17:03         ` Dave Jones
@ 2014-11-26  0:25         ` Dave Jones
  2014-11-26  1:48           ` Linus Torvalds
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-26  0:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Sat, Nov 15, 2014 at 10:33:19PM -0800, Linus Torvalds wrote:

 > I have no ideas left. I'd go for a bisection - rather than try random
 > things, at least bisection will get us a smaller set of suspects if
 > you can go through a few cycles of it. Even if you decide that you
 > want to run for most of a day before you are convinced it's all good,
 > a couple of days should get you a handful of bisection points (that's
 > assuming you hit a couple of bad ones too that turn bad in a shorter
 > while). And 4 or five bisections should get us from 11k commits down
 > to the ~600 commit range. That would be a huge improvement.

There's 8 bisections remaining. The log so far:

git bisect start
# good: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
git bisect good bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9
# bad: [f114040e3ea6e07372334ade75d1ee0775c355e1] Linux 3.18-rc1
git bisect bad f114040e3ea6e07372334ade75d1ee0775c355e1
# bad: [f114040e3ea6e07372334ade75d1ee0775c355e1] Linux 3.18-rc1
git bisect bad f114040e3ea6e07372334ade75d1ee0775c355e1
# bad: [35a9ad8af0bb0fa3525e6d0d20e32551d226f38e] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad 35a9ad8af0bb0fa3525e6d0d20e32551d226f38e
# bad: [35a9ad8af0bb0fa3525e6d0d20e32551d226f38e] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad 35a9ad8af0bb0fa3525e6d0d20e32551d226f38e
# bad: [683a52a10148e929fb4844f9237f059a47c0b01b] Merge tag 'tty-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad 683a52a10148e929fb4844f9237f059a47c0b01b
# bad: [683a52a10148e929fb4844f9237f059a47c0b01b] Merge tag 'tty-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad 683a52a10148e929fb4844f9237f059a47c0b01b
# bad: [76272ab3f348d303eb31a5a061601ca8e0f9c5ce] staging: rtl8821ae: remove driver
git bisect bad 76272ab3f348d303eb31a5a061601ca8e0f9c5ce
# bad: [e988e1f3f975a9d6013c6356c5b9369540c091f9] staging: comedi: ni_at_a2150: range check board index
git bisect bad e988e1f3f975a9d6013c6356c5b9369540c091f9
# bad: [bd8107b2b2dc9fb1113bfe1a9cf2533ee19c57ee] Staging: bcm: Bcmchar.c: Renamed variable: "RxCntrlMsgBitMask" -> "rx_cntrl_msg_bit_mask"
git bisect bad bd8107b2b2dc9fb1113bfe1a9cf2533ee19c57ee
# bad: [91ed283ab563727932d6cf92b74dd15226635870] staging: rtl8188eu: Remove unused function rtw_IOL_append_WD_cmd()
git bisect bad 91ed283ab563727932d6cf92b74dd15226635870


The reason I'm checking in at this point, is that I'm starting to see different
bugs at this point, so I don't know if I can call this good or bad, unless
someone has a fix for what I'm seeing now.

Reminiscent of a bug a couple releases ago. Processes about to exit, but stuck
in the kernel continuously faulting..
http://codemonkey.org.uk/junk/weird-hang.txt
The one I'm thinking of got fixed way before 3.17 though.

Does that trace ring a bell of something else I could try on top of
each bisection point ?

I rebooted and restarted my test at the current bisection point,
hopefully it'll show up as 'bad' before the bug above happens again.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  0:25         ` Dave Jones
@ 2014-11-26  1:48           ` Linus Torvalds
  2014-11-26  2:40             ` Dave Jones
  2014-11-26  4:39             ` Jürgen Groß
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-26  1:48 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 25, 2014 at 4:25 PM, Dave Jones <davej@redhat.com> wrote:
>
> The reason I'm checking in at this point, is that I'm starting to see different
> bugs at this point, so I don't know if I can call this good or bad, unless
> someone has a fix for what I'm seeing now.

Hmm. The three last "bad" biisects are all just 3.17-rc1 plus staging fixes.

> Reminiscent of a bug a couple releases ago. Processes about to exit, but stuck
> in the kernel continuously faulting..
> http://codemonkey.org.uk/junk/weird-hang.txt
> The one I'm thinking of got fixed way before 3.17 though.

Well, the staging tree was based on that 3.17-rc1 tree, so it may well
have the bug without the fix.

You have also marked 3.18-rc1 bad *twice*, along with the network
merge, and the tty merge. That's just odd. But it doesn't make the
bisect wrong, it just means that you fat-fingered thing and marked the
same thing bad a couple of times.

Nothing to worry about, unless it's a sign of early Parkinsons...

> Does that trace ring a bell of something else I could try on top of
> each bisection point ?

Hmm.

Smells somewhat like the "pipe/page fault oddness" bug you reported.

That one caused endless page faults on fault_in_pages_writeable()
because of a page table entry that the VM thought was present, but the
CPU thought was missing.

That caused the whole "pte_protnone()" thing, and trying to get rid of
the PTE_NUMA bit, but those patches have *not* been merged. And you
were ever able to reproduce it., so we left it as pending.

But if you actually really think that the bisect log you posted is
real and true and actually is the bug you're chasing, I have bad news
for you: do a "gitk --bisect", and you'll see that all the remaining
commits are just to staging drivers.

So that would either imply you have some staging driver (unlikely), or
more likely that 3.17 really already has the problem, it's just that
it needs some particular code alignment or phase of the moon or
something to trigger.

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  1:48           ` Linus Torvalds
@ 2014-11-26  2:40             ` Dave Jones
  2014-11-26 22:57               ` Dave Jones
  2014-11-26  4:39             ` Jürgen Groß
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-26  2:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers

On Tue, Nov 25, 2014 at 05:48:15PM -0800, Linus Torvalds wrote:

 > You have also marked 3.18-rc1 bad *twice*, along with the network
 > merge, and the tty merge. That's just odd. But it doesn't make the
 > bisect wrong, it just means that you fat-fingered thing and marked the
 > same thing bad a couple of times.
 > 
 > Nothing to worry about, unless it's a sign of early Parkinsons...

Intentional on my part, without realizing the first one was recorded.
First time, it printed the usual bisect text, but then complained my
tree was dirty (which it was). I unapplied the stuff I had, and did
the bisect command a 2nd time..

 > > Does that trace ring a bell of something else I could try on top of
 > > each bisection point ?
 > 
 > Hmm.
 > 
 > Smells somewhat like the "pipe/page fault oddness" bug you reported.
 > 
 > That one caused endless page faults on fault_in_pages_writeable()
 > because of a page table entry that the VM thought was present, but the
 > CPU thought was missing.
 > 
 > That caused the whole "pte_protnone()" thing, and trying to get rid of
 > the PTE_NUMA bit, but those patches have *not* been merged. And you
 > were ever able to reproduce it., so we left it as pending.

ah, yeah, now it comes back to me.

 > But if you actually really think that the bisect log you posted is
 > real and true and actually is the bug you're chasing, I have bad news
 > for you: do a "gitk --bisect", and you'll see that all the remaining
 > commits are just to staging drivers.
 > 
 > So that would either imply you have some staging driver (unlikely), or
 > more likely that 3.17 really already has the problem, it's just that
 > it needs some particular code alignment or phase of the moon or
 > something to trigger.

Maybe I'll try 3.17 + perf fix for an even longer runtime.
Like over thanksgiving or something.

If some of the bisection points so far had been 'good', I would
go back and re-check, but every step of the way I've been able
to reproduce it.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  1:48           ` Linus Torvalds
  2014-11-26  2:40             ` Dave Jones
@ 2014-11-26  4:39             ` Jürgen Groß
       [not found]               ` <CA+55aFx1SiFBzmA=k9jHxi3cZE3Ei_+2NHepujgf86KEvkz8eQ@mail.gmail.com>
  1 sibling, 1 reply; 486+ messages in thread
From: Jürgen Groß @ 2014-11-26  4:39 UTC (permalink / raw)
  To: Linus Torvalds, Dave Jones, Linux Kernel, the arch/x86 maintainers

On 11/26/2014 02:48 AM, Linus Torvalds wrote:
> On Tue, Nov 25, 2014 at 4:25 PM, Dave Jones <davej@redhat.com> wrote:
>>
>> The reason I'm checking in at this point, is that I'm starting to see different
>> bugs at this point, so I don't know if I can call this good or bad, unless
>> someone has a fix for what I'm seeing now.
>
> Hmm. The three last "bad" biisects are all just 3.17-rc1 plus staging fixes.
>
>> Reminiscent of a bug a couple releases ago. Processes about to exit, but stuck
>> in the kernel continuously faulting..
>> http://codemonkey.org.uk/junk/weird-hang.txt
>> The one I'm thinking of got fixed way before 3.17 though.
>
> Well, the staging tree was based on that 3.17-rc1 tree, so it may well
> have the bug without the fix.
>
> You have also marked 3.18-rc1 bad *twice*, along with the network
> merge, and the tty merge. That's just odd. But it doesn't make the
> bisect wrong, it just means that you fat-fingered thing and marked the
> same thing bad a couple of times.
>
> Nothing to worry about, unless it's a sign of early Parkinsons...
>
>> Does that trace ring a bell of something else I could try on top of
>> each bisection point ?
>
> Hmm.
>
> Smells somewhat like the "pipe/page fault oddness" bug you reported.
>
> That one caused endless page faults on fault_in_pages_writeable()
> because of a page table entry that the VM thought was present, but the
> CPU thought was missing.
>
> That caused the whole "pte_protnone()" thing, and trying to get rid of
> the PTE_NUMA bit, but those patches have *not* been merged. And you
> were ever able to reproduce it., so we left it as pending.
>
> But if you actually really think that the bisect log you posted is
> real and true and actually is the bug you're chasing, I have bad news
> for you: do a "gitk --bisect", and you'll see that all the remaining
> commits are just to staging drivers.
>
> So that would either imply you have some staging driver (unlikely), or
> more likely that 3.17 really already has the problem, it's just that
> it needs some particular code alignment or phase of the moon or
> something to trigger.

I COULD trigger it with 3.17. Took much longer, but I've seen it once.
And from Xen hypervisor data it was clear it was the same bug (cpu
spinning in pmd_lock()).


Juergen


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
       [not found]               ` <CA+55aFx1SiFBzmA=k9jHxi3cZE3Ei_+2NHepujgf86KEvkz8eQ@mail.gmail.com>
@ 2014-11-26  5:11                 ` Dave Jones
  2014-11-26  5:24                 ` Juergen Gross
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-26  5:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jürgen Groß, the arch/x86 maintainers, Kernel Mailing List

On Tue, Nov 25, 2014 at 09:09:45PM -0800, Linus Torvalds wrote:
 > On Nov 25, 2014 8:39 PM, "Jürgen Groß" <jgross@suse.com> wrote:
 > >
 > > I COULD trigger it with 3.17. Took much longer, but I've seen it once.
 > > And from Xen hypervisor data it was clear it was the same bug (cpu
 > > spinning in pmd_lock()).
 > 
 > I'm still hoping you can give a back trace. I'd like to know what access it
 > is that can trigger this, and preferably what the call chain to it was...
 > 
 > I do believe it happened in 3.17, I just want to understand the but more -
 > not just apply the fix..
 > 
 > Most of Dave's lockup back traces did not have the whole page fault in
 > them, so while Dave has seen this too, there might be different symptoms...

Before giving 3.17 a multi-day workout, I'll try rc6 with Jürgen's patch
to see if that makes any difference at all for me.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
       [not found]               ` <CA+55aFx1SiFBzmA=k9jHxi3cZE3Ei_+2NHepujgf86KEvkz8eQ@mail.gmail.com>
  2014-11-26  5:11                 ` Dave Jones
@ 2014-11-26  5:24                 ` Juergen Gross
  2014-11-26  5:52                   ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Juergen Gross @ 2014-11-26  5:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: the arch/x86 maintainers, Kernel Mailing List, Dave Jones

On 11/26/2014 06:09 AM, Linus Torvalds wrote:
>
> On Nov 25, 2014 8:39 PM, "Jürgen Groß" <jgross@suse.com
> <mailto:jgross@suse.com>> wrote:
>  >
>  > I COULD trigger it with 3.17. Took much longer, but I've seen it once.
>  > And from Xen hypervisor data it was clear it was the same bug (cpu
>  > spinning in pmd_lock()).
>
> I'm still hoping you can give a back trace. I'd like to know what access
> it is that can trigger this, and preferably what the call chain to it was...

Working on it. Triggering it via sysrq(l) isn't working: machine hung
up. I'll try a dump, but this might take some time due to the machine
size...

If this isn't working I can always modify the hypervisor to show me
more of the kernel stack in that situation. This will be a pure dump,
but it should be possible to extract the back trace from that.

>
> I do believe it happened in 3.17, I just want to understand the but more
> - not just apply the fix..

Sure.

>
> Most of Dave's lockup back traces did not have the whole page fault in
> them, so while Dave has seen this too, there might be different symptoms...

Stay tuned... :-)


Juergen


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  5:24                 ` Juergen Gross
@ 2014-11-26  5:52                   ` Linus Torvalds
  2014-11-26  6:21                     ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-26  5:52 UTC (permalink / raw)
  To: Juergen Gross; +Cc: the arch/x86 maintainers, Kernel Mailing List, Dave Jones

On Tue, Nov 25, 2014 at 9:24 PM, Juergen Gross <jgross@suse.com> wrote:
>
> Working on it. Triggering it via sysrq(l) isn't working: machine hung
> up. I'll try a dump, but this might take some time due to the machine
> size...

Actually, in that patch that did this:

-       pgd = pgd_offset(current->active_mm, address);
+       pgd = (pgd_t *)__va(read_cr3()) + pgd_index(address);

make the code do:

        pgd = (pgd_t *)__va(read_cr3()) + pgd_index(address);
        WARN_ON(pdg != pgd_offset(current->active_mm, address));

and now you should get a nice backtrace for exactly when it happens,
but it's on a working kernel, so nothing will lock up.

Hmm?

And leave it running for a while, and see if the trace is always the
same, or if there are variations on it...

Thanks,

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  5:52                   ` Linus Torvalds
@ 2014-11-26  6:21                     ` Linus Torvalds
  2014-11-26  6:52                       ` Juergen Gross
                                         ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-26  6:21 UTC (permalink / raw)
  To: Juergen Gross; +Cc: the arch/x86 maintainers, Kernel Mailing List, Dave Jones

On Tue, Nov 25, 2014 at 9:52 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And leave it running for a while, and see if the trace is always the
> same, or if there are variations on it...

Amusing.

Lookie here:

   http://lists.xenproject.org/archives/html/xen-changelog/2005-08/msg00310.html

That's from 2005.

Anyway, I don't see why the cr3 issue matters, *unless* there is some
situation where the scheduler can run with interrupts enabled. And why
this is Xen-related, I have no idea.

The Xen patches seem to have lost that

 /* On Xen the line below does not always work. Needs investigating! */

line when backporting the 2.6.29 patches to Xen. And clearly nobody
investigated.

So please do get me back-traces, and we'll investigate. Better late
than never. But it does sound Xen-specific - although it's possible
that Xen just triggers some timing (and has apparently been able to
trigger it since 2005) that DaveJ now triggers on his one machine.

So DaveJ, even though this does appear Xen-centric (Xentric?) and
you're running on bare hardware, maybe you could do the same thing in
that x86-64 vmalloc_fault(). The timing with Jürgen is kind of
intriguing - if 3.18-rc made it happen much more often for him, maybe
it really is very timing-sensitive, and you actually are seeing a
non-Xen version of the same thing...

                           Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  6:21                     ` Linus Torvalds
@ 2014-11-26  6:52                       ` Juergen Gross
  2014-11-26  9:44                       ` Juergen Gross
  2014-11-26 14:34                       ` Dave Jones
  2 siblings, 0 replies; 486+ messages in thread
From: Juergen Gross @ 2014-11-26  6:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: the arch/x86 maintainers, Kernel Mailing List, Dave Jones

On 11/26/2014 07:21 AM, Linus Torvalds wrote:
> On Tue, Nov 25, 2014 at 9:52 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> And leave it running for a while, and see if the trace is always the
>> same, or if there are variations on it...
>
> Amusing.
>
> Lookie here:
>
>     http://lists.xenproject.org/archives/html/xen-changelog/2005-08/msg00310.html
>
> That's from 2005.

:-)

>
> Anyway, I don't see why the cr3 issue matters, *unless* there is some
> situation where the scheduler can run with interrupts enabled. And why
> this is Xen-related, I have no idea.
>
> The Xen patches seem to have lost that
>
>   /* On Xen the line below does not always work. Needs investigating! */
>
> line when backporting the 2.6.29 patches to Xen. And clearly nobody
> investigated.
>
> So please do get me back-traces, and we'll investigate. Better late
> than never. But it does sound Xen-specific - although it's possible
> that Xen just triggers some timing (and has apparently been able to
> trigger it since 2005) that DaveJ now triggers on his one machine.

Yeah, this sounds plausible.

I'm working on the back traces right now, hope to have them soon.


Juergen

>
> So DaveJ, even though this does appear Xen-centric (Xentric?) and
> you're running on bare hardware, maybe you could do the same thing in
> that x86-64 vmalloc_fault(). The timing with Jürgen is kind of
> intriguing - if 3.18-rc made it happen much more often for him, maybe
> it really is very timing-sensitive, and you actually are seeing a
> non-Xen version of the same thing...
>
>                             Linus
>


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  6:21                     ` Linus Torvalds
  2014-11-26  6:52                       ` Juergen Gross
@ 2014-11-26  9:44                       ` Juergen Gross
  2014-11-26 14:34                       ` Dave Jones
  2 siblings, 0 replies; 486+ messages in thread
From: Juergen Gross @ 2014-11-26  9:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: the arch/x86 maintainers, Kernel Mailing List, Dave Jones,
	Konrad Rzeszutek Wilk, David Vrabel, xen-devel

On 11/26/2014 07:21 AM, Linus Torvalds wrote:
> On Tue, Nov 25, 2014 at 9:52 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> And leave it running for a while, and see if the trace is always the
>> same, or if there are variations on it...
>
> Amusing.
>
> Lookie here:
>
>     http://lists.xenproject.org/archives/html/xen-changelog/2005-08/msg00310.html
>
> That's from 2005.
>
> Anyway, I don't see why the cr3 issue matters, *unless* there is some
> situation where the scheduler can run with interrupts enabled. And why
> this is Xen-related, I have no idea.
>
> The Xen patches seem to have lost that
>
>   /* On Xen the line below does not always work. Needs investigating! */
>
> line when backporting the 2.6.29 patches to Xen. And clearly nobody
> investigated.
>
> So please do get me back-traces, and we'll investigate. Better late
> than never. But it does sound Xen-specific - although it's possible
> that Xen just triggers some timing (and has apparently been able to
> trigger it since 2005) that DaveJ now triggers on his one machine.
>
> So DaveJ, even though this does appear Xen-centric (Xentric?) and
> you're running on bare hardware, maybe you could do the same thing in
> that x86-64 vmalloc_fault(). The timing with Jürgen is kind of
> intriguing - if 3.18-rc made it happen much more often for him, maybe
> it really is very timing-sensitive, and you actually are seeing a
> non-Xen version of the same thing...

Very interesting: I've updated my test-machine yesterday to the newest
Xen version after I've got rid of the lockups to avoid another problem
I was seeing. With this version I don't get the lockups any more even
with the unmodified 3.18-rc kernel.

Digging deeper I found something making me believe I've seen another
issue than Dave which just looked similar on the surface. :-(

My Xen problem was related to an error in freeing grant pages (pages
mapped in from another domain). One detail in the handling of such
mappings is interesting: the "private" member of the page structure
is used to hold the machine frame number of the mapped memory page.
Another usage of this "private" member is in the pgd handling of Xen
(see xen_pgd_alloc() and xen_get_user_pgd()) to hold the pgd of the
user address space (kernel and user are in separate address spaces on
Xen). So with an error in the grant page handling I could imagine a
pgd's private member could be clobbered leading to effects like the one
I've observed. And this could have been the problem in 2005, too.

And why is my patch working? I think it's just because cr3 is always
written with a page aligned value while the clobbered "private" member
of the Xen pgd is not page aligned resulting in a different pointer.
I'm still using the wrong page for the user's pgd, but this seems not
to lead to fatal errors when nearly nothing is running on the machine.
I've seen Xen messages occasionally indicating there was something
wrong with the page table handling of the kernel (pages used as page
tables not known to Xen as such).

I hope this all makes sense.

And just for the records: with the actual Xen version (tweaked to
show the grant page error again) I see different lockups with the
following backtrace:

[ 1122.256305] NMI watchdog: BUG: soft lockup - CPU#94 stuck for 23s! 
[systemd-udevd:1179]
[ 1122.303427] Modules linked in: xen_blkfront msr bridge stp llc 
iscsi_ibft ipmi_devintf nls_utf8 x86_pkg_temp_thermal intel_powerclamp 
nls_cp437 coretemp crct10dif_pclmul vfat crc32_pclmul fat crc32c_intel 
ghash_clmulni_intel snd_pcm aesni_intel aes_x86_64 snd_timer lrw 
be2iscsi be2net gf128mul libiscsi snd glue_helper joydev vxlan soundcore 
scsi_transport_iscsi ablk_helper iTCO_wdt ixgbe igb mdio ip6_udp_tunnel 
iTCO_vendor_support efivars evdev iscsi_boot_sysfs udp_tunnel cryptd dca 
pcspkr sb_edac e1000e edac_core lpc_ich i2c_i801 ptp mfd_core pps_core 
shpchp tpm_infineon ipmi_si tpm_tis ipmi_msghandler tpm button xenfs 
xen_privcmd xen_acpi_processor processor thermal_sys xen_pciback 
xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn dm_mod 
efivarfs crc32c_generic btrfs xor raid6_pq hid_generic
[ 1122.303450]  usbhid hid sd_mod mgag200 ehci_pci i2c_algo_bit ehci_hcd 
drm_kms_helper ttm usbcore drm megaraid_sas usb_common sg scsi_mod autofs4
[ 1122.303456] CPU: 94 PID: 1179 Comm: systemd-udevd Tainted: G 
     L 3.18.0-rc5+ #304
[ 1122.303458] Hardware name: FUJITSU PRIMEQUEST 2800E/SB, BIOS 
PRIMEQUEST 2000 Series BIOS Version 01.59 07/24/2014
[ 1122.303459] task: ffff881f17b56ce0 ti: ffff881f0fff0000 task.ti: 
ffff881f0fff0000
[ 1122.303460] RIP: e030:[<ffffffff814fcf5e>]  [<ffffffff814fcf5e>] 
_raw_spin_lock+0x1e/0x30
[ 1122.303462] RSP: e02b:ffff881f0fff3ce8  EFLAGS: 00000282
[ 1122.303463] RAX: 000000000000ba43 RBX: 00003ffffffff000 RCX: 
0000000000000190
[ 1122.303464] RDX: 0000000000000190 RSI: 000000190ba43067 RDI: 
ffffea000157c350
[ 1122.303465] RBP: ffff880000000c70 R08: 0000000000000000 R09: 
0000000000000000
[ 1122.303466] R10: 000000000001b688 R11: ffff881fdf24ad80 R12: 
ffffea0000000000
[ 1122.303466] R13: ffff88006237cc70 R14: 0000000000000000 R15: 
00007f70f438e000
[ 1122.303470] FS:  00007f70f5c49880(0000) GS:ffff881f4c5c0000(0000) 
knlGS:ffff881f4c5c0000
[ 1122.303471] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1122.303472] CR2: 00007f70f5c68000 CR3: 0000001f111b7000 CR4: 
0000000000042660
[ 1122.303473] Stack:
[ 1122.303474]  ffffffff81155850 ffff881fdf24ad80 00007f70f438f000 
ffff881f138ae5d8
[ 1122.303476]  ffff881f08ead400 ffff881f0fff3fd8 0000000000000000 
ffff881eff0cbd08
[ 1122.303477]  ffff881f18b57d08 ffffea000157c320 ffffea006ccc5ec8 
ffff881f0fc00800
[ 1122.303479] Call Trace:
[ 1122.303481]  [<ffffffff81155850>] ? copy_page_range+0x460/0xa10
[ 1122.303484]  [<ffffffff8105d727>] ? copy_process.part.27+0x13e7/0x1b10
[ 1122.303486]  [<ffffffff81435f41>] ? netlink_insert+0x91/0xb0
[ 1122.303488]  [<ffffffff813f85c9>] ? release_sock+0x19/0x160
[ 1122.303490]  [<ffffffff8105dff8>] ? do_fork+0xc8/0x320
[ 1122.303492]  [<ffffffff814fd779>] ? stub_clone+0x69/0x90
[ 1122.303493]  [<ffffffff814fd42d>] ? system_call_fastpath+0x16/0x1b
[ 1122.303494] Code: 90 0f b7 17 66 39 d0 75 f6 eb e8 66 90 b8 00 00 01 
00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 89 d1 75 01 c3 0f b7 07 66 39 d0 
74 f7 <f3> 90 0f b7 07 66 39 c8 75 f6 c3 0f 1f 80 00 00 00 00 65 81 04

But if my assumptions above are correct this is meaningless, as using
an arbitrary memory page as pgd might result in anything...


Juergen

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  6:21                     ` Linus Torvalds
  2014-11-26  6:52                       ` Juergen Gross
  2014-11-26  9:44                       ` Juergen Gross
@ 2014-11-26 14:34                       ` Dave Jones
  2014-11-26 17:37                         ` Linus Torvalds
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-11-26 14:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Juergen Gross, the arch/x86 maintainers, Kernel Mailing List

On Tue, Nov 25, 2014 at 10:21:46PM -0800, Linus Torvalds wrote:
 
 > So DaveJ, even though this does appear Xen-centric (Xentric?) and
 > you're running on bare hardware, maybe you could do the same thing in
 > that x86-64 vmalloc_fault(). The timing with Jürgen is kind of
 > intriguing - if 3.18-rc made it happen much more often for him, maybe
 > it really is very timing-sensitive, and you actually are seeing a
 > non-Xen version of the same thing...

I did try your WARN variant (after fixing the typo)

Woke up to the below trace. Looks like a different issue.

Nnngh.

	Dave

NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c149:24766]
CPU: 2 PID: 24766 Comm: trinity-c149 Not tainted 3.18.0-rc6+ #98 [loadavg: 156.09 150.24 148.56 21/402 26750]
task: ffff8802285b96f0 ti: ffff8802260e0000 task.ti: ffff8802260e0000
RIP: 0010:[<ffffffff8104658c>]  [<ffffffff8104658c>] kernel_map_pages+0xbc/0x120
RSP: 0018:ffff8802260e3768  EFLAGS: 00000202
RAX: 00000000001407e0 RBX: ffffffff817e0c24 RCX: 0000000000140760
RDX: 0000000000000202 RSI: ffff8800000006b0 RDI: 0000000000000001
RBP: ffff8802260e37c8 R08: 8000000000000063 R09: ffff880000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff880200000001
R13: 0000000000010000 R14: 0000000001b60000 R15: 0000000000000000
FS:  00007fb8ef71d740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000018b87f0 CR3: 00000002277fe000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 00007f60b71b4000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff880123cd7000 ffff8802260e3768 0000000000000000 0000000000000003
 0000000000000000 0000000100000001 0000000000123cd6 0000000000000000
 0000000000000000 00000000ade00558 ffff8802445d7638 0000000000000001
Call Trace:
 [<ffffffff81185ebf>] get_page_from_freelist+0x49f/0xaa0
 [<ffffffff810a7431>] ? get_parent_ip+0x11/0x50
 [<ffffffff811866ee>] __alloc_pages_nodemask+0x22e/0xb60
 [<ffffffff810ad5c5>] ? local_clock+0x25/0x30
 [<ffffffff810c6e7c>] ? __lock_acquire.isra.31+0x22c/0x9f0
 [<ffffffff813775e0>] ? __radix_tree_preload+0x60/0xf0
 [<ffffffff810a7431>] ? get_parent_ip+0x11/0x50
 [<ffffffff810c546d>] ? lock_release_holdtime.part.24+0x9d/0x160
 [<ffffffff811d093e>] alloc_pages_vma+0xee/0x1b0
 [<ffffffff81194f0e>] ? shmem_alloc_page+0x6e/0xc0
 [<ffffffff810c6e7c>] ? __lock_acquire.isra.31+0x22c/0x9f0
 [<ffffffff81194f0e>] shmem_alloc_page+0x6e/0xc0
 [<ffffffff810a7431>] ? get_parent_ip+0x11/0x50
 [<ffffffff810a75ab>] ? preempt_count_sub+0x7b/0x100
 [<ffffffff8139ac66>] ? __percpu_counter_add+0x86/0xb0
 [<ffffffff811b2396>] ? __vm_enough_memory+0x66/0x1c0
 [<ffffffff8117cac5>] ? find_get_entry+0x5/0x120
 [<ffffffff81300937>] ? cap_vm_enough_memory+0x47/0x50
 [<ffffffff81197880>] shmem_getpage_gfp+0x4d0/0x7e0
 [<ffffffff81197bd2>] shmem_write_begin+0x42/0x70
 [<ffffffff8117c2d4>] generic_perform_write+0xd4/0x1f0
 [<ffffffff8117eac2>] __generic_file_write_iter+0x162/0x350
 [<ffffffff811f0070>] ? new_sync_read+0xd0/0xd0
 [<ffffffff8117ecef>] generic_file_write_iter+0x3f/0xb0
 [<ffffffff8117ecb0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffff811f01b8>] do_iter_readv_writev+0x78/0xc0
 [<ffffffff811f19e8>] do_readv_writev+0xd8/0x2a0
 [<ffffffff8117ecb0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffff8117ecb0>] ? __generic_file_write_iter+0x350/0x350
 [<ffffffff810c54b6>] ? lock_release_holdtime.part.24+0xe6/0x160
 [<ffffffff810a7431>] ? get_parent_ip+0x11/0x50
 [<ffffffff810a75ab>] ? preempt_count_sub+0x7b/0x100
 [<ffffffff817df36b>] ? _raw_spin_unlock_irq+0x3b/0x60
 [<ffffffff811f1c3c>] vfs_writev+0x3c/0x50
 [<ffffffff811f1dac>] SyS_writev+0x5c/0x100
 [<ffffffff817e0249>] tracesys_phase2+0xd4/0xd9
Code: 65 48 33 04 25 28 00 00 00 75 75 48 83 c4 50 5b 41 5c 5d c3 0f 1f 00 9c 5a fa 0f 20 e0 48 89 c1 80 e1 7f 0f 22 e1 0f 22 e0 52 9d <eb> cf 66 90 49 bc 00 00 00 00 00 88 ff ff 48 63 f6 49 01 fc 48 
sending NMI to other CPUs:


<nothing further on console, accidentally had panic=1 set>

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26 14:34                       ` Dave Jones
@ 2014-11-26 17:37                         ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-26 17:37 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Juergen Gross,
	the arch/x86 maintainers, Kernel Mailing List

On Wed, Nov 26, 2014 at 6:34 AM, Dave Jones <davej@redhat.com> wrote:
>
> Woke up to the below trace. Looks like a different issue.

Yeah, apparently the Xen issue was really just a Xen bug.

> NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c149:24766]
> RIP: 0010:[<ffffffff8104658c>]  [<ffffffff8104658c>] kernel_map_pages+0xbc/0x120

Well, this one at least makes some amount of sense. The "Code:" line says it's

  1b: 9c                   pushfq
  1c: 5a                   pop    %rdx
  1d: fa                   cli
  1e: 0f 20 e0             mov    %cr4,%rax
  21: 48 89 c1             mov    %rax,%rcx
  24: 80 e1 7f             and    $0x7f,%cl
  27: 0f 22 e1             mov    %rcx,%cr4
  2a: 0f 22 e0             mov    %rax,%cr4
  2d: 52                   push   %rdx
  2e: 9d                   popfq
  2f:* eb cf                 jmp    back <-- trapping instruction

and %rdx is 0x0202 which is actually a valid flags value.

That looks like the code for __native_flush_tlb_global().

Not that interrupt should have been disabled very long.

> Call Trace:
>  [<ffffffff81185ebf>] get_page_from_freelist+0x49f/0xaa0
>  [<ffffffff811866ee>] __alloc_pages_nodemask+0x22e/0xb60
>  [<ffffffff811d093e>] alloc_pages_vma+0xee/0x1b0
>  [<ffffffff81194f0e>] shmem_alloc_page+0x6e/0xc0
>  [<ffffffff81197880>] shmem_getpage_gfp+0x4d0/0x7e0
>  [<ffffffff81197bd2>] shmem_write_begin+0x42/0x70
>  [<ffffffff8117c2d4>] generic_perform_write+0xd4/0x1f0
>  [<ffffffff8117eac2>] __generic_file_write_iter+0x162/0x350
>  [<ffffffff8117ecef>] generic_file_write_iter+0x3f/0xb0
>  [<ffffffff811f01b8>] do_iter_readv_writev+0x78/0xc0
>  [<ffffffff811f19e8>] do_readv_writev+0xd8/0x2a0
>  [<ffffffff811f1c3c>] vfs_writev+0x3c/0x50
>  [<ffffffff811f1dac>] SyS_writev+0x5c/0x100
>  [<ffffffff817e0249>] tracesys_phase2+0xd4/0xd9

Hmm. Maybe some oom issue, that we spent a long time before this
trying to free pages?

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26  2:40             ` Dave Jones
@ 2014-11-26 22:57               ` Dave Jones
  2014-11-27  0:46                 ` Linus Torvalds
  2014-11-27 19:17                 ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-26 22:57 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Tue, Nov 25, 2014 at 09:40:32PM -0500, Dave Jones wrote:
 > On Tue, Nov 25, 2014 at 05:48:15PM -0800, Linus Torvalds wrote:
 > 
 >  > So that would either imply you have some staging driver (unlikely), or
 >  > more likely that 3.17 really already has the problem, it's just that
 >  > it needs some particular code alignment or phase of the moon or
 >  > something to trigger.
 > 
 > Maybe I'll try 3.17 + perf fix for an even longer runtime.
 > Like over thanksgiving or something.

Dammit, dammit, dammit.

I didn't even have to wait that long.

[19861.135201] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c132:26979]
[19861.135652] Modules linked in: snd_seq_dummy 8021q garp stp fuse tun hidp bnep rfcomm af_key llc2 scsi_transport_iscsi nfnetlink can_bcm nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can_raw can pppoe pppox ppp_ge
neric slhc irda crc_ccitt rds rose sctp libcrc32c x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi crc
t10dif_pclmul crc32c_intel snd_hda_intel snd_hda_controller snd_hda_codec ghash_clmulni_intel snd_hwdep pcspkr snd_seq snd_seq_device serio_raw usb_debug snd_pcm e1000e snd_timer microcode ptp snd pps_core shpchp soundcore nfsd auth_rpcgs
s oid_registry nfs_acl lockd sunrpc
[19861.138604] CPU: 1 PID: 26979 Comm: trinity-c132 Not tainted 3.17.0+ #2
[19861.139229] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
[19861.139897] task: ffff8801ec6716f0 ti: ffff8801b5bf8000 task.ti: ffff8801b5bf8000
[19861.140564] RIP: 0010:[<ffffffff81369585>]  [<ffffffff81369585>] copy_user_enhanced_fast_string+0x5/0x10
[19861.141263] RSP: 0018:ffff8801b5bfbcf0  EFLAGS: 00010206
[19861.141974] RAX: ffff8801b5bfbe48 RBX: 0000000000000003 RCX: 0000000000000a1d
[19861.142688] RDX: 0000000000001000 RSI: 00007f6f89ef85e3 RDI: ffff8801750445e3
[19861.143416] RBP: ffff8801b5bfbd30 R08: 0000000000000000 R09: 0000000000000001
[19861.144164] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b5bfbc78
[19861.144909] R13: ffff8801d702ed70 R14: ffffffff810a3d2b R15: ffff8801b5bfbc60
[19861.145668] FS:  00007f6f89eeb740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[19861.146440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19861.147218] CR2: 00007f6f89ef3000 CR3: 00000001cddb5000 CR4: 00000000001407e0
[19861.148014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19861.148828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[19861.149640] Stack:
[19861.150443]  ffffffff8119a4f8 160000007c331000 0000000000001000 160000007c331000
[19861.151283]  0000000000001000 ffff8801b5bfbe58 0000000000000000 ffff8801d702f0a0
[19861.152133]  ffff8801b5bfbdc0 ffffffff81170474 ffff8801b5bfbd88 0000000000001000
[19861.152995] Call Trace:
[19861.153851]  [<ffffffff8119a4f8>] ? iov_iter_copy_from_user_atomic+0x78/0x1c0
[19861.154738]  [<ffffffff81170474>] generic_perform_write+0xf4/0x1e0
[19861.155636]  [<ffffffff811ff1da>] ? file_update_time+0xaa/0xf0
[19861.156536]  [<ffffffff81172ba2>] __generic_file_write_iter+0x162/0x350
[19861.157447]  [<ffffffff81172dcf>] generic_file_write_iter+0x3f/0xb0
[19861.158365]  [<ffffffff811e17ae>] new_sync_write+0x8e/0xd0
[19861.159287]  [<ffffffff811e202a>] vfs_write+0xba/0x1f0
[19861.160214]  [<ffffffff811e2e42>] SyS_pwrite64+0x92/0xc0
[19861.161152]  [<ffffffff817b62a4>] tracesys+0xdd/0xe2
[19861.162091] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[19861.164217] sending NMI to other CPUs:
[19861.165221] NMI backtrace for cpu 2
[19861.166099] CPU: 2 PID: 28083 Comm: trinity-c151 Not tainted 3.17.0+ #2
[19861.167084] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
[19861.168113] task: ffff8800746116f0 ti: ffff8801c6894000 task.ti: ffff8801c6894000
[19861.169152] RIP: 0010:[<ffffffff810fb326>]  [<ffffffff810fb326>] smp_call_function_many+0x276/0x320
[19861.170223] RSP: 0000:ffff8801c6897b00  EFLAGS: 00000202
[19861.171295] RAX: 0000000000000001 RBX: ffff8802445d4c40 RCX: ffff8802443da408
[19861.172384] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 0000000000000000
[19861.173483] RBP: ffff8801c6897b40 R08: ffff880242469ce0 R09: 0000000100180011
[19861.174590] R10: ffff880243c04240 R11: 0000000000000000 R12: 0000000000000001
[19861.175703] R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000008
[19861.176822] FS:  00007f6f89eeb740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[19861.177956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19861.179103] CR2: 0000000002400000 CR3: 0000000231685000 CR4: 00000000001407e0
[19861.180264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19861.181428] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[19861.182595] Stack:
[19861.183764]  ffff88024d64dd00 000000014d64dd00 00000000001d4c00 ffffffff82be41a0
[19861.184969]  0000000000000002 ffffffff8117a700 0000000000000000 0000000000000001
[19861.186167]  ffff8801c6897b78 ffffffff810fb542 0000000000000003 0000000000000008
[19861.187355] Call Trace:
[19861.188538]  [<ffffffff8117a700>] ? drain_pages+0xc0/0xc0
[19861.189709]  [<ffffffff810fb542>] on_each_cpu_mask+0x42/0xc0
[19861.190853]  [<ffffffff811768b1>] drain_all_pages+0x101/0x120
[19861.191989]  [<ffffffff8117af40>] __alloc_pages_nodemask+0x7d0/0xb20
[19861.193130]  [<ffffffff811c2b11>] alloc_pages_vma+0xf1/0x1b0
[19861.194258]  [<ffffffff811d705c>] ? do_huge_pmd_anonymous_page+0x10c/0x3e0
[19861.195367]  [<ffffffff811d705c>] do_huge_pmd_anonymous_page+0x10c/0x3e0
[19861.196450]  [<ffffffff811a10dc>] handle_mm_fault+0x14c/0xe90
[19861.197509]  [<ffffffff81041940>] ? __do_page_fault+0x140/0x600
[19861.198540]  [<ffffffff810419a4>] __do_page_fault+0x1a4/0x600
[19861.199550]  [<ffffffff810a3bcd>] ? get_parent_ip+0xd/0x50
[19861.200539]  [<ffffffff810a3d2b>] ? preempt_count_sub+0x6b/0xf0
[19861.201514]  [<ffffffff810c0b6e>] ? put_lock_stats.isra.23+0xe/0x30
[19861.202467]  [<ffffffff8136ad3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[19861.203407]  [<ffffffff81041e0c>] do_page_fault+0xc/0x10
[19861.204331]  [<ffffffff817b7d72>] page_fault+0x22/0x30
[19861.205249] Code: 00 41 89 c4 39 f0 0f 8d 25 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 a0 b9 d1 81 f6 41 18 01 74 14 0f 1f 44 00 00 f3 90 f6 41 18 01 <75> f8 48 63 35 45 3b c2 00 83 f8 ff 48 8b 7b 08 74 b0 39 c6 77 
[19861.207272] NMI backtrace for cpu 0
[19861.207376] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 42.050 msecs
[19861.209220] CPU: 0 PID: 28128 Comm: trinity-c242 Not tainted 3.17.0+ #2
[19861.210200] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
[19861.211210] task: ffff8802168716f0 ti: ffff88007467c000 task.ti: ffff88007467c000
[19861.212215] RIP: 0010:[<ffffffff810c26ea>]  [<ffffffff810c26ea>] __lock_acquire.isra.31+0xfa/0x9f0
[19861.213248] RSP: 0000:ffff880244003cb0  EFLAGS: 00000046
[19861.214281] RAX: 0000000000000046 RBX: ffff8802168716f0 RCX: 0000000000000000
[19861.215318] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88023fcbbc40
[19861.216346] RBP: ffff880244003d18 R08: 0000000000000001 R09: 0000000000000000
[19861.217378] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[19861.218399] R13: 0000000000000000 R14: ffff88023fcbbc40 R15: 0000000000000000
[19861.219410] FS:  00007f6f89eeb740(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[19861.220425] CS:  0010 DS: 0000 ES: 0[19861.273167] NMI backtrace for cpu 3
[19861.273315] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 107.946 msecs
[19861.274821] CPU: 3 PID: 27913 Comm: trinity-c37 Not tainted 3.17.0+ #2
[19861.275672] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
[19861.276543] task: ffff88009a735bc0 ti: ffff8801eda8c000 task.ti: ffff8801eda8c000
[19861.277422] RIP: 0010:[<ffffffff810fb322>]  [<ffffffff810fb322>] smp_call_function_many+0x272/0x320
[19861.278339] RSP: 0000:ffff8801eda8fb00  EFLAGS: 00000202
[19861.279220] RAX: 0000000000000001 RBX: ffff8802447d4c40 RCX: ffff8802443da428
[19861.280115] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 0000000000000000
[19861.281021] RBP: ffff8801eda8fb40 R08: ffff880242469a40 R09: 0000000100180011
[19861.281917] R10: ffff880243c04240 R11: 0000000000000000 R12: 0000000000000001
[19861.282813] R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000008
[19861.283690] FS:  00007f6f89eeb740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[19861.284570] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19861.285466] CR2: 0000000002400000 CR3: 00000001cdd92000 CR4: 00000000001407e0
[19861.286363] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19861.287255] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[19861.288142] Stack:
[19861.289024]  ffff88024d64dd00 000000014d64dd00 00000000001d4c00 ffffffff82be41a0
[19861.289934]  0000000000000003 ffffffff8117a700 0000000000000000 0000000000000001
[19861.290845]  ffff8801eda8fb78 ffffffff810fb542 0000000000000003 0000000000000008
[19861.291763] Call Trace:
[19861.292664]  [<ffffffff8117a700>] ? drain_pages+0xc0/0xc0
[19861.293582]  [<ffffffff810fb542>] on_each_cpu_mask+0x42/0xc0
[19861.294501]  [<ffffffff811768b1>] drain_all_pages+0x101/0x120
[19861.295439]  [<ffffffff8117af40>] __alloc_pages_nodemask+0x7d0/0xb20
[19861.296369]  [<ffffffff811c2b11>] alloc_pages_vma+0xf1/0x1b0
[19861.297292]  [<ffffffff811d705c>] ? do_huge_pmd_anonymous_page+0x10c/0x3e0
[19861.298218]  [<ffffffff811d705c>] do_huge_pmd_anonymous_page+0x10c/0x3e0
[19861.299146]  [<ffffffff811a10dc>] handle_mm_fault+0x14c/0xe90
[19861.300078]  [<ffffffff81041940>] ? __do_page_fault+0x140/0x600
[19861.301011]  [<ffffffff810419a4>] __do_page_fault+0x1a4/0x600
[19861.301946]  [<ffffffff810a3bcd>] ? get_parent_ip+0xd/0x50
[19861.302874]  [<ffffffff810a3d2b>] ? preempt_count_sub+0x6b/0xf0
[19861.303805]  [<ffffffff810c0b6e>] ? put_lock_stats.isra.23+0xe/0x30
[19861.304736]  [<ffffffff8136ad3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[19861.305665]  [<ffffffff81041e0c>] do_page_fault+0xc/0x10
[19861.306590]  [<ffffffff817b7d72>] page_fault+0x22/0x30
[19861.307527] Code: 35 78 3b c2 00 41 89 c4 39 f0 0f 8d 25 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 a0 b9 d1 81 f6 41 18 01 74 14 0f 1f 44 00 00 f3 90 <f6> 41 18 01 75 f8 48 63 35 45 3b c2 00 83 f8 ff 48 8b 7b 08 74 
[19861.309600] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 144.376 msecs


So 3.17 also has this problem.
Good news I guess in that it's not a regression, but damn I really didn't
want to have to go digging through the mists of time to find the last 'good' point.
At least it shouldn't hold up 3.18

I'll do a couple builds to run over the holidays, but next week
I think I'm going to need to approach this differently to add
more debugging somewhere/somehow.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26 22:57               ` Dave Jones
@ 2014-11-27  0:46                 ` Linus Torvalds
  2014-11-27 19:17                 ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-11-27  0:46 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Wed, Nov 26, 2014 at 2:57 PM, Dave Jones <davej@redhat.com> wrote:
>
> So 3.17 also has this problem.
> Good news I guess in that it's not a regression, but damn I really didn't
> want to have to go digging through the mists of time to find the last 'good' point.
> At least it shouldn't hold up 3.18

Ugh. That still doesn't make me very happy.

I'll try to think about this more.

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-26 22:57               ` Dave Jones
  2014-11-27  0:46                 ` Linus Torvalds
@ 2014-11-27 19:17                 ` Linus Torvalds
  2014-11-27 22:56                   ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-27 19:17 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel,
	the arch/x86 maintainers, Don Zickus

On Wed, Nov 26, 2014 at 2:57 PM, Dave Jones <davej@redhat.com> wrote:
>
> So 3.17 also has this problem.
> Good news I guess in that it's not a regression, but damn I really didn't
> want to have to go digging through the mists of time to find the last 'good' point.

So I'm looking at the watchdog code, and it seems racy wrt parking and startup.

In particular, it sets the high priority *after* starting the hrtimer,
and it goes back to SCHED_NORMAL *before* canceling the timer.

Which seems completely ass-backwards. And the smp_hotplug_thread stuff
explicitly enables preemption around the setup/cleanup/part/unpark
operations.

However, that would be an issue only if trinity might be doing things
that enable and disable the watchdog. And doing so under insane loads.
Even then it seems unlikely.

The insane loads you have. But even then, could a load average of 169
possibly delay running a non-RT process for 22 seconds? Doubtful.

But just in case: do you do cpu hotplug events (that will disable and
re-enable the watchdog process?).  Anything else that will part/unpark
the hotplug thread?

Quite frankly, I'm just grasping for straws here, but a lot of the
watchdog traces really have seemed spurious...

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-27 19:17                 ` Linus Torvalds
@ 2014-11-27 22:56                   ` Dave Jones
  2014-11-29 20:38                     ` Dâniel Fraga
  2014-12-01 16:56                     ` Don Zickus
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-11-27 22:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, the arch/x86 maintainers, Don Zickus

On Thu, Nov 27, 2014 at 11:17:16AM -0800, Linus Torvalds wrote:
 > On Wed, Nov 26, 2014 at 2:57 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > So 3.17 also has this problem.
 > > Good news I guess in that it's not a regression, but damn I really didn't
 > > want to have to go digging through the mists of time to find the last 'good' point.
 > 
 > So I'm looking at the watchdog code, and it seems racy wrt parking and startup.
 > 
 > In particular, it sets the high priority *after* starting the hrtimer,
 > and it goes back to SCHED_NORMAL *before* canceling the timer.
 > 
 > Which seems completely ass-backwards. And the smp_hotplug_thread stuff
 > explicitly enables preemption around the setup/cleanup/part/unpark
 > operations.
 > 
 > However, that would be an issue only if trinity might be doing things
 > that enable and disable the watchdog. And doing so under insane loads.
 > Even then it seems unlikely.
 > 
 > The insane loads you have. But even then, could a load average of 169
 > possibly delay running a non-RT process for 22 seconds? Doubtful.
 > 
 > But just in case: do you do cpu hotplug events (that will disable and
 > re-enable the watchdog process?).  Anything else that will part/unpark
 > the hotplug thread?

That's root-only iirc, and I'm not running trinity as root, so that
shouldn't be happening. There's also no sign of such behaviour in dmesg
when the problem occurs.

 > Quite frankly, I'm just grasping for straws here, but a lot of the
 > watchdog traces really have seemed spurious...

Agreed.

Currently leaving 3.16 running. 21hrs so far.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-27 22:56                   ` Dave Jones
@ 2014-11-29 20:38                     ` Dâniel Fraga
  2014-11-30 20:45                       ` Linus Torvalds
  2014-12-01 16:56                     ` Don Zickus
  1 sibling, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-11-29 20:38 UTC (permalink / raw)
  To: linux-kernel

On Thu, 27 Nov 2014 17:56:37 -0500
Dave Jones <davej@redhat.com> wrote:

> Agreed.
> 
> Currently leaving 3.16 running. 21hrs so far.

	Dave, I think I reported this bug in this bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=85941

	Just posting in case the call trace helps...

	In my case it happens when I watch a video on Youtube or play a
audio file...

-- 
Linux 3.16.0-00115-g19583ca: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL



^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-29 20:38                     ` Dâniel Fraga
@ 2014-11-30 20:45                       ` Linus Torvalds
  2014-11-30 21:21                         ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-11-30 20:45 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linux Kernel Mailing List

On Sat, Nov 29, 2014 at 12:38 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Dave, I think I reported this bug in this bug report:

Yours looks very different. Dave (and Sasha Levin) have reported
rcy_preempt stalls too, but it's not clear it's the same issue.

In case yours is repeatable (you seem to say it is), can you try it
without TREE_PREEMPT_RCU?

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-30 20:45                       ` Linus Torvalds
@ 2014-11-30 21:21                         ` Dâniel Fraga
  2014-12-01  0:21                           ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-11-30 21:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sun, 30 Nov 2014 12:45:31 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Yours looks very different. Dave (and Sasha Levin) have reported
> rcy_preempt stalls too, but it's not clear it's the same issue.
> 
> In case yours is repeatable (you seem to say it is), can you try it
> without TREE_PREEMPT_RCU?

	Yes, but "menuconfig" doesn't allow me to disable it (it's
always checked). Newbie question: does TREE_PREEMPT_RCU depends on any
other option? Thanks.

-- 
Linux 3.16.0-00115-g19583ca: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-30 21:21                         ` Dâniel Fraga
@ 2014-12-01  0:21                           ` Linus Torvalds
  2014-12-01  1:02                             ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-01  0:21 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linux Kernel Mailing List

On Sun, Nov 30, 2014 at 1:21 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Yes, but "menuconfig" doesn't allow me to disable it (it's
> always checked). Newbie question: does TREE_PREEMPT_RCU depends on any
> other option? Thanks.

Maybe you'll have to turn off RCU_CPU_STALL_VERBOSE first.

Although I think you should be able to just edit the .config file,
delete the like that says

    CONFIG_TREE_PREEMPT_RCU=y

and then just do a "make oldconfig", and then verify that
TREE_PREEMPT_RCU hasn't been re-enabled by some dependency. But it
shouldn't have, and that "make oldconfig" should get rid of anything
that depends on TREE_PREEMPT_RCU.

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01  0:21                           ` Linus Torvalds
@ 2014-12-01  1:02                             ` Dâniel Fraga
  2014-12-01 19:14                               ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-01  1:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sun, 30 Nov 2014 16:21:19 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Maybe you'll have to turn off RCU_CPU_STALL_VERBOSE first.
> 
> Although I think you should be able to just edit the .config file,
> delete the like that says
> 
>     CONFIG_TREE_PREEMPT_RCU=y
> 
> and then just do a "make oldconfig", and then verify that
> TREE_PREEMPT_RCU hasn't been re-enabled by some dependency. But it
> shouldn't have, and that "make oldconfig" should get rid of anything
> that depends on TREE_PREEMPT_RCU.
	
	Ok, I did exactly that, but CONFIG_TREE_PREEMPT_RCU is
re-enabled. I talked with Pranith Kumar and he suggested I could just
disable preemption (No Forced Preemption (Server)) and that's the only
way to disable CONFIG_TREE_PREEMPT_RCU.

	Now I'll try to make the system freeze, then I'll send
you the Call trace.

	Thanks.

-- 
Linux 3.17.0: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-25 12:22                                     ` Will Deacon
@ 2014-12-01 11:48                                       ` Will Deacon
  2014-12-01 17:05                                         ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Will Deacon @ 2014-12-01 11:48 UTC (permalink / raw)
  To: Dave Jones, Andy Lutomirski, Linus Torvalds, Don Zickus,
	Thomas Gleixner, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra

On Tue, Nov 25, 2014 at 12:22:17PM +0000, Will Deacon wrote:
> I'm not sure if this is useful, but I've been seeing trinity lockups
> on arm64 as well. Sometimes they happen a few times a day, sometimes it
> takes a few days (I just saw my first one on -rc6, for example).
> 
> However, I have a little bit more trace than you do and *every single time*
> the lockup has involved an execve to a virtual file system.

Ok, just hit another one of these and I have a little bit more info this
time. The trinity log is:

[child1:27912] [438] execve(name="/proc/602/task/602/oom_score", argv=0x3a8426c0, envp=0x3a7a3bb0) = 0    # wtf
[child0:27837] [1081] setfsuid(uid=-128) = 0x3fffb000
[child0:27837] [1082] shmdt(shmaddr=0x7f92269000) = -1 (Invalid argument)
[child0:27837] [1083] fchmod(fd=676, mode=5237) = -1 (Operation not permitted)
[child0:27837] [1084] setuid(uid=0xffffe000) = -1 (Operation not permitted)
[child0:27837] [1085] newfstatat(dfd=676, filename="/proc/612/fdinfo/390", statbuf=0x7f935d8000, flag=0x0) = -1 (Permission denied)
[child0:27837] [1086] process_vm_readv(pid=0, lvec=0x3a7a3d70, liovcnt=47, rvec=0x3a7a4070, riovcnt=74, flags=0x0) = -1 (No such process)
[child0:27837] [1087] clock_gettime(which_clock=0x890000000000003f, tp=0x0) = -1 (Invalid argument)
[child0:27837] [1088] accept(fd=676, upeer_sockaddr=0x3a842360, upeer_addrlen=16) = -1 (Socket operation on non-socket)
[child0:27837] [1089] getpid() = 0x6cbd
[child0:27837] [1090] getpeername(fd=496, usockaddr=0x3a842310, usockaddr_len=16) = -1 (Socket operation on non-socket)
[child0:27837] [1091] timer_getoverrun(timer_id=0x4ff8e1) = -1 (Invalid argument)
[child0:27837] [1092] sigaltstack(uss=0x7f93069000, uoss=0x0, regs=0x0) = -1 (Invalid argument)
[child0:27837] [1093] io_cancel(ctx_id=-3, iocb=0x0, result=0xffffffc000080000) = -1 (Bad address)
[child0:27837] [1094] mknodat(dfd=496, filename="/proc/irq/84/affinity_hint", mode=0xa2c03013110804a0, dev=0xfbac6adf1379fada) = -1 (File exists)
[child0:27837] [1095] clock_nanosleep(which_clock=0x2, flags=0x1, rqtp=0x0, rmtp=0xffffffc000080000) = -1 (Bad address)
[child0:27837] [1096] reboot(magic1=-52, magic2=0xffffff1edbdf7fff, cmd=0xffb5179bfafbfbff, arg=0x0) = -1 (Operation not permitted)
[child0:27837] [1097] sched_yield() = 0
[child0:27837] [1098] getpid() = 0x6cbd
[child0:27837] [1099] newuname(name=0x400008200405) = -1 (Bad address)
[child0:27837] [1100] vmsplice(fd=384, iov=0x3a88fc20, nr_segs=687, flags=0x2) = -1 (Resource temporarily unavailable)
[child0:27837] [1101] timerfd_gettime(ufd=496, otmr=0x1) = -1 (Invalid argument)
[child0:27837] [1102] getcwd(buf=0x0, size=111) = -1 (Bad address)
[child0:27837] [1103] setdomainname(name=0x0, len=0) = -1 (Operation not permitted)
[child0:27837] [1104] sched_getparam(pid=0, param=0xbaedc7bf7ffaf2fe) = -1 (Bad address)
[child0:27837] [1105] readlinkat(dfd=496, pathname="/proc/4/task/4/net/netstat", buf=0x7f935d4000, bufsiz=0) = -1 (Invalid argument)
[child0:27837] [1106] shmctl(shmid=0xa1000000000000ff, cmd=0x7dad54836e49ff1d, buf=0x900000000000002c) = -1 (Invalid argument)
[child0:27837] [1107] getpgid(pid=0) = 0x6cbd
[child0:27837] [1108] flistxattr(fd=496, list=0xffffffffffffffdf, size=0xe7ff) = 0
[child0:27837] [1109] remap_file_pages(start=0x7f9324b000, size=0xfffffffffffaaead, prot=0, pgoff=0, flags=0x0) = -1 (Invalid argument)
[child0:27837] [1110] io_submit(ctx_id=0xffbf, nr=0xffbef, iocbpp=0x8) = -1 (Invalid argument)
[child0:27837] [1111] flistxattr(fd=384, list=0x0, size=0) = 0
[child0:27837] [1112] semtimedop(semid=0xffffffffefffffff, tsops=0x0, nsops=0xfffffffff71a7113, timeout=0xffffffa9) = -1 (Invalid argument)
[child0:27837] [1113] ioctl(fd=384, cmd=0x5100000080000000, arg=362) = -1 (Inappropriate ioctl for device)
[child0:27837] [1114] futex(uaddr=0x0, op=0xb, val=0x80000000000000de, utime=0x8, uaddr2=0x0, val3=0xffffffff00000fff) = -1 (Bad address)
[child0:27837] [1115] listxattr(pathname="/proc/219/net/softnet_stat", list=0x0, size=152) = 0
[child0:27837] [1116] getrusage(who=0xffffffffff080808, ru=0xffffffc000080000) = -1 (Invalid argument)
[child0:27837] [1117] clock_settime(which_clock=0xffffffff7fffffff, tp=0x0) = -1 (Invalid argument)
[child0:27837] [1118] mremap(addr=0x6680000000, old_len=0, new_len=8192, flags=0x2, new_addr=0x5080400000) = -1 (Invalid argument)
[child0:27837] [1119] waitid(which=0x80000702c966254, upid=0, infop=0x7f90069000, options=-166, ru=0x7f90069004) = -1 (Invalid argument)
[child0:27837] [1120] sigaltstack(uss=0x40000000bd5fff6f, uoss=0x8000000000000000, regs=0x0) = -1 (Bad address)
[child0:27837] [1121] timer_delete(timer_id=0x4300d68e28803329) = -1 (Invalid argument)
[child0:27837] [1122] preadv(fd=384, vec=0x3a88fc20, vlen=173, pos_l=0x82000000ff804000, pos_h=96) = -1 (Invalid argument)
[child0:27837] [1123] getdents64(fd=384, dirent=0x7f90a69000, count=0x2ab672e3) = -1 (Not a directory)
[child0:27837] [1124] mlock(addr=0x7f92e69000, len=0x1e0000) 

so for some bizarre reason, child1 (27912) managed to execve oom_score
from /proc. mlock then hangs waiting for a completion in flush_work,
although I'm not sure how the execve is responsible for that.

Looking at the task trace:


SysRq : Show State
  task                        PC stack   pid father

[...]

deferwq         S ffffffc0000855b0     0   599      2 0x00000000
Call trace:
[<ffffffc0000855b0>] __switch_to+0x74/0x8c
[<ffffffc000534214>] __schedule+0x214/0x680
[<ffffffc0005346a4>] schedule+0x24/0x74
[<ffffffc0000c5780>] rescuer_thread+0x200/0x29c
[<ffffffc0000ca404>] kthread+0xd8/0xf0
sh              S ffffffc0000855b0     0   602      1 0x00000000
Call trace:
[<ffffffc0000855b0>] __switch_to+0x74/0x8c
[<ffffffc000534214>] __schedule+0x214/0x680
[<ffffffc0005346a4>] schedule+0x24/0x74
[<ffffffc0000b1f94>] do_wait+0x1c4/0x1fc
[<ffffffc0000b306c>] SyS_wait4+0x74/0xf0
trinity         S ffffffc0000855b0     0   610    602 0x00000000
Call trace:
[<ffffffc0000855b0>] __switch_to+0x74/0x8c
[<ffffffc000534214>] __schedule+0x214/0x680
[<ffffffc0005346a4>] schedule+0x24/0x74
[<ffffffc0000b1f94>] do_wait+0x1c4/0x1fc
[<ffffffc0000b306c>] SyS_wait4+0x74/0xf0
trinity-watchdo R  running task        0   611    610 0x00000000
Call trace:
[<ffffffc0000855b0>] __switch_to+0x74/0x8c
[<ffffffc000534214>] __schedule+0x214/0x680
[<ffffffc0005346a4>] schedule+0x24/0x74
[<ffffffc0005373a0>] do_nanosleep+0xcc/0x134
[<ffffffc0000f9da4>] hrtimer_nanosleep+0x88/0x108
[<ffffffc0000f9eb0>] SyS_nanosleep+0x8c/0xa4
trinity-main    S ffffffc0000855b0     0   612    610 0x00000000
Call trace:
[<ffffffc0000855b0>] __switch_to+0x74/0x8c
[<ffffffc000534214>] __schedule+0x214/0x680
[<ffffffc0005346a4>] schedule+0x24/0x74
[<ffffffc0000b1f94>] do_wait+0x1c4/0x1fc
[<ffffffc0000b306c>] SyS_wait4+0x74/0xf0
trinity-c0      D ffffffc0000855b0     0 27837    612 0x00000000
Call trace:
[<ffffffc0000855b0>] __switch_to+0x74/0x8c
[<ffffffc000534214>] __schedule+0x214/0x680
[<ffffffc0005346a4>] schedule+0x24/0x74
[<ffffffc000537204>] schedule_timeout+0x134/0x18c
[<ffffffc000535364>] wait_for_common+0x9c/0x144
[<ffffffc00053541c>] wait_for_completion+0x10/0x1c
[<ffffffc0000c4cdc>] flush_work+0xbc/0x168
[<ffffffc00013f608>] lru_add_drain_all+0x12c/0x180
[<ffffffc00015cb78>] SyS_mlock+0x20/0x118
trinity-c1      R  running task        0 27912    612 0x00000000
Call trace:
[<ffffffc0000855b0>] __switch_to+0x74/0x8c
trinity-c1      R  running task        0 27921  27912 0x00000000
Call trace:


We can see the child that did the execve has somehow gained its own
child process (27921) that we're unable to backtrace. I can't see any
clone/fork syscalls in the log for 27912.

At this point, both of the CPUs are sitting in idle, so there's nothing
interesting in their register dumps.

Still confused.

Will

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-27 22:56                   ` Dave Jones
  2014-11-29 20:38                     ` Dâniel Fraga
@ 2014-12-01 16:56                     ` Don Zickus
  1 sibling, 0 replies; 486+ messages in thread
From: Don Zickus @ 2014-12-01 16:56 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, the arch/x86 maintainers

On Thu, Nov 27, 2014 at 05:56:37PM -0500, Dave Jones wrote:
> On Thu, Nov 27, 2014 at 11:17:16AM -0800, Linus Torvalds wrote:
>  > On Wed, Nov 26, 2014 at 2:57 PM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > So 3.17 also has this problem.
>  > > Good news I guess in that it's not a regression, but damn I really didn't
>  > > want to have to go digging through the mists of time to find the last 'good' point.
>  > 
>  > So I'm looking at the watchdog code, and it seems racy wrt parking and startup.
>  > 
>  > In particular, it sets the high priority *after* starting the hrtimer,
>  > and it goes back to SCHED_NORMAL *before* canceling the timer.
>  > 
>  > Which seems completely ass-backwards. And the smp_hotplug_thread stuff
>  > explicitly enables preemption around the setup/cleanup/part/unpark
>  > operations.
>  > 
>  > However, that would be an issue only if trinity might be doing things
>  > that enable and disable the watchdog. And doing so under insane loads.
>  > Even then it seems unlikely.
>  > 
>  > The insane loads you have. But even then, could a load average of 169
>  > possibly delay running a non-RT process for 22 seconds? Doubtful.
>  > 
>  > But just in case: do you do cpu hotplug events (that will disable and
>  > re-enable the watchdog process?).  Anything else that will part/unpark
>  > the hotplug thread?
> 
> That's root-only iirc, and I'm not running trinity as root, so that
> shouldn't be happening. There's also no sign of such behaviour in dmesg
> when the problem occurs.

Yeah, the watchdog code is very chatty during thread 'unparking'.  If
Dave's dmesg log isn't seeing any:

"enabled on all CPUs, permanently consumes one hw-PMU counter"

except on boot, then I believe the park/unpark race you see shouldn't
be occuring in this scenario.


> 
>  > Quite frankly, I'm just grasping for straws here, but a lot of the
>  > watchdog traces really have seemed spurious...
> 
> Agreed.

Well we can explore this route..

I added a patch below that just logs the watchdog timer function and
kernel thread for each cpu.  It's a little chatty but every 4 seconds you
will see something like this in the logs:

[ 2507.580184] 1: watchdog process kicked (reset)
[ 2507.581154] 0: watchdog process kicked (reset)
[ 2507.581172] 0: watchdog run
[ 2507.593469] 1: watchdog run
[ 2507.595106] 2: watchdog process kicked (reset)
[ 2507.595120] 2: watchdog run
[ 2507.608136] 3: watchdog process kicked (reset)
[ 2507.613204] 3: watchdog run

With the printk timestamps it would be interesting to see what the
watchdog was doing in its final moments and if the timestamps verify the
exceeded duration or if the watchdog screws up the calculation and falsely
reports a lockup.

Cheers,
Don


diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 70bf118..b1ea06c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -324,6 +324,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
 
 	if (touch_ts == 0) {
+		printk("%d: watchdog process kicked (reset)\n", smp_processor_id());
 		if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
 			/*
 			 * If the time stamp was touched atomically
@@ -346,6 +347,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	 * this is a good indication some task is hogging the cpu
 	 */
 	duration = is_softlockup(touch_ts);
+	printk("%d: watchdog process kicked (%d seconds since last)\n", smp_processor_id(), duration);
 	if (unlikely(duration)) {
 		/*
 		 * If a virtual machine is stopped by the host it can look to
@@ -477,6 +479,7 @@ static void watchdog(unsigned int cpu)
 	__this_cpu_write(soft_lockup_hrtimer_cnt,
 			 __this_cpu_read(hrtimer_interrupts));
 	__touch_watchdog();
+	printk("%d: watchdog run\n", cpu);
 }
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 11:48                                       ` Will Deacon
@ 2014-12-01 17:05                                         ` Linus Torvalds
  2014-12-01 17:10                                           ` Will Deacon
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-01 17:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dave Jones, Andy Lutomirski, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Mon, Dec 1, 2014 at 3:48 AM, Will Deacon <will.deacon@arm.com> wrote:
>
> so for some bizarre reason, child1 (27912) managed to execve oom_score
> from /proc.

That sounds like you have a binfmt that accepts crap. Possibly
ARM-specific, although more likely it's just a misc script.

> We can see the child that did the execve has somehow gained its own
> child process (27921) that we're unable to backtrace. I can't see any
> clone/fork syscalls in the log for 27912.

Well, it wouldn't be trinity any more, it would likely be some execve
script (think "/bin/sh", except likely through binfmt_misc).

Do you have anything in /proc/sys/fs/binfmt_misc? I don't see anything
else that would trigger it.

This doesn't really look anything like DaveJ's issue, but who knows..

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 17:05                                         ` Linus Torvalds
@ 2014-12-01 17:10                                           ` Will Deacon
  2014-12-01 17:53                                             ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Will Deacon @ 2014-12-01 17:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Andy Lutomirski, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Mon, Dec 01, 2014 at 05:05:06PM +0000, Linus Torvalds wrote:
> On Mon, Dec 1, 2014 at 3:48 AM, Will Deacon <will.deacon@arm.com> wrote:
> > so for some bizarre reason, child1 (27912) managed to execve oom_score
> > from /proc.
> 
> That sounds like you have a binfmt that accepts crap. Possibly
> ARM-specific, although more likely it's just a misc script.
> 
> > We can see the child that did the execve has somehow gained its own
> > child process (27921) that we're unable to backtrace. I can't see any
> > clone/fork syscalls in the log for 27912.
> 
> Well, it wouldn't be trinity any more, it would likely be some execve
> script (think "/bin/sh", except likely through binfmt_misc).
> 
> Do you have anything in /proc/sys/fs/binfmt_misc? I don't see anything
> else that would trigger it.

So I don't even have binfmt-misc compiled in. The two handlers I have are
BINFMT_ELF and BINFMT_SCRIPT, but they both check for headers that we won't
get back from oom_score afaict.

> This doesn't really look anything like DaveJ's issue, but who knows..

It's the only lockup I'm seeing on arm64 with trinity, but I agree that it's
not very helpful.

Will

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 17:10                                           ` Will Deacon
@ 2014-12-01 17:53                                             ` Linus Torvalds
  2014-12-01 18:25                                               ` Kirill A. Shutemov
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-01 17:53 UTC (permalink / raw)
  To: Will Deacon, Tejun Heo
  Cc: Dave Jones, Andy Lutomirski, Don Zickus, Thomas Gleixner,
	Linux Kernel, the arch/x86 maintainers, Peter Zijlstra

On Mon, Dec 1, 2014 at 9:10 AM, Will Deacon <will.deacon@arm.com> wrote:
>
> So I don't even have binfmt-misc compiled in. The two handlers I have are
> BINFMT_ELF and BINFMT_SCRIPT, but they both check for headers that we won't
> get back from oom_score afaict.

Hmm. So I can't even get that "oom_score" file to be executable in the
first place, which should mean that execve() should terminate very
quickly with an EACCES error.

The fact that you have a "flush_work()" that is waiting for completion
is interesting. Maybe the odd new thread is a worker thread for some
modprobe or similar, and we .  There's that whole

   request_module("binfmt-%04x", *(ushort *)(bprm->buf + 2))

which ends up creating a new work. Maybe the flush_work() is waiting
for that whole mess. Adding Tejun to the cc, since there *were*
changes to workqueues etc since 3.16..

Tejun, full thread on lkml, I'm assuming you can find it in your mail archives..

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 17:53                                             ` Linus Torvalds
@ 2014-12-01 18:25                                               ` Kirill A. Shutemov
  2014-12-01 18:36                                                 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Kirill A. Shutemov @ 2014-12-01 18:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Will Deacon, Tejun Heo, Dave Jones, Andy Lutomirski, Don Zickus,
	Thomas Gleixner, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra

On Mon, Dec 01, 2014 at 09:53:16AM -0800, Linus Torvalds wrote:
> On Mon, Dec 1, 2014 at 9:10 AM, Will Deacon <will.deacon@arm.com> wrote:
> >
> > So I don't even have binfmt-misc compiled in. The two handlers I have are
> > BINFMT_ELF and BINFMT_SCRIPT, but they both check for headers that we won't
> > get back from oom_score afaict.
> 
> Hmm. So I can't even get that "oom_score" file to be executable in the
> first place, which should mean that execve() should terminate very
> quickly with an EACCES error.

No idea about oom_score, but kernel happily accepts chmod on any file
under /proc/PID/net/. It caused issues before[1].

Why do we allow this?

I've asked before, but no answer so far.


[1] https://lkml.org/lkml/2014/8/2/103

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 18:25                                               ` Kirill A. Shutemov
@ 2014-12-01 18:36                                                 ` Linus Torvalds
  2014-12-04 10:51                                                   ` Will Deacon
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-01 18:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Will Deacon, Tejun Heo, Dave Jones, Andy Lutomirski, Don Zickus,
	Thomas Gleixner, Linux Kernel, the arch/x86 maintainers,
	Peter Zijlstra

On Mon, Dec 1, 2014 at 10:25 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> No idea about oom_score, but kernel happily accepts chmod on any file
> under /proc/PID/net/.

/proc used to accept that fairly widely, but no, we tightened things
down, and core /proc files end up not accepting chmod. See
'proc_setattr()':

        if (attr->ia_valid & ATTR_MODE)
                return -EPERM;

although particular /proc files could choose to not use 'proc_setattr'
if they want to.

The '/proc/pid/net' subtree is obviously not doing that. No idea why,
and probably for no good reason.

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01  1:02                             ` Dâniel Fraga
@ 2014-12-01 19:14                               ` Paul E. McKenney
  2014-12-01 20:28                                 ` Dâniel Fraga
  2014-12-02  8:40                                 ` Lai Jiangshan
  0 siblings, 2 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-01 19:14 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Sun, Nov 30, 2014 at 11:02:43PM -0200, Dâniel Fraga wrote:
> On Sun, 30 Nov 2014 16:21:19 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Maybe you'll have to turn off RCU_CPU_STALL_VERBOSE first.
> > 
> > Although I think you should be able to just edit the .config file,
> > delete the like that says
> > 
> >     CONFIG_TREE_PREEMPT_RCU=y
> > 
> > and then just do a "make oldconfig", and then verify that
> > TREE_PREEMPT_RCU hasn't been re-enabled by some dependency. But it
> > shouldn't have, and that "make oldconfig" should get rid of anything
> > that depends on TREE_PREEMPT_RCU.
> 	
> 	Ok, I did exactly that, but CONFIG_TREE_PREEMPT_RCU is
> re-enabled. I talked with Pranith Kumar and he suggested I could just
> disable preemption (No Forced Preemption (Server)) and that's the only
> way to disable CONFIG_TREE_PREEMPT_RCU.

If it would help to have !CONFIG_TREE_PREEMPT_RCU with CONFIG_PREEMPT=y,
please let me know and I will create a patch that forces this.
(Not mainline material, but if it helps with debug...)

							Thanx, Paul

> 	Now I'll try to make the system freeze, then I'll send
> you the Call trace.
> 
> 	Thanks.
> 
> -- 
> Linux 3.17.0: Shuffling Zombie Juror
> http://www.youtube.com/DanielFragaBR
> http://exchangewar.info
> Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 19:14                               ` Paul E. McKenney
@ 2014-12-01 20:28                                 ` Dâniel Fraga
  2014-12-01 20:36                                   ` Linus Torvalds
  2014-12-01 23:08                                   ` Paul E. McKenney
  2014-12-02  8:40                                 ` Lai Jiangshan
  1 sibling, 2 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-01 20:28 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Mon, 1 Dec 2014 11:14:31 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> If it would help to have !CONFIG_TREE_PREEMPT_RCU with CONFIG_PREEMPT=y,
> please let me know and I will create a patch that forces this.
> (Not mainline material, but if it helps with debug...)

	Hi Paul. Please, I'd like the patch, because without
preemption, I'm unable to trigger this bug.

	Thanks.

-- 
Linux 3.17.0: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 20:28                                 ` Dâniel Fraga
@ 2014-12-01 20:36                                   ` Linus Torvalds
  2014-12-01 23:08                                     ` Chris Mason
                                                       ` (2 more replies)
  2014-12-01 23:08                                   ` Paul E. McKenney
  1 sibling, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-01 20:36 UTC (permalink / raw)
  To: Dâniel Fraga, Dave Jones, Sasha Levin
  Cc: Paul E. McKenney, Linux Kernel Mailing List

On Mon, Dec 1, 2014 at 12:28 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Hi Paul. Please, I'd like the patch, because without
> preemption, I'm unable to trigger this bug.

Ok, that's already interesting information. And yes, it would probably
be interesting to see if CONFIG_PREEMPT=y but !CONFIG_TREE_PREEMPT_RCU
then solves it too, to narrow it down to one but not the other..

DaveJ - what about your situation? The standard Fedora kernels use
CONFIG_PREEMPT_VOLUNTARY, do you have CONFIG_PREEMPT and
CONFIG_TREE_PREEMPT_RCU enabled? I think you and Sasha both saw some
RCU oddities too, no?

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 20:28                                 ` Dâniel Fraga
  2014-12-01 20:36                                   ` Linus Torvalds
@ 2014-12-01 23:08                                   ` Paul E. McKenney
  2014-12-02 16:43                                     ` Dâniel Fraga
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-01 23:08 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Mon, Dec 01, 2014 at 06:28:31PM -0200, Dâniel Fraga wrote:
> On Mon, 1 Dec 2014 11:14:31 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > If it would help to have !CONFIG_TREE_PREEMPT_RCU with CONFIG_PREEMPT=y,
> > please let me know and I will create a patch that forces this.
> > (Not mainline material, but if it helps with debug...)
> 
> 	Hi Paul. Please, I'd like the patch, because without
> preemption, I'm unable to trigger this bug.

Well, this turned out to be way simpler than I expected.  Passes
light rcutorture testing.  Sometimes you get lucky...

							Thanx, Paul


diff --git a/init/Kconfig b/init/Kconfig
index 903505e66d1d..2cf71fcd514f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -469,7 +469,7 @@ choice
 
 config TREE_RCU
 	bool "Tree-based hierarchical RCU"
-	depends on !PREEMPT && SMP
+	depends on SMP
 	select IRQ_WORK
 	help
 	  This option selects the RCU implementation that is


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 20:36                                   ` Linus Torvalds
@ 2014-12-01 23:08                                     ` Chris Mason
  2014-12-01 23:25                                       ` Linus Torvalds
                                                         ` (2 more replies)
  2014-12-02 19:31                                     ` Dave Jones
  2014-12-02 20:30                                     ` Dave Jones
  2 siblings, 3 replies; 486+ messages in thread
From: Chris Mason @ 2014-12-01 23:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dâniel Fraga, Dave Jones, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

I'm not sure if this is related, but running trinity here, I noticed it
was stuck at 100% system time on every CPU.  perf report tells me we are
spending all of our time in spin_lock under the sync system call.

I think it's coming from contention in the bdi_queue_work() call from
inside sync_inodes_sb, which is spin_lock_bh(). 

I wonder if we're just spinning so hard on this one bh lock that we're
starving the watchdog?

Dave, do you have spinlock debugging on?  

-chris

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 23:08                                     ` Chris Mason
@ 2014-12-01 23:25                                       ` Linus Torvalds
  2014-12-01 23:44                                         ` Chris Mason
  2014-12-02 14:13                                       ` Mike Galbraith
  2014-12-02 19:32                                       ` Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-01 23:25 UTC (permalink / raw)
  To: Chris Mason, Linus Torvalds, Dâniel Fraga, Dave Jones,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List

On Mon, Dec 1, 2014 at 3:08 PM, Chris Mason <clm@fb.com> wrote:
> I'm not sure if this is related, but running trinity here, I noticed it
> was stuck at 100% system time on every CPU.  perf report tells me we are
> spending all of our time in spin_lock under the sync system call.
>
> I think it's coming from contention in the bdi_queue_work() call from
> inside sync_inodes_sb, which is spin_lock_bh().

Please do a perf run with -g to get the call chain to make sure..

> I wonder if we're just spinning so hard on this one bh lock that we're
> starving the watchdog?

If it was that simple, we should see it in the actual soft-lockup stack trace.

That said, looking at the bdi_queue_work() function, I don't think you
should see any real contention there, although:

 - spin-lock debugging can make any bad situation about 10x worse by
making the spinlocks just that much more horrible from a performance
standpoint

 - the whole "complete(work->done)" thing seems to be pointlessly done
inside the spinlock, and that just seems horrible. Do you have a ton
of BDI's that might fail that BDI_registered thing?

 - even the "mod_delayed_work()" is dubious wrt the wb_lock. From what
I can tell, the spinlock is supposed to just protect the list.

So I think that bdi_queue_work() quite possibly is horribly broken
crap and *if* it really is contention on wb_lock, we could rewrite it
to not be so bad locking-wise.

That said, contention that happens with spinlock debugging enabled
really tends to fall under the heading of "that's your own fault".

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 23:25                                       ` Linus Torvalds
@ 2014-12-01 23:44                                         ` Chris Mason
  2014-12-02  0:39                                           ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-01 23:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linus Torvalds, Dâniel Fraga, Dave Jones, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List



On Mon, Dec 1, 2014 at 6:25 PM, Linus Torvalds 
<torvalds@linux-foundation.org> wrote:
> On Mon, Dec 1, 2014 at 3:08 PM, Chris Mason <clm@fb.com> wrote:
>>  I'm not sure if this is related, but running trinity here, I 
>> noticed it
>>  was stuck at 100% system time on every CPU.  perf report tells me 
>> we are
>>  spending all of our time in spin_lock under the sync system call.
>> 
>>  I think it's coming from contention in the bdi_queue_work() call 
>> from
>>  inside sync_inodes_sb, which is spin_lock_bh().
> 
> Please do a perf run with -g to get the call chain to make sure..

The call chain goes something like this:

               --- _raw_spin_lock
                   |
                   |--99.72%-- sync_inodes_sb
                   |          sync_inodes_one_sb
                   |          iterate_supers
                   |          sys_sync
                   |          |
                   |          |--79.66%-- system_call_fastpath
                   |          |          syscall
                   |          |
                   |           --20.34%-- ia32_sysret
                   |                     __do_syscall
                    --0.28%-- [...]

(the 64bit call variation is similar)  Adding -v doesn't really help, 
because it isn't giving me the address inside sync_inodes_sb()

I first read this and guessed it must be leaving out the call to 
bdi_queue_work, hoping the spin_lock_bh and lock debugging were teaming 
up to stall the box.

But looking harder it's probably inside wait_sb_inodes:

        spin_lock(&inode_sb_list_lock);

Which is a little harder to blame.  Maaaaaybe with lock debugging, but 
its enough of a stretch that I wouldn't have emailed at all if I hadn't 
fixated on the bdi code.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 23:44                                         ` Chris Mason
@ 2014-12-02  0:39                                           ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-02  0:39 UTC (permalink / raw)
  To: Chris Mason
  Cc: Dâniel Fraga, Dave Jones, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Mon, Dec 1, 2014 at 3:44 PM, Chris Mason <clm@fb.com> wrote:
>
> But looking harder it's probably inside wait_sb_inodes:
>
>        spin_lock(&inode_sb_list_lock);

Yeah, that's a known pain-point for sync(), although nobody has really
cared enough, since performance of parallel sync() calls is usually
not very high on anybody's list of things to care about except when it
occasionally shows up on some old Unix benchmark (maybe AIM, I
forget).

Anyway, lock debugging will make what is usually not noticeable into a
"whee, that's horrible", because the lock debugging overhead is often
many *many* times higher than the cost of the code inside the lock..

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 19:14                               ` Paul E. McKenney
  2014-12-01 20:28                                 ` Dâniel Fraga
@ 2014-12-02  8:40                                 ` Lai Jiangshan
  2014-12-02 16:58                                   ` Paul E. McKenney
  2014-12-02 16:58                                   ` Dâniel Fraga
  1 sibling, 2 replies; 486+ messages in thread
From: Lai Jiangshan @ 2014-12-02  8:40 UTC (permalink / raw)
  To: paulmck; +Cc: Dâniel Fraga, Linus Torvalds, Linux Kernel Mailing List

On 12/02/2014 03:14 AM, Paul E. McKenney wrote:
> On Sun, Nov 30, 2014 at 11:02:43PM -0200, Dâniel Fraga wrote:
>> On Sun, 30 Nov 2014 16:21:19 -0800
>> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>> Maybe you'll have to turn off RCU_CPU_STALL_VERBOSE first.
>>>
>>> Although I think you should be able to just edit the .config file,
>>> delete the like that says
>>>
>>>     CONFIG_TREE_PREEMPT_RCU=y
>>>
>>> and then just do a "make oldconfig", and then verify that
>>> TREE_PREEMPT_RCU hasn't been re-enabled by some dependency. But it
>>> shouldn't have, and that "make oldconfig" should get rid of anything
>>> that depends on TREE_PREEMPT_RCU.
>> 	
>> 	Ok, I did exactly that, but CONFIG_TREE_PREEMPT_RCU is
>> re-enabled. I talked with Pranith Kumar and he suggested I could just
>> disable preemption (No Forced Preemption (Server)) and that's the only
>> way to disable CONFIG_TREE_PREEMPT_RCU.
> 
> If it would help to have !CONFIG_TREE_PREEMPT_RCU with CONFIG_PREEMPT=y,

It is needed at lest for testing.

CONFIG_TREE_PREEMPT_RCU=y with CONFIG_PREEMPT=n is needed for testing too.

Please enable them (or enable them under CONFIG_RCU_TRACE=y)

> please let me know and I will create a patch that forces this.
> (Not mainline material, but if it helps with debug...)
> 
> 							Thanx, Paul
> 
>> 	Now I'll try to make the system freeze, then I'll send
>> you the Call trace.
>>
>> 	Thanks.
>>
>> -- 
>> Linux 3.17.0: Shuffling Zombie Juror
>> http://www.youtube.com/DanielFragaBR
>> http://exchangewar.info
>> Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 23:08                                     ` Chris Mason
  2014-12-01 23:25                                       ` Linus Torvalds
@ 2014-12-02 14:13                                       ` Mike Galbraith
  2014-12-02 16:33                                         ` Linus Torvalds
  2014-12-02 19:32                                       ` Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Mike Galbraith @ 2014-12-02 14:13 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dâniel Fraga, Dave Jones, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Mon, 2014-12-01 at 18:08 -0500, Chris Mason wrote:
> I'm not sure if this is related, but running trinity here, I noticed it
> was stuck at 100% system time on every CPU.  perf report tells me we are
> spending all of our time in spin_lock under the sync system call.
> 
> I think it's coming from contention in the bdi_queue_work() call from
> inside sync_inodes_sb, which is spin_lock_bh(). 
> 
> I wonder if we're just spinning so hard on this one bh lock that we're
> starving the watchdog?

The bean counting problem below can contribute.

https://lkml.org/lkml/2014/3/30/7

	-Mike


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 14:13                                       ` Mike Galbraith
@ 2014-12-02 16:33                                         ` Linus Torvalds
  2014-12-02 17:14                                           ` Chris Mason
                                                             ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-02 16:33 UTC (permalink / raw)
  To: Mike Galbraith, Ingo Molnar, Peter Zijlstra
  Cc: Chris Mason, Dâniel Fraga, Dave Jones, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 6:13 AM, Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
>
> The bean counting problem below can contribute.
>
> https://lkml.org/lkml/2014/3/30/7

Hmm. That never got applied. I didn't apply it originally because of
timing and wanting clarifications, but apparently it never made it
into the -tip tree either.

Ingo, PeterZ - comments?

Looking again at that patch (the commit message still doesn't strike
me as wonderfully explanatory :^) makes me worry, though.

Is that

        if (rq->skip_clock_update-- > 0)
                return;

really right? If skip_clock_update was zero (normal), it now gets set
to -1, which has its own specific meaning (see "force clock update"
comment in kernel/sched/rt.c). Is that intentional? That seems insane.

Or should it be

        if (rq->skip_clock_update > 0) {
                rq->skip_clock_update = 0;
                return;
        }

or what? Maybe there was a reason the patch never got applied even to -tip.

At the same time, the whole "incapacitated by the rt throttle long
enough for the hard lockup detector to trigger" commentary about that
skip_clock_update issue does make me go "Hmmm..". It would certainly
explain Dave's incomprehensible watchdog messages..

               Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 23:08                                   ` Paul E. McKenney
@ 2014-12-02 16:43                                     ` Dâniel Fraga
  2014-12-02 17:04                                       ` Paul E. McKenney
  2014-12-02 17:08                                       ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 16:43 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Mon, 1 Dec 2014 15:08:13 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Well, this turned out to be way simpler than I expected.  Passes
> light rcutorture testing.  Sometimes you get lucky...

	Linus, Paul and others, I finally got a call trace with
only CONFIG_TREE_PREEMPT_RCU *disabled* using Paul's patch (to trigger 
it I compiled PHP with make -j8).

Dec  2 14:24:39 tux kernel: [ 8475.941616] conftest[9730]: segfault at 0 ip 0000000000400640 sp 00007fffa67ab300 error 4 in conftest[400000+1000]
Dec  2 14:24:40 tux kernel: [ 8476.104725] conftest[9753]: segfault at 0 ip 00007f6863024906 sp 00007fff0e31cc48 error 4 in libc-2.19.so[7f6862efe000+1a1000]
Dec  2 14:25:54 tux kernel: [ 8550.791697] INFO: rcu_sched detected stalls on CPUs/tasks: { 4} (detected by 0, t=60002 jiffies, g=112854, c=112853, q=0)
Dec  2 14:25:54 tux kernel: [ 8550.791702] Task dump for CPU 4:
Dec  2 14:25:54 tux kernel: [ 8550.791703] cc1             R  running task        0 14344  14340 0x00080008
Dec  2 14:25:54 tux kernel: [ 8550.791706]  000000001bcebcd8 ffff880100000003 ffffffff810cb7f1 ffff88021f5f5c00
Dec  2 14:25:54 tux kernel: [ 8550.791708]  ffff88011bcebfd8 ffff88011bcebce8 ffffffff811fb970 ffff8802149a2a00
Dec  2 14:25:54 tux kernel: [ 8550.791710]  ffff8802149a2cc8 ffff88011bcebd28 ffffffff8103e979 ffff88020ed01398
Dec  2 14:25:54 tux kernel: [ 8550.791712] Call Trace:
Dec  2 14:25:54 tux kernel: [ 8550.791718]  [<ffffffff810cb7f1>] ? release_pages+0xa1/0x1e0
Dec  2 14:25:54 tux kernel: [ 8550.791722]  [<ffffffff811fb970>] ? cpumask_any_but+0x30/0x40
Dec  2 14:25:54 tux kernel: [ 8550.791725]  [<ffffffff8103e979>] ? flush_tlb_page+0x49/0xf0
Dec  2 14:25:54 tux kernel: [ 8550.791727]  [<ffffffff810cbe72>] ? lru_cache_add_active_or_unevictable+0x22/0x90
Dec  2 14:25:54 tux kernel: [ 8550.791731]  [<ffffffff810fc4c2>] ? alloc_pages_vma+0x72/0x130
Dec  2 14:25:54 tux kernel: [ 8550.791733]  [<ffffffff810cbe72>] ? lru_cache_add_active_or_unevictable+0x22/0x90
Dec  2 14:25:54 tux kernel: [ 8550.791735]  [<ffffffff810e5220>] ? handle_mm_fault+0x3a0/0xaf0
Dec  2 14:25:54 tux kernel: [ 8550.791737]  [<ffffffff81039074>] ? __do_page_fault+0x224/0x4c0
Dec  2 14:25:54 tux kernel: [ 8550.791740]  [<ffffffff8110d54c>] ? new_sync_write+0x7c/0xb0
Dec  2 14:25:55 tux kernel: [ 8550.791743]  [<ffffffff8114765c>] ? fsnotify+0x27c/0x350
Dec  2 14:25:55 tux kernel: [ 8550.791746]  [<ffffffff81087233>] ? rcu_eqs_enter+0x93/0xa0
Dec  2 14:25:55 tux kernel: [ 8550.791748]  [<ffffffff81087a5e>] ? rcu_user_enter+0xe/0x10
Dec  2 14:25:55 tux kernel: [ 8550.791749]  [<ffffffff8103938a>] ? do_page_fault+0x5a/0x70
Dec  2 14:25:55 tux kernel: [ 8550.791752]  [<ffffffff8139d9d2>] ? page_fault+0x22/0x30

	If you need more info/testing, just ask.

-- 
Linux 3.17.0-dirty: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02  8:40                                 ` Lai Jiangshan
@ 2014-12-02 16:58                                   ` Paul E. McKenney
  2014-12-02 16:58                                   ` Dâniel Fraga
  1 sibling, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 16:58 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Dâniel Fraga, Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 04:40:37PM +0800, Lai Jiangshan wrote:
> On 12/02/2014 03:14 AM, Paul E. McKenney wrote:
> > On Sun, Nov 30, 2014 at 11:02:43PM -0200, Dâniel Fraga wrote:
> >> On Sun, 30 Nov 2014 16:21:19 -0800
> >> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>
> >>> Maybe you'll have to turn off RCU_CPU_STALL_VERBOSE first.
> >>>
> >>> Although I think you should be able to just edit the .config file,
> >>> delete the like that says
> >>>
> >>>     CONFIG_TREE_PREEMPT_RCU=y
> >>>
> >>> and then just do a "make oldconfig", and then verify that
> >>> TREE_PREEMPT_RCU hasn't been re-enabled by some dependency. But it
> >>> shouldn't have, and that "make oldconfig" should get rid of anything
> >>> that depends on TREE_PREEMPT_RCU.
> >> 	
> >> 	Ok, I did exactly that, but CONFIG_TREE_PREEMPT_RCU is
> >> re-enabled. I talked with Pranith Kumar and he suggested I could just
> >> disable preemption (No Forced Preemption (Server)) and that's the only
> >> way to disable CONFIG_TREE_PREEMPT_RCU.
> > 
> > If it would help to have !CONFIG_TREE_PREEMPT_RCU with CONFIG_PREEMPT=y,
> 
> It is needed at lest for testing.
> 
> CONFIG_TREE_PREEMPT_RCU=y with CONFIG_PREEMPT=n is needed for testing too.
> 
> Please enable them (or enable them under CONFIG_RCU_TRACE=y)

It is a really easy edit to Kconfig, but I don't want people using it
in production because I really don't need the extra test scenarios.
So I am happy to provide the patch below as needed, but not willing to
submit it to mainline without a lot more justification.  Because if it
appears in mainline, people will start using it in production, whether
I am doing proper testing of it or not.  :-/

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/init/Kconfig b/init/Kconfig
index 903505e66d1d..2cf71fcd514f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -469,7 +469,7 @@ choice
 
 config TREE_RCU
 	bool "Tree-based hierarchical RCU"
-	depends on !PREEMPT && SMP
+	depends on SMP
 	select IRQ_WORK
 	help
 	  This option selects the RCU implementation that is


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02  8:40                                 ` Lai Jiangshan
  2014-12-02 16:58                                   ` Paul E. McKenney
@ 2014-12-02 16:58                                   ` Dâniel Fraga
  2014-12-02 17:17                                     ` Paul E. McKenney
  2014-12-03  2:03                                     ` Lai Jiangshan
  1 sibling, 2 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 16:58 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: paulmck, Linus Torvalds, Linux Kernel Mailing List

On Tue, 2 Dec 2014 16:40:37 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> It is needed at lest for testing.
> 
> CONFIG_TREE_PREEMPT_RCU=y with CONFIG_PREEMPT=n is needed for testing too.
> 
> Please enable them (or enable them under CONFIG_RCU_TRACE=y)

	Lai, sorry but I didn't understand. Do you mean both of them
enabled? Because how can CONFIG_TREE_PREEMPT_RCU be enabled without
CONFIG_PREEMPT ?

	If you mean both enabled, I already reported a call trace with
both enabled:

https://bugzilla.kernel.org/show_bug.cgi?id=85941

	Please see my previous answer to Linus and Paul too.

	Regarding CONFIG_RCU_TRACE, do you mean
"CONFIG_TREE_RCU_TRACE"? I couldn't find CONFIG_RCU_TRACE.

	Thanks.

-- 
Linux 3.17.0-dirty: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 16:43                                     ` Dâniel Fraga
@ 2014-12-02 17:04                                       ` Paul E. McKenney
  2014-12-02 17:14                                         ` Dâniel Fraga
  2014-12-02 18:09                                         ` Paul E. McKenney
  2014-12-02 17:08                                       ` Linus Torvalds
  1 sibling, 2 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 17:04 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 02:43:17PM -0200, Dâniel Fraga wrote:
> On Mon, 1 Dec 2014 15:08:13 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Well, this turned out to be way simpler than I expected.  Passes
> > light rcutorture testing.  Sometimes you get lucky...
> 
> 	Linus, Paul and others, I finally got a call trace with
> only CONFIG_TREE_PREEMPT_RCU *disabled* using Paul's patch (to trigger 
> it I compiled PHP with make -j8).

Is it harder to reproduce with CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n?

If it is a -lot- harder to reproduce, it might be worth bisecting among
the RCU read-side critical sections.  If making a few of them be
non-preemptible greatly reduces the probability of the bug occuring,
that might provide a clue about root cause.

On the other hand, if it is just a little harder to reproduce, this
RCU read-side bisection would likely be an exercise in futility.

							Thanx, Paul

> Dec  2 14:24:39 tux kernel: [ 8475.941616] conftest[9730]: segfault at 0 ip 0000000000400640 sp 00007fffa67ab300 error 4 in conftest[400000+1000]
> Dec  2 14:24:40 tux kernel: [ 8476.104725] conftest[9753]: segfault at 0 ip 00007f6863024906 sp 00007fff0e31cc48 error 4 in libc-2.19.so[7f6862efe000+1a1000]
> Dec  2 14:25:54 tux kernel: [ 8550.791697] INFO: rcu_sched detected stalls on CPUs/tasks: { 4} (detected by 0, t=60002 jiffies, g=112854, c=112853, q=0)
> Dec  2 14:25:54 tux kernel: [ 8550.791702] Task dump for CPU 4:
> Dec  2 14:25:54 tux kernel: [ 8550.791703] cc1             R  running task        0 14344  14340 0x00080008
> Dec  2 14:25:54 tux kernel: [ 8550.791706]  000000001bcebcd8 ffff880100000003 ffffffff810cb7f1 ffff88021f5f5c00
> Dec  2 14:25:54 tux kernel: [ 8550.791708]  ffff88011bcebfd8 ffff88011bcebce8 ffffffff811fb970 ffff8802149a2a00
> Dec  2 14:25:54 tux kernel: [ 8550.791710]  ffff8802149a2cc8 ffff88011bcebd28 ffffffff8103e979 ffff88020ed01398
> Dec  2 14:25:54 tux kernel: [ 8550.791712] Call Trace:
> Dec  2 14:25:54 tux kernel: [ 8550.791718]  [<ffffffff810cb7f1>] ? release_pages+0xa1/0x1e0
> Dec  2 14:25:54 tux kernel: [ 8550.791722]  [<ffffffff811fb970>] ? cpumask_any_but+0x30/0x40
> Dec  2 14:25:54 tux kernel: [ 8550.791725]  [<ffffffff8103e979>] ? flush_tlb_page+0x49/0xf0
> Dec  2 14:25:54 tux kernel: [ 8550.791727]  [<ffffffff810cbe72>] ? lru_cache_add_active_or_unevictable+0x22/0x90
> Dec  2 14:25:54 tux kernel: [ 8550.791731]  [<ffffffff810fc4c2>] ? alloc_pages_vma+0x72/0x130
> Dec  2 14:25:54 tux kernel: [ 8550.791733]  [<ffffffff810cbe72>] ? lru_cache_add_active_or_unevictable+0x22/0x90
> Dec  2 14:25:54 tux kernel: [ 8550.791735]  [<ffffffff810e5220>] ? handle_mm_fault+0x3a0/0xaf0
> Dec  2 14:25:54 tux kernel: [ 8550.791737]  [<ffffffff81039074>] ? __do_page_fault+0x224/0x4c0
> Dec  2 14:25:54 tux kernel: [ 8550.791740]  [<ffffffff8110d54c>] ? new_sync_write+0x7c/0xb0
> Dec  2 14:25:55 tux kernel: [ 8550.791743]  [<ffffffff8114765c>] ? fsnotify+0x27c/0x350
> Dec  2 14:25:55 tux kernel: [ 8550.791746]  [<ffffffff81087233>] ? rcu_eqs_enter+0x93/0xa0
> Dec  2 14:25:55 tux kernel: [ 8550.791748]  [<ffffffff81087a5e>] ? rcu_user_enter+0xe/0x10
> Dec  2 14:25:55 tux kernel: [ 8550.791749]  [<ffffffff8103938a>] ? do_page_fault+0x5a/0x70
> Dec  2 14:25:55 tux kernel: [ 8550.791752]  [<ffffffff8139d9d2>] ? page_fault+0x22/0x30
> 
> 	If you need more info/testing, just ask.
> 
> -- 
> Linux 3.17.0-dirty: Shuffling Zombie Juror
> http://www.youtube.com/DanielFragaBR
> http://exchangewar.info
> Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 16:43                                     ` Dâniel Fraga
  2014-12-02 17:04                                       ` Paul E. McKenney
@ 2014-12-02 17:08                                       ` Linus Torvalds
  2014-12-02 17:16                                         ` Dâniel Fraga
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-02 17:08 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Paul E. McKenney, Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 8:43 AM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Linus, Paul and others, I finally got a call trace with
> only CONFIG_TREE_PREEMPT_RCU *disabled* using Paul's patch (to trigger
> it I compiled PHP with make -j8).

So just to verify:

Without CONFIG_PREEMPT, things work well for you?

But with CONFIG_PREEMPT, you are able to create the rcu_sched stalls
both with and without CONFIG_TREE_PREEMPT_RCU?

Correct?

> Dec  2 14:25:54 tux kernel: [ 8550.791697] INFO: rcu_sched detected stalls on CPUs/tasks: { 4} (detected by 0, t=60002 jiffies, g=112854, c=112853, q=0)

Paul?

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 17:04                                       ` Paul E. McKenney
@ 2014-12-02 17:14                                         ` Dâniel Fraga
  2014-12-02 18:42                                           ` Paul E. McKenney
  2014-12-02 18:09                                         ` Paul E. McKenney
  1 sibling, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 17:14 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, 2 Dec 2014 09:04:07 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Is it harder to reproduce with CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n?

	Yes, it's much harder! :)

> If it is a -lot- harder to reproduce, it might be worth bisecting among
> the RCU read-side critical sections.  If making a few of them be
> non-preemptible greatly reduces the probability of the bug occuring,
> that might provide a clue about root cause.
> 
> On the other hand, if it is just a little harder to reproduce, this
> RCU read-side bisection would likely be an exercise in futility.

	Ok, I want to bisect it. Since it could be painful to bisect,
could you suggest 2 commits between 3.16.0 and 3.17.0 so we can narrow
the bisect? I could just bisect between 3.16.0 and 3.17.0 but it would
take many days :).

	Ps: if you prefer I bisect between 3.16.0 and 3.17.0, no
problem, but you'll have to be patient ;).

-- 
Linux 3.17.0-dirty: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 16:33                                         ` Linus Torvalds
@ 2014-12-02 17:14                                           ` Chris Mason
  2014-12-03 18:41                                             ` Dave Jones
  2014-12-02 17:47                                           ` Mike Galbraith
  2014-12-17 11:13                                           ` Peter Zijlstra
  2 siblings, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-02 17:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Dave Jones, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 11:33 AM, Linus Torvalds 
<torvalds@linux-foundation.org> wrote:
> On Tue, Dec 2, 2014 at 6:13 AM, Mike Galbraith 
> <umgwanakikbuti@gmail.com> wrote:
> 
> At the same time, the whole "incapacitated by the rt throttle long
> enough for the hard lockup detector to trigger" commentary about that
> skip_clock_update issue does make me go "Hmmm..". It would certainly
> explain Dave's incomprehensible watchdog messages..

Dave's first email mentioned that he had panic on softlockup enabled, 
but even with that off the box wasn't recovering.

In my trinity runs here, I've gotten softlockup warnings where the box 
eventually recovered.  I'm wondering if some of the "bad" commits in 
the bisection are really false positives where the box would have been 
able to recover if we'd killed off all the trinity procs and given it 
time to breath.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 17:08                                       ` Linus Torvalds
@ 2014-12-02 17:16                                         ` Dâniel Fraga
  0 siblings, 0 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 17:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul E. McKenney, Linux Kernel Mailing List

On Tue, 2 Dec 2014 09:08:53 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So just to verify:
> 
> Without CONFIG_PREEMPT, things work well for you?

	Yes.

> But with CONFIG_PREEMPT, you are able to create the rcu_sched stalls
> both with and without CONFIG_TREE_PREEMPT_RCU?
> 
> Correct?

	Yes, correct. And without CONFIG_TREE_PREEMPT_RCU it's
much harder to trigger the bug.

-- 
Linux 3.17.0-dirty: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 16:58                                   ` Dâniel Fraga
@ 2014-12-02 17:17                                     ` Paul E. McKenney
  2014-12-03  2:03                                     ` Lai Jiangshan
  1 sibling, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 17:17 UTC (permalink / raw)
  To: Dâniel Fraga
  Cc: Lai Jiangshan, Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 02:58:38PM -0200, Dâniel Fraga wrote:
> On Tue, 2 Dec 2014 16:40:37 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> > It is needed at lest for testing.
> > 
> > CONFIG_TREE_PREEMPT_RCU=y with CONFIG_PREEMPT=n is needed for testing too.
> > 
> > Please enable them (or enable them under CONFIG_RCU_TRACE=y)
> 
> 	Lai, sorry but I didn't understand. Do you mean both of them
> enabled? Because how can CONFIG_TREE_PREEMPT_RCU be enabled without
> CONFIG_PREEMPT ?

Hmmm...  I did misread that in my reply.  A similar Kconfig edit will
enable that, but I am even less happy about the thought of pushing that
to mainline!  ;-)

							Thanx, Paul

> 	If you mean both enabled, I already reported a call trace with
> both enabled:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=85941
> 
> 	Please see my previous answer to Linus and Paul too.
> 
> 	Regarding CONFIG_RCU_TRACE, do you mean
> "CONFIG_TREE_RCU_TRACE"? I couldn't find CONFIG_RCU_TRACE.
> 
> 	Thanks.
> 
> -- 
> Linux 3.17.0-dirty: Shuffling Zombie Juror
> http://www.youtube.com/DanielFragaBR
> http://exchangewar.info
> Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 16:33                                         ` Linus Torvalds
  2014-12-02 17:14                                           ` Chris Mason
@ 2014-12-02 17:47                                           ` Mike Galbraith
  2014-12-13  8:11                                             ` Ingo Molnar
  2014-12-17 11:13                                           ` Peter Zijlstra
  2 siblings, 1 reply; 486+ messages in thread
From: Mike Galbraith @ 2014-12-02 17:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Peter Zijlstra, Chris Mason, Dâniel Fraga,
	Dave Jones, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Tue, 2014-12-02 at 08:33 -0800, Linus Torvalds wrote:

> Looking again at that patch (the commit message still doesn't strike
> me as wonderfully explanatory :^) makes me worry, though.
> 
> Is that
> 
>         if (rq->skip_clock_update-- > 0)
>                 return;
> 
> really right? If skip_clock_update was zero (normal), it now gets set
> to -1, which has its own specific meaning (see "force clock update"
> comment in kernel/sched/rt.c). Is that intentional? That seems insane.

Yeah, it was intentional.  Least lines.

> Or should it be
> 
>         if (rq->skip_clock_update > 0) {
>                 rq->skip_clock_update = 0;
>                 return;
>         }
> 
> or what? Maybe there was a reason the patch never got applied even to -tip.

Peterz was looking at corner case proofing the thing.  Saving those
cycles has been entirely too annoying.

https://lkml.org/lkml/2014/4/8/295

	-Mike


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 17:04                                       ` Paul E. McKenney
  2014-12-02 17:14                                         ` Dâniel Fraga
@ 2014-12-02 18:09                                         ` Paul E. McKenney
  2014-12-02 18:41                                           ` Dâniel Fraga
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 18:09 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 09:04:07AM -0800, Paul E. McKenney wrote:
> On Tue, Dec 02, 2014 at 02:43:17PM -0200, Dâniel Fraga wrote:
> > On Mon, 1 Dec 2014 15:08:13 -0800
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > Well, this turned out to be way simpler than I expected.  Passes
> > > light rcutorture testing.  Sometimes you get lucky...
> > 
> > 	Linus, Paul and others, I finally got a call trace with
> > only CONFIG_TREE_PREEMPT_RCU *disabled* using Paul's patch (to trigger 
> > it I compiled PHP with make -j8).
> 
> Is it harder to reproduce with CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n?
> 
> If it is a -lot- harder to reproduce, it might be worth bisecting among
> the RCU read-side critical sections.  If making a few of them be
> non-preemptible greatly reduces the probability of the bug occuring,
> that might provide a clue about root cause.
> 
> On the other hand, if it is just a little harder to reproduce, this
> RCU read-side bisection would likely be an exercise in futility.

To Linus's point, I guess I could look at the RCU CPU stall warning.  ;-)

Summary:  Not seeing something that would loop for 21 seconds.
Dâniel, if you let this run, does it hit a second RCU CPU stall
warning, or does it just lock up?

Details:

First, what would we be looking for?  We know that with CONFIG_PREEMPT=n,
things work, or at least that the failure rate is quite low.
With CONFIG_PREEMPT=y, with or without CONFIG_TREE_PREEMPT_RCU=y,
things can break.  This is backwards of the usual behavior: Normally
CONFIG_PREEMPT=y kernels are a bit less prone to RCU CPU stall warnings,
at least assuming that the kernel spends a relatively small fraction of
its time in RCU read-side critical sections.

So, how could this be?

1.	Someone forgot an rcu_read_unlock() on one of the exit paths from
	some RCU read-side critical section somewhere.  This seems unlikely,
	but either CONFIG_PROVE_RCU=y or CONFIG_PREEMPT_DEBUG=y should
	catch it.

2.	Someone forgot a preempt_enable() on one of the exit paths from
	some preempt-disable region somewhere.  This also seems a bit
	unlikely, but CONFIG_PREEMPT_DEBUG=y should catch it.

3.	Preemption exposes a race condition that is extremely unlikely
	for CONFIG_PREEMPT=n.

Of course, it wouldn't hurt for someone who knows mm better than I to
check my work.

> > Dec  2 14:24:39 tux kernel: [ 8475.941616] conftest[9730]: segfault at 0 ip 0000000000400640 sp 00007fffa67ab300 error 4 in conftest[400000+1000]
> > Dec  2 14:24:40 tux kernel: [ 8476.104725] conftest[9753]: segfault at 0 ip 00007f6863024906 sp 00007fff0e31cc48 error 4 in libc-2.19.so[7f6862efe000+1a1000]
> > Dec  2 14:25:54 tux kernel: [ 8550.791697] INFO: rcu_sched detected stalls on CPUs/tasks: { 4} (detected by 0, t=60002 jiffies, g=112854, c=112853, q=0)

Note that the patch I gave to Dâniel provides only rcu_sched, as opposed
to the usual CONFIG_PREEMPT=y rcu_preempt and rcu_sched.  This is expected
behavior for CONFIG_TREE_RCU=y/CONFIG_TREE_PREEMPT_RCU=n.

> > Dec  2 14:25:54 tux kernel: [ 8550.791702] Task dump for CPU 4:
> > Dec  2 14:25:54 tux kernel: [ 8550.791703] cc1             R  running task        0 14344  14340 0x00080008
> > Dec  2 14:25:54 tux kernel: [ 8550.791706]  000000001bcebcd8 ffff880100000003 ffffffff810cb7f1 ffff88021f5f5c00
> > Dec  2 14:25:54 tux kernel: [ 8550.791708]  ffff88011bcebfd8 ffff88011bcebce8 ffffffff811fb970 ffff8802149a2a00
> > Dec  2 14:25:54 tux kernel: [ 8550.791710]  ffff8802149a2cc8 ffff88011bcebd28 ffffffff8103e979 ffff88020ed01398
> > Dec  2 14:25:54 tux kernel: [ 8550.791712] Call Trace:
> > Dec  2 14:25:54 tux kernel: [ 8550.791718]  [<ffffffff810cb7f1>] ? release_pages+0xa1/0x1e0

This does have a loop whose length is controlled by the "nr" argument.

> > Dec  2 14:25:54 tux kernel: [ 8550.791722]  [<ffffffff811fb970>] ? cpumask_any_but+0x30/0x40

This one is inconsistent with the release_pages() called function.
Besides, its runtime is limited by the number of CPUs, so it shouldn't
go on forever.

> > Dec  2 14:25:54 tux kernel: [ 8550.791725]  [<ffffffff8103e979>] ? flush_tlb_page+0x49/0xf0

This one should also have a sharply limited runtime.

> > Dec  2 14:25:54 tux kernel: [ 8550.791727]  [<ffffffff810cbe72>] ? lru_cache_add_active_or_unevictable+0x22/0x90

This one does acquire locks, so could in theory run for a long time.
Would require high contention on ->lru_lock, though.  A pvec can only
contain 14 pages, so the move loop should have limited runtime.

> > Dec  2 14:25:54 tux kernel: [ 8550.791731]  [<ffffffff810fc4c2>] ? alloc_pages_vma+0x72/0x130

This one contains a retry loop, at least if CONFIG_NUMA=y.  But I don't
see anything here that would block an RCU grace period.

> > Dec  2 14:25:54 tux kernel: [ 8550.791733]  [<ffffffff810cbe72>] ? lru_cache_add_active_or_unevictable+0x22/0x90

Duplicate address above, presumably one or both are due to stack-trace
confusion.

> > Dec  2 14:25:54 tux kernel: [ 8550.791735]  [<ffffffff810e5220>] ? handle_mm_fault+0x3a0/0xaf0

If this one had a problem, I would expect to see it in some of its called
functions.

> > Dec  2 14:25:54 tux kernel: [ 8550.791737]  [<ffffffff81039074>] ? __do_page_fault+0x224/0x4c0

Ditto.

> > Dec  2 14:25:54 tux kernel: [ 8550.791740]  [<ffffffff8110d54c>] ? new_sync_write+0x7c/0xb0

Ditto.

> > Dec  2 14:25:55 tux kernel: [ 8550.791743]  [<ffffffff8114765c>] ? fsnotify+0x27c/0x350

This one uses SRCU, not RCU.

> > Dec  2 14:25:55 tux kernel: [ 8550.791746]  [<ffffffff81087233>] ? rcu_eqs_enter+0x93/0xa0
> > Dec  2 14:25:55 tux kernel: [ 8550.791748]  [<ffffffff81087a5e>] ? rcu_user_enter+0xe/0x10

These two don't call fsnotify(), so I am assuming that the stack trace is
confused here.  Any chance of enabling frame pointers or some such to get
an accurate stack trace?  (And yes, this is one CPU tracing another live
CPU's stack, so some confusion is inherent, but probably not this far up
the stack.)

> > Dec  2 14:25:55 tux kernel: [ 8550.791749]  [<ffffffff8103938a>] ? do_page_fault+0x5a/0x70

Wrapper for __do_page_fault().  Yay!  Functions that actually call each
other in this stack trace!  ;-)

> > Dec  2 14:25:55 tux kernel: [ 8550.791752]  [<ffffffff8139d9d2>] ? page_fault+0x22/0x30

Not seeing much in the way of loops here.

							Thanx, Paul

> > 
> > 	If you need more info/testing, just ask.
> > 
> > -- 
> > Linux 3.17.0-dirty: Shuffling Zombie Juror
> > http://www.youtube.com/DanielFragaBR
> > http://exchangewar.info
> > Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL
> > 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 18:09                                         ` Paul E. McKenney
@ 2014-12-02 18:41                                           ` Dâniel Fraga
  0 siblings, 0 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 18:41 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, 2 Dec 2014 10:09:47 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> To Linus's point, I guess I could look at the RCU CPU stall warning.  ;-)
> 
> Summary:  Not seeing something that would loop for 21 seconds.
> Dâniel, if you let this run, does it hit a second RCU CPU stall
> warning, or does it just lock up?

	It just lock up. I can't even use the keyboard or mouse, so I
have to hard reset the system.

	I'm trying the bisect you asked... even if it takes longer,
maybe I can find something for you.

-- 
Linux 3.17.0-rc6-00235-gb94d525: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 17:14                                         ` Dâniel Fraga
@ 2014-12-02 18:42                                           ` Paul E. McKenney
  2014-12-02 18:47                                             ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 18:42 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 03:14:08PM -0200, Dâniel Fraga wrote:
> On Tue, 2 Dec 2014 09:04:07 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Is it harder to reproduce with CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n?
> 
> 	Yes, it's much harder! :)
> 
> > If it is a -lot- harder to reproduce, it might be worth bisecting among
> > the RCU read-side critical sections.  If making a few of them be
> > non-preemptible greatly reduces the probability of the bug occuring,
> > that might provide a clue about root cause.
> > 
> > On the other hand, if it is just a little harder to reproduce, this
> > RCU read-side bisection would likely be an exercise in futility.
> 
> 	Ok, I want to bisect it. Since it could be painful to bisect,
> could you suggest 2 commits between 3.16.0 and 3.17.0 so we can narrow
> the bisect? I could just bisect between 3.16.0 and 3.17.0 but it would
> take many days :).
> 
> 	Ps: if you prefer I bisect between 3.16.0 and 3.17.0, no
> problem, but you'll have to be patient ;).

I was actually suggesting something a bit different.  Instead of bisecting
by release, bisect by code.  The procedure is as follows:

1.	I figure out some reliable way of making RCU allow preemption to
	be disabled for some RCU read-side critical sections, but not for
	others.  I send you the patch, which has rcu_read_lock_test()
	as well as rcu_read_lock().

2.	You build a kernel without my Kconfig hack, with my patch from
	#1 above, and build a kernel with CONFIG_PREEMPT=y (which of
	course implies CONFIG_TREE_PREEMPT_RCU=y, given that you are
	building without my Kconfig hack).

3.	You make a list of all the rcu_read_lock() uses in the kernel
	(or ask me to provide it).  You change the rcu_read_lock()
	calls in the first half of this list to rcu_read_lock_test().

	If the kernel locks up as easily with this change as it did
	in a stock CONFIG_PREEMPT=y CONFIG_TREE_PREEMPT_RCU=y kernel,
	change half of the remaining rcu_read_lock() calls to
	rcu_read_lock_test().  If the kernel is much more resistant
	to lockup, change half of the rcu_read_lock_test() calls
	back to rcu_read_lock().

4.	It is quite possible that several of the RCU read-side critical
	sections contribute to the unreliability, in which case the
	bisection will get a bit more complicated.

Other thoughts on how to attack this?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 18:42                                           ` Paul E. McKenney
@ 2014-12-02 18:47                                             ` Dâniel Fraga
  2014-12-02 19:11                                               ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 18:47 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, 2 Dec 2014 10:42:02 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> I was actually suggesting something a bit different.  Instead of bisecting
> by release, bisect by code.  The procedure is as follows:
> 
> 1.	I figure out some reliable way of making RCU allow preemption to
> 	be disabled for some RCU read-side critical sections, but not for
> 	others.  I send you the patch, which has rcu_read_lock_test()
> 	as well as rcu_read_lock().
> 
> 2.	You build a kernel without my Kconfig hack, with my patch from
> 	#1 above, and build a kernel with CONFIG_PREEMPT=y (which of
> 	course implies CONFIG_TREE_PREEMPT_RCU=y, given that you are
> 	building without my Kconfig hack).
> 
> 3.	You make a list of all the rcu_read_lock() uses in the kernel
> 	(or ask me to provide it).  You change the rcu_read_lock()
> 	calls in the first half of this list to rcu_read_lock_test().
> 
> 	If the kernel locks up as easily with this change as it did
> 	in a stock CONFIG_PREEMPT=y CONFIG_TREE_PREEMPT_RCU=y kernel,
> 	change half of the remaining rcu_read_lock() calls to
> 	rcu_read_lock_test().  If the kernel is much more resistant
> 	to lockup, change half of the rcu_read_lock_test() calls
> 	back to rcu_read_lock().

	Ok Paul, I want to do everything I can to help you debug this.

	So can you provide me the list you mentioned at point 3 (or
tell me how can I get it)? If you guide me through this, I can do
whatever you need. Thanks!

-- 
Linux 3.17.0-rc6-00235-gb94d525: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 18:47                                             ` Dâniel Fraga
@ 2014-12-02 19:11                                               ` Paul E. McKenney
  2014-12-02 19:24                                                 ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 19:11 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 04:47:31PM -0200, Dâniel Fraga wrote:
> On Tue, 2 Dec 2014 10:42:02 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > I was actually suggesting something a bit different.  Instead of bisecting
> > by release, bisect by code.  The procedure is as follows:
> > 
> > 1.	I figure out some reliable way of making RCU allow preemption to
> > 	be disabled for some RCU read-side critical sections, but not for
> > 	others.  I send you the patch, which has rcu_read_lock_test()
> > 	as well as rcu_read_lock().
> > 
> > 2.	You build a kernel without my Kconfig hack, with my patch from
> > 	#1 above, and build a kernel with CONFIG_PREEMPT=y (which of
> > 	course implies CONFIG_TREE_PREEMPT_RCU=y, given that you are
> > 	building without my Kconfig hack).
> > 
> > 3.	You make a list of all the rcu_read_lock() uses in the kernel
> > 	(or ask me to provide it).  You change the rcu_read_lock()
> > 	calls in the first half of this list to rcu_read_lock_test().
> > 
> > 	If the kernel locks up as easily with this change as it did
> > 	in a stock CONFIG_PREEMPT=y CONFIG_TREE_PREEMPT_RCU=y kernel,
> > 	change half of the remaining rcu_read_lock() calls to
> > 	rcu_read_lock_test().  If the kernel is much more resistant
> > 	to lockup, change half of the rcu_read_lock_test() calls
> > 	back to rcu_read_lock().
> 
> 	Ok Paul, I want to do everything I can to help you debug this.
> 
> 	So can you provide me the list you mentioned at point 3 (or
> tell me how can I get it)? If you guide me through this, I can do
> whatever you need. Thanks!

OK.  I need to know exactly what version of the Linux kernel you are
using.  3.18-rc7?  (I am not too worried about exactly which version
you are using as long as I know which version it is.)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 19:11                                               ` Paul E. McKenney
@ 2014-12-02 19:24                                                 ` Dâniel Fraga
  2014-12-02 20:56                                                   ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 19:24 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, 2 Dec 2014 11:11:43 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> OK.  I need to know exactly what version of the Linux kernel you are
> using.  3.18-rc7?  (I am not too worried about exactly which version
> you are using as long as I know which version it is.)

	Ok, I stopped bisecting and went back to 3.17.0 stock kernel.
I'm testing with 3.17.0 kernel because this one is the first to show
problems. If you want me to go to 3.18-rc7, just ask I can checkout
through git.

	Ps: my signature will reflect the kernel I'm using now ;)

-- 
Linux 3.17.0: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 20:36                                   ` Linus Torvalds
  2014-12-01 23:08                                     ` Chris Mason
@ 2014-12-02 19:31                                     ` Dave Jones
  2014-12-02 21:17                                       ` Linus Torvalds
  2014-12-02 20:30                                     ` Dave Jones
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-02 19:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Mon, Dec 01, 2014 at 12:36:34PM -0800, Linus Torvalds wrote:
 > On Mon, Dec 1, 2014 at 12:28 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
 > >
 > >         Hi Paul. Please, I'd like the patch, because without
 > > preemption, I'm unable to trigger this bug.
 > 
 > Ok, that's already interesting information. And yes, it would probably
 > be interesting to see if CONFIG_PREEMPT=y but !CONFIG_TREE_PREEMPT_RCU
 > then solves it too, to narrow it down to one but not the other..
 > 
 > DaveJ - what about your situation? The standard Fedora kernels use
 > CONFIG_PREEMPT_VOLUNTARY, do you have CONFIG_PREEMPT and
 > CONFIG_TREE_PREEMPT_RCU enabled?

I periodically switch PREEMPT options, just to see if anything new falls
out, though I've been on CONFIG_PREEMPT for quite a while.
So right now I'm testing the TREE_PREEMPT_RCU case.

I'm in the process of bisecting 3.16 -> 3.17 (currently on -rc1)
Thanksgiving kind of screwed up my flow, but 3.16 got a real
pounding for over 3 days with no problems.
I can give a !PREEMPT_RCU build a try, in case that turns out to
point to something quicker than a bisect is going to.

 > I think you and Sasha both saw some RCU oddities too, no?

I don't recall a kernel where I didn't see RCU oddities of some
description, but in recent times, not so much, though a few releases
back I did change some of the RCU related CONFIG options while
Paul & co were chasing down some bugs.

These days I've been running with..

# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
# CONFIG_RCU_USER_QS is not set
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_NONE is not set
# CONFIG_RCU_NOCB_CPU_ZERO is not set
CONFIG_RCU_NOCB_CPU_ALL=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
# RCU Debugging
CONFIG_SPARSE_RCU_POINTER=y
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_RCU_CPU_STALL_INFO=y
CONFIG_RCU_TRACE=y

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 23:08                                     ` Chris Mason
  2014-12-01 23:25                                       ` Linus Torvalds
  2014-12-02 14:13                                       ` Mike Galbraith
@ 2014-12-02 19:32                                       ` Dave Jones
  2014-12-02 23:32                                         ` Sasha Levin
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-02 19:32 UTC (permalink / raw)
  To: Chris Mason, Linus Torvalds, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Mon, Dec 01, 2014 at 06:08:38PM -0500, Chris Mason wrote:
 > I'm not sure if this is related, but running trinity here, I noticed it
 > was stuck at 100% system time on every CPU.  perf report tells me we are
 > spending all of our time in spin_lock under the sync system call.
 > 
 > I think it's coming from contention in the bdi_queue_work() call from
 > inside sync_inodes_sb, which is spin_lock_bh(). 
 > 
 > I wonder if we're just spinning so hard on this one bh lock that we're
 > starving the watchdog?
 > 
 > Dave, do you have spinlock debugging on?  

That has been a constant, yes. I can try with that disabled some time.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 20:36                                   ` Linus Torvalds
  2014-12-01 23:08                                     ` Chris Mason
  2014-12-02 19:31                                     ` Dave Jones
@ 2014-12-02 20:30                                     ` Dave Jones
  2014-12-02 20:48                                       ` Paul E. McKenney
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-02 20:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Mon, Dec 01, 2014 at 12:36:34PM -0800, Linus Torvalds wrote:
 > On Mon, Dec 1, 2014 at 12:28 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
 > >
 > >         Hi Paul. Please, I'd like the patch, because without
 > > preemption, I'm unable to trigger this bug.
 > 
 > Ok, that's already interesting information. And yes, it would probably
 > be interesting to see if CONFIG_PREEMPT=y but !CONFIG_TREE_PREEMPT_RCU
 > then solves it too, to narrow it down to one but not the other..

That combination doesn't seem possible. TREE_PREEMPT_RCU is the only
possible choice if PREEMPT=y

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 20:30                                     ` Dave Jones
@ 2014-12-02 20:48                                       ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 20:48 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 03:30:44PM -0500, Dave Jones wrote:
> On Mon, Dec 01, 2014 at 12:36:34PM -0800, Linus Torvalds wrote:
>  > On Mon, Dec 1, 2014 at 12:28 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
>  > >
>  > >         Hi Paul. Please, I'd like the patch, because without
>  > > preemption, I'm unable to trigger this bug.
>  > 
>  > Ok, that's already interesting information. And yes, it would probably
>  > be interesting to see if CONFIG_PREEMPT=y but !CONFIG_TREE_PREEMPT_RCU
>  > then solves it too, to narrow it down to one but not the other..
> 
> That combination doesn't seem possible. TREE_PREEMPT_RCU is the only
> possible choice if PREEMPT=y

Indeed, getting that combination requires a Kconfig patch, which I
supplied below.  Not for mainline, debugging only.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/init/Kconfig b/init/Kconfig
index 903505e66d1d..2cf71fcd514f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -469,7 +469,7 @@ choice
 
 config TREE_RCU
 	bool "Tree-based hierarchical RCU"
-	depends on !PREEMPT && SMP
+	depends on SMP
 	select IRQ_WORK
 	help
 	  This option selects the RCU implementation that is


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 19:24                                                 ` Dâniel Fraga
@ 2014-12-02 20:56                                                   ` Paul E. McKenney
  2014-12-02 22:01                                                     ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 20:56 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 05:24:39PM -0200, Dâniel Fraga wrote:
> On Tue, 2 Dec 2014 11:11:43 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > OK.  I need to know exactly what version of the Linux kernel you are
> > using.  3.18-rc7?  (I am not too worried about exactly which version
> > you are using as long as I know which version it is.)
> 
> 	Ok, I stopped bisecting and went back to 3.17.0 stock kernel.
> I'm testing with 3.17.0 kernel because this one is the first to show
> problems. If you want me to go to 3.18-rc7, just ask I can checkout
> through git.
> 
> 	Ps: my signature will reflect the kernel I'm using now ;)

And I left out a step.  Let's make sure that my preempt_disabled() hack
to CONFIG_TREE_PREEMPT_RCU=y has the same effect as the Kconfig hack
that allowed CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n.  Could you
please try out the following patch configured with CONFIG_PREEMPT=y
and CONFIG_TREE_PREEMPT_RCU=y?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index e0d31a345ee6..fff605a9e87f 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -71,7 +71,11 @@ module_param(rcu_expedited, int, 0);
  */
 void __rcu_read_lock(void)
 {
-	current->rcu_read_lock_nesting++;
+	struct task_struct *t = current;
+
+	if (!t->rcu_read_lock_nesting)
+		preempt_disable();
+	t->rcu_read_lock_nesting++;
 	barrier();  /* critical section after entry code. */
 }
 EXPORT_SYMBOL_GPL(__rcu_read_lock);
@@ -92,6 +96,7 @@ void __rcu_read_unlock(void)
 	} else {
 		barrier();  /* critical section before exit code. */
 		t->rcu_read_lock_nesting = INT_MIN;
+		preempt_enable();
 		barrier();  /* assign before ->rcu_read_unlock_special load */
 		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special.s)))
 			rcu_read_unlock_special(t);


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 19:31                                     ` Dave Jones
@ 2014-12-02 21:17                                       ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-02 21:17 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 11:31 AM, Dave Jones <davej@redhat.com> wrote:
>
> I'm in the process of bisecting 3.16 -> 3.17 (currently on -rc1)
> Thanksgiving kind of screwed up my flow, but 3.16 got a real
> pounding for over 3 days with no problems.
> I can give a !PREEMPT_RCU build a try, in case that turns out to
> point to something quicker than a bisect is going to.

No,. go on with the bisect. I think at this point we'll all be happier
narrowing down the range of commits than anything else. The behavior
Dâniel sees may be entirely unrelated anyway.

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 20:56                                                   ` Paul E. McKenney
@ 2014-12-02 22:01                                                     ` Dâniel Fraga
  2014-12-02 22:10                                                       ` Paul E. McKenney
  2014-12-02 22:10                                                       ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 22:01 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, 2 Dec 2014 12:56:36 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> And I left out a step.  Let's make sure that my preempt_disabled() hack
> to CONFIG_TREE_PREEMPT_RCU=y has the same effect as the Kconfig hack
> that allowed CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n.  Could you
> please try out the following patch configured with CONFIG_PREEMPT=y
> and CONFIG_TREE_PREEMPT_RCU=y?

	Of course! I applied your patch to 3.17 stock kernel and after
stressing it (compiling with -j8 and watching videos on Youtube) to
trigger the bug I got the following:

Dec  2 19:47:25 tux kernel: [  927.973547] INFO: rcu_preempt detected stalls on CPUs/tasks: { 5} (detected by 1, t=60002 jiffies, g=71142, c=71141, q=0)
Dec  2 19:47:26 tux kernel: [  927.973553] Task dump for CPU 5:
Dec  2 19:47:26 tux kernel: [  927.973555] cc1             R  running task        0 30691  30680 0x00080008
Dec  2 19:47:26 tux kernel: [  927.973558]  ffff88021f351bc0 ffff8801d5743f00 ffffffff8107062a ffff88021f351c38
Dec  2 19:47:26 tux kernel: [  927.973560]  ffff8801d5743ea8 ffff8800cd3b0000 ffff8800cd3b041c 0000000000000000
Dec  2 19:47:26 tux kernel: [  927.973562]  00007f89b4d7d8e8 00007f89b5939a60 ffff88021f34d2c0 00007f89b4d7f000
Dec  2 19:47:26 tux kernel: [  927.973564] Call Trace:
Dec  2 19:47:26 tux kernel: [  927.973573]  [<ffffffff8107062a>] ? pick_next_task_fair+0x6aa/0x890
Dec  2 19:47:26 tux kernel: [  927.973577]  [<ffffffff81087483>] ? rcu_eqs_enter+0x93/0xa0
Dec  2 19:47:26 tux kernel: [  927.973579]  [<ffffffff81087f2e>] ? rcu_user_enter+0xe/0x10
Dec  2 19:47:26 tux kernel: [  927.973582]  [<ffffffff8103935a>] ? do_page_fault+0x5a/0x70
Dec  2 19:47:26 tux kernel: [  927.973585]  [<ffffffff8139bed2>] ? page_fault+0x22/0x30
Dec  2 19:47:30 tux kernel: [  932.471964] CPU1: Core temperature above threshold, cpu clock throttled (total events = 820)
Dec  2 19:47:30 tux kernel: [  932.471966] CPU6: Package temperature above threshold, cpu clock throttled (total events = 2624)
Dec  2 19:47:30 tux kernel: [  932.471967] CPU3: Package temperature above threshold, cpu clock throttled (total events = 2624)
Dec  2 19:47:30 tux kernel: [  932.471968] CPU7: Package temperature above threshold, cpu clock throttled (total events = 2624)
Dec  2 19:47:30 tux kernel: [  932.471969] CPU0: Package temperature above threshold, cpu clock throttled (total events = 2624)
Dec  2 19:47:30 tux kernel: [  932.471970] CPU2: Package temperature above threshold, cpu clock throttled (total events = 2624)
Dec  2 19:47:30 tux kernel: [  932.471978] CPU1: Package temperature above threshold, cpu clock throttled (total events = 2624)
Dec  2 19:47:30 tux kernel: [  932.472922] CPU1: Core temperature/speed normal
Dec  2 19:47:30 tux kernel: [  932.472923] CPU6: Package temperature/speed normal
Dec  2 19:47:31 tux kernel: [  932.472923] CPU2: Package temperature/speed normal
Dec  2 19:47:31 tux kernel: [  932.472924] CPU3: Package temperature/speed normal
Dec  2 19:47:31 tux kernel: [  932.472925] CPU0: Package temperature/speed normal

	Waiting for your next instructions.

-- 
Linux 3.17.0-dirty: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 22:01                                                     ` Dâniel Fraga
@ 2014-12-02 22:10                                                       ` Paul E. McKenney
  2014-12-02 22:18                                                         ` Dâniel Fraga
  2014-12-02 22:10                                                       ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 22:10 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 08:01:49PM -0200, Dâniel Fraga wrote:
> On Tue, 2 Dec 2014 12:56:36 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > And I left out a step.  Let's make sure that my preempt_disabled() hack
> > to CONFIG_TREE_PREEMPT_RCU=y has the same effect as the Kconfig hack
> > that allowed CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n.  Could you
> > please try out the following patch configured with CONFIG_PREEMPT=y
> > and CONFIG_TREE_PREEMPT_RCU=y?
> 
> 	Of course! I applied your patch to 3.17 stock kernel and after
> stressing it (compiling with -j8 and watching videos on Youtube) to
> trigger the bug I got the following:

Thank you!!!

Was this as difficult to trigger as the version with the Kconfig hack
that used CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n?

							Thanx, Paul

> Dec  2 19:47:25 tux kernel: [  927.973547] INFO: rcu_preempt detected stalls on CPUs/tasks: { 5} (detected by 1, t=60002 jiffies, g=71142, c=71141, q=0)
> Dec  2 19:47:26 tux kernel: [  927.973553] Task dump for CPU 5:
> Dec  2 19:47:26 tux kernel: [  927.973555] cc1             R  running task        0 30691  30680 0x00080008
> Dec  2 19:47:26 tux kernel: [  927.973558]  ffff88021f351bc0 ffff8801d5743f00 ffffffff8107062a ffff88021f351c38
> Dec  2 19:47:26 tux kernel: [  927.973560]  ffff8801d5743ea8 ffff8800cd3b0000 ffff8800cd3b041c 0000000000000000
> Dec  2 19:47:26 tux kernel: [  927.973562]  00007f89b4d7d8e8 00007f89b5939a60 ffff88021f34d2c0 00007f89b4d7f000
> Dec  2 19:47:26 tux kernel: [  927.973564] Call Trace:
> Dec  2 19:47:26 tux kernel: [  927.973573]  [<ffffffff8107062a>] ? pick_next_task_fair+0x6aa/0x890
> Dec  2 19:47:26 tux kernel: [  927.973577]  [<ffffffff81087483>] ? rcu_eqs_enter+0x93/0xa0
> Dec  2 19:47:26 tux kernel: [  927.973579]  [<ffffffff81087f2e>] ? rcu_user_enter+0xe/0x10
> Dec  2 19:47:26 tux kernel: [  927.973582]  [<ffffffff8103935a>] ? do_page_fault+0x5a/0x70
> Dec  2 19:47:26 tux kernel: [  927.973585]  [<ffffffff8139bed2>] ? page_fault+0x22/0x30
> Dec  2 19:47:30 tux kernel: [  932.471964] CPU1: Core temperature above threshold, cpu clock throttled (total events = 820)
> Dec  2 19:47:30 tux kernel: [  932.471966] CPU6: Package temperature above threshold, cpu clock throttled (total events = 2624)
> Dec  2 19:47:30 tux kernel: [  932.471967] CPU3: Package temperature above threshold, cpu clock throttled (total events = 2624)
> Dec  2 19:47:30 tux kernel: [  932.471968] CPU7: Package temperature above threshold, cpu clock throttled (total events = 2624)
> Dec  2 19:47:30 tux kernel: [  932.471969] CPU0: Package temperature above threshold, cpu clock throttled (total events = 2624)
> Dec  2 19:47:30 tux kernel: [  932.471970] CPU2: Package temperature above threshold, cpu clock throttled (total events = 2624)
> Dec  2 19:47:30 tux kernel: [  932.471978] CPU1: Package temperature above threshold, cpu clock throttled (total events = 2624)
> Dec  2 19:47:30 tux kernel: [  932.472922] CPU1: Core temperature/speed normal
> Dec  2 19:47:30 tux kernel: [  932.472923] CPU6: Package temperature/speed normal
> Dec  2 19:47:31 tux kernel: [  932.472923] CPU2: Package temperature/speed normal
> Dec  2 19:47:31 tux kernel: [  932.472924] CPU3: Package temperature/speed normal
> Dec  2 19:47:31 tux kernel: [  932.472925] CPU0: Package temperature/speed normal
> 
> 	Waiting for your next instructions.
> 
> -- 
> Linux 3.17.0-dirty: Shuffling Zombie Juror
> http://www.youtube.com/DanielFragaBR
> http://exchangewar.info
> Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 22:01                                                     ` Dâniel Fraga
  2014-12-02 22:10                                                       ` Paul E. McKenney
@ 2014-12-02 22:10                                                       ` Linus Torvalds
  2014-12-02 22:16                                                         ` Dâniel Fraga
  2014-12-03  3:21                                                         ` Dâniel Fraga
  1 sibling, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-02 22:10 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Paul E. McKenney, Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 2:01 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Of course! I applied your patch to 3.17 stock kernel and after
> stressing it (compiling with -j8 and watching videos on Youtube) to
> trigger the bug I got the following:

So it appears that you can recreate this much more quickly than DaveJ
can recreate his issue.

The two issues may be entirely unrelated, but the it is certainly
quite possible that they have some relation to each other, and the
timing is intriguing, in that 3.17 seems to be the first kernel
release this happened in.

So at this point I think I'd ask you to just go back to your bisection
that you apparently already started earlier. I take it 3.16 worked
fine, and that's what you used as the good base for your bisect?

Even if it's something else than what DaveJ sees (or perhaps
*particularly* if it's something else), bisecting when it started
would be very worthwhile.

There's 13k+ commits in between 3.16 and 3.17, so a full bisect should
be around 15 test-points. But judging by the timing of your emails,
you can generally reproduce this relatively quickly..

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 22:10                                                       ` Linus Torvalds
@ 2014-12-02 22:16                                                         ` Dâniel Fraga
  2014-12-03  3:21                                                         ` Dâniel Fraga
  1 sibling, 0 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 22:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul E. McKenney, Linux Kernel Mailing List

On Tue, 2 Dec 2014 14:10:33 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So it appears that you can recreate this much more quickly than DaveJ
> can recreate his issue.
> 
> The two issues may be entirely unrelated, but the it is certainly
> quite possible that they have some relation to each other, and the
> timing is intriguing, in that 3.17 seems to be the first kernel
> release this happened in.
> 
> So at this point I think I'd ask you to just go back to your bisection
> that you apparently already started earlier. I take it 3.16 worked
> fine, and that's what you used as the good base for your bisect?
> 
> Even if it's something else than what DaveJ sees (or perhaps
> *particularly* if it's something else), bisecting when it started
> would be very worthwhile.
> 
> There's 13k+ commits in between 3.16 and 3.17, so a full bisect should
> be around 15 test-points. But judging by the timing of your emails,
> you can generally reproduce this relatively quickly..

	No problem Linus. I'll try the full bisect and I post here
later with the final result.

-- 
Linux 3.17.0-dirty: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 22:10                                                       ` Paul E. McKenney
@ 2014-12-02 22:18                                                         ` Dâniel Fraga
  2014-12-02 22:35                                                           ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-02 22:18 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, 2 Dec 2014 14:10:31 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Thank you!!!

	;)

> Was this as difficult to trigger as the version with the Kconfig hack
> that used CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n?

	Yes. I had to try many times until I got the call trace.

	I'll try the bisect as Linus suggested, but if you have any
other suggestions, just ask ;). Thanks Paul.

-- 
Linux 3.17.0-dirty: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 22:18                                                         ` Dâniel Fraga
@ 2014-12-02 22:35                                                           ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-02 22:35 UTC (permalink / raw)
  To: Dâniel Fraga; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 08:18:46PM -0200, Dâniel Fraga wrote:
> On Tue, 2 Dec 2014 14:10:31 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Thank you!!!
> 
> 	;)
> 
> > Was this as difficult to trigger as the version with the Kconfig hack
> > that used CONFIG_PREEMPT=y and CONFIG_TREE_PREEMPT_RCU=n?
> 
> 	Yes. I had to try many times until I got the call trace.
> 
> 	I'll try the bisect as Linus suggested, but if you have any
> other suggestions, just ask ;). Thanks Paul.

Sounds good to me -- getting a single commit somewhere between
3.16 and 3.17 is going to be a lot better than reasoning indirectly
from some set of RCU read-side critical sections.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 19:32                                       ` Dave Jones
@ 2014-12-02 23:32                                         ` Sasha Levin
  2014-12-03  0:09                                           ` Linus Torvalds
  2014-12-05  5:00                                           ` Sasha Levin
  0 siblings, 2 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-02 23:32 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Linus Torvalds, Dâniel Fraga,
	Paul E. McKenney, Linux Kernel Mailing List

On 12/02/2014 02:32 PM, Dave Jones wrote:
> On Mon, Dec 01, 2014 at 06:08:38PM -0500, Chris Mason wrote:
>  > I'm not sure if this is related, but running trinity here, I noticed it
>  > was stuck at 100% system time on every CPU.  perf report tells me we are
>  > spending all of our time in spin_lock under the sync system call.
>  > 
>  > I think it's coming from contention in the bdi_queue_work() call from
>  > inside sync_inodes_sb, which is spin_lock_bh(). 
>  > 
>  > I wonder if we're just spinning so hard on this one bh lock that we're
>  > starving the watchdog?
>  > 
>  > Dave, do you have spinlock debugging on?  
> 
> That has been a constant, yes. I can try with that disabled some time.

Here's my side of the story: I was observing RCU lockups which went away when
I disabled verbose printing for fault injections. It seems that printing one
line ~10 times a second can cause that...

I've disabled lock debugging to see if anything new will show up, and hit
something that may be related:

[  787.894288] ================================================================================
[  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
[  787.898981] signed integer overflow:
[  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
[  787.900066] CPU: 18 PID: 12958 Comm: trinity-c103 Not tainted 3.18.0-rc6-next-20141201-sasha-00070-g028060a-dirty #1528
[  787.900066]  0000000000000000 0000000000000000 ffffffff93b0f890 ffff8806e3eff918
[  787.900066]  ffffffff91f1cf26 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff938
[  787.900066]  ffffffff91f1fb90 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff9f8
[  787.900066] Call Trace:
[  787.900066] dump_stack (lib/dump_stack.c:52)
[  787.900066] ubsan_epilogue (lib/ubsan.c:159)
[  787.900066] handle_overflow (lib/ubsan.c:191)
[  787.900066] ? __do_page_fault (arch/x86/mm/fault.c:1220)
[  787.900066] ? local_clock (kernel/sched/clock.c:392)
[  787.900066] __ubsan_handle_mul_overflow (lib/ubsan.c:218)
[  787.900066] select_task_rq_fair (kernel/sched/fair.c:4541 kernel/sched/fair.c:4755)
[  787.900066] try_to_wake_up (kernel/sched/core.c:1415 kernel/sched/core.c:1724)
[  787.900066] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:33)
[  787.900066] default_wake_function (kernel/sched/core.c:2979)
[  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
[  787.900066] autoremove_wake_function (kernel/sched/wait.c:295)
[  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
[  787.900066] __wake_up_common (kernel/sched/wait.c:73)
[  787.900066] __wake_up_sync_key (include/linux/spinlock.h:364 kernel/sched/wait.c:146)
[  787.900066] pipe_write (fs/pipe.c:466)
[  787.900066] ? kasan_poison_shadow (mm/kasan/kasan.c:48)
[  787.900066] ? new_sync_read (fs/read_write.c:480)
[  787.900066] do_iter_readv_writev (fs/read_write.c:681)
[  787.900066] compat_do_readv_writev (fs/read_write.c:1029)
[  787.900066] ? wait_for_partner (fs/pipe.c:340)
[  787.900066] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[  787.900066] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  787.900066] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
[  787.900066] compat_writev (fs/read_write.c:1145)
[  787.900066] compat_SyS_writev (fs/read_write.c:1163 fs/read_write.c:1151)
[  787.900066] ia32_do_call (arch/x86/ia32/ia32entry.S:446)
[  787.900066] ================================================================================

(For Linus asking himself "what the hell is this UBSan thing, I didn't merge that!" - it's an
undefined behaviour sanitizer that works with gcc5.)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 23:32                                         ` Sasha Levin
@ 2014-12-03  0:09                                           ` Linus Torvalds
  2014-12-03  0:25                                             ` Sasha Levin
  2014-12-05  5:00                                           ` Sasha Levin
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03  0:09 UTC (permalink / raw)
  To: Sasha Levin, Peter Zijlstra, Ingo Molnar
  Cc: Dave Jones, Chris Mason, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 3:32 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> I've disabled lock debugging to see if anything new will show up, and hit
> something that may be related:

Very interesting. But your source code doesn't match mine - can you
say what that

    kernel/sched/fair.c:4541:17

line is?

There are at least five multiplications there (all inlined):

 - "imbalance*min_load" from find_idlest_group()

 - "factor * p->wakee_flips" in wake_wide()

 - at least three in wake_affine:

    "prev_eff_load *= capacity_of(this_cpu)"
    "this_eff_load *= this_load + effective_load(tg, this_cpu, weight, weight)"
    "prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight)"

(There are other multiplications too, but they are by constants afaik
and don't match yours).

None of those seem to have anything to do with the 3.16..3.17 changes,
but I might be missing something, and obviously this also might have
nothing to do with the problems anyway.

Adding Ingo/PeterZ to the participants again.

                 Linus


---
> [  787.894288] ================================================================================
> [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
> [  787.898981] signed integer overflow:
> [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
> [  787.900066] ubsan_epilogue (lib/ubsan.c:159)
> [  787.900066] handle_overflow (lib/ubsan.c:191)
> [  787.900066] ? __do_page_fault (arch/x86/mm/fault.c:1220)
> [  787.900066] ? local_clock (kernel/sched/clock.c:392)
> [  787.900066] __ubsan_handle_mul_overflow (lib/ubsan.c:218)
> [  787.900066] select_task_rq_fair (kernel/sched/fair.c:4541 kernel/sched/fair.c:4755)

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03  0:09                                           ` Linus Torvalds
@ 2014-12-03  0:25                                             ` Sasha Levin
  0 siblings, 0 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-03  0:25 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra, Ingo Molnar
  Cc: Dave Jones, Chris Mason, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/02/2014 07:09 PM, Linus Torvalds wrote:
> On Tue, Dec 2, 2014 at 3:32 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> >
>> > I've disabled lock debugging to see if anything new will show up, and hit
>> > something that may be related:
> Very interesting. But your source code doesn't match mine - can you
> say what that
> 
>     kernel/sched/fair.c:4541:17
> 
> line is?


Sorry about that, I'm testing on the -next kernel. The relevant code snippet is
in wake_affine:

        prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
        prev_eff_load *= capacity_of(this_cpu);

        if (this_load > 0) {
                this_eff_load *= this_load +
                        effective_load(tg, this_cpu, weight, weight);  <==== This one

                prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight);
        }


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 16:58                                   ` Dâniel Fraga
  2014-12-02 17:17                                     ` Paul E. McKenney
@ 2014-12-03  2:03                                     ` Lai Jiangshan
  2014-12-03  5:22                                       ` Paul E. McKenney
  1 sibling, 1 reply; 486+ messages in thread
From: Lai Jiangshan @ 2014-12-03  2:03 UTC (permalink / raw)
  To: paulmck; +Cc: Dâniel Fraga, Linus Torvalds, Linux Kernel Mailing List

On 12/03/2014 12:58 AM, Dâniel Fraga wrote:
> On Tue, 2 Dec 2014 16:40:37 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
>> It is needed at lest for testing.
>>
>> CONFIG_TREE_PREEMPT_RCU=y with CONFIG_PREEMPT=n is needed for testing too.
>>
>> Please enable them (or enable them under CONFIG_RCU_TRACE=y)
> 
> 	Lai, sorry but I didn't understand. Do you mean both of them
> enabled? Because how can CONFIG_TREE_PREEMPT_RCU be enabled without
> CONFIG_PREEMPT ?


Sorry, I replied to Paul, and my reply was off-topic, it has nothing
related your reports. Sorry again.

I think we need two combinations for testing (not mainline, but I think
they (combinations) should be enabled for test farms).

So I hope Paul enable them (combinations).

combination1: CONFIG_TREE_PREEMPT_RCU=n & CONFIG_PREEMPT=y
combination2: CONFIG_TREE_PREEMPT_RCU=y & CONFIG_PREEMPT=n

The core code should work correctly in these combinations.
I agree with Paul that these combinations should not be enabled in production,
So my request is: enable these combinations under CONFIG_RCU_TRACE
or CONFIG_TREE_RCU_TRACE.

For myself, I always edit the Kconfig directly, thus it is not a problem
for me.  But there is no way for test farms to test these combinations.

Thanks,
Lai

> 
> 	If you mean both enabled, I already reported a call trace with
> both enabled:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=85941
> 
> 	Please see my previous answer to Linus and Paul too.
> 
> 	Regarding CONFIG_RCU_TRACE, do you mean
> "CONFIG_TREE_RCU_TRACE"? I couldn't find CONFIG_RCU_TRACE.
> 
> 	Thanks.
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 22:10                                                       ` Linus Torvalds
  2014-12-02 22:16                                                         ` Dâniel Fraga
@ 2014-12-03  3:21                                                         ` Dâniel Fraga
  2014-12-03  4:14                                                           ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-03  3:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul E. McKenney, Linux Kernel Mailing List

On Tue, 2 Dec 2014 14:10:33 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> There's 13k+ commits in between 3.16 and 3.17, so a full bisect should
> be around 15 test-points. But judging by the timing of your emails,
> you can generally reproduce this relatively quickly..

	Ok Linus and Paul, it took me almost 5 hours to bisect it and
the result is:

c9b88e9581828bb8bba06c5e7ee8ed1761172b6e is the first bad commit

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9b88e9581828bb8bba06c5e7ee8ed1761172b6e

	I hope I didn't get any false positive/negative during 
bisect.

	And here's the complete bisect log (just in case):

git bisect start
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# bad: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
git bisect bad bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9
# bad: [f2d7e4d4398092d14fb039cb4d38e502d3f019ee] checkpatch: add fix_insert_line and fix_delete_line helpers
git bisect bad f2d7e4d4398092d14fb039cb4d38e502d3f019ee
# bad: [79eb238c76782a59d51adf8a3dd7f6444245b475] Merge tag 'tty-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad 79eb238c76782a59d51adf8a3dd7f6444245b475
# good: [3d582487beb83d650fbd25cb65688b0fbedc97f1] staging: vt6656: struct vnt_private pInterruptURB rename to interrupt_urb
git bisect good 3d582487beb83d650fbd25cb65688b0fbedc97f1
# bad: [e9c9eecabaa898ff3fedd98813ee4ac1a00d006a] Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad e9c9eecabaa898ff3fedd98813ee4ac1a00d006a
# bad: [c9b88e9581828bb8bba06c5e7ee8ed1761172b6e] Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect bad c9b88e9581828bb8bba06c5e7ee8ed1761172b6e
# good: [47dfe4037e37b2843055ea3feccf1c335ea23a9c] Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect good 47dfe4037e37b2843055ea3feccf1c335ea23a9c
# good: [b11a6face1b6d5518319f797a74e22bb4309daa9] clk: Add missing of_clk_set_defaults export
git bisect good b11a6face1b6d5518319f797a74e22bb4309daa9
# good: [3a636388bae8390d23f31e061c0c6fdc14525786] tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
git bisect good 3a636388bae8390d23f31e061c0c6fdc14525786
# good: [e17acfdc83b877794c119fac4627e80510ea3c09] Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
git bisect good e17acfdc83b877794c119fac4627e80510ea3c09
# good: [c7ed326fa7cafb83ced5a8b02517a61672fe9e90] Merge tag 'ktest-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
git bisect good c7ed326fa7cafb83ced5a8b02517a61672fe9e90
# good: [dc6f03f26f570104a2bb03f9d1deb588026d7c75] ftrace: Add warning if tramp hash does not match nr_trampolines
git bisect good dc6f03f26f570104a2bb03f9d1deb588026d7c75
# good: [ede392a75090aab49b01ecd6f7694bb9130ad461] tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic
git bisect good ede392a75090aab49b01ecd6f7694bb9130ad461
# good: [bb9ef1cb7d8668d6b0038b6f9f783c849135e40d] tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
git bisect good bb9ef1cb7d8668d6b0038b6f9f783c849135e40d
# good: [6355d54438bfc3b636cb6453cd091f782fb9b4d7] tracing: Kill "filter_string" arg of replace_preds()
git bisect good 6355d54438bfc3b636cb6453cd091f782fb9b4d7
# good: [b8c0aa46b3e86083721b57ed2eec6bd2c29ebfba] Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect good b8c0aa46b3e86083721b57ed2eec6bd2c29ebfba
# first bad commit: [c9b88e9581828bb8bba06c5e7ee8ed1761172b6e] Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

-- 
Linux 3.16.0-00409-gb8c0aa4: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03  3:21                                                         ` Dâniel Fraga
@ 2014-12-03  4:14                                                           ` Linus Torvalds
  2014-12-03  4:51                                                             ` Dâniel Fraga
                                                                               ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03  4:14 UTC (permalink / raw)
  To: Dâniel Fraga, Tejun Heo; +Cc: Paul E. McKenney, Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 7:21 PM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Ok Linus and Paul, it took me almost 5 hours to bisect it and
> the result is:

Much faster than I expected. However:

> c9b88e9581828bb8bba06c5e7ee8ed1761172b6e is the first bad commit

Hgghnn.. A merge commit can certainly be the thing that introduces
bugs, but it *usually* isn't. Especially not one that is fairly small
and has no actual conflcts in it. Sure, there could be semantics
conflicts etc, but that's where "fairly small" comes in - that is just
not a complicated or subtle merge. And there are other reasons to
believe your bisection weered off into the weeds earlier. Read on.

So:

>         I hope I didn't get any false positive/negative during
> bisect.

Well, the "bad" ones should be pretty safe, since there is no question
at all about any case where things locked up. So unless you actually
mis-typed or did something other silly, I'll trust the ones you marked
bad.

It's the ones marked "good" that are more questionable, and might be
wrong, because you didn't run for long enough, and didn't happen to
hit the right condition.

Your bisection log also kind of points to a mistake: it ends with a
long run of "all good". That usually means that you're not actually
getting closer to the bug: if you were, you'd - pretty much by
definition - also get closer to the "edge" of the bug, and you should
generally see a mix of good/bad as you narrow in on it. Of course,
it's all statistical, so I'm not saying that a run of "good"
bisections is a sure-fire sign of anything, but it's just another
sign: you may have marked something "good" that wasn't, and that
actually took you *away* from the bug, so now everything that followed
that false positive was good.

>         And here's the complete bisect log (just in case):

So this part I'll believe in:

> git bisect start
> # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
> git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
> # bad: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
> git bisect bad bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9
> # bad: [f2d7e4d4398092d14fb039cb4d38e502d3f019ee] checkpatch: add fix_insert_line and fix_delete_line helpers
> git bisect bad f2d7e4d4398092d14fb039cb4d38e502d3f019ee
> # bad: [79eb238c76782a59d51adf8a3dd7f6444245b475] Merge tag 'tty-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect bad 79eb238c76782a59d51adf8a3dd7f6444245b475
> # good: [3d582487beb83d650fbd25cb65688b0fbedc97f1] staging: vt6656: struct vnt_private pInterruptURB rename to interrupt_urb
> git bisect good 3d582487beb83d650fbd25cb65688b0fbedc97f1
> # bad: [e9c9eecabaa898ff3fedd98813ee4ac1a00d006a] Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad e9c9eecabaa898ff3fedd98813ee4ac1a00d006a
> # bad: [c9b88e9581828bb8bba06c5e7ee8ed1761172b6e] Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
> git bisect bad c9b88e9581828bb8bba06c5e7ee8ed1761172b6e

because anything marked "bad" clearly must be bad, and anything you
marked "good" before that was probably correct too - because you saw
"bad" cases after it, the good marking clearly hadn't made us ignore
the bug.

Put another way: "bad" is generally more trustworthy (because you
actively saw the bug), while a "good" _before_ a subsequent bad is
also trustworthy (because if the "good" kernel contained the bug and
you should have marked it bad, we'd then go on to test all the commits
that were *not* the bug, so we'd never see a "bad" kernel again).

Of course, the above rule-of-thumb is a simplification of reality. In
reality, there might be multiple bugs that come together and make the
whole good-vs-bad a much less black-and-white thing, but *generally* I
trust "git bisect bad" more than "git bisect good", and "git bisect
good" that is followed by "bad".

What is *really* suspicious is a series of "git bisect good" with no
"bad"s anywhere. Which is exactly what we see at the end of the
bisect.

So might I ask you to try starting from this point again (this is why
the bisect log is so useful - no need to retest the above part, you
can just mindlessly do that sequence by hand without testing), and
starting with this commit:

> # good: [47dfe4037e37b2843055ea3feccf1c335ea23a9c] Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
> git bisect good 47dfe4037e37b2843055ea3feccf1c335ea23a9c

Double-check whether that commit is really good. Run that "good"
kernel for a longer time, and under heavier load. Just to verify.

Because looking at the part of the bisect that seems trust-worthy, and
looking at what remains (hint: do "gitk --bisect" while bisecting to
see what is going on), these are the merges in that set (in my
"mergelog" format):

    Bjorn Helgaas (1):
      PCI updates

    Borislav Petkov (1):
      EDAC changes

    Herbert Xu (1):
      crypto update

    Jeff Layton (1):
      file locking related changes

    Mike Turquette (1):
      clock framework updates

    Steven Rostedt (3):
      config-bisect changes
      tracing updates
      tracing filter cleanups

    Tejun Heo (4):
      workqueue updates
      percpu updates
      cgroup changes
      libata changes

and quite frankly, for some core bug like this, I'd suspsect the
workqueue or percpu updates from Tejun (possibly cgroup), *not* the
tracing pull.

Of course, bugs can come in from anywhere, so it *could* be the
tracing one, and it *could* be the merge commit, but my gut just
screams that you probably missed one bad kernel, and marked it good.
And it's really that very first one (ie commit
47dfe4037e37b2843055ea3feccf1c335ea23a9c) that contains most of the
actually suspect code, so I'd really like you to re-test that one a
lot before you call it "good" again.

Humor me.

I added Tejun to the Cc, just because I wanted to give him a heads-up
that I am tentatively starting to blame him in my dark little mind..

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03  4:14                                                           ` Linus Torvalds
@ 2014-12-03  4:51                                                             ` Dâniel Fraga
  2014-12-03  6:02                                                             ` Chris Rorvick
  2014-12-03 14:54                                                             ` Tejun Heo
  2 siblings, 0 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-03  4:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tejun Heo, Paul E. McKenney, Linux Kernel Mailing List

On Tue, 2 Dec 2014 20:14:52 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> What is *really* suspicious is a series of "git bisect good" with no
> "bad"s anywhere. Which is exactly what we see at the end of the
> bisect.
> 
> So might I ask you to try starting from this point again (this is why
> the bisect log is so useful - no need to retest the above part, you
> can just mindlessly do that sequence by hand without testing), and
> starting with this commit:
> 
> > # good: [47dfe4037e37b2843055ea3feccf1c335ea23a9c] Merge branch
> > 'for-3.17' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup git bisect
> > good 47dfe4037e37b2843055ea3feccf1c335ea23a9c

> Of course, bugs can come in from anywhere, so it *could* be the
> tracing one, and it *could* be the merge commit, but my gut just
> screams that you probably missed one bad kernel, and marked it good.
> And it's really that very first one (ie commit
> 47dfe4037e37b2843055ea3feccf1c335ea23a9c) that contains most of the
> actually suspect code, so I'd really like you to re-test that one a
> lot before you call it "good" again.
> 
> Humor me.
> 
> I added Tejun to the Cc, just because I wanted to give him a heads-up
> that I am tentatively starting to blame him in my dark little mind..

	:)

	I understand Linus. I'll try the 47dfe4037 commit you
suggested harder and I'll return tomorrow.  

-- 
Linux 3.16.0-00409-gb8c0aa4: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03  2:03                                     ` Lai Jiangshan
@ 2014-12-03  5:22                                       ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-03  5:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Dâniel Fraga, Linus Torvalds, Linux Kernel Mailing List

On Wed, Dec 03, 2014 at 10:03:56AM +0800, Lai Jiangshan wrote:
> On 12/03/2014 12:58 AM, Dâniel Fraga wrote:
> > On Tue, 2 Dec 2014 16:40:37 +0800
> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> > 
> >> It is needed at lest for testing.
> >>
> >> CONFIG_TREE_PREEMPT_RCU=y with CONFIG_PREEMPT=n is needed for testing too.
> >>
> >> Please enable them (or enable them under CONFIG_RCU_TRACE=y)
> > 
> > 	Lai, sorry but I didn't understand. Do you mean both of them
> > enabled? Because how can CONFIG_TREE_PREEMPT_RCU be enabled without
> > CONFIG_PREEMPT ?
> 
> 
> Sorry, I replied to Paul, and my reply was off-topic, it has nothing
> related your reports. Sorry again.
> 
> I think we need two combinations for testing (not mainline, but I think
> they (combinations) should be enabled for test farms).
> 
> So I hope Paul enable them (combinations).
> 
> combination1: CONFIG_TREE_PREEMPT_RCU=n & CONFIG_PREEMPT=y
> combination2: CONFIG_TREE_PREEMPT_RCU=y & CONFIG_PREEMPT=n
> 
> The core code should work correctly in these combinations.
> I agree with Paul that these combinations should not be enabled in production,
> So my request is: enable these combinations under CONFIG_RCU_TRACE
> or CONFIG_TREE_RCU_TRACE.
> 
> For myself, I always edit the Kconfig directly, thus it is not a problem
> for me.  But there is no way for test farms to test these combinations.

OK, I'll bite...

How have these two combinations helped you in your testing?

The reason I ask is that I am actually trying to -decrease- the RCU
configurations, not increase them.  Added configurations need to have
strong justification, for example, the kernel-tracing/patching need
for tasks_rcu.

							Thanx, Paul

> Thanks,
> Lai
> 
> > 
> > 	If you mean both enabled, I already reported a call trace with
> > both enabled:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=85941
> > 
> > 	Please see my previous answer to Linus and Paul too.
> > 
> > 	Regarding CONFIG_RCU_TRACE, do you mean
> > "CONFIG_TREE_RCU_TRACE"? I couldn't find CONFIG_RCU_TRACE.
> > 
> > 	Thanks.
> > 
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03  4:14                                                           ` Linus Torvalds
  2014-12-03  4:51                                                             ` Dâniel Fraga
@ 2014-12-03  6:02                                                             ` Chris Rorvick
  2014-12-03 15:22                                                               ` Linus Torvalds
  2014-12-03 14:54                                                             ` Tejun Heo
  2 siblings, 1 reply; 486+ messages in thread
From: Chris Rorvick @ 2014-12-03  6:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dâniel Fraga, Tejun Heo, Paul E. McKenney,
	Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 10:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Put another way: "bad" is generally more trustworthy (because you
> actively saw the bug),

Makes sense, but ...

> while a "good" _before_ a subsequent bad is
> also trustworthy (because if the "good" kernel contained the bug and
> you should have marked it bad, we'd then go on to test all the commits
> that were *not* the bug, so we'd never see a "bad" kernel again).

wouldn't marking a bad commit "good" cause you to not see a *good*
kernel again?  Marking it "good" would seem push the search away from
the bug toward the current "bad" commit.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03  4:14                                                           ` Linus Torvalds
  2014-12-03  4:51                                                             ` Dâniel Fraga
  2014-12-03  6:02                                                             ` Chris Rorvick
@ 2014-12-03 14:54                                                             ` Tejun Heo
  2 siblings, 0 replies; 486+ messages in thread
From: Tejun Heo @ 2014-12-03 14:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dâniel Fraga, Paul E. McKenney, Linux Kernel Mailing List

Hello,

On Tue, Dec 02, 2014 at 08:14:52PM -0800, Linus Torvalds wrote:
> I added Tejun to the Cc, just because I wanted to give him a heads-up
> that I am tentatively starting to blame him in my dark little mind..

Yeap, keeping watch on the thread and working on a patch to dump more
workqueue info on sysrq (I don't know which sysrq alphabet to hang it
on yet, mostly likely it'd get appeneded to tasks dump).  For all
three subsystems, for-3.17 pulls contained quite a bit of changes.
I've skimmed through the commits but nothing rings the bell - I've
never seen this pattern of failures in any of the three subsystems.
Maybe a subtle percpu bug or cpuset messing up scheduling somehow?
Anyways, let's see how 47dfe4037 does.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03  6:02                                                             ` Chris Rorvick
@ 2014-12-03 15:22                                                               ` Linus Torvalds
  2014-12-04  8:43                                                                 ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03 15:22 UTC (permalink / raw)
  To: Chris Rorvick
  Cc: Dâniel Fraga, Tejun Heo, Paul E. McKenney,
	Linux Kernel Mailing List

On Tue, Dec 2, 2014 at 10:02 PM, Chris Rorvick <chris@rorvick.com> wrote:
>
>> while a "good" _before_ a subsequent bad is
>> also trustworthy (because if the "good" kernel contained the bug and
>> you should have marked it bad, we'd then go on to test all the commits
>> that were *not* the bug, so we'd never see a "bad" kernel again).
>
> wouldn't marking a bad commit "good" cause you to not see a *good*
> kernel again?  Marking it "good" would seem push the search away from
> the bug toward the current "bad" commit.

Yes, you're right.

The "long series of 'good'" at the end actually implies that the last
'bad' is questionable - just marking a bad kernel as being good should
push us further into 'bad' land, not the other way around. While
marking a 'good' kernel as 'bad' will push us into 'bug hasn't
happaned yet' land.

Which is somewhat odd, because the bad kernels should be easy to spot.
But it could happen if screwing up the test (by not booting the right
kernel, for example.

Or - and this is the scary part, and one of the huge downsides of 'git
bisect' - it just ends up meaning that the bug comes and goes and is
not quite repeatable enough.

Anyway, Dâniel, if you restart the bisection today, start it one
kernel earlier: re-test the last 'bad' kernel too. So start with
reconfirming that the c9b88e958182 kernel was bad (that *might* be as
easy as just checking your old kernel boot logs, and verifying that
"yes, I really booted it, and yes, it clearly hung and I had to
hard-reboot into it")

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 17:14                                           ` Chris Mason
@ 2014-12-03 18:41                                             ` Dave Jones
  2014-12-03 18:45                                               ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-03 18:41 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 12:14:53PM -0500, Chris Mason wrote:
 > On Tue, Dec 2, 2014 at 11:33 AM, Linus Torvalds 
 > <torvalds@linux-foundation.org> wrote:
 > > On Tue, Dec 2, 2014 at 6:13 AM, Mike Galbraith 
 > > <umgwanakikbuti@gmail.com> wrote:
 > > 
 > > At the same time, the whole "incapacitated by the rt throttle long
 > > enough for the hard lockup detector to trigger" commentary about that
 > > skip_clock_update issue does make me go "Hmmm..". It would certainly
 > > explain Dave's incomprehensible watchdog messages..
 > 
 > Dave's first email mentioned that he had panic on softlockup enabled, 
 > but even with that off the box wasn't recovering.

Not sure if I mentioned in an earlier post, but when I'm local to the machine,
I've disabled reboot-on-lockup, but yes the problem case is the
situation where it actually does lock up afterwards.

 > In my trinity runs here, I've gotten softlockup warnings where the box 
 > eventually recovered.  I'm wondering if some of the "bad" commits in 
 > the bisection are really false positives where the box would have been 
 > able to recover if we'd killed off all the trinity procs and given it 
 > time to breath.

So I've done multiple runs against 3.17-rc1 during bisecting, and hit
the case you describe, where I get a dump like below, and then it
eventually recovers. (Trinity then exits because the taint flag
changes).

I've been stuck on this kernel for a few days now trying to prove it
good/bad one way or the other, and I'm leaning towards good, given
that it recovers, even though the traces look similar.

	Dave


[ 9862.915562] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c29:13237]
[ 9862.915684] Modules linked in: 8021q garp stp tun fuse bnep hidp llc2 af_key nfnetlink can_bcm scsi_transport_iscsi can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc rfcomm bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep e1000e snd_seq coretemp hwmon x86_pkg_temp_thermal nfsd kvm_intel kvm snd_seq_device snd_pcm snd_timer ptp auth_rpcgss snd shpchp oid_registry crct10dif_pclmul crc32c_intel ghash_clmulni_intel usb_debug soundcore pps_core nfs_acl microcode serio_raw pcspkr lockd sunrpc
[ 9862.915987] CPU: 0 PID: 13237 Comm: trinity-c29 Not tainted 3.17.0-rc1+ #112
[ 9862.916046] task: ffff88022657dbc0 ti: ffff8800962b0000 task.ti: ffff8800962b0000
[ 9862.916071] RIP: 0010:[<ffffffff81042569>]  [<ffffffff81042569>] lookup_address_in_pgd+0x89/0xe0
[ 9862.916103] RSP: 0018:ffff8800962b36a8  EFLAGS: 00000202
[ 9862.917024] RAX: ffff88024da748d0 RBX: ffffffff81164c63 RCX: 0000000000000001
[ 9862.917956] RDX: ffff8800962b3740 RSI: ffff8801a3417000 RDI: 000000024da74000
[ 9862.918891] RBP: ffff8800962b36a8 R08: 00003ffffffff000 R09: ffff880000000000
[ 9862.919828] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81375a47
[ 9862.920758] R13: ffff8800962b3618 R14: ffff8802441d81f0 R15: fffff70c86134602
[ 9862.921681] FS:  00007f06c569f740(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[ 9862.922603] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9862.923526] CR2: 0000000002378590 CR3: 00000000a1ba3000 CR4: 00000000001407f0
[ 9862.924459] DR0: 00007f40a579e000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9862.925386] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 9862.926342] Stack:
[ 9862.927255]  ffff8800962b36b8 ffffffff810425e8 ffff8800962b36c8 ffffffff810426ab
[ 9862.928188]  ffff8800962b37c0 ffffffff810427a0 ffffffff810bfc5e ffff8800962b36f0
[ 9862.929127]  ffffffff810a89f5 ffff8800962b3768 ffffffff810c19b4 0000000000000002
[ 9862.930072] Call Trace:
[ 9862.931008]  [<ffffffff810425e8>] lookup_address+0x28/0x30
[ 9862.931958]  [<ffffffff810426ab>] _lookup_address_cpa.isra.9+0x3b/0x40
[ 9862.932913]  [<ffffffff810427a0>] __change_page_attr_set_clr+0xf0/0xab0
[ 9862.933869]  [<ffffffff810bfc5e>] ? put_lock_stats.isra.23+0xe/0x30
[ 9862.934831]  [<ffffffff810a89f5>] ? local_clock+0x25/0x30
[ 9862.935827]  [<ffffffff810c19b4>] ? __lock_acquire.isra.31+0x264/0xa60
[ 9862.936798]  [<ffffffff8109bfed>] ? finish_task_switch+0x7d/0x120
[ 9862.937765]  [<ffffffff810bfc5e>] ? put_lock_stats.isra.23+0xe/0x30
[ 9862.938730]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[ 9862.939698]  [<ffffffff8104418b>] kernel_map_pages+0x7b/0x120
[ 9862.940653]  [<ffffffff81178517>] get_page_from_freelist+0x497/0xaa0
[ 9862.941597]  [<ffffffff81179498>] __alloc_pages_nodemask+0x228/0xb20
[ 9862.942539]  [<ffffffff810a89f5>] ? local_clock+0x25/0x30
[ 9862.943469]  [<ffffffff810c19b4>] ? __lock_acquire.isra.31+0x264/0xa60
[ 9862.944411]  [<ffffffff8135fe50>] ? __radix_tree_preload+0x60/0xf0
[ 9862.945357]  [<ffffffff8135fe50>] ? __radix_tree_preload+0x60/0xf0
[ 9862.946326]  [<ffffffff811c14c1>] alloc_pages_vma+0xf1/0x1b0
[ 9862.947263]  [<ffffffff8118766e>] ? shmem_alloc_page+0x6e/0xc0
[ 9862.948205]  [<ffffffff8118766e>] shmem_alloc_page+0x6e/0xc0
[ 9862.949148]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[ 9862.950090]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[ 9862.951012]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[ 9862.951913]  [<ffffffff81382666>] ? __percpu_counter_add+0x86/0xb0
[ 9862.952811]  [<ffffffff811a4362>] ? __vm_enough_memory+0x62/0x1c0
[ 9862.953700]  [<ffffffff812eb0c7>] ? cap_vm_enough_memory+0x47/0x50
[ 9862.954591]  [<ffffffff81189f00>[ 9893.337880] [sched_delayed] sched: RT throttling activated
[ 9918.893057] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 24s! [swapper/1:0]
[ 9918.894352] Modules linked in: 8021q garp stp tun fuse bnep hidp llc2 af_key nfnetlink can_bcm scsi_transport_iscsi can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc rfcomm bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep e1000e snd_seq coretemp hwmon x86_pkg_temp_thermal nfsd kvm_intel kvm snd_seq_device snd_pcm snd_timer ptp auth_rpcgss snd shpchp oid_registry crct10dif_pclmul crc32c_intel ghash_clmulni_intel usb_debug soundcore pps_core nfs_acl microcode serio_raw pcspkr lockd sunrpc
[ 9918.901158] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.17.0-rc1+ #112
[ 9918.903863] task: ffff880242b716f0 ti: ffff88024240c000 task.ti: ffff88024240c000
[ 9918.905218] RIP: 0010:[<ffffffff81645849>]  [<ffffffff81645849>] cpuidle_enter_state+0x79/0x1c0
[ 9918.906591] RSP: 0000:ffff88024240fe60  EFLAGS: 00000246
[ 9918.907933] RAX: 0000000000000000 RBX: ffff880242b716f0 RCX: 0000000000000019
[ 9918.909264] RDX: 20c49ba5e353f7cf RSI: 000000000003cea4 RDI: 00536da522b45eb6
[ 9918.910594] RBP: ffff88024240fe98 R08: 000000008baf9f86 R09: 0000000000000000
[ 9918.911916] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024240fdf0
[ 9918.913243] R13: ffffffff810bfc5e R14: ffff88024240fdd0 R15: 00000000000001e1
[ 9918.914523] FS:  0000000000000000(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[ 9918.915815] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9918.917108] CR2: 000000000044bfa0 CR3: 0000000001c11000 CR4: 00000000001407e0
[ 9918.918414] DR0: 00007f40a579e000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9918.919710] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 9918.920998] Stack:
[ 9918.922273]  00000906c89bd5bb ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff004da8
[ 9918.923576]  ffff88024240c000 ffffffff81cae620 ffff88024240c000 ffff88024240fea8
[ 9918.924845]  ffffffff81645a47 ffff88024240ff10 ffffffff810b9fb4 ffff88024240ffd8
[ 9918.926108] Call Trace:
[ 9918.927351]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[ 9918.928602]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[ 9918.929850]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[ 9918.931086] Code: d0 48 89 df ff 50 48 41 89 c5 e8 b3 5c aa ff 44 8b 63 04 49 89 c7 0f 1f 44 00 00 e8 a2 19 b0 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 2b 7d c8 4c 89 f8 49 c1 ff 3f 48 f7 ea b8 ff ff ff 7f 48 c1 
[ 9918.933755] sending NMI to other CPUs:
[ 9918.935008] NMI backtrace for cpu 2
[ 9918.936208] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.17.0-rc1+ #112
[ 9918.938592] task: ffff880242b744d0 ti: ffff880242414000 task.ti: ffff880242414000
[ 9918.939793] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[ 9918.941003] RSP: 0018:ffff880242417e20  EFLAGS: 00000046
[ 9918.942191] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[ 9918.943369] RDX: 0000000000000000 RSI: ffff880242417fd8 RDI: 0000000000000002
[ 9918.944535] RBP: ffff880242417e50 R08: 000000008baf9f86 R09: 0000000000000000
[ 9918.945695] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[ 9918.946856] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242414000
[ 9918.948013] FS:  0000000000000000(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[ 9918.949175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9918.950339] CR2: 000000000068f760 CR3: 0000000001c11000 CR4: 00000000001407e0
[ 9918.951515] DR0: 00007f40a579e000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9918.952678] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 9918.953840] Stack:
[ 9918.954988]  0000000242414000 d7251be6f43cb9e7 ffffe8ffff204da8 0000000000000005
[ 9918.956164]  ffffffff81cae620 0000000000000002 ffff880242417e98 ffffffff81645825
[ 9918.957345]  00000906cb95d0a1 ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff204da8
[ 9918.958527] Call Trace:
[ 9918.959692]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[ 9918.960866]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[ 9918.962029]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[ 9918.963185]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[ 9918.964339] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[ 9918.966852] NMI backtrace for cpu 3
[ 9918.968059] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.17.0-rc1+ #112
[ 9918.970450] task: ffff880242b72de0 ti: ffff880242418000 task.ti: ffff880242418000
[ 9918.971652] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[ 9918.972862] RSP: 0018:ffff88024241be20  EFLAGS: 00000046
[ 9918.974045] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[ 9918.975212] RDX: 0000000000000000 RSI: ffff88024241bfd8 RDI: 0000000000000003
[ 9918.976349] RBP: ffff88024241be50 R08: 000000008baf9f86 R09: 0000000000000000
[ 9918.977463] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[ 9918.978550] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242418000
[ 9918.979628] FS:  0000000000000000(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[ 9918.980695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9918.981739] CR2: 00007f37d766f050 CR3: 0000000001c11000 CR4: 00000000001407e0
[ 9918.982786] DR0: 00007f40a579e000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9918.983821] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 9918.984849] Stack:
[ 9918.985866]  0000000342418000 63d0570b22f343d2 ffffe8ffff404da8 0000000000000005
[ 9918.986911]  ffffffff81cae620 0000000000000003 ffff88024241be98 ffffffff81645825
[ 9918.987961]  00000906cb9636ab ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff404da8
[ 9918.989007] Call Trace:
[ 9918.990036]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[ 9918.991072]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[ 9918.992097]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[ 9918.993127]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[ 9918.994152] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 18:41                                             ` Dave Jones
@ 2014-12-03 18:45                                               ` Linus Torvalds
  2014-12-03 19:00                                                 ` Dave Jones
                                                                   ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03 18:45 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Linus Torvalds, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Wed, Dec 3, 2014 at 10:41 AM, Dave Jones <davej@redhat.com> wrote:
>
> I've been stuck on this kernel for a few days now trying to prove it
> good/bad one way or the other, and I'm leaning towards good, given
> that it recovers, even though the traces look similar.

Ugh. But this does *not* happen with 3.16, right? Even the non-fatal case?

If so, I'd be inclined to call it "bad". But there might well be two
bugs: one that makes that NMI watchdog trigger, and another one that
then makes it be a hard lockup. I'd think it would be good to figure
out the "NMI watchdog starts triggering" one first, though.

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 18:45                                               ` Linus Torvalds
@ 2014-12-03 19:00                                                 ` Dave Jones
  2014-12-03 19:25                                                   ` Linus Torvalds
  2014-12-03 19:59                                                   ` Chris Mason
  2014-12-04  0:27                                                 ` Dave Jones
  2014-12-05 17:15                                                 ` Dave Jones
  2 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-03 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Wed, Dec 03, 2014 at 10:45:57AM -0800, Linus Torvalds wrote:
 > On Wed, Dec 3, 2014 at 10:41 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > I've been stuck on this kernel for a few days now trying to prove it
 > > good/bad one way or the other, and I'm leaning towards good, given
 > > that it recovers, even though the traces look similar.
 > 
 > Ugh. But this does *not* happen with 3.16, right? Even the non-fatal case?

correct. at least not in any of the runs that I did to date.

 > If so, I'd be inclined to call it "bad". But there might well be two
 > bugs: one that makes that NMI watchdog trigger, and another one that
 > then makes it be a hard lockup. I'd think it would be good to figure
 > out the "NMI watchdog starts triggering" one first, though.

I think you're right.

So right after sending my last mail, I rebooted, and restarted the run
on the same kernel again.

As I was writing this mail, this happened.

[  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]

and that's all that made it over the console. I couldn't log in via ssh,
and thought "ah-ha, so it IS bad".  I walked over to reboot it, and
found I could actually log in on the console. check out this dmesg..

[  503.683055] Clocksource tsc unstable (delta = -95946009388 ns)
[  503.692038] Switched to clocksource hpet
[  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
[  524.420972] Modules linked in: fuse tun rfcomm llc2 af_key nfnetlink scsi_transport_iscsi can_bcm bnep can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul crc32c_intel ghash_clmulni_intel e1000e snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer microcode snd serio_raw pcspkr usb_debug ptp pps_core shpchp soundcore
[  524.421288] CPU: 0 PID: 20182 Comm: trinity-c178 Not tainted 3.17.0-rc1+ #112
[  524.421351] task: ffff8801cd63c4d0 ti: ffff8801d2138000 task.ti: ffff8801d2138000
[  524.421377] RIP: 0010:[<ffffffff8136968d>]  [<ffffffff8136968d>] copy_user_handle_tail+0x6d/0x90
[  524.421411] RSP: 0018:ffff8801d213bf00  EFLAGS: 00000202
[  524.421430] RAX: 000000000007a8d9 RBX: ffffffff817b2c64 RCX: 0000000000000000
[  524.421455] RDX: 0000000000056ddc RSI: ffff88023412baf5 RDI: ffff88023412baf4
[  524.421480] RBP: ffff8801d213bf00 R08: 0000000000000000 R09: 0000000000000000
[  524.421504] R10: 0000000000000100 R11: 0000000000000000 R12: ffffffff817b92d0
[  524.421528] R13: 00007f6a24fe0000 R14: ffff8801d2138000 R15: 0000000000000001
[  524.421552] FS:  00007f6a24fd0740(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[  524.421579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  524.421600] CR2: 00007f6a24fe0000 CR3: 00000002053c8000 CR4: 00000000001407f0
[  524.421624] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  524.421648] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  524.421672] Stack:
[  524.421683]  ffff8801d213bf78 ffffffff812e4a85 00007f6a2452f068 0000000000000000
[  524.421716]  ffffffff3fffffff 00000000000003e8 00007f6a2452f000 00000000000000f8
[  524.421748]  0000000000000000 00000000a417dc9d 00000000000000f8 00007f6a2452f000
[  524.421781] Call Trace:
[  524.422754]  [<ffffffff812e4a85>] SyS_add_key+0xd5/0x240
[  524.423736]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  524.424713] Code: c0 74 d3 85 d2 89 d0 74 39 85 c9 74 35 45 31 c0 eb 0c 0f 1f 40 00 83 ea 01 74 17 48 89 f7 48 8d 77 01 44 89 c1 0f 1f 00 c6 07 00 <0f> 1f 00 85 c9 74 e4 0f 1f 00 5d c3 0f 1f 80 00 00 00 00 31 c0 
[  524.426861] sending NMI to other CPUs:
[  524.427867] NMI backtrace for cpu 3
[  524.428868] CPU: 3 PID: 20165 Comm: trinity-c161 Not tainted 3.17.0-rc1+ #112
[  524.430914] task: ffff8801fe67dbc0 ti: ffff8801fe70c000 task.ti: ffff8801fe70c000
[  524.431951] RIP: 0010:[<ffffffff810f99da>]  [<ffffffff810f99da>] generic_exec_single+0xea/0x1a0
[  524.433004] RSP: 0018:ffff8801fe70fc40  EFLAGS: 00000202
[  524.434051] RAX: 0000000000000000 RBX: ffff8801fe70fc40 RCX: 0000000000000038
[  524.435108] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[  524.436165] RBP: ffff8801fe70fc90 R08: ffff880242bfa3f0 R09: 0000000000000000
[  524.437221] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  524.438278] R13: 0000000000000001 R14: ffff880238ef1290 R15: ffffffff8115fd30
[  524.439339] FS:  00007f6a24fd0740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[  524.440415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  524.441494] CR2: 00007f6a23446001 CR3: 00000001cd627000 CR4: 00000000001407e0
[  524.442579] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  524.443665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  524.444749] Stack:
[  524.445825]  0000000000000000 ffffffff8115fd30 ffff880238ef1290 0000000000000003
[  524.446926]  00000000af3b5f31 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  524.448028]  ffff880238ef1290 0000000000000001 ffff8801fe70fcd0 ffffffff810f9b5a
[  524.449115] Call Trace:
[  524.450179]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  524.451259]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  524.452334]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  524.453411]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  524.454483]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  524.455549]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  524.456618]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  524.457689]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  524.458761]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  524.459827]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  524.460887]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  524.461944]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  524.462996]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  524.464029]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  524.465040]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  524.466027]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  524.466995]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  524.467940]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  524.468874]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  524.469791]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  524.470689]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  524.471568] Code: 00 4c 1d 00 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 
[  524.473500] NMI backtrace for cpu 1
[  524.473584] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 45.633 msecs
[  524.475347] CPU: 1 PID: 20241 Comm: trinity-c237 Not tainted 3.17.0-rc1+ #112
[  524.477246] task: ffff8801e32c16f0 ti: ffff88008696c000 task.ti: ffff88008696c000
[  524.478212] RIP: 0010:[<ffffffff810f99d8>]  [<ffffffff810f99d8>] generic_exec_single+0xe8/0x1a0
[  524.479189] RSP: 0018:ffff88008696fc40  EFLAGS: 00000202
[  524.480152] RAX: ffff880062d6bc00 RBX: ffff88008696fc40 RCX: ffff880062d6bc40
[  524.481130] RDX: ffff8802441d4c00 RSI: ffff88008696fc40 RDI: ffff88008696fc40
[  524.482108] RBP: ffff88008696fc90 R08: 0000000000000001 R09: 0000000000000001
[  524.483084] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  524.484051] R13: 0000000000000001 R14: ffff880238ef1bd8 R15: ffffffff8115fd30
[  524.485022] FS:  00007f6a24fd0740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[  524.486001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  524.486975] CR2: 0000000000000000 CR3: 00000000869c1000 CR4: 00000000001407e0
[  524.487952] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  524.488935] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  524.489891] Stack:
[  524.490817]  ffff880062d6bc40 ffffffff8115fd30 ffff880238ef1bd8 0000000000000003
[  524.491770]  00000000d88cff08 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  524.492729]  ffff880238ef1bd8 0000000000000001 ffff88008696fcd0 ffffffff810f9b5a
[  524.493684] Call Trace:
[  524.494611]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  524.495530]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  524.496433]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  524.497334]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  524.498227]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  524.499117]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  524.500007]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  524.500895]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  524.501786]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  524.502668]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  524.503548]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  524.504419]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  524.505295]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  524.506172]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  524.507045]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  524.507911]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  524.508774]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  524.509634]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  524.510488]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  524.511330]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  524.512180]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  524.513026] Code: c7 c2 00 4c 1d 00 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 <f3> 90 f6 43 18 01 75 f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 
[  524.514900] NMI backtrace for cpu 2
[  524.514903] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 87.033 msecs
[  524.516709] CPU: 2 PID: 20160 Comm: trinity-c156 Not tainted 3.17.0-rc1+ #112
[  524.518568] task: ffff88006b945bc0 ti: ffff880062d68000 task.ti: ffff880062d68000
[  524.519521] RIP: 0010:[<ffffffff810f99de>]  [<ffffffff810f99de>] generic_exec_single+0xee/0x1a0
[  524.520488] RSP: 0018:ffff880062d6bc40  EFLAGS: 00000202
[  524.521456] RAX: ffff8801fe70fc00 RBX: ffff880062d6bc40 RCX: ffff8801fe70fc40
[  524.522424] RDX: ffff8802441d4c00 RSI: ffff880062d6bc40 RDI: ffff880062d6bc40
[  524.523394] RBP: ffff880062d6bc90 R08: 0000000000000001 R09: 0000000000000001
[  524.524360] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  524.525325] R13: 0000000000000001 R14: ffff880238ef5cd0 R15: ffffffff8115fd30
[  524.526293] FS:  00007f6a24fd0740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[  524.527269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  524.528248] CR2: 00000000019ca288 CR3: 000000007a7bb000 CR4: 00000000001407e0
[  524.529238] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  524.530230] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  524.531200] Stack:
[  524.532141]  ffff8801fe70fc40 ffffffff8115fd30 ffff880238ef5cd0 0000000000000003
[  524.533104]  000000008b5d668d 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  524.534065]  ffff880238ef5cd0 0000000000000001 ffff880062d6bcd0 ffffffff810f9b5a
[  524.535014] Call Trace:
[  524.535938]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  524.536854]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  524.537756]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  524.538648]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  524.539536]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  524.540425]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  524.541317]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  524.542202]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  524.543089]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  524.543967]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  524.544840]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  524.545709]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  524.546581]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  524.547455]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  524.548327]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  524.549188]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  524.550046]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  524.550905]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  524.551755]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  524.552600]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  524.553444]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  524.554281] Code: 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
[  524.556148] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 128.279 msecs
[  548.406844] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
[  548.407766] Modules linked in: fuse tun rfcomm llc2 af_key nfnetlink scsi_transport_iscsi can_bcm bnep can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul crc32c_intel ghash_clmulni_intel e1000e snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer microcode snd serio_raw pcspkr usb_debug ptp pps_core shpchp soundcore
[  548.412860] CPU: 0 PID: 20182 Comm: trinity-c178 Tainted: G             L 3.17.0-rc1+ #112
[  548.414955] task: ffff8801cd63c4d0 ti: ffff8801d2138000 task.ti: ffff8801d2138000
[  548.416020] RIP: 0010:[<ffffffff8136968d>]  [<ffffffff8136968d>] copy_user_handle_tail+0x6d/0x90
[  548.417110] RSP: 0018:ffff8801d213bf00  EFLAGS: 00000202
[  548.418188] RAX: 000000000007a8d9 RBX: ffffffff817b2c64 RCX: 0000000000000000
[  548.419278] RDX: 0000000000056ddc RSI: ffff88023412baf5 RDI: ffff88023412baf4
[  548.420372] RBP: ffff8801d213bf00 R08: 0000000000000000 R09: 0000000000000000
[  548.421473] R10: 0000000000000100 R11: 0000000000000000 R12: ffffffff817b92d0
[  548.422576] R13: 00007f6a24fe0000 R14: ffff8801d2138000 R15: 0000000000000001
[  548.423662] FS:  00007f6a24fd0740(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[  548.424731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  548.425801] CR2: 00007f6a24fe0000 CR3: 00000002053c8000 CR4: 00000000001407f0
[  548.426870] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  548.427923] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  548.428956] Stack:
[  548.429968]  ffff8801d213bf78 ffffffff812e4a85 00007f6a2452f068 0000000000000000
[  548.430992]  ffffffff3fffffff 00000000000003e8 00007f6a2452f000 00000000000000f8
[  548.432009]  0000000000000000 00000000a417dc9d 00000000000000f8 00007f6a2452f000
[  548.433023] Call Trace:
[  548.434031]  [<ffffffff812e4a85>] SyS_add_key+0xd5/0x240
[  548.435046]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  548.436057] Code: c0 74 d3 85 d2 89 d0 74 39 85 c9 74 35 45 31 c0 eb 0c 0f 1f 40 00 83 ea 01 74 17 48 89 f7 48 8d 77 01 44 89 c1 0f 1f 00 c6 07 00 <0f> 1f 00 85 c9 74 e4 0f 1f 00 5d c3 0f 1f 80 00 00 00 00 31 c0 
[  548.438271] sending NMI to other CPUs:
[  548.439311] NMI backtrace for cpu 3
[  548.440341] CPU: 3 PID: 20165 Comm: trinity-c161 Tainted: G             L 3.17.0-rc1+ #112
[  548.442472] task: ffff8801fe67dbc0 ti: ffff8801fe70c000 task.ti: ffff8801fe70c000
[  548.443553] RIP: 0010:[<ffffffff810f99de>]  [<ffffffff810f99de>] generic_exec_single+0xee/0x1a0
[  548.444639] RSP: 0018:ffff8801fe70fc40  EFLAGS: 00000202
[  548.445718] RAX: 0000000000000000 RBX: ffff8801fe70fc40 RCX: 0000000000000038
[  548.446803] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[  548.447882] RBP: ffff8801fe70fc90 R08: ffff880242bfa3f0 R09: 0000000000000000
[  548.448957] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  548.450039] R13: 0000000000000001 R14: ffff880238ef1290 R15: ffffffff8115fd30
[  548.451120] FS:  00007f6a24fd0740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[  548.452211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  548.453305] CR2: 00007f6a23446001 CR3: 00000001cd627000 CR4: 00000000001407e0
[  548.454411] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  548.455515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  548.456617] Stack:
[  548.457711]  0000000000000000 ffffffff8115fd30 ffff880238ef1290 0000000000000003
[  548.458831]  00000000af3b5f31 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  548.459954]  ffff880238ef1290 0000000000000001 ffff8801fe70fcd0 ffffffff810f9b5a
[  548.461078] Call Trace:
[  548.462187]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  548.463310]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  548.464425]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  548.465541]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  548.466656]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  548.467772]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  548.468887]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  548.470003]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  548.471127]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  548.472230]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  548.473310]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  548.474375]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  548.475432]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  548.476470]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  548.477486]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  548.478479]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  548.479452]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  548.480402]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  548.481346]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  548.482266]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  548.483164]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  548.484050] Code: 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
[  548.485991] NMI backtrace for cpu 2
[  548.486903] CPU: 2 PID: 20160 Comm: trinity-c156 Tainted: G             L 3.17.0-rc1+ #112
[  548.488787] task: ffff88006b945bc0 ti: ffff880062d68000 task.ti: ffff880062d68000
[  548.489743] RIP: 0010:[<ffffffff810f99da>]  [<ffffffff810f99da>] generic_exec_single+0xea/0x1a0
[  548.490709] RSP: 0018:ffff880062d6bc40  EFLAGS: 00000202
[  548.491667] RAX: ffff8801fe70fc00 RBX: ffff880062d6bc40 RCX: ffff8801fe70fc40
[  548.492629] RDX: ffff8802441d4c00 RSI: ffff880062d6bc40 RDI: ffff880062d6bc40
[  548.493590] RBP: ffff880062d6bc90 R08: 0000000000000001 R09: 0000000000000001
[  548.494553] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  548.495511] R13: 0000000000000001 R14: ffff880238ef5cd0 R15: ffffffff8115fd30
[  548.496462] FS:  00007f6a24fd0740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[  548.497423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  548.498382] CR2: 00000000019ca288 CR3: 000000007a7bb000 CR4: 00000000001407e0
[  548.499347] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  548.500305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  548.501264] Stack:
[  548.502193]  ffff8801fe70fc40 ffffffff8115fd30 ffff880238ef5cd0 0000000000000003
[  548.503131]  000000008b5d668d 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  548.504073]  ffff880238ef5cd0 0000000000000001 ffff880062d6bcd0 ffffffff810f9b5a
[  548.505015] Call Trace:
[  548.505940]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  548.506857]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  548.507753]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  548.508639]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  548.509518]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  548.510391]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  548.511261]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  548.512130]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  548.513005]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  548.513877]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  548.514740]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  548.515600]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  548.516451]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  548.517313]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  548.518177]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  548.519031]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  548.519878]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  548.520719]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  548.521563]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  548.522397]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  548.523224]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  548.524052] Code: 00 4c 1d 00 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 
[  548.525897] NMI backtrace for cpu 1
[  548.526765] CPU: 1 PID: 20241 Comm: trinity-c237 Tainted: G             L 3.17.0-rc1+ #112
[  548.528575] task: ffff8801e32c16f0 ti: ffff88008696c000 task.ti: ffff88008696c000
[  548.529499] RIP: 0010:[<ffffffff810f99de>]  [<ffffffff810f99de>] generic_exec_single+0xee/0x1a0
[  548.530433] RSP: 0018:ffff88008696fc40  EFLAGS: 00000202
[  548.531363] RAX: ffff880062d6bc00 RBX: ffff88008696fc40 RCX: ffff880062d6bc40
[  548.532305] RDX: ffff8802441d4c00 RSI: ffff88008696fc40 RDI: ffff88008696fc40
[  548.533244] RBP: ffff88008696fc90 R08: 0000000000000001 R09: 0000000000000001
[  548.534181] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  548.535114] R13: 0000000000000001 R14: ffff880238ef1bd8 R15: ffffffff8115fd30
[  548.536048] FS:  00007f6a24fd0740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[  548.536995] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  548.537936] CR2: 0000000000000000 CR3: 00000000869c1000 CR4: 00000000001407e0
[  548.538888] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  548.539842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  548.540795] Stack:
[  548.541745]  ffff880062d6bc40 ffffffff8115fd30 ffff880238ef1bd8 0000000000000003
[  548.542699]  00000000d88cff08 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  548.543640]  ffff880238ef1bd8 0000000000000001 ffff88008696fcd0 ffffffff810f9b5a
[  548.544574] Call Trace:
[  548.545491]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  548.546404]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  548.547297]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  548.548179]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  548.549053]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  548.549920]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  548.550789]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  548.551651]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  548.552520]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  548.553388]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  548.554247]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  548.555103]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  548.555952]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  548.556809]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  548.557665]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  548.558511]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  548.559352]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  548.560189]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  548.561021]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  548.561847]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  548.562664]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  548.563483] Code: 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
[  564.237567] INFO: rcu_preempt self-detected stall on CPU
[  564.238434] 	0: (23495594 ticks this GP) idle=ee5/140000000000001/0 softirq=31067/31067 
[  564.239318] 	 (t=6000 jiffies g=12425 c=12424 q=0)
[  564.240203] Task dump for CPU 0:
[  564.241078] trinity-c178    R  running task    13424 20182  19467 0x10000008
[  564.241975]  ffff8801cd63c4d0 00000000a417dc9d ffff880244003dc8 ffffffff810a4406
[  564.242880]  ffffffff810a4372 0000000000000000 ffffffff81c50240 0000000000000086
[  564.243789]  ffff880244003de0 ffffffff810a83e9 0000000000000001 ffff880244003e10
[  564.244697] Call Trace:
[  564.245597]  <IRQ>  [<ffffffff810a4406>] sched_show_task+0x116/0x180
[  564.246511]  [<ffffffff810a4372>] ? sched_show_task+0x82/0x180
[  564.247420]  [<ffffffff810a83e9>] dump_cpu_task+0x39/0x40
[  564.248337]  [<ffffffff810d6360>] rcu_dump_cpu_stacks+0xa0/0xe0
[  564.249247]  [<ffffffff810ddba3>] rcu_check_callbacks+0x503/0x810
[  564.250155]  [<ffffffff81375a63>] ? __this_cpu_preempt_check+0x13/0x20
[  564.251069]  [<ffffffff810e5c93>] ? hrtimer_run_queues+0x43/0x130
[  564.251985]  [<ffffffff810e43e7>] update_process_times+0x47/0x70
[  564.252902]  [<ffffffff810f4c8a>] tick_sched_timer+0x4a/0x1a0
[  564.253796]  [<ffffffff810e4a71>] ? __run_hrtimer+0x81/0x250
[  564.254671]  [<ffffffff810e4a71>] __run_hrtimer+0x81/0x250
[  564.255541]  [<ffffffff810f4c40>] ? tick_init_highres+0x20/0x20
[  564.256404]  [<ffffffff810e5697>] hrtimer_interrupt+0x107/0x260
[  564.257251]  [<ffffffff81031cc4>] local_apic_timer_interrupt+0x34/0x60
[  564.258086]  [<ffffffff817b4b8f>] smp_apic_timer_interrupt+0x3f/0x60
[  564.258907]  [<ffffffff817b2faf>] apic_timer_interrupt+0x6f/0x80
[  564.259721]  <EOI>  [<ffffffff817b2c64>] ? retint_restore_args+0xe/0xe
[  564.260541]  [<ffffffff8136968d>] ? copy_user_handle_tail+0x6d/0x90
[  564.261360]  [<ffffffff812e4a85>] SyS_add_key+0xd5/0x240
[  564.262176]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  572.432766] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [trinity-c156:20160]
[  572.433593] Modules linked in: fuse tun rfcomm llc2 af_key nfnetlink scsi_transport_iscsi can_bcm bnep can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul crc32c_intel ghash_clmulni_intel e1000e snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer microcode snd serio_raw pcspkr usb_debug ptp pps_core shpchp soundcore
[  572.438194] CPU: 2 PID: 20160 Comm: trinity-c156 Tainted: G             L 3.17.0-rc1+ #112
[  572.440070] task: ffff88006b945bc0 ti: ffff880062d68000 task.ti: ffff880062d68000
[  572.441020] RIP: 0010:[<ffffffff810f99de>]  [<ffffffff810f99de>] generic_exec_single+0xee/0x1a0
[  572.441982] RSP: 0018:ffff880062d6bc40  EFLAGS: 00000202
[  572.442935] RAX: ffff8801fe70fc00 RBX: ffffffff817b2c64 RCX: ffff8801fe70fc40
[  572.443898] RDX: ffff8802441d4c00 RSI: ffff880062d6bc40 RDI: ffff880062d6bc40
[  572.444868] RBP: ffff880062d6bc90 R08: 0000000000000001 R09: 0000000000000001
[  572.445836] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880062d6bbb8
[  572.446808] R13: 0000000000406040 R14: ffff880062d68000 R15: ffff88006b945bc0
[  572.447777] FS:  00007f6a24fd0740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[  572.448759] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  572.449742] CR2: 00000000019ca288 CR3: 000000007a7bb000 CR4: 00000000001407e0
[  572.450733] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  572.451722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  572.452710] Stack:
[  572.453692]  ffff8801fe70fc40 ffffffff8115fd30 ffff880238ef5cd0 0000000000000003
[  572.454698]  000000008b5d668d 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  572.455712]  ffff880238ef5cd0 0000000000000001 ffff880062d6bcd0 ffffffff810f9b5a
[  572.456721] Call Trace:
[  572.457721]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  572.458732]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  572.459736]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  572.460745]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  572.461748]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  572.462763]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  572.463776]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  572.464765]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  572.465738]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  572.466709]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  572.467669]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  572.468627]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  572.469586]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  572.470543]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  572.471493]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  572.472442]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  572.473386]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  572.474322]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  572.475260]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  572.476186]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  572.477112]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  572.478038] Code: 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
[  572.480077] sending NMI to other CPUs:
[  572.481045] NMI backtrace for cpu 1
[  572.482011] CPU: 1 PID: 20241 Comm: trinity-c237 Tainted: G             L 3.17.0-rc1+ #112
[  572.484001] task: ffff8801e32c16f0 ti: ffff88008696c000 task.ti: ffff88008696c000
[  572.485020] RIP: 0010:[<ffffffff810f99de>]  [<ffffffff810f99de>] generic_exec_single+0xee/0x1a0
[  572.486053] RSP: 0018:ffff88008696fc40  EFLAGS: 00000202
[  572.487084] RAX: ffff880062d6bc00 RBX: ffff88008696fc40 RCX: ffff880062d6bc40
[  572.488124] RDX: ffff8802441d4c00 RSI: ffff88008696fc40 RDI: ffff88008696fc40
[  572.489163] RBP: ffff88008696fc90 R08: 0000000000000001 R09: 0000000000000001
[  572.490199] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  572.491231] R13: 0000000000000001 R14: ffff880238ef1bd8 R15: ffffffff8115fd30
[  572.492262] FS:  00007f6a24fd0740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[  572.493299] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  572.494343] CR2: 0000000000000000 CR3: 00000000869c1000 CR4: 00000000001407e0
[  572.495380] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  572.496396] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  572.497393] Stack:
[  572.498357]  ffff880062d6bc40 ffffffff8115fd30 ffff880238ef1bd8 0000000000000003
[  572.499325]  00000000d88cff08 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  572.500276]  ffff880238ef1bd8 0000000000000001 ffff88008696fcd0 ffffffff810f9b5a
[  572.501223] Call Trace:
[  572.502140]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  572.503053]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  572.503947]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  572.504837]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  572.505720]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  572.506600]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  572.507477]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  572.508350]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  572.509229]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  572.510102]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  572.510968]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  572.511828]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  572.512689]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  572.513556]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  572.514413]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  572.515267]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  572.516114]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  572.516958]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  572.517794]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  572.518627]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  572.519462]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  572.520291] Code: 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
[  572.522141] NMI backtrace for cpu 3
[  572.523018] CPU: 3 PID: 20165 Comm: trinity-c161 Tainted: G             L 3.17.0-rc1+ #112
[  572.524827] task: ffff8801fe67dbc0 ti: ffff8801fe70c000 task.ti: ffff8801fe70c000
[  572.525748] RIP: 0010:[<ffffffff810f99de>]  [<ffffffff810f99de>] generic_exec_single+0xee/0x1a0
[  572.526677] RSP: 0018:ffff8801fe70fc40  EFLAGS: 00000202
[  572.527605] RAX: 0000000000000000 RBX: ffff8801fe70fc40 RCX: 0000000000000038
[  572.528545] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[  572.529480] RBP: ffff8801fe70fc90 R08: ffff880242bfa3f0 R09: 0000000000000000
[  572.530416] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  572.531350] R13: 0000000000000001 R14: ffff880238ef1290 R15: ffffffff8115fd30
[  572.532287] FS:  00007f6a24fd0740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[  572.533230] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  572.534174] CR2: 00007f6a23446001 CR3: 00000001cd627000 CR4: 00000000001407e0
[  572.535133] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  572.536091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  572.537048] Stack:
[  572.537979]  0000000000000000 ffffffff8115fd30 ffff880238ef1290 0000000000000003
[  572.538915]  00000000af3b5f31 00000000ffffffff 0000000000000000 ffffffff8115fd30
[  572.539850]  ffff880238ef1290 0000000000000001 ffff8801fe70fcd0 ffffffff810f9b5a
[  572.540790] Call Trace:
[  572.541711]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  572.542629]  [<ffffffff8115fd30>] ? perf_duration_warn+0x70/0x70
[  572.543522]  [<ffffffff810f9b5a>] smp_call_function_single+0x6a/0xe0
[  572.544408]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  572.545285]  [<ffffffff8116036a>] perf_event_read+0xca/0xd0
[  572.546154]  [<ffffffff81160400>] perf_event_read_value+0x90/0xe0
[  572.547027]  [<ffffffff81161b1e>] perf_read+0x20e/0x360
[  572.547890]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  572.548765]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  572.549637]  [<ffffffff811dfdd3>] do_loop_readv_writev+0x63/0x90
[  572.550497]  [<ffffffff81161910>] ? cpu_clock_event_init+0x40/0x40
[  572.551353]  [<ffffffff811e1c97>] do_readv_writev+0x267/0x280
[  572.552210]  [<ffffffff81375a47>] ? debug_smp_processor_id+0x17/0x20
[  572.553066]  [<ffffffff810bffc6>] ? lock_release_holdtime.part.24+0xe6/0x160
[  572.553924]  [<ffffffff810a2c5d>] ? get_parent_ip+0xd/0x50
[  572.554775]  [<ffffffff810a2dbb>] ? preempt_count_sub+0x6b/0xf0
[  572.555618]  [<ffffffff817b1377>] ? _raw_spin_unlock_irq+0x37/0x60
[  572.556453]  [<ffffffff810e672a>] ? do_setitimer+0x1ca/0x250
[  572.557284]  [<ffffffff811e1ce9>] vfs_readv+0x39/0x50
[  572.558111]  [<ffffffff811e1dac>] SyS_readv+0x5c/0x100
[  572.558934]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  572.559756] Code: 48 89 de 48 03 14 c5 e0 af d1 81 48 89 df e8 5a 47 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 0f 85 8e 00 
[  572.561589] NMI backtrace for cpu 0
[  572.562451] CPU: 0 PID: 20182 Comm: trinity-c178 Tainted: G             L 3.17.0-rc1+ #112
[  572.564252] task: ffff8801cd63c4d0 ti: ffff8801d2138000 task.ti: ffff8801d2138000
[  572.565166] RIP: 0010:[<ffffffff8103cc16>]  [<ffffffff8103cc16>] read_hpet+0x16/0x20
[  572.566090] RSP: 0018:ffff880244003e70  EFLAGS: 00000046
[  572.567011] RAX: 00000000e8dd201c RBX: 000000000001cc86 RCX: ffff8802441d1118
[  572.567939] RDX: 0000000000010001 RSI: ffffffff81a86870 RDI: ffffffff81c28680
[  572.568869] RBP: ffff880244003e70 R08: 0000000000000000 R09: 0000000000000000
[  572.569802] R10: 0000000000000000 R11: 0000000000000000 R12: 000000854428057e
[  572.570732] R13: ffff8802441ce060 R14: ffff8802441cda80 R15: 0000008baf2377a5
[  572.571665] FS:  00007f6a24fd0740(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[  572.572608] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  572.573555] CR2: 00007f6a24fe0000 CR3: 00000002053c8000 CR4: 00000000001407f0
[  572.574508] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  572.575459] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  572.576414] Stack:
[  572.577362]  ffff880244003e98 ffffffff810eb574 ffffffff810f4c63 ffff8801d213be58
[  572.578323]  ffff880244003f40 ffff880244003ec8 ffffffff810f4c63 ffff8802441cda80
[  572.579255]  ffff8802441ce060 ffff880244003f40 ffff8802441cdb08 ffff880244003f08
[  572.580181] Call Trace:
[  572.581092]  <IRQ> 

[  572.581988]  [<ffffffff810eb574>] ktime_get+0x94/0x120
[  572.582865]  [<ffffffff810f4c63>] ? tick_sched_timer+0x23/0x1a0
[  572.583735]  [<ffffffff810f4c63>] tick_sched_timer+0x23/0x1a0
[  572.584592]  [<ffffffff810e4a71>] __run_hrtimer+0x81/0x250
[  572.585447]  [<ffffffff810f4c40>] ? tick_init_highres+0x20/0x20
[  572.586297]  [<ffffffff810e5697>] hrtimer_interrupt+0x107/0x260
[  572.587148]  [<ffffffff81031cc4>] local_apic_timer_interrupt+0x34/0x60
[  572.588004]  [<ffffffff817b4b8f>] smp_apic_timer_interrupt+0x3f/0x60
[  572.588859]  [<ffffffff817b2faf>] apic_timer_interrupt+0x6f/0x80
[  572.589705]  <EOI> 

[  572.590542]  [<ffffffff817b2c64>] ? retint_restore_args+0xe/0xe
[  572.591379]  [<ffffffff8136968d>] ? copy_user_handle_tail+0x6d/0x90
[  572.592225]  [<ffffffff812e4a85>] SyS_add_key+0xd5/0x240
[  572.593071]  [<ffffffff817b2264>] tracesys+0xdd/0xe2
[  572.593901] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 a3 0c 0b 01 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d d9 0b 0b 01 
[  599.566886] [sched_delayed] sched: RT throttling activated
[  599.573324] end_request: I/O error, dev sda, sector 0
[  624.402393] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [swapper/2:0]
[  624.403521] Modules linked in: fuse tun rfcomm llc2 af_key nfnetlink scsi_transport_iscsi can_bcm bnep can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul crc32c_intel ghash_clmulni_intel e1000e snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer microcode snd serio_raw pcspkr usb_debug ptp pps_core shpchp soundcore
[  624.409477] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.17.0-rc1+ #112
[  624.411931] task: ffff880242b744d0 ti: ffff880242414000 task.ti: ffff880242414000
[  624.413170] RIP: 0010:[<ffffffff81645849>]  [<ffffffff81645849>] cpuidle_enter_state+0x79/0x1c0
[  624.414396] RSP: 0018:ffff880242417e60  EFLAGS: 00000246
[  624.415614] RAX: 0000000000000000 RBX: ffff880242b744d0 RCX: 0000000000000019
[  624.416843] RDX: 20c49ba5e353f7cf RSI: 000000000003cd60 RDI: 002e512580cfca6e
[  624.418074] RBP: ffff880242417e98 R08: 000000008bafc4e0 R09: 0000000000000000
[  624.419397] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880242417df0
[  624.420626] R13: ffffffff810bfc5e R14: ffff880242417dd0 R15: 0000000000000210
[  624.421864] FS:  0000000000000000(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[  624.423132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  624.424380] CR2: 00007f15cfbaf000 CR3: 0000000001c11000 CR4: 00000000001407e0
[  624.425640] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  624.426904] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  624.428164] Stack:
[  624.429393]  0000009177337316 ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff204da8
[  624.430630]  ffff880242414000 ffffffff81cae620 ffff880242414000 ffff880242417ea8
[  624.431865]  ffffffff81645a47 ffff880242417f10 ffffffff810b9fb4 ffff880242417fd8
[  624.433085] Call Trace:
[  624.434242]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  624.435413]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  624.436588]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[  624.437761] Code: d0 48 89 df ff 50 48 41 89 c5 e8 b3 5c aa ff 44 8b 63 04 49 89 c7 0f 1f 44 00 00 e8 a2 19 b0 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 2b 7d c8 4c 89 f8 49 c1 ff 3f 48 f7 ea b8 ff ff ff 7f 48 c1 
[  624.440251] sending NMI to other CPUs:
[  624.441500] NMI backtrace for cpu 1
[  624.442624] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.17.0-rc1+ #112
[  624.444912] task: ffff880242b716f0 ti: ffff88024240c000 task.ti: ffff88024240c000
[  624.446070] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[  624.447236] RSP: 0018:ffff88024240fe20  EFLAGS: 00000046
[  624.448397] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[  624.449570] RDX: 0000000000000000 RSI: ffff88024240ffd8 RDI: 0000000000000001
[  624.450735] RBP: ffff88024240fe50 R08: 000000008bafc4e0 R09: 0000000000000000
[  624.451894] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[  624.453053] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024240c000
[  624.454213] FS:  0000000000000000(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[  624.455369] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  624.456531] CR2: 0000000000497120 CR3: 0000000001c11000 CR4: 00000000001407e0
[  624.457707] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  624.458881] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  624.460056] Stack:
[  624.461220]  000000014240c000 0d2996f8aba02290 ffffe8ffff004da8 0000000000000005
[  624.462418]  ffffffff81cae620 0000000000000001 ffff88024240fe98 ffffffff81645825
[  624.463622]  000000917994f078 ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff004da8
[  624.464826] Call Trace:
[  624.466015]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[  624.467220]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  624.468415]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  624.469607]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[  624.470800] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[  624.473394] NMI backtrace for cpu 0
[  624.474616] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.17.0-rc1+ #112
[  624.477061] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[  624.478299] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[  624.479544] RSP: 0018:ffffffff81c03e68  EFLAGS: 00000046
[  624.480766] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[  624.481964] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[  624.483137] RBP: ffffffff81c03e98 R08: 000000008bafc4e0 R09: 0000000000000000
[  624.484284] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[  624.485403] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[  624.486516] FS:  0000000000000000(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[  624.487613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  624.488690] CR2: 00007fd405db1000 CR3: 0000000001c11000 CR4: 00000000001407f0
[  624.489768] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  624.490837] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  624.491894] Stack:
[  624.492941]  0000000081c00000 b6e98a804d03933a ffffe8fffee04da8 0000000000000005
[  624.494017]  ffffffff81cae620 0000000000000000 ffffffff81c03ee0 ffffffff81645825
[  624.495096]  000000917a0eca81 ffffffff81cae7f0 ffffffff81d1d290 ffffe8fffee04da8
[  624.496174] Call Trace:
[  624.497229]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[  624.498294]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  624.499348]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  624.500403]  [<ffffffff8179d7a0>] rest_init+0xc0/0xd0
[  624.501458]  [<ffffffff8179d6e5>] ? rest_init+0x5/0xd0
[  624.502507]  [<ffffffff81eff009>] start_kernel+0x475/0x496
[  624.503549]  [<ffffffff81efe98d>] ? set_init_arg+0x53/0x53
[  624.504585]  [<ffffffff81efe57b>] x86_64_start_reservations+0x2a/0x2c
[  624.505627]  [<ffffffff81efe66e>] x86_64_start_kernel+0xf1/0xf4
[  624.506655] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[  624.508927] NMI backtrace for cpu 3
[  624.510061] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.17.0-rc1+ #112
[  624.512303] task: ffff880242b72de0 ti: ffff880242418000 task.ti: ffff880242418000
[  624.513284] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[  624.514281] RSP: 0000:ffff88024241be20  EFLAGS: 00000046
[  624.515289] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[  624.516286] RDX: 0000000000000000 RSI: ffff88024241bfd8 RDI: 0000000000000003
[  624.517264] RBP: ffff88024241be50 R08: 000000008bafc4e0 R09: 0000000000000000
[  624.518235] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[  624.519205] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242418000
[  624.520169] FS:  0000000000000000(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[  624.521142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  624.522100] CR2: 00000000013a4738 CR3: 0000000001c11000 CR4: 00000000001407e0
[  624.523105] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  624.524063] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  624.525039] Stack:
[  624.526003]  0000000342418000 1e8c3d4850bfa337 ffffe8ffff404da8 0000000000000005
[  624.526991]  ffffffff81cae620 0000000000000003 ffff88024241be98 ffffffff81645825
[  624.527953]  000000917994f2a7 ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff404da8
[  624.528926] Call Trace:
[  624.529911]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[  624.530911]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  624.531883]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  624.532905]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[  624.533877] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[  652.386003] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
[  652.388580] Modules linked in: fuse tun rfcomm llc2 af_key nfnetlink scsi_transport_iscsi can_bcm bnep can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul crc32c_intel ghash_clmulni_intel e1000e snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer microcode snd serio_raw pcspkr usb_debug ptp pps_core shpchp soundcore
[  652.394750] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.17.0-rc1+ #112
[  652.397225] task: ffff880242b744d0 ti: ffff880242414000 task.ti: ffff880242414000
[  652.398457] RIP: 0010:[<ffffffff81645849>]  [<ffffffff81645849>] cpuidle_enter_state+0x79/0x1c0
[  652.399711] RSP: 0018:ffff880242417e60  EFLAGS: 00000246
[  652.400953] RAX: 0000000000000000 RBX: ffff880242b744d0 RCX: 0000000000000019
[  652.402203] RDX: 20c49ba5e353f7cf RSI: 0000000000039e2e RDI: 002e6b0e8bd4c66e
[  652.403446] RBP: ffff880242417e98 R08: 000000008bafc4e0 R09: 0000000000000000
[  652.404695] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880242417df0
[  652.405953] R13: ffffffff810bfc5e R14: ffff880242417dd0 R15: 00000000000001da
[  652.407176] FS:  0000000000000000(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[  652.408441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  652.409679] CR2: 00007f15cfbaf000 CR3: 0000000001c11000 CR4: 00000000001407e0
[  652.410923] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  652.412173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  652.413415] Stack:
[  652.414652]  00000097fc215a08 ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff204da8
[  652.415906]  ffff880242414000 ffffffff81cae620 ffff880242414000 ffff880242417ea8
[  652.417138]  ffffffff81645a47 ffff880242417f10 ffffffff810b9fb4 ffff880242417fd8
[  652.418379] Call Trace:
[  652.419637]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  652.420917]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  652.422192]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[  652.423462] Code: d0 48 89 df ff 50 48 41 89 c5 e8 b3 5c aa ff 44 8b 63 04 49 89 c7 0f 1f 44 00 00 e8 a2 19 b0 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 2b 7d c8 4c 89 f8 49 c1 ff 3f 48 f7 ea b8 ff ff ff 7f 48 c1 
[  652.426104] sending NMI to other CPUs:
[  652.427302] NMI backtrace for cpu 0
[  652.428449] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.17.0-rc1+ #112
[  652.430736] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[  652.431882] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[  652.433036] RSP: 0018:ffffffff81c03e68  EFLAGS: 00000046
[  652.434181] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[  652.435341] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[  652.436491] RBP: ffffffff81c03e98 R08: 000000008bafc4e0 R09: 0000000000000000
[  652.437642] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[  652.438797] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[  652.439945] FS:  0000000000000000(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[  652.441093] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  652.442234] CR2: 00007f00bcc2c000 CR3: 0000000001c11000 CR4: 00000000001407f0
[  652.443388] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  652.444540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  652.445680] Stack:
[  652.446808]  0000000081c00000 b6e98a804d03933a ffffe8fffee04da8 0000000000000005
[  652.447959]  ffffffff81cae620 0000000000000000 ffffffff81c03ee0 ffffffff81645825
[  652.449111]  00000097ff1f0b6f ffffffff81cae7f0 ffffffff81d1d290 ffffe8fffee04da8
[  652.450261] Call Trace:
[  652.451392]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[  652.452539]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  652.453682]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  652.454825]  [<ffffffff8179d7a0>] rest_init+0xc0/0xd0
[  652.455942]  [<ffffffff8179d6e5>] ? rest_init+0x5/0xd0
[  652.457036]  [<ffffffff81eff009>] start_kernel+0x475/0x496
[  652.458123]  [<ffffffff81efe98d>] ? set_init_arg+0x53/0x53
[  652.459218]  [<ffffffff81efe57b>] x86_64_start_reservations+0x2a/0x2c
[  652.460320]  [<ffffffff81efe66e>] x86_64_start_kernel+0xf1/0xf4
[  652.461419] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[  652.463812] NMI backtrace for cpu 3
[  652.464973] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.17.0-rc1+ #112
[  652.467190] task: ffff880242b72de0 ti: ffff880242418000 task.ti: ffff880242418000
[  652.468248] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[  652.469309] RSP: 0018:ffff88024241be20  EFLAGS: 00000046
[  652.470336] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[  652.471368] RDX: 0000000000000000 RSI: ffff88024241bfd8 RDI: 0000000000000003
[  652.472378] RBP: ffff88024241be50 R08: 000000008bafc4e0 R09: 0000000000000000
[  652.473383] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[  652.474378] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242418000
[  652.475366] FS:  0000000000000000(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[  652.476385] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  652.477389] CR2: 00000000013a4738 CR3: 0000000001c11000 CR4: 00000000001407e0
[  652.478379] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  652.479371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  652.480366] Stack:
[  652.481327]  0000000342418000 1e8c3d4850bfa337 ffffe8ffff404da8 0000000000000005
[  652.482326]  ffffffff81cae620 0000000000000003 ffff88024241be98 ffffffff81645825
[  652.483326]  00000097ff1b9892 ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff404da8
[  652.484332] Call Trace:
[  652.485323]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[  652.486345]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  652.487338]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  652.488335]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[  652.489322] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[  652.491493] NMI backtrace for cpu 1
[  652.492553] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.17.0-rc1+ #112
[  652.494553] task: ffff880242b716f0 ti: ffff88024240c000 task.ti: ffff88024240c000
[  652.495559] RIP: 0010:[<ffffffff813c9e65>]  [<ffffffff813c9e65>] intel_idle+0xd5/0x180
[  652.496585] RSP: 0018:ffff88024240fe20  EFLAGS: 00000046
[  652.497569] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[  652.498545] RDX: 0000000000000000 RSI: ffff88024240ffd8 RDI: 0000000000000001
[  652.499502] RBP: ffff88024240fe50 R08: 000000008bafc4e0 R09: 0000000000000000
[  652.500452] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[  652.501398] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024240c000
[  652.502342] FS:  0000000000000000(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[  652.503290] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  652.504236] CR2: 00007fecce638ab8 CR3: 0000000001c11000 CR4: 00000000001407e0
[  652.505195] DR0: 00007f6670c66000 DR1: 0000000000000000 DR2: 0000000000000000
[  652.506202] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  652.507177] Stack:
[  652.508136]  000000014240c000 0d2996f8aba02290 ffffe8ffff004da8 0000000000000005
[  652.509122]  ffffffff81cae620 0000000000000001 ffff88024240fe98 ffffffff81645825
[  652.510113]  00000097ff1b96a9 ffffffff81cae7f0 ffffffff81d1d290 ffffe8ffff004da8
[  652.511130] Call Trace:
[  652.512121]  [<ffffffff81645825>] cpuidle_enter_state+0x55/0x1c0
[  652.513102]  [<ffffffff81645a47>] cpuidle_enter+0x17/0x20
[  652.514086]  [<ffffffff810b9fb4>] cpu_startup_entry+0x384/0x410
[  652.515078]  [<ffffffff8102ff37>] start_secondary+0x237/0x340
[  652.516085] Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 



It kept spewing lockups over and over.
Something weird that jumped out to me was this:

[  599.573324] end_request: I/O error, dev sda, sector 0

User trinity was running under did not have permission
to read block device directly, so that's just.. creepy.
Hopefully not a sign of impending disk death.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:00                                                 ` Dave Jones
@ 2014-12-03 19:25                                                   ` Linus Torvalds
  2014-12-03 19:30                                                     ` Dave Jones
                                                                       ` (2 more replies)
  2014-12-03 19:59                                                   ` Chris Mason
  1 sibling, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03 19:25 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Thomas Gleixner,
	John Stultz

On Wed, Dec 3, 2014 at 11:00 AM, Dave Jones <davej@redhat.com> wrote:
>
> So right after sending my last mail, I rebooted, and restarted the run
> on the same kernel again.
>
> As I was writing this mail, this happened.
>
> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
>
> and that's all that made it over the console. I couldn't log in via ssh,
> and thought "ah-ha, so it IS bad".  I walked over to reboot it, and
> found I could actually log in on the console. check out this dmesg..
>
> [  503.683055] Clocksource tsc unstable (delta = -95946009388 ns)
> [  503.692038] Switched to clocksource hpet
> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]

Interesting. That whole NMI watchdog thing happens pretty much 22s
after the "TSC unstable" message.

Have you ever seen that TSC issue before? The watchdog relies on
comparing get_timestamp() differences, so if the timestamp was
incorrect...

Maybe that whole "clocksource_watchdog()" is bogus. That delta is
about 96 seconds, sounds very odd. I'm not seeing how the TSC could
actually scew up that badly, so I'd almost be more likely to blame the
"watchdog" clock.

I don't know. This piece of code:

        delta = clocksource_delta(wdnow, cs->wd_last, watchdog->mask);

makes no sense to me. Shouldn't it be

        delta = clocksource_delta(wdnow, watchdog->wd_last, watchdog->mask);

Thomas? John?

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:25                                                   ` Linus Torvalds
@ 2014-12-03 19:30                                                     ` Dave Jones
  2014-12-03 19:48                                                     ` Linus Torvalds
  2014-12-03 19:56                                                     ` John Stultz
  2 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-03 19:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner, John Stultz

On Wed, Dec 03, 2014 at 11:25:29AM -0800, Linus Torvalds wrote:
 > On Wed, Dec 3, 2014 at 11:00 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > So right after sending my last mail, I rebooted, and restarted the run
 > > on the same kernel again.
 > >
 > > As I was writing this mail, this happened.
 > >
 > > [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
 > >
 > > and that's all that made it over the console. I couldn't log in via ssh,
 > > and thought "ah-ha, so it IS bad".  I walked over to reboot it, and
 > > found I could actually log in on the console. check out this dmesg..
 > >
 > > [  503.683055] Clocksource tsc unstable (delta = -95946009388 ns)
 > > [  503.692038] Switched to clocksource hpet
 > > [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
 > 
 > Interesting. That whole NMI watchdog thing happens pretty much 22s
 > after the "TSC unstable" message.
 > 
 > Have you ever seen that TSC issue before? The watchdog relies on
 > comparing get_timestamp() differences, so if the timestamp was
 > incorrect...
 
yeah, quite a lot.

# grep tsc\ unstable /var/log/messages* | wc -l
71

Usually happens pretty soon after boot, once I start the fuzzing run.
It sometimes occurs quite some time before the NMI issue though.

eg:

Dec  3 11:50:24 binary kernel: [ 4253.432642] Clocksource tsc unstable (delta = -243666538341 ns)
...
Dec  3 13:24:28 binary kernel: [ 9862.915562] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c29:13237]


	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:25                                                   ` Linus Torvalds
  2014-12-03 19:30                                                     ` Dave Jones
@ 2014-12-03 19:48                                                     ` Linus Torvalds
  2014-12-03 20:09                                                       ` Dave Jones
  2014-12-03 19:56                                                     ` John Stultz
  2 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03 19:48 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Thomas Gleixner,
	John Stultz

On Wed, Dec 3, 2014 at 11:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I don't know. This piece of code:
>
>         delta = clocksource_delta(wdnow, cs->wd_last, watchdog->mask);
>
> makes no sense to me.

Yeah, no, I see what's up. I missed that whole wd_last vs cs_last
pairing. I guess that part is all good. There are other crazy issues
in there, though, like the double test of 'watchdog_reset_pending'. So
I still wonder, though, since that odd 96-second delta is just insane
and makes no sense from a TSC standpoint (it's closer to a 32-bit
overflow of a hpet counter, but that sounds off too).

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:25                                                   ` Linus Torvalds
  2014-12-03 19:30                                                     ` Dave Jones
  2014-12-03 19:48                                                     ` Linus Torvalds
@ 2014-12-03 19:56                                                     ` John Stultz
  2014-12-03 20:37                                                       ` Thomas Gleixner
  2014-12-03 20:39                                                       ` Thomas Gleixner
  2 siblings, 2 replies; 486+ messages in thread
From: John Stultz @ 2014-12-03 19:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Wed, Dec 3, 2014 at 11:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Dec 3, 2014 at 11:00 AM, Dave Jones <davej@redhat.com> wrote:
>>
>> So right after sending my last mail, I rebooted, and restarted the run
>> on the same kernel again.
>>
>> As I was writing this mail, this happened.
>>
>> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
>>
>> and that's all that made it over the console. I couldn't log in via ssh,
>> and thought "ah-ha, so it IS bad".  I walked over to reboot it, and
>> found I could actually log in on the console. check out this dmesg..
>>
>> [  503.683055] Clocksource tsc unstable (delta = -95946009388 ns)
>> [  503.692038] Switched to clocksource hpet
>> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
>
> Interesting. That whole NMI watchdog thing happens pretty much 22s
> after the "TSC unstable" message.
>
> Have you ever seen that TSC issue before? The watchdog relies on
> comparing get_timestamp() differences, so if the timestamp was
> incorrect...
>
> Maybe that whole "clocksource_watchdog()" is bogus. That delta is
> about 96 seconds, sounds very odd. I'm not seeing how the TSC could
> actually scew up that badly, so I'd almost be more likely to blame the
> "watchdog" clock.
>
> I don't know. This piece of code:
>
>         delta = clocksource_delta(wdnow, cs->wd_last, watchdog->mask);
>
> makes no sense to me. Shouldn't it be
>
>         delta = clocksource_delta(wdnow, watchdog->wd_last, watchdog->mask);

So we store wdnow value in the cs->wd_last a few lines below, so I
don't think that's problematic.

I do recall seeing problematic watchdog behavior back in the day w/
PREEMPT_RT when a high priority task really starved the watchdog for a
long time. When we came back the hpet had wrapped, making the wd_delta
look quite small relative to the TSC delta, causing improper
disqualification of the TSC.

But in that case the watchdog would disqualify the TSC after the
stall, and here the stall is happening right afterwards. So I'm not
sure.

I'll look around for some other suspects though. The nohz ntp
improvments might be high on my list there, since it was a 3.17 item.
Will dig.

thanks
-john

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:00                                                 ` Dave Jones
  2014-12-03 19:25                                                   ` Linus Torvalds
@ 2014-12-03 19:59                                                   ` Chris Mason
  2014-12-03 20:11                                                     ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-03 19:59 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List



On Wed, Dec 3, 2014 at 2:00 PM, Dave Jones <davej@redhat.com> wrote:
> On Wed, Dec 03, 2014 at 10:45:57AM -0800, Linus Torvalds wrote:
>  > On Wed, Dec 3, 2014 at 10:41 AM, Dave Jones <davej@redhat.com> 
> wrote:
>  > >
>  > > I've been stuck on this kernel for a few days now trying to 
> prove it
>  > > good/bad one way or the other, and I'm leaning towards good, 
> given
>  > > that it recovers, even though the traces look similar.
>  >
>  > Ugh. But this does *not* happen with 3.16, right? Even the 
> non-fatal case?
> 
> correct. at least not in any of the runs that I did to date.
> 
>  > If so, I'd be inclined to call it "bad". But there might well be 
> two
>  > bugs: one that makes that NMI watchdog trigger, and another one 
> that
>  > then makes it be a hard lockup. I'd think it would be good to 
> figure
>  > out the "NMI watchdog starts triggering" one first, though.
> 
> I think you're right.
> 
> So right after sending my last mail, I rebooted, and restarted the run
> on the same kernel again.
> 
> As I was writing this mail, this happened.
> 
> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
> [trinity-c178:20182]
> 
> and that's all that made it over the console. I couldn't log in via 
> ssh,
> and thought "ah-ha, so it IS bad".  I walked over to reboot it, and
> found I could actually log in on the console. check out this dmesg..
> 
> [  503.683055] Clocksource tsc unstable (delta = -95946009388 ns)
> [  503.692038] Switched to clocksource hpet
> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
> [trinity-c178:20182]

Neat.  We often see switching to hpet on boxes as they are diving into 
softlockup pain, but it's not usually before the softlockups.

Are you configured for CONFIG_NOHZ_FULL?

I'd love to blame the only commit to kernel/smp.c between 3.16 and 3.17

commit 478850160636c4f0b2558451df0e42f8c5a10939
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date:   Thu May 8 01:37:48 2014 +0200

    irq_work: Implement remote queueing

You've also mentioned a few times where messages stopped hitting the 
console?


commit 5874af2003b1aaaa053128d655710140e3187226
Author: Jan Kara <jack@suse.cz>
Date:   Wed Aug 6 16:09:10 2014 -0700

    printk: enable interrupts before calling 
console_trylock_for_printk()

-chris


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:48                                                     ` Linus Torvalds
@ 2014-12-03 20:09                                                       ` Dave Jones
  2014-12-03 20:37                                                         ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-03 20:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner, John Stultz

On Wed, Dec 03, 2014 at 11:48:55AM -0800, Linus Torvalds wrote:
 > On Wed, Dec 3, 2014 at 11:25 AM, Linus Torvalds
 > <torvalds@linux-foundation.org> wrote:
 > >
 > > I don't know. This piece of code:
 > >
 > >         delta = clocksource_delta(wdnow, cs->wd_last, watchdog->mask);
 > >
 > > makes no sense to me.
 > 
 > Yeah, no, I see what's up. I missed that whole wd_last vs cs_last
 > pairing. I guess that part is all good. There are other crazy issues
 > in there, though, like the double test of 'watchdog_reset_pending'. So
 > I still wonder, though, since that odd 96-second delta is just insane
 > and makes no sense from a TSC standpoint (it's closer to a 32-bit
 > overflow of a hpet counter, but that sounds off too).

fwiw, there's quite a bit of variance in the delta that seems to show up.

Clocksource tsc unstable (delta = -1010986453 ns) 
Clocksource tsc unstable (delta = -112130224777 ns) 
Clocksource tsc unstable (delta = -154880389323 ns) 
Clocksource tsc unstable (delta = -165033940543 ns) 
Clocksource tsc unstable (delta = -16610147135 ns) 
Clocksource tsc unstable (delta = -169783264218 ns) 
Clocksource tsc unstable (delta = -183044061613 ns) 
Clocksource tsc unstable (delta = -188697049603 ns) 
Clocksource tsc unstable (delta = -190100649573 ns) 
Clocksource tsc unstable (delta = -192732788150 ns) 
Clocksource tsc unstable (delta = -211622574067 ns) 
Clocksource tsc unstable (delta = -219378634234 ns) 
Clocksource tsc unstable (delta = -226609871873 ns) 
Clocksource tsc unstable (delta = -228355467642 ns) 
Clocksource tsc unstable (delta = -233950238702 ns) 
Clocksource tsc unstable (delta = -243666538341 ns) 
Clocksource tsc unstable (delta = -251535074315 ns) 
Clocksource tsc unstable (delta = -26880030622 ns) 
Clocksource tsc unstable (delta = -37899993747 ns) 
Clocksource tsc unstable (delta = -50471031780 ns) 
Clocksource tsc unstable (delta = -6821168015 ns) 
Clocksource tsc unstable (delta = -7067883139 ns)
Clocksource tsc unstable (delta = -76682050692 ns) 
Clocksource tsc unstable (delta = -8296489689 ns) 
Clocksource tsc unstable (delta = -88373866635 ns) 
Clocksource tsc unstable (delta = -89099193219 ns) 
Clocksource tsc unstable (delta = -95946009388 ns) 


(sort -u'ed post grep, the ordering here doesn't mean anything)

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:59                                                   ` Chris Mason
@ 2014-12-03 20:11                                                     ` Dave Jones
  2014-12-03 20:56                                                       ` Chris Mason
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-03 20:11 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Wed, Dec 03, 2014 at 02:59:58PM -0500, Chris Mason wrote:
 
 
 > > [  503.692038] Switched to clocksource hpet
 > > [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
 > > [trinity-c178:20182]
 > 
 > Neat.  We often see switching to hpet on boxes as they are diving into 
 > softlockup pain, but it's not usually before the softlockups.
 > 
 > Are you configured for CONFIG_NOHZ_FULL?
 

No. I have recollections that I did run with that early on in this, but
I think someone asked me to try without that a few weeks back, and it's
been off since. (made no difference)

 > I'd love to blame the only commit to kernel/smp.c between 3.16 and 3.17
 > 
 > commit 478850160636c4f0b2558451df0e42f8c5a10939
 > Author: Frederic Weisbecker <fweisbec@gmail.com>
 > Date:   Thu May 8 01:37:48 2014 +0200
 > 
 >     irq_work: Implement remote queueing
 > 
 > You've also mentioned a few times where messages stopped hitting the 
 > console?
 >
 > commit 5874af2003b1aaaa053128d655710140e3187226
 > Author: Jan Kara <jack@suse.cz>
 > Date:   Wed Aug 6 16:09:10 2014 -0700
 > 
 >     printk: enable interrupts before calling 
 > console_trylock_for_printk()

Hmm..

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:56                                                     ` John Stultz
@ 2014-12-03 20:37                                                       ` Thomas Gleixner
  2014-12-03 20:44                                                         ` Dave Jones
  2014-12-03 20:39                                                       ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-03 20:37 UTC (permalink / raw)
  To: John Stultz
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Wed, 3 Dec 2014, John Stultz wrote:
> On Wed, Dec 3, 2014 at 11:25 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Wed, Dec 3, 2014 at 11:00 AM, Dave Jones <davej@redhat.com> wrote:
> >>
> >> So right after sending my last mail, I rebooted, and restarted the run
> >> on the same kernel again.
> >>
> >> As I was writing this mail, this happened.
> >>
> >> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
> >>
> >> and that's all that made it over the console. I couldn't log in via ssh,
> >> and thought "ah-ha, so it IS bad".  I walked over to reboot it, and
> >> found I could actually log in on the console. check out this dmesg..
> >>
> >> [  503.683055] Clocksource tsc unstable (delta = -95946009388 ns)
> >> [  503.692038] Switched to clocksource hpet
> >> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
> >
> > Interesting. That whole NMI watchdog thing happens pretty much 22s
> > after the "TSC unstable" message.
> >
> > Have you ever seen that TSC issue before? The watchdog relies on
> > comparing get_timestamp() differences, so if the timestamp was
> > incorrect...
> >
> > Maybe that whole "clocksource_watchdog()" is bogus. That delta is
> > about 96 seconds, sounds very odd. I'm not seeing how the TSC could
> > actually scew up that badly, so I'd almost be more likely to blame the
> > "watchdog" clock.
> >
> > I don't know. This piece of code:
> >
> >         delta = clocksource_delta(wdnow, cs->wd_last, watchdog->mask);
> >
> > makes no sense to me. Shouldn't it be
> >
> >         delta = clocksource_delta(wdnow, watchdog->wd_last, watchdog->mask);
> 
> So we store wdnow value in the cs->wd_last a few lines below, so I
> don't think that's problematic.
> 
> I do recall seeing problematic watchdog behavior back in the day w/
> PREEMPT_RT when a high priority task really starved the watchdog for a
> long time. When we came back the hpet had wrapped, making the wd_delta
> look quite small relative to the TSC delta, causing improper
> disqualification of the TSC.

Right, that resulted in a delta > 0. I have no idea how we could
create a negative delta via wrapping the HPET around, i.e. HPET being
96 seconds ahead of TSC.

This looks more like a genuine TSC wreckage. So we have these possible
causes:

   1) SMI

   2) Power states

   3) Writing to the wrong MSR

So I assume that 1/2 are a non issue. They should surface in normal
non fuzzed operation as well.

Dave, does that TSC unstable thing always happen AFTER you started
fuzzing? If yes, what is the fuzzer doing this time?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:09                                                       ` Dave Jones
@ 2014-12-03 20:37                                                         ` Linus Torvalds
  2014-12-03 20:55                                                           ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03 20:37 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Thomas Gleixner,
	John Stultz

On Wed, Dec 3, 2014 at 12:09 PM, Dave Jones <davej@redhat.com> wrote:
>
> fwiw, there's quite a bit of variance in the delta that seems to show up.

Yeah, anything from 1 to 251 seconds. That's definitely "quite a bit
of variance"

                Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 19:56                                                     ` John Stultz
  2014-12-03 20:37                                                       ` Thomas Gleixner
@ 2014-12-03 20:39                                                       ` Thomas Gleixner
  2014-12-04  3:15                                                         ` Chris Mason
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-03 20:39 UTC (permalink / raw)
  To: John Stultz
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Wed, 3 Dec 2014, John Stultz wrote:
> I'll look around for some other suspects though. The nohz ntp
> improvments might be high on my list there, since it was a 3.17 item.
> Will dig.

Neither the clocksource watchdog nor the timestamp of the kernel
watchdog are affected by NTP adjustments.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:37                                                       ` Thomas Gleixner
@ 2014-12-03 20:44                                                         ` Dave Jones
  2014-12-03 20:59                                                           ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-03 20:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Stultz, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Wed, Dec 03, 2014 at 09:37:10PM +0100, Thomas Gleixner wrote:
 
 > Right, that resulted in a delta > 0. I have no idea how we could
 > create a negative delta via wrapping the HPET around, i.e. HPET being
 > 96 seconds ahead of TSC.
 > 
 > This looks more like a genuine TSC wreckage. So we have these possible
 > causes:
 > 
 >    1) SMI
 > 
 >    2) Power states
 > 
 >    3) Writing to the wrong MSR
 > 
 > So I assume that 1/2 are a non issue. They should surface in normal
 > non fuzzed operation as well.
 > 
 > Dave, does that TSC unstable thing always happen AFTER you started
 > fuzzing? If yes, what is the fuzzer doing this time?

I've seen it even after doing just a kernel build sometimes, It's
happened so regularly I've just assumed "the tsc is crap on this box".

On occasion I even see it shortly after boot, while idle.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:37                                                         ` Linus Torvalds
@ 2014-12-03 20:55                                                           ` Thomas Gleixner
  2014-12-03 21:14                                                             ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-03 20:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, 3 Dec 2014, Linus Torvalds wrote:
> On Wed, Dec 3, 2014 at 12:09 PM, Dave Jones <davej@redhat.com> wrote:
> >
> > fwiw, there's quite a bit of variance in the delta that seems to show up.
> 
> Yeah, anything from 1 to 251 seconds. That's definitely "quite a bit
> of variance"

But it's always negative, which means HPET is always ahead of
TSC. That excludes pretty much the clocksource watchdog starvation
issue which results in TSC being ahead of HPET due to a HPET
wraparound (which takes ~300s).

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:11                                                     ` Dave Jones
@ 2014-12-03 20:56                                                       ` Chris Mason
  0 siblings, 0 replies; 486+ messages in thread
From: Chris Mason @ 2014-12-03 20:56 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List



On Wed, Dec 3, 2014 at 3:11 PM, Dave Jones <davej@redhat.com> wrote:
> On Wed, Dec 03, 2014 at 02:59:58PM -0500, Chris Mason wrote:
> 
>  > You've also mentioned a few times where messages stopped hitting 
> the
>  > console?
>  >
>  > commit 5874af2003b1aaaa053128d655710140e3187226
>  > Author: Jan Kara <jack@suse.cz>
>  > Date:   Wed Aug 6 16:09:10 2014 -0700
>  >
>  >     printk: enable interrupts before calling
>  > console_trylock_for_printk()
> 
> Hmm..

Jan Kara has been talking about printk deadlocks for some time.  I 
wouldn't have expected it to get worse since 3.16, but we do have a few 
printk cleanups and fixes in there.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:44                                                         ` Dave Jones
@ 2014-12-03 20:59                                                           ` Thomas Gleixner
  2014-12-03 21:05                                                             ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-03 20:59 UTC (permalink / raw)
  To: Dave Jones
  Cc: John Stultz, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Wed, 3 Dec 2014, Dave Jones wrote:
> On Wed, Dec 03, 2014 at 09:37:10PM +0100, Thomas Gleixner wrote:
>  
>  > Right, that resulted in a delta > 0. I have no idea how we could
>  > create a negative delta via wrapping the HPET around, i.e. HPET being
>  > 96 seconds ahead of TSC.
>  > 
>  > This looks more like a genuine TSC wreckage. So we have these possible
>  > causes:
>  > 
>  >    1) SMI
>  > 
>  >    2) Power states
>  > 
>  >    3) Writing to the wrong MSR
>  > 
>  > So I assume that 1/2 are a non issue. They should surface in normal
>  > non fuzzed operation as well.
>  > 
>  > Dave, does that TSC unstable thing always happen AFTER you started
>  > fuzzing? If yes, what is the fuzzer doing this time?
> 
> I've seen it even after doing just a kernel build sometimes, It's
> happened so regularly I've just assumed "the tsc is crap on this box".
> 
> On occasion I even see it shortly after boot, while idle.

Can you please provide the cpuinfo flags of that box?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:59                                                           ` Thomas Gleixner
@ 2014-12-03 21:05                                                             ` Dave Jones
  2014-12-03 21:48                                                               ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-03 21:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Stultz, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Wed, Dec 03, 2014 at 09:59:20PM +0100, Thomas Gleixner wrote:

 > Can you please provide the cpuinfo flags of that box?

flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64
monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept
vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
xsaveopt





^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:55                                                           ` Thomas Gleixner
@ 2014-12-03 21:14                                                             ` Linus Torvalds
  2014-12-03 22:19                                                               ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-03 21:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, Dec 3, 2014 at 12:55 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> But it's always negative, which means HPET is always ahead of
> TSC. That excludes pretty much the clocksource watchdog starvation
> issue which results in TSC being ahead of HPET due to a HPET
> wraparound (which takes ~300s).

Still, I'd be more likely to trust the TSC than the HPET on modern
machines.. And DaveJ's machine isn't some old one.

Of course, there's always BIOS games. Can we read the TSC offset
register and check it being constant (modulo sleep events)?

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 21:05                                                             ` Dave Jones
@ 2014-12-03 21:48                                                               ` Thomas Gleixner
  0 siblings, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-03 21:48 UTC (permalink / raw)
  To: Dave Jones
  Cc: John Stultz, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Wed, 3 Dec 2014, Dave Jones wrote:
> On Wed, Dec 03, 2014 at 09:59:20PM +0100, Thomas Gleixner wrote:
> 
>  > Can you please provide the cpuinfo flags of that box?
> 
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
> nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64
> monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
> xsaveopt

So that has nonstop_tsc and constant_tsc, which means that we switch
to sched_clock_stable, i.e. no range checks, nothing. We just take the
raw value and use it.

The clocksource code is a bit more paranoid and lets the TSC be
monitored by the watchdog. Now, if the TSC is detected as unstable we
should switch back to sched_clock_unstable, but we don't have a
mechanism for that.

That was obviously not considered when the sched_clock_stable stuff
was introduced. So sched_clock() happily uses TSC as a reliable thing
even when the clocksource code detected that it is crap.

For sure we need something here, but that sched_clock_stable
mechanism got introduced in 3.14, so it does not make any sense that
you observe that only post 3.16.

Thanks,

	tglx









^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 21:14                                                             ` Linus Torvalds
@ 2014-12-03 22:19                                                               ` Thomas Gleixner
  2014-12-03 23:21                                                                 ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-03 22:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, 3 Dec 2014, Linus Torvalds wrote:
> On Wed, Dec 3, 2014 at 12:55 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > But it's always negative, which means HPET is always ahead of
> > TSC. That excludes pretty much the clocksource watchdog starvation
> > issue which results in TSC being ahead of HPET due to a HPET
> > wraparound (which takes ~300s).
> 
> Still, I'd be more likely to trust the TSC than the HPET on modern
> machines.. And DaveJ's machine isn't some old one.

Well, that does not explain the softlock watchdog which is solely
relying on the TSC.

> Of course, there's always BIOS games. Can we read the TSC offset
> register and check it being constant (modulo sleep events)?

The kernel does not touch it. Here is a untested hack to verify it on
every local apic timer interrupt. Not nice, but simple :)

Thanks.

	tglx
---
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ba6cc041edb1..69b0a8143e83 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -554,6 +554,7 @@ static struct clock_event_device lapic_clockevent = {
 	.irq		= -1,
 };
 static DEFINE_PER_CPU(struct clock_event_device, lapic_events);
+static DEFINE_PER_CPU(u64, tsc_adjust);
 
 /*
  * Setup the local APIC timer for this CPU. Copy the initialized values
@@ -569,6 +570,13 @@ static void setup_APIC_timer(void)
 		lapic_clockevent.rating = 150;
 	}
 
+	if (this_cpu_has(X86_FEATURE_TSC_ADJUST)) {
+		u64 adj;
+
+		rdmsrl(MSR_IA32_TSC_ADJUST, adj);
+		__this_cpu_write(tsc_adjust, adj);
+	}
+
 	memcpy(levt, &lapic_clockevent, sizeof(*levt));
 	levt->cpumask = cpumask_of(smp_processor_id());
 
@@ -912,6 +920,19 @@ static void local_apic_timer_interrupt(void)
 		return;
 	}
 
+	if (this_cpu_has(X86_FEATURE_TSC_ADJUST)) {
+		u64 adj;
+
+		rdmsrl(MSR_IA32_TSC_ADJUST, adj);
+		if (adj != __this_cpu_read(tsc_adjust)) {
+			pr_err("TSC adjustment on cpu %d changed %llu -> %llu\n",
+			       cpu,
+			       (unsigned long long) __this_cpu_read(tsc_adjust),
+			       (unsigned long long) adj);
+			__this_cpu_write(tsc_adjust, adj);
+		}
+	}
+
 	/*
 	 * the NMI deadlock-detector uses this.
 	 */

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 22:19                                                               ` Thomas Gleixner
@ 2014-12-03 23:21                                                                 ` Dave Jones
  2014-12-03 23:49                                                                   ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-03 23:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, Dec 03, 2014 at 11:19:11PM +0100, Thomas Gleixner wrote:
 > On Wed, 3 Dec 2014, Linus Torvalds wrote:
 > > On Wed, Dec 3, 2014 at 12:55 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
 > > >
 > > > But it's always negative, which means HPET is always ahead of
 > > > TSC. That excludes pretty much the clocksource watchdog starvation
 > > > issue which results in TSC being ahead of HPET due to a HPET
 > > > wraparound (which takes ~300s).
 > > 
 > > Still, I'd be more likely to trust the TSC than the HPET on modern
 > > machines.. And DaveJ's machine isn't some old one.
 > 
 > Well, that does not explain the softlock watchdog which is solely
 > relying on the TSC.
 > 
 > > Of course, there's always BIOS games. Can we read the TSC offset
 > > register and check it being constant (modulo sleep events)?
 > 
 > The kernel does not touch it. Here is a untested hack to verify it on
 > every local apic timer interrupt. Not nice, but simple :)
 
 > +			pr_err("TSC adjustment on cpu %d changed %llu -> %llu\n",
 > +			       cpu,
 > +			       (unsigned long long) __this_cpu_read(tsc_adjust),
 > +			       (unsigned long long) adj);

I just got 

[ 1472.614433] Clocksource tsc unstable (delta = -26373048906 ns)

without any sign of the pr_err above.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 23:21                                                                 ` Dave Jones
@ 2014-12-03 23:49                                                                   ` Thomas Gleixner
  2014-12-04  0:19                                                                     ` Linus Torvalds
  2014-12-04  0:20                                                                     ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-03 23:49 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, 3 Dec 2014, Dave Jones wrote:
> On Wed, Dec 03, 2014 at 11:19:11PM +0100, Thomas Gleixner wrote:
>  > On Wed, 3 Dec 2014, Linus Torvalds wrote:
>  > > On Wed, Dec 3, 2014 at 12:55 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>  > > >
>  > > > But it's always negative, which means HPET is always ahead of
>  > > > TSC. That excludes pretty much the clocksource watchdog starvation
>  > > > issue which results in TSC being ahead of HPET due to a HPET
>  > > > wraparound (which takes ~300s).
>  > > 
>  > > Still, I'd be more likely to trust the TSC than the HPET on modern
>  > > machines.. And DaveJ's machine isn't some old one.
>  > 
>  > Well, that does not explain the softlock watchdog which is solely
>  > relying on the TSC.
>  > 
>  > > Of course, there's always BIOS games. Can we read the TSC offset
>  > > register and check it being constant (modulo sleep events)?
>  > 
>  > The kernel does not touch it. Here is a untested hack to verify it on
>  > every local apic timer interrupt. Not nice, but simple :)
>  
>  > +			pr_err("TSC adjustment on cpu %d changed %llu -> %llu\n",
>  > +			       cpu,
>  > +			       (unsigned long long) __this_cpu_read(tsc_adjust),
>  > +			       (unsigned long long) adj);
> 
> I just got 
> 
> [ 1472.614433] Clocksource tsc unstable (delta = -26373048906 ns)
> 
> without any sign of the pr_err above.

Bah. Would have been too simple ....

Could you please run Ingos time-warp test on that machine for a while?

   http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

Please change:

- #define TEST_CLOCK 0
+ #define TEST_CLOCK 1

I'll dig further into the time/clocksource whatever related changes
post 3.16

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 23:49                                                                   ` Thomas Gleixner
@ 2014-12-04  0:19                                                                     ` Linus Torvalds
  2014-12-04  1:02                                                                       ` Thomas Gleixner
  2014-12-04  0:20                                                                     ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-04  0:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, Dec 3, 2014 at 3:49 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Bah. Would have been too simple ....

I do think you tend to trust the hpet too much.

Yes, we've had issues with people doing bad things to the tsc, but on
the whole I really would tend to trust the internal CPU counter a
_lot_ more than the external hpet.

So I think it's equally (if not more) likely that switching from tsc
to hpet causes trouble, because I'd trust the tsc more than the hpet.

DaveJ, do this:

> Could you please run Ingos time-warp test on that machine for a while?

but perhaps also boot with "tsc=reliable", which _should_ get rid of
that CLOCK_SOURCE_MUST_VERIFY, and the clocksource watchdog should do
nothing.

Thomas? Am I misreading that?

              Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 23:49                                                                   ` Thomas Gleixner
  2014-12-04  0:19                                                                     ` Linus Torvalds
@ 2014-12-04  0:20                                                                     ` Dave Jones
  2014-12-04  0:59                                                                       ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-04  0:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Thu, Dec 04, 2014 at 12:49:29AM +0100, Thomas Gleixner wrote:

 > > I just got 
 > > 
 > > [ 1472.614433] Clocksource tsc unstable (delta = -26373048906 ns)
 > > 
 > > without any sign of the pr_err above.
 > 
 > Bah. Would have been too simple ....
 > 
 > Could you please run Ingos time-warp test on that machine for a while?
 > 
 >    http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
 > 
 > Please change:
 > 
 > - #define TEST_CLOCK 0
 > + #define TEST_CLOCK 1

Seems to be 32-bit only, so I built it with -m32. I assume that's ok?

Also, should I run it in isolation, with nothing else going on,
or under load where I see problems ?

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 18:45                                               ` Linus Torvalds
  2014-12-03 19:00                                                 ` Dave Jones
@ 2014-12-04  0:27                                                 ` Dave Jones
  2014-12-05 17:15                                                 ` Dave Jones
  2 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-04  0:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Wed, Dec 03, 2014 at 10:45:57AM -0800, Linus Torvalds wrote:
 > On Wed, Dec 3, 2014 at 10:41 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > I've been stuck on this kernel for a few days now trying to prove it
 > > good/bad one way or the other, and I'm leaning towards good, given
 > > that it recovers, even though the traces look similar.
 > 
 > Ugh. But this does *not* happen with 3.16, right? Even the non-fatal case?
 > 
 > If so, I'd be inclined to call it "bad". But there might well be two
 > bugs: one that makes that NMI watchdog trigger, and another one that
 > then makes it be a hard lockup. I'd think it would be good to figure
 > out the "NMI watchdog starts triggering" one first, though.

So I just got a definite "bad" case on 17-rc1. got NMI spew from two
CPUs then the box was a boat anchor. Even keyboard stopped responding,
had to power cycle it to get it back up.

Given it took 2 days to prove this one, I'm really hoping subsequent
bisect branches prove themselves faster.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  0:20                                                                     ` Dave Jones
@ 2014-12-04  0:59                                                                       ` Thomas Gleixner
  2014-12-04  1:32                                                                         ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-04  0:59 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, 3 Dec 2014, Dave Jones wrote:
> On Thu, Dec 04, 2014 at 12:49:29AM +0100, Thomas Gleixner wrote:
> 
>  > > I just got 
>  > > 
>  > > [ 1472.614433] Clocksource tsc unstable (delta = -26373048906 ns)
>  > > 
>  > > without any sign of the pr_err above.
>  > 
>  > Bah. Would have been too simple ....
>  > 
>  > Could you please run Ingos time-warp test on that machine for a while?
>  > 
>  >    http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
>  > 
>  > Please change:
>  > 
>  > - #define TEST_CLOCK 0
>  > + #define TEST_CLOCK 1
> 
> Seems to be 32-bit only, so I built it with -m32. I assume that's ok?

I has some _x86_64 ifdeffery, but I'm too tired to stare at that
now. 32bit should show the issue as well.
 
> Also, should I run it in isolation, with nothing else going on,
> or under load where I see problems ?

isolated is usually the best thing as it has the highest density of
reads.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  0:19                                                                     ` Linus Torvalds
@ 2014-12-04  1:02                                                                       ` Thomas Gleixner
  0 siblings, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-04  1:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Wed, 3 Dec 2014, Linus Torvalds wrote:
> On Wed, Dec 3, 2014 at 3:49 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Bah. Would have been too simple ....
> 
> I do think you tend to trust the hpet too much.
> 
> Yes, we've had issues with people doing bad things to the tsc, but on
> the whole I really would tend to trust the internal CPU counter a
> _lot_ more than the external hpet.
> 
> So I think it's equally (if not more) likely that switching from tsc
> to hpet causes trouble, because I'd trust the tsc more than the hpet.

Given my experience I trust neither of them too much which makes it
admittedly weird to assign one as the supervisor of the other.

But yes, on newer machines the TSC tends to be halfways reliable when
you are not exposed to the never ending supply of BIOS bugs.

Though, that still does not explain the softlock watchdog issue as
that is completely independent of HPET.

But in the case that the clocksource switches to HPET it is not
completely independent because the hrtimer event programming depends
on it. But that would require the following scenario:

 hrtimer_start(timer) /* timer is first to expire timer */
     clockevents_program_event(timer->expires)
       delta = expires - ktime_get();
       evtdev->set_next_event(delta);

 So if ktime_get() returns an insane value in the past then delta can
 become large and of course nothing is going to reprogramm the clock
 event device unless there is a timer started which is earlier than
 the one which programmed the large delta.

 But ktime_get() cannot return a time which is before the last update
 of the timekeeper. It might return something in the future, but that
 would expire the timer earlier not later.

So there is something else here. If the programmed next event does not
fire _AND_ there is no earlier timer queued, then nothing which
depends on queued timers (hrtimer/timerlist) is going to be
scheduled. In the case of highres timers or nohz enabled not even the
scheduler tick on that cpu would kick in. We've seen that before and
it causes interesting failures ...

Now that machine has tsc_deadline_timer. I haven't seen bug reports
for that one yet, but it might be worthwhile to disable that for a
test.

But then that does not explain the post 3.16 issue at all.

More questions than answers, sigh.

> DaveJ, do this:
> 
> > Could you please run Ingos time-warp test on that machine for a while?
> 
> but perhaps also boot with "tsc=reliable", which _should_ get rid of
> that CLOCK_SOURCE_MUST_VERIFY, and the clocksource watchdog should do
> nothing.
> 
> Thomas? Am I misreading that?

No, that's what it is supposed to do and I actually know that it works.

Now we could also do it the other way round and boot with
"clocksource=hpet", which should show potential HPET wreckage right
away when exposed to Ingos time warp test.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  0:59                                                                       ` Thomas Gleixner
@ 2014-12-04  1:32                                                                         ` Dave Jones
  2014-12-04  3:45                                                                           ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-04  1:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, John Stultz

On Thu, Dec 04, 2014 at 01:59:01AM +0100, Thomas Gleixner wrote:

 > > Also, should I run it in isolation, with nothing else going on,
 > > or under load where I see problems ?
 > 
 > isolated is usually the best thing as it has the highest density of
 > reads.

Ok, it's been running for just over an hour with no 'fails' yet.
How long should I leave it ?

	Dave





^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 20:39                                                       ` Thomas Gleixner
@ 2014-12-04  3:15                                                         ` Chris Mason
  2014-12-04  5:49                                                           ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-04  3:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Stultz, Linus Torvalds, Dave Jones, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

I asked Dave for his lockups from 3.17-rc1, and they were in the
flush_tlb code waiting for remote CPUs to finish flushing.  It feels
like that's a common theme, and there are a few commits there between
3.16 and 3.17.

One guess is that trinity is generating a huge number of tlb
invalidations over sparse and horrible ranges.  Perhaps the old code was
falling back to full tlb flushes before Dave Hansen's string of fixes?

commit a5102476a24bce364b74f1110005542a2c964103
Author: Dave Hansen <dave.hansen@linux.intel.com>

    x86/mm: Set TLB flush tunable to sane value (33)

This entirely untested diff forces full tlb flushes on the remote CPUs.
It adds a few parens for good luck, but the nr_pages var is only sent to
ftrace, so it's not the bug we're looking for.

I'm only changing the flushes done on remote CPUs.  The local CPU is
still doing up to 33 fine grained flushes.  That may or may not be a
good idea, but my hand waiving only makes sense to me if we've got a
long string of fine grained flushes from tons of procs fanning out to
the remote CPUs.

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index ee61c36..72c4ff0 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -120,7 +120,7 @@ static void flush_tlb_func(void *info)
 		} else {
 			unsigned long addr;
 			unsigned long nr_pages =
-				f->flush_end - f->flush_start / PAGE_SIZE;
+				(f->flush_end - f->flush_start) / PAGE_SIZE;
 			addr = f->flush_start;
 			while (addr < f->flush_end) {
 				__flush_tlb_single(addr);
@@ -214,10 +214,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	}
 	trace_tlb_flush(TLB_LOCAL_MM_SHOOTDOWN, base_pages_to_flush);
 out:
-	if (base_pages_to_flush == TLB_FLUSH_ALL) {
-		start = 0UL;
-		end = TLB_FLUSH_ALL;
-	}
+	start = 0UL;
+	end = TLB_FLUSH_ALL;
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, start, end);
 	preempt_enable();

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  1:32                                                                         ` Dave Jones
@ 2014-12-04  3:45                                                                           ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-04  3:45 UTC (permalink / raw)
  To: Thomas Gleixner, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, John Stultz

On Wed, Dec 03, 2014 at 08:32:00PM -0500, Dave Jones wrote:
 > On Thu, Dec 04, 2014 at 01:59:01AM +0100, Thomas Gleixner wrote:
 > 
 >  > > Also, should I run it in isolation, with nothing else going on,
 >  > > or under load where I see problems ?
 >  > 
 >  > isolated is usually the best thing as it has the highest density of
 >  > reads.
 > 
 > Ok, it's been running for just over an hour with no 'fails' yet.
 > How long should I leave it ?

Ok, I left it running for four hours with no observed failures.
I'd leave it running longer, but I might not have this machine
in the new year, so I'm running out of time to complete bisections
if they take a day or two to complete each step.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  3:15                                                         ` Chris Mason
@ 2014-12-04  5:49                                                           ` Linus Torvalds
  2014-12-04 14:57                                                             ` Chris Mason
  2014-12-04 15:22                                                             ` Dave Hansen
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-04  5:49 UTC (permalink / raw)
  To: Chris Mason, Thomas Gleixner, John Stultz, Linus Torvalds,
	Dave Jones, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Wed, Dec 3, 2014 at 7:15 PM, Chris Mason <clm@fb.com> wrote:
>
> One guess is that trinity is generating a huge number of tlb
> invalidations over sparse and horrible ranges.  Perhaps the old code was
> falling back to full tlb flushes before Dave Hansen's string of fixes?

Hmm. I agree that we've had some of the backtraces look like TLB
flushing might be involved. Not all, though. And I'm not seeing where
a loop over up to 33 pages should matter over doing a full TLB flush.

What *might* matter is if we somehow get that number wrong, and the loops like

                        addr = f->flush_start;
                        while (addr < f->flush_end) {
                                __flush_tlb_single(addr);
                                addr += PAGE_SIZE;
                        }

ends up looping a *lot* due to some bug, and then the IPI itself would
take so long that the watchdog could trigger.

But I do not see how that could actually happen. As far as I can tell,
either the number of pages is limited to less than 33, or we have that
 TLB_FLUSH_ALL case.

Do  you see something I don't?

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 15:22                                                               ` Linus Torvalds
@ 2014-12-04  8:43                                                                 ` Dâniel Fraga
  2014-12-04 16:18                                                                   ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-04  8:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Rorvick, Tejun Heo, Paul E. McKenney, Linux Kernel Mailing List

On Wed, 3 Dec 2014 07:22:44 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Anyway, Dâniel, if you restart the bisection today, start it one
> kernel earlier: re-test the last 'bad' kernel too. So start with
> reconfirming that the c9b88e958182 kernel was bad (that *might* be as
> easy as just checking your old kernel boot logs, and verifying that
> "yes, I really booted it, and yes, it clearly hung and I had to
> hard-reboot into it")

	Linus, today it's your lucky day, because I think I found the
real bad commit (if it isn't, then it's some very close to it). I 
managed to narrow the bisect and here's the result:

fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e is the first bad commit
commit fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date:   Tue Mar 18 21:12:53 2014 +0100

    nohz: Use nohz own full kick on 2nd task enqueue
    
    Now that we have a nohz full remote kick based on irq work, lets use
    it to notify a CPU that it's exiting single task mode.
    
    This unbloats a bit the scheduler IPI that the nohz code was abusing
    for its cool "callable anywhere/anytime" properties.
    
    Acked-by: Peter Zijlstra <peterz@infradead.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Kevin Hilman <khilman@linaro.org>
    Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Viresh Kumar <viresh.kumar@linaro.org>
    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

:040000 040000 5aa326eb5686b9343b56ab5d5e6779a0b988759d
3422b684c6d9121b888360789405050d0cf3cfdf M      kernel

****************************************************************

	And there's a nice Call Trace too:

Dec  4 06:03:33 tux kernel: [  737.180406] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 4, t=240007 jiffies, g=36360, c=36359, q=0)
Dec  4 06:03:33 tux kernel: [  737.180412] sending NMI to all CPUs:
Dec  4 06:03:33 tux kernel: [  737.180416] NMI backtrace for cpu 4
Dec  4 06:03:34 tux kernel: [  737.180419] CPU: 4 PID: 785 Comm: kwin Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:34 tux kernel: [  737.180420] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:34 tux kernel: [  737.180422] task: ffff880215360780 ti: ffff88020630c000 task.ti: ffff88020630c000
Dec  4 06:03:34 tux kernel: [  737.180423] RIP: 0010:[<ffffffff811fec10>]  [<ffffffff811fec10>] __const_udelay+0x0/0x30
Dec  4 06:03:34 tux kernel: [  737.180429] RSP: 0000:ffff88021f303e08  EFLAGS: 00000086
Dec  4 06:03:34 tux kernel: [  737.180430] RAX: 0000000000000c00 RBX: 0000000000002710 RCX: 0000000000000006
Dec  4 06:03:34 tux kernel: [  737.180431] RDX: 0000000000000007 RSI: 0000000000000046 RDI: 0000000000418958
Dec  4 06:03:34 tux kernel: [  737.180433] RBP: ffff88021f303e18 R08: 00000000000005cb R09: 00000000000005cb
Dec  4 06:03:34 tux kernel: [  737.180434] R10: 00000268d4b0862d R11: 00000000000005ca R12: ffffffff814f9680
Dec  4 06:03:34 tux kernel: [  737.180435] R13: ffffffff81521bd8 R14: ffffffff814f9680 R15: 0000000000000004
Dec  4 06:03:34 tux kernel: [  737.180437] FS:  00007fe9f003a900(0000) GS:ffff88021f300000(0000) knlGS:0000000000000000
Dec  4 06:03:34 tux kernel: [  737.180438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:34 tux kernel: [  737.180439] CR2: 00007fe93ba00000 CR3: 000000020635f000 CR4: 00000000000407e0
Dec  4 06:03:34 tux kernel: [  737.180440] Stack:
Dec  4 06:03:34 tux kernel: [  737.180441]  ffffffff8102f759 ffff88021f30d240 ffff88021f303e70 ffffffff8108dd41
Dec  4 06:03:34 tux kernel: [  737.180444]  ffffffff814f9680 ffffffff00000003 0000000000000000 0000000000000001
Dec  4 06:03:34 tux kernel: [  737.180445]  ffff880215360780 0000000000000000 0000000000000004 ffffffff81097730
Dec  4 06:03:34 tux kernel: [  737.180447] Call Trace:
Dec  4 06:03:34 tux kernel: [  737.180449]  <IRQ> 
Dec  4 06:03:34 tux kernel: [  737.180450]  [<ffffffff8102f759>] ? arch_trigger_all_cpu_backtrace+0x59/0x80
Dec  4 06:03:34 tux kernel: [  737.180457]  [<ffffffff8108dd41>] rcu_check_callbacks+0x6d1/0x740
Dec  4 06:03:35 tux kernel: [  737.180460]  [<ffffffff81097730>] ? tick_sched_handle.isra.20+0x40/0x40
Dec  4 06:03:35 tux kernel: [  737.180462]  [<ffffffff8104dea2>] update_process_times+0x42/0x70
Dec  4 06:03:35 tux kernel: [  737.180464]  [<ffffffff81097721>] tick_sched_handle.isra.20+0x31/0x40
Dec  4 06:03:35 tux kernel: [  737.180467]  [<ffffffff81097769>] tick_sched_timer+0x39/0x60
Dec  4 06:03:35 tux kernel: [  737.180469]  [<ffffffff810636a1>] __run_hrtimer.isra.33+0x41/0xd0
Dec  4 06:03:35 tux kernel: [  737.180472]  [<ffffffff81063a4f>] hrtimer_interrupt+0xef/0x250
Dec  4 06:03:35 tux kernel: [  737.180475]  [<ffffffff8102db65>] local_apic_timer_interrupt+0x35/0x60
Dec  4 06:03:35 tux kernel: [  737.180476]  [<ffffffff8102e12a>] smp_apic_timer_interrupt+0x3a/0x50
Dec  4 06:03:35 tux kernel: [  737.180480]  [<ffffffff81391a3a>] apic_timer_interrupt+0x6a/0x70
Dec  4 06:03:35 tux kernel: [  737.180481]  <EOI> 
Dec  4 06:03:35 tux kernel: [  737.180482]  [<ffffffff8109c4c6>] ? smp_call_function_many+0x256/0x270
Dec  4 06:03:35 tux kernel: [  737.180487]  [<ffffffff810c3290>] ? drain_pages+0x90/0x90
Dec  4 06:03:35 tux kernel: [  737.180489]  [<ffffffff810c3290>] ? drain_pages+0x90/0x90
Dec  4 06:03:35 tux kernel: [  737.180491]  [<ffffffff8109c5dc>] on_each_cpu_mask+0x2c/0x70
Dec  4 06:03:35 tux kernel: [  737.180493]  [<ffffffff810c0978>] drain_all_pages+0xb8/0xd0
Dec  4 06:03:35 tux kernel: [  737.180495]  [<ffffffff810c48a9>] __alloc_pages_nodemask+0x699/0x9f0
Dec  4 06:03:35 tux kernel: [  737.180498]  [<ffffffff810f84a2>] alloc_pages_vma+0x72/0x130
Dec  4 06:03:35 tux kernel: [  737.180501]  [<ffffffff8110579d>] do_huge_pmd_anonymous_page+0xed/0x3b0
Dec  4 06:03:35 tux kernel: [  737.180503]  [<ffffffff810e1721>] handle_mm_fault+0x141/0xae0
Dec  4 06:03:35 tux kernel: [  737.180506]  [<ffffffff810e6149>] ? vma_merge+0xf9/0x370
Dec  4 06:03:36 tux kernel: [  737.180508]  [<ffffffff81037f37>] __do_page_fault+0x167/0x4c0
Dec  4 06:03:36 tux kernel: [  737.180510]  [<ffffffff810e7e24>] ? do_mmap_pgoff+0x2e4/0x3d0
Dec  4 06:03:36 tux kernel: [  737.180513]  [<ffffffff8106f205>] ? local_clock+0x25/0x30
Dec  4 06:03:36 tux kernel: [  737.180516]  [<ffffffff8106fa5f>] ? vtime_account_user+0x4f/0x60
Dec  4 06:03:36 tux kernel: [  737.180517]  [<ffffffff810382ce>] do_page_fault+0x1e/0x70
Dec  4 06:03:36 tux kernel: [  737.180520]  [<ffffffff813921c2>] page_fault+0x22/0x30
Dec  4 06:03:36 tux kernel: [  737.180521] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 48 ff c8 75 fb 48 ff c8 5d c3 66 0f 1f 44 00 00 55 48 89 e5 ff 15 2e 61 30 00 5d c3 0f 1f 40 00 <55> 48 8d 04 bd 00 00 00 00 48 89 e5 65 48 8b 14 25 20 12 01 00 
Dec  4 06:03:36 tux kernel: [  737.180542] NMI backtrace for cpu 6
Dec  4 06:03:36 tux kernel: [  737.180545] CPU: 6 PID: 8441 Comm: sh Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:36 tux kernel: [  737.180547] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:36 tux kernel: [  737.180548] task: ffff8800c93d7800 ti: ffff880202094000 task.ti: ffff880202094000
Dec  4 06:03:36 tux kernel: [  737.180550] RIP: 0033:[<00007fcbfadb200f>]  [<00007fcbfadb200f>] 0x7fcbfadb200f
Dec  4 06:03:36 tux kernel: [  737.180563] RSP: 002b:00007fffa287e200  EFLAGS: 00000206
Dec  4 06:03:36 tux kernel: [  737.180564] RAX: 00007fffa287e2f0 RBX: 00007fffa287e2d0 RCX: 000000000095200f
Dec  4 06:03:36 tux kernel: [  737.180565] RDX: 0000000000000001 RSI: 00007fffa287e2f0 RDI: 00007fcbfb12f8a0
Dec  4 06:03:36 tux kernel: [  737.180567] RBP: 000000000095200e R08: 0000000000000000 R09: 00007fffa287e2e0
Dec  4 06:03:36 tux kernel: [  737.180568] R10: 000000000095200d R11: 0000000000000000 R12: 000000000095200f
Dec  4 06:03:36 tux kernel: [  737.180569] R13: 00007fffa287e2e8 R14: 000000000095200f R15: 0000000000000006
Dec  4 06:03:36 tux kernel: [  737.180571] FS:  00007fcbfb728700(0000) GS:ffff88021f380000(0000) knlGS:0000000000000000
Dec  4 06:03:36 tux kernel: [  737.180572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:36 tux kernel: [  737.180573] CR2: 0000000000a91000 CR3: 00000001fdfbe000 CR4: 00000000000407e0
Dec  4 06:03:37 tux kernel: [  737.180574] 
Dec  4 06:03:37 tux kernel: [  737.180576] NMI backtrace for cpu 7
Dec  4 06:03:37 tux kernel: [  737.180579] CPU: 7 PID: 8427 Comm: cc1 Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:37 tux kernel: [  737.180580] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:37 tux kernel: [  737.180582] task: ffff8802048ec380 ti: ffff880204b48000 task.ti: ffff880204b48000
Dec  4 06:03:37 tux kernel: [  737.180583] RIP: 0033:[<0000000000ad9689>]  [<0000000000ad9689>] 0xad9689
Dec  4 06:03:37 tux kernel: [  737.180589] RSP: 002b:00007fffd63b4180  EFLAGS: 00000297
Dec  4 06:03:37 tux kernel: [  737.180590] RAX: 0000000000000059 RBX: 00007f78fe6811b0 RCX: 0000000000000000
Dec  4 06:03:37 tux kernel: [  737.180591] RDX: 0000000000000003 RSI: 0000000000000072 RDI: 00007f78fe6811b0
Dec  4 06:03:37 tux kernel: [  737.180593] RBP: 00007f78fe6811b0 R08: 0000000000000001 R09: 0000000000000000
Dec  4 06:03:37 tux kernel: [  737.180594] R10: 0000000000000000 R11: 0000000000000003 R12: 00000000015f50fa
Dec  4 06:03:37 tux kernel: [  737.180595] R13: 0000000000000000 R14: 0000000000000ba4 R15: 00007f78fe680a20
Dec  4 06:03:37 tux kernel: [  737.180597] FS:  00007f79008f8880(0000) GS:ffff88021f3c0000(0000) knlGS:0000000000000000
Dec  4 06:03:37 tux kernel: [  737.180598] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:37 tux kernel: [  737.180599] CR2: 00007f78fe693000 CR3: 0000000204924000 CR4: 00000000000407e0
Dec  4 06:03:37 tux kernel: [  737.180600] 
Dec  4 06:03:37 tux kernel: [  737.180601] NMI backtrace for cpu 0
Dec  4 06:03:37 tux kernel: [  737.180604] CPU: 0 PID: 29099 Comm: cc1 Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:37 tux kernel: [  737.180605] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:37 tux kernel: [  737.180606] task: ffff8800035e0780 ti: ffff8800035ec000 task.ti: ffff8800035ec000
Dec  4 06:03:37 tux kernel: [  737.180607] RIP: 0033:[<00000000006b1df2>]  [<00000000006b1df2>] 0x6b1df2
Dec  4 06:03:38 tux kernel: [  737.180611] RSP: 002b:00007fffbbb506f0  EFLAGS: 00000202
Dec  4 06:03:38 tux kernel: [  737.180613] RAX: 000000000329e508 RBX: 0000000002f1d888 RCX: 0000000002f1d8e8
Dec  4 06:03:38 tux kernel: [  737.180614] RDX: 000000000329e538 RSI: 0000000000000000 RDI: 0000000002f1d888
Dec  4 06:03:38 tux kernel: [  737.180615] RBP: 0000000003088008 R08: 000000000329e568 R09: 0000000000000000
Dec  4 06:03:38 tux kernel: [  737.180616] R10: 00000000ffffffff R11: 00000000ffffffff R12: 0000000002f1d2b8
Dec  4 06:03:38 tux kernel: [  737.180617] R13: 0000000003266ba8 R14: 0000000000000001 R15: 00007f72248ac3b8
Dec  4 06:03:38 tux kernel: [  737.180619] FS:  00007f72297ee880(0000) GS:ffff88021f200000(0000) knlGS:0000000000000000
Dec  4 06:03:38 tux kernel: [  737.180620] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:38 tux kernel: [  737.180621] CR2: 00007f72215d5e28 CR3: 0000000012d96000 CR4: 00000000000407f0
Dec  4 06:03:38 tux kernel: [  737.180622] 
Dec  4 06:03:38 tux kernel: [  737.180623] NMI backtrace for cpu 1
Dec  4 06:03:38 tux kernel: [  737.180626] CPU: 1 PID: 7864 Comm: cc1 Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:38 tux kernel: [  737.180631] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:38 tux kernel: [  737.180635] task: ffff88020443e900 ti: ffff880203f80000 task.ti: ffff880203f80000
Dec  4 06:03:38 tux kernel: [  737.180638] RIP: 0033:[<00007f90422909df>]  [<00007f90422909df>] 0x7f90422909df
Dec  4 06:03:38 tux kernel: [  737.180648] RSP: 002b:00007fff76432930  EFLAGS: 00000246
Dec  4 06:03:38 tux kernel: [  737.180652] RAX: 00007f90425bb678 RBX: 000000000185f6e0 RCX: 0000000000000440
Dec  4 06:03:38 tux kernel: [  737.180655] RDX: 00007f90425bb678 RSI: 0000000000000000 RDI: 00007f90425bb620
Dec  4 06:03:38 tux kernel: [  737.180660] RBP: 0000000000000400 R08: 00007f904201bae0 R09: 000000000189f130
Dec  4 06:03:38 tux kernel: [  737.180663] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f90425bb620
Dec  4 06:03:38 tux kernel: [  737.180665] R13: 000000000185f7b0 R14: 0000000000000330 R15: 0000000000000000
Dec  4 06:03:38 tux kernel: [  737.180667] FS:  00007f9043b24880(0000) GS:ffff88021f240000(0000) knlGS:0000000000000000
Dec  4 06:03:38 tux kernel: [  737.180668] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:39 tux kernel: [  737.180669] CR2: 00007f903faec000 CR3: 000000010499d000 CR4: 00000000000407e0
Dec  4 06:03:39 tux kernel: [  737.180670] 
Dec  4 06:03:39 tux kernel: [  737.180672] NMI backtrace for cpu 2
Dec  4 06:03:39 tux kernel: [  737.180674] CPU: 2 PID: 587 Comm: X Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:39 tux kernel: [  737.180679] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:39 tux kernel: [  737.180683] task: ffff880215367080 ti: ffff88021391c000 task.ti: ffff88021391c000
Dec  4 06:03:39 tux kernel: [  737.180685] RIP: 0033:[<00007f93727697c2>]  [<00007f93727697c2>] 0x7f93727697c2
Dec  4 06:03:39 tux kernel: [  737.180694] RSP: 002b:00007fff5cb41b80  EFLAGS: 00003246
Dec  4 06:03:39 tux kernel: [  737.180696] RAX: 0000000000000021 RBX: 0000000000000001 RCX: 00007f9372a94620
Dec  4 06:03:39 tux kernel: [  737.180697] RDX: 0000000000000000 RSI: 00000000021abd90 RDI: 00007f9372a94620
Dec  4 06:03:39 tux kernel: [  737.180698] RBP: 00000000021abda0 R08: 0000000000000000 R09: 0000000000000000
Dec  4 06:03:39 tux kernel: [  737.180700] R10: 0000000000864d80 R11: 0000000000000000 R12: 000000000183a440
Dec  4 06:03:39 tux kernel: [  737.180701] R13: 0000000001b83b10 R14: 000000000086daf0 R15: 000000000085f078
Dec  4 06:03:39 tux kernel: [  737.180702] FS:  00007f93760ec880(0000) GS:ffff88021f280000(0000) knlGS:0000000000000000
Dec  4 06:03:39 tux kernel: [  737.180704] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:39 tux kernel: [  737.180705] CR2: 00007f46a794c000 CR3: 0000000214aeb000 CR4: 00000000000407e0
Dec  4 06:03:39 tux kernel: [  737.180706] 
Dec  4 06:03:39 tux kernel: [  737.180708] NMI backtrace for cpu 5
Dec  4 06:03:39 tux kernel: [  737.180710] CPU: 5 PID: 8163 Comm: cc1 Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:39 tux kernel: [  737.180711] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:39 tux kernel: [  737.180713] task: ffff8801afd8d280 ti: ffff8801aff20000 task.ti: ffff8801aff20000
Dec  4 06:03:39 tux kernel: [  737.180714] RIP: 0033:[<00000000006b1ce8>]  [<00000000006b1ce8>] 0x6b1ce8
Dec  4 06:03:39 tux kernel: [  737.180718] RSP: 002b:00007fff3e67c270  EFLAGS: 00000246
Dec  4 06:03:40 tux kernel: [  737.180719] RAX: 00000000032530f8 RBX: 00000000032be448 RCX: 0000000000000000
Dec  4 06:03:40 tux kernel: [  737.180720] RDX: 00000000000000c7 RSI: 0000000003253128 RDI: 00000000032e7ec0
Dec  4 06:03:40 tux kernel: [  737.180722] RBP: 00000000032be488 R08: 00000000ffffffff R09: 0000000000000000
Dec  4 06:03:40 tux kernel: [  737.180723] R10: 0000000000000000 R11: 00000000fffffffe R12: 0000000000000001
Dec  4 06:03:40 tux kernel: [  737.180724] R13: 0000000000000000 R14: 00000000032be448 R15: 0000000003227358
Dec  4 06:03:40 tux kernel: [  737.180725] FS:  00007f3b1ed4b880(0000) GS:ffff88021f340000(0000) knlGS:0000000000000000
Dec  4 06:03:40 tux kernel: [  737.180727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:40 tux kernel: [  737.180728] CR2: 00007f3b1bd32000 CR3: 0000000137da8000 CR4: 00000000000407e0
Dec  4 06:03:40 tux kernel: [  737.180729] 
Dec  4 06:03:40 tux kernel: [  737.180730] NMI backtrace for cpu 3
Dec  4 06:03:40 tux kernel: [  737.180733] CPU: 3 PID: 18986 Comm: cc1 Not tainted 3.16.0-rc1-00005-gfd2ac4f #11
Dec  4 06:03:40 tux kernel: [  737.180734] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3603 11/09/2012
Dec  4 06:03:40 tux kernel: [  737.180736] task: ffff8801af0fad00 ti: ffff88019d734000 task.ti: ffff88019d734000
Dec  4 06:03:40 tux kernel: [  737.180737] RIP: 0010:[<ffffffff8106378c>]  [<ffffffff8106378c>] hrtimer_try_to_cancel+0x5c/0x80
Dec  4 06:03:40 tux kernel: [  737.180740] RSP: 0000:ffff88021f2c3da0  EFLAGS: 00000002
Dec  4 06:03:40 tux kernel: [  737.180741] RAX: ffff88021f2ccf80 RBX: ffff88021f2cda80 RCX: 0000000000000058
Dec  4 06:03:40 tux kernel: [  737.180742] RDX: 0000000000000002 RSI: 000000000000006a RDI: ffff88021f2ccf40
Dec  4 06:03:40 tux kernel: [  737.180744] RBP: ffff88021f2c3db8 R08: 0000000000000df2 R09: 00000000000000ee
Dec  4 06:03:41 tux kernel: [  737.180745] R10: 0000000000000003 R11: 0000000000000020 R12: 00000000ffffffff
Dec  4 06:03:41 tux kernel: [  737.180746] R13: 0000000000000000 R14: 0000000000000003 R15: ffff88021f2ccf40
Dec  4 06:03:41 tux kernel: [  737.180748] FS:  00007ff8b2464880(0000) GS:ffff88021f2c0000(0000) knlGS:0000000000000000
Dec  4 06:03:41 tux kernel: [  737.180749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 06:03:41 tux kernel: [  737.180750] CR2: 00007f9151c00000 CR3: 00000001e70f7000 CR4: 00000000000407e0
Dec  4 06:03:41 tux kernel: [  737.180751] Stack:
Dec  4 06:03:41 tux kernel: [  737.180752]  0000000000000082 ffff88021f2cda80 000000738285228a ffff88021f2c3dd0
Dec  4 06:03:41 tux kernel: [  737.180755]  ffffffff810637ca ffff88021f2cda80 ffff88021f2c3df0 ffffffff81097842
Dec  4 06:03:41 tux kernel: [  737.180757]  ffff88021f2cda80 000000738285228a ffff88021f2c3e10 ffffffff81097c4f
Dec  4 06:03:41 tux kernel: [  737.180759] Call Trace:
Dec  4 06:03:41 tux kernel: [  737.180760]  <IRQ> 
Dec  4 06:03:41 tux kernel: [  737.180761]  [<ffffffff810637ca>] hrtimer_cancel+0x1a/0x30
Dec  4 06:03:41 tux kernel: [  737.180766]  [<ffffffff81097842>] tick_nohz_restart+0x12/0x80
Dec  4 06:03:41 tux kernel: [  737.180769]  [<ffffffff81097c4f>] __tick_nohz_full_check+0x9f/0xb0
Dec  4 06:03:41 tux kernel: [  737.180771]  [<ffffffff81097c69>] nohz_full_kick_work_func+0x9/0x10
Dec  4 06:03:41 tux kernel: [  737.180774]  [<ffffffff810aecd4>] irq_work_run_list+0x44/0x70
Dec  4 06:03:41 tux kernel: [  737.180777]  [<ffffffff81097730>] ? tick_sched_handle.isra.20+0x40/0x40
Dec  4 06:03:41 tux kernel: [  737.180779]  [<ffffffff810aed19>] __irq_work_run+0x19/0x30
Dec  4 06:03:41 tux kernel: [  737.180782]  [<ffffffff810aed98>] irq_work_run+0x18/0x40
Dec  4 06:03:41 tux kernel: [  737.180784]  [<ffffffff8104deb6>] update_process_times+0x56/0x70
Dec  4 06:03:41 tux kernel: [  737.180786]  [<ffffffff81097721>] tick_sched_handle.isra.20+0x31/0x40
Dec  4 06:03:42 tux kernel: [  737.180788]  [<ffffffff81097769>] tick_sched_timer+0x39/0x60
Dec  4 06:03:42 tux kernel: [  737.180790]  [<ffffffff810636a1>] __run_hrtimer.isra.33+0x41/0xd0
Dec  4 06:03:42 tux kernel: [  737.180792]  [<ffffffff81063a4f>] hrtimer_interrupt+0xef/0x250
Dec  4 06:03:42 tux kernel: [  737.180795]  [<ffffffff8102db65>] local_apic_timer_interrupt+0x35/0x60
Dec  4 06:03:42 tux kernel: [  737.180797]  [<ffffffff8102e12a>] smp_apic_timer_interrupt+0x3a/0x50
Dec  4 06:03:42 tux kernel: [  737.180799]  [<ffffffff81391a3a>] apic_timer_interrupt+0x6a/0x70
Dec  4 06:03:42 tux kernel: [  737.180800]  <EOI> 
Dec  4 06:03:42 tux kernel: [  737.180802] Code: 2a 48 c7 c1 40 cf 00
00 65 48 03 0c 25 c8 cd 00 00 48 39 08 48 89 c6 48 89 df 41 b4 01 0f 94
c1 83 e2 02 0f b6 c9 e8 84 f8 ff ff <48> 8b 43 30 48 8b 75 e8 48 8b 38
e8 34 cb 32 00 48 83 c4 08 44 

***************************************************

	And here it's the bisect log:

git bisect start
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# bad: [e9c9eecabaa898ff3fedd98813ee4ac1a00d006a] Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad e9c9eecabaa898ff3fedd98813ee4ac1a00d006a
# good: [c9b88e9581828bb8bba06c5e7ee8ed1761172b6e] Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect good c9b88e9581828bb8bba06c5e7ee8ed1761172b6e
# good: [5bda4f638f36ef4c4e3b1397b02affc3db94356e] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 5bda4f638f36ef4c4e3b1397b02affc3db94356e
# good: [288be943b5024729cd6809b61b62f727960178f3] perf tools: Add dso__data_status_seen()
git bisect good 288be943b5024729cd6809b61b62f727960178f3
# good: [ef35ad26f8ff44d2c93e29952cdb336bda729d9d] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good ef35ad26f8ff44d2c93e29952cdb336bda729d9d
# bad: [b728ca06029d085a1585c1926610f26de93b9146] sched: Rework check_for_tasks()
git bisect bad b728ca06029d085a1585c1926610f26de93b9146
# bad: [541b82644d72c1ef4a0587515a619712c1c19bd3] sched/core: Fix formatting issues in sched_can_stop_tick()
git bisect bad 541b82644d72c1ef4a0587515a619712c1c19bd3
# bad: [3882ec643997757824cd5f25180cd8a787b9dbe1] nohz: Use IPI implicit full barrier against rq->nr_running r/w
git bisect bad 3882ec643997757824cd5f25180cd8a787b9dbe1
# good: [3d36aebc2e78923095575df954f3f3b430ac0a30] nohz: Support nohz full remote kick
git bisect good 3d36aebc2e78923095575df954f3f3b430ac0a30
# bad: [fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e] nohz: Use nohz own full kick on 2nd task enqueue
git bisect bad fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e
# good: [53c5fa16b4c843f1df91f7498e3c7bf95e0eaefa] nohz: Switch to nohz full remote kick on timer enqueue
git bisect good 53c5fa16b4c843f1df91f7498e3c7bf95e0eaefa
# first bad commit: [fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e] nohz:
Use nohz own full kick on 2nd task enqueue

	**************

	I hope it's correct now! If you need more testing don't 
hesitate to ask ;)

	Ps: I have the Call traces for the other bad commits, if it's needed.

-- 
Linux 3.16.0-rc1-00004-g53c5fa1: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-01 18:36                                                 ` Linus Torvalds
@ 2014-12-04 10:51                                                   ` Will Deacon
  2014-12-04 14:56                                                     ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Will Deacon @ 2014-12-04 10:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Tejun Heo, Dave Jones, Andy Lutomirski,
	Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Mon, Dec 01, 2014 at 06:36:04PM +0000, Linus Torvalds wrote:
> On Mon, Dec 1, 2014 at 10:25 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> >
> > No idea about oom_score, but kernel happily accepts chmod on any file
> > under /proc/PID/net/.
> 
> /proc used to accept that fairly widely, but no, we tightened things
> down, and core /proc files end up not accepting chmod. See
> 'proc_setattr()':
> 
>         if (attr->ia_valid & ATTR_MODE)
>                 return -EPERM;
> 
> although particular /proc files could choose to not use 'proc_setattr'
> if they want to.
> 
> The '/proc/pid/net' subtree is obviously not doing that. No idea why,
> and probably for no good reason.

I just hit another one of these, but it's slightly different this time:

  [child1:811] [2219] execve(name="/proc/610/attr/keycreate", argv=0x3a044bf0, envp=0x3a04c810)

this guy disappears off into the execve and never returns. A little later,
another guy gets stuck on a completion after a sync:

  [child0:856] [155] sync()

trinity-c0      D ffffffc000087570     0   856    612 0x00000000
Call trace:
[<ffffffc000087570>] __switch_to+0x74/0x8c
[<ffffffc0005350b4>] __schedule+0x204/0x670
[<ffffffc000535544>] schedule+0x24/0x74
[<ffffffc0005380a4>] schedule_timeout+0x134/0x18c
[<ffffffc000536204>] wait_for_common+0x9c/0x144
[<ffffffc0005362bc>] wait_for_completion+0x10/0x1c
[<ffffffc0001bbc14>] sync_inodes_sb+0x98/0x194
[<ffffffc0001c0244>] sync_inodes_one_sb+0x10/0x1c
[<ffffffc0001984c8>] iterate_supers+0x10c/0x114
[<ffffffc0001c04c0>] sys_sync+0x38/0xa4

The backtrace for 811 looks bogus to me (or we're missing some entries):

trinity-c1      R  running task        0   811    612 0x00000000
Call trace:
[<ffffffc000087570>] __switch_to+0x74/0x8c
[<ffffffc0000ecb48>] __handle_domain_irq+0x9c/0xf4
[<ffffffc000301da4>] __this_cpu_preempt_check+0x14/0x20
[<ffffffc000538a2c>] _raw_spin_lock_irq+0x18/0x58
[<ffffffc000538cb4>] _raw_spin_unlock_irq+0x1c/0x48
[<ffffffc0000fa6bc>] run_timer_softirq+0x68/0x240
[<ffffffc0000b5b2c>] __do_softirq+0x110/0x244
[<ffffffc000301d84>] debug_smp_processor_id+0x18/0x24

and, as before, it has a weird child process that I can't backtrace:

trinity-c1      R  running task        0   861    811 0x00000000
Call trace:

The RCU stall detector gets cross too, but the stall ends before it has
a chance to dump anything.

Will

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04 10:51                                                   ` Will Deacon
@ 2014-12-04 14:56                                                     ` Dave Jones
  2014-12-05 13:49                                                       ` Will Deacon
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-04 14:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, Kirill A. Shutemov, Tejun Heo, Andy Lutomirski,
	Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Thu, Dec 04, 2014 at 10:51:03AM +0000, Will Deacon wrote:

 > and, as before, it has a weird child process that I can't backtrace:
 > 
 > trinity-c1      R  running task        0   861    811 0x00000000
 > Call trace:

When I get these, ftrace is a godsend.

cd /sys/kernel/debug/tracing/
echo 861 >> set_ftrace_pid
echo function_graph >> current_tracer
echo 1 >> tracing_on
wait a little while
cat trace > trace.txt

repeat last two steps if necessary

(assuming it's stuck in kernel space, which /proc/861/stack should
 be able to confirm. Usually when I see this problem, that just shows
 0xffffffffffffffff)

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  5:49                                                           ` Linus Torvalds
@ 2014-12-04 14:57                                                             ` Chris Mason
  2014-12-04 15:22                                                             ` Dave Hansen
  1 sibling, 0 replies; 486+ messages in thread
From: Chris Mason @ 2014-12-04 14:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, John Stultz, Linus Torvalds, Dave Jones,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List



On Thu, Dec 4, 2014 at 12:49 AM, Linus Torvalds 
<torvalds@linux-foundation.org> wrote:
> On Wed, Dec 3, 2014 at 7:15 PM, Chris Mason <clm@fb.com> wrote:
>> 
>>  One guess is that trinity is generating a huge number of tlb
>>  invalidations over sparse and horrible ranges.  Perhaps the old 
>> code was
>>  falling back to full tlb flushes before Dave Hansen's string of 
>> fixes?
> 
> Hmm. I agree that we've had some of the backtraces look like TLB
> flushing might be involved. Not all, though. And I'm not seeing where
> a loop over up to 33 pages should matter over doing a full TLB flush.
> 
> What *might* matter is if we somehow get that number wrong, and the 
> loops like
> 
>                         addr = f->flush_start;
>                         while (addr < f->flush_end) {
>                                 __flush_tlb_single(addr);
>                                 addr += PAGE_SIZE;
>                         }
> 
> ends up looping a *lot* due to some bug, and then the IPI itself would
> take so long that the watchdog could trigger.
> 
> But I do not see how that could actually happen. As far as I can tell,
> either the number of pages is limited to less than 33, or we have that
>  TLB_FLUSH_ALL case.
> 
> Do  you see something I don't?

Sadly not.  Looking harder, I'm pretty sure all of the flushes coming 
through from this path are single page flushes anyway.  So the most 
likely explanation is that we're waiting on the remote CPU, who is 
stuck somewhere secret.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  5:49                                                           ` Linus Torvalds
  2014-12-04 14:57                                                             ` Chris Mason
@ 2014-12-04 15:22                                                             ` Dave Hansen
  2014-12-04 15:30                                                               ` Chris Mason
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Hansen @ 2014-12-04 15:22 UTC (permalink / raw)
  To: Linus Torvalds, Chris Mason, Thomas Gleixner, John Stultz,
	Dave Jones, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/03/2014 09:49 PM, Linus Torvalds wrote:
> On Wed, Dec 3, 2014 at 7:15 PM, Chris Mason <clm@fb.com> wrote:
>>
>> One guess is that trinity is generating a huge number of tlb
>> invalidations over sparse and horrible ranges.  Perhaps the old code was
>> falling back to full tlb flushes before Dave Hansen's string of fixes?
> 
> Hmm. I agree that we've had some of the backtraces look like TLB
> flushing might be involved. Not all, though. And I'm not seeing where
> a loop over up to 33 pages should matter over doing a full TLB flush.
> 
> What *might* matter is if we somehow get that number wrong, and the loops like
> 
>                         addr = f->flush_start;
>                         while (addr < f->flush_end) {
>                                 __flush_tlb_single(addr);
>                                 addr += PAGE_SIZE;
>                         }
> 
> ends up looping a *lot* due to some bug, and then the IPI itself would
> take so long that the watchdog could trigger.
> 
> But I do not see how that could actually happen. As far as I can tell,
> either the number of pages is limited to less than 33, or we have that
>  TLB_FLUSH_ALL case.
> 
> Do  you see something I don't?

The one thing I _do_ see now is a missed TLB flush is we're flushing one
page at the end of the address space.  We'd overflow flush_end back so
flush_end=0:

        if (!f->flush_end)	
                f->flush_end = f->flush_start + PAGE_SIZE; <-- overflow

and we'll never enter the while loop where we actually do the flush:

                        while (addr < f->flush_end) {
                                __flush_tlb_single(addr);
                                addr += PAGE_SIZE;
                        }

But we have a hole up there on x86_64, so this will never happen in
practice there.  It might theoretically apply to 32-bit, but this still
doesn't help with the bug.

Oh, and the tracepoint is spitting out bogus numbers because we need
some parenthesis around the 'nr_pages' calculation.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04 15:22                                                             ` Dave Hansen
@ 2014-12-04 15:30                                                               ` Chris Mason
  0 siblings, 0 replies; 486+ messages in thread
From: Chris Mason @ 2014-12-04 15:30 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linus Torvalds, Thomas Gleixner, John Stultz, Dave Jones,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List

On Thu, Dec 4, 2014 at 10:22 AM, Dave Hansen <dave.hansen@intel.com> 
wrote:
> On 12/03/2014 09:49 PM, Linus Torvalds wrote:
>>  On Wed, Dec 3, 2014 at 7:15 PM, Chris Mason <clm@fb.com> wrote:
>>> 
>>>  One guess is that trinity is generating a huge number of tlb
>>>  invalidations over sparse and horrible ranges.  Perhaps the old 
>>> code was
>>>  falling back to full tlb flushes before Dave Hansen's string of 
>>> fixes?
>> 
>>  Hmm. I agree that we've had some of the backtraces look like TLB
>>  flushing might be involved. Not all, though. And I'm not seeing 
>> where
>>  a loop over up to 33 pages should matter over doing a full TLB 
>> flush.
>> 
>>  What *might* matter is if we somehow get that number wrong, and the 
>> loops like
>> 
>>                          addr = f->flush_start;
>>                          while (addr < f->flush_end) {
>>                                  __flush_tlb_single(addr);
>>                                  addr += PAGE_SIZE;
>>                          }
>> 
>>  ends up looping a *lot* due to some bug, and then the IPI itself 
>> would
>>  take so long that the watchdog could trigger.
>> 
>>  But I do not see how that could actually happen. As far as I can 
>> tell,
>>  either the number of pages is limited to less than 33, or we have 
>> that
>>   TLB_FLUSH_ALL case.
>> 
>>  Do  you see something I don't?
> 
> The one thing I _do_ see now is a missed TLB flush is we're flushing 
> one
> page at the end of the address space.  We'd overflow flush_end back so
> flush_end=0:
> 
>         if (!f->flush_end)
>                 f->flush_end = f->flush_start + PAGE_SIZE; <-- 
> overflow
> 
> and we'll never enter the while loop where we actually do the flush:
> 
>                         while (addr < f->flush_end) {
>                                 __flush_tlb_single(addr);
>                                 addr += PAGE_SIZE;
>                         }
> 
> But we have a hole up there on x86_64, so this will never happen in
> practice there.  It might theoretically apply to 32-bit, but this 
> still
> doesn't help with the bug.
> 
> Oh, and the tracepoint is spitting out bogus numbers because we need
> some parenthesis around the 'nr_pages' calculation.

Yeah, I didn't see any problems with your changes, but I was hoping 
that even a small change like doing 33 flushes at a time was pushing 
Dave's box just over the line.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04  8:43                                                                 ` Dâniel Fraga
@ 2014-12-04 16:18                                                                   ` Linus Torvalds
  2014-12-04 16:52                                                                     ` Frederic Weisbecker
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-04 16:18 UTC (permalink / raw)
  To: Dâniel Fraga, Peter Zijlstra, Frederic Weisbecker, Dave Jones
  Cc: Chris Rorvick, Tejun Heo, Paul E. McKenney, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On Thu, Dec 4, 2014 at 12:43 AM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Linus, today it's your lucky day, because I think I found the
> real bad commit (if it isn't, then it's some very close to it). I
> managed to narrow the bisect and here's the result:

Ok, that actually looks very reasonable, I had actually looked at it
because of the whole "changes IPI" thing.

One more thing to try: does a revert fix it on current git?

It doesn't revert entirely cleanly, but close enough - attached a
quick rough patch that may or may not work, but looks like a good
revert.

Dave - this might be worth testing for you too, exactly because of
that whole "it changes how we do IPI's". It was your bug report with
TLB IPI's that made me look at that commit originally.

                  Linus

---
> fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e is the first bad commit
> commit fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e
> Author: Frederic Weisbecker <fweisbec@gmail.com>
> Date:   Tue Mar 18 21:12:53 2014 +0100
>
>     nohz: Use nohz own full kick on 2nd task enqueue

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1140 bytes --]

 kernel/sched/core.c  | 5 ++++-
 kernel/sched/sched.h | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 89e7283015a6..1b40aed13931 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1577,7 +1577,9 @@ void scheduler_ipi(void)
 	 */
 	preempt_fold_need_resched();
 
-	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list)
+			&& !tick_nohz_full_cpu(smp_processor_id())
+			&& !got_nohz_idle_kick())
 		return;
 
 	/*
@@ -1594,6 +1596,7 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
+	tick_nohz_full_check();
 	sched_ttwu_pending();
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2df8ef067cc5..e9a73143d318 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1245,7 +1245,7 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
 			 * new value of rq->nr_running is visible on reception
 			 * from the target.
 			 */
-			tick_nohz_full_kick_cpu(rq->cpu);
+			smp_send_reschedule(rq->cpu);
 		}
 #endif
 	}

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04 16:18                                                                   ` Linus Torvalds
@ 2014-12-04 16:52                                                                     ` Frederic Weisbecker
  2014-12-04 17:25                                                                       ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Frederic Weisbecker @ 2014-12-04 16:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dâniel Fraga, Peter Zijlstra, Dave Jones, Chris Rorvick,
	Tejun Heo, Paul E. McKenney, Linux Kernel Mailing List

On Thu, Dec 04, 2014 at 08:18:10AM -0800, Linus Torvalds wrote:
> On Thu, Dec 4, 2014 at 12:43 AM, Dâniel Fraga <fragabr@gmail.com> wrote:
> >
> >         Linus, today it's your lucky day, because I think I found the
> > real bad commit (if it isn't, then it's some very close to it). I
> > managed to narrow the bisect and here's the result:
> 
> Ok, that actually looks very reasonable, I had actually looked at it
> because of the whole "changes IPI" thing.
> 
> One more thing to try: does a revert fix it on current git?
> 
> It doesn't revert entirely cleanly, but close enough - attached a
> quick rough patch that may or may not work, but looks like a good
> revert.
> 
> Dave - this might be worth testing for you too, exactly because of
> that whole "it changes how we do IPI's". It was your bug report with
> TLB IPI's that made me look at that commit originally.

I think this is a different issue. What Daniel reported is:

Dec  4 06:03:41 tux kernel: [  737.180761]  [<ffffffff810637ca>] hrtimer_cancel+0x1a/0x30
Dec  4 06:03:41 tux kernel: [  737.180766]  [<ffffffff81097842>] tick_nohz_restart+0x12/0x80
Dec  4 06:03:41 tux kernel: [  737.180769]  [<ffffffff81097c4f>] __tick_nohz_full_check+0x9f/0xb0
Dec  4 06:03:41 tux kernel: [  737.180771]  [<ffffffff81097c69>] nohz_full_kick_work_func+0x9/0x10
Dec  4 06:03:41 tux kernel: [  737.180774]  [<ffffffff810aecd4>] irq_work_run_list+0x44/0x70
Dec  4 06:03:41 tux kernel: [  737.180777]  [<ffffffff81097730>] ? tick_sched_handle.isra.20+0x40/0x40
Dec  4 06:03:41 tux kernel: [  737.180779]  [<ffffffff810aed19>] __irq_work_run+0x19/0x30
Dec  4 06:03:41 tux kernel: [  737.180782]  [<ffffffff810aed98>] irq_work_run+0x18/0x40
Dec  4 06:03:41 tux kernel: [  737.180784]  [<ffffffff8104deb6>] update_process_times+0x56/0x70
Dec  4 06:03:41 tux kernel: [  737.180786]  [<ffffffff81097721>] tick_sched_handle.isra.20+0x31/0x40
Dec  4 06:03:42 tux kernel: [  737.180788]  [<ffffffff81097769>] tick_sched_timer+0x39/0x60
Dec  4 06:03:42 tux kernel: [  737.180790]  [<ffffffff810636a1>] __run_hrtimer.isra.33+0x41/0xd0
Dec  4 06:03:42 tux kernel: [  737.180792]  [<ffffffff81063a4f>] hrtimer_interrupt+0xef/0x250
Dec  4 06:03:42 tux kernel: [  737.180795]  [<ffffffff8102db65>] local_apic_timer_interrupt+0x35/0x60
Dec  4 06:03:42 tux kernel: [  737.180797]  [<ffffffff8102e12a>] smp_apic_timer_interrupt+0x3a/0x50
Dec  4 06:03:42 tux kernel: [  737.180799]  [<ffffffff81391a3a>] apic_timer_interrupt+0x6a/0x70

And this bug has been fixed upstream with:

     _ nohz: nohz full depends on irq work self IPI support
     _ x86: Tell irq work about self IPI support
     _ irq_work: Force raised irq work to run on irq work interrupt
     _ nohz: Move nohz full init call to tick init

These patches have been backported to stable as well.

I suspect Daniel rewinded far enough to fall on that old bug.

Daniel, did you see the above very stacktrace in latest upstream too? Or was it
a different one?

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04 16:52                                                                     ` Frederic Weisbecker
@ 2014-12-04 17:25                                                                       ` Dâniel Fraga
  2014-12-04 17:47                                                                         ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-04 17:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Peter Zijlstra, Dave Jones, Chris Rorvick,
	Tejun Heo, Paul E. McKenney, Linux Kernel Mailing List

On Thu, 4 Dec 2014 17:52:08 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> And this bug has been fixed upstream with:
> 
>      _ nohz: nohz full depends on irq work self IPI support
>      _ x86: Tell irq work about self IPI support
>      _ irq_work: Force raised irq work to run on irq work interrupt
>      _ nohz: Move nohz full init call to tick init
> 
> These patches have been backported to stable as well.
> 
> I suspect Daniel rewinded far enough to fall on that old bug.
> 
> Daniel, did you see the above very stacktrace in latest upstream too? Or was it
> a different one?

	You're completely right Frederic. I was so obsessed with v3.17
that I forgot to check the fixes made after v3.17 (shame on me!).

	The revert patch Linus provided fixes the issue with v3.17, but
since Frederic just confirmed it was fixed after v3.17, then it doesn't
matter anymore.

	Sorry about that Linus. Well, so I'm using 3.18.0-rc7 and
everything is working perfect here. :)

	Thank you.

-- 
Linux 3.18.0-rc7: Diseased Newt
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04 17:25                                                                       ` Dâniel Fraga
@ 2014-12-04 17:47                                                                         ` Linus Torvalds
  2014-12-04 18:07                                                                           ` Dâniel Fraga
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-04 17:47 UTC (permalink / raw)
  To: Dâniel Fraga
  Cc: Frederic Weisbecker, Peter Zijlstra, Dave Jones, Chris Rorvick,
	Tejun Heo, Paul E. McKenney, Linux Kernel Mailing List

On Thu, Dec 4, 2014 at 9:25 AM, Dâniel Fraga <fragabr@gmail.com> wrote:
>
>         Sorry about that Linus. Well, so I'm using 3.18.0-rc7 and
> everything is working perfect here. :)

Ok. Can you make sure to really beat on that kernel, just to make sure
there's nothing else hiding?

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04 17:47                                                                         ` Linus Torvalds
@ 2014-12-04 18:07                                                                           ` Dâniel Fraga
  0 siblings, 0 replies; 486+ messages in thread
From: Dâniel Fraga @ 2014-12-04 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Peter Zijlstra, Dave Jones, Chris Rorvick,
	Tejun Heo, Paul E. McKenney, Linux Kernel Mailing List

On Thu, 4 Dec 2014 09:47:53 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Ok. Can you make sure to really beat on that kernel, just to make sure
> there's nothing else hiding?

	Yes, I'll keep torturing this kernel. If I find something, I'll
report here, but so far, no problem at all.

-- 
Linux 3.18.0-rc7: Diseased Newt
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 23:32                                         ` Sasha Levin
  2014-12-03  0:09                                           ` Linus Torvalds
@ 2014-12-05  5:00                                           ` Sasha Levin
  2014-12-05  6:38                                             ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-05  5:00 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Linus Torvalds, Dâniel Fraga,
	Paul E. McKenney, Linux Kernel Mailing List

On 12/02/2014 06:32 PM, Sasha Levin wrote:
> On 12/02/2014 02:32 PM, Dave Jones wrote:
>> > On Mon, Dec 01, 2014 at 06:08:38PM -0500, Chris Mason wrote:
>> >  > I'm not sure if this is related, but running trinity here, I noticed it
>> >  > was stuck at 100% system time on every CPU.  perf report tells me we are
>> >  > spending all of our time in spin_lock under the sync system call.
>> >  > 
>> >  > I think it's coming from contention in the bdi_queue_work() call from
>> >  > inside sync_inodes_sb, which is spin_lock_bh(). 
>> >  > 
>> >  > I wonder if we're just spinning so hard on this one bh lock that we're
>> >  > starving the watchdog?
>> >  > 
>> >  > Dave, do you have spinlock debugging on?  
>> > 
>> > That has been a constant, yes. I can try with that disabled some time.
> Here's my side of the story: I was observing RCU lockups which went away when
> I disabled verbose printing for fault injections. It seems that printing one
> line ~10 times a second can cause that...

Just to add to this: I've enabled the simplest level of verbosity in the fault
injection options, and RCU stalls are again easy to trigger:

[ 3926.110026] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 3926.110026] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 3926.110026]  0: (62 ticks this GP) idle=b63/140000000000002/0 softirq=20384/20384 last_accelerate: f772/8874, nonlazy_posted: 619783, ..
[ 3926.110026]  (detected by 2, t=30002 jiffies, g=-222, c=-223, q=0)
[ 3926.110026] Task dump for CPU 0:
[ 3926.110026] kworker/dying   R  running task    13464     7      2 0x00080008
[ 3926.110026]  ffffffff814db441 ffff88006aa3bd68 ffffffff814db441 ffffe8fff3c6c9d4
[ 3926.110026]  00000000ffffffff ffff88006aa3bd88 ffffffff814db76b ffffffff93a27440
[ 3926.110026]  dfffe90000000000 ffff88006aa26890 ffff88006aa26000 ffff88006aa3b5d8
[ 3926.110026] Call Trace:
[ 3926.110026]  [<ffffffff814db441>] ? get_parent_ip+0x11/0x50
[ 3926.110026]  [<ffffffff814db441>] ? get_parent_ip+0x11/0x50
[ 3926.110026]  [<ffffffff814db76b>] ? preempt_count_sub+0x11b/0x1d0
[ 3926.110026]  [<ffffffff81429c57>] ? do_exit+0x1687/0x3f20
[ 3926.110026]  [<ffffffff81485578>] ? worker_thread+0xa28/0x1760
[ 3926.110026]  [<ffffffff81484b50>] ? process_one_work+0x17a0/0x17a0
[ 3926.110026]  [<ffffffff8149d719>] ? kthread+0x229/0x320
[ 3926.110026]  [<ffffffff8149d4f0>] ? kthread_worker_fn+0x7d0/0x7d0
[ 3926.110026]  [<ffffffff91ffc0fc>] ? ret_from_fork+0x7c/0xb0
[ 3926.110026]  [<ffffffff8149d4f0>] ? kthread_worker_fn+0x7d0/0x7d0
[ 3926.110033]
[ 3926.110033]  0: (62 ticks this GP) idle=b63/140000000000002/0 softirq=20384/20384 last_accelerate: f772/8876, nonlazy_posted: 619783, ..
[ 3926.110033]  (detected by 10, t=30004 jiffies, g=15638, c=15637, q=63609)
[ 3926.110033] Task dump for CPU 0:
[ 3926.110033] kworker/dying   R  running task    13464     7      2 0x00080008
[ 3926.110033]  ffffffff814db441 ffff88006aa3bd68 ffffffff814db441 ffffe8fff3c6c9d4
[ 3926.110033]  00000000ffffffff ffff88006aa3bd88 ffffffff814db76b ffffffff93a27440
[ 3926.110033]  dfffe90000000000 ffff88006aa26890 ffff88006aa26000 ffff88006aa3b5d8
[ 3926.110033] Call Trace:
[ 3926.110033]  [<ffffffff814db441>] ? get_parent_ip+0x11/0x50
[ 3926.110033]  [<ffffffff814db441>] ? get_parent_ip+0x11/0x50
[ 3926.110033]  [<ffffffff814db76b>] ? preempt_count_sub+0x11b/0x1d0
[ 3926.110033]  [<ffffffff81429c57>] ? do_exit+0x1687/0x3f20
[ 3926.110033]  [<ffffffff81485578>] ? worker_thread+0xa28/0x1760
[ 3926.110033]  [<ffffffff81484b50>] ? process_one_work+0x17a0/0x17a0
[ 3926.110033]  [<ffffffff8149d719>] ? kthread+0x229/0x320
[ 3926.110033]  [<ffffffff8149d4f0>] ? kthread_worker_fn+0x7d0/0x7d0
[ 3926.110033]  [<ffffffff91ffc0fc>] ? ret_from_fork+0x7c/0xb0
[ 3926.110033]  [<ffffffff8149d4f0>] ? kthread_worker_fn+0x7d0/0x7d0


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05  5:00                                           ` Sasha Levin
@ 2014-12-05  6:38                                             ` Linus Torvalds
  2014-12-05 15:03                                               ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-05  6:38 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, Chris Mason, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 4, 2014 at 9:00 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> Just to add to this: I've enabled the simplest level of verbosity in the fault
> injection options, and RCU stalls are again easy to trigger:

Where do you log to?

In particular, do you have some serial line logging enabled (perhaps
even without anybody listening)? Or network logging?

We've definitely had cases where logging itself is so slow that it
triggers watchdogs etc. Logging to a serial line can take for*ever* in
modern terms..

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-04 14:56                                                     ` Dave Jones
@ 2014-12-05 13:49                                                       ` Will Deacon
  0 siblings, 0 replies; 486+ messages in thread
From: Will Deacon @ 2014-12-05 13:49 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Kirill A. Shutemov, Tejun Heo,
	Andy Lutomirski, Don Zickus, Thomas Gleixner, Linux Kernel,
	the arch/x86 maintainers, Peter Zijlstra

On Thu, Dec 04, 2014 at 02:56:31PM +0000, Dave Jones wrote:
> On Thu, Dec 04, 2014 at 10:51:03AM +0000, Will Deacon wrote:
> 
>  > and, as before, it has a weird child process that I can't backtrace:
>  > 
>  > trinity-c1      R  running task        0   861    811 0x00000000
>  > Call trace:
> 
> When I get these, ftrace is a godsend.
> 
> cd /sys/kernel/debug/tracing/
> echo 861 >> set_ftrace_pid
> echo function_graph >> current_tracer
> echo 1 >> tracing_on
> wait a little while
> cat trace > trace.txt
> 
> repeat last two steps if necessary
> 
> (assuming it's stuck in kernel space, which /proc/861/stack should
>  be able to confirm. Usually when I see this problem, that just shows
>  0xffffffffffffffff)

That would be great if I had a working shell :)

I tried enabling the thing before starting trinity, in the hope of dumping
the buffer using sysrq, but now I seem to have hit an unrelated panic in
the ftrace code.

Will

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05  6:38                                             ` Linus Torvalds
@ 2014-12-05 15:03                                               ` Sasha Levin
  2014-12-05 18:15                                                 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-05 15:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/05/2014 01:38 AM, Linus Torvalds wrote:
> On Thu, Dec 4, 2014 at 9:00 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> >
>> > Just to add to this: I've enabled the simplest level of verbosity in the fault
>> > injection options, and RCU stalls are again easy to trigger:
> Where do you log to?
> 
> In particular, do you have some serial line logging enabled (perhaps
> even without anybody listening)? Or network logging?
> 
> We've definitely had cases where logging itself is so slow that it
> triggers watchdogs etc. Logging to a serial line can take for*ever* in
> modern terms..

Yes, it's going to a serial line, but it's only about 100 lines/second on
average. I wouldn't expect it to cause anything to hang!


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-03 18:45                                               ` Linus Torvalds
  2014-12-03 19:00                                                 ` Dave Jones
  2014-12-04  0:27                                                 ` Dave Jones
@ 2014-12-05 17:15                                                 ` Dave Jones
  2014-12-05 18:38                                                   ` Linus Torvalds
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-05 17:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Wed, Dec 03, 2014 at 10:45:57AM -0800, Linus Torvalds wrote:
 > On Wed, Dec 3, 2014 at 10:41 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > I've been stuck on this kernel for a few days now trying to prove it
 > > good/bad one way or the other, and I'm leaning towards good, given
 > > that it recovers, even though the traces look similar.
 > 
 > Ugh. But this does *not* happen with 3.16, right? Even the non-fatal case?
 > 
 > If so, I'd be inclined to call it "bad". But there might well be two
 > bugs: one that makes that NMI watchdog trigger, and another one that
 > then makes it be a hard lockup. I'd think it would be good to figure
 > out the "NMI watchdog starts triggering" one first, though.

A bisect later, and I landed on a kernel that ran for a day, before
spewing NMI messages, recovering, and then..

http://codemonkey.org.uk/junk/log.txt

I could log in, but every command I tried (even shell built-ins) just printed 'bus error'.

I saw those end_request messages in an earlier bisect, I wonder if there
was an actual bug that got fixed where allowed non-root to try and do
bad things to raw devices. It's always sector 0 too.

Yet again, I'm wondering if this whole thing is just signs of early hardware death.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 15:03                                               ` Sasha Levin
@ 2014-12-05 18:15                                                 ` Linus Torvalds
  2014-12-07 14:58                                                   ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-05 18:15 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, Chris Mason, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 5, 2014 at 7:03 AM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> Yes, it's going to a serial line, but it's only about 100 lines/second on
> average. I wouldn't expect it to cause anything to hang!

A regular 16650 serial chip? Running at 115kbps, I assume? So that's
about 11kB/s.

And the serial console is polling, since it can't sleep or depend on interrupts.

At a average line length of what, 40 characters? At less than 300
lines/s, you'd be using up 100% of one CPU. And since the printouts
are serialized, that would be all other CPU's too..

100 lines/s _average_ means that I can easily see it be 300lines/s for a while.

So yeah. The serial console is simply not designed to handle
continuous output. It's for the "occasional" stuff.

The fact that your rcu lockups go away when you make the fault
injection be quiet makes me really suspect this is related.

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 17:15                                                 ` Dave Jones
@ 2014-12-05 18:38                                                   ` Linus Torvalds
  2014-12-05 18:48                                                     ` Dave Jones
                                                                       ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-05 18:38 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Fri, Dec 5, 2014 at 9:15 AM, Dave Jones <davej@redhat.com> wrote:
>
> A bisect later, and I landed on a kernel that ran for a day, before
> spewing NMI messages, recovering, and then..
>
> http://codemonkey.org.uk/junk/log.txt

I have to admit I'm seeing absolutely nothing sensible in there.

Call it bad, and see if bisection ends up slowly -oh so slowly -
pointing to some direction. Because I don't think it's the hardware,
considering that apparently 3.16 is solid. And the spews themselves
are so incomprehensible that I'm not seeing any pattern what-so-ever.

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 18:38                                                   ` Linus Torvalds
@ 2014-12-05 18:48                                                     ` Dave Jones
  2014-12-05 19:31                                                       ` Linus Torvalds
  2014-12-06  9:37                                                       ` Chuck Ebbert
  2014-12-05 19:04                                                     ` Chris Mason
  2014-12-06  5:04                                                     ` Gene Heskett
  2 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-05 18:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 05, 2014 at 10:38:55AM -0800, Linus Torvalds wrote:
 > On Fri, Dec 5, 2014 at 9:15 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > A bisect later, and I landed on a kernel that ran for a day, before
 > > spewing NMI messages, recovering, and then..
 > >
 > > http://codemonkey.org.uk/junk/log.txt
 > 
 > I have to admit I'm seeing absolutely nothing sensible in there.
 > 
 > Call it bad, and see if bisection ends up slowly -oh so slowly -
 > pointing to some direction. Because I don't think it's the hardware,
 > considering that apparently 3.16 is solid. And the spews themselves
 > are so incomprehensible that I'm not seeing any pattern what-so-ever.

Will do.
In the meantime, I rebooted into the same kernel, and ran trinity
solely doing the lsetxattr syscalls. The load was a bit lower, so I
cranked up the number of child processes to 512, and then this
happened..

[ 1611.746960] ------------[ cut here ]------------
[ 1611.747053] WARNING: CPU: 0 PID: 14810 at kernel/watchdog.c:265 watchdog_overflow_callback+0xd5/0x120()
[ 1611.747083] Watchdog detected hard LOCKUP on cpu 0
[ 1611.747097] Modules linked in:
[ 1611.747112]  rfcomm hidp bnep scsi_transport_iscsi can_bcm nfnetlink can_raw nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm e1000e crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec microcode serio_raw pcspkr snd_hwdep snd_seq snd_seq_device nfsd usb_debug snd_pcm ptp shpchp pps_core snd_timer snd soundcore auth_rpcgss oid_registry nfs_acl lockd sunrpc
[ 1611.747389] CPU: 0 PID: 14810 Comm: trinity-c304 Not tainted 3.16.0+ #114
[ 1611.747449]  0000000000000000 000000007964733e ffff880244006be0 ffffffff8178fccb
[ 1611.747481]  ffff880244006c28 ffff880244006c18 ffffffff81073ecd 0000000000000000
[ 1611.747512]  0000000000000000 ffff880244006d58 ffff880244006ef8 0000000000000000
[ 1611.747544] Call Trace:
[ 1611.747555]  <NMI>  [<ffffffff8178fccb>] dump_stack+0x4e/0x7a
[ 1611.747582]  [<ffffffff81073ecd>] warn_slowpath_common+0x7d/0xa0
[ 1611.747604]  [<ffffffff81073f4c>] warn_slowpath_fmt+0x5c/0x80
[ 1611.747625]  [<ffffffff811255c0>] ? restart_watchdog_hrtimer+0x50/0x50
[ 1611.747648]  [<ffffffff81125695>] watchdog_overflow_callback+0xd5/0x120
[ 1611.747673]  [<ffffffff8116446c>] __perf_event_overflow+0xac/0x2a0
[ 1611.747696]  [<ffffffff81018ffe>] ? x86_perf_event_set_period+0xde/0x150
[ 1611.747720]  [<ffffffff81165034>] perf_event_overflow+0x14/0x20
[ 1611.747742]  [<ffffffff8101ed56>] intel_pmu_handle_irq+0x206/0x410
[ 1611.747764]  [<ffffffff81017e5b>] perf_event_nmi_handler+0x2b/0x50
[ 1611.747787]  [<ffffffff81007403>] nmi_handle+0xa3/0x1b0
[ 1611.747807]  [<ffffffff81007365>] ? nmi_handle+0x5/0x1b0
[ 1611.747827]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
[ 1611.748699]  [<ffffffff81007742>] default_do_nmi+0x72/0x1c0
[ 1611.749570]  [<ffffffff81007948>] do_nmi+0xb8/0xf0
[ 1611.750438]  [<ffffffff8179dd2a>] end_repeat_nmi+0x1e/0x2e
[ 1611.751312]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
[ 1611.752177]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
[ 1611.753025]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
[ 1611.753861]  <<EOE>>  [<ffffffff810fee07>] is_module_text_address+0x17/0x50
[ 1611.754734]  [<ffffffff81092ab8>] __kernel_text_address+0x58/0x80
[ 1611.755575]  [<ffffffff81006b5f>] print_context_stack+0x8f/0x100
[ 1611.756410]  [<ffffffff81005540>] dump_trace+0x140/0x370
[ 1611.757242]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
[ 1611.758072]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
[ 1611.758895]  [<ffffffff810137cb>] save_stack_trace+0x2b/0x50
[ 1611.759720]  [<ffffffff811c29a0>] set_track+0x70/0x140
[ 1611.760541]  [<ffffffff8178d993>] alloc_debug_processing+0x92/0x118
[ 1611.761366]  [<ffffffff8178e5d6>] __slab_alloc+0x45f/0x56f
[ 1611.762195]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
[ 1611.763024]  [<ffffffff8178dd57>] ? __slab_free+0x114/0x309
[ 1611.763853]  [<ffffffff8137187e>] ? debug_check_no_obj_freed+0x17e/0x270
[ 1611.764712]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
[ 1611.765539]  [<ffffffff811c6b26>] kmem_cache_alloc+0x1f6/0x270
[ 1611.766364]  [<ffffffff810a7035>] ? local_clock+0x25/0x30
[ 1611.767183]  [<ffffffff811e797f>] getname_flags+0x4f/0x1a0
[ 1611.768004]  [<ffffffff811ed7e5>] user_path_at_empty+0x45/0xc0
[ 1611.768827]  [<ffffffff810a13cb>] ? preempt_count_sub+0x6b/0xf0
[ 1611.769649]  [<ffffffff810be35e>] ? put_lock_stats.isra.23+0xe/0x30
[ 1611.770470]  [<ffffffff810be67d>] ? lock_release_holdtime.part.24+0x9d/0x160
[ 1611.771297]  [<ffffffff811fdedd>] ? mntput_no_expire+0x6d/0x160
[ 1611.772129]  [<ffffffff811ed871>] user_path_at+0x11/0x20
[ 1611.772959]  [<ffffffff812040cb>] SyS_lsetxattr+0x4b/0xf0
[ 1611.773783]  [<ffffffff8179bc92>] system_call_fastpath+0x16/0x1b
[ 1611.774631] ---[ end trace 5beef170ba6002cc ]---
[ 1611.775514] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 28.493 msecs
[ 1611.776368] perf interrupt took too long (223592 > 2500), lowering kernel.perf_event_max_sample_rate to 50000


I don't really know if that's indicative of anything useful, but it
at least might have been how we triggered the NMI in the previous run.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 18:38                                                   ` Linus Torvalds
  2014-12-05 18:48                                                     ` Dave Jones
@ 2014-12-05 19:04                                                     ` Chris Mason
  2014-12-05 19:29                                                       ` Linus Torvalds
  2014-12-06  5:04                                                     ` Gene Heskett
  2 siblings, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-05 19:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Linus Torvalds, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List



On Fri, Dec 5, 2014 at 1:38 PM, Linus Torvalds 
<torvalds@linux-foundation.org> wrote:
> On Fri, Dec 5, 2014 at 9:15 AM, Dave Jones <davej@redhat.com> wrote:
>> 
>>  A bisect later, and I landed on a kernel that ran for a day, before
>>  spewing NMI messages, recovering, and then..
>> 
>>  
>> https://urldefense.proofpoint.com/v1/url?u=http://codemonkey.org.uk/junk/log.txt&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=APfD8%2BRkGVsO9UHnH6Oo05Zuoh90VyaaF71AycsnLbQ%3D%0A&s=de71b34f3a7da1c7b8f12dcd760c271657f9f7e2a93b4d2e296b2c687cee5157
> 
> I have to admit I'm seeing absolutely nothing sensible in there.
> 
> Call it bad, and see if bisection ends up slowly -oh so slowly -
> pointing to some direction. Because I don't think it's the hardware,
> considering that apparently 3.16 is solid. And the spews themselves
> are so incomprehensible that I'm not seeing any pattern what-so-ever.

I went back through all of the traces Dave has posted in this thread.  
This one looks like vm debugging is on:

 http://marc.info/?l=linux-kernel&m=141632237304726&w=2

Another had a function call from CONFIG_DEBUG_PAGEALLOC:

http://marc.info/?l=linux-kernel&m=141701248210949&w=2

So one idea is that our allocation/freeing of pages is dramatically 
more expensive and we're hitting a strange edge condition.  Maybe we're 
even faulting on a readonly page from a horrible place?

[83246.925234] end_request: I/O error, dev sda, sector 0

Ext3/4 shouldn't be doing IO to sector zero.  Something is stomping on 
ram?

-chris


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 19:04                                                     ` Chris Mason
@ 2014-12-05 19:29                                                       ` Linus Torvalds
  2014-12-11 14:54                                                         ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-05 19:29 UTC (permalink / raw)
  To: Chris Mason
  Cc: Dave Jones, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 5, 2014 at 11:04 AM, Chris Mason <clm@fb.com> wrote:
>
> So one idea is that our allocation/freeing of pages is dramatically more
> expensive and we're hitting a strange edge condition.  Maybe we're even
> faulting on a readonly page from a horrible place?

Well, various allocators have definitely shown up a lot.
DEBUG_PAGEALLOC does horrible things to performance, though, and the
kernel will just spend a *lot* of time in memory allocators when it is
on. So it might just be "yeah, the traces show allocations a lot, but
that might just be because allocation is slow". The last one showed
slub debugging - getting a call trace for the allocation.

> [83246.925234] end_request: I/O error, dev sda, sector 0
>
> Ext3/4 shouldn't be doing IO to sector zero.  Something is stomping on ram?

I'd buy memory corruption through wild pointers as the reason, but
quite frankly, that tends to have completely different failure modes.
Not NMI watchdogs.

So it must be some very particular corruption. I still vote for "let's
see if Dave can narrow it down with bisection".

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 18:48                                                     ` Dave Jones
@ 2014-12-05 19:31                                                       ` Linus Torvalds
  2014-12-05 19:37                                                         ` Dave Jones
  2014-12-06 22:38                                                         ` Thomas Gleixner
  2014-12-06  9:37                                                       ` Chuck Ebbert
  1 sibling, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-05 19:31 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Fri, Dec 5, 2014 at 10:48 AM, Dave Jones <davej@redhat.com> wrote:
>
> In the meantime, I rebooted into the same kernel, and ran trinity
> solely doing the lsetxattr syscalls.

Any particular reason for the lsetxattr guess? Just the last call
chain? I don't recognize it from the other traces, but maybe I just
didn't notice.

>   The load was a bit lower, so I
> cranked up the number of child processes to 512, and then this
> happened..

Ugh. "dump_trace()" being broken and looping forever? I don't actually
believe it, because this isn't even on the exception stack (well, the
NMI dumper is, but that one worked fine - this is the "nested" dumping
of just the allocation call chain)

Smells like more random callchains to me. Unless this one is repeatable.

Limiting trinity to just lsetxattr is interesting. Did it make things
fail faster?

                     Linus

> [ 1611.747053] WARNING: CPU: 0 PID: 14810 at kernel/watchdog.c:265 watchdog_overflow_callback+0xd5/0x120()
> [ 1611.747083] Watchdog detected hard LOCKUP on cpu 0
> [ 1611.747389] CPU: 0 PID: 14810 Comm: trinity-c304 Not tainted 3.16.0+ #114
> [ 1611.747544] Call Trace:
>    [ remnoved NMI perf event stack trace ]
> [ 1611.753861]  [<ffffffff810fee07>] is_module_text_address+0x17/0x50
> [ 1611.754734]  [<ffffffff81092ab8>] __kernel_text_address+0x58/0x80
> [ 1611.755575]  [<ffffffff81006b5f>] print_context_stack+0x8f/0x100
> [ 1611.756410]  [<ffffffff81005540>] dump_trace+0x140/0x370
> [ 1611.758895]  [<ffffffff810137cb>] save_stack_trace+0x2b/0x50
> [ 1611.759720]  [<ffffffff811c29a0>] set_track+0x70/0x140
> [ 1611.760541]  [<ffffffff8178d993>] alloc_debug_processing+0x92/0x118
> [ 1611.761366]  [<ffffffff8178e5d6>] __slab_alloc+0x45f/0x56f
> [ 1611.765539]  [<ffffffff811c6b26>] kmem_cache_alloc+0x1f6/0x270
> [ 1611.767183]  [<ffffffff811e797f>] getname_flags+0x4f/0x1a0
> [ 1611.768004]  [<ffffffff811ed7e5>] user_path_at_empty+0x45/0xc0
> [ 1611.772129]  [<ffffffff811ed871>] user_path_at+0x11/0x20
> [ 1611.772959]  [<ffffffff812040cb>] SyS_lsetxattr+0x4b/0xf0
> [ 1611.773783]  [<ffffffff8179bc92>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 19:31                                                       ` Linus Torvalds
@ 2014-12-05 19:37                                                         ` Dave Jones
  2014-12-06 22:38                                                         ` Thomas Gleixner
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-05 19:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 05, 2014 at 11:31:11AM -0800, Linus Torvalds wrote:
 > On Fri, Dec 5, 2014 at 10:48 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > In the meantime, I rebooted into the same kernel, and ran trinity
 > > solely doing the lsetxattr syscalls.
 > 
 > Any particular reason for the lsetxattr guess? Just the last call
 > chain? I don't recognize it from the other traces, but maybe I just
 > didn't notice.

yeah just a wild guess, just that that trace looked so.. clean.

 > >   The load was a bit lower, so I
 > > cranked up the number of child processes to 512, and then this
 > > happened..
 >
 > Ugh. "dump_trace()" being broken and looping forever? I don't actually
 > believe it, because this isn't even on the exception stack (well, the
 > NMI dumper is, but that one worked fine - this is the "nested" dumping
 > of just the allocation call chain)
 >
 > Smells like more random callchains to me. Unless this one is repeatable.
 >
 > Limiting trinity to just lsetxattr is interesting. Did it make things
 > fail faster?

It sure failed quickly, but not in the "machine is totally locked up"
sense, just "shit is all corrupted" sense. So it might be a completely different
thing, or it could be a different manifestation of a corruptor.

I guess we'll see how things go now that I marked it 'bad'.

I'll give it a quick run with just lsetxattr again just to see what
happens, but before I leave this one run over the weekend, I'll switch
it back to "do everything", and pick it up again on Monday.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 18:38                                                   ` Linus Torvalds
  2014-12-05 18:48                                                     ` Dave Jones
  2014-12-05 19:04                                                     ` Chris Mason
@ 2014-12-06  5:04                                                     ` Gene Heskett
  2 siblings, 0 replies; 486+ messages in thread
From: Gene Heskett @ 2014-12-06  5:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Friday 05 December 2014, Linus Torvalds wrote:
>On Fri, Dec 5, 2014 at 9:15 AM, Dave Jones <davej@redhat.com> wrote:
>> A bisect later, and I landed on a kernel that ran for a day, before
>> spewing NMI messages, recovering, and then..
>> 
>> http://codemonkey.org.uk/junk/log.txt
>
>I have to admit I'm seeing absolutely nothing sensible in there.
>
>Call it bad, and see if bisection ends up slowly -oh so slowly -
>pointing to some direction. Because I don't think it's the hardware,
>considering that apparently 3.16 is solid. And the spews themselves
>are so incomprehensible that I'm not seeing any pattern what-so-ever.
>
>                     Linus

Sort of in the FWIW category, may not mean a thing.

I did find something in 3.16.0 that is troubling me at times, causing a 
very busy quad core Phenom.  But I have located the culprit in my case. 

Look at your Xorg.0.log. Because the nouveau bits in the 3.16.0 kernel I'm 
using are no longer 100% compatible with an Xorg install thats now 5 years 
old, I am generating Xorg.0.logs that can reach 500 megabyte or more in a 
couple weeks. I am trying to run down the latest Xorg I can build here on 
this *buntu 10.04.4 LTS box, but haven't located a URL to get the tarball 
from. Yet, its been busy here.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>
US V Castleman, SCOTUS, Mar 2014 is grounds for Impeaching SCOTUS

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 18:48                                                     ` Dave Jones
  2014-12-05 19:31                                                       ` Linus Torvalds
@ 2014-12-06  9:37                                                       ` Chuck Ebbert
  2014-12-06 16:22                                                         ` Martin van Es
  2014-12-06 22:14                                                         ` Thomas Gleixner
  1 sibling, 2 replies; 486+ messages in thread
From: Chuck Ebbert @ 2014-12-06  9:37 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, 5 Dec 2014 13:48:08 -0500
Dave Jones <davej@redhat.com> wrote:

> [ 1611.749570]  [<ffffffff81007948>] do_nmi+0xb8/0xf0
> [ 1611.750438]  [<ffffffff8179dd2a>] end_repeat_nmi+0x1e/0x2e
> [ 1611.751312]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
> [ 1611.752177]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
> [ 1611.753025]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
> [ 1611.753861]  <<EOE>>  [<ffffffff810fee07>] is_module_text_address+0x17/0x50
> [ 1611.754734]  [<ffffffff81092ab8>] __kernel_text_address+0x58/0x80
> [ 1611.755575]  [<ffffffff81006b5f>] print_context_stack+0x8f/0x100
> [ 1611.756410]  [<ffffffff81005540>] dump_trace+0x140/0x370
> [ 1611.757242]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> [ 1611.758072]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> [ 1611.758895]  [<ffffffff810137cb>] save_stack_trace+0x2b/0x50
> [ 1611.759720]  [<ffffffff811c29a0>] set_track+0x70/0x140
> [ 1611.760541]  [<ffffffff8178d993>] alloc_debug_processing+0x92/0x118
> [ 1611.761366]  [<ffffffff8178e5d6>] __slab_alloc+0x45f/0x56f
> [ 1611.762195]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> [ 1611.763024]  [<ffffffff8178dd57>] ? __slab_free+0x114/0x309
> [ 1611.763853]  [<ffffffff8137187e>] ? debug_check_no_obj_freed+0x17e/0x270
> [ 1611.764712]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> [ 1611.765539]  [<ffffffff811c6b26>] kmem_cache_alloc+0x1f6/0x270

So, every time there is a slab allocation the entire stack trace gets
saved as human readable text. And for each line in the trace,
is_module_text_address() can be called, which has huge overhead
walking the entire list of loaded modules. No wonder there are
timeouts...

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-06  9:37                                                       ` Chuck Ebbert
@ 2014-12-06 16:22                                                         ` Martin van Es
  2014-12-06 20:09                                                           ` Linus Torvalds
  2014-12-06 22:14                                                         ` Thomas Gleixner
  1 sibling, 1 reply; 486+ messages in thread
From: Martin van Es @ 2014-12-06 16:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi,

I've been following this thread with some interrest because my mythtv
based media centre suffered from sudden freezes. I thought I had tried
everything to solve the problem (including exchanging hardware) until
I caught the slashdot article discussing this bug. I was running on
3.17.3 and and experienced daily freezes while watching DVB-C recorded
H.264 (HD) video, both live and pre-recorded streams. Both using VAAPI
GPU accelaration and CPU decoding. Ever since downgrading to 3.16.7 my
system has been solid as a rock again.

The load on this machine is marginal. Most playback is done using GPU
and other than that, the main load is hammering at most 2 HD
(1G/10mins) streams to SSD. When idle, the system crawls DVB channels
for EPG info. Crashes happened mostly while watching TV and ~5 minutes
after a background recording started or after a couple of hours
watching live TV, but were very hard to trigger. When freezing the
system was completely inaccessible, without any panic logging on (ssh
remote connected) console. I have never been able to find any evidence
in old logs after reboot. The freezes happened both with and without
NMI watchdog enabled.

Hardware is J1900 BayTrail (ValleyView) based ASRock miniITX board.
The DVB-C card requires out-of-tree built ddbridge drivers
(~endriss/media_build_expermintal) so the kernels (both stable and
unstable) are tainted I guess?

I'm a moderately experienced Linux user. I do build my own kernels,
but am not very knowledgable about the inner workings you guys are
discussing here. The system is a (family) production device so there
is not a lot of room for testing, let alone bisecting. I do now have a
spare J1900 lying around, so there is opportunity to build a dedicated
test system if needed.

Hope this may help in finding the right direction for this bug?

Regards,
Martin
-- 
If 'but' was any useful, it would be a logic operator

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-06 16:22                                                         ` Martin van Es
@ 2014-12-06 20:09                                                           ` Linus Torvalds
  2014-12-06 20:41                                                             ` Linus Torvalds
                                                                               ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-06 20:09 UTC (permalink / raw)
  To: Martin van Es; +Cc: Linux Kernel Mailing List

On Sat, Dec 6, 2014 at 8:22 AM, Martin van Es <mrvanes@gmail.com> wrote:
>
> Hope this may help in finding the right direction for this bug?

If you can reproduce it with your spare J1900 system and could perhaps
bisect it there, that would be a huge help.

                            Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-06 20:09                                                           ` Linus Torvalds
@ 2014-12-06 20:41                                                             ` Linus Torvalds
  2014-12-06 21:14                                                             ` Martin van Es
  2014-12-12 12:58                                                             ` Martin van Es
  2 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-06 20:41 UTC (permalink / raw)
  To: Martin van Es; +Cc: Linux Kernel Mailing List

On Sat, Dec 6, 2014 at 12:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Dec 6, 2014 at 8:22 AM, Martin van Es <mrvanes@gmail.com> wrote:
>>
>> Hope this may help in finding the right direction for this bug?
>
> If you can reproduce it with your spare J1900 system and could perhaps
> bisect it there, that would be a huge help.

Side note: your load sounds superficially more like the one Dâniel
Fraga had, who saw lockups with 3.17 but 3.18-rc7 actually works for
him.

But if you're running 3.17.3, I think you should already have all the
relevant fixes throuigh the -stable tree.

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-06 20:09                                                           ` Linus Torvalds
  2014-12-06 20:41                                                             ` Linus Torvalds
@ 2014-12-06 21:14                                                             ` Martin van Es
  2014-12-12 12:58                                                             ` Martin van Es
  2 siblings, 0 replies; 486+ messages in thread
From: Martin van Es @ 2014-12-06 21:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sat, Dec 6, 2014 at 9:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Dec 6, 2014 at 8:22 AM, Martin van Es <mrvanes@gmail.com> wrote:
>>
>> Hope this may help in finding the right direction for this bug?
>
> If you can reproduce it with your spare J1900 system and could perhaps
> bisect it there, that would be a huge help.
>

I'll give it a shot and see if I can get to freeze it on 3.17.3 as a
start by playing content from the prd backend, but don't expect fast
respons times... busy man...

M.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-06  9:37                                                       ` Chuck Ebbert
  2014-12-06 16:22                                                         ` Martin van Es
@ 2014-12-06 22:14                                                         ` Thomas Gleixner
  1 sibling, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-06 22:14 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Sat, 6 Dec 2014, Chuck Ebbert wrote:
> On Fri, 5 Dec 2014 13:48:08 -0500
> Dave Jones <davej@redhat.com> wrote:
> 
> > [ 1611.749570]  [<ffffffff81007948>] do_nmi+0xb8/0xf0
> > [ 1611.750438]  [<ffffffff8179dd2a>] end_repeat_nmi+0x1e/0x2e
> > [ 1611.751312]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
> > [ 1611.752177]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
> > [ 1611.753025]  [<ffffffff810a12c8>] ? preempt_count_add+0x18/0xb0
> > [ 1611.753861]  <<EOE>>  [<ffffffff810fee07>] is_module_text_address+0x17/0x50
> > [ 1611.754734]  [<ffffffff81092ab8>] __kernel_text_address+0x58/0x80
> > [ 1611.755575]  [<ffffffff81006b5f>] print_context_stack+0x8f/0x100
> > [ 1611.756410]  [<ffffffff81005540>] dump_trace+0x140/0x370
> > [ 1611.757242]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> > [ 1611.758072]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> > [ 1611.758895]  [<ffffffff810137cb>] save_stack_trace+0x2b/0x50
> > [ 1611.759720]  [<ffffffff811c29a0>] set_track+0x70/0x140
> > [ 1611.760541]  [<ffffffff8178d993>] alloc_debug_processing+0x92/0x118
> > [ 1611.761366]  [<ffffffff8178e5d6>] __slab_alloc+0x45f/0x56f
> > [ 1611.762195]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> > [ 1611.763024]  [<ffffffff8178dd57>] ? __slab_free+0x114/0x309
> > [ 1611.763853]  [<ffffffff8137187e>] ? debug_check_no_obj_freed+0x17e/0x270
> > [ 1611.764712]  [<ffffffff811e797f>] ? getname_flags+0x4f/0x1a0
> > [ 1611.765539]  [<ffffffff811c6b26>] kmem_cache_alloc+0x1f6/0x270
> 
> So, every time there is a slab allocation the entire stack trace gets
> saved as human readable text. And for each line in the trace,

Wrong. It gets saved as address. No conversion to text at all.

> is_module_text_address() can be called, which has huge overhead
> walking the entire list of loaded modules. No wonder there are
> timeouts...

You would have to have a gazillion of modules to make that overhead
big enough to trigger a multi seconds watchdog.

Thanks,

	tglx





^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 19:31                                                       ` Linus Torvalds
  2014-12-05 19:37                                                         ` Dave Jones
@ 2014-12-06 22:38                                                         ` Thomas Gleixner
  1 sibling, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-06 22:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, 5 Dec 2014, Linus Torvalds wrote:
> On Fri, Dec 5, 2014 at 10:48 AM, Dave Jones <davej@redhat.com> wrote:
> >
> > In the meantime, I rebooted into the same kernel, and ran trinity
> > solely doing the lsetxattr syscalls.
> 
> Any particular reason for the lsetxattr guess? Just the last call
> chain? I don't recognize it from the other traces, but maybe I just
> didn't notice.
> 
> >   The load was a bit lower, so I
> > cranked up the number of child processes to 512, and then this
> > happened..
> 
> Ugh. "dump_trace()" being broken and looping forever? I don't actually

Looking at the callchain: up to the point where dump_stack() is called
everything is preemtible context. So dump_stack() would need to loop
for a few seconds to trigger the NMI watchdog.

> believe it, because this isn't even on the exception stack (well, the
> NMI dumper is, but that one worked fine - this is the "nested" dumping
> of just the allocation call chain)

I doubt that dump_trace() itself is broken, but the call site might
have handed in something which causes memory corruption. And looking
at set_track() and the completely undocumented way how it retrieves
the storage for the trace entries via get_track() makes my brain melt.

Thanks,

	tglx








^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 18:15                                                 ` Linus Torvalds
@ 2014-12-07 14:58                                                   ` Sasha Levin
  2014-12-07 18:24                                                     ` Paul E. McKenney
  2014-12-07 23:53                                                     ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-07 14:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/05/2014 01:15 PM, Linus Torvalds wrote:
> On Fri, Dec 5, 2014 at 7:03 AM, Sasha Levin <sasha.levin@oracle.com> wrote:
>>
>> Yes, it's going to a serial line, but it's only about 100 lines/second on
>> average. I wouldn't expect it to cause anything to hang!
> 
> A regular 16650 serial chip? Running at 115kbps, I assume? So that's
> about 11kB/s.
> 
> And the serial console is polling, since it can't sleep or depend on interrupts.
> 
> At a average line length of what, 40 characters? At less than 300
> lines/s, you'd be using up 100% of one CPU. And since the printouts
> are serialized, that would be all other CPU's too..
> 
> 100 lines/s _average_ means that I can easily see it be 300lines/s for a while.
> 
> So yeah. The serial console is simply not designed to handle
> continuous output. It's for the "occasional" stuff.
> 
> The fact that your rcu lockups go away when you make the fault
> injection be quiet makes me really suspect this is related.

The lockups themselves "go away", but looking closer at the log
without those extra prints, I'm seeing:

[ 1458.700070] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1458.700133]  (detected by 19, t=30502 jiffies, g=12293, c=12292, q=0)
[ 1458.702764] INFO: Stall ended before state dump start

Quite often.

Maybe the extra prints were just a catalyst?


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-07 14:58                                                   ` Sasha Levin
@ 2014-12-07 18:24                                                     ` Paul E. McKenney
  2014-12-07 19:43                                                       ` Paul E. McKenney
  2014-12-07 23:53                                                     ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-07 18:24 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On Sun, Dec 07, 2014 at 09:58:14AM -0500, Sasha Levin wrote:
> On 12/05/2014 01:15 PM, Linus Torvalds wrote:
> > On Fri, Dec 5, 2014 at 7:03 AM, Sasha Levin <sasha.levin@oracle.com> wrote:
> >>
> >> Yes, it's going to a serial line, but it's only about 100 lines/second on
> >> average. I wouldn't expect it to cause anything to hang!
> > 
> > A regular 16650 serial chip? Running at 115kbps, I assume? So that's
> > about 11kB/s.
> > 
> > And the serial console is polling, since it can't sleep or depend on interrupts.
> > 
> > At a average line length of what, 40 characters? At less than 300
> > lines/s, you'd be using up 100% of one CPU. And since the printouts
> > are serialized, that would be all other CPU's too..
> > 
> > 100 lines/s _average_ means that I can easily see it be 300lines/s for a while.
> > 
> > So yeah. The serial console is simply not designed to handle
> > continuous output. It's for the "occasional" stuff.
> > 
> > The fact that your rcu lockups go away when you make the fault
> > injection be quiet makes me really suspect this is related.
> 
> The lockups themselves "go away", but looking closer at the log
> without those extra prints, I'm seeing:
> 
> [ 1458.700070] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1458.700133]  (detected by 19, t=30502 jiffies, g=12293, c=12292, q=0)
> [ 1458.702764] INFO: Stall ended before state dump start
> 
> Quite often.
> 
> Maybe the extra prints were just a catalyst?

Is anything else being printed about the time that these message show
up?  Or is this the only output for the 40,000 jiffies preceding this
message?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-07 18:24                                                     ` Paul E. McKenney
@ 2014-12-07 19:43                                                       ` Paul E. McKenney
  2014-12-07 23:28                                                         ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-07 19:43 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On Sun, Dec 07, 2014 at 10:24:20AM -0800, Paul E. McKenney wrote:
> On Sun, Dec 07, 2014 at 09:58:14AM -0500, Sasha Levin wrote:
> > On 12/05/2014 01:15 PM, Linus Torvalds wrote:
> > > On Fri, Dec 5, 2014 at 7:03 AM, Sasha Levin <sasha.levin@oracle.com> wrote:
> > >>
> > >> Yes, it's going to a serial line, but it's only about 100 lines/second on
> > >> average. I wouldn't expect it to cause anything to hang!
> > > 
> > > A regular 16650 serial chip? Running at 115kbps, I assume? So that's
> > > about 11kB/s.
> > > 
> > > And the serial console is polling, since it can't sleep or depend on interrupts.
> > > 
> > > At a average line length of what, 40 characters? At less than 300
> > > lines/s, you'd be using up 100% of one CPU. And since the printouts
> > > are serialized, that would be all other CPU's too..
> > > 
> > > 100 lines/s _average_ means that I can easily see it be 300lines/s for a while.
> > > 
> > > So yeah. The serial console is simply not designed to handle
> > > continuous output. It's for the "occasional" stuff.
> > > 
> > > The fact that your rcu lockups go away when you make the fault
> > > injection be quiet makes me really suspect this is related.
> > 
> > The lockups themselves "go away", but looking closer at the log
> > without those extra prints, I'm seeing:
> > 
> > [ 1458.700070] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 1458.700133]  (detected by 19, t=30502 jiffies, g=12293, c=12292, q=0)
> > [ 1458.702764] INFO: Stall ended before state dump start
> > 
> > Quite often.
> > 
> > Maybe the extra prints were just a catalyst?
> 
> Is anything else being printed about the time that these message show
> up?  Or is this the only output for the 40,000 jiffies preceding this
> message?

And could you please build with CONFIG_RCU_CPU_STALL_INFO=y and try
this again?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-07 19:43                                                       ` Paul E. McKenney
@ 2014-12-07 23:28                                                         ` Sasha Levin
  2014-12-08  5:20                                                           ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-07 23:28 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/07/2014 02:43 PM, Paul E. McKenney wrote:
> On Sun, Dec 07, 2014 at 10:24:20AM -0800, Paul E. McKenney wrote:
>> On Sun, Dec 07, 2014 at 09:58:14AM -0500, Sasha Levin wrote:
>>> On 12/05/2014 01:15 PM, Linus Torvalds wrote:
>>>> On Fri, Dec 5, 2014 at 7:03 AM, Sasha Levin <sasha.levin@oracle.com> wrote:
>>>>>
>>>>> Yes, it's going to a serial line, but it's only about 100 lines/second on
>>>>> average. I wouldn't expect it to cause anything to hang!
>>>>
>>>> A regular 16650 serial chip? Running at 115kbps, I assume? So that's
>>>> about 11kB/s.
>>>>
>>>> And the serial console is polling, since it can't sleep or depend on interrupts.
>>>>
>>>> At a average line length of what, 40 characters? At less than 300
>>>> lines/s, you'd be using up 100% of one CPU. And since the printouts
>>>> are serialized, that would be all other CPU's too..
>>>>
>>>> 100 lines/s _average_ means that I can easily see it be 300lines/s for a while.
>>>>
>>>> So yeah. The serial console is simply not designed to handle
>>>> continuous output. It's for the "occasional" stuff.
>>>>
>>>> The fact that your rcu lockups go away when you make the fault
>>>> injection be quiet makes me really suspect this is related.
>>>
>>> The lockups themselves "go away", but looking closer at the log
>>> without those extra prints, I'm seeing:
>>>
>>> [ 1458.700070] INFO: rcu_preempt detected stalls on CPUs/tasks:
>>> [ 1458.700133]  (detected by 19, t=30502 jiffies, g=12293, c=12292, q=0)
>>> [ 1458.702764] INFO: Stall ended before state dump start
>>>
>>> Quite often.
>>>
>>> Maybe the extra prints were just a catalyst?
>>
>> Is anything else being printed about the time that these message show
>> up?  Or is this the only output for the 40,000 jiffies preceding this
>> message?

There's actually nothing going on (beyond fuzzing noise) before/after:

[  756.618342] kexec-bzImage64: Not a bzImage
[  762.381734] kexec-bzImage64: Not a bzImage
[  765.129612] Unable to find swap-space signature
[  771.022304] Unable to find swap-space signature
[  793.434417] kexec-bzImage64: Not a bzImage
[  797.978210] => alloc_cpumask_var: failed!
[  800.253116] kexec-bzImage64: Not a bzImage
[  818.280056] INFO: rcu_sched detected stalls on CPUs/tasks:
[  818.280056]  (detected by 11, t=30503 jiffies, g=-295, c=-296, q=0)
[  818.283400] INFO: Stall ended before state dump start
[  829.523992] audit: type=1326 audit(39.680:47): auid=4294967295 uid=2385760256 gid=2214330370 ses=4294967295 pid=13307 comm="trinity-c353" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fffcb7bee47 code=0x0
[  830.890841] audit: type=1326 audit(41.010:48): auid=4294967295 uid=310902784 gid=201841673 ses=4294967295 pid=13294 comm="trinity-c350" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fffcb7bee47 code=0x0

> And could you please build with CONFIG_RCU_CPU_STALL_INFO=y and try
> this again?

I already have it set:

$ grep STALL_INFO .config
CONFIG_RCU_CPU_STALL_INFO=y


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-07 14:58                                                   ` Sasha Levin
  2014-12-07 18:24                                                     ` Paul E. McKenney
@ 2014-12-07 23:53                                                     ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-07 23:53 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, Chris Mason, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Sun, Dec 7, 2014 at 6:58 AM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> Maybe the extra prints were just a catalyst?

So there's an interesting change in between 3.16..3.17 - a commit that
was already reverted once due to unrelated problems (it apparently hit
lockdep issues): commit 5874af2003b1 ("printk: enable interrupts
before calling console_trylock_for_printk()").

In particular, that commit means that interrupts get re-enabled in the
middle of the printk (if they were enabled before the printk), and
while I don't see why that would be wrong, it definitely might change
behavior. That code has often been fragile (the whole lockdep example
was just the latest case of that). For example, it ends up looping
over "goto again" with preemption disabled if new console messages
keep coming in.

So I don't think that "enable interrupts" commit itself is necessarily
buggy, but looking at all the printk changes in the relevant time
range, I can easily see that particular commit having some subtle
interaction under heavy printk activity. Before that commit, all the
queued printouts would be written with interrupts disabled all the
way. After that commit, interrupts get re-enabled before and in
between messages get actually pushed to the console.

Should it matter? No. But I don't think we figured out what went wrong
with the lockdep issue that an earlier version of that commit had
either, and that problem caused lockups at boot for some people.  The
whole "print to console" is just fragile, and the addition of serial
console migth just make it even worse.

I dunno. But especially since your RCU issues seem to solve themselves
when *not* having lots of printk's, maybe the lockup is somehow
related to this all. Maybe the lockdep recursion hang ends up being a
"RCU debugging" hang when the timer interrupt causes printk recursion
with the console lock held..

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-07 23:28                                                         ` Sasha Levin
@ 2014-12-08  5:20                                                           ` Paul E. McKenney
  2014-12-08 14:33                                                             ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-08  5:20 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On Sun, Dec 07, 2014 at 06:28:43PM -0500, Sasha Levin wrote:
> On 12/07/2014 02:43 PM, Paul E. McKenney wrote:
> > On Sun, Dec 07, 2014 at 10:24:20AM -0800, Paul E. McKenney wrote:
> >> On Sun, Dec 07, 2014 at 09:58:14AM -0500, Sasha Levin wrote:
> >>> On 12/05/2014 01:15 PM, Linus Torvalds wrote:
> >>>> On Fri, Dec 5, 2014 at 7:03 AM, Sasha Levin <sasha.levin@oracle.com> wrote:
> >>>>>
> >>>>> Yes, it's going to a serial line, but it's only about 100 lines/second on
> >>>>> average. I wouldn't expect it to cause anything to hang!
> >>>>
> >>>> A regular 16650 serial chip? Running at 115kbps, I assume? So that's
> >>>> about 11kB/s.
> >>>>
> >>>> And the serial console is polling, since it can't sleep or depend on interrupts.
> >>>>
> >>>> At a average line length of what, 40 characters? At less than 300
> >>>> lines/s, you'd be using up 100% of one CPU. And since the printouts
> >>>> are serialized, that would be all other CPU's too..
> >>>>
> >>>> 100 lines/s _average_ means that I can easily see it be 300lines/s for a while.
> >>>>
> >>>> So yeah. The serial console is simply not designed to handle
> >>>> continuous output. It's for the "occasional" stuff.
> >>>>
> >>>> The fact that your rcu lockups go away when you make the fault
> >>>> injection be quiet makes me really suspect this is related.
> >>>
> >>> The lockups themselves "go away", but looking closer at the log
> >>> without those extra prints, I'm seeing:
> >>>
> >>> [ 1458.700070] INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>> [ 1458.700133]  (detected by 19, t=30502 jiffies, g=12293, c=12292, q=0)
> >>> [ 1458.702764] INFO: Stall ended before state dump start
> >>>
> >>> Quite often.
> >>>
> >>> Maybe the extra prints were just a catalyst?
> >>
> >> Is anything else being printed about the time that these message show
> >> up?  Or is this the only output for the 40,000 jiffies preceding this
> >> message?
> 
> There's actually nothing going on (beyond fuzzing noise) before/after:
> 
> [  756.618342] kexec-bzImage64: Not a bzImage
> [  762.381734] kexec-bzImage64: Not a bzImage
> [  765.129612] Unable to find swap-space signature
> [  771.022304] Unable to find swap-space signature
> [  793.434417] kexec-bzImage64: Not a bzImage
> [  797.978210] => alloc_cpumask_var: failed!
> [  800.253116] kexec-bzImage64: Not a bzImage
> [  818.280056] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  818.280056]  (detected by 11, t=30503 jiffies, g=-295, c=-296, q=0)
> [  818.283400] INFO: Stall ended before state dump start
> [  829.523992] audit: type=1326 audit(39.680:47): auid=4294967295 uid=2385760256 gid=2214330370 ses=4294967295 pid=13307 comm="trinity-c353" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fffcb7bee47 code=0x0
> [  830.890841] audit: type=1326 audit(41.010:48): auid=4294967295 uid=310902784 gid=201841673 ses=4294967295 pid=13294 comm="trinity-c350" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fffcb7bee47 code=0x0

I have seen this caused by lost IPIs, but you have to lose two of them,
which seems less than fully likely.

> > And could you please build with CONFIG_RCU_CPU_STALL_INFO=y and try
> > this again?
> 
> I already have it set:
> 
> $ grep STALL_INFO .config
> CONFIG_RCU_CPU_STALL_INFO=y

Ah, apologies.  OK, time to make this the default, and later to remove
the option...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-08  5:20                                                           ` Paul E. McKenney
@ 2014-12-08 14:33                                                             ` Sasha Levin
  2014-12-08 15:28                                                               ` Sasha Levin
  2014-12-08 15:56                                                               ` Paul E. McKenney
  0 siblings, 2 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-08 14:33 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/08/2014 12:20 AM, Paul E. McKenney wrote:
> I have seen this caused by lost IPIs, but you have to lose two of them,
> which seems less than fully likely.

It does seem that it can cause full blown stalls as well, just pretty
rarely (notice the lack of any prints before):

[11373.032327] audit: type=1326 audit(1397703594.974:502): auid=4294967295 uid=7884781 gid=0 ses=4294967295 pid=9853 comm="trinity-c768" exe="/trinity/trinity" sig=9 arch=c0000
03e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0
[11374.565881] audit: type=1326 audit(1397703596.504:503): auid=4294967295 uid=32 gid=0
 ses=4294967295 pid=9801 comm="trinity-c710" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0
[11839.353539] Hangcheck: hangcheck value past margin!
[12040.010128] INFO: rcu_sched detected stalls on CPUs/tasks:
[12040.012072]  (detected by 4, t=213513 jiffies, g=-222, c=-223, q=0)
[12040.014200] INFO: Stall ended before state dump start
[12159.730069] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12159.730069]  (detected by 3, t=396537 jiffies, g=24095, c=24094, q=1346)
[12159.730069] INFO: Stall ended before state dump start
[12602.162439] Hangcheck: hangcheck value past margin!
[12655.560806] INFO: rcu_sched detected stalls on CPUs/tasks:
[12655.560806]  0: (3 ticks this GP) idle=bc3/140000000000002/0 softirq=26674/26674 last_accelerate: b2a8/da68, nonlazy_posted: 20893, ..
[12655.602171]  (detected by 13, t=30506 jiffies, g=-219, c=-220, q=0)
[12655.602171] Task dump for CPU 0:
[12655.602171] trinity-c39     R  running task    11904  6558  26120 0x0008000c
[12655.602171]  ffffffff81593bf7 ffff8808d5d58d40 0000000000000282 ffffffff9ef40538
[12655.602171]  ffff880481400000 ffff8808d5dcb638 ffffffff83f2ed2b ffffffff9eaa1718
[12655.602171]  ffff8808d5d58d08 00000b820c44b0ae 0000000000000000 0000000000000001
[12655.602171] Call Trace:
[12655.602171]  [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
[12655.602171]  [<ffffffff83f2ed2b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[12655.602171]  [<ffffffff91677ffb>] ? retint_restore_args+0x13/0x13
[12655.602171]  [<ffffffff91676612>] ? _raw_spin_unlock_irqrestore+0xa2/0xf0
[12655.602171]  [<ffffffff83f93705>] ? __debug_check_no_obj_freed+0x2f5/0xd90
[12655.602171]  [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
[12655.602171]  [<ffffffff83f95ba9>] ? debug_check_no_obj_freed+0x19/0x20
[12655.602171]  [<ffffffff819049bf>] ? free_pages_prepare+0x5bf/0x1000
[12655.602171]  [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
[12655.602171]  [<ffffffff8190cacd>] ? __free_pages_ok+0x3d/0x360
[12655.602171]  [<ffffffff8190ce7d>] ? free_compound_page+0x8d/0xd0
[12655.602171]  [<ffffffff81929986>] ? __put_compound_page+0x46/0x70
[12655.602171]  [<ffffffff8192b395>] ? put_compound_page+0xf5/0x10e0
[12655.602171]  [<ffffffff814d99ab>] ? preempt_count_sub+0x11b/0x1d0
[12655.602171]  [<ffffffff8192da4d>] ? release_pages+0x41d/0x6f0
[12655.602171]  [<ffffffff81a0188b>] ? free_pages_and_swap_cache+0x11b/0x1a0
[12655.602171]  [<ffffffff819a6b92>] ? tlb_flush_mmu_free+0x72/0x180
[12655.602171]  [<ffffffff819ace76>] ? unmap_single_vma+0x1326/0x2170
[12655.602171]  [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
[12655.602171]  [<ffffffff819b0644>] ? unmap_vmas+0xd4/0x250
[12655.602171]  [<ffffffff819d62c9>] ? exit_mmap+0x169/0x610
[12655.602171]  [<ffffffff81a678fd>] ? kmem_cache_free+0x7cd/0xbb0
[12655.602171]  [<ffffffff814095b2>] ? mmput+0xd2/0x2c0
[12655.602171]  [<ffffffff81423551>] ? do_exit+0x7e1/0x39c0
[12655.602171]  [<ffffffff81456fb2>] ? get_signal+0x7a2/0x2130
[12655.602171]  [<ffffffff81426891>] ? do_group_exit+0x101/0x490
[12655.602171]  [<ffffffff814d99ab>] ? preempt_count_sub+0x11b/0x1d0
[12655.602171]  [<ffffffff81456f4e>] ? get_signal+0x73e/0x2130
[12655.602171]  [<ffffffff811d59f1>] ? sched_clock+0x31/0x50
[12655.602171]  [<ffffffff81585ded>] ? get_lock_stats+0x1d/0x100
[12655.602171]  [<ffffffff811ac828>] ? do_signal+0x28/0x3750
[12655.602171]  [<ffffffff814f7f73>] ? vtime_account_user+0x173/0x220
[12655.602171]  [<ffffffff814d96c1>] ? get_parent_ip+0x11/0x50
[12655.602171]  [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
[12655.602171]  [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
[12655.602171]  [<ffffffff81593e8d>] ? trace_hardirqs_on+0xd/0x10
[12655.602171]  [<ffffffff811affb9>] ? do_notify_resume+0x69/0x100
[12655.602171]  [<ffffffff9167744f>] ? int_signal+0x12/0x17


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-08 14:33                                                             ` Sasha Levin
@ 2014-12-08 15:28                                                               ` Sasha Levin
  2014-12-08 15:57                                                                 ` Paul E. McKenney
  2014-12-08 15:56                                                               ` Paul E. McKenney
  1 sibling, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-08 15:28 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/08/2014 09:33 AM, Sasha Levin wrote:
> On 12/08/2014 12:20 AM, Paul E. McKenney wrote:
>> > I have seen this caused by lost IPIs, but you have to lose two of them,
>> > which seems less than fully likely.
> It does seem that it can cause full blown stalls as well, just pretty
> rarely (notice the lack of any prints before):

Forgot to mentioned, I cranked the rcu lockup timeout to 300 seconds and got
that stall.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-08 14:33                                                             ` Sasha Levin
  2014-12-08 15:28                                                               ` Sasha Levin
@ 2014-12-08 15:56                                                               ` Paul E. McKenney
  1 sibling, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-08 15:56 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On Mon, Dec 08, 2014 at 09:33:29AM -0500, Sasha Levin wrote:
> On 12/08/2014 12:20 AM, Paul E. McKenney wrote:
> > I have seen this caused by lost IPIs, but you have to lose two of them,
> > which seems less than fully likely.
> 
> It does seem that it can cause full blown stalls as well, just pretty
> rarely (notice the lack of any prints before):

This is back in the TLB-flush and free-pagef arena.  But unless you
have an extremely large process, I wouldn't expect this to take 30K
jiffies.

> [11373.032327] audit: type=1326 audit(1397703594.974:502): auid=4294967295 uid=7884781 gid=0 ses=4294967295 pid=9853 comm="trinity-c768" exe="/trinity/trinity" sig=9 arch=c0000
> 03e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0
> [11374.565881] audit: type=1326 audit(1397703596.504:503): auid=4294967295 uid=32 gid=0
>  ses=4294967295 pid=9801 comm="trinity-c710" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0
> [11839.353539] Hangcheck: hangcheck value past margin!
> [12040.010128] INFO: rcu_sched detected stalls on CPUs/tasks:
> [12040.012072]  (detected by 4, t=213513 jiffies, g=-222, c=-223, q=0)
> [12040.014200] INFO: Stall ended before state dump start
> [12159.730069] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [12159.730069]  (detected by 3, t=396537 jiffies, g=24095, c=24094, q=1346)
> [12159.730069] INFO: Stall ended before state dump start
> [12602.162439] Hangcheck: hangcheck value past margin!
> [12655.560806] INFO: rcu_sched detected stalls on CPUs/tasks:
> [12655.560806]  0: (3 ticks this GP) idle=bc3/140000000000002/0 softirq=26674/26674 last_accelerate: b2a8/da68, nonlazy_posted: 20893, ..

And the above is what I was looking for from RCU_CPU_STALL_INFO, thank you!

This CPU only saw three scheduling-clock ticks during this grace period,
which suggests that your workload is not very intense or that you are
running NO_HZ_FULL with this CPU having a single CPU-bound userspace task.

The CPU is currently not idle (no surprise).  There has been no softirq
activity during this grace period.  There was at least one callback
acceleration during this grace period (irrelevant to this bug), and there
have been more than 26K non-lazy callbacks posted during this grace
period (no surprise, given that the grace period has been in force for
more than 30K jiffies).

							Thanx, Paul

> [12655.602171]  (detected by 13, t=30506 jiffies, g=-219, c=-220, q=0)
> [12655.602171] Task dump for CPU 0:
> [12655.602171] trinity-c39     R  running task    11904  6558  26120 0x0008000c
> [12655.602171]  ffffffff81593bf7 ffff8808d5d58d40 0000000000000282 ffffffff9ef40538
> [12655.602171]  ffff880481400000 ffff8808d5dcb638 ffffffff83f2ed2b ffffffff9eaa1718
> [12655.602171]  ffff8808d5d58d08 00000b820c44b0ae 0000000000000000 0000000000000001
> [12655.602171] Call Trace:
> [12655.602171]  [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
> [12655.602171]  [<ffffffff83f2ed2b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [12655.602171]  [<ffffffff91677ffb>] ? retint_restore_args+0x13/0x13
> [12655.602171]  [<ffffffff91676612>] ? _raw_spin_unlock_irqrestore+0xa2/0xf0
> [12655.602171]  [<ffffffff83f93705>] ? __debug_check_no_obj_freed+0x2f5/0xd90
> [12655.602171]  [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
> [12655.602171]  [<ffffffff83f95ba9>] ? debug_check_no_obj_freed+0x19/0x20
> [12655.602171]  [<ffffffff819049bf>] ? free_pages_prepare+0x5bf/0x1000
> [12655.602171]  [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
> [12655.602171]  [<ffffffff8190cacd>] ? __free_pages_ok+0x3d/0x360
> [12655.602171]  [<ffffffff8190ce7d>] ? free_compound_page+0x8d/0xd0
> [12655.602171]  [<ffffffff81929986>] ? __put_compound_page+0x46/0x70
> [12655.602171]  [<ffffffff8192b395>] ? put_compound_page+0xf5/0x10e0
> [12655.602171]  [<ffffffff814d99ab>] ? preempt_count_sub+0x11b/0x1d0
> [12655.602171]  [<ffffffff8192da4d>] ? release_pages+0x41d/0x6f0
> [12655.602171]  [<ffffffff81a0188b>] ? free_pages_and_swap_cache+0x11b/0x1a0
> [12655.602171]  [<ffffffff819a6b92>] ? tlb_flush_mmu_free+0x72/0x180
> [12655.602171]  [<ffffffff819ace76>] ? unmap_single_vma+0x1326/0x2170
> [12655.602171]  [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
> [12655.602171]  [<ffffffff819b0644>] ? unmap_vmas+0xd4/0x250
> [12655.602171]  [<ffffffff819d62c9>] ? exit_mmap+0x169/0x610
> [12655.602171]  [<ffffffff81a678fd>] ? kmem_cache_free+0x7cd/0xbb0
> [12655.602171]  [<ffffffff814095b2>] ? mmput+0xd2/0x2c0
> [12655.602171]  [<ffffffff81423551>] ? do_exit+0x7e1/0x39c0
> [12655.602171]  [<ffffffff81456fb2>] ? get_signal+0x7a2/0x2130
> [12655.602171]  [<ffffffff81426891>] ? do_group_exit+0x101/0x490
> [12655.602171]  [<ffffffff814d99ab>] ? preempt_count_sub+0x11b/0x1d0
> [12655.602171]  [<ffffffff81456f4e>] ? get_signal+0x73e/0x2130
> [12655.602171]  [<ffffffff811d59f1>] ? sched_clock+0x31/0x50
> [12655.602171]  [<ffffffff81585ded>] ? get_lock_stats+0x1d/0x100
> [12655.602171]  [<ffffffff811ac828>] ? do_signal+0x28/0x3750
> [12655.602171]  [<ffffffff814f7f73>] ? vtime_account_user+0x173/0x220
> [12655.602171]  [<ffffffff814d96c1>] ? get_parent_ip+0x11/0x50
> [12655.602171]  [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
> [12655.602171]  [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
> [12655.602171]  [<ffffffff81593e8d>] ? trace_hardirqs_on+0xd/0x10
> [12655.602171]  [<ffffffff811affb9>] ? do_notify_resume+0x69/0x100
> [12655.602171]  [<ffffffff9167744f>] ? int_signal+0x12/0x17
> 
> 
> Thanks,
> Sasha
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-08 15:28                                                               ` Sasha Levin
@ 2014-12-08 15:57                                                                 ` Paul E. McKenney
  2014-12-08 16:34                                                                   ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-08 15:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On Mon, Dec 08, 2014 at 10:28:53AM -0500, Sasha Levin wrote:
> On 12/08/2014 09:33 AM, Sasha Levin wrote:
> > On 12/08/2014 12:20 AM, Paul E. McKenney wrote:
> >> > I have seen this caused by lost IPIs, but you have to lose two of them,
> >> > which seems less than fully likely.
> > It does seem that it can cause full blown stalls as well, just pretty
> > rarely (notice the lack of any prints before):
> 
> Forgot to mentioned, I cranked the rcu lockup timeout to 300 seconds and got
> that stall.

So with the default of 21 seconds, you presumably get huge numbers of
RCU CPU stall warnings?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-08 15:57                                                                 ` Paul E. McKenney
@ 2014-12-08 16:34                                                                   ` Sasha Levin
  0 siblings, 0 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-08 16:34 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/08/2014 10:57 AM, Paul E. McKenney wrote:
> On Mon, Dec 08, 2014 at 10:28:53AM -0500, Sasha Levin wrote:
>> > On 12/08/2014 09:33 AM, Sasha Levin wrote:
>>> > > On 12/08/2014 12:20 AM, Paul E. McKenney wrote:
>>>>> > >> > I have seen this caused by lost IPIs, but you have to lose two of them,
>>>>> > >> > which seems less than fully likely.
>>> > > It does seem that it can cause full blown stalls as well, just pretty
>>> > > rarely (notice the lack of any prints before):
>> > 
>> > Forgot to mentioned, I cranked the rcu lockup timeout to 300 seconds and got
>> > that stall.
> So with the default of 21 seconds, you presumably get huge numbers of
> RCU CPU stall warnings?

Yes, I'm seeing 1 lockup every ~5 minutes on my set up.

The traces do seem to be different every time.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-05 19:29                                                       ` Linus Torvalds
@ 2014-12-11 14:54                                                         ` Dave Jones
  2014-12-11 21:49                                                           ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-11 14:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 05, 2014 at 11:29:05AM -0800, Linus Torvalds wrote:

 > So it must be some very particular corruption. I still vote for "let's
 > see if Dave can narrow it down with bisection".

I vote for "plane ticket to drinking island".

So I've been continuing the bisect, leaving the 'good' cases running for
as long as a day and a half, and it's taking me down a path of staging
commits again (which is crap, as I don't even have that enabled).

git bisect start
git bisect bad 7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
git bisect bad ae045e2455429c418a418a3376301a9e5753a0a8
git bisect bad 53ee983378ff23e8f3ff95ecf99dea7c6c221900
git bisect good 2042088cd67d0064d18c52c13c69af2499907bb1
git bisect good 98959948a7ba33cf8c708626e0d2a1456397e1c6
git bisect bad 6f929b4e5a022c3ca806c1675ccb833c42086853


So either one of those 'good's actually wasn't, or I'm just cursed.

I'm going to lose access to this machine next week, so unless I see
this on different hardware in the new year, I'm running out of options.

(on the bright side, having people stare at various suspect code
 throughout this thread does some to have yielded some good from all of
 this).

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 14:54                                                         ` Dave Jones
@ 2014-12-11 21:49                                                           ` Linus Torvalds
  2014-12-11 21:52                                                             ` Sasha Levin
                                                                               ` (3 more replies)
  0 siblings, 4 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-11 21:49 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 6:54 AM, Dave Jones <davej@redhat.com> wrote:
>
> So either one of those 'good's actually wasn't, or I'm just cursed.

Even if there was a good that wasn't, that last "bad"  (6f929b4e5a02)
is already sufficient just on its own to say that likely v3.16 already
had the problem.

Just do

   gitk v3.16..6f929b4e5a02

and cry.

(or "git diff --stat -M v3.16...6f929b4e5a02" to see what that commit
brought in from the common ancestor).

So I'd call that bisect a failure, and your "v3.16 is fine" is
actually suspect after all. Which *might* mean that it's some hardware
issue after all. Or there are multiple different problems, and while
v3.16 was fine, the problem was introduced earlier (in the common
ancestor of that staging tree), then fixed for 3.16, and then
re-introduced later again.

Anyway, you might as well stop bisecting. Regardless of where it lands
in the remaining pile, it's not going to give us any useful
information, methinks.

I'm stumped.

Maybe it's worth it to concentrate on just testing current kernels,
and instead try to limit the triggering some other way. In particular,
you had a trinity run that was *only* testing lsetxattr(). Is that
really *all* that was going on? Obviously trinity will be using
timers, fork, and other things? Can you recreate that lsetxattr thing,
and just try to get as many problem reports as possible from one
particular kernel (say, 3.18, since that should be a reasonable modern
base with hopefully not a lot of other random issues)?

Together with perhaps config checks. You've done some those already.
Did it reproduce without preemption, for example?

Does anybody have any smart ideas?

                            Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 21:49                                                           ` Linus Torvalds
@ 2014-12-11 21:52                                                             ` Sasha Levin
  2014-12-11 21:57                                                               ` Chris Mason
  2014-12-11 22:36                                                               ` Linus Torvalds
  2014-12-11 21:57                                                             ` Borislav Petkov
                                                                               ` (2 subsequent siblings)
  3 siblings, 2 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-11 21:52 UTC (permalink / raw)
  To: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/11/2014 04:49 PM, Linus Torvalds wrote:
> On Thu, Dec 11, 2014 at 6:54 AM, Dave Jones <davej@redhat.com> wrote:
>> >
>> > So either one of those 'good's actually wasn't, or I'm just cursed.
> Even if there was a good that wasn't, that last "bad"  (6f929b4e5a02)
> is already sufficient just on its own to say that likely v3.16 already
> had the problem.
> 
> Just do
> 
>    gitk v3.16..6f929b4e5a02
> 
> and cry.
> 
> (or "git diff --stat -M v3.16...6f929b4e5a02" to see what that commit
> brought in from the common ancestor).
> 
> So I'd call that bisect a failure, and your "v3.16 is fine" is
> actually suspect after all. Which *might* mean that it's some hardware
> issue after all. Or there are multiple different problems, and while
> v3.16 was fine, the problem was introduced earlier (in the common
> ancestor of that staging tree), then fixed for 3.16, and then
> re-introduced later again.

Is it possible that Dave and myself were seeing the same problem after
all?

I'll go bisect it even further back...


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 21:52                                                             ` Sasha Levin
@ 2014-12-11 21:57                                                               ` Chris Mason
  2014-12-11 22:00                                                                 ` Sasha Levin
  2014-12-11 22:36                                                               ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-11 21:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Dave Jones, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 4:52 PM, Sasha Levin <sasha.levin@oracle.com> 
wrote:
> On 12/11/2014 04:49 PM, Linus Torvalds wrote:
>>  On Thu, Dec 11, 2014 at 6:54 AM, Dave Jones <davej@redhat.com> 
>> wrote:
>>>  >
>>>  > So either one of those 'good's actually wasn't, or I'm just 
>>> cursed.
>>  Even if there was a good that wasn't, that last "bad"  
>> (6f929b4e5a02)
>>  is already sufficient just on its own to say that likely v3.16 
>> already
>>  had the problem.
>> 
>>  Just do
>> 
>>     gitk v3.16..6f929b4e5a02
>> 
>>  and cry.
>> 
>>  (or "git diff --stat -M v3.16...6f929b4e5a02" to see what that 
>> commit
>>  brought in from the common ancestor).
>> 
>>  So I'd call that bisect a failure, and your "v3.16 is fine" is
>>  actually suspect after all. Which *might* mean that it's some 
>> hardware
>>  issue after all. Or there are multiple different problems, and while
>>  v3.16 was fine, the problem was introduced earlier (in the common
>>  ancestor of that staging tree), then fixed for 3.16, and then
>>  re-introduced later again.
> 
> Is it possible that Dave and myself were seeing the same problem after
> all?
> 
> I'll go bisect it even further back...

For both of you, I'm curious how long 3.18 lasts if you turn off the 
serial console (and netconsole) completely.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 21:49                                                           ` Linus Torvalds
  2014-12-11 21:52                                                             ` Sasha Levin
@ 2014-12-11 21:57                                                             ` Borislav Petkov
  2014-12-12  3:03                                                             ` Dave Jones
  2014-12-12 18:54                                                             ` Dave Jones
  3 siblings, 0 replies; 486+ messages in thread
From: Borislav Petkov @ 2014-12-11 21:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 01:49:17PM -0800, Linus Torvalds wrote:
> Does anybody have any smart ideas?

Don't know if smart but we can check whether it is a hardware failure
if we try to reproduce the exact same failure on a second, identical
machine. If Dave gets one, of course.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 21:57                                                               ` Chris Mason
@ 2014-12-11 22:00                                                                 ` Sasha Levin
  0 siblings, 0 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-11 22:00 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Jones, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/11/2014 04:57 PM, Chris Mason wrote:
> On Thu, Dec 11, 2014 at 4:52 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> On 12/11/2014 04:49 PM, Linus Torvalds wrote:
>>>  On Thu, Dec 11, 2014 at 6:54 AM, Dave Jones <davej@redhat.com> wrote:
>>>>  >
>>>>  > So either one of those 'good's actually wasn't, or I'm just cursed.
>>>  Even if there was a good that wasn't, that last "bad"  (6f929b4e5a02)
>>>  is already sufficient just on its own to say that likely v3.16 already
>>>  had the problem.
>>>
>>>  Just do
>>>
>>>     gitk v3.16..6f929b4e5a02
>>>
>>>  and cry.
>>>
>>>  (or "git diff --stat -M v3.16...6f929b4e5a02" to see what that commit
>>>  brought in from the common ancestor).
>>>
>>>  So I'd call that bisect a failure, and your "v3.16 is fine" is
>>>  actually suspect after all. Which *might* mean that it's some hardware
>>>  issue after all. Or there are multiple different problems, and while
>>>  v3.16 was fine, the problem was introduced earlier (in the common
>>>  ancestor of that staging tree), then fixed for 3.16, and then
>>>  re-introduced later again.
>>
>> Is it possible that Dave and myself were seeing the same problem after
>> all?
>>
>> I'll go bisect it even further back...
> 
> For both of you, I'm curious how long 3.18 lasts if you turn off the serial console (and netconsole) completely.

I didn't try turning it off, but I tried switching debug level to critical
which meant that nothing was going out. I still saw the same hang...


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 21:52                                                             ` Sasha Levin
  2014-12-11 21:57                                                               ` Chris Mason
@ 2014-12-11 22:36                                                               ` Linus Torvalds
  2014-12-11 22:57                                                                 ` Sasha Levin
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-11 22:36 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 1:52 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> Is it possible that Dave and myself were seeing the same problem after
> all?

Could be. You do have commonalities, even if the actual symptoms then
differ. And while it looked different when you could trigger it with
3.16 but DaveJ couldn't, that's up in the air now that I doubt that
3.16 really is ok for DaveJ after all..

And you might have a better luck bisecting it, since you seem to be
able to trigger your RCU lockup much more quickly (and apparently
reliably? Correct?)

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 22:36                                                               ` Linus Torvalds
@ 2014-12-11 22:57                                                                 ` Sasha Levin
  2014-12-12  6:54                                                                   ` Ingo Molnar
  2014-12-12 23:54                                                                   ` Sasha Levin
  0 siblings, 2 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-11 22:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/11/2014 05:36 PM, Linus Torvalds wrote:
> On Thu, Dec 11, 2014 at 1:52 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> >
>> > Is it possible that Dave and myself were seeing the same problem after
>> > all?
> Could be. You do have commonalities, even if the actual symptoms then
> differ. And while it looked different when you could trigger it with
> 3.16 but DaveJ couldn't, that's up in the air now that I doubt that
> 3.16 really is ok for DaveJ after all..
> 
> And you might have a better luck bisecting it, since you seem to be
> able to trigger your RCU lockup much more quickly (and apparently
> reliably? Correct?)

Right, and it reproduces in 3.10 as well, so it's not really a new thing.

What's odd is that I don't remember seeing this bug so long in the past,
I'll try bisecting trinity rather than the kernel - it's the only other
thing that changed.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 21:49                                                           ` Linus Torvalds
  2014-12-11 21:52                                                             ` Sasha Levin
  2014-12-11 21:57                                                             ` Borislav Petkov
@ 2014-12-12  3:03                                                             ` Dave Jones
  2014-12-12  4:45                                                               ` Dave Jones
  2014-12-12 18:54                                                             ` Dave Jones
  3 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-12  3:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 01:49:17PM -0800, Linus Torvalds wrote:
 
 > Anyway, you might as well stop bisecting. Regardless of where it lands
 > in the remaining pile, it's not going to give us any useful
 > information, methinks.
 > 
 > I'm stumped.

yeah, likewise.  I don't recall any bug that's given me this much headache.
I don't think it's helped that the symptoms are vague enough that a
number of people have thought they've seen the same thing, which have
turned out to be unrelated incidents.  At least some of those have
gotten closure though it seems.

 > Maybe it's worth it to concentrate on just testing current kernels,
 > and instead try to limit the triggering some other way. In particular,
 > you had a trinity run that was *only* testing lsetxattr(). Is that
 > really *all* that was going on? Obviously trinity will be using
 > timers, fork, and other things? Can you recreate that lsetxattr thing,
 > and just try to get as many problem reports as possible from one
 > particular kernel (say, 3.18, since that should be a reasonable modern
 > base with hopefully not a lot of other random issues)?

I'll let it run overnight, but so far after 4hrs, on .18 it's not done
anything.

 > Together with perhaps config checks. You've done some those already.
 > Did it reproduce without preemption, for example?

Next kernel build I try, I'll turn that off.  I don't remember if
we've already tried that.  I *think* we just tried the non-preempt rcu
stuff, but not "no preemption at all".  I wish I'd kept better notes
about everything tried so far too, but I hadn't anticipated this
dragging out so long. Live and learn..

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12  3:03                                                             ` Dave Jones
@ 2014-12-12  4:45                                                               ` Dave Jones
  2014-12-12 14:38                                                                 ` Dave Jones
  2014-12-12 18:10                                                                 ` Paul E. McKenney
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-12  4:45 UTC (permalink / raw)
  To: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 10:03:43PM -0500, Dave Jones wrote:
 > On Thu, Dec 11, 2014 at 01:49:17PM -0800, Linus Torvalds wrote:
 >  
 >  > Anyway, you might as well stop bisecting. Regardless of where it lands
 >  > in the remaining pile, it's not going to give us any useful
 >  > information, methinks.
 >  > 
 >  > I'm stumped.
 > 
 > yeah, likewise.  I don't recall any bug that's given me this much headache.
 > I don't think it's helped that the symptoms are vague enough that a
 > number of people have thought they've seen the same thing, which have
 > turned out to be unrelated incidents.  At least some of those have
 > gotten closure though it seems.
 > 
 >  > Maybe it's worth it to concentrate on just testing current kernels,
 >  > and instead try to limit the triggering some other way. In particular,
 >  > you had a trinity run that was *only* testing lsetxattr(). Is that
 >  > really *all* that was going on? Obviously trinity will be using
 >  > timers, fork, and other things? Can you recreate that lsetxattr thing,
 >  > and just try to get as many problem reports as possible from one
 >  > particular kernel (say, 3.18, since that should be a reasonable modern
 >  > base with hopefully not a lot of other random issues)?
 > 
 > I'll let it run overnight, but so far after 4hrs, on .18 it's not done
 > anything.

Two hours later, it had spewed this, but survived. (Trinity had quit after that
point because /proc/sys/kernel/tainted changed).


[18755.303442] WARNING: CPU: 1 PID: 25572 at kernel/watchdog.c:317 watchdog_overflow_callback+0xdd/0x130()
[18755.303472] Watchdog detected hard LOCKUP on cpu 1
[18755.303487] CPU: 1 PID: 25572 Comm: trinity-c25 Not tainted 3.18.0+ #101 
[18755.303527]  ffffffff81a66315 00000000c1ad8e75 ffff880244205b88 ffffffff817d317e
[18755.303556]  0000000000110001 ffff880244205be0 ffff880244205bc8 ffffffff81078a01
[18755.303586]  0000000000000000 0000000000000001 0000000000000000 ffff880244205d30
[18755.303616] Call Trace:
[18755.303627]  <NMI>  [<ffffffff817d317e>] dump_stack+0x4f/0x7c
[18755.303654]  [<ffffffff81078a01>] warn_slowpath_common+0x81/0xa0
[18755.303675]  [<ffffffff81078a75>] warn_slowpath_fmt+0x55/0x70
[18755.303696]  [<ffffffff8112fea0>] ? restart_watchdog_hrtimer+0x60/0x60
[18755.303718]  [<ffffffff8112ff7d>] watchdog_overflow_callback+0xdd/0x130
[18755.303742]  [<ffffffff81173a7c>] __perf_event_overflow+0xac/0x2a0
[18755.303765]  [<ffffffff81019952>] ? x86_perf_event_set_period+0xe2/0x150
[18755.303787]  [<ffffffff81174644>] perf_event_overflow+0x14/0x20
[18755.303809]  [<ffffffff8101f479>] intel_pmu_handle_irq+0x209/0x410
[18755.303831]  [<ffffffff8101875b>] perf_event_nmi_handler+0x2b/0x50
[18755.303853]  [<ffffffff81007634>] nmi_handle+0xa4/0x1e0
[18755.303872]  [<ffffffff81007595>] ? nmi_handle+0x5/0x1e0
[18755.303892]  [<ffffffff810079aa>] default_do_nmi+0x7a/0x1d0
[18755.303911]  [<ffffffff81007bb8>] do_nmi+0xb8/0xf0
[18755.303929]  [<ffffffff817e0c2a>] end_repeat_nmi+0x1e/0x2e
[18755.303948]  <<EOE>>  <UNK> 
[18755.303959] ---[ end trace 6362f5b39b85eb2c ]---
[18755.303983] perf interrupt took too long (7018 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[18758.481443] ------------[ cut here ]------------
[18758.481475] WARNING: CPU: 3 PID: 25965 at kernel/watchdog.c:317 watchdog_overflow_callback+0xdd/0x130()
[18758.481512] Watchdog detected hard LOCKUP on cpu 3
[18758.481531] CPU: 3 PID: 25965 Comm: trinity-c418 Tainted: G        W      3.18.0+ #101 
[18758.482554]  ffffffff81a66315 00000000c93c9d92 ffff880244605b88 ffffffff817d317e
[18758.483585]  0000000000110004 ffff880244605be0 ffff880244605bc8 ffffffff81078a01
[18758.484614]  0000000000000000 0000000000000003 0000000000000000 ffff880244605d30
[18758.485651] Call Trace:
[18758.486657]  <NMI>  [<ffffffff817d317e>] dump_stack+0x4f/0x7c
[18758.487670]  [<ffffffff81078a01>] warn_slowpath_common+0x81/0xa0
[18758.488676]  [<ffffffff81078a75>] warn_slowpath_fmt+0x55/0x70
[18758.489675]  [<ffffffff8112fea0>] ? restart_watchdog_hrtimer+0x60/0x60
[18758.490681]  [<ffffffff8112ff7d>] watchdog_overflow_callback+0xdd/0x130
[18758.491687]  [<ffffffff81173a7c>] __perf_event_overflow+0xac/0x2a0
[18758.492677]  [<ffffffff81019952>] ? x86_perf_event_set_period+0xe2/0x150
[18758.493668]  [<ffffffff81174644>] perf_event_overflow+0x14/0x20
[18758.494662]  [<ffffffff8101f479>] intel_pmu_handle_irq+0x209/0x410
[18758.495653]  [<ffffffff8101875b>] perf_event_nmi_handler+0x2b/0x50
[18758.496652]  [<ffffffff81007634>] nmi_handle+0xa4/0x1e0
[18758.497646]  [<ffffffff81007595>] ? nmi_handle+0x5/0x1e0
[18758.498644]  [<ffffffff811080cf>] ? is_module_text_address+0x3f/0x50
[18758.499644]  [<ffffffff810079aa>] default_do_nmi+0x7a/0x1d0
[18758.500643]  [<ffffffff81007bb8>] do_nmi+0xb8/0xf0
[18758.501633]  [<ffffffff817e0c2a>] end_repeat_nmi+0x1e/0x2e
[18758.502619]  [<ffffffff811080cf>] ? is_module_text_address+0x3f/0x50
[18758.503606]  [<ffffffff811080cf>] ? is_module_text_address+0x3f/0x50
[18758.504583]  [<ffffffff811080cf>] ? is_module_text_address+0x3f/0x50
[18758.505548]  <<EOE>>  [<ffffffff810986b8>] __kernel_text_address+0x58/0x80
[18758.506526]  [<ffffffff81182a24>] ? free_one_page+0x1c4/0x520
[18758.507503]  [<ffffffff81006d8f>] print_context_stack+0x8f/0x100
[18758.508483]  [<ffffffff81005710>] dump_trace+0x140/0x370
[18758.509464]  [<ffffffff811b0961>] ? remove_vma+0x71/0x80
[18758.510445]  [<ffffffff81013ecf>] save_stack_trace+0x2f/0x50
[18758.511428]  [<ffffffff811d4f20>] set_track+0x70/0x140
[18758.512412]  [<ffffffff817d0f85>] free_debug_processing+0x157/0x22a
[18758.513387]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[18758.514344]  [<ffffffff817d10ad>] __slab_free+0x55/0x320
[18758.515298]  [<ffffffff8138e016>] ? debug_check_no_obj_freed+0x156/0x250
[18758.516252]  [<ffffffff811d7fd2>] kmem_cache_free+0x262/0x280
[18758.517201]  [<ffffffff811b0961>] ? remove_vma+0x71/0x80
[18758.518133]  [<ffffffff811b0961>] remove_vma+0x71/0x80
[18758.519042]  [<ffffffff811b3f1c>] exit_mmap+0x13c/0x1a0
[18758.519938]  [<ffffffff81075a2b>] mmput+0x6b/0x100
[18758.520826]  [<ffffffff8107a02e>] do_exit+0x29e/0xba0
[18758.521707]  [<ffffffff81088d41>] ? get_signal+0x2c1/0x710
[18758.522585]  [<ffffffff8138cb27>] ? debug_smp_processor_id+0x17/0x20
[18758.523463]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[18758.524336]  [<ffffffff810c5466>] ? lock_release_holdtime.part.24+0xe6/0x160
[18758.525220]  [<ffffffff8107b9dc>] do_group_exit+0x4c/0xc0
[18758.526099]  [<ffffffff81088d8c>] get_signal+0x30c/0x710
[18758.526977]  [<ffffffff8138cb27>] ? debug_smp_processor_id+0x17/0x20
[18758.527858]  [<ffffffff81002477>] do_signal+0x37/0x770
[18758.528735]  [<ffffffff8138d131>] ? free_object+0x81/0xb0
[18758.529608]  [<ffffffff8138db87>] ? debug_object_free+0xf7/0x150
[18758.530480]  [<ffffffff810ec285>] ? hrtimer_nanosleep+0x155/0x1c0
[18758.531357]  [<ffffffff810eac60>] ? hrtimer_get_res+0x50/0x50
[18758.532225]  [<ffffffff81002c15>] do_notify_resume+0x65/0x80
[18758.533102]  [<ffffffff8137f9ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[18758.533980]  [<ffffffff817deeff>] int_signal+0x12/0x17
[18758.534858] ---[ end trace 6362f5b39b85eb2d ]---
[18758.535840] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 54.297 msecs
[18758.537332] perf interrupt took too long (426767 > 10000), lowering kernel.perf_event_max_sample_rate to 12500

Few seconds later rcu craps itself..

[18801.941908] INFO: rcu_preempt detected stalls on CPUs/tasks:
[18801.942920] 	3: (3 GPs behind) idle=bf4/0/0 softirq=1597256/1597257 
[18801.943890] 	(detected by 0, t=6002 jiffies, g=763359, c=763358, q=0)
[18801.944843] Task dump for CPU 3:
[18801.945770] swapper/3       R  running task    14576     0      1 0x00200000
[18801.946706]  0000000342b6fe28 def23185c07e1b3d ffffe8ffff403518 0000000000000001
[18801.947629]  ffffffff81cb2000 0000000000000003 ffff880242b6fe78 ffffffff8166cb95
[18801.948557]  0000111242adb59f ffffffff81cb2070 ffff880242b6c000 ffffffff81d21ab0
[18801.949478] Call Trace:
[18801.950384]  [<ffffffff8166cb95>] ? cpuidle_enter_state+0x55/0x1c0
[18801.951303]  [<ffffffff8166cdb7>] ? cpuidle_enter+0x17/0x20
[18801.952211]  [<ffffffff810bf303>] ? cpu_startup_entry+0x423/0x4d0
[18801.953125]  [<ffffffff810314c3>] ? start_secondary+0x1a3/0x220

More of the same a minute later..

[18861.937095] INFO: rcu_preempt detected stalls on CPUs/tasks:
[18861.938050] 	1: (3 GPs behind) idle=89a/0/0 softirq=1498125/1498197 
[18861.938992] 	3: (4 GPs behind) idle=bf6/0/0 softirq=1597256/1597257 
[18861.939897] 	(detected by 0, t=6002 jiffies, g=763360, c=763359, q=0)
[18861.940812] Task dump for CPU 1:
[18861.941719] swapper/1       R  running task    14576     0      1 0x00200000
[18861.942649]  0000000142b5be28 8d64c020bc383a15 ffffe8ffff003518 0000000000000005
[18861.943584]  ffffffff81cb2000 0000000000000001 ffff880242b5be78 ffffffff8166cb95
[18861.944531]  0000112a29d67eee ffffffff81cb21d0 ffff880242b58000 ffffffff81d21ab0
[18861.945482] Call Trace:
[18861.946417]  [<ffffffff8166cb95>] ? cpuidle_enter_state+0x55/0x1c0
[18861.947368]  [<ffffffff8166cdb7>] ? cpuidle_enter+0x17/0x20
[18861.948315]  [<ffffffff810bf303>] ? cpu_startup_entry+0x423/0x4d0
[18861.949262]  [<ffffffff810314c3>] ? start_secondary+0x1a3/0x220
[18861.950214] Task dump for CPU 3:
[18861.951168] swapper/3       R  running task    14576     0      1 0x00200000
[18861.952143]  0000000342b6fe28 def23185c07e1b3d ffffe8ffff403518 0000000000000001
[18861.953117]  ffffffff81cb2000 0000000000000003 ffff880242b6fe78 ffffffff8166cb95
[18861.954099]  0000111c49e2522b ffffffff81cb2070 ffff880242b6c000 ffffffff81d21ab0
[18861.955082] Call Trace:
[18861.956054]  [<ffffffff8166cb95>] ? cpuidle_enter_state+0x55/0x1c0
[18861.957045]  [<ffffffff8166cdb7>] ? cpuidle_enter+0x17/0x20
[18861.958034]  [<ffffffff810bf303>] ? cpu_startup_entry+0x423/0x4d0
[18861.959019]  [<ffffffff810314c3>] ? start_secondary+0x1a3/0x220

CPU2 also gets 'stuck'.

[18889.800920] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 133s! [kworker/2:2:17558]
[18889.801789] CPU: 2 PID: 17558 Comm: kworker/2:2 Tainted: G        W      3.18.0+ #101 
[18889.802646] Workqueue: events free_obj_work
[18889.803490] task: ffff880231380000 ti: ffff8801b8c34000 task.ti: ffff8801b8c34000
[18889.804347] RIP: 0010:[<ffffffff81379d90>]  [<ffffffff81379d90>] memchr_inv+0x30/0x150
[18889.805219] RSP: 0018:ffff8801b8c37b08  EFLAGS: 00000287
[18889.806090] RAX: ffff8801cc9208a0 RBX: ffffffff81801b30 RCX: 000000000000005a
[18889.806973] RDX: 0000000000000008 RSI: 000000000000005a RDI: ffff8801cc92089c
[18889.807862] RBP: ffff8801b8c37b08 R08: ffff8801cc920898 R09: 000000000000005a
[18889.808756] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b8c37af8
[18889.809650] R13: 0000000000000000 R14: ffffffffffffc000 R15: ffff8801b8c37b68
[18889.810545] FS:  0000000000000000(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[18889.811489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18889.812384] CR2: 00007f1da06eaf30 CR3: 0000000001c11000 CR4: 00000000001407e0
[18889.813290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18889.814194] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[18889.815096] Stack:
[18889.815990]  ffff8801b8c37b58 ffffffff811d5aeb ffffea0007324800 ffff8801cc920898
[18889.816912]  00000000000000ce ffff880243c0fa80 ffff8801cc920730 ffffea0007324800
[18889.817839]  00000000000000bb ffff880243c0fa80 ffff8801b8c37b98 ffffffff811d6e78
[18889.818764] Call Trace:
[18889.819681]  [<ffffffff811d5aeb>] check_bytes_and_report+0x3b/0x110
[18889.820608]  [<ffffffff811d6e78>] check_object+0xa8/0x250
[18889.821525]  [<ffffffff811d7328>] __free_slab+0x158/0x1b0
[18889.822444]  [<ffffffff811d73b9>] discard_slab+0x39/0x50
[18889.823355]  [<ffffffff817d133a>] __slab_free+0x2e2/0x320
[18889.824250]  [<ffffffff8138ceed>] ? free_obj_work+0x5d/0xa0
[18889.825130]  [<ffffffff8138cb27>] ? debug_smp_processor_id+0x17/0x20
[18889.826007]  [<ffffffff811d7fd2>] kmem_cache_free+0x262/0x280
[18889.826882]  [<ffffffff8138cefc>] ? free_obj_work+0x6c/0xa0
[18889.827749]  [<ffffffff8138cefc>] free_obj_work+0x6c/0xa0
[18889.828612]  [<ffffffff810942ad>] process_one_work+0x1fd/0x590
[18889.829468]  [<ffffffff81094227>] ? process_one_work+0x177/0x590
[18889.830321]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[18889.831215]  [<ffffffff8109475b>] worker_thread+0x11b/0x490
[18889.832054]  [<ffffffff81094640>] ? process_one_work+0x590/0x590
[18889.832896]  [<ffffffff81099f79>] kthread+0xf9/0x110
[18889.833738]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[18889.834580]  [<ffffffff81099e80>] ? kthread_create_on_node+0x250/0x250
[18889.835425]  [<ffffffff817deb6c>] ret_from_fork+0x7c/0xb0
[18889.836268]  [<ffffffff81099e80>] ? kthread_create_on_node+0x250/0x250
[18889.837107] Code: 48 89 e5 77 3e 40 0f b6 f6 85 d2 89 d0 89 f1 74 2b 40 3a 37 0f 85 f1 00 00 00 83 e8 01 48 8d 44 07 01 eb 0f 0f 1f 80 00 00 00 00 <3a> 0f 0f 85 d8 00 00 00 48 83 c7 01 48 39 c7 75 ef 31 c0 5d c3 
[18889.838955] sending NMI to other CPUs:
[18889.839845] NMI backtrace for cpu 0
[18889.840760] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W      3.18.0+ #101 
[18889.841697] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[18889.842634] RIP: 0010:[<ffffffff813e1b15>]  [<ffffffff813e1b15>] intel_idle+0xd5/0x180
[18889.843583] RSP: 0018:ffffffff81c03e28  EFLAGS: 00000046
[18889.844523] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[18889.845467] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[18889.846410] RBP: ffffffff81c03e58 R08: 000000008baf90f8 R09: 0000000000000000
[18889.847355] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[18889.848300] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[18889.849240] FS:  0000000000000000(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[18889.850183] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18889.851118] CR2: 00007f7c0148c443 CR3: 0000000001c11000 CR4: 00000000001407f0
[18889.852058] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18889.852998] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[18889.853923] Stack:
[18889.854837]  0000000081c03e58 1972379d2eec42e9 ffffe8fffee03518 0000000000000005
[18889.855780]  ffffffff81cb2000 0000000000000000 ffffffff81c03ea8 ffffffff8166cb95
[18889.856727]  00001130b01225d0 ffffffff81cb21d0 ffffffff81c00000 ffffffff81d21ab0
[18889.857674] Call Trace:
[18889.858606]  [<ffffffff8166cb95>] cpuidle_enter_state+0x55/0x1c0
[18889.859532]  [<ffffffff8166cdb7>] cpuidle_enter+0x17/0x20
[18889.860435]  [<ffffffff810bf303>] cpu_startup_entry+0x423/0x4d0
[18889.861332]  [<ffffffff817ca403>] rest_init+0xc3/0xd0
[18889.862219]  [<ffffffff817ca345>] ? rest_init+0x5/0xd0
[18889.863099]  [<ffffffff81f21ee0>] ? ftrace_init+0xa8/0x13b
[18889.863975]  [<ffffffff81f0304c>] start_kernel+0x49d/0x4be
[18889.864840]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[18889.865703]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[18889.866564]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[18889.867421] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[18889.869301] NMI backtrace for cpu 1
[18889.869312] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 29.456 msecs
[18889.871119] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W      3.18.0+ #101 
[18889.872056] task: ffff8802428bade0 ti: ffff880242b58000 task.ti: ffff880242b58000
[18889.872994] RIP: 0010:[<ffffffff813e1b15>]  [<ffffffff813e1b15>] intel_idle+0xd5/0x180
[18889.873934] RSP: 0018:ffff880242b5bdf8  EFLAGS: 00000046
[18889.874867] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[18889.875814] RDX: 0000000000000000 RSI: ffff880242b5bfd8 RDI: 0000000000000001
[18889.876751] RBP: ffff880242b5be28 R08: 000000008baf90f8 R09: 0000000000000000
[18889.877685] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[18889.878619] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b58000
[18889.879556] FS:  0000000000000000(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[18889.880479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18889.881425] CR2: 00007f0f10335000 CR3: 0000000001c11000 CR4: 00000000001407e0
[18889.882361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18889.883292] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[18889.884219] Stack:
[18889.885136]  0000000142b5be28 8d64c020bc383a15 ffffe8ffff003518 0000000000000005
[18889.886087]  ffffffff81cb2000 0000000000000001 ffff880242b5be78 ffffffff8166cb95
[18889.887044]  00001130b011a3f3 ffffffff81cb21d0 ffff880242b58000 ffffffff81d21ab0
[18889.888005] Call Trace:
[18889.888970]  [<ffffffff8166cb95>] cpuidle_enter_state+0x55/0x1c0
[18889.889949]  [<ffffffff8166cdb7>] cpuidle_enter+0x17/0x20
[18889.890918]  [<ffffffff810bf303>] cpu_startup_entry+0x423/0x4d0
[18889.891884]  [<ffffffff810314c3>] start_secondary+0x1a3/0x220
[18889.892843] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[18889.894938] NMI backtrace for cpu 3
[18889.894946] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 55.092 msecs
[18889.896934] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W      3.18.0+ #101 
[18889.897911] task: ffff880242b65bc0 ti: ffff880242b6c000 task.ti: ffff880242b6c000
[18889.898890] RIP: 0010:[<ffffffff813e1b15>]  [<ffffffff813e1b15>] intel_idle+0xd5/0x180
[18889.899884] RSP: 0018:ffff880242b6fdf8  EFLAGS: 00000046
[18889.900889] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[18889.901859] RDX: 0000000000000000 RSI: ffff880242b6ffd8 RDI: 0000000000000003
[18889.902800] RBP: ffff880242b6fe28 R08: 000000008baf90f8 R09: 0000000000000000
[18889.903734] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[18889.904666] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b6c000
[18889.905599] FS:  0000000000000000(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[18889.906546] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18889.907502] CR2: 00007f0f10335000 CR3: 0000000001c11000 CR4: 00000000001407e0
[18889.908452] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18889.909396] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[18889.910329] Stack:
[18889.911297]  0000000342b6fe28 def23185c07e1b3d ffffe8ffff403518 0000000000000005
[18889.912265]  ffffffff81cb2000 0000000000000003 ffff880242b6fe78 ffffffff8166cb95
[18889.913254]  00001130b01223e8 ffffffff81cb21d0 ffff880242b6c000 ffffffff81d21ab0
[18889.914291] Call Trace:
[18889.915361]  [<ffffffff8166cb95>] cpuidle_enter_state+0x55/0x1c0
[18889.916402]  [<ffffffff8166cdb7>] cpuidle_enter+0x17/0x20
[18889.917448]  [<ffffffff810bf303>] cpu_startup_entry+0x423/0x4d0
[18889.918434]  [<ffffffff810314c3>] start_secondary+0x1a3/0x220
[18889.919385] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[18889.921511] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 81.657 msecs
[18889.922586] perf interrupt took too long (423445 > 19841), lowering kernel.perf_event_max_sample_rate to 6300

Then things are idle. Curiously, periodically I continued to see these..

[19729.163946] perf interrupt took too long (420164 > 39062), lowering kernel.perf_event_max_sample_rate to 3200
[20944.488815] perf interrupt took too long (416920 > 78125), lowering kernel.perf_event_max_sample_rate to 1600
[22102.596107] perf interrupt took too long (413690 > 156250), lowering kernel.perf_event_max_sample_rate to 800

I've seen those messages a fair bit on other machines do, and they drive me nuts
because there's no 'perf' being run. I think it means "NMI watchdog", but is
worded badly.  Still, it's curious that they appeared during what should have
been idle time.

I'll reboot the box and give it another shot, and see what falls out in the morning.

Oh, worth noting: on this run, I gave Chris's idea of disabling usb serial console a try.
I don't know if that's why I didn't see a total lockup this time or not..

Also this was from a run with just lsetxattr, 512 children..

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 22:57                                                                 ` Sasha Levin
@ 2014-12-12  6:54                                                                   ` Ingo Molnar
  2014-12-12 23:54                                                                   ` Sasha Levin
  1 sibling, 0 replies; 486+ messages in thread
From: Ingo Molnar @ 2014-12-12  6:54 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List


* Sasha Levin <sasha.levin@oracle.com> wrote:

> Right, and it reproduces in 3.10 as well, so it's not really a 
> new thing.
> 
> What's odd is that I don't remember seeing this bug so long in 
> the past, I'll try bisecting trinity rather than the kernel - 
> it's the only other thing that changed.

So I think DaveJ mentioned it that Trinity recently changed its 
test task count and is now more aggressively loading the system. 
Such a change might have made a dormant, resource limits related 
bug or load dependent race more likely.

I think at this point it would also be useful to debug the hang 
itself directly: using triggered printks and kgdb and drilling 
into all the data structures to figure out why the system isn't 
progressing.

If the bug triggers in a VM (which your testing uses) the failed 
kernel state ought to be a lot more accessible than bare metal.

That it triggers in a VM, and if it's the same bug as DaveJ's, 
that also makes the hardware bug theory a lot less likely.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-06 20:09                                                           ` Linus Torvalds
  2014-12-06 20:41                                                             ` Linus Torvalds
  2014-12-06 21:14                                                             ` Martin van Es
@ 2014-12-12 12:58                                                             ` Martin van Es
  2014-12-15 12:07                                                               ` Martin van Es
  2 siblings, 1 reply; 486+ messages in thread
From: Martin van Es @ 2014-12-12 12:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sat, Dec 6, 2014 at 9:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Dec 6, 2014 at 8:22 AM, Martin van Es <mrvanes@gmail.com> wrote:
>>
>> Hope this may help in finding the right direction for this bug?
>
> If you can reproduce it with your spare J1900 system and could perhaps
> bisect it there, that would be a huge help.
>

I've finally received the memory I needed to prepare the spare J1900
and have it now configured as mythfrontend to the DVB-C backend that
was freezing on 3.17.3. It's been playing liveTV for hours now and is
still going strong. I'd say the freezes can't be reproduced this way.
The only difference being the disk I/O that is missing on the
front-end.

I will give 3.18 a try on production J1900. Knowing I can go back to
safety in 3.16.7 won't hurt too much of my reputation I hope.

Best regards,
Martin
-- 
If 'but' was any useful, it would be a logic operator

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12  4:45                                                               ` Dave Jones
@ 2014-12-12 14:38                                                                 ` Dave Jones
  2014-12-12 18:24                                                                   ` Paul E. McKenney
  2014-12-12 18:10                                                                 ` Paul E. McKenney
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-12 14:38 UTC (permalink / raw)
  To: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 11:45:09PM -0500, Dave Jones wrote:

 > I've seen those messages a fair bit on other machines do, and they drive me nuts
 > because there's no 'perf' being run. I think it means "NMI watchdog", but is
 > worded badly.  Still, it's curious that they appeared during what should have
 > been idle time.
 > 
 > I'll reboot the box and give it another shot, and see what falls out in the morning.

Same deal again. It happened pretty quick after I'd gone to bed (typical).

[ 2754.509747] Clocksource tsc unstable (delta = -243594587656 ns)
[ 2754.519197] Switched to clocksource hpet
[ 2754.782940] INFO: rcu_preempt self-detected stall on CPU
[ 2754.782972] 	0: (1 GPs behind) idle=6ef/140000000000001/0 softirq=247160/247161 
[ 2754.782999] 	 (t=24343 jiffies g=104086 c=104085 q=0)
[ 2754.783022] Task dump for CPU 0:
[ 2754.783037] trinity-c387    R  running task    14016 13658  12780 0x00000008
[ 2754.783070]  ffff880222365bc0 000000005ed04994 ffff880244003d68 ffffffff810a8d46
[ 2754.783104]  ffffffff810a8cb2 0000000000000000 0000000000000001 0000000000000000
[ 2754.783138]  ffffffff81c51e40 0000000000000092 ffff880244003d88 ffffffff810acf4d
[ 2754.783171] Call Trace:
[ 2754.783184]  <IRQ>  [<ffffffff810a8d46>] sched_show_task+0x116/0x180
[ 2754.783215]  [<ffffffff810a8cb2>] ? sched_show_task+0x82/0x180
[ 2754.783239]  [<ffffffff810acf4d>] dump_cpu_task+0x3d/0x50
[ 2754.783261]  [<ffffffff810dc0c0>] rcu_dump_cpu_stacks+0x90/0xd0
[ 2754.783286]  [<ffffffff810e3db3>] rcu_check_callbacks+0x573/0x850
[ 2754.783311]  [<ffffffff8138cb43>] ? __this_cpu_preempt_check+0x13/0x20
[ 2754.783337]  [<ffffffff810ec043>] ? hrtimer_run_queues+0x43/0x130
[ 2754.783361]  [<ffffffff810ea5eb>] update_process_times+0x4b/0x80
[ 2754.783386]  [<ffffffff810fb2cc>] tick_sched_timer+0x4c/0x1b0
[ 2754.783409]  [<ffffffff810eb5db>] ? __run_hrtimer+0xbb/0x2e0
[ 2754.783432]  [<ffffffff810eb5db>] __run_hrtimer+0xbb/0x2e0
[ 2754.783454]  [<ffffffff810eb984>] ? hrtimer_interrupt+0x94/0x260
[ 2754.783478]  [<ffffffff810fb280>] ? tick_init_highres+0x20/0x20
[ 2754.783501]  [<ffffffff810eb9f7>] hrtimer_interrupt+0x107/0x260
[ 2754.783526]  [<ffffffff81033258>] local_apic_timer_interrupt+0x38/0x70
[ 2754.783552]  [<ffffffff817e16f5>] smp_apic_timer_interrupt+0x45/0x60
[ 2754.783578]  [<ffffffff817dfadf>] apic_timer_interrupt+0x6f/0x80
[ 2754.783600]  <EOI>  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160
[ 2754.783634]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[ 2754.783659]  [<ffffffff8118524c>] ? __alloc_pages_nodemask+0x1ac/0xb60
[ 2754.783684]  [<ffffffff811cf4be>] ? alloc_pages_vma+0xee/0x1b0
[ 2754.783708]  [<ffffffff810ad575>] ? local_clock+0x25/0x30
[ 2754.783731]  [<ffffffff810c6e2c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[ 2754.783756]  [<ffffffff810a73e1>] ? get_parent_ip+0x11/0x50
[ 2754.783779]  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160
[ 2754.784613]  [<ffffffff811cf4be>] alloc_pages_vma+0xee/0x1b0
[ 2754.785452]  [<ffffffff811aa21a>] ? do_wp_page+0xca/0x7d0
[ 2754.786305]  [<ffffffff811aa21a>] do_wp_page+0xca/0x7d0
[ 2754.787140]  [<ffffffff811acb6b>] handle_mm_fault+0x6cb/0xe90
[ 2754.787948]  [<ffffffff81042b20>] ? __do_page_fault+0x140/0x600
[ 2754.788748]  [<ffffffff81042b84>] __do_page_fault+0x1a4/0x600
[ 2754.789562]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[ 2754.790340]  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160
[ 2754.791113]  [<ffffffff810a73e1>] ? get_parent_ip+0x11/0x50
[ 2754.791879]  [<ffffffff810a755b>] ? preempt_count_sub+0x7b/0x100
[ 2754.792646]  [<ffffffff8137fa0d>] ? trace_hardirqs_off_thunk+0x3a/0x3f
[ 2754.793413]  [<ffffffff81042fec>] do_page_fault+0xc/0x10
[ 2754.794176]  [<ffffffff817e0862>] page_fault+0x22/0x30
[ 2754.794938] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2754.795736] 	Tasks blocked on level-0 rcu_node (CPUs 0-7):
[ 2754.796528] 	0: (1 GPs behind) idle=6ef/140000000000001/0 softirq=247160/247161 
[ 2754.797334] 	Tasks blocked on level-0 rcu_node (CPUs 0-7):
[ 2754.798153] 	(detected by 3, t=24343 jiffies, g=104086, c=104085, q=0)
[ 2754.798981] Task dump for CPU 0:
[ 2754.799797] trinity-c387    R  running task    14016 13658  12780 0x00000008
[ 2754.800630]  ffff880222365bc0 0000000000000246 0000000127e77e08 8000000000000865
[ 2754.801461]  ffff8802000000a9 800000008d044865 0000000000000000 ffff8802256d3c70
[ 2754.802288]  ffff880227e77e28 00000000000000a9 0000000000d8eff8 ffff880227e77f58
[ 2754.803109] Call Trace:
[ 2754.803929]  [<ffffffff81042b84>] ? __do_page_fault+0x1a4/0x600
[ 2754.804765]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[ 2754.805596]  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160
[ 2754.806444]  [<ffffffff810a73e1>] ? get_parent_ip+0x11/0x50
[ 2754.807267]  [<ffffffff810a755b>] ? preempt_count_sub+0x7b/0x100
[ 2754.808088]  [<ffffffff8137fa0d>] ? trace_hardirqs_off_thunk+0x3a/0x3f
[ 2754.808920]  [<ffffffff81042fec>] ? do_page_fault+0xc/0x10
[ 2754.809742]  [<ffffffff817e0862>] ? page_fault+0x22/0x30
[ 2771.561356] ------------[ cut here ]------------
[ 2771.562079] WARNING: CPU: 0 PID: 13696 at kernel/watchdog.c:317 watchdog_overflow_callback+0xdd/0x130()
[ 2771.562879] Watchdog detected hard LOCKUP on cpu 0
[ 2771.562895] CPU: 0 PID: 13696 Comm: trinity-c425 Not tainted 3.18.0+ #101 
[ 2771.564490]  ffffffff81a66315 00000000fce35109 ffff880244005b88 ffffffff817d317e
[ 2771.565315]  0000000000110004 ffff880244005be0 ffff880244005bc8 ffffffff81078a01
[ 2771.566136]  0000000000000000 0000000000000000 0000000000000000 ffff880244005d30
[ 2771.566954] Call Trace:
[ 2771.567759]  <NMI>  [<ffffffff817d317e>] dump_stack+0x4f/0x7c
[ 2771.568584]  [<ffffffff81078a01>] warn_slowpath_common+0x81/0xa0
[ 2771.569405]  [<ffffffff81078a75>] warn_slowpath_fmt+0x55/0x70
[ 2771.570253]  [<ffffffff8112fea0>] ? restart_watchdog_hrtimer+0x60/0x60
[ 2771.571074]  [<ffffffff8112ff7d>] watchdog_overflow_callback+0xdd/0x130
[ 2771.571894]  [<ffffffff81173a7c>] __perf_event_overflow+0xac/0x2a0
[ 2771.572721]  [<ffffffff81019952>] ? x86_perf_event_set_period+0xe2/0x150
[ 2771.573551]  [<ffffffff81174644>] perf_event_overflow+0x14/0x20
[ 2771.574378]  [<ffffffff8101f479>] intel_pmu_handle_irq+0x209/0x410
[ 2771.575210]  [<ffffffff8101875b>] perf_event_nmi_handler+0x2b/0x50
[ 2771.576040]  [<ffffffff81007634>] nmi_handle+0xa4/0x1e0
[ 2771.576868]  [<ffffffff81007595>] ? nmi_handle+0x5/0x1e0
[ 2771.577698]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
[ 2771.578526]  [<ffffffff810079aa>] default_do_nmi+0x7a/0x1d0
[ 2771.579354]  [<ffffffff81007bb8>] do_nmi+0xb8/0xf0
[ 2771.580206]  [<ffffffff817e0c2a>] end_repeat_nmi+0x1e/0x2e
[ 2771.581023]  [<ffffffff817d0f85>] ? free_debug_processing+0x157/0x22a
[ 2771.581836]  [<ffffffff817d0f85>] ? free_debug_processing+0x157/0x22a
[ 2771.582644]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
[ 2771.583452]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
[ 2771.584253]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
[ 2771.585042]  <<EOE>>  [<ffffffff81005710>] dump_trace+0x140/0x370
[ 2771.585841]  [<ffffffff812005c6>] ? final_putname+0x26/0x50
[ 2771.586636]  [<ffffffff81013ecf>] save_stack_trace+0x2f/0x50
[ 2771.587430]  [<ffffffff811d4f20>] set_track+0x70/0x140
[ 2771.588217]  [<ffffffff817d0f85>] free_debug_processing+0x157/0x22a
[ 2771.589015]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
[ 2771.589815]  [<ffffffff817d10ad>] __slab_free+0x55/0x320
[ 2771.590636]  [<ffffffff8138e016>] ? debug_check_no_obj_freed+0x156/0x250
[ 2771.591442]  [<ffffffff81212294>] ? mntput+0x24/0x40
[ 2771.592242]  [<ffffffff811d7fd2>] kmem_cache_free+0x262/0x280
[ 2771.593036]  [<ffffffff812005c6>] ? final_putname+0x26/0x50
[ 2771.593831]  [<ffffffff812005c6>] final_putname+0x26/0x50
[ 2771.594622]  [<ffffffff81200869>] putname+0x29/0x40
[ 2771.595411]  [<ffffffff8120166e>] user_path_at_empty+0x6e/0xc0
[ 2771.596199]  [<ffffffff81212197>] ? mntput_no_expire+0x67/0x140
[ 2771.596986]  [<ffffffff81212135>] ? mntput_no_expire+0x5/0x140
[ 2771.597766]  [<ffffffff81207df6>] ? dput+0x56/0x190
[ 2771.598542]  [<ffffffff812016d1>] user_path_at+0x11/0x20
[ 2771.599311]  [<ffffffff812187ec>] path_setxattr+0x4c/0xe0
[ 2771.600097]  [<ffffffff81218a51>] SyS_lsetxattr+0x11/0x20
[ 2771.600848]  [<ffffffff817dec12>] system_call_fastpath+0x12/0x17
[ 2771.601598] ---[ end trace 7b78126c55dcb717 ]---
[ 2771.602404] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 40.989 msecs
[ 2771.603175] perf interrupt took too long (322423 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 3471.463812] perf interrupt took too long (319933 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[ 4563.539619] perf interrupt took too long (317460 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[ 5676.723413] perf interrupt took too long (315015 > 19841), lowering kernel.perf_event_max_sample_rate to 6300
[ 6800.751151] perf interrupt took too long (312583 > 39062), lowering kernel.perf_event_max_sample_rate to 3200
[ 8056.882309] perf interrupt took too long (310176 > 78125), lowering kernel.perf_event_max_sample_rate to 1600
[ 9233.809073] perf interrupt took too long (307790 > 156250), lowering kernel.perf_event_max_sample_rate to 800

again, the box survived.   Next run I'll try undoing Chris' idea of no serial,
and see if it wedges after the spew.  After that, I'll do a no preempt run.

	Dave



^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12  4:45                                                               ` Dave Jones
  2014-12-12 14:38                                                                 ` Dave Jones
@ 2014-12-12 18:10                                                                 ` Paul E. McKenney
  2014-12-12 18:42                                                                   ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-12 18:10 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 11:45:09PM -0500, Dave Jones wrote:
> On Thu, Dec 11, 2014 at 10:03:43PM -0500, Dave Jones wrote:
>  > On Thu, Dec 11, 2014 at 01:49:17PM -0800, Linus Torvalds wrote:
>  >  
>  >  > Anyway, you might as well stop bisecting. Regardless of where it lands
>  >  > in the remaining pile, it's not going to give us any useful
>  >  > information, methinks.
>  >  > 
>  >  > I'm stumped.
>  > 
>  > yeah, likewise.  I don't recall any bug that's given me this much headache.
>  > I don't think it's helped that the symptoms are vague enough that a
>  > number of people have thought they've seen the same thing, which have
>  > turned out to be unrelated incidents.  At least some of those have
>  > gotten closure though it seems.
>  > 
>  >  > Maybe it's worth it to concentrate on just testing current kernels,
>  >  > and instead try to limit the triggering some other way. In particular,
>  >  > you had a trinity run that was *only* testing lsetxattr(). Is that
>  >  > really *all* that was going on? Obviously trinity will be using
>  >  > timers, fork, and other things? Can you recreate that lsetxattr thing,
>  >  > and just try to get as many problem reports as possible from one
>  >  > particular kernel (say, 3.18, since that should be a reasonable modern
>  >  > base with hopefully not a lot of other random issues)?
>  > 
>  > I'll let it run overnight, but so far after 4hrs, on .18 it's not done
>  > anything.
> 
> Two hours later, it had spewed this, but survived. (Trinity had quit after that
> point because /proc/sys/kernel/tainted changed).

[ . . . ]

> Few seconds later rcu craps itself..
> 
> [18801.941908] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [18801.942920] 	3: (3 GPs behind) idle=bf4/0/0 softirq=1597256/1597257 
> [18801.943890] 	(detected by 0, t=6002 jiffies, g=763359, c=763358, q=0)
> [18801.944843] Task dump for CPU 3:
> [18801.945770] swapper/3       R  running task    14576     0      1 0x00200000
> [18801.946706]  0000000342b6fe28 def23185c07e1b3d ffffe8ffff403518 0000000000000001
> [18801.947629]  ffffffff81cb2000 0000000000000003 ffff880242b6fe78 ffffffff8166cb95
> [18801.948557]  0000111242adb59f ffffffff81cb2070 ffff880242b6c000 ffffffff81d21ab0
> [18801.949478] Call Trace:
> [18801.950384]  [<ffffffff8166cb95>] ? cpuidle_enter_state+0x55/0x1c0
> [18801.951303]  [<ffffffff8166cdb7>] ? cpuidle_enter+0x17/0x20
> [18801.952211]  [<ffffffff810bf303>] ? cpu_startup_entry+0x423/0x4d0
> [18801.953125]  [<ffffffff810314c3>] ? start_secondary+0x1a3/0x220

Very strange.  Both cpuidle_enter() and cpuidle_enter_state() should be
within the idle loop, so that RCU should be ignoring this CPU.  And the
"idle=bf4/0/0" means that it really has marked itself as being idle from
an RCU perspective.  So I am guessing that the RCU grace-period kthread
has not gotten a chance to run.

If you are willing to live a bit dangerously, could you please see if
the (not for mainline) patch below clears this up?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Run grace-period kthreads at real-time priority

This is a experimental commit that attempts to better handle high-load
situations.

Not-yet-signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/init/Kconfig b/init/Kconfig
index cecce1b13825..6db1f304157c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -677,7 +677,6 @@ config RCU_BOOST
 config RCU_KTHREAD_PRIO
 	int "Real-time priority to use for RCU worker threads"
 	range 1 99
-	depends on RCU_BOOST
 	default 1
 	help
 	  This option specifies the SCHED_FIFO priority value that will be
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 93bca38925a9..57fd8f5bd1ad 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -156,6 +156,10 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 static void invoke_rcu_core(void);
 static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
 
+/* rcuc/rcub kthread realtime priority */
+static int kthread_prio = CONFIG_RCU_KTHREAD_PRIO;
+module_param(kthread_prio, int, 0644);
+
 /*
  * Track the rcutorture test sequence number and the update version
  * number within a given test.  The rcutorture_testseq is incremented
@@ -3631,15 +3635,19 @@ static int __init rcu_spawn_gp_kthread(void)
 	unsigned long flags;
 	struct rcu_node *rnp;
 	struct rcu_state *rsp;
+	struct sched_param sp;
 	struct task_struct *t;
 
 	rcu_scheduler_fully_active = 1;
 	for_each_rcu_flavor(rsp) {
-		t = kthread_run(rcu_gp_kthread, rsp, "%s", rsp->name);
+		t = kthread_create(rcu_gp_kthread, rsp, "%s", rsp->name);
 		BUG_ON(IS_ERR(t));
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
 		rsp->gp_kthread = t;
+		sp.sched_priority = kthread_prio;
+		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+		wake_up_process(t);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
 	rcu_spawn_nocb_kthreads();
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index cf3b4d532379..564944964f14 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -34,10 +34,6 @@
 
 #include "../locking/rtmutex_common.h"
 
-/* rcuc/rcub kthread realtime priority */
-static int kthread_prio = CONFIG_RCU_KTHREAD_PRIO;
-module_param(kthread_prio, int, 0644);
-
 /*
  * Control variables for per-CPU and per-rcu_node kthreads.  These
  * handle all flavors of RCU.


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 14:38                                                                 ` Dave Jones
@ 2014-12-12 18:24                                                                   ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-12 18:24 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 09:38:46AM -0500, Dave Jones wrote:
> On Thu, Dec 11, 2014 at 11:45:09PM -0500, Dave Jones wrote:
> 
>  > I've seen those messages a fair bit on other machines do, and they drive me nuts
>  > because there's no 'perf' being run. I think it means "NMI watchdog", but is
>  > worded badly.  Still, it's curious that they appeared during what should have
>  > been idle time.
>  > 
>  > I'll reboot the box and give it another shot, and see what falls out in the morning.
> 
> Same deal again. It happened pretty quick after I'd gone to bed (typical).
> 
> [ 2754.509747] Clocksource tsc unstable (delta = -243594587656 ns)
> [ 2754.519197] Switched to clocksource hpet
> [ 2754.782940] INFO: rcu_preempt self-detected stall on CPU
> [ 2754.782972] 	0: (1 GPs behind) idle=6ef/140000000000001/0 softirq=247160/247161 

In this one, the CPU is at least non-idle.  ;-)

> [ 2754.782999] 	 (t=24343 jiffies g=104086 c=104085 q=0)
> [ 2754.783022] Task dump for CPU 0:
> [ 2754.783037] trinity-c387    R  running task    14016 13658  12780 0x00000008
> [ 2754.783070]  ffff880222365bc0 000000005ed04994 ffff880244003d68 ffffffff810a8d46
> [ 2754.783104]  ffffffff810a8cb2 0000000000000000 0000000000000001 0000000000000000
> [ 2754.783138]  ffffffff81c51e40 0000000000000092 ffff880244003d88 ffffffff810acf4d
> [ 2754.783171] Call Trace:
> [ 2754.783184]  <IRQ>  [<ffffffff810a8d46>] sched_show_task+0x116/0x180
> [ 2754.783215]  [<ffffffff810a8cb2>] ? sched_show_task+0x82/0x180
> [ 2754.783239]  [<ffffffff810acf4d>] dump_cpu_task+0x3d/0x50
> [ 2754.783261]  [<ffffffff810dc0c0>] rcu_dump_cpu_stacks+0x90/0xd0
> [ 2754.783286]  [<ffffffff810e3db3>] rcu_check_callbacks+0x573/0x850
> [ 2754.783311]  [<ffffffff8138cb43>] ? __this_cpu_preempt_check+0x13/0x20
> [ 2754.783337]  [<ffffffff810ec043>] ? hrtimer_run_queues+0x43/0x130
> [ 2754.783361]  [<ffffffff810ea5eb>] update_process_times+0x4b/0x80
> [ 2754.783386]  [<ffffffff810fb2cc>] tick_sched_timer+0x4c/0x1b0
> [ 2754.783409]  [<ffffffff810eb5db>] ? __run_hrtimer+0xbb/0x2e0
> [ 2754.783432]  [<ffffffff810eb5db>] __run_hrtimer+0xbb/0x2e0
> [ 2754.783454]  [<ffffffff810eb984>] ? hrtimer_interrupt+0x94/0x260
> [ 2754.783478]  [<ffffffff810fb280>] ? tick_init_highres+0x20/0x20
> [ 2754.783501]  [<ffffffff810eb9f7>] hrtimer_interrupt+0x107/0x260
> [ 2754.783526]  [<ffffffff81033258>] local_apic_timer_interrupt+0x38/0x70
> [ 2754.783552]  [<ffffffff817e16f5>] smp_apic_timer_interrupt+0x45/0x60
> [ 2754.783578]  [<ffffffff817dfadf>] apic_timer_interrupt+0x6f/0x80

Looks like standard scheduling-clock interrupt above this point.

> [ 2754.783600]  <EOI>  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160

If this was an acquisition rather than a release, I would suspect high
lock contention.  Could just be luck of the draw,I suppose.

Or am I missing something subtle here?

							Thanx, Paul

> [ 2754.783634]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
> [ 2754.783659]  [<ffffffff8118524c>] ? __alloc_pages_nodemask+0x1ac/0xb60
> [ 2754.783684]  [<ffffffff811cf4be>] ? alloc_pages_vma+0xee/0x1b0
> [ 2754.783708]  [<ffffffff810ad575>] ? local_clock+0x25/0x30
> [ 2754.783731]  [<ffffffff810c6e2c>] ? __lock_acquire.isra.31+0x22c/0x9f0
> [ 2754.783756]  [<ffffffff810a73e1>] ? get_parent_ip+0x11/0x50
> [ 2754.783779]  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160
> [ 2754.784613]  [<ffffffff811cf4be>] alloc_pages_vma+0xee/0x1b0
> [ 2754.785452]  [<ffffffff811aa21a>] ? do_wp_page+0xca/0x7d0
> [ 2754.786305]  [<ffffffff811aa21a>] do_wp_page+0xca/0x7d0
> [ 2754.787140]  [<ffffffff811acb6b>] handle_mm_fault+0x6cb/0xe90
> [ 2754.787948]  [<ffffffff81042b20>] ? __do_page_fault+0x140/0x600
> [ 2754.788748]  [<ffffffff81042b84>] __do_page_fault+0x1a4/0x600
> [ 2754.789562]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
> [ 2754.790340]  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160
> [ 2754.791113]  [<ffffffff810a73e1>] ? get_parent_ip+0x11/0x50
> [ 2754.791879]  [<ffffffff810a755b>] ? preempt_count_sub+0x7b/0x100
> [ 2754.792646]  [<ffffffff8137fa0d>] ? trace_hardirqs_off_thunk+0x3a/0x3f
> [ 2754.793413]  [<ffffffff81042fec>] do_page_fault+0xc/0x10
> [ 2754.794176]  [<ffffffff817e0862>] page_fault+0x22/0x30
> [ 2754.794938] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2754.795736] 	Tasks blocked on level-0 rcu_node (CPUs 0-7):
> [ 2754.796528] 	0: (1 GPs behind) idle=6ef/140000000000001/0 softirq=247160/247161 
> [ 2754.797334] 	Tasks blocked on level-0 rcu_node (CPUs 0-7):
> [ 2754.798153] 	(detected by 3, t=24343 jiffies, g=104086, c=104085, q=0)
> [ 2754.798981] Task dump for CPU 0:
> [ 2754.799797] trinity-c387    R  running task    14016 13658  12780 0x00000008
> [ 2754.800630]  ffff880222365bc0 0000000000000246 0000000127e77e08 8000000000000865
> [ 2754.801461]  ffff8802000000a9 800000008d044865 0000000000000000 ffff8802256d3c70
> [ 2754.802288]  ffff880227e77e28 00000000000000a9 0000000000d8eff8 ffff880227e77f58
> [ 2754.803109] Call Trace:
> [ 2754.803929]  [<ffffffff81042b84>] ? __do_page_fault+0x1a4/0x600
> [ 2754.804765]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
> [ 2754.805596]  [<ffffffff810c541d>] ? lock_release_holdtime.part.24+0x9d/0x160
> [ 2754.806444]  [<ffffffff810a73e1>] ? get_parent_ip+0x11/0x50
> [ 2754.807267]  [<ffffffff810a755b>] ? preempt_count_sub+0x7b/0x100
> [ 2754.808088]  [<ffffffff8137fa0d>] ? trace_hardirqs_off_thunk+0x3a/0x3f
> [ 2754.808920]  [<ffffffff81042fec>] ? do_page_fault+0xc/0x10
> [ 2754.809742]  [<ffffffff817e0862>] ? page_fault+0x22/0x30
> [ 2771.561356] ------------[ cut here ]------------
> [ 2771.562079] WARNING: CPU: 0 PID: 13696 at kernel/watchdog.c:317 watchdog_overflow_callback+0xdd/0x130()
> [ 2771.562879] Watchdog detected hard LOCKUP on cpu 0
> [ 2771.562895] CPU: 0 PID: 13696 Comm: trinity-c425 Not tainted 3.18.0+ #101 
> [ 2771.564490]  ffffffff81a66315 00000000fce35109 ffff880244005b88 ffffffff817d317e
> [ 2771.565315]  0000000000110004 ffff880244005be0 ffff880244005bc8 ffffffff81078a01
> [ 2771.566136]  0000000000000000 0000000000000000 0000000000000000 ffff880244005d30
> [ 2771.566954] Call Trace:
> [ 2771.567759]  <NMI>  [<ffffffff817d317e>] dump_stack+0x4f/0x7c
> [ 2771.568584]  [<ffffffff81078a01>] warn_slowpath_common+0x81/0xa0
> [ 2771.569405]  [<ffffffff81078a75>] warn_slowpath_fmt+0x55/0x70
> [ 2771.570253]  [<ffffffff8112fea0>] ? restart_watchdog_hrtimer+0x60/0x60
> [ 2771.571074]  [<ffffffff8112ff7d>] watchdog_overflow_callback+0xdd/0x130
> [ 2771.571894]  [<ffffffff81173a7c>] __perf_event_overflow+0xac/0x2a0
> [ 2771.572721]  [<ffffffff81019952>] ? x86_perf_event_set_period+0xe2/0x150
> [ 2771.573551]  [<ffffffff81174644>] perf_event_overflow+0x14/0x20
> [ 2771.574378]  [<ffffffff8101f479>] intel_pmu_handle_irq+0x209/0x410
> [ 2771.575210]  [<ffffffff8101875b>] perf_event_nmi_handler+0x2b/0x50
> [ 2771.576040]  [<ffffffff81007634>] nmi_handle+0xa4/0x1e0
> [ 2771.576868]  [<ffffffff81007595>] ? nmi_handle+0x5/0x1e0
> [ 2771.577698]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
> [ 2771.578526]  [<ffffffff810079aa>] default_do_nmi+0x7a/0x1d0
> [ 2771.579354]  [<ffffffff81007bb8>] do_nmi+0xb8/0xf0
> [ 2771.580206]  [<ffffffff817e0c2a>] end_repeat_nmi+0x1e/0x2e
> [ 2771.581023]  [<ffffffff817d0f85>] ? free_debug_processing+0x157/0x22a
> [ 2771.581836]  [<ffffffff817d0f85>] ? free_debug_processing+0x157/0x22a
> [ 2771.582644]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
> [ 2771.583452]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
> [ 2771.584253]  [<ffffffff81006de1>] ? print_context_stack+0xe1/0x100
> [ 2771.585042]  <<EOE>>  [<ffffffff81005710>] dump_trace+0x140/0x370
> [ 2771.585841]  [<ffffffff812005c6>] ? final_putname+0x26/0x50
> [ 2771.586636]  [<ffffffff81013ecf>] save_stack_trace+0x2f/0x50
> [ 2771.587430]  [<ffffffff811d4f20>] set_track+0x70/0x140
> [ 2771.588217]  [<ffffffff817d0f85>] free_debug_processing+0x157/0x22a
> [ 2771.589015]  [<ffffffff810c50fe>] ? put_lock_stats.isra.23+0xe/0x30
> [ 2771.589815]  [<ffffffff817d10ad>] __slab_free+0x55/0x320
> [ 2771.590636]  [<ffffffff8138e016>] ? debug_check_no_obj_freed+0x156/0x250
> [ 2771.591442]  [<ffffffff81212294>] ? mntput+0x24/0x40
> [ 2771.592242]  [<ffffffff811d7fd2>] kmem_cache_free+0x262/0x280
> [ 2771.593036]  [<ffffffff812005c6>] ? final_putname+0x26/0x50
> [ 2771.593831]  [<ffffffff812005c6>] final_putname+0x26/0x50
> [ 2771.594622]  [<ffffffff81200869>] putname+0x29/0x40
> [ 2771.595411]  [<ffffffff8120166e>] user_path_at_empty+0x6e/0xc0
> [ 2771.596199]  [<ffffffff81212197>] ? mntput_no_expire+0x67/0x140
> [ 2771.596986]  [<ffffffff81212135>] ? mntput_no_expire+0x5/0x140
> [ 2771.597766]  [<ffffffff81207df6>] ? dput+0x56/0x190
> [ 2771.598542]  [<ffffffff812016d1>] user_path_at+0x11/0x20
> [ 2771.599311]  [<ffffffff812187ec>] path_setxattr+0x4c/0xe0
> [ 2771.600097]  [<ffffffff81218a51>] SyS_lsetxattr+0x11/0x20
> [ 2771.600848]  [<ffffffff817dec12>] system_call_fastpath+0x12/0x17
> [ 2771.601598] ---[ end trace 7b78126c55dcb717 ]---
> [ 2771.602404] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 40.989 msecs
> [ 2771.603175] perf interrupt took too long (322423 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> [ 3471.463812] perf interrupt took too long (319933 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
> [ 4563.539619] perf interrupt took too long (317460 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
> [ 5676.723413] perf interrupt took too long (315015 > 19841), lowering kernel.perf_event_max_sample_rate to 6300
> [ 6800.751151] perf interrupt took too long (312583 > 39062), lowering kernel.perf_event_max_sample_rate to 3200
> [ 8056.882309] perf interrupt took too long (310176 > 78125), lowering kernel.perf_event_max_sample_rate to 1600
> [ 9233.809073] perf interrupt took too long (307790 > 156250), lowering kernel.perf_event_max_sample_rate to 800
> 
> again, the box survived.   Next run I'll try undoing Chris' idea of no serial,
> and see if it wedges after the spew.  After that, I'll do a no preempt run.
> 
> 	Dave
> 
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 18:10                                                                 ` Paul E. McKenney
@ 2014-12-12 18:42                                                                   ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-12 18:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 10:10:44AM -0800, Paul E. McKenney wrote:

 > > [18801.941908] INFO: rcu_preempt detected stalls on CPUs/tasks:
 > > [18801.942920] 	3: (3 GPs behind) idle=bf4/0/0 softirq=1597256/1597257 
 > > [18801.943890] 	(detected by 0, t=6002 jiffies, g=763359, c=763358, q=0)
 > > [18801.944843] Task dump for CPU 3:
 > > [18801.945770] swapper/3       R  running task    14576     0      1 0x00200000
 > > [18801.946706]  0000000342b6fe28 def23185c07e1b3d ffffe8ffff403518 0000000000000001
 > > [18801.947629]  ffffffff81cb2000 0000000000000003 ffff880242b6fe78 ffffffff8166cb95
 > > [18801.948557]  0000111242adb59f ffffffff81cb2070 ffff880242b6c000 ffffffff81d21ab0
 > > [18801.949478] Call Trace:
 > > [18801.950384]  [<ffffffff8166cb95>] ? cpuidle_enter_state+0x55/0x1c0
 > > [18801.951303]  [<ffffffff8166cdb7>] ? cpuidle_enter+0x17/0x20
 > > [18801.952211]  [<ffffffff810bf303>] ? cpu_startup_entry+0x423/0x4d0
 > > [18801.953125]  [<ffffffff810314c3>] ? start_secondary+0x1a3/0x220
 > 
 > Very strange.  Both cpuidle_enter() and cpuidle_enter_state() should be
 > within the idle loop, so that RCU should be ignoring this CPU.  And the
 > "idle=bf4/0/0" means that it really has marked itself as being idle from
 > an RCU perspective.  So I am guessing that the RCU grace-period kthread
 > has not gotten a chance to run.
 > 
 > If you are willing to live a bit dangerously, could you please see if
 > the (not for mainline) patch below clears this up?

I'll try anything at this point, regardless of danger level :)

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 21:49                                                           ` Linus Torvalds
                                                                               ` (2 preceding siblings ...)
  2014-12-12  3:03                                                             ` Dave Jones
@ 2014-12-12 18:54                                                             ` Dave Jones
  2014-12-12 19:14                                                               ` Linus Torvalds
  3 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-12 18:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Thu, Dec 11, 2014 at 01:49:17PM -0800, Linus Torvalds wrote:

 > Maybe it's worth it to concentrate on just testing current kernels,
 > and instead try to limit the triggering some other way. In particular,
 > you had a trinity run that was *only* testing lsetxattr(). Is that
 > really *all* that was going on? Obviously trinity will be using
 > timers, fork, and other things? Can you recreate that lsetxattr thing,
 > and just try to get as many problem reports as possible from one
 > particular kernel (say, 3.18, since that should be a reasonable modern
 > base with hopefully not a lot of other random issues)?

Something that's still making me wonder if it's some kind of hardware
problem is the non-deterministic nature of this bug.
Take the example above, by limiting trinity to doing nothing but lsetxattr's.
Why would the bug sometimes take 3-4 hours to shake out, and another
run take just 45 minutes.

"different entropy" really shouldn't matter a huge amount here. Even if
we end up picking different pathnames to pass in, it's the same source
(proc,sys,/dev).   The other arguments are a crapshoot, but it seems
unlikely that it would matter hugely whatever values they are.

If it *is* a kernel bug, it's not going to be in lsetxattr, but rather
some kind of scheduling or mm related thing that happens in some corner
case when we're under extreme load. That I can drive up the loadavg with
lsetxattr is I suspect just a symptom rather than the cause.

If enough callers pass in huge 'len' arguments, and an mmap that's big
enough to cover that size, I could see that giving the kernel a lot of
work to do.

Another thing I keep thinking is "well, how is this different from
a forkbomb?". The user account I'm running under has no ulimit set on
the maximum memory size for eg, but if that were the problem, surely
I'd be seeing the oom-killer rather than lockups.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 18:54                                                             ` Dave Jones
@ 2014-12-12 19:14                                                               ` Linus Torvalds
  2014-12-12 19:23                                                                 ` Dave Jones
                                                                                   ` (4 more replies)
  0 siblings, 5 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-12 19:14 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
>
> Something that's still making me wonder if it's some kind of hardware
> problem is the non-deterministic nature of this bug.

I'd expect it to be a race condition, though. Which can easily cause
these kinds of issues, and the timing will be pretty random even if
the load is very regular.

And we know that the scheduler has an integer overflow under Sasha's
loads, although I didn't hear anything from Ingo and friends about it.
Ingo/Peter, you were cc'd on that report, where at least one of the
multiplcations in wake_affine() ended up overflowing..

Some scheduler thing that overflows only under heavy load, and screws
up scheduling could easily account for the RCU thread thing. I see it
*less* easily accounting for DaveJ's case, though, because the
watchdog is running at RT priority,  and the scheduler would have to
screw up much more to then not schedule an RT task, but..

I'm also not sure if the bug ever happens with preemption disabled.
Sasha, was that you who reported that you cannot reproduce it without
preemption? It strikes me that there's a race condition in
__cond_resched() wrt preemption, for example: we do

        __preempt_count_add(PREEMPT_ACTIVE);
        __schedule();
        __preempt_count_sub(PREEMPT_ACTIVE);

and in between the __schedule() and __preempt_count_sub(), if an
interrupt comes in and wakes up some important process, it won't
reschedule (because preemption is active), but then we enable
preemption again and don't check whether we should reschedule (again),
and we just go on our merry ways.

Now, I don't see how that could really matter for a long time -
returning to user space will check need_resched, and sleeping will
obviously force a reschedule anyway, so these kinds of races should at
most delay things by just a tiny amount, but maybe there is some case
where we screw up in a bigger way. So I do *not* believe that the one
in __cond_resched() matters, but I'm giving it as an example of the
kind of things that could go wrong.

                        Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 19:14                                                               ` Linus Torvalds
@ 2014-12-12 19:23                                                                 ` Dave Jones
  2014-12-12 19:58                                                                 ` David Lang
                                                                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-12 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 11:14:06AM -0800, Linus Torvalds wrote:
 > On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > Something that's still making me wonder if it's some kind of hardware
 > > problem is the non-deterministic nature of this bug.
 > 
 > I'd expect it to be a race condition, though. Which can easily cause
 > these kinds of issues, and the timing will be pretty random even if
 > the load is very regular.
 > 
 > I'm also not sure if the bug ever happens with preemption disabled.

After tomorrow, I'm not going to be in front of this machine until
Wednesday, so I'll leave a no-preempt build running for the duration.
Hopefully that will give us some clues.  I might be able to log into
it while travelling, so I'll provide updates where possible as long
as it doesn't wedge solid.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 19:14                                                               ` Linus Torvalds
  2014-12-12 19:23                                                                 ` Dave Jones
@ 2014-12-12 19:58                                                                 ` David Lang
  2014-12-12 20:20                                                                   ` Linus Torvalds
  2014-12-12 20:34                                                                   ` Paul E. McKenney
  2014-12-13  7:36                                                                 ` [PATCH] sched: Fix lost reschedule in __cond_resched() Ingo Molnar
                                                                                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 486+ messages in thread
From: David Lang @ 2014-12-12 19:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, 12 Dec 2014, Linus Torvalds wrote:

> I'm also not sure if the bug ever happens with preemption disabled.
> Sasha, was that you who reported that you cannot reproduce it without
> preemption? It strikes me that there's a race condition in
> __cond_resched() wrt preemption, for example: we do
>
>        __preempt_count_add(PREEMPT_ACTIVE);
>        __schedule();
>        __preempt_count_sub(PREEMPT_ACTIVE);
>
> and in between the __schedule() and __preempt_count_sub(), if an
> interrupt comes in and wakes up some important process, it won't
> reschedule (because preemption is active), but then we enable
> preemption again and don't check whether we should reschedule (again),
> and we just go on our merry ways.
>
> Now, I don't see how that could really matter for a long time -
> returning to user space will check need_resched, and sleeping will
> obviously force a reschedule anyway, so these kinds of races should at
> most delay things by just a tiny amount,

If the machine has NOHZ and has a cpu bound userspace task, it could take quite 
a while before userspace would trigger a reschedule (at least if I've understood 
the comments on this thread properly)

David Lang

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 19:58                                                                 ` David Lang
@ 2014-12-12 20:20                                                                   ` Linus Torvalds
  2014-12-13  7:43                                                                     ` Ingo Molnar
  2014-12-12 20:34                                                                   ` Paul E. McKenney
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-12 20:20 UTC (permalink / raw)
  To: David Lang
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 11:58 AM, David Lang <david@lang.hm> wrote:
>
> If the machine has NOHZ and has a cpu bound userspace task, it could take
> quite a while before userspace would trigger a reschedule (at least if I've
> understood the comments on this thread properly)

The thing is, we'd have to return to user space for that to happen.
And when we do that, we check the "should we schedule" flag again. So
races like this really shouldn't matter, but there could be something
kind-of-similar that just ends up causing a wakeup to be delayed.

But it would need to be delayed for seconds (for the RCU threads) or
for tens of seconds (for the watchdog) to matter.

Which just seems unlikely. Even the "very high load" thing shouldn't
really matter, since while that could delay one particular thread
being scheduled, it shouldn't delay the next "should we schedule"
test. In fact, high load would normally be extected to make the next
"should we schedule" come faster.

But this is where some load calculation overflow might screw things
up, of course.

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 19:58                                                                 ` David Lang
  2014-12-12 20:20                                                                   ` Linus Torvalds
@ 2014-12-12 20:34                                                                   ` Paul E. McKenney
  2014-12-12 21:23                                                                     ` Sasha Levin
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-12 20:34 UTC (permalink / raw)
  To: David Lang
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 11:58:50AM -0800, David Lang wrote:
> On Fri, 12 Dec 2014, Linus Torvalds wrote:
> 
> >I'm also not sure if the bug ever happens with preemption disabled.
> >Sasha, was that you who reported that you cannot reproduce it without
> >preemption? It strikes me that there's a race condition in
> >__cond_resched() wrt preemption, for example: we do
> >
> >       __preempt_count_add(PREEMPT_ACTIVE);
> >       __schedule();
> >       __preempt_count_sub(PREEMPT_ACTIVE);
> >
> >and in between the __schedule() and __preempt_count_sub(), if an
> >interrupt comes in and wakes up some important process, it won't
> >reschedule (because preemption is active), but then we enable
> >preemption again and don't check whether we should reschedule (again),
> >and we just go on our merry ways.
> >
> >Now, I don't see how that could really matter for a long time -
> >returning to user space will check need_resched, and sleeping will
> >obviously force a reschedule anyway, so these kinds of races should at
> >most delay things by just a tiny amount,
> 
> If the machine has NOHZ and has a cpu bound userspace task, it could
> take quite a while before userspace would trigger a reschedule (at
> least if I've understood the comments on this thread properly)

Dave, Sasha, if you guys are running CONFIG_NO_HZ_FULL=y and
CONFIG_NO_HZ_FULL_ALL=y, please let me know.  I am currently assuming
that none of your CPUs are in NO_HZ_FULL mode.  If this assumption is
incorrect, there are some other pieces of RCU that I should be taking
a hard look at.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 20:34                                                                   ` Paul E. McKenney
@ 2014-12-12 21:23                                                                     ` Sasha Levin
  2014-12-13  0:58                                                                       ` Paul E. McKenney
  2014-12-13  8:30                                                                       ` Ingo Molnar
  0 siblings, 2 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-12 21:23 UTC (permalink / raw)
  To: paulmck, David Lang
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/12/2014 03:34 PM, Paul E. McKenney wrote:
> On Fri, Dec 12, 2014 at 11:58:50AM -0800, David Lang wrote:
>> > On Fri, 12 Dec 2014, Linus Torvalds wrote:
>> > 
>>> > >I'm also not sure if the bug ever happens with preemption disabled.
>>> > >Sasha, was that you who reported that you cannot reproduce it without
>>> > >preemption? It strikes me that there's a race condition in
>>> > >__cond_resched() wrt preemption, for example: we do
>>> > >
>>> > >       __preempt_count_add(PREEMPT_ACTIVE);
>>> > >       __schedule();
>>> > >       __preempt_count_sub(PREEMPT_ACTIVE);
>>> > >
>>> > >and in between the __schedule() and __preempt_count_sub(), if an
>>> > >interrupt comes in and wakes up some important process, it won't
>>> > >reschedule (because preemption is active), but then we enable
>>> > >preemption again and don't check whether we should reschedule (again),
>>> > >and we just go on our merry ways.
>>> > >
>>> > >Now, I don't see how that could really matter for a long time -
>>> > >returning to user space will check need_resched, and sleeping will
>>> > >obviously force a reschedule anyway, so these kinds of races should at
>>> > >most delay things by just a tiny amount,
>> > 
>> > If the machine has NOHZ and has a cpu bound userspace task, it could
>> > take quite a while before userspace would trigger a reschedule (at
>> > least if I've understood the comments on this thread properly)
> Dave, Sasha, if you guys are running CONFIG_NO_HZ_FULL=y and
> CONFIG_NO_HZ_FULL_ALL=y, please let me know.  I am currently assuming
> that none of your CPUs are in NO_HZ_FULL mode.  If this assumption is
> incorrect, there are some other pieces of RCU that I should be taking
> a hard look at.

This is my no_hz related config:

$ grep NO_HZ .config
CONFIG_NO_HZ_COMMON=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_NO_HZ_FULL_SYSIDLE=y
CONFIG_NO_HZ_FULL_SYSIDLE_SMALL=8
CONFIG_NO_HZ=y
CONFIG_RCU_FAST_NO_HZ=y

And from dmesg:

[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000]  RCU debugfs-based tracing is enabled.
[    0.000000]  Hierarchical RCU autobalancing is disabled.
[    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000]  Additional per-CPU info printed with stalls.
[    0.000000]  RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=28.
[    0.000000]  RCU kthread priority: 1.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=28
[    0.000000] NR_IRQS:524544 nr_irqs:648 16
[    0.000000] NO_HZ: Clearing 0 from nohz_full range for timekeeping
[    0.000000] NO_HZ: Full dynticks CPUs: 1-27.
[    0.000000]  Offload RCU callbacks from CPUs: 1-27.

Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-11 22:57                                                                 ` Sasha Levin
  2014-12-12  6:54                                                                   ` Ingo Molnar
@ 2014-12-12 23:54                                                                   ` Sasha Levin
  2014-12-13  0:23                                                                     ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-12 23:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/11/2014 05:57 PM, Sasha Levin wrote:
> On 12/11/2014 05:36 PM, Linus Torvalds wrote:
>> > On Thu, Dec 11, 2014 at 1:52 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>>>> >> >
>>>> >> > Is it possible that Dave and myself were seeing the same problem after
>>>> >> > all?
>> > Could be. You do have commonalities, even if the actual symptoms then
>> > differ. And while it looked different when you could trigger it with
>> > 3.16 but DaveJ couldn't, that's up in the air now that I doubt that
>> > 3.16 really is ok for DaveJ after all..
>> > 
>> > And you might have a better luck bisecting it, since you seem to be
>> > able to trigger your RCU lockup much more quickly (and apparently
>> > reliably? Correct?)
> Right, and it reproduces in 3.10 as well, so it's not really a new thing.
> 
> What's odd is that I don't remember seeing this bug so long in the past,
> I'll try bisecting trinity rather than the kernel - it's the only other
> thing that changed.

So I checked out trinity from half a year ago, and could not reproduce the
stall any more. Not on v3.16 nor on the current -next.

I ran bisection on trinity, rather than the kernel, and got the following
result:

commit f2be2d5ffe4bf896eb5418972013822a2bef0cee
Author: Dave Jones <davej@redhat.com>
Date:   Mon Aug 4 19:55:17 2014 -0400

    begin some infrastructure to use a bunch of test files for fsx like ops.

I've been running trinity f2be2d5ff^ on -next for two hours now, and there's
no sign of a lockup. Previously it took ~10 minutes trigger.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 23:54                                                                   ` Sasha Levin
@ 2014-12-13  0:23                                                                     ` Linus Torvalds
  2014-12-13  0:34                                                                       ` Sasha Levin
  2014-12-13  2:32                                                                       ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-13  0:23 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 3:54 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> I ran bisection on trinity, rather than the kernel, and got the following
> result:

Heh. That commit is pretty small, but I guess the effect of having a
number of regular files open and being used on the trinity loads can
be almost arbitrarily large.

Where do those files get opened? What filesystem?

That might actually explain your and DaveJ's big differences: you are
running in virtualization, and I remember some 9p traces with virtio
etc from your reports.

While DaveJ obviously runs on bare hardware, possibly on /tmp and tmpfs?

> I've been running trinity f2be2d5ff^ on -next for two hours now, and there's
> no sign of a lockup. Previously it took ~10 minutes trigger.

DaveJ?  Is there anything limiting the size of those files?

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  0:23                                                                     ` Linus Torvalds
@ 2014-12-13  0:34                                                                       ` Sasha Levin
  2014-12-13  0:44                                                                         ` Linus Torvalds
  2014-12-13  2:32                                                                       ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-13  0:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On 12/12/2014 07:23 PM, Linus Torvalds wrote:
> On Fri, Dec 12, 2014 at 3:54 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> >
>> > I ran bisection on trinity, rather than the kernel, and got the following
>> > result:
> Heh. That commit is pretty small, but I guess the effect of having a
> number of regular files open and being used on the trinity loads can
> be almost arbitrarily large.
> 
> Where do those files get opened? What filesystem?

Right, it's virtio-9p. However, virtio-9p acts merely as a proxy to an underlying
tmpfs - so while it's slow, I don't think it's way slower than the average disk
backed ext4.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  0:34                                                                       ` Sasha Levin
@ 2014-12-13  0:44                                                                         ` Linus Torvalds
  2014-12-13 16:28                                                                           ` Jeff Chua
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-13  0:44 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 4:34 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> Right, it's virtio-9p. However, virtio-9p acts merely as a proxy to an underlying
> tmpfs - so while it's slow, I don't think it's way slower than the average disk
> backed ext4.

I was thinking more in the sense of "how much of the trouble is about
something like tmpfs eating tons of memory when trinity starts doing
random system calls on those files".

I was also thinking that some of it might be filesystem-specific. We
already *did* see one trace where it was in the loop getting virtio
channel data. Maybe it's actually possible to overwhelm the 9p
filesystem exactly because the backing store is tmpfs, and basically
have a CPU 100% busy handling ring events from the virtual
filesystem..

But I'm just flailing..

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 21:23                                                                     ` Sasha Levin
@ 2014-12-13  0:58                                                                       ` Paul E. McKenney
  2014-12-13 12:08                                                                         ` Paul E. McKenney
  2014-12-13  8:30                                                                       ` Ingo Molnar
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-13  0:58 UTC (permalink / raw)
  To: Sasha Levin
  Cc: David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 04:23:56PM -0500, Sasha Levin wrote:
> On 12/12/2014 03:34 PM, Paul E. McKenney wrote:
> > On Fri, Dec 12, 2014 at 11:58:50AM -0800, David Lang wrote:
> >> > On Fri, 12 Dec 2014, Linus Torvalds wrote:
> >> > 
> >>> > >I'm also not sure if the bug ever happens with preemption disabled.
> >>> > >Sasha, was that you who reported that you cannot reproduce it without
> >>> > >preemption? It strikes me that there's a race condition in
> >>> > >__cond_resched() wrt preemption, for example: we do
> >>> > >
> >>> > >       __preempt_count_add(PREEMPT_ACTIVE);
> >>> > >       __schedule();
> >>> > >       __preempt_count_sub(PREEMPT_ACTIVE);
> >>> > >
> >>> > >and in between the __schedule() and __preempt_count_sub(), if an
> >>> > >interrupt comes in and wakes up some important process, it won't
> >>> > >reschedule (because preemption is active), but then we enable
> >>> > >preemption again and don't check whether we should reschedule (again),
> >>> > >and we just go on our merry ways.
> >>> > >
> >>> > >Now, I don't see how that could really matter for a long time -
> >>> > >returning to user space will check need_resched, and sleeping will
> >>> > >obviously force a reschedule anyway, so these kinds of races should at
> >>> > >most delay things by just a tiny amount,
> >> > 
> >> > If the machine has NOHZ and has a cpu bound userspace task, it could
> >> > take quite a while before userspace would trigger a reschedule (at
> >> > least if I've understood the comments on this thread properly)
> > Dave, Sasha, if you guys are running CONFIG_NO_HZ_FULL=y and
> > CONFIG_NO_HZ_FULL_ALL=y, please let me know.  I am currently assuming
> > that none of your CPUs are in NO_HZ_FULL mode.  If this assumption is
> > incorrect, there are some other pieces of RCU that I should be taking
> > a hard look at.
> 
> This is my no_hz related config:
> 
> $ grep NO_HZ .config
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_NO_HZ_IDLE is not set
> CONFIG_NO_HZ_FULL=y
> CONFIG_NO_HZ_FULL_ALL=y
> CONFIG_NO_HZ_FULL_SYSIDLE=y
> CONFIG_NO_HZ_FULL_SYSIDLE_SMALL=8
> CONFIG_NO_HZ=y
> CONFIG_RCU_FAST_NO_HZ=y
> 
> And from dmesg:
> 
> [    0.000000] Preemptible hierarchical RCU implementation.
> [    0.000000]  RCU debugfs-based tracing is enabled.
> [    0.000000]  Hierarchical RCU autobalancing is disabled.
> [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
> [    0.000000]  Additional per-CPU info printed with stalls.
> [    0.000000]  RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=28.
> [    0.000000]  RCU kthread priority: 1.
> [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=28
> [    0.000000] NR_IRQS:524544 nr_irqs:648 16
> [    0.000000] NO_HZ: Clearing 0 from nohz_full range for timekeeping
> [    0.000000] NO_HZ: Full dynticks CPUs: 1-27.
> [    0.000000]  Offload RCU callbacks from CPUs: 1-27.

Thank you, Sasha.  Looks like I have a few more places to take a hard
look at, then!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  0:23                                                                     ` Linus Torvalds
  2014-12-13  0:34                                                                       ` Sasha Levin
@ 2014-12-13  2:32                                                                       ` Dave Jones
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-13  2:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 04:23:16PM -0800, Linus Torvalds wrote:
 > On Fri, Dec 12, 2014 at 3:54 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
 > >
 > > I ran bisection on trinity, rather than the kernel, and got the following
 > > result:
 > 
 > Heh. That commit is pretty small, but I guess the effect of having a
 > number of regular files open and being used on the trinity loads can
 > be almost arbitrarily large.
 > 
 > Where do those files get opened? What filesystem?

in the cwd where trinity is run from. In my case, ext4

 > > I've been running trinity f2be2d5ff^ on -next for two hours now, and there's
 > > no sign of a lockup. Previously it took ~10 minutes trigger.
 > 
 > DaveJ?  Is there anything limiting the size of those files?

Nope. We could for eg, do a random truncate() with a huge size, and it would
try and create something enormous.  write()'s are limited to page size,
there might be some other syscalls I haven't added safety guards to.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* [PATCH] sched: Fix lost reschedule in __cond_resched()
  2014-12-12 19:14                                                               ` Linus Torvalds
  2014-12-12 19:23                                                                 ` Dave Jones
  2014-12-12 19:58                                                                 ` David Lang
@ 2014-12-13  7:36                                                                 ` Ingo Molnar
  2014-12-14 18:04                                                                   ` Frederic Weisbecker
  2014-12-13  8:19                                                                 ` frequent lockups in 3.18rc4 Ingo Molnar
  2014-12-13 16:59                                                                 ` Dave Jones
  4 siblings, 1 reply; 486+ messages in thread
From: Ingo Molnar @ 2014-12-13  7:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I'm also not sure if the bug ever happens with preemption 
> disabled. Sasha, was that you who reported that you cannot 
> reproduce it without preemption? It strikes me that there's a 
> race condition in __cond_resched() wrt preemption, for example: 
> we do
> 
>         __preempt_count_add(PREEMPT_ACTIVE);
>         __schedule();
>         __preempt_count_sub(PREEMPT_ACTIVE);
> 
> and in between the __schedule() and __preempt_count_sub(), if 
> an interrupt comes in and wakes up some important process, it 
> won't reschedule (because preemption is active), but then we 
> enable preemption again and don't check whether we should 
> reschedule (again), and we just go on our merry ways.

Indeed, that's a really good find regardless of whether it's the 
source of these lockups - the (untested) patch below ought to 
cure that.

> Now, I don't see how that could really matter for a long time - 
> returning to user space will check need_resched, and sleeping 
> will obviously force a reschedule anyway, so these kinds of 
> races should at most delay things by just a tiny amount, but 
> maybe there is some case where we screw up in a bigger way. So 
> I do *not* believe that the one in __cond_resched() matters, 
> but I'm giving it as an example of the kind of things that 
> could go wrong.

(as you later note) NOHZ is somewhat special in this regard, 
because there we try really hard not to run anything 
periodically, so a lost reschedule will matter more.

But ... I'd be surprised if this patch made a difference: it 
should normally not be possible to go idle with tasks on the 
runqueue (even with this bug present), and with at least one busy 
task on the CPU we get the regular scheduler tick which ought to 
hide such latencies.

It's nevertheless a good thing to fix, I'm just not sure it's the 
root cause of the observed lockup here.

Thanks,

	Ingo

--

Reported-by: Linus Torvalds <torvalds@linux-foundation.org> 

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bb398c0c5f08..532809aa0544 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4207,6 +4207,8 @@ static void __cond_resched(void)
 	__preempt_count_add(PREEMPT_ACTIVE);
 	__schedule();
 	__preempt_count_sub(PREEMPT_ACTIVE);
+	if (need_resched())
+		__schedule();
 }
 
 int __sched _cond_resched(void)

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 20:20                                                                   ` Linus Torvalds
@ 2014-12-13  7:43                                                                     ` Ingo Molnar
  0 siblings, 0 replies; 486+ messages in thread
From: Ingo Molnar @ 2014-12-13  7:43 UTC (permalink / raw)
  To: Linus Torvalds, Frédéric Weisbecker
  Cc: David Lang, Dave Jones, Chris Mason, Mike Galbraith,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Dec 12, 2014 at 11:58 AM, David Lang <david@lang.hm> wrote:
> >
> > If the machine has NOHZ and has a cpu bound userspace task, 
> > it could take quite a while before userspace would trigger a 
> > reschedule (at least if I've understood the comments on this 
> > thread properly)
> 
> The thing is, we'd have to return to user space for that to 
> happen. And when we do that, we check the "should we schedule" 
> flag again. So races like this really shouldn't matter, but 
> there could be something kind-of-similar that just ends up 
> causing a wakeup to be delayed.

Furthermore there ought to be a scheduler tick active in that 
case - which won't be as fast as an immediate reschedule, but 
fast enough to beat the softlockup watchdog's threshold of 20 
seconds or so.

That is why I think it would be interesting to examine how the 
locked up state looks like: is the system truly locked up, 
impossible to log in to, locks held but not released, etc., or is 
the lockup transient?

> But it would need to be delayed for seconds (for the RCU 
> threads) or for tens of seconds (for the watchdog) to matter.
> 
> Which just seems unlikely. Even the "very high load" thing 
> shouldn't really matter, since while that could delay one 
> particular thread being scheduled, it shouldn't delay the next 
> "should we schedule" test. In fact, high load would normally be 
> extected to make the next "should we schedule" come faster.
> 
> But this is where some load calculation overflow might screw 
> things up, of course.

Also, the percpu watchdog threads are SCHED_FIFO:99, woken up 
through percpu hrtimers, which are not easy to delay through high 
SCHED_OTHER load.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 17:47                                           ` Mike Galbraith
@ 2014-12-13  8:11                                             ` Ingo Molnar
  2014-12-13  9:57                                               ` Mike Galbraith
  0 siblings, 1 reply; 486+ messages in thread
From: Ingo Molnar @ 2014-12-13  8:11 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Linus Torvalds, Peter Zijlstra, Chris Mason, Dâniel Fraga,
	Dave Jones, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List


* Mike Galbraith <umgwanakikbuti@gmail.com> wrote:

> On Tue, 2014-12-02 at 08:33 -0800, Linus Torvalds wrote:
> 
> > Looking again at that patch (the commit message still doesn't strike
> > me as wonderfully explanatory :^) makes me worry, though.
> > 
> > Is that
> > 
> >         if (rq->skip_clock_update-- > 0)
> >                 return;
> > 
> > really right? If skip_clock_update was zero (normal), it now gets set
> > to -1, which has its own specific meaning (see "force clock update"
> > comment in kernel/sched/rt.c). Is that intentional? That seems insane.
> 
> Yeah, it was intentional.  Least lines.
> 
> > Or should it be
> > 
> >         if (rq->skip_clock_update > 0) {
> >                 rq->skip_clock_update = 0;
> >                 return;
> >         }
> > 
> > or what? Maybe there was a reason the patch never got applied even to -tip.
> 
> Peterz was looking at corner case proofing the thing.  Saving those
> cycles has been entirely too annoying.
> 
> https://lkml.org/lkml/2014/4/8/295

Hm, so that discussion died with:

  https://lkml.org/lkml/2014/4/8/343

Did you ever get around to trying Peter's patch?

But ... I've yet to see rq_clock problems cause actual lockups. 
That's the main problem we have with its (un)robustness and why 
Peter created that rq_clock debug facility: bugs there cause 
latencies but no easily actionable symptoms, which are much 
harder to debug.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 19:14                                                               ` Linus Torvalds
                                                                                   ` (2 preceding siblings ...)
  2014-12-13  7:36                                                                 ` [PATCH] sched: Fix lost reschedule in __cond_resched() Ingo Molnar
@ 2014-12-13  8:19                                                                 ` Ingo Molnar
  2014-12-13  8:27                                                                   ` Ingo Molnar
  2014-12-13 16:59                                                                 ` Dave Jones
  4 siblings, 1 reply; 486+ messages in thread
From: Ingo Molnar @ 2014-12-13  8:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
>
> >
> > Something that's still making me wonder if it's some kind of 
> > hardware problem is the non-deterministic nature of this bug.
> 
> I'd expect it to be a race condition, though. Which can easily 
> cause these kinds of issues, and the timing will be pretty 
> random even if the load is very regular.
> 
> And we know that the scheduler has an integer overflow under 
> Sasha's loads, although I didn't hear anything from Ingo and 
> friends about it. Ingo/Peter, you were cc'd on that report, 
> where at least one of the multiplcations in wake_affine() ended 
> up overflowing..

Just to make sure, is there any other wake_affine report other 
than the one in this thread? (I tried a wake_affine full text 
search on my inbox and didn't find anything that appeared 
relevant.)

> Some scheduler thing that overflows only under heavy load, and 
> screws up scheduling could easily account for the RCU thread 
> thing. I see it *less* easily accounting for DaveJ's case, 
> though, because the watchdog is running at RT priority, and the 
> scheduler would have to screw up much more to then not schedule 
> an RT task, but..

Yeah, the RT scheduler is harder (but not impossible) to confuse 
due to its simplicity, but scheduler counts overflowing could 
definitely cause all sorts of trouble and make debugging harder, 
so we want to fix it regardless of its likelihood of causing 
lockups.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  8:19                                                                 ` frequent lockups in 3.18rc4 Ingo Molnar
@ 2014-12-13  8:27                                                                   ` Ingo Molnar
  2014-12-13 14:15                                                                     ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Ingo Molnar @ 2014-12-13  8:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
> >
> > >
> > > Something that's still making me wonder if it's some kind of 
> > > hardware problem is the non-deterministic nature of this bug.
> > 
> > I'd expect it to be a race condition, though. Which can easily 
> > cause these kinds of issues, and the timing will be pretty 
> > random even if the load is very regular.
> > 
> > And we know that the scheduler has an integer overflow under 
> > Sasha's loads, although I didn't hear anything from Ingo and 
> > friends about it. Ingo/Peter, you were cc'd on that report, 
> > where at least one of the multiplcations in wake_affine() ended 
> > up overflowing..
> 
> Just to make sure, is there any other wake_affine report other 
> than the one in this thread? (I tried a wake_affine full text 
> search on my inbox and didn't find anything that appeared 
> relevant.)

Found the report from Sasha:

    sched: odd values for effective load calculations

right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 21:23                                                                     ` Sasha Levin
  2014-12-13  0:58                                                                       ` Paul E. McKenney
@ 2014-12-13  8:30                                                                       ` Ingo Molnar
  2014-12-13 15:53                                                                         ` Sasha Levin
  1 sibling, 1 reply; 486+ messages in thread
From: Ingo Molnar @ 2014-12-13  8:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: paulmck, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List


* Sasha Levin <sasha.levin@oracle.com> wrote:

> On 12/12/2014 03:34 PM, Paul E. McKenney wrote:
> > On Fri, Dec 12, 2014 at 11:58:50AM -0800, David Lang wrote:
> >> > On Fri, 12 Dec 2014, Linus Torvalds wrote:
> >> > 
> >>> > >I'm also not sure if the bug ever happens with preemption disabled.
> >>> > >Sasha, was that you who reported that you cannot reproduce it without
> >>> > >preemption? It strikes me that there's a race condition in
> >>> > >__cond_resched() wrt preemption, for example: we do
> >>> > >
> >>> > >       __preempt_count_add(PREEMPT_ACTIVE);
> >>> > >       __schedule();
> >>> > >       __preempt_count_sub(PREEMPT_ACTIVE);
> >>> > >
> >>> > >and in between the __schedule() and __preempt_count_sub(), if an
> >>> > >interrupt comes in and wakes up some important process, it won't
> >>> > >reschedule (because preemption is active), but then we enable
> >>> > >preemption again and don't check whether we should reschedule (again),
> >>> > >and we just go on our merry ways.
> >>> > >
> >>> > >Now, I don't see how that could really matter for a long time -
> >>> > >returning to user space will check need_resched, and sleeping will
> >>> > >obviously force a reschedule anyway, so these kinds of races should at
> >>> > >most delay things by just a tiny amount,
> >> > 
> >> > If the machine has NOHZ and has a cpu bound userspace task, it could
> >> > take quite a while before userspace would trigger a reschedule (at
> >> > least if I've understood the comments on this thread properly)
> > Dave, Sasha, if you guys are running CONFIG_NO_HZ_FULL=y and
> > CONFIG_NO_HZ_FULL_ALL=y, please let me know.  I am currently assuming
> > that none of your CPUs are in NO_HZ_FULL mode.  If this assumption is
> > incorrect, there are some other pieces of RCU that I should be taking
> > a hard look at.
> 
> This is my no_hz related config:
> 
> $ grep NO_HZ .config
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_NO_HZ_IDLE is not set
> CONFIG_NO_HZ_FULL=y
> CONFIG_NO_HZ_FULL_ALL=y

Just curious, if you disable NO_HZ_FULL_ALL, does the bug change?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  8:11                                             ` Ingo Molnar
@ 2014-12-13  9:57                                               ` Mike Galbraith
  0 siblings, 0 replies; 486+ messages in thread
From: Mike Galbraith @ 2014-12-13  9:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Peter Zijlstra, Chris Mason, Dâniel Fraga,
	Dave Jones, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Sat, 2014-12-13 at 09:11 +0100, Ingo Molnar wrote: 
> * Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
> 
> > On Tue, 2014-12-02 at 08:33 -0800, Linus Torvalds wrote:
> > 
> > > Looking again at that patch (the commit message still doesn't strike
> > > me as wonderfully explanatory :^) makes me worry, though.
> > > 
> > > Is that
> > > 
> > >         if (rq->skip_clock_update-- > 0)
> > >                 return;
> > > 
> > > really right? If skip_clock_update was zero (normal), it now gets set
> > > to -1, which has its own specific meaning (see "force clock update"
> > > comment in kernel/sched/rt.c). Is that intentional? That seems insane.
> > 
> > Yeah, it was intentional.  Least lines.
> > 
> > > Or should it be
> > > 
> > >         if (rq->skip_clock_update > 0) {
> > >                 rq->skip_clock_update = 0;
> > >                 return;
> > >         }
> > > 
> > > or what? Maybe there was a reason the patch never got applied even to -tip.
> > 
> > Peterz was looking at corner case proofing the thing.  Saving those
> > cycles has been entirely too annoying.
> > 
> > https://lkml.org/lkml/2014/4/8/295
> 
> Hm, so that discussion died with:
> 
>   https://lkml.org/lkml/2014/4/8/343
> 
> Did you ever get around to trying Peter's patch?

I couldn't plug it into the production -ENOBOOT IO beasts from hell, but
did run it on my little desktop a bit.
> But ... I've yet to see rq_clock problems cause actual lockups. 
> That's the main problem we have with its (un)robustness and why 
> Peter created that rq_clock debug facility: bugs there cause 
> latencies but no easily actionable symptoms, which are much 
> harder to debug.

If watchdog gets credit for zillion disk detection time, it can end up
throttled for what is effectively forever.

-Mike


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  0:58                                                                       ` Paul E. McKenney
@ 2014-12-13 12:08                                                                         ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-13 12:08 UTC (permalink / raw)
  To: Sasha Levin
  Cc: David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 04:58:07PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 12, 2014 at 04:23:56PM -0500, Sasha Levin wrote:
> > On 12/12/2014 03:34 PM, Paul E. McKenney wrote:
> > > On Fri, Dec 12, 2014 at 11:58:50AM -0800, David Lang wrote:
> > >> > On Fri, 12 Dec 2014, Linus Torvalds wrote:
> > >> > 
> > >>> > >I'm also not sure if the bug ever happens with preemption disabled.
> > >>> > >Sasha, was that you who reported that you cannot reproduce it without
> > >>> > >preemption? It strikes me that there's a race condition in
> > >>> > >__cond_resched() wrt preemption, for example: we do
> > >>> > >
> > >>> > >       __preempt_count_add(PREEMPT_ACTIVE);
> > >>> > >       __schedule();
> > >>> > >       __preempt_count_sub(PREEMPT_ACTIVE);
> > >>> > >
> > >>> > >and in between the __schedule() and __preempt_count_sub(), if an
> > >>> > >interrupt comes in and wakes up some important process, it won't
> > >>> > >reschedule (because preemption is active), but then we enable
> > >>> > >preemption again and don't check whether we should reschedule (again),
> > >>> > >and we just go on our merry ways.
> > >>> > >
> > >>> > >Now, I don't see how that could really matter for a long time -
> > >>> > >returning to user space will check need_resched, and sleeping will
> > >>> > >obviously force a reschedule anyway, so these kinds of races should at
> > >>> > >most delay things by just a tiny amount,
> > >> > 
> > >> > If the machine has NOHZ and has a cpu bound userspace task, it could
> > >> > take quite a while before userspace would trigger a reschedule (at
> > >> > least if I've understood the comments on this thread properly)
> > > Dave, Sasha, if you guys are running CONFIG_NO_HZ_FULL=y and
> > > CONFIG_NO_HZ_FULL_ALL=y, please let me know.  I am currently assuming
> > > that none of your CPUs are in NO_HZ_FULL mode.  If this assumption is
> > > incorrect, there are some other pieces of RCU that I should be taking
> > > a hard look at.
> > 
> > This is my no_hz related config:
> > 
> > $ grep NO_HZ .config
> > CONFIG_NO_HZ_COMMON=y
> > # CONFIG_NO_HZ_IDLE is not set
> > CONFIG_NO_HZ_FULL=y
> > CONFIG_NO_HZ_FULL_ALL=y
> > CONFIG_NO_HZ_FULL_SYSIDLE=y
> > CONFIG_NO_HZ_FULL_SYSIDLE_SMALL=8
> > CONFIG_NO_HZ=y
> > CONFIG_RCU_FAST_NO_HZ=y
> > 
> > And from dmesg:
> > 
> > [    0.000000] Preemptible hierarchical RCU implementation.
> > [    0.000000]  RCU debugfs-based tracing is enabled.
> > [    0.000000]  Hierarchical RCU autobalancing is disabled.
> > [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
> > [    0.000000]  Additional per-CPU info printed with stalls.
> > [    0.000000]  RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=28.
> > [    0.000000]  RCU kthread priority: 1.
> > [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=28
> > [    0.000000] NR_IRQS:524544 nr_irqs:648 16
> > [    0.000000] NO_HZ: Clearing 0 from nohz_full range for timekeeping
> > [    0.000000] NO_HZ: Full dynticks CPUs: 1-27.
> > [    0.000000]  Offload RCU callbacks from CPUs: 1-27.
> 
> Thank you, Sasha.  Looks like I have a few more places to take a hard
> look at, then!

And one effect of CONFIG_NO_HZ_FULL=y and CONFIG_NO_HZ_FULL_ALL=y is
that all the grace-period kthread are pinned to CPU 0.  In addition,
all of CPUs 1-27 are offloaded, and all of the resulting rcuo kthreads
(which invoke RCU callbacks) are also pinned to CPU 0.  If you are then
running a heavy in-kernel workload that generates lots of callbacks, it
is easy to imagine that CPU 0 might be getting overloaded.  After all,
this combination of Kconfig parameters was designed for HPC and real-time
workloads that spend most of their time in userspace.

If you are allowing your workload to run on CPU 0, it would be very
interesting to see what happens if you restrict your workload to run on
CPUs 1-27.

Alternatively, your could boot with nohz_full=2-27 (or maybe even
nohz_full=4-27).  This will override CONFIG_NO_HZ_FULL_ALL=y and will
provide two (or four with 4-27) housekeeping CPUs that are available to
run things like RCU grace-period kthreads and RCU callback processing.
This might allow RCU to get the CPU bandwidth it needs despite
competition from your workload.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  8:27                                                                   ` Ingo Molnar
@ 2014-12-13 14:15                                                                     ` Sasha Levin
  0 siblings, 0 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-13 14:15 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Peter Zijlstra,
	Dâniel Fraga, Paul E. McKenney, Linux Kernel Mailing List

On 12/13/2014 03:27 AM, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
>>
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>> On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
>>>
>>>>
>>>> Something that's still making me wonder if it's some kind of 
>>>> hardware problem is the non-deterministic nature of this bug.
>>>
>>> I'd expect it to be a race condition, though. Which can easily 
>>> cause these kinds of issues, and the timing will be pretty 
>>> random even if the load is very regular.
>>>
>>> And we know that the scheduler has an integer overflow under 
>>> Sasha's loads, although I didn't hear anything from Ingo and 
>>> friends about it. Ingo/Peter, you were cc'd on that report, 
>>> where at least one of the multiplcations in wake_affine() ended 
>>> up overflowing..
>>
>> Just to make sure, is there any other wake_affine report other 
>> than the one in this thread? (I tried a wake_affine full text 
>> search on my inbox and didn't find anything that appeared 
>> relevant.)
> 
> Found the report from Sasha:
> 
>     sched: odd values for effective load calculations
> 
> right?

Yup, that's the one.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  8:30                                                                       ` Ingo Molnar
@ 2014-12-13 15:53                                                                         ` Sasha Levin
  2014-12-13 18:07                                                                           ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-13 15:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: paulmck, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/13/2014 03:30 AM, Ingo Molnar wrote:
>> > This is my no_hz related config:
>> > 
>> > $ grep NO_HZ .config
>> > CONFIG_NO_HZ_COMMON=y
>> > # CONFIG_NO_HZ_IDLE is not set
>> > CONFIG_NO_HZ_FULL=y
>> > CONFIG_NO_HZ_FULL_ALL=y
> Just curious, if you disable NO_HZ_FULL_ALL, does the bug change?

On 12/13/2014 07:08 AM, Paul E. McKenney wrote:
> Alternatively, your could boot with nohz_full=2-27 (or maybe even
> nohz_full=4-27).  This will override CONFIG_NO_HZ_FULL_ALL=y and will
> provide two (or four with 4-27) housekeeping CPUs that are available to
> run things like RCU grace-period kthreads and RCU callback processing.
> This might allow RCU to get the CPU bandwidth it needs despite
> competition from your workload.

I've tried both nohz_full=4-27 and disabling CONFIG_NO_HZ_FULL_ALL
altogether, but I'm still seeing the stall:

[  725.670017] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  725.670017]  0: (11 ticks this GP) idle=bbd/140000000000002/0 softirq=11529/11529 fqs=0 last_accelerate: 9d0e/a648, nonlazy_posted: 721357, ..
[  725.670017]  (detected by 16, t=2102 jiffies, g=9857, c=9856, q=2581)
[  725.670017] Task dump for CPU 0:
[  725.670017] kworker/0:1     S ffff8800633abde8 13016   520      2 0x10080008
[  725.670017]  ffffffffb03027a8 ffff880a70f24017 ffffffffb043ef40 ffff88005ffea310
[  725.670017]  0000000000000000 dfffe90000000000 0000000000000000 1ffffffff63bcdeb
[  725.670017]  ffff88006be15030 ffffffffb1de6f58 ffffffffffffff10 ffffffffb0301237
[  725.670017] Call Trace:
[  725.670017]  [<ffffffffb03027a8>] ? retint_restore_args+0x13/0x13
[  725.670017]  [<ffffffffb0301237>] ? _raw_spin_unlock_irq+0x57/0x200
[  725.670017]  [<ffffffffb0301203>] ? _raw_spin_unlock_irq+0x23/0x200
[  725.670017]  [<ffffffffa04630cb>] ? worker_thread+0x15b/0x1680
[  725.670017]  [<ffffffffb02effef>] ? __schedule+0xf6f/0x2fc0
[  725.670017]  [<ffffffffa0462f70>] ? process_one_work+0x1650/0x1650
[  725.670017]  [<ffffffffa047ae12>] ? kthread+0x1f2/0x2b0
[  725.670017]  [<ffffffffa047ac20>] ? kthread_worker_fn+0x6a0/0x6a0
[  725.670017]  [<ffffffffb03018bc>] ? ret_from_fork+0x7c/0xb0
[  725.670017]  [<ffffffffa047ac20>] ? kthread_worker_fn+0x6a0/0x6a0


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13  0:44                                                                         ` Linus Torvalds
@ 2014-12-13 16:28                                                                           ` Jeff Chua
  0 siblings, 0 replies; 486+ messages in thread
From: Jeff Chua @ 2014-12-13 16:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List

I started seeing this behavior somewhere around 3.16 with
CONFIG_PREEMPT set. Setting CONFIG_PREEMPT off seems to help. And,
yes, it happens on high load (compiling mozilla, xul) and using qemu
chroot to compile mesa.

I'm seeing a few persons bisecting already. If you want, I could start
bisecting too, but 3.16 was unstable for me as I'm still on reiserfs.

Jeff.



On Sat, Dec 13, 2014 at 8:44 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Dec 12, 2014 at 4:34 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>>
>> Right, it's virtio-9p. However, virtio-9p acts merely as a proxy to an underlying
>> tmpfs - so while it's slow, I don't think it's way slower than the average disk
>> backed ext4.
>
> I was thinking more in the sense of "how much of the trouble is about
> something like tmpfs eating tons of memory when trinity starts doing
> random system calls on those files".
>
> I was also thinking that some of it might be filesystem-specific. We
> already *did* see one trace where it was in the loop getting virtio
> channel data. Maybe it's actually possible to overwhelm the 9p
> filesystem exactly because the backing store is tmpfs, and basically
> have a CPU 100% busy handling ring events from the virtual
> filesystem..
>
> But I'm just flailing..
>
>                      Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 19:14                                                               ` Linus Torvalds
                                                                                   ` (3 preceding siblings ...)
  2014-12-13  8:19                                                                 ` frequent lockups in 3.18rc4 Ingo Molnar
@ 2014-12-13 16:59                                                                 ` Dave Jones
  2014-12-13 18:04                                                                   ` Paul E. McKenney
  2014-12-13 22:36                                                                   ` Dave Jones
  4 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-13 16:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 11:14:06AM -0800, Linus Torvalds wrote:
 > On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > Something that's still making me wonder if it's some kind of hardware
 > > problem is the non-deterministic nature of this bug.
 > 
 > I'd expect it to be a race condition, though. Which can easily cause
 > these kinds of issues, and the timing will be pretty random even if
 > the load is very regular.
 > 
 > And we know that the scheduler has an integer overflow under Sasha's
 > loads, although I didn't hear anything from Ingo and friends about it.
 > Ingo/Peter, you were cc'd on that report, where at least one of the
 > multiplcations in wake_affine() ended up overflowing..
 > 
 > Some scheduler thing that overflows only under heavy load, and screws
 > up scheduling could easily account for the RCU thread thing. I see it
 > *less* easily accounting for DaveJ's case, though, because the
 > watchdog is running at RT priority,  and the scheduler would have to
 > screw up much more to then not schedule an RT task, but..
 > 
 > I'm also not sure if the bug ever happens with preemption disabled.

Bah, so I see some watchdog traces with preemption off, and that then
taints the kernel, and the fuzzing stops.  I'll hack something up
so it ignores the taint and keeps going. All I really care about here
is the "machine hangs completely" case, which the trace below didn't
hit..

(back to fuzzing almost everything, not just lsetxattr btw)

[34917.468470] WARNING: CPU: 1 PID: 9226 at kernel/watchdog.c:317 watchdog_overflow_callback+0x9c/0xd0()
[34917.468500] Watchdog detected hard LOCKUP on cpu 1
[34917.468516] CPU: 1 PID: 9226 Comm: trinity-c107 Not tainted 3.18.0+ #102 
[34917.468542] [loadavg: 155.62 139.10 140.12 10/405 11756]
[34917.468559]  ffffffff81a65d99 000000005606cf60 ffff880244205b98 ffffffff817c4f75
[34917.468591]  ffffffff810cd5a1 ffff880244205bf0 ffff880244205bd8 ffffffff81077cb1
[34917.468623]  ffff880244205bd8 ffff880243c55388 0000000000000000 ffff880244205d30
[34917.468655] Call Trace:
[34917.468667]  <NMI>  [<ffffffff817c4f75>] dump_stack+0x4e/0x68
[34917.468696]  [<ffffffff810cd5a1>] ? console_unlock+0x1f1/0x4e0
[34917.468718]  [<ffffffff81077cb1>] warn_slowpath_common+0x81/0xa0
[34917.468740]  [<ffffffff81077d25>] warn_slowpath_fmt+0x55/0x70
[34917.468761]  [<ffffffff817c3710>] ? __slab_alloc+0x3c4/0x58f
[34917.468783]  [<ffffffff8112bce0>] ? restart_watchdog_hrtimer+0x60/0x60
[34917.468806]  [<ffffffff8112bd7c>] watchdog_overflow_callback+0x9c/0xd0
[34917.468830]  [<ffffffff8116ebed>] __perf_event_overflow+0x9d/0x2a0
[34917.468856]  [<ffffffff8116d7c3>] ? perf_event_update_userpage+0x103/0x180
[34917.469785]  [<ffffffff8116d6c0>] ? perf_event_task_disable+0x90/0x90
[34917.470705]  [<ffffffff8116f7c4>] perf_event_overflow+0x14/0x20
[34917.471632]  [<ffffffff8101e749>] intel_pmu_handle_irq+0x1f9/0x3f0
[34917.472553]  [<ffffffff81017cbb>] perf_event_nmi_handler+0x2b/0x50
[34917.473459]  [<ffffffff81007330>] nmi_handle+0xc0/0x1b0
[34917.474355]  [<ffffffff81007275>] ? nmi_handle+0x5/0x1b0
[34917.475245]  [<ffffffff8100761a>] default_do_nmi+0x4a/0x140
[34917.476128]  [<ffffffff810077d0>] do_nmi+0xc0/0x100
[34917.477012]  [<ffffffff817d237a>] end_repeat_nmi+0x1e/0x2e
[34917.477902]  [<ffffffff81383a37>] ? debug_check_no_obj_freed+0xe7/0x250
[34917.478788]  [<ffffffff81383a37>] ? debug_check_no_obj_freed+0xe7/0x250
[34917.479660]  [<ffffffff81383a37>] ? debug_check_no_obj_freed+0xe7/0x250
[34917.480523]  <<EOE>>  [<ffffffff8117b87f>] free_pages_prepare+0x1af/0x240
[34917.481396]  [<ffffffff8117dd51>] __free_pages_ok+0x21/0x100
[34917.482270]  [<ffffffff8117de4b>] free_compound_page+0x1b/0x20
[34917.483144]  [<ffffffff81184d23>] __put_compound_page+0x23/0x30
[34917.484022]  [<ffffffff81184da8>] put_compound_page+0x48/0x2e0
[34917.484895]  [<ffffffff811854d9>] release_pages+0x239/0x270
[34917.485768]  [<ffffffff811b9d1d>] free_pages_and_swap_cache+0x8d/0xa0
[34917.486648]  [<ffffffff811a25d4>] tlb_flush_mmu_free+0x34/0x60
[34917.487530]  [<ffffffff811a4021>] unmap_single_vma+0x6d1/0x900
[34917.488405]  [<ffffffff811a4d51>] unmap_vmas+0x51/0xa0
[34917.489277]  [<ffffffff811ae175>] exit_mmap+0xe5/0x1a0
[34917.490143]  [<ffffffff81074d6b>] mmput+0x6b/0x100
[34917.490995]  [<ffffffff810792ae>] do_exit+0x29e/0xb60
[34917.491823]  [<ffffffff8107abfc>] do_group_exit+0x4c/0xc0
[34917.492645]  [<ffffffff8107ac84>] SyS_exit_group+0x14/0x20
[34917.493462]  [<ffffffff817d0589>] tracesys_phase2+0xd4/0xd9
[34917.494268] ---[ end trace c48441b18b6523a2 ]---
[34917.495171] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 26.690 msecs
[34917.496031] perf interrupt took too long (211387 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[34967.056860] INFO: rcu_sched detected stalls on CPUs/tasks:
[34967.057898] 	1: (0 ticks this GP) idle=b4c/0/0 softirq=2168971/2168971 
[34967.058900] 	(detected by 2, t=6002 jiffies, g=1058044, c=1058043, q=0)
[34967.059867] Task dump for CPU 1:
[34967.060827] swapper/1       R  running task    14576     0      1 0x00200000
[34967.061802]  0000000142b4be38 4a979bec19cdc3d2 ffffe8ffff003200 0000000000000003
[34967.062786]  ffffffff81cb1b80 0000000000000001 ffff880242b4be88 ffffffff8165ff65
[34967.063759]  00001fcecca713d2 ffffffff81cb1ca0 ffffffff81cb1b80 ffffffff81d215b0
[34967.064721] Call Trace:
[34967.065649]  [<ffffffff8165ff65>] ? cpuidle_enter_state+0x55/0x190
[34967.066570]  [<ffffffff81660157>] ? cpuidle_enter+0x17/0x20
[34967.067498]  [<ffffffff810bd6c5>] ? cpu_startup_entry+0x355/0x410
[34967.068425]  [<ffffffff8103016a>] ? start_secondary+0x1aa/0x230
[35027.731690] INFO: rcu_sched detected stalls on CPUs/tasks:
[35027.732701] 	1: (0 ticks this GP) idle=b82/0/0 softirq=2168971/2168971 
[35027.733652] 	(detected by 2, t=6002 jiffies, g=1058047, c=1058046, q=0)
[35027.734593] Task dump for CPU 1:
[35027.735514] swapper/1       R  running task    14576     0      1 0x00200000
[35027.736445]  0000000142b4be38 4a979bec19cdc3d2 ffffe8ffff003200 0000000000000004
[35027.737369]  ffffffff81cb1b80 0000000000000001 ffff880242b4be88 ffffffff8165ff65
[35027.738285]  00001fde808fc8c8 ffffffff81cb1cf8 ffffffff81cb1b80 ffffffff81d215b0
[35027.739206] Call Trace:
[35027.740114]  [<ffffffff8165ff65>] ? cpuidle_enter_state+0x55/0x190
[35027.741032]  [<ffffffff81660157>] ? cpuidle_enter+0x17/0x20
[35027.741949]  [<ffffffff810bd6c5>] ? cpu_startup_entry+0x355/0x410
[35027.742858]  [<ffffffff8103016a>] ? start_secondary+0x1aa/0x230
[35982.698415] perf interrupt took too long (209762 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[37306.241794] perf interrupt took too long (208160 > 19841), lowering kernel.perf_event_max_sample_rate to 6300
[38626.487390] perf interrupt took too long (206565 > 39062), lowering kernel.perf_event_max_sample_rate to 3200
[39781.429034] perf interrupt took too long (204990 > 78125), lowering kernel.perf_event_max_sample_rate to 1600
[41041.380281] perf interrupt took too long (203427 > 156250), lowering kernel.perf_event_max_sample_rate to 800


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 16:59                                                                 ` Dave Jones
@ 2014-12-13 18:04                                                                   ` Paul E. McKenney
  2014-12-13 20:41                                                                     ` Dave Jones
  2014-12-13 22:36                                                                   ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-13 18:04 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 11:59:15AM -0500, Dave Jones wrote:
> On Fri, Dec 12, 2014 at 11:14:06AM -0800, Linus Torvalds wrote:
>  > On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > Something that's still making me wonder if it's some kind of hardware
>  > > problem is the non-deterministic nature of this bug.
>  > 
>  > I'd expect it to be a race condition, though. Which can easily cause
>  > these kinds of issues, and the timing will be pretty random even if
>  > the load is very regular.
>  > 
>  > And we know that the scheduler has an integer overflow under Sasha's
>  > loads, although I didn't hear anything from Ingo and friends about it.
>  > Ingo/Peter, you were cc'd on that report, where at least one of the
>  > multiplcations in wake_affine() ended up overflowing..
>  > 
>  > Some scheduler thing that overflows only under heavy load, and screws
>  > up scheduling could easily account for the RCU thread thing. I see it
>  > *less* easily accounting for DaveJ's case, though, because the
>  > watchdog is running at RT priority,  and the scheduler would have to
>  > screw up much more to then not schedule an RT task, but..
>  > 
>  > I'm also not sure if the bug ever happens with preemption disabled.
> 
> Bah, so I see some watchdog traces with preemption off, and that then
> taints the kernel, and the fuzzing stops.  I'll hack something up
> so it ignores the taint and keeps going. All I really care about here
> is the "machine hangs completely" case, which the trace below didn't
> hit..
> 
> (back to fuzzing almost everything, not just lsetxattr btw)

Hmmm...  This one looks like the RCU grace-period kthread is getting
starved: "idle=b4c/0/0".  Is this running with the "dangerous" patch
that sets these kthreads to RT priority?

							Thanx, Paul

> [34917.468470] WARNING: CPU: 1 PID: 9226 at kernel/watchdog.c:317 watchdog_overflow_callback+0x9c/0xd0()
> [34917.468500] Watchdog detected hard LOCKUP on cpu 1
> [34917.468516] CPU: 1 PID: 9226 Comm: trinity-c107 Not tainted 3.18.0+ #102 
> [34917.468542] [loadavg: 155.62 139.10 140.12 10/405 11756]
> [34917.468559]  ffffffff81a65d99 000000005606cf60 ffff880244205b98 ffffffff817c4f75
> [34917.468591]  ffffffff810cd5a1 ffff880244205bf0 ffff880244205bd8 ffffffff81077cb1
> [34917.468623]  ffff880244205bd8 ffff880243c55388 0000000000000000 ffff880244205d30
> [34917.468655] Call Trace:
> [34917.468667]  <NMI>  [<ffffffff817c4f75>] dump_stack+0x4e/0x68
> [34917.468696]  [<ffffffff810cd5a1>] ? console_unlock+0x1f1/0x4e0
> [34917.468718]  [<ffffffff81077cb1>] warn_slowpath_common+0x81/0xa0
> [34917.468740]  [<ffffffff81077d25>] warn_slowpath_fmt+0x55/0x70
> [34917.468761]  [<ffffffff817c3710>] ? __slab_alloc+0x3c4/0x58f
> [34917.468783]  [<ffffffff8112bce0>] ? restart_watchdog_hrtimer+0x60/0x60
> [34917.468806]  [<ffffffff8112bd7c>] watchdog_overflow_callback+0x9c/0xd0
> [34917.468830]  [<ffffffff8116ebed>] __perf_event_overflow+0x9d/0x2a0
> [34917.468856]  [<ffffffff8116d7c3>] ? perf_event_update_userpage+0x103/0x180
> [34917.469785]  [<ffffffff8116d6c0>] ? perf_event_task_disable+0x90/0x90
> [34917.470705]  [<ffffffff8116f7c4>] perf_event_overflow+0x14/0x20
> [34917.471632]  [<ffffffff8101e749>] intel_pmu_handle_irq+0x1f9/0x3f0
> [34917.472553]  [<ffffffff81017cbb>] perf_event_nmi_handler+0x2b/0x50
> [34917.473459]  [<ffffffff81007330>] nmi_handle+0xc0/0x1b0
> [34917.474355]  [<ffffffff81007275>] ? nmi_handle+0x5/0x1b0
> [34917.475245]  [<ffffffff8100761a>] default_do_nmi+0x4a/0x140
> [34917.476128]  [<ffffffff810077d0>] do_nmi+0xc0/0x100
> [34917.477012]  [<ffffffff817d237a>] end_repeat_nmi+0x1e/0x2e
> [34917.477902]  [<ffffffff81383a37>] ? debug_check_no_obj_freed+0xe7/0x250
> [34917.478788]  [<ffffffff81383a37>] ? debug_check_no_obj_freed+0xe7/0x250
> [34917.479660]  [<ffffffff81383a37>] ? debug_check_no_obj_freed+0xe7/0x250
> [34917.480523]  <<EOE>>  [<ffffffff8117b87f>] free_pages_prepare+0x1af/0x240
> [34917.481396]  [<ffffffff8117dd51>] __free_pages_ok+0x21/0x100
> [34917.482270]  [<ffffffff8117de4b>] free_compound_page+0x1b/0x20
> [34917.483144]  [<ffffffff81184d23>] __put_compound_page+0x23/0x30
> [34917.484022]  [<ffffffff81184da8>] put_compound_page+0x48/0x2e0
> [34917.484895]  [<ffffffff811854d9>] release_pages+0x239/0x270
> [34917.485768]  [<ffffffff811b9d1d>] free_pages_and_swap_cache+0x8d/0xa0
> [34917.486648]  [<ffffffff811a25d4>] tlb_flush_mmu_free+0x34/0x60
> [34917.487530]  [<ffffffff811a4021>] unmap_single_vma+0x6d1/0x900
> [34917.488405]  [<ffffffff811a4d51>] unmap_vmas+0x51/0xa0
> [34917.489277]  [<ffffffff811ae175>] exit_mmap+0xe5/0x1a0
> [34917.490143]  [<ffffffff81074d6b>] mmput+0x6b/0x100
> [34917.490995]  [<ffffffff810792ae>] do_exit+0x29e/0xb60
> [34917.491823]  [<ffffffff8107abfc>] do_group_exit+0x4c/0xc0
> [34917.492645]  [<ffffffff8107ac84>] SyS_exit_group+0x14/0x20
> [34917.493462]  [<ffffffff817d0589>] tracesys_phase2+0xd4/0xd9
> [34917.494268] ---[ end trace c48441b18b6523a2 ]---
> [34917.495171] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 26.690 msecs
> [34917.496031] perf interrupt took too long (211387 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
> [34967.056860] INFO: rcu_sched detected stalls on CPUs/tasks:
> [34967.057898] 	1: (0 ticks this GP) idle=b4c/0/0 softirq=2168971/2168971 
> [34967.058900] 	(detected by 2, t=6002 jiffies, g=1058044, c=1058043, q=0)
> [34967.059867] Task dump for CPU 1:
> [34967.060827] swapper/1       R  running task    14576     0      1 0x00200000
> [34967.061802]  0000000142b4be38 4a979bec19cdc3d2 ffffe8ffff003200 0000000000000003
> [34967.062786]  ffffffff81cb1b80 0000000000000001 ffff880242b4be88 ffffffff8165ff65
> [34967.063759]  00001fcecca713d2 ffffffff81cb1ca0 ffffffff81cb1b80 ffffffff81d215b0
> [34967.064721] Call Trace:
> [34967.065649]  [<ffffffff8165ff65>] ? cpuidle_enter_state+0x55/0x190
> [34967.066570]  [<ffffffff81660157>] ? cpuidle_enter+0x17/0x20
> [34967.067498]  [<ffffffff810bd6c5>] ? cpu_startup_entry+0x355/0x410
> [34967.068425]  [<ffffffff8103016a>] ? start_secondary+0x1aa/0x230
> [35027.731690] INFO: rcu_sched detected stalls on CPUs/tasks:
> [35027.732701] 	1: (0 ticks this GP) idle=b82/0/0 softirq=2168971/2168971 
> [35027.733652] 	(detected by 2, t=6002 jiffies, g=1058047, c=1058046, q=0)
> [35027.734593] Task dump for CPU 1:
> [35027.735514] swapper/1       R  running task    14576     0      1 0x00200000
> [35027.736445]  0000000142b4be38 4a979bec19cdc3d2 ffffe8ffff003200 0000000000000004
> [35027.737369]  ffffffff81cb1b80 0000000000000001 ffff880242b4be88 ffffffff8165ff65
> [35027.738285]  00001fde808fc8c8 ffffffff81cb1cf8 ffffffff81cb1b80 ffffffff81d215b0
> [35027.739206] Call Trace:
> [35027.740114]  [<ffffffff8165ff65>] ? cpuidle_enter_state+0x55/0x190
> [35027.741032]  [<ffffffff81660157>] ? cpuidle_enter+0x17/0x20
> [35027.741949]  [<ffffffff810bd6c5>] ? cpu_startup_entry+0x355/0x410
> [35027.742858]  [<ffffffff8103016a>] ? start_secondary+0x1aa/0x230
> [35982.698415] perf interrupt took too long (209762 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
> [37306.241794] perf interrupt took too long (208160 > 19841), lowering kernel.perf_event_max_sample_rate to 6300
> [38626.487390] perf interrupt took too long (206565 > 39062), lowering kernel.perf_event_max_sample_rate to 3200
> [39781.429034] perf interrupt took too long (204990 > 78125), lowering kernel.perf_event_max_sample_rate to 1600
> [41041.380281] perf interrupt took too long (203427 > 156250), lowering kernel.perf_event_max_sample_rate to 800
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 15:53                                                                         ` Sasha Levin
@ 2014-12-13 18:07                                                                           ` Paul E. McKenney
  2014-12-14 17:50                                                                             ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-13 18:07 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 10:53:35AM -0500, Sasha Levin wrote:
> On 12/13/2014 03:30 AM, Ingo Molnar wrote:
> >> > This is my no_hz related config:
> >> > 
> >> > $ grep NO_HZ .config
> >> > CONFIG_NO_HZ_COMMON=y
> >> > # CONFIG_NO_HZ_IDLE is not set
> >> > CONFIG_NO_HZ_FULL=y
> >> > CONFIG_NO_HZ_FULL_ALL=y
> > Just curious, if you disable NO_HZ_FULL_ALL, does the bug change?
> 
> On 12/13/2014 07:08 AM, Paul E. McKenney wrote:
> > Alternatively, your could boot with nohz_full=2-27 (or maybe even
> > nohz_full=4-27).  This will override CONFIG_NO_HZ_FULL_ALL=y and will
> > provide two (or four with 4-27) housekeeping CPUs that are available to
> > run things like RCU grace-period kthreads and RCU callback processing.
> > This might allow RCU to get the CPU bandwidth it needs despite
> > competition from your workload.
> 
> I've tried both nohz_full=4-27 and disabling CONFIG_NO_HZ_FULL_ALL
> altogether, but I'm still seeing the stall:

And again looping in workqueues, despite the cond_resched_rcu_qs() there.
And the reason for that is that cond_resched_rcu_qs() currently only
provides quiescent states for tasks RCU.  I will put together something
that makes it work for other RCU flavors.

Not that this is likely to do much about Dave Jones's lockup, but one
thing at a time...

							Thanx, Paul

> [  725.670017] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  725.670017]  0: (11 ticks this GP) idle=bbd/140000000000002/0 softirq=11529/11529 fqs=0 last_accelerate: 9d0e/a648, nonlazy_posted: 721357, ..
> [  725.670017]  (detected by 16, t=2102 jiffies, g=9857, c=9856, q=2581)
> [  725.670017] Task dump for CPU 0:
> [  725.670017] kworker/0:1     S ffff8800633abde8 13016   520      2 0x10080008
> [  725.670017]  ffffffffb03027a8 ffff880a70f24017 ffffffffb043ef40 ffff88005ffea310
> [  725.670017]  0000000000000000 dfffe90000000000 0000000000000000 1ffffffff63bcdeb
> [  725.670017]  ffff88006be15030 ffffffffb1de6f58 ffffffffffffff10 ffffffffb0301237
> [  725.670017] Call Trace:
> [  725.670017]  [<ffffffffb03027a8>] ? retint_restore_args+0x13/0x13
> [  725.670017]  [<ffffffffb0301237>] ? _raw_spin_unlock_irq+0x57/0x200
> [  725.670017]  [<ffffffffb0301203>] ? _raw_spin_unlock_irq+0x23/0x200
> [  725.670017]  [<ffffffffa04630cb>] ? worker_thread+0x15b/0x1680
> [  725.670017]  [<ffffffffb02effef>] ? __schedule+0xf6f/0x2fc0
> [  725.670017]  [<ffffffffa0462f70>] ? process_one_work+0x1650/0x1650
> [  725.670017]  [<ffffffffa047ae12>] ? kthread+0x1f2/0x2b0
> [  725.670017]  [<ffffffffa047ac20>] ? kthread_worker_fn+0x6a0/0x6a0
> [  725.670017]  [<ffffffffb03018bc>] ? ret_from_fork+0x7c/0xb0
> [  725.670017]  [<ffffffffa047ac20>] ? kthread_worker_fn+0x6a0/0x6a0
> 
> 
> Thanks,
> Sasha
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 18:04                                                                   ` Paul E. McKenney
@ 2014-12-13 20:41                                                                     ` Dave Jones
  2014-12-14  4:04                                                                       ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-13 20:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 10:04:08AM -0800, Paul E. McKenney wrote:
 > On Sat, Dec 13, 2014 at 11:59:15AM -0500, Dave Jones wrote:
 > > On Fri, Dec 12, 2014 at 11:14:06AM -0800, Linus Torvalds wrote:
 > >  > On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
 > >  > >
 > >  > > Something that's still making me wonder if it's some kind of hardware
 > >  > > problem is the non-deterministic nature of this bug.
 > >  > 
 > >  > I'd expect it to be a race condition, though. Which can easily cause
 > >  > these kinds of issues, and the timing will be pretty random even if
 > >  > the load is very regular.
 > >  > 
 > >  > And we know that the scheduler has an integer overflow under Sasha's
 > >  > loads, although I didn't hear anything from Ingo and friends about it.
 > >  > Ingo/Peter, you were cc'd on that report, where at least one of the
 > >  > multiplcations in wake_affine() ended up overflowing..
 > >  > 
 > >  > Some scheduler thing that overflows only under heavy load, and screws
 > >  > up scheduling could easily account for the RCU thread thing. I see it
 > >  > *less* easily accounting for DaveJ's case, though, because the
 > >  > watchdog is running at RT priority,  and the scheduler would have to
 > >  > screw up much more to then not schedule an RT task, but..
 > >  > 
 > >  > I'm also not sure if the bug ever happens with preemption disabled.
 > > 
 > > Bah, so I see some watchdog traces with preemption off, and that then
 > > taints the kernel, and the fuzzing stops.  I'll hack something up
 > > so it ignores the taint and keeps going. All I really care about here
 > > is the "machine hangs completely" case, which the trace below didn't
 > > hit..
 > > 
 > > (back to fuzzing almost everything, not just lsetxattr btw)
 > 
 > Hmmm...  This one looks like the RCU grace-period kthread is getting
 > starved: "idle=b4c/0/0".  Is this running with the "dangerous" patch
 > that sets these kthreads to RT priority?

sorry, no. Ran out of time yesterday. I'll try and get to applying that
later this evening if I get chance.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 16:59                                                                 ` Dave Jones
  2014-12-13 18:04                                                                   ` Paul E. McKenney
@ 2014-12-13 22:36                                                                   ` Dave Jones
  2014-12-13 22:40                                                                     ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-13 22:36 UTC (permalink / raw)
  To: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 11:59:15AM -0500, Dave Jones wrote:
 > On Fri, Dec 12, 2014 at 11:14:06AM -0800, Linus Torvalds wrote:
 >  > On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
 >  > >
 >  > > Something that's still making me wonder if it's some kind of hardware
 >  > > problem is the non-deterministic nature of this bug.
 >  > 
 >  > I'd expect it to be a race condition, though. Which can easily cause
 >  > these kinds of issues, and the timing will be pretty random even if
 >  > the load is very regular.
 >  > 
 >  > And we know that the scheduler has an integer overflow under Sasha's
 >  > loads, although I didn't hear anything from Ingo and friends about it.
 >  > Ingo/Peter, you were cc'd on that report, where at least one of the
 >  > multiplcations in wake_affine() ended up overflowing..
 >  > 
 >  > Some scheduler thing that overflows only under heavy load, and screws
 >  > up scheduling could easily account for the RCU thread thing. I see it
 >  > *less* easily accounting for DaveJ's case, though, because the
 >  > watchdog is running at RT priority,  and the scheduler would have to
 >  > screw up much more to then not schedule an RT task, but..
 >  > 
 >  > I'm also not sure if the bug ever happens with preemption disabled.
 > 
 > Bah, so I see some watchdog traces with preemption off, and that then
 > taints the kernel, and the fuzzing stops.  I'll hack something up
 > so it ignores the taint and keeps going. All I really care about here
 > is the "machine hangs completely" case, which the trace below didn't
 > hit..

Ok, I think we can rule out preemption. I just checked on it, and
found it wedged.  Here's what I got over usb-serial.
(tainting was from previous post).

[76132.505590] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c8:19387]
[76132.506438] CPU: 3 PID: 19387 Comm: trinity-c8 Tainted: G        W      3.18.0+ #102 [loadavg: 148.33 137.64 140.62 48/406 19489]
[76132.507293] task: ffff880226a9ada0 ti: ffff8801aee08000 task.ti: ffff8801aee08000
[76132.508149] RIP: 0010:[<ffffffff81045d3c>]  [<ffffffff81045d3c>] kernel_map_pages+0xbc/0x120
[76132.509022] RSP: 0000:ffff8801aee0ba08  EFLAGS: 00000202
[76132.509889] RAX: 00000000001407e0 RBX: 0000000000000000 RCX: 0000000000140760
[76132.510760] RDX: 0000000000000202 RSI: ffff880000000188 RDI: 0000000000000001
[76132.511636] RBP: ffff8801aee0ba68 R08: 8000000000000063 R09: ffff880000000000
[76132.512512] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[76132.513394] R13: 0000000000000000 R14: 0000000001b5f000 R15: 0000000000000000
[76132.514269] FS:  00007fb1263cc740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[76132.515152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76132.516055] CR2: 000000000277dff8 CR3: 0000000233290000 CR4: 00000000001407e0
[76132.516957] DR0: 00007f07ef05a000 DR1: 00007fb7761bb000 DR2: 0000000000000000
[76132.517858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[76132.518757] Stack:
[76132.519650]  ffff880097a32000 ffff8801aee0ba08 0000000000000000 0000000000000003
[76132.520590]  0000000000000000 0000000100000001 0000000000097a31 0000000000000000
[76132.521530]  0000000000000000 0000000078052420 ffff8802447d7348 0000000000000001
[76132.522447] Call Trace:
[76132.523359]  [<ffffffff8117f9d4>] get_page_from_freelist+0x4a4/0xaa0
[76132.524281]  [<ffffffff811801fe>] __alloc_pages_nodemask+0x22e/0xb40
[76132.525205]  [<ffffffff810abb55>] ? local_clock+0x25/0x30
[76132.526135]  [<ffffffff810c518c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[76132.527065]  [<ffffffff8118545d>] ? release_pages+0x1bd/0x270
[76132.528004]  [<ffffffff810c367f>] ? lock_release_holdtime.part.24+0xf/0x190
[76132.528944]  [<ffffffff811c941e>] alloc_pages_vma+0xee/0x1b0
[76132.529883]  [<ffffffff811a465a>] ? do_wp_page+0xca/0x770
[76132.530823]  [<ffffffff8109faff>] ? __might_sleep+0x1f/0x140
[76132.531770]  [<ffffffff811a465a>] do_wp_page+0xca/0x770
[76132.532705]  [<ffffffff811a6eab>] handle_mm_fault+0x6cb/0xe90
[76132.533633]  [<ffffffff810423f8>] ? __do_page_fault+0x198/0x5c0
[76132.534561]  [<ffffffff8104245c>] __do_page_fault+0x1fc/0x5c0
[76132.535465]  [<ffffffff817cf790>] ? _raw_spin_unlock_irq+0x30/0x40
[76132.536372]  [<ffffffff8109f2bd>] ? finish_task_switch+0x7d/0x120
[76132.537269]  [<ffffffff8109f27f>] ? finish_task_switch+0x3f/0x120
[76132.538154]  [<ffffffff817c9822>] ? __schedule+0x352/0x8c0
[76132.539043]  [<ffffffff8137576d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[76132.539921]  [<ffffffff8104282c>] do_page_fault+0xc/0x10
[76132.540791]  [<ffffffff817d1fb2>] page_fault+0x22/0x30
[76132.541654] Code: 65 48 33 04 25 28 00 00 00 75 75 48 83 c4 50 5b 41 5c 5d c3 0f 1f 00 9c 5a fa 0f 20 e0 48 89 c1 80 e1 7f 0f 22 e1 0f 22 e0 52 9d <eb> cf 66 90 49 bc 00 00 00 00 00 88 ff ff 48 63 f6 49 01 fc 48 
[76132.543541] sending NMI to other CPUs:
[76132.544438] NMI backtrace for cpu 1
[76132.545300] CPU: 1 PID: 17326 Comm: trinity-c93 Tainted: G        W      3.18.0+ #102 [loadavg: 148.33 137.64 140.62 48/406 19489]
[76132.546193] task: ffff8800098b2da0 ti: ffff8801aec38000 task.ti: ffff8801aec38000
[76132.547085] RIP: 0010:[<ffffffff810c50a2>]  [<ffffffff810c50a2>] __lock_acquire.isra.31+0x142/0x9f0
[76132.547987] RSP: 0018:ffff8801aec3bd58  EFLAGS: 00000082
[76132.548883] RAX: 0000000000000000 RBX: ffff8800098b2da0 RCX: 0000000000000002
[76132.549786] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81c0a098
[76132.550683] RBP: ffff8801aec3bdc8 R08: 0000000000000000 R09: 0000000000000000
[76132.551576] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff825a4ab0
[76132.552462] R13: 0000000000000000 R14: ffffffff81c0a098 R15: 0000000000000000
[76132.553347] FS:  00007fb1263cc740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[76132.554237] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76132.555118] CR2: 00007fc56b68e000 CR3: 0000000176d68000 CR4: 00000000001407e0
[76132.556014] DR0: 00007f07ef05a000 DR1: 00007fb7761bb000 DR2: 0000000000000000
[76132.556908] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[76132.557805] Stack:
[76132.558679]  ffff8801aec3bdd8 0000000000000046 0000000000000000 0000000000000000
[76132.559582]  0000000000000000 0000000000000000 ffff8801aec3bdf8 0000000000000046
[76132.560486]  0000000000000000 0000000000000246 0000000000000000 0000000000000000
[76132.561394] Call Trace:
[76132.562285]  [<ffffffff810c605f>] lock_acquire+0x9f/0x120
[76132.563172]  [<ffffffff8107a939>] ? do_wait+0xd9/0x280
[76132.564038]  [<ffffffff817cf851>] _raw_read_lock+0x41/0x80
[76132.564891]  [<ffffffff8107a939>] ? do_wait+0xd9/0x280
[76132.565734]  [<ffffffff8107a939>] do_wait+0xd9/0x280
[76132.566566]  [<ffffffff8107af00>] SyS_wait4+0x80/0x110
[76132.567389]  [<ffffffff810789e0>] ? task_stopped_code+0x60/0x60
[76132.568220]  [<ffffffff817d0392>] system_call_fastpath+0x12/0x17
[76132.569044] Code: 00 00 45 31 ed e9 f3 01 00 00 0f 1f 80 00 00 00 00 44 89 e8 4d 8b 64 c6 08 4d 85 e4 0f 84 21 ff ff ff f0 41 ff 84 24 98 01 00 00 <8b> 3d 70 31 ab 01 44 8b ab 68 07 00 00 85 ff 75 0a 41 83 fd 2f 
[76132.570883] NMI backtrace for cpu 2
[76132.570886] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 26.444 msecs
[76132.572634] CPU: 2 PID: 18775 Comm: trinity-c59 Tainted: G        W      3.18.0+ #102 [loadavg: 148.33 137.64 140.62 49/406 19489]
[76132.573536] task: ffff880096135b40 ti: ffff8802408f0000 task.ti: ffff8802408f0000
[76132.574440] RIP: 0010:[<ffffffff810fbd4e>]  [<ffffffff810fbd4e>] generic_exec_single+0xee/0x1b0
[76132.575364] RSP: 0018:ffff8802408f3d28  EFLAGS: 00000202
[76132.576275] RAX: ffff880223a67d00 RBX: ffff8802408f3d40 RCX: ffff880223a67d40
[76132.577188] RDX: ffff8802447d3ac0 RSI: ffff8802408f3d40 RDI: ffff8802408f3d40
[76132.578096] RBP: ffff8802408f3d88 R08: 0000000000000001 R09: 0000000000000001
[76132.579011] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[76132.579918] R13: 0000000000000001 R14: ffff880096159290 R15: ffff8802408f3e80
[76132.580822] FS:  00007fb1263cc740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[76132.581737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76132.582653] CR2: 0000000081000000 CR3: 00000001ccf39000 CR4: 00000000001407e0
[76132.583573] DR0: 00007f07ef05a000 DR1: 00007fb7761bb000 DR2: 0000000000000000
[76132.584495] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[76132.585411] Stack:
[76132.586319]  0000000000000001 ffff880096135b40 ffff8802408f3d48 ffff880223a67d40
[76132.587256]  ffffffff81166730 ffff8802258737b0 0000000000000003 00000000803d2842
[76132.588198]  ffff8802408f3da8 00000000ffffffff 0000000000000003 ffffffff81166730
[76132.589142] Call Trace:
[76132.590074]  [<ffffffff81166730>] ? perf_swevent_add+0x110/0x110
[76132.591018]  [<ffffffff81166730>] ? perf_swevent_add+0x110/0x110
[76132.591957]  [<ffffffff810fbeb0>] smp_call[76148.896166] sched: RT throttling activated
[76172.492491] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/u16:1:24452]
[76172.493490] CPU: 3 PID: 24452 Comm: kworker/u16:1 Tainted: G        W    L 3.18.0+ #102 [loadavg: 180.20 147.58 143.86 23/399 24452]
[76172.494533] task: ffff88007309c470 ti: ffff880223a18000 task.ti: ffff880223a18000
[76172.495572] RIP: 0010:[<ffffffff817cf799>]  [<ffffffff817cf799>] _raw_spin_unlock_irq+0x39/0x40
[76172.496598] RSP: 0018:ffff880223a1bec8  EFLAGS: 00000286
[76172.497628] RAX: 0000000000000003 RBX: 0000000000000046 RCX: 0000000000000380
[76172.498691] RDX: ffff88024460dc40 RSI: 0000000000000000 RDI: ffff8802447d2f00
[76172.499749] RBP: ffff880223a1bed8 R08: 0000000000000000 R09: 0000000000000000
[76172.500804] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[76172.501920] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[76172.502958] FS:  0000000000000000(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
[76172.504054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76172.505140] CR2: 00007fe411e27000 CR3: 0000000042ae7000 CR4: 00000000001407e0
[76172.506171] DR0: 00007f07ef05a000 DR1: 00007fb7761bb000 DR2: 0000000000000000
[76172.507176] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[76172.508171] Stack:
[76172.509170]  0000000000000001 ffff8802447d2f00 ffff880223a1bf28 ffffffff8109f2bd
[76172.510185]  ffffffff8109f27f 0000000000000000 0000000000000000 ffff8802447d2f00
[76172.511234]  ffff8802447d2f00 ffff880094cc5dc0 0000000000000000 ffff8801aed38208
[76172.512313] Call Trace:
[76172.513371]  [<ffffffff8109f2bd>] finish_task_switch+0x7d/0x120
[76172.514399]  [<ffffffff8109f27f>] ? finish_task_switch+0x3f/0x120
[76172.515429]  [<ffffffff810a5c87>] schedule_tail+0x27/0xb0
[76172.516455]  [<ffffffff817d027f>] ret_from_fork+0xf/0xb0
[76172.517479]  [<ffffffff8108e390>] ? call_helper+0x20/0x20
[76172.518515] Code: 53 48 89 fb 48 8d 7f 18 48 83 ec 08 48 8b 55 08 e8 8d 6c 8f ff 48 89 df e8 b5 9f 8f ff e8 d0 aa 97 ff fb 65 ff 0c 25 e0 a9 00 00 <48> 83 c4 08 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 
[76172.520732] sending NMI to other CPUs:
[76172.521757] NMI backtrace for cpu 1
[76172.522745] CPU: 1 PID: 24454 Comm: modprobe Tainted: G        W    L 3.18.0+ #102 [loadavg: 180.20 147.58 143.86 19/402 24456]
[76172.523759] task: ffff880223174470 ti: ffff880225ba4000 task.ti: ffff880225ba4000
[76172.524750] RIP: 0033:[<000000336f609ffd>]  [<000000336f609ffd>] 0x336f609ffd
[76172.525747] RSP: 002b:00007fffcd99c458  EFLAGS: 00000202
[76172.526721] RAX: 0000000000000001 RBX: 0000000000000003 RCX: 0000000000000000
[76172.527697] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000336ec20908
[76172.528662] RBP: 000000336ee80fc0 R08: 000000336ec20908 R09: 0000000000000000
[76172.529632] R10: 000000336ee879b0 R11: 0000000000000010 R12: 00007fffcd99c4d0
[76172.530599] R13: 0000000000418c2f R14: 00007fffcd99d918 R15: 0000000000000000
[76172.531567] FS:  00007f8ca6d7e740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
[76172.532543] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76172.533512] CR2: 000000336eed9c00 CR3: 00000000963b7000 CR4: 00000000001407e0
[76172.534486] DR0: 00007f07ef05a000 DR1: 00007fb7761bb000 DR2: 0000000000000000
[76172.535450] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[76172.536413] 
[76172.537368] NMI backtrace for cpu 2
[76172.538331] CPU: 2 PID: 20261 Comm: trinity-c241 Tainted: G        W    L 3.18.0+ #102 [loadavg: 180.20 147.58 143.86 18/402 24456]
[76172.539329] task: ffff88017968ada0 ti: ffff8801c0508000 task.ti: ffff8801c0508000
[76172.540316] RIP: 0033:[<000000336ee39e40>]  [<000000336ee39e40>] 0x336ee39e40
[76172.541314] RSP: 002b:00007fffe71236e8  EFLAGS: 00000206
[76172.542300] RAX: 0000000000000003 RBX: 0000000001b66720 RCX: 000000336f1b70cc
[76172.543289] RDX: 000000336f1b70c4 RSI: 00007fffe71236fc RDI: 000000336f1b76e0
[76172.544282] RBP: 000000000000000f R08: 000000336f1b713c R09: 000000336f1b7140
[76172.545273] R10: ffffffffffff9f00 R11: 0000000000000202 R12: 00007fe41545d000
[76172.546266] R13: 00007fe41545d068 R14: 0000000000000000 R15: 0000000000000000
[76172.547263] FS:  00007fe416231740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
[76172.548267] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76172.549270] CR2: 0000000000000008 CR3: 000000017304b000 CR4: 00000000001407e0
[76172.550284] DR0: 00007f07ef05a000 DR1: 00007fb7761bb000 DR2: 0000000000000000
[76172.551300] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[76172.552316] 
[76172.553304] NMI backtrace for cpu 0
[76172.554289] CPU: 0 PID: 22642 Comm: trinity-c51 Tainted: G        W    L 3.18.0+ #102 [loadavg: 180.20 147.58 143.86 19/400 24457]
[76172.555284] task: ffff88009f375b40 ti: ffff880229be8000 task.ti: ffff880229be8000
[76172.556277] RIP: 0010:[<ffffffff811a5de0>]  [<ffffffff811a5de0>] copy_page_range+0x550/0xa20
[76172.557304] RSP: 0018:ffff880229bebc50  EFLAGS: 00000286
[76172.558340] RAX: 001ffe000008007c RBX: 00007fe413bbf000 RCX: 00000000022c3700
[76172.559328] RDX: ffffea00022c3700 RSI: 00007fe413bbf000 RDI: ffff880201b78c00
[76172.560375] RBP: ffff880229bebd80 R08: 0000000000000000 R09: 0000000000000001
[76172.561423] R10: 0000000000000000 R11: 0000000000000000 R12: 800000008b0dc007
[76172.562465] R13: 00007fe413c00000 R14: ffff88022475ddf8 R15: 0000000000000018
[76172.563432] FS:  00007fe416231740(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
[76172.564401] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76172.565375] CR2: 0000000001d04b68 CR3: 00000001aee5d000 CR4: 00000000001407f0
[76172.566408] DR0: 00007f07ef05a000 DR1: 00007fb7761bb000 DR2: 0000000000000000
[76172.567423] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[76172.568394] Stack:
[76172.569353]  ffffffff00000000 00007fe413ea6fff 0000000000000000 ffff88021242b988
[76172.570337]  00007fe413ea6fff 00007fe4134a7000 ffff88006681f7f8 ffff880175a72d08
[76172.571323]  ffff8800905d9c80 ffff8801aee5d7f8 00007fe413ea7000 ffff880223ff3c80
[76172.572320] Call Trace:
[76172.573300]  [<ffffffff810764ef>] copy_process.part.26+0x146f/0x1a40

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 22:36                                                                   ` Dave Jones
@ 2014-12-13 22:40                                                                     ` Linus Torvalds
  2014-12-13 22:59                                                                       ` Linus Torvalds
  2014-12-14 23:46                                                                       ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-13 22:40 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 2:36 PM, Dave Jones <davej@redhat.com> wrote:
>
> Ok, I think we can rule out preemption. I just checked on it, and
> found it wedged.

Ok, one more. Mind checking what happens without CONFIG_DEBUG_PAGEALLOC?

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 22:40                                                                     ` Linus Torvalds
@ 2014-12-13 22:59                                                                       ` Linus Torvalds
  2014-12-13 23:09                                                                         ` Linus Torvalds
  2014-12-13 23:39                                                                         ` Al Viro
  2014-12-14 23:46                                                                       ` Dave Jones
  1 sibling, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-13 22:59 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Al Viro,
	Thomas Gleixner

Side note: I think I've found a real potential lockup bug in
fs/namespace.c, but afaik it could only trigger with the RT patches.

I'm looking at what lxsetattr() does, since you had that
lxsetattr-only lockup. I doubt it's really related to lxsetattr(), but
whatever. The generic code does that mnt_want_write/mnt_drop_write
dance adound the call to setxattr, and that in turn does

        while (ACCESS_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD)
                cpu_relax();

with preemption explicitly disabled. It's waitingo for
mnt_make_readonly() to go away if it is racing with it.

But mnt_make_readonly() doesn't actually explicitly disable preemption
while it sets that MNT_WRITE_HOLD bit. Instead, it depends on
lock_mount_hash() to disable preemption for it. Which it does, because
it is a seq-writelock, which uses a spinlock, which will disable
preemption.

Except it won't with the RT patches, I guess. So it looks like you could have:\

 - mnt_make_readonly() sets that bit
 - gets preempted with the RT patches
 - we run mnt_want_write() on all CPU's, which disables preemption and
waits for the bit to be cleared
 - nothing happens.

This is clearly not what happens in your lockup, but it does seem to
be a potential issue for the RT kernel.

Added Al and Thomas to the cc, for fs/namespace.c and RT kernel
respectively. Maybe the RT patches already fix this, I didn't actually
check.

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 22:59                                                                       ` Linus Torvalds
@ 2014-12-13 23:09                                                                         ` Linus Torvalds
  2014-12-13 23:35                                                                           ` Al Viro
  2014-12-13 23:39                                                                         ` Al Viro
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-13 23:09 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Al Viro,
	Thomas Gleixner

On Sat, Dec 13, 2014 at 2:59 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> The generic code does that mnt_want_write/mnt_drop_write
> dance adound the call to setxattr, and that in turn does
>
>         while (ACCESS_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD)
>                 cpu_relax();
>
> with preemption explicitly disabled.

Btw, I see no reason why mnt_want_write/mnt_drop_write disables
preemption. They don't care, they just care about the ordering of the
write counts and the MNT_WRITE_HOLD bit. It's the code that sets the
bit that should care, afaik. But maybe I'm missing something.

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 23:09                                                                         ` Linus Torvalds
@ 2014-12-13 23:35                                                                           ` Al Viro
  2014-12-13 23:38                                                                             ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Al Viro @ 2014-12-13 23:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 03:09:59PM -0800, Linus Torvalds wrote:
> On Sat, Dec 13, 2014 at 2:59 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > The generic code does that mnt_want_write/mnt_drop_write
> > dance adound the call to setxattr, and that in turn does
> >
> >         while (ACCESS_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD)
> >                 cpu_relax();
> >
> > with preemption explicitly disabled.
> 
> Btw, I see no reason why mnt_want_write/mnt_drop_write disables
> preemption. They don't care, they just care about the ordering of the
> write counts and the MNT_WRITE_HOLD bit. It's the code that sets the
> bit that should care, afaik. But maybe I'm missing something.

Er...  There's much more direct reason - suppose we get a timer interrupt
right in the middle of mnt_drop_write().  And lost the timeslice.
On UP we have mnt->mnt_writers--, with no locks held.  On SMP we have
this_cpu_dec() instead, also without any locks.  You really don't want to
lose the timeslice in the middle of either...

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 23:35                                                                           ` Al Viro
@ 2014-12-13 23:38                                                                             ` Linus Torvalds
  2014-12-13 23:47                                                                               ` Al Viro
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-13 23:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 3:35 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Er...  There's much more direct reason - suppose we get a timer interrupt
> right in the middle of mnt_drop_write().  And lost the timeslice.

So?

You didn't have preemption disabled in *between* the mnt_want_write()
and mnt_drop_write(), there's absolutely no reason to have it inside
of them.

Nobody cares if you get preempted and go away for a while. It's
exactly equivalent to sleeping while doing the write that the pair was
protecting.

Seriously, the preemption disable looks like just voodoo code. It
doesn't protect anything, it doesn't fix anything, it doesn't change
anything. All it does is disable preemption over a random sequence of
code.

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 22:59                                                                       ` Linus Torvalds
  2014-12-13 23:09                                                                         ` Linus Torvalds
@ 2014-12-13 23:39                                                                         ` Al Viro
  1 sibling, 0 replies; 486+ messages in thread
From: Al Viro @ 2014-12-13 23:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 02:59:43PM -0800, Linus Torvalds wrote:
> Side note: I think I've found a real potential lockup bug in
> fs/namespace.c, but afaik it could only trigger with the RT patches.

> Except it won't with the RT patches, I guess. So it looks like you could have:\
> 
>  - mnt_make_readonly() sets that bit
>  - gets preempted with the RT patches
>  - we run mnt_want_write() on all CPU's, which disables preemption and
> waits for the bit to be cleared
>  - nothing happens.
> 
> This is clearly not what happens in your lockup, but it does seem to
> be a potential issue for the RT kernel.
> 
> Added Al and Thomas to the cc, for fs/namespace.c and RT kernel
> respectively. Maybe the RT patches already fix this, I didn't actually
> check.

I agree that it's a thing to keep in mind on the RT side of things, but
IMO it belongs in RT patches...

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 23:38                                                                             ` Linus Torvalds
@ 2014-12-13 23:47                                                                               ` Al Viro
  2014-12-14  0:14                                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Al Viro @ 2014-12-13 23:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 03:38:57PM -0800, Linus Torvalds wrote:
> On Sat, Dec 13, 2014 at 3:35 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Er...  There's much more direct reason - suppose we get a timer interrupt
> > right in the middle of mnt_drop_write().  And lost the timeslice.
> 
> So?
> 
> You didn't have preemption disabled in *between* the mnt_want_write()
> and mnt_drop_write(), there's absolutely no reason to have it inside
> of them.
> 
> Nobody cares if you get preempted and go away for a while. It's
> exactly equivalent to sleeping while doing the write that the pair was
> protecting.
> 
> Seriously, the preemption disable looks like just voodoo code. It
> doesn't protect anything, it doesn't fix anything, it doesn't change
> anything. All it does is disable preemption over a random sequence of
> code.

Huh?  Sure, we can enable it after mnt_inc_writers() and disable just prior to
mnt_dec_writers(), but we absolutely *do* need it disabled during either.
Is that what you are talking about?  If so, yes, we can do that.

But that applies only to __mnt_want_write() - __mnt_drop_write() is pure
mnt_dec_writers() and we can't call that one with preemption enabled.
Seriously, look at the mnt_dec_writers():
static inline void mnt_dec_writers(struct mount *mnt)
{
#ifdef CONFIG_SMP
        this_cpu_dec(mnt->mnt_pcp->mnt_writers);  
#else
        mnt->mnt_writers--;
#endif
}
It's load/modify/store, without any kind of atomicity; get preempted in the
middle of that sequence by another caller of mnt_dec_writers() and obvious bad
things will happen...

Al, really confused by now...

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 23:47                                                                               ` Al Viro
@ 2014-12-14  0:14                                                                                 ` Linus Torvalds
  2014-12-14  0:33                                                                                   ` Al Viro
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-14  0:14 UTC (permalink / raw)
  To: Al Viro
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 3:47 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> static inline void mnt_dec_writers(struct mount *mnt)
> {
> #ifdef CONFIG_SMP
>         this_cpu_dec(mnt->mnt_pcp->mnt_writers);
> #else
>         mnt->mnt_writers--;
> #endif
> }
> It's load/modify/store, without any kind of atomicity; get preempted in the
> middle of that sequence by another caller of mnt_dec_writers() and obvious bad
> things will happen...

Ugh, yes ok, the UP case needs it for the actual counter itself. Ugh.
What an ugly mess. I'd rather have the preemption disable where it is
actually *needed*, in that function itself for the UP case (or just
make it "atomic_t", which would likely be better still.

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-14  0:14                                                                                 ` Linus Torvalds
@ 2014-12-14  0:33                                                                                   ` Al Viro
  2014-12-14  1:35                                                                                     ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Al Viro @ 2014-12-14  0:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 04:14:58PM -0800, Linus Torvalds wrote:
> On Sat, Dec 13, 2014 at 3:47 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > static inline void mnt_dec_writers(struct mount *mnt)
> > {
> > #ifdef CONFIG_SMP
> >         this_cpu_dec(mnt->mnt_pcp->mnt_writers);
> > #else
> >         mnt->mnt_writers--;
> > #endif
> > }
> > It's load/modify/store, without any kind of atomicity; get preempted in the
> > middle of that sequence by another caller of mnt_dec_writers() and obvious bad
> > things will happen...
> 
> Ugh, yes ok, the UP case needs it for the actual counter itself. Ugh.
> What an ugly mess. I'd rather have the preemption disable where it is
> actually *needed*, in that function itself for the UP case (or just
> make it "atomic_t", which would likely be better still.

So does SMP - this_cpu_dec() relies on preemption being disabled.  On x86
we might get away with that, what with having it compiled into decl %gs:const,
but on generic it turns into
	*raw_cpu_ptr(&pcp) -= 1;
and compiler has every right to turn it into
	p = raw_cpu_ptr(&pcp);
	(*p)--;
again, with no locking.  Lose the timeslice in the middle of that and you
are risking to get a different CPU when you are scheduled again, with
another process doing this_cpu_dec() on your old CPU.  Have fun - two
non-atomic decrements of the same variable by different CPUs in parallel...

We really need preemtion disabled there, UP or no UP.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-14  0:33                                                                                   ` Al Viro
@ 2014-12-14  1:35                                                                                     ` Linus Torvalds
  2014-12-14  3:14                                                                                       ` Al Viro
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-14  1:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 4:33 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> So does SMP - this_cpu_dec() relies on preemption being disabled.

No. really. It very much does not. Not on x86, not elsewhere. It's
part of the whole point of "this_cpu_p()". They are preemption and
interrupt safe.

It's the "__this_cpu_op()" ones that need external protection.

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-14  1:35                                                                                     ` Linus Torvalds
@ 2014-12-14  3:14                                                                                       ` Al Viro
  2014-12-15  0:18                                                                                         ` Al Viro
  0 siblings, 1 reply; 486+ messages in thread
From: Al Viro @ 2014-12-14  3:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Dec 13, 2014 at 05:35:17PM -0800, Linus Torvalds wrote:
> On Sat, Dec 13, 2014 at 4:33 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > So does SMP - this_cpu_dec() relies on preemption being disabled.
> 
> No. really. It very much does not. Not on x86, not elsewhere. It's
> part of the whole point of "this_cpu_p()". They are preemption and
> interrupt safe.
> 
> It's the "__this_cpu_op()" ones that need external protection.

Right you are - I really need to get some coffee...  Sorry...

FWIW, do we need to disable interrupts there?  After all, mnt_want_write()
and mnt_drop_write() shouldn't be done from interrupt context - they can
happen via schedule_delayed_work(), but that's it...

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 20:41                                                                     ` Dave Jones
@ 2014-12-14  4:04                                                                       ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-14  4:04 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 03:41:52PM -0500, Dave Jones wrote:
> On Sat, Dec 13, 2014 at 10:04:08AM -0800, Paul E. McKenney wrote:
>  > On Sat, Dec 13, 2014 at 11:59:15AM -0500, Dave Jones wrote:
>  > > On Fri, Dec 12, 2014 at 11:14:06AM -0800, Linus Torvalds wrote:
>  > >  > On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@redhat.com> wrote:
>  > >  > >
>  > >  > > Something that's still making me wonder if it's some kind of hardware
>  > >  > > problem is the non-deterministic nature of this bug.
>  > >  > 
>  > >  > I'd expect it to be a race condition, though. Which can easily cause
>  > >  > these kinds of issues, and the timing will be pretty random even if
>  > >  > the load is very regular.
>  > >  > 
>  > >  > And we know that the scheduler has an integer overflow under Sasha's
>  > >  > loads, although I didn't hear anything from Ingo and friends about it.
>  > >  > Ingo/Peter, you were cc'd on that report, where at least one of the
>  > >  > multiplcations in wake_affine() ended up overflowing..
>  > >  > 
>  > >  > Some scheduler thing that overflows only under heavy load, and screws
>  > >  > up scheduling could easily account for the RCU thread thing. I see it
>  > >  > *less* easily accounting for DaveJ's case, though, because the
>  > >  > watchdog is running at RT priority,  and the scheduler would have to
>  > >  > screw up much more to then not schedule an RT task, but..
>  > >  > 
>  > >  > I'm also not sure if the bug ever happens with preemption disabled.
>  > > 
>  > > Bah, so I see some watchdog traces with preemption off, and that then
>  > > taints the kernel, and the fuzzing stops.  I'll hack something up
>  > > so it ignores the taint and keeps going. All I really care about here
>  > > is the "machine hangs completely" case, which the trace below didn't
>  > > hit..
>  > > 
>  > > (back to fuzzing almost everything, not just lsetxattr btw)
>  > 
>  > Hmmm...  This one looks like the RCU grace-period kthread is getting
>  > starved: "idle=b4c/0/0".  Is this running with the "dangerous" patch
>  > that sets these kthreads to RT priority?
> 
> sorry, no. Ran out of time yesterday. I'll try and get to applying that
> later this evening if I get chance.

Whew!!!  You had me worried there for a bit!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 18:07                                                                           ` Paul E. McKenney
@ 2014-12-14 17:50                                                                             ` Paul E. McKenney
  2014-12-14 23:46                                                                               ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-14 17:50 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 10:07:52AM -0800, Paul E. McKenney wrote:
> On Sat, Dec 13, 2014 at 10:53:35AM -0500, Sasha Levin wrote:
> > On 12/13/2014 03:30 AM, Ingo Molnar wrote:
> > >> > This is my no_hz related config:
> > >> > 
> > >> > $ grep NO_HZ .config
> > >> > CONFIG_NO_HZ_COMMON=y
> > >> > # CONFIG_NO_HZ_IDLE is not set
> > >> > CONFIG_NO_HZ_FULL=y
> > >> > CONFIG_NO_HZ_FULL_ALL=y
> > > Just curious, if you disable NO_HZ_FULL_ALL, does the bug change?
> > 
> > On 12/13/2014 07:08 AM, Paul E. McKenney wrote:
> > > Alternatively, your could boot with nohz_full=2-27 (or maybe even
> > > nohz_full=4-27).  This will override CONFIG_NO_HZ_FULL_ALL=y and will
> > > provide two (or four with 4-27) housekeeping CPUs that are available to
> > > run things like RCU grace-period kthreads and RCU callback processing.
> > > This might allow RCU to get the CPU bandwidth it needs despite
> > > competition from your workload.
> > 
> > I've tried both nohz_full=4-27 and disabling CONFIG_NO_HZ_FULL_ALL
> > altogether, but I'm still seeing the stall:
> 
> And again looping in workqueues, despite the cond_resched_rcu_qs() there.
> And the reason for that is that cond_resched_rcu_qs() currently only
> provides quiescent states for tasks RCU.  I will put together something
> that makes it work for other RCU flavors.
> 
> Not that this is likely to do much about Dave Jones's lockup, but one
> thing at a time...

And here is a patch for this purpose that passes moderate rcutorture
testing.  Please note that this patch is designed to help the case you
were seeing, which is workqueues running indefinitely on the housekeeping
CPU, namely CPU 0.  If this also ends up happening on the other CPUs,
I will need to get you another CPU-kicking mechanism.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors

Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index b63b9bb3bc0c..08651da15448 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -56,14 +56,14 @@ rcuboost:
 
 The output of "cat rcu/rcu_preempt/rcudata" looks as follows:
 
-  0!c=30455 g=30456 pq=1 qp=1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
-  1!c=30719 g=30720 pq=1 qp=0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
-  2!c=30150 g=30151 pq=1 qp=1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
-  3 c=31249 g=31250 pq=1 qp=0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
-  4!c=29502 g=29503 pq=1 qp=1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
-  5 c=31201 g=31202 pq=1 qp=1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
-  6!c=30253 g=30254 pq=1 qp=1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
-  7 c=31178 g=31178 pq=1 qp=0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
+  0!c=30455 g=30456 pq=1/0 qp=1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
+  1!c=30719 g=30720 pq=1/0 qp=0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
+  2!c=30150 g=30151 pq=1/1 qp=1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
+  3 c=31249 g=31250 pq=1/1 qp=0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
+  4!c=29502 g=29503 pq=1/0 qp=1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
+  5 c=31201 g=31202 pq=1/0 qp=1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
+  6!c=30253 g=30254 pq=1/0 qp=1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
+  7 c=31178 g=31178 pq=1/0 qp=0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
 
 This file has one line per CPU, or eight for this 8-CPU system.
 The fields are as follows:
@@ -188,14 +188,14 @@ o	"ca" is the number of RCU callbacks that have been adopted by this
 Kernels compiled with CONFIG_RCU_BOOST=y display the following from
 /debug/rcu/rcu_preempt/rcudata:
 
-  0!c=12865 g=12866 pq=1 qp=1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
-  1 c=14407 g=14408 pq=1 qp=0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
-  2 c=14407 g=14408 pq=1 qp=0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
-  3 c=14407 g=14408 pq=1 qp=0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
-  4 c=14405 g=14406 pq=1 qp=1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
-  5!c=14168 g=14169 pq=1 qp=0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
-  6 c=14404 g=14405 pq=1 qp=0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
-  7 c=14407 g=14408 pq=1 qp=1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
+  0!c=12865 g=12866 pq=1/0 qp=1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
+  1 c=14407 g=14408 pq=1/0 qp=0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
+  2 c=14407 g=14408 pq=1/0 qp=0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
+  3 c=14407 g=14408 pq=1/0 qp=0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
+  4 c=14405 g=14406 pq=1/0 qp=1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
+  5!c=14168 g=14169 pq=1/0 qp=0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
+  6 c=14404 g=14405 pq=1/0 qp=0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
+  7 c=14407 g=14408 pq=1/0 qp=1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
 
 This is similar to the output discussed above, but contains the following
 additional fields:
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 386ba288084a..7f08f5079757 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -331,6 +331,7 @@ static inline void rcu_init_nohz(void)
 extern struct srcu_struct tasks_rcu_exit_srcu;
 #define rcu_note_voluntary_context_switch(t) \
 	do { \
+		rcu_all_qs(); \
 		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
 			ACCESS_ONCE((t)->rcu_tasks_holdout) = false; \
 	} while (0)
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 984192160e9b..937edaeb150d 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -186,7 +186,10 @@ static inline bool rcu_is_watching(void)
 	return true;
 }
 
-
 #endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
 
+static inline void rcu_all_qs(void)
+{
+}
+
 #endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index c0dd124e69ec..fbfd61d40e73 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -100,4 +100,10 @@ extern int rcu_scheduler_active __read_mostly;
 
 bool rcu_is_watching(void);
 
+DECLARE_PER_CPU(unsigned long, rcu_qs_ctr);
+static inline void rcu_all_qs(void)
+{
+	this_cpu_inc(rcu_qs_ctr);
+}
+
 #endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 57fd8f5bd1ad..58ea6bf55fd2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -219,6 +219,9 @@ static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
 #endif /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
 };
 
+DEFINE_PER_CPU_SHARED_ALIGNED(unsigned long, rcu_qs_ctr);
+EXPORT_PER_CPU_SYMBOL_GPL(rcu_qs_ctr);
+
 /*
  * Let the RCU core know that this CPU has gone through the scheduler,
  * which is a quiescent state.  This is called when the need for a
@@ -1630,6 +1633,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		rdp->gpnum = rnp->gpnum;
 		trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
 		rdp->passed_quiesce = 0;
+		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		rdp->qs_pending = !!(rnp->qsmask & rdp->grpmask);
 		zero_cpu_stall_ticks(rdp);
 		ACCESS_ONCE(rdp->gpwrap) = false;
@@ -2096,8 +2100,10 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	smp_mb__after_unlock_lock();
-	if (rdp->passed_quiesce == 0 || rdp->gpnum != rnp->gpnum ||
-	    rnp->completed == rnp->gpnum || rdp->gpwrap) {
+	if ((rdp->passed_quiesce == 0 &&
+	     rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) ||
+	    rdp->gpnum != rnp->gpnum || rnp->completed == rnp->gpnum ||
+	    rdp->gpwrap) {
 
 		/*
 		 * The grace period in which this quiescent state was
@@ -2106,6 +2112,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 		 * within the current grace period.
 		 */
 		rdp->passed_quiesce = 0;	/* need qs for new gp. */
+		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
 	}
@@ -2150,7 +2157,8 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 	 * Was there a quiescent state since the beginning of the grace
 	 * period? If no, then exit and wait for the next call.
 	 */
-	if (!rdp->passed_quiesce)
+	if (!rdp->passed_quiesce &&
+	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr))
 		return;
 
 	/*
@@ -3206,9 +3214,12 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
 	if (rcu_scheduler_fully_active &&
-	    rdp->qs_pending && !rdp->passed_quiesce) {
+	    rdp->qs_pending && !rdp->passed_quiesce &&
+	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) {
 		rdp->n_rp_qs_pending++;
-	} else if (rdp->qs_pending && rdp->passed_quiesce) {
+	} else if (rdp->qs_pending &&
+		   (rdp->passed_quiesce ||
+		    rdp->rcu_qs_ctr_snap != __this_cpu_read(rcu_qs_ctr))) {
 		rdp->n_rp_report_qs++;
 		return 1;
 	}
@@ -3542,6 +3553,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 			rdp->gpnum = rnp->completed;
 			rdp->completed = rnp->completed;
 			rdp->passed_quiesce = 0;
+			rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 			rdp->qs_pending = 0;
 			trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
 		}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4c156b43ff1a..f1dfc0bbb498 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -252,6 +252,8 @@ struct rcu_data {
 					/*  in order to detect GP end. */
 	unsigned long	gpnum;		/* Highest gp number that this CPU */
 					/*  is aware of having started. */
+	unsigned long	rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
+					/*  for rcu_all_qs() invocations. */
 	bool		passed_quiesce;	/* User-mode/idle loop etc. */
 	bool		qs_pending;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 5cdc62e1beeb..4ec028a9987a 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -115,11 +115,13 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 
 	if (!rdp->beenonline)
 		return;
-	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d qp=%d",
+	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d/%d qp=%d",
 		   rdp->cpu,
 		   cpu_is_offline(rdp->cpu) ? '!' : ' ',
 		   ulong2long(rdp->completed), ulong2long(rdp->gpnum),
-		   rdp->passed_quiesce, rdp->qs_pending);
+		   rdp->passed_quiesce,
+		   rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr),
+		   rdp->qs_pending);
 	seq_printf(m, " dt=%d/%llx/%d df=%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: [PATCH] sched: Fix lost reschedule in __cond_resched()
  2014-12-13  7:36                                                                 ` [PATCH] sched: Fix lost reschedule in __cond_resched() Ingo Molnar
@ 2014-12-14 18:04                                                                   ` Frederic Weisbecker
  2014-12-14 19:43                                                                     ` Ingo Molnar
  2014-12-14 19:50                                                                     ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-12-14 18:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 08:36:34AM +0100, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > I'm also not sure if the bug ever happens with preemption 
> > disabled. Sasha, was that you who reported that you cannot 
> > reproduce it without preemption? It strikes me that there's a 
> > race condition in __cond_resched() wrt preemption, for example: 
> > we do
> > 
> >         __preempt_count_add(PREEMPT_ACTIVE);
> >         __schedule();
> >         __preempt_count_sub(PREEMPT_ACTIVE);
> > 
> > and in between the __schedule() and __preempt_count_sub(), if 
> > an interrupt comes in and wakes up some important process, it 
> > won't reschedule (because preemption is active), but then we 
> > enable preemption again and don't check whether we should 
> > reschedule (again), and we just go on our merry ways.
> 
> Indeed, that's a really good find regardless of whether it's the 
> source of these lockups - the (untested) patch below ought to 
> cure that.
> 
> > Now, I don't see how that could really matter for a long time - 
> > returning to user space will check need_resched, and sleeping 
> > will obviously force a reschedule anyway, so these kinds of 
> > races should at most delay things by just a tiny amount, but 
> > maybe there is some case where we screw up in a bigger way. So 
> > I do *not* believe that the one in __cond_resched() matters, 
> > but I'm giving it as an example of the kind of things that 
> > could go wrong.
> 
> (as you later note) NOHZ is somewhat special in this regard, 
> because there we try really hard not to run anything 
> periodically, so a lost reschedule will matter more.
> 
> But ... I'd be surprised if this patch made a difference: it 
> should normally not be possible to go idle with tasks on the 
> runqueue (even with this bug present), and with at least one busy 
> task on the CPU we get the regular scheduler tick which ought to 
> hide such latencies.
> 
> It's nevertheless a good thing to fix, I'm just not sure it's the 
> root cause of the observed lockup here.
> 
> Thanks,
> 
> 	Ingo
> 
> --
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> 
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bb398c0c5f08..532809aa0544 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4207,6 +4207,8 @@ static void __cond_resched(void)
>  	__preempt_count_add(PREEMPT_ACTIVE);
>  	__schedule();
>  	__preempt_count_sub(PREEMPT_ACTIVE);
> +	if (need_resched())
> +		__schedule();
>  }

Nice catch! This indeed matters a lot for full nohz where a lost reschedule
interrupt might be ignored and not fixed with a near tick. Although even if
it is fixed by a tick, a missed reschedule delayed by HZ involves latency issue.

Anyway, probably the above __schedule() should stay as a preemption point
to make sure that a TASK_[UN]INTERRUPTIBLE is handled as expected and avoids
early task deactivation.

Such as:

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 240157c..6e942f3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2922,6 +2922,21 @@ void __sched schedule_preempt_disabled(void)
 	preempt_disable();
 }
 
+static void __preempt_schedule(void)
+{
+	do {
+		__preempt_count_add(PREEMPT_ACTIVE);
+		__schedule();
+		__preempt_count_sub(PREEMPT_ACTIVE);
+
+		/*
+		 * Check again in case we missed a preemption opportunity
+		 * between schedule and now.
+		 */
+		barrier();
+	} while (need_resched());
+}
+
 #ifdef CONFIG_PREEMPT
 /*
  * this is the entry point to schedule() from in-kernel preemption
@@ -2937,17 +2952,7 @@ asmlinkage __visible void __sched notrace preempt_schedule(void)
 	if (likely(!preemptible()))
 		return;
 
-	do {
-		__preempt_count_add(PREEMPT_ACTIVE);
-		__schedule();
-		__preempt_count_sub(PREEMPT_ACTIVE);
-
-		/*
-		 * Check again in case we missed a preemption opportunity
-		 * between schedule and now.
-		 */
-		barrier();
-	} while (need_resched());
+	__preempt_schedule();
 }
 NOKPROBE_SYMBOL(preempt_schedule);
 EXPORT_SYMBOL(preempt_schedule);
@@ -4249,9 +4254,7 @@ SYSCALL_DEFINE0(sched_yield)
 
 static void __cond_resched(void)
 {
-	__preempt_count_add(PREEMPT_ACTIVE);
-	__schedule();
-	__preempt_count_sub(PREEMPT_ACTIVE);
+	__preempt_schedule();
 }
 
 int __sched _cond_resched(void)

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: [PATCH] sched: Fix lost reschedule in __cond_resched()
  2014-12-14 18:04                                                                   ` Frederic Weisbecker
@ 2014-12-14 19:43                                                                     ` Ingo Molnar
  2014-12-14 19:50                                                                     ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Ingo Molnar @ 2014-12-14 19:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Sat, Dec 13, 2014 at 08:36:34AM +0100, Ingo Molnar wrote:
> > 
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > I'm also not sure if the bug ever happens with preemption 
> > > disabled. Sasha, was that you who reported that you cannot 
> > > reproduce it without preemption? It strikes me that there's a 
> > > race condition in __cond_resched() wrt preemption, for example: 
> > > we do
> > > 
> > >         __preempt_count_add(PREEMPT_ACTIVE);
> > >         __schedule();
> > >         __preempt_count_sub(PREEMPT_ACTIVE);
> > > 
> > > and in between the __schedule() and __preempt_count_sub(), if 
> > > an interrupt comes in and wakes up some important process, it 
> > > won't reschedule (because preemption is active), but then we 
> > > enable preemption again and don't check whether we should 
> > > reschedule (again), and we just go on our merry ways.
> > 
> > Indeed, that's a really good find regardless of whether it's the 
> > source of these lockups - the (untested) patch below ought to 
> > cure that.
> > 
> > > Now, I don't see how that could really matter for a long time - 
> > > returning to user space will check need_resched, and sleeping 
> > > will obviously force a reschedule anyway, so these kinds of 
> > > races should at most delay things by just a tiny amount, but 
> > > maybe there is some case where we screw up in a bigger way. So 
> > > I do *not* believe that the one in __cond_resched() matters, 
> > > but I'm giving it as an example of the kind of things that 
> > > could go wrong.
> > 
> > (as you later note) NOHZ is somewhat special in this regard, 
> > because there we try really hard not to run anything 
> > periodically, so a lost reschedule will matter more.
> > 
> > But ... I'd be surprised if this patch made a difference: it 
> > should normally not be possible to go idle with tasks on the 
> > runqueue (even with this bug present), and with at least one busy 
> > task on the CPU we get the regular scheduler tick which ought to 
> > hide such latencies.
> > 
> > It's nevertheless a good thing to fix, I'm just not sure it's the 
> > root cause of the observed lockup here.
> > 
> > Thanks,
> > 
> > 	Ingo
> > 
> > --
> > 
> > Reported-by: Linus Torvalds <torvalds@linux-foundation.org> 
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index bb398c0c5f08..532809aa0544 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4207,6 +4207,8 @@ static void __cond_resched(void)
> >  	__preempt_count_add(PREEMPT_ACTIVE);
> >  	__schedule();
> >  	__preempt_count_sub(PREEMPT_ACTIVE);
> > +	if (need_resched())
> > +		__schedule();
> >  }
> 
> Nice catch! This indeed matters a lot for full nohz where a lost reschedule
> interrupt might be ignored and not fixed with a near tick. Although even if
> it is fixed by a tick, a missed reschedule delayed by HZ involves latency issue.
> 
> Anyway, probably the above __schedule() should stay as a preemption point
> to make sure that a TASK_[UN]INTERRUPTIBLE is handled as expected and avoids
> early task deactivation.
> 
> Such as:
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 240157c..6e942f3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2922,6 +2922,21 @@ void __sched schedule_preempt_disabled(void)
>  	preempt_disable();
>  }
>  
> +static void __preempt_schedule(void)
> +{
> +	do {
> +		__preempt_count_add(PREEMPT_ACTIVE);
> +		__schedule();
> +		__preempt_count_sub(PREEMPT_ACTIVE);
> +
> +		/*
> +		 * Check again in case we missed a preemption opportunity
> +		 * between schedule and now.
> +		 */
> +		barrier();
> +	} while (need_resched());
> +}
> +
>  #ifdef CONFIG_PREEMPT
>  /*
>   * this is the entry point to schedule() from in-kernel preemption
> @@ -2937,17 +2952,7 @@ asmlinkage __visible void __sched notrace preempt_schedule(void)
>  	if (likely(!preemptible()))
>  		return;
>  
> -	do {
> -		__preempt_count_add(PREEMPT_ACTIVE);
> -		__schedule();
> -		__preempt_count_sub(PREEMPT_ACTIVE);
> -
> -		/*
> -		 * Check again in case we missed a preemption opportunity
> -		 * between schedule and now.
> -		 */
> -		barrier();
> -	} while (need_resched());
> +	__preempt_schedule();
>  }
>  NOKPROBE_SYMBOL(preempt_schedule);
>  EXPORT_SYMBOL(preempt_schedule);
> @@ -4249,9 +4254,7 @@ SYSCALL_DEFINE0(sched_yield)
>  
>  static void __cond_resched(void)
>  {
> -	__preempt_count_add(PREEMPT_ACTIVE);
> -	__schedule();
> -	__preempt_count_sub(PREEMPT_ACTIVE);
> +	__preempt_schedule();
>  }

Yeah, agreed, your variant is even nicer.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: [PATCH] sched: Fix lost reschedule in __cond_resched()
  2014-12-14 18:04                                                                   ` Frederic Weisbecker
  2014-12-14 19:43                                                                     ` Ingo Molnar
@ 2014-12-14 19:50                                                                     ` Linus Torvalds
  2014-12-14 20:30                                                                       ` Frederic Weisbecker
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-14 19:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Dave Jones, Chris Mason, Mike Galbraith,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Sun, Dec 14, 2014 at 10:04 AM, Frederic Weisbecker
<fweisbec@gmail.com> wrote:
>
> Such as:

So I like your patch, but quite frankly, can we go one step further?

Look at the callers of __schedule().

EVERY SINGLE ONE now has that loop around it that goes along the lines of

   do {
      .. disable preemption somehow ..
      __schedule();
      ...enable preemption without scheduling ..
   } while (need_resced());

except for one - the regular "schedule()" function.

Furthermore, look inside __schedule() itself: it has the same loop,
except with a count of one.

So I would suggest going the extra mile, and
 - remove the loop from __schedule() itself
 - add the same loop as everywhere else to "schedule()"

IOW, just make this "you have to loop and disable preemption" thing be
a rule that __schedule() can depend on.

                 Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: [PATCH] sched: Fix lost reschedule in __cond_resched()
  2014-12-14 19:50                                                                     ` Linus Torvalds
@ 2014-12-14 20:30                                                                       ` Frederic Weisbecker
  0 siblings, 0 replies; 486+ messages in thread
From: Frederic Weisbecker @ 2014-12-14 20:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Dave Jones, Chris Mason, Mike Galbraith,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Sun, Dec 14, 2014 at 11:50:20AM -0800, Linus Torvalds wrote:
> On Sun, Dec 14, 2014 at 10:04 AM, Frederic Weisbecker
> <fweisbec@gmail.com> wrote:
> >
> > Such as:
> 
> So I like your patch, but quite frankly, can we go one step further?
> 
> Look at the callers of __schedule().
> 
> EVERY SINGLE ONE now has that loop around it that goes along the lines of
> 
>    do {
>       .. disable preemption somehow ..
>       __schedule();
>       ...enable preemption without scheduling ..
>    } while (need_resced());
> 
> except for one - the regular "schedule()" function.
> 
> Furthermore, look inside __schedule() itself: it has the same loop,
> except with a count of one.
> 
> So I would suggest going the extra mile, and
>  - remove the loop from __schedule() itself

That sounds like a good idea. Unless the loop inside __schedule()
is very frequent and sensitive enough to show visible overhead if we
force it to pass through the preemp_count_add/sub() and local_irq_*()
operations in the preempt_schedule_*() functions.

I suspect it's not, so I'm cooking that patch.

>  - add the same loop as everywhere else to "schedule()"

Right. I'm doing that too.

> IOW, just make this "you have to loop and disable preemption" thing be
> a rule that __schedule() can depend on.

Ok. It would be nice if we could have a common function that does the loop
and PREEMPT_ACTIVE increments. But the variable code is inside that loop
so that's only factorizable with a function pointer (no-go in that fast-path)
or a macro that would make things even worse and ugly.

So I think I'll just keep all those loops explicit.

Thanks.

>                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-14 17:50                                                                             ` Paul E. McKenney
@ 2014-12-14 23:46                                                                               ` Sasha Levin
  2014-12-15  0:11                                                                                 ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-14 23:46 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/14/2014 12:50 PM, Paul E. McKenney wrote:
> rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
> 
> Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
> in places where it would be useful for it to apply to the normal RCU
> flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
> case for workloads that aggressively overload the system, particularly
> those that generate large numbers of RCU updates on systems running
> NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
> from cond_resched_rcu_qs() to the normal RCU flavors.
> 
> Note that it is unfortunately necessary to leave the old ->passed_quiesce
> mechanism in place to allow quiescent states that apply to only one
> flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
> that case, but that is not so good for debugging of RCU internals.)
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Does it depend on anything not currently in -next? My build fails with

kernel/rcu/tree.c: In function ‘rcu_report_qs_rdp’:
kernel/rcu/tree.c:2099:6: error: ‘struct rcu_data’ has no member named ‘gpwrap’
   rdp->gpwrap) {
      ^

On an unrelated subject, I've tried disabling preemption, and am seeing different
stalls even when I have the testfiles fuzzing in trinity disabled (which means I'm
not seeing hangs in the preempt case):

[  332.920142] INFO: rcu_sched self-detected stall on CPU
[  332.920142] 	19: (2099 ticks this GP) idle=f7d/140000000000001/0 softirq=21726/21726 fqs=1751
[  332.920142] 	 (t=2100 jiffies g=10656 c=10655 q=212427)
[  332.920142] Task dump for CPU 19:
[  332.920142] trinity-c522    R  running task    13544  9447   8279 0x1008000a
[  332.920142]  00000000000034e8 00000000000034e8 ffff8808a678a000 ffff8808bc203c18
[  332.920142]  ffffffff814b66f6 dfffe900000054de 0000000000000013 ffff8808bc215800
[  332.920142]  0000000000000013 ffffffff9cb5d018 dfffe90000000000 ffff8808bc203c48
[  332.920142] Call Trace:
[  332.920142] <IRQ> sched_show_task (kernel/sched/core.c:4541)
[  332.920142] dump_cpu_task (kernel/sched/core.c:8383)
[  332.940081] INFO: rcu_sched detected stalls on CPUs/tasks:
[  332.920142] rcu_dump_cpu_stacks (kernel/rcu/tree.c:1093)
[  332.920142] rcu_check_callbacks (kernel/rcu/tree.c:1199 kernel/rcu/tree.c:1261 kernel/rcu/tree.c:3194 kernel/rcu/tree.c:3254 kernel/rcu/tree.c:2507)
[  332.920142] update_process_times (./arch/x86/include/asm/preempt.h:22 kernel/time/timer.c:1386)
[  332.920142] tick_sched_timer (kernel/time/tick-sched.c:152 kernel/time/tick-sched.c:1128)
[  332.920142] __run_hrtimer (kernel/time/hrtimer.c:1216 (discriminator 3))
[  332.920142] ? tick_init_highres (kernel/time/tick-sched.c:1115)
[  332.920142] hrtimer_interrupt (include/linux/timerqueue.h:37 kernel/time/hrtimer.c:1275)
[  332.920142] ? acct_account_cputime (kernel/tsacct.c:168)
[  332.920142] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921)
[  332.920142] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:660 arch/x86/kernel/apic/apic.c:945)
[  332.920142] apic_timer_interrupt (arch/x86/kernel/entry_64.S:983)
[  332.920142] <EOI> ? retint_restore_args (arch/x86/kernel/entry_64.S:844)
[  332.920142] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[  332.920142] __debug_check_no_obj_freed (lib/debugobjects.c:713)
[  332.920142] debug_check_no_obj_freed (lib/debugobjects.c:727)
[  332.920142] free_pages_prepare (mm/page_alloc.c:829)
[  332.920142] free_hot_cold_page (mm/page_alloc.c:1496)
[  332.920142] __free_pages (mm/page_alloc.c:2982)
[  332.920142] ? __vunmap (mm/vmalloc.c:1459 (discriminator 2))
[  332.920142] __vunmap (mm/vmalloc.c:1455 (discriminator 2))
[  332.920142] vfree (mm/vmalloc.c:1500)
[  332.920142] SyS_init_module (kernel/module.c:2483 kernel/module.c:3359 kernel/module.c:3346)
[  332.920142] ia32_do_call (arch/x86/ia32/ia32entry.S:446)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-13 22:40                                                                     ` Linus Torvalds
  2014-12-13 22:59                                                                       ` Linus Torvalds
@ 2014-12-14 23:46                                                                       ` Dave Jones
  2014-12-15  0:38                                                                         ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-14 23:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Sat, Dec 13, 2014 at 02:40:51PM -0800, Linus Torvalds wrote:
 > On Sat, Dec 13, 2014 at 2:36 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > Ok, I think we can rule out preemption. I just checked on it, and
 > > found it wedged.
 > 
 > Ok, one more. Mind checking what happens without CONFIG_DEBUG_PAGEALLOC?

Crap. Looks like it wedged. It's stuck that way until I get back to it
on Wednesday.


[ 6188.985536] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c175:14205]
[ 6188.985612] CPU: 1 PID: 14205 Comm: trinity-c175 Not tainted 3.18.0+ #103 [loadavg: 200.63 151.07 150.40 179/407 17316]
[ 6188.985652] task: ffff880056ac96d0 ti: ffff8800975d8000 task.ti: ffff8800975d8000
[ 6188.985680] RIP: 0010:[<ffffffff810c6430>]  [<ffffffff810c6430>] lock_release+0xc0/0x240
[ 6188.985714] RSP: 0018:ffff8800975dbaa8  EFLAGS: 00000292
[ 6188.985734] RAX: ffff880056ac96d0 RBX: ffff8800975dbaf0 RCX: 00000000000003a0
[ 6188.985759] RDX: ffff88024500dd20 RSI: 0000000000000000 RDI: ffff880056ac9e40
[ 6188.985785] RBP: ffff8800975dbad8 R08: 0000000000000000 R09: 0000000000000000
[ 6188.985810] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000292
[ 6188.985835] R13: ffff8800975dba28 R14: 0000000000000292 R15: 0000000000000292
[ 6188.985861] FS:  00007f107fc69740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[ 6188.985890] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6188.985912] CR2: 00007fff7457af40 CR3: 00000000145ed000 CR4: 00000000001407e0
[ 6188.985937] DR0: 00007f322081b000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6188.985963] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 6188.985988] Stack:
[ 6188.986000]  ffff8800975dbb08 0000000000000000 0000000000000000 00000000001ce380
[ 6188.986034]  ffff8800975dbd08 ffff8802451ce380 ffff8800975dbbd8 ffffffff8116f928
[ 6188.986067]  ffffffff8116f842 ffff8800975dbaf0 ffff8800975dbaf0 0000000000000001
[ 6188.986101] Call Trace:
[ 6188.986116]  [<ffffffff8116f928>] __perf_sw_event+0x168/0x240
[ 6188.987079]  [<ffffffff8116f842>] ? __perf_sw_event+0x82/0x240
[ 6188.988045]  [<ffffffff81178ab2>] ? __lock_page_or_retry+0xb2/0xc0
[ 6188.989008]  [<ffffffff811a68f8>] ? handle_mm_fault+0x458/0xe90
[ 6188.989986]  [<ffffffff8104250e>] __do_page_fault+0x28e/0x5c0
[ 6188.990940]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6188.991884]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
[ 6188.992826]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
[ 6188.993773]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 6188.994715]  [<ffffffff8104284c>] do_page_fault+0xc/0x10
[ 6188.995658]  [<ffffffff817d1a32>] page_fault+0x22/0x30
[ 6188.996590]  [<ffffffff81375266>] ? __clear_user+0x36/0x60
[ 6188.997518]  [<ffffffff81375247>] ? __clear_user+0x17/0x60
[ 6188.998440]  [<ffffffff8100f3f1>] save_xstate_sig+0x81/0x220
[ 6188.999362]  [<ffffffff817cf1cf>] ? _raw_spin_unlock_irqrestore+0x4f/0x60
[ 6189.000291]  [<ffffffff810029e7>] do_signal+0x5c7/0x740
[ 6189.001220]  [<ffffffff81209acf>] ? mnt_drop_write+0x2f/0x40
[ 6189.002164]  [<ffffffff811e527e>] ? chmod_common+0xfe/0x150
[ 6189.003096]  [<ffffffff81002bc5>] do_notify_resume+0x65/0x80
[ 6189.004038]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6189.004972]  [<ffffffff817d00ff>] int_signal+0x12/0x17
[ 6189.005899] Code: ff 0f 85 7c 00 00 00 4c 89 ea 4c 89 e6 48 89 df e8 26 fc ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 41 56 9d <48> 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 65 ff 04 25 e0 
[ 6189.007935] sending NMI to other CPUs:
[ 6189.008904] NMI backtrace for cpu 2
[ 6189.009755] CPU: 2 PID: 14224 Comm: trinity-c194 Not tainted 3.18.0+ #103 [loadavg: 200.63 151.07 150.40 179/407 17316]
[ 6189.010618] task: ffff880224af5b40 ti: ffff880225aec000 task.ti: ffff880225aec000
[ 6189.011555] RIP: 0010:[<ffffffff81176890>]  [<ffffffff81176890>] pagecache_get_page+0x0/0x220
[ 6189.012501] RSP: 0018:ffff880225aefb50  EFLAGS: 00000282
[ 6189.013442] RAX: ffff88023f4b9d00 RBX: 00007fff7457b07f RCX: 0000000000000000
[ 6189.014396] RDX: 0000000000000000 RSI: 000000000001dda7 RDI: ffffffff81c6aa80
[ 6189.015357] RBP: ffff880225aefb58 R08: 0000000000000000 R09: 0000000007769c80
[ 6189.016317] R10: 0000000000000000 R11: 0000000000000029 R12: ffff88022a63be00
[ 6189.017283] R13: ffff8801c64afd10 R14: ffff880000000bd8 R15: ffff880187efea40
[ 6189.018253] FS:  00007f107fc69740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[ 6189.019247] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6189.020245] CR2: 00007fff7457b07f CR3: 000000022789e000 CR4: 00000000001407e0
[ 6189.021230] DR0: 00007f322081b000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6189.022193] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 6189.023155] Stack:
[ 6189.024104]  ffffffff811b99da ffff880225aefbf8 ffffffff811a68a1 ffff880225aefbd8
[ 6189.025084]  0000000000000246 ffffffff81042418 ffffffff00000000 0000000000000000
[ 6189.026063]  0000000000000246 0000000100000000 ffff880187efeb58 0000000000000080
[ 6189.027025] Call Trace:
[ 6189.027962]  [<ffffffff811b99da>] ? lookup_swap_cache+0x2a/0x70
[ 6189.028897]  [<ffffffff811a68a1>] handle_mm_fault+0x401/0xe90
[ 6189.029819]  [<ffffffff81042418>] ? __do_page_fault+0x198/0x5c0
[ 6189.030731]  [<ffffffff8104247c>] __do_page_fault+0x1fc/0x5c0
[ 6189.031635]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6189.032537]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
[ 6189.033432]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
[ 6189.034334]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 6189.035238]  [<ffffffff8104284c>] do_page_fault+0xc/0x10
[ 6189.036146]  [<ffffffff817d1a32>] page_fault+0x22/0x30
[ 6189.037043]  [<ffffffff8100f408>] ? save_xstate_sig+0x98/0x220
[ 6189.037934]  [<ffffffff8100f3f1>] ? save_xstate_sig+0x81/0x220
[ 6189.038819]  [<ffffffff810029e7>] do_signal+0x5c7/0x740
[ 6189.039699]  [<ffffffff817cf210>] ? _raw_spin_unlock_irq+0x30/0x40
[ 6189.040583]  [<ffffffff81002bc5>] do_notify_resume+0x65/0x80
[ 6189.041464]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6189.042340]  [<ffffffff817d00ff>] int_signal+0x12/0x17
[ 6189.043210] Code: f0 80 a6 81 48 89 df e8 7f a5 02 00 0f 0b 48 89 df e8 45 fd ff ff 48 89 df e8 8d e4 00 00 eb 83 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 55 48 89 e5 41 57 45 89 c7 41 56 49 89 f6 41 55 
[ 6189.045130] NMI backtrace for cpu 3
[ 6189.045244] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 36.225 msecs
[ 6189.046980] CPU: 3 PID: 14076 Comm: trinity-c46 Not tainted 3.18.0+ #103 [loadavg: 200.63 151.07 150.40 181/407 17316]
[ 6189.047934] task: ffff88008a6c4470 ti: ffff8801cdb58000 task.ti: ffff8801cdb58000
[ 6189.048893] RIP: 0010:[<ffffffff810c63a4>]  [<ffffffff810c63a4>] lock_release+0x34/0x240
[ 6189.049867] RSP: 0000:ffff8801cdb5bad0  EFLAGS: 00000296
[ 6189.050834] RAX: ffff88008a6c4470 RBX: ffff88013f39e4a8 RCX: 00000000000003a0
[ 6189.051815] RDX: ffffffff81178a7f RSI: 0000000000000001 RDI: ffff88013f39e518
[ 6189.052799] RBP: ffff8801cdb5bb08 R08: 0000000000000000 R09: 00000000073e8480
[ 6189.053781] R10: ffffea0007927a80 R11: 0000000000000029 R12: ffff88013f39e518
[ 6189.054764] R13: ffffffff81178a7f R14: ffff880000000bd8 R15: 0000000000000001
[ 6189.055748] FS:  00007f107fc69740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[ 6189.056746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6189.057751] CR2: 00007fff7457b07f CR3: 0000000226f46000 CR4: 00000000001407e0
[ 6189.058770] DR0: 00007f322081b000 DR1: 0000000000[ 6216.969357] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c175:14205]
[ 6216.970331] CPU: 1 PID: 14205 Comm: trinity-c175 Tainted: G             L 3.18.0+ #103 [loadavg: 221.85 160.04 153.39 183/407 17316]
[ 6216.971359] task: ffff880056ac96d0 ti: ffff8800975d8000 task.ti: ffff8800975d8000
[ 6216.972366] RIP: 0010:[<ffffffff817cf1b8>]  [<ffffffff817cf1b8>] _raw_spin_unlock_irqrestore+0x38/0x60
[ 6216.973391] RSP: 0018:ffff8800975dba18  EFLAGS: 00000292
[ 6216.974423] RAX: 0000000000000001 RBX: ffff880056ac96d0 RCX: 0000000000005040
[ 6216.975459] RDX: ffff88024502f580 RSI: 0000000000000000 RDI: ffff88024e581e28
[ 6216.976507] RBP: ffff8800975dba28 R08: 0000000000000000 R09: ffff8800975dbaf0
[ 6216.977551] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000046
[ 6216.978594] R13: ffff8800975db9e8 R14: 0000000000000000 R15: 0000000000000000
[ 6216.979635] FS:  00007f107fc69740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[ 6216.980686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6216.981735] CR2: 00007fff7457af40 CR3: 00000000145ed000 CR4: 00000000001407e0
[ 6216.982774] DR0: 00007f322081b000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6216.983792] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 6216.984799] Stack:
[ 6216.985785]  ffff8800975dbad8 ffff8800975dbaf0 ffff8800975dba58 ffffffff810bcd56
[ 6216.986789]  ffff8800975dbac0 ffff8800975dbad8 ffff88024e581e28 0000000000000082
[ 6216.987794]  ffff8800975dbaa8 ffffffff817c9f4e ffff8800975dba78 000000010073e800
[ 6216.988770] Call Trace:
[ 6216.989729]  [<ffffffff810bcd56>] finish_wait+0x56/0x70
[ 6216.990693]  [<ffffffff817c9f4e>] __wait_on_bit+0x7e/0x90
[ 6216.991661]  [<ffffffff811789d7>] wait_on_page_bit_killable+0xc7/0xf0
[ 6216.992632]  [<ffffffff810bd050>] ? autoremove_wake_function+0x40/0x40
[ 6216.993609]  [<ffffffff81178ab2>] __lock_page_or_retry+0xb2/0xc0
[ 6216.994586]  [<ffffffff811a6e5c>] handle_mm_fault+0x9bc/0xe90
[ 6216.995555]  [<ffffffff81042418>] ? __do_page_fault+0x198/0x5c0
[ 6216.996516]  [<ffffffff8104247c>] __do_page_fault+0x1fc/0x5c0
[ 6216.997468]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6216.998428]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6216.999386]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
[ 6217.000330]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
[ 6217.001269]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 6217.002205]  [<ffffffff8104284c>] do_page_fault+0xc/0x10
[ 6217.003136]  [<ffffffff817d1a32>] page_fault+0x22/0x30
[ 6217.004055]  [<ffffffff81375266>] ? __clear_user+0x36/0x60
[ 6217.004972]  [<ffffffff81375247>] ? __clear_user+0x17/0x60
[ 6217.005882]  [<ffffffff8100f3f1>] save_xstate_sig+0x81/0x220
[ 6217.006800]  [<ffffffff817cf1cf>] ? _raw_spin_unlock_irqrestore+0x4f/0x60
[ 6217.007716]  [<ffffffff810029e7>] do_signal+0x5c7/0x740
[ 6217.008635]  [<ffffffff81209acf>] ? mnt_drop_write+0x2f/0x40
[ 6217.009555]  [<ffffffff811e527e>] ? chmod_common+0xfe/0x150
[ 6217.010470]  [<ffffffff81002bc5>] do_notify_resume+0x65/0x80
[ 6217.011382]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6217.012297]  [<ffffffff817d00ff>] int_signal+0x12/0x17
[ 6217.013210] Code: fc 48 8b 55 08 53 48 8d 7f 18 48 89 f3 be 01 00 00 00 e8 cc 71 8f ff 4c 89 e7 e8 f4 a4 8f ff f6 c7 02 74 17 e8 0a b0 97 ff 53 9d <5b> 65 ff 0c 25 e0 a9 00 00 41 5c 5d c3 0f 1f 00 53 9d e8 f1 ae 
[ 6217.015229] sending NMI to other CPUs:
[ 6217.016191] NMI backtrace for cpu 3
[ 6217.017110] CPU: 3 PID: 14076 Comm: trinity-c46 Tainted: G             L 3.18.0+ #103 [loadavg: 221.85 160.04 153.39 183/407 17316]
[ 6217.018066] task: ffff88008a6c4470 ti: ffff8801cdb58000 task.ti: ffff8801cdb58000
[ 6217.019021] RIP: 0010:[<ffffffff810c5071>]  [<ffffffff810c5071>] __lock_acquire.isra.31+0x1b1/0x9f0
[ 6217.019997] RSP: 0000:ffff8801cdb5b9d8  EFLAGS: 00000083
[ 6217.020972] RAX: 000000000000001e RBX: ffff88008a6c4470 RCX: 0000000000000002
[ 6217.021966] RDX: 0000000000000157 RSI: 0000000000000008 RDI: 0000000000000000
[ 6217.022961] RBP: ffff8801cdb5ba48 R08: 0000000000000000 R09: 0000000000000000
[ 6217.023953] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000001d
[ 6217.024945] R13: 0000000000000001 R14: ffffffff81c50e60 R15: ffff88008a6c4c18
[ 6217.025939] FS:  00007f107fc69740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[ 6217.026943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6217.027943] CR2: 00007fff7457b07f CR3: 0000000226f46000 CR4: 00000000001407e0
[ 6217.028933] DR0: 00007f322081b000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6217.029896] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 6217.030846] Stack:
[ 6217.031772]  ffff88008a6c4470 ffff88024e52e040 ffff8801cdb5ba28 0000000000000092
[ 6217.032708]  ffff8801cdb5ba08 ffffffff810abab5 ffff8801cdb5ba88 ffffffff810c50ec
[ 6217.033625]  0000000000000296 0000000000000246 0000000000000000 0000000000000000
[ 6217.034537] Call Trace:
[ 6217.035425]  [<ffffffff810abab5>] ? local_clock+0x25/0x30
[ 6217.036316]  [<ffffffff810c50ec>] ? __lock_acquire.isra.31+0x22c/0x9f0
[ 6217.037210]  [<ffffffff810c5fbf>] lock_acquire+0x9f/0x120
[ 6217.038103]  [<ffffffff811766d5>] ? find_get_entry+0x5/0x120
[ 6217.038995]  [<ffffffff81176717>] find_get_entry+0x47/0x120
[ 6217.039891]  [<ffffffff811766d5>] ? find_get_entry+0x5/0x120
[ 6217.040776]  [<ffffffff811768bf>] pagecache_get_page+0x2f/0x220
[ 6217.041653]  [<ffffffff8116f842>] ? __perf_sw_event+0x82/0x240
[ 6217.042527]  [<ffffffff811b99da>] lookup_swap_cache+0x2a/0x70
[ 6217.043399]  [<ffffffff811a68a1>] handle_mm_fault+0x401/0xe90
[ 6217.044273]  [<ffffffff81042418>] ? __do_page_fault+0x198/0x5c0
[ 6217.045140]  [<ffffffff8104247c>] __do_page_fault+0x1fc/0x5c0
[ 6217.045999]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
[ 6217.046857]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
[ 6217.047713]  [<ffffffff81042358>] ? __do_page_fault+0xd8/0x5c0
[ 6217.048562]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 6217.049414]  [<ffffffff8104284c>] do_page_fault+0xc/0x10
[ 6217.050263]  [<ffffffff817d1a32>] page_fault+0x22/0x30
[ 6217.051108]  [<ffffffff8100f408>] ? save_xstate_sig+0x98/0x220
[ 6217.051953]  [<ffffffff8100f3f1>] ? save_xstate_sig+0x81/0x220
[ 6217.052787]  [<ffffffff810029e7>] do_signal+0x5c7/0x740
[ 6217.053620]  [<ffffffff817cf210>] ? _raw_spin_unlock_irq+0x30/0x40
[ 6217.054457]  [<ffffffff81002bc5>] do_notify_resume+0x65/0x80
[ 6217.055294]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6217.056133]  [<ffffffff817d00ff>] int_signal+0x12/0x17
[ 6217.056977] Code: ea 48 8d 34 d5 00 00 00 00 48 c1 e2 06 48 29 f2 4c 8d bc 13 70 07 00 00 41 0f b7 57 f8 81 e2 ff 1f 00 00 39 d0 0f 84 3f 01 00 00 <41> 0f b7 57 30 66 25 ff 1f 4c 89 4d c8 41 c1 e2 07 83 e1 03 4d 
[ 6217.058855] NMI backtrace for cpu 2
[ 6217.059739] CPU: 2 PID: 14224 Comm: trinity-c194 Tainted: G             L 3.18.0+ #103 [loadavg: 221.85 160.04 153.39 183/407 17316]
[ 6217.060662] task: ffff880224af5b40 ti: ffff880225aec000 task.ti: ffff880225aec000
[ 6217.061589] RIP: 0010:[<ffffffff810c6396>]  [<ffffffff810c6396>] lock_release+0x26/0x240
[ 6217.062533] RSP: 0018:ffff880225aefa10  EFLAGS: 00000046
[ 6217.063477] RAX: ffff880224af5b40 RBX: 0000000000000296 RCX: 0000000000000002
[ 6217.064435] RDX: ffffffff810bcd56 RSI: 0000000000000001 RDI: ffff88024e54da40
[ 6217.065392] RBP: ffff880225aefa28 R08: 0000000000000000 R09: ffff880225aefb10
[ 6217.066342] R[ 6217.158570] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 142.293 msecs
[ 6225.814243] INFO: rcu_sched self-detected stall on CPU
[ 6225.815127] 	3: (5990 ticks this GP) idle=f83/140000000000001/0 softirq=390686/390686 
[ 6225.816000] 	 (t=6000 jiffies g=166553 c=166552 q=0)
[ 6225.816870] Task dump for CPU 3:
[ 6225.817736] trinity-c46     R  running task    13568 14076  13551 0x1000000c
[ 6225.818626]  ffff88008a6c4470 00000000ef08e0c8 ffff880245403d68 ffffffff810a73fc
[ 6225.819531]  ffffffff810a7362 0000000000000003 0000000000000008 0000000000000003
[ 6225.820432]  ffffffff81c523c0 0000000000000092 ffff880245403d88 ffffffff810ab4ad
[ 6225.821340] Call Trace:
[ 6225.822218]  <IRQ>  [<ffffffff810a73fc>] sched_show_task+0x11c/0x190
[ 6225.823122]  [<ffffffff810a7362>] ? sched_show_task+0x82/0x190
[ 6225.824021]  [<ffffffff810ab4ad>] dump_cpu_task+0x3d/0x50
[ 6225.824916]  [<ffffffff810d9bf0>] rcu_dump_cpu_stacks+0x90/0xd0
[ 6225.825813]  [<ffffffff810e0783>] rcu_check_callbacks+0x503/0x770
[ 6225.826697]  [<ffffffff811304dc>] ? acct_account_cputime+0x1c/0x20
[ 6225.827581]  [<ffffffff810abde7>] ? account_system_time+0x97/0x180
[ 6225.828464]  [<ffffffff810e645b>] update_process_times+0x4b/0x80
[ 6225.829350]  [<ffffffff810f6e13>] ? tick_sched_timer+0x23/0x1b0
[ 6225.830233]  [<ffffffff810f6e3f>] tick_sched_timer+0x4f/0x1b0
[ 6225.831108]  [<ffffffff810e72ff>] __run_hrtimer+0xaf/0x240
[ 6225.831977]  [<ffffffff810e76eb>] ? hrtimer_interrupt+0x16b/0x260
[ 6225.832844]  [<ffffffff810f6df0>] ? tick_init_highres+0x20/0x20
[ 6225.833709]  [<ffffffff810e7687>] hrtimer_interrupt+0x107/0x260
[ 6225.834565]  [<ffffffff81031e9b>] local_apic_timer_interrupt+0x3b/0x70
[ 6225.835384]  [<ffffffff817d28c5>] smp_apic_timer_interrupt+0x45/0x60
[ 6225.836203]  [<ffffffff817d0caf>] apic_timer_interrupt+0x6f/0x80
[ 6225.837023]  <EOI>  [<ffffffff810c50ec>] ? __lock_acquire.isra.31+0x22c/0x9f0
[ 6225.837858]  [<ffffffff810c5fd4>] ? lock_acquire+0xb4/0x120
[ 6225.838688]  [<ffffffff81042418>] ? __do_page_fault+0x198/0x5c0
[ 6225.839517]  [<ffffffff810c225a>] down_read_trylock+0x5a/0x60
[ 6225.840345]  [<ffffffff81042418>] ? __do_page_fault+0x198/0x5c0
[ 6225.841175]  [<ffffffff81042418>] __do_page_fault+0x198/0x5c0
[ 6225.842004]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
[ 6225.842836]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
[ 6225.843672]  [<ffffffff81042358>] ? __do_page_fault+0xd8/0x5c0
[ 6225.844506]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 6225.845340]  [<ffffffff8104284c>] do_page_fault+0xc/0x10
[ 6225.846174]  [<ffffffff817d1a32>] page_fault+0x22/0x30
[ 6225.847002]  [<ffffffff8100f408>] ? save_xstate_sig+0x98/0x220
[ 6225.847827]  [<ffffffff8100f3f1>] ? save_xstate_sig+0x81/0x220
[ 6225.848648]  [<ffffffff810029e7>] do_signal+0x5c7/0x740
[ 6225.849468]  [<ffffffff817cf210>] ? _raw_spin_unlock_irq+0x30/0x40
[ 6225.850287]  [<ffffffff81002bc5>] do_notify_resume+0x65/0x80
[ 6225.851104]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6225.851927]  [<ffffffff817d00ff>] int_signal+0x12/0x17
[ 6225.852746] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 6225.853609] 	3: (5991 ticks this GP) idle=f83/140000000000000/0 softirq=390686/390686 
[ 6225.854481] 	(detected by 1, t=6004 jiffies, g=166553, c=166552, q=0)
[ 6225.855354] Task dump for CPU 3:
[ 6225.856225] trinity-c46     R  running task    13568 14076  13551 0x1000000c
[ 6225.857127]  ffffffff810bcd38 ffff88008a6c4470 ffff8801cdb5b9c8 ffffffff810abab5
[ 6225.858045]  ffff88024e52e040 0000000000000046 ffff88008a6c4470 ffff88024e52e040
[ 6225.858951]  ffff8801cdb5ba28 0000000000000092 ffffffff810bcb37 ffffffff810abab5
[ 6225.859860] Call Trace:
[ 6225.860761]  [<ffffffff810c35df>] ? lock_release_holdtime.part.24+0xf/0x190
[ 6225.861690]  [<ffffffff810abab5>] ? local_clock+0x25/0x30
[ 6225.862612]  [<ffffffff810c35df>] lock_release_holdtime.part.24+0xf/0x190
[ 6225.863543]  [<ffffffff810abab5>] ? local_clock+0x25/0x30
[ 6225.864473]  [<ffffffff810c50ec>] ? __lock_acquire.isra.31+0x22c/0x9f0
[ 6225.865402]  [<ffffffff810bcd56>] ? finish_wait+0x56/0x70
[ 6225.866329]  [<ffffffff817c9f4e>] ? __wait_on_bit+0x7e/0x90
[ 6225.867236]  [<ffffffff811789d7>] ? wait_on_page_bit_killable+0xc7/0xf0
[ 6225.868122]  [<ffffffff810bd050>] ? autoremove_wake_function+0x40/0x40
[ 6225.868996]  [<ffffffff811b99da>] ? lookup_swap_cache+0x2a/0x70
[ 6225.869855]  [<ffffffff811a68f8>] ? handle_mm_fault+0x458/0xe90
[ 6225.870706]  [<ffffffff810c225a>] ? down_read_trylock+0x5a/0x60
[ 6225.871545]  [<ffffffff8104247c>] ? __do_page_fault+0x1fc/0x5c0
[ 6225.872385]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
[ 6225.873219]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
[ 6225.874046]  [<ffffffff81042358>] ? __do_page_fault+0xd8/0x5c0
[ 6225.874873]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 6225.875692]  [<ffffffff8104284c>] ? do_page_fault+0xc/0x10
[ 6225.876503]  [<ffffffff817d1a32>] ? page_fault+0x22/0x30
[ 6225.877307]  [<ffffffff8100f408>] ? save_xstate_sig+0x98/0x220
[ 6225.878107]  [<ffffffff8100f3f1>] ? save_xstate_sig+0x81/0x220
[ 6225.878901]  [<ffffffff810029e7>] ? do_signal+0x5c7/0x740
[ 6225.879695]  [<ffffffff817cf210>] ? _raw_spin_unlock_irq+0x30/0x40
[ 6225.880496]  [<ffffffff81002bc5>] ? do_notify_resume+0x65/0x80
[ 6225.881292]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6225.882097]  [<ffffffff817d00ff>] ? int_signal+0x12/0x17
[ 6244.953181] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [trinity-c194:14224]
[ 6244.953995] CPU: 2 PID: 14224 Comm: trinity-c194 Tainted: G             L 3.18.0+ #103 [loadavg: 238.82 170.06 156.93 185/407 17316]
[ 6244.954854] task: ffff880224af5b40 ti: ffff880225aec000 task.ti: ffff880225aec000
[ 6244.955699] RIP: 0010:[<ffffffff810c5f60>]  [<ffffffff810c5f60>] lock_acquire+0x40/0x120
[ 6244.956560] RSP: 0018:ffff880225aefb78  EFLAGS: 00000246
[ 6244.957418] RAX: ffff880224af5b40 RBX: ffff8802453ce380 RCX: 0000000000000001
[ 6244.958281] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[ 6244.959146] RBP: ffff880225aefbd8 R08: 0000000000000001 R09: 0000000000000000
[ 6244.960008] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000001ce380
[ 6244.960867] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880225aefb28
[ 6244.961716] FS:  00007f107fc69740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[ 6244.962571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6244.963429] CR2: 00007fff7457b07f CR3: 000000022789e000 CR4: 00000000001407e0
[ 6244.964296] DR0: 00007f322081b000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6244.965160] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 6244.966025] Stack:
[ 6244.966881]  0000000005080021 ffffffff00000000 0000000000000000 0000000000000246
[ 6244.967776]  0000000100000000 ffff880187efeb58 0000000000000000 0000000000000029
[ 6244.968678]  00007fff7457b07f ffff880225aefd28 0000000000000002 ffff880187efea40
[ 6244.969564] Call Trace:
[ 6244.970445]  [<ffffffff810c225a>] down_read_trylock+0x5a/0x60
[ 6244.971336]  [<ffffffff81042418>] ? __do_page_fault+0x198/0x5c0
[ 6244.972226]  [<ffffffff81042418>] __do_page_fault+0x198/0x5c0
[ 6244.973125]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6244.974021]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6244.974913]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
[ 6244.975800]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
[ 6244.976685]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 6244.977573]  [<ffffffff8104284c>] do_page_fault+0xc/0x10
[ 6244.978455]  [<ffffffff817d1a32>] page_fault+0x22/0x30
[ 6244.979327]  [<ffffffff8100f408>] ? save_xstate_sig+0x98/0x220
[ 6244.980190]  [<ffffffff8100f3f1>] ? save_xstate_sig+0x81/0x220
[ 6244.981045]  [<ffffffff810029e7>] do_signal+0x5c7/0x740
[ 6244.981892]  [<ffffffff817cf210>] ? _raw_spin_unlock_irq+0x30/0x40
[ 6244.982743]  [<ffffffff81002bc5>] do_notify_resume+0x65/0x80
[ 6244.983582]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 6244.984425]  [<ffffffff817d00ff>] int_signal+0x12/0x17
[ 6244.985266] Code: 65 48 8b 04 25 00 aa 00 00 8b b8 6c 07 00 00 44 89 45 c4 85 ff 0f 85 84 00 00 00 41 89 f4 41 89 d5 41 89 ce 4d 89 cf 9c 8f 45 b8 <fa> c7 80 6c 07 00 00 01 00 00 00 0f 1f 44 00 00 65 ff 04 25 e0 
[ 6244.987120] sending NMI to other CPUs:

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-14 23:46                                                                               ` Sasha Levin
@ 2014-12-15  0:11                                                                                 ` Paul E. McKenney
  2014-12-15  1:20                                                                                   ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-15  0:11 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Sun, Dec 14, 2014 at 06:46:21PM -0500, Sasha Levin wrote:
> On 12/14/2014 12:50 PM, Paul E. McKenney wrote:
> > rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
> > 
> > Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
> > in places where it would be useful for it to apply to the normal RCU
> > flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
> > case for workloads that aggressively overload the system, particularly
> > those that generate large numbers of RCU updates on systems running
> > NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
> > from cond_resched_rcu_qs() to the normal RCU flavors.
> > 
> > Note that it is unfortunately necessary to leave the old ->passed_quiesce
> > mechanism in place to allow quiescent states that apply to only one
> > flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
> > that case, but that is not so good for debugging of RCU internals.)
> > 
> > Reported-by: Sasha Levin <sasha.levin@oracle.com>
> > Reported-by: Dave Jones <davej@redhat.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Does it depend on anything not currently in -next? My build fails with
> 
> kernel/rcu/tree.c: In function ‘rcu_report_qs_rdp’:
> kernel/rcu/tree.c:2099:6: error: ‘struct rcu_data’ has no member named ‘gpwrap’
>    rdp->gpwrap) {

Indeed it does.  Please see below for a port to current mainline.

> On an unrelated subject, I've tried disabling preemption, and am seeing different
> stalls even when I have the testfiles fuzzing in trinity disabled (which means I'm
> not seeing hangs in the preempt case):
> 
> [  332.920142] INFO: rcu_sched self-detected stall on CPU
> [  332.920142] 	19: (2099 ticks this GP) idle=f7d/140000000000001/0 softirq=21726/21726 fqs=1751
> [  332.920142] 	 (t=2100 jiffies g=10656 c=10655 q=212427)

More than 200K RCU callbacks queued.  Impressive!  ;-)

> [  332.920142] Task dump for CPU 19:
> [  332.920142] trinity-c522    R  running task    13544  9447   8279 0x1008000a
> [  332.920142]  00000000000034e8 00000000000034e8 ffff8808a678a000 ffff8808bc203c18
> [  332.920142]  ffffffff814b66f6 dfffe900000054de 0000000000000013 ffff8808bc215800
> [  332.920142]  0000000000000013 ffffffff9cb5d018 dfffe90000000000 ffff8808bc203c48
> [  332.920142] Call Trace:
> [  332.920142] <IRQ> sched_show_task (kernel/sched/core.c:4541)
> [  332.920142] dump_cpu_task (kernel/sched/core.c:8383)
> [  332.940081] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  332.920142] rcu_dump_cpu_stacks (kernel/rcu/tree.c:1093)
> [  332.920142] rcu_check_callbacks (kernel/rcu/tree.c:1199 kernel/rcu/tree.c:1261 kernel/rcu/tree.c:3194 kernel/rcu/tree.c:3254 kernel/rcu/tree.c:2507)
> [  332.920142] update_process_times (./arch/x86/include/asm/preempt.h:22 kernel/time/timer.c:1386)
> [  332.920142] tick_sched_timer (kernel/time/tick-sched.c:152 kernel/time/tick-sched.c:1128)
> [  332.920142] __run_hrtimer (kernel/time/hrtimer.c:1216 (discriminator 3))
> [  332.920142] ? tick_init_highres (kernel/time/tick-sched.c:1115)
> [  332.920142] hrtimer_interrupt (include/linux/timerqueue.h:37 kernel/time/hrtimer.c:1275)
> [  332.920142] ? acct_account_cputime (kernel/tsacct.c:168)
> [  332.920142] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921)
> [  332.920142] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:660 arch/x86/kernel/apic/apic.c:945)
> [  332.920142] apic_timer_interrupt (arch/x86/kernel/entry_64.S:983)
> [  332.920142] <EOI> ? retint_restore_args (arch/x86/kernel/entry_64.S:844)
> [  332.920142] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
> [  332.920142] __debug_check_no_obj_freed (lib/debugobjects.c:713)
> [  332.920142] debug_check_no_obj_freed (lib/debugobjects.c:727)
> [  332.920142] free_pages_prepare (mm/page_alloc.c:829)
> [  332.920142] free_hot_cold_page (mm/page_alloc.c:1496)
> [  332.920142] __free_pages (mm/page_alloc.c:2982)

If this still appears after applying the below patch, I would be tempted
to place a few cond_resched_rcu_qs() calls in the above three functions.

							Thanx, Paul

> [  332.920142] ? __vunmap (mm/vmalloc.c:1459 (discriminator 2))
> [  332.920142] __vunmap (mm/vmalloc.c:1455 (discriminator 2))
> [  332.920142] vfree (mm/vmalloc.c:1500)
> [  332.920142] SyS_init_module (kernel/module.c:2483 kernel/module.c:3359 kernel/module.c:3346)
> [  332.920142] ia32_do_call (arch/x86/ia32/ia32entry.S:446)

------------------------------------------------------------------------

rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors

Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index b63b9bb3bc0c..08651da15448 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -56,14 +56,14 @@ rcuboost:
 
 The output of "cat rcu/rcu_preempt/rcudata" looks as follows:
 
-  0!c=30455 g=30456 pq=1 qp=1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
-  1!c=30719 g=30720 pq=1 qp=0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
-  2!c=30150 g=30151 pq=1 qp=1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
-  3 c=31249 g=31250 pq=1 qp=0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
-  4!c=29502 g=29503 pq=1 qp=1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
-  5 c=31201 g=31202 pq=1 qp=1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
-  6!c=30253 g=30254 pq=1 qp=1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
-  7 c=31178 g=31178 pq=1 qp=0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
+  0!c=30455 g=30456 pq=1/0 qp=1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
+  1!c=30719 g=30720 pq=1/0 qp=0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
+  2!c=30150 g=30151 pq=1/1 qp=1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
+  3 c=31249 g=31250 pq=1/1 qp=0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
+  4!c=29502 g=29503 pq=1/0 qp=1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
+  5 c=31201 g=31202 pq=1/0 qp=1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
+  6!c=30253 g=30254 pq=1/0 qp=1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
+  7 c=31178 g=31178 pq=1/0 qp=0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
 
 This file has one line per CPU, or eight for this 8-CPU system.
 The fields are as follows:
@@ -188,14 +188,14 @@ o	"ca" is the number of RCU callbacks that have been adopted by this
 Kernels compiled with CONFIG_RCU_BOOST=y display the following from
 /debug/rcu/rcu_preempt/rcudata:
 
-  0!c=12865 g=12866 pq=1 qp=1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
-  1 c=14407 g=14408 pq=1 qp=0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
-  2 c=14407 g=14408 pq=1 qp=0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
-  3 c=14407 g=14408 pq=1 qp=0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
-  4 c=14405 g=14406 pq=1 qp=1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
-  5!c=14168 g=14169 pq=1 qp=0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
-  6 c=14404 g=14405 pq=1 qp=0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
-  7 c=14407 g=14408 pq=1 qp=1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
+  0!c=12865 g=12866 pq=1/0 qp=1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
+  1 c=14407 g=14408 pq=1/0 qp=0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
+  2 c=14407 g=14408 pq=1/0 qp=0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
+  3 c=14407 g=14408 pq=1/0 qp=0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
+  4 c=14405 g=14406 pq=1/0 qp=1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
+  5!c=14168 g=14169 pq=1/0 qp=0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
+  6 c=14404 g=14405 pq=1/0 qp=0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
+  7 c=14407 g=14408 pq=1/0 qp=1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
 
 This is similar to the output discussed above, but contains the following
 additional fields:
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ed4f5939a452..04eb366ab7fa 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -331,6 +331,7 @@ static inline void rcu_init_nohz(void)
 extern struct srcu_struct tasks_rcu_exit_srcu;
 #define rcu_note_voluntary_context_switch(t) \
 	do { \
+		rcu_all_qs(); \
 		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
 			ACCESS_ONCE((t)->rcu_tasks_holdout) = false; \
 	} while (0)
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0e5366200154..fabd3fad8516 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -154,7 +154,10 @@ static inline bool rcu_is_watching(void)
 	return true;
 }
 
-
 #endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
 
+static inline void rcu_all_qs(void)
+{
+}
+
 #endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 52953790dcca..3344783af1f2 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -97,4 +97,10 @@ extern int rcu_scheduler_active __read_mostly;
 
 bool rcu_is_watching(void);
 
+DECLARE_PER_CPU(unsigned long, rcu_qs_ctr);
+static inline void rcu_all_qs(void)
+{
+	this_cpu_inc(rcu_qs_ctr);
+}
+
 #endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7680fc275036..38f0009b999b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -215,6 +215,9 @@ static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
 #endif /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
 };
 
+DEFINE_PER_CPU_SHARED_ALIGNED(unsigned long, rcu_qs_ctr);
+EXPORT_PER_CPU_SYMBOL_GPL(rcu_qs_ctr);
+
 /*
  * Let the RCU core know that this CPU has gone through the scheduler,
  * which is a quiescent state.  This is called when the need for a
@@ -1554,6 +1557,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		rdp->gpnum = rnp->gpnum;
 		trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
 		rdp->passed_quiesce = 0;
+		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		rdp->qs_pending = !!(rnp->qsmask & rdp->grpmask);
 		zero_cpu_stall_ticks(rdp);
 	}
@@ -2020,6 +2024,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 		 * within the current grace period.
 		 */
 		rdp->passed_quiesce = 0;	/* need qs for new gp. */
+		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
 	}
@@ -2064,7 +2069,8 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 	 * Was there a quiescent state since the beginning of the grace
 	 * period? If no, then exit and wait for the next call.
 	 */
-	if (!rdp->passed_quiesce)
+	if (!rdp->passed_quiesce &&
+	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr))
 		return;
 
 	/*
@@ -3109,9 +3115,12 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
 	if (rcu_scheduler_fully_active &&
-	    rdp->qs_pending && !rdp->passed_quiesce) {
+	    rdp->qs_pending && !rdp->passed_quiesce &&
+	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) {
 		rdp->n_rp_qs_pending++;
-	} else if (rdp->qs_pending && rdp->passed_quiesce) {
+	} else if (rdp->qs_pending &&
+		   (rdp->passed_quiesce ||
+		    rdp->rcu_qs_ctr_snap != __this_cpu_read(rcu_qs_ctr))) {
 		rdp->n_rp_report_qs++;
 		return 1;
 	}
@@ -3444,6 +3453,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 			rdp->gpnum = rnp->completed;
 			rdp->completed = rnp->completed;
 			rdp->passed_quiesce = 0;
+			rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 			rdp->qs_pending = 0;
 			trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
 		}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 8e7b1843896e..c259a0bc0d97 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -257,6 +257,8 @@ struct rcu_data {
 					/*  in order to detect GP end. */
 	unsigned long	gpnum;		/* Highest gp number that this CPU */
 					/*  is aware of having started. */
+	unsigned long	rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
+					/*  for rcu_all_qs() invocations. */
 	bool		passed_quiesce;	/* User-mode/idle loop etc. */
 	bool		qs_pending;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 5cdc62e1beeb..4ec028a9987a 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -115,11 +115,13 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 
 	if (!rdp->beenonline)
 		return;
-	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d qp=%d",
+	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d/%d qp=%d",
 		   rdp->cpu,
 		   cpu_is_offline(rdp->cpu) ? '!' : ' ',
 		   ulong2long(rdp->completed), ulong2long(rdp->gpnum),
-		   rdp->passed_quiesce, rdp->qs_pending);
+		   rdp->passed_quiesce,
+		   rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr),
+		   rdp->qs_pending);
 	seq_printf(m, " dt=%d/%llx/%d df=%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-14  3:14                                                                                       ` Al Viro
@ 2014-12-15  0:18                                                                                         ` Al Viro
  0 siblings, 0 replies; 486+ messages in thread
From: Al Viro @ 2014-12-15  0:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Thomas Gleixner

On Sun, Dec 14, 2014 at 03:14:29AM +0000, Al Viro wrote:
> On Sat, Dec 13, 2014 at 05:35:17PM -0800, Linus Torvalds wrote:
> > On Sat, Dec 13, 2014 at 4:33 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > So does SMP - this_cpu_dec() relies on preemption being disabled.
> > 
> > No. really. It very much does not. Not on x86, not elsewhere. It's
> > part of the whole point of "this_cpu_p()". They are preemption and
> > interrupt safe.
> > 
> > It's the "__this_cpu_op()" ones that need external protection.
> 
> Right you are - I really need to get some coffee...  Sorry...
> 
> FWIW, do we need to disable interrupts there?  After all, mnt_want_write()
> and mnt_drop_write() shouldn't be done from interrupt context - they can
> happen via schedule_delayed_work(), but that's it...

OK, having looked through the tree - we really don't need to bother with
disabling interrupts (fortunately - or UP case would be broken).  So how
about turning those into __this_cpu_{inc,dec} and yes, moving preempt
disabling into mnt_{inc,dec}_writers()?  Like this:

diff --git a/fs/namespace.c b/fs/namespace.c
index 5b66b2b..48cb162 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -274,20 +274,32 @@ EXPORT_SYMBOL_GPL(__mnt_is_readonly);
 
 static inline void mnt_inc_writers(struct mount *mnt)
 {
+	preempt_disable();
 #ifdef CONFIG_SMP
-	this_cpu_inc(mnt->mnt_pcp->mnt_writers);
+	__this_cpu_inc(mnt->mnt_pcp->mnt_writers);
 #else
 	mnt->mnt_writers++;
 #endif
+	preempt_enable();
 }
 
-static inline void mnt_dec_writers(struct mount *mnt)
+/**
+ * __mnt_drop_write - give up write access to a mount
+ * @mnt: the mount on which to give up write access
+ *
+ * Tells the low-level filesystem that we are done
+ * performing writes to it.  Must be matched with
+ * __mnt_want_write() call above.
+ */
+void __mnt_drop_write(struct vfsmount *m)
 {
+	preempt_disable();
 #ifdef CONFIG_SMP
-	this_cpu_dec(mnt->mnt_pcp->mnt_writers);
+	__this_cpu_dec(real_mount(m)->mnt_pcp->mnt_writers);
 #else
-	mnt->mnt_writers--;
+	real_mount(m)->mnt_writers--;
 #endif
+	preempt_enable();
 }
 
 static unsigned int mnt_get_writers(struct mount *mnt)
@@ -336,7 +348,6 @@ int __mnt_want_write(struct vfsmount *m)
 	struct mount *mnt = real_mount(m);
 	int ret = 0;
 
-	preempt_disable();
 	mnt_inc_writers(mnt);
 	/*
 	 * The store to mnt_inc_writers must be visible before we pass
@@ -353,10 +364,9 @@ int __mnt_want_write(struct vfsmount *m)
 	 */
 	smp_rmb();
 	if (mnt_is_readonly(m)) {
-		mnt_dec_writers(mnt);
+		__mnt_drop_write(m);
 		ret = -EROFS;
 	}
-	preempt_enable();
 
 	return ret;
 }
@@ -399,9 +409,7 @@ int mnt_clone_write(struct vfsmount *mnt)
 	/* superblock may be r/o */
 	if (__mnt_is_readonly(mnt))
 		return -EROFS;
-	preempt_disable();
 	mnt_inc_writers(real_mount(mnt));
-	preempt_enable();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(mnt_clone_write);
@@ -441,21 +449,6 @@ int mnt_want_write_file(struct file *file)
 EXPORT_SYMBOL_GPL(mnt_want_write_file);
 
 /**
- * __mnt_drop_write - give up write access to a mount
- * @mnt: the mount on which to give up write access
- *
- * Tells the low-level filesystem that we are done
- * performing writes to it.  Must be matched with
- * __mnt_want_write() call above.
- */
-void __mnt_drop_write(struct vfsmount *mnt)
-{
-	preempt_disable();
-	mnt_dec_writers(real_mount(mnt));
-	preempt_enable();
-}
-
-/**
  * mnt_drop_write - give up write access to a mount
  * @mnt: the mount on which to give up write access
  *

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-14 23:46                                                                       ` Dave Jones
@ 2014-12-15  0:38                                                                         ` Linus Torvalds
  2014-12-15  0:42                                                                           ` Dave Jones
                                                                                             ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-15  0:38 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List
  Cc: Suresh Siddha, Oleg Nesterov, Peter Anvin

On Sun, Dec 14, 2014 at 3:46 PM, Dave Jones <davej@redhat.com> wrote:
> On Sat, Dec 13, 2014 at 02:40:51PM -0800, Linus Torvalds wrote:
>  > On Sat, Dec 13, 2014 at 2:36 PM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > Ok, I think we can rule out preemption. I just checked on it, and
>  > > found it wedged.
>  >
>  > Ok, one more. Mind checking what happens without CONFIG_DEBUG_PAGEALLOC?
>
> Crap. Looks like it wedged. It's stuck that way until I get back to it
> on Wednesday.

Hey, this is not "crap" at all. Quite the reverse. I think you may
have hit on the real bug now, and it's possible that the
DEBUG_PAGEALLOC code was hiding it because it was trying to handle the
page fault.

Or something.

Anyway, this time your backtrace is *interesting*. It's showing
something that looks real. Namely "save_xstate_sig" apparently taking
a page fault. And unlike your earlier traces, now all the different
CPU traces show very similar things, which again is something that
makes a lot more sense than your previous lockups have.

That said, maybe I'm just being optimistic, because while the NMI
watchdog messages now look ostensibly much saner, I'm not actually
seeing what's really going on. But at least this time I *could*
imagine that it's something like infinitely taking a page fault in
save_xstate_sig. This is some pretty special code, with the whole FPU
save state handling  being one mess of random really subtle issues
with FPU exceptions, page faults, delayed allocation, yadda yadda.

And I could fairly easily imagine endless page faults due to the
exception table, or even endless signal handling loops due to getting
a signal while trying to handle a signal. Both things that would
actually reasonably result in a watchdog.

So I'm adding some x86 FPU save people to the cc.

Can anybody make sense of that backtrace, keeping in mind that we're
looking for some kind of endless loop where we don't make progress?

There's more in the original email (see on lkml if you haven't seen
the thread earlier already), but they look similar with that whole
do_signal -> save_xstate_sig -> do_page_fault thing just on other
CPU's.

DaveJ, do you have the kernel image for this? I'd love to see what the
code is around that "save_xstate_sig+0x81" or around those
__clear_user+0x17/0x36 points...

                          Linus


> [ 6188.985536] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c175:14205]
> [ 6188.985612] CPU: 1 PID: 14205 Comm: trinity-c175 Not tainted 3.18.0+ #103 [loadavg: 200.63 151.07 150.40 179/407 17316]
> [ 6188.985652] task: ffff880056ac96d0 ti: ffff8800975d8000 task.ti: ffff8800975d8000
> [ 6188.985680] RIP: 0010:[<ffffffff810c6430>]  [<ffffffff810c6430>] lock_release+0xc0/0x240
> [ 6188.985988] Stack:
> [ 6188.986101] Call Trace:
> [ 6188.986116]  [<ffffffff8116f928>] __perf_sw_event+0x168/0x240
> [ 6188.987079]  [<ffffffff8116f842>] ? __perf_sw_event+0x82/0x240
> [ 6188.988045]  [<ffffffff81178ab2>] ? __lock_page_or_retry+0xb2/0xc0
> [ 6188.989008]  [<ffffffff811a68f8>] ? handle_mm_fault+0x458/0xe90
> [ 6188.989986]  [<ffffffff8104250e>] __do_page_fault+0x28e/0x5c0
> [ 6188.990940]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 6188.991884]  [<ffffffff8107c25d>] ? __do_softirq+0x1ed/0x310
> [ 6188.992826]  [<ffffffff817d09e0>] ? retint_restore_args+0xe/0xe
> [ 6188.993773]  [<ffffffff8137511d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [ 6188.994715]  [<ffffffff8104284c>] do_page_fault+0xc/0x10
> [ 6188.995658]  [<ffffffff817d1a32>] page_fault+0x22/0x30
> [ 6188.996590]  [<ffffffff81375266>] ? __clear_user+0x36/0x60
> [ 6188.997518]  [<ffffffff81375247>] ? __clear_user+0x17/0x60
> [ 6188.998440]  [<ffffffff8100f3f1>] save_xstate_sig+0x81/0x220
> [ 6188.999362]  [<ffffffff817cf1cf>] ? _raw_spin_unlock_irqrestore+0x4f/0x60
> [ 6189.000291]  [<ffffffff810029e7>] do_signal+0x5c7/0x740
> [ 6189.001220]  [<ffffffff81209acf>] ? mnt_drop_write+0x2f/0x40
> [ 6189.002164]  [<ffffffff811e527e>] ? chmod_common+0xfe/0x150
> [ 6189.003096]  [<ffffffff81002bc5>] do_notify_resume+0x65/0x80
> [ 6189.004038]  [<ffffffff813750de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 6189.004972]  [<ffffffff817d00ff>] int_signal+0x12/0x17
> [ 6189.005899] Code: ff 0f 85 7c 00 00 00 4c 89 ea 4c 89 e6 48 89 df e8 26 fc ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 41 56 9d <48> 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 65 ff 04 25 e0

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  0:38                                                                         ` Linus Torvalds
@ 2014-12-15  0:42                                                                           ` Dave Jones
  2014-12-15  5:47                                                                           ` Linus Torvalds
  2014-12-17 18:22                                                                           ` frequent lockups in 3.18rc4 Dave Jones
  2 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-15  0:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 14, 2014 at 04:38:00PM -0800, Linus Torvalds wrote:
 > On Sun, Dec 14, 2014 at 3:46 PM, Dave Jones <davej@redhat.com> wrote:
 > > On Sat, Dec 13, 2014 at 02:40:51PM -0800, Linus Torvalds wrote:
 > >  > On Sat, Dec 13, 2014 at 2:36 PM, Dave Jones <davej@redhat.com> wrote:
 > >  > >
 > >  > > Ok, I think we can rule out preemption. I just checked on it, and
 > >  > > found it wedged.
 > >  >
 > >  > Ok, one more. Mind checking what happens without CONFIG_DEBUG_PAGEALLOC?
 > >
 > > Crap. Looks like it wedged. It's stuck that way until I get back to it
 > > on Wednesday.
 > 
 > Hey, this is not "crap" at all. Quite the reverse.

It was more a "Crap it's wedged and I can't do anything about it for days".
I didn't look closely at the actual traces this time.

 > DaveJ, do you have the kernel image for this? I'd love to see what the
 > code is around that "save_xstate_sig+0x81" or around those
 > __clear_user+0x17/0x36 points...
 
Not until I get back on Wednesday and reboot it.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  0:11                                                                                 ` Paul E. McKenney
@ 2014-12-15  1:20                                                                                   ` Sasha Levin
  2014-12-15  6:33                                                                                     ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-15  1:20 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/14/2014 07:11 PM, Paul E. McKenney wrote:
>> Does it depend on anything not currently in -next? My build fails with
>> > 
>> > kernel/rcu/tree.c: In function ‘rcu_report_qs_rdp’:
>> > kernel/rcu/tree.c:2099:6: error: ‘struct rcu_data’ has no member named ‘gpwrap’
>> >    rdp->gpwrap) {
> Indeed it does.  Please see below for a port to current mainline.

With the patch:


[  620.340045] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  620.341087]  (detected by 22, t=8407 jiffies, g=10452, c=10451, q=3622)
[  620.342154] All QSes seen, last rcu_preempt kthread activity 4294990929/4294999330, jiffies_till_next_fqs=1
[  643.710049] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  643.710073] INFO: rcu_sched detected stalls on CPUs/tasks:
[  643.710093]  0: (6 ticks this GP) idle=bd5/140000000000002/0 softirq=12421/12421 fqs=0 last_accelerate: 5283/8643, nonlazy_posted: 643841, ..
[  643.710110]  (detected by 1, t=2102 jiffies, g=-292, c=-293, q=0)
[  643.710112] Task dump for CPU 0:
[  643.710129] kworker/0:1     R  running task    13016   628      2 0x10080008
[  643.710148] Workqueue: events vmstat_update
[  643.710156]  ffffffffb0301dc4 ffff88006be15000 ffff880060ba1c70 0000000000000000
[  643.710161]  ffff88006be10680 ffff8800633efde8 ffffffffa0461f1b ffff88006a776000
[  643.710166]  ffff880060ba1cb8 ffff880060ba1c78 ffff880060ba1c80 ffff880060ba1c90
[  643.710168] Call Trace:
[  643.710181]  [<ffffffffb0301dc4>] ? _raw_spin_unlock_irq+0x64/0x200
[  643.710191]  [<ffffffffa0461f1b>] ? process_one_work+0x5fb/0x1660
[  643.710197]  [<ffffffffa0463545>] ? worker_thread+0x5c5/0x1680
[  643.710205]  [<ffffffffb02f0b6f>] ? __schedule+0xf6f/0x2fc0
[  643.710211]  [<ffffffffa0462f80>] ? process_one_work+0x1660/0x1660
[  643.710216]  [<ffffffffa047ae22>] ? kthread+0x1f2/0x2b0
[  643.710221]  [<ffffffffa047ac30>] ? kthread_worker_fn+0x6a0/0x6a0
[  643.710226]  [<ffffffffb030243c>] ? ret_from_fork+0x7c/0xb0
[  643.710233]  [<ffffffffa047ac30>] ? kthread_worker_fn+0x6a0/0x6a0
[  643.711486]
[  643.711486]  (detected by 22, t=2104 jiffies, g=10453, c=10452, q=1570)
[  643.711486] All QSes seen, last rcu_preempt kthread activity 4294999565/4295001669, jiffies_till_next_fqs=1


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  0:38                                                                         ` Linus Torvalds
  2014-12-15  0:42                                                                           ` Dave Jones
@ 2014-12-15  5:47                                                                           ` Linus Torvalds
  2014-12-15  5:57                                                                             ` Dave Jones
                                                                                               ` (2 more replies)
  2014-12-17 18:22                                                                           ` frequent lockups in 3.18rc4 Dave Jones
  2 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-15  5:47 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List
  Cc: Suresh Siddha, Oleg Nesterov, Peter Anvin

On Sun, Dec 14, 2014 at 4:38 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Can anybody make sense of that backtrace, keeping in mind that we're
> looking for some kind of endless loop where we don't make progress?

So looking at all the backtraces, which is kind of messy because
there's some missing data (presumably buffers overflowed from all the
CPU's printing at the same time), it looks  like:

 - CPU 0 is missing. No idea why.
 - CPU's 1-3 all have the same trace for

    int_signal ->
    do_notify_resume ->
    do_signal ->
      ....
    page_fault ->
    do_page_fault

and "save_xstate_sig+0x81" shows up on all stacks, although only on
CPU1 does it show up as a "guaranteed" part of the stack chain (ie it
matches frame pointer data too). CPU1 also has that __clear_user show
up (which is called from save_xstate_sig), but not other CPU's.  CPU2
and CPU3 have "save_xstate_sig+0x98" in addition to that +0x81 thing.

My guess is that "save_xstate_sig+0x81" is the instruction after the
__clear_user call, and that CPU1 took the fault in __clear_user(),
while CPU2 and CPU3 took the fault at "save_xstate_sig+0x98" instead,
which I'd guess is the

        xsave64 (%rdi)

and in fact, with CONFIG_FTRACE on, my own kernel build gives exactly
those two offsets for those things in save_xstate_sig().

So I'm pretty certain that on all three CPU's, we had page faults for
save_xstate_sig() accessing user space, with the only difference being
that on CPU1 it happened from __clear_user, while on CPU's 2/3 it
happened on the xsaveq instruction itself.

That sounds like much more than coincidence. I have no idea where CPU0
is hiding, and all CPU's were at different stages of actually handling
the fault, but that's to be expected if the page fault just keeps
repeating.

In fact, CPU2 shows up three different times, and the call trace
changes in between, so it's "making progress", just never getting out
of that loop. The traces are

    pagecache_get_page+0x0/0x220
    ? lookup_swap_cache+0x2a/0x70
    handle_mm_fault+0x401/0xe90
    ? __do_page_fault+0x198/0x5c0
    __do_page_fault+0x1fc/0x5c0
    ? trace_hardirqs_on_thunk+0x3a/0x3f
    ? __do_softirq+0x1ed/0x310
    ? retint_restore_args+0xe/0xe
    ? trace_hardirqs_off_thunk+0x3a/0x3c
    do_page_fault+0xc/0x10
    page_fault+0x22/0x30
    ? save_xstate_sig+0x98/0x220
    ? save_xstate_sig+0x81/0x220
    do_signal+0x5c7/0x740
    ? _raw_spin_unlock_irq+0x30/0x40
    do_notify_resume+0x65/0x80
    ? trace_hardirqs_on_thunk+0x3a/0x3f
    int_signal+0x12/0x17

and

    ? __lock_acquire.isra.31+0x22c/0x9f0
    ? lock_acquire+0xb4/0x120
    ? __do_page_fault+0x198/0x5c0
    down_read_trylock+0x5a/0x60
    ? __do_page_fault+0x198/0x5c0
    __do_page_fault+0x198/0x5c0
    ? __do_softirq+0x1ed/0x310
    ? retint_restore_args+0xe/0xe
    ? __do_page_fault+0xd8/0x5c0
    ? trace_hardirqs_off_thunk+0x3a/0x3c
    do_page_fault+0xc/0x10
    page_fault+0x22/0x30
    ? save_xstate_sig+0x98/0x220
    ? save_xstate_sig+0x81/0x220
    do_signal+0x5c7/0x740
    ? _raw_spin_unlock_irq+0x30/0x40
    do_notify_resume+0x65/0x80
    ? trace_hardirqs_on_thunk+0x3a/0x3f
    int_signal+0x12/0x17

and

    lock_acquire+0x40/0x120
    down_read_trylock+0x5a/0x60
    ? __do_page_fault+0x198/0x5c0
    __do_page_fault+0x198/0x5c0
    ? trace_hardirqs_on_thunk+0x3a/0x3f
    ? trace_hardirqs_on_thunk+0x3a/0x3f
    ? __do_softirq+0x1ed/0x310
    ? retint_restore_args+0xe/0xe
    ? trace_hardirqs_off_thunk+0x3a/0x3c
    do_page_fault+0xc/0x10
    page_fault+0x22/0x30
    ? save_xstate_sig+0x98/0x220
    ? save_xstate_sig+0x81/0x220
    do_signal+0x5c7/0x740
    ? _raw_spin_unlock_irq+0x30/0x40
    do_notify_resume+0x65/0x80
    ? trace_hardirqs_on_thunk+0x3a/0x3f
    int_signal+0x12/0x17

so it's always in __do_page_fault, but at sometimes it has gotten into
handle_mm_fault too. So it really really looks like it is taking an
endless stream of page faults on that "xsaveq" instruction. Presumably
the page faulting never actually makes any progress, even though it
*thinks* the page tables are fine.

DaveJ - you've seen that "endless page faults" behavior before. You
had a few traces that showed it. That was in that whole "pipe/page
fault oddness." email thread, where you would get endless faults in
copy_page_to_iter() with an error_code=0x2.

That was the one where I chased it down to "page table entry must be
marked with _PAGE_PROTNONE", but VM_WRITE in the vma, because your
machine was alive enough that you got traces out of the endless loop.

Very odd.

              Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  5:47                                                                           ` Linus Torvalds
@ 2014-12-15  5:57                                                                             ` Dave Jones
  2014-12-15 18:21                                                                               ` Linus Torvalds
  2014-12-15 14:00                                                                             ` Borislav Petkov
  2014-12-18 21:17                                                                             ` save_xstate_sig (Re: frequent lockups in 3.18rc4) Andy Lutomirski
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-15  5:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 14, 2014 at 09:47:26PM -0800, Linus Torvalds wrote:

 > so it's always in __do_page_fault, but at sometimes it has gotten into
 > handle_mm_fault too. So it really really looks like it is taking an
 > endless stream of page faults on that "xsaveq" instruction. Presumably
 > the page faulting never actually makes any progress, even though it
 > *thinks* the page tables are fine.
 > 
 > DaveJ - you've seen that "endless page faults" behavior before. You
 > had a few traces that showed it. That was in that whole "pipe/page
 > fault oddness." email thread, where you would get endless faults in
 > copy_page_to_iter() with an error_code=0x2.
 > 
 > That was the one where I chased it down to "page table entry must be
 > marked with _PAGE_PROTNONE", but VM_WRITE in the vma, because your
 > machine was alive enough that you got traces out of the endless loop.

We had a flashback to that old bug last month too.
See this mail & your followup. : https://lkml.org/lkml/2014/11/25/1171
That was during a bisect though, so may have been something
entirely different, but it is a spooky coincidence.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  1:20                                                                                   ` Sasha Levin
@ 2014-12-15  6:33                                                                                     ` Paul E. McKenney
  2014-12-15 12:56                                                                                       ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-15  6:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Sun, Dec 14, 2014 at 08:20:13PM -0500, Sasha Levin wrote:
> On 12/14/2014 07:11 PM, Paul E. McKenney wrote:
> >> Does it depend on anything not currently in -next? My build fails with
> >> > 
> >> > kernel/rcu/tree.c: In function ‘rcu_report_qs_rdp’:
> >> > kernel/rcu/tree.c:2099:6: error: ‘struct rcu_data’ has no member named ‘gpwrap’
> >> >    rdp->gpwrap) {
> > Indeed it does.  Please see below for a port to current mainline.
> 
> With the patch:
> 
> 
> [  620.340045] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  620.341087]  (detected by 22, t=8407 jiffies, g=10452, c=10451, q=3622)
> [  620.342154] All QSes seen, last rcu_preempt kthread activity 4294990929/4294999330, jiffies_till_next_fqs=1

OK, 8401 jiffies since the grace-period kthread ran.  This is without
the "Run grace-period kthreads at real-time priority" patch?

> [  643.710049] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  643.710073] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  643.710093]  0: (6 ticks this GP) idle=bd5/140000000000002/0 softirq=12421/12421 fqs=0 last_accelerate: 5283/8643, nonlazy_posted: 643841, ..
> [  643.710110]  (detected by 1, t=2102 jiffies, g=-292, c=-293, q=0)

But this one is real.

> [  643.710112] Task dump for CPU 0:
> [  643.710129] kworker/0:1     R  running task    13016   628      2 0x10080008
> [  643.710148] Workqueue: events vmstat_update
> [  643.710156]  ffffffffb0301dc4 ffff88006be15000 ffff880060ba1c70 0000000000000000
> [  643.710161]  ffff88006be10680 ffff8800633efde8 ffffffffa0461f1b ffff88006a776000
> [  643.710166]  ffff880060ba1cb8 ffff880060ba1c78 ffff880060ba1c80 ffff880060ba1c90
> [  643.710168] Call Trace:
> [  643.710181]  [<ffffffffb0301dc4>] ? _raw_spin_unlock_irq+0x64/0x200
> [  643.710191]  [<ffffffffa0461f1b>] ? process_one_work+0x5fb/0x1660
> [  643.710197]  [<ffffffffa0463545>] ? worker_thread+0x5c5/0x1680
> [  643.710205]  [<ffffffffb02f0b6f>] ? __schedule+0xf6f/0x2fc0
> [  643.710211]  [<ffffffffa0462f80>] ? process_one_work+0x1660/0x1660
> [  643.710216]  [<ffffffffa047ae22>] ? kthread+0x1f2/0x2b0
> [  643.710221]  [<ffffffffa047ac30>] ? kthread_worker_fn+0x6a0/0x6a0
> [  643.710226]  [<ffffffffb030243c>] ? ret_from_fork+0x7c/0xb0
> [  643.710233]  [<ffffffffa047ac30>] ? kthread_worker_fn+0x6a0/0x6a0

Which in theory should have been addressed by the "Make
cond_resched_rcu_qs() apply to normal RCU flavors" patch,
given that this is CPU 0, which should be taking scheduling
clock interrupts.  Well, I guess that theory and practice
are only the same in theory.  :-/

Will dig into it more.

							Thanx, Paul

> [  643.711486]
> [  643.711486]  (detected by 22, t=2104 jiffies, g=10453, c=10452, q=1570)
> [  643.711486] All QSes seen, last rcu_preempt kthread activity 4294999565/4295001669, jiffies_till_next_fqs=1
> 
> 
> Thanks,
> Sasha
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-12 12:58                                                             ` Martin van Es
@ 2014-12-15 12:07                                                               ` Martin van Es
  0 siblings, 0 replies; 486+ messages in thread
From: Martin van Es @ 2014-12-15 12:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Fri, Dec 12, 2014 at 1:58 PM, Martin van Es <mrvanes@gmail.com> wrote:
> On Sat, Dec 6, 2014 at 9:09 PM, Linus Torvalds
> I will give 3.18 a try on production J1900. Knowing I can go back to
> safety in 3.16.7 won't hurt too much of my reputation I hope.

3.18 froze twice (just to be sure) as well. Will commence the slow and
painful bisect between 3.16.7 and 3.17rc1 after I've established that
the latter freezes reliably.
I assume my problems are not linked to the RCU problems described in
this thread since I have no logging in kernel and different symptoms?
Should I start a new thread, or maybe better file a bug and report
progress on the bisect there? I won't be too noisy on the list with my
findings untill I'm done though.

Best regards,
Martin
--
If 'but' was any useful, it would be a logic operator

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  6:33                                                                                     ` Paul E. McKenney
@ 2014-12-15 12:56                                                                                       ` Paul E. McKenney
  2014-12-15 13:16                                                                                         ` Sasha Levin
  0 siblings, 1 reply; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-15 12:56 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Sun, Dec 14, 2014 at 10:33:31PM -0800, Paul E. McKenney wrote:
> On Sun, Dec 14, 2014 at 08:20:13PM -0500, Sasha Levin wrote:
> > On 12/14/2014 07:11 PM, Paul E. McKenney wrote:
> > >> Does it depend on anything not currently in -next? My build fails with
> > >> > 
> > >> > kernel/rcu/tree.c: In function ‘rcu_report_qs_rdp’:
> > >> > kernel/rcu/tree.c:2099:6: error: ‘struct rcu_data’ has no member named ‘gpwrap’
> > >> >    rdp->gpwrap) {
> > > Indeed it does.  Please see below for a port to current mainline.
> > 
> > With the patch:
> > 
> > 
> > [  620.340045] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [  620.341087]  (detected by 22, t=8407 jiffies, g=10452, c=10451, q=3622)
> > [  620.342154] All QSes seen, last rcu_preempt kthread activity 4294990929/4294999330, jiffies_till_next_fqs=1
> 
> OK, 8401 jiffies since the grace-period kthread ran.  This is without
> the "Run grace-period kthreads at real-time priority" patch?
> 
> > [  643.710049] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [  643.710073] INFO: rcu_sched detected stalls on CPUs/tasks:
> > [  643.710093]  0: (6 ticks this GP) idle=bd5/140000000000002/0 softirq=12421/12421 fqs=0 last_accelerate: 5283/8643, nonlazy_posted: 643841, ..
> > [  643.710110]  (detected by 1, t=2102 jiffies, g=-292, c=-293, q=0)
> 
> But this one is real.
> 
> > [  643.710112] Task dump for CPU 0:
> > [  643.710129] kworker/0:1     R  running task    13016   628      2 0x10080008
> > [  643.710148] Workqueue: events vmstat_update
> > [  643.710156]  ffffffffb0301dc4 ffff88006be15000 ffff880060ba1c70 0000000000000000
> > [  643.710161]  ffff88006be10680 ffff8800633efde8 ffffffffa0461f1b ffff88006a776000
> > [  643.710166]  ffff880060ba1cb8 ffff880060ba1c78 ffff880060ba1c80 ffff880060ba1c90
> > [  643.710168] Call Trace:
> > [  643.710181]  [<ffffffffb0301dc4>] ? _raw_spin_unlock_irq+0x64/0x200
> > [  643.710191]  [<ffffffffa0461f1b>] ? process_one_work+0x5fb/0x1660
> > [  643.710197]  [<ffffffffa0463545>] ? worker_thread+0x5c5/0x1680
> > [  643.710205]  [<ffffffffb02f0b6f>] ? __schedule+0xf6f/0x2fc0
> > [  643.710211]  [<ffffffffa0462f80>] ? process_one_work+0x1660/0x1660
> > [  643.710216]  [<ffffffffa047ae22>] ? kthread+0x1f2/0x2b0
> > [  643.710221]  [<ffffffffa047ac30>] ? kthread_worker_fn+0x6a0/0x6a0
> > [  643.710226]  [<ffffffffb030243c>] ? ret_from_fork+0x7c/0xb0
> > [  643.710233]  [<ffffffffa047ac30>] ? kthread_worker_fn+0x6a0/0x6a0
> 
> Which in theory should have been addressed by the "Make
> cond_resched_rcu_qs() apply to normal RCU flavors" patch,
> given that this is CPU 0, which should be taking scheduling
> clock interrupts.  Well, I guess that theory and practice
> are only the same in theory.  :-/
> 
> Will dig into it more.

And maybe it would help if I did the CONFIG_TASKS_RCU=n case as well as
the CONFIG_TASKS_RCU=y case.  Please see below for an updated patch.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors

Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index b63b9bb3bc0c..08651da15448 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -56,14 +56,14 @@ rcuboost:
 
 The output of "cat rcu/rcu_preempt/rcudata" looks as follows:
 
-  0!c=30455 g=30456 pq=1 qp=1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
-  1!c=30719 g=30720 pq=1 qp=0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
-  2!c=30150 g=30151 pq=1 qp=1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
-  3 c=31249 g=31250 pq=1 qp=0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
-  4!c=29502 g=29503 pq=1 qp=1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
-  5 c=31201 g=31202 pq=1 qp=1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
-  6!c=30253 g=30254 pq=1 qp=1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
-  7 c=31178 g=31178 pq=1 qp=0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
+  0!c=30455 g=30456 pq=1/0 qp=1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716
+  1!c=30719 g=30720 pq=1/0 qp=0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982
+  2!c=30150 g=30151 pq=1/1 qp=1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458
+  3 c=31249 g=31250 pq=1/1 qp=0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622
+  4!c=29502 g=29503 pq=1/0 qp=1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521
+  5 c=31201 g=31202 pq=1/0 qp=1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698
+  6!c=30253 g=30254 pq=1/0 qp=1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353
+  7 c=31178 g=31178 pq=1/0 qp=0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969
 
 This file has one line per CPU, or eight for this 8-CPU system.
 The fields are as follows:
@@ -188,14 +188,14 @@ o	"ca" is the number of RCU callbacks that have been adopted by this
 Kernels compiled with CONFIG_RCU_BOOST=y display the following from
 /debug/rcu/rcu_preempt/rcudata:
 
-  0!c=12865 g=12866 pq=1 qp=1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
-  1 c=14407 g=14408 pq=1 qp=0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
-  2 c=14407 g=14408 pq=1 qp=0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
-  3 c=14407 g=14408 pq=1 qp=0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
-  4 c=14405 g=14406 pq=1 qp=1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
-  5!c=14168 g=14169 pq=1 qp=0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
-  6 c=14404 g=14405 pq=1 qp=0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
-  7 c=14407 g=14408 pq=1 qp=1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
+  0!c=12865 g=12866 pq=1/0 qp=1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871
+  1 c=14407 g=14408 pq=1/0 qp=0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485
+  2 c=14407 g=14408 pq=1/0 qp=0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490
+  3 c=14407 g=14408 pq=1/0 qp=0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290
+  4 c=14405 g=14406 pq=1/0 qp=1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114
+  5!c=14168 g=14169 pq=1/0 qp=0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722
+  6 c=14404 g=14405 pq=1/0 qp=0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811
+  7 c=14407 g=14408 pq=1/0 qp=1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042
 
 This is similar to the output discussed above, but contains the following
 additional fields:
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ed4f5939a452..aa894d4fe375 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -331,12 +331,13 @@ static inline void rcu_init_nohz(void)
 extern struct srcu_struct tasks_rcu_exit_srcu;
 #define rcu_note_voluntary_context_switch(t) \
 	do { \
+		rcu_all_qs(); \
 		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
 			ACCESS_ONCE((t)->rcu_tasks_holdout) = false; \
 	} while (0)
 #else /* #ifdef CONFIG_TASKS_RCU */
 #define TASKS_RCU(x) do { } while (0)
-#define rcu_note_voluntary_context_switch(t)	do { } while (0)
+#define rcu_note_voluntary_context_switch(t)	do { rcu_do_qs(); } while (0)
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 /**
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0e5366200154..fabd3fad8516 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -154,7 +154,10 @@ static inline bool rcu_is_watching(void)
 	return true;
 }
 
-
 #endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
 
+static inline void rcu_all_qs(void)
+{
+}
+
 #endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 52953790dcca..3344783af1f2 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -97,4 +97,10 @@ extern int rcu_scheduler_active __read_mostly;
 
 bool rcu_is_watching(void);
 
+DECLARE_PER_CPU(unsigned long, rcu_qs_ctr);
+static inline void rcu_all_qs(void)
+{
+	this_cpu_inc(rcu_qs_ctr);
+}
+
 #endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7680fc275036..38f0009b999b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -215,6 +215,9 @@ static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
 #endif /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
 };
 
+DEFINE_PER_CPU_SHARED_ALIGNED(unsigned long, rcu_qs_ctr);
+EXPORT_PER_CPU_SYMBOL_GPL(rcu_qs_ctr);
+
 /*
  * Let the RCU core know that this CPU has gone through the scheduler,
  * which is a quiescent state.  This is called when the need for a
@@ -1554,6 +1557,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		rdp->gpnum = rnp->gpnum;
 		trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
 		rdp->passed_quiesce = 0;
+		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		rdp->qs_pending = !!(rnp->qsmask & rdp->grpmask);
 		zero_cpu_stall_ticks(rdp);
 	}
@@ -2020,6 +2024,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
 		 * within the current grace period.
 		 */
 		rdp->passed_quiesce = 0;	/* need qs for new gp. */
+		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
 	}
@@ -2064,7 +2069,8 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 	 * Was there a quiescent state since the beginning of the grace
 	 * period? If no, then exit and wait for the next call.
 	 */
-	if (!rdp->passed_quiesce)
+	if (!rdp->passed_quiesce &&
+	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr))
 		return;
 
 	/*
@@ -3109,9 +3115,12 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
 	if (rcu_scheduler_fully_active &&
-	    rdp->qs_pending && !rdp->passed_quiesce) {
+	    rdp->qs_pending && !rdp->passed_quiesce &&
+	    rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr)) {
 		rdp->n_rp_qs_pending++;
-	} else if (rdp->qs_pending && rdp->passed_quiesce) {
+	} else if (rdp->qs_pending &&
+		   (rdp->passed_quiesce ||
+		    rdp->rcu_qs_ctr_snap != __this_cpu_read(rcu_qs_ctr))) {
 		rdp->n_rp_report_qs++;
 		return 1;
 	}
@@ -3444,6 +3453,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 			rdp->gpnum = rnp->completed;
 			rdp->completed = rnp->completed;
 			rdp->passed_quiesce = 0;
+			rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
 			rdp->qs_pending = 0;
 			trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
 		}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 8e7b1843896e..c259a0bc0d97 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -257,6 +257,8 @@ struct rcu_data {
 					/*  in order to detect GP end. */
 	unsigned long	gpnum;		/* Highest gp number that this CPU */
 					/*  is aware of having started. */
+	unsigned long	rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
+					/*  for rcu_all_qs() invocations. */
 	bool		passed_quiesce;	/* User-mode/idle loop etc. */
 	bool		qs_pending;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 5cdc62e1beeb..4ec028a9987a 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -115,11 +115,13 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 
 	if (!rdp->beenonline)
 		return;
-	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d qp=%d",
+	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d/%d qp=%d",
 		   rdp->cpu,
 		   cpu_is_offline(rdp->cpu) ? '!' : ' ',
 		   ulong2long(rdp->completed), ulong2long(rdp->gpnum),
-		   rdp->passed_quiesce, rdp->qs_pending);
+		   rdp->passed_quiesce,
+		   rdp->rcu_qs_ctr_snap == __this_cpu_read(rcu_qs_ctr),
+		   rdp->qs_pending);
 	seq_printf(m, " dt=%d/%llx/%d df=%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15 12:56                                                                                       ` Paul E. McKenney
@ 2014-12-15 13:16                                                                                         ` Sasha Levin
  2014-12-16  3:40                                                                                           ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-15 13:16 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On 12/15/2014 07:56 AM, Paul E. McKenney wrote:
> And maybe it would help if I did the CONFIG_TASKS_RCU=n case as well as
> the CONFIG_TASKS_RCU=y case.  Please see below for an updated patch.

I do have CONFIG_TASKS_RCU=y


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  5:47                                                                           ` Linus Torvalds
  2014-12-15  5:57                                                                             ` Dave Jones
@ 2014-12-15 14:00                                                                             ` Borislav Petkov
  2014-12-18 21:17                                                                             ` save_xstate_sig (Re: frequent lockups in 3.18rc4) Andy Lutomirski
  2 siblings, 0 replies; 486+ messages in thread
From: Borislav Petkov @ 2014-12-15 14:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 14, 2014 at 09:47:26PM -0800, Linus Torvalds wrote:
> and "save_xstate_sig+0x81" shows up on all stacks, although only on
> CPU1 does it show up as a "guaranteed" part of the stack chain (ie it
> matches frame pointer data too). CPU1 also has that __clear_user show
> up (which is called from save_xstate_sig), but not other CPU's.  CPU2
> and CPU3 have "save_xstate_sig+0x98" in addition to that +0x81 thing.
> 
> My guess is that "save_xstate_sig+0x81" is the instruction after the
> __clear_user call, and that CPU1 took the fault in __clear_user(),
> while CPU2 and CPU3 took the fault at "save_xstate_sig+0x98" instead,
> which I'd guess is the
> 
>         xsave64 (%rdi)

Err, maybe a wild guess, but could XSAVE be encountering some problems,
like store ordering violations or somesuch?

Quick search shows

"AZ72. Store Ordering Violation When Using XSAVE"

here http://download.intel.com/design/mobile/specupdt/320121.pdf which
talks about SSE context stores happening out of order. Now, there are a
lot of IFs like does Dave's machine even have the erratum and even if,
would that erratum cause some sort of a livelock leading to the kernel
lockups and so on and so on...

It might be worth to rule out though.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  5:57                                                                             ` Dave Jones
@ 2014-12-15 18:21                                                                               ` Linus Torvalds
  2014-12-15 23:46                                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-15 18:21 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

On Sun, Dec 14, 2014 at 9:57 PM, Dave Jones <davej@redhat.com> wrote:
>
> We had a flashback to that old bug last month too.
> See this mail & your followup. : https://lkml.org/lkml/2014/11/25/1171
> That was during a bisect though, so may have been something
> entirely different, but it is a spooky coincidence.

Yeah, there's something funny going on there.

Anyway, I've looked at the page fault patch, and I mentioned this last
time it came up: there's a nasty possible kernel loop in the "retry"
case if there's also a fatal signal pending, and we're returning to
kernel mode rather than returning to user mode.

If we return to user mode, the return will handle signals, and we'll
kill the process due to the fatal pending signal and everything is
fine.

But if we're returning to kernel mode, we'll just take the page fault
again. And again. And again. Until the condition that caused the retry
is finally cleared.

Now, normally finishing IO on the page or whatever should get things
done, but whatever. Us busy-looping on it in kernel space might end up
delaying that too forever.

So let's just fix it. Here's a completely untested patch. It looks
bigger than it really is: it moves the "up_read()" up a bit in
__do_page_fault(), so that all the logic is saner. This is "tested" in
the sense that I am running a kernel with this patch, but I could
easily have screwed up some fault handling case.

Anyway, at least CPU1 in your traces was actually going through that
__lock_page_or_retry() code that could trigger this, so...

                               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 4271 bytes --]

 arch/x86/mm/fault.c | 65 +++++++++++++++++++++++++----------------------------
 1 file changed, 30 insertions(+), 35 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d973e61e450d..b38adc1cd39f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -844,11 +844,8 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 	  unsigned int fault)
 {
 	struct task_struct *tsk = current;
-	struct mm_struct *mm = tsk->mm;
 	int code = BUS_ADRERR;
 
-	up_read(&mm->mmap_sem);
-
 	/* Kernel mode? Handle exceptions or die: */
 	if (!(error_code & PF_USER)) {
 		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
@@ -879,7 +876,6 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	       unsigned long address, unsigned int fault)
 {
 	if (fatal_signal_pending(current) && !(error_code & PF_USER)) {
-		up_read(&current->mm->mmap_sem);
 		no_context(regs, error_code, address, 0, 0);
 		return;
 	}
@@ -887,14 +883,11 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	if (fault & VM_FAULT_OOM) {
 		/* Kernel mode? Handle exceptions or die: */
 		if (!(error_code & PF_USER)) {
-			up_read(&current->mm->mmap_sem);
 			no_context(regs, error_code, address,
 				   SIGSEGV, SEGV_MAPERR);
 			return;
 		}
 
-		up_read(&current->mm->mmap_sem);
-
 		/*
 		 * We ran out of memory, call the OOM killer, and return the
 		 * userspace (which will retry the fault, or kill us if we got
@@ -1062,7 +1055,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
-	int fault;
+	int fault, major = 0;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
@@ -1237,14 +1230,31 @@ good_area:
 	 * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.
 	 */
 	fault = handle_mm_fault(mm, vma, address, flags);
+	major |= fault & VM_FAULT_MAJOR;
 
 	/*
-	 * If we need to retry but a fatal signal is pending, handle the
-	 * signal first. We do not need to release the mmap_sem because it
-	 * would already be released in __lock_page_or_retry in mm/filemap.c.
+	 * If we need to retry the mmap_sem has already been released,
+	 * and if there is a fatal signal pending there is no guarantee
+	 * that we made any progress. Handle this case first.
 	 */
-	if (unlikely((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)))
+	if (unlikely(fault & VM_FAULT_RETRY)) {
+		if ((flags & FAULT_FLAG_ALLOW_RETRY) && !fatal_signal_pending(current)) {
+			/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
+			 * of starvation. */
+			flags &= ~FAULT_FLAG_ALLOW_RETRY;
+			flags |= FAULT_FLAG_TRIED;
+			goto retry;
+		}
+
+		/* Not returning to user mode? Handle exceptions or die: */
+		if (!(fault & FAULT_FLAG_USER))
+			no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+
+		/* User mode? Just return to handle the fatal exception */
 		return;
+	}
+
+	up_read(&mm->mmap_sem);
 
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		mm_fault_error(regs, error_code, address, fault);
@@ -1252,32 +1262,17 @@ good_area:
 	}
 
 	/*
-	 * Major/minor page fault accounting is only done on the
-	 * initial attempt. If we go through a retry, it is extremely
-	 * likely that the page will be found in page cache at that point.
+	 * Major/minor page fault accounting. If any of the events
+	 * returned VM_FAULT_MAJOR, we account it as a major fault.
 	 */
-	if (flags & FAULT_FLAG_ALLOW_RETRY) {
-		if (fault & VM_FAULT_MAJOR) {
-			tsk->maj_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
-				      regs, address);
-		} else {
-			tsk->min_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
-				      regs, address);
-		}
-		if (fault & VM_FAULT_RETRY) {
-			/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
-			 * of starvation. */
-			flags &= ~FAULT_FLAG_ALLOW_RETRY;
-			flags |= FAULT_FLAG_TRIED;
-			goto retry;
-		}
+	if (major) {
+		tsk->maj_flt++;
+		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+	} else {
+		tsk->min_flt++;
+		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
 	}
-
 	check_v8086_mode(regs, address, tsk);
-
-	up_read(&mm->mmap_sem);
 }
 NOKPROBE_SYMBOL(__do_page_fault);
 

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15 18:21                                                                               ` Linus Torvalds
@ 2014-12-15 23:46                                                                                 ` Linus Torvalds
  2014-12-18  2:42                                                                                   ` Sasha Levin
  2014-12-18  5:13                                                                                   ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-15 23:46 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Mon, Dec 15, 2014 at 10:21 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So let's just fix it. Here's a completely untested patch.

So after looking at this more, I'm actually really convinced that this
was a pretty nasty bug.

I'm *not* convinced that it's necessarily *your* bug, but I still
think it could be.

I cleaned up the patch a bit, split it up into two to clarify it, and
have committed it to my tree. I'm not marking the patches for stable,
because while I'm convinced it's a bug, I'm also not sure why even if
it triggers it doesn't eventually recover when the IO completes. So
I'd mark them for stable only if they are actually confirmed to fix
anything in the wild, and after they've gotten some testing in
general. The patches *look* straightforward, they remove more lines
than they add, and I think the code is more understandable too, but
maybe I just screwed up. Whatever. Some care is warranted, but this is
the first time I feel like I actually fixed something that matched at
least one of your lockup symptoms.

Anyway, it's there as

  26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling")
  7fb08eca4527 ("x86: mm: move mmap_sem unlock from mm_fault_error() to caller")

and I'll continue to look at the page fault patch. I still have a
slight worry that it's something along the lines of corrupted page
tables or some core VM issue, but I apart from my general nervousness
about the auto-numa code (which will be cleaned up eventually though
the pte_protnone patches), I can't actually see how you'd get into
endless page faults any other way. So I'm really hoping that the buggy
VM_FAULT_RETRY handling explains it.

But me not seeing any other bug clearly doesn't mean it doesn't exist.

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15 13:16                                                                                         ` Sasha Levin
@ 2014-12-16  3:40                                                                                           ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-16  3:40 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, David Lang, Linus Torvalds, Dave Jones, Chris Mason,
	Mike Galbraith, Peter Zijlstra, Dâniel Fraga,
	Linux Kernel Mailing List

On Mon, Dec 15, 2014 at 08:16:04AM -0500, Sasha Levin wrote:
> On 12/15/2014 07:56 AM, Paul E. McKenney wrote:
> > And maybe it would help if I did the CONFIG_TASKS_RCU=n case as well as
> > the CONFIG_TASKS_RCU=y case.  Please see below for an updated patch.
> 
> I do have CONFIG_TASKS_RCU=y

OK, back to the drawing board...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-11-21 22:55                                                                                         ` Linus Torvalds
  2014-11-21 23:03                                                                                           ` Andy Lutomirski
@ 2014-12-16 19:28                                                                                           ` Peter Zijlstra
  2014-12-16 20:46                                                                                             ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Peter Zijlstra @ 2014-12-16 19:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Andy Lutomirski, Steven Rostedt, Tejun Heo,
	linux-kernel, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Fri, Nov 21, 2014 at 02:55:27PM -0800, Linus Torvalds wrote:
> On Fri, Nov 21, 2014 at 1:11 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > I'm fine with that. I just think it's not horrid enough, but that can
> > be fixed easily :)
> 
> Oh, I think it's plenty horrid.
> 
> Anyway, here's an actual patch. As usual, it has seen absolutely no
> actual testing, but I did try to make sure it compiles and seems to do
> the right thing on:
>  - x86-32 no-PAE
>  - x86-32 no-PAE with PARAVIRT
>  - x86-32 PAE
>  - x86-64
> 
> also, I just removed the noise that is "vmalloc_sync_all()", since
> it's just all garbage and nothing actually uses it. Yeah, it's used by
> "register_die_notifier()", which makes no sense what-so-ever.
> Whatever. It's gone.
> 
> Can somebody actually *test* this? In particular, in any kind of real
> paravirt environment? Or, any comments even without testing?
> 
> I *really* am not proud of the mess wrt the whole
> 
>   #ifdef CONFIG_PARAVIRT
>   #ifdef CONFIG_X86_32
>     ...
> 
> but I think that from a long-term perspective, we're actually better
> off with this kind of really ugly - but very explcit - hack that very
> clearly shows what is going on.
> 
> The old code that actually "walked" the page tables was more
> "portable", but was somewhat misleading about what was actually going
> on.
> 
> Comments?

While going through this thread I wondered whatever became of this
patch. It seems a shame to forget about it entirely. Maybe just queued
for later while hunting wabbits?

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-16 19:28                                                                                           ` Peter Zijlstra
@ 2014-12-16 20:46                                                                                             ` Linus Torvalds
  2014-12-16 21:19                                                                                               ` Mel Gorman
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-16 20:46 UTC (permalink / raw)
  To: Peter Zijlstra, Mel Gorman
  Cc: Thomas Gleixner, Andy Lutomirski, Steven Rostedt, Tejun Heo,
	linux-kernel, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Don Zickus, Dave Jones, the arch/x86 maintainers

On Tue, Dec 16, 2014 at 11:28 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> While going through this thread I wondered whatever became of this
> patch. It seems a shame to forget about it entirely. Maybe just queued
> for later while hunting wabbits?

Mel Gorman took it up, cleaned up some stuff, and I think it's in -mm
or on its way there. I'm assuming it's 3.20 material by now.

                         Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-16 20:46                                                                                             ` Linus Torvalds
@ 2014-12-16 21:19                                                                                               ` Mel Gorman
  2014-12-16 23:02                                                                                                 ` Peter Zijlstra
  0 siblings, 1 reply; 486+ messages in thread
From: Mel Gorman @ 2014-12-16 21:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Andy Lutomirski, Steven Rostedt,
	Tejun Heo, linux-kernel, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Tue, Dec 16, 2014 at 12:46:57PM -0800, Linus Torvalds wrote:
> On Tue, Dec 16, 2014 at 11:28 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > While going through this thread I wondered whatever became of this
> > patch. It seems a shame to forget about it entirely. Maybe just queued
> > for later while hunting wabbits?
> 
> Mel Gorman took it up, cleaned up some stuff, and I think it's in -mm
> or on its way there. I'm assuming it's 3.20 material by now.
> 

I didn't pick up this one. My pickup and cleaning was on the PROT_NONE
material for automatic NUMA balancing.  It was posted way too close to
the merge window for -mm. It's post 3.19-rc1 + post Christmas holiday
material.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-16 21:19                                                                                               ` Mel Gorman
@ 2014-12-16 23:02                                                                                                 ` Peter Zijlstra
  2014-12-17  0:00                                                                                                   ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Peter Zijlstra @ 2014-12-16 23:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linus Torvalds, Thomas Gleixner, Andy Lutomirski, Steven Rostedt,
	Tejun Heo, linux-kernel, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Tue, Dec 16, 2014 at 09:19:21PM +0000, Mel Gorman wrote:
> On Tue, Dec 16, 2014 at 12:46:57PM -0800, Linus Torvalds wrote:
> > On Tue, Dec 16, 2014 at 11:28 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > While going through this thread I wondered whatever became of this
> > > patch. It seems a shame to forget about it entirely. Maybe just queued
> > > for later while hunting wabbits?
> > 
> > Mel Gorman took it up, cleaned up some stuff, and I think it's in -mm
> > or on its way there. I'm assuming it's 3.20 material by now.
> > 
> 
> I didn't pick up this one. My pickup and cleaning was on the PROT_NONE
> material for automatic NUMA balancing.  It was posted way too close to
> the merge window for -mm. It's post 3.19-rc1 + post Christmas holiday
> material.

OK, should we just stick it in the x86 tree and see if anything
explodes? ;-)

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-16 23:02                                                                                                 ` Peter Zijlstra
@ 2014-12-17  0:00                                                                                                   ` Linus Torvalds
  2014-12-17  0:41                                                                                                     ` Andy Lutomirski
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-17  0:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mel Gorman, Thomas Gleixner, Andy Lutomirski, Steven Rostedt,
	Tejun Heo, linux-kernel, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Tue, Dec 16, 2014 at 3:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> OK, should we just stick it in the x86 tree and see if anything
> explodes? ;-)

Gaah, I got confused about the patches.

And something did explode, it showed some Xen nasties. Xen has that
odd "we don't share PMD entries between MM's" thing going on, which
means that the vmalloc fault thing does actually have to occasionally
walk two levels rather than just copy the top level. I'm still not
sure why Xen doesn't share PMD's, since threads that shame the MM
clearly can share PMD's within Xen, but I gave up on it.

That said, making x86-64 use "read_cr3()" instead of
"current->active_mm" would at least make things a bit safer wrt NMI's
during the task switch, of course.  So *some* 32/64-bit consolidation
should be done, but my patch went a bit too far for Xen.

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17  0:00                                                                                                   ` Linus Torvalds
@ 2014-12-17  0:41                                                                                                     ` Andy Lutomirski
  2014-12-17 17:01                                                                                                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 486+ messages in thread
From: Andy Lutomirski @ 2014-12-17  0:41 UTC (permalink / raw)
  To: Linus Torvalds, Konrad Rzeszutek Wilk
  Cc: Peter Zijlstra, Mel Gorman, Thomas Gleixner, Steven Rostedt,
	Tejun Heo, linux-kernel, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Don Zickus, Dave Jones,
	the arch/x86 maintainers

On Tue, Dec 16, 2014 at 4:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Dec 16, 2014 at 3:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> OK, should we just stick it in the x86 tree and see if anything
>> explodes? ;-)
>
> Gaah, I got confused about the patches.
>
> And something did explode, it showed some Xen nasties. Xen has that
> odd "we don't share PMD entries between MM's" thing going on, which
> means that the vmalloc fault thing does actually have to occasionally
> walk two levels rather than just copy the top level. I'm still not
> sure why Xen doesn't share PMD's, since threads that shame the MM
> clearly can share PMD's within Xen, but I gave up on it.

Sounds like it's time to ask Konrad, the source of all Xen understanding :)

Linus, do you have a pointer to whatever version of the patch you tried?

--Andy

>
> That said, making x86-64 use "read_cr3()" instead of
> "current->active_mm" would at least make things a bit safer wrt NMI's
> during the task switch, of course.  So *some* 32/64-bit consolidation
> should be done, but my patch went a bit too far for Xen.
>
>                       Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-02 16:33                                         ` Linus Torvalds
  2014-12-02 17:14                                           ` Chris Mason
  2014-12-02 17:47                                           ` Mike Galbraith
@ 2014-12-17 11:13                                           ` Peter Zijlstra
  2 siblings, 0 replies; 486+ messages in thread
From: Peter Zijlstra @ 2014-12-17 11:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Galbraith, Ingo Molnar, Chris Mason, Dâniel Fraga,
	Dave Jones, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List

On Tue, Dec 02, 2014 at 08:33:53AM -0800, Linus Torvalds wrote:
> On Tue, Dec 2, 2014 at 6:13 AM, Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
> >
> > The bean counting problem below can contribute.
> >
> > https://lkml.org/lkml/2014/3/30/7
> 
> Hmm. That never got applied. I didn't apply it originally because of
> timing and wanting clarifications, but apparently it never made it
> into the -tip tree either.
> 
> Ingo, PeterZ - comments?

My comment at the time was:

https://lkml.org/lkml/2014/4/8/295

Of course that debug patch doesn't apply anymore. But we've had so many
fails with that skip_clock_update thing that we need more than yet
another fudge without real means of validating stuff.

I'll go see if I can get that debug infra back up.


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17  0:41                                                                                                     ` Andy Lutomirski
@ 2014-12-17 17:01                                                                                                       ` Konrad Rzeszutek Wilk
  2014-12-17 17:14                                                                                                         ` Peter Zijlstra
  0 siblings, 1 reply; 486+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-17 17:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Peter Zijlstra, Mel Gorman, Thomas Gleixner,
	Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Don Zickus,
	Dave Jones, the arch/x86 maintainers

On Tue, Dec 16, 2014 at 04:41:16PM -0800, Andy Lutomirski wrote:
> On Tue, Dec 16, 2014 at 4:00 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, Dec 16, 2014 at 3:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> OK, should we just stick it in the x86 tree and see if anything
> >> explodes? ;-)
> >
> > Gaah, I got confused about the patches.
> >
> > And something did explode, it showed some Xen nasties. Xen has that
> > odd "we don't share PMD entries between MM's" thing going on, which
> > means that the vmalloc fault thing does actually have to occasionally
> > walk two levels rather than just copy the top level. I'm still not
> > sure why Xen doesn't share PMD's, since threads that shame the MM
> > clearly can share PMD's within Xen, but I gave up on it.
> 
> Sounds like it's time to ask Konrad, the source of all Xen understanding :)

Awesome :-)
> 
> Linus, do you have a pointer to whatever version of the patch you tried?

The patch was this:

a) http://article.gmane.org/gmane.linux.kernel/1835331

Then Jurgen had a patch:
https://lkml.kernel.org/g/CA+55aFxSRujj=cM1NkXYvxmo=Y1hb1e3tgLhdh1JDphzV6WKRw@mail.gmail.com
which was one fix for one bug that ended up being fixed in QEMU - so
it can be ignored.

But my understanding of that thread was that it said patch 'a)' did not
fix Dave's issues - and the conversation went off on NMI watchdog?

I will look up the giant thread to make sense.


> 
> --Andy
> 
> >
> > That said, making x86-64 use "read_cr3()" instead of
> > "current->active_mm" would at least make things a bit safer wrt NMI's
> > during the task switch, of course.  So *some* 32/64-bit consolidation
> > should be done, but my patch went a bit too far for Xen.
> >
> >                       Linus
> 
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17 17:01                                                                                                       ` Konrad Rzeszutek Wilk
@ 2014-12-17 17:14                                                                                                         ` Peter Zijlstra
  0 siblings, 0 replies; 486+ messages in thread
From: Peter Zijlstra @ 2014-12-17 17:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andy Lutomirski, Linus Torvalds, Mel Gorman, Thomas Gleixner,
	Steven Rostedt, Tejun Heo, linux-kernel,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Don Zickus,
	Dave Jones, the arch/x86 maintainers

On Wed, Dec 17, 2014 at 12:01:39PM -0500, Konrad Rzeszutek Wilk wrote:
> > Linus, do you have a pointer to whatever version of the patch you tried?
> 
> The patch was this:
> 
> a) http://article.gmane.org/gmane.linux.kernel/1835331
> 
> Then Jurgen had a patch:
> https://lkml.kernel.org/g/CA+55aFxSRujj=cM1NkXYvxmo=Y1hb1e3tgLhdh1JDphzV6WKRw@mail.gmail.com
> which was one fix for one bug that ended up being fixed in QEMU - so
> it can be ignored.
> 
> But my understanding of that thread was that it said patch 'a)' did not
> fix Dave's issues - and the conversation went off on NMI watchdog?
> 
> I will look up the giant thread to make sense.

No, you're right in that the patch didn't solve the issue at hand and
that the conversation moved on into a different direction.

But I would very much like to see it (or something very much like it)
happen, because it does make the code much saner.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15  0:38                                                                         ` Linus Torvalds
  2014-12-15  0:42                                                                           ` Dave Jones
  2014-12-15  5:47                                                                           ` Linus Torvalds
@ 2014-12-17 18:22                                                                           ` Dave Jones
  2014-12-17 18:57                                                                             ` Dave Jones
  2014-12-17 19:41                                                                             ` Linus Torvalds
  2 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-17 18:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 14, 2014 at 04:38:00PM -0800, Linus Torvalds wrote:

 > And I could fairly easily imagine endless page faults due to the
 > exception table, or even endless signal handling loops due to getting
 > a signal while trying to handle a signal. Both things that would
 > actually reasonably result in a watchdog.
 > 
 > So I'm adding some x86 FPU save people to the cc.
 > 
 > Can anybody make sense of that backtrace, keeping in mind that we're
 > looking for some kind of endless loop where we don't make progress?
 > 
 > There's more in the original email (see on lkml if you haven't seen
 > the thread earlier already), but they look similar with that whole
 > do_signal -> save_xstate_sig -> do_page_fault thing just on other
 > CPU's.
 > 
 > DaveJ, do you have the kernel image for this? I'd love to see what the
 > code is around that "save_xstate_sig+0x81" or around those
 > __clear_user+0x17/0x36 points...

Finally back in front of that machine.

Here's save_xstate_sig:

ve_xstate_sig>:
ffffffff8100f370:       e8 0b 2b 7c 00          callq  ffffffff817d1e80 <__fentry__>
ffffffff8100f375:       55                      push   %rbp
ffffffff8100f376:       48 63 d2                movslq %edx,%rdx
ffffffff8100f379:       48 89 e5                mov    %rsp,%rbp
ffffffff8100f37c:       41 57                   push   %r15
ffffffff8100f37e:       41 56                   push   %r14
ffffffff8100f380:       45 31 f6                xor    %r14d,%r14d
ffffffff8100f383:       41 55                   push   %r13
ffffffff8100f385:       41 54                   push   %r12
ffffffff8100f387:       49 89 fc                mov    %rdi,%r12
ffffffff8100f38a:       53                      push   %rbx
ffffffff8100f38b:       48 89 f3                mov    %rsi,%rbx
ffffffff8100f38e:       48 83 ec 18             sub    $0x18,%rsp
ffffffff8100f392:       65 4c 8b 2c 25 00 aa    mov    %gs:0xaa00,%r13
ffffffff8100f399:       00 00 
ffffffff8100f39b:       48 39 f7                cmp    %rsi,%rdi
ffffffff8100f39e:       4d 8b bd 98 05 00 00    mov    0x598(%r13),%r15
ffffffff8100f3a5:       41 0f 95 c6             setne  %r14b
ffffffff8100f3a9:       65 48 8b 04 25 08 aa    mov    %gs:0xaa08,%rax
ffffffff8100f3b0:       00 00 
ffffffff8100f3b2:       48 01 fa                add    %rdi,%rdx
ffffffff8100f3b5:       48 8b 88 48 c0 ff ff    mov    -0x3fb8(%rax),%rcx
ffffffff8100f3bc:       b8 f3 ff ff ff          mov    $0xfffffff3,%eax
ffffffff8100f3c1:       0f 82 00 01 00 00       jb     ffffffff8100f4c7 <save_xstate_sig+0x157>
ffffffff8100f3c7:       48 39 d1                cmp    %rdx,%rcx
ffffffff8100f3ca:       0f 82 f7 00 00 00       jb     ffffffff8100f4c7 <save_xstate_sig+0x157>
ffffffff8100f3d0:       41 8b 85 94 05 00 00    mov    0x594(%r13),%eax
ffffffff8100f3d7:       85 c0                   test   %eax,%eax
ffffffff8100f3d9:       74 3d                   je     ffffffff8100f418 <save_xstate_sig+0xa8>
ffffffff8100f3db:       e9 00 01 00 00          jmpq   ffffffff8100f4e0 <save_xstate_sig+0x170>
ffffffff8100f3e0:       48 8d bb 00 02 00 00    lea    0x200(%rbx),%rdi
ffffffff8100f3e7:       be 40 00 00 00          mov    $0x40,%esi
ffffffff8100f3ec:       e8 3f 5e 36 00          callq  ffffffff81375230 <__clear_user>
ffffffff8100f3f1:       85 c0                   test   %eax,%eax
ffffffff8100f3f3:       0f 85 81 01 00 00       jne    ffffffff8100f57a <save_xstate_sig+0x20a>
ffffffff8100f3f9:       ba ff ff ff ff          mov    $0xffffffff,%edx
ffffffff8100f3fe:       89 c1                   mov    %eax,%ecx
ffffffff8100f400:       48 89 df                mov    %rbx,%rdi
ffffffff8100f403:       89 d0                   mov    %edx,%eax
ffffffff8100f405:       0f 1f 00                nopl   (%rax)
ffffffff8100f408:       48 0f ae 27             xsave64 (%rdi)
ffffffff8100f40c:       0f 1f 00                nopl   (%rax)
ffffffff8100f40f:       e9 ea 00 00 00          jmpq   ffffffff8100f4fe <save_xstate_sig+0x18e>
ffffffff8100f414:       0f 1f 40 00             nopl   0x0(%rax)
ffffffff8100f418:       e9 33 01 00 00          jmpq   ffffffff8100f550 <save_xstate_sig+0x1e0>
ffffffff8100f41d:       4c 89 ef                mov    %r13,%rdi
ffffffff8100f420:       e8 1b fe ff ff          callq  ffffffff8100f240 <__sanitize_i387_state>
ffffffff8100f425:       8b 05 95 d5 0b 01       mov    0x10bd595(%rip),%eax        # ffffffff820cc9c0 <xstate_size>
ffffffff8100f42b:       89 45 cc                mov    %eax,-0x34(%rbp)
ffffffff8100f42e:       e8 7d 33 19 00          callq  ffffffff811a27b0 <might_fault>
ffffffff8100f433:       48 89 df                mov    %rbx,%rdi
ffffffff8100f436:       4c 89 fe                mov    %r15,%rsi
ffffffff8100f439:       8b 55 cc                mov    -0x34(%rbp),%edx
ffffffff8100f43c:       e8 2f 44 36 00          callq  ffffffff81373870 <copy_user_generic_unrolled>
ffffffff8100f441:       85 c0                   test   %eax,%eax
ffffffff8100f443:       0f 85 27 01 00 00       jne    ffffffff8100f570 <save_xstate_sig+0x200>
ffffffff8100f449:       45 85 f6                test   %r14d,%r14d
ffffffff8100f44c:       0f 85 c4 00 00 00       jne    ffffffff8100f516 <save_xstate_sig+0x1a6>
ffffffff8100f452:       49 c7 c4 60 cd 0c 82    mov    $0xffffffff820ccd60,%r12
ffffffff8100f459:       e8 52 33 19 00          callq  ffffffff811a27b0 <might_fault>
ffffffff8100f45e:       48 8d bb d0 01 00 00    lea    0x1d0(%rbx),%rdi
ffffffff8100f465:       4c 89 e6                mov    %r12,%rsi
ffffffff8100f468:       ba 30 00 00 00          mov    $0x30,%edx
ffffffff8100f46d:       e8 fe 43 36 00          callq  ffffffff81373870 <copy_user_generic_unrolled>
ffffffff8100f472:       41 89 c5                mov    %eax,%r13d
ffffffff8100f475:       41 89 c4                mov    %eax,%r12d
ffffffff8100f478:       e9 bb 00 00 00          jmpq   ffffffff8100f538 <save_xstate_sig+0x1c8>
ffffffff8100f47d:       31 d2                   xor    %edx,%edx
ffffffff8100f47f:       8b 05 3b d5 0b 01       mov    0x10bd53b(%rip),%eax        # ffffffff820cc9c0 <xstate_size>
ffffffff8100f485:       41 89 d4                mov    %edx,%r12d
ffffffff8100f488:       0f 1f 00                nopl   (%rax)
ffffffff8100f48b:       c7 04 03 45 58 50 46    movl   $0x46505845,(%rbx,%rax,1)
ffffffff8100f492:       0f 1f 00                nopl   (%rax)
ffffffff8100f495:       89 d0                   mov    %edx,%eax
ffffffff8100f497:       0f 1f 00                nopl   (%rax)
ffffffff8100f49a:       8b 8b 00 02 00 00       mov    0x200(%rbx),%ecx
ffffffff8100f4a0:       0f 1f 00                nopl   (%rax)
ffffffff8100f4a3:       41 09 c4                or     %eax,%r12d
ffffffff8100f4a6:       83 c9 03                or     $0x3,%ecx
ffffffff8100f4a9:       89 d0                   mov    %edx,%eax
ffffffff8100f4ab:       45 09 ec                or     %r13d,%r12d
ffffffff8100f4ae:       0f 1f 00                nopl   (%rax)
ffffffff8100f4b1:       89 8b 00 02 00 00       mov    %ecx,0x200(%rbx)
ffffffff8100f4b7:       0f 1f 00                nopl   (%rax)
ffffffff8100f4ba:       41 09 c4                or     %eax,%r12d
ffffffff8100f4bd:       31 c0                   xor    %eax,%eax
ffffffff8100f4bf:       45 85 e4                test   %r12d,%r12d
ffffffff8100f4c2:       0f 95 c0                setne  %al
ffffffff8100f4c5:       f7 d8                   neg    %eax
ffffffff8100f4c7:       48 83 c4 18             add    $0x18,%rsp
ffffffff8100f4cb:       5b                      pop    %rbx
ffffffff8100f4cc:       41 5c                   pop    %r12
ffffffff8100f4ce:       41 5d                   pop    %r13
ffffffff8100f4d0:       41 5e                   pop    %r14
ffffffff8100f4d2:       41 5f                   pop    %r15
ffffffff8100f4d4:       5d                      pop    %rbp
ffffffff8100f4d5:       c3                      retq   



and  __clear_user :

ffffffff81375230 <__clear_user>:
ffffffff81375230:       e8 4b cc 45 00          callq  ffffffff817d1e80 <__fentry__>
ffffffff81375235:       55                      push   %rbp
ffffffff81375236:       48 89 e5                mov    %rsp,%rbp
ffffffff81375239:       41 54                   push   %r12
ffffffff8137523b:       49 89 fc                mov    %rdi,%r12
ffffffff8137523e:       53                      push   %rbx
ffffffff8137523f:       48 89 f3                mov    %rsi,%rbx
ffffffff81375242:       e8 69 d5 e2 ff          callq  ffffffff811a27b0 <might_fault>
ffffffff81375247:       0f 1f 00                nopl   (%rax)
ffffffff8137524a:       48 89 d8                mov    %rbx,%rax
ffffffff8137524d:       48 c1 eb 03             shr    $0x3,%rbx
ffffffff81375251:       4c 89 e7                mov    %r12,%rdi
ffffffff81375254:       83 e0 07                and    $0x7,%eax
ffffffff81375257:       48 89 d9                mov    %rbx,%rcx
ffffffff8137525a:       be 08 00 00 00          mov    $0x8,%esi
ffffffff8137525f:       31 d2                   xor    %edx,%edx
ffffffff81375261:       48 85 c9                test   %rcx,%rcx
ffffffff81375264:       74 0a                   je     ffffffff81375270 <__clear_user+0x40>
ffffffff81375266:       48 89 17                mov    %rdx,(%rdi)
ffffffff81375269:       48 01 f7                add    %rsi,%rdi
ffffffff8137526c:       ff c9                   dec    %ecx
ffffffff8137526e:       75 f6                   jne    ffffffff81375266 <__clear_user+0x36>
ffffffff81375270:       48 89 c1                mov    %rax,%rcx
ffffffff81375273:       85 c9                   test   %ecx,%ecx
ffffffff81375275:       74 09                   je     ffffffff81375280 <__clear_user+0x50>
ffffffff81375277:       88 17                   mov    %dl,(%rdi)
ffffffff81375279:       48 ff c7                inc    %rdi
ffffffff8137527c:       ff c9                   dec    %ecx
ffffffff8137527e:       75 f7                   jne    ffffffff81375277 <__clear_user+0x47>
ffffffff81375280:       0f 1f 00                nopl   (%rax)
ffffffff81375283:       5b                      pop    %rbx
ffffffff81375284:       48 89 c8                mov    %rcx,%rax
ffffffff81375287:       41 5c                   pop    %r12
ffffffff81375289:       5d                      pop    %rbp
ffffffff8137528a:       c3                      retq   
ffffffff8137528b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)


If you need more, I've kept the vmlinux handy.  I'm going to try your two patches
on top of .18, with the same kernel config, and see where that takes us.
Hopefully to happier places.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17 18:22                                                                           ` frequent lockups in 3.18rc4 Dave Jones
@ 2014-12-17 18:57                                                                             ` Dave Jones
  2014-12-17 19:24                                                                               ` Dave Jones
  2014-12-17 19:51                                                                               ` Linus Torvalds
  2014-12-17 19:41                                                                             ` Linus Torvalds
  1 sibling, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-17 18:57 UTC (permalink / raw)
  To: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Wed, Dec 17, 2014 at 01:22:41PM -0500, Dave Jones wrote:
 
 > I'm going to try your two patches on top of .18, with the same kernel
 > config, and see where that takes us.
 > Hopefully to happier places.

Not so much.  Died very quickly.

[  270.822490] BUG: unable to handle kernel paging request at 000000000249db90
[  270.822573] IP: [<000000336ef04084>] 0x336ef04084
[  270.822602] PGD 20e5ee067 PUD 20e5ef067 PMD 23126d067 PTE 94ec80
[  270.822633] Oops: 0006 [#1] SMP 
[  270.822652] Modules linked in: hidp llc2 af_key fuse bnep can_raw scsi_transport_iscsi nfnetlink can_bcm rfcomm nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose sctp libcrc32c x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm coretemp hwmon x86_pkg_temp_thermal kvm_intel snd_timer kvm snd crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw usb_debug shpchp e1000e soundcore ptp pps_core nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[  270.822979] CPU: 3 PID: 9856 Comm: trinity-c93 Not tainted 3.18.0+ #105
[  270.823042] task: ffff8801a45416d0 ti: ffff88020e72c000 task.ti: ffff88020e72c000
[  270.823067] RIP: 0033:[<000000336ef04084>]  [<000000336ef04084>] 0x336ef04084
[  270.823096] RSP: 002b:00007fff9c3304c0  EFLAGS: 00010202
[  270.823117] RAX: 000000336f1b68c0 RBX: 000000000249db90 RCX: 0000000000000000
[  270.823142] RDX: fffffffffffffffe RSI: 00000000fbad8000 RDI: 00007fff9c3304c0
[  270.823168] RBP: 000000000249db90 R08: 0000000000000000 R09: 0000000000002680
[  270.823192] R10: 000000000000001f R11: 0000000000000246 R12: ffffffffffffffff
[  270.823217] R13: 000000000041edb9 R14: 00007fff9c3305f8 R15: 0000000000000001
[  270.823241] FS:  00007f5fd4acf740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[  270.823268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.823288] CR2: 00007f5fd2640113 CR3: 000000020e5ed000 CR4: 00000000001407e0
[  270.823312] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  270.823336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  270.823360] 
[  270.823370] RIP  [<000000336ef04084>] 0x336ef04084
[  270.823392]  RSP <00007fff9c3304c0>
[  270.824407] CR2: 000000000249db90
[  270.825443] ---[ end trace d6eb8dccb8df6213 ]---
[  270.826448] Kernel panic - not syncing: Fatal exception


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17 18:57                                                                             ` Dave Jones
@ 2014-12-17 19:24                                                                               ` Dave Jones
  2014-12-17 19:51                                                                               ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-17 19:24 UTC (permalink / raw)
  To: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Wed, Dec 17, 2014 at 01:57:55PM -0500, Dave Jones wrote:
 > On Wed, Dec 17, 2014 at 01:22:41PM -0500, Dave Jones wrote:
 >  
 >  > I'm going to try your two patches on top of .18, with the same kernel
 >  > config, and see where that takes us.
 >  > Hopefully to happier places.
 > 
 > Not so much.  Died very quickly.
 > 
 > [  270.822490] BUG: unable to handle kernel paging request at 000000000249db90
 > [  270.822573] IP: [<000000336ef04084>] 0x336ef04084
 > [  270.822602] PGD 20e5ee067 PUD 20e5ef067 PMD 23126d067 PTE 94ec80
 > [  270.822633] Oops: 0006 [#1] SMP 
 > [  270.822652] Modules linked in: hidp llc2 af_key fuse bnep can_raw scsi_transport_iscsi nfnetlink can_bcm rfcomm nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose sctp libcrc32c x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm coretemp hwmon x86_pkg_temp_thermal kvm_intel snd_timer kvm snd crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw usb_debug shpchp e1000e soundcore ptp pps_core nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
 > [  270.822979] CPU: 3 PID: 9856 Comm: trinity-c93 Not tainted 3.18.0+ #105
 > [  270.823042] task: ffff8801a45416d0 ti: ffff88020e72c000 task.ti: ffff88020e72c000
 > [  270.823067] RIP: 0033:[<000000336ef04084>]  [<000000336ef04084>] 0x336ef04084
 > [  270.823096] RSP: 002b:00007fff9c3304c0  EFLAGS: 00010202
 > [  270.823117] RAX: 000000336f1b68c0 RBX: 000000000249db90 RCX: 0000000000000000
 > [  270.823142] RDX: fffffffffffffffe RSI: 00000000fbad8000 RDI: 00007fff9c3304c0
 > [  270.823168] RBP: 000000000249db90 R08: 0000000000000000 R09: 0000000000002680
 > [  270.823192] R10: 000000000000001f R11: 0000000000000246 R12: ffffffffffffffff
 > [  270.823217] R13: 000000000041edb9 R14: 00007fff9c3305f8 R15: 0000000000000001
 > [  270.823241] FS:  00007f5fd4acf740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
 > [  270.823268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 > [  270.823288] CR2: 00007f5fd2640113 CR3: 000000020e5ed000 CR4: 00000000001407e0
 > [  270.823312] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 > [  270.823336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 > [  270.823360] 
 > [  270.823370] RIP  [<000000336ef04084>] 0x336ef04084
 > [  270.823392]  RSP <00007fff9c3304c0>
 > [  270.824407] CR2: 000000000249db90
 > [  270.825443] ---[ end trace d6eb8dccb8df6213 ]---
 > [  270.826448] Kernel panic - not syncing: Fatal exception

different flavour of the same thing

[  298.759018] BUG: unable to handle kernel paging request at 00000000016edc30
[  298.759108] IP: [<0000000000412c20>] 0x412c20
[  298.759130] PGD 2315d1067 PUD 2315d2067 PMD 2315d7067 PTE 3c2a880
[  298.759159] Oops: 0004 [#1] SMP 
[  298.759177] Modules linked in: rfcomm hidp bnep llc2 af_key scsi_transport_iscsi nfnetlink can_raw can_bcm nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose sctp libcrc32c x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device usb_debug snd_pcm e1000e ptp snd_timer shpchp snd pps_core soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[  298.759487] CPU: 3 PID: 4568 Comm: trinity-c193 Not tainted 3.18.0+ #105
[  298.759550] task: ffff88018c534470 ti: ffff8801deafc000 task.ti: ffff8801deafc000
[  298.759575] RIP: 0033:[<0000000000412c20>]  [<0000000000412c20>] 0x412c20
[  298.759601] RSP: 002b:00007fff8b5d80c0  EFLAGS: 00010202
[  298.759621] RAX: 00000000016edc20 RBX: 00000000017769f0 RCX: 00000000016edc20
[  298.759645] RDX: 0000000000000003 RSI: 0000000000000003 RDI: 000000336f1b76e0
[  298.759668] RBP: 00000000016edc20 R08: 000000336f1b70fc R09: 000000336f1b7140
[  298.759692] R10: 000000000000001f R11: 0000000000000246 R12: 0000000001776ef0
[  298.759716] R13: 00000000017769f0 R14: 0000000000000000 R15: 0000000000000000
[  298.759740] FS:  00007fbc02017740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[  298.759766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  298.760773] CR2: 0000000000000004 CR3: 00000002315d0000 CR4: 00000000001407e0
[  298.761788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  298.762802] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[  298.763812] 
[  298.764808] RIP  [<0000000000412c20>] 0x412c20
[  298.765806]  RSP <00007fff8b5d80c0>
[  298.766772] CR2: 00000000016edc30
[  298.767725] ---[ end trace eaa888b859a91308 ]---
[  298.768672] Kernel panic - not syncing: Fatal exception

This seems to be easily reproducable at least..

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17 18:22                                                                           ` frequent lockups in 3.18rc4 Dave Jones
  2014-12-17 18:57                                                                             ` Dave Jones
@ 2014-12-17 19:41                                                                             ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-17 19:41 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Wed, Dec 17, 2014 at 10:22 AM, Dave Jones <davej@redhat.com> wrote:
>
> Here's save_xstate_sig:

Ok, that just confirmed that it was the call to __clear_user and the
"xsave64" instruction like expected. And the offset in __clear_user()
was just the return address after the call to "might_fault", so this
all matches with __clear_user and/or the xsave64 instruction taking
infinite page-faults.

.. which also kind of matches with your old "pipe/page fault oddness"
report, where it was the single-byte write in
"fault_in_pages_writable()" that kept taking infinite page faults.

However.

I'm back looking at your old trace for the pipe/page fault oddness
report, and it goes like this (simplified):

    __do_page_fault() {
      down_read_trylock();
      __might_sleep();
      find_vma();
      handle_mm_fault() {
        _raw_spin_lock();
        ptep_set_access_flags();
        _raw_spin_unlock();
      }
      up_read();
    }

which is a bit hard to read because it doesn't trace all functions -
it only traces the ones that didn't get inlined.

But "didn't get inlined" already means that we know the
handle_mm_fault() codepath it didn't go outside of mm/memory.c apart
from the quoted parts. So no hugetlb.c code, and no mm/filemap.c code.
So there are no lock_page_or_retry() failures, for example.

There are only two ptep_set_access_flags() calls in mm/memory.c - in
do_wp_page() for the page re-use case, and in handle_pte_fault() for
the "pte was present and already writable".

And it's almost certainly not the do_wp_page() one, because that is
not something that gcc inlines at least for me (it has two different
call sites).

So I'm a bit less optimistic about the VM_FAULT_RETRY +
fatal_signal_pending() scenario, because it simply doesn't match your
earlier odd page fault thing.

Your earlier page fault problem really looks like the VM is confused
about the page table contents. The kernel thinks the page is already
perfectly writable, and just marks it accessed and returns. But the
page fault just kept happening. And from that thread, you had
"error-code=0", which means "not present page, write fault".

So that earlier "infinite page fault" bug really smells like something
else went wrong. One of:

 - we're using the wrong "mm". The x86 fault code uses "tsk->mm",
rather than "tsk->active_mm", which is somewhat dubious. At the same
time, they should always match, unless "mm" is NULL. And we know mm
isn't NULL, because __do_page_fault checks for that lack of user
context..

   There's also small areas in the scheduler where the current task
itself is kind of a gray area, and the CPU hasn't been switched to the
new cr3 yet, but those are all irq-off. They don't match your stack
trace anyway.

 - we walk the page table directories without necessarily checking
that they are writable.  So maybe the PTE itself is writable, but the
PMD isn't. We do have read-only PMD's (see pmd_wrprotect). But that
*should* only trigger for hugepage entries. And again, your old error
code really said that the CPU thinks it is "not present".

 - the whole NUMA protnone mess. That's what I suspected back then,
and I keep coming back to it. That's the code that makes the kernel
think that "pte_present is set", even though the CPU sees that the
actual PTE_PRESENT bit is clear.

Ugh. I'd have loved for the VM_FAULT_RETRY thing to explain all your
problems, but it doesn't.

                        Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17 18:57                                                                             ` Dave Jones
  2014-12-17 19:24                                                                               ` Dave Jones
@ 2014-12-17 19:51                                                                               ` Linus Torvalds
  2014-12-17 20:16                                                                                 ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-17 19:51 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

On Wed, Dec 17, 2014 at 10:57 AM, Dave Jones <davej@redhat.com> wrote:
> On Wed, Dec 17, 2014 at 01:22:41PM -0500, Dave Jones wrote:
>
>  > I'm going to try your two patches on top of .18, with the same kernel
>  > config, and see where that takes us.
>  > Hopefully to happier places.
>
> Not so much.  Died very quickly.

Damn, damn, damn. That's because of a stupid typo on the patches. We
have these very similar variables ("flags" and "fault") that have very
similar fault information, but they are completely different.

The "fault & FAULT_FLAG_USER" test is wrong, it should test "flags &
FAULT_FLAG_USER". Patch attached.

The half-way good news is that this certainly confirms that trinity is
triggering the "page fault with fatal signal pending" special case.

                             Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 466 bytes --]

 arch/x86/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b74a7e130b03..38dcec403b46 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1247,7 +1247,7 @@ good_area:
 		}
 
 		/* User mode? Just return to handle the fatal exception */
-		if (fault & FAULT_FLAG_USER)
+		if (flags & FAULT_FLAG_USER)
 			return;
 
 		/* Not returning to user mode? Handle exceptions or die: */

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-17 19:51                                                                               ` Linus Torvalds
@ 2014-12-17 20:16                                                                                 ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-17 20:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Wed, Dec 17, 2014 at 11:51:45AM -0800, Linus Torvalds wrote:
 > On Wed, Dec 17, 2014 at 10:57 AM, Dave Jones <davej@redhat.com> wrote:
 > > On Wed, Dec 17, 2014 at 01:22:41PM -0500, Dave Jones wrote:
 > >
 > >  > I'm going to try your two patches on top of .18, with the same kernel
 > >  > config, and see where that takes us.
 > >  > Hopefully to happier places.
 > >
 > > Not so much.  Died very quickly.
 > 
 > Damn, damn, damn. That's because of a stupid typo on the patches. We
 > have these very similar variables ("flags" and "fault") that have very
 > similar fault information, but they are completely different.
 > 
 > The "fault & FAULT_FLAG_USER" test is wrong, it should test "flags &
 > FAULT_FLAG_USER". Patch attached.

Yup, that seems to be the only obvious bug in those commits.
It's survived for long with that change on top.

 > The half-way good news is that this certainly confirms that trinity is
 > triggering the "page fault with fatal signal pending" special case.

Now let's see how long it runs for..

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15 23:46                                                                                 ` Linus Torvalds
@ 2014-12-18  2:42                                                                                   ` Sasha Levin
  2014-12-18  2:45                                                                                     ` Linus Torvalds
  2014-12-18  5:13                                                                                   ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Sasha Levin @ 2014-12-18  2:42 UTC (permalink / raw)
  To: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On 12/15/2014 06:46 PM, Linus Torvalds wrote:
> I cleaned up the patch a bit, split it up into two to clarify it, and
> have committed it to my tree. I'm not marking the patches for stable,
> because while I'm convinced it's a bug, I'm also not sure why even if
> it triggers it doesn't eventually recover when the IO completes. So
> I'd mark them for stable only if they are actually confirmed to fix
> anything in the wild, and after they've gotten some testing in
> general. The patches *look* straightforward, they remove more lines
> than they add, and I think the code is more understandable too, but
> maybe I just screwed up. Whatever. Some care is warranted, but this is
> the first time I feel like I actually fixed something that matched at
> least one of your lockup symptoms.
> 
> Anyway, it's there as
> 
>   26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling")
>   7fb08eca4527 ("x86: mm: move mmap_sem unlock from mm_fault_error() to caller")

I guess you did "just screwed up"...

I've started seeing this:

[  240.190061] BUG: unable to handle kernel paging request at 00007f341768b000
[  240.190061] IP: [<00007f341baf61fb>] 0x7f341baf61fb
[  240.190061] PGD 12b3e4067 PUD 12b3e5067 PMD 29a700067 PTE 0
[  240.190061] Oops: 0004 [#10] PREEMPT SMP
[  240.190061] Dumping ftrace buffer:
[  240.190061]    (ftrace buffer empty)
[  240.190061] Modules linked in:
[  240.190061] CPU: 6 PID: 9691 Comm: trinity-c619 Tainted: G      D        3.18.0-sasha-08443-g2b40f4a #1618
[  240.190061] task: ffff88012b346000 ti: ffff88012b3d4000 task.ti: ffff88012b3d4000
[  240.190061] RIP: 0033:[<00007f341baf61fb>]  [<00007f341baf61fb>] 0x7f341baf61fb
[  240.190061] RSP: 002b:00007fff39f045f8  EFLAGS: 00010206
[  240.190061] RAX: 00007fff39f04600 RBX: 0000000000000363 RCX: 0000000000000200
[  240.190061] RDX: 0000000000001000 RSI: 00007f341768b000 RDI: 00007fff39f04600
[  240.190061] RBP: 00007fff39f05640 R08: 00007f341bdf20a8 R09: 00007f341bdf2100
[  240.190061] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000001000
[  240.190061] R13: 0000000000001000 R14: 0000000000362000 R15: 00007fff39f04600
[  240.190061] FS:  00007f341bffb700(0000) GS:ffff8802da400000(0000) knlGS:0000000000000000
[  240.190061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  240.190061] CR2: 00007f341894801c CR3: 000000012b364000 CR4: 00000000000006a0
[  240.190061] DR0: ffffffff81000000 DR1: 0000000000000000 DR2: 0000000000000000
[  240.190061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000b0602
[  240.190061]
[  240.190061] RIP  [<00007f341baf61fb>] 0x7f341baf61fb
[  240.190061]  RSP <00007fff39f045f8>
[  240.190061] CR2: 00007f341768b000

Which was bisected down to:

	26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling")


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-18  2:42                                                                                   ` Sasha Levin
@ 2014-12-18  2:45                                                                                     ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-18  2:45 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Wed, Dec 17, 2014 at 6:42 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>
> I guess you did "just screwed up"...

See the email to Dave, pick the fix from there, or from commit
cf3c0a1579ef ("x86: mm: fix VM_FAULT_RETRY handling")

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-15 23:46                                                                                 ` Linus Torvalds
  2014-12-18  2:42                                                                                   ` Sasha Levin
@ 2014-12-18  5:13                                                                                   ` Dave Jones
  2014-12-18 15:54                                                                                     ` Chris Mason
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-18  5:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Mon, Dec 15, 2014 at 03:46:41PM -0800, Linus Torvalds wrote:
 > On Mon, Dec 15, 2014 at 10:21 AM, Linus Torvalds
 > <torvalds@linux-foundation.org> wrote:
 > >
 > > So let's just fix it. Here's a completely untested patch.
 > 
 > So after looking at this more, I'm actually really convinced that this
 > was a pretty nasty bug.
 > 
 > I'm *not* convinced that it's necessarily *your* bug, but I still
 > think it could be.

Bah, I was getting all optimistic.
I came home this evening to a locked up machine.
Serial console had a *lot* more traces than usual though.
Full log below.  The 12xxx.xxxxxx traces we seemed to recover from,
followed by silence for a while, before the real fun begins at 157xx.xxxxxx


[12525.310743] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [trinity-c60:28574]
[12525.311429] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[12525.315877] CPU: 2 PID: 28574 Comm: trinity-c60 Not tainted 3.18.0+ #106
[12525.318136] task: ffff880153a20000 ti: ffff8801c251c000 task.ti: ffff8801c251c000
[12525.318931] RIP: 0010:[<ffffffff81373805>]  [<ffffffff81373805>] copy_user_enhanced_fast_string+0x5/0x10
[12525.319758] RSP: 0018:ffff8801c251fbe0  EFLAGS: 00010206
[12525.320555] RAX: 00007f9c44b07f1d RBX: 0000000000000003 RCX: 00000000000009c0
[12525.321403] RDX: 0000000000001000 RSI: 00007f9c44b0855d RDI: ffff88007cb1f640
[12525.322227] RBP: ffff8801c251fc28 R08: 0000000000000000 R09: 0000000000000001
[12525.323046] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff817cf22b
[12525.323854] R13: ffff8801c251fb68 R14: ffff880061dccd40 R15: ffff880061dcce60
[12525.324676] FS:  00007f9c4714c740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[12525.325468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12525.326262] CR2: 00007f9c45b905b8 CR3: 000000022a30b000 CR4: 00000000001407e0
[12525.327065] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12525.327867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12525.328666] Stack:
[12525.329459]  ffffffff8119fe56 ffff8801c251fbf8 140000010718f000 0000000000001000
[12525.330283]  140000010718f000 0000000000001000 ffff8801c251fd60 0000000000000000
[12525.331139]  ffff880061dcd070 ffff8801c251fcc8 ffffffff81175fe7 ffff8801c251fc88
[12525.331969] Call Trace:
[12525.332801]  [<ffffffff8119fe56>] ? iov_iter_copy_from_user_atomic+0x156/0x180
[12525.333642]  [<ffffffff81175fe7>] generic_perform_write+0xf7/0x1f0
[12525.334475]  [<ffffffff81178722>] __generic_file_write_iter+0x162/0x350
[12525.335315]  [<ffffffff811e7940>] ? new_sync_read+0xd0/0xd0
[12525.336149]  [<ffffffff8117894f>] generic_file_write_iter+0x3f/0xb0
[12525.336981]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12525.337809]  [<ffffffff811e7a88>] do_iter_readv_writev+0x78/0xc0
[12525.338630]  [<ffffffff811e92b8>] do_readv_writev+0xd8/0x2a0
[12525.339435]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12525.340238]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12525.341068]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12525.341865]  [<ffffffff817cf310>] ? _raw_spin_unlock_irq+0x30/0x40
[12525.342664]  [<ffffffff811e950c>] vfs_writev+0x3c/0x50
[12525.343486]  [<ffffffff811e98d2>] SyS_pwritev+0xc2/0xf0
[12525.344292]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12525.345099] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[12525.346820] sending NMI to other CPUs:
[12525.347621] NMI backtrace for cpu 3
[12525.348409] CPU: 3 PID: 25560 Comm: trinity-c201 Not tainted 3.18.0+ #106
[12525.350733] task: ffff880095342da0 ti: ffff8801c048c000 task.ti: ffff8801c048c000
[12525.351503] RIP: 0010:[<ffffffff810fbc4e>]  [<ffffffff810fbc4e>] generic_exec_single+0xee/0x1b0
[12525.352270] RSP: 0018:ffff8801c048f9f8  EFLAGS: 00000202
[12525.353017] RAX: 0000000000000008 RBX: ffff8801c048fa10 RCX: 0000000000000038
[12525.353757] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[12525.354479] RBP: ffff8801c048fa58 R08: ffff8802437418f0 R09: 0000000000000000
[12525.355187] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[12525.355878] R13: 0000000000000001 R14: ffff880061e34fc0 R15: 0000000000000003
[12525.356558] FS:  00007f9c4714c740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[12525.357236] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12525.357905] CR2: 0000000001ac9498 CR3: 00000002247cb000 CR4: 00000000001407e0
[12525.358577] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12525.359246] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12525.359911] Stack:
[12525.360568]  0000000000000000 0000000000000000 ffff8801c048fa78 0000000000000000
[12525.361250]  ffffffff81048cd0 ffff8801c048fb08 0000000000000003 00000000639bd7ce
[12525.361928]  ffff880095342da0 00000000ffffffff 0000000000000002 ffffffff81048cd0
[12525.362601] Call Trace:
[12525.363259]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12525.363926]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12525.364589]  [<ffffffff810fbdb0>] smp_call_function_single+0x70/0xd0
[12525.365250]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12525.365904]  [<ffffffff810fc4a9>] smp_call_function_many+0x2b9/0x320
[12525.366556]  [<ffffffff811a396d>] ? unmap_single_vma+0x50d/0x900
[12525.367204]  [<ffffffff81049020>] flush_tlb_mm_range+0x90/0x1d0
[12525.367843]  [<ffffffff811a25b2>] tlb_flush_mmu_tlbonly+0x42/0x50
[12525.368476]  [<ffffffff811a2bac>] tlb_flush_mmu+0x1c/0x30
[12525.369102]  [<ffffffff811a2bd4>] tlb_finish_mmu+0x14/0x40
[12525.369726]  [<ffffffff811a3e78>] zap_page_range_single+0x118/0x160
[12525.370348]  [<ffffffff811a4044>] unmap_mapping_range+0x134/0x190
[12525.370971]  [<ffffffff81192a5d>] shmem_fallocate+0x4fd/0x520
[12525.371591]  [<ffffffff810bcb77>] ? prepare_to_wait+0x27/0x90
[12525.372210]  [<ffffffff811e6062>] do_fallocate+0x132/0x1d0
[12525.372825]  [<ffffffff811b8558>] SyS_madvise+0x398/0x870
[12525.373439]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12525.374061] Code: 48 89 de 48 03 14 c5 a0 f2 d1 81 48 89 df e8 fa e0 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[12525.375458] NMI backtrace for cpu 0
[12525.375565] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 27.836 msecs
[12525.376788] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0+ #106
[12525.378827] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[12525.379514] RIP: 0010:[<ffffffff813d6cab>]  [<ffffffff813d6cab>] intel_idle+0xdb/0x180
[12525.380215] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[12525.380905] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[12525.381588] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[12525.382254] RBP: ffffffff81c03e68 R08: 000000008baf8a0b R09: 0000000000000000
[12525.382911] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[12525.383558] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[12525.384194] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:00000000000000[12525.398359] NMI backtrace for cpu 1
[12525.398368] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 50.720 msecs
[12525.399764] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.18.0+ #106
[12525.401932] task: ffff8802434b96d0 ti: ffff880243748000 task.ti: ffff880243748000
[12525.402687] RIP: 0010:[<ffffffff813d6cab>]  [<ffffffff813d6cab>] intel_idle+0xdb/0x180
[12525.403457] RSP: 0018:ffff88024374be08  EFLAGS: 00000046
[12525.404241] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[12525.405038] RDX: 0000000000000000 RSI: ffff88024374bfd8 RDI: 0000000000000001
[12525.405832] RBP: ffff88024374be38 R08: 000000008baf8a0b R09: 0000000000000000
[12525.406611] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[12525.407377] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243748000
[12525.408131] FS:  0000000000000000(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[12525.408881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12525.409601] CR2: 00000000000000d2 CR3: 0000000001c11000 CR4: 00000000001407e0
[12525.410310] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12525.411011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12525.411694] Stack:
[12525.412363]  000000014374be38 b532cae23f43ed20 ffffe8ffff003200 0000000000000005
[12525.413051]  ffffffff81cb1bc0 0000000000000001 ffff88024374be88 ffffffff8165f7b5
[12525.413744]  00000b65f893e397 ffffffff81cb1d90 ffffffff81cb1bc0 ffffffff81d215f0
[12525.414439] Call Trace:
[12525.415125]  [<ffffffff8165f7b5>] cpuidle_enter_state+0x55/0x190
[12525.415824]  [<ffffffff8165f9a7>] cpuidle_enter+0x17/0x20
[12525.416523]  [<ffffffff810bd665>] cpu_startup_entry+0x355/0x410
[12525.417216]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[12525.417904] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[12525.419435] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 71.790 msecs
[12549.296847] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c60:28574]
[12549.297585] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[12549.302426] CPU: 2 PID: 28574 Comm: trinity-c60 Tainted: G             L 3.18.0+ #106
[12549.304874] task: ffff880153a20000 ti: ffff8801c251c000 task.ti: ffff8801c251c000
[12549.305730] RIP: 0010:[<ffffffff81373805>]  [<ffffffff81373805>] copy_user_enhanced_fast_string+0x5/0x10
[12549.306609] RSP: 0018:ffff8801c251fbe0  EFLAGS: 00010206
[12549.307502] RAX: 00007f9c44b07f1d RBX: 0000000000000003 RCX: 00000000000009c0
[12549.308400] RDX: 0000000000001000 RSI: 00007f9c44b0855d RDI: ffff88007cb1f640
[12549.309284] RBP: ffff8801c251fc28 R08: 0000000000000000 R09: 0000000000000001
[12549.310159] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff817cf22b
[12549.311018] R13: ffff8801c251fb68 R14: ffff880061dccd40 R15: ffff880061dcce60
[12549.311862] FS:  00007f9c4714c740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[12549.312716] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12549.313581] CR2: 00007f9c45b905b8 CR3: 000000022a30b000 CR4: 00000000001407e0
[12549.314418] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12549.315239] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12549.316051] Stack:
[12549.316861]  ffffffff8119fe56 ffff8801c251fbf8 140000010718f000 0000000000001000
[12549.317679]  140000010718f000 0000000000001000 ffff8801c251fd60 0000000000000000
[12549.318514]  ffff880061dcd070 ffff8801c251fcc8 ffffffff81175fe7 ffff8801c251fc88
[12549.319338] Call Trace:
[12549.320153]  [<ffffffff8119fe56>] ? iov_iter_copy_from_user_atomic+0x156/0x180
[12549.320988]  [<ffffffff81175fe7>] generic_perform_write+0xf7/0x1f0
[12549.321824]  [<ffffffff81178722>] __generic_file_write_iter+0x162/0x350
[12549.322658]  [<ffffffff811e7940>] ? new_sync_read+0xd0/0xd0
[12549.323484]  [<ffffffff8117894f>] generic_file_write_iter+0x3f/0xb0
[12549.324306]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12549.325133]  [<ffffffff811e7a88>] do_iter_readv_writev+0x78/0xc0
[12549.325961]  [<ffffffff811e92b8>] do_readv_writev+0xd8/0x2a0
[12549.326783]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12549.327646]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12549.328470]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12549.329279]  [<ffffffff817cf310>] ? _raw_spin_unlock_irq+0x30/0x40
[12549.330078]  [<ffffffff811e950c>] vfs_writev+0x3c/0x50
[12549.330874]  [<ffffffff811e98d2>] SyS_pwritev+0xc2/0xf0
[12549.331672]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12549.332471] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[12549.334222] sending NMI to other CPUs:
[12549.335023] NMI backtrace for cpu 3
[12549.335811] CPU: 3 PID: 25560 Comm: trinity-c201 Tainted: G             L 3.18.0+ #106
[12549.338137] task: ffff880095342da0 ti: ffff8801c048c000 task.ti: ffff8801c048c000
[12549.338904] RIP: 0010:[<ffffffff810fbc4e>]  [<ffffffff810fbc4e>] generic_exec_single+0xee/0x1b0
[12549.339670] RSP: 0018:ffff8801c048f9f8  EFLAGS: 00000202
[12549.340416] RAX: 0000000000000008 RBX: ffff8801c048fa10 RCX: 0000000000000038
[12549.341157] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[12549.341879] RBP: ffff8801c048fa58 R08: ffff8802437418f0 R09: 0000000000000000
[12549.342587] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[12549.343280] R13: 0000000000000001 R14: ffff880061e34fc0 R15: 0000000000000003
[12549.343961] FS:  00007f9c4714c740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[12549.344639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12549.345309] CR2: 0000000001ac9498 CR3: 00000002247cb000 CR4: 00000000001407e0
[12549.345981] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12549.346649] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12549.347313] Stack:
[12549.347969]  0000000000000000 0000000000000000 ffff8801c048fa78 0000000000000000
[12549.348648]  ffffffff81048cd0 ffff8801c048fb08 0000000000000003 00000000639bd7ce
[12549.349325]  ffff880095342da0 00000000ffffffff 0000000000000002 ffffffff81048cd0
[12549.349997] Call Trace:
[12549.350653]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12549.351319]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12549.351983]  [<ffffffff810fbdb0>] smp_call_function_single+0x70/0xd0
[12549.352644]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12549.353298]  [<ffffffff810fc4a9>] smp_call_function_many+0x2b9/0x320
[12549.353954]  [<ffffffff811a396d>] ? unmap_single_vma+0x50d/0x900
[12549.354604]  [<ffffffff81049020>] flush_tlb_mm_range+0x90/0x1d0
[12549.355245]  [<ffffffff811a25b2>] tlb_flush_mmu_tlbonly+0x42/0x50
[12549.355881]  [<ffffffff811a2bac>] tlb_flush_mmu+0x1c/0x30
[12549.356508]  [<ffffffff811a2bd4>] tlb_finish_mmu+0x14/0x40
[12549.357131]  [<ffffffff811a3e78>] zap_page_range_single+0x118/0x160
[12549.357756]  [<ffffffff811a4044>] unmap_mapping_range+0x134/0x190
[12549.358379]  [<ffffffff81192a5d>] shmem_fallocate+0x4fd/0x520
[12549.359001]  [<ffffffff810bcb77>] ? prepare_to_wait+0x27/0x90
[12549.359620]  [<ffffffff811e6062>] do_fallocate+0x132/0x1d0
[12549.360239]  [<ffffffff811b8558>] SyS_madvise+0x398/0x870
[12549.360855]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12549.361476] Code: 48 89 de 48 03 14 c5 a0 f2 d1 81 48 89 df e8 fa e0 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[12549.362876] NMI backtrace for cpu 1
[12549.363530] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.18.0+ #106
[12549.365548] task: ffff8802434b96d0 ti: ffff880243748000 task.ti: ffff880243748000
[12549.366222] RIP: 0010:[<ffffffff813d6cab>]  [<ffffffff813d6cab>] intel_idle+0xdb/0x180
[12549.366902] RSP: 0018:ffff88024374be08  EFLAGS: 00000046
[12549.367578] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[12549.368258] RDX: 0000000000000000 RSI: ffff88024374bfd8 RDI: 0000000000000001
[12549.368920] RBP: ffff88024374be38 R08: 000000008baf89f8 R09: 0000000000000000
[12549.369565] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[12549.370201] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243748000
[12549.370828] FS:  0000000000000000(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[12549.371451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12549.372069][12563.088868] INFO: rcu_sched self-detected stall on CPU
[12563.089596] 	2: (1 GPs behind) idle=d4b/140000000000001/0 softirq=781780/781781 
[12563.090304] 	 (t=6000 jiffies g=382738 c=382737 q=0)
[12563.090996] Task dump for CPU 2:
[12563.091676] trinity-c60     R  running task    12768 28574  19163 0x0000000c
[12563.092378]  ffff880153a20000 000000005c7b7872 ffff880245203d68 ffffffff810a743c
[12563.093085]  ffffffff810a73a2 0000000000000002 0000000000000004 0000000000000002
[12563.093810]  ffffffff81c52380 0000000000000092 ffff880245203d88 ffffffff810ab4ed
[12563.094507] Call Trace:
[12563.095197]  <IRQ>  [<ffffffff810a743c>] sched_show_task+0x11c/0x190
[12563.095910]  [<ffffffff810a73a2>] ? sched_show_task+0x82/0x190
[12563.096636]  [<ffffffff810ab4ed>] dump_cpu_task+0x3d/0x50
[12563.097336]  [<ffffffff810d9b90>] rcu_dump_cpu_stacks+0x90/0xd0
[12563.098019]  [<ffffffff810e0723>] rcu_check_callbacks+0x503/0x770
[12563.098712]  [<ffffffff8113058c>] ? acct_account_cputime+0x1c/0x20
[12563.099408]  [<ffffffff810abe27>] ? account_system_time+0x97/0x180
[12563.100022]  [<ffffffff810e63fb>] update_process_times+0x4b/0x80
[12563.100623]  [<ffffffff810f6db3>] ? tick_sched_timer+0x23/0x1b0
[12563.101214]  [<ffffffff810f6ddf>] tick_sched_timer+0x4f/0x1b0
[12563.101795]  [<ffffffff810e729f>] __run_hrtimer+0xaf/0x240
[12563.102370]  [<ffffffff810e75ab>] ? hrtimer_interrupt+0x8b/0x260
[12563.102941]  [<ffffffff810f6d90>] ? tick_init_highres+0x20/0x20
[12563.103514]  [<ffffffff810e7627>] hrtimer_interrupt+0x107/0x260
[12563.104086]  [<ffffffff81031e9b>] local_apic_timer_interrupt+0x3b/0x70
[12563.104660]  [<ffffffff817d29c5>] smp_apic_timer_interrupt+0x45/0x60
[12563.105234]  [<ffffffff817d0daf>] apic_timer_interrupt+0x6f/0x80
[12563.105805]  <EOI>  [<ffffffff81373805>] ? copy_user_enhanced_fast_string+0x5/0x10
[12563.106387]  [<ffffffff8119fe56>] ? iov_iter_copy_from_user_atomic+0x156/0x180
[12563.106965]  [<ffffffff81175fe7>] generic_perform_write+0xf7/0x1f0
[12563.107545]  [<ffffffff81178722>] __generic_file_write_iter+0x162/0x350
[12563.108128]  [<ffffffff811e7940>] ? new_sync_read+0xd0/0xd0
[12563.108705]  [<ffffffff8117894f>] generic_file_write_iter+0x3f/0xb0
[12563.109299]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12563.109869]  [<ffffffff811e7a88>] do_iter_readv_writev+0x78/0xc0
[12563.110440]  [<ffffffff811e92b8>] do_readv_writev+0xd8/0x2a0
[12563.111003]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12563.111566]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12563.112125]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12563.112686]  [<ffffffff817cf310>] ? _raw_spin_unlock_irq+0x30/0x40
[12563.113248]  [<ffffffff811e950c>] vfs_writev+0x3c/0x50
[12563.113810]  [<ffffffff811e98d2>] SyS_pwritev+0xc2/0xf0
[12563.114373]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12577.290614] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c201:25560]
[12577.291259] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[12577.295639] CPU: 3 PID: 25560 Comm: trinity-c201 Tainted: G             L 3.18.0+ #106
[12577.297656] task: ffff880095342da0 ti: ffff8801c048c000 task.ti: ffff8801c048c000
[12577.298367] RIP: 0010:[<ffffffff810fbc4a>]  [<ffffffff810fbc4a>] generic_exec_single+0xea/0x1b0
[12577.299091] RSP: 0018:ffff8801c048f9f8  EFLAGS: 00000202
[12577.299825] RAX: 0000000000000008 RBX: ffffffff817d0ae0 RCX: 0000000000000038
[12577.300553] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[12577.301296] RBP: ffff8801c048fa58 R08: ffff8802437418f0 R09: 0000000000000000
[12577.302007] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c048f968
[12577.302726] R13: 0000000000001a4a R14: ffff8801c048c000 R15: ffff880095342da0
[12577.303444] FS:  00007f9c4714c740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[12577.304169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12577.304897] CR2: 0000000001ac9498 CR3: 00000002247cb000 CR4: 00000000001407e0
[12577.305636] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12577.306386] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12577.307114] Stack:
[12577.307830]  0000000000000000 0000000000000000 ffff8801c048fa78 0000000000000000
[12577.308585]  ffffffff81048cd0 ffff8801c048fb08 0000000000000003 00000000639bd7ce
[12577.309355]  ffff880095342da0 00000000ffffffff 0000000000000002 ffffffff81048cd0
[12577.310106] Call Trace:
[12577.310898]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12577.311664]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12577.312440]  [<ffffffff810fbdb0>] smp_call_function_single+0x70/0xd0
[12577.313196]  [<ffffffff81048cd0>] ? do_flush_tlb_all+0x60/0x60
[12577.313935]  [<ffffffff810fc4a9>] smp_call_function_many+0x2b9/0x320
[12577.314684]  [<ffffffff811a396d>] ? unmap_single_vma+0x50d/0x900
[12577.315429]  [<ffffffff81049020>] flush_tlb_mm_range+0x90/0x1d0
[12577.316182]  [<ffffffff811a25b2>] tlb_flush_mmu_tlbonly+0x42/0x50
[12577.316913]  [<ffffffff811a2bac>] tlb_flush_mmu+0x1c/0x30
[12577.317664]  [<ffffffff811a2bd4>] tlb_finish_mmu+0x14/0x40
[12577.318399]  [<ffffffff811a3e78>] zap_page_range_single+0x118/0x160
[12577.319123]  [<ffffffff811a4044>] unmap_mapping_range+0x134/0x190
[12577.319871]  [<ffffffff81192a5d>] shmem_fallocate+0x4fd/0x520
[12577.320624]  [<ffffffff810bcb77>] ? prepare_to_wait+0x27/0x90
[12577.321411]  [<ffffffff811e6062>] do_fallocate+0x132/0x1d0
[12577.322149]  [<ffffffff811b8558>] SyS_madvise+0x398/0x870
[12577.322868]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12577.323617] Code: c0 3a 1d 00 48 89 de 48 03 14 c5 a0 f2 d1 81 48 89 df e8 fa e0 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[12577.325246] sending NMI to other CPUs:
[12577.325989] NMI backtrace for cpu 2
[12577.326717] CPU: 2 PID: 28574 Comm: trinity-c60 Tainted: G             L 3.18.0+ #106
[12577.328845] task: ffff880153a20000 ti: ffff8801c251c000 task.ti: ffff8801c251c000
[12577.329548] RIP: 0010:[<ffffffff810c523d>]  [<ffffffff810c523d>] __lock_acquire.isra.31+0x33d/0x9f0
[12577.330250] RSP: 0018:ffff880245203d88  EFLAGS: 00000097
[12577.330933] RAX: 0000000000000000 RBX: ffff880153a20000 RCX: ffff8802453cff98
[12577.331610] RDX: 0000000000000980 RSI: 0000000000000010 RDI: 0000000000000000
[12577.332272] RBP: ffff880245203df8 R08: 0000000000000001 R09: 0000000000000000
[12577.332924] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000097
[12577.333564] R13: 0000000000000002 R14: ffff8802453cc898 R15: ffff880153a207e0
[12577.334195] FS:  00007f9c4714c740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[12577.334831] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12577.335469] CR2: 00007f9c45b905b8 CR3: 000000022a30b000 CR4: 00000000001407e0
[12577.336116] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12577.336759] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12577.337399] Stack:
[12577.338024]  ffff880153a20000 0000000000000046 ffff880245203dd8 0000000000000092
[12577.338671]  ffff8802453d2f18 ffff8802453d2f00 0000000000000002 0000000000000000
[12577.339313]  ffff880245203dc8 0000000000000046 0000000000000000 0000000000000000
[12577.339955] Call Trace:
[12577.340585]  <IRQ> 
[12577.340593]  [<ffffffff810c5fff>] lock_acquire+0x9f/0x120
[12577.341830]  [<ffffffff810e72c6>] ? __run_hrtimer+0xd6/0x240
[12577.342450]  [<ffffffff810f6db3>] ? tick_sched_timer+0x23/0x1b0
[12577.343065]  [<ffffffff817cee9e>] _raw_spin_lock+0x3e/0x80
[12577.343673]  [<ffffffff810e72c6>] ? __run_hrtimer+0xd6/0x240
[12577.344277]  [<ffffffff810e72c6>] __run_hrtimer+0xd6/0x240
[12577.344886]  [<ffffffff810e75ab>] ? hrtimer_interrupt+0x8b/0x260
[12577.345493]  [<ffffffff810f6d90>] ? tick_init_highres+0x20/0x20
[12577.346099]  [<ffffffff810e7627>] hrtimer_interrupt+0x107/0x260
[12577.346703]  [<ffffffff81031e9b>] local_apic_timer_interrupt+0x3b/0x70
[12577.347309]  [<ffffffff817d29c5>] smp_apic_timer_interrupt+0x45/0x60
[12577.347916]  [<ffffffff817d0daf>] apic_timer_interrupt+0x6f/0x80
[12577.348522]  <EOI> 
[12577.348529]  [<ffffffff81373805>] ? copy_user_enhanced_fast_string+0x5/0x10
[12577.349750]  [<ffffffff8119fe56>] ? iov_iter_copy_from_user_atomic+0x156/0x180
[12577.350375]  [<ffffffff81175fe7>] generic_perform_write+0xf7/0x1f0
[12577.351001]  [<ffffffff81178722>] __generic_file_write_iter+0x162/0x350
[12577.351626]  [<ffffffff811e7940>] ? new_sync_read+0xd0/0xd0
[12577.352253]  [<ffffffff8117894f>] generic_file_write_iter+0x3f/0xb0
[12577.352883]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12577.353515]  [<ffffffff811e7a88>] do_iter_readv_writev+0x78/0xc0
[12577.354147]  [<ffffffff811e92b8>] do_readv_writev+0xd8/0x2a0
[12577.354763]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12577.355366]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12577.355956]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12577.356550]  [<ffffffff817cf310>] ? _raw_spin_unlock_irq+0x30/0x40
[12577.357141]  [<ffffffff811e950c>] vfs_writev+0x3c/0x50
[12577.357713]  [<ffffffff811e98d2>] SyS_pwritev+0xc2/0xf0
[12577.358267]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12577.358815] Code: 00 00 48 83 c4 48 44 89 e8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 2e 0f 1f 84 00 00 00 00 00 41 81 fc fe 1f 00 00 0f 87 34 04 00 00 <45> 85 ed 48 8b 83 60 07 00 00 75 09 48 85 c0 0f 85 df 03 00 00 
[12577.360038] NMI backtrace for cpu 0
[12577.360598] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #106
[12577.381697] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.18.0+ #106
[12577.383862] task: ffff8802434b96d0 ti: ffff880243748000 task.ti: ffff880243748000
[12577.384585] RIP: 0010:[<ffffffff813d6cab>]  [<ffffffff813d6cab>] intel_idle+0xdb/0x180
[12577.385332] RSP: 0018:ffff88024374be08  EFLAGS: 00000046
[12577.386066] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[12577.386807] RDX: 0000000000000000 RSI: ffff88024374bfd8 RDI: 0000000000000001
[12577.387545] RBP: ffff88024374be38 R08: 000000008baf89e1 R09: 0000000000000000
[12577.388285] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[12577.389028] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243748000
[12577.389752] FS:  0000000000000000(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[12577.390463] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12577.391170] CR2: 00007f5271c07000 CR3: 0000000001c11000 CR4: 00000000001407e0
[12577.391888] DR0: 00007f6c9030b000 DR1: 0000000000000000 DR2: 0000000000000000
[12577.392584] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12577.393258] Stack:
[12577.393915]  000000014374be38 b532cae23f43ed20 ffffe8ffff003200 0000000000000005
[12577.394595]  ffffffff81cb1bc0 0000000000000001 ffff88024374be88 ffffffff8165f7b5
[12577.395278]  00000b72149ea4e1 ffffffff81cb1d90 ffffffff81cb1bc0 ffffffff81d215f0
[12577.395972] Call Trace:
[12577.396643]  [<ffffffff8165f7b5>] cpuidle_enter_state+0x55/0x190
[12577.397327]  [<ffffffff8165f9a7>] cpuidle_enter+0x17/0x20
[12577.398010]  [<ffffffff810bd665>] cpu_startup_entry+0x355/0x410
[12577.398691]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[12577.399364] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[12577.400864] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 74.851 msecs
[12591.532403] sched: RT throttling activated
[12613.269774] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [trinity-c12:30629]
[12613.270528] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[12613.275557] CPU: 3 PID: 30629 Comm: trinity-c12 Tainted: G             L 3.18.0+ #106
[12613.278212] task: ffff8802433b5b40 ti: ffff880180f34000 task.ti: ffff880180f34000
[12613.279152] RIP: 0010:[<ffffffff8117e2df>]  [<ffffffff8117e2df>] free_hot_cold_page+0x11f/0x1a0
[12613.280117] RSP: 0018:ffff880180f37b38  EFLAGS: 00000202
[12613.281030] RAX: 0000000000000003 RBX: 0000000000000002 RCX: 0000000000000240
[12613.281939] RDX: ffff88024540d380 RSI: 0000000000000000 RDI: ffff88024e5d3f40
[12613.282817] RBP: ffff880180f37b78 R08: 0000000000000000 R09: 000000000024e600
[12613.283701] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88024e5d3f40
[12613.284572] R13: 0000000180f37b08 R14: ffffffff8117df5b R15: ffff880180f37b28
[12613.285454] FS:  00007f688fc33740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[12613.286408] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12613.287338] CR2: 00007f688baa9000 CR3: 00000002276f9000 CR4: 00000000001407e0
[12613.288228] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12613.289102] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12613.289957] Stack:
[12613.290804]  00000000001ddf58 ffff88024e5d3e80 ffff8802433b5b40 ffffea00076a8540
[12613.291675]  ffffea000777d600 0000000000000000 ffffea00062ec580 ffff880180f37c00
[12613.292574]  ffff880180f37bb8 ffffffff8117e3c5 000000000000000d ffffea00076a8540
[12613.293471] Call Trace:
[12613.294325]  [<ffffffff8117e3c5>] free_hot_cold_page_list+0x65/0xd0
[12613.295179]  [<ffffffff81184f2d>] release_pages+0x1bd/0x270
[12613.296027]  [<ffffffff81185f63>] __pagevec_release+0x43/0x60
[12613.296869]  [<ffffffff81191f80>] shmem_undo_range+0x460/0x710
[12613.297704]  [<ffffffff81192248>] shmem_truncate_range+0x18/0x40
[12613.298541]  [<ffffffff811924d6>] shmem_setattr+0x116/0x1a0
[12613.299383]  [<ffffffff812075d1>] notify_change+0x241/0x390
[12613.300250]  [<ffffffff811e5ad5>] do_truncate+0x75/0xc0
[12613.301103]  [<ffffffff811e5e4a>] ? do_sys_ftruncate.constprop.14+0xda/0x160
[12613.301962]  [<ffffffff811e5e7f>] do_sys_ftruncate.constprop.14+0x10f/0x160
[12613.302815]  [<ffffffff81374f6e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[12613.303656]  [<ffffffff811e5f0e>] SyS_ftruncate+0xe/0x10
[12613.304497]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12613.305362] Code: 8d 7b 20 e8 64 3b 20 00 41 8b 04 24 83 c0 01 41 3b 44 24 04 41 89 04 24 7d 52 41 f7 c6 00 02 00 00 74 31 e8 94 bf fc ff 41 56 9d <48> 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 90 49 8d 34 04 
[12613.307230] sending NMI to other CPUs:
[12613.308088] NMI backtrace for cpu 2
[12613.308871] CPU: 2 PID: 30778 Comm: trinity-c161 Tainted: G             L 3.18.0+ #106
[12613.311213] task: ffff8800478eada0 ti: ffff8802251dc000 task.ti: ffff8802251dc000
[12613.311995] RIP: 0010:[<ffffffff810c96bd>]  [<ffffffff810c96bd>] do_raw_spin_trylock+0x2d/0x50
[12613.312776] RSP: 0018:ffff8802251dfd88  EFLAGS: 00000246
[12613.313536] RAX: 0000000000008f8f RBX: ffff880235610238 RCX: 0000000000000000
[12613.314287] RDX: 0000000000008f8f RSI: 000000000000908f RDI: ffff880235610238
[12613.315022] RBP: ffff8802251dfd88 R08: 0000000000000000 R09: 0000000000000000
[12613.315743] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880235610250
[12613.316446] R13: ffff88009f3cc640 R14: 0000000000000000 R15: ffff8802356103c0
[12613.317133] FS:  00007f688fc33740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[12613.317813] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12613.318483] CR2: 000000288076098d CR3: 000000022a0d2000 CR4: 00000000001407e0
[12613.319155] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12613.319819] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12613.320475] Stack:
[12613.321122]  ffff8802251dfdb8 ffffffff817ceea6 ffffffff81214c6a ffffffff817cf22b
[12613.321796]  ffff8802356101b0 ffff880235610238 ffff8802251dff08 ffffffff81214c6a
[12613.322474]  00000000251dfde8 7fffffffffffffff ffff88009f3cc520 ffff88017b9a3c28
[12613.323153] Call Trace:
[12613.323815]  [<ffffffff817ceea6>] _raw_spin_lock+0x46/0x80
[12613.324479]  [<ffffffff81214c6a>] ? sync_inodes_sb+0x1ca/0x2b0
[12613.325138]  [<ffffffff817cf22b>] ? _raw_spin_unlock+0x2b/0x40
[12613.325798]  [<ffffffff81214c6a>] sync_inodes_sb+0x1ca/0x2b0
[12613.326458]  [<ffffffff817cb01f>] ? wait_for_completion+0xff/0x130
[12613.327114]  [<ffffffff8121c560>] ? vfs_fsync+0x40/0x40
[12613.327761]  [<ffffffff8121c579>] sync_inodes_one_sb+0x19/0x20
[12613.328404]  [<ffffffff811ec002>] iterate_supers+0xb2/0x110
[12613.329041]  [<ffffffff8121c824>] sys_sync+0x44/0xb0
[12613.329669]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12613.330292] Code: 44 00 00 0f b7 17 55 31 c9 48 89 e5 38 d6 74 0e 89 c8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 8d b2 00 01 00 00 89 d0 f0 66 0f b1 37 <66> 39 d0 75 e0 b1 01 5d 65 8b 04 25 2c a0 00 00 89 47 08 65 48 
[12613.331704] NMI backtrace for cpu 0
[12613.332380] CPU: 0 PID: 1460 Comm: trinity-c155 Tainted: G             L 3.18.0+ #106
[12613.334512] task: ffff880232a72da0 ti: ffff880232820000 task.ti: ffff880232820000
[12613.335239] RIP: 0033:[<000000336ee93d7d>]  [<000000336ee93d7d>] 0x336ee93d7d
[12613.335984] RSP: 002b:00007fff9d870a18  EFLAGS: 00000206
[12613.336708] RAX: ffffff68f0ef8610 RBX: 0000000000000200 RCX: 00007fff9d870f00
[12613.337440] RDX: 00007fff9d871a00 RSI: 00007f688e769020 RDI: 00007fff9d870a20
[12613.338173] RBP: 00007fff9d871a60 R08: ffffff68f0ef8600 R09: ffffff68f0ef85f0
[12613.338905] R10: ffffff68f0ef85e0 R11: 0000000000000246 R12: 0000000000000129
[12613.339627] R13: 0000000000001000 R14: 0000000000001000 R15: 00007f688e6a9000
[12613.340332] FS:  00007f688fc33740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[12613.341040] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12613.341753] CR2: 00007f688e702000 CR3: 00000001c9675000 CR4: 00000000001407f0
[12613.342468] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12613.343161] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12613.343836] 
[12613.344498] NMI backtrace for cpu 1
[12613.345187] CPU: 1 PID: 30783 Comm: trinity-c166 Tainted: G             L 3.18.0+ #106
[12613.347247] task: ffff8801c059ada0 ti: ffff880227058000 task.ti: ffff880227058000
[12613.347978] RIP: 0033:[<000000336ee39bc0>]  [<000000336ee39bc0>] 0x336ee39bc0
[12613.348715] RSP: 002b:00007fff9d8719d8  EFLAGS: 00000202
[12613.349440] RAX: e8ffff3573f747e9 RBX: 00007f688f22e000 RCX: 0000000000000000
[12613.350175] RDX: 8000000000000000 RSI: 00007fff9d8719cc RDI: 000000336f1b76e0
[12613.350903] RBP: 00007f688f22e068 R08: 000000336f1b7120 R09: 000000336f1b7140
[12613.351639] R10: ffffffffffff9f00 R11: 0000000000000202 R12: 00007f688fbb7528
[12613.352362] R13: 00007f688f22e068 R14: 0000000000000000 R15: 0000000000000000
[12613.353089] FS:  00007f688fc33740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[12613.353814] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12613.354537] CR2: 0000000000000008 CR3: 0000000153ac1000 CR4: 00000000001407e0
[12613.355265] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12613.355989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12613.356701] 
[12641.253566] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [trinity-main:30108]
[12641.254492] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[12641.260655] CPU: 3 PID: 30108 Comm: trinity-main Tainted: G             L 3.18.0+ #106
[12641.263755] task: ffff8800a0d15b40 ti: ffff880234ec4000 task.ti: ffff880234ec4000
[12641.264753] RIP: 0010:[<ffffffff811729d7>]  [<ffffffff811729d7>] task_bp_pinned.isra.6.constprop.12+0x37/0x90
[12641.265768] RSP: 0018:ffff880234ec7a40  EFLAGS: 00000287
[12641.266763] RAX: ffff88023f199d18 RBX: 000000000000005c RCX: 00000000001ce490
[12641.267751] RDX: ffff8801c87c2fa8 RSI: ffff8801ad860000 RDI: 0000000000000007
[12641.268728] RBP: ffff880234ec7a70 R08: ffffffff81d215f8 R09: ffff880245dce490
[12641.269700] R10: 0000000000000000 R11: 0000000000000008 R12: ffff880234ec7a30
[12641.270657] R13: ffffffff810abaf5 R14: ffff880234ec79b0 R15: ffffea000252c000
[12641.271618] FS:  00007f688fc33740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[12641.272585] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12641.273547] CR2: 000000336f1b8490 CR3: 0000000229035000 CR4: 00000000001407e0
[12641.274502] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12641.275443] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000000000017060a
[12641.276381] Stack:
[12641.277317]  0000000000000002 ffff8800a0d15b40 ffff880234ec7aa0 0000000000000001
[12641.278267]  ffff880094b06618 0000000000000007 ffff880234ec7ad0 ffffffff81172af6
[12641.279214]  ffffffff81c64868 0000000100000246 00000000001ce490 ffffffff81d215f8
[12641.280161] Call Trace:
[12641.281101]  [<ffffffff81172af6>] toggle_bp_slot.constprop.7+0xc6/0x1b0
[12641.282055]  [<ffffffff81172dfb>] __reserve_bp_slot+0x1eb/0x230
[12641.283017]  [<ffffffff81172e77>] reserve_bp_slot+0x27/0x40
[12641.283972]  [<ffffffff81172f68>] register_perf_hw_breakpoint+0x18/0x60
[12641.284884]  [<ffffffff81172fdf>] hw_breakpoint_event_init+0x2f/0x50
[12641.285766]  [<ffffffff8116fcfb>] perf_init_event+0x17b/0x1e0
[12641.286650]  [<ffffffff8116fb80>] ? perf_bp_event+0xd0/0xd0
[12641.287512]  [<ffffffff811700f8>] perf_event_alloc+0x398/0x440
[12641.288373]  [<ffffffff810e3c1c>] ? do_init_timer+0x5c/0x60
[12641.289239]  [<ffffffff81170e3e>] inherit_event.isra.90+0x8e/0x260
[12641.290095]  [<ffffffff8117104e>] inherit_task_group.isra.92.part.93+0x3e/0xe0
[12641.290959]  [<ffffffff811716e3>] perf_event_init_task+0x163/0x2e0
[12641.291812]  [<ffffffff81075756>] copy_process.part.26+0x726/0x1a40
[12641.292664]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[12641.293514]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12641.294347]  [<ffffffff810424b2>] ? __do_page_fault+0x232/0x610
[12641.295158]  [<ffffffff81076c37>] do_fork+0xe7/0x490
[12641.295969]  [<ffffffff811ea0b1>] ? __fput+0x191/0x200
[12641.296791]  [<ffffffff81374fe4>] ? lockdep_sys_exit_thunk+0x35/0x67
[12641.297597]  [<ffffffff81077066>] SyS_clone+0x16/0x20
[12641.298395]  [<ffffffff817d0299>] stub_clone+0x69/0x90
[12641.299173]  [<ffffffff817cff12>] ? system_call_fastpath+0x12/0x17
[12641.299935] Code: 54 53 48 83 ec 18 48 8b 05 57 1f af 00 48 8d 98 c0 fe ff ff 48 3d 10 49 c6 81 74 4c 41 89 fd 45 31 e4 eb 17 48 8b 93 40 01 00 00 <48> 8d 9a c0 fe ff ff 48 81 fa 10 49 c6 81 74 39 48 3b b3 28 01 
[12641.301567] sending NMI to other CPUs:
[12641.302311] NMI backtrace for cpu 1
[12641.303001] CPU: 1 PID: 3545 Comm: trinity-c7 Tainted: G             L 3.18.0+ #106
[12641.305035] task: ffff8800963f96d0 ti: ffff880215b28000 task.ti: ffff880215b28000
[12641.305715] RIP: 0010:[<ffffffff810fc466>]  [<ffffffff810fc466>] smp_call_function_many+0x276/0x320
[12641.306405] RSP: 0000:ffff880215b2baa8  EFLAGS: 00000202
[12641.307074] RAX: 0000000000000003 RBX: ffff8802451d3b00 RCX: ffff8802455d9488
[12641.307748] RDX: 0000000000000003 RSI: 0000000000000008 RDI: 0000000000000000
[12641.308423] RBP: ffff880215b2baf8 R08: ffff880243741e30 R09: 0000000100180011
[12641.309097] R10: ffff880244804240 R11: 0000000000000000 R12: 0000000000000003
[12641.309770] R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000008
[12641.310436] FS:  00007f688fc33740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[12641.311099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12641.311757] CR2: 0000000002258ff8 CR3: 0000000180fec000 CR4: 00000000001407e0
[12641.312420] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12641.313082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12641.313744] Stack:
[12641.314390]  ffff88024e5d4d00 0000000000000001 0000000100000003 00000000001d3ac0
[12641.315059]  0000004000000000 ffffffff82bea298 0000000000000001 ffffffff8117e150
[12641.315726]  0000000000000000 0000000000000000 ffff880215b2bb28 ffffffff810fc652
[12641.316393] Call Trace:
[12641.317045]  [<ffffffff8117e150>] ? drain_pages+0xc0/0xc0
[12641.317698]  [<ffffffff810fc652>] on_each_cpu_mask+0x32/0xa0
[12641.318351]  [<ffffffff8117c371>] drain_all_pages+0x101/0x120
[12641.319003]  [<ffffffff8118024f>] __alloc_pages_nodemask+0x7af/0xb40
[12641.319657]  [<ffffffff811c8ebe>] alloc_pages_vma+0xee/0x1b0
[12641.320310]  [<ffffffff811a416a>] ? do_wp_page+0xca/0x770
[12641.320959]  [<ffffffff811a416a>] do_wp_page+0xca/0x770
[12641.321608]  [<ffffffff811a69bb>] handle_mm_fault+0x6cb/0xe90
[12641.322258]  [<ffffffff8104241d>] ? __do_page_fault+0x19d/0x610
[12641.322910]  [<ffffffff8104248c>] __do_page_fault+0x20c/0x610
[12641.323557]  [<ffffffff811aa2dc>] ? validate_mm+0x15c/0x2c0
[12641.324204]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[12641.324847]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12641.325494]  [<ffffffff81374fad>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[12641.326142]  [<ffffffff8104289c>] do_page_fault+0xc/0x10
[12641.326785]  [<ffffffff817d1b32>] page_fault+0x22/0x30
[12641.327423] Code: 00 41 89 c4 39 f0 0f 8d 27 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 a0 f2 d1 81 f6 41 18 01 74 14 0f 1f 44 00 00 f3 90 f6 41 18 01 <75> f8 48 63 35 05 63 c2 00 83 f8 ff 48 8b 7b 08 74 b0 39 c6 77 
[12641.328871] NMI backtrace for cpu 0
[12641.329555] CPU: 0 PID: 30823 Comm: trinity-c206 Tainted: G             L 3.18.0+ #106
[12641.331677] task: ffff8801ad862da0 ti: ffff88017d7d0000 task.ti: ffff88017d7d0000
[12641.332396] RIP: 0010:[<ffffffff810fc466>]  [<ffffffff810fc466>] smp_call_function_many+0x276/0x320
[12641.333117] RSP: 0000:ffff88017d7d3a18  EFLAGS: 00000202
[12641.333827] RAX: 0000000000000003 RBX: ffff880244fd3b00 RCX: ffff8802455d7228
[12641.334546] RDX: 0000000000000003 RSI: 0000000000000008 RDI: 0000000000000000
[12641.335264] RBP: ffff88017d7d3a68 R08: ffff88024483b650 R09: 0000000100180011
[12641.335968] R10: ffff880244804240 R11: 0000000000000000 R12: 0000000000000003
[12641.336653] R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000008
[12641.337327] FS:  00007f688fc33740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[12641.338000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12641.338665] CR2: 0000000002258ff8 CR3: 0000000232b93000 CR4: 00000000001407f0
[126[12641.356633] NMI backtrace for cpu 2
[12641.357349] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #106
[12641.359555] task: ffff8802434bc470 ti: ffff880243758000 task.ti: ffff880243758000
[12641.360314] RIP: 0010:[<ffffffff813d6cab>]  [<ffffffff813d6cab>] intel_idle+0xdb/0x180
[12641.361070] RSP: 0000:ffff88024375be08  EFLAGS: 00000046
[12641.361812] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000001
[12641.362561] RDX: 0000000000000000 RSI: ffff88024375bfd8 RDI: 0000000000000002
[12641.363289] RBP: ffff88024375be38 R08: 000000008baf89af R09: 0000000000000000
[12641.363998] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[12641.364680] R13: 0000000000000000 R14: 0000000000000001 R15: ffff880243758000
[12641.365350] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[12641.366017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12641.366678] CR2: 0000000002258ff8 CR3: 0000000097dd1000 CR4: 00000000001407e0
[12641.367347] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12641.368016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[12641.368680] Stack:
[12641.369340]  000000024375be38 5dccb806cd27b931 ffffe8ffff203200 0000000000000001
[12641.370021]  ffffffff81cb1bc0 0000000000000002 ffff88024375be88 ffffffff8165f7b5
[12641.370710]  00000b80fc4e5dd1 ffffffff81cb1c30 ffffffff81cb1bc0 ffffffff81d215f0
[12641.371403] Call Trace:
[12641.372073]  [<ffffffff8165f7b5>] cpuidle_enter_state+0x55/0x190
[12641.372758]  [<ffffffff8165f9a7>] cpuidle_enter+0x17/0x20
[12641.373441]  [<ffffffff810bd665>] cpu_startup_entry+0x355/0x410
[12641.374108]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[12641.374769] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[12669.237353] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [trinity-c59:7816]
[12669.238086] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[12669.242957] CPU: 3 PID: 7816 Comm: trinity-c59 Tainted: G             L 3.18.0+ #106
[12669.245468] task: ffff8802258796d0 ti: ffff88007df64000 task.ti: ffff88007df64000
[12669.246378] RIP: 0010:[<ffffffff810c6014>]  [<ffffffff810c6014>] lock_acquire+0xb4/0x120
[12669.247281] RSP: 0018:ffff88007df67da8  EFLAGS: 00000246
[12669.248181] RAX: ffff8802258796d0 RBX: ffffffff810c512c RCX: ffff8802455cff98
[12669.249074] RDX: 00000000000001e0 RSI: 0000000000000010 RDI: 0000000000000000
[12669.249977] RBP: ffff88007df67e08 R08: 0000000000000000 R09: 0000000000000000
[12669.250881] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810abaf5
[12669.251790] R13: ffff88007df67d18 R14: ffffffff810abaf5 R15: ffff88007df67d08
[12669.252693] FS:  00007f688fc33740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[12669.253585] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12669.254486] CR2: 000000336f1b7740 CR3: 00000000a0ef4000 CR4: 00000000001407e0
[12669.255431] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12669.256368] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12669.257263] Stack:
[12669.258097]  ffffffff81086b05 0000000000000000 0000000000000000 0000000000000246
[12669.258933]  0000000000000002 ffffffff81c50e20 ffff88007df67e48 00000000000024ca
[12669.259768]  0000000000000000 ffff88007df67eb0 ffff8800953896d0 0000000000000000
[12669.260638] Call Trace:
[12669.261471]  [<ffffffff81086b05>] ? group_send_sig_info+0x5/0xc0
[12669.262318]  [<ffffffff81086b48>] group_send_sig_info+0x48/0xc0
[12669.263170]  [<ffffffff81086b05>] ? group_send_sig_info+0x5/0xc0
[12669.264028]  [<ffffffff81086d05>] kill_pid_info+0x65/0xb0
[12669.264945]  [<ffffffff81086ca5>] ? kill_pid_info+0x5/0xb0
[12669.265851]  [<ffffffff81086e3c>] SYSC_kill+0xcc/0x240
[12669.266674]  [<ffffffff81086df8>] ? SYSC_kill+0x88/0x240
[12669.267508]  [<ffffffff8107aebb>] ? SyS_wait4+0x8b/0x110
[12669.268328]  [<ffffffff81088cde>] SyS_kill+0xe/0x10
[12669.269145]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12669.269958] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[12669.271728] sending NMI to other CPUs:
[12669.272548] NMI backtrace for cpu 1
[12669.273314] CPU: 1 PID: 7878 Comm: trinity-c219 Tainted: G             L 3.18.0+ #106
[12669.275686] task: ffff880096860000 ti: ffff880097f2c000 task.ti: ffff880097f2c000
[12669.276489] RIP: 0010:[<ffffffff811037c2>]  [<ffffffff811037c2>] is_module_text_address+0x22/0x30
[12669.277296] RSP: 0018:ffff880097f2f9e8  EFLAGS: 00000046
[12669.278089] RAX: 0000000000000000 RBX: ffff880097f2fd70 RCX: 0000000000000004
[12669.278876] RDX: ffff880227e01b80 RSI: 0000000000000000 RDI: ffff880097f2fd70
[12669.279650] RBP: ffff880097f2fa00 R08: ffff880097f2fb40 R09: 0000000000000000
[12669.280408] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[12669.281147] R13: ffff880097f2fd80 R14: ffff880097f2fd50 R15: ffffffff81801b30
[12669.281874] FS:  00007f688fc33740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[12669.282595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12669.283299] CR2: 0000000001cb9a28 CR3: 00000000948d4000 CR4: 00000000001407e0
[12669.283997] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12669.284677] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12669.285348] Stack:
[12669.286003]  ffffffff810973c8 0000000000000000 ffff880097f2fd70 ffff880097f2fa70
[12669.286678]  ffffffff81006a7f ffff880097f2fa20 ffff880097f2fff8 ffff880097f2c000
[12669.287346]  ffff880097f2fd80 ffff880097f2fb40 ffffffffffffc000 0000000000000000
[12669.288011] Call Trace:
[12669.288672]  [<ffffffff810973c8>] ? __kernel_text_address+0x58/0x80
[12669.289343]  [<ffffffff81006a7f>] print_context_stack+0x8f/0x100
[12669.290009]  [<ffffffff8100559f>] dump_trace+0x16f/0x350
[12669.290668]  [<ffffffff811b22a9>] ? anon_vma_clone+0x49/0x140
[12669.291321]  [<ffffffff811b22a9>] ? anon_vma_clone+0x49/0x140
[12669.291963]  [<ffffffff810135bf>] save_stack_trace+0x2f/0x50
[12669.292603]  [<ffffffff811ce860>] set_track+0x70/0x140
[12669.293240]  [<ffffffff817c26de>] alloc_debug_processing+0x92/0x118
[12669.293872]  [<ffffffff817c33b2>] __slab_alloc+0x4da/0x58f
[12669.294498]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[12669.295119]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[12669.295745]  [<ffffffff811b22a9>] ? anon_vma_clone+0x49/0x140
[12669.296362]  [<ffffffff811b22a9>] ? anon_vma_clone+0x49/0x140
[12669.296969]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
[12669.297577]  [<ffffffff811b22a9>] anon_vma_clone+0x49/0x140
[12669.298185]  [<ffffffff811b23cd>] anon_vma_fork+0x2d/0x100
[12669.298790]  [<ffffffff81076392>] copy_process.part.26+0x1362/0x1a40
[12669.299398]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[12669.300009]  [<ffffffff81076c37>] do_fork+0xe7/0x490
[12669.300619]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[12669.301232]  [<ffffffff81374fe4>] ? lockdep_sys_exit_thunk+0x35/0x67
[12669.301850]  [<ffffffff81077066>] SyS_clone+0x16/0x20
[12669.302466]  [<ffffffff817d0299>] stub_clone+0x69/0x90
[12669.303062]  [<ffffffff817cff12>] ? system_call_fastpath+0x12/0x17
[12669.303646] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 65 ff 04 25 e0 a9 00 00 e8 4a b7 ff ff 65 ff 0c 25 e0 a9 00 00 48 85 c0 5d <0f> 95 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 
[12669.304961] NMI backtrace for cpu 0
[12669.305579] CPU: 0 PID: 5863 Comm: trinity-c230 Tainted: G             L 3.18.0+ #106
[12669.307401] task: ffff88023f830000 ti: ffff880096b74000 task.ti: ffff880096b74000
[12669.308023] RIP: 0010:[<ffffffff81191f24>]  [<ffffffff81191f24>] shmem_undo_range+0x404/0x710
[12669.308649] RSP: 0000:ffff880096b77c78  EFLAGS: 00000246
[12669.309266] RAX: 002ffe000008003d RBX: 000000000000000c RCX: 0000000000000835
[12669.309888] RDX: ffffea0005e915c0 RSI: ffffea0005e91580 RDI:[12669.325471] NMI backtrace for cpu 2
[12669.326185] CPU: 2 PID: 7188 Comm: trinity-c210 Tainted: G             L 3.18.0+ #106
[12669.328331] task: ffff8800a0d14470 ti: ffff8801c96d4000 task.ti: ffff8801c96d4000
[12669.329085] RIP: 0010:[<ffffffff810c4f21>]  [<ffffffff810c4f21>] __lock_acquire.isra.31+0x21/0x9f0
[12669.329858] RSP: 0018:ffff8801c96d79b8  EFLAGS: 00000096
[12669.330627] RAX: ffffffff81185068 RBX: ffff8800a0d14470 RCX: 0000000000000000
[12669.331406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88024e5d3c58
[12669.332191] RBP: ffff8801c96d7a28 R08: 0000000000000001 R09: 0000000000000000
[12669.332984] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[12669.333780] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[12669.334577] FS:  00007f688fc33740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[12669.335370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12669.336152] CR2: 00007f688d8a9000 CR3: 00000002271f7000 CR4: 00000000001407e0
[12669.336934] DR0: 00007f688e6a9000 DR1: 0000000000000000 DR2: 0000000000000000
[12669.337707] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[12669.338474] Stack:
[12669.339224]  ffff8801c96d79f8 0000000093c05f9b 00000000000000d0 ffffffff82c00cc0
[12669.340023]  ffff8800a0d14470 00000000000200da ffff8800a0d15780 0000000000000000
[12669.340803]  ffff8801c96d7a08 0000000000000046 0000000000000000 0000000000000000
[12669.341541] Call Trace:
[12669.342270]  [<ffffffff810c5fff>] lock_acquire+0x9f/0x120
[12669.342998]  [<ffffffff81185068>] ? pagevec_lru_move_fn+0x88/0x100
[12669.343728]  [<ffffffff817cf0a9>] _raw_spin_lock_irqsave+0x49/0x90
[12669.344455]  [<ffffffff81185068>] ? pagevec_lru_move_fn+0x88/0x100
[12669.345190]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12669.345916]  [<ffffffff81185068>] pagevec_lru_move_fn+0x88/0x100
[12669.346640]  [<ffffffff81185160>] ? __pagevec_lru_add+0x20/0x20
[12669.347374]  [<ffffffff81185811>] __lru_cache_add+0x71/0x90
[12669.348097]  [<ffffffff81185bf9>] lru_cache_add_anon+0x19/0x20
[12669.348811]  [<ffffffff81190b08>] shmem_getpage_gfp+0x528/0x7a0
[12669.349559]  [<ffffffff81190dc2>] shmem_write_begin+0x42/0x70
[12669.350260]  [<ffffffff81175fc4>] generic_perform_write+0xd4/0x1f0
[12669.350957]  [<ffffffff81178722>] __generic_file_write_iter+0x162/0x350
[12669.351655]  [<ffffffff811e7940>] ? new_sync_read+0xd0/0xd0
[12669.352348]  [<ffffffff8117894f>] generic_file_write_iter+0x3f/0xb0
[12669.353037]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12669.353738]  [<ffffffff811e7a88>] do_iter_readv_writev+0x78/0xc0
[12669.354434]  [<ffffffff811e92b8>] do_readv_writev+0xd8/0x2a0
[12669.355126]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12669.355828]  [<ffffffff81178910>] ? __generic_file_write_iter+0x350/0x350
[12669.356531]  [<ffffffff810c361f>] ? lock_release_holdtime.part.24+0xf/0x190
[12669.357228]  [<ffffffff817cf310>] ? _raw_spin_unlock_irq+0x30/0x40
[12669.357921]  [<ffffffff811e950c>] vfs_writev+0x3c/0x50
[12669.358596]  [<ffffffff811e967c>] SyS_writev+0x5c/0x100
[12669.359258]  [<ffffffff817cff12>] system_call_fastpath+0x12/0x17
[12669.359905] Code: 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 48 44 8b 25 48 80 be 00 65 48 8b 1c 25 00 aa 00 00 <45> 85 e4 0f 84 ef 00 00 00 44 8b 1d e7 32 ab 01 49 89 fe 41 89 
[12669.361354] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 88.798 msecs
[15739.449422] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u16:3:14112]
[15739.450087] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15739.454475] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G             L 3.18.0+ #106
[15739.456686] Workqueue: khelper __call_usermodehelper
[15739.457473] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: ffff880227eac000
[15739.458231] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] __slab_alloc+0x52f/0x58f
[15739.459015] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
[15739.459794] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 00000000000002e0
[15739.460570] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: ffff880244802000
[15739.461343] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 0000000000000000
[15739.462113] R10: 0000000000000092 R11: 0000000000000000 R12: ffffffff810135bf
[15739.462885] R13: ffff880227eaf878 R14: 0000000100160015 R15: ffffffff8138278d
[15739.463648] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15739.464431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15739.465203] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 00000000001407e0
[15739.465985] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15739.466766] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15739.467542] Stack:
[15739.468308]  000000000000005c ffff880240f8d790 ffff880240f8d790 ffff880240f8dd00
[15739.469106]  0000000180230020 000000010000000f ffffffff8112ee12 0000000000000000
[15739.469912]  ffff8802453d7260 000000020023001f ffff880227eaf968 ffffffff8138278d
[15739.470717] Call Trace:
[15739.471511]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
[15739.472325]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15739.473138]  [<ffffffff811ce860>] ? set_track+0x70/0x140
[15739.473947]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
[15739.474757]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15739.475571]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
[15739.476374]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
[15739.477174]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
[15739.477983]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
[15739.478781]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
[15739.479590]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
[15739.480387]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
[15739.481184]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15739.481983]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15739.482797]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
[15739.483598]  [<ffffffff81076c37>] do_fork+0xe7/0x490
[15739.484382]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15739.485160]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15739.485922]  [<ffffffff81077006>] kernel_thread+0x26/0x30
[15739.486669]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
[15739.487407]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
[15739.488126]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15739.488834]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
[15739.489529]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
[15739.490197]  [<ffffffff81098c89>] kthread+0xf9/0x110
[15739.490845]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15739.491474]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15739.492092]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
[15739.492689]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15739.493288] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c e8 22 46 
[15739.494617] sending NMI to other CPUs:
[15739.495221] NMI backtrace for cpu 3
[15739.495787] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G             L 3.18.0+ #106
[15739.497539] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: ffff880180e04000
[15739.498153] RIP: 0010:[<ffffffff810c6014>]  [<ffffffff810c6014>] lock_acquire+0xb4/0x120
[15739.498772] RSP: 0018:ffff880180e07dd8  EFLAGS: 00000246
[15739.499382] RAX: ffff8801adac4470 RBX: 0000000000000246 RCX: ffff8802455cff98
[15739.499994] RDX: 00000000000006a0 RSI: 0000000000000000 RDI: 0000000000000000
[15739.500609] RBP: ffff880180e07e38 R08: 0000000000000000 R09: 0000000000000000
[15739.501224] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[15739.501829] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[15739.502422] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15739.503021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15739.503621] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 00000000001407e0
[15739.504217] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15739.504812] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15739.505403] Stack:
[15739.505988]  ffffffff8107a8e9 0000000000000000 ffff8801adac4470 0000000000000246
[15739.506603]  0000000127cc4da0 ffffffff81c0a098 ffff8801adac4470 ffffffff81c0a080
[15739.507220]  ffffffff81c0a098 ffff8801adac4470 ffff8801adac4470 ffff8801adac4470
[15739.507839] Call Trace:
[15739.508448]  [<ffffffff8107a8e9>] ? do_wait+0xd9/0x280
[15739.509064]  [<ffffffff817cf3d1>] _raw_read_lock+0x41/0x80
[15739.509678]  [<ffffffff8107a8e9>] ? do_wait+0xd9/0x280
[15739.510293]  [<ffffffff8107a8e9>] do_wait+0xd9/0x280
[15739.510907]  [<ffffffff8107aeb0>] SyS_wait4+0x80/0x110
[15739.511518]  [<ffffffff81078990>] ? task_stopped_code+0x60/0x60
[15739.512130]  [<ffffffff817d0109>] tracesys_phase2+0xd4/0xd9
[15739.512739] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15739.514121] NMI backtrace for cpu 1
[15739.514769] CPU: 1 PID: 2849 Comm: trinity-c95 Tainted: G             L 3.18.0+ #106
[15739.516771] task: ffff880096b4c470 ti: ffff880153a4c000 task.ti: ffff880153a4c000
[15739.517467] RIP: 0010:[<ffffffff810961f9>]  [<ffffffff810961f9>] find_pid_ns+0x39/0x90
[15739.518177] RSP: 0018:ffff880153a4fe78  EFLAGS: 00000207
[15739.518884] RAX: ffff88024e517120 RBX: 0000000000000d7f RCX: 0000000000000034
[15739.519599] RDX: ffff880094be6500 RSI: ffffffff81c486c0 RDI: 0000000000000d7f
[15739.520316] RBP: ffff880153a4fe78 R08: 0000000000000000 R09: 0000000000000000
[15739.521036] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[15739.521752] R13: ffff880096b4c470 R14: 0000000000000000 R15: 0000000000000000
[15739.522464] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15739.523181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15739.523882] CR2: 00007f7996f24220 CR3: 00000002251b4000 CR4: 00000000001407e0
[15739.524580] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15739.525269] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15739.525957] Stack:
[15739.526632]  ffff880153a4fe88 ffffffff8109627f ffff880153a4ff68 ffffffff81086e2a
[15739.527319]  ffffffff81086df8 0000000000000000 ffff880096b4c840 0000000000000000
[15739.527992]  ffff880100000000 000003e800000b21 ffff880094be64c0 00007fff123fb870
[15739.528656] Call Trace:
[15739.529301]  [<ffffffff8109627f>] find_vpid+0x2f/0x50
[15739.529947]  [<ffffffff81086e2a>] SYSC_kill+0xba/0x240
[15739.530587]  [<ffffffff81086df8>] ? SYSC_kill+0x88/0x240
[15739[15763.435537] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u16:3:14112]
[15763.436252] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15763.441025] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G             L 3.18.0+ #106
[15763.443320] Workqueue: khelper __call_usermodehelper
[15763.444090] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: ffff880227eac000
[15763.444861] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] __slab_alloc+0x52f/0x58f
[15763.445654] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
[15763.446422] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 00000000000002e0
[15763.447201] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: ffff880244802000
[15763.447985] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 0000000000000000
[15763.448764] R10: 0000000000000092 R11: 0000000000000000 R12: ffffffff810135bf
[15763.449545] R13: ffff880227eaf878 R14: 0000000100160015 R15: ffffffff8138278d
[15763.450319] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15763.451100] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15763.451876] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 00000000001407e0
[15763.452662] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15763.453451] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15763.454230] Stack:
[15763.455003]  000000000000005c ffff880240f8d790 ffff880240f8d790 ffff880240f8dd00
[15763.455818]  0000000180230020 000000010000000f ffffffff8112ee12 0000000000000000
[15763.456608]  ffff8802453d7260 000000020023001f ffff880227eaf968 ffffffff8138278d
[15763.457393] Call Trace:
[15763.458166]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
[15763.458960]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15763.459753]  [<ffffffff811ce860>] ? set_track+0x70/0x140
[15763.460544]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
[15763.461338]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15763.462127]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
[15763.462921]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
[15763.463707]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
[15763.464503]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
[15763.465294]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
[15763.466086]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
[15763.466875]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
[15763.467664]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15763.468458]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15763.469251]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
[15763.470041]  [<ffffffff81076c37>] do_fork+0xe7/0x490
[15763.470828]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15763.471603]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15763.472371]  [<ffffffff81077006>] kernel_thread+0x26/0x30
[15763.473120]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
[15763.473854]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
[15763.474578]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15763.475287]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
[15763.475975]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
[15763.476640]  [<ffffffff81098c89>] kthread+0xf9/0x110
[15763.477285]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15763.477915]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15763.478532]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
[15763.479131]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15763.479741] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c e8 22 46 
[15763.481087] sending NMI to other CPUs:
[15763.481693] NMI backtrace for cpu 3
[15763.482255] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G             L 3.18.0+ #106
[15763.484005] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: ffff880180e04000
[15763.484618] RIP: 0010:[<ffffffff810c47f5>]  [<ffffffff810c47f5>] lock_acquired+0x45/0x370
[15763.485234] RSP: 0018:ffff880180e07db8  EFLAGS: 00000046
[15763.485841] RAX: 0000000000000001 RBX: ffff880227cc4da0 RCX: 0000000000000001
[15763.486453] RDX: 000000000000dbdb RSI: ffffffff810bcc8d RDI: ffff880227cc4db8
[15763.487068] RBP: ffff880180e07df8 R08: 0000000000000000 R09: 0000000000000001
[15763.487687] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801adac4470
[15763.488303] R13: ffff880227cc4db8 R14: 0000000000000046 R15: ffff8801adac4460
[15763.488909] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15763.489522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15763.490133] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 00000000001407e0
[15763.490747] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15763.491356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15763.491964] Stack:
[15763.492564]  0000000100000002 ffffffff810bcc8d ffff880180e07dd8 ffff880227cc4da0
[15763.493194]  ffff880227cc4db8 0000000000000292 ffff8801adac4470 ffff8801adac4460
[15763.493828]  ffff880180e07e38 ffffffff817cf0d5 ffffffff810bcc8d 0000000000000296
[15763.494464] Call Trace:
[15763.495088]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
[15763.495726]  [<ffffffff817cf0d5>] _raw_spin_lock_irqsave+0x75/0x90
[15763.496364]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
[15763.497005]  [<ffffffff810bcc8d>] remove_wait_queue+0x1d/0x40
[15763.497645]  [<ffffffff8107a95b>] do_wait+0x14b/0x280
[15763.498283]  [<ffffffff8107aeb0>] SyS_wait4+0x80/0x110
[15763.498917]  [<ffffffff81078990>] ? task_stopped_code+0x60/0x60
[15763.499551]  [<ffffffff817d0109>] tracesys_phase2+0xd4/0xd9
[15763.500183] Code: b8 00 45 85 c9 0f 84 d8 00 00 00 65 4c 8b 24 25 00 aa 00 00 45 8b 84 24 6c 07 00 00 45 85 c0 0f 85 be 00 00 00 49 89 fd 9c 41 5e <fa> 8b 35 1c 3a ab 01 41 c7 84 24 6c 07 00 00 01 00 00 00 41 8b 
[15763.501611] NMI backtrace for cpu 1
[15763.502274] CPU: 1 PID: 3298 Comm: trinity-c183 Tainted: G             L 3.18.0+ #106
[15763.504345] task: ffff880227de16d0 ti: ffff880071060000 task.ti: ffff880071060000
[15763.505065] RIP: 0033:[<000000336eebc2fc>]  [<000000336eebc2fc>] 0x336eebc2fc
[15763.505796] RSP: 002b:00007fff123fb868  EFLAGS: 00000246
[15763.506522] RAX: 0000000000000000 RBX: 0000000000000d7b RCX: ffffffffffffffff
[15763.507257] RDX: 000000000000000b RSI: 00007fff123fb870 RDI: 0000000000000d7b
[15763.507994] RBP: 0000000000000000 R08: 00007f79977d8740 R09: 0000000000000000
[15763.508729] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7996cf6000
[15763.509460] R13: 00007f7996cf6068 R14: 0000000000000000 R15: 0000000000000000
[15763.510188] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15763.510908] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15763.511612] CR2: 00007f7996f24220 CR3: 000000009a646000 CR4: 00000000001407e0
[15763.512319] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15763.513027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15763.513727] 
[15763.514403] NMI backtrace for cpu 0
[15763.515067] CPU: 0 PID: 1876 Comm: trinity-c189 Tainted: G             L 3.18.0+ #106
[15763.517087] task: ffff880096b4ada0 ti: ffff8802253d0000 task.ti: ffff8802253d0000
[15763.517776] RIP: 0010:[<[15779.306349] INFO: rcu_sched detected stalls on CPUs/tasks:
[15779.307024] 	(detected by 0, t=6002 jiffies, g=481360, c=481359, q=0)
[15779.307662] INFO: Stall ended before state dump start
[15787.421647] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u16:3:14112]
[15787.422333] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15787.426946] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G             L 3.18.0+ #106
[15787.429181] Workqueue: khelper __call_usermodehelper
[15787.429945] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: ffff880227eac000
[15787.430719] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] __slab_alloc+0x52f/0x58f
[15787.431507] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
[15787.432297] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 00000000000002e0
[15787.433094] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: ffff880244802000
[15787.433886] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 0000000000000000
[15787.434671] R10: 0000000000000092 R11: 0000000000000000 R12: ffffffff810135bf
[15787.435452] R13: ffff880227eaf878 R14: 0000000100160015 R15: ffffffff8138278d
[15787.436234] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15787.437028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15787.437816] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 00000000001407e0
[15787.438605] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15787.439400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15787.440183] Stack:
[15787.440959]  000000000000005c ffff880240f8d790 ffff880240f8d790 ffff880240f8dd00
[15787.441792]  0000000180230020 000000010000000f ffffffff8112ee12 0000000000000000
[15787.442615]  ffff8802453d7260 000000020023001f ffff880227eaf968 ffffffff8138278d
[15787.443422] Call Trace:
[15787.444217]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
[15787.445027]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15787.445833]  [<ffffffff811ce860>] ? set_track+0x70/0x140
[15787.446647]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
[15787.447449]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15787.448258]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
[15787.449067]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
[15787.449879]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
[15787.450688]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
[15787.451497]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
[15787.452307]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
[15787.453122]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
[15787.453928]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15787.454733]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15787.455524]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
[15787.456299]  [<ffffffff81076c37>] do_fork+0xe7/0x490
[15787.457062]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15787.457826]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15787.458587]  [<ffffffff81077006>] kernel_thread+0x26/0x30
[15787.459336]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
[15787.460078]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
[15787.460794]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15787.461495]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
[15787.462182]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
[15787.462850]  [<ffffffff81098c89>] kthread+0xf9/0x110
[15787.463496]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15787.464124]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15787.464742]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
[15787.465341]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15787.465947] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c e8 22 46 
[15787.467291] sending NMI to other CPUs:
[15787.467904] NMI backtrace for cpu 3
[15787.468463] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G             L 3.18.0+ #106
[15787.470211] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: ffff880180e04000
[15787.470824] RIP: 0033:[<000000336eebc2fc>]  [<000000336eebc2fc>] 0x336eebc2fc
[15787.471440] RSP: 002b:00007fff123fb868  EFLAGS: 00000246
[15787.472043] RAX: 0000000000000000 RBX: 0000000000000d7e RCX: ffffffffffffffff
[15787.472652] RDX: 000000000000000b RSI: 00007fff123fb870 RDI: 0000000000000d7e
[15787.473259] RBP: 0000000000000000 R08: 00007f79977d8740 R09: 0000000000000000
[15787.473868] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7997265000
[15787.474468] R13: 00007f7997265068 R14: 0000000000000000 R15: 0000000000000000
[15787.475063] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15787.475660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15787.476254] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 00000000001407e0
[15787.476851] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15787.477442] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15787.478032] 
[15787.478615] NMI backtrace for cpu 1
[15787.479205] CPU: 1 PID: 2849 Comm: trinity-c95 Tainted: G             L 3.18.0+ #106
[15787.481052] task: ffff880096b4c470 ti: ffff880153a4c000 task.ti: ffff880153a4c000
[15787.481696] RIP: 0010:[<ffffffff817cfea0>]  [<ffffffff817cfea0>] system_call+0x0/0x3
[15787.482351] RSP: 0018:00007fff123fb868  EFLAGS: 00000046
[15787.483007] RAX: 000000000000003d RBX: 0000000000000d7f RCX: 000000336eebc2fc
[15787.483674] RDX: 000000000000000b RSI: 00007fff123fb870 RDI: 0000000000000d7f
[15787.484338] RBP: 0000000000000000 R08: 00007f79977d8740 R09: 0000000000000000
[15787.485000] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f799716e000
[15787.485658] R13: 00007f799716e068 R14: 0000000000000000 R15: 0000000000000000
[15787.486314] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15787.486979] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15787.487641] CR2: 00007f7996f24220 CR3: 00000002251b4000 CR4: 00000000001407e0
[15787.488311] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15787.488982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15787.489650] Stack:
[15787.490312]  000000000041401b 00007f79976944e8 00007f799716e000 0000000000000001
[15787.490999]  00007f799716e07c 00007f799716e000 0000000000000000 00007f799716e07c
[15787.491690]  0000000000416c03 00000000000066b2 0000000000000155 00000000cccccccd
[15787.492383] Call Trace:
[15787.493063]  <UNK> 
[15787.493070] Code: 8b 3c 24 4c 8b 74 24 08 4c 8b 6c 24 10 4c 8b 64 24 18 48 8b 6c 24 20 48 8b 5c 24 28 48 83 c4 30 e9 74 02 00 00 66 0f 1f 44 00 00 <0f> 01 f8 65 48 89 24 25 80 a0 00 00 65 48 8b 24 25 08 aa 00 00 
[15787.495270] NMI backtrace for cpu 0
[15787.495987] CPU: 0 PID: 1876 Comm: trinity-c189 Tainted: G             L 3.18.0+ #106
[15787.498136] task: ffff880096b4ada0 ti: ffff8802253d0000 task.ti: ffff8802253d0000
[15787.498870] RIP: 0010:[<ffffffff810c9698>]  [<ffffffff810c9698>] do_raw_spin_trylock+0x8/0x50
[15787.499618] RSP: 0018:ffff880244e03d00  EFLAGS: 00000092
[15787.500344] RAX: ffff880096b4ada0 RBX: ffff880240c51578 RCX: ffff880244fcff98
[15787.501064] RDX: 0000000000004a4a RSI: 0000000000000018 RDI: ffff880240c51578
[15787.501780] RBP: ffff880244e03d38 R08: 0000000000000001 R09: 0000000000000000
[15787.502487] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880240c51590
[15787.503190] R13: 0000000000000092 R14: ff[15811.407761] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u16:3:14112]
[15811.408334] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15811.412296] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G             L 3.18.0+ #106
[15811.414324] Workqueue: khelper __call_usermodehelper
[15811.415025] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: ffff880227eac000
[15811.415733] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] __slab_alloc+0x52f/0x58f
[15811.416455] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
[15811.417171] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 00000000000002e0
[15811.417913] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: ffff880244802000
[15811.418645] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 0000000000000000
[15811.419374] R10: 0000000000000092 R11: 0000000000000000 R12: ffffffff810135bf
[15811.420102] R13: ffff880227eaf878 R14: 0000000100160015 R15: ffffffff8138278d
[15811.420831] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15811.421572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15811.422317] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 00000000001407e0
[15811.423070] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15811.423828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15811.424583] Stack:
[15811.425327]  000000000000005c ffff880240f8d790 ffff880240f8d790 ffff880240f8dd00
[15811.426099]  0000000180230020 000000010000000f ffffffff8112ee12 0000000000000000
[15811.426872]  ffff8802453d7260 000000020023001f ffff880227eaf968 ffffffff8138278d
[15811.427657] Call Trace:
[15811.428437]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
[15811.429231]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15811.430025]  [<ffffffff811ce860>] ? set_track+0x70/0x140
[15811.430814]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
[15811.431600]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15811.432394]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
[15811.433180]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
[15811.433972]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
[15811.434768]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
[15811.435559]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
[15811.436349]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
[15811.437141]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
[15811.437937]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15811.438725]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15811.439519]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
[15811.440307]  [<ffffffff81076c37>] do_fork+0xe7/0x490
[15811.441094]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15811.441871]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15811.442637]  [<ffffffff81077006>] kernel_thread+0x26/0x30
[15811.443385]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
[15811.444121]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
[15811.444838]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15811.445538]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
[15811.446219]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
[15811.446883]  [<ffffffff81098c89>] kthread+0xf9/0x110
[15811.447527]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15811.448159]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15811.448776]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
[15811.449371]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15811.449968] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c e8 22 46 
[15811.451292] sending NMI to other CPUs:
[15811.451901] NMI backtrace for cpu 3
[15811.452459] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G             L 3.18.0+ #106
[15811.454206] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: ffff880180e04000
[15811.454817] RIP: 0010:[<ffffffff810c511a>]  [<ffffffff810c511a>] __lock_acquire.isra.31+0x21a/0x9f0
[15811.455437] RSP: 0018:ffff880180e07d18  EFLAGS: 00000002
[15811.456045] RAX: 0000000000000008 RBX: ffff8801adac4470 RCX: 0000000000000000
[15811.456660] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[15811.457274] RBP: ffff880180e07d88 R08: 0000000000000001 R09: 0000000000000000
[15811.457889] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000014e
[15811.458493] R13: 0000000000000000 R14: ffff880227cc4db8 R15: ffff8801adac4be0
[15811.459092] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15811.459699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15811.460303] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 00000000001407e0
[15811.460907] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15811.461507] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15811.462108] Stack:
[15811.462702]  ffff880180e07d98 ffffffff810c512c 0000000000000102 0000000000000000
[15811.463325]  ffff880180e07d48 ffffffff810abaf5 ffff880180e07dc8 0000000000000000
[15811.463945]  ffff880180e07dd8 0000000000000046 0000000000000000 0000000000000000
[15811.464564] Call Trace:
[15811.465173]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15811.465797]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15811.466421]  [<ffffffff810c5fff>] lock_acquire+0x9f/0x120
[15811.467047]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
[15811.467672]  [<ffffffff817cf0a9>] _raw_spin_lock_irqsave+0x49/0x90
[15811.468300]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
[15811.468924]  [<ffffffff810bcc8d>] remove_wait_queue+0x1d/0x40
[15811.469546]  [<ffffffff8107a95b>] do_wait+0x14b/0x280
[15811.470166]  [<ffffffff8107aeb0>] SyS_wait4+0x80/0x110
[15811.470782]  [<ffffffff81078990>] ? task_stopped_code+0x60/0x60
[15811.471398]  [<ffffffff817d0109>] tracesys_phase2+0xd4/0xd9
[15811.472011] Code: e0 7f 44 09 d0 41 88 47 31 41 0f b6 47 32 83 e0 f0 45 85 c0 0f 95 c2 09 c8 c1 e2 03 09 d0 41 88 47 32 0f b7 55 18 41 0f b7 47 32 <c1> e2 04 83 e0 0f 09 d0 66 41 89 47 32 e8 a4 69 fe ff 4c 8b 4d 
[15811.473409] NMI backtrace for cpu 1
[15811.474061] CPU: 1 PID: 2849 Comm: trinity-c95 Tainted: G             L 3.18.0+ #106
[15811.476098] task: ffff880096b4c470 ti: ffff880153a4c000 task.ti: ffff880153a4c000
[15811.476808] RIP: 0010:[<ffffffff810c63cf>]  [<ffffffff810c63cf>] lock_release+0x1f/0x240
[15811.477535] RSP: 0018:ffff880153a4fe40  EFLAGS: 00000246
[15811.478253] RAX: ffff880096b4c470 RBX: 0000000000000000 RCX: 00000000000003a0
[15811.478982] RDX: ffffffff81086d0c RSI: 0000000000000001 RDI: ffffffff81c50e20
[15811.479707] RBP: ffff880153a4fe48 R08: 0000000000000000 R09: 0000000000000000
[15811.480419] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880094be64c0
[15811.481115] R13: ffff880153a4feb0 R14: 0000000000000000 R15: 0000000000000000
[15811.481803] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15811.482504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15811.483200] CR2: 00007f7996f24220 CR3: 00000002251b4000 CR4: 00000000001407e0
[15811.483890] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15811.484562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15811.485221] Stack:
[15811.485862]  0000000000000000 ffff880153a4fe88 ffffffff81086d24 ffffffff81086ca5
[15811.486521]  0000000000000d7f 0000000000000d7f 0000000000000000 ffff880096[15835.393872] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u16:3:14112]
[15835.394432] Modules linked in: bridge 8021q garp stp snd_seq_dummy dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15835.398378] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G             L 3.18.0+ #106
[15835.400379] Workqueue: khelper __call_usermodehelper
[15835.401080] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: ffff880227eac000
[15835.401795] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] __slab_alloc+0x52f/0x58f
[15835.402534] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
[15835.403252] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 00000000000002e0
[15835.403995] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: ffff880244802000
[15835.404727] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 0000000000000000
[15835.405456] R10: 0000000000000092 R11: 0000000000000000 R12: ffffffff810135bf
[15835.406185] R13: ffff880227eaf878 R14: 0000000100160015 R15: ffffffff8138278d
[15835.406916] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15835.407667] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15835.408410] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 00000000001407e0
[15835.409165] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 0000000000000000
[15835.409924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15835.410679] Stack:
[15835.411433]  000000000000005c ffff880240f8d790 ffff880240f8d790 ffff880240f8dd00
[15835.412226]  0000000180230020 000000010000000f ffffffff8112ee12 0000000000000000
[15835.413002]  ffff8802453d7260 000000020023001f ffff880227eaf968 ffffffff8138278d
[15835.413784] Call Trace:
[15835.414563]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
[15835.415358]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15835.416162]  [<ffffffff811ce860>] ? set_track+0x70/0x140
[15835.416952]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
[15835.417745]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
[15835.418533]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
[15835.419328]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
[15835.420114]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
[15835.420904]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
[15835.421701]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
[15835.422491]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
[15835.423282]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
[15835.424077]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15835.424865]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15835.425658]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
[15835.426447]  [<ffffffff81076c37>] do_fork+0xe7/0x490
[15835.427239]  [<ffffffff810c512c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15835.428018]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15835.428776]  [<ffffffff81077006>] kernel_thread+0x26/0x30
[15835.429525]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
[15835.430259]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
[15835.430986]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
[15835.431694]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
[15835.432374]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
[15835.433049]  [<ffffffff81098c89>] kthread+0xf9/0x110
[15835.433694]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
[15835.434328]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15835.434947]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
[15835.435543]  [<ffffffff81098b90>] ? kthread_create_on_node+0x250/0x250
[15835.436148] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c e8 22 46 
[15835.437494] sending NMI to other CPUs:



^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-18  5:13                                                                                   ` Dave Jones
@ 2014-12-18 15:54                                                                                     ` Chris Mason
  2014-12-18 16:12                                                                                       ` Dave Jones
  2014-12-18 18:54                                                                                       ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Chris Mason @ 2014-12-18 15:54 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin



On Thu, Dec 18, 2014 at 12:13 AM, Dave Jones <davej@redhat.com> wrote:
> On Mon, Dec 15, 2014 at 03:46:41PM -0800, Linus Torvalds wrote:
>  > On Mon, Dec 15, 2014 at 10:21 AM, Linus Torvalds
>  > <torvalds@linux-foundation.org> wrote:
>  > >
>  > > So let's just fix it. Here's a completely untested patch.
>  >
>  > So after looking at this more, I'm actually really convinced that 
> this
>  > was a pretty nasty bug.
>  >
>  > I'm *not* convinced that it's necessarily *your* bug, but I still
>  > think it could be.
> 
> Bah, I was getting all optimistic.
> I came home this evening to a locked up machine.
> Serial console had a *lot* more traces than usual though.
> Full log below.  The 12xxx.xxxxxx traces we seemed to recover from,
> followed by silence for a while, before the real fun begins at 
> 157xx.xxxxxx

CPU 2 seems to be the one making the least progress.  I think he's 
calling fork and then trying to allocate a debug object for his 
hrtimer, eventually wandering into fill_pool from __debug_object_init():

static void fill_pool(void)
{
        gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
        struct debug_obj *new;
        unsigned long flags;

        if (likely(obj_pool_free >= ODEBUG_POOL_MIN_LEVEL))
                return;

        if (unlikely(!obj_cache))
                return;

        while (obj_pool_free < ODEBUG_POOL_MIN_LEVEL) {

                new = kmem_cache_zalloc(obj_cache, gfp);
                if (!new)
                        return;

                raw_spin_lock_irqsave(&pool_lock, flags);
                hlist_add_head(&new->node, &obj_pool);
                obj_pool_free++;
                raw_spin_unlock_irqrestore(&pool_lock, flags);
        }
}

It doesn't seem to be making progress out of __slab_alloc+0x52f/0x58f, 
but maybe the slab code is just a victim of being called in a while 
loop with GFP_ATOMIC set from a starvation prone loop.  Can you please 
line up where 0x52f is in __slab_alloc?

It might be fun to run with CONFIG_DEBUG_OBJECTS off...Linus' patch 
clearly helped, I think we're off in a different bug now.

> [12669.359905] Code: 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
> 41 56 41 55 41 54 53 48 83 ec 48 44 8b 25 48 80 be 00 65 48 8b 1c 25 
> 00 aa 00 00 <45> 85 e4 0f 84 ef 00 00 00 44 8b 1d e7 32 ab 01 49 89 
> fe 41 89
> [12669.361354] INFO: NMI handler 
> (arch_trigger_all_cpu_backtrace_handler) took too long to run: 88.798 
> msecs
> [15739.449422] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! 
> [kworker/u16:3:14112]
> [15739.450087] Modules linked in: bridge 8021q garp stp snd_seq_dummy 
> dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm 
> scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif 
> af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox 
> ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx 
> p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon 
> x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel 
> ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel 
> snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device 
> snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd 
> auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> [15739.454475] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G       
>       L 3.18.0+ #106
> [15739.456686] Workqueue: khelper __call_usermodehelper
> [15739.457473] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: 
> ffff880227eac000
> [15739.458231] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] 
> __slab_alloc+0x52f/0x58f
> [15739.459015] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
> [15739.459794] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 
> 00000000000002e0
> [15739.460570] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: 
> ffff880244802000
> [15739.461343] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 
> 0000000000000000
> [15739.462113] R10: 0000000000000092 R11: 0000000000000000 R12: 
> ffffffff810135bf
> [15739.462885] R13: ffff880227eaf878 R14: 0000000100160015 R15: 
> ffffffff8138278d
> [15739.463648] FS:  0000000000000000(0000) GS:ffff880245200000(0000) 
> knlGS:0000000000000000
> [15739.464431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15739.465203] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 
> 00000000001407e0
> [15739.465985] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15739.466766] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15739.467542] Stack:
> [15739.468308]  000000000000005c ffff880240f8d790 ffff880240f8d790 
> ffff880240f8dd00
> [15739.469106]  0000000180230020 000000010000000f ffffffff8112ee12 
> 0000000000000000
> [15739.469912]  ffff8802453d7260 000000020023001f ffff880227eaf968 
> ffffffff8138278d
> [15739.470717] Call Trace:
> [15739.471511]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
> [15739.472325]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15739.473138]  [<ffffffff811ce860>] ? set_track+0x70/0x140
> [15739.473947]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
> [15739.474757]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15739.475571]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
> [15739.476374]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
> [15739.477174]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
> [15739.477983]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
> [15739.478781]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
> [15739.479590]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
> [15739.480387]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
> [15739.481184]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15739.481983]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15739.482797]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
> [15739.483598]  [<ffffffff81076c37>] do_fork+0xe7/0x490
> [15739.484382]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15739.485160]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15739.485922]  [<ffffffff81077006>] kernel_thread+0x26/0x30
> [15739.486669]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
> [15739.487407]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
> [15739.488126]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15739.488834]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
> [15739.489529]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
> [15739.490197]  [<ffffffff81098c89>] kthread+0xf9/0x110
> [15739.490845]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15739.491474]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15739.492092]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
> [15739.492689]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15739.493288] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 
> 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 
> ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c 
> e8 22 46
> [15739.494617] sending NMI to other CPUs:
> [15739.495221] NMI backtrace for cpu 3
> [15739.495787] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G          
>    L 3.18.0+ #106
> [15739.497539] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: 
> ffff880180e04000
> [15739.498153] RIP: 0010:[<ffffffff810c6014>]  [<ffffffff810c6014>] 
> lock_acquire+0xb4/0x120
> [15739.498772] RSP: 0018:ffff880180e07dd8  EFLAGS: 00000246
> [15739.499382] RAX: ffff8801adac4470 RBX: 0000000000000246 RCX: 
> ffff8802455cff98
> [15739.499994] RDX: 00000000000006a0 RSI: 0000000000000000 RDI: 
> 0000000000000000
> [15739.500609] RBP: ffff880180e07e38 R08: 0000000000000000 R09: 
> 0000000000000000
> [15739.501224] R10: 0000000000000000 R11: 0000000000000000 R12: 
> 0000000000000000
> [15739.501829] R13: 0000000000000000 R14: 0000000000000002 R15: 
> 0000000000000000
> [15739.502422] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) 
> knlGS:0000000000000000
> [15739.503021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15739.503621] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 
> 00000000001407e0
> [15739.504217] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15739.504812] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15739.505403] Stack:
> [15739.505988]  ffffffff8107a8e9 0000000000000000 ffff8801adac4470 
> 0000000000000246
> [15739.506603]  0000000127cc4da0 ffffffff81c0a098 ffff8801adac4470 
> ffffffff81c0a080
> [15739.507220]  ffffffff81c0a098 ffff8801adac4470 ffff8801adac4470 
> ffff8801adac4470
> [15739.507839] Call Trace:
> [15739.508448]  [<ffffffff8107a8e9>] ? do_wait+0xd9/0x280
> [15739.509064]  [<ffffffff817cf3d1>] _raw_read_lock+0x41/0x80
> [15739.509678]  [<ffffffff8107a8e9>] ? do_wait+0xd9/0x280
> [15739.510293]  [<ffffffff8107a8e9>] do_wait+0xd9/0x280
> [15739.510907]  [<ffffffff8107aeb0>] SyS_wait4+0x80/0x110
> [15739.511518]  [<ffffffff81078990>] ? task_stopped_code+0x60/0x60
> [15739.512130]  [<ffffffff817d0109>] tracesys_phase2+0xd4/0xd9
> [15739.512739] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 
> 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 
> 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 
> 00 00 65
> [15739.514121] NMI backtrace for cpu 1
> [15739.514769] CPU: 1 PID: 2849 Comm: trinity-c95 Tainted: G          
>    L 3.18.0+ #106
> [15739.516771] task: ffff880096b4c470 ti: ffff880153a4c000 task.ti: 
> ffff880153a4c000
> [15739.517467] RIP: 0010:[<ffffffff810961f9>]  [<ffffffff810961f9>] 
> find_pid_ns+0x39/0x90
> [15739.518177] RSP: 0018:ffff880153a4fe78  EFLAGS: 00000207
> [15739.518884] RAX: ffff88024e517120 RBX: 0000000000000d7f RCX: 
> 0000000000000034
> [15739.519599] RDX: ffff880094be6500 RSI: ffffffff81c486c0 RDI: 
> 0000000000000d7f
> [15739.520316] RBP: ffff880153a4fe78 R08: 0000000000000000 R09: 
> 0000000000000000
> [15739.521036] R10: 0000000000000000 R11: 0000000000000000 R12: 
> 0000000000000000
> [15739.521752] R13: ffff880096b4c470 R14: 0000000000000000 R15: 
> 0000000000000000
> [15739.522464] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) 
> knlGS:0000000000000000
> [15739.523181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15739.523882] CR2: 00007f7996f24220 CR3: 00000002251b4000 CR4: 
> 00000000001407e0
> [15739.524580] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15739.525269] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15739.525957] Stack:
> [15739.526632]  ffff880153a4fe88 ffffffff8109627f ffff880153a4ff68 
> ffffffff81086e2a
> [15739.527319]  ffffffff81086df8 0000000000000000 ffff880096b4c840 
> 0000000000000000
> [15739.527992]  ffff880100000000 000003e800000b21 ffff880094be64c0 
> 00007fff123fb870
> [15739.528656] Call Trace:
> [15739.529301]  [<ffffffff8109627f>] find_vpid+0x2f/0x50
> [15739.529947]  [<ffffffff81086e2a>] SYSC_kill+0xba/0x240
> [15739.530587]  [<ffffffff81086df8>] ? SYSC_kill+0x88/0x240
> [15739[15763.435537] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 
> 22s! [kworker/u16:3:14112]
> [15763.436252] Modules linked in: bridge 8021q garp stp snd_seq_dummy 
> dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm 
> scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif 
> af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox 
> ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx 
> p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon 
> x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel 
> ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel 
> snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device 
> snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd 
> auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> [15763.441025] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G       
>       L 3.18.0+ #106
> [15763.443320] Workqueue: khelper __call_usermodehelper
> [15763.444090] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: 
> ffff880227eac000
> [15763.444861] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] 
> __slab_alloc+0x52f/0x58f
> [15763.445654] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
> [15763.446422] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 
> 00000000000002e0
> [15763.447201] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: 
> ffff880244802000
> [15763.447985] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 
> 0000000000000000
> [15763.448764] R10: 0000000000000092 R11: 0000000000000000 R12: 
> ffffffff810135bf
> [15763.449545] R13: ffff880227eaf878 R14: 0000000100160015 R15: 
> ffffffff8138278d
> [15763.450319] FS:  0000000000000000(0000) GS:ffff880245200000(0000) 
> knlGS:0000000000000000
> [15763.451100] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15763.451876] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 
> 00000000001407e0
> [15763.452662] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15763.453451] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15763.454230] Stack:
> [15763.455003]  000000000000005c ffff880240f8d790 ffff880240f8d790 
> ffff880240f8dd00
> [15763.455818]  0000000180230020 000000010000000f ffffffff8112ee12 
> 0000000000000000
> [15763.456608]  ffff8802453d7260 000000020023001f ffff880227eaf968 
> ffffffff8138278d
> [15763.457393] Call Trace:
> [15763.458166]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
> [15763.458960]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15763.459753]  [<ffffffff811ce860>] ? set_track+0x70/0x140
> [15763.460544]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
> [15763.461338]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15763.462127]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
> [15763.462921]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
> [15763.463707]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
> [15763.464503]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
> [15763.465294]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
> [15763.466086]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
> [15763.466875]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
> [15763.467664]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15763.468458]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15763.469251]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
> [15763.470041]  [<ffffffff81076c37>] do_fork+0xe7/0x490
> [15763.470828]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15763.471603]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15763.472371]  [<ffffffff81077006>] kernel_thread+0x26/0x30
> [15763.473120]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
> [15763.473854]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
> [15763.474578]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15763.475287]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
> [15763.475975]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
> [15763.476640]  [<ffffffff81098c89>] kthread+0xf9/0x110
> [15763.477285]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15763.477915]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15763.478532]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
> [15763.479131]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15763.479741] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 
> 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 
> ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c 
> e8 22 46
> [15763.481087] sending NMI to other CPUs:
> [15763.481693] NMI backtrace for cpu 3
> [15763.482255] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G          
>    L 3.18.0+ #106
> [15763.484005] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: 
> ffff880180e04000
> [15763.484618] RIP: 0010:[<ffffffff810c47f5>]  [<ffffffff810c47f5>] 
> lock_acquired+0x45/0x370
> [15763.485234] RSP: 0018:ffff880180e07db8  EFLAGS: 00000046
> [15763.485841] RAX: 0000000000000001 RBX: ffff880227cc4da0 RCX: 
> 0000000000000001
> [15763.486453] RDX: 000000000000dbdb RSI: ffffffff810bcc8d RDI: 
> ffff880227cc4db8
> [15763.487068] RBP: ffff880180e07df8 R08: 0000000000000000 R09: 
> 0000000000000001
> [15763.487687] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffff8801adac4470
> [15763.488303] R13: ffff880227cc4db8 R14: 0000000000000046 R15: 
> ffff8801adac4460
> [15763.488909] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) 
> knlGS:0000000000000000
> [15763.489522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15763.490133] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 
> 00000000001407e0
> [15763.490747] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15763.491356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15763.491964] Stack:
> [15763.492564]  0000000100000002 ffffffff810bcc8d ffff880180e07dd8 
> ffff880227cc4da0
> [15763.493194]  ffff880227cc4db8 0000000000000292 ffff8801adac4470 
> ffff8801adac4460
> [15763.493828]  ffff880180e07e38 ffffffff817cf0d5 ffffffff810bcc8d 
> 0000000000000296
> [15763.494464] Call Trace:
> [15763.495088]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
> [15763.495726]  [<ffffffff817cf0d5>] _raw_spin_lock_irqsave+0x75/0x90
> [15763.496364]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
> [15763.497005]  [<ffffffff810bcc8d>] remove_wait_queue+0x1d/0x40
> [15763.497645]  [<ffffffff8107a95b>] do_wait+0x14b/0x280
> [15763.498283]  [<ffffffff8107aeb0>] SyS_wait4+0x80/0x110
> [15763.498917]  [<ffffffff81078990>] ? task_stopped_code+0x60/0x60
> [15763.499551]  [<ffffffff817d0109>] tracesys_phase2+0xd4/0xd9
> [15763.500183] Code: b8 00 45 85 c9 0f 84 d8 00 00 00 65 4c 8b 24 25 
> 00 aa 00 00 45 8b 84 24 6c 07 00 00 45 85 c0 0f 85 be 00 00 00 49 89 
> fd 9c 41 5e <fa> 8b 35 1c 3a ab 01 41 c7 84 24 6c 07 00 00 01 00 00 
> 00 41 8b
> [15763.501611] NMI backtrace for cpu 1
> [15763.502274] CPU: 1 PID: 3298 Comm: trinity-c183 Tainted: G         
>     L 3.18.0+ #106
> [15763.504345] task: ffff880227de16d0 ti: ffff880071060000 task.ti: 
> ffff880071060000
> [15763.505065] RIP: 0033:[<000000336eebc2fc>]  [<000000336eebc2fc>] 
> 0x336eebc2fc
> [15763.505796] RSP: 002b:00007fff123fb868  EFLAGS: 00000246
> [15763.506522] RAX: 0000000000000000 RBX: 0000000000000d7b RCX: 
> ffffffffffffffff
> [15763.507257] RDX: 000000000000000b RSI: 00007fff123fb870 RDI: 
> 0000000000000d7b
> [15763.507994] RBP: 0000000000000000 R08: 00007f79977d8740 R09: 
> 0000000000000000
> [15763.508729] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 00007f7996cf6000
> [15763.509460] R13: 00007f7996cf6068 R14: 0000000000000000 R15: 
> 0000000000000000
> [15763.510188] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) 
> knlGS:0000000000000000
> [15763.510908] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15763.511612] CR2: 00007f7996f24220 CR3: 000000009a646000 CR4: 
> 00000000001407e0
> [15763.512319] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15763.513027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15763.513727]
> [15763.514403] NMI backtrace for cpu 0
> [15763.515067] CPU: 0 PID: 1876 Comm: trinity-c189 Tainted: G         
>     L 3.18.0+ #106
> [15763.517087] task: ffff880096b4ada0 ti: ffff8802253d0000 task.ti: 
> ffff8802253d0000
> [15763.517776] RIP: 0010:[<[15779.306349] INFO: rcu_sched detected 
> stalls on CPUs/tasks:
> [15779.307024] 	(detected by 0, t=6002 jiffies, g=481360, c=481359, 
> q=0)
> [15779.307662] INFO: Stall ended before state dump start
> [15787.421647] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! 
> [kworker/u16:3:14112]
> [15787.422333] Modules linked in: bridge 8021q garp stp snd_seq_dummy 
> dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm 
> scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif 
> af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox 
> ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx 
> p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon 
> x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel 
> ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel 
> snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device 
> snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd 
> auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> [15787.426946] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G       
>       L 3.18.0+ #106
> [15787.429181] Workqueue: khelper __call_usermodehelper
> [15787.429945] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: 
> ffff880227eac000
> [15787.430719] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] 
> __slab_alloc+0x52f/0x58f
> [15787.431507] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
> [15787.432297] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 
> 00000000000002e0
> [15787.433094] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: 
> ffff880244802000
> [15787.433886] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 
> 0000000000000000
> [15787.434671] R10: 0000000000000092 R11: 0000000000000000 R12: 
> ffffffff810135bf
> [15787.435452] R13: ffff880227eaf878 R14: 0000000100160015 R15: 
> ffffffff8138278d
> [15787.436234] FS:  0000000000000000(0000) GS:ffff880245200000(0000) 
> knlGS:0000000000000000
> [15787.437028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15787.437816] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 
> 00000000001407e0
> [15787.438605] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15787.439400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15787.440183] Stack:
> [15787.440959]  000000000000005c ffff880240f8d790 ffff880240f8d790 
> ffff880240f8dd00
> [15787.441792]  0000000180230020 000000010000000f ffffffff8112ee12 
> 0000000000000000
> [15787.442615]  ffff8802453d7260 000000020023001f ffff880227eaf968 
> ffffffff8138278d
> [15787.443422] Call Trace:
> [15787.444217]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
> [15787.445027]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15787.445833]  [<ffffffff811ce860>] ? set_track+0x70/0x140
> [15787.446647]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
> [15787.447449]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15787.448258]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
> [15787.449067]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
> [15787.449879]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
> [15787.450688]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
> [15787.451497]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
> [15787.452307]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
> [15787.453122]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
> [15787.453928]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15787.454733]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15787.455524]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
> [15787.456299]  [<ffffffff81076c37>] do_fork+0xe7/0x490
> [15787.457062]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15787.457826]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15787.458587]  [<ffffffff81077006>] kernel_thread+0x26/0x30
> [15787.459336]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
> [15787.460078]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
> [15787.460794]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15787.461495]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
> [15787.462182]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
> [15787.462850]  [<ffffffff81098c89>] kthread+0xf9/0x110
> [15787.463496]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15787.464124]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15787.464742]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
> [15787.465341]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15787.465947] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 
> 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 
> ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c 
> e8 22 46
> [15787.467291] sending NMI to other CPUs:
> [15787.467904] NMI backtrace for cpu 3
> [15787.468463] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G          
>    L 3.18.0+ #106
> [15787.470211] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: 
> ffff880180e04000
> [15787.470824] RIP: 0033:[<000000336eebc2fc>]  [<000000336eebc2fc>] 
> 0x336eebc2fc
> [15787.471440] RSP: 002b:00007fff123fb868  EFLAGS: 00000246
> [15787.472043] RAX: 0000000000000000 RBX: 0000000000000d7e RCX: 
> ffffffffffffffff
> [15787.472652] RDX: 000000000000000b RSI: 00007fff123fb870 RDI: 
> 0000000000000d7e
> [15787.473259] RBP: 0000000000000000 R08: 00007f79977d8740 R09: 
> 0000000000000000
> [15787.473868] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 00007f7997265000
> [15787.474468] R13: 00007f7997265068 R14: 0000000000000000 R15: 
> 0000000000000000
> [15787.475063] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) 
> knlGS:0000000000000000
> [15787.475660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15787.476254] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 
> 00000000001407e0
> [15787.476851] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15787.477442] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15787.478032]
> [15787.478615] NMI backtrace for cpu 1
> [15787.479205] CPU: 1 PID: 2849 Comm: trinity-c95 Tainted: G          
>    L 3.18.0+ #106
> [15787.481052] task: ffff880096b4c470 ti: ffff880153a4c000 task.ti: 
> ffff880153a4c000
> [15787.481696] RIP: 0010:[<ffffffff817cfea0>]  [<ffffffff817cfea0>] 
> system_call+0x0/0x3
> [15787.482351] RSP: 0018:00007fff123fb868  EFLAGS: 00000046
> [15787.483007] RAX: 000000000000003d RBX: 0000000000000d7f RCX: 
> 000000336eebc2fc
> [15787.483674] RDX: 000000000000000b RSI: 00007fff123fb870 RDI: 
> 0000000000000d7f
> [15787.484338] RBP: 0000000000000000 R08: 00007f79977d8740 R09: 
> 0000000000000000
> [15787.485000] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 00007f799716e000
> [15787.485658] R13: 00007f799716e068 R14: 0000000000000000 R15: 
> 0000000000000000
> [15787.486314] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) 
> knlGS:0000000000000000
> [15787.486979] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15787.487641] CR2: 00007f7996f24220 CR3: 00000002251b4000 CR4: 
> 00000000001407e0
> [15787.488311] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15787.488982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15787.489650] Stack:
> [15787.490312]  000000000041401b 00007f79976944e8 00007f799716e000 
> 0000000000000001
> [15787.490999]  00007f799716e07c 00007f799716e000 0000000000000000 
> 00007f799716e07c
> [15787.491690]  0000000000416c03 00000000000066b2 0000000000000155 
> 00000000cccccccd
> [15787.492383] Call Trace:
> [15787.493063]  <UNK>
> [15787.493070] Code: 8b 3c 24 4c 8b 74 24 08 4c 8b 6c 24 10 4c 8b 64 
> 24 18 48 8b 6c 24 20 48 8b 5c 24 28 48 83 c4 30 e9 74 02 00 00 66 0f 
> 1f 44 00 00 <0f> 01 f8 65 48 89 24 25 80 a0 00 00 65 48 8b 24 25 08 
> aa 00 00
> [15787.495270] NMI backtrace for cpu 0
> [15787.495987] CPU: 0 PID: 1876 Comm: trinity-c189 Tainted: G         
>     L 3.18.0+ #106
> [15787.498136] task: ffff880096b4ada0 ti: ffff8802253d0000 task.ti: 
> ffff8802253d0000
> [15787.498870] RIP: 0010:[<ffffffff810c9698>]  [<ffffffff810c9698>] 
> do_raw_spin_trylock+0x8/0x50
> [15787.499618] RSP: 0018:ffff880244e03d00  EFLAGS: 00000092
> [15787.500344] RAX: ffff880096b4ada0 RBX: ffff880240c51578 RCX: 
> ffff880244fcff98
> [15787.501064] RDX: 0000000000004a4a RSI: 0000000000000018 RDI: 
> ffff880240c51578
> [15787.501780] RBP: ffff880244e03d38 R08: 0000000000000001 R09: 
> 0000000000000000
> [15787.502487] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffff880240c51590
> [15787.503190] R13: 0000000000000092 R14: ff[15811.407761] NMI 
> watchdog: BUG: soft lockup - CPU#2 stuck for 22s! 
> [kworker/u16:3:14112]
> [15811.408334] Modules linked in: bridge 8021q garp stp snd_seq_dummy 
> dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm 
> scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif 
> af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox 
> ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx 
> p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon 
> x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel 
> ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel 
> snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device 
> snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd 
> auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> [15811.412296] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G       
>       L 3.18.0+ #106
> [15811.414324] Workqueue: khelper __call_usermodehelper
> [15811.415025] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: 
> ffff880227eac000
> [15811.415733] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] 
> __slab_alloc+0x52f/0x58f
> [15811.416455] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
> [15811.417171] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 
> 00000000000002e0
> [15811.417913] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: 
> ffff880244802000
> [15811.418645] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 
> 0000000000000000
> [15811.419374] R10: 0000000000000092 R11: 0000000000000000 R12: 
> ffffffff810135bf
> [15811.420102] R13: ffff880227eaf878 R14: 0000000100160015 R15: 
> ffffffff8138278d
> [15811.420831] FS:  0000000000000000(0000) GS:ffff880245200000(0000) 
> knlGS:0000000000000000
> [15811.421572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15811.422317] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 
> 00000000001407e0
> [15811.423070] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15811.423828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15811.424583] Stack:
> [15811.425327]  000000000000005c ffff880240f8d790 ffff880240f8d790 
> ffff880240f8dd00
> [15811.426099]  0000000180230020 000000010000000f ffffffff8112ee12 
> 0000000000000000
> [15811.426872]  ffff8802453d7260 000000020023001f ffff880227eaf968 
> ffffffff8138278d
> [15811.427657] Call Trace:
> [15811.428437]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
> [15811.429231]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15811.430025]  [<ffffffff811ce860>] ? set_track+0x70/0x140
> [15811.430814]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
> [15811.431600]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15811.432394]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
> [15811.433180]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
> [15811.433972]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
> [15811.434768]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
> [15811.435559]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
> [15811.436349]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
> [15811.437141]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
> [15811.437937]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15811.438725]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15811.439519]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
> [15811.440307]  [<ffffffff81076c37>] do_fork+0xe7/0x490
> [15811.441094]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15811.441871]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15811.442637]  [<ffffffff81077006>] kernel_thread+0x26/0x30
> [15811.443385]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
> [15811.444121]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
> [15811.444838]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15811.445538]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
> [15811.446219]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
> [15811.446883]  [<ffffffff81098c89>] kthread+0xf9/0x110
> [15811.447527]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15811.448159]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15811.448776]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
> [15811.449371]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15811.449968] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 
> 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 
> ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c 
> e8 22 46
> [15811.451292] sending NMI to other CPUs:
> [15811.451901] NMI backtrace for cpu 3
> [15811.452459] CPU: 3 PID: 1650 Comm: trinity-c76 Tainted: G          
>    L 3.18.0+ #106
> [15811.454206] task: ffff8801adac4470 ti: ffff880180e04000 task.ti: 
> ffff880180e04000
> [15811.454817] RIP: 0010:[<ffffffff810c511a>]  [<ffffffff810c511a>] 
> __lock_acquire.isra.31+0x21a/0x9f0
> [15811.455437] RSP: 0018:ffff880180e07d18  EFLAGS: 00000002
> [15811.456045] RAX: 0000000000000008 RBX: ffff8801adac4470 RCX: 
> 0000000000000000
> [15811.456660] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 0000000000000000
> [15811.457274] RBP: ffff880180e07d88 R08: 0000000000000001 R09: 
> 0000000000000000
> [15811.457889] R10: 0000000000000000 R11: 0000000000000000 R12: 
> 000000000000014e
> [15811.458493] R13: 0000000000000000 R14: ffff880227cc4db8 R15: 
> ffff8801adac4be0
> [15811.459092] FS:  00007f79977d8740(0000) GS:ffff880245400000(0000) 
> knlGS:0000000000000000
> [15811.459699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15811.460303] CR2: 0000000000000001 CR3: 00000001c9593000 CR4: 
> 00000000001407e0
> [15811.460907] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15811.461507] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15811.462108] Stack:
> [15811.462702]  ffff880180e07d98 ffffffff810c512c 0000000000000102 
> 0000000000000000
> [15811.463325]  ffff880180e07d48 ffffffff810abaf5 ffff880180e07dc8 
> 0000000000000000
> [15811.463945]  ffff880180e07dd8 0000000000000046 0000000000000000 
> 0000000000000000
> [15811.464564] Call Trace:
> [15811.465173]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15811.465797]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15811.466421]  [<ffffffff810c5fff>] lock_acquire+0x9f/0x120
> [15811.467047]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
> [15811.467672]  [<ffffffff817cf0a9>] _raw_spin_lock_irqsave+0x49/0x90
> [15811.468300]  [<ffffffff810bcc8d>] ? remove_wait_queue+0x1d/0x40
> [15811.468924]  [<ffffffff810bcc8d>] remove_wait_queue+0x1d/0x40
> [15811.469546]  [<ffffffff8107a95b>] do_wait+0x14b/0x280
> [15811.470166]  [<ffffffff8107aeb0>] SyS_wait4+0x80/0x110
> [15811.470782]  [<ffffffff81078990>] ? task_stopped_code+0x60/0x60
> [15811.471398]  [<ffffffff817d0109>] tracesys_phase2+0xd4/0xd9
> [15811.472011] Code: e0 7f 44 09 d0 41 88 47 31 41 0f b6 47 32 83 e0 
> f0 45 85 c0 0f 95 c2 09 c8 c1 e2 03 09 d0 41 88 47 32 0f b7 55 18 41 
> 0f b7 47 32 <c1> e2 04 83 e0 0f 09 d0 66 41 89 47 32 e8 a4 69 fe ff 
> 4c 8b 4d
> [15811.473409] NMI backtrace for cpu 1
> [15811.474061] CPU: 1 PID: 2849 Comm: trinity-c95 Tainted: G          
>    L 3.18.0+ #106
> [15811.476098] task: ffff880096b4c470 ti: ffff880153a4c000 task.ti: 
> ffff880153a4c000
> [15811.476808] RIP: 0010:[<ffffffff810c63cf>]  [<ffffffff810c63cf>] 
> lock_release+0x1f/0x240
> [15811.477535] RSP: 0018:ffff880153a4fe40  EFLAGS: 00000246
> [15811.478253] RAX: ffff880096b4c470 RBX: 0000000000000000 RCX: 
> 00000000000003a0
> [15811.478982] RDX: ffffffff81086d0c RSI: 0000000000000001 RDI: 
> ffffffff81c50e20
> [15811.479707] RBP: ffff880153a4fe48 R08: 0000000000000000 R09: 
> 0000000000000000
> [15811.480419] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffff880094be64c0
> [15811.481115] R13: ffff880153a4feb0 R14: 0000000000000000 R15: 
> 0000000000000000
> [15811.481803] FS:  00007f79977d8740(0000) GS:ffff880245000000(0000) 
> knlGS:0000000000000000
> [15811.482504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15811.483200] CR2: 00007f7996f24220 CR3: 00000002251b4000 CR4: 
> 00000000001407e0
> [15811.483890] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15811.484562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15811.485221] Stack:
> [15811.485862]  0000000000000000 ffff880153a4fe88 ffffffff81086d24 
> ffffffff81086ca5
> [15811.486521]  0000000000000d7f 0000000000000d7f 0000000000000000 
> ffff880096[15835.393872] NMI watchdog: BUG: soft lockup - CPU#2 stuck 
> for 22s! [kworker/u16:3:14112]
> [15835.394432] Modules linked in: bridge 8021q garp stp snd_seq_dummy 
> dlci tun fuse rfcomm hidp bnep af_key llc2 nfnetlink can_bcm 
> scsi_transport_iscsi can_raw sctp libcrc32c nfc caif_socket caif 
> af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox 
> ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx 
> p8023 psnap p8022 llc ax25 usb_debug cfg80211 rfkill coretemp hwmon 
> x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel 
> ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel 
> snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device 
> snd_pcm e1000e ptp pps_core snd_timer snd soundcore shpchp nfsd 
> auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> [15835.398378] CPU: 2 PID: 14112 Comm: kworker/u16:3 Tainted: G       
>       L 3.18.0+ #106
> [15835.400379] Workqueue: khelper __call_usermodehelper
> [15835.401080] task: ffff8801c95f0000 ti: ffff880227eac000 task.ti: 
> ffff880227eac000
> [15835.401795] RIP: 0010:[<ffffffff817c3407>]  [<ffffffff817c3407>] 
> __slab_alloc+0x52f/0x58f
> [15835.402534] RSP: 0018:ffff880227eaf8f8  EFLAGS: 00000246
> [15835.403252] RAX: 0000000000000002 RBX: ffff8802304cf5c8 RCX: 
> 00000000000002e0
> [15835.403995] RDX: ffff88024520d7e0 RSI: 0000000000000000 RDI: 
> ffff880244802000
> [15835.404727] RBP: ffff880227eaf9e8 R08: 0000000000000000 R09: 
> 0000000000000000
> [15835.405456] R10: 0000000000000092 R11: 0000000000000000 R12: 
> ffffffff810135bf
> [15835.406185] R13: ffff880227eaf878 R14: 0000000100160015 R15: 
> ffffffff8138278d
> [15835.406916] FS:  0000000000000000(0000) GS:ffff880245200000(0000) 
> knlGS:0000000000000000
> [15835.407667] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15835.408410] CR2: 0000000000000008 CR3: 0000000225ab9000 CR4: 
> 00000000001407e0
> [15835.409165] DR0: 00007fbe591ef000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [15835.409924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000600
> [15835.410679] Stack:
> [15835.411433]  000000000000005c ffff880240f8d790 ffff880240f8d790 
> ffff880240f8dd00
> [15835.412226]  0000000180230020 000000010000000f ffffffff8112ee12 
> 0000000000000000
> [15835.413002]  ffff8802453d7260 000000020023001f ffff880227eaf968 
> ffffffff8138278d
> [15835.413784] Call Trace:
> [15835.414563]  [<ffffffff8112ee12>] ? __delayacct_tsk_init+0x22/0x50
> [15835.415358]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15835.416162]  [<ffffffff811ce860>] ? set_track+0x70/0x140
> [15835.416952]  [<ffffffff811cf35d>] ? init_object+0x3d/0x70
> [15835.417745]  [<ffffffff8138278d>] ? __debug_object_init+0x43d/0x450
> [15835.418533]  [<ffffffff811d295b>] kmem_cache_alloc+0x1cb/0x1f0
> [15835.419328]  [<ffffffff8138278d>] __debug_object_init+0x43d/0x450
> [15835.420114]  [<ffffffff813827bb>] debug_object_init+0x1b/0x20
> [15835.420904]  [<ffffffff810e66d5>] hrtimer_init+0x25/0xb0
> [15835.421701]  [<ffffffff8109f069>] __sched_fork+0x99/0x230
> [15835.422491]  [<ffffffff810a59c9>] sched_fork+0x29/0x200
> [15835.423282]  [<ffffffff8107568c>] copy_process.part.26+0x65c/0x1a40
> [15835.424077]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15835.424865]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15835.425658]  [<ffffffff8108e340>] ? call_helper+0x20/0x20
> [15835.426447]  [<ffffffff81076c37>] do_fork+0xe7/0x490
> [15835.427239]  [<ffffffff810c512c>] ? 
> __lock_acquire.isra.31+0x22c/0x9f0
> [15835.428018]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15835.428776]  [<ffffffff81077006>] kernel_thread+0x26/0x30
> [15835.429525]  [<ffffffff8108e1b4>] __call_usermodehelper+0x64/0x80
> [15835.430259]  [<ffffffff8109301a>] process_one_work+0x1fa/0x550
> [15835.430986]  [<ffffffff81092f98>] ? process_one_work+0x178/0x550
> [15835.431694]  [<ffffffff8109348b>] worker_thread+0x11b/0x490
> [15835.432374]  [<ffffffff81093370>] ? process_one_work+0x550/0x550
> [15835.433049]  [<ffffffff81098c89>] kthread+0xf9/0x110
> [15835.433694]  [<ffffffff810abaf5>] ? local_clock+0x25/0x30
> [15835.434328]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15835.434947]  [<ffffffff817cfe6c>] ret_from_fork+0x7c/0xb0
> [15835.435543]  [<ffffffff81098b90>] ? 
> kthread_create_on_node+0x250/0x250
> [15835.436148] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 
> 78 ff ff ff 9d e8 7a 6d 98 ff 4c 89 e0 eb 0f e8 70 6e 98 ff ff b5 78 
> ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c 
> e8 22 46
> [15835.437494] sending NMI to other CPUs:
> 
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-18 15:54                                                                                     ` Chris Mason
@ 2014-12-18 16:12                                                                                       ` Dave Jones
  2014-12-19  2:45                                                                                         ` Dave Jones
  2014-12-18 18:54                                                                                       ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-18 16:12 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 10:54:19AM -0500, Chris Mason wrote:
 
 
 > CPU 2 seems to be the one making the least progress.  I think he's 
 > calling fork and then trying to allocate a debug object for his 
 > hrtimer, eventually wandering into fill_pool from __debug_object_init():
 > 
 > static void fill_pool(void)
 > {
 >         gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
 >         struct debug_obj *new;
 >         unsigned long flags;
 > 
 >         if (likely(obj_pool_free >= ODEBUG_POOL_MIN_LEVEL))
 >                 return;
 > 
 >         if (unlikely(!obj_cache))
 >                 return;
 > 
 >         while (obj_pool_free < ODEBUG_POOL_MIN_LEVEL) {
 > 
 >                 new = kmem_cache_zalloc(obj_cache, gfp);
 >                 if (!new)
 >                         return;
 > 
 >                 raw_spin_lock_irqsave(&pool_lock, flags);
 >                 hlist_add_head(&new->node, &obj_pool);
 >                 obj_pool_free++;
 >                 raw_spin_unlock_irqrestore(&pool_lock, flags);
 >         }
 > }
 > 
 > It doesn't seem to be making progress out of __slab_alloc+0x52f/0x58f, 
 > but maybe the slab code is just a victim of being called in a while 
 > loop with GFP_ATOMIC set from a starvation prone loop.  Can you please 
 > line up where 0x52f is in __slab_alloc?

http://codemonkey.org.uk/junk/slub.txt is the whole disassembly.
is that at 10ac, the end of an inlined get_freepointer ?
If so, the looping thing sounds plausible.

 > It might be fun to run with CONFIG_DEBUG_OBJECTS off.

That was going to be my next move in absense of any better ideas.

 > ..Linus' patch clearly helped, I think we're off in a different bug now.

It certainly not showing the xsave traces any more, which is a good
sign.  I just this happen..

[36195.185301] WARNING: CPU: 2 PID: 23893 at kernel/watchdog.c:290 watchdog_overflow_callback+0x9c/0xd0()
[36195.185333] Watchdog detected hard LOCKUP on cpu 2
[36195.185347] Modules linked in:
[36195.185363]  8021q garp dlci bridge stp snd_seq_dummy fuse tun rfcomm bnep llc2 af_key hidp can_raw nfnetlink sctp libcrc32c can_bcm nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds scsi_transport_iscsi rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel usb_debug snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm shpchp snd_timer snd e1000e ptp soundcore pps_core nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[36195.186809] CPU: 2 PID: 23893 Comm: modprobe Not tainted 3.18.0+ #106
[36195.189025]  0000000000000000 00000000ac0a0aab ffff880245205b98 ffffffff817c4aef
[36195.190140]  0000000000000000 ffff880245205bf0 ffff880245205bd8 ffffffff81077c61
[36195.191266]  0000000000000000 ffff880244852520 0000000000000000 ffff880245205d30
[36195.192380] Call Trace:
[36195.193482]  <NMI>  [<ffffffff817c4aef>] dump_stack+0x4e/0x68
[36195.194603]  [<ffffffff81077c61>] warn_slowpath_common+0x81/0xa0
[36195.195722]  [<ffffffff81077cd5>] warn_slowpath_fmt+0x55/0x70
[36195.196810]  [<ffffffff8112bcf0>] ? restart_watchdog_hrtimer+0x60/0x60
[36195.197898]  [<ffffffff8112bd8c>] watchdog_overflow_callback+0x9c/0xd0
[36195.198982]  [<ffffffff8116ebfd>] __perf_event_overflow+0x9d/0x2a0
[36195.200058]  [<ffffffff8116d7d3>] ? perf_event_update_userpage+0x103/0x180
[36195.201137]  [<ffffffff8116d6d0>] ? perf_event_task_disable+0x90/0x90
[36195.202291]  [<ffffffff8116f7d4>] perf_event_overflow+0x14/0x20
[36195.203501]  [<ffffffff8101e739>] intel_pmu_handle_irq+0x1f9/0x3f0
[36195.204696]  [<ffffffff81017cab>] perf_event_nmi_handler+0x2b/0x50
[36195.205881]  [<ffffffff81007320>] nmi_handle+0xc0/0x1b0
[36195.207018]  [<ffffffff81007265>] ? nmi_handle+0x5/0x1b0
[36195.208119]  [<ffffffff8100760a>] default_do_nmi+0x4a/0x140
[36195.209191]  [<ffffffff810077c0>] do_nmi+0xc0/0x100
[36195.210259]  [<ffffffff817d1efa>] end_repeat_nmi+0x1e/0x2e
[36195.211323]  <<EOE>>  <UNK> 
[36195.211335] ---[ end trace b7e2af452c79e16a ]---
[36195.213538] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 28.230 msecs
[36195.214595] perf interrupt took too long (223362 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[36225.945721] perf interrupt took too long (221634 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[36243.519426] INFO: rcu_sched detected stalls on CPUs/tasks:
[36243.520799] 	2: (0 ticks this GP) idle=3b5/140000000000000/0 softirq=2159968/2159968 
[36243.522120] 	(detected by 0, t=6002 jiffies, g=972859, c=972858, q=0)
[36243.523406] Task dump for CPU 2:
[36243.524699] swapper/2       R  running task    14576     0      1 0x00200000
[36243.526016]  000000024375be38 ffffffffffffff10 ffffffff8165f7d9 0000000000000001
[36243.527326]  ffffffff81cb1bc0 0000000000000002 ffff88024375be88 ffffffff8165f7b5
[36243.528664]  000020fb799d5682 ffffffff81cb1c30 00000000001cc300 ffffffff81d215f0
[36243.530015] Call Trace:
[36243.531328]  [<ffffffff810ed5f4>] ? ktime_get+0x94/0x120
[36243.532647]  [<ffffffff8165f7b5>] cpuidle_enter_state+0x55/0x190
[36243.533974]  [<ffffffff8165f9a7>] cpuidle_enter+0x17/0x20
[36243.535290]  [<ffffffff810bd4a4>] cpu_startup_entry+0x194/0x410
[36243.536590]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230


Which didn't lock up the box, but did taint the kernel so it stopped the fuzzer.
That <UNK> trace is pretty hopeless.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-18 15:54                                                                                     ` Chris Mason
  2014-12-18 16:12                                                                                       ` Dave Jones
@ 2014-12-18 18:54                                                                                       ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-18 18:54 UTC (permalink / raw)
  To: Chris Mason
  Cc: Dave Jones, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 7:54 AM, Chris Mason <clm@fb.com> wrote:
>
> CPU 2 seems to be the one making the least progress.  I think he's calling
> fork and then trying to allocate a debug object for his hrtimer, eventually
> wandering into fill_pool from __debug_object_init():

Good call.

I agree - fill_pool() seems to be just plain nasty.

We've had this bug before, btw - a *loong* time ago in the original
kmalloc stuff. You really should not fill a pool of memory that way.
It's fundamentally wrong to fill a pool and then (later - after having
released and re-aqcuired the lock) allocate from the pool. Somebody
else will steal the allocations you did, and take advantage of your
work.

The high/low watermarks are done completely wrong for that thing too -
if things fall below a minimum level, you want to try to make sure it
grows clearly past the minimum, so that you don't get stuck just
around the minimum. But you need to spread out the pain, rather than
make one unlucky allocator have to do all the work.

> It might be fun to run with CONFIG_DEBUG_OBJECTS off...Linus' patch clearly
> helped, I think we're off in a different bug now.

I'm not sure it was my patch. I'm wondering if it's because Dave still
has preemption off, and the backtraces look different (and better) as
a result.

But yes, trying with DEBUG_OBJECTS off might be a good idea. It's
entirely possible that the debug code is actually triggering bugs of
its own, rather than showing other peoples bugs.

                         Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* save_xstate_sig (Re: frequent lockups in 3.18rc4)
  2014-12-15  5:47                                                                           ` Linus Torvalds
  2014-12-15  5:57                                                                             ` Dave Jones
  2014-12-15 14:00                                                                             ` Borislav Petkov
@ 2014-12-18 21:17                                                                             ` Andy Lutomirski
  2014-12-18 21:34                                                                               ` Linus Torvalds
  2014-12-18 21:37                                                                               ` Dave Jones
  2 siblings, 2 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-12-18 21:17 UTC (permalink / raw)
  To: Linus Torvalds, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List
  Cc: Suresh Siddha, Oleg Nesterov, Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 2640 bytes --]

On 12/14/2014 09:47 PM, Linus Torvalds wrote:
> On Sun, Dec 14, 2014 at 4:38 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Can anybody make sense of that backtrace, keeping in mind that we're
>> looking for some kind of endless loop where we don't make progress?
>
> So looking at all the backtraces, which is kind of messy because
> there's some missing data (presumably buffers overflowed from all the
> CPU's printing at the same time), it looks  like:
>
>   - CPU 0 is missing. No idea why.
>   - CPU's 1-3 all have the same trace for
>
>      int_signal ->
>      do_notify_resume ->
>      do_signal ->
>        ....
>      page_fault ->
>      do_page_fault
>
> and "save_xstate_sig+0x81" shows up on all stacks, although only on
> CPU1 does it show up as a "guaranteed" part of the stack chain (ie it
> matches frame pointer data too). CPU1 also has that __clear_user show
> up (which is called from save_xstate_sig), but not other CPU's.  CPU2
> and CPU3 have "save_xstate_sig+0x98" in addition to that +0x81 thing.
>
> My guess is that "save_xstate_sig+0x81" is the instruction after the
> __clear_user call, and that CPU1 took the fault in __clear_user(),
> while CPU2 and CPU3 took the fault at "save_xstate_sig+0x98" instead,
> which I'd guess is the
>
>          xsave64 (%rdi)

I admit that my understanding of the disaster that is x86's FPU handling 
is limited, but I'm moderately confident that save_xstate_sig is broken.

The code is:

	if (user_has_fpu()) {
		/* Save the live register state to the user directly. */
		if (save_user_xstate(buf_fx))
			return -1;
		/* Update the thread's fxstate to save the fsave header. */
		if (ia32_fxstate)
			fpu_fxsave(&tsk->thread.fpu);
	} else {
		sanitize_i387_state(tsk);
		if (__copy_to_user(buf_fx, xsave, xstate_size))
			return -1;
	}

Suppose that user_has_fpu() returns true, we call save_user_xstate, and 
the xsave instruction (or anything else in there, for that matter) 
causes a page fault.

The page fault handler is well within its rights to schedule.  At that 
point, *we might not own the FPU any more*, depending on the vagaries of 
eager vs lazy mode.  So, when we schedule back in and resume from the 
page fault, we are in the wrong branch of the if statement.

At this point, we're going to write garbage (possibly sensitive garbage) 
to the userspace signal frame.  I don't see why this would cause an 
infinite loop, but I don't think it's healthy.

FWIW, if xsave traps with cr2 value, then there would indeed be an 
infinite loop in here.  It seems to work right on my machine.  Dave, 
want to run the attached little test?

--Andy

[-- Attachment #2: xsave_cr2.c --]
[-- Type: text/plain, Size: 2841 bytes --]

#define _GNU_SOURCE
#include <err.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/ucontext.h>

static volatile unsigned char *buf, *xsave_addr;
static volatile int nfailures = 0;

static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
		       int flags)
{
	struct sigaction sa;
	memset(&sa, 0, sizeof(sa));
	sa.sa_sigaction = handler;
	sa.sa_flags = SA_SIGINFO | flags;
	sigemptyset(&sa.sa_mask);
	if (sigaction(sig, &sa, 0))
		err(1, "sigaction");
}

static void clearhandler(int sig)
{
	struct sigaction sa;
	memset(&sa, 0, sizeof(sa));
	sa.sa_handler = SIG_DFL;
	sigemptyset(&sa.sa_mask);
	if (sigaction(sig, &sa, 0))
		err(1, "sigaction");
}

static void sigsegv(int sig, siginfo_t *si, void *ctx_void)
{
	ucontext_t *ctx = (ucontext_t*)ctx_void;

	unsigned long cr2 = (unsigned long)ctx->uc_mcontext.gregs[REG_CR2];
	unsigned long start = (unsigned long)buf;

	extern unsigned char xsave_insn[], after_xsave_insn[];
	
	if (ctx->uc_mcontext.gregs[REG_RIP] != (unsigned long)xsave_insn) {
		printf("Uncorrectable segfault\n");
		clearhandler(SIGSEGV);
		return;
	}

	if (si->si_code != SEGV_ACCERR) {
		printf("Segfault was %d (trap %d), not SEGV_ACCERR\n",
		       si->si_code, ctx->uc_mcontext.gregs[REG_TRAPNO]);
		clearhandler(SIGSEGV);
		return;
	}

	if (cr2 != (unsigned long)si->si_addr) {
		printf("CR2 (0x%lx) != si_addr (0x%lx)\n",
		       cr2, (unsigned long)si->si_addr);
		clearhandler(SIGSEGV);
		return;
	}

	if (cr2 >= start && cr2 <= (start + 4095)) {
		printf("[OK]\txsave offset = %d, cr2 offset = %d\n",
		       (int)(xsave_addr - buf), (int)(cr2 - start));
	} else if (cr2 >= start + 4096 && cr2 <= start + 8191) {
		printf("[FAIL]\txsave offset = %d, cr2 offset = %d\n",
		       (int)(xsave_addr - buf), (int)(cr2 - start));

		nfailures++;
	} else if (cr2 >= start + 8192 && cr2 <= start + 12287) {
		printf("[OK]\txsave offset = %d, cr2 offset = %d\n",
		       (int)(xsave_addr - buf), (int)(cr2 - start));
	} else {
		printf("[FAIL]\tcr2 is completely out of range\n");
		abort();
	}

	ctx->uc_mcontext.gregs[REG_RIP] = (unsigned long)after_xsave_insn;
}

int main()
{
	int i;

	buf = mmap(NULL, 4096*3, PROT_NONE,
		   MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
	if (buf == MAP_FAILED)
		err(1, "mmap");

	if (mmap((unsigned char *)buf + 4096, 4096, PROT_READ | PROT_WRITE,
		 MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED)
		err(1, "mmap");

	sethandler(SIGSEGV, sigsegv, 0);

	for (i = 0; i < 8193; i += 64) {
		xsave_addr = buf + i;
		printf("XSAVE to offset %d\n", i);
		asm volatile ("xsave_insn: xsaveq %0 ; after_xsave_insn:"
			      : "=m" (*xsave_addr)
			      : "a" (0xffffffff), "d" (0xffffffff));
	}

	if (nfailures)
		printf("%d failures\n", nfailures);
	else
		printf("PASS!\n");

	return 0;
}

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: save_xstate_sig (Re: frequent lockups in 3.18rc4)
  2014-12-18 21:17                                                                             ` save_xstate_sig (Re: frequent lockups in 3.18rc4) Andy Lutomirski
@ 2014-12-18 21:34                                                                               ` Linus Torvalds
  2014-12-18 21:41                                                                                 ` Andy Lutomirski
  2014-12-18 21:37                                                                               ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-18 21:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 1:17 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I admit that my understanding of the disaster that is x86's FPU handling is
> limited, but I'm moderately confident that save_xstate_sig is broken.

Very possible. The FPU code *is* nasty.

> The code is:
>
>         if (user_has_fpu()) {
>                 /* Save the live register state to the user directly. */
>                 if (save_user_xstate(buf_fx))
>                         return -1;
>                 /* Update the thread's fxstate to save the fsave header. */
>                 if (ia32_fxstate)
>                         fpu_fxsave(&tsk->thread.fpu);
>         } else {
>                 sanitize_i387_state(tsk);
>                 if (__copy_to_user(buf_fx, xsave, xstate_size))
>                         return -1;
>         }
>
> Suppose that user_has_fpu() returns true, we call save_user_xstate, and the
> xsave instruction (or anything else in there, for that matter) causes a page
> fault.
>
> The page fault handler is well within its rights to schedule.

You don't even have to page fault. Preemption..

But that shouldn't actually be the bug. This is just an optimization.
If we have the FPU, we save it from the FP state, rather than copying
it from our kernel copy. If we schedule (page fault, preemption,
whatever) and lose the FPU, the code still works - we'll just take a
TS fault, and have to reload the information.

So I'm with you in that there can certainly be bugs in the FPU
handling, but I don't think this is one.

                        Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: save_xstate_sig (Re: frequent lockups in 3.18rc4)
  2014-12-18 21:17                                                                             ` save_xstate_sig (Re: frequent lockups in 3.18rc4) Andy Lutomirski
  2014-12-18 21:34                                                                               ` Linus Torvalds
@ 2014-12-18 21:37                                                                               ` Dave Jones
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-18 21:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 01:17:59PM -0800, Andy Lutomirski wrote:

 > FWIW, if xsave traps with cr2 value, then there would indeed be an 
 > infinite loop in here.  It seems to work right on my machine.  Dave, 
 > want to run the attached little test?

XSAVE to offset 0
[OK]	xsave offset = 0, cr2 offset = 831
XSAVE to offset 64
[OK]	xsave offset = 64, cr2 offset = 895
XSAVE to offset 128
[OK]	xsave offset = 128, cr2 offset = 959
XSAVE to offset 192
[OK]	xsave offset = 192, cr2 offset = 1023
XSAVE to offset 256
[OK]	xsave offset = 256, cr2 offset = 1087
XSAVE to offset 320
[OK]	xsave offset = 320, cr2 offset = 1151
XSAVE to offset 384
[OK]	xsave offset = 384, cr2 offset = 1215
XSAVE to offset 448
[OK]	xsave offset = 448, cr2 offset = 1279
XSAVE to offset 512
[OK]	xsave offset = 512, cr2 offset = 1343
XSAVE to offset 576
[OK]	xsave offset = 576, cr2 offset = 1407
XSAVE to offset 640
[OK]	xsave offset = 640, cr2 offset = 1471
XSAVE to offset 704
[OK]	xsave offset = 704, cr2 offset = 1535
XSAVE to offset 768
[OK]	xsave offset = 768, cr2 offset = 1599
XSAVE to offset 832
[OK]	xsave offset = 832, cr2 offset = 1663
XSAVE to offset 896
[OK]	xsave offset = 896, cr2 offset = 1727
XSAVE to offset 960
[OK]	xsave offset = 960, cr2 offset = 1791
XSAVE to offset 1024
[OK]	xsave offset = 1024, cr2 offset = 1855
XSAVE to offset 1088
[OK]	xsave offset = 1088, cr2 offset = 1919
XSAVE to offset 1152
[OK]	xsave offset = 1152, cr2 offset = 1983
XSAVE to offset 1216
[OK]	xsave offset = 1216, cr2 offset = 2047
XSAVE to offset 1280
[OK]	xsave offset = 1280, cr2 offset = 2111
XSAVE to offset 1344
[OK]	xsave offset = 1344, cr2 offset = 2175
XSAVE to offset 1408
[OK]	xsave offset = 1408, cr2 offset = 2239
XSAVE to offset 1472
[OK]	xsave offset = 1472, cr2 offset = 2303
XSAVE to offset 1536
[OK]	xsave offset = 1536, cr2 offset = 2367
XSAVE to offset 1600
[OK]	xsave offset = 1600, cr2 offset = 2431
XSAVE to offset 1664
[OK]	xsave offset = 1664, cr2 offset = 2495
XSAVE to offset 1728
[OK]	xsave offset = 1728, cr2 offset = 2559
XSAVE to offset 1792
[OK]	xsave offset = 1792, cr2 offset = 2623
XSAVE to offset 1856
[OK]	xsave offset = 1856, cr2 offset = 2687
XSAVE to offset 1920
[OK]	xsave offset = 1920, cr2 offset = 2751
XSAVE to offset 1984
[OK]	xsave offset = 1984, cr2 offset = 2815
XSAVE to offset 2048
[OK]	xsave offset = 2048, cr2 offset = 2879
XSAVE to offset 2112
[OK]	xsave offset = 2112, cr2 offset = 2943
XSAVE to offset 2176
[OK]	xsave offset = 2176, cr2 offset = 3007
XSAVE to offset 2240
[OK]	xsave offset = 2240, cr2 offset = 3071
XSAVE to offset 2304
[OK]	xsave offset = 2304, cr2 offset = 3135
XSAVE to offset 2368
[OK]	xsave offset = 2368, cr2 offset = 3199
XSAVE to offset 2432
[OK]	xsave offset = 2432, cr2 offset = 3263
XSAVE to offset 2496
[OK]	xsave offset = 2496, cr2 offset = 3327
XSAVE to offset 2560
[OK]	xsave offset = 2560, cr2 offset = 3391
XSAVE to offset 2624
[OK]	xsave offset = 2624, cr2 offset = 3455
XSAVE to offset 2688
[OK]	xsave offset = 2688, cr2 offset = 3519
XSAVE to offset 2752
[OK]	xsave offset = 2752, cr2 offset = 3583
XSAVE to offset 2816
[OK]	xsave offset = 2816, cr2 offset = 3647
XSAVE to offset 2880
[OK]	xsave offset = 2880, cr2 offset = 3711
XSAVE to offset 2944
[OK]	xsave offset = 2944, cr2 offset = 3775
XSAVE to offset 3008
[OK]	xsave offset = 3008, cr2 offset = 3839
XSAVE to offset 3072
[OK]	xsave offset = 3072, cr2 offset = 3903
XSAVE to offset 3136
[OK]	xsave offset = 3136, cr2 offset = 3967
XSAVE to offset 3200
[OK]	xsave offset = 3200, cr2 offset = 4031
XSAVE to offset 3264
[OK]	xsave offset = 3264, cr2 offset = 4095
XSAVE to offset 3328
[OK]	xsave offset = 3328, cr2 offset = 3328
XSAVE to offset 3392
[OK]	xsave offset = 3392, cr2 offset = 3392
XSAVE to offset 3456
[OK]	xsave offset = 3456, cr2 offset = 3456
XSAVE to offset 3520
[OK]	xsave offset = 3520, cr2 offset = 3520
XSAVE to offset 3584
[OK]	xsave offset = 3584, cr2 offset = 3584
XSAVE to offset 3648
[OK]	xsave offset = 3648, cr2 offset = 3648
XSAVE to offset 3712
[OK]	xsave offset = 3712, cr2 offset = 3712
XSAVE to offset 3776
[OK]	xsave offset = 3776, cr2 offset = 3776
XSAVE to offset 3840
[OK]	xsave offset = 3840, cr2 offset = 3840
XSAVE to offset 3904
[OK]	xsave offset = 3904, cr2 offset = 3904
XSAVE to offset 3968
[OK]	xsave offset = 3968, cr2 offset = 3968
XSAVE to offset 4032
[OK]	xsave offset = 4032, cr2 offset = 4032
XSAVE to offset 4096
XSAVE to offset 4160
XSAVE to offset 4224
XSAVE to offset 4288
XSAVE to offset 4352
XSAVE to offset 4416
XSAVE to offset 4480
XSAVE to offset 4544
XSAVE to offset 4608
XSAVE to offset 4672
XSAVE to offset 4736
XSAVE to offset 4800
XSAVE to offset 4864
XSAVE to offset 4928
XSAVE to offset 4992
XSAVE to offset 5056
XSAVE to offset 5120
XSAVE to offset 5184
XSAVE to offset 5248
XSAVE to offset 5312
XSAVE to offset 5376
XSAVE to offset 5440
XSAVE to offset 5504
XSAVE to offset 5568
XSAVE to offset 5632
XSAVE to offset 5696
XSAVE to offset 5760
XSAVE to offset 5824
XSAVE to offset 5888
XSAVE to offset 5952
XSAVE to offset 6016
XSAVE to offset 6080
XSAVE to offset 6144
XSAVE to offset 6208
XSAVE to offset 6272
XSAVE to offset 6336
XSAVE to offset 6400
XSAVE to offset 6464
XSAVE to offset 6528
XSAVE to offset 6592
XSAVE to offset 6656
XSAVE to offset 6720
XSAVE to offset 6784
XSAVE to offset 6848
XSAVE to offset 6912
XSAVE to offset 6976
XSAVE to offset 7040
XSAVE to offset 7104
XSAVE to offset 7168
XSAVE to offset 7232
XSAVE to offset 7296
XSAVE to offset 7360
XSAVE to offset 7424
[OK]	xsave offset = 7424, cr2 offset = 8255
XSAVE to offset 7488
[OK]	xsave offset = 7488, cr2 offset = 8319
XSAVE to offset 7552
[OK]	xsave offset = 7552, cr2 offset = 8383
XSAVE to offset 7616
[OK]	xsave offset = 7616, cr2 offset = 8447
XSAVE to offset 7680
[OK]	xsave offset = 7680, cr2 offset = 8511
XSAVE to offset 7744
[OK]	xsave offset = 7744, cr2 offset = 8575
XSAVE to offset 7808
[OK]	xsave offset = 7808, cr2 offset = 8639
XSAVE to offset 7872
[OK]	xsave offset = 7872, cr2 offset = 8703
XSAVE to offset 7936
[OK]	xsave offset = 7936, cr2 offset = 8767
XSAVE to offset 8000
[OK]	xsave offset = 8000, cr2 offset = 8831
XSAVE to offset 8064
[OK]	xsave offset = 8064, cr2 offset = 8895
XSAVE to offset 8128
[OK]	xsave offset = 8128, cr2 offset = 8959
XSAVE to offset 8192
[OK]	xsave offset = 8192, cr2 offset = 9023
PASS!


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: save_xstate_sig (Re: frequent lockups in 3.18rc4)
  2014-12-18 21:34                                                                               ` Linus Torvalds
@ 2014-12-18 21:41                                                                                 ` Andy Lutomirski
  0 siblings, 0 replies; 486+ messages in thread
From: Andy Lutomirski @ 2014-12-18 21:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 1:34 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 18, 2014 at 1:17 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> I admit that my understanding of the disaster that is x86's FPU handling is
>> limited, but I'm moderately confident that save_xstate_sig is broken.
>
> Very possible. The FPU code *is* nasty.
>
>> The code is:
>>
>>         if (user_has_fpu()) {
>>                 /* Save the live register state to the user directly. */
>>                 if (save_user_xstate(buf_fx))
>>                         return -1;
>>                 /* Update the thread's fxstate to save the fsave header. */
>>                 if (ia32_fxstate)
>>                         fpu_fxsave(&tsk->thread.fpu);
>>         } else {
>>                 sanitize_i387_state(tsk);
>>                 if (__copy_to_user(buf_fx, xsave, xstate_size))
>>                         return -1;
>>         }
>>
>> Suppose that user_has_fpu() returns true, we call save_user_xstate, and the
>> xsave instruction (or anything else in there, for that matter) causes a page
>> fault.
>>
>> The page fault handler is well within its rights to schedule.
>
> You don't even have to page fault. Preemption..
>
> But that shouldn't actually be the bug. This is just an optimization.
> If we have the FPU, we save it from the FP state, rather than copying
> it from our kernel copy. If we schedule (page fault, preemption,
> whatever) and lose the FPU, the code still works - we'll just take a
> TS fault, and have to reload the information.
>

Not if this happens:

    /*
     * Paranoid restore. send a SIGSEGV if we fail to restore the state.
     */
    if (unlikely(restore_fpu_checking(tsk))) {
        drop_init_fpu(tsk);
        force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
        return;
    }

I have no idea what, if anything, can cause FPU restore to fail, but
that looks like an infinite loop to me.

And the fact that we have an xsave instruction that can cause page
faults *and* has an extable fixup doesn't exactly inspire confidence,
but the code looks correct.

If this is easy enough for Dave to trigger, it could be worth
instrumenting __do_page_fault to log when a fault happens on that
xsave instruction and to maybe also log the outcome.  Do we know
whether your fault retry fixes solved the problem yet?


FWIW, Dave's run of my test seems to rule out easy bugs in his CPU,
and I couldn't trigger a bogus cr2 value on Sandy Bridge or Core 2
Quad.

--Andy

> So I'm with you in that there can certainly be bugs in the FPU
> handling, but I don't think this is one.
>
>                         Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-18 16:12                                                                                       ` Dave Jones
@ 2014-12-19  2:45                                                                                         ` Dave Jones
  2014-12-19  3:49                                                                                           ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-19  2:45 UTC (permalink / raw)
  To: Chris Mason, Linus Torvalds, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 11:12:30AM -0500, Dave Jones wrote:
 
 >  > It might be fun to run with CONFIG_DEBUG_OBJECTS off.
 > 
 > That was going to be my next move in absense of any better ideas.

So this is interesting. So far today with that change, I've not been able
to reproduce the "NMI watchdog kicks in, and then locks up the box", but I keep
hitting lockups which cause the kernel to taint, and then the fuzzer
would usually quit. I hacked something up so it would ignore that, and keep
going, and it spews a lot of traces, but it does actually seem to be staying up.

Example of the spew-o-rama below.

	Dave


[15351.809622] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15351.809723] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15351.811011] CPU: 1 PID: 20128 Comm: trinity-c195 Not tainted 3.18.0+ #107
[15351.813048] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15351.814094] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15351.815142] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15351.816174] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15351.817211] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15351.818242] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15351.819265] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15351.820314] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15351.821343] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15351.822373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15351.823401] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15351.824436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15351.825473] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15351.826507] Stack:
[15351.827536]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15351.828596]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15351.829683]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15351.830752] Call Trace:
[15351.831807]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15351.832873]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15351.833942]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15351.834999]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15351.836044]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15351.837077]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15351.838105]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15351.839141]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15351.840210]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15351.841244]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15351.842280]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15351.843311]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15351.844337]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15351.845358]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15351.846384] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15351.848623] sending NMI to other CPUs:
[15351.849722] NMI backtrace for cpu 2
[15351.850796] CPU: 2 PID: 22823 Comm: trinity-c154 Not tainted 3.18.0+ #107
[15351.852950] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15351.854042] RIP: 0010:[<ffffffff810fb22a>]  [<ffffffff810fb22a>] generic_exec_single+0xea/0x1b0
[15351.855149] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15351.856246] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15351.857348] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15351.858449] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15351.859558] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15351.860668] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15351.861787] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15351.862915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15351.864023] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15351.865121] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15351.866204] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15351.867269] Stack:
[15351.868316]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15351.869381]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15351.870445]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15351.871506] Call Trace:
[15351.872550]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15351.873579]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15351.874582]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15351.875564]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15351.876525]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15351.877469]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15351.878397]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15351.879303]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15351.880192]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15351.881072]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15351.881949]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15351.882818]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15351.883684]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15351.884552]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15351.885416]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15351.886279] Code: c0 3a 1d 00 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[15351.888165] NMI backtrace for cpu 0
[15351.888278] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 38.443 msecs
[15351.889964] CPU: 0 PID: 24440 Comm: trinity-c229 Not tainted 3.18.0+ #107
[15351.891825] task: ffff880240c14470 ti: ffff880095bb4000 task.ti: ffff880095bb4000
[15351.892771] RIP: 0010:[<ffffffff810c6155>]  [<ffffffff810c6155>] lock_release+0xc5/0x240
[15351.893724] RSP: 0018:ffff880095bb7e98  EFLAGS: 00000296
[15351.894666] RAX: ffff880240c14470 RBX: ffff8802236cc380 RCX: 0000000000000360
[15351.895616] RDX: ffff880244e0db60 RSI: 0000000000000000 RDI: ffff880240c14be0
[15351.896562] RBP: ffff880095bb7eb8 R08: 0000000000000000 R09: 0000000000000000
[15351.897507] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c50da0
[15351.898452] R13: ffffffff810961fd R14: 0000000000000296 R15: 0000000000000001
[15351.899401] FS:  00007f9dcd485740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15351.900361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15351.901318] CR2: 0000000000000008 CR3: 000000009afbc000 CR4: 00000000001407f0
[15351.902288] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15351.903254] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15351.904220] Stack:
[15351.905181]  00007f9dcc74d000 00007fffe01fcb90 0000000000000000 0000000000000000
[15351.906167]  ffff880095bb7ee8 ffffffff81096215 ffffffff810961b5 00007fffe01fcb90
[15351.907153]  0000000000000000 000000000000000b ffff880095bb7f78 ffffffff8107ae75
[15351.908147] Call Trace:
[15351.909131]  [<ffffffff81096215>] find_get_pid+0x65/0x80
[15351.910101]  [<ffffffff810961b5>] ? find_get_pid+0x5/0x80
[15351.911051]  [<ffffffff8107ae75>] SyS_wait4+0x55/0x110
[15351.911995]  [<ffffffff8137432e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[15351.912944]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15351.913886] Code: 00 00 4c 89 ea 4c 89 e6 48 89 df e8 26 fc ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 41 56 9d 48 83 c4 08 5b <41> 5c 41 5d 41 5e 41 5f 5d f3 c3 65 ff 04 25 e0 a9 00 00 48 8b 
[15351.915903] NMI backtrace for cpu 3
[15351.915906] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 66.180 msecs
[15351.917854] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0+ #107
[15351.919856] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15351.920877] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15351.921916] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15351.922951] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15351.923981] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15351.925008] RBP: ffff88024372be38 R08: 000000008baf5f56 R09: 0000000000000000
[15351.926027] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15351.927053] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15351.928080] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15351.929111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15351.930133] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407e0
[15351.931161] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15351.932184] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15351.933197] Stack:
[15351.934198]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15351.935238]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15351.936280]  00000df86c4d4975 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15351.937324] Call Trace:
[15351.938358]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15351.939406]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15351.940448]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15351.941495]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15351.942533] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15351.944799] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 95.053 msecs


[15375.795837] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15375.796930] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15375.802851] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15375.805152] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15375.806342] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15375.807487] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15375.808618] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15375.809753] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15375.810881] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15375.812007] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15375.813131] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15375.814262] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15375.815399] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15375.816583] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15375.817716] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15375.818842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15375.819963] Stack:
[15375.821085]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15375.822227]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15375.823368]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15375.824506] Call Trace:
[15375.825630]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15375.826798]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15375.827930]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15375.829057]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15375.830190]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15375.831320]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15375.832445]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15375.833558]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15375.834647]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15375.835727]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15375.836821]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15375.837883]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15375.838926]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15375.839948]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15375.840965] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15375.843169] sending NMI to other CPUs:
[15375.844211] NMI backtrace for cpu 2
[15375.845266] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15375.847416] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15375.848501] RIP: 0010:[<ffffffff810fb22e>]  [<ffffffff810fb22e>] generic_exec_single+0xee/0x1b0
[15375.849601] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15375.850691] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15375.851789] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15375.852887] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15375.853982] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15375.855070] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15375.856156] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15375.857242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15375.858325] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15375.859414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15375.860503] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15375.861585] Stack:
[15375.862664]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15375.863747]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15375.864811]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15375.865862] Call Trace:
[15375.866891]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15375.867913]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15375.868904]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15375.869873]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15375.870823]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15375.871751]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15375.872665]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15375.873561]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15375.874438]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15375.875309]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15375.876176]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15375.877032]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15375.877890]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15375.878743]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15375.879599]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15375.880452] Code: 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[15375.882317] NMI backtrace for cpu 3
[15375.883194] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15375.885009] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15375.885929] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15375.886858] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15375.887779] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15375.888702] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15375.889621] RBP: ffff88024372be38 R08: 000000008baf5f56 R09: 0000000000000000
[15375.890534] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15375.891446] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15375.892362] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15375.893286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15375.894208] CR2: 00000000a0d5536f CR3: 0000000001c11000 CR4: 00000000001407e0
[15375.895138] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15375.896066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15375.896990] Stack:
[15375.897914]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15375.898865]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15375.899819]  00000dfe02d03a98 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15375.900772] Call Trace:
[15375.901721]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15375.902675]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15375.903607]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15375.904520]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15375.905430] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15375.907432] NMI backtrace for cpu 0
[15375.908366] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15375.910239] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15375.911180] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15375.912114] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15375.913039] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15375.913969] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15375.914895] RBP: ffffffff81c03e68 R08: 000000008baf5f56 R09: 0000000000000000
[15375.915851] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15375.916776] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15375.917694] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15375.918612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15375.919528] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15375.920459] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15375.921382] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15375.922294] Stack:
[15375.923199]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15375.924126]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15375.925050]  00000dfe03416b9a ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15375.925994] Call Trace:
[15375.926912]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15375.927843]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15375.928768]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15375.929692]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15375.930613]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15375.931531]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15375.932450]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15375.933364]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15375.934282]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15375.935202]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15375.936136] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15390.907159] INFO: rcu_sched self-detected stall on CPU
[15390.908187] 	1: (1 GPs behind) idle=52b/140000000000001/0 softirq=984789/984790 
[15390.909165] 	 (t=6000 jiffies g=472111 c=472110 q=0)
[15390.910126] Task dump for CPU 1:
[15390.911080] trinity-c195    R  running task    13248 20128  19429 0x0000000c
[15390.912048]  ffff880069c02da0 0000000017646c73 ffff880245003d68 ffffffff810a711c
[15390.913012]  ffffffff810a7082 0000000000000001 0000000000000002 0000000000000001
[15390.913958]  ffffffff81c52300 0000000000000092 ffff880245003d88 ffffffff810ab1cd
[15390.914898] Call Trace:
[15390.915822]  <IRQ>  [<ffffffff810a711c>] sched_show_task+0x11c/0x190
[15390.916767]  [<ffffffff810a7082>] ? sched_show_task+0x82/0x190
[15390.917748]  [<ffffffff810ab1cd>] dump_cpu_task+0x3d/0x50
[15390.918709]  [<ffffffff810d9760>] rcu_dump_cpu_stacks+0x90/0xd0
[15390.919675]  [<ffffffff810e0213>] rcu_check_callbacks+0x503/0x770
[15390.920637]  [<ffffffff8112fadc>] ? acct_account_cputime+0x1c/0x20
[15390.921595]  [<ffffffff810abb07>] ? account_system_time+0x97/0x180
[15390.922549]  [<ffffffff810e5c0b>] update_process_times+0x4b/0x80
[15390.923497]  [<ffffffff810f63a3>] ? tick_sched_timer+0x23/0x1b0
[15390.924449]  [<ffffffff810f63cf>] tick_sched_timer+0x4f/0x1b0
[15390.925402]  [<ffffffff810e6900>] __run_hrtimer+0xa0/0x230
[15390.926350]  [<ffffffff810e6beb>] ? hrtimer_interrupt+0x8b/0x260
[15390.927310]  [<ffffffff810f6380>] ? tick_init_highres+0x20/0x20
[15390.928214]  [<ffffffff810e6c67>] hrtimer_interrupt+0x107/0x260
[15390.929111]  [<ffffffff81031e9b>] local_apic_timer_interrupt+0x3b/0x70
[15390.930006]  [<ffffffff817d0505>] smp_apic_timer_interrupt+0x45/0x60
[15390.930898]  [<ffffffff817ce8ef>] apic_timer_interrupt+0x6f/0x80
[15390.931796]  <EOI>  [<ffffffff810c5cf4>] ? lock_acquire+0xb4/0x120
[15390.932697]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15390.933599]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15390.934483]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15390.935343]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15390.936201]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15390.937050]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15390.937883]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15390.938701]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15390.939505]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15390.940303]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15390.941094]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15390.941893]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15390.942690]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15390.943477]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15390.944276] INFO: rcu_sched detected stalls on CPUs/tasks:
[15390.945074] 	1: (1 GPs behind) idle=52b/140000000000001/0 softirq=984789/984790 
[15390.945854] 	(detected by 0, t=6002 jiffies, g=472111, c=472110, q=0)
[15390.946637] Task dump for CPU 1:
[15390.947442] trinity-c195    R  running task    13248 20128  19429 0x0000000c
[15390.948244]  ffff8801c06dff60 ffffffff817c6eb2 000000000203f56b 0000000017646c73
[15390.949047]  ffffffff813743a4 0000000000000000 00007fffe01fcb40 0000000000000000
[15390.949842]  0000000000000000 0000000000000000 ffff8801c06dff40 ffffffff81077056
[15390.950641] Call Trace:
[15390.951431]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15390.952218]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15390.953006]  [<ffffffff81077056>] ? SyS_clone+0x16/0x20
[15390.953793]  [<ffffffff817cddd9>] ? stub_clone+0x69/0x90
[15390.954582]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15403.789654] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c154:22823]
[15403.790519] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15403.795345] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15403.797194] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15403.798137] RIP: 0010:[<ffffffff810fb22e>]  [<ffffffff810fb22e>] generic_exec_single+0xee/0x1b0
[15403.799068] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15403.800047] RAX: 0000000000000008 RBX: ffffffff817ce620 RCX: 0000000000000038
[15403.800987] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15403.801923] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15403.802866] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009455b8a8
[15403.803780] R13: 0000000000406040 R14: ffff880094558000 R15: ffff880235dedb40
[15403.804720] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15403.805633] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15403.806572] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15403.807495] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15403.808423] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15403.809361] Stack:
[15403.810339]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15403.811301]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15403.812253]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15403.813215] Call Trace:
[15403.814145]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15403.815102]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15403.816051]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15403.817000]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15403.817966]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15403.818923]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15403.819903]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15403.820853]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15403.821801]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15403.822753]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15403.823711]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15403.824645]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15403.825601]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15403.826536]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15403.827482]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15403.828423] Code: 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[15403.830537] sending NMI to other CPUs:
[15403.831515] NMI backtrace for cpu 1
[15403.832519] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15403.834564] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15403.835608] RIP: 0010:[<ffffffff810a5d95>]  [<ffffffff810a5d95>] scheduler_tick+0x35/0xe0
[15403.836664] RSP: 0018:ffff880245003e08  EFLAGS: 00000086
[15403.837713] RAX: 0000000000000000 RBX: ffff8802451d2f00 RCX: 0000000000000009
[15403.838768] RDX: 000000000007342f RSI: ffff8802451cc5c0 RDI: ffff8802451ce358
[15403.839816] RBP: ffff880245003e28 R08: 0000000000000000 R09: 0000000000000000
[15403.840866] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15403.841908] R13: 00000000001d2f00 R14: ffff880069c02da0 R15: 00000e2fcb303732
[15403.842944] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15403.843986] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15403.845025] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15403.846067] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15403.847105] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15403.848139] Stack:
[15403.849168]  ffff880069c02da0 0000000000000000 0000000000000001 ffff8802451cc880
[15403.850226]  ffff880245003e58 ffffffff810e5c24 ffffffff810f63a3 ffff8802451cce60
[15403.851285]  ffff8801c06dfb08 00000e048872d5bd ffff880245003e98 ffffffff810f63cf
[15403.852339] Call Trace:
[15403.853358]  <IRQ> 

[15403.854360]  [<ffffffff810e5c24>] update_process_times+0x64/0x80
[15403.855339]  [<ffffffff810f63a3>] ? tick_sched_timer+0x23/0x1b0
[15403.856296]  [<ffffffff810f63cf>] tick_sched_timer+0x4f/0x1b0
[15403.857230]  [<ffffffff810e6900>] __run_hrtimer+0xa0/0x230
[15403.858151]  [<ffffffff810e6beb>] ? hrtimer_interrupt+0x8b/0x260
[15403.859055]  [<ffffffff810f6380>] ? tick_init_highres+0x20/0x20
[15403.859937]  [<ffffffff810e6c67>] hrtimer_interrupt+0x107/0x260
[15403.860809]  [<ffffffff81031e9b>] local_apic_timer_interrupt+0x3b/0x70
[15403.861677]  [<ffffffff817d0505>] smp_apic_timer_interrupt+0x45/0x60
[15403.862539]  [<ffffffff817ce8ef>] apic_timer_interrupt+0x6f/0x80
[15403.863404]  <EOI> 

[15403.864270]  [<ffffffff810c5cf4>] ? lock_acquire+0xb4/0x120
[15403.865133]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15403.865999]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15403.866855]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15403.867709]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15403.868551]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15403.869397]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15403.870244]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15403.871090]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15403.871928]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15403.872757]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15403.873581]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15403.874400]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15403.875215]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15403.876034]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15403.876854] Code: 56 41 55 49 c7 c5 00 2f 1d 00 41 54 65 44 8b 24 25 2c a0 00 00 4d 63 e4 53 4c 89 eb 4a 03 1c e5 a0 f0 d1 81 4c 8b b3 50 09 00 00 <e8> 46 5a 00 00 48 89 df e8 fe 6b 72 00 8b b3 88 00 00 00 85 f6 
[15403.878669] NMI backtrace for cpu 0
[15403.879531] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15403.881306] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15403.882213] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15403.883130] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15403.884046] RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
[15403.884974] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15403.885895] RBP: ffffffff81c03e68 R08: 000000008baf94ae R09: 0000000000000000
[15403.886819] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[15403.887743] R13: 0000000000000020 R14: 0000000000000003 R15: ffffffff81c00000
[15403.888643] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15403.889530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15403.890410] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15403.891303] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15403.892187] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15403.893049] Stack:
[15403.893886]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000004
[15403.894739]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15403.895587]  00000e048864f135 ffffffff81cb1b38 ffffffff81cb19c0 ffffffff81d213f0
[15403.896437] Call Trace:
[15403.897273]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15403.898118]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15403.898959]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15403.899803]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15403.900634]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15403.901461]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15403.902279]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15403.903098]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15403.903919]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15403.904738]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15403.905542] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15403.907331] NMI backtrace for cpu 3
[15403.908216] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15403.910040] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15403.910967] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15403.911886] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15403.912807] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15403.913741] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15403.914671] RBP: ffff88024372be38 R08: 000000008baf94ae R09: 0000000000000000
[15403.915600] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15403.916530] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15403.917459] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15403.918390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15403.919323] CR2: 00007f3d704ad510 CR3: 0000000001c11000 CR4: 00000000001407e0
[15403.920290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15403.921227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15403.922170] Stack:
[15403.923088]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15403.924041]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15403.924999]  00000e048856e4d7 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15403.925963] Call Trace:
[15403.926915]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15403.927880]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15403.928851]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15403.929856]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15403.930814] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15403.932954] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 101.416 msecs
[15431.763416] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15431.764423] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15431.769803] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15431.771905] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15431.772970] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15431.774087] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15431.775164] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15431.776247] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15431.777335] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15431.778416] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15431.779490] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15431.780563] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15431.781650] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15431.782741] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15431.783871] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15431.784955] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15431.786032] Stack:
[15431.787101]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15431.788192]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15431.789284]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15431.790380] Call Trace:
[15431.791470]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15431.792574]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15431.793719]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15431.794824]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15431.795932]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15431.797039]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15431.798144]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15431.799257]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15431.800350]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15431.801415]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15431.802477]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15431.803569]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15431.804612]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15431.805638]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15431.806658] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15431.808871] sending NMI to other CPUs:
[15431.809913] NMI backtrace for cpu 2
[15431.810970] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15431.813123] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15431.814216] RIP: 0010:[<ffffffff810fb22e>]  [<ffffffff810fb22e>] generic_exec_single+0xee/0x1b0
[15431.815316] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15431.816408] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15431.817508] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15431.818611] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15431.819708] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15431.820802] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15431.821891] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15431.822984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15431.824070] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15431.825163] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15431.826259] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15431.827346] Stack:
[15431.828426]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15431.829512]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15431.830583]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15431.831638] Call Trace:
[15431.832672]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15431.833696]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15431.834691]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15431.835665]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15431.836615]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15431.837546]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15431.838465]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15431.839368]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15431.840247]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15431.841119]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15431.841984]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15431.842844]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15431.843703]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15431.844564]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15431.845423]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15431.846275] Code: 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[15431.848144] NMI backtrace for cpu 3
[15431.849023] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15431.850835] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15431.851759] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15431.852688] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15431.853609] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15431.854534] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15431.855455] RBP: ffff88024372be38 R08: 000000008baf9453 R09: 0000000000000000
[15431.856371] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15431.857285] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15431.858198] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15431.859119] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15431.860043] CR2: 00007fb66fe8d000 CR3: 0000000001c11000 CR4: 00000000001407e0
[15431.860974] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15431.861900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15431.862823] Stack:
[15431.863743]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15431.864692]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15431.865643]  00000e0b0cac5a1b ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15431.866597] Call Trace:
[15431.867545]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15431.868502]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15431.869430]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15431.870340]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15431.871248] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15431.873246] NMI backtrace for cpu 0
[15431.874180] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15431.875997] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15431.876912] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15431.877839] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15431.878761] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15431.879690] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15431.880614] RBP: ffffffff81c03e68 R08: 000000008baf9453 R09: 0000000000000000
[15431.881543] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15431.882464] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15431.883404] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15431.884320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15431.885237] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15431.886166] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15431.887087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15431.887999] Stack:
[15431.888902]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15431.889827]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15431.890750]  00000e0b0d00864a ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15431.891667] Call Trace:
[15431.892585]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15431.893539]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15431.894459]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15431.895379]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15431.896298]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15431.897213]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15431.898129]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15431.899043]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15431.899956]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15431.900870]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15431.901781] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15455.749491] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15455.750478] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15455.755714] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15455.757759] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15455.758788] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15455.759874] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15455.760911] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15455.761959] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15455.763008] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15455.764058] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15455.765101] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15455.766140] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15455.767184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15455.768231] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15455.769293] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15455.770388] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15455.771434] Stack:
[15455.772464]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15455.773521]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15455.774595]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15455.775650] Call Trace:
[15455.776695]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15455.777753]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15455.778815]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15455.779897]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15455.780917]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15455.781937]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15455.782953]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15455.783954]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15455.784939]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15455.785912]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15455.786881]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15455.787847]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15455.788813]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15455.789820]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15455.790787] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15455.792887] sending NMI to other CPUs:
[15455.793879] NMI backtrace for cpu 2
[15455.794911] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15455.796992] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15455.798051] RIP: 0010:[<ffffffff810fb22e>]  [<ffffffff810fb22e>] generic_exec_single+0xee/0x1b0
[15455.799120] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15455.800181] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15455.801251] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15455.802312] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15455.803370] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15455.804434] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15455.805497] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15455.806568] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15455.807644] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15455.808728] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15455.809809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15455.810889] Stack:
[15455.811962]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15455.813062]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15455.814144]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15455.815201] Call Trace:
[15455.816245]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15455.817277]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15455.818277]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15455.819260]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15455.820220]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15455.821159]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15455.822081]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15455.822985]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15455.823872]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15455.824751]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15455.825624]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15455.826490]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15455.827357]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15455.828223]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15455.829087]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15455.829946] Code: 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[15455.831821] NMI backtrace for cpu 0
[15455.832705] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15455.834530] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15455.835459] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15455.836393] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15455.837317] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15455.838245] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15455.839167] RBP: ffffffff81c03e68 R08: 000000008baf9406 R09: 0000000000000000
[15455.840088] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15455.841005] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15455.841922] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15455.842846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15455.843771] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15455.844702] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15455.845633] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15455.846557] Stack:
[15455.847479]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15455.848429]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15455.849383]  00000e10a3641194 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15455.850336] Call Trace:
[15455.851281]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15455.852238]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15455.853168]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15455.854074]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15455.854979]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15455.855879]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15455.856777]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15455.857652]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15455.858509]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15455.859360]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15455.860198] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15455.862051] NMI backtrace for cpu 3
[15455.862965] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15455.864827] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15455.865772] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15455.866725] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15455.867666] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15455.868617] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15455.869587] RBP: ffff88024372be38 R08: 000000008baf9406 R09: 0000000000000000
[15455.870547] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15455.871512] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15455.872459] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15455.873402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15455.874337] CR2: 00007fcb01b8ddc8 CR3: 0000000001c11000 CR4: 00000000001407e0
[15455.875280] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15455.876223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15455.877167] Stack:
[15455.878099]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15455.879058]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15455.880070]  00000e10a32f62cd ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15455.881036] Call Trace:
[15455.881988]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15455.882958]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15455.883922]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15455.884889]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15455.885855] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15479.735569] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15479.736560] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15479.741807] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15479.743855] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15479.744885] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15479.745974] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15479.747013] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15479.748060] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15479.749108] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15479.750160] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15479.751208] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15479.752252] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15479.753302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15479.754354] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15479.755417] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15479.756512] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15479.757559] Stack:
[15479.758599]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15479.759664]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15479.760720]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15479.761780] Call Trace:
[15479.762833]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15479.763899]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15479.764966]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15479.766079]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15479.767149]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15479.768222]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15479.769292]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15479.770362]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15479.771437]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15479.772489]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15479.773518]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15479.774548]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15479.775590]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15479.776595]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15479.777587] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15479.779714] sending NMI to other CPUs:
[15479.780720] NMI backtrace for cpu 2
[15479.781764] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15479.783889] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15479.784972] RIP: 0010:[<ffffffff810fb22a>]  [<ffffffff810fb22a>] generic_exec_single+0xea/0x1b0
[15479.786068] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15479.787156] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15479.788247] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15479.789336] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15479.790430] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15479.791519] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15479.792602] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15479.793683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15479.794764] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15479.795847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15479.796924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15479.797999] Stack:
[15479.799068]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15479.800165]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15479.801246]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15479.802301] Call Trace:
[15479.803339]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15479.804368]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15479.805369]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15479.806347]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15479.807302]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15479.808240]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15479.809161]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15479.810064]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15479.810945]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15479.811823]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15479.812694]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15479.813558]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15479.814420]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15479.815282]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15479.816142]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15479.816999] Code: c0 3a 1d 00 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[15479.818876] NMI backtrace for cpu 0
[15479.819760] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15479.821583] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15479.822511] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15479.823444] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15479.824367] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15479.825295] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15479.826216] RBP: ffffffff81c03e68 R08: 000000008baf93ba R09: 0000000000000000
[15479.827135] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15479.828051] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15479.828966] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15479.829890] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15479.830816] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15479.831748] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15479.832678] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15479.833606] Stack:
[15479.834532]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15479.835483]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15479.836437]  00000e1639f2763b ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15479.837392] Call Trace:
[15479.838340]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15479.839299]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15479.840232]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15479.841143]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15479.842054]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15479.842961]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15479.843859]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15479.844740]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15479.845601]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15479.846457]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15479.847298] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15479.849161] NMI backtrace for cpu 3
[15479.850081] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15479.851956] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15479.852907] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15479.853869] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15479.854814] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15479.855797] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15479.856752] RBP: ffff88024372be38 R08: 000000008baf93ba R09: 0000000000000000
[15479.857706] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15479.858655] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15479.859594] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15479.860536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15479.861477] CR2: 00007fb66fe8d000 CR3: 0000000001c11000 CR4: 00000000001407e0
[15479.862424] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15479.863374] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15479.864320] Stack:
[15479.865259]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15479.866246]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15479.867210]  00000e1639b23221 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15479.868181] Call Trace:
[15479.869141]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15479.870115]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15479.871084]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15479.872056]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15479.873024] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15503.721645] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15503.722660] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15503.728065] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15503.730161] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15503.731220] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15503.732323] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15503.733390] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15503.734464] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15503.735538] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15503.736613] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15503.737686] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15503.738753] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15503.739825] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15503.740900] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15503.742025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15503.743107] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15503.744174] Stack:
[15503.745237]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15503.746321]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15503.747402]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15503.748483] Call Trace:
[15503.749558]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15503.750647]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15503.751777]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15503.752869]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15503.753960]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15503.755052]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15503.756141]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15503.757236]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15503.758334]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15503.759408]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15503.760459]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15503.761510]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15503.762592]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15503.763620]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15503.764630] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15503.766839] sending NMI to other CPUs:
[15503.767869] NMI backtrace for cpu 2
[15503.768911] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15503.771029] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15503.772108] RIP: 0010:[<ffffffff810fb22e>]  [<ffffffff810fb22e>] generic_exec_single+0xee/0x1b0
[15503.773197] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15503.774278] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15503.775363] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15503.776449] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15503.777541] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15503.778624] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15503.779703] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15503.780783] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15503.781859] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15503.782940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15503.784017] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15503.785088] Stack:
[15503.786155]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15503.787248]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15503.788321]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15503.789368] Call Trace:
[15503.790402]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15503.791424]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15503.792418]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15503.793393]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15503.794348]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15503.795279]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15503.796195]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15503.797090]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15503.797967]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15503.798842]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15503.799708]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15503.800568]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15503.801426]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15503.802284]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15503.803144]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15503.803995] Code: 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[15503.805861] NMI backtrace for cpu 0
[15503.806741] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15503.808556] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15503.809478] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15503.810409] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15503.811329] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15503.812252] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15503.813170] RBP: ffffffff81c03e68 R08: 000000008baf936d R09: 0000000000000000
[15503.814082] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15503.814996] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15503.815908] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15503.816831] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15503.817751] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15503.818679] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15503.819606] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15503.820528] Stack:
[15503.821448]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15503.822395]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15503.823347]  00000e1bd0854ef6 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15503.824298] Call Trace:
[15503.825238]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15503.826192]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15503.827119]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15503.828030]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15503.828933]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15503.829838]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15503.830737]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15503.831615]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15503.832476]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15503.833324]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15503.834162] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15503.836024] NMI backtrace for cpu 3
[15503.836937] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15503.838808] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15503.839754] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15503.840709] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15503.841668] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15503.842628] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15503.843584] RBP: ffff88024372be38 R08: 000000008baf936d R09: 0000000000000000
[15503.844532] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15503.845472] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15503.846410] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15503.847351] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15503.848287] CR2: 00007f3d70900000 CR3: 0000000001c11000 CR4: 00000000001407e0
[15503.849227] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15503.850175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15503.851116] Stack:
[15503.852075]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15503.853033]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15503.853996]  00000e1bd035579e ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15503.854964] Call Trace:
[15503.855919]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15503.856889]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15503.857856]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15503.858823]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15503.859792] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15527.707724] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15527.708738] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15527.714144] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15527.716239] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15527.717296] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15527.718404] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15527.719467] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15527.720542] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15527.721617] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15527.722698] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15527.723769] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15527.724838] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15527.725913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15527.726988] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15527.728114] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15527.729195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15527.730264] Stack:
[15527.731329]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15527.732413]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15527.733496]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15527.734579] Call Trace:
[15527.735657]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15527.736745]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15527.737876]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15527.738967]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15527.740061]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15527.741153]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15527.742247]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15527.743341]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15527.744436]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15527.745510]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15527.746561]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15527.747613]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15527.748697]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15527.749728]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15527.750744] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15527.752936] sending NMI to other CPUs:
[15527.753966] NMI backtrace for cpu 2
[15527.755011] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15527.757137] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15527.758219] RIP: 0010:[<ffffffff810fb22a>]  [<ffffffff810fb22a>] generic_exec_single+0xea/0x1b0
[15527.759312] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15527.760398] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15527.761486] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15527.762575] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15527.763669] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15527.764759] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15527.765841] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15527.766922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15527.768002] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15527.769085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15527.770164] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15527.771239] Stack:
[15527.772311]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15527.773405]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15527.774481]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15527.775537] Call Trace:
[15527.776573]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15527.777602]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15527.778601]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15527.779578]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15527.780533]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15527.781466]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15527.782385]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15527.783286]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15527.784169]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15527.785043]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15527.785910]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15527.786771]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15527.787633]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15527.788494]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15527.789353]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15527.790209] Code: c0 3a 1d 00 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[15527.792084] NMI backtrace for cpu 0
[15527.792967] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15527.794790] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15527.795714] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15527.796646] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15527.797567] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15527.798491] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15527.799410] RBP: ffffffff81c03e68 R08: 000000008baf9322 R09: 0000000000000000
[15527.800326] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15527.801238] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15527.802152] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15527.803078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15527.804002] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15527.804932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15527.805862] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15527.806783] Stack:
[15527.807706]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15527.808654]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15527.809606]  00000e216708a957 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15527.810560] Call Trace:
[15527.811502]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15527.812458]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15527.813387]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15527.814298]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15527.815206]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15527.816111]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15527.817012]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15527.817888]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15527.818748]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15527.819602]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15527.820441] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15527.822303] NMI backtrace for cpu 3
[15527.823217] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15527.825093] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15527.826045] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15527.827004] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15527.827975] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15527.828929] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15527.829884] RBP: ffff88024372be38 R08: 000000008baf9322 R09: 0000000000000000
[15527.830835] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15527.831782] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15527.832725] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15527.833671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15527.834610] CR2: 00007fb66fe8d000 CR3: 0000000001c11000 CR4: 00000000001407e0
[15527.835553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15527.836504] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15527.837452] Stack:
[15527.838415]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15527.839379]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15527.840346]  00000e2166b83852 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15527.841318] Call Trace:
[15527.842279]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15527.843254]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15527.844223]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15527.845193]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15527.846163] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15551.693803] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
[15551.694820] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15551.700151] CPU: 1 PID: 20128 Comm: trinity-c195 Tainted: G             L 3.18.0+ #107
[15551.702255] task: ffff880069c02da0 ti: ffff8801c06dc000 task.ti: ffff8801c06dc000
[15551.703316] RIP: 0010:[<ffffffff810c5cf4>]  [<ffffffff810c5cf4>] lock_acquire+0xb4/0x120
[15551.704425] RSP: 0018:ffff8801c06dfbb0  EFLAGS: 00000246
[15551.705491] RAX: ffff880069c02da0 RBX: 0000000180140014 RCX: ffff8802451cff98
[15551.706569] RDX: 00000000000015a0 RSI: 0000000000000018 RDI: 0000000000000000
[15551.707645] RBP: ffff8801c06dfc10 R08: 0000000000000000 R09: 0000000000000000
[15551.708722] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c06dfba0
[15551.709794] R13: ffffffff810ab7d5 R14: ffff8801c06dfb20 R15: ffff88024483fc70
[15551.710862] FS:  00007f9dcd485740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15551.711937] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15551.713015] CR2: 0000000001e259f8 CR3: 00000002289f7000 CR4: 00000000001407e0
[15551.714153] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15551.715236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15551.716311] Stack:
[15551.717374]  ffffffff811a4cc3 ffff880000000000 ffff8801c06dfc10 0000000000000246
[15551.718462]  000000019449eff0 ffff88005e137c38 00000000000000d0 ffff88005e137c20
[15551.719545]  ffff88005e137c38 0000000002422000 ffff8802332eb110 0000000000000000
[15551.720630] Call Trace:
[15551.721706]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15551.722796]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15551.723935]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15551.725027]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15551.726119]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15551.727214]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15551.728307]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15551.729402]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15551.730507]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15551.731572]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15551.732624]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15551.733679]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15551.734760]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15551.735791]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15551.736805] Code: d8 49 c1 e8 09 48 89 04 24 49 83 f0 01 41 83 e0 01 e8 01 ef ff ff 65 48 8b 04 25 00 aa 00 00 c7 80 6c 07 00 00 00 00 00 00 53 9d <48> 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 65 
[15551.738995] sending NMI to other CPUs:
[15551.740026] NMI backtrace for cpu 2
[15551.741070] CPU: 2 PID: 22823 Comm: trinity-c154 Tainted: G             L 3.18.0+ #107
[15551.743195] task: ffff880235dedb40 ti: ffff880094558000 task.ti: ffff880094558000
[15551.744276] RIP: 0010:[<ffffffff810fb22e>]  [<ffffffff810fb22e>] generic_exec_single+0xee/0x1b0
[15551.745368] RSP: 0000:ffff88009455b938  EFLAGS: 00000202
[15551.746450] RAX: 0000000000000008 RBX: ffff88009455b950 RCX: 0000000000000038
[15551.747536] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[15551.748624] RBP: ffff88009455b998 R08: ffff88024370c3f0 R09: 0000000000000000
[15551.749717] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15551.750805] R13: 0000000000000001 R14: ffff880225e61780 R15: 0000000000000002
[15551.751885] FS:  00007f9dcd485740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15551.752967] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15551.754047] CR2: 0000000001f9cfd8 CR3: 000000009ae2d000 CR4: 00000000001407e0
[15551.755130] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15551.756204] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15551.757280] Stack:
[15551.758349]  ffff88009455b948 0000000000000001 ffff88024370c3f0 0000000000000000
[15551.759443]  ffffffff81048cc0 ffff88009455ba48 0000000000000003 0000000077fab51f
[15551.760518]  0000000000000001 00000000ffffffff 0000000000000001 ffffffff81048cc0
[15551.761574] Call Trace:
[15551.762611]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15551.763634]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15551.764628]  [<ffffffff810fb390>] smp_call_function_single+0x70/0xd0
[15551.765605]  [<ffffffff81048cc0>] ? do_flush_tlb_all+0x60/0x60
[15551.766558]  [<ffffffff810fba89>] smp_call_function_many+0x2b9/0x320
[15551.767490]  [<ffffffff81049010>] flush_tlb_mm_range+0x90/0x1d0
[15551.768408]  [<ffffffff811a1a42>] tlb_flush_mmu_tlbonly+0x42/0x50
[15551.769309]  [<ffffffff811a2fa8>] unmap_single_vma+0x6b8/0x900
[15551.770190]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15551.771068]  [<ffffffff811a34d4>] unmap_mapping_range+0x134/0x190
[15551.771935]  [<ffffffff81191f0d>] shmem_fallocate+0x4fd/0x520
[15551.772795]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15551.773656]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15551.774518]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15551.775378]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15551.776235] Code: 48 89 de 48 03 14 c5 a0 f0 d1 81 48 89 df e8 da de 27 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[15551.778111] NMI backtrace for cpu 0
[15551.778992] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15551.780810] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15551.781738] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15551.782669] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15551.783592] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15551.784519] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15551.785439] RBP: ffffffff81c03e68 R08: 000000008baf92d6 R09: 0000000000000000
[15551.786357] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15551.787273] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15551.788189] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15551.789114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15551.790036] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407f0
[15551.790968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15551.791900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15551.792824] Stack:
[15551.793747]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15551.794696]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15551.795650]  00000e26fd8b172b ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15551.796601] Call Trace:
[15551.797546]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15551.798502]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15551.799432]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15551.800343]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15551.801247]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15551.802151]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15551.803050]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15551.803928]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15551.804786]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15551.805637]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15551.806475] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15551.808334] NMI backtrace for cpu 3
[15551.809246] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15551.811116] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15551.812066] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15551.813027] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15551.813992] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15551.814942] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15551.815894] RBP: ffff88024372be38 R08: 000000008baf92d6 R09: 0000000000000000
[15551.816843] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15551.817784] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15551.818718] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15551.819658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15551.820593] CR2: 00007f3d70914008 CR3: 0000000001c11000 CR4: 00000000001407e0
[15551.821533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15551.822478] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15551.823418] Stack:
[15551.824374]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15551.825334]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15551.826295]  00000e26fd3b2c1a ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15551.827258] Call Trace:
[15551.828213]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15551.829182]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15551.830143]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15551.831107]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15551.832072] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15570.852710] INFO: rcu_sched self-detected stall on CPU
[15570.853735] 	1: (1 GPs behind) idle=52b/140000000000001/0 softirq=984789/984790 
[15570.854714] 	 (t=24005 jiffies g=472111 c=472110 q=0)
[15570.855675] Task dump for CPU 1:
[15570.856626] trinity-c195    R  running task    13248 20128  19429 0x0000000c
[15570.857593]  ffff880069c02da0 0000000017646c73 ffff880245003d68 ffffffff810a711c
[15570.858554]  ffffffff810a7082 0000000000000001 0000000000000002 0000000000000001
[15570.859513]  ffffffff81c52300 0000000000000092 ffff880245003d88 ffffffff810ab1cd
[15570.860447] Call Trace:
[15570.861368]  <IRQ>  [<ffffffff810a711c>] sched_show_task+0x11c/0x190
[15570.862309]  [<ffffffff810a7082>] ? sched_show_task+0x82/0x190
[15570.863287]  [<ffffffff810ab1cd>] dump_cpu_task+0x3d/0x50
[15570.864250]  [<ffffffff810d9760>] rcu_dump_cpu_stacks+0x90/0xd0
[15570.865213]  [<ffffffff810e0213>] rcu_check_callbacks+0x503/0x770
[15570.866178]  [<ffffffff8112fadc>] ? acct_account_cputime+0x1c/0x20
[15570.867139]  [<ffffffff810abb07>] ? account_system_time+0x97/0x180
[15570.868092]  [<ffffffff810e5c0b>] update_process_times+0x4b/0x80
[15570.869053]  [<ffffffff810f63a3>] ? tick_sched_timer+0x23/0x1b0
[15570.869997]  [<ffffffff810f63cf>] tick_sched_timer+0x4f/0x1b0
[15570.870951]  [<ffffffff810e6900>] __run_hrtimer+0xa0/0x230
[15570.871901]  [<ffffffff810e6beb>] ? hrtimer_interrupt+0x8b/0x260
[15570.872862]  [<ffffffff810f6380>] ? tick_init_highres+0x20/0x20
[15570.873767]  [<ffffffff810e6c67>] hrtimer_interrupt+0x107/0x260
[15570.874670]  [<ffffffff81031e9b>] local_apic_timer_interrupt+0x3b/0x70
[15570.875569]  [<ffffffff817d0505>] smp_apic_timer_interrupt+0x45/0x60
[15570.876467]  [<ffffffff817ce8ef>] apic_timer_interrupt+0x6f/0x80
[15570.877366]  <EOI>  [<ffffffff810c5cf4>] ? lock_acquire+0xb4/0x120
[15570.878275]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15570.879182]  [<ffffffff817ccadc>] _raw_spin_lock_nested+0x3c/0x80
[15570.880088]  [<ffffffff811a4cc3>] ? copy_page_range+0x493/0xa20
[15570.881000]  [<ffffffff811a4cc3>] copy_page_range+0x493/0xa20
[15570.881911]  [<ffffffff8107648f>] copy_process.part.26+0x146f/0x1a40
[15570.882824]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15570.883732]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15570.884647]  [<ffffffff8109ef5d>] ? finish_task_switch+0x7d/0x120
[15570.885544]  [<ffffffff8109ef1f>] ? finish_task_switch+0x3f/0x120
[15570.886410]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15570.887274]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15570.888136]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15570.888977]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15570.889798]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15570.890616] INFO: rcu_sched detected stalls on CPUs/tasks:
[15570.891487] 	1: (1 GPs behind) idle=52b/140000000000001/0 softirq=984789/984790 
[15570.892485] 	(detected by 2, t=24008 jiffies, g=472111, c=472110, q=0)
[15570.893412] Task dump for CPU 1:
[15570.894307] trinity-c195    R  running task    13248 20128  19429 0x0000000c
[15570.895226]  ffff8801c06dff60 ffffffff817c6eb2 000000000203f56b 0000000017646c73
[15570.896250]  ffffffff813743a4 0000000000000000 00007fffe01fcb40 0000000000000000
[15570.897134]  0000000000000000 0000000000000000 ffff8801c06dff40 ffffffff81077056
[15570.898021] Call Trace:
[15570.898891]  [<ffffffff817c6eb2>] ? __schedule+0x352/0x8c0
[15570.899789]  [<ffffffff813743a4>] ? lockdep_sys_exit_thunk+0x35/0x67
[15570.900688]  [<ffffffff81077056>] ? SyS_clone+0x16/0x20
[15570.901556]  [<ffffffff817cddd9>] ? stub_clone+0x69/0x90
[15570.902430]  [<ffffffff817cda52>] ? system_call_fastpath+0x12/0x17
[15589.522001] sched: RT throttling activated
[15611.669013] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [trinity-c170:30232]
[15611.669927] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15611.674793] CPU: 1 PID: 30232 Comm: trinity-c170 Tainted: G             L 3.18.0+ #107
[15611.676697] task: ffff880226e70000 ti: ffff8802276e4000 task.ti: ffff8802276e4000
[15611.677659] RIP: 0010:[<ffffffff8118f845>]  [<ffffffff8118f845>] shmem_write_end+0x65/0xf0
[15611.678654] RSP: 0018:ffff8802276e7c08  EFLAGS: 00000202
[15611.679726] RAX: 001ffe0000080005 RBX: ffff8802276e7c80 RCX: 1000ef5abfc6b000
[15611.680754] RDX: 1000ef5abfc6a000 RSI: ffff88009ac54a00 RDI: ffff88020ccd9500
[15611.681797] RBP: ffff8802276e7c28 R08: 0000000000001000 R09: ffffea00011734c0
[15611.682859] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8118ffe4
[15611.683899] R13: ffff8802276e7c18 R14: 0000000000000003 R15: 0000000000000003
[15611.684945] FS:  00007fdf412f4740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15611.686005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15611.687067] CR2: 00007fdf412fb055 CR3: 0000000227d30000 CR4: 00000000001407e0
[15611.688105] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15611.689152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15611.690228] Stack:
[15611.691283]  1000ef5abfc6a000 0000000000001000 ffff8802276e7d60 0000000000000000
[15611.692367]  ffff8802276e7cc8 ffffffff811754da ffff8802276e7d18 0000000000001000
[15611.693438]  ffff880226e70000 0000000081204df1 0000000003341000 0000000000000000
[15611.694522] Call Trace:
[15611.695606]  [<ffffffff811754da>] generic_perform_write+0x11a/0x1f0
[15611.696694]  [<ffffffff81177bf2>] __generic_file_write_iter+0x162/0x350
[15611.697783]  [<ffffffff811e6d90>] ? new_sync_read+0xd0/0xd0
[15611.698872]  [<ffffffff81177e1f>] generic_file_write_iter+0x3f/0xb0
[15611.699974]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.701062]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15611.702161]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15611.703259]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.704386]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.705488]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15611.706610]  [<ffffffff811e895c>] vfs_writev+0x3c/0x50
[15611.707726]  [<ffffffff811e8acc>] SyS_writev+0x5c/0x100
[15611.708838]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15611.709950] Code: df e8 20 63 fe ff 48 89 df e8 a8 47 ff ff 5b 44 89 e0 41 5c 41 5d 41 5e 5d c3 0f 1f 40 00 41 81 fc ff 0f 00 00 76 0f f0 80 0b 08 <eb> c9 66 0f 1f 84 00 00 00 00 00 81 e2 ff 0f 00 00 46 8d 2c 22 
[15611.712349] sending NMI to other CPUs:
[15611.713501] NMI backtrace for cpu 3
[15611.714511] CPU: 3 PID: 29352 Comm: trinity-c56 Tainted: G             L 3.18.0+ #107
[15611.716572] task: ffff8801c067c470 ti: ffff88017c840000 task.ti: ffff88017c840000
[15611.717621] RIP: 0010:[<ffffffff81372bc5>]  [<ffffffff81372bc5>] copy_user_enhanced_fast_string+0x5/0x10
[15611.718691] RSP: 0018:ffff88017c843c60  EFLAGS: 00010286
[15611.719758] RAX: ffff8801711e0000 RBX: ffff88017c843d60 RCX: 0000000000000fc0
[15611.720830] RDX: 0000000000001000 RSI: ffff8801711e0040 RDI: 00007fdf3fdd67c2
[15611.721900] RBP: ffff88017c843cb8 R08: ffff8801711e0000 R09: ffff88022d82a6d8
[15611.722963] R10: ffff88017c843b88 R11: 0000000000000000 R12: 0000000000001000
[15611.724022] R13: 000000000006c782 R14: ffff880095736ad0 R15: 0000000000001000
[15611.725076] FS:  00007fdf412f4740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15611.726136] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15611.727191] CR2: 00007fdf403eaa16 CR3: 0000000223d4e000 CR4: 00000000001407e0
[15611.728253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15611.729308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15611.730357] Stack:
[15611.731402]  ffffffff8119f74d ffff8801711e0000 00007fdf3fdd6782 ffff8801711e0000
[15611.732469]  ffffea0005c47800 ffffea0005c47800 000000000008d423 0000000000000000
[15611.733543]  ffff88009ac56820 0000000000001000 0000000000000000 ffff88017c843d48
[15611.734620] Call Trace:
[15611.735681]  [<ffffffff8119f74d>] ? copy_page_to_iter+0x19d/0x340
[15611.736742]  [<ffffffff81190b7b>] shmem_file_read_iter+0xcb/0x300
[15611.737783]  [<ffffffff81190ab0>] ? shmem_fault+0x1c0/0x1c0
[15611.738800]  [<ffffffff811e6cc0>] ? do_sync_readv_writev+0xa0/0xa0
[15611.739797]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15611.740770]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15611.741721]  [<ffffffff81190ab0>] ? shmem_fault+0x1c0/0x1c0
[15611.742657]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15611.743576]  [<ffffffff811e8909>] vfs_readv+0x39/0x50
[15611.744474]  [<ffffffff811e89cc>] SyS_readv+0x5c/0x100
[15611.745359]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15611.746238] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[15611.748165] NMI backtrace for cpu 2
[15611.749082] CPU: 2 PID: 28939 Comm: trinity-c179 Tainted: G             L 3.18.0+ #107
[15611.750987] task: ffff880222f696d0 ti: ffff88019cc6c000 task.ti: ffff88019cc6c000
[15611.751942] RIP: 0010:[<ffffffff81372bc5>]  [<ffffffff81372bc5>] copy_user_enhanced_fast_string+0x5/0x10
[15611.752910] RSP: 0000:ffff88019cc6fbe0  EFLAGS: 00010202
[15611.753867] RAX: 00007fdf3d6bdc01 RBX: 0000000000000000 RCX: 0000000000000880
[15611.754833] RDX: 0000000000001000 RSI: 00007fdf3d6be381 RDI: ffff88011f624780
[15611.755806] RBP: ffff88019cc6fc28 R08: 0000000000000000 R09: 0000000000000001
[15611.756775] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880069dae0a8
[15611.757738] R13: 0000000000001000 R14: 0000000000001000 R15: ffff88011f624000
[15611.758699] FS:  00007fdf412f4740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15611.759671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15611.760632] CR2: 00007fdf400c0727 CR3: 00000001c07a0000 CR4: 00000000001407e0
[15611.761606] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15611.762573] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15611.763535] Stack:
[15611.764491]  ffffffff8119f2e6 ffff88019cc6fbf8 000000000f8c4000 0000000000001000
[15611.765477]  000000000f8c4000 0000000000001000 ffff88019cc6fd60 0000000000000000
[15611.766460]  ffff88009ac57710 ffff88019cc6fcc8 ffffffff811754b7 ffff88019cc6fc88
[15611.767449] Call Trace:
[15611.768424]  [<ffffffff8119f2e6>] ? iov_iter_copy_from_user_atomic+0x156/0x180
[15611.769424]  [<ffffffff811754b7>] generic_perform_write+0xf7/0x1f0
[15611.770420]  [<ffffffff81177bf2>] __generic_file_write_iter+0x162/0x350
[15611.771416]  [<ffffffff811e6d90>] ? new_sync_read+0xd0/0xd0
[15611.772417]  [<ffffffff81177e1f>] generic_file_write_iter+0x3f/0xb0
[15611.773419]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.774405]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15611.775369]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15611.776326]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.777292]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.778248]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15611.779188]  [<ffffffff811e895c>] vfs_writev+0x3c/0x50
[15611.780107]  [<ffffffff811e8acc>] SyS_writev+0x5c/0x100
[15611.781012]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15611.781902] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[15611.783859] NMI backtrace for cpu 0
[15611.784812] CPU: 0 PID: 29177 Comm: trinity-c146 Tainted: G             L 3.18.0+ #107
[15611.786780] task: ffff880095be0000 ti: ffff880223dd8000 task.ti: ffff880223dd8000
[15611.787780] RIP: 0010:[<ffffffff8136b93c>]  [<ffffffff8136b93c>] __radix_tree_create+0x6c/0x220
[15611.788788] RSP: 0000:ffff880223ddba88  EFLAGS: 00000046
[15611.789786] RAX: ffff8800984c7100 RBX: 0000000000000000 RCX: 0000000000000000
[15611.790790] RDX: 0000000000000000 RSI: 000000000008d689 RDI: ffff88009ac56a38
[15611.791801] RBP: ffff880223ddbad8 R08: 0000000000000000 R09: 0000000000000001
[15611.792806] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa
[15611.793794] R13: 000000000008d689 R14: ffff88009ac56a38 R15: 0000000000000009
[15611.794770] FS:  00007fdf412f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15611.795754] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15611.796746] CR2: 00007fdf40250096 CR3: 000000017a9d0000 CR4: 00000000001407f0
[15611.797760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15611.798765] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15611.799775] Stack:
[15611.800774]  ffff880223ddbaf8 ffff880223ddbaf0 0000000000000002 ffff880223ddbaf8
[15611.801798]  ffff880223ddbaf8 ffffea0008891980 0000000000000000 000000000008d689
[15611.802817]  ffff88009ac56a38 0000000000000000 ffff880223ddbb28 ffffffff8136bb31
[15611.803841] Call Trace:
[15611.804858]  [<ffffffff8136bb31>] radix_tree_insert+0x41/0xf0
[15611.805881]  [<ffffffff8118f798>] shmem_add_to_page_cache+0xf8/0x140
[15611.806908]  [<ffffffff8118ff9c>] shmem_getpage_gfp+0x50c/0x7a0
[15611.807935]  [<ffffffff81190272>] shmem_write_begin+0x42/0x70
[15611.808950]  [<ffffffff81175494>] generic_perform_write+0xd4/0x1f0
[15611.809917]  [<ffffffff81177bf2>] __generic_file_write_iter+0x162/0x350
[15611.810880]  [<ffffffff811e6d90>] ? new_sync_read+0xd0/0xd0
[15611.811838]  [<ffffffff81177e1f>] generic_file_write_iter+0x3f/0xb0
[15611.812779]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.813705]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15611.814616]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15611.815515]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.816418]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15611.817310]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15611.818198]  [<ffffffff811e895c>] vfs_writev+0x3c/0x50
[15611.819090]  [<ffffffff811e8d22>] SyS_pwritev+0xc2/0xf0
[15611.819979]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15611.820859] Code: 12 fa 74 75 45 31 ff 31 c9 eb 28 0f 1f 40 00 44 89 e1 4d 89 ef 41 83 ec 06 49 d3 ef 41 83 e7 3f 83 eb 01 44 89 fa 48 8b 54 d0 28 <74> 52 48 89 c1 48 89 d0 48 85 c0 75 d7 4c 89 f7 48 89 4d c0 e8 
[15611.822791] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 109.285 msecs
[15639.652762] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/3:32]
[15639.653833] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15639.659569] CPU: 1 PID: 32 Comm: rcuos/3 Tainted: G             L 3.18.0+ #107
[15639.661870] task: ffff8802437e4470 ti: ffff88024300c000 task.ti: ffff88024300c000
[15639.663045] RIP: 0010:[<ffffffff817ccdf8>]  [<ffffffff817ccdf8>] _raw_spin_unlock_irqrestore+0x38/0x60
[15639.664199] RSP: 0018:ffff88024300fc48  EFLAGS: 00000292
[15639.665344] RAX: 0000000000000001 RBX: ffffffff811cdcd0 RCX: 00000000000002a0
[15639.666499] RDX: ffff88024500d620 RSI: 0000000000000000 RDI: ffff88024483e200
[15639.667656] RBP: ffff88024300fc58 R08: 0000000000000000 R09: ffff8800984a0448
[15639.668811] R10: ffff880243b13840 R11: 0000000000000000 R12: ffff8800984a04f0
[15639.669955] R13: 00000000000000cc R14: ffffffff810135bf R15: ffff88024300fbb8
[15639.671114] FS:  0000000000000000(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15639.672277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15639.673431] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407e0
[15639.674570] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15639.675708] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15639.676821] Stack:
[15639.677905]  ffffea0002612800 ffffea0002612800 ffff88024300fd18 ffffffff817c0513
[15639.679014]  ffff88024300fc78 ffffffff810ab7d5 ffff88024300fcf8 ffff8800984a0000
[15639.680119]  0000000000000000 0000000000000292 0000000000000000 0000000100190016
[15639.681210] Call Trace:
[15639.682277]  [<ffffffff817c0513>] __slab_free+0x75/0x317
[15639.683348]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15639.684402]  [<ffffffff81255bcc>] ? proc_i_callback+0x1c/0x20
[15639.685449]  [<ffffffff811d0bee>] kmem_cache_free+0x1ae/0x240
[15639.686495]  [<ffffffff81255bb0>] ? proc_destroy_inode+0x20/0x20
[15639.687538]  [<ffffffff81255bcc>] proc_i_callback+0x1c/0x20
[15639.688590]  [<ffffffff810dd8df>] rcu_nocb_kthread+0x58f/0xd60
[15639.689645]  [<ffffffff810dd62a>] ? rcu_nocb_kthread+0x2da/0xd60
[15639.690670]  [<ffffffff810bcd30>] ? prepare_to_wait_event+0xf0/0xf0
[15639.691706]  [<ffffffff810dd350>] ? rcu_process_callbacks+0x9d0/0x9d0
[15639.692735]  [<ffffffff81098979>] kthread+0xf9/0x110
[15639.693763]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15639.694793]  [<ffffffff81098880>] ? kthread_create_on_node+0x250/0x250
[15639.695817]  [<ffffffff817cd9ac>] ret_from_fork+0x7c/0xb0
[15639.696852]  [<ffffffff81098880>] ? kthread_create_on_node+0x250/0x250
[15639.697866] Code: fc 48 8b 55 08 53 48 8d 7f 18 48 89 f3 be 01 00 00 00 e8 ac 92 8f ff 4c 89 e7 e8 d4 c5 8f ff f6 c7 02 74 17 e8 aa c9 97 ff 53 9d <5b> 65 ff 0c 25 e0 a9 00 00 41 5c 5d c3 0f 1f 00 53 9d e8 91 c8 
[15639.700102] sending NMI to other CPUs:
[15639.701151] NMI backtrace for cpu 3
[15639.702181] CPU: 3 PID: 5935 Comm: trinity-main Tainted: G             L 3.18.0+ #107
[15639.704275] task: ffff880229bbc470 ti: ffff8801b65f4000 task.ti: ffff8801b65f4000
[15639.705341] RIP: 0010:[<ffffffff811cdcf4>]  [<ffffffff811cdcf4>] set_track+0x94/0x140
[15639.706420] RSP: 0018:ffff8801b65f7bd8  EFLAGS: 00000012
[15639.707490] RAX: 000000000000000a RBX: ffff8802288fa228 RCX: ffff8802288fa230
[15639.708570] RDX: 0000000000000010 RSI: 000000000000000f RDI: 00007fffda3ab4c8
[15639.709650] RBP: ffff8801b65f7c08 R08: ffff8801b65f7bd8 R09: 0000000000000000
[15639.710730] R10: ffff88024480e1c0 R11: 0000000000000000 R12: ffffffff811f8726
[15639.711816] R13: ffff8802288f9180 R14: ffff88024483ea00 R15: ffff8801b65f7ca0
[15639.712902] FS:  00007fdd90bc4740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15639.714004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15639.715104] CR2: 0000000000bb4000 CR3: 000000019ce21000 CR4: 00000000001407e0
[15639.716211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15639.717315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[15639.718420] Stack:
[15639.719492]  000000100000000a ffff8802288fa230 ffffea0000000000 00000000ff68a7ed
[15639.720572]  ffffea0008a23e00 ffff88024480e1c0 ffff8801b65f7c68 ffffffff817c03cb
[15639.721639]  ffff88009a3a4fd8 ffff88009a3a4fc0 ffff88024480e1c0 ffff88024483ea00
[15639.722694] Call Trace:
[15639.723719]  [<ffffffff817c03cb>] free_debug_processing+0x157/0x22a
[15639.724736]  [<ffffffff817c04f3>] __slab_free+0x55/0x317
[15639.725727]  [<ffffffff811f6fda>] ? path_lookupat+0x7a/0x770
[15639.726696]  [<ffffffff811f8726>] ? final_putname+0x26/0x50
[15639.727641]  [<ffffffff811d0bee>] kmem_cache_free+0x1ae/0x240
[15639.728570]  [<ffffffff811f8726>] final_putname+0x26/0x50
[15639.729478]  [<ffffffff811f89c9>] putname+0x29/0x40
[15639.730366]  [<ffffffff811f97ce>] user_path_at_empty+0x6e/0xc0
[15639.731243]  [<ffffffff811a1ad7>] ? might_fault+0x47/0x50
[15639.732111]  [<ffffffff811ece47>] ? cp_new_stat+0x157/0x190
[15639.732969]  [<ffffffff811f9831>] user_path_at+0x11/0x20
[15639.733827]  [<ffffffff811ec983>] vfs_fstatat+0x63/0xc0
[15639.734682]  [<ffffffff811ecf64>] SYSC_newfstatat+0x24/0x60
[15639.735543]  [<ffffffff81012217>] ? syscall_trace_enter_phase2+0xa7/0x200
[15639.736404]  [<ffffffff811ed1ce>] SyS_newfstatat+0xe/0x10
[15639.737263]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15639.738114] Code: 00 00 e8 c0 58 e4 ff 8b 55 d0 85 d2 0f 85 95 00 00 00 31 d2 0f 1f 00 48 63 f2 83 c2 01 48 c7 44 f3 08 00 00 00 00 83 fa 0f 7e ec <4c> 89 23 65 8b 04 25 2c a0 00 00 89 83 88 00 00 00 65 48 8b 04 
[15639.739992] NMI backtrace for cpu 2
[15639.740889] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #107
[15639.742712] task: ffff88024348c470 ti: ffff88024371c000 task.ti: ffff88024371c000
[15639.743639] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15639.744575] RSP: 0018:ffff88024371fe08  EFLAGS: 00000046
[15639.745502] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[15639.746434] RDX: 0000000000000000 RSI: ffff88024371ffd8 RDI: 0000000000000002
[15639.747365] RBP: ffff88024371fe38 R08: 000000008baf91c6 R09: 0000000000000000
[15639.748300] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[15639.749235] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88024371c000
[15639.750168] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15639.751109] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15639.752051] CR2: 0000000000000000 CR3: 0000000240f57000 CR4: 00000000001407e0
[15639.753000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15639.753946] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15639.754884] Stack:
[15639.755819]  000000024371fe38 87ce37a9ab100d43 ffffe8ffff202118 0000000000000002
[15639.756778]  ffffffff81cb19c0 0000000000000002 ffff88024371fe88 ffffffff8165d385
[15639.757741]  00000e3b7b884bf4 ffffffff81cb1a88 ffffffff81cb19c0 ffffffff81d213f0
[15639.758707] Call Trace:
[15639.759662]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15639.760606]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15639.761524]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15639.762440]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15639.763358] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15639.765344] NMI backtrace for cpu 0
[15639.766269] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15639.768133] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15639.769074] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15639.770030] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15639.770979] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15639.771933] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15639.772884] RBP: ffffffff81c03e68 R08: 000000008baf91c6 R09: 0000000000000000
[15639.773827] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15639.774767] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[15639.775699] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15639.776639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15639.777580] CR2: 0000003370219050 CR3: 0000000001c11000 CR4: 00000000001407f0
[15639.778527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15639.779463] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15639.780391] Stack:
[15639.781314]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000005
[15639.782258]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15639.783208]  00000e3b7b79aa5b ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15639.784151] Call Trace:
[15639.785088]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15639.786034]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15639.786978]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15639.787921]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15639.788869]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15639.789815]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15639.790758]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15639.791704]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15639.792649]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15639.793603]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15639.794549] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15667.636533] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c206:14314]
[15667.637609] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15667.643390] CPU: 1 PID: 14314 Comm: trinity-c206 Tainted: G             L 3.18.0+ #107
[15667.645698] task: ffff880227440000 ti: ffff880227c7c000 task.ti: ffff880227c7c000
[15667.646864] RIP: 0010:[<ffffffff81372bc5>]  [<ffffffff81372bc5>] copy_user_enhanced_fast_string+0x5/0x10
[15667.648013] RSP: 0018:ffff880227c7fc60  EFLAGS: 00010286
[15667.649167] RAX: ffff880152231000 RBX: ffffffff81175db2 RCX: 0000000000000fc0
[15667.650341] RDX: 0000000000001000 RSI: ffff880152231040 RDI: 00007fdd8cc56b7f
[15667.651499] RBP: ffff880227c7fcb8 R08: ffff880152231000 R09: ffff8800982fe9f0
[15667.652654] R10: ffff880227c7fb88 R11: 0000000000000000 R12: ffffea0005488c40
[15667.653825] R13: 000100000009cf84 R14: 0000000000000246 R15: ffff880227c7fbf8
[15667.654982] FS:  00007fdd90bc4740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15667.656129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15667.657274] CR2: 00007fdd8f9f58e0 CR3: 0000000224a01000 CR4: 00000000001407e0
[15667.658421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15667.659556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[15667.660673] Stack:
[15667.661780]  ffffffff8119f74d ffff880152231000 00007fdd8cc56b3f ffff880152231000
[15667.662915]  ffffea0005488c40 ffffea0005488c40 000100000009cf84 0000000000000000
[15667.664058]  ffff8802252be820 0000000000001000 0000000000000000 ffff880227c7fd48
[15667.665205] Call Trace:
[15667.666336]  [<ffffffff8119f74d>] ? copy_page_to_iter+0x19d/0x340
[15667.667512]  [<ffffffff81190b7b>] shmem_file_read_iter+0xcb/0x300
[15667.668671]  [<ffffffff81190ab0>] ? shmem_fault+0x1c0/0x1c0
[15667.669837]  [<ffffffff811e6cc0>] ? do_sync_readv_writev+0xa0/0xa0
[15667.671003]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15667.672149]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15667.673252]  [<ffffffff81190ab0>] ? shmem_fault+0x1c0/0x1c0
[15667.674329]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15667.675387]  [<ffffffff811e8909>] vfs_readv+0x39/0x50
[15667.676432]  [<ffffffff811e89cc>] SyS_readv+0x5c/0x100
[15667.677537]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15667.678574] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[15667.680846] sending NMI to other CPUs:
[15667.681910] NMI backtrace for cpu 2
[15667.682941] CPU: 2 PID: 15662 Comm: trinity-c188 Tainted: G             L 3.18.0+ #107
[15667.685036] task: ffff8801db9e0000 ti: ffff880226ca8000 task.ti: ffff880226ca8000
[15667.686094] RIP: 0010:[<ffffffff810c44d6>]  [<ffffffff810c44d6>] lock_acquired+0x46/0x370
[15667.687169] RSP: 0018:ffff880226cabdf8  EFLAGS: 00000046
[15667.688244] RAX: 0000000000000001 RBX: ffffffff81c0a080 RCX: ffff8802453cff98
[15667.689324] RDX: 0000000000000680 RSI: ffffffff8107a8d9 RDI: ffffffff81c0a098
[15667.690392] RBP: ffff880226cabe38 R08: 0000000000000000 R09: 0000000000000001
[15667.691464] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801db9e0000
[15667.692532] R13: ffffffff81c0a098 R14: 0000000000000246 R15: ffff8801db9e0000
[15667.693595] FS:  00007fdd90bc4740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15667.694665] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15667.695738] CR2: 000000336f1b7740 CR3: 0000000223f31000 CR4: 00000000001407e0
[15667.696820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15667.697901] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[15667.698977] Stack:
[15667.700049]  0000000125edaea0 ffffffff8107a8d9 ffff8801db9e0000 ffffffff81c0a080
[15667.701151]  ffffffff81c0a098 ffff8801db9e0000 ffff8801db9e0000 ffff8801db9e0000
[15667.702257]  ffff880226cabe68 ffffffff817ccf3d ffffffff8107a8d9 ffff880226cabef8
[15667.703361] Call Trace:
[15667.704456]  [<ffffffff8107a8d9>] ? do_wait+0xd9/0x280
[15667.705539]  [<ffffffff817ccf3d>] _raw_read_lock+0x6d/0x80
[15667.706599]  [<ffffffff8107a8d9>] ? do_wait+0xd9/0x280
[15667.707646]  [<ffffffff8107a8d9>] do_wait+0xd9/0x280
[15667.708669]  [<ffffffff8107aea0>] SyS_wait4+0x80/0x110
[15667.709667]  [<ffffffff81078980>] ? task_stopped_code+0x60/0x60
[15667.710642]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15667.711594] Code: 00 45 85 c9 0f 84 d8 00 00 00 65 4c 8b 24 25 00 aa 00 00 45 8b 84 24 6c 07 00 00 45 85 c0 0f 85 be 00 00 00 49 89 fd 9c 41 5e fa <8b> 35 3c ed aa 01 41 c7 84 24 6c 07 00 00 01 00 00 00 41 8b 9c 
[15667.713623] NMI backtrace for cpu 0
[15667.714570] CPU: 0 PID: 12614 Comm: trinity-c134 Tainted: G             L 3.18.0+ #107
[15667.716458] task: ffff8802254716d0 ti: ffff88009463c000 task.ti: ffff88009463c000
[15667.717406] RIP: 0010:[<ffffffff8136c297>]  [<ffffffff8136c297>] rb_first+0x17/0x30
[15667.718363] RSP: 0018:ffff88009463fd10  EFLAGS: 00000282
[15667.719311] RAX: ffff8801be9e1220 RBX: ffff8800955da020 RCX: ffff8800955da620
[15667.720271] RDX: ffff8801be9e2220 RSI: 0000000000000000 RDI: ffff88009ee6abc8
[15667.721229] RBP: ffff88009463fd10 R08: 0000000000000000 R09: ffff8801beaa8398
[15667.722185] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800955da020
[15667.723129] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800955da620
[15667.724066] FS:  00007fdd90bc4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15667.725004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15667.725946] CR2: 0000000000e308f0 CR3: 000000007260b000 CR4: 00000000001407f0
[15667.726902] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15667.727844] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[15667.728781] Stack:
[15667.729706]  ffff88009463fd40 ffffffff811a98ea ffff8801beaa83b0 ffff8800955da020
[15667.730658]  ffff8800955da020 ffff88009ee6abc8 ffff88009463fd80 ffffffff811aa413
[15667.731604]  ffff88009463fd70 ffff8800955da000 ffff8801beaa83b0 0000000001200011
[15667.732545] Call Trace:
[15667.733480]  [<ffffffff811a98ea>] validate_mm_rb+0x1a/0x70
[15667.734421]  [<ffffffff811aa413>] __vma_link_rb+0xc3/0x100
[15667.735361]  [<ffffffff81076465>] copy_process.part.26+0x1445/0x1a40
[15667.736305]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15667.737250]  [<ffffffff81076c27>] do_fork+0xe7/0x490
[15667.738191]  [<ffffffff81012217>] ? syscall_trace_enter_phase2+0xa7/0x200
[15667.739138]  [<ffffffff81077056>] SyS_clone+0x16/0x20
[15667.740081]  [<ffffffff817cddd9>] stub_clone+0x69/0x90
[15667.741024]  [<ffffffff817cdc49>] ? tracesys_phase2+0xd4/0xd9
[15667.741969] Code: 31 c0 e8 18 1e 45 00 eb d2 90 90 90 90 90 90 90 90 90 90 90 48 8b 07 55 48 89 e5 48 85 c0 75 07 eb 10 66 90 48 89 d0 48 8b 50 10 <48> 85 d2 75 f4 5d c3 31 c0 5d c3 66 66 66 66 66 2e 0f 1f 84 00 
[15667.743980] NMI backtrace for cpu 3
[15667.744977] CPU: 3 PID: 15728 Comm: trinity-c168 Tainted: G             L 3.18.0+ #107
[15667.747026] task: ffff88009af74470 ti: ffff88022752c000 task.ti: ffff88022752c000
[15667.748038] RIP: 0010:[<ffffffff8117d366>]  [<ffffffff8117d366>] free_pcppages_bulk+0x3c6/0x530
[15667.749008] RSP: 0018:ffff88022752fa98  EFLAGS: 00000087
[15667.749964] RAX: ffffea0006cb6000 RBX: 0000000000000584 RCX: 0000000000000002
[15667.750928] RDX: 00000000ffffff80 RSI: 000000000000000a RDI: 00000000ffffff80
[15667.751884] RBP: ffff88022752fb28 R08: ffffea0006cb6100 R09: ffff88024e5d3ed8
[15667.752845] R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000580
[15667.753803] R13: 0000000000000002 R14: ffffea0006cb6180 R15: 0000000000000002
[15667.754760] FS:  00007fdd90bc4740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15667.755727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15667.756685] CR2: 0000000000000000 CR3: 0000000223c1b000 CR4: 00000000001407e0
[15667.757649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15667.758606] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
[15667.759561] Stack:
[15667.760508]  ffff88022752fb18 ffff88024e5d3f40 ffff8802455d7328 0000000900000002
[15667.761476]  0000000000000002 0000000000000586 ffffea0006cb6180 ffff88024e5d3ed8
[15667.762445]  ffffea0006cb6100 000000090000000a ffff8802455d7358 ffff88024e5d3e80
[15667.763410] Call Trace:
[15667.764368]  [<ffffffff8117d7f3>] free_hot_cold_page+0x173/0x1a0
[15667.765331]  [<ffffffff8117d885>] free_hot_cold_page_list+0x65/0xd0
[15667.766290]  [<ffffffff811843ed>] release_pages+0x1bd/0x270
[15667.767252]  [<ffffffff81185423>] __pagevec_release+0x43/0x60
[15667.768213]  [<ffffffff81191430>] shmem_undo_range+0x460/0x710
[15667.769178]  [<ffffffff811916f8>] shmem_truncate_range+0x18/0x40
[15667.770143]  [<ffffffff81191986>] shmem_setattr+0x116/0x1a0
[15667.771107]  [<ffffffff81206a21>] notify_change+0x241/0x390
[15667.772067]  [<ffffffff811e4f25>] do_truncate+0x75/0xc0
[15667.773009]  [<ffffffff811e529a>] ? do_sys_ftruncate.constprop.14+0xda/0x160
[15667.773932]  [<ffffffff811e52cf>] do_sys_ftruncate.constprop.14+0x10f/0x160
[15667.774858]  [<ffffffff811e535e>] SyS_ftruncate+0xe/0x10
[15667.775773]  [<ffffffff817cdc49>] tracesys_phase2+0xd4/0xd9
[15667.776667] Code: 48 c1 e0 06 49 01 c0 41 39 f7 73 24 41 bc 01 00 00 00 41 d3 e4 4d 63 e4 49 31 dc 4c 89 e0 48 29 d8 48 c1 e0 06 4c 01 c0 8b 50 18 <83> fa 80 74 0d 44 89 f9 e9 4d fd ff ff 0f 1f 44 00 00 48 8d 51 
[15695.620295] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c243:22756]
[15695.621317] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15695.626986] CPU: 1 PID: 22756 Comm: trinity-c243 Tainted: G             L 3.18.0+ #107
[15695.629248] task: ffff8801dbac0000 ti: ffff880222ec8000 task.ti: ffff880222ec8000
[15695.630424] RIP: 0010:[<ffffffff81372bc5>]  [<ffffffff81372bc5>] copy_user_enhanced_fast_string+0x5/0x10
[15695.631649] RSP: 0000:ffff880222ecbbe0  EFLAGS: 00010202
[15695.632813] RAX: 00007f5352671e15 RBX: 0000000000000003 RCX: 00000000000003c0
[15695.633981] RDX: 0000000000001000 RSI: 00007f5352672a55 RDI: ffff880022849c40
[15695.635160] RBP: ffff880222ecbc28 R08: 0000000000000000 R09: 0000000000000001
[15695.636360] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff817ccd6b
[15695.637495] R13: ffff880222ecbb68 R14: ffff88007758c6d0 R15: ffff88007758c7f0
[15695.638629] FS:  00007f53535df740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15695.639773] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15695.640919] CR2: 00007f5352440748 CR3: 0000000227cf6000 CR4: 00000000001407e0
[15695.642073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15695.643247] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15695.644422] Stack:
[15695.645589]  ffffffff8119f2e6 ffff880222ecbbf8 00000000119c1000 0000000000001000
[15695.646776]  00000000119c1000 0000000000001000 ffff880222ecbd60 0000000000000000
[15695.647982]  ffff88007758ca00 ffff880222ecbcc8 ffffffff811754b7 ffff880222ecbc88
[15695.649174] Call Trace:
[15695.650348]  [<ffffffff8119f2e6>] ? iov_iter_copy_from_user_atomic+0x156/0x180
[15695.651487]  [<ffffffff811754b7>] generic_perform_write+0xf7/0x1f0
[15695.652622]  [<ffffffff81177bf2>] __generic_file_write_iter+0x162/0x350
[15695.653748]  [<ffffffff811e6d90>] ? new_sync_read+0xd0/0xd0
[15695.654855]  [<ffffffff81177e1f>] generic_file_write_iter+0x3f/0xb0
[15695.655943]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15695.657028]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15695.658103]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15695.659170]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15695.660255]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15695.661334]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15695.662409]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15695.663490]  [<ffffffff811e895c>] vfs_writev+0x3c/0x50
[15695.664559]  [<ffffffff811e8acc>] SyS_writev+0x5c/0x100
[15695.665610]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15695.666663] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[15695.668947] sending NMI to other CPUs:
[15695.670050] NMI backtrace for cpu 0
[15695.671085] CPU: 0 PID: 24460 Comm: trinity-c223 Tainted: G             L 3.18.0+ #107
[15695.673189] task: ffff88022551db40 ti: ffff88009afa8000 task.ti: ffff88009afa8000
[15695.674252] RIP: 0010:[<ffffffff815f25f2>]  [<ffffffff815f25f2>] xhci_irq+0x42/0x1c70
[15695.675331] RSP: 0018:ffff880244e03dd8  EFLAGS: 00000086
[15695.676404] RAX: 0000000000000008 RBX: ffff8802400bd480 RCX: 0000000000000000
[15695.677487] RDX: ffff880244e0c3c0 RSI: ffff880240298060 RDI: ffff88022551e2b0
[15695.678572] RBP: ffff880244e03e78 R08: 0000000000000000 R09: 0000000000000001
[15695.679657] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000001c
[15695.680744] R13: ffff8800a171dab8 R14: ffff880240298048 R15: ffff880240298000
[15695.681831] FS:  00007f53535df740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15695.682930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15695.684032] CR2: 00007f53535ada54 CR3: 00000000a1a8c000 CR4: 00000000001407f0
[15695.685141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15695.686246] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15695.687345] Stack:
[15695.688438]  ffff880244e03de8 ffffffff810ab7d5 ffff880244e03e68 ffffffff810c4e0c
[15695.689541]  ffff880240d05450 0000000000000000 00000000fffe8920 0000000000000046
[15695.690619]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[15695.691688] Call Trace:
[15695.692723]  <IRQ> 

[15695.693742]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15695.694734]  [<ffffffff810c4e0c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15695.695708]  [<ffffffff815f4231>] xhci_msi_irq+0x11/0x20
[15695.696664]  [<ffffffff810cf8f5>] handle_irq_event_percpu+0x55/0x1f0
[15695.697605]  [<ffffffff810cfad1>] handle_irq_event+0x41/0x70
[15695.698524]  [<ffffffff810d2b5e>] ? handle_edge_irq+0x1e/0x140
[15695.699425]  [<ffffffff810d2bbf>] handle_edge_irq+0x7f/0x140
[15695.700311]  [<ffffffff810053a1>] handle_irq+0xb1/0x140
[15695.701189]  [<ffffffff8107a9f1>] ? do_wait+0x1f1/0x280
[15695.702059]  [<ffffffff817d0413>] do_IRQ+0x53/0x100
[15695.702916]  [<ffffffff8107a9f1>] ? do_wait+0x1f1/0x280
[15695.703776]  [<ffffffff817ce56f>] common_interrupt+0x6f/0x6f
[15695.704634]  <EOI> 

[15695.705488]  [<ffffffff810c6150>] ? lock_release+0xc0/0x240
[15695.706335]  [<ffffffff817cd103>] _raw_read_unlock+0x23/0x40
[15695.707179]  [<ffffffff8107a9f1>] do_wait+0x1f1/0x280
[15695.708016]  [<ffffffff8107aea0>] SyS_wait4+0x80/0x110
[15695.708850]  [<ffffffff81078980>] ? task_stopped_code+0x60/0x60
[15695.709689]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15695.710525] Code: ec 78 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 4c 8b bf d0 03 00 00 4d 8d 77 48 4c 89 f7 e8 b5 a3 1d 00 49 8b 47 18 8b 40 04 <83> f8 ff 0f 84 09 01 00 00 a8 08 0f 84 35 02 00 00 a8 04 0f 85 
[15695.712357] NMI backtrace for cpu 3
[15695.713263] CPU: 3 PID: 21578 Comm: trinity-c97 Tainted: G             L 3.18.0+ #107
[15695.715097] task: ffff88009acc2da0 ti: ffff88019cdf8000 task.ti: ffff88019cdf8000
[15695.716028] RIP: 0010:[<ffffffff81185b7b>]  [<ffffffff81185b7b>] cancel_dirty_page+0xb/0xc0
[15695.716970] RSP: 0018:ffff88019cdfbc40  EFLAGS: 00000246
[15695.717914] RAX: 00000000fffffffb RBX: ffffea0007bfe600 RCX: 0000000000000415
[15695.718861] RDX: ffffea0007bfe600 RSI: 0000000000001000 RDI: ffffea0007bfe600
[15695.719804] RBP: ffff88019cdfbc48 R08: 0000000000000000 R09: ffff88010f861c40
[15695.720746] R10: ffff88019cdfbbd8 R11: 0000000000000000 R12: ffff880227e63040
[15695.721687] R13: ffff88019cdfbd40 R14: 0000000000000000 R15: 0000000000047bc6
[15695.722623] FS:  00007f53535df740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15695.723585] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15695.724539] CR2: 000000336f1b7740 CR3: 00000002240c0000 CR4: 00000000001407e0
[15695.725502] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15695.726465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15695.727420] Stack:
[15695.728374]  ffff88019cdfbd40 ffff88019cdfbc68 ffffffff8118609e 0000000000000008
[15695.729334]  ffff88019cdfbcd0 ffff88019cdfbdf8 ffffffff811913f3 ffffea0006fad800
[15695.730268]  ffff880227e62e30 0000000000000000 0000000000000000 0000000000000000
[15695.731203] Call Trace:
[15695.732131]  [<ffffffff8118609e>] truncate_inode_page+0x4e/0x90
[15695.733069]  [<ffffffff811913f3>] shmem_undo_range+0x423/0x710
[15695.733989]  [<ffffffff811916f8>] shmem_truncate_range+0x18/0x40
[15695.734891]  [<ffffffff81191986>] shmem_setattr+0x116/0x1a0
[15695.735787]  [<ffffffff81206a21>] notify_change+0x241/0x390
[15695.736664]  [<ffffffff811e4f25>] do_truncate+0x75/0xc0
[15695.737535]  [<ffffffff811e529a>] ? do_sys_ftruncate.constprop.14+0xda/0x160
[15695.738413]  [<ffffffff811e52cf>] do_sys_ftruncate.constprop.14+0x10f/0x160
[15695.739291]  [<ffffffff8137432e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[15695.740176]  [<ffffffff811e535e>] SyS_ftruncate+0xe/0x10
[15695.741050]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15695.741916] Code: eb d5 be f7 02 00 00 48 c7 c7 a2 cc a6 81 48 89 55 d8 e8 e9 21 ef ff 48 8b 55 d8 e9 4f ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 55 <41> 54 41 89 f4 53 48 83 ec 08 f0 0f ba 37 04 72 14 48 83 c4 08 
[15695.743812] NMI backtrace for cpu 2
[15695.744736] CPU: 2 PID: 24408 Comm: trinity-c238 Tainted: G             L 3.18.0+ #107
[15695.746590] task: ffff880240cd8000 ti: ffff880226d38000 task.ti: ffff880226d38000
[15695.747528] RIP: 0010:[<ffffffff81372bc5>]  [<ffffffff81372bc5>] copy_user_enhanced_fast_string+0x5/0x10
[15695.748479] RSP: 0018:ffff880226d3bc60  EFLAGS: 00010286
[15695.749430] RAX: ffff8800020a1000 RBX: ffff880226d3bd60 RCX: 00000000000009c0
[15695.750385] RDX: 0000000000001000 RSI: ffff8800020a1640 RDI: 00007f5351be001e
[15695.751344] RBP: ffff880226d3bcb8 R08: ffff8800020a1000 R09: ffff8800981a8388
[15695.752303] R10: ffff880226d3bb88 R11: 0000000000000000 R12: 0000000000001000
[15695.753261] R13: 000000000018a9de R14: ffff880094522380 R15: 0000000000001000
[15695.754220] FS:  00007f53535df740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15695.755190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15695.756162] CR2: 00007f53524a684a CR3: 0000000226d63000 CR4: 00000000001407e0
[15695.757142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15695.758127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15695.759108] Stack:
[15695.760127]  ffffffff8119f74d ffff8800020a1000 00007f5351bdf9de ffff8800020a1000
[15695.761224]  ffffea0000082840 ffffea00051adc80 0000000000001965 0000000000000000
[15695.762316]  ffff880227e64e60 0000000000001000 0000000000000000 ffff880226d3bd48
[15695.763408] Call Trace:
[15695.764482]  [<ffffffff8119f74d>] ? copy_page_to_iter+0x19d/0x340
[15695.765503]  [<ffffffff81190b7b>] shmem_file_read_iter+0xcb/0x300
[15695.766524]  [<ffffffff81190ab0>] ? shmem_fault+0x1c0/0x1c0
[15695.767542]  [<ffffffff811e6cc0>] ? do_sync_readv_writev+0xa0/0xa0
[15695.768565]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15695.769583]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15695.770606]  [<ffffffff81190ab0>] ? shmem_fault+0x1c0/0x1c0
[15695.771609]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15695.772593]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15695.773598]  [<ffffffff811e8909>] vfs_readv+0x39/0x50
[15695.774616]  [<ffffffff811e8c32>] SyS_preadv+0xc2/0xf0
[15695.775607]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15695.776553] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[15723.604074] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-main:773]
[15723.605115] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15723.610645] CPU: 1 PID: 773 Comm: trinity-main Tainted: G             L 3.18.0+ #107
[15723.612819] task: ffff880240c116d0 ti: ffff8800582d4000 task.ti: ffff8800582d4000
[15723.613925] RIP: 0010:[<ffffffff817c0f17>]  [<ffffffff817c0f17>] __slab_alloc+0x52f/0x58f
[15723.615063] RSP: 0018:ffff8800582d7a58  EFLAGS: 00000246
[15723.616174] RAX: 0000000000000001 RBX: ffff88014bc57900 RCX: 00000000000002a0
[15723.617287] RDX: ffff88024500d620 RSI: 0000000000000000 RDI: ffff88024483d600
[15723.618407] RBP: ffff8800582d7b48 R08: 0000000000000000 R09: 0000000000000000
[15723.619522] R10: 0000000000000092 R11: ffff88014bc54f30 R12: ffffffff810135bf
[15723.620632] R13: ffff8800582d79d8 R14: 0000000100190010 R15: ffffffff81202809
[15723.621744] FS:  00007f2495d4c740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15723.622868] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15723.623990] CR2: 0000000000e39000 CR3: 0000000094e20000 CR4: 00000000001407e0
[15723.625133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15723.626265] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15723.627394] Stack:
[15723.628515]  ffff8800582d7a68 ffffffff810ab7d5 ffff8800582d7ae8 ffffffff810c4e0c
[15723.629664]  ffff8800582d7a98 ffffffff810c32ff ffff880240c116d0 0000000000000000
[15723.630811]  ffff8802451d80a0 0000000200000092 ffff8800582d7ad8 ffffffff81202809
[15723.631984] Call Trace:
[15723.633117]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15723.634276]  [<ffffffff810c4e0c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15723.635346]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15723.636425]  [<ffffffff81202809>] ? __d_alloc+0x29/0x1d0
[15723.637510]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15723.638551]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15723.639590]  [<ffffffff81202809>] ? __d_alloc+0x29/0x1d0
[15723.640631]  [<ffffffff811d172b>] kmem_cache_alloc+0x1cb/0x1f0
[15723.641649]  [<ffffffff81202fcf>] ? __d_lookup+0xdf/0x1c0
[15723.642661]  [<ffffffff81202ef5>] ? __d_lookup+0x5/0x1c0
[15723.643670]  [<ffffffff81202809>] __d_alloc+0x29/0x1d0
[15723.644712]  [<ffffffff812029d1>] d_alloc+0x21/0x80
[15723.645731]  [<ffffffff811f217b>] lookup_dcache+0x8b/0xb0
[15723.646734]  [<ffffffff817c128a>] ? lookup_slow+0x38/0xad
[15723.647731]  [<ffffffff811f21cd>] __lookup_hash+0x2d/0x60
[15723.648709]  [<ffffffff817c1299>] lookup_slow+0x47/0xad
[15723.649687]  [<ffffffff811f76a8>] path_lookupat+0x748/0x770
[15723.650669]  [<ffffffff811d1666>] ? kmem_cache_alloc+0x106/0x1f0
[15723.651666]  [<ffffffff811f879f>] ? getname_flags+0x4f/0x1a0
[15723.652634]  [<ffffffff811f76fb>] filename_lookup+0x2b/0xc0
[15723.653603]  [<ffffffff811f97c3>] user_path_at_empty+0x63/0xc0
[15723.654619]  [<ffffffff811a1ad7>] ? might_fault+0x47/0x50
[15723.655582]  [<ffffffff811ece47>] ? cp_new_stat+0x157/0x190
[15723.656540]  [<ffffffff811f9831>] user_path_at+0x11/0x20
[15723.657495]  [<ffffffff811ec983>] vfs_fstatat+0x63/0xc0
[15723.658458]  [<ffffffff811ecf64>] SYSC_newfstatat+0x24/0x60
[15723.659412]  [<ffffffff8137432e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[15723.660354]  [<ffffffff811ed1ce>] SyS_newfstatat+0xe/0x10
[15723.661310]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15723.662268] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 9a 87 98 ff 4c 89 e0 eb 0f e8 90 88 98 ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 55 c8 65 48 33 14 25 28 00 00 00 74 3c e8 02 6b 
[15723.664427] sending NMI to other CPUs:
[15723.665416] NMI backtrace for cpu 2
[15723.666409] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #107
[15723.668424] task: ffff88024348c470 ti: ffff88024371c000 task.ti: ffff88024371c000
[15723.669432] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15723.670429] RSP: 0018:ffff88024371fe08  EFLAGS: 00000046
[15723.671412] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[15723.672382] RDX: 0000000000000000 RSI: ffff88024371ffd8 RDI: 0000000000000002
[15723.673330] RBP: ffff88024371fe38 R08: 000000008baf90c8 R09: 0000000000000000
[15723.674260] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[15723.675167] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88024371c000
[15723.676047] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15723.676921] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15723.677771] CR2: 0000000000da6000 CR3: 000000007252c000 CR4: 00000000001407e0
[15723.678613] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15723.679446] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15723.680268] Stack:
[15723.681077]  000000024371fe38 87ce37a9ab100d43 ffffe8ffff202118 0000000000000002
[15723.681913]  ffffffff81cb19c0 0000000000000002 ffff88024371fe88 ffffffff8165d385
[15723.682749]  00000e4f0b188364 ffffffff81cb1a88 ffffffff81cb19c0 ffffffff81d213f0
[15723.683590] Call Trace:
[15723.684416]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15723.685247]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15723.686064]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15723.686880]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15723.687697] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15723.689514] NMI backtrace for cpu 0
[15723.690374] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15723.692130] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15723.693019] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15723.693920] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15723.694813] RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
[15723.695721] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15723.696616] RBP: ffffffff81c03e68 R08: 000000008baf90c8 R09: 0000000000000000
[15723.697519] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[15723.698420] R13: 0000000000000020 R14: 0000000000000003 R15: ffffffff81c00000
[15723.699315] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15723.700217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15723.701120] CR2: 0000003370219050 CR3: 0000000001c11000 CR4: 00000000001407f0
[15723.702038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15723.702952] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15723.703863] Stack:
[15723.704771]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000004
[15723.705706]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15723.706649]  00000e4f0b0b0da0 ffffffff81cb1b38 ffffffff81cb19c0 ffffffff81d213f0
[15723.707593] Call Trace:
[15723.708528]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15723.709476]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15723.710415]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15723.711355]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15723.712298]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15723.713241]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15723.714184]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15723.715128]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15723.716070]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15723.717018]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15723.717959] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15723.720020] NMI backtrace for cpu 3
[15723.721025] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15723.723020] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15723.724043] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15723.725069] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15723.726079] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15723.727071] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15723.728051] RBP: ffff88024372be38 R08: 000000008baf90c8 R09: 0000000000000000
[15723.729029] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15723.730028] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880243728000
[15723.731028] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15723.732019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15723.732997] CR2: 0000003370219050 CR3: 0000000001c11000 CR4: 00000000001407e0
[15723.734000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15723.734971] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15723.735943] Stack:
[15723.736901]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000005
[15723.737889]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15723.738904]  00000e4f0b04a497 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15723.739903] Call Trace:
[15723.740880]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15723.741867]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15723.742849]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15723.743828]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15723.744820] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15751.587822] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c127:9822]
[15751.588924] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15751.594608] CPU: 1 PID: 9822 Comm: trinity-c127 Tainted: G             L 3.18.0+ #107
[15751.596824] task: ffff880229bb96d0 ti: ffff8801699f4000 task.ti: ffff8801699f4000
[15751.597973] RIP: 0010:[<ffffffff81117f09>]  [<ffffffff81117f09>] map_id_up+0x9/0x80
[15751.599126] RSP: 0018:ffff8801699f7e78  EFLAGS: 00000246
[15751.600280] RAX: ffff88002394c000 RBX: 0000000000000292 RCX: ffff88022357edb0
[15751.601445] RDX: ffffffff81c48680 RSI: 00000000000003e8 RDI: ffffffff81c41700
[15751.602611] RBP: ffff8801699f7e88 R08: 00007f2495d4c740 R09: 0000000000000001
[15751.603772] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000096
[15751.604933] R13: ffff8801699f7e18 R14: ffff88022fdd3e38 R15: ffff880229bb96d0
[15751.606071] FS:  00007f2495d4c740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15751.607202] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15751.608347] CR2: 0000000000639210 CR3: 00000001dbbc4000 CR4: 00000000001407e0
[15751.609502] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15751.610645] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15751.611774] Stack:
[15751.612884]  ffff8801699f7e88 ffffffff81117fae ffff8801699f7f68 ffffffff81086dda
[15751.614045]  00007f2495542000 0000000000000000 ffff880229bb9aa0 0000000000000000
[15751.615175]  ffff880100000000 000000000000265e ffff880061283800 00007fffa68babd0
[15751.616317] Call Trace:
[15751.617454]  [<ffffffff81117fae>] ? from_kuid_munged+0xe/0x20
[15751.618617]  [<ffffffff81086dda>] SYSC_kill+0x7a/0x240
[15751.619749]  [<ffffffff8107aeab>] ? SyS_wait4+0x8b/0x110
[15751.620889]  [<ffffffff81088cce>] SyS_kill+0xe/0x10
[15751.622034]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15751.623177] Code: 04 52 5d 03 74 87 08 89 f0 29 c8 c3 8b 47 0c 01 c8 83 e8 01 39 c6 77 a7 31 d2 eb e2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 44 8b 0f <48> 89 e5 45 85 c9 b8 ff ff ff ff 74 3c 8b 4f 08 39 ce 73 4d 48 
[15751.625658] sending NMI to other CPUs:
[15751.626847] NMI backtrace for cpu 0
[15751.627968] CPU: 0 PID: 9692 Comm: trinity-c41 Tainted: G             L 3.18.0+ #107
[15751.630204] task: ffff88020ce1c470 ti: ffff88009acb0000 task.ti: ffff88009acb0000
[15751.631339] RIP: 0010:[<ffffffff810c93c0>]  [<ffffffff810c93c0>] do_raw_spin_unlock+0x0/0xa0
[15751.632467] RSP: 0018:ffff88009acb3b10  EFLAGS: 00000092
[15751.633576] RAX: ffff88020ce1c470 RBX: ffff88009d38f728 RCX: 0000000000002640
[15751.634685] RDX: ffff880244e1cf80 RSI: 0000000000000000 RDI: ffff88009d38f728
[15751.635789] RBP: ffff88009acb3b28 R08: 0000000000000000 R09: 0000000000000001
[15751.636888] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d38f710
[15751.637990] R13: 0000000000000000 R14: ffff88009d38f728 R15: 0000000000000000
[15751.639090] FS:  00007f2495d4c740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15751.640201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15751.641312] CR2: 0000000000000004 CR3: 0000000097156000 CR4: 00000000001407f0
[15751.642424] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15751.643531] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15751.644627] Stack:
[15751.645719]  ffffffff817cce4b ffff88009acb3b28 ffffea000820ddc0 ffff88009acb3b68
[15751.646844]  ffffffff8118f783 ffff88009acb3b58 0000000000000003 0000000000000000
[15751.647958]  0000000000000000 ffff88009d38f500 0000000000000000 ffff88009acb3c18
[15751.649070] Call Trace:
[15751.650165]  [<ffffffff817cce4b>] ? _raw_spin_unlock_irq+0x2b/0x40
[15751.651269]  [<ffffffff8118f783>] shmem_add_to_page_cache+0xe3/0x140
[15751.652368]  [<ffffffff8118ff9c>] shmem_getpage_gfp+0x50c/0x7a0
[15751.653456]  [<ffffffff81190272>] shmem_write_begin+0x42/0x70
[15751.654552]  [<ffffffff81175494>] generic_perform_write+0xd4/0x1f0
[15751.655646]  [<ffffffff81177bf2>] __generic_file_write_iter+0x162/0x350
[15751.656742]  [<ffffffff811e6d90>] ? new_sync_read+0xd0/0xd0
[15751.657819]  [<ffffffff81177e1f>] generic_file_write_iter+0x3f/0xb0
[15751.658872]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15751.659919]  [<ffffffff811e6ed8>] do_iter_readv_writev+0x78/0xc0
[15751.660948]  [<ffffffff811e8708>] do_readv_writev+0xd8/0x2a0
[15751.661950]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15751.662938]  [<ffffffff81177de0>] ? __generic_file_write_iter+0x350/0x350
[15751.663898]  [<ffffffff810c32ff>] ? lock_release_holdtime.part.24+0xf/0x190
[15751.664836]  [<ffffffff817cce50>] ? _raw_spin_unlock_irq+0x30/0x40
[15751.665755]  [<ffffffff811e895c>] vfs_writev+0x3c/0x50
[15751.666657]  [<ffffffff811e8acc>] SyS_writev+0x5c/0x100
[15751.667535]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15751.668406] Code: 00 89 d0 f0 66 0f b1 37 66 39 d0 75 e0 b1 01 5d 65 8b 04 25 2c a0 00 00 89 47 08 65 48 8b 04 25 00 aa 00 00 48 89 47 10 89 c8 c3 <0f> 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 81 7f 04 ad 
[15751.670313] NMI backtrace for cpu 3
[15751.671258] CPU: 3 PID: 8780 Comm: trinity-c136 Tainted: G             L 3.18.0+ #107
[15751.673201] task: ffff88009ed316d0 ti: ffff8802333fc000 task.ti: ffff8802333fc000
[15751.674212] RIP: 0010:[<ffffffff81185b90>]  [<ffffffff81185b90>] cancel_dirty_page+0x20/0xc0
[15751.675256] RSP: 0018:ffff8802333ffc30  EFLAGS: 00000296
[15751.676330] RAX: ffffffff81d24160 RBX: ffff88016985bd20 RCX: 0000000000005ed5
[15751.677366] RDX: ffffea0005d7c700 RSI: 0000000000001000 RDI: ffffea0005d7c700
[15751.678392] RBP: ffff8802333ffc48 R08: 0000000000000000 R09: 0000000000000046
[15751.679383] R10: ffff8802333ffbd8 R11: 0000000000000000 R12: 0000000000001000
[15751.680368] R13: ffff8802333ffd40 R14: 0000000000000000 R15: 0000800000129380
[15751.681341] FS:  00007f2495d4c740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15751.682321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15751.683301] CR2: 00000000ffffff48 CR3: 0000000226fe9000 CR4: 00000000001407e0
[15751.684300] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15751.685278] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15751.686254] Stack:
[15751.687221]  ffffea0005d7c700 ffff88016985bd20 ffff8802333ffd40 ffff8802333ffc68
[15751.688225]  ffffffff8118609e 0000000000000009 ffff8802333ffcd0 ffff8802333ffdf8
[15751.689222]  ffffffff811913f3 ffff880243b12f40 ffff88016985bb10 000000eb00000000
[15751.690199] Call Trace:
[15751.691159]  [<ffffffff8118609e>] truncate_inode_page+0x4e/0x90
[15751.692117]  [<ffffffff811913f3>] shmem_undo_range+0x423/0x710
[15751.693078]  [<ffffffff811916f8>] shmem_truncate_range+0x18/0x40
[15751.694033]  [<ffffffff81191986>] shmem_setattr+0x116/0x1a0
[15751.694964]  [<ffffffff81206a21>] notify_change+0x241/0x390
[15751.695880]  [<ffffffff811e4f25>] do_truncate+0x75/0xc0
[15751.696783]  [<ffffffff811e529a>] ? do_sys_ftruncate.constprop.14+0xda/0x160
[15751.697697]  [<ffffffff811e52cf>] do_sys_ftruncate.constprop.14+0x10f/0x160
[15751.698595]  [<ffffffff8137432e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[15751.699490]  [<ffffffff811e5370>] compat_SyS_ftruncate+0x10/0x20
[15751.700403]  [<ffffffff817d0249>] ia32_do_call+0x13/0x13
[15751.701301] Code: ef ff 48 8b 55 d8 e9 4f ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 41 89 f4 53 48 83 ec 08 f0 0f ba 37 04 72 14 48 83 c4 08 <5b> 41 5c 41 5d 5d c3 66 0f 1f 84 00 00 00 00 00 48 8b 5f 08 48 
[15751.703287] NMI backtrace for cpu 2
[15751.704239] CPU: 2 PID: 9809 Comm: trinity-c81 Tainted: G             L 3.18.0+ #107
[15751.706139] task: ffff8802277e0000 ti: ffff880222cb0000 task.ti: ffff880222cb0000
[15751.707119] RIP: 0010:[<ffffffff810c4d22>]  [<ffffffff810c4d22>] __lock_acquire.isra.31+0x142/0x9f0
[15751.708111] RSP: 0018:ffff880222cb3d68  EFLAGS: 00000002
[15751.709102] RAX: 0000000000000000 RBX: ffff8802277e0000 RCX: 0000000000000002
[15751.710083] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81c50da0
[15751.711071] RBP: ffff880222cb3dd8 R08: 0000000000000000 R09: 0000000000000000
[15751.712049] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82596770
[15751.713027] R13: 0000000000000000 R14: ffffffff81c50da0 R15: 0000000000000000
[15751.714017] FS:  00007f2495d4c740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15751.715002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15751.715993] CR2: 0000000000000000 CR3: 000000017cbc0000 CR4: 00000000001407e0
[15751.716989] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15751.717995] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15751.718991] Stack:
[15751.720012]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[15751.721025]  ffff880222cb3d98 ffffffff810ab7d5 ffff880222cb3e18 ffffffff810c4e0c
[15751.722043]  ffff880222cb3db8 0000000000000246 0000000000000000 0000000000000000
[15751.723057] Call Trace:
[15751.724058]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15751.725094]  [<ffffffff810c4e0c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15751.726114]  [<ffffffff810c5cdf>] lock_acquire+0x9f/0x120
[15751.727134]  [<ffffffff81086c95>] ? kill_pid_info+0x5/0xb0
[15751.728178]  [<ffffffff81086cd8>] kill_pid_info+0x48/0xb0
[15751.729197]  [<ffffffff81086c95>] ? kill_pid_info+0x5/0xb0
[15751.730225]  [<ffffffff81086e2c>] SYSC_kill+0xcc/0x240
[15751.731216]  [<ffffffff81086de8>] ? SYSC_kill+0x88/0x240
[15751.732193]  [<ffffffff8107aeab>] ? SyS_wait4+0x8b/0x110
[15751.733174]  [<ffffffff81088cce>] SyS_kill+0xe/0x10
[15751.734133]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15751.735076] Code: 00 00 45 31 ed e9 f3 01 00 00 0f 1f 80 00 00 00 00 44 89 e8 4d 8b 64 c6 08 4d 85 e4 0f 84 21 ff ff ff f0 41 ff 84 24 98 01 00 00 <8b> 3d f0 e4 aa 01 44 8b ab 68 07 00 00 85 ff 75 0a 41 83 fd 2f 
[15751.737141] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 110.288 msecs
[15779.571634] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[15779.572815] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15779.579104] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.18.0+ #107
[15779.581677] task: ffff8802434896d0 ti: ffff880243714000 task.ti: ffff880243714000
[15779.582931] RIP: 0010:[<ffffffff8165d3a9>]  [<ffffffff8165d3a9>] cpuidle_enter_state+0x79/0x190
[15779.584212] RSP: 0000:ffff880243717e48  EFLAGS: 00000246
[15779.585481] RAX: 00000e5c114d0f5f RBX: ffffffff82bb8048 RCX: 0000000000000019
[15779.586751] RDX: 20c49ba5e353f7cf RSI: 000000000002c480 RDI: 00410b3fa729c634
[15779.588025] RBP: ffff880243717e88 R08: 000000008baf9021 R09: 0000000000000000
[15779.589299] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810c32ff
[15779.590568] R13: ffff880243717dc8 R14: ffffffff810ab7d5 R15: ffff880243717da8
[15779.591843] FS:  0000000000000000(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15779.593080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15779.594318] CR2: 0000000000d80c80 CR3: 0000000001c11000 CR4: 00000000001407e0
[15779.595572] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15779.596822] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15779.598066] Stack:
[15779.599304]  00000e5c1132f748 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15779.600571]  ffffe8ffff002118 ffff880243714000 ffffffff81cb19c0 ffff880243714000
[15779.601840]  ffff880243717e98 ffffffff8165d577 ffff880243717f08 ffffffff810bd345
[15779.603062] Call Trace:
[15779.604277]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15779.605482]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15779.606660]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15779.607831] Code: c8 48 89 df ff 50 48 41 89 c5 e8 c3 f7 a8 ff 44 8b 63 04 49 89 c7 0f 1f 44 00 00 e8 02 c4 ae ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 2b 7d c0 4c 89 f8 49 c1 ff 3f 48 f7 ea b8 ff ff ff 7f 48 c1 
[15779.610336] sending NMI to other CPUs:
[15779.611504] NMI backtrace for cpu 0
[15779.612633] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15779.614924] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15779.616088] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15779.617265] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15779.618434] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000001
[15779.619614] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15779.620782] RBP: ffffffff81c03e68 R08: 000000008baf9021 R09: 0000000000000000
[15779.621945] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15779.623100] R13: 0000000000000000 R14: 0000000000000001 R15: ffffffff81c00000
[15779.624258] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15779.625426] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15779.626584] CR2: 00007f249422c269 CR3: 0000000001c11000 CR4: 00000000001407f0
[15779.627747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15779.628901] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15779.630043] Stack:
[15779.631174]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000001
[15779.632331]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15779.633491]  00000e5c139d54ef ffffffff81cb1a30 ffffffff81cb19c0 ffffffff81d213f0
[15779.634649] Call Trace:
[15779.635799]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15779.636965]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15779.638128]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15779.639291]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15779.640454]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15779.641617]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15779.642783]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15779.643943]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15779.645078]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15779.646196]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15779.647291] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15779.649635] NMI backtrace for cpu 3
[15779.650755] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15779.652899] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15779.653930] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15779.654957] RSP: 0018:ffff88024372be08  EFLAGS: 00000046
[15779.655960] RAX: 0000000000000010 RBX: 0000000000000004 RCX: 0000000000000001
[15779.656955] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15779.657940] RBP: ffff88024372be38 R08: 000000008baf9021 R09: 0000000000000000
[15779.658918] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[15779.659893] R13: 0000000000000010 R14: 0000000000000002 R15: ffff880243728000
[15779.660862] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15779.661833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15779.662808] CR2: 00007f24941d6269 CR3: 0000000001c11000 CR4: 00000000001407e0
[15779.663797] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15779.664768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15779.665737] Stack:
[15779.666688]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000003
[15779.667667]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15779.668656]  00000e5c1316d5c0 ffffffff81cb1ae0 ffffffff81cb19c0 ffffffff81d213f0
[15779.669642] Call Trace:
[15779.670611]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15779.671604]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15779.672584]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15779.673558]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15779.674526] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15779.676645] NMI backtrace for cpu 2
[15779.677640] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #107
[15779.679625] task: ffff88024348c470 ti: ffff88024371c000 task.ti: ffff88024371c000
[15779.680640] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15779.681674] RSP: 0018:ffff88024371fe08  EFLAGS: 00000046
[15779.682678] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[15779.683669] RDX: 0000000000000000 RSI: ffff88024371ffd8 RDI: 0000000000000002
[15779.684654] RBP: ffff88024371fe38 R08: 000000008baf9021 R09: 0000000000000000
[15779.685629] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[15779.686605] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024371c000
[15779.687581] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15779.688566] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15779.689551] CR2: 00000000006ee1e0 CR3: 0000000001c11000 CR4: 00000000001407e0
[15779.690546] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15779.691544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15779.692532] Stack:
[15779.693514]  000000024371fe38 87ce37a9ab100d43 ffffe8ffff202118 0000000000000005
[15779.694530]  ffffffff81cb19c0 0000000000000002 ffff88024371fe88 ffffffff8165d385
[15779.695552]  00000e5c13171780 ffffffff81cb1b90 ffffffff81cb19c0 ffffffff81d213f0
[15779.696576] Call Trace:
[15779.697590]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15779.698615]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15779.699626]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15779.700624]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15779.701619] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15807.555361] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c35:18287]
[15807.556447] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15807.562334] CPU: 1 PID: 18287 Comm: trinity-c35 Tainted: G             L 3.18.0+ #107
[15807.564718] task: ffff8802252edb40 ti: ffff8802253f0000 task.ti: ffff8802253f0000
[15807.565907] RIP: 0010:[<ffffffff811a30c9>]  [<ffffffff811a30c9>] unmap_single_vma+0x7d9/0x900
[15807.567106] RSP: 0018:ffff8802253f38e8  EFLAGS: 00000286
[15807.568294] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 00000000007d6dc0
[15807.569470] RDX: ffff8800972c1010 RSI: ffff8802253f3ac0 RDI: ffff880223c22000
[15807.570629] RBP: ffff8802253f39d8 R08: 0000000000000000 R09: 0000000000000001
[15807.571790] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802252edb40
[15807.572940] R13: 0000000000000003 R14: 0000000000000246 R15: ffff8802253f38a8
[15807.574084] FS:  00007f1edb7cb740(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15807.575239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15807.576412] CR2: 0000000000639058 CR3: 00000001dbb98000 CR4: 00000000001407e0
[15807.577594] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15807.578759] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15807.579916] Stack:
[15807.581064]  00007f1ed9440fff 00007f1ed9440fff ffff88020cfc83d8 ffff8801b64d47f0
[15807.582230]  00007f1ed9441000 00007f1ed9441000 00007f1ed9440fff 00007f1ed9441000
[15807.583396]  ffff8802253f3968 ffff8802253f39a0 ffff8802252edb40 00003ffffffff000
[15807.584565] Call Trace:
[15807.585732]  [<ffffffff811a32ec>] zap_page_range_single+0xfc/0x160
[15807.586897]  [<ffffffff811a3415>] ? unmap_mapping_range+0x75/0x190
[15807.588054]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15807.589203]  [<ffffffff811a3517>] unmap_mapping_range+0x177/0x190
[15807.590353]  [<ffffffff8118607d>] truncate_inode_page+0x2d/0x90
[15807.591490]  [<ffffffff811913f3>] shmem_undo_range+0x423/0x710
[15807.592633]  [<ffffffff811916f8>] shmem_truncate_range+0x18/0x40
[15807.593779]  [<ffffffff81191c47>] shmem_fallocate+0x237/0x520
[15807.594916]  [<ffffffff810bc857>] ? prepare_to_wait+0x27/0x90
[15807.596039]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15807.597135]  [<ffffffff811b79d8>] SyS_madvise+0x398/0x870
[15807.598225]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15807.599315] Code: f7 45 98 00 08 00 00 48 0f 45 c2 49 89 04 24 e9 8b fc ff ff 66 0f 1f 44 00 00 4d 85 f6 0f 84 4e fc ff ff 48 8b 75 a8 48 8b 56 08 <48> 85 d2 74 0a 49 3b 56 08 0f 85 e6 fb ff ff 48 8b 75 a8 48 83 
[15807.601631] sending NMI to other CPUs:
[15807.602717] NMI backtrace for cpu 0
[15807.603771] CPU: 0 PID: 16807 Comm: trinity-c87 Tainted: G             L 3.18.0+ #107
[15807.605890] task: ffff88005e3e16d0 ti: ffff880225c54000 task.ti: ffff880225c54000
[15807.606966] RIP: 0010:[<ffffffff811496a0>]  [<ffffffff811496a0>] trace_hardirqs_off+0x0/0x100
[15807.608063] RSP: 0018:ffff880244e03cc0  EFLAGS: 00000092
[15807.609150] RAX: 0000000000000000 RBX: 0000000000000092 RCX: 0000000000007560
[15807.610243] RDX: ffff880244e3f960 RSI: 0000000000000000 RDI: ffff88023fa7da98
[15807.611329] RBP: ffff880244e03cd8 R08: 0000000000000000 R09: 0000000000000001
[15807.612412] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023fa7da98
[15807.613498] R13: 0000000000000092 R14: ffff88023fa7da98 R15: ffff880240298000
[15807.614589] FS:  00007f1edb7cb740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15807.615682] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15807.616767] CR2: 00007f1ed9c41000 CR3: 0000000225685000 CR4: 00000000001407f0
[15807.617855] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15807.618938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15807.620007] Stack:
[15807.621068]  ffffffff817cce0f ffff88023fa7d668 ffff88023fa7da98 ffff880244e03d38
[15807.622157]  ffffffff815fbecd ffff88023fa7da98 ffff88023fa7dcb8 000000203fa7da98
[15807.623246]  ffff88023fa7dbb8 ffff88023fa7da98 ffff88023fa7d668 ffff880240d06aa8
[15807.624334] Call Trace:
[15807.625388]  <IRQ> 

[15807.626421]  [<ffffffff817cce0f>] ? _raw_spin_unlock_irqrestore+0x4f/0x60
[15807.627453]  [<ffffffff815fbecd>] usb_serial_generic_write_start+0x11d/0x200
[15807.628469]  [<ffffffff815fc193>] usb_serial_generic_write_bulk_callback+0x93/0x1b0
[15807.629462]  [<ffffffff815b0542>] __usb_hcd_giveback_urb+0x72/0x120
[15807.630430]  [<ffffffff815b0733>] usb_hcd_giveback_urb+0x43/0x120
[15807.631379]  [<ffffffff815f3203>] xhci_irq+0xc53/0x1c70
[15807.632303]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15807.633212]  [<ffffffff810c4e0c>] ? __lock_acquire.isra.31+0x22c/0x9f0
[15807.634107]  [<ffffffff815f4231>] xhci_msi_irq+0x11/0x20
[15807.634981]  [<ffffffff810cf8f5>] handle_irq_event_percpu+0x55/0x1f0
[15807.635848]  [<ffffffff810cfad1>] handle_irq_event+0x41/0x70
[15807.636708]  [<ffffffff810d2b5e>] ? handle_edge_irq+0x1e/0x140
[15807.637557]  [<ffffffff810d2bbf>] handle_edge_irq+0x7f/0x140
[15807.638404]  [<ffffffff810053a1>] handle_irq+0xb1/0x140
[15807.639247]  [<ffffffff817d0413>] do_IRQ+0x53/0x100
[15807.640085]  [<ffffffff817ce56f>] common_interrupt+0x6f/0x6f
[15807.640927]  <EOI> 

[15807.641761]  [<ffffffff81079bc4>] ? wait_consider_task+0x84/0xcc0
[15807.642588]  [<ffffffff8107a8d9>] ? do_wait+0xd9/0x280
[15807.643412]  [<ffffffff8107a920>] do_wait+0x120/0x280
[15807.644234]  [<ffffffff8107aea0>] SyS_wait4+0x80/0x110
[15807.645064]  [<ffffffff81078980>] ? task_stopped_code+0x60/0x60
[15807.645885]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15807.646697] Code: e8 66 f6 ff ff 4c 8b 4d c8 4a 8b 04 cd a0 f0 d1 81 41 c7 04 06 01 00 00 00 f0 41 ff 0f e9 46 ff ff ff 66 0f 1f 84 00 00 00 00 00 <f6> 05 51 a0 bd 00 02 75 07 c3 66 0f 1f 44 00 00 9c 58 f6 c4 02 
[15807.648500] NMI backtrace for cpu 2
[15807.649383] CPU: 2 PID: 15128 Comm: trinity-c180 Tainted: G             L 3.18.0+ #107
[15807.651182] task: ffff880225dd2da0 ti: ffff88009aed8000 task.ti: ffff88009aed8000
[15807.652102] RIP: 0010:[<ffffffff810c610b>]  [<ffffffff810c610b>] lock_release+0x7b/0x240
[15807.653030] RSP: 0018:ffff88009aedbe10  EFLAGS: 00000046
[15807.653960] RAX: 0000000000000000 RBX: ffff880225dd2da0 RCX: ffff8802266f74d8
[15807.654896] RDX: ffffffff8107a9f1 RSI: 0000000000000000 RDI: 0000000000000001
[15807.655839] RBP: ffff88009aedbe48 R08: 0000000000000000 R09: 0000000000000001
[15807.656782] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c0a098
[15807.657724] R13: ffffffff8107a9f1 R14: 0000000000000296 R15: 0000000000000001
[15807.658669] FS:  00007f1edb7cb740(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15807.659626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15807.660583] CR2: 0000000000000001 CR3: 00000000613c7000 CR4: 00000000001407e0
[15807.661526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15807.662446] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15807.663355] Stack:
[15807.664262]  0000000000000046 ffffffff81c0a098 ffffffff81c0a080 ffff880225dd3108
[15807.665197]  ffff880225dd2da0 ffff880225dd2da0 ffff880225dd2d90 ffff88009aedbe68
[15807.666125]  ffffffff817cd103 ffff880225dd2da0 ffff88009aedbef8 ffff88009aedbee8
[15807.667012] Call Trace:
[15807.667883]  [<ffffffff817cd103>] _raw_read_unlock+0x23/0x40
[15807.668761]  [<ffffffff8107a9f1>] do_wait+0x1f1/0x280
[15807.669630]  [<ffffffff8107aea0>] SyS_wait4+0x80/0x110
[15807.670494]  [<ffffffff81078980>] ? task_stopped_code+0x60/0x60
[15807.671365]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15807.672238] Code: 05 0b cc c3 00 8b 3d 75 6d be 00 65 ff 0c 25 e0 a9 00 00 65 48 8b 1c 25 00 aa 00 00 85 ff 74 3a 8b 35 12 d1 aa 01 85 f6 75 0b 9c <58> f6 c4 02 0f 85 6e 01 00 00 8b 93 68 07 00 00 85 d2 0f 8e 45 
[15807.674141] NMI backtrace for cpu 3
[15807.675053] CPU: 3 PID: 18981 Comm: trinity-c225 Tainted: G             L 3.18.0+ #107
[15807.676970] task: ffff88011c4e0000 ti: ffff880225e94000 task.ti: ffff880225e94000
[15807.677939] RIP: 0010:[<ffffffff810c4dfa>]  [<ffffffff810c4dfa>] __lock_acquire.isra.31+0x21a/0x9f0
[15807.678917] RSP: 0018:ffff880225e97a48  EFLAGS: 00000002
[15807.679875] RAX: 0000000000000008 RBX: ffff88011c4e0000 RCX: 0000000000000000
[15807.680839] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000000
[15807.681802] RBP: ffff880225e97ab8 R08: 0000000000000001 R09: 0000000000000000
[15807.682757] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000135
[15807.683717] R13: 0000000000000002 R14: ffff88024e5d4398 R15: ffff88011c4e07e0
[15807.684675] FS:  00007f1edb7cb740(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15807.685642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15807.686606] CR2: 00007f1edb72f07c CR3: 0000000229ac5000 CR4: 00000000001407e0
[15807.687579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15807.688548] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15807.689514] Stack:
[15807.690473]  ffff880225e97ac8 ffffea00087f7440 000000021fdd0000 0000000000000000
[15807.691463]  0000000000000000 000ffe0000000000 ffff880225e97ad8 0000000000000000
[15807.692449]  ffffea0006f59d80 0000000000000046 0000000000000000 0000000000000000
[15807.693433] Call Trace:
[15807.694415]  [<ffffffff810c5cdf>] lock_acquire+0x9f/0x120
[15807.695418]  [<ffffffff81184331>] ? release_pages+0x101/0x270
[15807.696412]  [<ffffffff817ccbe9>] _raw_spin_lock_irqsave+0x49/0x90
[15807.697408]  [<ffffffff81184331>] ? release_pages+0x101/0x270
[15807.698408]  [<ffffffff810ab7d5>] ? local_clock+0x25/0x30
[15807.699404]  [<ffffffff81184331>] release_pages+0x101/0x270
[15807.700406]  [<ffffffff81185423>] __pagevec_release+0x43/0x60
[15807.701396]  [<ffffffff81191430>] shmem_undo_range+0x460/0x710
[15807.702391]  [<ffffffff81191d96>] shmem_fallocate+0x386/0x520
[15807.703386]  [<ffffffff811ea115>] ? __sb_start_write+0x105/0x1c0
[15807.704382]  [<ffffffff811e549c>] ? do_fallocate+0x11c/0x1d0
[15807.705389]  [<ffffffff811e549c>] ? do_fallocate+0x11c/0x1d0
[15807.706377]  [<ffffffff811e54b2>] do_fallocate+0x132/0x1d0
[15807.707346]  [<ffffffff811e5598>] SyS_fallocate+0x48/0x80
[15807.708287]  [<ffffffff817cda52>] system_call_fastpath+0x12/0x17
[15807.709228] Code: e0 7f 44 09 d0 41 88 47 31 41 0f b6 47 32 83 e0 f0 45 85 c0 0f 95 c2 09 c8 c1 e2 03 09 d0 41 88 47 32 0f b7 55 18 41 0f b7 47 32 <c1> e2 04 83 e0 0f 09 d0 66 41 89 47 32 e8 a4 69 fe ff 4c 8b 4d 
[15835.539166] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[15835.540319] Modules linked in: 8021q garp bridge stp snd_seq_dummy hidp fuse tun rfcomm bnep af_key llc2 can_raw nfnetlink sctp libcrc32c can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic microcode serio_raw pcspkr usb_debug snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep shpchp snd_seq e1000e snd_seq_device ptp pps_core snd_pcm snd_timer snd soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[15835.546410] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.18.0+ #107
[15835.548896] task: ffff8802434896d0 ti: ffff880243714000 task.ti: ffff880243714000
[15835.550154] RIP: 0010:[<ffffffff8165d3a9>]  [<ffffffff8165d3a9>] cpuidle_enter_state+0x79/0x190
[15835.551425] RSP: 0018:ffff880243717e48  EFLAGS: 00000246
[15835.552686] RAX: 00000e691b28f962 RBX: ffffffff82bb8048 RCX: 0000000000000019
[15835.553975] RDX: 20c49ba5e353f7cf RSI: 00000000000276af RDI: 0041352987dbd37c
[15835.555275] RBP: ffff880243717e88 R08: 000000008baf8f7d R09: 0000000000000000
[15835.556557] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810c32ff
[15835.557841] R13: ffff880243717dc8 R14: ffffffff810ab7d5 R15: ffff880243717da8
[15835.559116] FS:  0000000000000000(0000) GS:ffff880245000000(0000) knlGS:0000000000000000
[15835.560378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15835.561620] CR2: 00007f8b7d063000 CR3: 0000000001c11000 CR4: 00000000001407e0
[15835.562868] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15835.564124] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15835.565374] Stack:
[15835.566619]  00000e691a90eefa ffffffff81cb1ae0 ffffffff81cb19c0 ffffffff81d213f0
[15835.567895]  ffffe8ffff002118 ffff880243714000 ffffffff81cb19c0 ffff880243714000
[15835.569155]  ffff880243717e98 ffffffff8165d577 ffff880243717f08 ffffffff810bd345
[15835.570338] Call Trace:
[15835.571508]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15835.572690]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15835.573850]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15835.574978] Code: c8 48 89 df ff 50 48 41 89 c5 e8 c3 f7 a8 ff 44 8b 63 04 49 89 c7 0f 1f 44 00 00 e8 02 c4 ae ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 2b 7d c0 4c 89 f8 49 c1 ff 3f 48 f7 ea b8 ff ff ff 7f 48 c1 
[15835.577402] sending NMI to other CPUs:
[15835.578547] NMI backtrace for cpu 0
[15835.579666] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #107
[15835.581963] task: ffffffff81c164c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[15835.583133] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15835.584310] RSP: 0018:ffffffff81c03e38  EFLAGS: 00000046
[15835.585475] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000001
[15835.586643] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[15835.587805] RBP: ffffffff81c03e68 R08: 000000008baf8f7d R09: 0000000000000000
[15835.588972] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[15835.590132] R13: 0000000000000000 R14: 0000000000000001 R15: ffffffff81c00000
[15835.591285] FS:  0000000000000000(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
[15835.592443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15835.593595] CR2: 00000000f0000137 CR3: 0000000001c11000 CR4: 00000000001407f0
[15835.594753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15835.595903] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15835.597050] Stack:
[15835.598189]  0000000081c03e68 76e04e88cc48b66f ffffe8fffee02118 0000000000000001
[15835.599357]  ffffffff81cb19c0 0000000000000000 ffffffff81c03eb8 ffffffff8165d385
[15835.600530]  00000e691d725867 ffffffff81cb1a30 ffffffff81cb19c0 ffffffff81d213f0
[15835.601707] Call Trace:
[15835.602870]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15835.604053]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15835.605230]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15835.606407]  [<ffffffff817b9639>] rest_init+0xc9/0xd0
[15835.607581]  [<ffffffff817b9575>] ? rest_init+0x5/0xd0
[15835.608761]  [<ffffffff81f21de6>] ? ftrace_init+0xa8/0x13b
[15835.609942]  [<ffffffff81f03041>] start_kernel+0x492/0x4b3
[15835.611123]  [<ffffffff81f0299f>] ? set_init_arg+0x55/0x55
[15835.612296]  [<ffffffff81f02581>] x86_64_start_reservations+0x2a/0x2c
[15835.613451]  [<ffffffff81f02675>] x86_64_start_kernel+0xf2/0xf6
[15835.614573] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15835.616991] NMI backtrace for cpu 3
[15835.618146] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #107
[15835.620322] task: ffff88024348ada0 ti: ffff880243728000 task.ti: ffff880243728000
[15835.621352] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15835.622392] RSP: 0000:ffff88024372be08  EFLAGS: 00000046
[15835.623397] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[15835.624391] RDX: 0000000000000000 RSI: ffff88024372bfd8 RDI: 0000000000000003
[15835.625376] RBP: ffff88024372be38 R08: 000000008baf8f7d R09: 0000000000000000
[15835.626352] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[15835.627324] R13: 0000000000000001 R14: 0000000000000001 R15: ffff880243728000
[15835.628290] FS:  0000000000000000(0000) GS:ffff880245400000(0000) knlGS:0000000000000000
[15835.629259] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15835.630229] CR2: 0000000001186ff8 CR3: 0000000094ed2000 CR4: 00000000001407e0
[15835.631206] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15835.632173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15835.633133] Stack:
[15835.634080]  000000034372be38 492161474c69335b ffffe8ffff402118 0000000000000002
[15835.635053]  ffffffff81cb19c0 0000000000000003 ffff88024372be88 ffffffff8165d385
[15835.636035]  00000e691d2afb93 ffffffff81cb1a88 ffffffff81cb19c0 ffffffff81d213f0
[15835.637018] Call Trace:
[15835.637986]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15835.638969]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15835.639942]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15835.640908]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15835.641872] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[15835.643982] NMI backtrace for cpu 2
[15835.644955] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #107
[15835.646877] task: ffff88024348c470 ti: ffff88024371c000 task.ti: ffff88024371c000
[15835.647860] RIP: 0010:[<ffffffff813d4cab>]  [<ffffffff813d4cab>] intel_idle+0xdb/0x180
[15835.648868] RSP: 0018:ffff88024371fe08  EFLAGS: 00000046
[15835.649868] RAX: 0000000000000010 RBX: 0000000000000004 RCX: 0000000000000001
[15835.650828] RDX: 0000000000000000 RSI: ffff88024371ffd8 RDI: ffffffff81c11000
[15835.651782] RBP: ffff88024371fe38 R08: 000000008baf8f7d R09: 0000000000000000
[15835.652734] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[15835.653682] R13: 0000000000000010 R14: 0000000000000002 R15: ffff88024371c000
[15835.654633] FS:  0000000000000000(0000) GS:ffff880245200000(0000) knlGS:0000000000000000
[15835.655588] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15835.656543] CR2: 000000336f1b7740 CR3: 0000000001c11000 CR4: 00000000001407e0
[15835.657514] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15835.658480] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[15835.659460] Stack:
[15835.660403]  000000024371fe38 87ce37a9ab100d43 ffffe8ffff202118 0000000000000003
[15835.661376]  ffffffff81cb19c0 0000000000000002 ffff88024371fe88 ffffffff8165d385
[15835.662354]  00000e691d059748 ffffffff81cb1ae0 ffffffff81cb19c0 ffffffff81d213f0
[15835.663333] Call Trace:
[15835.664296]  [<ffffffff8165d385>] cpuidle_enter_state+0x55/0x190
[15835.665268]  [<ffffffff8165d577>] cpuidle_enter+0x17/0x20
[15835.666239]  [<ffffffff810bd345>] cpu_startup_entry+0x355/0x410
[15835.667202]  [<ffffffff8103015a>] start_secondary+0x1aa/0x230
[15835.668163] Code: 31 d2 65 48 8b 34 25 08 aa 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 aa 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19  2:45                                                                                         ` Dave Jones
@ 2014-12-19  3:49                                                                                           ` Linus Torvalds
  2014-12-19  3:58                                                                                             ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-19  3:49 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Linus Torvalds, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Thu, Dec 18, 2014 at 6:45 PM, Dave Jones <davej@redhat.com> wrote:
>
> Example of the spew-o-rama below.

Hmm. Not only does it apparently stay up for you now, the traces seem
to be improving in quality.

There's a decided pattern of "copy_page_range()" and "zap_page_range()" here.

Now, what's *also* intriguing is how many "_raw_spin_lock_nested"
things there are in there. Which makes me suspect that you are
actually hitting some really nasty spinlock contention, and that your
22s lockups could be due to lock hold times going exponential.

So I don't think that it's the copy_page_range() itself that is
necessarily all that horribly expensive (although it's certainly not a
lightweight function), but the fact that you get contention on some
lock inside that loop, and when you have every single CPU hammering on
it things just go to hell in a handbasket.

And when spinlocks start getting  contention, *nested* spinlocks
really really hurt. And you've got all the spinlock debugging on etc,
don't you? Which just makes the locks really expensive, and much much
easier to start becoming contended (and there's *another* level of
nesting, because I think the lockdep stuff has its own locking
inside). So you have three levels of spinlock nesting, and the outer
one will be completely hammered.

So I think the locks you have are from copy_pte_range:

        dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
        if (!dst_pte)
                return -ENOMEM;
        src_pte = pte_offset_map(src_pmd, addr);
        src_ptl = pte_lockptr(src_mm, src_pmd);
        spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);

and we do have some mitigation in place for horrible horrible
contention (try to release every few entries, but with all CPU's
hammering on these locks, and things being slow due to all the
debugging, I think we may finally be hitting the right place here.

Also, you do have this:

  sched: RT throttling activated

so there's something going on with RT scheduling too. I'd consider all
the softlockups after that point suspect - the softlockup thread has
presumably used so much CPU spewing out the debug messages that things
aren't really working any more RT-wise.

Lookie here (the "soft lockup" grep is to skip all the cross-CPU
traces from other CPU's that weren't necessarily the problem case):

  [torvalds@i7 linux]$ grep -5 "soft lockup" ~/0.txt | grep RIP
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   generic_exec_single+0xee/0x1b0
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   lock_acquire+0xb4/0x120
   RIP: 0010:   shmem_write_end+0x65/0xf0
   RIP: 0010:   _raw_spin_unlock_irqrestore+0x38/0x60
   RIP: 0010:   copy_user_enhanced_fast_string+0x5/0x10
   RIP: 0010:   copy_user_enhanced_fast_string+0x5/0x10
   RIP: 0010:   __slab_alloc+0x52f/0x58f
   RIP: 0010:   map_id_up+0x9/0x80
   RIP: 0010:   cpuidle_enter_state+0x79/0x190
   RIP: 0010:   unmap_single_vma+0x7d9/0x900
   RIP: 0010:   cpuidle_enter_state+0x79/0x190

and notice the pattern? All those early RIP cases are the page table
locks for copy_page_range and that one TLB flush for zap_page_range..

So your printouts are finally starting to make sense. But I'm also
starting to suspect strongly that the problem is that with all your
lock debugging and other overheads (does this still have
DEBUG_PAGEALLOC?) you really are getting into a "real" softlockup
because things are scaling so horribly badly.

If you now disable spinlock debugging and lockdep, hopefully that page
table lock now doesn't always get hung up on the lockdep locking, so
it starts scaling much better, and maybe you'd not see this...

                    Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19  3:49                                                                                           ` Linus Torvalds
@ 2014-12-19  3:58                                                                                             ` Dave Jones
  2014-12-19  4:03                                                                                               ` Dave Jones
  2014-12-19 14:30                                                                                               ` Chris Mason
  0 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-19  3:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 07:49:41PM -0800, Linus Torvalds wrote:

 > And when spinlocks start getting  contention, *nested* spinlocks
 > really really hurt. And you've got all the spinlock debugging on etc,
 > don't you?

Yeah, though remember this seems to have for some reason gotten worse
in more recent builds. I've been running kitchen-sink debug kernels
for my trinity runs for the last three years, and it's only this
last few months that this has got to be enough of a problem that I'm
not seeing the more interesting bugs. (Or perhaps we're just getting
better at fixing them in -next now, so my runs are lasting longer..)

 > Also, you do have this:
 > 
 >   sched: RT throttling activated
 > 
 > so there's something going on with RT scheduling too.

I see that fairly often. I've never dug into exactly what causes it, but
it seems to be triggerable just by some long running CPU hogs.

 > So your printouts are finally starting to make sense. But I'm also
 > starting to suspect strongly that the problem is that with all your
 > lock debugging and other overheads (does this still have
 > DEBUG_PAGEALLOC?) you really are getting into a "real" softlockup
 > because things are scaling so horribly badly.
 > 
 > If you now disable spinlock debugging and lockdep, hopefully that page
 > table lock now doesn't always get hung up on the lockdep locking, so
 > it starts scaling much better, and maybe you'd not see this...

I can give it a shot.  Hopefully there's some further mitigation that
could be done to allow a workload like this to survive under a debug
build though, as we've caught *so many* bugs with this stuff in the past.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19  3:58                                                                                             ` Dave Jones
@ 2014-12-19  4:03                                                                                               ` Dave Jones
  2014-12-19  4:48                                                                                                 ` Linus Torvalds
  2014-12-19 14:30                                                                                               ` Chris Mason
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-19  4:03 UTC (permalink / raw)
  To: Linus Torvalds, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 10:58:59PM -0500, Dave Jones wrote:
 >  > lock debugging and other overheads (does this still have
 >  > DEBUG_PAGEALLOC?) you really are getting into a "real" softlockup
 >  > because things are scaling so horribly badly.
 >  > 
 >  > If you now disable spinlock debugging and lockdep, hopefully that page
 >  > table lock now doesn't always get hung up on the lockdep locking, so
 >  > it starts scaling much better, and maybe you'd not see this...
 > 
 > I can give it a shot.  Hopefully there's some further mitigation that
 > could be done to allow a workload like this to survive under a debug
 > build though, as we've caught *so many* bugs with this stuff in the past.
 
Turns out also that this build didn't have PROVE_LOCKING enabled.
CONFIG_LOCKDEP was, but that just bloats the structures a little, and
afaik doesn't incur the same runtime overhead.

I also forgot to answer the question above, PAGEALLOC is also off.

So the only thing that was on that could cause spinlock overhead
was DEBUG_SPINLOCK (and LOCK_STAT, though iirc that's not huge either)

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19  4:03                                                                                               ` Dave Jones
@ 2014-12-19  4:48                                                                                                 ` Linus Torvalds
  2014-12-19 11:35                                                                                                   ` Peter Zijlstra
  2014-12-19 14:55                                                                                                   ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-19  4:48 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Thu, Dec 18, 2014 at 8:03 PM, Dave Jones <davej@redhat.com> wrote:
>
> So the only thing that was on that could cause spinlock overhead
> was DEBUG_SPINLOCK (and LOCK_STAT, though iirc that's not huge either)

So DEBUG_SPINLOCK does have one big downside if I recall correctly -
the debugging spinlocks are very much not fair. So they don't work
like the real ticket spinlocks. That might have serious effects on the
contention case, with some thread not making any progress due to just
the implementation of the debug spinlocks.

Peter, Ingo, maybe I'm full of crap on the debug spinlock thing., but
a quick look tells me thay are all built on top of the "trylock"
primitive that does indeed not queue anything, and is thus not fair.

I'm not sure why the debug spinlocks couldn't just be ticket locks
instead. But there you are - once more, the debug infrastructure is
actually much weaker and inferior to the "real" code.

                   Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19  4:48                                                                                                 ` Linus Torvalds
@ 2014-12-19 11:35                                                                                                   ` Peter Zijlstra
  2014-12-19 14:55                                                                                                   ` Dave Jones
  1 sibling, 0 replies; 486+ messages in thread
From: Peter Zijlstra @ 2014-12-19 11:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 08:48:24PM -0800, Linus Torvalds wrote:
> On Thu, Dec 18, 2014 at 8:03 PM, Dave Jones <davej@redhat.com> wrote:
> >
> > So the only thing that was on that could cause spinlock overhead
> > was DEBUG_SPINLOCK (and LOCK_STAT, though iirc that's not huge either)
> 
> So DEBUG_SPINLOCK does have one big downside if I recall correctly -
> the debugging spinlocks are very much not fair. So they don't work
> like the real ticket spinlocks. That might have serious effects on the
> contention case, with some thread not making any progress due to just
> the implementation of the debug spinlocks.
> 
> Peter, Ingo, maybe I'm full of crap on the debug spinlock thing., but
> a quick look tells me thay are all built on top of the "trylock"
> primitive that does indeed not queue anything, and is thus not fair.
> 
> I'm not sure why the debug spinlocks couldn't just be ticket locks
> instead. But there you are - once more, the debug infrastructure is
> actually much weaker and inferior to the "real" code.

Yeah, the DEBUG_SPINLOCK stuff is horrible. The trylock loops were
designed to 'detect' actual lockups, but this was all done before
lockdep.

I think one can make an argument to remove the trylock loops and fully
rely on lockdep to detect such issues. Only keeping the integrity
checks; similar to the mutex debugging stuff.

There's a related issue with the trylock loops in that it relies on
delay(1) and DVFS heavy (or virt) platforms often have 'dubious' quality
delay loops.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19  3:58                                                                                             ` Dave Jones
  2014-12-19  4:03                                                                                               ` Dave Jones
@ 2014-12-19 14:30                                                                                               ` Chris Mason
  2014-12-19 15:12                                                                                                 ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-19 14:30 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin



On Thu, Dec 18, 2014 at 10:58 PM, Dave Jones <davej@redhat.com> wrote:
> On Thu, Dec 18, 2014 at 07:49:41PM -0800, Linus Torvalds wrote:
> 
>  > And when spinlocks start getting  contention, *nested* spinlocks
>  > really really hurt. And you've got all the spinlock debugging on 
> etc,
>  > don't you?
> 
> Yeah, though remember this seems to have for some reason gotten worse
> in more recent builds. I've been running kitchen-sink debug kernels
> for my trinity runs for the last three years, and it's only this
> last few months that this has got to be enough of a problem that I'm
> not seeing the more interesting bugs. (Or perhaps we're just getting
> better at fixing them in -next now, so my runs are lasting longer..)

I think we're also adding more and more debugging.  It's definitely a 
good thing, but I think a lot of them are expected to stay off until 
you're trying to track down a specific problem.  I do always run with 
CONFIG_DEBUG_PAGEALLOC here and lock debugging/lockdep, and aside from 
being slow haven't hit trouble.

I know it's 3.16 instead of 3.17, but 16K stacks are probably 
increasing the pressure on everything in these runs.  It's my favorite 
kernel feature this year, but it's likely to make trinity hurt more on 
memory constrained boxes.

Your trace with hrtimer debugging yesterday made some sense, but it 
still should have been survivable.  I mean you should have kept seeing 
lockups from that one poor task being starved out of filling up his 
pool.  I know you have traces with a ton more output, but I'm still 
wondering if usb-serial and printk from NMI really get along well.  I'd 
try with debugging back on and serial consoles off.  We carry patches 
to make oom print less, just because the time spent on our slow 
emulated serial console is enough to back the box up into a death 
spiral.

The fairness of spinlock debugging is a really great point too, 
definitely worth trying with that off (and fixing, I love spinlock 
debugging).

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19  4:48                                                                                                 ` Linus Torvalds
  2014-12-19 11:35                                                                                                   ` Peter Zijlstra
@ 2014-12-19 14:55                                                                                                   ` Dave Jones
  2014-12-19 15:14                                                                                                     ` Chris Mason
  2014-12-19 19:15                                                                                                     ` Linus Torvalds
  1 sibling, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-19 14:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Thu, Dec 18, 2014 at 08:48:24PM -0800, Linus Torvalds wrote:
 > On Thu, Dec 18, 2014 at 8:03 PM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > So the only thing that was on that could cause spinlock overhead
 > > was DEBUG_SPINLOCK (and LOCK_STAT, though iirc that's not huge either)
 > 
 > So DEBUG_SPINLOCK does have one big downside if I recall correctly -
 > the debugging spinlocks are very much not fair. So they don't work
 > like the real ticket spinlocks. That might have serious effects on the
 > contention case, with some thread not making any progress due to just
 > the implementation of the debug spinlocks.

Wish DEBUG_SPINLOCK disabled, I see the same behaviour.
Lots of traces spewed, but it seems to run and run (at least so far).

	Dave

[24053.092097] trinity-c206 (26510) used greatest stack depth: 8888 bytes left
[24998.017355] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c205:636]
[24998.017457] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[24998.019523] CPU: 1 PID: 636 Comm: trinity-c205 Not tainted 3.18.0+ #108
[24998.021412] task: ffff8800962fed60 ti: ffff880242678000 task.ti: ffff880242678000
[24998.022367] RIP: 0010:[<ffffffff810ee9ca>]  [<ffffffff810ee9ca>] generic_exec_single+0xea/0x1b0
[24998.023332] RSP: 0018:ffff88024267b9c8  EFLAGS: 00000202
[24998.024287] RAX: 0000000000000008 RBX: ffffffff81799e0d RCX: 0000000000000038
[24998.025245] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[24998.026194] RBP: ffff88024267ba28 R08: ffff8802444f43f0 R09: 0000000000000000
[24998.027143] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024267b938
[24998.028144] R13: 0000000000406040 R14: ffff880242678000 R15: ffff8800962fed60
[24998.029095] FS:  00007ff08e78d740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[24998.030050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24998.031005] CR2: 0000000000000000 CR3: 0000000228031000 CR4: 00000000001407e0
[24998.031967] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[24998.032930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[24998.033898] Stack:
[24998.034855]  ffffffff8124055d ffff8801b4eff080 ffffffff8124055d 0000000000000000
[24998.035843]  ffffffff81047bb0 ffff88024267bad8 0000000000000003 0000000077930768
[24998.036832]  ffff88024267ba48 00000000ffffffff 0000000000000000 ffffffff81047bb0
[24998.037855] Call Trace:
[24998.038837]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[24998.039826]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[24998.040804]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[24998.041779]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[24998.042748]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[24998.043705]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280
[24998.044647]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[24998.045585]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[24998.046528]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[24998.047510]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[24998.048456]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[24998.049409]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50
[24998.050361]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[24998.051309]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[24998.052260]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[24998.053209]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[24998.054156]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[24998.055100]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[24998.056044]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[24998.056992]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[24998.057963] Code: 00 2b 01 00 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[24998.059993] sending NMI to other CPUs:
[24998.060963] NMI backtrace for cpu 0
[24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted 3.18.0+ #108
[24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
[24998.065137] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>] read_hpet+0x16/0x20
[24998.066226] RSP: 0018:ffff88024e203e38  EFLAGS: 00000046
[24998.067310] RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
[24998.068407] RDX: 0000000000000000 RSI: ffff88024e20c710 RDI: ffffffff81c26f40
[24998.069491] RBP: ffff88024e203e38 R08: 0000000000000000 R09: 000000000000000f
[24998.070569] R10: 0000000000000526 R11: 000000000000000f R12: 000016bf99600917
[24998.071642] R13: 0000000000000000 R14: ffff88024e20c700 R15: 000016eacc784674
[24998.072713] FS:  00007ff08e78d740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[24998.073771] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24998.074813] CR2: 0000000000000001 CR3: 0000000229d22000 CR4: 00000000001407f0
[24998.075855] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[24998.076891] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[24998.077903] Stack:
[24998.078886]  ffff88024e203e68 ffffffff810e0d3e ffff88024e20ca80 ffff88024e20ca80
[24998.079872]  ffff880197e0fe38 ffff88024e20c6c0 ffff88024e203e98 ffffffff810e9cd3
[24998.080832]  ffff88024e203f28 ffff88024e20ca80 ffff88024e203f28 ffff88024e20c6c0
[24998.081769] Call Trace:
[24998.082680]  <IRQ> 

[24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[24998.090435]  <EOI> 

[24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[24998.094605]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[24998.095420] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 d3 c8 ec 00 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d 29 c8 ec 00 
[24998.097220] NMI backtrace for cpu 3
[24998.098074] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0+ #108
[24998.099812] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[24998.100708] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[24998.101617] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[24998.102529] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[24998.103448] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[24998.104364] RBP: ffff88024471be38 R08: 0000000000000010 R09: 000000000000265a
[24998.105283] R10: 0000000000007d5b R11: 00000000000003ff R12: 0000000000000005
[24998.106201] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[24998.107115] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[24998.108039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24998.108962] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[24998.109894] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[24998.110820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[24998.111747] Stack:
[24998.112668]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[24998.113610]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[24998.114553]  000016bfb0c185fc ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[24998.115501] Call Trace:
[24998.116440]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[24998.117389]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[24998.118309]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[24998.119208]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[24998.120106] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[24998.122074] NMI backtrace for cpu 2
[24998.122981] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.18.0+ #108
[24998.124771] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[24998.125679] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[24998.126593] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[24998.127528] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[24998.128444] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[24998.129357] RBP: ffff88024470fe38 R08: ffff88024e28ecf0 R09: 00000000ffffffff
[24998.130267] R10: 000000000000265c R11: 00000000000003ff R12: 0000000000000005
[24998.131172] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[24998.132071] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[24998.132974] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24998.133876] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[24998.134789] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[24998.135694] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[24998.136589] Stack:
[24998.137505]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[24998.138424]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[24998.139340]  000016bfb0c18159 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[24998.140262] Call Trace:
[24998.141176]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[24998.142103]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[24998.143026]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[24998.143948]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[24998.144868] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25026.001132] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c205:636]
[25026.002121] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25026.007365] CPU: 1 PID: 636 Comm: trinity-c205 Tainted: G             L 3.18.0+ #108
[25026.009524] task: ffff8800962fed60 ti: ffff880242678000 task.ti: ffff880242678000
[25026.010591] RIP: 0010:[<ffffffff810ee9ce>]  [<ffffffff810ee9ce>] generic_exec_single+0xee/0x1b0
[25026.011710] RSP: 0018:ffff88024267b9c8  EFLAGS: 00000202
[25026.012787] RAX: 0000000000000008 RBX: ffffffff81799e0d RCX: 0000000000000038
[25026.013854] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[25026.014902] RBP: ffff88024267ba28 R08: ffff8802444f43f0 R09: 0000000000000000
[25026.015941] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024267b938
[25026.016982] R13: 0000000000406040 R14: ffff880242678000 R15: ffff8800962fed60
[25026.018019] FS:  00007ff08e78d740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25026.019065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25026.020113] CR2: 0000000000000000 CR3: 0000000228031000 CR4: 00000000001407e0
[25026.021205] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25026.022263] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25026.023310] Stack:
[25026.024349]  ffffffff8124055d ffff8801b4eff080 ffffffff8124055d 0000000000000000
[25026.025411]  ffffffff81047bb0 ffff88024267bad8 0000000000000003 0000000077930768
[25026.026472]  ffff88024267ba48 00000000ffffffff 0000000000000000 ffffffff81047bb0
[25026.027535] Call Trace:
[25026.028585]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25026.029640]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25026.030683]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25026.031747]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25026.032771]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[25026.033800]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280
[25026.034826]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25026.035854]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[25026.036883]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[25026.037896]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[25026.038886]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[25026.039875]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50
[25026.040865]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[25026.041867]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[25026.042818]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[25026.043763]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[25026.044707]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[25026.045647]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[25026.046587]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25026.047529]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25026.048469] Code: 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[25026.050509] sending NMI to other CPUs:
[25026.051516] NMI backtrace for cpu 0
[25026.052541] CPU: 0 PID: 2940 Comm: trinity-c150 Tainted: G             L 3.18.0+ #108
[25026.054635] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
[25026.055696] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>] read_hpet+0x16/0x20
[25026.056761] RSP: 0018:ffff88024e203e38  EFLAGS: 00000046
[25026.057822] RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
[25026.058886] RDX: 0000000000000000 RSI: ffff88024e20c710 RDI: ffffffff81c26f40
[25026.059947] RBP: ffff88024e203e38 R08: 0000000000000000 R09: 000000000000000f
[25026.061009] R10: 0000000000000526 R11: 000000000000000f R12: 000016c61e4e2117
[25026.062073] R13: 0000000000000000 R14: ffff88024e20c700 R15: 000016eacc784674
[25026.063139] FS:  00007ff08e78d740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25026.064215] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25026.065271] CR2: 0000000000000001 CR3: 0000000229d22000 CR4: 00000000001407f0
[25026.066314] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25026.067349] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25026.068359] Stack:
[25026.069339]  ffff88024e203e68 ffffffff810e0d3e ffff88024e20ca80 ffff88024e20ca80
[25026.070326]  ffff880197e0fe38 ffff88024e20c6c0 ffff88024e203e98 ffffffff810e9cd3
[25026.071289]  ffff88024e203f28 ffff88024e20ca80 ffff88024e203f28 ffff88024e20c6c0
[25026.072225] Call Trace:
[25026.073133]  <IRQ> 

[25026.074028]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[25026.074902]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[25026.075767]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[25026.076622]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[25026.077476]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[25026.078328]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[25026.079183]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[25026.080035]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25026.080885]  <EOI> 

[25026.081728]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[25026.082565]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[25026.083397]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[25026.084226]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[25026.085050]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25026.085867] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 d3 c8 ec 00 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d 29 c8 ec 00 
[25026.087667] NMI backtrace for cpu 3
[25026.088525] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25026.090268] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25026.091168] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25026.092081] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25026.092994] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25026.093915] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25026.094834] RBP: ffff88024471be38 R08: ffff88024e2ced08 R09: 00000000ffffffff
[25026.095754] R10: 000000000000265a R11: 00000000000003ff R12: 0000000000000005
[25026.096677] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[25026.097598] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25026.098526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25026.099454] CR2: 00007fffcba6fea0 CR3: 0000000001c11000 CR4: 00000000001407e0
[25026.100392] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25026.101324] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25026.102257] Stack:
[25026.103183]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[25026.104133]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25026.105083]  000016c63648391d ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25026.106036] Call Trace:
[25026.106984]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25026.107936]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25026.108865]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25026.109771]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25026.110674] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25026.112654] NMI backtrace for cpu 2
[25026.113565] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25026.115375] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25026.116284] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25026.117207] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25026.118123] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25026.119041] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25026.119958] RBP: ffff88024470fe38 R08: ffff88024e28ed08 R09: 00000000ffffffff
[25026.120875] R10: 0000000000002659 R11: 00000000000003ff R12: 0000000000000005
[25026.121814] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[25026.122719] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25026.123625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25026.124533] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25026.125450] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25026.126361] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25026.127262] Stack:
[25026.128157]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[25026.129079]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25026.129996]  000016c636483734 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25026.130920] Call Trace:
[25026.131864]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25026.132794]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25026.133725]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25026.134652]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25026.135579] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25037.184648] INFO: rcu_sched detected stalls on CPUs/tasks:
[25037.185608] 	(detected by 0, t=6002 jiffies, g=777861, c=777860, q=0)
[25037.186536] INFO: Stall ended before state dump start
[25053.984908] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c205:636]
[25053.985884] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25053.991028] CPU: 1 PID: 636 Comm: trinity-c205 Tainted: G             L 3.18.0+ #108
[25053.993097] task: ffff8800962fed60 ti: ffff880242678000 task.ti: ffff880242678000
[25053.994131] RIP: 0010:[<ffffffff810ee9ca>]  [<ffffffff810ee9ca>] generic_exec_single+0xea/0x1b0
[25053.995202] RSP: 0018:ffff88024267b9c8  EFLAGS: 00000202
[25053.996223] RAX: 0000000000000008 RBX: ffffffff81799e0d RCX: 0000000000000038
[25053.997247] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[25053.998270] RBP: ffff88024267ba28 R08: ffff8802444f43f0 R09: 0000000000000000
[25053.999294] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024267b938
[25054.000323] R13: 0000000000406040 R14: ffff880242678000 R15: ffff8800962fed60
[25054.001349] FS:  00007ff08e78d740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25054.002384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25054.003420] CR2: 0000000000000000 CR3: 0000000228031000 CR4: 00000000001407e0
[25054.004455] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25054.005522] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25054.006552] Stack:
[25054.007575]  ffffffff8124055d ffff8801b4eff080 ffffffff8124055d 0000000000000000
[25054.008619]  ffffffff81047bb0 ffff88024267bad8 0000000000000003 0000000077930768
[25054.009658]  ffff88024267ba48 00000000ffffffff 0000000000000000 ffffffff81047bb0
[25054.010692] Call Trace:
[25054.011716]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25054.012743]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25054.013763]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25054.014779]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25054.015830]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[25054.016846]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280
[25054.017846]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25054.018823]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[25054.019798]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[25054.020774]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[25054.021732]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[25054.022671]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50
[25054.023604]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[25054.024534]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[25054.025495]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[25054.026421]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[25054.027353]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[25054.028281]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[25054.029206]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25054.030136]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25054.031054] Code: 00 2b 01 00 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[25054.033040] sending NMI to other CPUs:
[25054.033992] NMI backtrace for cpu 0
[25054.035027] CPU: 0 PID: 2940 Comm: trinity-c150 Tainted: G             L 3.18.0+ #108
[25054.037117] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
[25054.038172] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>] read_hpet+0x16/0x20
[25054.039238] RSP: 0018:ffff88024e203e38  EFLAGS: 00000046
[25054.040297] RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
[25054.041365] RDX: 0000000000000000 RSI: ffff88024e20c710 RDI: ffffffff81c26f40
[25054.042436] RBP: ffff88024e203e38 R08: 0000000000000000 R09: 000000000000000f
[25054.043514] R10: 0000000000000526 R11: 0000000000aaaaaa R12: 000016cca33c3917
[25054.044572] R13: 0000000000000000 R14: ffff88024e20c700 R15: 000016eacc784674
[25054.045606] FS:  00007ff08e78d740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25054.046645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25054.047685] CR2: 0000000000000001 CR3: 0000000229d22000 CR4: 00000000001407f0
[25054.048728] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25054.049763] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25054.050774] Stack:
[25054.051757]  ffff88024e203e68 ffffffff810e0d3e ffff88024e20ca80 ffff88024e20ca80
[25054.052744]  ffff880197e0fe38 ffff88024e20c6c0 ffff88024e203e98 ffffffff810e9cd3
[25054.053709]  ffff88024e203f28 ffff88024e20ca80 ffff88024e203f28 ffff88024e20c6c0
[25054.054649] Call Trace:
[25054.055559]  <IRQ> 

[25054.056455]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[25054.057326]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[25054.058194]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[25054.059050]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[25054.059904]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[25054.060759]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[25054.061614]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[25054.062466]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25054.063317]  <EOI> 

[25054.064159]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[25054.064994]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[25054.065826]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[25054.066656]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[25054.067480]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25054.068297] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 d3 c8 ec 00 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d 29 c8 ec 00 
[25054.070095] NMI backtrace for cpu 2
[25054.070949] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25054.072692] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25054.073589] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25054.074498] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25054.075411] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25054.076329] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25054.077248] RBP: ffff88024470fe38 R08: ffff88024e28ed08 R09: 00000000ffffffff
[25054.078167] R10: 000000000000265b R11: 00000000000003ff R12: 0000000000000005
[25054.079087] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[25054.080007] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25054.080934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25054.081856] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25054.082790] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25054.083721] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25054.084651] Stack:
[25054.085578]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[25054.086525]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25054.087477]  000016ccba9dadfd ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25054.088429] Call Trace:
[25054.089371]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25054.090324]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25054.091251]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25054.092158]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25054.093059] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25054.095041] NMI backtrace for cpu 3
[25054.095952] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25054.097763] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25054.098676] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25054.099598] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25054.100517] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25054.101441] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25054.102360] RBP: ffff88024471be38 R08: 0000000000000018 R09: 0000000000002655
[25054.103278] R10: 000000000000be78 R11: 00000000000003ff R12: 0000000000000005
[25054.104189] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[25054.105118] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25054.106026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25054.106933] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25054.107847] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25054.108756] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25054.109656] Stack:
[25054.110546]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[25054.111466]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25054.112383]  000016ccba9dc98b ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25054.113305] Call Trace:
[25054.114222]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25054.115173]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25054.116098]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25054.117022]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25054.117945] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25081.968684] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c205:636]
[25081.969644] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25081.974748] CPU: 1 PID: 636 Comm: trinity-c205 Tainted: G             L 3.18.0+ #108
[25081.976852] task: ffff8800962fed60 ti: ffff880242678000 task.ti: ffff880242678000
[25081.977892] RIP: 0010:[<ffffffff810ee9ca>]  [<ffffffff810ee9ca>] generic_exec_single+0xea/0x1b0
[25081.978981] RSP: 0018:ffff88024267b9c8  EFLAGS: 00000202
[25081.980030] RAX: 0000000000000008 RBX: ffffffff81799e0d RCX: 0000000000000038
[25081.981068] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[25081.982090] RBP: ffff88024267ba28 R08: ffff8802444f43f0 R09: 0000000000000000
[25081.983107] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024267b938
[25081.984121] R13: 0000000000406040 R14: ffff880242678000 R15: ffff8800962fed60
[25081.985132] FS:  00007ff08e78d740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25081.986155] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25081.987179] CR2: 0000000000000000 CR3: 0000000228031000 CR4: 00000000001407e0
[25081.988208] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25081.989271] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25081.990294] Stack:
[25081.991306]  ffffffff8124055d ffff8801b4eff080 ffffffff8124055d 0000000000000000
[25081.992338]  ffffffff81047bb0 ffff88024267bad8 0000000000000003 0000000077930768
[25081.993369]  ffff88024267ba48 00000000ffffffff 0000000000000000 ffffffff81047bb0
[25081.994406] Call Trace:
[25081.995430]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25081.996456]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25081.997470]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25081.998481]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25081.999512]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[25082.000509]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280
[25082.001510]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25082.002510]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[25082.003514]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[25082.004498]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[25082.005463]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[25082.006425]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50
[25082.007386]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[25082.008335]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[25082.009300]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[25082.010222]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[25082.011144]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[25082.012057]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[25082.012971]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25082.013889]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25082.014807] Code: 00 2b 01 00 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[25082.016796] sending NMI to other CPUs:
[25082.017742] NMI backtrace for cpu 0
[25082.018768] CPU: 0 PID: 2940 Comm: trinity-c150 Tainted: G             L 3.18.0+ #108
[25082.020865] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
[25082.021929] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>] read_hpet+0x16/0x20
[25082.022996] RSP: 0018:ffff88024e203e38  EFLAGS: 00000046
[25082.024057] RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
[25082.025125] RDX: 0000000000000000 RSI: ffff88024e20c710 RDI: ffffffff81c26f40
[25082.026185] RBP: ffff88024e203e38 R08: 0000000000000000 R09: 000000000000000f
[25082.027248] R10: 0000000000000526 R11: 0000000000aaaaaa R12: 000016d3282a5117
[25082.028317] R13: 0000000000000000 R14: ffff88024e20c700 R15: 000016eacc784674
[25082.029383] FS:  00007ff08e78d740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25082.030459] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25082.031518] CR2: 0000000000000001 CR3: 0000000229d22000 CR4: 00000000001407f0
[25082.032563] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25082.033598] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25082.034612] Stack:
[25082.035595]  ffff88024e203e68 ffffffff810e0d3e ffff88024e20ca80 ffff88024e20ca80
[25082.036582]  ffff880197e0fe38 ffff88024e20c6c0 ffff88024e203e98 ffffffff810e9cd3
[25082.037546]  ffff88024e203f28 ffff88024e20ca80 ffff88024e203f28 ffff88024e20c6c0
[25082.038485] Call Trace:
[25082.039397]  <IRQ> 

[25082.040290]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[25082.041164]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[25082.042032]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[25082.042890]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[25082.043748]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[25082.044603]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[25082.045460]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[25082.046317]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25082.047170]  <EOI> 

[25082.048018]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[25082.048856]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[25082.049691]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[25082.050520]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[25082.051347]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25082.052166] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 d3 c8 ec 00 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d 29 c8 ec 00 
[25082.053966] NMI backtrace for cpu 3
[25082.054824] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25082.056568] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25082.057464] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25082.058376] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25082.059292] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25082.060214] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25082.061132] RBP: ffff88024471be38 R08: ffff88024e2ced08 R09: 00000000ffffffff
[25082.062054] R10: 0000000000002655 R11: 00000000000003ff R12: 0000000000000005
[25082.062977] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[25082.063899] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25082.064825] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25082.065751] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25082.066685] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25082.067618] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25082.068552] Stack:
[25082.069477]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[25082.070426]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25082.071378]  000016d33f8be858 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25082.072334] Call Trace:
[25082.073279]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25082.074233]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25082.075161]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25082.076066]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25082.076968] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25082.078949] NMI backtrace for cpu 2
[25082.079865] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25082.081672] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25082.082586] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25082.083507] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25082.084424] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25082.085345] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25082.086261] RBP: ffff88024470fe38 R08: ffff88024e28ed08 R09: 00000000ffffffff
[25082.087178] R10: 000000000000265b R11: 00000000000003ff R12: 0000000000000005
[25082.088088] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[25082.089016] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25082.089919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25082.090826] CR2: 00007f274c4061c0 CR3: 0000000001c11000 CR4: 00000000001407e0
[25082.091740] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25082.092652] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25082.093556] Stack:
[25082.094451]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[25082.095371]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25082.096293]  000016d33f8bcc84 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25082.097220] Call Trace:
[25082.098141]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25082.099095]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25082.100026]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25082.100956]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25082.101882] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25097.149884] INFO: rcu_sched self-detected stall on CPU
[25097.149885] INFO: rcu_sched self-detected stall on CPU

[25097.149886] 	1: (5973 ticks this GP) idle=6b3/140000000000001/0 softirq=1672732/1672732 
[25097.149897] 	
[25097.149898]  (t=6000 jiffies g=777862 c=777861 q=0)
[25097.149899] Task dump for CPU 0:
[25097.149899] trinity-c150    R
[25097.149900]   running task    
[25097.149908] 13312  2940  29562 0x10000008
[25097.149909]  ffff880197e0c000
[25097.149909]  000000002e7fd874
[25097.149909]  ffff880197e0fb88
[25097.149910]  ffffffff810f1c86

[25097.149910]  ffff8801bf3536b0
[25097.149910]  00007fffa4bfd708
[25097.149911]  ffff880197e0fb98
[25097.149911]  000000002e7fd874

[25097.149911]  ffff880197e0fbb8
[25097.149911]  ffffffff810f1c86
[25097.149912]  ffff880197e0fc08
[25097.149912]  00007fffa4bfd708

[25097.149912] Call Trace:

[25097.149915]  [<ffffffff810f1c86>] ? __module_text_address+0x16/0x80

[25097.149917]  [<ffffffff810f1c86>] ? __module_text_address+0x16/0x80

[25097.149919]  [<ffffffff810f63f6>] ? is_module_text_address+0x16/0x30

[25097.149921]  [<ffffffff810942f8>] ? __kernel_text_address+0x58/0x80

[25097.149923]  [<ffffffff81006a2f>] ? print_context_stack+0x8f/0x100

[25097.149925]  [<ffffffff8100556f>] ? dump_trace+0x16f/0x350

[25097.149926]  [<ffffffff811e2966>] ? link_path_walk+0x266/0x850

[25097.149927]  [<ffffffff811e5dff>] ? getname_flags+0x4f/0x1a0

[25097.149929]  [<ffffffff811e5d86>] ? final_putname+0x26/0x50

[25097.149930]  [<ffffffff810133af>] ? save_stack_trace+0x2f/0x50

[25097.149941]  [<ffffffff811bc6e0>] ? set_track+0x70/0x140

[25097.149943]  [<ffffffff8178e291>] ? free_debug_processing+0x1de/0x232

[25097.149944]  [<ffffffff8178e407>] ? __slab_free+0x122/0x2e9

[25097.149946]  [<ffffffff81352ef4>] ? timerqueue_del+0x24/0x70

[25097.149948]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0

[25097.149950]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0

[25097.149951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0

[25097.149953]  [<ffffffff812c8d6d>] ? SyS_msgget+0x4d/0x70

[25097.149955]  [<ffffffff817994c4>] ? tracesys_phase2+0xd4/0xd9
[25097.149955] Task dump for CPU 1:
[25097.149956] trinity-c205    R
[25097.149956]   running task    
[25097.149958] 13312   636  29562 0x10000008
[25097.149959]  ffff8800962fed60
[25097.149959]  0000000077930768
[25097.149959]  ffff88024e243d78
[25097.149959]  ffffffff810a3932

[25097.149960]  0000000000000001
[25097.149960]  ffffffff81c4bf80
[25097.149960]  ffff88024e243d98
[25097.149960]  ffffffff810a76dd

[25097.149961]  ffff88024e24c440
[25097.149961]  0000000000000002
[25097.149961]  ffff88024e243dc8
[25097.149961]  ffffffff810ce920

[25097.149962] Call Trace:
[25097.149962]  <IRQ> 

[25097.149964]  [<ffffffff810a3932>] sched_show_task+0xd2/0x140

[25097.149965]  [<ffffffff810a76dd>] dump_cpu_task+0x3d/0x50

[25097.149969]  [<ffffffff810ce920>] rcu_dump_cpu_stacks+0x90/0xd0

[25097.149970]  [<ffffffff810d4c33>] rcu_check_callbacks+0x4c3/0x730

[25097.149973]  [<ffffffff81120b4c>] ? acct_account_cputime+0x1c/0x20

[25097.149974]  [<ffffffff810a800e>] ? account_system_time+0x8e/0x190

[25097.149976]  [<ffffffff810da34b>] update_process_times+0x4b/0x80

[25097.149977]  [<ffffffff810e9cff>] tick_sched_timer+0x4f/0x160

[25097.149978]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0

[25097.149979]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20

[25097.149980]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260

[25097.149982]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70

[25097.149983]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60

[25097.149985]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25097.149985]  <EOI> 

[25097.149987]  [<ffffffff81799e0d>] ? retint_restore_args+0xe/0xe

[25097.149989]  [<ffffffff810ee9ce>] ? generic_exec_single+0xee/0x1b0

[25097.149990]  [<ffffffff810eea09>] ? generic_exec_single+0x129/0x1b0

[25097.149991]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0

[25097.149992]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0

[25097.149993]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60

[25097.149994]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60

[25097.149995]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0

[25097.149998]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280

[25097.149999]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60

[25097.150000]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320

[25097.150001]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0

[25097.150003]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50

[25097.150004]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900

[25097.150006]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50

[25097.150007]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160

[25097.150009]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190

[25097.150011]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0

[25097.150012]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122

[25097.150013]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0

[25097.150015]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860

[25097.150016]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0

[25097.150018]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9

[25097.206563] 	0: (1 GPs behind) idle=845/140000000000001/0 softirq=1616166/1616167 
[25097.207170] 	 (t=6000 jiffies g=777862 c=777861 q=0)
[25097.207778] Task dump for CPU 0:
[25097.208382] trinity-c150    R  running task    13312  2940  29562 0x10000008
[25097.209002]  ffff8801bf3536b0 000000002e7fd874 ffff88024e203d78 ffffffff810a3932
[25097.209635]  0000000000000000 ffffffff81c4bf80 ffff88024e203d98 ffffffff810a76dd
[25097.210304]  ffff88024e20c440 0000000000000001 ffff88024e203dc8 ffffffff810ce920
[25097.210965] Call Trace:
[25097.211614]  <IRQ>  [<ffffffff810a3932>] sched_show_task+0xd2/0x140
[25097.212284]  [<ffffffff810a76dd>] dump_cpu_task+0x3d/0x50
[25097.212955]  [<ffffffff810ce920>] rcu_dump_cpu_stacks+0x90/0xd0
[25097.213628]  [<ffffffff810d4c33>] rcu_check_callbacks+0x4c3/0x730
[25097.214300]  [<ffffffff81120b4c>] ? acct_account_cputime+0x1c/0x20
[25097.214973]  [<ffffffff810a800e>] ? account_system_time+0x8e/0x190
[25097.215645]  [<ffffffff810da34b>] update_process_times+0x4b/0x80
[25097.216317]  [<ffffffff810e9cff>] tick_sched_timer+0x4f/0x160
[25097.216996]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[25097.217676]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[25097.218359]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[25097.219040]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[25097.219719]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[25097.220415]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25097.221091]  <EOI>  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[25097.221778]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[25097.222454]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[25097.223125]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[25097.223793]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25097.224461] Task dump for CPU 1:
[25097.225124] trinity-c205    R  running task    13312   636  29562 0x10000008
[25097.225806]  00000000458261c0 ffff880066bae1b8 ffff880066bae1f0 ffff88021cbef200
[25097.226496]  ffff88024267bd20 0000000000000001 ffff88024267bd78 ffffffff811927cf
[25097.227190]  0000000000000000 0000000000000000 ffff880066bae1b8 0000000000000000
[25097.227889] Call Trace:
[25097.228578]  [<ffffffff811927cf>] ? unmap_mapping_range+0x12f/0x190
[25097.229279]  [<ffffffff8118193d>] ? shmem_fallocate+0x38d/0x4c0
[25097.230004]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[25097.230708]  [<ffffffff811d38af>] ? do_fallocate+0x12f/0x1d0
[25097.231409]  [<ffffffff811a6995>] ? SyS_madvise+0x385/0x860
[25097.232106]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25097.232810]  [<ffffffff817994c4>] ? tracesys_phase2+0xd4/0xd9
[25121.945509] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c205:636]
[25121.946236] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25121.950189] CPU: 1 PID: 636 Comm: trinity-c205 Tainted: G             L 3.18.0+ #108
[25121.951854] task: ffff8800962fed60 ti: ffff880242678000 task.ti: ffff880242678000
[25121.952705] RIP: 0010:[<ffffffff810ee9ce>]  [<ffffffff810ee9ce>] generic_exec_single+0xee/0x1b0
[25121.953566] RSP: 0018:ffff88024267b9c8  EFLAGS: 00000202
[25121.954428] RAX: 0000000000000008 RBX: ffffffff81799e0d RCX: 0000000000000038
[25121.955296] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[25121.956199] RBP: ffff88024267ba28 R08: ffff8802444f43f0 R09: 0000000000000000
[25121.957065] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024267b938
[25121.957937] R13: 0000000000406040 R14: ffff880242678000 R15: ffff8800962fed60
[25121.958811] FS:  00007ff08e78d740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25121.959692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25121.960571] CR2: 0000000000000000 CR3: 0000000228031000 CR4: 00000000001407e0
[25121.961462] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25121.962346] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25121.963231] Stack:
[25121.964105]  ffffffff8124055d ffff8801b4eff080 ffffffff8124055d 0000000000000000
[25121.965010]  ffffffff81047bb0 ffff88024267bad8 0000000000000003 0000000077930768
[25121.965945]  ffff88024267ba48 00000000ffffffff 0000000000000000 ffffffff81047bb0
[25121.966838] Call Trace:
[25121.967723]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25121.968615]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25121.969494]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25121.970371]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25121.971239]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[25121.972107]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280
[25121.972968]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25121.973829]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[25121.974690]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[25121.975592]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[25121.976454]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[25121.977319]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50
[25121.978178]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[25121.979044]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[25121.979905]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[25121.980768]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[25121.981636]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[25121.982501]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[25121.983362]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25121.984228]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25121.985086] Code: 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[25121.986993] sending NMI to other CPUs:
[25121.987891] NMI backtrace for cpu 0
[25121.988871] CPU: 0 PID: 2940 Comm: trinity-c150 Tainted: G             L 3.18.0+ #108
[25121.990863] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
[25121.991871] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>] read_hpet+0x16/0x20
[25121.992886] RSP: 0018:ffff88024e203e38  EFLAGS: 00000046
[25121.993898] RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
[25121.994911] RDX: 0000000000000000 RSI: ffff88024e20c710 RDI: ffffffff81c26f40
[25121.995923] RBP: ffff88024e203e38 R08: 0000000000000000 R09: 000000000000000f
[25121.996942] R10: 0000000000000526 R11: 0000000000aaaaaa R12: 000016dc7859e117
[25121.997961] R13: 0000000000000000 R14: ffff88024e20c700 R15: 000016eacc784674
[25121.998975] FS:  00007ff08e78d740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25122.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25122.001032] CR2: 0000000000000001 CR3: 0000000229d22000 CR4: 00000000001407f0
[25122.002066] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25122.003094] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25122.004099] Stack:
[25122.005070]  ffff88024e203e68 ffffffff810e0d3e ffff88024e20ca80 ffff88024e20ca80
[25122.006050]  ffff880197e0fe38 ffff88024e20c6c0 ffff88024e203e98 ffffffff810e9cd3
[25122.007005]  ffff88024e203f28 ffff88024e20ca80 ffff88024e203f28 ffff88024e20c6c0
[25122.007936] Call Trace:
[25122.008841]  <IRQ> 

[25122.009729]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[25122.010594]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[25122.011453]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[25122.012301]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[25122.013148]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[25122.013996]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[25122.014843]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[25122.015690]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25122.016534]  <EOI> 

[25122.017374]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[25122.018207]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[25122.019037]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[25122.019859]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[25122.020679]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25122.021490] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 d3 c8 ec 00 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d 29 c8 ec 00 
[25122.023285] NMI backtrace for cpu 2
[25122.024138] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25122.025877] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25122.026772] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25122.027680] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25122.028590] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25122.029511] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25122.030427] RBP: ffff88024470fe38 R08: ffff88024e28ed08 R09: 00000000ffffffff
[25122.031345] R10: 000000000000265b R11: 00000000000003ff R12: 0000000000000005
[25122.032268] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[25122.033189] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25122.034115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25122.035042] CR2: 00007f6b1f71a000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25122.035977] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25122.036911] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25122.037844] Stack:
[25122.038768]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[25122.039719]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25122.040671]  000016dc8fbb58a7 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25122.041629] Call Trace:
[25122.042576]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25122.043530]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25122.044462]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25122.045370]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25122.046275] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25122.048261] NMI backtrace for cpu 3
[25122.049177] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25122.050992] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25122.051911] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25122.052836] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25122.053757] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25122.054682] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25122.055628] RBP: ffff88024471be38 R08: ffff88024e2ced08 R09: 00000000ffffffff
[25122.056550] R10: 0000000000002655 R11: 00000000000003ff R12: 0000000000000005
[25122.057466] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[25122.058374] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25122.059282] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25122.060191] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25122.061107] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25122.062020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25122.062925] Stack:
[25122.063822]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[25122.064744]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25122.065690]  000016dc8fbb74c1 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25122.066617] Call Trace:
[25122.067536]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25122.068470]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25122.069400]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25122.070329]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25122.071258] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25149.929286] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c205:636]
[25149.930256] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25149.935393] CPU: 1 PID: 636 Comm: trinity-c205 Tainted: G             L 3.18.0+ #108
[25149.937508] task: ffff8800962fed60 ti: ffff880242678000 task.ti: ffff880242678000
[25149.938555] RIP: 0010:[<ffffffff810ee9ce>]  [<ffffffff810ee9ce>] generic_exec_single+0xee/0x1b0
[25149.939652] RSP: 0018:ffff88024267b9c8  EFLAGS: 00000202
[25149.940706] RAX: 0000000000000008 RBX: ffffffff81799e0d RCX: 0000000000000038
[25149.941751] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[25149.942780] RBP: ffff88024267ba28 R08: ffff8802444f43f0 R09: 0000000000000000
[25149.943803] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024267b938
[25149.944822] R13: 0000000000406040 R14: ffff880242678000 R15: ffff8800962fed60
[25149.945839] FS:  00007ff08e78d740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25149.946862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25149.947891] CR2: 0000000000000000 CR3: 0000000228031000 CR4: 00000000001407e0
[25149.948928] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25149.949997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25149.951027] Stack:
[25149.952045]  ffffffff8124055d ffff8801b4eff080 ffffffff8124055d 0000000000000000
[25149.953082]  ffffffff81047bb0 ffff88024267bad8 0000000000000003 0000000077930768
[25149.954119]  ffff88024267ba48 00000000ffffffff 0000000000000000 ffffffff81047bb0
[25149.955159] Call Trace:
[25149.956184]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25149.957219]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25149.958239]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25149.959258]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25149.960299]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[25149.961305]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280
[25149.962310]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25149.963313]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[25149.964320]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[25149.965312]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[25149.966281]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[25149.967249]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50
[25149.968212]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[25149.969163]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[25149.970131]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[25149.971056]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[25149.971977]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[25149.972895]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[25149.973812]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25149.974735]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25149.975654] Code: 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[25149.977642] sending NMI to other CPUs:
[25149.978587] NMI backtrace for cpu 0
[25149.979615] CPU: 0 PID: 2940 Comm: trinity-c150 Tainted: G             L 3.18.0+ #108
[25149.981704] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
[25149.982766] RIP: 0010:[<ffffffff810cf1db>]  [<ffffffff810cf1db>] invoke_rcu_core+0x2b/0x50
[25149.983834] RSP: 0018:ffff88024e203db8  EFLAGS: 00000093
[25149.984894] RAX: ffffffff81cfe330 RBX: 0000000000000000 RCX: 00000000000bde85
[25149.985959] RDX: 00000000000bde86 RSI: ffff88024e20c440 RDI: ffffffff81c4bf80
[25149.987021] RBP: ffff88024e203dc8 R08: 0000000000000000 R09: 000000000000000f
[25149.988084] R10: 0000000000000526 R11: 0000000000aaaaaa R12: ffffffff81c4bf80
[25149.989155] R13: ffffffff81c4bf80 R14: 0000000000000000 R15: 000016eacc784674
[25149.990227] FS:  00007ff08e78d740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25149.991310] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25149.992374] CR2: 0000000000000001 CR3: 0000000229d22000 CR4: 00000000001407f0
[25149.993420] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25149.994462] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25149.995477] Stack:
[25149.996463]  ffff8801bf3536b0 ffff88024e20c440 ffff88024e203e38 ffffffff810d48f3
[25149.997456]  ffff88024e203de8 ffffffff81120b4c ffff88024e203e28 ffffffff810a800e
[25149.998421]  ffff88024e203e08 ffffffff81cfe338 ffff88024e203e38 ffff8801bf3536b0
[25149.999364] Call Trace:
[25150.000280]  <IRQ> 

[25150.001179]  [<ffffffff810d48f3>] rcu_check_callbacks+0x183/0x730
[25150.002063]  [<ffffffff81120b4c>] ? acct_account_cputime+0x1c/0x20
[25150.002939]  [<ffffffff810a800e>] ? account_system_time+0x8e/0x190
[25150.003805]  [<ffffffff810da34b>] update_process_times+0x4b/0x80
[25150.004669]  [<ffffffff810e9cff>] tick_sched_timer+0x4f/0x160
[25150.005530]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[25150.006391]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[25150.007248]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[25150.008106]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[25150.008958]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[25150.009803]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25150.010645]  <EOI> 

[25150.011487]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[25150.012328]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[25150.013162]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[25150.013991]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[25150.014809]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25150.015627] Code: 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 65 8b 1c 25 2c a0 00 00 3b 1d e8 02 c3 00 73 24 89 db 48 8b 05 31 3e 74 00 48 0f a3 18 <19> db 85 db 74 0a bf 09 00 00 00 e8 b5 be fa ff 48 83 c4 08 5b 
[25150.017421] NMI backtrace for cpu 2
[25150.018283] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25150.020063] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25150.020972] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25150.021897] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25150.022819] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25150.023745] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25150.024669] RBP: ffff88024470fe38 R08: ffff88024e28ed08 R09: 00000000ffffffff
[25150.025595] R10: 000000000000265b R11: 00000000000003ff R12: 0000000000000005
[25150.026523] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[25150.027445] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25150.028378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25150.029311] CR2: 00007f6b1f71a000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25150.030250] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25150.031190] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25150.032125] Stack:
[25150.033056]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[25150.034012]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25150.034966]  000016e314a978e8 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25150.035903] Call Trace:
[25150.036807]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25150.037724]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25150.038637]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25150.039548]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25150.040438] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25150.042345] NMI backtrace for cpu 3
[25150.043231] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25150.045019] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25150.045926] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25150.046843] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25150.047755] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25150.048670] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25150.049598] RBP: ffff88024471be38 R08: ffff88024e2ced08 R09: 00000000ffffffff
[25150.050500] R10: 0000000000002655 R11: 00000000000003ff R12: 0000000000000005
[25150.051406] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[25150.052313] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25150.053216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25150.054114] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25150.055021] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25150.055929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25150.056829] Stack:
[25150.057727]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[25150.058649]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25150.059600]  000016e314a99619 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25150.060530] Call Trace:
[25150.061450]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25150.062380]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25150.063303]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25150.064231]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25150.065157] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25177.913063] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c205:636]
[25177.914027] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25177.919143] CPU: 1 PID: 636 Comm: trinity-c205 Tainted: G             L 3.18.0+ #108
[25177.921249] task: ffff8800962fed60 ti: ffff880242678000 task.ti: ffff880242678000
[25177.922289] RIP: 0010:[<ffffffff810ee9ca>]  [<ffffffff810ee9ca>] generic_exec_single+0xea/0x1b0
[25177.923382] RSP: 0018:ffff88024267b9c8  EFLAGS: 00000202
[25177.924433] RAX: 0000000000000008 RBX: ffffffff81799e0d RCX: 0000000000000038
[25177.925476] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[25177.926504] RBP: ffff88024267ba28 R08: ffff8802444f43f0 R09: 0000000000000000
[25177.927525] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024267b938
[25177.928542] R13: 0000000000406040 R14: ffff880242678000 R15: ffff8800962fed60
[25177.929556] FS:  00007ff08e78d740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25177.930580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25177.931606] CR2: 0000000000000000 CR3: 0000000228031000 CR4: 00000000001407e0
[25177.932638] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25177.933707] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25177.934735] Stack:
[25177.935750]  ffffffff8124055d ffff8801b4eff080 ffffffff8124055d 0000000000000000
[25177.936786]  ffffffff81047bb0 ffff88024267bad8 0000000000000003 0000000077930768
[25177.937819]  ffff88024267ba48 00000000ffffffff 0000000000000000 ffffffff81047bb0
[25177.938856] Call Trace:
[25177.939882]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25177.940911]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25177.941928]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25177.942941]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25177.943980]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[25177.944982]  [<ffffffff81173f62>] ? release_pages+0x1c2/0x280
[25177.945985]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25177.946988]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[25177.947993]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[25177.948984]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[25177.949951]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[25177.950920]  [<ffffffff8116d377>] ? __free_pages+0x37/0x50
[25177.951886]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[25177.952837]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[25177.953805]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[25177.954730]  [<ffffffff8178df51>] ? cmpxchg_double_slab.isra.49+0xd8/0x122
[25177.955653]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[25177.956573]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[25177.957491]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25177.958414]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25177.959332] Code: 00 2b 01 00 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 
[25177.961321] sending NMI to other CPUs:
[25177.962268] NMI backtrace for cpu 0
[25177.963295] CPU: 0 PID: 2940 Comm: trinity-c150 Tainted: G             L 3.18.0+ #108
[25177.965391] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
[25177.966456] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>] read_hpet+0x16/0x20
[25177.967525] RSP: 0018:ffff88024e203e38  EFLAGS: 00000046
[25177.968586] RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
[25177.969651] RDX: 0000000000000000 RSI: ffff88024e20c710 RDI: ffffffff81c26f40
[25177.970712] RBP: ffff88024e203e38 R08: 0000000000000000 R09: 000000000000000f
[25177.971777] R10: 0000000000000526 R11: 0000000000aaaaaa R12: 000016e982361117
[25177.972846] R13: 0000000000000000 R14: ffff88024e20c700 R15: 000016eacc784674
[25177.973915] FS:  00007ff08e78d740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25177.974998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25177.976058] CR2: 0000000000000001 CR3: 0000000229d22000 CR4: 00000000001407f0
[25177.977103] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25177.978139] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25177.979151] Stack:
[25177.980134]  ffff88024e203e68 ffffffff810e0d3e ffff88024e20ca80 ffff88024e20ca80
[25177.981123]  ffff880197e0fe38 ffff88024e20c6c0 ffff88024e203e98 ffffffff810e9cd3
[25177.982089]  ffff88024e203f28 ffff88024e20ca80 ffff88024e203f28 ffff88024e20c6c0
[25177.983029] Call Trace:
[25177.983943]  <IRQ> 

[25177.984839]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[25177.985714]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[25177.986581]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[25177.987439]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[25177.988296]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[25177.989149]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[25177.990007]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[25177.990862]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[25177.991714]  <EOI> 

[25177.992558]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[25177.993396]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[25177.994233]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[25177.995063]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
[25177.995891]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25177.996712] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 d3 c8 ec 00 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d 29 c8 ec 00 
[25177.998518] NMI backtrace for cpu 2
[25177.999377] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25178.001121] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25178.002019] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25178.002932] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25178.003849] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25178.004771] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25178.005688] RBP: ffff88024470fe38 R08: ffff88024e28ed08 R09: 00000000ffffffff
[25178.006609] R10: 000000000000265b R11: 00000000000003ff R12: 0000000000000005
[25178.007532] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[25178.008453] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25178.009379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25178.010307] CR2: 00007f6b1f71a000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25178.011246] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25178.012181] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25178.013112] Stack:
[25178.014036]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[25178.014985]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25178.015936]  000016e99997935a ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25178.016893] Call Trace:
[25178.017836]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25178.018791]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25178.019719]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25178.020625]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25178.021526] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25178.023504] NMI backtrace for cpu 3
[25178.024417] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25178.026229] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25178.027145] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25178.028067] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25178.028983] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25178.029906] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25178.030826] RBP: ffff88024471be38 R08: ffff88024e2ced08 R09: 00000000ffffffff
[25178.031744] R10: 0000000000002656 R11: 00000000000003ff R12: 0000000000000005
[25178.032657] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[25178.033586] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25178.034496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25178.035401] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25178.036319] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25178.037229] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25178.038130] Stack:
[25178.039023]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[25178.039948]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25178.040867]  000016e99997b046 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25178.041793] Call Trace:
[25178.042713]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25178.043663]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25178.044589]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25178.045515]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25178.046441] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25183.090076] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 88.570 msecs
[25183.090087] sched: RT throttling activated
[25205.896861] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [trinity-c65:8051]
[25205.897959] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25205.904284] CPU: 1 PID: 8051 Comm: trinity-c65 Tainted: G             L 3.18.0+ #108
[25205.906737] task: ffff8800a1b98000 ti: ffff880071a70000 task.ti: ffff880071a70000
[25205.907986] RIP: 0010:[<ffffffff8178ecca>]  [<ffffffff8178ecca>] __slab_alloc+0x4e5/0x53b
[25205.909296] RSP: 0018:ffff880071a739f8  EFLAGS: 00000246
[25205.910532] RAX: 0000000000000001 RBX: ffff88022f966c40 RCX: 0000000180240022
[25205.911789] RDX: 0000000000000024 RSI: ffffea0008be5800 RDI: ffff880245827cc0
[25205.913018] RBP: ffff880071a73ac8 R08: ffff88022f965138 R09: 0000000000000000
[25205.914222] R10: 0000000000000092 R11: 0000000000000000 R12: ffffffff810133af
[25205.915459] R13: ffff880071a73978 R14: 0000000100240021 R15: ffffffff8134f019
[25205.916728] FS:  00007f2805635740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25205.917955] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25205.919158] CR2: 0000000001a86a10 CR3: 0000000097832000 CR4: 00000000001407e0
[25205.920348] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25205.921514] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25205.922678] Stack:
[25205.923828]  ffff880071a73a68 ffffffff811b7057 ffff8800a1b98000 ffff8800a1b98000
[25205.925006]  0000000000000000 ffff88022f9669f0 0000000200000000 ffffffff8134f019
[25205.926179]  ffffffff00000000 0000000000000246 0000000000000000 0000000000000000
[25205.927394] Call Trace:
[25205.928579]  [<ffffffff811b7057>] ? alloc_pages_vma+0x97/0x160
[25205.929762]  [<ffffffff8134f019>] ? __radix_tree_preload+0x49/0xc0
[25205.930972]  [<ffffffff8117ccde>] ? shmem_alloc_page+0x6e/0xc0
[25205.932185]  [<ffffffff8134fd05>] ? __radix_tree_create+0x85/0x220
[25205.933398]  [<ffffffff8134f019>] ? __radix_tree_preload+0x49/0xc0
[25205.934603]  [<ffffffff811c012b>] kmem_cache_alloc+0x1bb/0x1e0
[25205.935825]  [<ffffffff8134f019>] __radix_tree_preload+0x49/0xc0
[25205.937011]  [<ffffffff8134f101>] radix_tree_maybe_preload+0x21/0x30
[25205.938175]  [<ffffffff8117fa46>] shmem_getpage_gfp+0x466/0x810
[25205.939323]  [<ffffffff8109bd1a>] ? finish_task_switch+0x4a/0x100
[25205.940453]  [<ffffffff8117fe2f>] shmem_write_begin+0x3f/0x60
[25205.941549]  [<ffffffff811652e4>] generic_perform_write+0xd4/0x1f0
[25205.942627]  [<ffffffff8116789f>] __generic_file_write_iter+0x15f/0x350
[25205.943694]  [<ffffffff811d5140>] ? new_sync_read+0xd0/0xd0
[25205.944754]  [<ffffffff81167acd>] generic_file_write_iter+0x3d/0xb0
[25205.945813]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25205.946885]  [<ffffffff811d5288>] do_iter_readv_writev+0x78/0xc0
[25205.948034]  [<ffffffff811d6a18>] do_readv_writev+0xd8/0x2a0
[25205.949150]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25205.950297]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25205.951463]  [<ffffffff810dac62>] ? __hrtimer_start_range_ns+0x252/0x380
[25205.952586]  [<ffffffff810dadc8>] ? hrtimer_start+0x18/0x20
[25205.953678]  [<ffffffff811d6c69>] vfs_writev+0x39/0x50
[25205.954747]  [<ffffffff811d6dc9>] SyS_writev+0x59/0xf0
[25205.955854]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25205.956941] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 07 b4 9a ff 4c 89 e0 eb 0f e8 fd b4 9a ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 75 c8 65 48 33 34 25 28 00 00 00 74 32 e8 ff 79 
[25205.959247] sending NMI to other CPUs:
[25205.960438] NMI backtrace for cpu 2
[25205.961454] CPU: 2 PID: 7491 Comm: trinity-c14 Tainted: G             L 3.18.0+ #108
[25205.963519] task: ffff88022b98a0d0 ti: ffff880096408000 task.ti: ffff880096408000
[25205.964574] RIP: 0010:[<ffffffff811741ea>]  [<ffffffff811741ea>] __lru_cache_add+0x4a/0x90
[25205.965645] RSP: 0018:ffff88009640bb38  EFLAGS: 00000293
[25205.966690] RAX: 000000000000000c RBX: ffff88024e28e0c0 RCX: 000000000001602a
[25205.967724] RDX: 000000000000000d RSI: 0000000000000018 RDI: ffffea00042410c0
[25205.968745] RBP: ffff88009640bb48 R08: ffff8801020dae90 R09: 0000000000000000
[25205.969761] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea00042410c0
[25205.970773] R13: 0000000000000000 R14: ffff880096339f18 R15: 0000000000000000
[25205.971763] FS:  00007f2805635740(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25205.972737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25205.973689] CR2: 0000003370219050 CR3: 000000009ab9e000 CR4: 00000000001407e0
[25205.974628] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25205.975543] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25205.976440] Stack:
[25205.977315]  0000000000000003 0000000000000000 ffff88009640bb58 ffffffff81174919
[25205.978198]  ffff88009640bc18 ffffffff8117fa80 ffff88009640bb98 ffffffff8109bdb0
[25205.979071]  ffff8802292c8980 ffff88009640bc80 0000000000000000 00000000000000d0
[25205.979939] Call Trace:
[25205.980791]  [<ffffffff81174919>] lru_cache_add_anon+0x19/0x20
[25205.981659]  [<ffffffff8117fa80>] shmem_getpage_gfp+0x4a0/0x810
[25205.982526]  [<ffffffff8109bdb0>] ? finish_task_switch+0xe0/0x100
[25205.983385]  [<ffffffff8117fe2f>] shmem_write_begin+0x3f/0x60
[25205.984245]  [<ffffffff811652e4>] generic_perform_write+0xd4/0x1f0
[25205.985098]  [<ffffffff8116789f>] __generic_file_write_iter+0x15f/0x350
[25205.985946]  [<ffffffff811d5140>] ? new_sync_read+0xd0/0xd0
[25205.986787]  [<ffffffff81167acd>] generic_file_write_iter+0x3d/0xb0
[25205.987629]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25205.988479]  [<ffffffff811d5288>] do_iter_readv_writev+0x78/0xc0
[25205.989323]  [<ffffffff811d6a18>] do_readv_writev+0xd8/0x2a0
[25205.990156]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25205.990994]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25205.991822]  [<ffffffff810dac62>] ? __hrtimer_start_range_ns+0x252/0x380
[25205.992644]  [<ffffffff810dadc8>] ? hrtimer_start+0x18/0x20
[25205.993461]  [<ffffffff811d6c69>] vfs_writev+0x39/0x50
[25205.994277]  [<ffffffff811d6dc9>] SyS_writev+0x59/0xf0
[25205.995089]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25205.995900] Code: 48 03 1c 25 68 bd 00 00 48 8b 07 f6 c4 80 75 47 41 8b 44 24 1c 85 c0 7e 4c f0 41 ff 44 24 1c 48 8b 03 83 f8 0e 74 1a 48 8d 50 01 <48> 89 13 4c 89 64 c3 10 5b 65 ff 0c 25 60 a8 00 00 41 5c 5d c3 
[25205.997698] NMI backtrace for cpu 3
[25205.998563] CPU: 3 PID: 10185 Comm: trinity-c162 Tainted: G             L 3.18.0+ #108
[25206.000332] task: ffff880197ea2bc0 ti: ffff8801c6410000 task.ti: ffff8801c6410000
[25206.001236] RIP: 0010:[<ffffffff810ee9ce>]  [<ffffffff810ee9ce>] generic_exec_single+0xee/0x1b0
[25206.002161] RSP: 0018:ffff8801c64139c8  EFLAGS: 00000202
[25206.003077] RAX: 0000000000000008 RBX: ffff8801c64139e0 RCX: 0000000000000038
[25206.004003] RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
[25206.004926] RBP: ffff8801c6413a28 R08: ffff8802444f47e0 R09: 0000000000000000
[25206.005848] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[25206.006769] R13: 0000000000000001 R14: ffff880229f27a00 R15: 0000000000000003
[25206.007683] FS:  00007f2805635740(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25206.008612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25206.009539] CR2: 00007f28048e2220 CR3: 000000021ca6c000 CR4: 00000000001407e0
[25206.010484] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25206.011426] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25206.012367] Stack:
[25206.013300]  ffffffff8124055d ffff8801e640ddc0 ffffffff8124055d 0000000000000000
[25206.014263]  ffffffff81047bb0 ffff8801c6413ad8 0000000000000003 00000000e731055f
[25206.015228]  ffff8801c6413a48 00000000ffffffff 0000000000000001 ffffffff81047bb0
[25206.016172] Call Trace:
[25206.017090]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25206.018026]  [<ffffffff8124055d>] ? proc_alloc_inode+0x1d/0xb0
[25206.018958]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25206.019879]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25206.020766]  [<ffffffff810eeb30>] smp_call_function_single+0x70/0xd0
[25206.021637]  [<ffffffff81047bb0>] ? do_flush_tlb_all+0x60/0x60
[25206.022498]  [<ffffffff810ef229>] smp_call_function_many+0x2b9/0x320
[25206.023356]  [<ffffffff81184681>] ? zone_statistics+0x81/0xa0
[25206.024210]  [<ffffffff81047ed0>] flush_tlb_mm_range+0x90/0x1b0
[25206.025067]  [<ffffffff81190d62>] tlb_flush_mmu_tlbonly+0x42/0x50
[25206.025914]  [<ffffffff811922c8>] unmap_single_vma+0x6d8/0x900
[25206.026757]  [<ffffffff811925ec>] zap_page_range_single+0xfc/0x160
[25206.027601]  [<ffffffff811927cf>] unmap_mapping_range+0x12f/0x190
[25206.028439]  [<ffffffff8118193d>] shmem_fallocate+0x38d/0x4c0
[25206.029270]  [<ffffffff811d38af>] do_fallocate+0x12f/0x1d0
[25206.030089]  [<ffffffff811a6995>] SyS_madvise+0x385/0x860
[25206.030910]  [<ffffffff811c029e>] ? kmem_cache_free+0x14e/0x1e0
[25206.031733]  [<ffffffff811e6029>] ? putname+0x29/0x40
[25206.032546]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25206.033361]  [<ffffffff817994c4>] tracesys_phase2+0xd4/0xd9
[25206.034170] Code: 48 89 de 48 03 14 c5 e0 bf cf 81 48 89 df e8 5a eb 26 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 7d d8 65 48 33 3c 25 28 00 00 00 0f 85 98 00 
[25206.035960] NMI backtrace for cpu 0
[25206.036822] CPU: 0 PID: 9587 Comm: trinity-c129 Tainted: G             L 3.18.0+ #108
[25206.038638] task: ffff88009a8695e0 ti: ffff880039f48000 task.ti: ffff880039f48000
[25206.039565] RIP: 0033:[<000000336ee891e5>]  [<000000336ee891e5>] 0x336ee891e5
[25206.040510] RSP: 002b:00007fffdd789658  EFLAGS: 00000202
[25206.041437] RAX: 0000000001b33250 RBX: 0000000001b33240 RCX: 000000000202fe00
[25206.042378] RDX: 0000000002533240 RSI: 0000000000000054 RDI: 0000000001b33250
[25206.043317] RBP: 0000000000a00010 R08: 0000000001b33240 R09: 000000000000002e
[25206.044255] R10: 0000000000003666 R11: 0000000000000246 R12: 0000000002533250
[25206.045186] R13: 0000000000020db0 R14: 0000000000000000 R15: 0000000000000000
[25206.046117] FS:  00007f2805635740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25206.047047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25206.047954] CR2: 00000000019fcad0 CR3: 000000009edfa000 CR4: 00000000001407f0
[25206.048866] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25206.049779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600

[25206.051604] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 91.159 msecs
[25233.880621] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c251:17822]
[25233.881750] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25233.887204] CPU: 1 PID: 17822 Comm: trinity-c251 Tainted: G             L 3.18.0+ #108
[25233.889315] task: ffff8801c65ec1a0 ti: ffff88009b7b0000 task.ti: ffff88009b7b0000
[25233.890406] RIP: 0010:[<ffffffff81356ec5>]  [<ffffffff81356ec5>] copy_user_enhanced_fast_string+0x5/0x10
[25233.891548] RSP: 0018:ffff88009b7b3be0  EFLAGS: 00010206
[25233.892666] RAX: 00007f95d93d2239 RBX: ffffffff8117faa1 RCX: 00000000000000a0
[25233.893801] RDX: 0000000000001000 RSI: 00007f95d93d3199 RDI: ffff88019049ef60
[25233.894943] RBP: ffff88009b7b3c28 R08: ffff88022f8d1668 R09: 0000000000000000
[25233.896102] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[25233.897252] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
[25233.898395] FS:  00007f95dc296740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25233.899559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25233.900733] CR2: 00007f95da9e0bbe CR3: 0000000095a94000 CR4: 00000000001407e0
[25233.901878] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25233.903009] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25233.904151] Stack:
[25233.905253]  ffffffff8118e606 ffff88009b7b3bf8 000000001bb22000 0000000000001000
[25233.906364]  000000001bb22000 0000000000001000 ffff88009b7b3d60 0000000000000000
[25233.907478]  ffff8800a2518d28 ffff88009b7b3cc8 ffffffff81165307 ffff88009b7b3c88
[25233.908613] Call Trace:
[25233.909759]  [<ffffffff8118e606>] ? iov_iter_copy_from_user_atomic+0x156/0x180
[25233.910938]  [<ffffffff81165307>] generic_perform_write+0xf7/0x1f0
[25233.912164]  [<ffffffff8116789f>] __generic_file_write_iter+0x15f/0x350
[25233.913376]  [<ffffffff811d5140>] ? new_sync_read+0xd0/0xd0
[25233.914574]  [<ffffffff81167acd>] generic_file_write_iter+0x3d/0xb0
[25233.915730]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25233.916890]  [<ffffffff811d5288>] do_iter_readv_writev+0x78/0xc0
[25233.918061]  [<ffffffff811d6a18>] do_readv_writev+0xd8/0x2a0
[25233.919202]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25233.920376]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25233.921584]  [<ffffffff810dac62>] ? __hrtimer_start_range_ns+0x252/0x380
[25233.922742]  [<ffffffff810dadc8>] ? hrtimer_start+0x18/0x20
[25233.923883]  [<ffffffff811d6c69>] vfs_writev+0x39/0x50
[25233.925034]  [<ffffffff811d6dc9>] SyS_writev+0x59/0xf0
[25233.926189]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25233.927320] Code: 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f 1f 00 89 d1 <f3> a4 31 c0 0f 1f 00 c3 90 90 90 0f 1f 00 83 fa 08 0f 82 95 00 
[25233.929775] sending NMI to other CPUs:
[25233.930989] NMI backtrace for cpu 0
[25233.932037] CPU: 0 PID: 17755 Comm: trinity-c184 Tainted: G             L 3.18.0+ #108
[25233.934172] task: ffff88022d25cc90 ti: ffff880229f78000 task.ti: ffff880229f78000
[25233.935239] RIP: 0010:[<ffffffff81593482>]  [<ffffffff81593482>] usb_unanchor_urb+0x12/0x60
[25233.936302] RSP: 0018:ffff88024e203d50  EFLAGS: 00000082
[25233.937352] RAX: 0000000000000000 RBX: ffff880242a2ecb0 RCX: 0000000000000000
[25233.938412] RDX: 000000000000000e RSI: 0000000000000046 RDI: ffff880242a2ecb0
[25233.939464] RBP: ffff88024e203d68 R08: ffffffff81f5dea8 R09: 00000000000fffe2
[25233.940527] R10: fffffffff8000000 R11: 0000000000000004 R12: ffff88009f014f38
[25233.941592] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880242080000
[25233.942660] FS:  00007f95dc296740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25233.943738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25233.944808] CR2: 0000003370219050 CR3: 000000022e902000 CR4: 00000000001407f0
[25233.945886] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25233.946955] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25233.948024] Stack:
[25233.949088]  ffff88009f014f38 0000000000000000 0000000000000000 ffff88024e203d98
[25233.950171]  ffffffff8158fc9c ffff880241d1d6c0 ffff880242a2ecb0 ffff8802416a4558
[25233.951243]  ffff880241d1d6c0 ffff88024e203dc8 ffffffff8158fec0 ffff88024e203dc8
[25233.952305] Call Trace:
[25233.953331]  <IRQ> 

[25233.954339]  [<ffffffff8158fc9c>] __usb_hcd_giveback_urb+0x5c/0x120
[25233.955323]  [<ffffffff8158fec0>] usb_hcd_giveback_urb+0x40/0xf0
[25233.956290]  [<ffffffff815d276f>] xhci_irq+0xccf/0x1bc0
[25233.957234]  [<ffffffff815d3671>] xhci_msi_irq+0x11/0x20
[25233.958161]  [<ffffffff810c4d2e>] handle_irq_event_percpu+0x3e/0x1c0
[25233.959072]  [<ffffffff810c4ef2>] handle_irq_event+0x42/0x70
[25233.959958]  [<ffffffff810c801f>] handle_edge_irq+0x7f/0x150
[25233.960829]  [<ffffffff81005371>] handle_irq+0xb1/0x140
[25233.961691]  [<ffffffff8179bbb3>] do_IRQ+0x53/0x100
[25233.962544]  [<ffffffff81799daf>] common_interrupt+0x6f/0x6f
[25233.963393]  <EOI> 

[25233.964240]  [<ffffffff81165a08>] ? unlock_page+0x18/0x90
[25233.965088]  [<ffffffff81180d42>] shmem_undo_range+0x1b2/0x710
[25233.965941]  [<ffffffff811812b8>] shmem_truncate_range+0x18/0x40
[25233.966786]  [<ffffffff81181520>] shmem_setattr+0x110/0x1a0
[25233.967622]  [<ffffffff811f3361>] notify_change+0x241/0x390
[25233.968453]  [<ffffffff811d3323>] do_truncate+0x73/0xc0
[25233.969279]  [<ffffffff811d36cc>] do_sys_ftruncate.constprop.14+0x10c/0x160
[25233.970113]  [<ffffffff8135862e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[25233.970943]  [<ffffffff811d375e>] SyS_ftruncate+0xe/0x10
[25233.971761]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25233.972581] Code: 00 49 8d 7c 24 10 31 c9 ba 01 00 00 00 be 03 00 00 00 e8 42 50 b2 ff eb cd 0f 1f 44 00 00 55 48 85 ff 48 89 e5 41 56 41 55 41 54 <53> 48 89 fb 74 35 4c 8b 67 40 4d 85 e4 74 2c 4d 8d 6c 24 28 4c 
[25233.974377] NMI backtrace for cpu 3
[25233.975270] CPU: 3 PID: 17777 Comm: trinity-c206 Tainted: G             L 3.18.0+ #108
[25233.977090] task: ffff880228aa2bc0 ti: ffff88022936c000 task.ti: ffff88022936c000
[25233.978013] RIP: 0010:[<ffffffff81165a15>]  [<ffffffff81165a15>] unlock_page+0x25/0x90
[25233.978949] RSP: 0018:ffff88022936fbf8  EFLAGS: 00000202
[25233.979880] RAX: ffffea0008ee2d80 RBX: ffffea0008ee2d80 RCX: 00000000146c9000
[25233.980816] RDX: 002ffe000008001c RSI: 9e37fffffffc0001 RDI: ffffea0008ee2d80
[25233.981761] RBP: ffff88022936fbf8 R08: 0000000000001000 R09: ffffea0008ee2d80
[25233.982702] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000001000
[25233.983653] R13: ffff88022936fd60 R14: 0000000000000000 R15: ffff8800a251da08
[25233.984603] FS:  00007f95dc296740(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25233.985565] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25233.986528] CR2: 0000000000000008 CR3: 000000022d223000 CR4: 00000000001407e0
[25233.987505] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25233.988484] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25233.989436] Stack:
[25233.990361]  ffff88022936fc28 ffffffff8117f380 00000000146c8000 0000000000001000
[25233.991308]  ffff88022936fd60 0000000000000000 ffff88022936fcc8 ffffffff8116532a
[25233.992263]  ffff88022936fd68 0000000000001000 ffff880228aa2bc0 00000000811f1681
[25233.993212] Call Trace:
[25233.994140]  [<ffffffff8117f380>] shmem_write_end+0x40/0xf0
[25233.995056]  [<ffffffff8116532a>] generic_perform_write+0x11a/0x1f0
[25233.995969]  [<ffffffff8116789f>] __generic_file_write_iter+0x15f/0x350
[25233.996873]  [<ffffffff811d5140>] ? new_sync_read+0xd0/0xd0
[25233.997776]  [<ffffffff81167acd>] generic_file_write_iter+0x3d/0xb0
[25233.998673]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25233.999570]  [<ffffffff811d5288>] do_iter_readv_writev+0x78/0xc0
[25234.000462]  [<ffffffff811d6a18>] do_readv_writev+0xd8/0x2a0
[25234.001357]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25234.002277]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25234.003155]  [<ffffffff810dac62>] ? __hrtimer_start_range_ns+0x252/0x380
[25234.004045]  [<ffffffff810dadc8>] ? hrtimer_start+0x18/0x20
[25234.004942]  [<ffffffff811d6c69>] vfs_writev+0x39/0x50
[25234.005827]  [<ffffffff811d6dc9>] SyS_writev+0x59/0xf0
[25234.006705]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25234.007581] Code: 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 f8 48 8b 17 48 89 e5 83 e2 01 74 60 f0 80 27 fe 48 be 01 00 fc ff ff ff 37 9e 48 8b 17 <48> 0f af f7 48 89 d1 48 c1 ea 34 83 e2 03 48 c1 e9 36 48 8d 14 
[25234.009513] NMI backtrace for cpu 2
[25234.010438] CPU: 2 PID: 17723 Comm: trinity-c152 Tainted: G             L 3.18.0+ #108
[25234.012415] task: ffff880229a50af0 ti: ffff88022e918000 task.ti: ffff88022e918000
[25234.013417] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>] read_hpet+0x16/0x20
[25234.014445] RSP: 0018:ffff88022e91be58  EFLAGS: 00000046
[25234.015465] RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004
[25234.016505] RDX: 0000000000004949 RSI: ffff88022e91be98 RDI: ffffffff81c26f40
[25234.017547] RBP: ffff88022e91be58 R08: 0000000000000004 R09: 00000000004323f1
[25234.018587] R10: 0000000000000004 R11: 0000000000000202 R12: 000016f68c124117
[25234.019640] R13: 0000000000000000 R14: ffff880229825108 R15: 0000000000000000
[25234.020693] FS:  00007f95dc296740(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25234.021740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25234.022761] CR2: 00007f95dc055220 CR3: 000000014fecd000 CR4: 00000000001407e0
[25234.023813] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25234.024844] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25234.025871] Stack:
[25234.026890]  ffff88022e91be88 ffffffff810e0d3e ffff88022e91be98 ffff880229825108
[25234.027926]  000016f6dfac1843 ffff88022e91bf18 ffff88022e91beb8 ffffffff810da9da
[25234.028977]  0000000000000086 00000000749a29d5 ffff880229825108 ffff88022e91bf38
[25234.030035] Call Trace:
[25234.031074]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[25234.032113]  [<ffffffff810da9da>] hrtimer_get_remaining+0x3a/0x70
[25234.033154]  [<ffffffff810dbbf6>] itimer_get_remtime+0x16/0x60
[25234.034172]  [<ffffffff810dc242>] do_setitimer+0xb2/0x260
[25234.035175]  [<ffffffff810dc439>] alarm_setitimer+0x49/0x90
[25234.036173]  [<ffffffff810da3ae>] SyS_alarm+0xe/0x20
[25234.037171]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25234.038136] Code: 00 29 c7 ba 00 00 00 00 b8 c2 ff ff ff 83 ff 7f 5d 0f 4f c2 c3 0f 1f 44 00 00 55 48 8b 05 d3 c8 ec 00 48 89 e5 8b 80 f0 00 00 00 <89> c0 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 8b 0d 29 c8 ec 00 
[25234.040197] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 109.194 msecs
[25261.864386] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c24:30558]
[25261.865435] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25261.871111] CPU: 1 PID: 30558 Comm: trinity-c24 Tainted: G             L 3.18.0+ #108
[25261.873609] task: ffff88009b63abc0 ti: ffff8802283d4000 task.ti: ffff8802283d4000
[25261.874875] RIP: 0033:[<0000000000412fc8>]  [<0000000000412fc8>] 0x412fc8
[25261.876181] RSP: 002b:00007fff73bbaf70  EFLAGS: 00000202
[25261.877378] RAX: 00000000759aa432 RBX: ffffffff81799dfa RCX: 0000000000000038
[25261.878584] RDX: 000000000bc2aa05 RSI: 00007fff73bbaf4c RDI: 000000336f1b76e0
[25261.879738] RBP: 0000000000000002 R08: 000000336f1b70e8 R09: 000000336f1b7140
[25261.880885] R10: 0000000000000000 R11: 0000000000000206 R12: ffff8802283d7f78
[25261.882058] R13: 0000000000000000 R14: ffff8802283d4000 R15: ffff88009b63abc0
[25261.883248] FS:  00007f95dc296740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25261.884396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25261.885546] CR2: 0000000000000001 CR3: 0000000235a45000 CR4: 00000000001407e0
[25261.886701] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25261.887903] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600

[25261.890230] sending NMI to other CPUs:
[25261.891374] NMI backtrace for cpu 0
[25261.892452] CPU: 0 PID: 30551 Comm: trinity-c4 Tainted: G             L 3.18.0+ #108
[25261.894667] task: ffff880229a515e0 ti: ffff88000e014000 task.ti: ffff88000e014000
[25261.895791] RIP: 0010:[<ffffffff81799260>]  [<ffffffff81799260>] system_call+0x0/0x3
[25261.896927] RSP: 0018:00007fff73bbafa8  EFLAGS: 00000046
[25261.898061] RAX: 000000000000003d RBX: 0000000000007780 RCX: 000000336eebc2fc
[25261.899206] RDX: 000000000000000b RSI: 00007fff73bbafb0 RDI: 0000000000007780
[25261.900327] RBP: 0000000000000000 R08: 00007f95dc296740 R09: 0000000000000000
[25261.901427] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f95dc0cb000
[25261.902524] R13: 00007f95dc0cb068 R14: 0000000000000000 R15: 0000000000000000
[25261.903615] FS:  00007f95dc296740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25261.904695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25261.905755] CR2: 000000336f1b7740 CR3: 0000000095b7d000 CR4: 00000000001407f0
[25261.906819] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25261.907870] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25261.908922] Stack:
[25261.909964]  000000000041401b 00007f95dc216768 00007f95dc0cb000 0000000000000000
[25261.911029]  00007f95dc0cb07c 00007f95dc0cb000 0000000000000000 00007f95dc0cb07c
[25261.912090]  0000000000416c03 00000000000042bc 0000000000000017 00000000cccccccd
[25261.913152] Call Trace:
[25261.914193]  <UNK> 
[25261.914200] Code: 
[25261.915236] 8b 3c 24 4c 8b 74 24 08 4c 8b 6c 24 10 4c 8b 64 24 18 48 8b 6c 24 20 48 8b 5c 24 28 48 83 c4 30 e9 6f 02 00 00 66 0f 1f 44 00 00 <0f> 01 f8 65 48 89 24 25 80 a0 00 00 65 48 8b 24 25 88 a8 00 00 
[25261.917470] NMI backtrace for cpu 3
[25261.918589] CPU: 3 PID: 30572 Comm: trinity-c230 Tainted: G             L 3.18.0+ #108
[25261.920852] task: ffff8800967736b0 ti: ffff88022cbc0000 task.ti: ffff88022cbc0000
[25261.922001] RIP: 0010:[<ffffffff8118f1c4>]  [<ffffffff8118f1c4>] iov_iter_fault_in_readable+0x64/0x80
[25261.923165] RSP: 0018:ffff88022cbc3c18  EFLAGS: 00000206
[25261.924316] RAX: 0000000000000000 RBX: 00000000029ef000 RCX: 0000000000001007
[25261.925475] RDX: 00007f95d952b800 RSI: 0000000000001000 RDI: 000000000001e800
[25261.926633] RBP: ffff88022cbc3c28 R08: ffff880066aca360 R09: 0000000000094524
[25261.927794] R10: ffff88022cbc3af8 R11: 00000000000003ff R12: 0000000000001000
[25261.928954] R13: ffff88022cbc3d60 R14: 0000000000001000 R15: ffff8800a2518d28
[25261.930112] FS:  00007f95dc296740(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25261.931280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25261.932458] CR2: 00007f95d9a2036e CR3: 000000022b2d6000 CR4: 00000000001407e0
[25261.933618] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25261.934753] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25261.935873] Stack:
[25261.936974]  0000000000001000 00ff8800a2518d28 ffff88022cbc3cc8 ffffffff811652b8
[25261.938081]  ffff88022cbc3c98 0000000000001000 ffff8800967736b0 00000000811f1681
[25261.939164]  0000000000f60000 0000000000000000 ffff88009f3c9400 ffffffff81828380
[25261.940227] Call Trace:
[25261.941261]  [<ffffffff811652b8>] generic_perform_write+0xa8/0x1f0
[25261.942290]  [<ffffffff8116789f>] __generic_file_write_iter+0x15f/0x350
[25261.943300]  [<ffffffff811d5140>] ? new_sync_read+0xd0/0xd0
[25261.944291]  [<ffffffff81167acd>] generic_file_write_iter+0x3d/0xb0
[25261.945270]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25261.946243]  [<ffffffff811d5288>] do_iter_readv_writev+0x78/0xc0
[25261.947210]  [<ffffffff811d6a18>] do_readv_writev+0xd8/0x2a0
[25261.948174]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25261.949141]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25261.950106]  [<ffffffff810dac62>] ? __hrtimer_start_range_ns+0x252/0x380
[25261.951068]  [<ffffffff810dadc8>] ? hrtimer_start+0x18/0x20
[25261.952027]  [<ffffffff811d6c69>] vfs_writev+0x39/0x50
[25261.952982]  [<ffffffff811d6dc9>] SyS_writev+0x59/0xf0
[25261.953928]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25261.954865] Code: 10 0f 1f 00 40 8a 39 0f 1f 00 85 d2 40 88 7d ff 75 2d 48 63 f6 48 8d 54 31 ff 48 31 d1 48 f7 c1 00 f0 ff ff 74 0f 0f 1f 00 8a 12 <0f> 1f 00 88 55 ff 0f b6 55 ff c9 f3 c3 0f 1f 80 00 00 00 00 c9 
[25261.956947] NMI backtrace for cpu 2
[25261.957928] CPU: 2 PID: 30581 Comm: trinity-c42 Tainted: G             L 3.18.0+ #108
[25261.959978] task: ffff88022c1f2bc0 ti: ffff8800a2604000 task.ti: ffff8800a2604000
[25261.961020] RIP: 0033:[<00007fff73bd3bfe>]  [<00007fff73bd3bfe>] 0x7fff73bd3bfe
[25261.962065] RSP: 002b:00007fff73bbaf20  EFLAGS: 00000246
[25261.963081] RAX: 0000000043481221 RBX: 00007f95dbedd068 RCX: 000000336eeef4b9
[25261.964103] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 00007f95dbedd000
[25261.965115] RBP: 00007fff73bbaf20 R08: 002b34bd7fef2fdf R09: 00000000004d1e4c
[25261.966125] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f95dbedd000
[25261.967135] R13: 00000000000001ff R14: 0000000000000000 R15: 00007f95dc2966a0
[25261.968161] FS:  00007f95dc296740(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25261.969187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25261.970198] CR2: 0000000000000008 CR3: 0000000040239000 CR4: 00000000001407e0
[25261.971236] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25261.972260] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600

[25289.848160] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c27:31317]
[25289.849258] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25289.855194] CPU: 1 PID: 31317 Comm: trinity-c27 Tainted: G             L 3.18.0+ #108
[25289.857533] task: ffff8802415a0af0 ti: ffff8802290e4000 task.ti: ffff8802290e4000
[25289.858723] RIP: 0010:[<ffffffff81356c07>]  [<ffffffff81356c07>] clear_page_c_e+0x7/0x10
[25289.859915] RSP: 0000:ffff8802290e7ce0  EFLAGS: 00010246
[25289.861099] RAX: 0000000000000000 RBX: ffff8802415a0af0 RCX: 00000000000001c0
[25289.862267] RDX: 0000000000007a55 RSI: 0000000000000ead RDI: ffff880123d7ae40
[25289.863466] RBP: ffff8802290e7d18 R08: ffffea00048f0000 R09: 00000000000160f0
[25289.864660] R10: ffff88024e5d4d80 R11: 0000000000000000 R12: 0000000000000000
[25289.865808] R13: ffff8802415a0af0 R14: ffff88024e5d5b00 R15: 0000000200000000
[25289.866970] FS:  00007fd3c5236740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25289.868134] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25289.869294] CR2: 0000000001c00000 CR3: 0000000227866000 CR4: 00000000001407e0
[25289.870491] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25289.871683] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25289.872843] Stack:
[25289.874017]  ffffffff81196633 ffff880235814070 0000000001c00000 0000000000000000
[25289.875201]  ffff8800967dcc00 ffff880235814070 ffff8801c64b4600 ffff8802290e7d78
[25289.876372]  ffffffff811ca951 ffff8802290e7e18 ffffffff8115f002 ffffea00048f0000
[25289.877537] Call Trace:
[25289.878705]  [<ffffffff81196633>] ? clear_huge_page+0xa3/0x160
[25289.879878]  [<ffffffff811ca951>] do_huge_pmd_anonymous_page+0x151/0x3b0
[25289.881085]  [<ffffffff8115f002>] ? __perf_sw_event+0x1b2/0x200
[25289.882274]  [<ffffffff81194c1b>] handle_mm_fault+0x14b/0xe90
[25289.883439]  [<ffffffff810413cc>] __do_page_fault+0x20c/0x610
[25289.884554]  [<ffffffff8119b61d>] ? do_brk+0x24d/0x350
[25289.885663]  [<ffffffff8135866a>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[25289.886786]  [<ffffffff810417dc>] do_page_fault+0xc/0x10
[25289.887911]  [<ffffffff8179ae62>] page_fault+0x22/0x30
[25289.889033] Code: bc 0f 1f 00 e8 fb fa d1 ff 90 90 90 90 90 90 90 90 90 90 90 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 66 0f 1f 44 00 00 eb ee 0f 1f 84 00 00 00 00 00 0f 1f 
[25289.891387] sending NMI to other CPUs:
[25289.892488] NMI backtrace for cpu 2
[25289.893575] CPU: 2 PID: 350 Comm: trinity-c163 Tainted: G             L 3.18.0+ #108
[25289.895771] task: ffff8802293236b0 ti: ffff88009ab34000 task.ti: ffff88009ab34000
[25289.896886] RIP: 0010:[<ffffffff8136544a>]  [<ffffffff8136544a>] __list_add+0x1a/0xc0
[25289.898020] RSP: 0000:ffff88009ab37b08  EFLAGS: 00000092
[25289.899149] RAX: 0000000000000030 RBX: ffffea00079e7c20 RCX: 000000000000000d
[25289.900265] RDX: ffffea00079e7c60 RSI: ffff88024e296120 RDI: ffffea00079e7c20
[25289.901361] RBP: ffff88009ab37b28 R08: ffff88024e296120 R09: 00000000001e79f0
[25289.902457] R10: 0000000000000002 R11: 0000000000000002 R12: ffffea00079e7c60
[25289.903549] R13: ffff88024e296120 R14: 0000000000000002 R15: 0000000000000202
[25289.904623] FS:  00007fd3c5236740(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25289.905686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25289.906746] CR2: 000000336ef629f2 CR3: 00000002359ea000 CR4: 00000000001407e0
[25289.907809] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25289.908869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25289.909925] Stack:
[25289.910974]  002ffe000008000c ffffea00079e7c00 ffff88024e2960f0 0000000000000000
[25289.912045]  ffff88009ab37b78 ffffffff8116d218 00000000001e79f0 ffff88024e5d4d80
[25289.913122]  ffff880238414300 ffffea00079e7cc0 ffffea00079e7c00 0000000000000000
[25289.914193] Call Trace:
[25289.915250]  [<ffffffff8116d218>] free_hot_cold_page+0x148/0x1b0
[25289.916316]  [<ffffffff8116d2ce>] free_hot_cold_page_list+0x4e/0xc0
[25289.917375]  [<ffffffff81223d4f>] ? locks_alloc_lock+0x1f/0x70
[25289.918438]  [<ffffffff81173f62>] release_pages+0x1c2/0x280
[25289.919489]  [<ffffffff81174c93>] __pagevec_release+0x43/0x60
[25289.920534]  [<ffffffff81180ff0>] shmem_undo_range+0x460/0x710
[25289.921557]  [<ffffffff811812b8>] shmem_truncate_range+0x18/0x40
[25289.922553]  [<ffffffff81181520>] shmem_setattr+0x110/0x1a0
[25289.923527]  [<ffffffff811f3361>] notify_change+0x241/0x390
[25289.924478]  [<ffffffff811d3323>] do_truncate+0x73/0xc0
[25289.925401]  [<ffffffff811d36cc>] do_sys_ftruncate.constprop.14+0x10c/0x160
[25289.926315]  [<ffffffff8135862e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[25289.927214]  [<ffffffff811d3770>] compat_SyS_ftruncate+0x10/0x20
[25289.928093]  [<ffffffff8179b9e9>] ia32_do_call+0x13/0x13
[25289.928958] Code: ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb 48 83 ec 08 4c 8b 42 08 <49> 39 f0 75 2e 4d 8b 45 00 4d 39 c4 75 6c 4c 39 e3 74 42 4c 39 
[25289.930848] NMI backtrace for cpu 3
[25289.931762] CPU: 3 PID: 31454 Comm: trinity-c164 Tainted: G             L 3.18.0+ #108
[25289.933619] task: ffff88009f3f95e0 ti: ffff880229bf4000 task.ti: ffff880229bf4000
[25289.934573] RIP: 0010:[<ffffffff811753f9>]  [<ffffffff811753f9>] cancel_dirty_page+0x39/0xc0
[25289.935536] RSP: 0000:ffff880229bf7c28  EFLAGS: 00000286
[25289.936482] RAX: 00000000fffffffb RBX: ffff8800a2518d28 RCX: 0000000000001fdd
[25289.937434] RDX: ffffea0008433d00 RSI: 0000000000001000 RDI: ffffea0008433d00
[25289.938386] RBP: ffff880229bf7c48 R08: 000000000001182a R09: ffff880121d39fc8
[25289.939345] R10: ffff880229bf7bd8 R11: 0000000000000220 R12: 0000000000001000
[25289.940295] R13: ffff880229bf7d40 R14: 0000000000000000 R15: 00000000000118ee
[25289.941237] FS:  00007fd3c5236740(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25289.942185] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25289.943137] CR2: 00000000014f4fd8 CR3: 000000010750d000 CR4: 00000000001407e0
[25289.944088] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25289.945036] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25289.945982] Stack:
[25289.946922]  ffff880229bf7d40 ffffea0008433d00 ffff8800a2518d28 ffff880229bf7d40
[25289.947887]  ffff880229bf7c68 ffffffff811758fe 000000000000000a ffff880229bf7cd0
[25289.948859]  ffff880229bf7df8 ffffffff81180fab ffffea00027bbe00 ffff8800a2518be0
[25289.949826] Call Trace:
[25289.950786]  [<ffffffff811758fe>] truncate_inode_page+0x4e/0x90
[25289.951752]  [<ffffffff81180fab>] shmem_undo_range+0x41b/0x710
[25289.952721]  [<ffffffff811812b8>] shmem_truncate_range+0x18/0x40
[25289.953666]  [<ffffffff81181520>] shmem_setattr+0x110/0x1a0
[25289.954589]  [<ffffffff811f3361>] notify_change+0x241/0x390
[25289.955510]  [<ffffffff811d3323>] do_truncate+0x73/0xc0
[25289.956428]  [<ffffffff811d36cc>] do_sys_ftruncate.constprop.14+0x10c/0x160
[25289.957345]  [<ffffffff8135862e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[25289.958247]  [<ffffffff811d375e>] SyS_ftruncate+0xe/0x10
[25289.959129]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25289.960005] Code: 89 f4 53 48 83 ec 08 f0 0f ba 37 04 72 14 48 83 c4 08 5b 41 5c 41 5d 5d c3 66 0f 1f 84 00 00 00 00 00 48 8b 5f 08 48 85 db 74 e3 <48> 8b 83 88 00 00 00 f6 40 20 01 75 d6 be 0b 00 00 00 e8 f0 e9 
[25289.961909] NMI backtrace for cpu 0
[25289.962833] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #108
[25289.964723] task: ffffffff81c16460 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[25289.965679] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25289.966648] RSP: 0018:ffffffff81c03e58  EFLAGS: 00000046
[25289.967596] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25289.968548] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[25289.969493] RBP: ffffffff81c03e88 R08: 0000000000000018 R09: 00000000000006c9
[25289.970443] R10: 0000000000000fc9 R11: 0000000000000400 R12: 0000000000000005
[25289.971384] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[25289.972314] FS:  0000000000000000(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25289.973248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25289.974183] CR2: 0000000000000008 CR3: 0000000001c11000 CR4: 00000000001407f0
[25289.975117] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25289.976047] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25289.976979] Stack:
[25289.977905]  0000000081c03e88 b7fee4166800c9e8 ffffe8ffffc01ba8 0000000000000005
[25289.978865]  ffffffff81c9e440 0000000000000000 ffffffff81c03ed8 ffffffff8163a995
[25289.979820]  00001703ad828700 ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25289.980783] Call Trace:
[25289.981737]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25289.982705]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25289.983669]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25289.984636]  [<ffffffff81787ac7>] rest_init+0x87/0x90
[25289.985602]  [<ffffffff81d1e032>] start_kernel+0x483/0x4a4
[25289.986565]  [<ffffffff81d1d99f>] ? set_init_arg+0x55/0x55
[25289.987521]  [<ffffffff81d1d581>] x86_64_start_reservations+0x2a/0x2c
[25289.988488]  [<ffffffff81d1d675>] x86_64_start_kernel+0xf2/0xf6
[25289.989425] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25317.831994] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [swapper/1:0]
[25317.833213] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25317.839376] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G             L 3.18.0+ #108
[25317.841869] task: ffff8802444fed60 ti: ffff880244704000 task.ti: ffff880244704000
[25317.843147] RIP: 0010:[<ffffffff8163a9b9>]  [<ffffffff8163a9b9>] cpuidle_enter_state+0x79/0x190
[25317.844445] RSP: 0018:ffff880244707e48  EFLAGS: 00000246
[25317.845727] RAX: 0000170a2fdbc2c9 RBX: ffffffff81799e0d RCX: 0000000000000019
[25317.847008] RDX: 20c49ba5e353f7cf RSI: ffff880244707fd8 RDI: ffffffff81c26f40
[25317.848288] RBP: ffff880244707e88 R08: ffff88024e24ecf0 R09: 00000000ffffffff
[25317.849570] R10: 000000000000262f R11: 0000000000aaaaaa R12: ffff880244707db8
[25317.850860] R13: 0000000000000000 R14: ffff880244704000 R15: ffff8802444fed60
[25317.852157] FS:  0000000000000000(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25317.853393] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25317.854633] CR2: 00007fda8919e000 CR3: 0000000001c11000 CR4: 00000000001407e0
[25317.855885] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25317.857121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25317.858353] Stack:
[25317.859581]  0000170a2f44020c ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25317.860841]  ffffe8ffffc41ba8 ffff880244704000 ffffffff81c9e440 ffff880244704000
[25317.862108]  ffff880244707e98 ffffffff8163ab87 ffff880244707f08 ffffffff810b9095
[25317.863330] Call Trace:
[25317.864545]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25317.865774]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25317.866999]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25317.868218] Code: c8 48 89 df ff 50 48 41 89 c5 e8 63 63 aa ff 44 8b 63 04 49 89 c7 0f 1f 44 00 00 e8 12 f8 af ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 2b 7d c0 4c 89 f8 49 c1 ff 3f 48 f7 ea b8 ff ff ff 7f 48 c1 
[25317.870820] sending NMI to other CPUs:
[25317.872055] NMI backtrace for cpu 0
[25317.873204] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #108
[25317.875524] task: ffffffff81c16460 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[25317.876676] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25317.877832] RSP: 0018:ffffffff81c03e58  EFLAGS: 00000046
[25317.878979] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25317.880133] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[25317.881280] RBP: ffffffff81c03e88 R08: 0000000000000018 R09: 0000000000000401
[25317.882431] R10: 0000000000000f4c R11: 0000000000000000 R12: 0000000000000005
[25317.883583] R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff81c00000
[25317.884735] FS:  0000000000000000(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25317.885886] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25317.887030] CR2: 00000000006392c8 CR3: 0000000001c11000 CR4: 00000000001407f0
[25317.888181] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25317.889331] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25317.890475] Stack:
[25317.891613]  0000000081c03e88 b7fee4166800c9e8 ffffe8ffffc01ba8 0000000000000005
[25317.892770]  ffffffff81c9e440 0000000000000000 ffffffff81c03ed8 ffffffff8163a995
[25317.893926]  0000170a323f875b ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25317.895082] Call Trace:
[25317.896217]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25317.897361]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25317.898503]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25317.899646]  [<ffffffff81787ac7>] rest_init+0x87/0x90
[25317.900788]  [<ffffffff81d1e032>] start_kernel+0x483/0x4a4
[25317.901931]  [<ffffffff81d1d99f>] ? set_init_arg+0x55/0x55
[25317.903073]  [<ffffffff81d1d581>] x86_64_start_reservations+0x2a/0x2c
[25317.904216]  [<ffffffff81d1d675>] x86_64_start_kernel+0xf2/0xf6
[25317.905357] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25317.907755] NMI backtrace for cpu 3
[25317.908965] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25317.911350] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25317.912485] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25317.913548] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25317.914602] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25317.915637] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25317.916642] RBP: ffff88024471be38 R08: ffff88024e2cecf0 R09: 00000000ffffffff
[25317.917629] R10: 000000000000263b R11: 0000000000000400 R12: 0000000000000005
[25317.918611] R13: 0000000000000032 R14: 0000000000000004 R15: ffff880244718000
[25317.919581] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25317.920557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25317.921528] CR2: 0000000000000024 CR3: 0000000001c11000 CR4: 00000000001407e0
[25317.922527] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25317.923502] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25317.924477] Stack:
[25317.925437]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000005
[25317.926419]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25317.927423]  0000170a323e077e ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25317.928430] Call Trace:
[25317.929409]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25317.930397]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25317.931370]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25317.932373]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25317.933346] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25317.935489] NMI backtrace for cpu 2
[25317.936503] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25317.938500] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25317.939486] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25317.940485] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25317.941483] RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
[25317.942501] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25317.943457] RBP: ffff88024470fe38 R08: ffff88024e28ecf0 R09: 00000000ffffffff
[25317.944395] R10: 0000000000002639 R11: 0000000000000400 R12: 0000000000000005
[25317.945329] R13: 0000000000000032 R14: 0000000000000004 R15: ffff88024470c000
[25317.946259] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25317.947190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25317.948124] CR2: 0000000000639058 CR3: 0000000001c11000 CR4: 00000000001407e0
[25317.949063] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25317.950009] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25317.950953] Stack:
[25317.951915]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000005
[25317.952893]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25317.953885]  0000170a323df4ab ffffffff81c9e610 ffffffff81c9e440 ffffffff81cfe330
[25317.954862] Call Trace:
[25317.955831]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25317.956807]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25317.957776]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25317.958741]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25317.959702] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25345.815706] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c103:12391]
[25345.816962] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25345.823837] CPU: 1 PID: 12391 Comm: trinity-c103 Tainted: G             L 3.18.0+ #108
[25345.826600] task: ffff88022911abc0 ti: ffff880143340000 task.ti: ffff880143340000
[25345.827931] RIP: 0010:[<ffffffff8178ecca>]  [<ffffffff8178ecca>] __slab_alloc+0x4e5/0x53b
[25345.829277] RSP: 0018:ffff880143343ca0  EFLAGS: 00000246
[25345.830607] RAX: 0000000000000001 RBX: ffff880143343c60 RCX: 000000018020001e
[25345.831928] RDX: 0000000000000020 RSI: ffffea00076ef700 RDI: ffff880245825680
[25345.833232] RBP: ffff880143343d70 R08: ffff8801dbbdc000 R09: 0000000000000000
[25345.834528] R10: 0000000000000082 R11: 00003ffffffff000 R12: 00000000000000bb
[25345.835836] R13: ffffffff810133af R14: ffff880143343c20 R15: 000000010020001d
[25345.837084] FS:  00007f5c9bc2f740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25345.838337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25345.839599] CR2: 00000000020d4968 CR3: 00000002299b4000 CR4: 00000000001407e0
[25345.840857] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25345.842117] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25345.843379] Stack:
[25345.844620]  01ffffff00000000 ffff88022ea27288 00007f5c97aa4fff 00007f5c97a24000
[25345.845915]  0000000000000000 ffff8801dbbdc600 0000000227852b98 ffffffff810750f5
[25345.847139]  ffffffff00000000 0000000000000246 00007f5c97aa5000 00007f5c97aa4fff
[25345.848385] Call Trace:
[25345.849599]  [<ffffffff810750f5>] ? copy_process.part.29+0x12a5/0x1970
[25345.850835]  [<ffffffff810750f5>] ? copy_process.part.29+0x12a5/0x1970
[25345.852052]  [<ffffffff811c012b>] kmem_cache_alloc+0x1bb/0x1e0
[25345.853258]  [<ffffffff810750f5>] copy_process.part.29+0x12a5/0x1970
[25345.854460]  [<ffffffff811d79ad>] ? get_empty_filp+0xdd/0x1f0
[25345.855664]  [<ffffffff81075981>] do_fork+0xe1/0x3d0
[25345.856837]  [<ffffffff81012047>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[25345.858003]  [<ffffffff81075cf6>] SyS_clone+0x16/0x20
[25345.859172]  [<ffffffff81799609>] stub_clone+0x69/0x90
[25345.860312]  [<ffffffff817994c4>] ? tracesys_phase2+0xd4/0xd9
[25345.861429] Code: 00 02 00 00 49 c7 45 00 00 00 00 00 75 11 ff b5 78 ff ff ff 9d e8 07 b4 9a ff 4c 89 e0 eb 0f e8 fd b4 9a ff ff b5 78 ff ff ff 9d <4c> 89 e0 48 8b 75 c8 65 48 33 34 25 28 00 00 00 74 32 e8 ff 79 
[25345.863833] sending NMI to other CPUs:
[25345.864970] NMI backtrace for cpu 2
[25345.866051] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L 3.18.0+ #108
[25345.868240] task: ffff8802444f8af0 ti: ffff88024470c000 task.ti: ffff88024470c000
[25345.869350] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25345.870475] RSP: 0018:ffff88024470fe08  EFLAGS: 00000046
[25345.871594] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[25345.872721] RDX: 0000000000000000 RSI: ffff88024470ffd8 RDI: 0000000000000002
[25345.873846] RBP: ffff88024470fe38 R08: 0000000000000018 R09: 0000000000001033
[25345.874967] R10: 000000000000331d R11: 0000000000000002 R12: 0000000000000002
[25345.876083] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88024470c000
[25345.877191] FS:  0000000000000000(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25345.878306] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25345.879419] CR2: 0000003370219050 CR3: 0000000229371000 CR4: 00000000001407e0
[25345.880537] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25345.881648] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25345.882753] Stack:
[25345.883844]  000000024470fe38 c6aa8015cfd0a8b5 ffffe8ffffc81ba8 0000000000000002
[25345.884958]  ffffffff81c9e440 0000000000000002 ffff88024470fe88 ffffffff8163a995
[25345.886071]  00001710b7b5ad88 ffffffff81c9e508 ffffffff81c9e440 ffffffff81cfe330
[25345.887189] Call Trace:
[25345.888301]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25345.889424]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25345.890517]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25345.891590]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25345.892652] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25345.894882] NMI backtrace for cpu 0
[25345.895925] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L 3.18.0+ #108
[25345.897992] task: ffffffff81c16460 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[25345.899014] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25345.900033] RSP: 0018:ffffffff81c03e58  EFLAGS: 00000046
[25345.901026] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000001
[25345.902020] RDX: 0000000000000000 RSI: ffffffff81c03fd8 RDI: 0000000000000000
[25345.903001] RBP: ffffffff81c03e88 R08: 0000000000000018 R09: 000000000000032a
[25345.903982] R10: 00000000000003f6 R11: 00000000000fff9d R12: 0000000000000001
[25345.904960] R13: 0000000000000000 R14: 0000000000000001 R15: ffffffff81c00000
[25345.905936] FS:  0000000000000000(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25345.906947] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25345.907962] CR2: 0000000000000001 CR3: 00000000973c9000 CR4: 00000000001407f0
[25345.909015] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25345.910072] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25345.911128] Stack:
[25345.912132]  0000000081c03e88 b7fee4166800c9e8 ffffe8ffffc01ba8 0000000000000001
[25345.913130]  ffffffff81c9e440 0000000000000000 ffffffff81c03ed8 ffffffff8163a995
[25345.914121]  00001710b7b5d674 ffffffff81c9e4b0 ffffffff81c9e440 ffffffff81cfe330
[25345.915112] Call Trace:
[25345.916097]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25345.917082]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25345.918056]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25345.919028]  [<ffffffff81787ac7>] rest_init+0x87/0x90
[25345.920015]  [<ffffffff81d1e032>] start_kernel+0x483/0x4a4
[25345.920988]  [<ffffffff81d1d99f>] ? set_init_arg+0x55/0x55
[25345.921960]  [<ffffffff81d1d581>] x86_64_start_reservations+0x2a/0x2c
[25345.922929]  [<ffffffff81d1d675>] x86_64_start_kernel+0xf2/0xf6
[25345.923900] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25345.926016] NMI backtrace for cpu 3
[25345.927071] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G             L 3.18.0+ #108
[25345.929211] task: ffff8802444fe270 ti: ffff880244718000 task.ti: ffff880244718000
[25345.930264] RIP: 0010:[<ffffffff813b84eb>]  [<ffffffff813b84eb>] intel_idle+0xdb/0x180
[25345.931337] RSP: 0018:ffff88024471be08  EFLAGS: 00000046
[25345.932420] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[25345.933563] RDX: 0000000000000000 RSI: ffff88024471bfd8 RDI: 0000000000000003
[25345.934692] RBP: ffff88024471be38 R08: 0000000000000018 R09: 0000000000000d4b
[25345.935804] R10: 0000000000002da4 R11: 0000000000000002 R12: 0000000000000002
[25345.936797] R13: 0000000000000001 R14: 0000000000000001 R15: ffff880244718000
[25345.937778] FS:  0000000000000000(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25345.938766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25345.939752] CR2: 0000000001e81930 CR3: 00000001f9d90000 CR4: 00000000001407e0
[25345.940747] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25345.941745] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25345.942743] Stack:
[25345.943728]  000000034471be38 5fbbf5c12849ac78 ffffe8ffffcc1ba8 0000000000000002
[25345.944753]  ffffffff81c9e440 0000000000000003 ffff88024471be88 ffffffff8163a995
[25345.945764]  00001710b7b5d25c ffffffff81c9e508 ffffffff81c9e440 ffffffff81cfe330
[25345.946777] Call Trace:
[25345.947789]  [<ffffffff8163a995>] cpuidle_enter_state+0x55/0x190
[25345.948807]  [<ffffffff8163ab87>] cpuidle_enter+0x17/0x20
[25345.949812]  [<ffffffff810b9095>] cpu_startup_entry+0x355/0x410
[25345.950811]  [<ffffffff8102fd0a>] start_secondary+0x1aa/0x230
[25345.951826] Code: 31 d2 65 48 8b 34 25 88 a8 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 88 a8 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48 
[25373.799486] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c17:29947]
[25373.800608] Modules linked in: dlci bridge snd_seq_dummy fuse 8021q garp stp tun bnep rfcomm scsi_transport_iscsi af_key llc2 hidp can_bcm sctp libcrc32c can_raw nfnetlink nfc caif_socket caif af_802154 ieee802154 phonet af_rxrpc bluetooth can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq e1000e shpchp snd_seq_device usb_debug coretemp hwmon x86_pkg_temp_thermal ptp pps_core snd_pcm nfsd snd_timer snd soundcore auth_rpcgss kvm_intel kvm crct10dif_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr serio_raw oid_registry nfs_acl lockd grace sunrpc
[25373.806511] CPU: 1 PID: 29947 Comm: trinity-c17 Tainted: G             L 3.18.0+ #108
[25373.808876] task: ffff88022908abc0 ti: ffff8801bf164000 task.ti: ffff8801bf164000
[25373.810057] RIP: 0010:[<ffffffff810933ee>]  [<ffffffff810933ee>] find_get_pid+0x1e/0x30
[25373.811282] RSP: 0018:ffff8801bf167ee8  EFLAGS: 00000202
[25373.812535] RAX: ffff8801bf33a140 RBX: ffffffff810850e0 RCX: ffff8801bf33a170
[25373.813781] RDX: 0000000000000030 RSI: ffffffff81c44840 RDI: 0000000000007d64
[25373.815031] RBP: ffff8801bf167ee8 R08: 00007f0b34598740 R09: 0000000000000000
[25373.816283] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[25373.817515] R13: ffff8801bf167eb0 R14: ffff8801bf33a140 R15: ffff8801bf167e68
[25373.818738] FS:  00007f0b34598740(0000) GS:ffff88024e240000(0000) knlGS:0000000000000000
[25373.819941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25373.821149] CR2: 000000336f1b7740 CR3: 0000000242a70000 CR4: 00000000001407e0
[25373.822379] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25373.823593] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25373.824799] Stack:
[25373.825983]  ffff8801bf167f78 ffffffff81079935 0000000f00000000 ffff8801bf33a140
[25373.827203]  0000000000000000 00007fff74741110 0000000000000000 0000000000000000
[25373.828434]  ffff88022908abc0 ffffffff8135862e 0000000000000246 0000000000000000
[25373.829660] Call Trace:
[25373.830833]  [<ffffffff81079935>] SyS_wait4+0x55/0x110
[25373.832014]  [<ffffffff8135862e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[25373.833206]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25373.834409] Code: 80 00 00 00 00 31 c0 eb eb 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 65 ff 04 25 60 a8 00 00 e8 ca fe ff ff 48 85 c0 74 03 f0 ff 00 <65> ff 0c 25 60 a8 00 00 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 
[25373.837037] sending NMI to other CPUs:
[25373.838287] NMI backtrace for cpu 0
[25373.839429] CPU: 0 PID: 32111 Comm: modprobe Tainted: G             L 3.18.0+ #108
[25373.841745] task: ffff8802293be270 ti: ffff880103704000 task.ti: ffff880103704000
[25373.842890] RIP: 0010:[<ffffffff811f70b4>]  [<ffffffff811f70b4>] __lookup_mnt+0x64/0x80
[25373.844045] RSP: 0018:ffff880103707bb8  EFLAGS: 00000286
[25373.845190] RAX: ffff880241670780 RBX: ffff880103707da8 RCX: 000000000000000e
[25373.846327] RDX: ffff880244b56300 RSI: ffff880241b03ae8 RDI: ffff8802416707a0
[25373.847446] RBP: ffff880103707bb8 R08: 70646f6d2f6e7572 R09: ffff880103707bdc
[25373.848558] R10: ffff88014331e921 R11: 0000000000000003 R12: ffff880241b03ae8
[25373.849665] R13: ffff880103707c60 R14: ffff880103707c50 R15: ffff8802416707a0
[25373.850770] FS:  00007fdddcfb2740(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000
[25373.851886] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25373.853000] CR2: 00000000016211d8 CR3: 00000001c6569000 CR4: 00000000001407f0
[25373.854123] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25373.855252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25373.856371] Stack:
[25373.857476]  ffff880103707c18 ffffffff811e082a 0000000000000081 ffff880200000001
[25373.858600]  0000000203707be8 00000000d85f4923 ffff880103707c08 ffff88014331e925
[25373.859730]  ffff8802293be270 6f636e6c2e6d7471 00000000ffffff9c ffff880103707da8
[25373.860865] Call Trace:
[25373.861983]  [<ffffffff811e082a>] lookup_fast+0xca/0x2e0
[25373.863101]  [<ffffffff811e28a5>] link_path_walk+0x1a5/0x850
[25373.864213]  [<ffffffff811e3424>] path_lookupat+0x64/0x770
[25373.865321]  [<ffffffff811c006f>] ? kmem_cache_alloc+0xff/0x1e0
[25373.866424]  [<ffffffff811e5dff>] ? getname_flags+0x4f/0x1a0
[25373.867522]  [<ffffffff811e3b5b>] filename_lookup+0x2b/0xc0
[25373.868622]  [<ffffffff811e6e23>] user_path_at_empty+0x63/0xc0
[25373.869721]  [<ffffffff811d7600>] ? fput+0xa0/0xa0
[25373.870814]  [<ffffffff812577da>] ? ext4_release_dir+0x2a/0x40
[25373.871890]  [<ffffffff811e6e91>] user_path_at+0x11/0x20
[25373.872942]  [<ffffffff811daa63>] vfs_fstatat+0x63/0xc0
[25373.873985]  [<ffffffff811d77e8>] ? __fput+0x188/0x1f0
[25373.875004]  [<ffffffff811daf8e>] SYSC_newstat+0x2e/0x60
[25373.876000]  [<ffffffff810941b4>] ? task_work_run+0xc4/0xe0
[25373.876971]  [<ffffffff81002bb9>] ? do_notify_resume+0x59/0x80
[25373.877921]  [<ffffffff8135862e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[25373.878846]  [<ffffffff811db26e>] SyS_newstat+0xe/0x10
[25373.879757]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25373.880651] Code: c2 48 8b 10 31 c0 48 85 d2 75 1c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 39 72 18 74 23 48 8b 12 48 85 d2 74 17 66 90 48 8b 42 10 <48> 83 c0 20 48 39 c7 74 e3 48 8b 12 48 85 d2 75 eb 31 c0 5d c3 
[25373.882565] NMI backtrace for cpu 2
[25373.883509] CPU: 2 PID: 31661 Comm: trinity-c90 Tainted: G             L 3.18.0+ #108
[25373.885431] task: ffff88022e86a0d0 ti: ffff880039fc0000 task.ti: ffff880039fc0000
[25373.886415] RIP: 0010:[<ffffffff8118f1c4>]  [<ffffffff8118f1c4>] iov_iter_fault_in_readable+0x64/0x80
[25373.887422] RSP: 0018:ffff880039fc3c18  EFLAGS: 00000206
[25373.888422] RAX: 0000000000000000 RBX: 000000000468e000 RCX: 0000000000003001
[25373.889432] RDX: 00007f0b31a7ea00 RSI: 0000000000001000 RDI: 000000000026fa00
[25373.890454] RBP: ffff880039fc3c28 R08: ffff8800978c0cf0 R09: 00000000008b89e2
[25373.891462] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000001000
[25373.892470] R13: ffff880039fc3d60 R14: 0000000000001000 R15: ffff88022e9718b0
[25373.893546] FS:  00007f0b34598740(0000) GS:ffff88024e280000(0000) knlGS:0000000000000000
[25373.894591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25373.895602] CR2: 00007f0b33620294 CR3: 000000009f3b0000 CR4: 00000000001407e0
[25373.896666] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25373.897705] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25373.898733] Stack:
[25373.899757]  0000000000001000 00ff88022e9718b0 ffff880039fc3cc8 ffffffff811652b8
[25373.900776]  0000000054940ba3 0000000000001000 ffff88022e86a0d0 0000000000000a01
[25373.901816]  000000000468dffa 0000000000000000 ffff880071a82580 ffffffff81828380
[25373.902857] Call Trace:
[25373.903836]  [<ffffffff811652b8>] generic_perform_write+0xa8/0x1f0
[25373.904819]  [<ffffffff8116789f>] __generic_file_write_iter+0x15f/0x350
[25373.905807]  [<ffffffff811d5140>] ? new_sync_read+0xd0/0xd0
[25373.906776]  [<ffffffff81167acd>] generic_file_write_iter+0x3d/0xb0
[25373.907731]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25373.908671]  [<ffffffff811d5288>] do_iter_readv_writev+0x78/0xc0
[25373.909602]  [<ffffffff811d6a18>] do_readv_writev+0xd8/0x2a0
[25373.910526]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25373.911458]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25373.912427]  [<ffffffff810dac62>] ? __hrtimer_start_range_ns+0x252/0x380
[25373.913397]  [<ffffffff810dadc8>] ? hrtimer_start+0x18/0x20
[25373.914320]  [<ffffffff811d6c69>] vfs_writev+0x39/0x50
[25373.915231]  [<ffffffff811d6dc9>] SyS_writev+0x59/0xf0
[25373.916128]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25373.917017] Code: 10 0f 1f 00 40 8a 39 0f 1f 00 85 d2 40 88 7d ff 75 2d 48 63 f6 48 8d 54 31 ff 48 31 d1 48 f7 c1 00 f0 ff ff 74 0f 0f 1f 00 8a 12 <0f> 1f 00 88 55 ff 0f b6 55 ff c9 f3 c3 0f 1f 80 00 00 00 00 c9 
[25373.918994] NMI backtrace for cpu 3
[25373.919961] CPU: 3 PID: 29575 Comm: trinity-c135 Tainted: G             L 3.18.0+ #108
[25373.921902] task: ffff88022d2fa0d0 ti: ffff88009aa0c000 task.ti: ffff88009aa0c000
[25373.922890] RIP: 0010:[<ffffffff81165a08>]  [<ffffffff81165a08>] unlock_page+0x18/0x90
[25373.923900] RSP: 0018:ffff88009aa0fbf8  EFLAGS: 00000202
[25373.924888] RAX: ffffea0001505d00 RBX: ffffea0001505d00 RCX: 007c8fc09347b000
[25373.925868] RDX: 0000000000000001 RSI: ffff88022e973770 RDI: ffffea0001505d00
[25373.926844] RBP: ffff88009aa0fbf8 R08: 0000000000001000 R09: ffffea0001505d00
[25373.927819] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000001000
[25373.928790] R13: ffff88009aa0fd60 R14: 0000000000000000 R15: ffff88022e973770
[25373.929770] FS:  00007f0b34598740(0000) GS:ffff88024e2c0000(0000) knlGS:0000000000000000
[25373.930759] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25373.931746] CR2: 00007f0b3320417d CR3: 0000000039ddb000 CR4: 00000000001407e0
[25373.932739] DR0: 00007fade9372000 DR1: 00007fe6ed782000 DR2: 0000000000000000
[25373.933800] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[25373.934848] Stack:
[25373.935836]  ffff88009aa0fc28 ffffffff8117f380 007c8fc09347a000 0000000000001000
[25373.936902]  ffff88009aa0fd60 0000000000000000 ffff88009aa0fcc8 ffffffff8116532a
[25373.938049]  ffff88009aa0fd68 0000000000001000 ffff88022d2fa0d0 0000000045806f40
[25373.939153] Call Trace:
[25373.940209]  [<ffffffff8117f380>] shmem_write_end+0x40/0xf0
[25373.941247]  [<ffffffff8116532a>] generic_perform_write+0x11a/0x1f0
[25373.942283]  [<ffffffff8116789f>] __generic_file_write_iter+0x15f/0x350
[25373.943317]  [<ffffffff811d5140>] ? new_sync_read+0xd0/0xd0
[25373.944323]  [<ffffffff81167acd>] generic_file_write_iter+0x3d/0xb0
[25373.945306]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25373.946298]  [<ffffffff811d5288>] do_iter_readv_writev+0x78/0xc0
[25373.947279]  [<ffffffff811d6a18>] do_readv_writev+0xd8/0x2a0
[25373.948237]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25373.949185]  [<ffffffff81167a90>] ? __generic_file_write_iter+0x350/0x350
[25373.950126]  [<ffffffff810dac62>] ? __hrtimer_start_range_ns+0x252/0x380
[25373.951061]  [<ffffffff810dadc8>] ? hrtimer_start+0x18/0x20
[25373.951985]  [<ffffffff811d6c69>] vfs_writev+0x39/0x50
[25373.952958]  [<ffffffff811d6dc9>] SyS_writev+0x59/0xf0
[25373.953919]  [<ffffffff817992d2>] system_call_fastpath+0x12/0x17
[25373.954946] Code: 41 f6 c4 01 0f 84 7d ff ff ff eb eb 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 f8 48 8b 17 48 89 e5 83 e2 01 74 60 f0 80 27 fe <48> be 01 00 fc ff ff ff 37 9e 48 8b 17 48 0f af f7 48 89 d1 48 
[25373.957052] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 118.758 msecs

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 14:30                                                                                               ` Chris Mason
@ 2014-12-19 15:12                                                                                                 ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-19 15:12 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 09:30:37AM -0500, Chris Mason wrote:

 > > in more recent builds. I've been running kitchen-sink debug kernels
 > > for my trinity runs for the last three years, and it's only this
 > > last few months that this has got to be enough of a problem that I'm
 > > not seeing the more interesting bugs. (Or perhaps we're just getting
 > > better at fixing them in -next now, so my runs are lasting longer..)
 > 
 > I think we're also adding more and more debugging.  It's definitely a 
 > good thing, but I think a lot of them are expected to stay off until 
 > you're trying to track down a specific problem.  I do always run with 
 > CONFIG_DEBUG_PAGEALLOC here and lock debugging/lockdep, and aside from 
 > being slow haven't hit trouble.

I think in the new year I'll hack up something I run on each kernel
build that picks a random subset of the debug options.  It's been on
my whiteboard for a while anyway, to try and get more 'real world'
looking kernel testing.  If I can get enough machines to test on,
it should still mean we get enough testing that we'll catch stuff early on.

It does seem like things have gotten so 'heavy' that a lot of what I've
been seeing have been ghosts. That said, there's also been several
real problems that have been shaken out during this thread over the last
two months, so I don't feel like we've wasted our time entirely.

 > I know it's 3.16 instead of 3.17, but 16K stacks are probably 
 > increasing the pressure on everything in these runs.  It's my favorite 
 > kernel feature this year, but it's likely to make trinity hurt more on 
 > memory constrained boxes.

That's actually a good point. Even just the forking/exiting overhead
is now much higher when we're starting & tearing down hundreds of child
processes every few seconds. Couple that with some children 'stuck'
in VM functions, and I could see the kernel struggling to find order
2 pages for a while. (Though never to the point where it fails).

 > I know you have traces with a ton more output, but I'm still 
 > wondering if usb-serial and printk from NMI really get along well.  I'd 
 > try with debugging back on and serial consoles off.  We carry patches 
 > to make oom print less, just because the time spent on our slow 
 > emulated serial console is enough to back the box up into a death 
 > spiral.

So I'm running out of time on this, and will realistically only
have this machine over the weekend.  I can give that a try, hopefully
if it fails, it'll fail early so we can try something else.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 14:55                                                                                                   ` Dave Jones
@ 2014-12-19 15:14                                                                                                     ` Chris Mason
  2014-12-19 19:15                                                                                                     ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Chris Mason @ 2014-12-19 15:14 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin



On Fri, Dec 19, 2014 at 9:55 AM, Dave Jones <davej@redhat.com> wrote:
> On Thu, Dec 18, 2014 at 08:48:24PM -0800, Linus Torvalds wrote:
>  > On Thu, Dec 18, 2014 at 8:03 PM, Dave Jones <davej@redhat.com> 
> wrote:
>  > >
>  > > So the only thing that was on that could cause spinlock overhead
>  > > was DEBUG_SPINLOCK (and LOCK_STAT, though iirc that's not huge 
> either)
>  >
>  > So DEBUG_SPINLOCK does have one big downside if I recall correctly 
> -
>  > the debugging spinlocks are very much not fair. So they don't work
>  > like the real ticket spinlocks. That might have serious effects on 
> the
>  > contention case, with some thread not making any progress due to 
> just
>  > the implementation of the debug spinlocks.
> 
> Wish DEBUG_SPINLOCK disabled, I see the same behaviour.
> Lots of traces spewed, but it seems to run and run (at least so far).

Not quite the same, the spinlocks are gone.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 14:55                                                                                                   ` Dave Jones
  2014-12-19 15:14                                                                                                     ` Chris Mason
@ 2014-12-19 19:15                                                                                                     ` Linus Torvalds
  2014-12-19 19:44                                                                                                       ` Peter Zijlstra
                                                                                                                         ` (3 more replies)
  1 sibling, 4 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-19 19:15 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 6:55 AM, Dave Jones <davej@redhat.com> wrote:
>
> Wish DEBUG_SPINLOCK disabled, I see the same behaviour.
> Lots of traces spewed, but it seems to run and run (at least so far).

Ok, so it's not spinlock debugging.

There are some interesting patters here, once again. Lookie:

     RIP: 0010:   generic_exec_single+0xea/0x1b0
     RIP: 0010:   generic_exec_single+0xee/0x1b0
     RIP: 0010:   generic_exec_single+0xea/0x1b0
     RIP: 0010:   generic_exec_single+0xea/0x1b0
     RIP: 0010:   generic_exec_single+0xee/0x1b0
     RIP: 0010:   generic_exec_single+0xee/0x1b0
     RIP: 0010:   generic_exec_single+0xea/0x1b0
     sched: RT throttling activated
     RIP: 0010:   __slab_alloc+0x4e5/0x53b
     RIP: 0010:   copy_user_enhanced_fast_string+0x5/0x10
     RIP: 0033:   0x412fc8
     RIP: 0010:   clear_page_c_e+0x7/0x10
     RIP: 0010:   cpuidle_enter_state+0x79/0x190
     RIP: 0010:   __slab_alloc+0x4e5/0x53b
     RIP: 0010:   find_get_pid+0x1e/0x30

so now copy_page_range() is gone, but every single case before the RT
throttling is activated is that zap_page_range() followed by the TLB
invalidate that we saw last time.

And after RT throttling, it's random (not even always trinity), but
that's probably because the watchdog thread doesn't run reliably any
more.

Another pattern in this one: it's CPU#1 that is stuck. Every single
time. There are stack traces from other CPU's, but they are all the
NMI broadcast *due* to the soft lockup on CPU#1.

And that is true even after the RT throttling thing.

And let's take another look at your previous one (with lock debugging,
but that config detail is clearly not that important - it hasn't
really changed anything major except make that lock very visible):

     RIP: 0010:   lock_acquire+0xb4/0x120
     RIP: 0010:   lock_acquire+0xb4/0x120
     RIP: 0010:   generic_exec_single+0xee/0x1b0
     RIP: 0010:   lock_acquire+0xb4/0x120
     RIP: 0010:   lock_acquire+0xb4/0x120
     RIP: 0010:   lock_acquire+0xb4/0x120
     RIP: 0010:   lock_acquire+0xb4/0x120
     RIP: 0010:   lock_acquire+0xb4/0x120
     RIP: 0010:   lock_acquire+0xb4/0x120
     sched: RT throttling activated
     RIP: 0010:   shmem_write_end+0x65/0xf0
     RIP: 0010:   _raw_spin_unlock_irqrestore+0x38/0x60
     RIP: 0010:   copy_user_enhanced_fast_string+0x5/0x10
     RIP: 0010:   copy_user_enhanced_fast_string+0x5/0x10
     RIP: 0010:   __slab_alloc+0x52f/0x58f
     RIP: 0010:   map_id_up+0x9/0x80
     RIP: 0010:   cpuidle_enter_state+0x79/0x190
     RIP: 0010:   unmap_single_vma+0x7d9/0x900
     RIP: 0010:   cpuidle_enter_state+0x79/0x190

same pattern: after the RT throttling, it's random because the
watchdog is no longer sane, before that it's always reliably either
the lock_acquire as part of copy_page_range(), or it's that TLB flush
as part of zap_page_range().

And the CPU patterns are interesting too:

   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
   NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [trinity-c154:22823]
   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]
   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c195:20128]

CPU#1 again, *except* for the third lockup, which happens to match the
exact same pattern of copy_page_range() (CPU#1) vs zap_page_range()
(CPU#2).

It's also the case that before the RT throttling, it really does seem
to be one particular thread (ie trinity-c195.20128 does the
copy_page_range on your previous one, and in the newer one it's
trinity-c205:636 that does the zap_page_range(). So those threads
really seem to be stuck for real. The fact that they *eventually* go
away at all is interesting in itself.

And that "generic_exec_single()" place where it is stuck is the
instruction after the "pause" (aka "cpu_relax()") in the final
"csd_lock_wait()" once more. So it's waiting on some CPU to pick up
the IPI, and that never happens.

Here's another pattern. In your latest thing, every single time that
CPU1 is waiting for some other CPU to pick up the IPI, we have CPU0
doing this:

[24998.060963] NMI backtrace for cpu 0
[24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted 3.18.0+ #108
[24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti:
ffff880197e0c000
[24998.065137] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>]
read_hpet+0x16/0x20
[24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[24998.090435]  <EOI>
[24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70

and I think that's the smoking gun. The reason CPU0 isn't picking up
any IPI's is because it is in some endless loop around read_hpet().

There is even time information in the register dump:

 RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
 RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
 RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
 RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
 RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
 RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
 RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004

That RAX value is the value we just read from the HPET, and RBX seems
to be monotonically increasing too, so it's likely the sequence
counter in ktime_get().

So it's not stuck *inside* read_hpet(), and it's almost certainly not
the loop over the sequence counter in ktime_get() either (it's not
increasing *that* quickly). But some basically infinite __run_hrtimer
thing or something?

In your earlier trace (with spinlock debugging), the softlockup
detection was in lock_acquire for copy_page_range(), but CPU2 was
always in that "generic_exec_single" due to a TLB flush from that
zap_page_range thing again. But there are no timer traces from that
one, so I dunno.

Anyway, I do think we're getting somewhere. Your traces are
interesting and have real patterns in them. Which is very different
from the mess it used to be.

                            Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 19:15                                                                                                     ` Linus Torvalds
@ 2014-12-19 19:44                                                                                                       ` Peter Zijlstra
  2014-12-19 19:51                                                                                                       ` Linus Torvalds
                                                                                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 486+ messages in thread
From: Peter Zijlstra @ 2014-12-19 19:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 11:15:21AM -0800, Linus Torvalds wrote:

>      sched: RT throttling activated

> 
> And after RT throttling, it's random (not even always trinity), but
> that's probably because the watchdog thread doesn't run reliably any
> more.

So if we want to shoot that RT throttling in the head you can do:

echo -1 > /proc/sys/kernel/sched_rt_runtime_us

That should completely disable that stuff, of course at that point a
run-away RR/FIFO thread will hog your cpu indefinitely.

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 19:15                                                                                                     ` Linus Torvalds
  2014-12-19 19:44                                                                                                       ` Peter Zijlstra
@ 2014-12-19 19:51                                                                                                       ` Linus Torvalds
  2014-12-19 20:46                                                                                                         ` Linus Torvalds
  2014-12-19 20:31                                                                                                       ` Chris Mason
  2014-12-19 23:14                                                                                                       ` Thomas Gleixner
  3 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-19 19:51 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 11:15 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In your earlier trace (with spinlock debugging), the softlockup
> detection was in lock_acquire for copy_page_range(), but CPU2 was
> always in that "generic_exec_single" due to a TLB flush from that
> zap_page_range thing again. But there are no timer traces from that
> one, so I dunno.

Ahh, and that's because the TLB flushing is done under the page table
lock these days (see commit 1cf35d47712d: "mm: split 'tlb_flush_mmu()'
into tlb flushing and memory freeing parts").

Which means that if the TLB flushing gets stuck on CPU#2, CPU#1 that
is trying to get the page table lock will be locked up too.

So this is all very consistent, actually. The underlying bug in both
cases seems to be that the IPI for the TLB flushing doesn't happen for
some reason.

In your second trace, that's explained by the fact that CPU0 is in a
timer interrupt. In the first trace with spinlock debugging, no such
obvious explanation exists. It could be that an IPI has gotten lost
for some reason.

However, the first trace does have this:

   NMI backtrace for cpu 3
   INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
long to run: 66.180 msecs
   CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0+ #107
   RIP: 0010:   intel_idle+0xdb/0x180
   Code: 31 d2 65 48 8b 34 ...
   INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
long to run: 95.053 msecs

so something odd is happening (probably on CPU3). It took a long time
to react to the NMI IPI too.

So there's definitely something screwy going on here in IPI-land.

I do note that we depend on the "new mwait" semantics where we do
mwait with interrupts disabled and a non-zero RCX value. Are there
possibly even any known CPU errata in that area? Not that it sounds
likely, but still..

                         Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 19:15                                                                                                     ` Linus Torvalds
  2014-12-19 19:44                                                                                                       ` Peter Zijlstra
  2014-12-19 19:51                                                                                                       ` Linus Torvalds
@ 2014-12-19 20:31                                                                                                       ` Chris Mason
  2014-12-19 20:36                                                                                                         ` Dave Jones
  2014-12-19 23:22                                                                                                         ` Thomas Gleixner
  2014-12-19 23:14                                                                                                       ` Thomas Gleixner
  3 siblings, 2 replies; 486+ messages in thread
From: Chris Mason @ 2014-12-19 20:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 11:15:21AM -0800, Linus Torvalds wrote:
> Here's another pattern. In your latest thing, every single time that
> CPU1 is waiting for some other CPU to pick up the IPI, we have CPU0
> doing this:
> 
> [24998.060963] NMI backtrace for cpu 0
> [24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted 3.18.0+ #108
> [24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti:
> ffff880197e0c000
> [24998.065137] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>]
> read_hpet+0x16/0x20
> [24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
> [24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
> [24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
> [24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
> [24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
> [24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
> [24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
> [24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
> [24998.090435]  <EOI>
> [24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
> [24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
> [24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
> [24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
> 
> and I think that's the smoking gun. The reason CPU0 isn't picking up
> any IPI's is because it is in some endless loop around read_hpet().
> 
> There is even time information in the register dump:
> 
>  RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
>  RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
>  RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
>  RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
>  RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
>  RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
>  RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004
> 
> That RAX value is the value we just read from the HPET, and RBX seems
> to be monotonically increasing too, so it's likely the sequence
> counter in ktime_get().
> 
> So it's not stuck *inside* read_hpet(), and it's almost certainly not
> the loop over the sequence counter in ktime_get() either (it's not
> increasing *that* quickly). But some basically infinite __run_hrtimer
> thing or something?

Really interesting.

So, we're calling __run_hrtimer in a loop:

                while ((node = timerqueue_getnext(&base->active))) {
				...
				__run_hrtimer(timer, &basenow);
				...
		}

The easy question is how often does trinity call nanosleep?

Looking at __run_hrtimer(), it drops the lock and runs the function and then
takes the lock again, maybe enqueueing us again right away.

timer->state is supposed to protect us from other CPUs jumping in and doing
something else with the timer, but it feels racey wrt remove_hrtimer().
Something like this, but I'm not sure how often __hrtimer_start_range_ns gets
called

CPU 0						CPU 1
__run_hrtimer()
    timer->state = HRTIMER_STATE_CALLBACK
    removed from list
    unlock cpu_base->lock
    restrt = fn(timer)
    						__hrtimer_start_range_ns()
						base = lock_hrtimer_base()
						ret = remove_hrtimer()
						    finds timer->state = HRTIMER_STATE_CALLBACK
						    does nothing
						new_base = switch_hrtimer_base()
						    now we're on a different base, different lock
    lock(cpu_base->lock)
    enqueue the timer
    						enqueue the timer

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 20:31                                                                                                       ` Chris Mason
@ 2014-12-19 20:36                                                                                                         ` Dave Jones
  2014-12-19 23:22                                                                                                         ` Thomas Gleixner
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-19 20:36 UTC (permalink / raw)
  To: Chris Mason, Linus Torvalds, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 03:31:36PM -0500, Chris Mason wrote:

 > > So it's not stuck *inside* read_hpet(), and it's almost certainly not
 > > the loop over the sequence counter in ktime_get() either (it's not
 > > increasing *that* quickly). But some basically infinite __run_hrtimer
 > > thing or something?
 > 
 > Really interesting.
 > 
 > So, we're calling __run_hrtimer in a loop:
 > 
 >                 while ((node = timerqueue_getnext(&base->active))) {
 > 				...
 > 				__run_hrtimer(timer, &basenow);
 > 				...
 > 		}
 > 
 > The easy question is how often does trinity call nanosleep?

It shouldn't call it directly. (syscalls/nanosleep.c)

/*
 * SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp, struct timespec __user *, rmtp)
 */
#include "sanitise.h"

struct syscallentry syscall_nanosleep = {
	.name = "nanosleep",
	.num_args = 2,
	.arg1name = "rqtp",
	.arg1type = ARG_ADDRESS,
	.arg2name = "rmtp",
	.arg2type = ARG_ADDRESS,
	.flags = AVOID_SYSCALL, // Boring.  Can cause long sleeps.
};


That last line being the key one.  We used to do it, but it's well.. boring.
We could do something smarter, but given it's never triggered anything
interesting in the past, focussing runtime on the more interesting
syscalls seems to have been more fruitful.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 19:51                                                                                                       ` Linus Torvalds
@ 2014-12-19 20:46                                                                                                         ` Linus Torvalds
  2014-12-19 20:54                                                                                                           ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-19 20:46 UTC (permalink / raw)
  To: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 11:51 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I do note that we depend on the "new mwait" semantics where we do
> mwait with interrupts disabled and a non-zero RCX value. Are there
> possibly even any known CPU errata in that area? Not that it sounds
> likely, but still..

Remind me what CPU you have in that machine again? The %rax value for
the mwait cases in question seems to be 0x32, which is either C7s-HSW
or C7s-BDW, and in both cases has the "TLB flushed" flag set.

I'm pretty sure you have a Haswell, I'm just checking. Which model?
I'm assuming it's family 6, model 60, stepping 3? I found you
mentioning i5-4670T in a perf thread.. That the one?

Anyway, I don't actually believe in any CPU bugs, but you could try
"intel_idle.max_cstate=0" and see if that makes any difference, for
example.

Or perhaps just "intel_idle.max_cstate=1", which leaves intel_idle
active, but gets rid of the deeper sleep states (that incidentally
also play games with leave_mm() etc)

                            Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 20:46                                                                                                         ` Linus Torvalds
@ 2014-12-19 20:54                                                                                                           ` Dave Jones
  2014-12-19 22:05                                                                                                             ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-19 20:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 12:46:16PM -0800, Linus Torvalds wrote:
 > On Fri, Dec 19, 2014 at 11:51 AM, Linus Torvalds
 > <torvalds@linux-foundation.org> wrote:
 > >
 > > I do note that we depend on the "new mwait" semantics where we do
 > > mwait with interrupts disabled and a non-zero RCX value. Are there
 > > possibly even any known CPU errata in that area? Not that it sounds
 > > likely, but still..
 > 
 > Remind me what CPU you have in that machine again? The %rax value for
 > the mwait cases in question seems to be 0x32, which is either C7s-HSW
 > or C7s-BDW, and in both cases has the "TLB flushed" flag set.
 > 
 > I'm pretty sure you have a Haswell, I'm just checking. Which model?
 > I'm assuming it's family 6, model 60, stepping 3? I found you
 > mentioning i5-4670T in a perf thread.. That the one?

Yep.

vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz
stepping	: 3
microcode	: 0x1a

 > Anyway, I don't actually believe in any CPU bugs, but you could try
 > "intel_idle.max_cstate=0" and see if that makes any difference, for
 > example.
 > 
 > Or perhaps just "intel_idle.max_cstate=1", which leaves intel_idle
 > active, but gets rid of the deeper sleep states (that incidentally
 > also play games with leave_mm() etc)

So I'm leaving Red Hat on Tuesday, and can realistically only do one
more experiment over the weekend before I give them this box back.

Right now I'm doing Chris' idea of "turn debugging back on,
and try without serial console".  Shall I try your suggestion
on top of that ?

I *hate* for this to be "the one that got away", but we've
at least gotten some good mileage out of this bug in the last
two months.  Who knows, maybe I'll find some new hardware that
will exhibit the same behaviour in the new year.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 20:54                                                                                                           ` Dave Jones
@ 2014-12-19 22:05                                                                                                             ` Linus Torvalds
  2014-12-20 16:49                                                                                                               ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-19 22:05 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Fri, Dec 19, 2014 at 12:54 PM, Dave Jones <davej@redhat.com> wrote:
>
> Right now I'm doing Chris' idea of "turn debugging back on,
> and try without serial console".  Shall I try your suggestion
> on top of that ?

Might as well. I doubt it really will make any difference, but I also
don't think it will interact badly in any way, so..

                      Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 19:15                                                                                                     ` Linus Torvalds
                                                                                                                         ` (2 preceding siblings ...)
  2014-12-19 20:31                                                                                                       ` Chris Mason
@ 2014-12-19 23:14                                                                                                       ` Thomas Gleixner
  2014-12-19 23:55                                                                                                         ` Linus Torvalds
  3 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-19 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, 19 Dec 2014, Linus Torvalds wrote:
> Here's another pattern. In your latest thing, every single time that
> CPU1 is waiting for some other CPU to pick up the IPI, we have CPU0
> doing this:
> 
> [24998.060963] NMI backtrace for cpu 0
> [24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted 3.18.0+ #108
> [24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti:
> ffff880197e0c000
> [24998.065137] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>]
> read_hpet+0x16/0x20
> [24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
> [24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
> [24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
> [24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
> [24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
> [24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
> [24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
> [24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
> [24998.090435]  <EOI>
> [24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
> [24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
> [24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
> [24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
> 
> and I think that's the smoking gun. The reason CPU0 isn't picking up
> any IPI's is because it is in some endless loop around read_hpet().
> 
> There is even time information in the register dump:
> 
>  RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
>  RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
>  RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
>  RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
>  RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
>  RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
>  RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004
> 
> That RAX value is the value we just read from the HPET, and RBX seems
> to be monotonically increasing too, so it's likely the sequence
> counter in ktime_get().

Here is the full diff of the first and the second splat for CPU0

 task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti: ffff880197e0c000
 read_hpet+0x16/0x20
 RSP: 0018:ffff88024e203e38  EFLAGS: 00000046
-RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
+RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000

RAX: 
(0x0000000079e588fc - 0x0000000061fece8a) / 14.318180e6 ~= 28.0061

So HPET @14.3MHz progressed by 28 seconds, which matches the splat
delta between the first and the second one.

25026.001132 - 24998.017355 = 27.9838

RBX:
0x0000000000511d6e - 0x0000000000510792 = 5596 

The sequence counter increments by 2 per tick. So:

28 / (5596/2) ~= 0.0025 s

==> HZ = 400

The sequence counter is even, so ktime_get() will succeed.

 RDX: 0000000000000000 RSI: ffff88024e20c710 RDI: ffffffff81c26f40
 RBP: ffff88024e203e38 R08: 0000000000000000 R09: 000000000000000f
-R10: 0000000000000526 R11: 000000000000000f R12: 000016bf99600917
+R10: 0000000000000526 R11: 000000000000000f R12: 000016c61e4e2117

R12:
0x000016c61e4e2117 - 0x000016bf99600917 = 2.8e+10

That's the nanoseconds timestamp: 2.8e10/1e9 = 28

Now that all looks correct. So there is something else going on. After
staring some more at it, I think we are looking at it from the wrong
angle.

The watchdog always detects CPU1 as stuck and we got completely
fixated on the csd_wait() in the stack trace on CPU1. Now we have
stack traces which show a different picture, i.e. CPU1 makes progress
after a gazillion of seconds.

I think we really need to look at CPU1 itself.

AFAICT all these 'stuck' events happen in fully interruptible
context. So an undetected interrupt storm can cause that.

We only detect interrupt storms for unhandled interrupts, but for
those where the handler returns IRQ_HANDLED, we just count them.

For directly handled vectors we do not even have a detection mechanism
at all.

That also might explain the RT throttling. If that storm hits a high
prio task, the throttler will trigger.

Just a theory, but worth to explore, IMO.

So adding a dump of the total interrupt counts to the watchdog trace
might give us some insight.

Debug patch below.

Thanks,

	tglx
---
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 501baa9ac1be..2021662663c7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -315,11 +315,21 @@ asmlinkage __visible void do_softirq(void)
 	local_irq_restore(flags);
 }
 
+static DEFINE_PER_CPU(unsigned long, irqcnt);
+
+void show_irqcnt(int cpu)
+{
+	pr_emerg("CPU#%d: IRQ %lu NMI %u\n", cpu, this_cpu_read(irqcnt),
+		 this_cpu_read(irq_stat.__nmi_count));
+}
+
 /*
  * Enter an interrupt context.
  */
 void irq_enter(void)
 {
+	this_cpu_inc(irqcnt);
+
 	rcu_irq_enter();
 	if (is_idle_task(current) && !in_interrupt()) {
 		/*
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 70bf11815f84..f505cc58d354 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -306,6 +306,8 @@ static void watchdog_interrupt_count(void)
 static int watchdog_nmi_enable(unsigned int cpu);
 static void watchdog_nmi_disable(unsigned int cpu);
 
+extern void show_irqcnt(int cpu);
+
 /* watchdog kicker functions */
 static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
@@ -388,6 +390,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 			smp_processor_id(), duration,
 			current->comm, task_pid_nr(current));
 		__this_cpu_write(softlockup_task_ptr_saved, current);
+		show_irqcnt(smp_processor_id());
 		print_modules();
 		print_irqtrace_events(current);
 		if (regs)









^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 20:31                                                                                                       ` Chris Mason
  2014-12-19 20:36                                                                                                         ` Dave Jones
@ 2014-12-19 23:22                                                                                                         ` Thomas Gleixner
  2014-12-20  0:12                                                                                                           ` Chris Mason
  1 sibling, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-19 23:22 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Jones, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, 19 Dec 2014, Chris Mason wrote:
> On Fri, Dec 19, 2014 at 11:15:21AM -0800, Linus Torvalds wrote:
> > Here's another pattern. In your latest thing, every single time that
> > CPU1 is waiting for some other CPU to pick up the IPI, we have CPU0
> > doing this:
> > 
> > [24998.060963] NMI backtrace for cpu 0
> > [24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted 3.18.0+ #108
> > [24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 task.ti:
> > ffff880197e0c000
> > [24998.065137] RIP: 0010:[<ffffffff8103e006>]  [<ffffffff8103e006>]
> > read_hpet+0x16/0x20
> > [24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
> > [24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
> > [24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
> > [24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
> > [24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
> > [24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
> > [24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
> > [24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
> > [24998.090435]  <EOI>
> > [24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
> > [24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
> > [24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
> > [24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
> > 
> > and I think that's the smoking gun. The reason CPU0 isn't picking up
> > any IPI's is because it is in some endless loop around read_hpet().
> > 
> > There is even time information in the register dump:
> > 
> >  RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
> >  RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
> >  RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
> >  RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
> >  RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
> >  RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
> >  RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004
> > 
> > That RAX value is the value we just read from the HPET, and RBX seems
> > to be monotonically increasing too, so it's likely the sequence
> > counter in ktime_get().
> > 
> > So it's not stuck *inside* read_hpet(), and it's almost certainly not
> > the loop over the sequence counter in ktime_get() either (it's not
> > increasing *that* quickly). But some basically infinite __run_hrtimer
> > thing or something?
> 
> Really interesting.
> 
> So, we're calling __run_hrtimer in a loop:
> 
>                 while ((node = timerqueue_getnext(&base->active))) {
> 				...
> 				__run_hrtimer(timer, &basenow);
> 				...
> 		}
> 
> The easy question is how often does trinity call nanosleep?
> 
> Looking at __run_hrtimer(), it drops the lock and runs the function and then
> takes the lock again, maybe enqueueing us again right away.
> 
> timer->state is supposed to protect us from other CPUs jumping in and doing
> something else with the timer, but it feels racey wrt remove_hrtimer().
> Something like this, but I'm not sure how often __hrtimer_start_range_ns gets
> called
> 
> CPU 0						CPU 1
> __run_hrtimer()
>     timer->state = HRTIMER_STATE_CALLBACK
>     removed from list
>     unlock cpu_base->lock
>     restrt = fn(timer)
>     						__hrtimer_start_range_ns()
> 						base = lock_hrtimer_base()
> 						ret = remove_hrtimer()
> 						    finds timer->state = HRTIMER_STATE_CALLBACK
> 						    does nothing
> 						new_base = switch_hrtimer_base()
> 						    now we're on a different base, different lock
>     lock(cpu_base->lock)
>     enqueue the timer
>     						enqueue the timer

But at the very end this would be detected by the runtime check of the
hrtimer interrupt, which does not trigger. And it would trigger at
some point as ALL cpus including CPU0 in that trace dump make
progress.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 23:14                                                                                                       ` Thomas Gleixner
@ 2014-12-19 23:55                                                                                                         ` Linus Torvalds
  2014-12-20  1:00                                                                                                           ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-19 23:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 3:14 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Now that all looks correct. So there is something else going on. After
> staring some more at it, I think we are looking at it from the wrong
> angle.
>
> The watchdog always detects CPU1 as stuck and we got completely
> fixated on the csd_wait() in the stack trace on CPU1. Now we have
> stack traces which show a different picture, i.e. CPU1 makes progress
> after a gazillion of seconds.

.. but that doesn't explain why CPU0 ends up always being at that
*exact* same instruction in the NMI backtrace.

While a fairly tight loop, together with "mmio read is very expensive
and synchronizing" would explain it. An MMIO read can easily be as
expensive as several thousand instructions.

> I think we really need to look at CPU1 itself.

Not so fast. Take another look at CPU0.

[24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
[24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
[24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
[24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
[24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
[24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
[24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
[24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
[24998.090435]  <EOI>
[24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
[24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
[24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
[24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70


Really. None of that changed. NONE. The likelihood that we hit the
exact same instruction every time? Over a timefraem of more than a
minute?

The only way I see that happening is (a) NMI is completely buggered,
and the backtrace is just random crap that is always the same.  Or (b)
it's really a fairly tight loop.

The fact that you had a hrtimer interrupt happen in the *middle* of
__remove_hrtimer() is really another fairly strong hint. That smells
like "__remove_hrtimer() has a race with hrtimer interrupts".

And that race results in a basically endless loop (which perhaps ends
when the HRtimer overflows, in what, a few minutes?)

I really don't think you should look at CPU1. Not when CPU0 has such
an interesting pattern that you dismissed just because the HPET is
making progress.

                            Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 23:22                                                                                                         ` Thomas Gleixner
@ 2014-12-20  0:12                                                                                                           ` Chris Mason
  2014-12-20  1:06                                                                                                             ` Thomas Gleixner
  0 siblings, 1 reply; 486+ messages in thread
From: Chris Mason @ 2014-12-20  0:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Dave Jones, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin



On Fri, Dec 19, 2014 at 6:22 PM, Thomas Gleixner <tglx@linutronix.de> 
wrote:
> On Fri, 19 Dec 2014, Chris Mason wrote:
>>  On Fri, Dec 19, 2014 at 11:15:21AM -0800, Linus Torvalds wrote:
>>  > Here's another pattern. In your latest thing, every single time 
>> that
>>  > CPU1 is waiting for some other CPU to pick up the IPI, we have 
>> CPU0
>>  > doing this:
>>  >
>>  > [24998.060963] NMI backtrace for cpu 0
>>  > [24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted 
>> 3.18.0+ #108
>>  > [24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 
>> task.ti:
>>  > ffff880197e0c000
>>  > [24998.065137] RIP: 0010:[<ffffffff8103e006>]  
>> [<ffffffff8103e006>]
>>  > read_hpet+0x16/0x20
>>  > [24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
>>  > [24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
>>  > [24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
>>  > [24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
>>  > [24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
>>  > [24998.087877]  [<ffffffff81031a4b>] 
>> local_apic_timer_interrupt+0x3b/0x70
>>  > [24998.088732]  [<ffffffff8179bca5>] 
>> smp_apic_timer_interrupt+0x45/0x60
>>  > [24998.089583]  [<ffffffff8179a0df>] 
>> apic_timer_interrupt+0x6f/0x80
>>  > [24998.090435]  <EOI>
>>  > [24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
>>  > [24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
>>  > [24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
>>  > [24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
>>  >
>>  > and I think that's the smoking gun. The reason CPU0 isn't picking 
>> up
>>  > any IPI's is because it is in some endless loop around 
>> read_hpet().
>>  >
>>  > There is even time information in the register dump:
>>  >
>>  >  RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
>>  >  RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
>>  >  RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
>>  >  RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
>>  >  RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
>>  >  RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
>>  >  RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004
>>  >
>>  > That RAX value is the value we just read from the HPET, and RBX 
>> seems
>>  > to be monotonically increasing too, so it's likely the sequence
>>  > counter in ktime_get().
>>  >
>>  > So it's not stuck *inside* read_hpet(), and it's almost certainly 
>> not
>>  > the loop over the sequence counter in ktime_get() either (it's not
>>  > increasing *that* quickly). But some basically infinite 
>> __run_hrtimer
>>  > thing or something?
>> 
>>  Really interesting.
>> 
>>  So, we're calling __run_hrtimer in a loop:
>> 
>>                  while ((node = timerqueue_getnext(&base->active))) {
>>  				...
>>  				__run_hrtimer(timer, &basenow);
>>  				...
>>  		}
>> 
>>  The easy question is how often does trinity call nanosleep?
>> 
>>  Looking at __run_hrtimer(), it drops the lock and runs the function 
>> and then
>>  takes the lock again, maybe enqueueing us again right away.
>> 
>>  timer->state is supposed to protect us from other CPUs jumping in 
>> and doing
>>  something else with the timer, but it feels racey wrt 
>> remove_hrtimer().
>>  Something like this, but I'm not sure how often 
>> __hrtimer_start_range_ns gets
>>  called
>> 
>>  CPU 0						CPU 1
>>  __run_hrtimer()
>>      timer->state = HRTIMER_STATE_CALLBACK
>>      removed from list
>>      unlock cpu_base->lock
>>      restrt = fn(timer)
>>      						__hrtimer_start_range_ns()
>>  						base = lock_hrtimer_base()
>>  						ret = remove_hrtimer()
>>  						    finds timer->state = HRTIMER_STATE_CALLBACK
>>  						    does nothing
>>  						new_base = switch_hrtimer_base()
>>  						    now we're on a different base, different lock
>>      lock(cpu_base->lock)
>>      enqueue the timer
>>      						enqueue the timer
> 
> But at the very end this would be detected by the runtime check of the
> hrtimer interrupt, which does not trigger. And it would trigger at
> some point as ALL cpus including CPU0 in that trace dump make
> progress.

I'll admit that at some point we should be hitting one of the WARN or 
BUG_ON, but it's possible to thread that needle and corrupt the timer 
list, without hitting a warning (CPU 1 in my example has to enqueue 
last).  Once the rbtree is hosed, it can go forever.  Probably not the 
bug we're looking for, but still suspect in general.

-chris




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 23:55                                                                                                         ` Linus Torvalds
@ 2014-12-20  1:00                                                                                                           ` Thomas Gleixner
  2014-12-20  1:57                                                                                                             ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-20  1:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, 19 Dec 2014, Linus Torvalds wrote:
> On Fri, Dec 19, 2014 at 3:14 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > Now that all looks correct. So there is something else going on. After
> > staring some more at it, I think we are looking at it from the wrong
> > angle.
> >
> > The watchdog always detects CPU1 as stuck and we got completely
> > fixated on the csd_wait() in the stack trace on CPU1. Now we have
> > stack traces which show a different picture, i.e. CPU1 makes progress
> > after a gazillion of seconds.
> 
> .. but that doesn't explain why CPU0 ends up always being at that
> *exact* same instruction in the NMI backtrace.

Up to the point where it exposes different instructions and completely
different code pathes. I'm not agreeing with your theory that after
the RT throttler hit the watchdog and everything else goes completely
bonkers. And even before that happens we see a different backtrace on
CPU0:

[25149.982766] RIP: 0010:[<ffffffff810cf1db>]  [<ffffffff810cf1db>] invoke_rcu_core+0x2b/0x50

Though I have to admit that this "very same instruction" pattern
puzzled me for quite a while as well.

> While a fairly tight loop, together with "mmio read is very expensive
> and synchronizing" would explain it. An MMIO read can easily be as
> expensive as several thousand instructions.

The watchdog timer runs on a fully periodic schedule. It's self
rearming via

	 hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

So if that aligns with the equally periodic tick interrupt on the
other CPU then you might get into that situation due to the fully
synchronizing and serializing nature of HPET reads.

That can drift apart over time because the timer device (apic or that
new fangled tscdeadline timer) are not frequency corrected versus
timekeeping.

> > I think we really need to look at CPU1 itself.
> 
> Not so fast. Take another look at CPU0.
> 
> [24998.083577]  [<ffffffff810e0d3e>] ktime_get+0x3e/0xa0
> [24998.084450]  [<ffffffff810e9cd3>] tick_sched_timer+0x23/0x160
> [24998.085315]  [<ffffffff810daf96>] __run_hrtimer+0x76/0x1f0
> [24998.086173]  [<ffffffff810e9cb0>] ? tick_init_highres+0x20/0x20
> [24998.087025]  [<ffffffff810db2e7>] hrtimer_interrupt+0x107/0x260
> [24998.087877]  [<ffffffff81031a4b>] local_apic_timer_interrupt+0x3b/0x70
> [24998.088732]  [<ffffffff8179bca5>] smp_apic_timer_interrupt+0x45/0x60
> [24998.089583]  [<ffffffff8179a0df>] apic_timer_interrupt+0x6f/0x80
> [24998.090435]  <EOI>
> [24998.091279]  [<ffffffff810da66e>] ? __remove_hrtimer+0x4e/0xa0
> [24998.092118]  [<ffffffff812c7c7a>] ? ipcget+0x8a/0x1e0
> [24998.092951]  [<ffffffff812c7c6c>] ? ipcget+0x7c/0x1e0
> [24998.093779]  [<ffffffff812c8d6d>] SyS_msgget+0x4d/0x70
> 
> 
> Really. None of that changed. NONE. The likelihood that we hit the
> exact same instruction every time? Over a timefraem of more than a
> minute?
> 
> The only way I see that happening is (a) NMI is completely buggered,
> and the backtrace is just random crap that is always the same.  Or (b)
> it's really a fairly tight loop.
> 
> The fact that you had a hrtimer interrupt happen in the *middle* of
> __remove_hrtimer() is really another fairly strong hint. That smells
> like "__remove_hrtimer() has a race with hrtimer interrupts".

That __remove_hrtimer has a '?' in front of it. So its not a reliable
trace entry.

There is NO hrtimer related operation in the msgget() syscall at all
and SyS_msgget() is the only reliable entry on that stack trace.

So that __remove_hrtimer operation happened before that msgget()
syscall and is just a stack artifact. poll/select/nanosleep whatever.

> And that race results in a basically endless loop (which perhaps ends
> when the HRtimer overflows, in what, a few minutes?)

hrtimers overflow in about 584 years
 
> I really don't think you should look at CPU1. Not when CPU0 has such
> an interesting pattern that you dismissed just because the HPET is
> making progress.

No. I did not dismiss it because HPET is making progress. I looked at
it from a different angle.

So lets assume there is that hrtimer_remove race (I'm certainly going
to stare at that tomorrow with fully awake brain. It's past beer
o'clock here). How do you explain that:

1) the watchdog always triggers on CPU1?

2) the race only ever happens on CPU0?

3) the hrtimer interrupt took too long message never appears?

   If that timer interrupt loops forever then it will complain about
   that. And it leaves that code for sure as the backtrace of CPU0
   hits user space later on.

4) the RT throttler hit?

   Admittetly we dont know from which CPU and which task that comes
   but that's very simple to figure out. Debug patch below.

5) that the system makes progress afterwards?

6) ....

If my assumption about an interrupt storm turns out to be true, then
it explain all of the above. I might be wrong as usual, but I still
think its worth to have a look.

Thanks,

	tglx
---

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ee15f5a0d1c1..d9e4153d405b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -895,7 +895,8 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
 		 */
 		if (likely(rt_b->rt_runtime)) {
 			rt_rq->rt_throttled = 1;
-			printk_deferred_once("sched: RT throttling activated\n");
+			printk_deferred_once("sched: RT throttling activated cpu %d task %s %d\n",
+					     smp_processor_id(), current->comm, current->pid);
 		} else {
 			/*
 			 * In case we did anyway, make it go away,

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-20  0:12                                                                                                           ` Chris Mason
@ 2014-12-20  1:06                                                                                                             ` Thomas Gleixner
  0 siblings, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2014-12-20  1:06 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Jones, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, 19 Dec 2014, Chris Mason wrote:
> On Fri, Dec 19, 2014 at 6:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > But at the very end this would be detected by the runtime check of the
> > hrtimer interrupt, which does not trigger. And it would trigger at
> > some point as ALL cpus including CPU0 in that trace dump make
> > progress.
> 
> I'll admit that at some point we should be hitting one of the WARN or BUG_ON,
> but it's possible to thread that needle and corrupt the timer list, without
> hitting a warning (CPU 1 in my example has to enqueue last).  Once the rbtree
> is hosed, it can go forever.  Probably not the bug we're looking for, but
> still suspect in general.

I surely have a close look at that, but in that case we get out of
that state later on and I doubt that we have 

     A) a corruption of the rbtree
     B) a self healing of the rbtree afterwards

I doubt it, but who knows.

Though even if A & B would happen we would still get the 'hrtimer
interrupt took a gazillion of seconds' warning because CPU0 definitely
leaves the timer interrupt at some point otherwise we would not see
backtraces from usb, userspace and idle later on.

Thanks,

	tglx





^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-20  1:00                                                                                                           ` Thomas Gleixner
@ 2014-12-20  1:57                                                                                                             ` Linus Torvalds
  2014-12-20 18:25                                                                                                               ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-20  1:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 5:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> The watchdog timer runs on a fully periodic schedule. It's self
> rearming via
>
>          hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
>
> So if that aligns with the equally periodic tick interrupt on the
> other CPU then you might get into that situation due to the fully
> synchronizing and serializing nature of HPET reads.

No. Really. No.

There is no way in hell that a modern system ends up being that
synchronized. Not to the point that the interrupt also happens at the
*exact* same point.

Thomas, you're in denial. It's not just the exact same instruction in
the timer interrupt handler ("timer interrupts and NMI's are in
sync"), the call chain *past* the timer interrupt is the exact same
thing too.  The stack has the same contents. So not only must the two
CPU's be perfectly in sync so that the NMI always triggers on exactly
the same instruction, the CPU that takes the timer interrupt (in your
schenario, over and over again) must be magically filling the stack
with the exact same thing every time.

That CPU isn't making progress that just happens to be "synchronized".

Admit it, you've not seen a busy system that is *so* synchronized that
you get timer interrupts and NMI's that hit on the same instruction
over a sequence of minutes.

> So lets assume there is that hrtimer_remove race (I'm certainly going
> to stare at that tomorrow with fully awake brain. It's past beer
> o'clock here). How do you explain that:
>
> 1) the watchdog always triggers on CPU1?
>
> 2) the race only ever happens on CPU0?

I'm claiming that the race happened *once*. And it then corrupted some
data structure or similar sufficiently that CPU0 keeps looping.

Perhaps something keeps re-adding itself to the head of the timerqueue
due to the race.

The watchdog doesn't trigger on CPU0, because this is the software
watchdog, and interrupts are disabled on CPU0, and CPU0 isn't making
any progress. Because it's looping in a fairly tight loop.

The watchdog triggers on CPU1 (over and over again) because CPU1 is
waiting for the TLB shootdown to complete. And it doesn't, because
interrupts are disabled on CPU0 that it's trying to shoot down the TLB
on.

That theory at least fits the data. So CPU1 isn't doing anything odd at all.

In a way that "magic happens so that everything is so synchronized
that you cross-syncornize two CPU's making real progress over many
minutes". THAT sounds like just a fairy tale to me.

Your argument "it has a question mark in front of it" objection is
bogus. We got an *interrupt* in the middle of the call chain. Of
*course* the call chain is unreliable. It doesn't matter. What matters
is that the stack under the interrupt DOES NOT CHANGE. It doesn't even
matter if it's a real honest-to-god callchain or not, what matters is
that the kernel stack under the interrupt is unchanged. No way does
that happen if it's making progress at the same time.

> 3) the hrtimer interrupt took too long message never appears?

At a guess, it's looping (for a long long time) on
timerqueue_getnext() in hrtimer_interrupt(), and then returns. Never
gets to the retry or the "interrupt took %llu" messages.

And we know what timer entry it is looping on:  tick_sched_timer.
Which does a HRTIMER_RESTART, so the re-enqueueing isn't exactly
unlikely.

All it needs is that the re-enqueueing has gotten confused enough that
it re-enqueues it on the same queue over and over again.  Which would
run tick_sched_timer over and over again. No? Until the re-enqueuing
magically stops (and we do actually have a HPET counter overflow
there. Just look at the RAX values:

 RAX: 00000000fb9d303f
 RAX: 000000002b67efe4

that 00000000fb9d303f is the last time we see that particular
callchain. The next time we see hpet_read(), it's that
000000002b67efe4 thing.

So my "maybe it has something to do with HPET overflow" wasn't just a
random throw-away comment. We actually have real data saying that the
HPET *did* overflow, and it in fact happened somewhere around the time
when the lockup went away.

Are they related? Maybe not. But dammit, there's a lot of
"coincidences" here. Not just the "oh, it always takes the NMI on the
exact same instruction".

> 4) the RT throttler hit?
> 5) that the system makes progress afterwards?

.. something eventually clears the problem, maybe because of the HPET
overflow. I dunno. I'm just saying that your arguments to ignore CPU0
are pretty damn weak.

So I think the CPU1 behavior is 100% consistent with CPU0 just having
interrupts disabled.

So I think CPU1 is _trivial_ to explain if you accept that CPU0 is
doing something weird.

Which is why I think your "look at CPU1" sounds so strange. I don't
think CPU1 is all that interesting. I can *trivially* explain it with
a single sentence, and did exactly that above.

Can you trivially explain CPU0? Without the "it's just a big
coincidence that we take  the NMI on the same instruction for several
minutes, and the stack under the timer interrupt hasn't changed at all
in that same time".

                     Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-19 22:05                                                                                                             ` Linus Torvalds
@ 2014-12-20 16:49                                                                                                               ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-20 16:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 02:05:20PM -0800, Linus Torvalds wrote:
 > > Right now I'm doing Chris' idea of "turn debugging back on,
 > > and try without serial console".  Shall I try your suggestion
 > > on top of that ?
 > 
 > Might as well. I doubt it really will make any difference, but I also
 > don't think it will interact badly in any way, so..

It locked up. It's not even responding to icmp.
It might be Monday before I can see what's on the console.

	Dave



^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-20  1:57                                                                                                             ` Linus Torvalds
@ 2014-12-20 18:25                                                                                                               ` Linus Torvalds
  2014-12-20 21:16                                                                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-20 18:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Fri, Dec 19, 2014 at 5:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm claiming that the race happened *once*. And it then corrupted some
> data structure or similar sufficiently that CPU0 keeps looping.
>
> Perhaps something keeps re-adding itself to the head of the timerqueue
> due to the race.

So tick_sched_timer() does

        ktime_t now = ktime_get();
        ...
        hrtimer_forward(timer, now, tick_period);
        return HRTIMER_RESTART;

and then __run_hrtimer does

        enqueue_hrtimer(timer, base);

which just adds the timer back on the tiemr heap.

So all you need to get an infinite loop (as far as I can see) is that
hrtimer_forward() doesn't actually move the timer forward.

The most likely reason would seem to be this:

        delta = ktime_sub(now, hrtimer_get_expires(timer));

        if (delta.tv64 < 0)
                return 0;

and clearly it *should* return a positive number, since the timer has
expired, so the expiry time _should_ be smaller than "now". So it
should never trigger, and this bug is clearly impossible.

HOWEVER.

It turns out that while tick_sched_timer() does "ktime_get()" to get
the current time, the actual timer machinery does *not* do that at
all. The actual timer machinery does

        entry_time = now = hrtimer_update_base(cpu_base);

                base = cpu_base->clock_base + i;
                basenow = ktime_add(now, base->offset);

_once_ per hrtimer_clock_base. And then it iterates using that
"basenow" thing, and  compares it to the timer expiry.

So we have two different times. Now, let's think about what happens if
those clocks aren't quote in sync.

We know (since __run_hrtimer was called) that

        basenow.tv64 > hrtimer_get_softexpires_tv64(timer)

but here we have "basenow" - which is not that ktime_get(), and we
have "hrtimer_get_softexpires_tv64()" (which is not
hrtimer_get_expires() in general - we have all that "delta" range
handling, but for the scheduling tick it *should* be the same).

So I can see at least one lockup:

 - if "expires < basenow" hrtimer_interrupt() will run the timer
 - if "now < expires" hrtimer_forward() will not do anything, and will
just reschedule the timer with the same expiration

iow, all you need for a re-arming of the same timer is:

   now < expires < basenow

now, the two clocks (now and basenow) are not the same, but they do
have the same *base*. So they are related, but even the base time was
gotten under two different sequence locks, so even the base could have
been updated in between the hrtimer_update_base() time and the
ktime_get(). And even though they have the same base, they have
different offsets: basenow does that "base->offset" thing (and
ktime_get_update_offsets_now() does timekeeping_get_ns()

 - now = ktime_get() does

                base = tk->tkr.base_mono;
                nsecs = timekeeping_get_ns(&tk->tkr);

 - basenow = ktime_get_update_offsets_now() does

                base = tk->tkr.base_mono;
                nsecs = timekeeping_get_ns(&tk->tkr);
          .. and then ..
                ktime_add(.., base->offset);

and if I read the thing right, the ktime_add() should be a no-op,
because base->offset should be 0 for the normal monotonic clock.
Right?

So the two times (now and basenow) *should* be the same time, and the
whole "now < expires < basenow" situation can never happen. Right?

Potentially wrong.

Because that's where the whole "different sequence locks" comes in.
The one-time race could be something that updates the base in between
the (one-time) ktime_get_update_offsets_now() and the (then as a
result pseudo-infinitely repeating) ktime_get.

Hmm? If "base" ever goes backwards, or if "base" does *not* update
atomically with the HPET timer overflows, I can see that happening. Of
course, that would imply that ktime_get() is not monotonic. But we do
know that we've had odd time issues on that machine.

I think you already had DaveJ check clock monotonicity. But that was
with the TSC, wasn't it? I'm claiming maybe the HPET isn't monotonic,
and there is some HPET clocksource issue with overflow in 32 bits.

(I think the HPET *should* be 64-bit, and just the comparators for
interrupts may be 32-bit, but we use a "readl()" and only use the low
32-bits even if the upper 32 bits *might* be ok).

I keep harping on that HPET overflow, because we actually have the 6
"locked up" HPET traces, and then we have a seventh without that
lockup, and there definitely was a overflow in 32 bits:

  [torvalds@i7 linux]$ grep -3 read_hpet+0x16 ~/dj-1.txt | grep RAX
   RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000
   RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000
   RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000
   RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000
   RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000
   RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000
   RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004

and I have just gotten hung up on that small detail.

How/where is the HPET overflow case handled? I don't know the code enough.

(Also, maybe I shouldn't be so hung up on this *one* long trace from
DaveJ. There's been a lot of crazy traces from that machine. We've had
some time-handling questions about it before, but *most* of the traces
have not been implicating the HPET like this one, so..)

                            Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-20 18:25                                                                                                               ` Linus Torvalds
@ 2014-12-20 21:16                                                                                                                 ` Linus Torvalds
  2014-12-21  3:52                                                                                                                   ` Paul E. McKenney
  2014-12-21 21:22                                                                                                                   ` Linus Torvalds
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-20 21:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sat, Dec 20, 2014 at 10:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> How/where is the HPET overflow case handled? I don't know the code enough.

Hmm, ok, I've re-acquainted myself with it. And I have to admit that I
can't see anything wrong. The whole "update_wall_clock" and the shadow
timekeeping state is confusing as hell, but seems fine. We'd have to
avoid update_wall_clock for a *long* time for overflows to occur.

And the overflow in 32 bits isn't that special, since the only thing
that really matters is the overflow of "cycle_now - tkr->cycle_last"
within the mask.

So I'm not seeing anything even halfway suspicious.

                  Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-20 21:16                                                                                                                 ` Linus Torvalds
@ 2014-12-21  3:52                                                                                                                   ` Paul E. McKenney
  2014-12-21 21:22                                                                                                                   ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-21  3:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Dave Jones, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sat, Dec 20, 2014 at 01:16:29PM -0800, Linus Torvalds wrote:
> On Sat, Dec 20, 2014 at 10:25 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > How/where is the HPET overflow case handled? I don't know the code enough.
> 
> Hmm, ok, I've re-acquainted myself with it. And I have to admit that I
> can't see anything wrong. The whole "update_wall_clock" and the shadow
> timekeeping state is confusing as hell, but seems fine. We'd have to
> avoid update_wall_clock for a *long* time for overflows to occur.
> 
> And the overflow in 32 bits isn't that special, since the only thing
> that really matters is the overflow of "cycle_now - tkr->cycle_last"
> within the mask.
> 
> So I'm not seeing anything even halfway suspicious.

One long shot is a bug in rcu_barrier() that I introduced in v3.18-rc1.
This is a low-probability race that can cause rcu_barrier() and friends
to return too soon, which can of course result in arbitrary misbehavior.
Please see below for a fix which looks good thus far in reasonably
intense rcutorture testing.

Might be what Dave and Sasha are seeing.  Or not.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Fix rcu_barrier() race that could result in too-short wait

The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions.
It checks a given CPU's lists of callbacks, and if all three no-CBs lists
are empty, ignores that CPU.  However, these three lists could potentially
be empty even when callbacks are present if the check executed just as
the callbacks were being moved from one list to another.  It turns out
that recent versions of rcutorture can spot this race.

This commit plugs this hole by consolidating the per-list counts of
no-CBs callbacks into a single count, which is incremented before
the corresponding callback is posted and after it is invoked.  Then
rcu_barrier() checks this single count to reliably determine whether
the corresponding CPU has no-CBs callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7680fc275036..658b691dc32b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3318,6 +3318,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 			} else {
 				_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
 						   rsp->n_barrier_done);
+				smp_mb__before_atomic();
 				atomic_inc(&rsp->barrier_cpu_count);
 				__call_rcu(&rdp->barrier_head,
 					   rcu_barrier_callback, rsp, cpu, 0);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 8e7b1843896e..cb5908672f11 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -340,14 +340,10 @@ struct rcu_data {
 #ifdef CONFIG_RCU_NOCB_CPU
 	struct rcu_head *nocb_head;	/* CBs waiting for kthread. */
 	struct rcu_head **nocb_tail;
-	atomic_long_t nocb_q_count;	/* # CBs waiting for kthread */
-	atomic_long_t nocb_q_count_lazy; /*  (approximate). */
+	atomic_long_t nocb_q_count;	/* # CBs waiting for nocb */
+	atomic_long_t nocb_q_count_lazy; /*  invocation (all stages). */
 	struct rcu_head *nocb_follower_head; /* CBs ready to invoke. */
 	struct rcu_head **nocb_follower_tail;
-	atomic_long_t nocb_follower_count; /* # CBs ready to invoke. */
-	atomic_long_t nocb_follower_count_lazy; /*  (approximate). */
-	int nocb_p_count;		/* # CBs being invoked by kthread */
-	int nocb_p_count_lazy;		/*  (approximate). */
 	wait_queue_head_t nocb_wq;	/* For nocb kthreads to sleep on. */
 	struct task_struct *nocb_kthread;
 	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
@@ -356,8 +352,6 @@ struct rcu_data {
 	struct rcu_head *nocb_gp_head ____cacheline_internodealigned_in_smp;
 					/* CBs waiting for GP. */
 	struct rcu_head **nocb_gp_tail;
-	long nocb_gp_count;
-	long nocb_gp_count_lazy;
 	bool nocb_leader_sleep;		/* Is the nocb leader thread asleep? */
 	struct rcu_data *nocb_next_follower;
 					/* Next follower in wakeup chain. */
@@ -622,24 +616,15 @@ static void rcu_dynticks_task_exit(void);
 #endif /* #ifndef RCU_TREE_NONCORE */
 
 #ifdef CONFIG_RCU_TRACE
-#ifdef CONFIG_RCU_NOCB_CPU
-/* Sum up queue lengths for tracing. */
+/* Read out queue lengths for tracing. */
 static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 {
-	*ql = atomic_long_read(&rdp->nocb_q_count) +
-	      rdp->nocb_p_count +
-	      atomic_long_read(&rdp->nocb_follower_count) +
-	      rdp->nocb_p_count + rdp->nocb_gp_count;
-	*qll = atomic_long_read(&rdp->nocb_q_count_lazy) +
-	       rdp->nocb_p_count_lazy +
-	       atomic_long_read(&rdp->nocb_follower_count_lazy) +
-	       rdp->nocb_p_count_lazy + rdp->nocb_gp_count_lazy;
-}
+#ifdef CONFIG_RCU_NOCB_CPU
+	*ql = atomic_long_read(&rdp->nocb_q_count);
+	*qll = atomic_long_read(&rdp->nocb_q_count_lazy);
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
-static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
-{
 	*ql = 0;
 	*qll = 0;
-}
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
+}
 #endif /* #ifdef CONFIG_RCU_TRACE */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3ec85cb5d544..e5c43b7f63f2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2056,9 +2056,26 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
 static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
 {
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	unsigned long ret;
+#ifdef CONFIG_PROVE_RCU
 	struct rcu_head *rhp;
+#endif /* #ifdef CONFIG_PROVE_RCU */
+
+	/*
+	 * Check count of all no-CBs callbacks awaiting invocation.
+	 * There needs to be a barrier before this function is called,
+	 * but associated with a prior determination that no more
+	 * callbacks would be posted.  In the worst case, the first
+	 * barrier in _rcu_barrier() suffices (but the caller cannot
+	 * necessarily rely on this, not a substitute for the caller
+	 * getting the concurrency design right!).  There must also be
+	 * a barrier between the following load an posting of a callback
+	 * (if a callback is in fact needed).  This is associated with an
+	 * atomic_inc() in the caller.
+	 */
+	ret = atomic_long_read(&rdp->nocb_q_count);
 
-	/* No-CBs CPUs might have callbacks on any of three lists. */
+#ifdef CONFIG_PROVE_RCU
 	rhp = ACCESS_ONCE(rdp->nocb_head);
 	if (!rhp)
 		rhp = ACCESS_ONCE(rdp->nocb_gp_head);
@@ -2072,8 +2089,9 @@ static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
 		       cpu, rhp->func);
 		WARN_ON_ONCE(1);
 	}
+#endif /* #ifdef CONFIG_PROVE_RCU */
 
-	return !!rhp;
+	return !!ret;
 }
 
 /*
@@ -2095,9 +2113,10 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 	struct task_struct *t;
 
 	/* Enqueue the callback on the nocb list and update counts. */
+	atomic_long_add(rhcount, &rdp->nocb_q_count);
+	/* rcu_barrier() relies on ->nocb_q_count add before xchg. */
 	old_rhpp = xchg(&rdp->nocb_tail, rhtp);
 	ACCESS_ONCE(*old_rhpp) = rhp;
-	atomic_long_add(rhcount, &rdp->nocb_q_count);
 	atomic_long_add(rhcount_lazy, &rdp->nocb_q_count_lazy);
 	smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */
 
@@ -2288,9 +2307,6 @@ wait_again:
 		/* Move callbacks to wait-for-GP list, which is empty. */
 		ACCESS_ONCE(rdp->nocb_head) = NULL;
 		rdp->nocb_gp_tail = xchg(&rdp->nocb_tail, &rdp->nocb_head);
-		rdp->nocb_gp_count = atomic_long_xchg(&rdp->nocb_q_count, 0);
-		rdp->nocb_gp_count_lazy =
-			atomic_long_xchg(&rdp->nocb_q_count_lazy, 0);
 		gotcbs = true;
 	}
 
@@ -2338,9 +2354,6 @@ wait_again:
 		/* Append callbacks to follower's "done" list. */
 		tail = xchg(&rdp->nocb_follower_tail, rdp->nocb_gp_tail);
 		*tail = rdp->nocb_gp_head;
-		atomic_long_add(rdp->nocb_gp_count, &rdp->nocb_follower_count);
-		atomic_long_add(rdp->nocb_gp_count_lazy,
-				&rdp->nocb_follower_count_lazy);
 		smp_mb__after_atomic(); /* Store *tail before wakeup. */
 		if (rdp != my_rdp && tail == &rdp->nocb_follower_head) {
 			/*
@@ -2415,13 +2428,11 @@ static int rcu_nocb_kthread(void *arg)
 		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeNonEmpty");
 		ACCESS_ONCE(rdp->nocb_follower_head) = NULL;
 		tail = xchg(&rdp->nocb_follower_tail, &rdp->nocb_follower_head);
-		c = atomic_long_xchg(&rdp->nocb_follower_count, 0);
-		cl = atomic_long_xchg(&rdp->nocb_follower_count_lazy, 0);
-		rdp->nocb_p_count += c;
-		rdp->nocb_p_count_lazy += cl;
 
 		/* Each pass through the following loop invokes a callback. */
-		trace_rcu_batch_start(rdp->rsp->name, cl, c, -1);
+		trace_rcu_batch_start(rdp->rsp->name,
+				      atomic_long_read(&rdp->nocb_q_count_lazy),
+				      atomic_long_read(&rdp->nocb_q_count), -1);
 		c = cl = 0;
 		while (list) {
 			next = list->next;
@@ -2443,9 +2454,9 @@ static int rcu_nocb_kthread(void *arg)
 			list = next;
 		}
 		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
-		ACCESS_ONCE(rdp->nocb_p_count) = rdp->nocb_p_count - c;
-		ACCESS_ONCE(rdp->nocb_p_count_lazy) =
-						rdp->nocb_p_count_lazy - cl;
+		smp_mb__before_atomic();  /* _add after CB invocation. */
+		atomic_long_add(-c, &rdp->nocb_q_count);
+		atomic_long_add(-cl, &rdp->nocb_q_count_lazy);
 		rdp->n_nocbs_invoked += c;
 	}
 	return 0;


^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-20 21:16                                                                                                                 ` Linus Torvalds
  2014-12-21  3:52                                                                                                                   ` Paul E. McKenney
@ 2014-12-21 21:22                                                                                                                   ` Linus Torvalds
  2014-12-21 22:19                                                                                                                     ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-21 21:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sat, Dec 20, 2014 at 1:16 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm, ok, I've re-acquainted myself with it. And I have to admit that I
> can't see anything wrong. The whole "update_wall_clock" and the shadow
> timekeeping state is confusing as hell, but seems fine. We'd have to
> avoid update_wall_clock for a *long* time for overflows to occur.
>
> And the overflow in 32 bits isn't that special, since the only thing
> that really matters is the overflow of "cycle_now - tkr->cycle_last"
> within the mask.
>
> So I'm not seeing anything even halfway suspicious.

.. of course, this reminds me of the "clocksource TSC unstable" issue.

Then *simple* solution may actually be that the HPET itself is
buggered. That would explain both the "clocksource TSC unstable"
messages _and_ the "time went backwards, so now we're re-arming the
scheduler tick 'forever' until time has gone forwards again".

And googling for this actually shows other people seeing similar
issues, including hangs after switching to another clocksource. See
for example

   http://stackoverflow.com/questions/13796944/system-hang-with-possible-relevance-to-clocksource-tsc-unstable

which switches to acpi_pm (not HPET) and then hangs afterwards.

Of course, it may be the switching itself that causes some issue.

Btw, there's another reason to think that it's the HPET, I just realized.

DaveJ posted all his odd TSC unstable things, and the delta was pretty
damn random. But it did have a range: it was in the 1-251 second
range.

With a 14.318MHz clock (which is, I think, the normal HPET frequency),
a 32-bit overflow happens in about 300 seconds.

So the range of 1-251 seconds  is not entirely random. It's all in
that "32-bit HPET range".

In contrast, wrt the TSC frequency, that kind of range makes no sense at all.

                          Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-21 21:22                                                                                                                   ` Linus Torvalds
@ 2014-12-21 22:19                                                                                                                     ` Linus Torvalds
  2014-12-21 22:32                                                                                                                       ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-21 22:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 21, 2014 at 1:22 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So the range of 1-251 seconds  is not entirely random. It's all in
> that "32-bit HPET range".

DaveJ, I assume it's too late now, and you don't effectively have any
access to the machine any more, but "hpet=disable" or "nohpet" on the
command line might be worth trying if you ever see that piece of
hardware again.

And for posterity, do you have a dmidecode with motherboard/BIOS
information for the problematic machine? And your configuration?

And finally, and stupidly, is there any chance that you have anything
accessing /dev/hpet?

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-21 22:19                                                                                                                     ` Linus Torvalds
@ 2014-12-21 22:32                                                                                                                       ` Dave Jones
  2014-12-21 23:58                                                                                                                         ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-21 22:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 21, 2014 at 02:19:03PM -0800, Linus Torvalds wrote:

 > > So the range of 1-251 seconds  is not entirely random. It's all in
 > > that "32-bit HPET range".
 > 
 > DaveJ, I assume it's too late now, and you don't effectively have any
 > access to the machine any more, but "hpet=disable" or "nohpet" on the
 > command line might be worth trying if you ever see that piece of
 > hardware again.

I can give it a try tomorrow. I'm probably saying goodbye to that
machine on Tuesday, so we'll have 24hrs of testing at least.

 > And for posterity, do you have a dmidecode with motherboard/BIOS
 > information for the problematic machine? And your configuration?

I can grab that in the morning too.

 > And finally, and stupidly, is there any chance that you have anything
 > accessing /dev/hpet?

Not knowingly at least, but who the hell knows what systemd has its
fingers in these days.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-21 22:32                                                                                                                       ` Dave Jones
@ 2014-12-21 23:58                                                                                                                         ` Linus Torvalds
  2014-12-22  0:41                                                                                                                           ` Linus Torvalds
  2015-01-12 10:05                                                                                                                           ` Thomas Gleixner
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-21 23:58 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin

On Sun, Dec 21, 2014 at 2:32 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
> On Sun, Dec 21, 2014 at 02:19:03PM -0800, Linus Torvalds wrote:
> >
>  > And finally, and stupidly, is there any chance that you have anything
>  > accessing /dev/hpet?
>
> Not knowingly at least, but who the hell knows what systemd has its
> fingers in these days.

Actually, it looks like /dev/hpet doesn't allow write access.

I can do the mmap(/dev/mem) thing and access the HPET by hand, and
when I write zero to it I immediately get something like this:

  Clocksource tsc unstable (delta = -284317725450 ns)
  Switched to clocksource hpet

just to confirm that yes, a jump in the HPET counter would indeed give
those kinds of symptoms:blaming the TSC with a negative delta in the
0-300s range, even though it's the HPET that is broken.

And if the HPET then occasionally jumps around afterwards, it would
show up as ktime_get() occasionally going backwards, which in turn
would - as far as I can tell - result in exactly that pseudo-infirnite
loop with timers.

Anyway, any wild kernel pointer access *could* happen to just hit the
HPET and write to the main counter value, although I'd personally be
more inclined to blame BIOS/SMM kind of code playing tricks with
time.. We do have a few places where we explicitly write the value on
purpose, but they are in the HPET init code, and in the clocksource
resume code, so they should not be involved.

Thomas - have you had reports of HPET breakage in RT circles, the same
way BIOSes have been tinkering with TSC?

Also, would it perhaps be a good idea to make "ktime_get()" save the
last time in a percpu variable, and warn if time ever goes backwards
on a particular CPU?  A percpu thing should be pretty cheap, even if
we write to it every time somebody asks for time..

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-21 23:58                                                                                                                         ` Linus Torvalds
@ 2014-12-22  0:41                                                                                                                           ` Linus Torvalds
  2014-12-22  0:52                                                                                                                             ` Linus Torvalds
  2014-12-22 19:47                                                                                                                             ` Linus Torvalds
  2015-01-12 10:05                                                                                                                           ` Thomas Gleixner
  1 sibling, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-22  0:41 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 1534 bytes --]

On Sun, Dec 21, 2014 at 3:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I can do the mmap(/dev/mem) thing and access the HPET by hand, and
> when I write zero to it I immediately get something like this:
>
>   Clocksource tsc unstable (delta = -284317725450 ns)
>   Switched to clocksource hpet
>
> just to confirm that yes, a jump in the HPET counter would indeed give
> those kinds of symptoms:blaming the TSC with a negative delta in the
> 0-300s range, even though it's the HPET that is broken.
>
> And if the HPET then occasionally jumps around afterwards, it would
> show up as ktime_get() occasionally going backwards, which in turn
> would - as far as I can tell - result in exactly that pseudo-infirnite
> loop with timers.

Ok, so I tried that too.

It's actually a pretty easy experiment to do: just mmap(/dev/mem) at
the HPET offset (the kernel prints it out at boot, it should normally
be at 0xfed00000). And then just write a zero to offset 0xf0, which is
the main counter.

The first time, you get the "Clocksource tsc unstable".

The second time (or third, or fourth - it might not take immediately)
you get a lockup or similar. Bad things happen.

This is *not* to say that this is the bug you're hitting. But it does show that

 (a) a flaky HPET can do some seriously bad stuff
 (b) the kernel is very fragile wrt time going backwards.

and maybe we can use this test program to at least try to alleviate problem (b).

Trivial HPET mess-up program attached.

                                Linus

[-- Attachment #2: hpet-mess.c --]
[-- Type: text/x-csrc, Size: 479 bytes --]

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>


int main(int argc, char **argv)
{
	int fd = open("/dev/mem", O_RDWR);
	void *base;

	if (fd < 0) {
		fputs("Unable to open /dev/mem\n", stderr);
		return -1;
	}
	base = mmap(NULL, 4096 ,PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xfed00000);
	if ((long)base == -1) {
		fputs("Unable to mmap HPET\n", stderr);
		return -1;
	}
	*(unsigned long *) (base+0xf0) = 0;
	return 0;
}

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22  0:41                                                                                                                           ` Linus Torvalds
@ 2014-12-22  0:52                                                                                                                             ` Linus Torvalds
  2014-12-22  1:22                                                                                                                               ` Dave Jones
  2014-12-22  3:11                                                                                                                               ` Paul E. McKenney
  2014-12-22 19:47                                                                                                                             ` Linus Torvalds
  1 sibling, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-22  0:52 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin

On Sun, Dec 21, 2014 at 4:41 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The second time (or third, or fourth - it might not take immediately)
> you get a lockup or similar. Bad things happen.

I've only tested it twice now, but the first time I got a weird
lockup-like thing (things *kind* of worked, but I could imagine that
one CPU was stuck with a lock held, because things eventually ground
to a screeching halt.

The second time I got

  INFO: rcu_sched self-detected stall on CPU { 5}  (t=84533 jiffies
g=11971 c=11970 q=17)

and then

   INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3 4 5 6 7}
(detected by 0, t=291309 jiffies, g=12031, c=12030, q=57)

with backtraces that made no sense (because obviously no actual stall
had taken place), and were the CPU's mostly being idle.

I could easily see it resulting in your softlockup scenario too.

                          Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22  0:52                                                                                                                             ` Linus Torvalds
@ 2014-12-22  1:22                                                                                                                               ` Dave Jones
  2014-12-22  3:11                                                                                                                               ` Paul E. McKenney
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-22  1:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 21, 2014 at 04:52:28PM -0800, Linus Torvalds wrote:
 > > The second time (or third, or fourth - it might not take immediately)
 > > you get a lockup or similar. Bad things happen.
 > 
 > I've only tested it twice now, but the first time I got a weird
 > lockup-like thing (things *kind* of worked, but I could imagine that
 > one CPU was stuck with a lock held, because things eventually ground
 > to a screeching halt.
 > 
 > The second time I got
 > 
 >   INFO: rcu_sched self-detected stall on CPU { 5}  (t=84533 jiffies
 > g=11971 c=11970 q=17)
 > 
 > and then
 > 
 >    INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3 4 5 6 7}
 > (detected by 0, t=291309 jiffies, g=12031, c=12030, q=57)
 > 
 > with backtraces that made no sense (because obviously no actual stall
 > had taken place), and were the CPU's mostly being idle.
 > 
 > I could easily see it resulting in your softlockup scenario too.

So something trinity does when it doesn't have a better idea of
something to pass a syscall is to generate a random number.

A wild hypothesis could be that we're in one of these situations,
and we randomly generated 0xfed000f0 and passed that as a value to
a syscall, and the kernel wrote 0 to that address.

What syscall could do that, and not just fail a access_ok() or similar
is a mystery though.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22  0:52                                                                                                                             ` Linus Torvalds
  2014-12-22  1:22                                                                                                                               ` Dave Jones
@ 2014-12-22  3:11                                                                                                                               ` Paul E. McKenney
  1 sibling, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-22  3:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, Dec 21, 2014 at 04:52:28PM -0800, Linus Torvalds wrote:
> On Sun, Dec 21, 2014 at 4:41 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > The second time (or third, or fourth - it might not take immediately)
> > you get a lockup or similar. Bad things happen.
> 
> I've only tested it twice now, but the first time I got a weird
> lockup-like thing (things *kind* of worked, but I could imagine that
> one CPU was stuck with a lock held, because things eventually ground
> to a screeching halt.
> 
> The second time I got
> 
>   INFO: rcu_sched self-detected stall on CPU { 5}  (t=84533 jiffies
> g=11971 c=11970 q=17)
> 
> and then
> 
>    INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3 4 5 6 7}
> (detected by 0, t=291309 jiffies, g=12031, c=12030, q=57)
> 
> with backtraces that made no sense (because obviously no actual stall
> had taken place), and were the CPU's mostly being idle.

Yep, if time gets messed up too much, RCU can incorrectly decide that
21 seconds have elapsed since the grace period started, and can even
decide this pretty much immediately after the grace period starts.

							Thanx, Paul

> I could easily see it resulting in your softlockup scenario too.
> 
>                           Linus
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22  0:41                                                                                                                           ` Linus Torvalds
  2014-12-22  0:52                                                                                                                             ` Linus Torvalds
@ 2014-12-22 19:47                                                                                                                             ` Linus Torvalds
  2014-12-22 20:06                                                                                                                               ` Linus Torvalds
                                                                                                                                                 ` (2 more replies)
  1 sibling, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-22 19:47 UTC (permalink / raw)
  To: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin, John Stultz

[-- Attachment #1: Type: text/plain, Size: 4578 bytes --]

On Sun, Dec 21, 2014 at 4:41 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This is *not* to say that this is the bug you're hitting. But it does show that
>
>  (a) a flaky HPET can do some seriously bad stuff
>  (b) the kernel is very fragile wrt time going backwards.
>
> and maybe we can use this test program to at least try to alleviate problem (b).

Ok, so after several false starts (ktime_get() is really really
fragile - called in scheduler things, and doing magic things at
bootup), here is something that seems to alleviate the problem for me.

I still get a lot of RCU  messages like "self-detected stall" etc, but
that's to be expected. When the clock does odd things, crap *will*
happen.

But what this does is:

 (a) make the error more visible as a clock error rather than various
random downstream users

     IOW, it prints things out when it looks like we're getting odd
clock read errors (arbitrary cut-off: we expect clock read-outs to be
withing 1/8th of the range of the expected clock value)

 (b) try to alleviate the horrible things that happen when the clock
error is big

     The patch tries to "correct" for the huge time jump by basically
undoing it. We'll still see time jumps (there really is no way to
avoid it), but we limit the range of them.

With the attached patch, my machine seems to survive me writing to the
HPET master counter register. It spews warnings, and it is noisy about
the odd clock reads:

    ...
    Clocksource hpet had cycles off by 642817751
    Cutting it too close for hpet in in update_wall_time (offset = 4034102337)
    INFO: rcu_sched self-detected stall on CPU { 0}  (t=281743 jiffies
g=4722 c=4721 q=14)
    ...

and there may still be situations where it does horrible horrible
things due to the (smaller) time leaps, but it does seem a lot more
robust.

NOTE! There's an (intentional) difference in how we handle the time
leaps at time read time vs write (wall-clock update).

At time read time, we just refuse to believe the big delta, and we set
the "cycle_error" value so that future time reads will be relative to
the error we just got. We also don't print anything out, because we're
possibly deep in the scheduler or in tracing, and we do not want to
spam the console about our fixup.

At time *write* time, we first report about any read-time errors, and
then we report (but believe in) overlarge clocksource delta errors as
we update the time.

This seems to be the best way to limit the damage.

Also note that the patch is entirely clock-agnostic. It's just that I
can trivially screw up my HPET, I didn't look at other clocks.

One note: my current limit of clocksource delta errors is based on the
range of the clock (1/8th of the range). I actually think that's
bogus, and it should instead be based on the expected frequency of the
clock (ie "we are guaranteed to update the wall clock at least once
every second, so if the clock source delta read is larger than one
second, we've done something wrong"). So this patch is meant very much
as an RFC, rather than anything else. It's pretty hacky. But it does
actually make a huge difference for me wrt the "mess up HPET time on
purpose". That used to crash my machine pretty hard, and pretty
reliably. With this patch, I've done it ten+ times, and while it spews
a lot of garbage, the machine stays up and _works_.

Making the sanity check tighter (ie the "one second" band rather than
"1/8th of the clock range") would probably just improve it further.

Thomas, what do you think? Hate it? Any better ideas?

And again: this is not trying to make the kernel clock not jump. There
is no way I can come up with even in theory to try to really *fix* a
fundamentally broken clock.

So this is not meant to be a real "fix" for anything, but is meant to
make sure that if the clock is unreliable, we pinpoint the clock
itself, and it mitigates the absolutely horrendously bad behavior we
currently with bad clocks. So think of this as debug and band-aid
rather than "this makes clocks magically reliable".

.. and we might still lock up under some circumstances. But at least
from my limited testing, it is infinitely much better, even if it
might not be perfect. Also note that my "testing" has been writing
zero to the HPET lock (so the HPET clock difference tends to be pretty
specific), while my next step is to see what happens when I write
random values (and a lot of them).

Since I expect that to cause more problems, I thought I'd send this
RFC out before I start killing my machine again ;)

                             Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 3206 bytes --]

 include/linux/timekeeper_internal.h |  1 +
 kernel/time/timekeeping.c           | 25 ++++++++++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 05af9a334893..0fcb60d77079 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -32,6 +32,7 @@ struct tk_read_base {
 	cycle_t			(*read)(struct clocksource *cs);
 	cycle_t			mask;
 	cycle_t			cycle_last;
+	cycle_t			cycle_error;
 	u32			mult;
 	u32			shift;
 	u64			xtime_nsec;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6a931852082f..1c842ddd567f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -140,6 +140,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	tk->tkr.read = clock->read;
 	tk->tkr.mask = clock->mask;
 	tk->tkr.cycle_last = tk->tkr.read(clock);
+	tk->tkr.cycle_error = 0;
 
 	/* Do the ns -> cycle conversion first, using original mult */
 	tmp = NTP_INTERVAL_LENGTH;
@@ -197,11 +198,17 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 	s64 nsec;
 
 	/* read clocksource: */
-	cycle_now = tkr->read(tkr->clock);
+	cycle_now = tkr->read(tkr->clock) + tkr->cycle_error;
 
 	/* calculate the delta since the last update_wall_time: */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
+	/* Hmm? This is really not good, we're too close to overflowing */
+	if (unlikely(delta > (tkr->mask >> 3))) {
+		tkr->cycle_error = delta;
+		delta = 0;
+	}
+
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
 
@@ -465,6 +472,16 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action)
 	update_fast_timekeeper(tk);
 }
 
+static void check_cycle_error(struct tk_read_base *tkr)
+{
+	cycle_t error = tkr->cycle_error;
+
+	if (unlikely(error)) {
+		tkr->cycle_error = 0;
+		pr_err("Clocksource %s had cycles off by %llu\n", tkr->clock->name, error);
+	}
+}
+
 /**
  * timekeeping_forward_now - update clock to the current time
  *
@@ -481,6 +498,7 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 	cycle_now = tk->tkr.read(clock);
 	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
 	tk->tkr.cycle_last = cycle_now;
+	check_cycle_error(&tk->tkr);
 
 	tk->tkr.xtime_nsec += delta * tk->tkr.mult;
 
@@ -1237,6 +1255,7 @@ static void timekeeping_resume(void)
 
 	/* Re-base the last cycle value */
 	tk->tkr.cycle_last = cycle_now;
+	tk->tkr.cycle_error = 0;
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
 	timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
@@ -1591,11 +1610,15 @@ void update_wall_time(void)
 	if (unlikely(timekeeping_suspended))
 		goto out;
 
+	check_cycle_error(&real_tk->tkr);
+
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
 	offset = real_tk->cycle_interval;
 #else
 	offset = clocksource_delta(tk->tkr.read(tk->tkr.clock),
 				   tk->tkr.cycle_last, tk->tkr.mask);
+	if (unlikely(offset > (tk->tkr.mask >> 3)))
+		pr_err("Cutting it too close for %s in in update_wall_time (offset = %llu)\n", tk->tkr.clock->name, offset);
 #endif
 
 	/* Check if there's really nothing to do */

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22 19:47                                                                                                                             ` Linus Torvalds
@ 2014-12-22 20:06                                                                                                                               ` Linus Torvalds
  2014-12-22 22:57                                                                                                                               ` Dave Jones
  2014-12-22 23:59                                                                                                                               ` John Stultz
  2 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-22 20:06 UTC (permalink / raw)
  To: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin, John Stultz

On Mon, Dec 22, 2014 at 11:47 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> .. and we might still lock up under some circumstances. But at least
> from my limited testing, it is infinitely much better, even if it
> might not be perfect. Also note that my "testing" has been writing
> zero to the HPET lock (so the HPET clock difference tends to be pretty
> specific), while my next step is to see what happens when I write
> random values (and a lot of them).
>
> Since I expect that to cause more problems, I thought I'd send this
> RFC out before I start killing my machine again ;)

Ok, not horrible. Although I'd suggest not testing in a terminal
window while running X. The time jumping will confuse X input timing
and the screensaver, to the point that the machine may not be dead,
but it isn't exactly usable. Do it in a virtual console.

Again, making the limit tighter (one second?) and perhaps not trusting
insane values too much at walltime clock update time either, might
make it all work smoother still.

I did manage to confuse systemd with all the garbage the kernel
spewed, with a lot of stuff like:

   systemd-journald[779]: Failed to write entry (9 items, 276 bytes),
ignoring: Invalid argument

showing up in the logs, but I'm writing this without having had to
reboot the machine.

                         Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22 19:47                                                                                                                             ` Linus Torvalds
  2014-12-22 20:06                                                                                                                               ` Linus Torvalds
@ 2014-12-22 22:57                                                                                                                               ` Dave Jones
  2014-12-22 23:59                                                                                                                                 ` Linus Torvalds
  2014-12-22 23:59                                                                                                                               ` John Stultz
  2 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-22 22:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Mon, Dec 22, 2014 at 11:47:37AM -0800, Linus Torvalds wrote:
 > And again: this is not trying to make the kernel clock not jump. There
 > is no way I can come up with even in theory to try to really *fix* a
 > fundamentally broken clock.
 > 
 > So this is not meant to be a real "fix" for anything, but is meant to
 > make sure that if the clock is unreliable, we pinpoint the clock
 > itself, and it mitigates the absolutely horrendously bad behavior we
 > currently with bad clocks. So think of this as debug and band-aid
 > rather than "this makes clocks magically reliable".
 > 
 > .. and we might still lock up under some circumstances. But at least
 > from my limited testing, it is infinitely much better, even if it
 > might not be perfect. Also note that my "testing" has been writing
 > zero to the HPET lock (so the HPET clock difference tends to be pretty
 > specific), while my next step is to see what happens when I write
 > random values (and a lot of them).
 > 
 > Since I expect that to cause more problems, I thought I'd send this
 > RFC out before I start killing my machine again ;)

I tried the nohpet thing for a few hours this morning and didn't see
anything weird, but it may have been that I just didn't run long enough.
When I saw your patch, I gave that a shot instead, with hpet enabled
again.  Just got back to find lots of messages in dmesg, but none
of the usual NMI/lockup messages.

[ 2256.430694] Clocksource tsc unstable (delta = -142018897190 ns)
[ 2256.437433] Switched to clocksource hpet
[ 2279.788605] Clocksource hpet had cycles off by 4294946559
[ 2280.191272] Clocksource hpet had cycles off by 4294905111
[ 2282.605990] Clocksource hpet had cycles off by 4294960721
[ 2284.485410] Clocksource hpet had cycles off by 4294953427
[ 2288.954844] Clocksource hpet had cycles off by 4294924880
[ 2305.202931] Clocksource hpet had cycles off by 4294960429
[ 2315.527247] Clocksource hpet had cycles off by 4294956296
[ 2318.954066] Clocksource hpet had cycles off by 4293673652
[ 2332.370923] Clocksource hpet had cycles off by 4294907221
[ 2332.739861] Clocksource hpet had cycles off by 4294919496
[ 2345.459694] Clocksource hpet had cycles off by 4294959592
[ 2346.159780] Clocksource hpet had cycles off by 4294952613
[ 2348.132071] Clocksource hpet had cycles off by 4294903415
[ 2348.207593] Clocksource hpet had cycles off by 4294966900
[ 2351.699779] Clocksource hpet had cycles off by 4294906755
[ 2354.125982] Clocksource hpet had cycles off by 4294941028
[ 2365.249438] Clocksource hpet had cycles off by 4294942458
[ 2370.247560] Clocksource hpet had cycles off by 4294927938
[ 2372.554642] Clocksource hpet had cycles off by 4294950723
[ 2377.361721] Clocksource hpet had cycles off by 4294952569
[ 2384.747820] Clocksource hpet had cycles off by 4294947263
[ 2389.133886] Clocksource hpet had cycles off by 4294967233
[ 2392.423458] Clocksource hpet had cycles off by 4294946214
[ 2397.648955] Clocksource hpet had cycles off by 4294967205
[ 2405.228015] Clocksource hpet had cycles off by 4294917938
[ 2429.571163] Clocksource hpet had cycles off by 4294957112
[ 2434.214788] Clocksource hpet had cycles off by 4294866662
[ 2438.686705] Clocksource hpet had cycles off by 4294945380
[ 2440.280478] Clocksource hpet had cycles off by 4294878090
[ 2458.370164] Clocksource hpet had cycles off by 4294875577
[ 2496.916971] Clocksource hpet had cycles off by 4294887574
[ 2516.314875] Clocksource hpet had cycles off by 4294899744
[ 2519.857221] Clocksource hpet had cycles off by 4294836752
[ 2522.696576] Clocksource hpet had cycles off by 4294965711
[ 2527.599967] Clocksource hpet had cycles off by 4294876467
[ 2528.573678] Clocksource hpet had cycles off by 4294815154
[ 2537.325296] Clocksource hpet had cycles off by 4294862624
[ 2542.296016] Clocksource hpet had cycles off by 4294954228
[ 2558.634123] Clocksource hpet had cycles off by 4294845883
[ 2560.804973] Clocksource hpet had cycles off by 4294958781
[ 2579.057030] Clocksource hpet had cycles off by 4294921012
[ 2588.139716] Clocksource hpet had cycles off by 4294950381
[ 2594.076877] Clocksource hpet had cycles off by 4294941777
[ 2597.645800] Clocksource hpet had cycles off by 4294927609
[ 2605.032338] Clocksource hpet had cycles off by 4294915823
[ 2605.239672] Clocksource hpet had cycles off by 4294952275
[ 2605.294230] Clocksource hpet had cycles off by 4294886603
[ 2609.801532] Clocksource hpet had cycles off by 4294887976
[ 2615.003674] Clocksource hpet had cycles off by 4294957202
[ 2641.039536] Clocksource hpet had cycles off by 4294943689
[ 2644.554947] Clocksource hpet had cycles off by 4294837076
[ 2648.576203] Clocksource hpet had cycles off by 4294928887
[ 2648.627249] Clocksource hpet had cycles off by 4294913656
[ 2680.465314] Clocksource hpet had cycles off by 4294963565
[ 2705.231925] Clocksource hpet had cycles off by 4294949762
[ 2708.181981] Clocksource hpet had cycles off by 4294924526
[ 2713.622343] Clocksource hpet had cycles off by 4294874217
[ 2714.725619] Clocksource hpet had cycles off by 4294961341
[ 2722.302868] Clocksource hpet had cycles off by 4294937888
[ 2723.351842] Clocksource hpet had cycles off by 4294943821
[ 2724.230634] Clocksource hpet had cycles off by 4294953908
[ 2734.508428] Clocksource hpet had cycles off by 4294900255
[ 2743.480843] Clocksource hpet had cycles off by 4294934465
[ 2748.638267] Clocksource hpet had cycles off by 4294928479
[ 2750.374907] Clocksource hpet had cycles off by 4294962242
[ 2752.883492] Clocksource hpet had cycles off by 4294961729
[ 2762.358287] Clocksource hpet had cycles off by 4294957673
[ 2777.020231] Clocksource hpet had cycles off by 4294951532
[ 2789.811124] Clocksource hpet had cycles off by 4294832640
[ 2808.599221] perf interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 2812.309248] Clocksource hpet had cycles off by 4294959285
[ 2820.167264] Clocksource hpet had cycles off by 4294922522
[ 2825.981201] Clocksource hpet had cycles off by 4294961107
[ 2849.266035] Clocksource hpet had cycles off by 4294322241
[ 2862.090994] Clocksource hpet had cycles off by 4294807837
[ 2864.602231] Clocksource hpet had cycles off by 4292691328
[ 2875.567269] Clocksource hpet had cycles off by 4294892749
[ 2876.792253] Clocksource hpet had cycles off by 4294954356
[ 2877.242823] Clocksource hpet had cycles off by 4294942513
[ 2883.445407] Clocksource hpet had cycles off by 4294854046
[ 2884.018795] Clocksource hpet had cycles off by 4294943942
[ 2924.508029] Clocksource hpet had cycles off by 4294913383
[ 2928.152682] Clocksource hpet had cycles off by 4294951984
[ 2942.827401] Clocksource hpet had cycles off by 4294905917
[ 2979.433869] Clocksource hpet had cycles off by 4294939435
[ 3003.908148] Clocksource hpet had cycles off by 4294961616
[ 3038.850267] Clocksource hpet had cycles off by 4294927946
[ 3044.954639] Clocksource hpet had cycles off by 4294958095
[ 3061.620024] Clocksource hpet had cycles off by 4294885942
[ 3073.112084] Clocksource hpet had cycles off by 4294904313
[ 3102.306426] Clocksource hpet had cycles off by 4294886586
[ 3132.345432] Clocksource hpet had cycles off by 4294938217
[ 3138.942577] Clocksource hpet had cycles off by 4294924394
[ 3142.071929] Clocksource hpet had cycles off by 4294907682
[ 3146.492397] Clocksource hpet had cycles off by 4294864373
[ 3166.314671] Clocksource hpet had cycles off by 4294953858
[ 3178.918199] Clocksource hpet had cycles off by 4294942016
[ 3188.422027] Clocksource hpet had cycles off by 4294951735
[ 3194.288347] Clocksource hpet had cycles off by 4294955749
[ 3209.949342] Clocksource hpet had cycles off by 4294954855
[ 3210.908565] Clocksource hpet had cycles off by 4294958025
[ 3215.309835] Clocksource hpet had cycles off by 4294903310
[ 3215.478186] Clocksource hpet had cycles off by 4294925603
[ 3220.315692] Clocksource hpet had cycles off by 4294921130
[ 3240.091173] Clocksource hpet had cycles off by 4294965151
[ 3241.310332] Clocksource hpet had cycles off by 4294920337
[ 3256.946306] Clocksource hpet had cycles off by 4294895161
[ 3274.502015] Clocksource hpet had cycles off by 4294954311
[ 3292.591837] Clocksource hpet had cycles off by 4294950141
[ 3303.335861] Clocksource hpet had cycles off by 4294803418
[ 3326.010088] Clocksource hpet had cycles off by 4294841425
[ 3326.811161] Clocksource hpet had cycles off by 4294962464
[ 3330.472170] Clocksource hpet had cycles off by 4294917788
[ 3340.393483] Clocksource hpet had cycles off by 4294816705
[ 3344.508292] Clocksource hpet had cycles off by 4294856944
[ 3353.028263] Clocksource hpet had cycles off by 4294929946
[ 3358.636907] Clocksource hpet had cycles off by 4294902855
[ 3393.123323] Clocksource hpet had cycles off by 4294954661
[ 3395.373158] Clocksource hpet had cycles off by 4294938408
[ 3396.705795] Clocksource hpet had cycles off by 4294889623
[ 3398.205662] Clocksource hpet had cycles off by 4294879106
[ 3401.179862] Clocksource hpet had cycles off by 4294937424
[ 3411.096003] Clocksource hpet had cycles off by 4294910612
[ 3435.494311] Clocksource hpet had cycles off by 4294875772
[ 3444.877243] Clocksource hpet had cycles off by 4294899367
[ 3474.106134] Clocksource hpet had cycles off by 4294959423
[ 3474.166008] Clocksource hpet had cycles off by 4294960630
[ 3494.407860] Clocksource hpet had cycles off by 4294909541
[ 3500.723084] Clocksource hpet had cycles off by 4294925671
[ 3502.921757] Clocksource hpet had cycles off by 4294926449
[ 3508.518955] Clocksource hpet had cycles off by 4294920226
[ 3533.172055] Clocksource hpet had cycles off by 4294957897
[ 3533.185116] Clocksource hpet had cycles off by 4294913906
[ 3540.982091] Clocksource hpet had cycles off by 4294892669
[ 3543.816807] Clocksource hpet had cycles off by 4294944846
[ 3544.456059] Clocksource hpet had cycles off by 4294950209
[ 3549.528972] Clocksource hpet had cycles off by 4294866480
[ 3550.765097] Clocksource hpet had cycles off by 4294911727
[ 3552.171078] Clocksource hpet had cycles off by 4294957875
[ 3552.331984] Clocksource hpet had cycles off by 4294800210
[ 3561.750983] Clocksource hpet had cycles off by 4294879617
[ 3564.864088] Clocksource hpet had cycles off by 4294952537
[ 3566.032836] Clocksource hpet had cycles off by 4294960751
[ 3584.384888] Clocksource hpet had cycles off by 4294922502
[ 3585.272470] Clocksource hpet had cycles off by 4294949766
[ 3590.822550] Clocksource hpet had cycles off by 4294902676
[ 3593.347261] Clocksource hpet had cycles off by 4294957358
[ 3603.374532] Clocksource hpet had cycles off by 4294913488
[ 3627.928013] Clocksource hpet had cycles off by 4294946425
[ 3629.977816] Clocksource hpet had cycles off by 4294932328
[ 3631.198377] Clocksource hpet had cycles off by 4294914213
[ 3631.238396] Clocksource hpet had cycles off by 4294913701
[ 3631.333341] Clocksource hpet had cycles off by 4294841996
[ 3631.626025] Clocksource hpet had cycles off by 4294944292
[ 3631.745065] Clocksource hpet had cycles off by 4294957078
[ 3632.092954] Clocksource hpet had cycles off by 4294841257
[ 3632.454357] Clocksource hpet had cycles off by 4294961345
[ 3634.493806] Clocksource hpet had cycles off by 4294952357
[ 3635.153726] Clocksource hpet had cycles off by 4294948027
[ 3652.272623] Clocksource hpet had cycles off by 4294965136
[ 3656.902557] Clocksource hpet had cycles off by 4294927721
[ 3662.138415] Clocksource hpet had cycles off by 4294943626
[ 3664.735885] Clocksource hpet had cycles off by 4294958345
[ 3667.755405] Clocksource hpet had cycles off by 4294940238
[ 3685.903347] Clocksource hpet had cycles off by 4294962480
[ 3704.449524] Clocksource hpet had cycles off by 4294863510
[ 3728.823421] Clocksource hpet had cycles off by 4294892160
[ 3749.284831] Clocksource hpet had cycles off by 4294845549
[ 3799.003642] Clocksource hpet had cycles off by 4294880141
[ 3800.812777] Clocksource hpet had cycles off by 4294877511
[ 3806.316479] Clocksource hpet had cycles off by 4294922331
[ 3818.379672] Clocksource hpet had cycles off by 4294919579
[ 3824.536343] Clocksource hpet had cycles off by 4294916256
[ 3912.043508] Clocksource hpet had cycles off by 4294947596
[ 3914.773341] Clocksource hpet had cycles off by 4294927332
[ 3915.092603] Clocksource hpet had cycles off by 4294935220
[ 3918.711418] Clocksource hpet had cycles off by 4294922241
[ 3935.700688] Clocksource hpet had cycles off by 4294935334
[ 3944.982394] trinity-c119 (26679) used greatest stack depth: 8608 bytes left
[ 3945.144701] Clocksource hpet had cycles off by 4294942818
[ 3946.224074] Clocksource hpet had cycles off by 4294942933
[ 3952.200779] Clocksource hpet had cycles off by 4294940591
[ 3955.858738] Clocksource hpet had cycles off by 4294939648
[ 3958.188509] Clocksource hpet had cycles off by 4294923538
[ 3958.658586] Clocksource hpet had cycles off by 4294918578
[ 3975.317160] Clocksource hpet had cycles off by 4294944423
[ 3983.803077] Clocksource hpet had cycles off by 4294932553
[ 3985.311172] Clocksource hpet had cycles off by 4294947521
[ 3985.913043] Clocksource hpet had cycles off by 4294915749
[ 3998.665901] Clocksource hpet had cycles off by 4294912421
[ 4004.910893] Clocksource hpet had cycles off by 4294932330
[ 4014.406515] Clocksource hpet had cycles off by 4294916459
[ 4015.213388] Clocksource hpet had cycles off by 4294954637
[ 4024.019290] Clocksource hpet had cycles off by 4294940364
[ 4024.667464] Clocksource hpet had cycles off by 4294961248
[ 4024.937651] Clocksource hpet had cycles off by 4294956335
[ 4029.585715] Clocksource hpet had cycles off by 4294945561
[ 4030.797962] Clocksource hpet had cycles off by 4294903378
[ 4051.989581] Clocksource hpet had cycles off by 4294847969
[ 4070.615040] Clocksource hpet had cycles off by 4294902056
[ 4073.994411] Clocksource hpet had cycles off by 4294883107
[ 4080.916317] Clocksource hpet had cycles off by 4294941652
[ 4086.311462] Clocksource hpet had cycles off by 4294966537
[ 4090.010466] Clocksource hpet had cycles off by 4294950146
[ 4096.838086] Clocksource hpet had cycles off by 4294927769
[ 4104.491718] Clocksource hpet had cycles off by 4294955539
[ 4107.629077] Clocksource hpet had cycles off by 4294967271
[ 4125.898702] Clocksource hpet had cycles off by 4294964631
[ 4132.125374] Clocksource hpet had cycles off by 4294960839
[ 4142.642241] Clocksource hpet had cycles off by 4294918707
[ 4148.550288] Clocksource hpet had cycles off by 4294897635
[ 4148.745473] Clocksource hpet had cycles off by 4294965015
[ 4151.090145] Clocksource hpet had cycles off by 4294878713
[ 4152.214313] Clocksource hpet had cycles off by 4294952880
[ 4156.038261] Clocksource hpet had cycles off by 4294864713
[ 4162.478929] Clocksource hpet had cycles off by 4294945082
[ 4176.830542] Clocksource hpet had cycles off by 4294946310
[ 4195.020466] Clocksource hpet had cycles off by 4294796816
[ 4202.680429] Clocksource hpet had cycles off by 4294877002
[ 4249.878052] Clocksource hpet had cycles off by 4294949958
[ 4252.020356] Clocksource hpet had cycles off by 4294899168
[ 4252.637177] Clocksource hpet had cycles off by 4294796450
[ 4284.053708] Clocksource hpet had cycles off by 4294872739
[ 4294.238642] Clocksource hpet had cycles off by 4294860546
[ 4301.431434] Clocksource hpet had cycles off by 4294904168
[ 4307.412321] Clocksource hpet had cycles off by 4294841970
[ 4312.143379] Clocksource hpet had cycles off by 4294930826
[ 4314.093635] Clocksource hpet had cycles off by 4294910975
[ 4326.764818] Clocksource hpet had cycles off by 4294932296
[ 4329.571886] Clocksource hpet had cycles off by 4294950989
[ 4330.222706] Clocksource hpet had cycles off by 4294933891
[ 4347.066665] Clocksource hpet had cycles off by 4294880937
[ 4374.504888] Clocksource hpet had cycles off by 4294965440
[ 4382.493267] Clocksource hpet had cycles off by 4294922671
[ 4389.818116] Clocksource hpet had cycles off by 4294935624
[ 4393.376766] Clocksource hpet had cycles off by 4294925525
[ 4399.072216] Clocksource hpet had cycles off by 4294943563
[ 4419.290063] Clocksource hpet had cycles off by 4294950072
[ 4450.537859] Clocksource hpet had cycles off by 4294865975
[ 4464.675533] Clocksource hpet had cycles off by 4294925511
[ 4507.840126] Clocksource hpet had cycles off by 4294931748
[ 4508.161862] Clocksource hpet had cycles off by 4294915748
[ 4512.498940] Clocksource hpet had cycles off by 4294910106
[ 4514.624517] Clocksource hpet had cycles off by 4294955930
[ 4520.082370] Clocksource hpet had cycles off by 4294941408
[ 4531.884141] Clocksource hpet had cycles off by 4294961494
[ 4545.935603] Clocksource hpet had cycles off by 4294967229
[ 4556.241469] Clocksource hpet had cycles off by 4294941038
[ 4569.914804] Clocksource hpet had cycles off by 4294923318
[ 4570.422413] Clocksource hpet had cycles off by 4294953359
[ 4570.602296] Clocksource hpet had cycles off by 4294810379
[ 4570.794821] Clocksource hpet had cycles off by 4294915889
[ 4570.961255] Clocksource hpet had cycles off by 4294884105
[ 4575.453845] Clocksource hpet had cycles off by 4294891136
[ 4576.261732] Clocksource hpet had cycles off by 4294914711
[ 4605.592394] Clocksource hpet had cycles off by 4294948696
[ 4612.348715] Clocksource hpet had cycles off by 4294945417
[ 4634.628273] Clocksource hpet had cycles off by 4294910320
[ 4649.729320] Clocksource hpet had cycles off by 4294913504
[ 4661.279664] Clocksource hpet had cycles off by 4294956035
[ 4674.144390] Clocksource hpet had cycles off by 4294924970
[ 4712.119906] Clocksource hpet had cycles off by 4294960987
[ 4718.898730] Clocksource hpet had cycles off by 4294921730
[ 4751.991138] Clocksource hpet had cycles off by 4294899738
[ 4773.210994] Clocksource hpet had cycles off by 4294869476
[ 4789.097013] Clocksource hpet had cycles off by 4294938181
[ 4794.215907] Clocksource hpet had cycles off by 4294911648
[ 4808.794228] Clocksource hpet had cycles off by 4294958260
[ 4822.398742] Clocksource hpet had cycles off by 4294780976
[ 4822.596511] Clocksource hpet had cycles off by 4294954567
[ 4822.727585] Clocksource hpet had cycles off by 4294938077
[ 4822.817637] Clocksource hpet had cycles off by 4294936644
[ 4826.665756] Clocksource hpet had cycles off by 4294931735
[ 4838.959274] Clocksource hpet had cycles off by 4294779568
[ 4840.277796] Clocksource hpet had cycles off by 4294932970
[ 4845.753143] Clocksource hpet had cycles off by 4294954120
[ 4846.282983] Clocksource hpet had cycles off by 4294952329
[ 4864.693100] Clocksource hpet had cycles off by 4294941337
[ 4864.922970] Clocksource hpet had cycles off by 4294941269
[ 4867.012004] Clocksource hpet had cycles off by 4294937786
[ 4869.792338] Clocksource hpet had cycles off by 4294910021
[ 4870.748390] Clocksource hpet had cycles off by 4294958640
[ 4870.954498] Clocksource hpet had cycles off by 4294872876
[ 4872.812320] Clocksource hpet had cycles off by 4294885275
[ 4872.823645] Clocksource hpet had cycles off by 4294866238
[ 4872.867261] Clocksource hpet had cycles off by 4294957268
[ 4873.676168] Clocksource hpet had cycles off by 4294966242
[ 4877.229892] Clocksource hpet had cycles off by 4294883515
[ 4893.816464] Clocksource hpet had cycles off by 4294938409
[ 4894.318274] Clocksource hpet had cycles off by 4294908408
[ 4905.959500] Clocksource hpet had cycles off by 4294937652
[ 4924.827197] Clocksource hpet had cycles off by 4294957667
[ 4925.456653] Clocksource hpet had cycles off by 4294960141
[ 4932.326392] Clocksource hpet had cycles off by 4294907128
[ 4934.732113] Clocksource hpet had cycles off by 4294948395
[ 4934.743365] Clocksource hpet had cycles off by 4294930445
[ 4942.552790] Clocksource hpet had cycles off by 4294874035
[ 4945.457208] Clocksource hpet had cycles off by 4294929837
[ 4945.647660] Clocksource hpet had cycles off by 4294921846
[ 4950.555687] Clocksource hpet had cycles off by 4294909405
[ 4976.967578] Clocksource hpet had cycles off by 4294950121
[ 4979.117036] Clocksource hpet had cycles off by 4294940069
[ 5002.403831] Clocksource hpet had cycles off by 4294936372
[ 5020.353880] Clocksource hpet had cycles off by 4294930262
[ 5036.692914] Clocksource hpet had cycles off by 4294952001
[ 5045.171781] Clocksource hpet had cycles off by 4294898045
[ 5071.963409] Clocksource hpet had cycles off by 4294796101
[ 5078.613560] Clocksource hpet had cycles off by 4294882096
[ 5079.087553] Clocksource hpet had cycles off by 4294820969
[ 5081.004328] Clocksource hpet had cycles off by 4294851267
[ 5082.917976] Clocksource hpet had cycles off by 4294926333
[ 5087.054632] Clocksource hpet had cycles off by 4294939955
[ 5093.631733] Clocksource hpet had cycles off by 4294926983
[ 5094.330283] Clocksource hpet had cycles off by 4294941989
[ 5110.395745] Clocksource hpet had cycles off by 4294873876
[ 5110.435634] Clocksource hpet had cycles off by 4294875223
[ 5110.509490] Clocksource hpet had cycles off by 4294962589
[ 5110.589587] Clocksource hpet had cycles off by 4294960565
[ 5128.092220] Clocksource hpet had cycles off by 4294921061
[ 5132.569735] Clocksource hpet had cycles off by 4294919599
[ 5156.547880] Clocksource hpet had cycles off by 4294890788
[ 5159.944541] Clocksource hpet had cycles off by 4294910541
[ 5160.333493] Clocksource hpet had cycles off by 4294922199
[ 5162.315113] Clocksource hpet had cycles off by 4294882633
[ 5181.319714] Clocksource hpet had cycles off by 4294945810
[ 5182.884907] Clocksource hpet had cycles off by 4294858390
[ 5186.778012] Clocksource hpet had cycles off by 4294924884
[ 5197.855842] Clocksource hpet had cycles off by 4294864213
[ 5197.952885] Clocksource hpet had cycles off by 4294905809
[ 5198.679846] Clocksource hpet had cycles off by 4294943313
[ 5199.982167] Clocksource hpet had cycles off by 4294899268
[ 5204.351106] Clocksource hpet had cycles off by 4294878340
[ 5205.110198] Clocksource hpet had cycles off by 4294885031
[ 5209.299707] Clocksource hpet had cycles off by 4294857429
[ 5239.444653] Clocksource hpet had cycles off by 4294966446
[ 5255.008651] Clocksource hpet had cycles off by 4294923526
[ 5257.772678] Clocksource hpet had cycles off by 4294842949
[ 5259.896201] Clocksource hpet had cycles off by 4294918089
[ 5284.770634] Clocksource hpet had cycles off by 4294935032
[ 5302.299728] Clocksource hpet had cycles off by 4294945991
[ 5307.619643] Clocksource hpet had cycles off by 4294903178
[ 5315.291445] Clocksource hpet had cycles off by 4294957003
[ 5326.100708] Clocksource hpet had cycles off by 4294878037
[ 5338.598352] Clocksource hpet had cycles off by 4294951503
[ 5342.316899] Clocksource hpet had cycles off by 4294941428
[ 5344.665771] Clocksource hpet had cycles off by 4294938139
[ 5356.528904] Clocksource hpet had cycles off by 4294938129
[ 5380.477968] Clocksource hpet had cycles off by 4294896543
[ 5407.848043] Clocksource hpet had cycles off by 4294955079
[ 5410.097635] Clocksource hpet had cycles off by 4294942267
[ 5414.573618] Clocksource hpet had cycles off by 4294962698
[ 5414.983855] Clocksource hpet had cycles off by 4294955952
[ 5424.537540] Clocksource hpet had cycles off by 4294967133
[ 5425.499051] Clocksource hpet had cycles off by 4294937518
[ 5429.511923] Clocksource hpet had cycles off by 4294863228
[ 5430.414199] Clocksource hpet had cycles off by 4294823155
[ 5437.801616] Clocksource hpet had cycles off by 4294798993
[ 5468.287497] Clocksource hpet had cycles off by 4294891844
[ 5489.800970] Clocksource hpet had cycles off by 4294950293
[ 5510.281769] Clocksource hpet had cycles off by 4294912422
[ 5532.073472] Clocksource hpet had cycles off by 4294850674
[ 5549.445596] Clocksource hpet had cycles off by 4294962724
[ 5575.904611] Clocksource hpet had cycles off by 4294900881
[ 5579.483300] Clocksource hpet had cycles off by 4294889966
[ 5588.857974] Clocksource hpet had cycles off by 4294888621
[ 5589.455840] Clocksource hpet had cycles off by 4294914245
[ 5620.635120] Clocksource hpet had cycles off by 4294952529
[ 5620.928365] Clocksource hpet had cycles off by 4294903738
[ 5625.722839] Clocksource hpet had cycles off by 4294943093
[ 5631.856542] Clocksource hpet had cycles off by 4294839313
[ 5644.032367] Clocksource hpet had cycles off by 4294941402
[ 5647.813466] Clocksource hpet had cycles off by 4294894388
[ 5651.100292] Clocksource hpet had cycles off by 4294912467
[ 5678.273899] Clocksource hpet had cycles off by 4294922211
[ 5678.742115] Clocksource hpet had cycles off by 4294943849
[ 5695.958200] Clocksource hpet had cycles off by 4294857269
[ 5697.575874] Clocksource hpet had cycles off by 4294877152
[ 5705.158889] Clocksource hpet had cycles off by 4294914338
[ 5713.403739] Clocksource hpet had cycles off by 4294919820
[ 5729.534543] Clocksource hpet had cycles off by 4294917802
[ 5730.382999] Clocksource hpet had cycles off by 4294932957
[ 5742.686335] Clocksource hpet had cycles off by 4294926410
[ 5745.702133] Clocksource hpet had cycles off by 4294961654
[ 5746.141520] Clocksource hpet had cycles off by 4294966741
[ 5769.528576] Clocksource hpet had cycles off by 4294958417
[ 5784.972097] Clocksource hpet had cycles off by 4294923535
[ 5787.578966] Clocksource hpet had cycles off by 4294946783
[ 5813.633583] Clocksource hpet had cycles off by 4294951596
[ 5820.970202] Clocksource hpet had cycles off by 4294939310
[ 5830.287369] Clocksource hpet had cycles off by 4294902872
[ 5837.819734] Clocksource hpet had cycles off by 4294949911
[ 5838.588926] Clocksource hpet had cycles off by 4294954964
[ 5841.180607] Clocksource hpet had cycles off by 4294909572
[ 5848.836677] Clocksource hpet had cycles off by 4294902509
[ 5863.227184] Clocksource hpet had cycles off by 4294919298
[ 5884.519000] Clocksource hpet had cycles off by 4294860472
[ 5888.972993] Clocksource hpet had cycles off by 4294909499
[ 5909.998691] Clocksource hpet had cycles off by 4294797337
[ 5915.465763] Clocksource hpet had cycles off by 4294937155
[ 5915.924585] Clocksource hpet had cycles off by 4294950242
[ 5937.331075] Clocksource hpet had cycles off by 4294823436
[ 5943.651689] Clocksource hpet had cycles off by 4294905410
[ 5974.163191] Clocksource hpet had cycles off by 4294917888
[ 5977.359562] Clocksource hpet had cycles off by 4294943445
[ 5995.301687] Clocksource hpet had cycles off by 4294907800
[ 6029.468474] Clocksource hpet had cycles off by 4294957421
[ 6056.226569] Clocksource hpet had cycles off by 4294906619
[ 6062.484070] Clocksource hpet had cycles off by 4294890605
[ 6064.308707] Clocksource hpet had cycles off by 4294809203
[ 6098.279543] Clocksource hpet had cycles off by 4294802756
[ 6135.069521] Clocksource hpet had cycles off by 4294928208
[ 6135.285280] Clocksource hpet had cycles off by 4294843953
[ 6139.976491] Clocksource hpet had cycles off by 4294931068
[ 6143.423703] Clocksource hpet had cycles off by 4294942395
[ 6157.906601] Clocksource hpet had cycles off by 4294924337
[ 6177.083163] Clocksource hpet had cycles off by 4294958092
[ 6179.092624] Clocksource hpet had cycles off by 4294949225
[ 6214.012542] Clocksource hpet had cycles off by 4294947915
[ 6221.297576] Clocksource hpet had cycles off by 4294958776
[ 6226.018320] Clocksource hpet had cycles off by 4294909084
[ 6250.934516] Clocksource hpet had cycles off by 4294900612
[ 6260.946682] Clocksource hpet had cycles off by 4294929947
[ 6276.895078] Clocksource hpet had cycles off by 4294964159
[ 6283.232687] Clocksource hpet had cycles off by 4294945917
[ 6290.228100] Clocksource hpet had cycles off by 4294953716
[ 6297.087200] Clocksource hpet had cycles off by 4294909852
[ 6299.046671] Clocksource hpet had cycles off by 4294901253
[ 6299.187211] Clocksource hpet had cycles off by 4294892320
[ 6301.790171] Clocksource hpet had cycles off by 4294828528
[ 6306.810805] Clocksource hpet had cycles off by 4294921000
[ 6314.129911] Clocksource hpet had cycles off by 4294873175
[ 6327.787826] Clocksource hpet had cycles off by 4294790003
[ 6346.530949] Clocksource hpet had cycles off by 4294876632
[ 6370.781799] Clocksource hpet had cycles off by 4294950110
[ 6371.842423] Clocksource hpet had cycles off by 4294932421
[ 6374.793833] Clocksource hpet had cycles off by 4294887737
[ 6417.213990] Clocksource hpet had cycles off by 4294963984
[ 6418.003396] Clocksource hpet had cycles off by 4294822757
[ 6435.553685] Clocksource hpet had cycles off by 4294959791
[ 6464.638872] Clocksource hpet had cycles off by 4294931170
[ 6464.708534] Clocksource hpet had cycles off by 4294935528
[ 6468.948025] Clocksource hpet had cycles off by 4294907633
[ 6484.551740] Clocksource hpet had cycles off by 4294868507
[ 6484.699955] Clocksource hpet had cycles off by 4294892759
[ 6505.195090] Clocksource hpet had cycles off by 4294935944
[ 6515.271475] Clocksource hpet had cycles off by 4294904414
[ 6517.549058] Clocksource hpet had cycles off by 4294920126
[ 6519.321184] Clocksource hpet had cycles off by 4294875045
[ 6519.908060] Clocksource hpet had cycles off by 4294914921
[ 6527.724728] Clocksource hpet had cycles off by 4294897847
[ 6533.573475] Clocksource hpet had cycles off by 4294867477
[ 6557.035495] Clocksource hpet had cycles off by 4294930640
[ 6572.334089] Clocksource hpet had cycles off by 4294967233
[ 6574.727422] Clocksource hpet had cycles off by 4294899749
[ 6576.013240] Clocksource hpet had cycles off by 4294948971
[ 6588.328161] Clocksource hpet had cycles off by 4294776650
[ 6591.846962] Clocksource hpet had cycles off by 4294907791
[ 6591.890104] Clocksource hpet had cycles off by 4294862481
[ 6601.459236] Clocksource hpet had cycles off by 4294938828
[ 6610.623869] Clocksource hpet had cycles off by 4294939901
[ 6611.433532] Clocksource hpet had cycles off by 4294938116
[ 6633.251682] Clocksource hpet had cycles off by 4294927045
[ 6635.375075] Clocksource hpet had cycles off by 4294860954
[ 6656.878708] Clocksource hpet had cycles off by 4294917232
[ 6661.040050] Clocksource hpet had cycles off by 4294863572
[ 6662.034977] Clocksource hpet had cycles off by 4294927973
[ 6664.421749] Clocksource hpet had cycles off by 4294954448
[ 6672.247917] Clocksource hpet had cycles off by 4294944422
[ 6675.467838] Clocksource hpet had cycles off by 4294918962
[ 6677.270902] Clocksource hpet had cycles off by 4294860250
[ 6678.633280] Clocksource hpet had cycles off by 4294957945
[ 6685.963978] Clocksource hpet had cycles off by 4294887362
[ 6701.873035] Clocksource hpet had cycles off by 4294912353
[ 6702.662357] Clocksource hpet had cycles off by 4294915436
[ 6703.431587] Clocksource hpet had cycles off by 4294920195
[ 6704.522236] Clocksource hpet had cycles off by 4294901823
[ 6713.083410] Clocksource hpet had cycles off by 4294957243
[ 6713.114765] Clocksource hpet had cycles off by 4294937637
[ 6716.082258] Clocksource hpet had cycles off by 4294948909
[ 6718.203099] Clocksource hpet had cycles off by 4294919408
[ 6718.971994] Clocksource hpet had cycles off by 4294928882
[ 6720.708849] Clocksource hpet had cycles off by 4294959583
[ 6721.028787] Clocksource hpet had cycles off by 4294957753
[ 6726.836580] Clocksource hpet had cycles off by 4294941202
[ 6727.583387] Clocksource hpet had cycles off by 4294837664
[ 6742.105971] Clocksource hpet had cycles off by 4294966775
[ 6758.356617] Clocksource hpet had cycles off by 4294966130
[ 6762.800567] Clocksource hpet had cycles off by 4294872687
[ 6777.845714] Clocksource hpet had cycles off by 4294960889
[ 6798.288427] Clocksource hpet had cycles off by 4294895965
[ 6798.514300] Clocksource hpet had cycles off by 4294953069
[ 6798.734637] Clocksource hpet had cycles off by 4294803356
[ 6799.934953] Clocksource hpet had cycles off by 4294932050
[ 6804.485824] Clocksource hpet had cycles off by 4294881922
[ 6830.087641] Clocksource hpet had cycles off by 4294930326
[ 6838.518600] Clocksource hpet had cycles off by 4294846768
[ 6857.801095] Clocksource hpet had cycles off by 4294937726
[ 6898.388286] Clocksource hpet had cycles off by 4294785053
[ 6907.121034] Clocksource hpet had cycles off by 4294816610
[ 6926.043057] Clocksource hpet had cycles off by 4294917312
[ 6926.803442] Clocksource hpet had cycles off by 4294905471
[ 6927.075925] Clocksource hpet had cycles off by 4294867702
[ 6929.839822] Clocksource hpet had cycles off by 4294932181
[ 6954.442526] Clocksource hpet had cycles off by 4294832962
[ 6966.174058] Clocksource hpet had cycles off by 4294857017
[ 6968.395284] Clocksource hpet had cycles off by 4294964282
[ 6978.293338] Clocksource hpet had cycles off by 4294910164
[ 7020.205322] Clocksource hpet had cycles off by 4294964231
[ 7031.379481] Clocksource hpet had cycles off by 4294955343
[ 7038.454684] Clocksource hpet had cycles off by 4294965434
[ 7039.017680] Clocksource hpet had cycles off by 4294917868
[ 7066.359342] Clocksource hpet had cycles off by 4294954047
[ 7068.741632] Clocksource hpet had cycles off by 4294901425
[ 7083.980073] Clocksource hpet had cycles off by 4294940744
[ 7088.905444] Clocksource hpet had cycles off by 4294966277
[ 7099.930773] Clocksource hpet had cycles off by 4294941903
[ 7105.897852] Clocksource hpet had cycles off by 4294934229
[ 7110.933612] Clocksource hpet had cycles off by 4294953279
[ 7129.072933] Clocksource hpet had cycles off by 4294955897
[ 7129.767004] Clocksource hpet had cycles off by 4294891957
[ 7156.722538] Clocksource hpet had cycles off by 4294875889
[ 7179.913404] Clocksource hpet had cycles off by 4294957740
[ 7189.328997] Clocksource hpet had cycles off by 4294942902
[ 7210.145915] Clocksource hpet had cycles off by 4294957773
[ 7236.795287] Clocksource hpet had cycles off by 4294889295
[ 7243.509122] Clocksource hpet had cycles off by 4294922023
[ 7271.875409] Clocksource hpet had cycles off by 4294740258
[ 7297.285172] Clocksource hpet had cycles off by 4294962657
[ 7310.990716] Clocksource hpet had cycles off by 4294912929
[ 7312.084582] Clocksource hpet had cycles off by 4294848642
[ 7314.470384] Clocksource hpet had cycles off by 4294889002
[ 7315.035741] Clocksource hpet had cycles off by 4294950647
[ 7315.307236] Clocksource hpet had cycles off by 4294927050
[ 7317.355527] Clocksource hpet had cycles off by 4294934516
[ 7317.507286] Clocksource hpet had cycles off by 4294908108
[ 7332.817230] Clocksource hpet had cycles off by 4294925346
[ 7340.929623] Clocksource hpet had cycles off by 4294966964
[ 7379.138915] Clocksource hpet had cycles off by 4294947109
[ 7382.226767] Clocksource hpet had cycles off by 4294809185
[ 7386.444425] Clocksource hpet had cycles off by 4294950895
[ 7409.150469] Clocksource hpet had cycles off by 4294962611
[ 7426.042688] Clocksource hpet had cycles off by 4294934167
[ 7426.082070] Clocksource hpet had cycles off by 4294942677
[ 7430.354331] Clocksource hpet had cycles off by 4294874886
[ 7434.067328] Clocksource hpet had cycles off by 4294944448
[ 7436.374986] Clocksource hpet had cycles off by 4294958881
[ 7437.165057] Clocksource hpet had cycles off by 4294951297
[ 7437.576178] Clocksource hpet had cycles off by 4294931847
[ 7469.267601] Clocksource hpet had cycles off by 4294935388
[ 7472.926004] Clocksource hpet had cycles off by 4294927961
[ 7473.035131] Clocksource hpet had cycles off by 4294939573
[ 7477.606732] Clocksource hpet had cycles off by 4294878650
[ 7484.316221] Clocksource hpet had cycles off by 4294830436
[ 7485.829596] Clocksource hpet had cycles off by 4294912868
[ 7489.836062] Clocksource hpet had cycles off by 4294930459
[ 7517.439309] Clocksource hpet had cycles off by 4294941503
[ 7518.214518] Clocksource hpet had cycles off by 4294860491
[ 7528.973593] Clocksource hpet had cycles off by 4294927857
[ 7538.526153] Clocksource hpet had cycles off by 4294955154
[ 7540.727143] Clocksource hpet had cycles off by 4294922835
[ 7541.785108] Clocksource hpet had cycles off by 4294943237
[ 7547.270907] Clocksource hpet had cycles off by 4294957904
[ 7548.399906] Clocksource hpet had cycles off by 4294962899
[ 7553.329945] Clocksource hpet had cycles off by 4294921515
[ 7558.594497] Clocksource hpet had cycles off by 4294955867
[ 7571.296453] Clocksource hpet had cycles off by 4294965875
[ 7572.297917] Clocksource hpet had cycles off by 4294936696
[ 7584.129477] Clocksource hpet had cycles off by 4294959463
[ 7631.334389] Clocksource hpet had cycles off by 4294927694
[ 7658.144277] Clocksource hpet had cycles off by 4294850625
[ 7665.387109] Clocksource hpet had cycles off by 4294893204
[ 7714.453000] Clocksource hpet had cycles off by 4294831930
[ 7739.171354] Clocksource hpet had cycles off by 4294937098
[ 7749.445785] Clocksource hpet had cycles off by 4294931693
[ 7764.919405] Clocksource hpet had cycles off by 4294895069
[ 7772.220322] Clocksource hpet had cycles off by 4294965063
[ 7775.778338] Clocksource hpet had cycles off by 4294963388
[ 7792.889439] Clocksource hpet had cycles off by 4294949185
[ 7797.879080] Clocksource hpet had cycles off by 4294912987
[ 7800.596555] Clocksource hpet had cycles off by 4294926575
[ 7811.046111] Clocksource hpet had cycles off by 4294846442
[ 7817.048445] Clocksource hpet had cycles off by 4294906500
[ 7822.443159] Clocksource hpet had cycles off by 4294937437
[ 7849.680883] Clocksource hpet had cycles off by 4294887672
[ 7854.373766] Clocksource hpet had cycles off by 4294950629
[ 7863.277494] Clocksource hpet had cycles off by 4294966728
[ 7863.477635] Clocksource hpet had cycles off by 4294962968
[ 7864.313563] Clocksource hpet had cycles off by 4294871251
[ 7865.821339] Clocksource hpet had cycles off by 4294890597
[ 7876.291192] Clocksource hpet had cycles off by 4294949092
[ 7879.888696] Clocksource hpet had cycles off by 4294955026
[ 7910.026184] Clocksource hpet had cycles off by 4294884664
[ 7915.601307] Clocksource hpet had cycles off by 4294908311
[ 7920.607557] Clocksource hpet had cycles off by 4294920518
[ 7932.572470] Clocksource hpet had cycles off by 4294894280
[ 7937.241556] Clocksource hpet had cycles off by 4294868692
[ 7940.775581] Clocksource hpet had cycles off by 4294924933
[ 7949.147913] Clocksource hpet had cycles off by 4294965384
[ 7950.379028] Clocksource hpet had cycles off by 4294939201
[ 7999.898462] Clocksource hpet had cycles off by 4294966898
[ 8000.239317] Clocksource hpet had cycles off by 4294951726
[ 8007.101739] Clocksource hpet had cycles off by 4294860141
[ 8018.431564] Clocksource hpet had cycles off by 4294912041
[ 8038.027568] Clocksource hpet had cycles off by 4294950192
[ 8069.892109] Clocksource hpet had cycles off by 4294764480
[ 8101.450098] Clocksource hpet had cycles off by 4294961535
[ 8101.786097] Clocksource hpet had cycles off by 4294872842
[ 8102.842211] Clocksource hpet had cycles off by 4294776561
[ 8103.965146] Clocksource hpet had cycles off by 4294868328
[ 8122.896036] Clocksource hpet had cycles off by 4294842099
[ 8149.283965] Clocksource hpet had cycles off by 4294939642
[ 8154.503130] Clocksource hpet had cycles off by 4294908378
[ 8165.305183] Clocksource hpet had cycles off by 4294932669
[ 8171.667909] Clocksource hpet had cycles off by 4294840995
[ 8196.928476] Clocksource hpet had cycles off by 4294910078
[ 8229.599255] Clocksource hpet had cycles off by 4294914687
[ 8239.193123] Clocksource hpet had cycles off by 4294923080
[ 8239.650541] Clocksource hpet had cycles off by 4294956286
[ 8239.947251] Clocksource hpet had cycles off by 4294857682
[ 8241.924557] Clocksource hpet had cycles off by 4294879976
[ 8245.030679] Clocksource hpet had cycles off by 4294909712
[ 8251.775849] Clocksource hpet had cycles off by 4294922984
[ 8263.898042] Clocksource hpet had cycles off by 4294934385
[ 8266.278976] Clocksource hpet had cycles off by 4294901363
[ 8281.454261] Clocksource hpet had cycles off by 4294843228
[ 8285.870006] Clocksource hpet had cycles off by 4294867567
[ 8298.621180] Clocksource hpet had cycles off by 4294888436
[ 8298.757273] Clocksource hpet had cycles off by 4294943246
[ 8308.035690] Clocksource hpet had cycles off by 4294889036
[ 8311.154045] Clocksource hpet had cycles off by 4294886733
[ 8312.848648] Clocksource hpet had cycles off by 4294949960
[ 8312.863102] Clocksource hpet had cycles off by 4294886142
[ 8315.476778] Clocksource hpet had cycles off by 4294954936
[ 8318.814522] Clocksource hpet had cycles off by 4294959686
[ 8319.834864] Clocksource hpet had cycles off by 4294946289
[ 8349.397739] Clocksource hpet had cycles off by 4294946741
[ 8382.389404] Clocksource hpet had cycles off by 4294935992
[ 8408.794696] Clocksource hpet had cycles off by 4294927930
[ 8410.900853] Clocksource hpet had cycles off by 4294965564
[ 8424.086268] Clocksource hpet had cycles off by 4294921997
[ 8427.317733] Clocksource hpet had cycles off by 4294874276
[ 8444.811443] Clocksource hpet had cycles off by 4294962606
[ 8445.113445] Clocksource hpet had cycles off by 4294931560
[ 8462.581299] Clocksource hpet had cycles off by 4294960739
[ 8462.715757] Clocksource hpet had cycles off by 4294895796
[ 8493.622948] Clocksource hpet had cycles off by 4294966474
[ 8494.833670] Clocksource hpet had cycles off by 4294946137
[ 8501.019390] Clocksource hpet had cycles off by 4294956147
[ 8533.765179] Clocksource hpet had cycles off by 4294888459
[ 8538.547991] Clocksource hpet had cycles off by 4294808556
[ 8554.721417] Clocksource hpet had cycles off by 4294911891
[ 8556.567332] Clocksource hpet had cycles off by 4294955177
[ 8558.766400] Clocksource hpet had cycles off by 4294950299
[ 8576.729899] Clocksource hpet had cycles off by 4294894690
[ 8603.246371] Clocksource hpet had cycles off by 4294868839
[ 8632.497979] Clocksource hpet had cycles off by 4294889916
[ 8642.138536] Clocksource hpet had cycles off by 4294945200
[ 8643.223516] Clocksource hpet had cycles off by 4294864937
[ 8672.322200] Clocksource hpet had cycles off by 4294929168
[ 8690.759412] Clocksource hpet had cycles off by 4294959493
[ 8709.492255] Clocksource hpet had cycles off by 4294907059
[ 8724.280863] Clocksource hpet had cycles off by 4294947618
[ 8733.784764] Clocksource hpet had cycles off by 4294956463
[ 8736.674757] Clocksource hpet had cycles off by 4294932319
[ 8739.246240] Clocksource hpet had cycles off by 4294889784
[ 8744.718494] Clocksource hpet had cycles off by 4294955354
[ 8748.845567] Clocksource hpet had cycles off by 4294963121
[ 8756.253080] Clocksource hpet had cycles off by 4294937365
[ 8762.767729] Clocksource hpet had cycles off by 4294960073
[ 8785.657751] Clocksource hpet had cycles off by 4294913298
[ 8808.529748] Clocksource hpet had cycles off by 4294838591
[ 8813.760909] Clocksource hpet had cycles off by 4294921712
[ 8815.321019] Clocksource hpet had cycles off by 4294907336
[ 8818.077544] Clocksource hpet had cycles off by 4294934173
[ 8838.651774] Clocksource hpet had cycles off by 4294846372
[ 8840.092610] Clocksource hpet had cycles off by 4294965757
[ 8841.833566] Clocksource hpet had cycles off by 4294937637
[ 8868.628460] Clocksource hpet had cycles off by 4294932050
[ 8874.366140] Clocksource hpet had cycles off by 4294917706
[ 8874.732828] Clocksource hpet had cycles off by 4294962074
[ 8880.059558] Clocksource hpet had cycles off by 4294964809
[ 8891.447021] Clocksource hpet had cycles off by 4294906820
[ 8896.750796] Clocksource hpet had cycles off by 4294951994
[ 8898.189306] Clocksource hpet had cycles off by 4294961458
[ 8898.709462] Clocksource hpet had cycles off by 4294954950
[ 8899.369753] Clocksource hpet had cycles off by 4294945259
[ 8928.971992] Clocksource hpet had cycles off by 4294954376
[ 8929.021722] Clocksource hpet had cycles off by 4294957830
[ 8936.577523] Clocksource hpet had cycles off by 4294955418
[ 8936.687488] Clocksource hpet had cycles off by 4294954959
[ 8937.019020] Clocksource hpet had cycles off by 4294930347
[ 8937.042867] Clocksource hpet had cycles off by 4294875044
[ 8941.655197] Clocksource hpet had cycles off by 4294946703
[ 8943.261263] Clocksource hpet had cycles off by 4294846525
[ 8949.283066] Clocksource hpet had cycles off by 4294914024
[ 9002.000288] Clocksource hpet had cycles off by 4294946758
[ 9027.774997] Clocksource hpet had cycles off by 4294952261
[ 9032.419466] Clocksource hpet had cycles off by 4294849853
[ 9033.882828] Clocksource hpet had cycles off by 4294789534
[ 9054.591077] Clocksource hpet had cycles off by 4294929529
[ 9055.470296] Clocksource hpet had cycles off by 4294933377
[ 9070.914430] Clocksource hpet had cycles off by 4294889478
[ 9088.885051] Clocksource hpet had cycles off by 4294874940
[ 9091.500745] Clocksource hpet had cycles off by 4294914900
[ 9123.008947] Clocksource hpet had cycles off by 4294966056
[ 9144.431073] Clocksource hpet had cycles off by 4294901401
[ 9146.349907] Clocksource hpet had cycles off by 4294902199
[ 9149.985011] Clocksource hpet had cycles off by 4294942253
[ 9159.173212] Clocksource hpet had cycles off by 4294891821
[ 9159.706348] Clocksource hpet had cycles off by 4294842545
[ 9165.735597] Clocksource hpet had cycles off by 4294946486
[ 9184.604714] Clocksource hpet had cycles off by 4294946134
[ 9190.991487] Clocksource hpet had cycles off by 4294939336
[ 9224.390628] Clocksource hpet had cycles off by 4294961394
[ 9231.459587] Clocksource hpet had cycles off by 4294944626
[ 9241.941939] Clocksource hpet had cycles off by 4294940579
[ 9242.801999] Clocksource hpet had cycles off by 4294932526
[ 9260.811589] Clocksource hpet had cycles off by 4294932405
[ 9261.491175] Clocksource hpet had cycles off by 4294932739
[ 9268.155226] Clocksource hpet had cycles off by 4294819608
[ 9287.714075] Clocksource hpet had cycles off by 4294960443




^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22 22:57                                                                                                                               ` Dave Jones
@ 2014-12-22 23:59                                                                                                                                 ` Linus Torvalds
  2014-12-23 14:56                                                                                                                                   ` Dave Jones
  2014-12-24  3:01                                                                                                                                   ` Dave Jones
  0 siblings, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-22 23:59 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin, John Stultz

On Mon, Dec 22, 2014 at 2:57 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
>
> I tried the nohpet thing for a few hours this morning and didn't see
> anything weird, but it may have been that I just didn't run long enough.
> When I saw your patch, I gave that a shot instead, with hpet enabled
> again.  Just got back to find lots of messages in dmesg, but none
> of the usual NMI/lockup messages.

Hmm. So my patch is a bit sloppy, and I suspect that the sloppiness
may account for some - and quite probably all - of those numbers.

Some of your numbers are pretty big (it's a 32-bit mask, so they are
all really just pretty small negative numbers), but they are still in
the 2us .. 165ms range when taking the 14MHz HPET counter into
account.  So not huge timer shifts.

And the sloppiness of the patch is two-fold:

One problem with my patch is that it does that "tkr->cycle_error"
without any locking (because it's running in various environmetns
where locking really doesn't work, and we can't even sanely disable
preemption because we might well be inside the scheduler etc.

So this:

          cycle_now = tkr->read(tkr->clock) + tkr->cycle_error;

          /* calculate the delta since the last update_wall_time: */
          delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);

          /* Hmm? This is really not good, we're too close to overflowing */
          if (unlikely(delta > (tkr->mask >> 3))) {
                  tkr->cycle_error = delta;
                 delta = 0;
          }

might run concurrently on two CPU's, and then that tkr->cycle_error
access isn't serialized, so a "later" read of the HPET clock could end
up being written to tkr->cycle_error before. So that can account for
small errors: you'd have a "cycle_error" that gets updated on one CPU,
and then used to correct for an "earlier" read of the clock on another
CPU, and that could make the cycle error possibly worse.

However, that first race only matters if you get errors to begin with,
so if that was the only race, it would still show that some real error
happened.

BUT.

The *bigger* problem is that since the reading side cannot hold any
locks at all, it can also race against the writing side. That's by
design, and we will use the sequence counter to recover from it and
try again, but it means that some of those small errors are just a
reader racing with the wall-time update code, and since this error
code is done _inside_ the read-sequence code, it's not aware of the
retry, and will give a false positive even if we then later on throw
the known-bad result out and re-try.

So your small negative numbers are most likely just those false positives.

I was more hoping to see some big sudden jumps on the order of your
20-second delays - the kinds of jumps that your "tsc unstable"
messages implied (which weren't in the 2us .. 165ms range, but in the
2s to 250s range)

Ugh. I guess I'll have to try to figure out a non-sloppy thing, but
quite frankly, the non-sloppy things I tried first were rather painful
failures. The sloppy thing was sloppy, but worked well to see the
disaster case.

I'll put on my thinking cap. Maybe I can move the "cycle_error" logic
to outside the sequence lock retry loop.

But in the meantime please do keep that thing running as long as you
can. Let's see if we get bigger jumps. Or perhaps we'll get a negative
result - the original softlockup bug happening *without* any bigger
hpet jumps.

                         Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22 19:47                                                                                                                             ` Linus Torvalds
  2014-12-22 20:06                                                                                                                               ` Linus Torvalds
  2014-12-22 22:57                                                                                                                               ` Dave Jones
@ 2014-12-22 23:59                                                                                                                               ` John Stultz
  2014-12-23  0:46                                                                                                                                 ` Linus Torvalds
  2 siblings, 1 reply; 486+ messages in thread
From: John Stultz @ 2014-12-22 23:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Mon, Dec 22, 2014 at 11:47 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Dec 21, 2014 at 4:41 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> This is *not* to say that this is the bug you're hitting. But it does show that
>>
>>  (a) a flaky HPET can do some seriously bad stuff
>>  (b) the kernel is very fragile wrt time going backwards.
>>
>> and maybe we can use this test program to at least try to alleviate problem (b).
>
> Ok, so after several false starts (ktime_get() is really really
> fragile - called in scheduler things, and doing magic things at
> bootup), here is something that seems to alleviate the problem for me.
>
> I still get a lot of RCU  messages like "self-detected stall" etc, but
> that's to be expected. When the clock does odd things, crap *will*
> happen.
>
> But what this does is:
>
>  (a) make the error more visible as a clock error rather than various
> random downstream users
>
>      IOW, it prints things out when it looks like we're getting odd
> clock read errors (arbitrary cut-off: we expect clock read-outs to be
> withing 1/8th of the range of the expected clock value)

(Warning: I'm replying with my vacation goggles on)

A few thoughts from quickly looking at the patch (some of this is
repeating your comments here):

* So 1/8th of the interval seems way too short, as there's
clocksources like the ACP PM, which wrap every 2.5 seconds or so. And
even with more reasonable clocksource wrapping intervals, the tick
scheduler may not schedule the next tick till after that time, which
could cause major problems (we don't want the hrtimer expiration
calculation to get capped out here, since the timer may be scheduled
past 1/8th of the interval, which would keep us from ever accumulating
time and clearing the cycle_error added here)

* I suspect something closer to the clocksource_max_deferment() value
(which I think is max interval before multiplication overflows could
happen - ~12%) which we use in the scheduler would make more sense.
Especially since the timer scheduler uses that to calculate how long
we can idle for.

* Nulling out delta in timekeeping_get_ns() seems like it could cause
problems since time would then possibly go backwards compared to
previous reads (as you mentioned, resulting in smaller time jumps).
Instead it would probably make more sense to cap the delta at the
maximum value (though this assumes the clock doesn't jump back in the
interval before the next call to update_wall_time).

* Also, as you note, this would just cause the big time jump to only
happen at the next update, since there's no logic in
update_wall_time() to limit the jump. I'm not sure if "believing" the
large jump at write time make that much more sense, though.

* Finally, you're writing to error while only holding a read lock, but
that's sort of a minor thing.

I do agree something that is more helpful in validating the
timekeeping here would be nice to avoid further geese chases in the
future.

Some possible ideas:
* Checking the accumulation interval isn't beyond the
clocksource_max_deferment() value seems like a very good check to have
in update_wall_time().

* Maybe when we schedule the next timekeeping update, the tick
scheduler could store the expected time for that to fire, and then we
could validate that we're relatively close after that value when we do
accumulate time (warning if we're running too early or far too late -
though with virtualziation, defining a "reasonable" late value is
difficult).

* This "expected next tick" time could be used to try to cap read-time
intervals in a similar fashion as done here. (Of course, again, we'd
have to be careful, since if that expected next tick ends up somehow
being before the actual hrtimer expiration value, we could end up
stopping time - and the system).

I can try to add some of this when I'm back from holiday in the new year.

Maybe Thomas will have some other ideas?

thanks
-john

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22 23:59                                                                                                                               ` John Stultz
@ 2014-12-23  0:46                                                                                                                                 ` Linus Torvalds
  2014-12-27 20:33                                                                                                                                   ` Paul E. McKenney
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-23  0:46 UTC (permalink / raw)
  To: John Stultz
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Mon, Dec 22, 2014 at 3:59 PM, John Stultz <john.stultz@linaro.org> wrote:
>
> * So 1/8th of the interval seems way too short, as there's
> clocksources like the ACP PM, which wrap every 2.5 seconds or so.

Ugh. At the same time, 1/8th of a range is actually bigger than I'd
like, since if there is some timer corruption, it means that we only
catch it when it's in really big.

But as I said, I'd actually prefer it to be time-based, because it
would be good if this approach worked on things like the TSC which is
a 64-bit counter..

So yes, that capping was very much arbitrary, and was mostly a case of
"this works with the one timer source that I can easily trigger"

> * I suspect something closer to the clocksource_max_deferment() value
> (which I think is max interval before multiplication overflows could
> happen - ~12%) which we use in the scheduler would make more sense.
> Especially since the timer scheduler uses that to calculate how long
> we can idle for.

I'd rather not be anywhere *close* to any overflow problems. Even for
the scheduler all-idle case, I'd argue that there is rather quickly
diminishing returns. Yes, a thousand timer interrupts per second are
expensive and a noticeable power draw. The difference between "one
timer interrupt every two seconds" and "every 20 seconds" is rather
less noticeable.

Of course, reasonable clock sources have *much* longer periods than a
second (yeah, the acpi pm timer really isn't a good one), so there are
probably good middle grounds, The 1/8th was a hack, and one that was
aware of teh 300s cycle of the HPET at that..

> * Nulling out delta in timekeeping_get_ns() seems like it could cause
> problems since time would then possibly go backwards compared to
> previous reads (as you mentioned, resulting in smaller time jumps).
> Instead it would probably make more sense to cap the delta at the
> maximum value (though this assumes the clock doesn't jump back in the
> interval before the next call to update_wall_time).

So part of the nulling was that it was simpler, and part of it was
that I expected to get backwards jumps (see the other email to Dave
about the inherent races). And with the whole timer mask modulo
arithmetic, those backwards jumps just look like biggish positive
numbers, not even negative. So it ends up being things like "is it an
unsigned number larger than half the mask? Consider it negative" etc.

The "zero it out" was simple, and it worked for my test-case, which
was "ok, my machine no longer locks up when I mess with the timer".

And I didn't post the earlier versions of that patch that didn't even *boot*.

I started out trying to do it at a higher level (not on a clock read
level, but outside the whole 'convert-to-ns and do the sequence  value
check'), but during bootup we play a lot of games with initializing
the timer sources etc.

So that explains the approach of doing it at that

   cycle_now = tkr->read(tkr->clock);

level, and keeping it very low-level.

But as I already explained in the email that crossed, that low-level
thing also results in some fundamental races.

> * Also, as you note, this would just cause the big time jump to only
> happen at the next update, since there's no logic in
> update_wall_time() to limit the jump. I'm not sure if "believing" the
> large jump at write time make that much more sense, though.

So I considered just capping it there (to a single interval or
something). Again, just ignoring - like the read side does - it would
have been easier, but at the same time I *really* wanted to make time
go forward, so just taking the big value seemed safest.

But yes. this was very much a RFC patch. It's not even ready for real
use, as DaveJ found out (although it might be good enough in practice,
despite its flaws)

> * Finally, you're writing to error while only holding a read lock, but
> that's sort of a minor thing.

It's not a minor thing, but the alternatives looked worse.

I really wanted to make it per-cpu, and do this with interrupts
disabled or something. But that then pushes a big problem to the write
time to go over all cpu's and see if there are errors.

So it's not right. But .. It's a hacky patch to get discussion
started, and it's actually hard to do "right" when this code has to be
basically lockless.

> * Checking the accumulation interval isn't beyond the
> clocksource_max_deferment() value seems like a very good check to have
> in update_wall_time().

Sounds like a good idea. Also, quite frankly, reading all the code I
wasn't ever really able to figure out that things don't overflow. The
overflow protection is a bit ad-hoc (that maxshift thing in
update_wall_time() really makes baby Jesus cry, despite the season,
and it wasn't at all obvious that ntp_tick_length() is fundamentally
bigger than xtime_interval, for example).

It's also not clear that the complicated and frankly not-very-obvious
shift-loop is any faster than just using a divide - possibly with the
"single interval" case being a special case to avoid dividing then.

I was a bit nervous that the whole update of tkr.cycle_last in there
could just overrun the actual *read* value of 'tk->tkr.clock'. With
the whole offset logic split between update_wall_time() and
logarithmic_accumulation(), the code isn't exactly self-explanatory.

Heh.

> * Maybe when we schedule the next timekeeping update, the tick
> scheduler could store the expected time for that to fire, and then we
> could validate that we're relatively close after that value when we do
> accumulate time (warning if we're running too early or far too late -
> though with virtualziation, defining a "reasonable" late value is
> difficult).

In general, it would be really nice to know what the expected limits
are. It was hard to impossible to figure out the interaction between
the timer subsystem and the scheduler tick. It's pretty incestuous,
and if there's an explanation for it, I missed it.

> * This "expected next tick" time could be used to try to cap read-time
> intervals in a similar fashion as done here. (Of course, again, we'd
> have to be careful, since if that expected next tick ends up somehow
> being before the actual hrtimer expiration value, we could end up
> stopping time - and the system).

I don't think you can cap them to exactly the expected value anyway,
since the wall time update *will* get delayed by locking and just
interrupts being off etc. And virtual environments will obviously make
it much worse. So the capping needs to be somewhat loose anyway.

The patch I posted was actually sloppy by design, exactly because I
had so much trouble with trying to be strict. My first patch was a
percpu thing that just limited ktime_get() from ever going backwards
on that particular cpu (really simple, real;ly stupid), and it got
*nowhere*.

                        Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22 23:59                                                                                                                                 ` Linus Torvalds
@ 2014-12-23 14:56                                                                                                                                   ` Dave Jones
  2014-12-24 13:58                                                                                                                                     ` Sasha Levin
  2014-12-24  3:01                                                                                                                                   ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-23 14:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Mon, Dec 22, 2014 at 03:59:19PM -0800, Linus Torvalds wrote:
 
 > But in the meantime please do keep that thing running as long as you
 > can. Let's see if we get bigger jumps. Or perhaps we'll get a negative
 > result - the original softlockup bug happening *without* any bigger
 > hpet jumps.

It's been going for 18 hours, with just a bunch more of those hpet
messages, all in the same range.  I'll leave it go a few more hours,
before I have to wipe it, but I've got feel-good vibes about this.
Even if that patch isn't the solution, It seems like we're finally
looking in the right direction.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-22 23:59                                                                                                                                 ` Linus Torvalds
  2014-12-23 14:56                                                                                                                                   ` Dave Jones
@ 2014-12-24  3:01                                                                                                                                   ` Dave Jones
  2014-12-26 16:34                                                                                                                                     ` Dave Jones
  1 sibling, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-24  3:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Mon, Dec 22, 2014 at 03:59:19PM -0800, Linus Torvalds wrote:
 
 > But in the meantime please do keep that thing running as long as you
 > can. Let's see if we get bigger jumps. Or perhaps we'll get a negative
 > result - the original softlockup bug happening *without* any bigger
 > hpet jumps.

So I've got this box a *little* longer than anticipated.
It's now been running 30 hours with not a single NMI lockup.
and that's with my kitchen-sink debugging kernel.

The 'hpet off' messages continue to be spewed, and again they're
all in the same range of 4293198075 -> 4294967266

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-23 14:56                                                                                                                                   ` Dave Jones
@ 2014-12-24 13:58                                                                                                                                     ` Sasha Levin
  0 siblings, 0 replies; 486+ messages in thread
From: Sasha Levin @ 2014-12-24 13:58 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin, John Stultz

On 12/23/2014 09:56 AM, Dave Jones wrote:
> On Mon, Dec 22, 2014 at 03:59:19PM -0800, Linus Torvalds wrote:
>  
>  > But in the meantime please do keep that thing running as long as you
>  > can. Let's see if we get bigger jumps. Or perhaps we'll get a negative
>  > result - the original softlockup bug happening *without* any bigger
>  > hpet jumps.
> 
> It's been going for 18 hours, with just a bunch more of those hpet
> messages, all in the same range.  I'll leave it go a few more hours,
> before I have to wipe it, but I've got feel-good vibes about this.
> Even if that patch isn't the solution, It seems like we're finally
> looking in the right direction.

I've got myself a physical server to play with, and running trinity on it
seems to cause similar stalls:

 2338.389210] INFO: rcu_sched self-detected stall on CPU[ 2338.429153] INFO: rcu_sched detected stalls on CPUs/tasks:[ 2338.429164] 	16: (5999 ticks this GP) idle=4b5/140000000000001/0 softirq=24859/24860 last_accelerate: 039d/1b78, nonlazy_posted: 64, ..
[ 2338.429165] 	
[ 2338.680231] 	16: (5999 ticks this GP) idle=4b5/140000000000001/0 softirq=24859/24860 last_accelerate: 039d/1b91, nonlazy_posted: 64, ..
[ 2338.828353] 	 (t=6044 jiffies g=16473 c=16472 q=4915881)

Oddly enough, there's no stacktrace...


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-24  3:01                                                                                                                                   ` Dave Jones
@ 2014-12-26 16:34                                                                                                                                     ` Dave Jones
  2014-12-26 18:12                                                                                                                                       ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-26 16:34 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin, John Stultz

On Tue, Dec 23, 2014 at 10:01:25PM -0500, Dave Jones wrote:
 > On Mon, Dec 22, 2014 at 03:59:19PM -0800, Linus Torvalds wrote:
 >  
 >  > But in the meantime please do keep that thing running as long as you
 >  > can. Let's see if we get bigger jumps. Or perhaps we'll get a negative
 >  > result - the original softlockup bug happening *without* any bigger
 >  > hpet jumps.
 > 
 > So I've got this box a *little* longer than anticipated.
 > It's now been running 30 hours with not a single NMI lockup.
 > and that's with my kitchen-sink debugging kernel.
 > 
 > The 'hpet off' messages continue to be spewed, and again they're
 > all in the same range of 4293198075 -> 4294967266

In case there was any doubt remaining, it's now been running
3 days, 20 hours with no lockups at all.  I haven't seen it
run this long in months.

Either tomorrow or Sunday I'm finally wiping that box
to give it back on Monday, so if there's anything else
you'd like to try, the next 24hrs are pretty much the only
remaining time I have.

One thing I think I'll try is to try and narrow down which
syscalls are triggering those "Clocksource hpet had cycles off"
messages.  I'm still unclear on exactly what is doing
the stomping on the hpet.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 16:34                                                                                                                                     ` Dave Jones
@ 2014-12-26 18:12                                                                                                                                       ` Dave Jones
  2014-12-26 20:57                                                                                                                                         ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-26 18:12 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote:

 > One thing I think I'll try is to try and narrow down which
 > syscalls are triggering those "Clocksource hpet had cycles off"
 > messages.  I'm still unclear on exactly what is doing
 > the stomping on the hpet.

First I ran trinity with "-g vm" which limits it to use just
a subset of syscalls, specifically VM related ones.
That triggered the messages. Further experiments revealed:

-c mremap triggered it, but only when I also passed -C256
to crank up the number of child processes. The same thing
occured with mprotect, madvise, remap_file_pages.

I couldn't trigger it with -c mmap, or msync, mbind, move_pages,
migrate_pages, mlock, regardless of how many child processes there were.


Given the high child count necessary to trigger it,
it's nigh on impossible to weed through all the calls
that trinity made to figure out which one actually
triggered the messages.

I'm not even convinced that the syscall parameters are
even particularly interesting.  The "needs high load to trigger"
aspect of the bug still has a smell of scheduler interaction or
side effect of lock contention. Looking at one childs
syscall params in isolation might look quite dull, but if
we have N processes hammering on the same mapping, that's
probably a lot more interesting.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 18:12                                                                                                                                       ` Dave Jones
@ 2014-12-26 20:57                                                                                                                                         ` Linus Torvalds
  2014-12-26 21:20                                                                                                                                           ` Dave Jones
                                                                                                                                                             ` (2 more replies)
  0 siblings, 3 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-26 20:57 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin, John Stultz

[-- Attachment #1: Type: text/plain, Size: 2983 bytes --]

On Fri, Dec 26, 2014 at 10:12 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
> On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote:
>
>  > One thing I think I'll try is to try and narrow down which
>  > syscalls are triggering those "Clocksource hpet had cycles off"
>  > messages.  I'm still unclear on exactly what is doing
>  > the stomping on the hpet.
>
> First I ran trinity with "-g vm" which limits it to use just
> a subset of syscalls, specifically VM related ones.
> That triggered the messages. Further experiments revealed:

So I can trigger the false positives with my original patch quite
easily by just putting my box under some load. My numbers are nowhere
near as bad as yours, but then, I didn't put it under as much load
anyway. Just a regular "make -j64" of the kernel.

I suspect your false positives are bigger partly because of the load,
but mostly because you presumably have preemption enabled too. I don't
do preemption in my normal kernels, and that limits the damage of the
race a bit.

I have a newer version of the patch that gets rid of the false
positives with some ordering rules instead, and just for you I hacked
it up to say where the problem happens too, but it's likely too late.

The fact that the original racy patch seems to make a difference for
you does say that yes, we seem to be zeroing in on the right area
here, but I'm not seeing what's wrong. I was hoping for big jumps from
your HPET, since your "TSC unstable" messages do kind of imply that
such really big jumps can happen.

I'm attaching my updated hacky patch, although I assume it's much too
late for that machine. Don't look too closely at the backtrace
generation part, that's just a quick hack, and only works with frame
pointers enabled anyway.

So I'm still a bit unhappy about not figuring out *what* is wrong. And
I'd still like the dmidecode from that machine, just for posterity. In
case we can figure out some pattern.

So right now I can imagine several reasons:

 - actual hardware bug.

   This is *really* unlikely, though. It should hit everybody. The
HPET is in the core intel chipset, we're not talking random unusual
hardware by fly-by-night vendors here.

 - some SMM/BIOS "power management" feature.

   We've seen this before, where the SMM saves/restores the TSC on
entry/exit in order to hide itself from the system. I could imagine
similar code for the HPET counter. SMM writers use some bad drugs to
dull their pain.

   And with the HPET counter, since it's not even per-CPU, the "save
and restore HPET" will actually show up as "HPET went backwards" to
the other non-SMM CPU's if it happens

 - a bug in our own clocksource handling.

   I'm not seeing it. But maybe my patch hides it for some magical reason.

 - gremlins.

So I dunno. I hope more people will look at this after the holidays,
even if your machine is gone. My test-program to do bad things to the
HPET shows *something*, and works on any machine.

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 5169 bytes --]

 arch/x86/kernel/entry_64.S          |  5 +++
 include/linux/timekeeper_internal.h |  1 +
 kernel/time/timekeeping.c           | 78 +++++++++++++++++++++++++++++++++++--
 3 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 9ebaf63ba182..0a4c34b4658e 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -312,6 +312,11 @@ ENTRY(save_paranoid)
 	CFI_ENDPROC
 END(save_paranoid)
 
+ENTRY(save_back_trace)
+	movq %rbp,%rdi
+	jmp do_save_back_trace
+END(save_back_trace)
+
 /*
  * A newly forked process directly context switches into this address.
  *
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 05af9a334893..0fcb60d77079 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -32,6 +32,7 @@ struct tk_read_base {
 	cycle_t			(*read)(struct clocksource *cs);
 	cycle_t			mask;
 	cycle_t			cycle_last;
+	cycle_t			cycle_error;
 	u32			mult;
 	u32			shift;
 	u64			xtime_nsec;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6a931852082f..1c924c80b462 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -140,6 +140,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	tk->tkr.read = clock->read;
 	tk->tkr.mask = clock->mask;
 	tk->tkr.cycle_last = tk->tkr.read(clock);
+	tk->tkr.cycle_error = 0;
 
 	/* Do the ns -> cycle conversion first, using original mult */
 	tmp = NTP_INTERVAL_LENGTH;
@@ -191,16 +192,59 @@ u32 (*arch_gettimeoffset)(void) = default_arch_gettimeoffset;
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
+unsigned long tracebuffer[16];
+
+extern void save_back_trace(long dummy, void *ptr);
+
+void do_save_back_trace(long rbp, void *ptr)
+{
+	int i;
+	unsigned long frame = rbp;
+
+	for (i = 0; i < 15; i++) {
+		unsigned long nextframe = ((unsigned long *)frame)[0];
+		unsigned long rip = ((unsigned long *)frame)[1];
+		tracebuffer[i] = rip;
+		if ((nextframe ^ frame) >> 13)
+			break;
+		if (nextframe <= frame)
+			break;
+		frame = nextframe;
+	}
+	tracebuffer[i] = 0;
+}
+
+/*
+ * At read time, we read "cycle_last" *before* we read
+ * the clock.
+ *
+ * At write time, we read the clock before we update
+ * 'cycle_last'.
+ *
+ * Thus, any 'cycle_last' value read here *must* be smaller
+ * than the clock read. Unless the clock is buggy.
+ */
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
-	cycle_t cycle_now, delta;
+	cycle_t cycle_last, cycle_now, delta;
 	s64 nsec;
 
+	/* Read previous cycle - *before* reading clocksource */
+	cycle_last = smp_load_acquire(&tkr->cycle_last);
+
 	/* read clocksource: */
-	cycle_now = tkr->read(tkr->clock);
+	cycle_now = smp_load_acquire(&tkr->cycle_error);
+	cycle_now += tkr->read(tkr->clock);
 
 	/* calculate the delta since the last update_wall_time: */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	delta = clocksource_delta(cycle_now, cycle_last, tkr->mask);
+
+	/* Hmm? This is really not good, we're too close to overflowing */
+	if (unlikely(delta > (tkr->mask >> 3))) {
+		smp_store_release(&tkr->cycle_error, delta);
+		delta = 0;
+		save_back_trace(0, tracebuffer);
+	}
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
@@ -465,6 +509,28 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action)
 	update_fast_timekeeper(tk);
 }
 
+static void check_cycle_error(struct tk_read_base *tkr)
+{
+	cycle_t error = tkr->cycle_error;
+
+	if (unlikely(error)) {
+		int i;
+		const char *sign = "";
+		tkr->cycle_error = 0;
+		if (error > tkr->mask/2) {
+			error = tkr->mask - error + 1;
+			sign = "-";
+		}
+		pr_err("Clocksource %s had cycles off by %s%llu\n", tkr->clock->name, sign, error);
+		for (i = 0; i < 16; i++) {
+			unsigned long rip = tracebuffer[i];
+			if (!rip)
+				break;
+			printk("  %pS\n", (void *)rip);
+		}
+	}
+}
+
 /**
  * timekeeping_forward_now - update clock to the current time
  *
@@ -481,6 +547,7 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 	cycle_now = tk->tkr.read(clock);
 	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
 	tk->tkr.cycle_last = cycle_now;
+	check_cycle_error(&tk->tkr);
 
 	tk->tkr.xtime_nsec += delta * tk->tkr.mult;
 
@@ -1237,6 +1304,7 @@ static void timekeeping_resume(void)
 
 	/* Re-base the last cycle value */
 	tk->tkr.cycle_last = cycle_now;
+	tk->tkr.cycle_error = 0;
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
 	timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
@@ -1591,11 +1659,15 @@ void update_wall_time(void)
 	if (unlikely(timekeeping_suspended))
 		goto out;
 
+	check_cycle_error(&real_tk->tkr);
+
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
 	offset = real_tk->cycle_interval;
 #else
 	offset = clocksource_delta(tk->tkr.read(tk->tkr.clock),
 				   tk->tkr.cycle_last, tk->tkr.mask);
+	if (unlikely(offset > (tk->tkr.mask >> 3)))
+		pr_err("Cutting it too close for %s in in update_wall_time (offset = %llu)\n", tk->tkr.clock->name, offset);
 #endif
 
 	/* Check if there's really nothing to do */

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 20:57                                                                                                                                         ` Linus Torvalds
@ 2014-12-26 21:20                                                                                                                                           ` Dave Jones
  2014-12-26 22:57                                                                                                                                           ` Dave Jones
  2015-01-03  0:27                                                                                                                                           ` John Stultz
  2 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-26 21:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 12:57:07PM -0800, Linus Torvalds wrote:

 > I have a newer version of the patch that gets rid of the false
 > positives with some ordering rules instead, and just for you I hacked
 > it up to say where the problem happens too, but it's likely too late.

I'll give it a spin and see what falls out this evening.

 > So I'm still a bit unhappy about not figuring out *what* is wrong. And
 > I'd still like the dmidecode from that machine, just for posterity. In
 > case we can figure out some pattern.

So this is something I should have done a long time ago.
Googling for the board name shows up a very similar report
from a year ago, except that was within kvm, and was aparently fixed.
https://lkml.org/lkml/2013/10/9/206 and 
https://bugzilla.kernel.org/show_bug.cgi?id=69491
(dmidecode attachment there is pretty much the same as mine)

 >  - actual hardware bug.
 >    This is *really* unlikely, though. It should hit everybody. The
 > HPET is in the core intel chipset, we're not talking random unusual
 > hardware by fly-by-night vendors here.

This machine is allegedly a 'production' box from Intel, but
given Kashyap saw something very similar I'm wondering now if there
was some board/bios errata for this system.

There's a few Intel folks cc'd here, maybe one those can dig up whether
there was anything peculiar about Shark Bay systems that would
explain the HPET getting screwed up.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 20:57                                                                                                                                         ` Linus Torvalds
  2014-12-26 21:20                                                                                                                                           ` Dave Jones
@ 2014-12-26 22:57                                                                                                                                           ` Dave Jones
  2014-12-26 23:16                                                                                                                                             ` Linus Torvalds
  2014-12-26 23:30                                                                                                                                             ` Linus Torvalds
  2015-01-03  0:27                                                                                                                                           ` John Stultz
  2 siblings, 2 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-26 22:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 12:57:07PM -0800, Linus Torvalds wrote:
 
 > I have a newer version of the patch that gets rid of the false
 > positives with some ordering rules instead, and just for you I hacked
 > it up to say where the problem happens too, but it's likely too late.

hm.


[ 2733.047100] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2733.047188] 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P25811
[ 2733.047216] 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P25811
[ 2733.047242] 	(detected by 0, t=6502 jiffies, g=52141, c=52140, q=0)
[ 2733.047271] trinity-c406    R  running task    13416 25811  24907 0x00000000
[ 2733.047305]  ffff88022208fd28 0000000000000002 ffffffffa819f627 ffff8801df2c0000
[ 2733.047341]  00000000001d31c0 0000000000000002 ffff88022208ffd8 00000000001d31c0
[ 2733.047375]  ffff8800806e1780 ffff8801df2c0000 ffff88022208fd18 ffff88022208ffd8
[ 2733.047411] Call Trace:
[ 2733.047429]  [<ffffffffa819f627>] ? context_tracking_user_exit+0x67/0x280
[ 2733.047457]  [<ffffffffa88522a2>] preempt_schedule_irq+0x52/0xb0
[ 2733.047482]  [<ffffffffa8859820>] retint_kernel+0x20/0x30
[ 2733.047505]  [<ffffffffa808a361>] ? check_kill_permission+0xb1/0x1e0
[ 2733.047531]  [<ffffffffa808a402>] ? check_kill_permission+0x152/0x1e0
[ 2733.047557]  [<ffffffffa808dc25>] group_send_sig_info+0x65/0x150
[ 2733.047581]  [<ffffffffa808dbc5>] ? group_send_sig_info+0x5/0x150
[ 2733.047607]  [<ffffffffa80ed71e>] ? rcu_read_lock_held+0x6e/0x80
[ 2733.047632]  [<ffffffffa808dee8>] kill_pid_info+0x78/0x130
[ 2733.047654]  [<ffffffffa808de75>] ? kill_pid_info+0x5/0x130
[ 2733.047677]  [<ffffffffa808e0b2>] SYSC_kill+0xf2/0x2f0
[ 2733.047699]  [<ffffffffa808e05b>] ? SYSC_kill+0x9b/0x2f0
[ 2733.047721]  [<ffffffffa80d7ffd>] ? trace_hardirqs_on+0xd/0x10
[ 2733.047745]  [<ffffffffa8013765>] ? syscall_trace_enter_phase1+0x125/0x1a0
[ 2733.048607]  [<ffffffffa80d7f2d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[ 2733.049469]  [<ffffffffa809079e>] SyS_kill+0xe/0x10
[ 2733.050332]  [<ffffffffa8858aa2>] system_call_fastpath+0x12/0x17
[ 2733.051197] trinity-c406    R  running task    13416 25811  24907 0x00000000
[ 2733.052064]  ffff88022208fd28 0000000000000002 ffffffffa819f627 ffff8801df2c0000
[ 2733.052932]  00000000001d31c0 0000000000000002 ffff88022208ffd8 00000000001d31c0
[ 2733.053792]  ffff880209e2c680 ffff8801df2c0000 ffff88022208fd18 ffff88022208ffd8
[ 2733.054651] Call Trace:
[ 2733.055500]  [<ffffffffa819f627>] ? context_tracking_user_exit+0x67/0x280
[ 2733.056362]  [<ffffffffa88522a2>] preempt_schedule_irq+0x52/0xb0
[ 2733.057222]  [<ffffffffa8859820>] retint_kernel+0x20/0x30
[ 2733.058076]  [<ffffffffa808a361>] ? check_kill_permission+0xb1/0x1e0
[ 2733.058930]  [<ffffffffa808a402>] ? check_kill_permission+0x152/0x1e0
[ 2733.059778]  [<ffffffffa808dc25>] group_send_sig_info+0x65/0x150
[ 2733.060624]  [<ffffffffa808dbc5>] ? group_send_sig_info+0x5/0x150
[ 2733.061472]  [<ffffffffa80ed71e>] ? rcu_read_lock_held+0x6e/0x80
[ 2733.062322]  [<ffffffffa808dee8>] kill_pid_info+0x78/0x130
[ 2733.063168]  [<ffffffffa808de75>] ? kill_pid_info+0x5/0x130
[ 2733.064015]  [<ffffffffa808e0b2>] SYSC_kill+0xf2/0x2f0
[ 2733.064863]  [<ffffffffa808e05b>] ? SYSC_kill+0x9b/0x2f0
[ 2733.065704]  [<ffffffffa80d7ffd>] ? trace_hardirqs_on+0xd/0x10
[ 2733.066541]  [<ffffffffa8013765>] ? syscall_trace_enter_phase1+0x125/0x1a0
[ 2733.067384]  [<ffffffffa80d7f2d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[ 2733.068217]  [<ffffffffa809079e>] SyS_kill+0xe/0x10
[ 2733.069045]  [<ffffffffa8858aa2>] system_call_fastpath+0x12/0x17
[ 3708.217920] perf interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 4583.530580] request_module: runaway loop modprobe personality-87


still running though..

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 22:57                                                                                                                                           ` Dave Jones
@ 2014-12-26 23:16                                                                                                                                             ` Linus Torvalds
  2014-12-27  0:36                                                                                                                                               ` Dave Jones
  2014-12-26 23:30                                                                                                                                             ` Linus Torvalds
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-26 23:16 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 2:57 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
>
> hm.

So with the previous patch that had the false positives, you never saw
this? You saw the false positives instead?

I'm wondering if the added debug noise just ended up helping. Doing a
printk() will automatically cause some scheduler activity. And they
also caused the time reading jiggle.

That said, it's also possible that I screwed something up in the
second version of the patch, just breaking it and making it generally
ineffective.

Oh - and have you actually seen the "TSC unstable (delta = xyz)" +
"switched to hpet" messages there yet?

                         Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 22:57                                                                                                                                           ` Dave Jones
  2014-12-26 23:16                                                                                                                                             ` Linus Torvalds
@ 2014-12-26 23:30                                                                                                                                             ` Linus Torvalds
  2014-12-27  0:39                                                                                                                                               ` Dave Jones
  2014-12-27  2:53                                                                                                                                               ` Dave Jones
  1 sibling, 2 replies; 486+ messages in thread
From: Linus Torvalds @ 2014-12-26 23:30 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 2:57 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
>
> still running though..

Btw, did you ever boot with "tsc=reliable" as a kernel command line option?

For the last night, can you see if you can just run it with that, and
things work? Because by now, my gut feel is that we should start
derating the HPET rather than the TSC, especially going forward on
modern hardware. And if this is some subtle timing issue with our hpet
code (the whole -ETIME thing when getting close to setting a timer is
subtle, for example, even if the HPET hardware itself would be ok),
I'm wondering if the fix isn't to just stop believing in HPET if there
are better alternatives around.

So I'm not even convinced that trying to debug some HPET issue is
really worth it. Especially if your machine is a preproduction board
from Intel.

But verifying that with just the TSC everything is ok might still be worth it.

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 23:16                                                                                                                                             ` Linus Torvalds
@ 2014-12-27  0:36                                                                                                                                               ` Dave Jones
  2014-12-27  3:14                                                                                                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Dave Jones @ 2014-12-27  0:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 03:16:41PM -0800, Linus Torvalds wrote:
 > On Fri, Dec 26, 2014 at 2:57 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > >
 > > hm.
 > 
 > So with the previous patch that had the false positives, you never saw
 > this? You saw the false positives instead?
 
correct.

 > I'm wondering if the added debug noise just ended up helping. Doing a
 > printk() will automatically cause some scheduler activity. And they
 > also caused the time reading jiggle.
 > 
 > That said, it's also possible that I screwed something up in the
 > second version of the patch, just breaking it and making it generally
 > ineffective.
 > 
 > Oh - and have you actually seen the "TSC unstable (delta = xyz)" +
 > "switched to hpet" messages there yet?

not yet. 3 hrs in.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 23:30                                                                                                                                             ` Linus Torvalds
@ 2014-12-27  0:39                                                                                                                                               ` Dave Jones
  2014-12-27  2:53                                                                                                                                               ` Dave Jones
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-27  0:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 03:30:20PM -0800, Linus Torvalds wrote:
 > On Fri, Dec 26, 2014 at 2:57 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > >
 > > still running though..
 > 
 > Btw, did you ever boot with "tsc=reliable" as a kernel command line option?

I don't think so.

 > For the last night, can you see if you can just run it with that, and
 > things work?

Sure.

 > So I'm not even convinced that trying to debug some HPET issue is
 > really worth it. Especially if your machine is a preproduction board
 > from Intel.

Yeah, I agree. Even though it's strange that this only became a problem
this last few months for me, after over a year of abuse.

Hopefully the new year brings more trustworthy hardware.

	Dave


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 23:30                                                                                                                                             ` Linus Torvalds
  2014-12-27  0:39                                                                                                                                               ` Dave Jones
@ 2014-12-27  2:53                                                                                                                                               ` Dave Jones
  1 sibling, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-27  2:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 03:30:20PM -0800, Linus Torvalds wrote:
 > On Fri, Dec 26, 2014 at 2:57 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > >
 > > still running though..
 > 
 > Btw, did you ever boot with "tsc=reliable" as a kernel command line option?

I'll check it again in the morning, but before I turn in for the night,
so far the only thing is this:


[ 6713.394395] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 6713.394489] 	Tasks blocked on level-0 rcu_node (CPUs 0-7):
[ 6713.394513] 	Tasks blocked on level-0 rcu_node (CPUs 0-7):
[ 6713.394536] 	(detected by 3, t=6502 jiffies, g=141292, c=141291, q=0)
[ 6713.394564] INFO: Stall ended before state dump start


	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-27  0:36                                                                                                                                               ` Dave Jones
@ 2014-12-27  3:14                                                                                                                                                 ` Linus Torvalds
  2014-12-27 16:48                                                                                                                                                   ` Dave Jones
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2014-12-27  3:14 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Paul E. McKenney, Linux Kernel Mailing List,
	Suresh Siddha, Oleg Nesterov, Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 4:36 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
>  >
>  > Oh - and have you actually seen the "TSC unstable (delta = xyz)" +
>  > "switched to hpet" messages there yet?
>
> not yet. 3 hrs in.

Ok, so then the

     INFO: rcu_preempt detected stalls on CPUs/tasks:

has nothing to do with HPET, since you'd still be running with the TSC enabled.

My googling around did find a number of "machine locks up a few hours
after switching to hpet" reports, so it is possible that the whole rcu
stall and nmi watchdog thing is independent and unrelated to the
actual locking up.

It *is* intriguing that my broken patch seemed to prevent it from
happening, though. And both NMI watchdogs and the rcu stall are
related to wall-clock time.  But hey, maybe there really is some odd
loop in the kernel that stops scheduling or RCU grace periods. It just
seems to be never caught by your backtraces..

             Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-27  3:14                                                                                                                                                 ` Linus Torvalds
@ 2014-12-27 16:48                                                                                                                                                   ` Dave Jones
  0 siblings, 0 replies; 486+ messages in thread
From: Dave Jones @ 2014-12-27 16:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin, John Stultz

On Fri, Dec 26, 2014 at 07:14:55PM -0800, Linus Torvalds wrote:
 > On Fri, Dec 26, 2014 at 4:36 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > >  >
 > >  > Oh - and have you actually seen the "TSC unstable (delta = xyz)" +
 > >  > "switched to hpet" messages there yet?
 > >
 > > not yet. 3 hrs in.
 > 
 > Ok, so then the
 > 
 >      INFO: rcu_preempt detected stalls on CPUs/tasks:
 > 
 > has nothing to do with HPET, since you'd still be running with the TSC enabled.

right. 16hrs later, that's the only thing that's spewed.

 > My googling around did find a number of "machine locks up a few hours
 > after switching to hpet" reports, so it is possible that the whole rcu
 > stall and nmi watchdog thing is independent and unrelated to the
 > actual locking up.

possible.  I'm heading home in a few hours to start the wipe of that
box. This is going to be 'the one that got away', but at least we've
managed to find a number of other things that needed fixing along the way.

	Dave

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-23  0:46                                                                                                                                 ` Linus Torvalds
@ 2014-12-27 20:33                                                                                                                                   ` Paul E. McKenney
  0 siblings, 0 replies; 486+ messages in thread
From: Paul E. McKenney @ 2014-12-27 20:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Stultz, Dave Jones, Thomas Gleixner, Chris Mason,
	Mike Galbraith, Ingo Molnar, Peter Zijlstra, Dâniel Fraga,
	Sasha Levin, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Mon, Dec 22, 2014 at 04:46:42PM -0800, Linus Torvalds wrote:
> On Mon, Dec 22, 2014 at 3:59 PM, John Stultz <john.stultz@linaro.org> wrote:
> >
> > * So 1/8th of the interval seems way too short, as there's
> > clocksources like the ACP PM, which wrap every 2.5 seconds or so.
> 
> Ugh. At the same time, 1/8th of a range is actually bigger than I'd
> like, since if there is some timer corruption, it means that we only
> catch it when it's in really big.
> 
> But as I said, I'd actually prefer it to be time-based, because it
> would be good if this approach worked on things like the TSC which is
> a 64-bit counter..
> 
> So yes, that capping was very much arbitrary, and was mostly a case of
> "this works with the one timer source that I can easily trigger"
> 
> > * I suspect something closer to the clocksource_max_deferment() value
> > (which I think is max interval before multiplication overflows could
> > happen - ~12%) which we use in the scheduler would make more sense.
> > Especially since the timer scheduler uses that to calculate how long
> > we can idle for.
> 
> I'd rather not be anywhere *close* to any overflow problems. Even for
> the scheduler all-idle case, I'd argue that there is rather quickly
> diminishing returns. Yes, a thousand timer interrupts per second are
> expensive and a noticeable power draw. The difference between "one
> timer interrupt every two seconds" and "every 20 seconds" is rather
> less noticeable.
> 
> Of course, reasonable clock sources have *much* longer periods than a
> second (yeah, the acpi pm timer really isn't a good one), so there are
> probably good middle grounds, The 1/8th was a hack, and one that was
> aware of teh 300s cycle of the HPET at that..

I of course very much like the idea of the timekeeping system doing a
bit of self-checking.  ;-)

Can we simplify things by just not doing self-checking on clocks that
overflow in less than a few minutes?  In other words, if someone
reports oddball RCU CPU stall warnings when using the ACPI PM
timer, am I within my rights to tell them to reproduce the stall
using a better timer?

If so, we could possibly get away with the assumption that preemptions
don't last longer than the soft-lockup interval, which is currently
a bit over 20 seconds (hence "a few minutes" above).  And yes, you
might avoid a soft lockup by having a series of 15-second preemptions
between each pair of instructions, but my response would be to increase
my "a few minutes" a bit and to invoke probabilities.

I don't claim to understand the timer code for all the reasons that
Linus calls out below, but I believe that this simplifying
assumption would in turn simplify the self-check code.

						Thanx, Paul

> > * Nulling out delta in timekeeping_get_ns() seems like it could cause
> > problems since time would then possibly go backwards compared to
> > previous reads (as you mentioned, resulting in smaller time jumps).
> > Instead it would probably make more sense to cap the delta at the
> > maximum value (though this assumes the clock doesn't jump back in the
> > interval before the next call to update_wall_time).
> 
> So part of the nulling was that it was simpler, and part of it was
> that I expected to get backwards jumps (see the other email to Dave
> about the inherent races). And with the whole timer mask modulo
> arithmetic, those backwards jumps just look like biggish positive
> numbers, not even negative. So it ends up being things like "is it an
> unsigned number larger than half the mask? Consider it negative" etc.
> 
> The "zero it out" was simple, and it worked for my test-case, which
> was "ok, my machine no longer locks up when I mess with the timer".
> 
> And I didn't post the earlier versions of that patch that didn't even *boot*.
> 
> I started out trying to do it at a higher level (not on a clock read
> level, but outside the whole 'convert-to-ns and do the sequence  value
> check'), but during bootup we play a lot of games with initializing
> the timer sources etc.
> 
> So that explains the approach of doing it at that
> 
>    cycle_now = tkr->read(tkr->clock);
> 
> level, and keeping it very low-level.
> 
> But as I already explained in the email that crossed, that low-level
> thing also results in some fundamental races.
> 
> > * Also, as you note, this would just cause the big time jump to only
> > happen at the next update, since there's no logic in
> > update_wall_time() to limit the jump. I'm not sure if "believing" the
> > large jump at write time make that much more sense, though.
> 
> So I considered just capping it there (to a single interval or
> something). Again, just ignoring - like the read side does - it would
> have been easier, but at the same time I *really* wanted to make time
> go forward, so just taking the big value seemed safest.
> 
> But yes. this was very much a RFC patch. It's not even ready for real
> use, as DaveJ found out (although it might be good enough in practice,
> despite its flaws)
> 
> > * Finally, you're writing to error while only holding a read lock, but
> > that's sort of a minor thing.
> 
> It's not a minor thing, but the alternatives looked worse.
> 
> I really wanted to make it per-cpu, and do this with interrupts
> disabled or something. But that then pushes a big problem to the write
> time to go over all cpu's and see if there are errors.
> 
> So it's not right. But .. It's a hacky patch to get discussion
> started, and it's actually hard to do "right" when this code has to be
> basically lockless.
> 
> > * Checking the accumulation interval isn't beyond the
> > clocksource_max_deferment() value seems like a very good check to have
> > in update_wall_time().
> 
> Sounds like a good idea. Also, quite frankly, reading all the code I
> wasn't ever really able to figure out that things don't overflow. The
> overflow protection is a bit ad-hoc (that maxshift thing in
> update_wall_time() really makes baby Jesus cry, despite the season,
> and it wasn't at all obvious that ntp_tick_length() is fundamentally
> bigger than xtime_interval, for example).
> 
> It's also not clear that the complicated and frankly not-very-obvious
> shift-loop is any faster than just using a divide - possibly with the
> "single interval" case being a special case to avoid dividing then.
> 
> I was a bit nervous that the whole update of tkr.cycle_last in there
> could just overrun the actual *read* value of 'tk->tkr.clock'. With
> the whole offset logic split between update_wall_time() and
> logarithmic_accumulation(), the code isn't exactly self-explanatory.
> 
> Heh.
> 
> > * Maybe when we schedule the next timekeeping update, the tick
> > scheduler could store the expected time for that to fire, and then we
> > could validate that we're relatively close after that value when we do
> > accumulate time (warning if we're running too early or far too late -
> > though with virtualziation, defining a "reasonable" late value is
> > difficult).
> 
> In general, it would be really nice to know what the expected limits
> are. It was hard to impossible to figure out the interaction between
> the timer subsystem and the scheduler tick. It's pretty incestuous,
> and if there's an explanation for it, I missed it.
> 
> > * This "expected next tick" time could be used to try to cap read-time
> > intervals in a similar fashion as done here. (Of course, again, we'd
> > have to be careful, since if that expected next tick ends up somehow
> > being before the actual hrtimer expiration value, we could end up
> > stopping time - and the system).
> 
> I don't think you can cap them to exactly the expected value anyway,
> since the wall time update *will* get delayed by locking and just
> interrupts being off etc. And virtual environments will obviously make
> it much worse. So the capping needs to be somewhat loose anyway.
> 
> The patch I posted was actually sloppy by design, exactly because I
> had so much trouble with trying to be strict. My first patch was a
> percpu thing that just limited ktime_get() from ever going backwards
> on that particular cpu (really simple, real;ly stupid), and it got
> *nowhere*.
> 
>                         Linus
> 


^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-26 20:57                                                                                                                                         ` Linus Torvalds
  2014-12-26 21:20                                                                                                                                           ` Dave Jones
  2014-12-26 22:57                                                                                                                                           ` Dave Jones
@ 2015-01-03  0:27                                                                                                                                           ` John Stultz
  2015-01-03 14:58                                                                                                                                             ` Sasha Levin
  2015-01-04 19:46                                                                                                                                             ` Linus Torvalds
  2 siblings, 2 replies; 486+ messages in thread
From: John Stultz @ 2015-01-03  0:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Fri, Dec 26, 2014 at 12:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Dec 26, 2014 at 10:12 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
>> On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote:
>>
>>  > One thing I think I'll try is to try and narrow down which
>>  > syscalls are triggering those "Clocksource hpet had cycles off"
>>  > messages.  I'm still unclear on exactly what is doing
>>  > the stomping on the hpet.
>>
>> First I ran trinity with "-g vm" which limits it to use just
>> a subset of syscalls, specifically VM related ones.
>> That triggered the messages. Further experiments revealed:
>
> So I can trigger the false positives with my original patch quite
> easily by just putting my box under some load. My numbers are nowhere
> near as bad as yours, but then, I didn't put it under as much load
> anyway. Just a regular "make -j64" of the kernel.
>
> I suspect your false positives are bigger partly because of the load,
> but mostly because you presumably have preemption enabled too. I don't
> do preemption in my normal kernels, and that limits the damage of the
> race a bit.
>
> I have a newer version of the patch that gets rid of the false
> positives with some ordering rules instead, and just for you I hacked
> it up to say where the problem happens too, but it's likely too late.
>
> The fact that the original racy patch seems to make a difference for
> you does say that yes, we seem to be zeroing in on the right area
> here, but I'm not seeing what's wrong. I was hoping for big jumps from
> your HPET, since your "TSC unstable" messages do kind of imply that
> such really big jumps can happen.
>
> I'm attaching my updated hacky patch, although I assume it's much too
> late for that machine. Don't look too closely at the backtrace
> generation part, that's just a quick hack, and only works with frame
> pointers enabled anyway.
>
> So I'm still a bit unhappy about not figuring out *what* is wrong. And
> I'd still like the dmidecode from that machine, just for posterity. In
> case we can figure out some pattern.
>
> So right now I can imagine several reasons:
>
>  - actual hardware bug.
>
>    This is *really* unlikely, though. It should hit everybody. The
> HPET is in the core intel chipset, we're not talking random unusual
> hardware by fly-by-night vendors here.
>
>  - some SMM/BIOS "power management" feature.
>
>    We've seen this before, where the SMM saves/restores the TSC on
> entry/exit in order to hide itself from the system. I could imagine
> similar code for the HPET counter. SMM writers use some bad drugs to
> dull their pain.
>
>    And with the HPET counter, since it's not even per-CPU, the "save
> and restore HPET" will actually show up as "HPET went backwards" to
> the other non-SMM CPU's if it happens
>
>  - a bug in our own clocksource handling.
>
>    I'm not seeing it. But maybe my patch hides it for some magical reason.

So I sent out a first step validation check to warn us if we end up
with idle periods that are larger then we expect.

It doesn't yet cap the timekeeping_get_ns() output (like you're patch
effectively does), but it would be easy to do that in a following
patch.

I did notice while testing this that the max_idle_ns (max idle time we
report to the scheduler) for the hpet is only ~16sec, and we'll
overflow after just ~21seconds. This second number maps closely to the
22 second stalls seen in the  nmi watchdog reports which seems
interesting, but I also realize that qemu uses a 100MHz hpet, where as
real hardware is likely to be a bit slower, so maybe that's just
chance..

I'd be interested if folks seeing anything similar to Dave would give
my patch a shot.

thanks
-john

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2015-01-03  0:27                                                                                                                                           ` John Stultz
@ 2015-01-03 14:58                                                                                                                                             ` Sasha Levin
  2015-01-04 19:46                                                                                                                                             ` Linus Torvalds
  1 sibling, 0 replies; 486+ messages in thread
From: Sasha Levin @ 2015-01-03 14:58 UTC (permalink / raw)
  To: John Stultz, Linus Torvalds
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On 01/02/2015 07:27 PM, John Stultz wrote:
> On Fri, Dec 26, 2014 at 12:57 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> > On Fri, Dec 26, 2014 at 10:12 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
>>> >> On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote:
>>> >>
>>> >>  > One thing I think I'll try is to try and narrow down which
>>> >>  > syscalls are triggering those "Clocksource hpet had cycles off"
>>> >>  > messages.  I'm still unclear on exactly what is doing
>>> >>  > the stomping on the hpet.
>>> >>
>>> >> First I ran trinity with "-g vm" which limits it to use just
>>> >> a subset of syscalls, specifically VM related ones.
>>> >> That triggered the messages. Further experiments revealed:
>> >
>> > So I can trigger the false positives with my original patch quite
>> > easily by just putting my box under some load. My numbers are nowhere
>> > near as bad as yours, but then, I didn't put it under as much load
>> > anyway. Just a regular "make -j64" of the kernel.
>> >
>> > I suspect your false positives are bigger partly because of the load,
>> > but mostly because you presumably have preemption enabled too. I don't
>> > do preemption in my normal kernels, and that limits the damage of the
>> > race a bit.
>> >
>> > I have a newer version of the patch that gets rid of the false
>> > positives with some ordering rules instead, and just for you I hacked
>> > it up to say where the problem happens too, but it's likely too late.
>> >
>> > The fact that the original racy patch seems to make a difference for
>> > you does say that yes, we seem to be zeroing in on the right area
>> > here, but I'm not seeing what's wrong. I was hoping for big jumps from
>> > your HPET, since your "TSC unstable" messages do kind of imply that
>> > such really big jumps can happen.
>> >
>> > I'm attaching my updated hacky patch, although I assume it's much too
>> > late for that machine. Don't look too closely at the backtrace
>> > generation part, that's just a quick hack, and only works with frame
>> > pointers enabled anyway.
>> >
>> > So I'm still a bit unhappy about not figuring out *what* is wrong. And
>> > I'd still like the dmidecode from that machine, just for posterity. In
>> > case we can figure out some pattern.
>> >
>> > So right now I can imagine several reasons:
>> >
>> >  - actual hardware bug.
>> >
>> >    This is *really* unlikely, though. It should hit everybody. The
>> > HPET is in the core intel chipset, we're not talking random unusual
>> > hardware by fly-by-night vendors here.
>> >
>> >  - some SMM/BIOS "power management" feature.
>> >
>> >    We've seen this before, where the SMM saves/restores the TSC on
>> > entry/exit in order to hide itself from the system. I could imagine
>> > similar code for the HPET counter. SMM writers use some bad drugs to
>> > dull their pain.
>> >
>> >    And with the HPET counter, since it's not even per-CPU, the "save
>> > and restore HPET" will actually show up as "HPET went backwards" to
>> > the other non-SMM CPU's if it happens
>> >
>> >  - a bug in our own clocksource handling.
>> >
>> >    I'm not seeing it. But maybe my patch hides it for some magical reason.
> So I sent out a first step validation check to warn us if we end up
> with idle periods that are larger then we expect.
> 
> It doesn't yet cap the timekeeping_get_ns() output (like you're patch
> effectively does), but it would be easy to do that in a following
> patch.
> 
> I did notice while testing this that the max_idle_ns (max idle time we
> report to the scheduler) for the hpet is only ~16sec, and we'll
> overflow after just ~21seconds. This second number maps closely to the
> 22 second stalls seen in the  nmi watchdog reports which seems
> interesting, but I also realize that qemu uses a 100MHz hpet, where as
> real hardware is likely to be a bit slower, so maybe that's just
> chance..
> 
> I'd be interested if folks seeing anything similar to Dave would give
> my patch a shot.

I ran it overnight, but I didn't see any of the new warnings in the logs.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2015-01-03  0:27                                                                                                                                           ` John Stultz
  2015-01-03 14:58                                                                                                                                             ` Sasha Levin
@ 2015-01-04 19:46                                                                                                                                             ` Linus Torvalds
  2015-01-06  1:17                                                                                                                                               ` John Stultz
  1 sibling, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2015-01-04 19:46 UTC (permalink / raw)
  To: John Stultz
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Fri, Jan 2, 2015 at 4:27 PM, John Stultz <john.stultz@linaro.org> wrote:
>
> So I sent out a first step validation check to warn us if we end up
> with idle periods that are larger then we expect.

.. not having tested it, this is just from reading the patch, but it
would *seem* that it doesn't actually validate the clock reading much
at all.

Why? Because most of the time, for crap clocks like HPET, the real
limitation will be not the multiplication overflow, but the "mask",
which is just 32-bit (or worse - I think the ACPI PM timer might be
just 24 bits).

So then you effectively "validate" that the timer difference value
fits in mask, but that isn't any validation at all - it's just a
truism. Since we by definition mask the difference to just the valid
bitmask.

So I really think that the maximum valid clock needs to be narrowed
down from the "technically, this clock can count to X".

But maybe I'm wrong, and the multiplication overflow is actually often
the real limit. What are the actual values for real timer sources?

                        Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2015-01-04 19:46                                                                                                                                             ` Linus Torvalds
@ 2015-01-06  1:17                                                                                                                                               ` John Stultz
  2015-01-06  1:25                                                                                                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: John Stultz @ 2015-01-06  1:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Sun, Jan 4, 2015 at 11:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Jan 2, 2015 at 4:27 PM, John Stultz <john.stultz@linaro.org> wrote:
>>
>> So I sent out a first step validation check to warn us if we end up
>> with idle periods that are larger then we expect.
>
> .. not having tested it, this is just from reading the patch, but it
> would *seem* that it doesn't actually validate the clock reading much
> at all.
>
> Why? Because most of the time, for crap clocks like HPET, the real
> limitation will be not the multiplication overflow, but the "mask",
> which is just 32-bit (or worse - I think the ACPI PM timer might be
> just 24 bits).
>
> So then you effectively "validate" that the timer difference value
> fits in mask, but that isn't any validation at all - it's just a
> truism. Since we by definition mask the difference to just the valid
> bitmask.
>
> So I really think that the maximum valid clock needs to be narrowed
> down from the "technically, this clock can count to X".
>
> But maybe I'm wrong, and the multiplication overflow is actually often
> the real limit. What are the actual values for real timer sources?


As you point out, for clocksources where the mask is 32bits or under,
we shouldn't have any risk of multiplication overflow, since mult is a
32bit. So yes, the max_cycles on those probably should be the same as
the mask, and isn't useful on those clocksources to test if we run
over (though warning if we're within the 12% margin could be useful).
But for clocksources who have larger masks, it still could be a useful
check (@2ghz tscs overflow 32bits in 2 seconds), although the mult
value is targets an mult overflow at ~10 minutes, so its less likely
that we really hit it.

However, it ends up the calculations we use are a little more
conservative, treating the result as signed and so they avoid
multiplications that could run into a that sign bit. This looks like
an error to me (the code is also used for clockevents, so I haven't
run through to see if the requirements are different there), but a
conservative one, which results in the maximum idle interval to be
~halved what they could be (and we add yet another 12% margin on that
- so we probably need to just pick one or the other, not both).

So even on 32bit masks, max_cycles in my patch is smaller then 32bits.
That's why I was able to hit both warnings in my testing with the hpet
by sending SIGSTOP to qemu.

Anyway, It may be worth keeping the 50% margin (and dropping the 12%
reduction to simplify things), since I've not heard recent complaints
about timekeeping limiting idle lengths (but could be wrong here).
This would give you something closer to the 1/8th of the mask that you
were using in your patch (and on larger mask clocksources, we do
already cap the interval at 10 minutes - so most really crazy values
would be caught for clocksources like the TSC - and maybe we can make
this more configurable so we can shorten it as done in your patch to
try to debug things).

I've also got a capping patch that I'm testing that keeps time reads
from passing that interval. The only thing I'm really cautious about
with that change is that we have to make sure the hrtimer that
triggers update_wall_clock is always set to expire within that cap (I
need to review it again) or else we'll hang ourselves.

thanks
-john

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2015-01-06  1:17                                                                                                                                               ` John Stultz
@ 2015-01-06  1:25                                                                                                                                                 ` Linus Torvalds
  2015-01-06  2:05                                                                                                                                                   ` John Stultz
  0 siblings, 1 reply; 486+ messages in thread
From: Linus Torvalds @ 2015-01-06  1:25 UTC (permalink / raw)
  To: John Stultz
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Mon, Jan 5, 2015 at 5:17 PM, John Stultz <john.stultz@linaro.org> wrote:
>
> Anyway, It may be worth keeping the 50% margin (and dropping the 12%
> reduction to simplify things)

Again, the 50% margin is only on the multiplication overflow. Not on the mask.

So it won't do anything at all for the case we actually care about,
namely a broken HPET, afaik.

I'd much rather limit to 50% of the mask too.

Also, why do we actually play games with ilog2 for that overflow
calculation? It seems pointless. This is for the setup code, doing a
real division there would seem to be a whole lot more straightforward,
and not need that big comment. And there's no performance issue. Am I
missing something?

> I've also got a capping patch that I'm testing that keeps time reads
> from passing that interval. The only thing I'm really cautious about
> with that change is that we have to make sure the hrtimer that
> triggers update_wall_clock is always set to expire within that cap (I
> need to review it again) or else we'll hang ourselves.

 Yeah, that thing is fragile. And quite possibly part of the problem.

                       Linus

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2015-01-06  1:25                                                                                                                                                 ` Linus Torvalds
@ 2015-01-06  2:05                                                                                                                                                   ` John Stultz
  0 siblings, 0 replies; 486+ messages in thread
From: John Stultz @ 2015-01-06  2:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Thomas Gleixner, Chris Mason, Mike Galbraith,
	Ingo Molnar, Peter Zijlstra, Dâniel Fraga, Sasha Levin,
	Paul E. McKenney, Linux Kernel Mailing List, Suresh Siddha,
	Oleg Nesterov, Peter Anvin

On Mon, Jan 5, 2015 at 5:25 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 5, 2015 at 5:17 PM, John Stultz <john.stultz@linaro.org> wrote:
>>
>> Anyway, It may be worth keeping the 50% margin (and dropping the 12%
>> reduction to simplify things)
>
> Again, the 50% margin is only on the multiplication overflow. Not on the mask.

Right, but we calculate the mult value based on the mask (or 10 mins,
whichever is shorter).

So then when we go back and calculate the max_cycles/max_idle_ns using
the mult, we end up with a value smaller then the mask. So the
scheduler shouldn't push idle times out beyond that and the debug
logic in my patch should be able to catch strangely large values.

> So it won't do anything at all for the case we actually care about,
> namely a broken HPET, afaik.

Yea, the case my code doesn't catch that yours did is for slightly
broken clocksources (I'm thinking two cpus which virtual hpets
embedded in them that are slightly off) where you could get negative
deltas right after the update. In that case the capping on read is
really needed since by the next update the stale value has grown large
enough to look like a reasonable offset. The TSC has a similar issue,
but its easier to check for negative values because it won't
reasonably ever overflow.

>
> I'd much rather limit to 50% of the mask too.

Ok, I'll try to rework the code to make this choice and make it more
explicitly clear.


> Also, why do we actually play games with ilog2 for that overflow
> calculation? It seems pointless. This is for the setup code, doing a
> real division there would seem to be a whole lot more straightforward,
> and not need that big comment. And there's no performance issue. Am I
> missing something?

I feel like there was a time when this may have been called by some of
the clocksource code if it they changed frequency (I think over
suspend/resume), but I'm not seeing it in the current source. So yea,
likely something to simplify.

>> I've also got a capping patch that I'm testing that keeps time reads
>> from passing that interval. The only thing I'm really cautious about
>> with that change is that we have to make sure the hrtimer that
>> triggers update_wall_clock is always set to expire within that cap (I
>> need to review it again) or else we'll hang ourselves.
>
>  Yeah, that thing is fragile. And quite possibly part of the problem.

"Time is a flat circle..." and thus unfortunately requires some
circular logic. :)

thanks
-john

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2014-12-21 23:58                                                                                                                         ` Linus Torvalds
  2014-12-22  0:41                                                                                                                           ` Linus Torvalds
@ 2015-01-12 10:05                                                                                                                           ` Thomas Gleixner
  1 sibling, 0 replies; 486+ messages in thread
From: Thomas Gleixner @ 2015-01-12 10:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Chris Mason, Mike Galbraith, Ingo Molnar,
	Peter Zijlstra, Dâniel Fraga, Sasha Levin, Paul E. McKenney,
	Linux Kernel Mailing List, Suresh Siddha, Oleg Nesterov,
	Peter Anvin

On Sun, 21 Dec 2014, Linus Torvalds wrote:
> On Sun, Dec 21, 2014 at 2:32 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
> > On Sun, Dec 21, 2014 at 02:19:03PM -0800, Linus Torvalds wrote:
> > >
> >  > And finally, and stupidly, is there any chance that you have anything
> >  > accessing /dev/hpet?
> >
> > Not knowingly at least, but who the hell knows what systemd has its
> > fingers in these days.
> 
> Actually, it looks like /dev/hpet doesn't allow write access.
> 
> I can do the mmap(/dev/mem) thing and access the HPET by hand, and
> when I write zero to it I immediately get something like this:
> 
>   Clocksource tsc unstable (delta = -284317725450 ns)
>   Switched to clocksource hpet
> 
> just to confirm that yes, a jump in the HPET counter would indeed give
> those kinds of symptoms:blaming the TSC with a negative delta in the
> 0-300s range, even though it's the HPET that is broken.
> 
> And if the HPET then occasionally jumps around afterwards, it would
> show up as ktime_get() occasionally going backwards, which in turn
> would - as far as I can tell - result in exactly that pseudo-infirnite
> loop with timers.
> 
> Anyway, any wild kernel pointer access *could* happen to just hit the
> HPET and write to the main counter value, although I'd personally be
> more inclined to blame BIOS/SMM kind of code playing tricks with
> time.. We do have a few places where we explicitly write the value on
> purpose, but they are in the HPET init code, and in the clocksource
> resume code, so they should not be involved.

Right.
 
> Thomas - have you had reports of HPET breakage in RT circles, the same
> way BIOSes have been tinkering with TSC?

Not that I'm aware of.
 
> Also, would it perhaps be a good idea to make "ktime_get()" save the
> last time in a percpu variable, and warn if time ever goes backwards
> on a particular CPU?  A percpu thing should be pretty cheap, even if
> we write to it every time somebody asks for time..

That should be simple enough to implement.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
  2015-02-12 11:09 Martin van Es
@ 2015-02-12 16:01 ` Linus Torvalds
  0 siblings, 0 replies; 486+ messages in thread
From: Linus Torvalds @ 2015-02-12 16:01 UTC (permalink / raw)
  To: Martin van Es; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

On Thu, Feb 12, 2015 at 3:09 AM, Martin van Es <mrvanes@gmail.com> wrote:
>
> Best I can come up with now is try the next mainline that has all the
> fixes and ideas in this thread incorporated. Would that be 3.19?

Yes. I'm attaching a patch (very much experimental - it might
introduce new problems rather than fix old ones) that might also be
worth testing on top of 3.19.

> I'm sorry I couldn't be more helpful.

Hey, so far nobody else has been able to pin this down either. It
seems to be very timing-specific, and it's possible (even likely,
considering DaveJ's adventures) that while you cannot trigger it with
3.16.7, it might be lurking there too, just not with the kind of
timing that can trigger it on your machine. Which would explain the
bisection trouble.

It would have been wonderful if somebody had been able to really
reproduce it truly reliably, but it seems to be very slippery.

                          Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 4698 bytes --]

 kernel/smp.c | 78 ++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f38a1e692259..2aaac2c47683 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,7 +19,7 @@
 
 enum {
 	CSD_FLAG_LOCK		= 0x01,
-	CSD_FLAG_WAIT		= 0x02,
+	CSD_FLAG_SYNCHRONOUS	= 0x02,
 };
 
 struct call_function_data {
@@ -107,7 +107,7 @@ void __init call_function_init(void)
  */
 static void csd_lock_wait(struct call_single_data *csd)
 {
-	while (csd->flags & CSD_FLAG_LOCK)
+	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
 		cpu_relax();
 }
 
@@ -121,19 +121,17 @@ static void csd_lock(struct call_single_data *csd)
 	 * to ->flags with any subsequent assignments to other
 	 * fields of the specified call_single_data structure:
 	 */
-	smp_mb();
+	smp_wmb();
 }
 
 static void csd_unlock(struct call_single_data *csd)
 {
-	WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
+	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
 
 	/*
 	 * ensure we're all done before releasing data:
 	 */
-	smp_mb();
-
-	csd->flags &= ~CSD_FLAG_LOCK;
+	smp_store_release(&csd->flags, 0);
 }
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data);
@@ -144,13 +142,16 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data);
  * ->func, ->info, and ->flags set.
  */
 static int generic_exec_single(int cpu, struct call_single_data *csd,
-			       smp_call_func_t func, void *info, int wait)
+			       smp_call_func_t func, void *info)
 {
-	struct call_single_data csd_stack = { .flags = 0 };
-	unsigned long flags;
-
-
 	if (cpu == smp_processor_id()) {
+		unsigned long flags;
+
+		/*
+		 * We can unlock early even for the synchronous on-stack case,
+		 * since we're doing this from the same CPU..
+		 */
+		csd_unlock(csd);
 		local_irq_save(flags);
 		func(info);
 		local_irq_restore(flags);
@@ -161,21 +162,9 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
 	if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
 		return -ENXIO;
 
-
-	if (!csd) {
-		csd = &csd_stack;
-		if (!wait)
-			csd = this_cpu_ptr(&csd_data);
-	}
-
-	csd_lock(csd);
-
 	csd->func = func;
 	csd->info = info;
 
-	if (wait)
-		csd->flags |= CSD_FLAG_WAIT;
-
 	/*
 	 * The list addition should be visible before sending the IPI
 	 * handler locks the list to pull the entry off it because of
@@ -190,9 +179,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
 	if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
 		arch_send_call_function_single_ipi(cpu);
 
-	if (wait)
-		csd_lock_wait(csd);
-
 	return 0;
 }
 
@@ -250,8 +236,17 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
 	}
 
 	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
-		csd->func(csd->info);
-		csd_unlock(csd);
+		smp_call_func_t func = csd->func;
+		void *info = csd->info;
+
+		/* Do we wait until *after* callback? */
+		if (csd->flags & CSD_FLAG_SYNCHRONOUS) {
+			func(info);
+			csd_unlock(csd);
+		} else {
+			csd_unlock(csd);
+			func(info);
+		}
 	}
 
 	/*
@@ -274,6 +269,8 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
 int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 			     int wait)
 {
+	struct call_single_data *csd;
+	struct call_single_data csd_stack = { .flags = CSD_FLAG_LOCK | CSD_FLAG_SYNCHRONOUS };
 	int this_cpu;
 	int err;
 
@@ -292,7 +289,16 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 		     && !oops_in_progress);
 
-	err = generic_exec_single(cpu, NULL, func, info, wait);
+	csd = &csd_stack;
+	if (!wait) {
+		csd = this_cpu_ptr(&csd_data);
+		csd_lock(csd);
+	}
+
+	err = generic_exec_single(cpu, csd, func, info);
+
+	if (wait)
+		csd_lock_wait(csd);
 
 	put_cpu();
 
@@ -321,7 +327,15 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
 	int err = 0;
 
 	preempt_disable();
-	err = generic_exec_single(cpu, csd, csd->func, csd->info, 0);
+
+	/* We could deadlock if we have to wait here with interrupts disabled! */
+	if (WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK))
+		csd_lock_wait(csd);
+
+	csd->flags = CSD_FLAG_LOCK;
+	smp_wmb();
+
+	err = generic_exec_single(cpu, csd, csd->func, csd->info);
 	preempt_enable();
 
 	return err;
@@ -433,6 +447,8 @@ void smp_call_function_many(const struct cpumask *mask,
 		struct call_single_data *csd = per_cpu_ptr(cfd->csd, cpu);
 
 		csd_lock(csd);
+		if (wait)
+			csd->flags |= CSD_FLAG_SYNCHRONOUS;
 		csd->func = func;
 		csd->info = info;
 		llist_add(&csd->llist, &per_cpu(call_single_queue, cpu));

^ permalink raw reply related	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
@ 2015-02-12 11:09 Martin van Es
  2015-02-12 16:01 ` Linus Torvalds
  0 siblings, 1 reply; 486+ messages in thread
From: Martin van Es @ 2015-02-12 11:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

To follow up on this long standing promise to bisect.

I've made two attempts at bisecting and both landed in limbo. It's
hard to explain but it feels like this bug has quantum properties;

I know for sure it's present in 3.17 and not in 3.16(.7). But once I
start bisecting it gets less pronounced. I know it sound vague, but
the best explanations I get come up with are:
- The out-of-kernel dvb-c driver and hardware are playing funny games
(although being absolutely stable in 3.16?)
- and/or the problem consists of two or more commits that
independently don't express themselves like I see in 3.17?

So, my second bisecting ended at
3be738ad55d8e2b9d949eb0d830de5aa4d4f8e05, which is nonsense because
it's a commit to a staging module that I don't even compile.

Best I can come up with now is try the next mainline that has all the
fixes and ideas in this thread incorporated. Would that be 3.19?

I'm sorry I couldn't be more helpful.

Best regards,
Martin

On Sat, Dec 6, 2014 at 10:14 PM, Martin van Es <mrvanes@gmail.com> wrote:
> On Sat, Dec 6, 2014 at 9:09 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Sat, Dec 6, 2014 at 8:22 AM, Martin van Es <mrvanes@gmail.com> wrote:
>>>
>>> Hope this may help in finding the right direction for this bug?
>>
>> If you can reproduce it with your spare J1900 system and could perhaps
>> bisect it there, that would be a huge help.
>>
>
> I'll give it a shot and see if I can get to freeze it on 3.17.3 as a
> start by playing content from the prd backend, but don't expect fast
> respons times... busy man...
>
> M.



-- 
If 'but' was any useful, it would be a logic operator

^ permalink raw reply	[flat|nested] 486+ messages in thread

* Re: frequent lockups in 3.18rc4
@ 2014-12-16  3:04 Hillf Danton
  0 siblings, 0 replies; 486+ messages in thread
From: Hillf Danton @ 2014-12-16  3:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Mike Galbraith, Ingo Molnar, Peter Zijlstra,
	'Chris Mason',
	linux-kernel

> 
> But me not seeing any other bug clearly doesn't mean it doesn't exist.
> 
Perhaps  we can easy the zap loop if it is busy.

thanks
Hillf

--- a/mm/memory.c	Tue Dec 16 10:38:03 2014
+++ b/mm/memory.c	Tue Dec 16 10:42:07 2014
@@ -1212,8 +1212,10 @@ again:
 		force_flush = 0;
 		tlb_flush_mmu_free(tlb);
 
-		if (addr != end)
+		if (addr != end) {
+			cond_resched();
 			goto again;
+		}
 	}
 
 	return addr;
--



^ permalink raw reply	[flat|nested] 486+ messages in thread

end of thread, other threads:[~2015-02-12 16:01 UTC | newest]

Thread overview: 486+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 21:31 frequent lockups in 3.18rc4 Dave Jones
2014-11-14 22:01 ` Linus Torvalds
2014-11-14 22:30   ` Dave Jones
2014-11-14 22:55   ` Thomas Gleixner
2014-11-14 23:32     ` Dave Jones
2014-11-15  0:36       ` Thomas Gleixner
2014-11-15  2:40         ` Dave Jones
2014-11-16 12:16           ` Thomas Gleixner
2014-11-15  1:59     ` Linus Torvalds
2014-11-17 21:22       ` Linus Torvalds
2014-11-17 22:31         ` Thomas Gleixner
2014-11-17 22:43           ` Thomas Gleixner
2014-11-17 22:58             ` Jens Axboe
2014-11-17 23:59             ` Linus Torvalds
2014-11-18  0:15               ` Thomas Gleixner
2014-11-17 23:04         ` Jens Axboe
2014-11-17 23:17           ` Thomas Gleixner
2014-11-18  2:23             ` Jens Axboe
2014-11-15 21:34   ` Dave Jones
2014-11-16  1:40     ` Dave Jones
2014-11-16  6:33       ` Linus Torvalds
2014-11-16 10:06         ` Markus Trippelsdorf
2014-11-16 18:33           ` Linus Torvalds
2014-11-17 17:03         ` Dave Jones
2014-11-17 19:59           ` Linus Torvalds
2014-11-18  2:09             ` Dave Jones
2014-11-18  2:21               ` Linus Torvalds
2014-11-18  2:39                 ` Dave Jones
2014-11-18  2:51                   ` Linus Torvalds
2014-11-18 14:52                     ` Dave Jones
2014-11-18 17:20                       ` Linus Torvalds
2014-11-18 19:28                         ` Thomas Gleixner
2014-11-18 21:25                           ` Don Zickus
2014-11-18 21:31                             ` Dave Jones
2014-11-18 18:54                       ` Thomas Gleixner
2014-11-18 21:55                         ` Don Zickus
2014-11-18 22:02                           ` Dave Jones
2014-11-19 14:41                             ` Don Zickus
2014-11-19 15:03                               ` Vivek Goyal
2014-11-19 15:38                                 ` Dave Jones
2014-11-19 16:28                                   ` Vivek Goyal
2014-11-20 16:10                                     ` Dave Jones
2014-11-20 16:48                                       ` Vivek Goyal
2014-11-20 17:38                                         ` Dave Jones
2014-11-21  9:46                                           ` Dave Young
2014-11-20 16:54                                       ` Vivek Goyal
2014-11-20  9:54                               ` Dave Young
2014-11-19  2:19                           ` Dave Jones
2014-11-19  4:40                             ` Linus Torvalds
2014-11-19  4:59                               ` Dave Jones
2014-11-19  5:15                               ` Dave Jones
2014-11-20 14:36                                 ` Frederic Weisbecker
2014-11-19 14:59                               ` Dave Jones
2014-11-19 17:22                                 ` Linus Torvalds
2014-11-19 17:40                                   ` Linus Torvalds
2014-11-19 19:02                                     ` Frederic Weisbecker
2014-11-19 19:03                                       ` Andy Lutomirski
2014-11-19 23:00                                         ` Frederic Weisbecker
2014-11-19 23:07                                           ` Andy Lutomirski
2014-11-19 23:13                                             ` Frederic Weisbecker
2014-11-19 21:56                                       ` Thomas Gleixner
2014-11-19 22:56                                         ` Frederic Weisbecker
2014-11-19 22:59                                           ` Andy Lutomirski
2014-11-19 23:07                                             ` Frederic Weisbecker
2014-11-19 23:09                                           ` Thomas Gleixner
2014-11-19 23:50                                             ` Frederic Weisbecker
2014-11-20 12:23                                               ` Tejun Heo
2014-11-20 21:58                                                 ` Thomas Gleixner
2014-11-20 22:06                                                   ` Andy Lutomirski
2014-11-20 22:11                                                   ` Tejun Heo
2014-11-20 22:42                                                     ` Thomas Gleixner
2014-11-20 23:05                                                       ` Tejun Heo
2014-11-20 23:08                                                         ` Andy Lutomirski
2014-11-20 23:34                                                           ` Linus Torvalds
2014-11-20 23:39                                                           ` Tejun Heo
2014-11-20 23:55                                                             ` Andy Lutomirski
2014-11-21 16:27                                                               ` Tejun Heo
2014-11-21 16:38                                                                 ` Andy Lutomirski
2014-11-21 16:48                                                                   ` Linus Torvalds
2014-11-21 17:08                                                                     ` Steven Rostedt
2014-11-21 17:19                                                                       ` Linus Torvalds
2014-11-21 17:22                                                                         ` Andy Lutomirski
2014-11-21 18:22                                                                           ` Linus Torvalds
2014-11-21 18:28                                                                             ` Andy Lutomirski
2014-11-21 19:06                                                                             ` Linus Torvalds
2014-11-21 19:23                                                                               ` Steven Rostedt
2014-11-21 19:34                                                                                 ` Linus Torvalds
2014-11-21 19:46                                                                                   ` Linus Torvalds
2014-11-21 19:52                                                                                     ` Andy Lutomirski
2014-11-21 20:14                                                                                       ` Josh Boyer
2014-11-21 20:16                                                                                         ` Andy Lutomirski
2014-11-21 20:23                                                                                           ` Josh Boyer
2014-11-24 18:48                                                                                             ` Konrad Rzeszutek Wilk
2014-11-24 19:07                                                                                               ` Josh Boyer
2014-11-25  5:36                                                                                               ` Jürgen Groß
2014-11-25 17:22                                                                                                 ` Linus Torvalds
2014-11-21 20:00                                                                                     ` Dave Jones
2014-11-21 20:02                                                                                       ` Andy Lutomirski
2014-11-21 19:51                                                                               ` Thomas Gleixner
2014-11-21 20:00                                                                                 ` Linus Torvalds
2014-11-21 20:16                                                                                   ` Thomas Gleixner
2014-11-21 20:41                                                                                     ` Linus Torvalds
2014-11-21 21:11                                                                                       ` Thomas Gleixner
2014-11-21 22:55                                                                                         ` Linus Torvalds
2014-11-21 23:03                                                                                           ` Andy Lutomirski
2014-11-21 23:33                                                                                             ` Linus Torvalds
2014-12-16 19:28                                                                                           ` Peter Zijlstra
2014-12-16 20:46                                                                                             ` Linus Torvalds
2014-12-16 21:19                                                                                               ` Mel Gorman
2014-12-16 23:02                                                                                                 ` Peter Zijlstra
2014-12-17  0:00                                                                                                   ` Linus Torvalds
2014-12-17  0:41                                                                                                     ` Andy Lutomirski
2014-12-17 17:01                                                                                                       ` Konrad Rzeszutek Wilk
2014-12-17 17:14                                                                                                         ` Peter Zijlstra
2014-11-21 22:33                                                                                 ` Konrad Rzeszutek Wilk
2014-11-22  1:17                                                                                   ` Thomas Gleixner
2014-11-21 17:34                                                                         ` Steven Rostedt
2014-11-21 18:24                                                                           ` Linus Torvalds
2014-11-21 22:10                                                                   ` Frederic Weisbecker
2014-11-21  2:33                                                             ` Steven Rostedt
2014-11-21  0:54                                                         ` Thomas Gleixner
2014-11-21 14:13                                                           ` Frederic Weisbecker
2014-11-21 16:25                                                             ` Tejun Heo
2014-11-21 17:01                                                               ` Steven Rostedt
2014-11-21 17:11                                                                 ` Steven Rostedt
2014-11-21 21:32                                                                 ` Frederic Weisbecker
2014-11-21 21:34                                                                   ` Andy Lutomirski
2014-11-21 21:50                                                                     ` Frederic Weisbecker
2014-11-21 22:45                                                                       ` Steven Rostedt
2014-11-21 21:44                                                               ` Frederic Weisbecker
2014-11-22  0:11                                                                 ` Tejun Heo
2014-11-22  0:18                                                                   ` Linus Torvalds
2014-11-22  0:41                                                                     ` Andy Lutomirski
2014-11-19 23:54                                             ` Andy Lutomirski
2014-11-20  0:00                                               ` Thomas Gleixner
2014-11-20  0:30                                                 ` Andy Lutomirski
2014-11-20  0:40                                                   ` Linus Torvalds
2014-11-20  0:49                                                     ` Andy Lutomirski
2014-11-20  1:07                                                       ` Linus Torvalds
2014-11-20  1:16                                                         ` Andy Lutomirski
2014-11-20  2:42                                                           ` Linus Torvalds
2014-11-20  6:16                                                             ` Andy Lutomirski
2014-11-19 19:15                                   ` Andy Lutomirski
2014-11-19 19:38                                     ` Linus Torvalds
2014-11-19 22:18                                       ` Dave Jones
2014-11-19 21:01                                 ` Andy Lutomirski
2014-11-19 21:47                                   ` Dave Jones
2014-11-19 21:58                                     ` Borislav Petkov
2014-11-19 22:18                                       ` Dave Jones
2014-11-20 10:33                                         ` Borislav Petkov
2014-11-19 21:56                                   ` [PATCH] x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 Andy Lutomirski
2014-11-19 22:13                                     ` Thomas Gleixner
2014-11-20 20:33                                       ` Linus Torvalds
2014-11-20 22:07                                         ` Thomas Gleixner
2014-11-20 22:04                                     ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2014-11-20 15:25                                   ` frequent lockups in 3.18rc4 Dave Jones
2014-11-20 19:43                                     ` Linus Torvalds
2014-11-20 20:06                                       ` Dave Jones
2014-11-20 20:37                                       ` Don Zickus
2014-11-20 20:51                                         ` Linus Torvalds
2014-11-21  6:37                                       ` Ingo Molnar
2014-11-21 14:50                                         ` Dave Jones
2014-11-25 12:22                                     ` Will Deacon
2014-12-01 11:48                                       ` Will Deacon
2014-12-01 17:05                                         ` Linus Torvalds
2014-12-01 17:10                                           ` Will Deacon
2014-12-01 17:53                                             ` Linus Torvalds
2014-12-01 18:25                                               ` Kirill A. Shutemov
2014-12-01 18:36                                                 ` Linus Torvalds
2014-12-04 10:51                                                   ` Will Deacon
2014-12-04 14:56                                                     ` Dave Jones
2014-12-05 13:49                                                       ` Will Deacon
2014-11-20 15:04                                 ` Frederic Weisbecker
2014-11-20 15:08           ` Frederic Weisbecker
2014-11-20 16:19             ` Dave Jones
2014-11-20 16:42               ` Frederic Weisbecker
2014-11-26  0:25         ` Dave Jones
2014-11-26  1:48           ` Linus Torvalds
2014-11-26  2:40             ` Dave Jones
2014-11-26 22:57               ` Dave Jones
2014-11-27  0:46                 ` Linus Torvalds
2014-11-27 19:17                 ` Linus Torvalds
2014-11-27 22:56                   ` Dave Jones
2014-11-29 20:38                     ` Dâniel Fraga
2014-11-30 20:45                       ` Linus Torvalds
2014-11-30 21:21                         ` Dâniel Fraga
2014-12-01  0:21                           ` Linus Torvalds
2014-12-01  1:02                             ` Dâniel Fraga
2014-12-01 19:14                               ` Paul E. McKenney
2014-12-01 20:28                                 ` Dâniel Fraga
2014-12-01 20:36                                   ` Linus Torvalds
2014-12-01 23:08                                     ` Chris Mason
2014-12-01 23:25                                       ` Linus Torvalds
2014-12-01 23:44                                         ` Chris Mason
2014-12-02  0:39                                           ` Linus Torvalds
2014-12-02 14:13                                       ` Mike Galbraith
2014-12-02 16:33                                         ` Linus Torvalds
2014-12-02 17:14                                           ` Chris Mason
2014-12-03 18:41                                             ` Dave Jones
2014-12-03 18:45                                               ` Linus Torvalds
2014-12-03 19:00                                                 ` Dave Jones
2014-12-03 19:25                                                   ` Linus Torvalds
2014-12-03 19:30                                                     ` Dave Jones
2014-12-03 19:48                                                     ` Linus Torvalds
2014-12-03 20:09                                                       ` Dave Jones
2014-12-03 20:37                                                         ` Linus Torvalds
2014-12-03 20:55                                                           ` Thomas Gleixner
2014-12-03 21:14                                                             ` Linus Torvalds
2014-12-03 22:19                                                               ` Thomas Gleixner
2014-12-03 23:21                                                                 ` Dave Jones
2014-12-03 23:49                                                                   ` Thomas Gleixner
2014-12-04  0:19                                                                     ` Linus Torvalds
2014-12-04  1:02                                                                       ` Thomas Gleixner
2014-12-04  0:20                                                                     ` Dave Jones
2014-12-04  0:59                                                                       ` Thomas Gleixner
2014-12-04  1:32                                                                         ` Dave Jones
2014-12-04  3:45                                                                           ` Dave Jones
2014-12-03 19:56                                                     ` John Stultz
2014-12-03 20:37                                                       ` Thomas Gleixner
2014-12-03 20:44                                                         ` Dave Jones
2014-12-03 20:59                                                           ` Thomas Gleixner
2014-12-03 21:05                                                             ` Dave Jones
2014-12-03 21:48                                                               ` Thomas Gleixner
2014-12-03 20:39                                                       ` Thomas Gleixner
2014-12-04  3:15                                                         ` Chris Mason
2014-12-04  5:49                                                           ` Linus Torvalds
2014-12-04 14:57                                                             ` Chris Mason
2014-12-04 15:22                                                             ` Dave Hansen
2014-12-04 15:30                                                               ` Chris Mason
2014-12-03 19:59                                                   ` Chris Mason
2014-12-03 20:11                                                     ` Dave Jones
2014-12-03 20:56                                                       ` Chris Mason
2014-12-04  0:27                                                 ` Dave Jones
2014-12-05 17:15                                                 ` Dave Jones
2014-12-05 18:38                                                   ` Linus Torvalds
2014-12-05 18:48                                                     ` Dave Jones
2014-12-05 19:31                                                       ` Linus Torvalds
2014-12-05 19:37                                                         ` Dave Jones
2014-12-06 22:38                                                         ` Thomas Gleixner
2014-12-06  9:37                                                       ` Chuck Ebbert
2014-12-06 16:22                                                         ` Martin van Es
2014-12-06 20:09                                                           ` Linus Torvalds
2014-12-06 20:41                                                             ` Linus Torvalds
2014-12-06 21:14                                                             ` Martin van Es
2014-12-12 12:58                                                             ` Martin van Es
2014-12-15 12:07                                                               ` Martin van Es
2014-12-06 22:14                                                         ` Thomas Gleixner
2014-12-05 19:04                                                     ` Chris Mason
2014-12-05 19:29                                                       ` Linus Torvalds
2014-12-11 14:54                                                         ` Dave Jones
2014-12-11 21:49                                                           ` Linus Torvalds
2014-12-11 21:52                                                             ` Sasha Levin
2014-12-11 21:57                                                               ` Chris Mason
2014-12-11 22:00                                                                 ` Sasha Levin
2014-12-11 22:36                                                               ` Linus Torvalds
2014-12-11 22:57                                                                 ` Sasha Levin
2014-12-12  6:54                                                                   ` Ingo Molnar
2014-12-12 23:54                                                                   ` Sasha Levin
2014-12-13  0:23                                                                     ` Linus Torvalds
2014-12-13  0:34                                                                       ` Sasha Levin
2014-12-13  0:44                                                                         ` Linus Torvalds
2014-12-13 16:28                                                                           ` Jeff Chua
2014-12-13  2:32                                                                       ` Dave Jones
2014-12-11 21:57                                                             ` Borislav Petkov
2014-12-12  3:03                                                             ` Dave Jones
2014-12-12  4:45                                                               ` Dave Jones
2014-12-12 14:38                                                                 ` Dave Jones
2014-12-12 18:24                                                                   ` Paul E. McKenney
2014-12-12 18:10                                                                 ` Paul E. McKenney
2014-12-12 18:42                                                                   ` Dave Jones
2014-12-12 18:54                                                             ` Dave Jones
2014-12-12 19:14                                                               ` Linus Torvalds
2014-12-12 19:23                                                                 ` Dave Jones
2014-12-12 19:58                                                                 ` David Lang
2014-12-12 20:20                                                                   ` Linus Torvalds
2014-12-13  7:43                                                                     ` Ingo Molnar
2014-12-12 20:34                                                                   ` Paul E. McKenney
2014-12-12 21:23                                                                     ` Sasha Levin
2014-12-13  0:58                                                                       ` Paul E. McKenney
2014-12-13 12:08                                                                         ` Paul E. McKenney
2014-12-13  8:30                                                                       ` Ingo Molnar
2014-12-13 15:53                                                                         ` Sasha Levin
2014-12-13 18:07                                                                           ` Paul E. McKenney
2014-12-14 17:50                                                                             ` Paul E. McKenney
2014-12-14 23:46                                                                               ` Sasha Levin
2014-12-15  0:11                                                                                 ` Paul E. McKenney
2014-12-15  1:20                                                                                   ` Sasha Levin
2014-12-15  6:33                                                                                     ` Paul E. McKenney
2014-12-15 12:56                                                                                       ` Paul E. McKenney
2014-12-15 13:16                                                                                         ` Sasha Levin
2014-12-16  3:40                                                                                           ` Paul E. McKenney
2014-12-13  7:36                                                                 ` [PATCH] sched: Fix lost reschedule in __cond_resched() Ingo Molnar
2014-12-14 18:04                                                                   ` Frederic Weisbecker
2014-12-14 19:43                                                                     ` Ingo Molnar
2014-12-14 19:50                                                                     ` Linus Torvalds
2014-12-14 20:30                                                                       ` Frederic Weisbecker
2014-12-13  8:19                                                                 ` frequent lockups in 3.18rc4 Ingo Molnar
2014-12-13  8:27                                                                   ` Ingo Molnar
2014-12-13 14:15                                                                     ` Sasha Levin
2014-12-13 16:59                                                                 ` Dave Jones
2014-12-13 18:04                                                                   ` Paul E. McKenney
2014-12-13 20:41                                                                     ` Dave Jones
2014-12-14  4:04                                                                       ` Paul E. McKenney
2014-12-13 22:36                                                                   ` Dave Jones
2014-12-13 22:40                                                                     ` Linus Torvalds
2014-12-13 22:59                                                                       ` Linus Torvalds
2014-12-13 23:09                                                                         ` Linus Torvalds
2014-12-13 23:35                                                                           ` Al Viro
2014-12-13 23:38                                                                             ` Linus Torvalds
2014-12-13 23:47                                                                               ` Al Viro
2014-12-14  0:14                                                                                 ` Linus Torvalds
2014-12-14  0:33                                                                                   ` Al Viro
2014-12-14  1:35                                                                                     ` Linus Torvalds
2014-12-14  3:14                                                                                       ` Al Viro
2014-12-15  0:18                                                                                         ` Al Viro
2014-12-13 23:39                                                                         ` Al Viro
2014-12-14 23:46                                                                       ` Dave Jones
2014-12-15  0:38                                                                         ` Linus Torvalds
2014-12-15  0:42                                                                           ` Dave Jones
2014-12-15  5:47                                                                           ` Linus Torvalds
2014-12-15  5:57                                                                             ` Dave Jones
2014-12-15 18:21                                                                               ` Linus Torvalds
2014-12-15 23:46                                                                                 ` Linus Torvalds
2014-12-18  2:42                                                                                   ` Sasha Levin
2014-12-18  2:45                                                                                     ` Linus Torvalds
2014-12-18  5:13                                                                                   ` Dave Jones
2014-12-18 15:54                                                                                     ` Chris Mason
2014-12-18 16:12                                                                                       ` Dave Jones
2014-12-19  2:45                                                                                         ` Dave Jones
2014-12-19  3:49                                                                                           ` Linus Torvalds
2014-12-19  3:58                                                                                             ` Dave Jones
2014-12-19  4:03                                                                                               ` Dave Jones
2014-12-19  4:48                                                                                                 ` Linus Torvalds
2014-12-19 11:35                                                                                                   ` Peter Zijlstra
2014-12-19 14:55                                                                                                   ` Dave Jones
2014-12-19 15:14                                                                                                     ` Chris Mason
2014-12-19 19:15                                                                                                     ` Linus Torvalds
2014-12-19 19:44                                                                                                       ` Peter Zijlstra
2014-12-19 19:51                                                                                                       ` Linus Torvalds
2014-12-19 20:46                                                                                                         ` Linus Torvalds
2014-12-19 20:54                                                                                                           ` Dave Jones
2014-12-19 22:05                                                                                                             ` Linus Torvalds
2014-12-20 16:49                                                                                                               ` Dave Jones
2014-12-19 20:31                                                                                                       ` Chris Mason
2014-12-19 20:36                                                                                                         ` Dave Jones
2014-12-19 23:22                                                                                                         ` Thomas Gleixner
2014-12-20  0:12                                                                                                           ` Chris Mason
2014-12-20  1:06                                                                                                             ` Thomas Gleixner
2014-12-19 23:14                                                                                                       ` Thomas Gleixner
2014-12-19 23:55                                                                                                         ` Linus Torvalds
2014-12-20  1:00                                                                                                           ` Thomas Gleixner
2014-12-20  1:57                                                                                                             ` Linus Torvalds
2014-12-20 18:25                                                                                                               ` Linus Torvalds
2014-12-20 21:16                                                                                                                 ` Linus Torvalds
2014-12-21  3:52                                                                                                                   ` Paul E. McKenney
2014-12-21 21:22                                                                                                                   ` Linus Torvalds
2014-12-21 22:19                                                                                                                     ` Linus Torvalds
2014-12-21 22:32                                                                                                                       ` Dave Jones
2014-12-21 23:58                                                                                                                         ` Linus Torvalds
2014-12-22  0:41                                                                                                                           ` Linus Torvalds
2014-12-22  0:52                                                                                                                             ` Linus Torvalds
2014-12-22  1:22                                                                                                                               ` Dave Jones
2014-12-22  3:11                                                                                                                               ` Paul E. McKenney
2014-12-22 19:47                                                                                                                             ` Linus Torvalds
2014-12-22 20:06                                                                                                                               ` Linus Torvalds
2014-12-22 22:57                                                                                                                               ` Dave Jones
2014-12-22 23:59                                                                                                                                 ` Linus Torvalds
2014-12-23 14:56                                                                                                                                   ` Dave Jones
2014-12-24 13:58                                                                                                                                     ` Sasha Levin
2014-12-24  3:01                                                                                                                                   ` Dave Jones
2014-12-26 16:34                                                                                                                                     ` Dave Jones
2014-12-26 18:12                                                                                                                                       ` Dave Jones
2014-12-26 20:57                                                                                                                                         ` Linus Torvalds
2014-12-26 21:20                                                                                                                                           ` Dave Jones
2014-12-26 22:57                                                                                                                                           ` Dave Jones
2014-12-26 23:16                                                                                                                                             ` Linus Torvalds
2014-12-27  0:36                                                                                                                                               ` Dave Jones
2014-12-27  3:14                                                                                                                                                 ` Linus Torvalds
2014-12-27 16:48                                                                                                                                                   ` Dave Jones
2014-12-26 23:30                                                                                                                                             ` Linus Torvalds
2014-12-27  0:39                                                                                                                                               ` Dave Jones
2014-12-27  2:53                                                                                                                                               ` Dave Jones
2015-01-03  0:27                                                                                                                                           ` John Stultz
2015-01-03 14:58                                                                                                                                             ` Sasha Levin
2015-01-04 19:46                                                                                                                                             ` Linus Torvalds
2015-01-06  1:17                                                                                                                                               ` John Stultz
2015-01-06  1:25                                                                                                                                                 ` Linus Torvalds
2015-01-06  2:05                                                                                                                                                   ` John Stultz
2014-12-22 23:59                                                                                                                               ` John Stultz
2014-12-23  0:46                                                                                                                                 ` Linus Torvalds
2014-12-27 20:33                                                                                                                                   ` Paul E. McKenney
2015-01-12 10:05                                                                                                                           ` Thomas Gleixner
2014-12-19 14:30                                                                                               ` Chris Mason
2014-12-19 15:12                                                                                                 ` Dave Jones
2014-12-18 18:54                                                                                       ` Linus Torvalds
2014-12-15 14:00                                                                             ` Borislav Petkov
2014-12-18 21:17                                                                             ` save_xstate_sig (Re: frequent lockups in 3.18rc4) Andy Lutomirski
2014-12-18 21:34                                                                               ` Linus Torvalds
2014-12-18 21:41                                                                                 ` Andy Lutomirski
2014-12-18 21:37                                                                               ` Dave Jones
2014-12-17 18:22                                                                           ` frequent lockups in 3.18rc4 Dave Jones
2014-12-17 18:57                                                                             ` Dave Jones
2014-12-17 19:24                                                                               ` Dave Jones
2014-12-17 19:51                                                                               ` Linus Torvalds
2014-12-17 20:16                                                                                 ` Dave Jones
2014-12-17 19:41                                                                             ` Linus Torvalds
2014-12-06  5:04                                                     ` Gene Heskett
2014-12-02 17:47                                           ` Mike Galbraith
2014-12-13  8:11                                             ` Ingo Molnar
2014-12-13  9:57                                               ` Mike Galbraith
2014-12-17 11:13                                           ` Peter Zijlstra
2014-12-02 19:32                                       ` Dave Jones
2014-12-02 23:32                                         ` Sasha Levin
2014-12-03  0:09                                           ` Linus Torvalds
2014-12-03  0:25                                             ` Sasha Levin
2014-12-05  5:00                                           ` Sasha Levin
2014-12-05  6:38                                             ` Linus Torvalds
2014-12-05 15:03                                               ` Sasha Levin
2014-12-05 18:15                                                 ` Linus Torvalds
2014-12-07 14:58                                                   ` Sasha Levin
2014-12-07 18:24                                                     ` Paul E. McKenney
2014-12-07 19:43                                                       ` Paul E. McKenney
2014-12-07 23:28                                                         ` Sasha Levin
2014-12-08  5:20                                                           ` Paul E. McKenney
2014-12-08 14:33                                                             ` Sasha Levin
2014-12-08 15:28                                                               ` Sasha Levin
2014-12-08 15:57                                                                 ` Paul E. McKenney
2014-12-08 16:34                                                                   ` Sasha Levin
2014-12-08 15:56                                                               ` Paul E. McKenney
2014-12-07 23:53                                                     ` Linus Torvalds
2014-12-02 19:31                                     ` Dave Jones
2014-12-02 21:17                                       ` Linus Torvalds
2014-12-02 20:30                                     ` Dave Jones
2014-12-02 20:48                                       ` Paul E. McKenney
2014-12-01 23:08                                   ` Paul E. McKenney
2014-12-02 16:43                                     ` Dâniel Fraga
2014-12-02 17:04                                       ` Paul E. McKenney
2014-12-02 17:14                                         ` Dâniel Fraga
2014-12-02 18:42                                           ` Paul E. McKenney
2014-12-02 18:47                                             ` Dâniel Fraga
2014-12-02 19:11                                               ` Paul E. McKenney
2014-12-02 19:24                                                 ` Dâniel Fraga
2014-12-02 20:56                                                   ` Paul E. McKenney
2014-12-02 22:01                                                     ` Dâniel Fraga
2014-12-02 22:10                                                       ` Paul E. McKenney
2014-12-02 22:18                                                         ` Dâniel Fraga
2014-12-02 22:35                                                           ` Paul E. McKenney
2014-12-02 22:10                                                       ` Linus Torvalds
2014-12-02 22:16                                                         ` Dâniel Fraga
2014-12-03  3:21                                                         ` Dâniel Fraga
2014-12-03  4:14                                                           ` Linus Torvalds
2014-12-03  4:51                                                             ` Dâniel Fraga
2014-12-03  6:02                                                             ` Chris Rorvick
2014-12-03 15:22                                                               ` Linus Torvalds
2014-12-04  8:43                                                                 ` Dâniel Fraga
2014-12-04 16:18                                                                   ` Linus Torvalds
2014-12-04 16:52                                                                     ` Frederic Weisbecker
2014-12-04 17:25                                                                       ` Dâniel Fraga
2014-12-04 17:47                                                                         ` Linus Torvalds
2014-12-04 18:07                                                                           ` Dâniel Fraga
2014-12-03 14:54                                                             ` Tejun Heo
2014-12-02 18:09                                         ` Paul E. McKenney
2014-12-02 18:41                                           ` Dâniel Fraga
2014-12-02 17:08                                       ` Linus Torvalds
2014-12-02 17:16                                         ` Dâniel Fraga
2014-12-02  8:40                                 ` Lai Jiangshan
2014-12-02 16:58                                   ` Paul E. McKenney
2014-12-02 16:58                                   ` Dâniel Fraga
2014-12-02 17:17                                     ` Paul E. McKenney
2014-12-03  2:03                                     ` Lai Jiangshan
2014-12-03  5:22                                       ` Paul E. McKenney
2014-12-01 16:56                     ` Don Zickus
2014-11-26  4:39             ` Jürgen Groß
     [not found]               ` <CA+55aFx1SiFBzmA=k9jHxi3cZE3Ei_+2NHepujgf86KEvkz8eQ@mail.gmail.com>
2014-11-26  5:11                 ` Dave Jones
2014-11-26  5:24                 ` Juergen Gross
2014-11-26  5:52                   ` Linus Torvalds
2014-11-26  6:21                     ` Linus Torvalds
2014-11-26  6:52                       ` Juergen Gross
2014-11-26  9:44                       ` Juergen Gross
2014-11-26 14:34                       ` Dave Jones
2014-11-26 17:37                         ` Linus Torvalds
2014-11-20 15:28       ` Frederic Weisbecker
2014-11-17 15:07 ` Don Zickus
2014-12-16  3:04 Hillf Danton
2015-02-12 11:09 Martin van Es
2015-02-12 16:01 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).