All of lore.kernel.org
 help / color / mirror / Atom feed
* crash in 3.12.51 (likely in 3.12.52 as well) in timer code
@ 2016-02-03 10:58 Nikolay Borisov
  2016-02-04  8:56 ` Thomas Gleixner
  2016-02-04 11:32 ` Mike Galbraith
  0 siblings, 2 replies; 7+ messages in thread
From: Nikolay Borisov @ 2016-02-03 10:58 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org
  Cc: Jiri Slaby, Oleg Nesterov, tglx, SiteGround Operations

Hello, 

I've observed the following crash on a machine running 3.12.51:

[2711471.041886] Modules linked in: xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_set(O) ip_set_list_set(O) ip_set_hash_ip(O) ip_set(O) act_police cls_basic sch_ingress veth dm_snapshot netconsole openvswitch gre vxlan ip_tunnel nf_nat_ftp nf_conntrack_ftp xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ipmi_devintf ipmi_si ipmi_msghandler i2c_i801 lpc_ich mfd_core shpchp ioapic ioatdma ses enclosure ixgbe dca
[2711471.059208] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O 3.12.51-clouder5 #2
[2711471.059563] Hardware name: Supermicro PIO-628U-TR4T+-ST031/X10DRU-i+, BIOS 1.0c 03/23/2015
[2711471.059919] task: ffff881fd31db870 ti: ffff881fd31ea000 task.ti: ffff881fd31ea000
[2711471.060273] RIP: 0010:[<ffffffff81097718>]  [<ffffffff81097718>] detach_if_pending+0x48/0x100
[2711471.060972] RSP: 0018:ffff883fff203bd0  EFLAGS: 00010002
[2711471.061320] RAX: dead000000200200 RBX: ffffffffa018be20 RCX: 0000000000000074
[2711471.061672] RDX: ffff883fd2e14638 RSI: ffff883fd2df8000 RDI: ffffffffa018be20
[2711471.062025] RBP: ffff883fff203bf0 R08: 0000000000000000 R09: ffff881fff403700
[2711471.062377] R10: ffffea00d4178f80 R11: 0000000000000000 R12: ffff883fd2df8000
[2711471.062729] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883fff203c88
[2711471.063081] FS:  0000000000000000(0000) GS:ffff883fff200000(0000) knlGS:0000000000000000
[2711471.063437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2711471.063787] CR2: 00007f7b9b4b2000 CR3: 000000380ddae000 CR4: 00000000001407e0
[2711471.064143] Stack:
[2711471.064483]  ffffffffa018be20 ffff883fd2df8000 0000000000000000 0000000000000000
[2711471.065090]  ffff883fff203c30 ffffffff810978f1 0000000000000000 0000000000000082
[2711471.065695]  ffff883fff203c88 ffffffffa018be00 ffff883fff203c88 0000000000000001
[2711471.066301] Call Trace:
[2711471.066643]  <IRQ> 
[2711471.066709] 
[2711471.067112]  [<ffffffff810978f1>] del_timer+0x41/0x70
[2711471.067465]  [<ffffffff810a5271>] try_to_grab_pending+0x121/0x1d0
[2711471.067818]  [<ffffffff810a5532>] mod_delayed_work_on+0x42/0xa0
[2711471.068171]  [<ffffffffa018b1fa>] set_timeout+0x3a/0x40 [ib_addr]
[2711471.068523]  [<ffffffffa018b22d>] netevent_callback+0x2d/0x40 [ib_addr]
[2711471.068879]  [<ffffffff810b45c4>] notifier_call_chain+0x54/0x80
[2711471.069231]  [<ffffffff810b461a>] __atomic_notifier_call_chain+0x2a/0x40
[2711471.069584]  [<ffffffff810b4646>] atomic_notifier_call_chain+0x16/0x20
[2711471.069940]  [<ffffffff81590f8b>] call_netevent_notifiers+0x1b/0x20
[2711471.070292]  [<ffffffff81593a2e>] neigh_update_notify+0x1e/0x40
[2711471.070643]  [<ffffffff815941c6>] neigh_timer_handler+0x116/0x270
[2711471.070995]  [<ffffffff815940b0>] ? neigh_periodic_work+0x270/0x270
[2711471.071346]  [<ffffffff810975b9>] call_timer_fn+0x49/0x160
[2711471.079597]  [<ffffffff81098298>] run_timer_softirq+0x278/0x2e0
[2711471.079948]  [<ffffffff815940b0>] ? neigh_periodic_work+0x270/0x270
[2711471.080301]  [<ffffffff8108f037>] __do_softirq+0x137/0x2e0
[2711471.080653]  [<ffffffff8164c54c>] call_softirq+0x1c/0x30
[2711471.081006]  [<ffffffff8104a35d>] do_softirq+0x8d/0xc0
[2711471.081356]  [<ffffffff8108ebd5>] irq_exit+0x95/0xa0
[2711471.081706]  [<ffffffff8164cc8a>] smp_apic_timer_interrupt+0x4a/0x5a
[2711471.082057]  [<ffffffff8164b92f>] apic_timer_interrupt+0x6f/0x80
[2711471.082406]  <EOI> 
[2711471.082472] 
[2711471.082877]  [<ffffffff81051b53>] ? mwait_idle+0x73/0x90
[2711471.083227]  [<ffffffff81051b4a>] ? mwait_idle+0x6a/0x90
[2711471.083577]  [<ffffffff81051bc6>] arch_cpu_idle+0x26/0x30
[2711471.083929]  [<ffffffff810d28db>] cpu_startup_entry+0xcb/0x2a0
[2711471.084283]  [<ffffffff81071369>] start_secondary+0x1e9/0x250
[2711471.084633] Code: 44 00 00 31 c0 41 89 d6 48 89 fb 48 8b 17 49 89 f4 48 85 d2 74 4a 8b 05 ff ad c0 00 85 c0 7f 6c 48 8b 43 08 45 84 f6 48 89 42 08 <48> 89 10 74 07 48 c7 03 00 00 00 00 48 b9 00 02 20 00 00 00 ad 
[2711471.089662] RIP  [<ffffffff81097718>] detach_if_pending+0x48/0x100
[2711471.090078]  RSP <ffff883fff203bd0>


Analysing the issue it seems what happens is that a neighbor timer 
expires which in turn causes the subscribed ib_addr module to invoke 
set_timeout which queues delayed work. However, it seems something has 
already corrupted the timer_list since the crash actually occurs in the 
inlined detach_timer inside detach_if_pending, here is annotated assembly: 

------------[detach_timer]----------------------
/home/projects/linux-stable/kernel/timer.c: 662
0xffffffff8109770d <detach_if_pending+61>:      mov    rax,QWORD PTR [rbx+0x8]  ; rbx holds value of rdi = timer_list
/home/projects/linux-stable/kernel/timer.c: 663
0xffffffff81097711 <detach_if_pending+65>:      test   r14b,r14b 
----------[__list_del]----------------------
/home/projects/linux-stable/include/linux/list.h: 88
0xffffffff81097714 <detach_if_pending+68>:      mov    QWORD PTR [rdx+0x8],rax ; ffffffffa018be20
/home/projects/linux-stable/include/linux/list.h: 89
0xffffffff81097718 <detach_if_pending+72>:      mov    QWORD PTR [rax],rdx 
---------------[__list_del]----------------
/home/projects/linux-stable/kernel/timer.c: 663
0xffffffff8109771b <detach_if_pending+75>:      je     0xffffffff81097724 <detach_if_pending+84> 
/home/projects/linux-stable/kernel/timer.c: 664
0xffffffff8109771d <detach_if_pending+77>:      mov    QWORD PTR [rbx],0x0
/home/projects/linux-stable/kernel/timer.c: 665
0xffffffff81097724 <detach_if_pending+84>:      movabs rcx,0xdead000000200200
------------[end detach_timer]-------------


It seems when the code tries to do prev->next = next in __list_del from detach_timer, 
rax has a value of dead000000200200 (LIST_POISON2).

ffffffffa018be20 is the address of the timer_list passed to detach_timer which looks 
like so: 

crash> struct timer_list ffffffffa018be20
struct timer_list {
  entry = {
    next = 0xffff883fd2e14638, 
    prev = 0xffff883fff223e60
  }, 
  expires = 4565976929, 
  base = 0xffff883fd2e14002, 
  function = 0xffffffff810a4f70 <delayed_work_timer_fn>, 
  data = 18446744072100560384, 
  slack = -1
}

So in this case the prev/next entries do not look like corrupted, whereas
when manipulating the list inside detach_timer they do. This is really
odd, any ideas how to further debug this?

Regards, 
Nikolay

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
  2016-02-03 10:58 crash in 3.12.51 (likely in 3.12.52 as well) in timer code Nikolay Borisov
@ 2016-02-04  8:56 ` Thomas Gleixner
  2016-02-04 11:32 ` Mike Galbraith
  1 sibling, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2016-02-04  8:56 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Linux-Kernel@Vger. Kernel. Org, Jiri Slaby, Oleg Nesterov,
	SiteGround Operations

On Wed, 3 Feb 2016, Nikolay Borisov wrote:
> 
> It seems when the code tries to do prev->next = next in __list_del from detach_timer, 
> rax has a value of dead000000200200 (LIST_POISON2).

> So in this case the prev/next entries do not look like corrupted, whereas
> when manipulating the list inside detach_timer they do. This is really
> odd, any ideas how to further debug this?

Please enable

CONFIG_DEBUG_OBJECTS
CONFIG_DEBUG_OBJECTS_TIMERS
CONFIG_DEBUG_OBJECTS_WORK

That should pinpoint the offending code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
  2016-02-03 10:58 crash in 3.12.51 (likely in 3.12.52 as well) in timer code Nikolay Borisov
  2016-02-04  8:56 ` Thomas Gleixner
@ 2016-02-04 11:32 ` Mike Galbraith
  2016-02-04 11:51   ` Nikolay Borisov
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2016-02-04 11:32 UTC (permalink / raw)
  To: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org
  Cc: Jiri Slaby, Oleg Nesterov, tglx, SiteGround Operations

On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote:
> Hello, 
> 
> I've observed the following crash on a machine running 3.12.51:
> 
> [2711471.041886] Modules linked in: xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_set(O) ip_set_list_set(O) ip_set_hash_ip(O) ip_set(O) act_police cls_basic sch_ingress veth dm_snapshot netconsole openvswitch gre vxlan ip_tunnel nf_nat_ftp nf_conntrack_ftp xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ipmi_devintf ipmi_si ipmi_msghandler i2c_i801 lpc_ich mfd_core shpchp ioapic ioatdma ses enclosure ixgbe dca
> [2711471.059208] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O 3.12.51-clouder5 #2
> [2711471.059563] Hardware name: Supermicro PIO-628U-TR4T+-ST031/X10DRU-i+, BIOS 1.0c 03/23/2015
> [2711471.059919] task: ffff881fd31db870 ti: ffff881fd31ea000 task.ti: ffff881fd31ea000
> [2711471.060273] RIP: 0010:[]  [] detach_if_pending+0x48/0x100
> [2711471.060972] RSP: 0018:ffff883fff203bd0  EFLAGS: 00010002
> [2711471.061320] RAX: dead000000200200 RBX: ffffffffa018be20 RCX: 0000000000000074
> [2711471.061672] RDX: ffff883fd2e14638 RSI: ffff883fd2df8000 RDI: ffffffffa018be20
> [2711471.062025] RBP: ffff883fff203bf0 R08: 0000000000000000 R09: ffff881fff403700
> [2711471.062377] R10: ffffea00d4178f80 R11: 0000000000000000 R12: ffff883fd2df8000
> [2711471.062729] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883fff203c88
> [2711471.063081] FS:  0000000000000000(0000) GS:ffff883fff200000(0000) knlGS:0000000000000000
> [2711471.063437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2711471.063787] CR2: 00007f7b9b4b2000 CR3: 000000380ddae000 CR4: 00000000001407e0
> [2711471.064143] Stack:
> [2711471.064483]  ffffffffa018be20 ffff883fd2df8000 0000000000000000 0000000000000000
> [2711471.065090]  ffff883fff203c30 ffffffff810978f1 0000000000000000 0000000000000082
> [2711471.065695]  ffff883fff203c88 ffffffffa018be00 ffff883fff203c88 0000000000000001
> [2711471.066301] Call Trace:
> [2711471.066643]   
> [2711471.066709] 
> [2711471.067112]  [] del_timer+0x41/0x70
> [2711471.067465]  [] try_to_grab_pending+0x121/0x1d0
> [2711471.067818]  [] mod_delayed_work_on+0x42/0xa0
> [2711471.068171]  [] set_timeout+0x3a/0x40 [ib_addr]
> [2711471.068523]  [] netevent_callback+0x2d/0x40 [ib_addr]
> [2711471.068879]  [] notifier_call_chain+0x54/0x80
> [2711471.069231]  [] __atomic_notifier_call_chain+0x2a/0x40
> [2711471.069584]  [] atomic_notifier_call_chain+0x16/0x20
> [2711471.069940]  [] call_netevent_notifiers+0x1b/0x20
> [2711471.070292]  [] neigh_update_notify+0x1e/0x40
> [2711471.070643]  [] neigh_timer_handler+0x116/0x270
> [2711471.070995]  [] ? neigh_periodic_work+0x270/0x270
> [2711471.071346]  [] call_timer_fn+0x49/0x160
> [2711471.079597]  [] run_timer_softirq+0x278/0x2e0
> [2711471.079948]  [] ? neigh_periodic_work+0x270/0x270
> [2711471.080301]  [] __do_softirq+0x137/0x2e0
> [2711471.080653]  [] call_softirq+0x1c/0x30
> [2711471.081006]  [] do_softirq+0x8d/0xc0
> [2711471.081356]  [] irq_exit+0x95/0xa0
> [2711471.081706]  [] smp_apic_timer_interrupt+0x4a/0x5a
> [2711471.082057]  [] apic_timer_interrupt+0x6f/0x80
> [2711471.082406]   
> [2711471.082472] 
> [2711471.082877]  [] ? mwait_idle+0x73/0x90
> [2711471.083227]  [] ? mwait_idle+0x6a/0x90
> [2711471.083577]  [] arch_cpu_idle+0x26/0x30
> [2711471.083929]  [] cpu_startup_entry+0xcb/0x2a0
> [2711471.084283]  [] start_secondary+0x1e9/0x250
> [2711471.084633] Code: 44 00 00 31 c0 41 89 d6 48 89 fb 48 8b 17 49 89 f4 48 85 d2 74 4a 8b 05 ff ad c0 00 85 c0 7f 6c 48 8b 43 08 45 84 f6 48 89 42 08 <48> 89 10 74 07 48 c7 03 00 00 00 00 48 b9 00 02 20 00 00 00 ad 
> [2711471.089662] RIP  [] detach_if_pending+0x48/0x100
> [2711471.090078]  RSP 
> 
> 
> Analysing the issue it seems what happens is that a neighbor timer 
> expires which in turn causes the subscribed ib_addr module to invoke 
> set_timeout which queues delayed work. However, it seems something has 
> already corrupted the timer_list since the crash actually occurs in the 
> inlined detach_timer inside detach_if_pending, here is annotated assembly: 
> 
> ------------[detach_timer]----------------------
> /home/projects/linux-stable/kernel/timer.c: 662
> 0xffffffff8109770d :      mov    rax,QWORD PTR [rbx+0x8]  ; rbx holds value of rdi = timer_list
> /home/projects/linux-stable/kernel/timer.c: 663
> 0xffffffff81097711 :      test   r14b,r14b 
> ----------[__list_del]----------------------
> /home/projects/linux-stable/include/linux/list.h: 88
> 0xffffffff81097714 :      mov    QWORD PTR [rdx+0x8],rax ; ffffffffa018be20
> /home/projects/linux-stable/include/linux/list.h: 89
> 0xffffffff81097718 :      mov    QWORD PTR [rax],rdx 
> ---------------[__list_del]----------------
> /home/projects/linux-stable/kernel/timer.c: 663
> 0xffffffff8109771b :      je     0xffffffff81097724  
> /home/projects/linux-stable/kernel/timer.c: 664
> 0xffffffff8109771d :      mov    QWORD PTR [rbx],0x0
> /home/projects/linux-stable/kernel/timer.c: 665
> 0xffffffff81097724 :      movabs rcx,0xdead000000200200
> ------------[end detach_timer]-------------
> 
> 
> It seems when the code tries to do prev->next = next in __list_del from detach_timer, 
> rax has a value of dead000000200200 (LIST_POISON2).
> 
> ffffffffa018be20 is the address of the timer_list passed to detach_timer which looks 
> like so: 
> 
> crash> struct timer_list ffffffffa018be20
> struct timer_list {
>   entry = {
>     next = 0xffff883fd2e14638, 
>     prev = 0xffff883fff223e60
>   }, 
>   expires = 4565976929, 
>   base = 0xffff883fd2e14002, 
>   function = 0xffffffff810a4f70 , 
>   data = 18446744072100560384, 
>   slack = -1
> }
> 
> So in this case the prev/next entries do not look like corrupted, whereas
> when manipulating the list inside detach_timer they do. This is really
> odd, any ideas how to further debug this?

Suspiciously similar to https://lkml.org/lkml/2016/2/4/247

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
  2016-02-04 11:32 ` Mike Galbraith
@ 2016-02-04 11:51   ` Nikolay Borisov
  2016-02-04 12:17     ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Nikolay Borisov @ 2016-02-04 11:51 UTC (permalink / raw)
  To: Mike Galbraith, Linux-Kernel@Vger. Kernel. Org
  Cc: Jiri Slaby, Oleg Nesterov, tglx, SiteGround Operations



On 02/04/2016 01:32 PM, Mike Galbraith wrote:
> On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote:
>>
>> So in this case the prev/next entries do not look like corrupted, whereas
>> when manipulating the list inside detach_timer they do. This is really
>> odd, any ideas how to further debug this?
> 
> Suspiciously similar to https://lkml.org/lkml/2016/2/4/247

Right, I've been cursory following this thread but I was left with the
impression this only occurs on machines where the CPU can go offline,
currently the server on which this happened should never offline any of
its CPUs since the power management is disabled (though I will have to
double check this).

On a different note - is there a way to safely reproduce this so I can
test the suggested fix by Thomas?


> 
> 	-Mike
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
  2016-02-04 11:51   ` Nikolay Borisov
@ 2016-02-04 12:17     ` Mike Galbraith
  2016-02-04 12:21       ` Nikolay Borisov
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2016-02-04 12:17 UTC (permalink / raw)
  To: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org
  Cc: Jiri Slaby, Oleg Nesterov, tglx, SiteGround Operations

On Thu, 2016-02-04 at 13:51 +0200, Nikolay Borisov wrote:
> 
> On 02/04/2016 01:32 PM, Mike Galbraith wrote:
> > On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote:
> > > 
> > > So in this case the prev/next entries do not look like corrupted,
> > > whereas
> > > when manipulating the list inside detach_timer they do. This is
> > > really
> > > odd, any ideas how to further debug this?
> > 
> > Suspiciously similar to https://lkml.org/lkml/2016/2/4/247
> 
> Right, I've been cursory following this thread but I was left with the
> impression this only occurs on machines where the CPU can go offline,
> currently the server on which this happened should never offline any of
> its CPUs since the power management is disabled (though I will have to
> double check this).

AFAIU, hotplug isn't required, only mod_delayed_work() being called
from a different CPU than where the timer was born, migrating it at a
bad time.

> On a different note - is there a way to safely reproduce this so I can
> test the suggested fix by Thomas?

Hm, write a module to beat mod_delayed_work() to pulp with a NR_CPUS
horde, and run it in a vm where you don't care about shrapnel?

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
  2016-02-04 12:17     ` Mike Galbraith
@ 2016-02-04 12:21       ` Nikolay Borisov
  2016-02-04 12:27         ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Nikolay Borisov @ 2016-02-04 12:21 UTC (permalink / raw)
  To: Mike Galbraith, Linux-Kernel@Vger. Kernel. Org
  Cc: Jiri Slaby, Oleg Nesterov, tglx, SiteGround Operations



On 02/04/2016 02:17 PM, Mike Galbraith wrote:
> On Thu, 2016-02-04 at 13:51 +0200, Nikolay Borisov wrote:
>>
>> On 02/04/2016 01:32 PM, Mike Galbraith wrote:
>>> On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote:
>>>>
>>>> So in this case the prev/next entries do not look like corrupted,
>>>> whereas
>>>> when manipulating the list inside detach_timer they do. This is
>>>> really
>>>> odd, any ideas how to further debug this?
>>>
>>> Suspiciously similar to https://lkml.org/lkml/2016/2/4/247
>>
>> Right, I've been cursory following this thread but I was left with the
>> impression this only occurs on machines where the CPU can go offline,
>> currently the server on which this happened should never offline any of
>> its CPUs since the power management is disabled (though I will have to
>> double check this).
> 
> AFAIU, hotplug isn't required, only mod_delayed_work() being called
> from a different CPU than where the timer was born, migrating it at a
> bad time.

Right, in this case the ib_addr was indeed using mod_delayed_work so
things line up so far.

> 
>> On a different note - is there a way to safely reproduce this so I can
>> test the suggested fix by Thomas?
> 
> Hm, write a module to beat mod_delayed_work() to pulp with a NR_CPUS
> horde, and run it in a vm where you don't care about shrapnel?

In other words, have multiple threads (NR_CPUS) that spin on
mod_delayed_work?


> 
> 	-Mike
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
  2016-02-04 12:21       ` Nikolay Borisov
@ 2016-02-04 12:27         ` Mike Galbraith
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Galbraith @ 2016-02-04 12:27 UTC (permalink / raw)
  To: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org
  Cc: Jiri Slaby, Oleg Nesterov, tglx, SiteGround Operations

On Thu, 2016-02-04 at 14:21 +0200, Nikolay Borisov wrote:

> > > On a different note - is there a way to safely reproduce this so I can
> > > test the suggested fix by Thomas?
> > 
> > Hm, write a module to beat mod_delayed_work() to pulp with a NR_CPUS
> > horde, and run it in a vm where you don't care about shrapnel?
> 
> In other words, have multiple threads (NR_CPUS) that spin on
> mod_delayed_work?

Yeah, give threads lots of opportunities to collide. (not forgetting to
sleep at least occasionally;)

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-04 12:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 10:58 crash in 3.12.51 (likely in 3.12.52 as well) in timer code Nikolay Borisov
2016-02-04  8:56 ` Thomas Gleixner
2016-02-04 11:32 ` Mike Galbraith
2016-02-04 11:51   ` Nikolay Borisov
2016-02-04 12:17     ` Mike Galbraith
2016-02-04 12:21       ` Nikolay Borisov
2016-02-04 12:27         ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.