linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel@kyup.com>
To: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>, Oleg Nesterov <oleg@redhat.com>,
	tglx@linutronix.de,
	SiteGround Operations <operations@siteground.com>
Subject: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
Date: Wed, 3 Feb 2016 12:58:42 +0200	[thread overview]
Message-ID: <56B1DD62.9030900@kyup.com> (raw)

Hello, 

I've observed the following crash on a machine running 3.12.51:

[2711471.041886] Modules linked in: xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_set(O) ip_set_list_set(O) ip_set_hash_ip(O) ip_set(O) act_police cls_basic sch_ingress veth dm_snapshot netconsole openvswitch gre vxlan ip_tunnel nf_nat_ftp nf_conntrack_ftp xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ipmi_devintf ipmi_si ipmi_msghandler i2c_i801 lpc_ich mfd_core shpchp ioapic ioatdma ses enclosure ixgbe dca
[2711471.059208] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O 3.12.51-clouder5 #2
[2711471.059563] Hardware name: Supermicro PIO-628U-TR4T+-ST031/X10DRU-i+, BIOS 1.0c 03/23/2015
[2711471.059919] task: ffff881fd31db870 ti: ffff881fd31ea000 task.ti: ffff881fd31ea000
[2711471.060273] RIP: 0010:[<ffffffff81097718>]  [<ffffffff81097718>] detach_if_pending+0x48/0x100
[2711471.060972] RSP: 0018:ffff883fff203bd0  EFLAGS: 00010002
[2711471.061320] RAX: dead000000200200 RBX: ffffffffa018be20 RCX: 0000000000000074
[2711471.061672] RDX: ffff883fd2e14638 RSI: ffff883fd2df8000 RDI: ffffffffa018be20
[2711471.062025] RBP: ffff883fff203bf0 R08: 0000000000000000 R09: ffff881fff403700
[2711471.062377] R10: ffffea00d4178f80 R11: 0000000000000000 R12: ffff883fd2df8000
[2711471.062729] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883fff203c88
[2711471.063081] FS:  0000000000000000(0000) GS:ffff883fff200000(0000) knlGS:0000000000000000
[2711471.063437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2711471.063787] CR2: 00007f7b9b4b2000 CR3: 000000380ddae000 CR4: 00000000001407e0
[2711471.064143] Stack:
[2711471.064483]  ffffffffa018be20 ffff883fd2df8000 0000000000000000 0000000000000000
[2711471.065090]  ffff883fff203c30 ffffffff810978f1 0000000000000000 0000000000000082
[2711471.065695]  ffff883fff203c88 ffffffffa018be00 ffff883fff203c88 0000000000000001
[2711471.066301] Call Trace:
[2711471.066643]  <IRQ> 
[2711471.066709] 
[2711471.067112]  [<ffffffff810978f1>] del_timer+0x41/0x70
[2711471.067465]  [<ffffffff810a5271>] try_to_grab_pending+0x121/0x1d0
[2711471.067818]  [<ffffffff810a5532>] mod_delayed_work_on+0x42/0xa0
[2711471.068171]  [<ffffffffa018b1fa>] set_timeout+0x3a/0x40 [ib_addr]
[2711471.068523]  [<ffffffffa018b22d>] netevent_callback+0x2d/0x40 [ib_addr]
[2711471.068879]  [<ffffffff810b45c4>] notifier_call_chain+0x54/0x80
[2711471.069231]  [<ffffffff810b461a>] __atomic_notifier_call_chain+0x2a/0x40
[2711471.069584]  [<ffffffff810b4646>] atomic_notifier_call_chain+0x16/0x20
[2711471.069940]  [<ffffffff81590f8b>] call_netevent_notifiers+0x1b/0x20
[2711471.070292]  [<ffffffff81593a2e>] neigh_update_notify+0x1e/0x40
[2711471.070643]  [<ffffffff815941c6>] neigh_timer_handler+0x116/0x270
[2711471.070995]  [<ffffffff815940b0>] ? neigh_periodic_work+0x270/0x270
[2711471.071346]  [<ffffffff810975b9>] call_timer_fn+0x49/0x160
[2711471.079597]  [<ffffffff81098298>] run_timer_softirq+0x278/0x2e0
[2711471.079948]  [<ffffffff815940b0>] ? neigh_periodic_work+0x270/0x270
[2711471.080301]  [<ffffffff8108f037>] __do_softirq+0x137/0x2e0
[2711471.080653]  [<ffffffff8164c54c>] call_softirq+0x1c/0x30
[2711471.081006]  [<ffffffff8104a35d>] do_softirq+0x8d/0xc0
[2711471.081356]  [<ffffffff8108ebd5>] irq_exit+0x95/0xa0
[2711471.081706]  [<ffffffff8164cc8a>] smp_apic_timer_interrupt+0x4a/0x5a
[2711471.082057]  [<ffffffff8164b92f>] apic_timer_interrupt+0x6f/0x80
[2711471.082406]  <EOI> 
[2711471.082472] 
[2711471.082877]  [<ffffffff81051b53>] ? mwait_idle+0x73/0x90
[2711471.083227]  [<ffffffff81051b4a>] ? mwait_idle+0x6a/0x90
[2711471.083577]  [<ffffffff81051bc6>] arch_cpu_idle+0x26/0x30
[2711471.083929]  [<ffffffff810d28db>] cpu_startup_entry+0xcb/0x2a0
[2711471.084283]  [<ffffffff81071369>] start_secondary+0x1e9/0x250
[2711471.084633] Code: 44 00 00 31 c0 41 89 d6 48 89 fb 48 8b 17 49 89 f4 48 85 d2 74 4a 8b 05 ff ad c0 00 85 c0 7f 6c 48 8b 43 08 45 84 f6 48 89 42 08 <48> 89 10 74 07 48 c7 03 00 00 00 00 48 b9 00 02 20 00 00 00 ad 
[2711471.089662] RIP  [<ffffffff81097718>] detach_if_pending+0x48/0x100
[2711471.090078]  RSP <ffff883fff203bd0>


Analysing the issue it seems what happens is that a neighbor timer 
expires which in turn causes the subscribed ib_addr module to invoke 
set_timeout which queues delayed work. However, it seems something has 
already corrupted the timer_list since the crash actually occurs in the 
inlined detach_timer inside detach_if_pending, here is annotated assembly: 

------------[detach_timer]----------------------
/home/projects/linux-stable/kernel/timer.c: 662
0xffffffff8109770d <detach_if_pending+61>:      mov    rax,QWORD PTR [rbx+0x8]  ; rbx holds value of rdi = timer_list
/home/projects/linux-stable/kernel/timer.c: 663
0xffffffff81097711 <detach_if_pending+65>:      test   r14b,r14b 
----------[__list_del]----------------------
/home/projects/linux-stable/include/linux/list.h: 88
0xffffffff81097714 <detach_if_pending+68>:      mov    QWORD PTR [rdx+0x8],rax ; ffffffffa018be20
/home/projects/linux-stable/include/linux/list.h: 89
0xffffffff81097718 <detach_if_pending+72>:      mov    QWORD PTR [rax],rdx 
---------------[__list_del]----------------
/home/projects/linux-stable/kernel/timer.c: 663
0xffffffff8109771b <detach_if_pending+75>:      je     0xffffffff81097724 <detach_if_pending+84> 
/home/projects/linux-stable/kernel/timer.c: 664
0xffffffff8109771d <detach_if_pending+77>:      mov    QWORD PTR [rbx],0x0
/home/projects/linux-stable/kernel/timer.c: 665
0xffffffff81097724 <detach_if_pending+84>:      movabs rcx,0xdead000000200200
------------[end detach_timer]-------------


It seems when the code tries to do prev->next = next in __list_del from detach_timer, 
rax has a value of dead000000200200 (LIST_POISON2).

ffffffffa018be20 is the address of the timer_list passed to detach_timer which looks 
like so: 

crash> struct timer_list ffffffffa018be20
struct timer_list {
  entry = {
    next = 0xffff883fd2e14638, 
    prev = 0xffff883fff223e60
  }, 
  expires = 4565976929, 
  base = 0xffff883fd2e14002, 
  function = 0xffffffff810a4f70 <delayed_work_timer_fn>, 
  data = 18446744072100560384, 
  slack = -1
}

So in this case the prev/next entries do not look like corrupted, whereas
when manipulating the list inside detach_timer they do. This is really
odd, any ideas how to further debug this?

Regards, 
Nikolay

             reply	other threads:[~2016-02-03 10:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 10:58 Nikolay Borisov [this message]
2016-02-04  8:56 ` crash in 3.12.51 (likely in 3.12.52 as well) in timer code Thomas Gleixner
2016-02-04 11:32 ` Mike Galbraith
2016-02-04 11:51   ` Nikolay Borisov
2016-02-04 12:17     ` Mike Galbraith
2016-02-04 12:21       ` Nikolay Borisov
2016-02-04 12:27         ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B1DD62.9030900@kyup.com \
    --to=kernel@kyup.com \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=operations@siteground.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).