All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Nikolay Borisov <kernel@kyup.com>,
	"Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>, Oleg Nesterov <oleg@redhat.com>,
	tglx@linutronix.de,
	SiteGround Operations <operations@siteground.com>
Subject: Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code
Date: Thu, 04 Feb 2016 12:32:30 +0100	[thread overview]
Message-ID: <1454585550.3407.126.camel@gmail.com> (raw)
In-Reply-To: <56B1DD62.9030900@kyup.com>

On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote:
> Hello, 
> 
> I've observed the following crash on a machine running 3.12.51:
> 
> [2711471.041886] Modules linked in: xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_set(O) ip_set_list_set(O) ip_set_hash_ip(O) ip_set(O) act_police cls_basic sch_ingress veth dm_snapshot netconsole openvswitch gre vxlan ip_tunnel nf_nat_ftp nf_conntrack_ftp xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ipmi_devintf ipmi_si ipmi_msghandler i2c_i801 lpc_ich mfd_core shpchp ioapic ioatdma ses enclosure ixgbe dca
> [2711471.059208] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O 3.12.51-clouder5 #2
> [2711471.059563] Hardware name: Supermicro PIO-628U-TR4T+-ST031/X10DRU-i+, BIOS 1.0c 03/23/2015
> [2711471.059919] task: ffff881fd31db870 ti: ffff881fd31ea000 task.ti: ffff881fd31ea000
> [2711471.060273] RIP: 0010:[]  [] detach_if_pending+0x48/0x100
> [2711471.060972] RSP: 0018:ffff883fff203bd0  EFLAGS: 00010002
> [2711471.061320] RAX: dead000000200200 RBX: ffffffffa018be20 RCX: 0000000000000074
> [2711471.061672] RDX: ffff883fd2e14638 RSI: ffff883fd2df8000 RDI: ffffffffa018be20
> [2711471.062025] RBP: ffff883fff203bf0 R08: 0000000000000000 R09: ffff881fff403700
> [2711471.062377] R10: ffffea00d4178f80 R11: 0000000000000000 R12: ffff883fd2df8000
> [2711471.062729] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883fff203c88
> [2711471.063081] FS:  0000000000000000(0000) GS:ffff883fff200000(0000) knlGS:0000000000000000
> [2711471.063437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2711471.063787] CR2: 00007f7b9b4b2000 CR3: 000000380ddae000 CR4: 00000000001407e0
> [2711471.064143] Stack:
> [2711471.064483]  ffffffffa018be20 ffff883fd2df8000 0000000000000000 0000000000000000
> [2711471.065090]  ffff883fff203c30 ffffffff810978f1 0000000000000000 0000000000000082
> [2711471.065695]  ffff883fff203c88 ffffffffa018be00 ffff883fff203c88 0000000000000001
> [2711471.066301] Call Trace:
> [2711471.066643]   
> [2711471.066709] 
> [2711471.067112]  [] del_timer+0x41/0x70
> [2711471.067465]  [] try_to_grab_pending+0x121/0x1d0
> [2711471.067818]  [] mod_delayed_work_on+0x42/0xa0
> [2711471.068171]  [] set_timeout+0x3a/0x40 [ib_addr]
> [2711471.068523]  [] netevent_callback+0x2d/0x40 [ib_addr]
> [2711471.068879]  [] notifier_call_chain+0x54/0x80
> [2711471.069231]  [] __atomic_notifier_call_chain+0x2a/0x40
> [2711471.069584]  [] atomic_notifier_call_chain+0x16/0x20
> [2711471.069940]  [] call_netevent_notifiers+0x1b/0x20
> [2711471.070292]  [] neigh_update_notify+0x1e/0x40
> [2711471.070643]  [] neigh_timer_handler+0x116/0x270
> [2711471.070995]  [] ? neigh_periodic_work+0x270/0x270
> [2711471.071346]  [] call_timer_fn+0x49/0x160
> [2711471.079597]  [] run_timer_softirq+0x278/0x2e0
> [2711471.079948]  [] ? neigh_periodic_work+0x270/0x270
> [2711471.080301]  [] __do_softirq+0x137/0x2e0
> [2711471.080653]  [] call_softirq+0x1c/0x30
> [2711471.081006]  [] do_softirq+0x8d/0xc0
> [2711471.081356]  [] irq_exit+0x95/0xa0
> [2711471.081706]  [] smp_apic_timer_interrupt+0x4a/0x5a
> [2711471.082057]  [] apic_timer_interrupt+0x6f/0x80
> [2711471.082406]   
> [2711471.082472] 
> [2711471.082877]  [] ? mwait_idle+0x73/0x90
> [2711471.083227]  [] ? mwait_idle+0x6a/0x90
> [2711471.083577]  [] arch_cpu_idle+0x26/0x30
> [2711471.083929]  [] cpu_startup_entry+0xcb/0x2a0
> [2711471.084283]  [] start_secondary+0x1e9/0x250
> [2711471.084633] Code: 44 00 00 31 c0 41 89 d6 48 89 fb 48 8b 17 49 89 f4 48 85 d2 74 4a 8b 05 ff ad c0 00 85 c0 7f 6c 48 8b 43 08 45 84 f6 48 89 42 08 <48> 89 10 74 07 48 c7 03 00 00 00 00 48 b9 00 02 20 00 00 00 ad 
> [2711471.089662] RIP  [] detach_if_pending+0x48/0x100
> [2711471.090078]  RSP 
> 
> 
> Analysing the issue it seems what happens is that a neighbor timer 
> expires which in turn causes the subscribed ib_addr module to invoke 
> set_timeout which queues delayed work. However, it seems something has 
> already corrupted the timer_list since the crash actually occurs in the 
> inlined detach_timer inside detach_if_pending, here is annotated assembly: 
> 
> ------------[detach_timer]----------------------
> /home/projects/linux-stable/kernel/timer.c: 662
> 0xffffffff8109770d :      mov    rax,QWORD PTR [rbx+0x8]  ; rbx holds value of rdi = timer_list
> /home/projects/linux-stable/kernel/timer.c: 663
> 0xffffffff81097711 :      test   r14b,r14b 
> ----------[__list_del]----------------------
> /home/projects/linux-stable/include/linux/list.h: 88
> 0xffffffff81097714 :      mov    QWORD PTR [rdx+0x8],rax ; ffffffffa018be20
> /home/projects/linux-stable/include/linux/list.h: 89
> 0xffffffff81097718 :      mov    QWORD PTR [rax],rdx 
> ---------------[__list_del]----------------
> /home/projects/linux-stable/kernel/timer.c: 663
> 0xffffffff8109771b :      je     0xffffffff81097724  
> /home/projects/linux-stable/kernel/timer.c: 664
> 0xffffffff8109771d :      mov    QWORD PTR [rbx],0x0
> /home/projects/linux-stable/kernel/timer.c: 665
> 0xffffffff81097724 :      movabs rcx,0xdead000000200200
> ------------[end detach_timer]-------------
> 
> 
> It seems when the code tries to do prev->next = next in __list_del from detach_timer, 
> rax has a value of dead000000200200 (LIST_POISON2).
> 
> ffffffffa018be20 is the address of the timer_list passed to detach_timer which looks 
> like so: 
> 
> crash> struct timer_list ffffffffa018be20
> struct timer_list {
>   entry = {
>     next = 0xffff883fd2e14638, 
>     prev = 0xffff883fff223e60
>   }, 
>   expires = 4565976929, 
>   base = 0xffff883fd2e14002, 
>   function = 0xffffffff810a4f70 , 
>   data = 18446744072100560384, 
>   slack = -1
> }
> 
> So in this case the prev/next entries do not look like corrupted, whereas
> when manipulating the list inside detach_timer they do. This is really
> odd, any ideas how to further debug this?

Suspiciously similar to https://lkml.org/lkml/2016/2/4/247

	-Mike

  parent reply	other threads:[~2016-02-04 11:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 10:58 crash in 3.12.51 (likely in 3.12.52 as well) in timer code Nikolay Borisov
2016-02-04  8:56 ` Thomas Gleixner
2016-02-04 11:32 ` Mike Galbraith [this message]
2016-02-04 11:51   ` Nikolay Borisov
2016-02-04 12:17     ` Mike Galbraith
2016-02-04 12:21       ` Nikolay Borisov
2016-02-04 12:27         ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1454585550.3407.126.camel@gmail.com \
    --to=umgwanakikbuti@gmail.com \
    --cc=jslaby@suse.cz \
    --cc=kernel@kyup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=operations@siteground.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.