netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux 3.1-rc9
       [not found]                   ` <20111024190203.GA24410@hostway.ca>
@ 2011-10-25  7:13                     ` Linus Torvalds
  2011-10-25  9:01                       ` David Miller
  2011-10-25 20:20                     ` Simon Kirby
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2011-10-25  7:13 UTC (permalink / raw)
  To: Simon Kirby, Network Development
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar

Added netdev, because this seems to be a generic networking bug (ABBA
between sk_lock and icsk_retransmit_timer if my quick scan looks
correct).

Davem?

               Linus

On Mon, Oct 24, 2011 at 9:02 PM, Simon Kirby <sim@hostway.ca> wrote:
>
> Ok, hit the hang about 4 more times, but only this morning on a box with
> a serial cable attached. Yay!
>
> Simon-
>
> [216695.579770] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216695.589435]
> [216695.589437] =======================================================
> [216695.593380] [ INFO: possible circular locking dependency detected ]
> [216695.593380] 3.1.0-rc10-hw-lockdep+ #51
> [216695.593380] -------------------------------------------------------
> [216695.593380] kworker/0:1/0 is trying to acquire lock:
> [216695.593380]  (&icsk->icsk_retransmit_timer){+.-.-.}, at: [<ffffffff8106cc88>] run_timer_softirq+0x198/0x410
> [216695.593380]
> [216695.593380] but task is already holding lock:
> [216695.593380]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
> [216695.593380]
> [216695.593380] which lock already depends on the new lock.
> [216695.593380]
> [216695.593380]
> [216695.593380] the existing dependency chain (in reverse order) is:
> [216695.593380]
> [216695.593380] -> #1 (slock-AF_INET){+.-.-.}:
> [216695.593380]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.593380]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
> [216695.593380]        [<ffffffff81661cc3>] tcp_write_timer+0x23/0x230
> [216695.682901]        [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]        [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [216695.682901]        [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216695.682901]
> [216695.682901] -> #0 (&icsk->icsk_retransmit_timer){+.-.-.}:
> [216695.682901]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
> [216695.682901]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.682901]        [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
> [216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
> [216695.682901]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a
> [216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216695.682901]
> [216695.682901] other info that might help us debug this:
> [216695.682901]
> [216695.682901]  Possible unsafe locking scenario:
> [216695.682901]
> [216695.682901]        CPU0                    CPU1
> [216695.682901]        ----                    ----
> [216695.682901]   lock(slock-AF_INET);
> [216695.682901]                                lock(&icsk->icsk_retransmit_timer);
> [216695.682901]                                lock(slock-AF_INET);
> [216695.682901]   lock(&icsk->icsk_retransmit_timer);
> [216695.682901]
> [216695.682901]  *** DEADLOCK ***
> [216695.682901]
> [216695.682901] 1 lock held by kworker/0:1/0:
> [216695.682901]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
> [216695.682901]
> [216695.682901] stack backtrace:
> [216695.682901] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51
> [216695.682901] Call Trace:
> [216695.682901]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
> [216695.682901]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
> [216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
> [216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
> [216695.682901]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [216695.682901]  [<ffffffff81096b4c>] ? trace_hardirqs_on_caller+0x7c/0x1c0
> [216695.682901]  [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
> [216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [216695.682901]  [<ffffffff816f16c1>] ? printk+0x67/0x69
> [216695.682901]  [<ffffffff81661ca0>] ? tcp_delack_timer+0x230/0x230
> [216695.682901]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
> [216695.682901]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
> [216695.682901]  <EOI>  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [216695.682901]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [216695.682901]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216696.019296] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000105?
> [216697.762956] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216698.597297] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216701.489681] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216701.667999] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216704.580592] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216709.468971] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216712.845904] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216716.588502] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216725.072958] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216725.603879] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216725.828374] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216727.588978] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216735.513864] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216740.581530] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216756.278571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218855.312903] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218855.323133] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218858.293355] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218864.301938] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218876.333821] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218885.332651] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218900.313590] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [220821.012017] TCP: Peer 32.176.160.153:49226/80 unexpectedly shrunk window 665256753:665268993 (repaired)
> [221075.224300] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.234579] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.277593] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.780515] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.780713] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221077.349279] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221077.905587] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221077.915567] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221081.498430] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221081.703277] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221082.088513] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221082.167985] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221089.772578] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221090.487927] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221090.686394] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221094.587131] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221105.255699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221105.280699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221105.291634] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221106.325794] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.286029] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.622736] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.734471] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221120.381643] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [223936.264020] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
> [223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.268002] irq event stamp: 2595159887
> [223936.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002] CPU 0
> [223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.268002]
> [223936.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.268002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
> [223936.268002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000202
> [223936.268002] RAX: 00017b5d5932dd02 RBX: ffffffff816f6334 RCX: 000000005932dd02
> [223936.372028] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
> [223936.372031] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.372042] irq event stamp: 2598787699
> [223936.372044] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.372054] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.372058] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.372063] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372069] CPU 1
> [223936.372070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.372079]
> [223936.372081] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.372086] RIP: 0010:[<ffffffff8101afab>]  [<ffffffff8101afab>] native_read_tsc+0xb/0x20
> [223936.372091] RSP: 0018:ffff88022fc43ce0  EFLAGS: 00000202
> [223936.372093] RAX: 0000000000017b5d RBX: ffffffff816f6334 RCX: 00000000652f810e
> [223936.372096] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
> [223936.372098] RBP: ffff88022fc43ce0 R08: 00000000652f80c8 R09: 0000000000000000
> [223936.372101] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
> [223936.372103] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 00000000180bbeb8
> [223936.372106] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223936.372108] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.372111] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223936.372113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.372116] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.372119] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223936.372121] Stack:
> [223936.372123]  ffff88022fc43d30 ffffffff813a4eaf ffff880226928000 00000000652f8090
> [223936.372128]  000000012fc43d18 ffff88002e90e348 00000000180bbeb8 000000006efcdc62
> [223936.372132]  0000000000000001 ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a
> [223936.372136] Call Trace:
> [223936.372139]  <IRQ>
> [223936.372144]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223936.372148]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.372153]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.372159]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.372164]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.372168]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.372174]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.372178]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.372182]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.372186]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.372190]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372194]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.372198]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.372203]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.372208]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.372210]  <EOI>
> [223936.372214]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372218]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.372222]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372226]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.372230]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.372233] Code: a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 89 c1 48 89 d0
> [223936.372253]  c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 00 00 00 00 00
> [223936.372262] Call Trace:
> [223936.372264]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223936.372269]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.372272]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.372276]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.372280]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.372283]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.372286]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.372289]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.372293]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.372297]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.372300]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372303]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.372307]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.372310]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.372313]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.372315]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372321]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.372324]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372327]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.372331]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.476032] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
> [223936.476034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.476043] irq event stamp: 2613824057
> [223936.476045] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.476050] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.476054] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.476058] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476062] CPU 2
> [223936.476063] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.476071]
> [223936.476073] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.476077] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223936.476082] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
> [223936.476084] RAX: 0000000070ba7dfc RBX: ffffffff813a60ae RCX: 0000000070ba7dc4
> [223936.476086] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
> [223936.476089] RBP: ffff88022fc83ce0 R08: 0000000070ba7d7e R09: 0000000000000000
> [223936.476091] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
> [223936.476093] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 00000000182285f9
> [223936.476096] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223936.476099] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.476101] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223936.476104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.476106] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.476109] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223936.476111] Stack:
> [223936.476113]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 0000000070ba7dc4
> [223936.476117]  00000002ffffff10 ffff88006afd8948 00000000182285f9 000000006efcdc62
> [223936.476121]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223936.476126] Call Trace:
> [223936.476128]  <IRQ>
> [223936.476132]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.476136]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.476141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.476147]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.476153]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.476157]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.476163]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.476167]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.476171]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.476176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.476180]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476184]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.476187]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.476193]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.476197]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.476199]  <EOI>
> [223936.476203]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476207]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.476211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476215]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.476219]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.476222] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223936.476241]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223936.476251] Call Trace:
> [223936.476252]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.476257]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.476261]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.476265]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.476268]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.476272]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.476275]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.476278]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.476282]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.476286]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.476289]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476292]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.476295]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.476299]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.476302]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.476304]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476310]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.476313]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476316]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.476320]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.580039] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
> [223936.580041] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.580050] irq event stamp: 2615464042
> [223936.580052] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
> [223936.580057] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
> [223936.580061] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
> [223936.580065] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580069] CPU 3
> [223936.580070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.580078]
> [223936.580080] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.580085] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223936.580090] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000202
> [223936.580092] RAX: 000000007c457b06 RBX: ffffffff816f6334 RCX: 000000007c457ad5
> [223936.580094] RDX: 0000000000017b5d RSI: ffffffff818f9896 RDI: 0000000000000001
> [223936.580097] RBP: ffff88022fcc3ce0 R08: 000000007c457a88 R09: 0000000000000000
> [223936.580099] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
> [223936.580101] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 00000000183a1380
> [223936.580104] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223936.580107] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.580109] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223936.580112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.580114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.580117] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223936.580119] Stack:
> [223936.580120]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 000000007c457ad5
> [223936.580125]  00000003ffffff10 ffff880031438948 00000000183a1380 000000006efcdc62
> [223936.580129]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
> [223936.580133] Call Trace:
> [223936.580135]  <IRQ>
> [223936.580138]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.580142]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.580147]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.580151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.580156]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.580160]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.580164]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.580168]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.580172]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.580176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.580181]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580185]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.580188]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.580192]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.580196]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.580199]  <EOI>
> [223936.580202]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580206]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.580211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580214]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.580218]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.580221] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223936.580240]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223936.580250] Call Trace:
> [223936.580251]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.580256]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.580260]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.580264]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.580267]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.580270]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.580274]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.580277]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.580280]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.580284]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.580288]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580291]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.580294]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.580297]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.580300]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.580302]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580308]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.580312]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580315]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.580318]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.268002] RDX: 000000005932dd02 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223936.268002] RBP: ffff88022fc03d30 R08: 000000005932dcb5 R09: 0000000000000000
> [223936.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c68
> [223936.268002] R13: ffffffff816feb33 R14: ffff88022fc03d30 R15: 0000000017f328cd
> [223936.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223936.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223936.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223936.268002] Stack:
> [223936.268002]  ffffffff819a6000 000000005932dd02 000000002fc03d18 ffff8801f6c22448
> [223936.268002]  0000000017f328cd 000000006efcdc62 0000000000000001 ffffffff81a2b020
> [223936.268002]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
> [223936.268002] Call Trace:
> [223936.268002]  <IRQ>
> [223936.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.268002]  <EOI>
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223936.268002] Code: 4c 89 7d c8 eb 1f 66 90 48 8b 45 c0 83 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 <e8> b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0 48 2b 45 c8 48 39 d8 72
> [223936.268002] Call Trace:
> [223936.268002]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223964.264018] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
> [223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.268002] irq event stamp: 2595159887
> [223964.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002] CPU 0
> [223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.268002]
> [223964.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.268002] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.268002] RSP: 0018:ffff88022fc03ce0  EFLAGS: 00000202
> [223964.268002] RAX: 000000007cb6c61b RBX: ffffffff816f6334 RCX: 000000007cb6c5e3
> [223964.372025] BUG: soft lockup - CPU#1 stuck for 23s! [kworker/0:0:0]
> [223964.372027] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.372036] irq event stamp: 2598787699
> [223964.372037] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.372042] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.372045] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.372049] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372052] CPU 1
> [223964.372053] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.372061]
> [223964.372063] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.372067] RIP: 0010:[<ffffffff8101afa0>]  [<ffffffff8101afa0>] read_persistent_clock+0x30/0x30
> [223964.372072] RSP: 0018:ffff88022fc43ce8  EFLAGS: 00000202
> [223964.372074] RAX: 0000000000000001 RBX: ffff88022fc43c68 RCX: 0000000088b369fd
> [223964.372076] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000001
> [223964.372078] RBP: ffff88022fc43d30 R08: ffffffff88b369fd R09: 0000000000000000
> [223964.372081] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
> [223964.372083] R13: ffffffff816feb33 R14: ffff88022fc43d30 R15: 00000000307e58b4
> [223964.372086] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223964.372089] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.372091] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223964.372093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.372096] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.372098] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223964.372100] Stack:
> [223964.372102]  ffffffff813a4eaf ffff880226928000 ffffffff88b369c5 000000012fc43d18
> [223964.372106]  ffff88002e90e348 00000000307e58b4 000000006efcdc62 0000000000000001
> [223964.372111]  ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a ffff88022fc43d80
> [223964.372115] Call Trace:
> [223964.372116]  <IRQ>
> [223964.372119]  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
> [223964.372123]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.372127]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.372132]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.372136]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.372140]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.372144]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.372148]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.372153]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.372158]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.372162]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372166]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.372170]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.372174]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.372178]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.372180]  <EOI>
> [223964.372184]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372188]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.372192]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372196]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.372200]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.372203] Code: 48 89 fb 48 83 ec 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00
> [223964.372221]  48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9
> [223964.372231] Call Trace:
> [223964.372232]  <IRQ>  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
> [223964.372237]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.372241]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.372245]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.372248]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.372251]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.372255]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.372258]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.372261]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.372265]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.372268]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372271]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.372275]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.372278]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.372281]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.372282]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372288]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.372292]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372295]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.372298]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.476031] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
> [223964.476033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.476042] irq event stamp: 2613824057
> [223964.476043] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.476048] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.476051] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.476055] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476059] CPU 2
> [223964.476060] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.476067]
> [223964.476070] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.476074] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.476078] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000206
> [223964.476080] RAX: 00000000943e6715 RBX: ffffffff816f6334 RCX: 00000000943e66dd
> [223964.476083] RDX: 0000000000017b69 RSI: 0000000000000000 RDI: 0000000000000001
> [223964.476085] RBP: ffff88022fc83ce0 R08: ffffffff943e6697 R09: 0000000000000000
> [223964.476087] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
> [223964.476090] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 000000003094ad30
> [223964.476092] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223964.476095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.476097] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223964.476100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.476102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.476105] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223964.476107] Stack:
> [223964.476108]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 ffffffff943e66dd
> [223964.476113]  00000002ffffff10 ffff88006afd8948 000000003094ad30 000000006efcdc62
> [223964.476117]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223964.476121] Call Trace:
> [223964.476123]  <IRQ>
> [223964.476126]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.476130]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.476134]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.476139]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.476143]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.476147]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.476151]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.476155]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.476159]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.476164]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.476168]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476172]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.476176]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.476180]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.476184]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.476186]  <EOI>
> [223964.476190]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476194]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.476198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476202]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.476206]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.476208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.476227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.476236] Call Trace:
> [223964.476238]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.476243]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.476246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.476250]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.476254]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.476257]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.476260]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.476264]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.476267]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.476271]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.476274]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476277]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.476281]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.476284]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.476287]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.476289]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476295]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.476298]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476301]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.476304]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.580038] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
> [223964.580040] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.580049] irq event stamp: 2615464042
> [223964.580050] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
> [223964.580054] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
> [223964.580058] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
> [223964.580062] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580066] CPU 3
> [223964.580067] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.580075]
> [223964.580077] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.580081] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.580086] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000206
> [223964.580088] RAX: 000000009fc963af RBX: ffffffff816f6334 RCX: 000000009fc96377
> [223964.580090] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223964.580093] RBP: ffff88022fcc3ce0 R08: ffffffff9fc96331 R09: 0000000000000000
> [223964.580095] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
> [223964.580097] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 0000000030ac88b0
> [223964.580100] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223964.580103] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.580105] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223964.580107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.580110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.580112] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223964.580114] Stack:
> [223964.580116]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 ffffffff9fc96377
> [223964.580120]  000000039c3b34d8 ffff880031438948 0000000030ac88b0 000000006efcdc62
> [223964.580124]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
> [223964.580128] Call Trace:
> [223964.580130]  <IRQ>
> [223964.580133]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.580137]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.580141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.580146]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.580150]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.580154]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.580158]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.580162]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.580167]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.580171]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.580176]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580180]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.580184]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.580188]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.580192]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.580194]  <EOI>
> [223964.580198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580202]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.580206]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580210]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.580214]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.580217] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.580235]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.580245] Call Trace:
> [223964.580246]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.580252]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.580255]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.580259]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.580262]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.580265]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.580269]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.580272]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.580276]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.580279]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.580283]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580286]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.580289]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.580292]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.580295]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.580297]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580303]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.580307]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580310]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.580313]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.268002] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223964.268002] RBP: ffff88022fc03ce0 R08: 000000007cb6c596 R09: 0000000000000000
> [223964.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c58
> [223964.268002] R13: ffffffff816feb33 R14: ffff88022fc03ce0 R15: 000000002eb85d38
> [223964.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223964.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223964.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223964.268002] Stack:
> [223964.268002]  ffff88022fc03d30 ffffffff813a4ee8 ffffffff819a6000 000000007cb6c5e3
> [223964.268002]  000000007c44ac9c ffff8801f6c22448 000000002eb85d38 000000006efcdc62
> [223964.268002]  0000000000000001 ffffffff81a2b020 ffff88022fc03d40 ffffffff813a4f6a
> [223964.268002] Call Trace:
> [223964.268002]  <IRQ>
> [223964.268002]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.268002]  <EOI>
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223964.268002] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.268002]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.268002] Call Trace:
> [223964.268002]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.815995] INFO: rcu_sched_state detected stall on CPU 1 (t=15000 jiffies)
> [223968.819995] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 3, t=15002 jiffies)
> [223968.820000] sending NMI to all CPUs:
> [223968.820002] NMI backtrace for cpu 3
> [223968.820002] CPU 3
> [223968.820002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820002]
> [223968.820002] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820002] RIP: 0010:[<ffffffff813a4f86>]  [<ffffffff813a4f86>] __const_udelay+0x16/0x40
> [223968.820002] RSP: 0018:ffff88022fcc3a90  EFLAGS: 00000002
> [223968.820002] RAX: 0000000000e34d8a RBX: 0000000000000001 RCX: 0000000001062560
> [223968.820002] RDX: 000000000071a6c5 RSI: 0000000000000002 RDI: 0000000000418958
> [223968.820002] RBP: ffff88022fcc3ab0 R08: 0000000000000002 R09: 0000000000000000
> [223968.820002] R10: 0000000000000006 R11: 000000000000000a R12: ffffffff81a40d80
> [223968.820002] R13: 0000000000000010 R14: ffffffff81a40e40 R15: ffffffff81a40fc0
> [223968.820002] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223968.820002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820002] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223968.820002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820002] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223968.820002] Stack:
> [223968.820002]  ffff88022fcc3ab0 ffffffff81031695 ffff88022fccdfa0 ffff88022fccdfa0
> [223968.820002]  ffff88022fcc3af0 ffffffff810bb9d2 ffffffff81a40fc0 0000000000000003
> [223968.820002]  0000000000000003 ffff880226981f20 ffffffff810921f0 ffff88022fcc3be0
> [223968.820002] Call Trace:
> [223968.820002]  <IRQ>
> [223968.820002]  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
> [223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
> [223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  <EOI>
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820002] Code: 00 00 00 00 00 55 48 89 e5 ff 15 8e a5 6c 00 c9 c3 0f 1f 40 00 55 48 8d 0c bd 00 00 00 00 65 48 8b 14 25 58 2d 01 00 48 8d 04 12
> [223968.820002]  c1 e2 06 48 89 e5 48 29 c2 48 89 c8 f7 e2 48 8d 7a 01 ff 15
> [223968.820002] Call Trace:
> [223968.820002]  <IRQ>  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
> [223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
> [223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820335] NMI backtrace for cpu 0
> [223968.820337] CPU 0
> [223968.820338] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820347]
> [223968.820349] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820353] RIP: 0010:[<ffffffff813a4ef0>]  [<ffffffff813a4ef0>] delay_tsc+0x80/0xd0
> [223968.820358] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
> [223968.820360] RAX: 000000007659b10f RBX: 0000000000000001 RCX: 000000007659b10f
> [223968.820363] RDX: 000000007659b10f RSI: ffffffff818f9896 RDI: 0000000000000001
> [223968.820365] RBP: ffff88022fc03d30 R08: 000000007659b10f R09: 0000000000000000
> [223968.820367] R10: ffffffff81a2b020 R11: 0000000000000000 R12: 0000000031026962
> [223968.820370] R13: 000000006efcdc62 R14: ffffffff819a6000 R15: 000000007659b0de
> [223968.820373] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223968.820375] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820377] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223968.820380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820382] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820385] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223968.820387] Stack:
> [223968.820388]  ffffffff819a6000 000000007659b0de 00000000818f9896 ffff8801f6c22448
> [223968.820393]  0000000031026962 000000006efcdc62 0000000000000001 ffffffff81a2b020
> [223968.820397]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
> [223968.820401] Call Trace:
> [223968.820402]  <IRQ>
> [223968.820406]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820410]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820414]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820417]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820420]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820424]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820427]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820430]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820434]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820437]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820441]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820444]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820447]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820450]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820452]  <EOI>
> [223968.820455]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820459]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820462]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820465]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820468]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223968.820471]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223968.820475]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223968.820478]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223968.820481]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223968.820484]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.820486] Code: 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 e8 b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0
> [223968.820504]  2b 45 c8 48 39 d8 72 c7 65 48 8b 04 25 08 c4 00 00 83 a8 44
> [223968.820514] Call Trace:
> [223968.820515]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820521]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820525]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820528]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820532]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820535]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820538]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820542]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820546]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820549]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820552]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820556]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820559]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820562]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820564]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820570]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820573]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820576]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820579]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223968.820583]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223968.820586]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223968.820589]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223968.820593]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223968.820596]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.820599] NMI backtrace for cpu 2
> [223968.820600] CPU 2
> [223968.820602] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820610]
> [223968.820612] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820616] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223968.820621] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
> [223968.820623] RAX: 000000007659b116 RBX: 0000000000000001 RCX: 000000007659b0e5
> [223968.820625] RDX: 0000000000017b6b RSI: 0000000000000000 RDI: 0000000000000001
> [223968.820628] RBP: ffff88022fc83ce0 R08: 000000007659b098 R09: 0000000000000000
> [223968.820630] R10: ffff880226948000 R11: 0000000000000000 R12: 00000000345f87d7
> [223968.820632] R13: 000000006efcdc62 R14: ffff88022693e000 R15: 000000007659b0e5
> [223968.820635] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223968.820638] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820640] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223968.820642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820645] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820647] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223968.820649] Stack:
> [223968.820651]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 000000007659b0e5
> [223968.820655]  000000026b4044c5 ffff88006afd8948 00000000345f87d7 000000006efcdc62
> [223968.820659]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223968.820663] Call Trace:
> [223968.820665]  <IRQ>
> [223968.820668]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223968.820671]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820674]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820678]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820682]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820685]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820688]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820691]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820695]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820699]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820702]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820705]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820708]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820712]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820715]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820717]  <EOI>
> [223968.820720]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820723]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820727]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820730]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820733]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820735] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223968.820753]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223968.820763] Call Trace:
> [223968.820764]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223968.820769]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820773]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820777]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820780]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820783]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820787]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820790]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820793]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820797]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820801]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820804]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820807]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820810]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820813]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820815]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820821]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820824]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820827]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820831]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.816001] NMI backtrace for cpu 1
> [223968.816001] CPU 1
> [223968.816001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.816001]
> [223968.816001] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.816001] RIP: 0010:[<ffffffff81440955>]  [<ffffffff81440955>] io_serial_out+0x15/0x20
> [223968.816001] RSP: 0018:ffff88022fc437f0  EFLAGS: 00000002
> [223968.816001] RAX: 0000000000000073 RBX: ffffffff8243eec0 RCX: 0000000000000000
> [223968.816001] RDX: 00000000000003f8 RSI: 00000000000003f8 RDI: ffffffff8243eec0
> [223968.816001] RBP: ffff88022fc437f0 R08: 000000007659a435 R09: 0000000000000000
> [223968.816001] R10: 0000000000000001 R11: 000000000000000a R12: 0000000000000073
> [223968.816001] R13: ffffffff81bc648d R14: 0000000000000050 R15: ffffffff8243eec0
> [223968.816001] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223968.816001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.816001] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223968.816001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.816001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.816001] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223968.816001] Stack:
> [223968.816001]  ffff88022fc43810 ffffffff814410dc 0000000000000030 ffffffff814410b0
> [223968.816001]  ffff88022fc43850 ffffffff8143cdb5 0000000000000087 0000000000000000
> [223968.816001]  ffffffff8243eec0 0000000000000001 0000000000000087 000000000000000d
> [223968.816001] Call Trace:
> [223968.816001]  <IRQ>
> [223968.816001]  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
> [223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
> [223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
> [223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
> [223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
> [223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
> [223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
> [223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
> [223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
> [223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
> [223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
> [223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
> [223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  <EOI>
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.816001] Code: 48 89 e5 d3 e2 03 57 38 ec 0f b6 c0 c9 c3 0f 1f 84 00 00 00 00 00 0f b6 8f 81 00 00 00 55 89 d0 48 89 e5 d3 e6 03 77 38 89 f2 ee <c9> c3 66 0f 1f 84 00 00 00 00 00 55 80 bf 82 00 00 00 08 48 89
> [223968.816001] Call Trace:
> [223968.816001]  <IRQ>  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
> [223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
> [223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
> [223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
> [223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
> [223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
> [223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
> [223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
> [223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
> [223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
> [223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
> [223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
> [223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
>
> [ goes on for another ~300kB, trimmed ]
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25  7:13                     ` Linux 3.1-rc9 Linus Torvalds
@ 2011-10-25  9:01                       ` David Miller
  2011-10-25 12:30                         ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: David Miller @ 2011-10-25  9:01 UTC (permalink / raw)
  To: torvalds
  Cc: sim, netdev, tglx, a.p.zijlstra, linux-kernel, davej, schwidefsky, mingo

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 25 Oct 2011 09:13:48 +0200

> Added netdev, because this seems to be a generic networking bug (ABBA
> between sk_lock and icsk_retransmit_timer if my quick scan looks
> correct).
> 
> Davem?

I suspect that's all just a side effect of whatever is creating the
preempt_count imbalance.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25  9:01                       ` David Miller
@ 2011-10-25 12:30                         ` Thomas Gleixner
  2011-10-25 23:18                           ` David Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2011-10-25 12:30 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, sim, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo

On Tue, 25 Oct 2011, David Miller wrote:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 25 Oct 2011 09:13:48 +0200
> 
> > Added netdev, because this seems to be a generic networking bug (ABBA
> > between sk_lock and icsk_retransmit_timer if my quick scan looks
> > correct).
> > 
> > Davem?
> 
> I suspect that's all just a side effect of whatever is creating the
> preempt_count imbalance.

Something is holding socket lock and it was acquired in sk_clone()
which does bh_lock_sock() and returns with the lock held, though I got
completely lost in the gazillions of possible callchains ...

While staring at it I found an missing unlock in sk_clone() itself,
but that's not the one which causes the leak. Lockdep would have
complained about that separately :)

Thanks,

	tglx

--------->
Subject: net: Unlock sock before calling sk_free()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Index: linux-2.6/net/core/sock.c
===================================================================
--- linux-2.6.orig/net/core/sock.c
+++ linux-2.6/net/core/sock.c
@@ -1260,6 +1260,7 @@ struct sock *sk_clone(const struct sock 
 			/* It is still raw copy of parent, so invalidate
 			 * destructor and make plain sk_free() */
 			newsk->sk_destruct = NULL;
+			bh_unlock_sock(newsk);
 			sk_free(newsk);
 			newsk = NULL;
 			goto out;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
       [not found]                   ` <20111024190203.GA24410@hostway.ca>
  2011-10-25  7:13                     ` Linux 3.1-rc9 Linus Torvalds
@ 2011-10-25 20:20                     ` Simon Kirby
  2011-10-31 17:32                       ` Simon Kirby
  1 sibling, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2011-10-25 20:20 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, Network Development

On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:

> Ok, hit the hang about 4 more times, but only this morning on a box with
> a serial cable attached. Yay!

Here's lockdep output from another box. This one looks a bit different.

Simon-

[583223.799383] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583223.805083] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583223.805093] 
[583223.805094] =================================
[583223.805096] [ INFO: inconsistent lock state ]
[583223.805098] 3.1.0-rc10-hw-lockdep+ #51
[583223.805100] ---------------------------------
[583223.805102] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[583223.805105] swapper/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
[583223.805107]  (slock-AF_INET){+.?.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[583223.805116] {IN-SOFTIRQ-W} state was registered at:
[583223.805117]   [<ffffffff81098c7c>] __lock_acquire+0xcbc/0x2180
[583223.805123]   [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[583223.805126]   [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[583223.805131]   [<ffffffff8166bd3d>] udp_queue_rcv_skb+0x26d/0x4b0
[583223.805135]   [<ffffffff8166c6a3>] __udp4_lib_rcv+0x2f3/0x910
[583223.805139]   [<ffffffff8166ccd5>] udp_rcv+0x15/0x20
[583223.805142]   [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[583223.805146]   [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[583223.805149]   [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510
[583223.805152]   [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[583223.805154]   [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[583223.805158]   [<ffffffff81610ec0>] process_backlog+0xd0/0x1e0
[583223.805161]   [<ffffffff81613880>] net_rx_action+0x140/0x2c0
[583223.805164]   [<ffffffff810640b8>] __do_softirq+0x138/0x250
[583223.805168]   [<ffffffff817002bc>] call_softirq+0x1c/0x30
[583223.805172]   [<ffffffff810153c5>] do_softirq+0x95/0xd0
[583223.805176]   [<ffffffff81063ecd>] local_bh_enable+0xed/0x110
[583223.805179]   [<ffffffff81614c48>] dev_queue_xmit+0x1a8/0x8a0
[583223.805181]   [<ffffffff8161f1aa>] neigh_resolve_output+0x17a/0x220
[583223.805185]   [<ffffffff81647d4c>] ip_finish_output+0x2ec/0x590
[583223.805188]   [<ffffffff81648078>] ip_output+0x88/0xe0
[583223.805191]   [<ffffffff81646cd8>] ip_local_out+0x28/0x80
[583223.805194]   [<ffffffff81646d39>] ip_send_skb+0x9/0x40
[583223.805197]   [<ffffffff8166aeb2>] udp_send_skb+0x122/0x390
[583223.805200]   [<ffffffff8166db0c>] udp_sendmsg+0x7dc/0x920
[583223.805203]   [<ffffffff81675e1f>] inet_sendmsg+0xbf/0x120
[583223.805207]   [<ffffffff815ff333>] sock_sendmsg+0xe3/0x110
[583223.805209]   [<ffffffff815ffc55>] sys_sendto+0x105/0x140
[583223.805212]   [<ffffffff816fe052>] system_call_fastpath+0x16/0x1b
[583223.805217] irq event stamp: 4284605374
[583223.805219] hardirqs last  enabled at (4284605372): [<ffffffff816101ad>] net_rps_action_and_irq_enable+0x8d/0xa0
[583223.805222] hardirqs last disabled at (4284605373): [<ffffffff8106412d>] __do_softirq+0x1ad/0x250
[583223.805226] softirqs last  enabled at (4284605374): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[583223.805230] softirqs last disabled at (4284605313): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[583223.805233] 
[583223.805233] other info that might help us debug this:
[583223.805235]  Possible unsafe locking scenario:
[583223.805236] 
[583223.805237]        CPU0
[583223.805238]        ----
[583223.805239]   lock(slock-AF_INET);
[583223.805241]   <Interrupt>
[583223.805242]     lock(slock-AF_INET);
[583223.805244] 
[583223.805245]  *** DEADLOCK ***
[583223.805246] 
[583223.805248] 1 lock held by swapper/0:
[583223.805249]  #0:  (slock-AF_INET){+.?.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[583223.805254] 
[583223.805254] stack backtrace:
[583223.805257] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51
[583223.805259] Call Trace:
[583223.805264]  [<ffffffff81096033>] print_usage_bug+0x243/0x310
[583223.805267]  [<ffffffff810965b4>] mark_lock+0x4b4/0x6c0
[583223.805271]  [<ffffffff81097400>] ? check_usage_forwards+0x110/0x110
[583223.805275]  [<ffffffff81096862>] mark_held_locks+0xa2/0x130
[583223.805278]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[583223.805281]  [<ffffffff81096c0d>] trace_hardirqs_on_caller+0x13d/0x1c0
[583223.805286]  [<ffffffff813a60ae>] trace_hardirqs_on_thunk+0x3a/0x3f
[583223.805290]  [<ffffffff81092b8e>] ? tick_nohz_stop_sched_tick+0x2fe/0x430
[583223.805293]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[583223.805297]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[583223.805301]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[583223.805304]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[583223.805307]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[583223.805310]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[583223.805315]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[583223.805318]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[583223.805321]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[583223.805325]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[583226.813848] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583232.802948] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583244.833571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[583253.849631] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[583268.837126] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587843.931805] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587846.165584] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587850.602316] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587859.482841] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587873.940136] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587877.240624] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[590476.272022] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[590476.276002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.276002] irq event stamp: 4284605374
[590476.276002] hardirqs last  enabled at (4284605372): [<ffffffff816101ad>] net_rps_action_and_irq_enable+0x8d/0xa0
[590476.276002] hardirqs last disabled at (4284605373): [<ffffffff8106412d>] __do_softirq+0x1ad/0x250
[590476.276002] softirqs last  enabled at (4284605374): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[590476.276002] softirqs last disabled at (4284605313): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.276002] CPU 0 
[590476.276002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.276002] 
[590476.276002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0UR033
[590476.276002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
[590476.276002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
[590476.276002] RAX: 00042f884dcdaa24 RBX: ffff88022fc0d3c0 RCX: 000000004dcdaa24
[590476.380029] BUG: soft lockup - CPU#1 stuck for 22s! [php:10828]
[590476.380033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.380044] irq event stamp: 0
[590476.380045] hardirqs last  enabled at (0): [<          (null)>]           (null)
[590476.380048] hardirqs last disabled at (0): [<ffffffff8105aa8b>] copy_process+0x65b/0x1450
[590476.380056] softirqs last  enabled at (0): [<ffffffff8105aa8b>] copy_process+0x65b/0x1450
[590476.380060] softirqs last disabled at (0): [<          (null)>]           (null)
[590476.380063] CPU 1 
[590476.380064] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.380072] 
[590476.380075] Pid: 10828, comm: php Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0UR033
[590476.380079] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[590476.380086] RSP: 0000:ffff88022fc43ce0  EFLAGS: 00000206
[590476.380088] RAX: 000000005aa56d04 RBX: ffffffff816f6334 RCX: 000000005aa56c92
[590476.380091] RDX: 0000000000042f88 RSI: ffffffff818f9896 RDI: 0000000000000001
[590476.380093] RBP: ffff88022fc43ce0 R08: 000000005aa56c92 R09: 0000000000000000
[590476.380096] R10: ffff88014b9a9f20 R11: 0000000000000000 R12: ffff88022fc43c58
[590476.380098] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 000000000e27878c
[590476.380101] FS:  00007fb61c8fa720(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[590476.380103] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[590476.380106] CR2: 00000000027914a0 CR3: 000000013a070000 CR4: 00000000000006e0
[590476.380108] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[590476.380110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[590476.380113] Process php (pid: 10828, threadinfo ffff88014a1f2000, task ffff88014b9a9f20)
[590476.380115] Stack:
[590476.380117]  ffff88022fc43d30 ffffffff813a4eaf ffff88014a1f2000 000000005aa56c38
[590476.380121]  00000001818f9896 ffff88001db58048 000000000e27878c 0000000076e96800
[590476.380125]  0000000000000001 ffff88014b9a9f20 ffff88022fc43d40 ffffffff813a4f6a
[590476.380129] Call Trace:
[590476.380132]  <IRQ> 
[590476.380137]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[590476.380141]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[590476.380145]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[590476.380151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[590476.380157]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[590476.380161]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[590476.380166]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[590476.380169]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[590476.380174]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[590476.380179]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[590476.380184]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.380188]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[590476.380191]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[590476.380196]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[590476.380200]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[590476.380203]  <EOI> 
[590476.380206]  [<ffffffff816f6319>] ? retint_swapgs+0x13/0x1b
[590476.380208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[590476.380227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[590476.380236] Call Trace:
[590476.380237]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[590476.380242]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[590476.380246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[590476.380249]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[590476.380252]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[590476.380256]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[590476.380259]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[590476.380262]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[590476.380265]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[590476.380268]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[590476.380271]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.380274]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[590476.380277]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[590476.380280]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[590476.380283]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[590476.380285]  <EOI>  [<ffffffff816f6319>] ? retint_swapgs+0x13/0x1b
[590476.484032] BUG: soft lockup - CPU#2 stuck for 23s! [suexec:10831]
...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 12:30                         ` Thomas Gleixner
@ 2011-10-25 23:18                           ` David Miller
  0 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2011-10-25 23:18 UTC (permalink / raw)
  To: tglx
  Cc: torvalds, sim, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo

From: Thomas Gleixner <tglx@linutronix.de>
Date: Tue, 25 Oct 2011 14:30:50 +0200 (CEST)

> Subject: net: Unlock sock before calling sk_free()
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Good spotting, applied, thanks Thomas!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 20:20                     ` Simon Kirby
@ 2011-10-31 17:32                       ` Simon Kirby
  2011-11-02 16:40                         ` Thomas Gleixner
  2011-11-02 22:10                         ` Steven Rostedt
  0 siblings, 2 replies; 31+ messages in thread
From: Simon Kirby @ 2011-10-31 17:32 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, Network Development

On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:

> On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> 
> > Ok, hit the hang about 4 more times, but only this morning on a box with
> > a serial cable attached. Yay!
> 
> Here's lockdep output from another box. This one looks a bit different.

One more, again a bit different. The last few lockups have looked like
this. Not sure why, but we're hitting this at a few a day now. Thomas,
this is without your patch, but as you said, that's right before a free
and should print a separate lockdep warning.

No "huh" lines until after the trace on this one. I'll move to 3.1 with
cherry-picked b0691c8e now.

Simon-

[104661.173798] 
[104661.173801] =======================================================
[104661.179922] [ INFO: possible circular locking dependency detected ]
[104661.179922] 3.1.0-rc10-hw-lockdep+ #51
[104661.179922] -------------------------------------------------------
[104661.179922] watchdog.pl/29331 is trying to acquire lock:
[104661.179922]  (slock-AF_INET/1){+.-.-.}, at: [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.179922] 
[104661.179922] but task is already holding lock:
[104661.179922]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.179922] 
[104661.179922] which lock already depends on the new lock.
[104661.179922] 
[104661.179922] 
[104661.179922] the existing dependency chain (in reverse order) is:
[104661.239412] 
[104661.239412] -> #1 (slock-AF_INET){+.-.-.}:
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[104661.244767]        [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]        [<ffffffff8164cb33>] inet_csk_clone+0x13/0x90
[104661.244767]        [<ffffffff816669a5>] tcp_create_openreq_child+0x25/0x4d0
[104661.244767]        [<ffffffff81664c78>] tcp_v4_syn_recv_sock+0x48/0x2c0
[104661.244767]        [<ffffffff816667f5>] tcp_check_req+0x335/0x4c0
[104661.244767]        [<ffffffff81663e5e>] tcp_v4_do_rcv+0x29e/0x460
[104661.244767]        [<ffffffff816648ac>] tcp_v4_rcv+0x88c/0xc10   
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81610ec0>] process_backlog+0xd0/0x1e0
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0 
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250  
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30    
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0      
[104661.244767]        [<ffffffff81063dbd>] local_bh_enable_ip+0xed/0x110
[104661.244767]        [<ffffffff816f5e9f>] _raw_spin_unlock_bh+0x3f/0x50
[104661.244767]        [<ffffffff81602e41>] release_sock+0x161/0x1d0
[104661.244767]        [<ffffffff816762ed>] inet_stream_connect+0x6d/0x2f0
[104661.244767]        [<ffffffff815fcfeb>] kernel_connect+0xb/0x10
[104661.244767]        [<ffffffff816aaf86>] xs_tcp_setup_socket+0x2a6/0x4c0
[104661.244767]        [<ffffffff81078cf9>] process_one_work+0x1e9/0x560   
[104661.244767]        [<ffffffff81079403>] worker_thread+0x193/0x420      
[104661.244767]        [<ffffffff81080466>] kthread+0x96/0xb0
[104661.244767]        [<ffffffff817001c4>] kernel_thread_helper+0x4/0x10
[104661.244767] 
[104661.244767] -> #0 (slock-AF_INET/1){+.-.-.}:
[104661.244767]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]        [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81612e24>] netif_receive_skb+0x104/0x120  
[104661.244767]        [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]        [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a     
[104661.244767]        [<ffffffff816f65b5>] page_fault+0x25/0x30     
[104661.244767] 
[104661.244767] other info that might help us debug this:
[104661.244767] 
[104661.244767]  Possible unsafe locking scenario:
[104661.244767]        
[104661.244767]        CPU0                    CPU1
[104661.244767]        ----                    ----
[104661.244767]   lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]   lock(slock-AF_INET);
[104661.244767] 
[104661.244767]  *** DEADLOCK ***
[104661.244767] 
[104661.244767] 3 locks held by watchdog.pl/29331:
[104661.244767]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff816109f5>] __netif_receive_skb+0x165/0x560
[104661.244767]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816418a0>] ip_local_deliver_finish+0x40/0x2f0
[104661.244767] 
[104661.244767] stack backtrace:
[104661.244767] Pid: 29331, comm: watchdog.pl Not tainted 3.1.0-rc10-hw-lockdep+ #51
[104661.244767] Call Trace:
[104661.244767]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
[104661.244767]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10  
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81636978>] ? nf_hook_slow+0x148/0x1a0
[104661.244767]  [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]  [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]  [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]  [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]  [<ffffffff816109f5>] ? __netif_receive_skb+0x165/0x560
[104661.244767]  [<ffffffff81612e24>] netif_receive_skb+0x104/0x120
[104661.244767]  [<ffffffff81612d43>] ? netif_receive_skb+0x23/0x120
[104661.244767]  [<ffffffff816133ab>] ? dev_gro_receive+0x29b/0x380 
[104661.244767]  [<ffffffff816132a2>] ? dev_gro_receive+0x192/0x380 
[104661.244767]  [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]  [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]  [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]  [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]  [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]  [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
[104661.244767]  <EOI>  [<ffffffff816f99b3>] ? do_page_fault+0x93/0x520
[104661.244767]  [<ffffffff816f99af>] ? do_page_fault+0x8f/0x520
[104661.244767]  [<ffffffff81149afc>] ? vfsmount_lock_local_unlock+0x1c/0x40
[104661.244767]  [<ffffffff8114a79b>] ? mntput_no_expire+0x3b/0x150
[104661.244767]  [<ffffffff8114a8ca>] ? mntput+0x1a/0x30
[104661.244767]  [<ffffffff8112c540>] ? fput+0x190/0x230
[104661.244767]  [<ffffffff813a60ed>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[104661.244767]  [<ffffffff816f65b5>] page_fault+0x25/0x30
[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104663.418206] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104666.420003] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104672.425159] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104684.423542] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[104691.206752] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-31 17:32                       ` Simon Kirby
@ 2011-11-02 16:40                         ` Thomas Gleixner
  2011-11-02 17:27                           ` Eric Dumazet
  2011-11-02 18:28                           ` Simon Kirby
  2011-11-02 22:10                         ` Steven Rostedt
  1 sibling, 2 replies; 31+ messages in thread
From: Thomas Gleixner @ 2011-11-02 16:40 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Mon, 31 Oct 2011, Simon Kirby wrote:
> On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> 
> > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > 
> > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > a serial cable attached. Yay!
> > 
> > Here's lockdep output from another box. This one looks a bit different.
> 
> One more, again a bit different. The last few lockups have looked like
> this. Not sure why, but we're hitting this at a few a day now. Thomas,
> this is without your patch, but as you said, that's right before a free
> and should print a separate lockdep warning.
> 
> No "huh" lines until after the trace on this one. I'll move to 3.1 with

That means that the lockdep warning hit in the same net_rx cycle
before the leak was detected by the softirq code.

> cherry-picked b0691c8e now.

Can you please add the debug patch below and try the following:

Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER

# cd $DEBUGFSMOUNTPOINT/tracing
# echo sk_clone >set_ftrace_filter
# echo function >current_tracer
# echo 1 >options/func_stack_trace

Now wait until it reproduces (which stops the trace) and read out

# cat trace >/tmp/trace.txt

Please provide the trace file along with the lockdep splat. That
should tell us which callchain is responsible for the spinlock
leakage.

Thanks,

	tglx

--------------->
 kernel/softirq.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/kernel/softirq.c
===================================================================
--- linux-2.6.orig/kernel/softirq.c
+++ linux-2.6/kernel/softirq.c
@@ -238,6 +238,7 @@ restart:
 			h->action(h);
 			trace_softirq_exit(vec_nr);
 			if (unlikely(prev_count != preempt_count())) {
+				tracing_off();
 				printk(KERN_ERR "huh, entered softirq %u %s %p"
 				       "with preempt_count %08x,"
 				       " exited with %08x?\n", vec_nr,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 16:40                         ` Thomas Gleixner
@ 2011-11-02 17:27                           ` Eric Dumazet
  2011-11-02 17:46                             ` Linus Torvalds
                                               ` (2 more replies)
  2011-11-02 18:28                           ` Simon Kirby
  1 sibling, 3 replies; 31+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 17:40 +0100, Thomas Gleixner a écrit :
> On Mon, 31 Oct 2011, Simon Kirby wrote:
> > On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> > 
> > > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > > 
> > > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > > a serial cable attached. Yay!
> > > 
> > > Here's lockdep output from another box. This one looks a bit different.
> > 
> > One more, again a bit different. The last few lockups have looked like
> > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > this is without your patch, but as you said, that's right before a free
> > and should print a separate lockdep warning.
> > 
> > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> 
> That means that the lockdep warning hit in the same net_rx cycle
> before the leak was detected by the softirq code.
> 
> > cherry-picked b0691c8e now.
> 
> Can you please add the debug patch below and try the following:
> 
> Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> 
> # cd $DEBUGFSMOUNTPOINT/tracing
> # echo sk_clone >set_ftrace_filter
> # echo function >current_tracer
> # echo 1 >options/func_stack_trace
> 
> Now wait until it reproduces (which stops the trace) and read out
> 
> # cat trace >/tmp/trace.txt
> 
> Please provide the trace file along with the lockdep splat. That
> should tell us which callchain is responsible for the spinlock
> leakage.
> 
> Thanks,
> 
> 	tglx
> 
> --------------->
>  kernel/softirq.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6/kernel/softirq.c
> ===================================================================
> --- linux-2.6.orig/kernel/softirq.c
> +++ linux-2.6/kernel/softirq.c
> @@ -238,6 +238,7 @@ restart:
>  			h->action(h);
>  			trace_softirq_exit(vec_nr);
>  			if (unlikely(prev_count != preempt_count())) {
> +				tracing_off();
>  				printk(KERN_ERR "huh, entered softirq %u %s %p"
>  				       "with preempt_count %08x,"
>  				       " exited with %08x?\n", vec_nr,


I believe it might come from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

In case inet_csk_route_child_sock() returns NULL, we dont release socket
lock.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                           ` Eric Dumazet
@ 2011-11-02 17:46                             ` Linus Torvalds
  2011-11-02 17:53                               ` Eric Dumazet
  2011-11-02 17:49                             ` Eric Dumazet
  2011-11-02 17:54                             ` Thomas Gleixner
  2 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2011-11-02 17:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 10:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
>
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.

Hmm. I'm not seeing it. We're not even taking the socket lock there.
Or is it hidden somehow?

                    Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                           ` Eric Dumazet
  2011-11-02 17:46                             ` Linus Torvalds
@ 2011-11-02 17:49                             ` Eric Dumazet
  2011-11-02 17:58                               ` Eric Dumazet
  2011-11-02 17:54                             ` Thomas Gleixner
  2 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:27 +0100, Eric Dumazet a écrit :

> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.
> 
> 

Yes, thats the problem. I am testing following patch :

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ea10ee..683d97a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1510,6 +1510,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:46                             ` Linus Torvalds
@ 2011-11-02 17:53                               ` Eric Dumazet
  2011-11-02 18:00                                 ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 10:46 -0700, Linus Torvalds a écrit :
> On Wed, Nov 2, 2011 at 10:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I believe it might come from commit 0e734419
> > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> >
> > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > lock.
> 
> Hmm. I'm not seeing it. We're not even taking the socket lock there.
> Or is it hidden somehow?
> 
>                     Linus

tcp_v4_syn_recv_sock()
{
	newsk = tcp_create_openreq_child(sk, req, skb);

...
	

}

newsk is locked at this point.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                           ` Eric Dumazet
  2011-11-02 17:46                             ` Linus Torvalds
  2011-11-02 17:49                             ` Eric Dumazet
@ 2011-11-02 17:54                             ` Thomas Gleixner
  2011-11-02 18:04                               ` Eric Dumazet
  2 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2011-11-02 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2793 bytes --]

On Wed, 2 Nov 2011, Eric Dumazet wrote:

> Le mercredi 02 novembre 2011 à 17:40 +0100, Thomas Gleixner a écrit :
> > On Mon, 31 Oct 2011, Simon Kirby wrote:
> > > On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> > > 
> > > > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > > > 
> > > > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > > > a serial cable attached. Yay!
> > > > 
> > > > Here's lockdep output from another box. This one looks a bit different.
> > > 
> > > One more, again a bit different. The last few lockups have looked like
> > > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > > this is without your patch, but as you said, that's right before a free
> > > and should print a separate lockdep warning.
> > > 
> > > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> > 
> > That means that the lockdep warning hit in the same net_rx cycle
> > before the leak was detected by the softirq code.
> > 
> > > cherry-picked b0691c8e now.
> > 
> > Can you please add the debug patch below and try the following:
> > 
> > Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> > 
> > # cd $DEBUGFSMOUNTPOINT/tracing
> > # echo sk_clone >set_ftrace_filter
> > # echo function >current_tracer
> > # echo 1 >options/func_stack_trace
> > 
> > Now wait until it reproduces (which stops the trace) and read out
> > 
> > # cat trace >/tmp/trace.txt
> > 
> > Please provide the trace file along with the lockdep splat. That
> > should tell us which callchain is responsible for the spinlock
> > leakage.
> > 
> > Thanks,
> > 
> > 	tglx
> > 
> > --------------->
> >  kernel/softirq.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > Index: linux-2.6/kernel/softirq.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/softirq.c
> > +++ linux-2.6/kernel/softirq.c
> > @@ -238,6 +238,7 @@ restart:
> >  			h->action(h);
> >  			trace_softirq_exit(vec_nr);
> >  			if (unlikely(prev_count != preempt_count())) {
> > +				tracing_off();
> >  				printk(KERN_ERR "huh, entered softirq %u %s %p"
> >  				       "with preempt_count %08x,"
> >  				       " exited with %08x?\n", vec_nr,
> 
> 
> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.

The same applies for if (__inet_inherit_port(sk, newsk) < 0) a few
lines further down, but that part was leaking the lock before that
commit already.

Just for the record, the locking in that code is mind boggling. It
took me some detective work to find even the place where the success
code path unlocks the lock :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:49                             ` Eric Dumazet
@ 2011-11-02 17:58                               ` Eric Dumazet
  2011-11-02 19:16                                 ` Simon Kirby
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:49 +0100, Eric Dumazet a écrit :
> Le mercredi 02 novembre 2011 à 18:27 +0100, Eric Dumazet a écrit :
> 
> > I believe it might come from commit 0e734419
> > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> > 
> > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > lock.
> > 
> > 
> 
> Yes, thats the problem. I am testing following patch :
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> 


This indeed solves the problem, but more closer inspection is needed to
close all bugs, not this only one.

# netstat -s
Ip:
    6961157 total packets received
    0 forwarded
    0 incoming packets discarded
    6961157 incoming packets delivered
    6961049 requests sent out
    2 dropped because of missing route    //// HERE, this is the origin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:53                               ` Eric Dumazet
@ 2011-11-02 18:00                                 ` Linus Torvalds
  2011-11-02 18:05                                   ` Eric Dumazet
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2011-11-02 18:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 10:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> tcp_v4_syn_recv_sock()
> {
>        newsk = tcp_create_openreq_child(sk, req, skb);
>
> newsk is locked at this point.

Umm, if that is the case, then the bug predates the commit you point
to. There were exit paths before that too.

                   Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:54                             ` Thomas Gleixner
@ 2011-11-02 18:04                               ` Eric Dumazet
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Dumazet @ 2011-11-02 18:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:54 +0100, Thomas Gleixner a écrit :

> The same applies for if (__inet_inherit_port(sk, newsk) < 0) a few
> lines further down, but that part was leaking the lock before that
> commit already.
> 

Yes, but in normal condition, this never happened, this is why this
problem was never noticed.

tproxy is probaby very seldom used, and when used the error path is
probably never reached...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:00                                 ` Linus Torvalds
@ 2011-11-02 18:05                                   ` Eric Dumazet
  2011-11-02 18:10                                     ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2011-11-02 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 11:00 -0700, Linus Torvalds a écrit :
> On Wed, Nov 2, 2011 at 10:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > tcp_v4_syn_recv_sock()
> > {
> >        newsk = tcp_create_openreq_child(sk, req, skb);
> >
> > newsk is locked at this point.
> 
> Umm, if that is the case, then the bug predates the commit you point
> to. There were exit paths before that too.
> 

Yes, but only when tproxy is used, and in some obscure error
conditions... Probably nobody ever hit them or complained.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:05                                   ` Eric Dumazet
@ 2011-11-02 18:10                                     ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2011-11-02 18:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 11:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Yes, but only when tproxy is used, and in some obscure error
> conditions... Probably nobody ever hit them or complained.

Yes, I'm not disputing that. However, it does show how incredibly
fragile that code is.

May I suggest renaming those "clone_sk()" kinds of functions
"clone_sk_lock()" or something? So that you *see* that it's locked as
it is cloned. That might have made the bug not happen in the first
place..

Of course, maybe it's obvious to most net people - just not me looking
at the code - that the new socket ended up being locked at allocation.
But considering the bug happened twice, that "obvious" part is clearly
debatable..

                          Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 16:40                         ` Thomas Gleixner
  2011-11-02 17:27                           ` Eric Dumazet
@ 2011-11-02 18:28                           ` Simon Kirby
  2011-11-02 18:30                             ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2011-11-02 18:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 05:40:53PM +0100, Thomas Gleixner wrote:

> On Mon, 31 Oct 2011, Simon Kirby wrote:
> 
> > One more, again a bit different. The last few lockups have looked like
> > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > this is without your patch, but as you said, that's right before a free
> > and should print a separate lockdep warning.
> > 
> > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> 
> That means that the lockdep warning hit in the same net_rx cycle
> before the leak was detected by the softirq code.
> 
> > cherry-picked b0691c8e now.
> 
> Can you please add the debug patch below and try the following:
> 
> Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> 
> # cd $DEBUGFSMOUNTPOINT/tracing
> # echo sk_clone >set_ftrace_filter
> # echo function >current_tracer
> # echo 1 >options/func_stack_trace
> 
> Now wait until it reproduces (which stops the trace) and read out
> 
> # cat trace >/tmp/trace.txt
> 
> Please provide the trace file along with the lockdep splat. That
> should tell us which callchain is responsible for the spinlock
> leakage.
> Thanks,
> 
> 	tglx
> 
> --------------->
>  kernel/softirq.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6/kernel/softirq.c
> ===================================================================
> --- linux-2.6.orig/kernel/softirq.c
> +++ linux-2.6/kernel/softirq.c
> @@ -238,6 +238,7 @@ restart:
>  			h->action(h);
>  			trace_softirq_exit(vec_nr);
>  			if (unlikely(prev_count != preempt_count())) {
> +				tracing_off();
>  				printk(KERN_ERR "huh, entered softirq %u %s %p"
>  				       "with preempt_count %08x,"
>  				       " exited with %08x?\n", vec_nr,

Ok, I'll try this. Hmm, all CPUs typically try to grab the lock fairly
quickly after it happens, which could make it difficult to cat the file.
I'll try ftrace_dump(DUMP_ALL); in there instead.

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:28                           ` Simon Kirby
@ 2011-11-02 18:30                             ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2011-11-02 18:30 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, 2 Nov 2011, Simon Kirby wrote:
> On Wed, Nov 02, 2011 at 05:40:53PM +0100, Thomas Gleixner wrote:
> Ok, I'll try this. Hmm, all CPUs typically try to grab the lock fairly
> quickly after it happens, which could make it difficult to cat the file.
> I'll try ftrace_dump(DUMP_ALL); in there instead.

Eric has spotted the source of trouble already. Can you try his patch
first? If it still persists, we still can resort to hardcore tracing :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:58                               ` Eric Dumazet
@ 2011-11-02 19:16                                 ` Simon Kirby
  2011-11-02 22:42                                   ` Eric Dumazet
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2011-11-02 19:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 06:58:21PM +0100, Eric Dumazet wrote:

> Le mercredi 02 novembre 2011 ?? 18:49 +0100, Eric Dumazet a ??crit :
> > Le mercredi 02 novembre 2011 ?? 18:27 +0100, Eric Dumazet a ??crit :
> > 
> > > I believe it might come from commit 0e734419
> > > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> > > 
> > > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > > lock.
> > > 
> > > 
> > 
> > Yes, thats the problem. I am testing following patch :
> > 
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 0ea10ee..683d97a 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1510,6 +1510,7 @@ exit:
> >  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
> >  	return NULL;
> >  put_and_exit:
> > +	bh_unlock_sock(newsk);
> >  	sock_put(newsk);
> >  	goto exit;
> >  }
> > 
> 
> 
> This indeed solves the problem, but more closer inspection is needed to
> close all bugs, not this only one.
> 
> # netstat -s
> Ip:
>     6961157 total packets received
>     0 forwarded
>     0 incoming packets discarded
>     6961157 incoming packets delivered
>     6961049 requests sent out
>     2 dropped because of missing route    //// HERE, this is the origin

Actually, we have an anti-abuse daemon that injects blackhole routes, so
this makes sense. (The daemon was written before ipsets were merged and
normal netfilter rules make it fall over under attack.)

I'll try with this patch. Thanks!

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-31 17:32                       ` Simon Kirby
  2011-11-02 16:40                         ` Thomas Gleixner
@ 2011-11-02 22:10                         ` Steven Rostedt
  2011-11-02 23:00                           ` Steven Rostedt
  1 sibling, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2011-11-02 22:10 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Thomas pointed me here.

On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> [104661.244767] 
> [104661.244767]  Possible unsafe locking scenario:
> [104661.244767]        
> [104661.244767]        CPU0                    CPU1
> [104661.244767]        ----                    ----
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767] 
> [104661.244767]  *** DEADLOCK ***
> [104661.244767] 

Bah, I used the __print_lock_name() function to show the lock names in
the above, which leaves off the subclass number. I'll go write up a
patch that fixes that.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 19:16                                 ` Simon Kirby
@ 2011-11-02 22:42                                   ` Eric Dumazet
  2011-11-03  0:24                                     ` Thomas Gleixner
                                                       ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Eric Dumazet @ 2011-11-02 22:42 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On 02/11/2011 20:16, Simon Kirby wrote:

 
> Actually, we have an anti-abuse daemon that injects blackhole routes, so
> this makes sense. (The daemon was written before ipsets were merged and
> normal netfilter rules make it fall over under attack.)
> 
> I'll try with this patch. Thanks!
> 


Thanks !

Here is the official submission, please add your 'Tested-by' signature
when you can confirm problem goes away.

(It did here, when I injected random NULL returns from
inet_csk_route_child_sock(), so I am confident this is the problem you hit )

[PATCH] net: add missing bh_unlock_sock() calls

Simon Kirby reported lockdep warnings and following messages :

[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

Problem comes from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

If inet_csk_route_child_sock() returns NULL, we should release socket
lock before freeing it.

Another lock imbalance exists if __inet_inherit_port() returns an error
since commit 093d282321da ( tproxy: fix hash locking issue when using
port redirection in __inet_inherit_port()) a backport is also needed for
>= 2.6.37 kernels.

Reported-by: Dimon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Balazs Scheidler <bazsi@balabit.hu>
CC: KOVACS Krisztian <hidden@balabit.hu>
---
 net/dccp/ipv4.c     |    1 +
 net/ipv4/tcp_ipv4.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 332639b..90a919a 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -433,6 +433,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ea10ee..683d97a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1510,6 +1510,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:10                         ` Steven Rostedt
@ 2011-11-02 23:00                           ` Steven Rostedt
  2011-11-03  0:09                             ` Simon Kirby
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2011-11-02 23:00 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> Thomas pointed me here.
> 
> On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > [104661.244767] 
> > [104661.244767]  Possible unsafe locking scenario:
> > [104661.244767]        
> > [104661.244767]        CPU0                    CPU1
> > [104661.244767]        ----                    ----
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767] 
> > [104661.244767]  *** DEADLOCK ***
> > [104661.244767] 
> 
> Bah, I used the __print_lock_name() function to show the lock names in
> the above, which leaves off the subclass number. I'll go write up a
> patch that fixes that.
> 

Simon,

If you are still triggering the bug. Could you do me a favor and apply
the following patch. Just to make sure it fixes the confusing output
from above.

Thanks,

-- Steve


diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 91d67ce..d821ac9 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
 	usage[i] = '\0';
 }
 
-static int __print_lock_name(struct lock_class *class)
+static void __print_lock_name(struct lock_class *class)
 {
 	char str[KSYM_NAME_LEN];
 	const char *name;
 
 	name = class->name;
-	if (!name)
+	if (!name) {
 		name = __get_key_name(class->key, str);
-
-	return printk("%s", name);
+		printk("%s", name);
+	} else {
+		printk("%s", name);
+		if (class->name_version > 1)
+			printk("#%d", class->name_version);
+		if (class->subclass)
+			printk("/%d", class->subclass);
+	}
 }
 
 static void print_lock_name(struct lock_class *class)
@@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
 
 	get_usage_chars(class, usage);
 
-	name = class->name;
-	if (!name) {
-		name = __get_key_name(class->key, str);
-		printk(" (%s", name);
-	} else {
-		printk(" (%s", name);
-		if (class->name_version > 1)
-			printk("#%d", class->name_version);
-		if (class->subclass)
-			printk("/%d", class->subclass);
-	}
+	printk(" (");
+	__print_lock_name(class);
 	printk("){%s}", usage);
 }
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 23:00                           ` Steven Rostedt
@ 2011-11-03  0:09                             ` Simon Kirby
  2011-11-03  0:15                               ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2011-11-03  0:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 07:00:10PM -0400, Steven Rostedt wrote:

> On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> > Thomas pointed me here.
> > 
> > On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > > [104661.244767] 
> > > [104661.244767]  Possible unsafe locking scenario:
> > > [104661.244767]        
> > > [104661.244767]        CPU0                    CPU1
> > > [104661.244767]        ----                    ----
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767] 
> > > [104661.244767]  *** DEADLOCK ***
> > > [104661.244767] 
> > 
> > Bah, I used the __print_lock_name() function to show the lock names in
> > the above, which leaves off the subclass number. I'll go write up a
> > patch that fixes that.
> > 
> 
> Simon,
> 
> If you are still triggering the bug. Could you do me a favor and apply
> the following patch. Just to make sure it fixes the confusing output
> from above.
> 
> Thanks,
> 
> -- Steve
> 
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 91d67ce..d821ac9 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
>  	usage[i] = '\0';
>  }
>  
> -static int __print_lock_name(struct lock_class *class)
> +static void __print_lock_name(struct lock_class *class)
>  {
>  	char str[KSYM_NAME_LEN];
>  	const char *name;
>  
>  	name = class->name;
> -	if (!name)
> +	if (!name) {
>  		name = __get_key_name(class->key, str);
> -
> -	return printk("%s", name);
> +		printk("%s", name);
> +	} else {
> +		printk("%s", name);
> +		if (class->name_version > 1)
> +			printk("#%d", class->name_version);
> +		if (class->subclass)
> +			printk("/%d", class->subclass);
> +	}
>  }
>  
>  static void print_lock_name(struct lock_class *class)
> @@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
>  
>  	get_usage_chars(class, usage);
>  
> -	name = class->name;
> -	if (!name) {
> -		name = __get_key_name(class->key, str);
> -		printk(" (%s", name);
> -	} else {
> -		printk(" (%s", name);
> -		if (class->name_version > 1)
> -			printk("#%d", class->name_version);
> -		if (class->subclass)
> -			printk("/%d", class->subclass);
> -	}
> +	printk(" (");
> +	__print_lock_name(class);
>  	printk("){%s}", usage);
>  }

Hello!

I am now able to reproduce on demand by just starting an "ab" from
another box and "ip route add blackhole <other machine>" on the target
box while the ab is running. The first time I tried this without your
patch, and got the trace I had before. With your patch, I got this:

[  366.198866] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?
[  366.198981] 
[  366.198982] =================================
[  366.199118] [ INFO: inconsistent lock state ]
[  366.199189] 3.1.0-hw-lockdep+ #58
[  366.199259] ---------------------------------
[  366.199331] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  366.199407] kworker/0:1/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  366.199480]  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.199773] {IN-SOFTIRQ-W} state was registered at:
[  366.199846]   [<ffffffff81098b7c>] __lock_acquire+0xcbc/0x2180
[  366.199973]   [<ffffffff8109a149>] lock_acquire+0x109/0x140
[  366.200096]   [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[  366.200220]   [<ffffffff8166eb6d>] udp_queue_rcv_skb+0x26d/0x4b0
[  366.200346]   [<ffffffff8166f4d3>] __udp4_lib_rcv+0x2f3/0x910
[  366.200470]   [<ffffffff8166fb05>] udp_rcv+0x15/0x20
[  366.200592]   [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[  366.200718]   [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[  366.200841]   [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[  366.200965]   [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[  366.201086]   [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[  366.201211]   [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[  366.201335]   [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[  366.201458]   [<ffffffff810640e8>] __do_softirq+0x138/0x250
[  366.201582]   [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.201706]   [<ffffffff810153c5>] do_softirq+0x95/0xd0
[  366.202822]   [<ffffffff81063efd>] local_bh_enable+0xed/0x110
[  366.202822]   [<ffffffff81617a68>] dev_queue_xmit+0x1a8/0x8a0
[  366.202822]   [<ffffffff81621fca>] neigh_resolve_output+0x17a/0x220
[  366.202822]   [<ffffffff8164ab7c>] ip_finish_output+0x2ec/0x590
[  366.202822]   [<ffffffff8164aea8>] ip_output+0x88/0xe0
[  366.202822]   [<ffffffff81649b08>] ip_local_out+0x28/0x80
[  366.202822]   [<ffffffff81649b69>] ip_send_skb+0x9/0x40
[  366.202822]   [<ffffffff8166dce8>] udp_send_skb+0x108/0x370
[  366.202822]   [<ffffffff8167093c>] udp_sendmsg+0x7dc/0x920
[  366.202822]   [<ffffffff81678c4f>] inet_sendmsg+0xbf/0x120
[  366.202822]   [<ffffffff81602193>] sock_sendmsg+0xe3/0x110
[  366.202822]   [<ffffffff81602ab5>] sys_sendto+0x105/0x140
[  366.202822]   [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[  366.202822] irq event stamp: 1175966
[  366.202822] hardirqs last  enabled at (1175964): [<ffffffff816f9174>] restore_args+0x0/0x30
[  366.202822] hardirqs last disabled at (1175965): [<ffffffff8106415d>] __do_softirq+0x1ad/0x250
[  366.202822] softirqs last  enabled at (1175966): [<ffffffff810641a6>] __do_softirq+0x1f6/0x250
[  366.202822] softirqs last disabled at (1175907): [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.202822] 
[  366.202822] other info that might help us debug this:
[  366.202822]  Possible unsafe locking scenario:
[  366.202822] 
[  366.202822]        CPU0
[  366.202822]        ----
[  366.202822]   lock(slock-AF_INET);
[  366.202822]   <Interrupt>
[  366.202822]     lock(slock-AF_INET);
[  366.202822] 
[  366.202822]  *** DEADLOCK ***
[  366.202822] 
[  366.202822] 1 lock held by kworker/0:1/0:
[  366.202822]  #0:  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.202822] 
[  366.202822] stack backtrace:
[  366.202822] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-hw-lockdep+ #58
[  366.202822] Call Trace:
[  366.202822]  [<ffffffff81095f31>] print_usage_bug+0x241/0x310
[  366.202822]  [<ffffffff810964b4>] mark_lock+0x4b4/0x6c0
[  366.202822]  [<ffffffff81097300>] ? check_usage_forwards+0x110/0x110
[  366.202822]  [<ffffffff81096762>] mark_held_locks+0xa2/0x130
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff81096b0d>] trace_hardirqs_on_caller+0x13d/0x1c0
[  366.202822]  [<ffffffff813a60ee>] trace_hardirqs_on_thunk+0x3a/0x3f
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[  366.202822]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[  366.202822]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[  366.202822]  [<ffffffff816ef2eb>] start_secondary+0x1ca/0x1ff

...which of course is a different splat, so I ran it again:

[   49.028097] =======================================================
[   49.028244] [ INFO: possible circular locking dependency detected ]
[   49.028321] 3.1.0-hw-lockdep+ #58
[   49.028391] -------------------------------------------------------
[   49.028466] tcsh/2490 is trying to acquire lock:
[   49.028539]  (slock-AF_INET/1){+.-...}, at: [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.028882] 
[   49.028883] but task is already holding lock:
[   49.029018]  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.029310] 
[   49.029310] which lock already depends on the new lock.
[   49.029312] 
[   49.029513] 
[   49.029514] the existing dependency chain (in reverse order) is:
[   49.029653] 
[   49.029654] -> #1 (slock-AF_INET){+.-...}:
[   49.029986]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.030115]        [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[   49.030242]        [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.031959]        [<ffffffff8164f963>] inet_csk_clone+0x13/0x90
[   49.032008]        [<ffffffff816697d5>] tcp_create_openreq_child+0x25/0x4d0
[   49.032008]        [<ffffffff81667aa8>] tcp_v4_syn_recv_sock+0x48/0x2c0
[   49.032008]        [<ffffffff81669625>] tcp_check_req+0x335/0x4c0
[   49.032008]        [<ffffffff81666c8e>] tcp_v4_do_rcv+0x29e/0x460
[   49.032008]        [<ffffffff816676dc>] tcp_v4_rcv+0x88c/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063ded>] local_bh_enable_ip+0xed/0x110
[   49.032008]        [<ffffffff816f8ccf>] _raw_spin_unlock_bh+0x3f/0x50
[   49.032008]        [<ffffffff81605ca1>] release_sock+0x161/0x1d0
[   49.032008]        [<ffffffff8167911d>] inet_stream_connect+0x6d/0x2f0
[   49.032008]        [<ffffffff815ffe4b>] kernel_connect+0xb/0x10
[   49.032008]        [<ffffffff816addb6>] xs_tcp_setup_socket+0x2a6/0x4c0
[   49.032008]        [<ffffffff81078d29>] process_one_work+0x1e9/0x560
[   49.032008]        [<ffffffff81079433>] worker_thread+0x193/0x420
[   49.032008]        [<ffffffff81080496>] kthread+0x96/0xb0
[   49.032008]        [<ffffffff81703004>] kernel_thread_helper+0x4/0x10
[   49.032008] 
[   49.032008] -> #0 (slock-AF_INET/1){+.-...}:
[   49.032008]        [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.032008]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.032008]        [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.032008]        [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.032008]        [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.032008]        [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.032008]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.032008]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.032008]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.032008]        [<ffffffff816f90b3>] ret_from_intr+0x0/0x1a
[   49.032008]        [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.032008]        [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.032008]        [<ffffffff81060590>] do_wait+0x150/0x270
[   49.032008]        [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.032008]        [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.032008] 
[   49.032008] other info that might help us debug this:
[   49.032008] 
[   49.032008]  Possible unsafe locking scenario:
[   49.032008] 
[   49.032008]        CPU0                    CPU1
[   49.032008]        ----                    ----
[   49.032008]   lock(slock-AF_INET);
[   49.039565]                                lock(slock-AF_INET/1);
[   49.039565]                                lock(slock-AF_INET);
[   49.039565]   lock(slock-AF_INET/1);
[   49.039565] 
[   49.039565]  *** DEADLOCK ***
[   49.039565] 
[   49.039565] 3 locks held by tcsh/2490:
[   49.039565]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.039565]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81613815>] __netif_receive_skb+0x165/0x560
[   49.039565]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816446d0>] ip_local_deliver_finish+0x40/0x2f0
[   49.039565] 
[   49.039565] stack backtrace:
[   49.039565] Pid: 2490, comm: tcsh Not tainted 3.1.0-hw-lockdep+ #58
[   49.039565] Call Trace:
[   49.039565]  <IRQ>  [<ffffffff81097dab>] print_circular_bug+0x21b/0x330
[   49.039565]  [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.039565]  [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.039565]  [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.039565]  [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.039565]  [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.039565]  [<ffffffff81613815>] ? __netif_receive_skb+0x165/0x560
[   49.039565]  [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.039565]  [<ffffffff81615b63>] ? netif_receive_skb+0x23/0x120
[   49.039565]  [<ffffffff816161cb>] ? dev_gro_receive+0x29b/0x380
[   49.039565]  [<ffffffff816160c2>] ? dev_gro_receive+0x192/0x380
[   49.039565]  [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.039565]  [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.039565]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.039565]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.039565]  [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.039565]  [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.039565]  [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.039565]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.039565]  [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.039565]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.039565]  [<ffffffff816f90b3>] common_interrupt+0x73/0x73
[   49.039565]  <EOI>  [<ffffffff810944fd>] ? trace_hardirqs_off+0xd/0x10
[   49.039565]  [<ffffffff816f864f>] ? _raw_write_unlock_irq+0x2f/0x50
[   49.039565]  [<ffffffff816f864b>] ? _raw_write_unlock_irq+0x2b/0x50
[   49.039565]  [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.039565]  [<ffffffff8105f414>] ? release_task+0x24/0x4c0
[   49.039565]  [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.039565]  [<ffffffff81096b0d>] ? trace_hardirqs_on_caller+0x13d/0x1c0
[   49.039565]  [<ffffffff81060590>] do_wait+0x150/0x270
[   49.039565]  [<ffffffff81096b9d>] ? trace_hardirqs_on+0xd/0x10
[   49.039565]  [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.039565]  [<ffffffff8105e9b0>] ? wait_noreap_copyout+0x150/0x150
[   49.039565]  [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.045277] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?

Did that help? I'm not sure if that's what you wanted to see...

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:09                             ` Simon Kirby
@ 2011-11-03  0:15                               ` Steven Rostedt
  2011-11-03  0:17                                 ` Simon Kirby
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2011-11-03  0:15 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
>  
> [   49.032008] other info that might help us debug this:
> [   49.032008] 
> [   49.032008]  Possible unsafe locking scenario:
> [   49.032008] 
> [   49.032008]        CPU0                    CPU1
> [   49.032008]        ----                    ----
> [   49.032008]   lock(slock-AF_INET);
> [   49.039565]                                lock(slock-AF_INET/1);
> [   49.039565]                                lock(slock-AF_INET);
> [   49.039565]   lock(slock-AF_INET/1);
> [   49.039565] 
> [   49.039565]  *** DEADLOCK ***
> [   49.039565] 

> Did that help? I'm not sure if that's what you wanted to see...


Yes, this looks much better than what you previously showed. The added
"/1" makes a world of difference.

Thanks!

I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
you are chasing, but instead fixed the output of what the bug
produced ;)

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:15                               ` Steven Rostedt
@ 2011-11-03  0:17                                 ` Simon Kirby
  0 siblings, 0 replies; 31+ messages in thread
From: Simon Kirby @ 2011-11-03  0:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 08:15:51PM -0400, Steven Rostedt wrote:

> On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
> >  
> > [   49.032008] other info that might help us debug this:
> > [   49.032008] 
> > [   49.032008]  Possible unsafe locking scenario:
> > [   49.032008] 
> > [   49.032008]        CPU0                    CPU1
> > [   49.032008]        ----                    ----
> > [   49.032008]   lock(slock-AF_INET);
> > [   49.039565]                                lock(slock-AF_INET/1);
> > [   49.039565]                                lock(slock-AF_INET);
> > [   49.039565]   lock(slock-AF_INET/1);
> > [   49.039565] 
> > [   49.039565]  *** DEADLOCK ***
> > [   49.039565] 
> 
> > Did that help? I'm not sure if that's what you wanted to see...
> 
> 
> Yes, this looks much better than what you previously showed. The added
> "/1" makes a world of difference.
> 
> Thanks!
> 
> I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
> you are chasing, but instead fixed the output of what the bug
> produced ;)

Well, I was testing this without Eric's patch as I figured you wanted to
see the splat. :) Testing again with Eric's patch now.

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                   ` Eric Dumazet
@ 2011-11-03  0:24                                     ` Thomas Gleixner
  2011-11-03  0:52                                     ` Simon Kirby
  2011-11-03  6:06                                     ` Jörg-Volker Peetz
  2 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2011-11-03  0:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On Wed, 2 Nov 2011, Eric Dumazet wrote:
> On 02/11/2011 20:16, Simon Kirby wrote:
> 
>  
> > Actually, we have an anti-abuse daemon that injects blackhole routes, so
> > this makes sense. (The daemon was written before ipsets were merged and
> > normal netfilter rules make it fall over under attack.)
> > 
> > I'll try with this patch. Thanks!
> > 
> 
> 
> Thanks !
> 
> Here is the official submission, please add your 'Tested-by' signature
> when you can confirm problem goes away.
> 
> (It did here, when I injected random NULL returns from
> inet_csk_route_child_sock(), so I am confident this is the problem you hit )
> 
> [PATCH] net: add missing bh_unlock_sock() calls
> 
> Simon Kirby reported lockdep warnings and following messages :
> 
> [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> Problem comes from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> If inet_csk_route_child_sock() returns NULL, we should release socket
> lock before freeing it.
> 
> Another lock imbalance exists if __inet_inherit_port() returns an error
> since commit 093d282321da ( tproxy: fix hash locking issue when using
> port redirection in __inet_inherit_port()) a backport is also needed for
> >= 2.6.37 kernels.
> 
> Reported-by: Dimon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Balazs Scheidler <bazsi@balabit.hu>
> CC: KOVACS Krisztian <hidden@balabit.hu>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

You probably also want: CC: stable@vger.kernel.org

Thanks,

	tglx

> ---
>  net/dccp/ipv4.c     |    1 +
>  net/ipv4/tcp_ipv4.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index 332639b..90a919a 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -433,6 +433,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                   ` Eric Dumazet
  2011-11-03  0:24                                     ` Thomas Gleixner
@ 2011-11-03  0:52                                     ` Simon Kirby
  2011-11-03 22:07                                       ` David Miller
  2011-11-03  6:06                                     ` Jörg-Volker Peetz
  2 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2011-11-03  0:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On Wed, Nov 02, 2011 at 11:42:56PM +0100, Eric Dumazet wrote:

> On 02/11/2011 20:16, Simon Kirby wrote:
> 
>  
> > Actually, we have an anti-abuse daemon that injects blackhole routes, so
> > this makes sense. (The daemon was written before ipsets were merged and
> > normal netfilter rules make it fall over under attack.)
> > 
> > I'll try with this patch. Thanks!
> > 
> 
> 
> Thanks !
> 
> Here is the official submission, please add your 'Tested-by' signature
> when you can confirm problem goes away.
> 
> (It did here, when I injected random NULL returns from
> inet_csk_route_child_sock(), so I am confident this is the problem you hit )
> 
> [PATCH] net: add missing bh_unlock_sock() calls
> 
> Simon Kirby reported lockdep warnings and following messages :
> 
> [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> Problem comes from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> If inet_csk_route_child_sock() returns NULL, we should release socket
> lock before freeing it.
> 
> Another lock imbalance exists if __inet_inherit_port() returns an error
> since commit 093d282321da ( tproxy: fix hash locking issue when using
> port redirection in __inet_inherit_port()) a backport is also needed for
> >= 2.6.37 kernels.
> 
> Reported-by: Dimon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Balazs Scheidler <bazsi@balabit.hu>
> CC: KOVACS Krisztian <hidden@balabit.hu>
> ---
>  net/dccp/ipv4.c     |    1 +
>  net/ipv4/tcp_ipv4.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index 332639b..90a919a 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -433,6 +433,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }

Tested-by: Simon Kirby <sim@hostway.ca>

I tried many times, with route unreach/blackhole, and could not reproduce
the issue with this patch applied.

Thanks!

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                   ` Eric Dumazet
  2011-11-03  0:24                                     ` Thomas Gleixner
  2011-11-03  0:52                                     ` Simon Kirby
@ 2011-11-03  6:06                                     ` Jörg-Volker Peetz
  2 siblings, 0 replies; 31+ messages in thread
From: Jörg-Volker Peetz @ 2011-11-03  6:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

Eric Dumazet wrote, on 11/02/11 23:42:
<snip>
> Reported-by: Dimon Kirby <sim@hostway.ca>
??             Simon                     ??
<snip>
-- 
Best regards,
Jörg-Volker.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:52                                     ` Simon Kirby
@ 2011-11-03 22:07                                       ` David Miller
  0 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2011-11-03 22:07 UTC (permalink / raw)
  To: sim
  Cc: eric.dumazet, tglx, a.p.zijlstra, torvalds, linux-kernel, davej,
	schwidefsky, mingo, netdev, bazsi, hidden

From: Simon Kirby <sim@hostway.ca>
Date: Wed, 2 Nov 2011 17:52:55 -0700

>> [PATCH] net: add missing bh_unlock_sock() calls
 ...
> Tested-by: Simon Kirby <sim@hostway.ca>
> 
> I tried many times, with route unreach/blackhole, and could not reproduce
> the issue with this patch applied.

Applied and queued up for -stable, thanks everyone!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.1-rc9
       [not found]           ` <20111012213555.GC24461@hostway.ca>
@ 2011-10-18  5:40             ` Simon Kirby
  0 siblings, 0 replies; 31+ messages in thread
From: Simon Kirby @ 2011-10-18  5:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List, netdev

On Wed, Oct 12, 2011 at 02:35:55PM -0700, Simon Kirby wrote:

> > > patching file kernel/posix-cpu-timers.c
> > > patching file kernel/sched_stats.h 
> > 
> > yes that would be fine.
> 
> This patch (s/raw_//) has been stable on 5 boxes for a day. I'll push to
> another 15 shortly and confirm tomorrow. Meanwhile, we had another ~4
> boxes lock up on 3.1-rc9 _with_ d670ec13 reverted (all CPUs spinning),
> but there weren't enough serial cables to log all of them and we haven't
> been lucky enough to capture anything other than what fits on 80x25.
> I'm hoping it's just the same bug you've already fixed.

Looks to be a different bug. It just happened on a box with serial
console logging, on the same build I was testing the above patch on --
Linus master circa Oct 7th. This seems to be specific to TCP. I'm not
sure what is with all of the doubled backtraces. I've only seen this on
a couple of different boxes so far.

Full log at http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log

First 100 lines:

[516112.140013] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[516112.144001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.144001] CPU 0 
[516112.144001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.144001] 
[516112.144001] Pid: 0, comm: swapper Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.144001] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.144001] RSP: 0018:ffff88022fc03e10  EFLAGS: 00000297
[516112.144001] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffffffff81b4df20
[516112.144001] RDX: ffff8801002aebe0 RSI: dead000000200200 RDI: ffff8801002ad188
[516112.144001] RBP: ffff88022fc03e10 R08: 00000000000000f7 R09: 0000000000000000
[516112.144001] R10: 0000000000000000 R11: 0000000000000010 R12: ffff88022fc03d88
[516112.144001] R13: ffffffff816bed1e R14: ffff88022fc03e10 R15: ffffffff81b4df00
[516112.144001] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[516112.244020] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
[516112.244024] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.244033] CPU 1 
[516112.244035] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.244041] 
[516112.244044] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.244048] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.244057] RSP: 0018:ffff88022fc43e10  EFLAGS: 00000297
[516112.244059] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffff880226888020
[516112.244062] RDX: ffff88001ece1aa0 RSI: dead000000200200 RDI: ffff88001ece1f88
[516112.244064] RBP: ffff88022fc43e10 R08: 00000000000000df R09: 0000000000000000
[516112.244066] R10: 0000000000000000 R11: 0000000000000010 R12: ffff88022fc43d88
[516112.244068] R13: ffffffff816bed1e R14: ffff88022fc43e10 R15: ffff880226888000
[516112.244071] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[516112.244074] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[516112.244076] CR2: ffffffffff600400 CR3: 0000000126d93000 CR4: 00000000000006e0
[516112.244078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[516112.244081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[516112.244083] Process kworker/0:0 (pid: 0, threadinfo ffff880226918000, task ffff880226911640)
[516112.244085] Stack:
[516112.244086]  ffff88022fc43e40 ffffffff8162a613 0000000000000000 0000000000000000
[516112.244090]  ffff880226888000 ffff88001ece20e0 ffff88022fc43ee0 ffffffff810692dc
[516112.244094]  0000000000000000 ffff880226919fd8 ffff880226919fd8 ffff880226919fd8
[516112.244098] Call Trace:
[516112.244099]  <IRQ> 
[516112.244105]  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.244110]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.244113]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.244118]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.244121]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.244125]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.244129]  [<ffffffff81014255>] do_softirq+0x65/0xa0
[516112.244132]  [<ffffffff810608fd>] irq_exit+0xad/0xe0
[516112.244135]  [<ffffffff8102f569>] smp_apic_timer_interrupt+0x69/0xa0
[516112.244139]  [<ffffffff816bed1e>] apic_timer_interrupt+0x6e/0x80
[516112.244140]  <EOI> 
[516112.244144]  [<ffffffff8101a337>] ? mwait_idle+0x117/0x120
[516112.244147]  [<ffffffff810120c6>] cpu_idle+0x86/0xe0
[516112.244151]  [<ffffffff816ae77c>] start_secondary+0x1a3/0x1e7
[516112.244153] Code: 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 38 e0 74 06 f3 90 <8a> 07 eb f6 c9 c3 66 0f 1f 44 00 00 55 48 89 e5 9c 58 66 66 90 
[516112.244173] Call Trace:
[516112.244174]  <IRQ>  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.244179]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.244182]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.244185]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.244188]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.244191]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.244194]  [<ffffffff81014255>] do_softirq+0x65/0xa0
[516112.244197]  [<ffffffff810608fd>] irq_exit+0xad/0xe0
[516112.244199]  [<ffffffff8102f569>] smp_apic_timer_interrupt+0x69/0xa0
[516112.244202]  [<ffffffff816bed1e>] apic_timer_interrupt+0x6e/0x80
[516112.244204]  <EOI>  [<ffffffff8101a337>] ? mwait_idle+0x117/0x120
[516112.244209]  [<ffffffff810120c6>] cpu_idle+0x86/0xe0
[516112.244212]  [<ffffffff816ae77c>] start_secondary+0x1a3/0x1e7
[516112.344023] BUG: soft lockup - CPU#2 stuck for 23s! [php:1486]
[516112.344025] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.344033] CPU 2 
[516112.344034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.344040] 
[516112.344042] Pid: 1486, comm: php Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.344046] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.344051] RSP: 0000:ffff88022fc83e10  EFLAGS: 00000297
[516112.344053] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffff880226920020
[516112.344056] RDX: ffff88022198c660 RSI: dead000000200200 RDI: ffff8800ac758cc8
[516112.344058] RBP: ffff88022fc83e10 R08: 00000000000000ef R09: 0000000000000000
[516112.344060] R10: 000000000000018b R11: 0000000000000010 R12: ffff88022fc83d88
[516112.344062] R13: ffffffff816bed1e R14: ffff88022fc83e10 R15: ffff880226920000
[516112.344065] FS:  00007faafda03720(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[516112.344068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[516112.344070] CR2: ffffffffff600400 CR3: 00000002223de000 CR4: 00000000000006e0
[516112.344072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[516112.344075] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[516112.344077] Process php (pid: 1486, threadinfo ffff880039262000, task ffff88003e675900)
[516112.344079] Stack:
[516112.344081]  ffff88022fc83e40 ffffffff8162a613 0000000000000000 0000000000000000
[516112.344084]  ffff880226920000 ffff8800ac758e20 ffff88022fc83ee0 ffffffff810692dc
[516112.344088]  0000000000000001 ffff880039263fd8 ffff880039263fd8 ffff880039263fd8
[516112.344091] Call Trace:
[516112.344093]  <IRQ> 
[516112.344099]  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.344104]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.344107]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.344111]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.344115]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.344119]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.344123]  [<ffffffff81014255>] do_softirq+0x65/0xa0

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-11-03 22:09 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1318847658.6594.40.camel@twins>
     [not found] ` <CA+55aFxaGKEyhXdHXNxvPrPQ-SGSpbXdfoeXrxfjPx3VXsgvtg@mail.gmail.com>
     [not found]   ` <1318874090.4172.84.camel@twins>
     [not found]     ` <CA+55aFwCBy=4YK6amE=H-BYu9-boj4Po2Zkgf4V261mCx0DC4A@mail.gmail.com>
     [not found]       ` <1318879396.4172.92.camel@twins>
     [not found]         ` <alpine.LFD.2.02.1110172237030.3240@ionos>
     [not found]           ` <alpine.LFD.2.02.1110181037120.3240@ionos>
     [not found]             ` <1318928713.21167.4.camel@twins>
     [not found]               ` <20111018182046.GF1309@hostway.ca>
     [not found]                 ` <alpine.LFD.2.02.1110182146440.3240@ionos>
     [not found]                   ` <20111024190203.GA24410@hostway.ca>
2011-10-25  7:13                     ` Linux 3.1-rc9 Linus Torvalds
2011-10-25  9:01                       ` David Miller
2011-10-25 12:30                         ` Thomas Gleixner
2011-10-25 23:18                           ` David Miller
2011-10-25 20:20                     ` Simon Kirby
2011-10-31 17:32                       ` Simon Kirby
2011-11-02 16:40                         ` Thomas Gleixner
2011-11-02 17:27                           ` Eric Dumazet
2011-11-02 17:46                             ` Linus Torvalds
2011-11-02 17:53                               ` Eric Dumazet
2011-11-02 18:00                                 ` Linus Torvalds
2011-11-02 18:05                                   ` Eric Dumazet
2011-11-02 18:10                                     ` Linus Torvalds
2011-11-02 17:49                             ` Eric Dumazet
2011-11-02 17:58                               ` Eric Dumazet
2011-11-02 19:16                                 ` Simon Kirby
2011-11-02 22:42                                   ` Eric Dumazet
2011-11-03  0:24                                     ` Thomas Gleixner
2011-11-03  0:52                                     ` Simon Kirby
2011-11-03 22:07                                       ` David Miller
2011-11-03  6:06                                     ` Jörg-Volker Peetz
2011-11-02 17:54                             ` Thomas Gleixner
2011-11-02 18:04                               ` Eric Dumazet
2011-11-02 18:28                           ` Simon Kirby
2011-11-02 18:30                             ` Thomas Gleixner
2011-11-02 22:10                         ` Steven Rostedt
2011-11-02 23:00                           ` Steven Rostedt
2011-11-03  0:09                             ` Simon Kirby
2011-11-03  0:15                               ` Steven Rostedt
2011-11-03  0:17                                 ` Simon Kirby
     [not found] <CA+55aFxPNszU5UHFrDDYnshLEMupaviFwhgEsgmPkqpmuWNZ8A@mail.gmail.com>
     [not found] ` <20111007070842.GA27555@hostway.ca>
     [not found]   ` <20111007174848.GA11011@hostway.ca>
     [not found]     ` <1318010515.398.8.camel@twins>
     [not found]       ` <20111008005035.GC22843@hostway.ca>
     [not found]         ` <1318060551.8395.0.camel@twins>
     [not found]           ` <20111012213555.GC24461@hostway.ca>
2011-10-18  5:40             ` Simon Kirby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).