* Fw: [Bug 14470] New: freez in TCP stack @ 2009-10-26 15:41 Stephen Hemminger 2009-10-28 22:13 ` Andrew Morton 0 siblings, 1 reply; 17+ messages in thread From: Stephen Hemminger @ 2009-10-26 15:41 UTC (permalink / raw) To: netdev Begin forwarded message: Date: Mon, 26 Oct 2009 12:47:22 GMT From: bugzilla-daemon@bugzilla.kernel.org To: shemminger@linux-foundation.org Subject: [Bug 14470] New: freez in TCP stack http://bugzilla.kernel.org/show_bug.cgi?id=14470 Summary: freez in TCP stack Product: Networking Version: 2.5 Kernel Version: 2.6.31 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: IPV4 AssignedTo: shemminger@linux-foundation.org ReportedBy: kolo@albatani.cz Regression: No We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears usualy under a high network trafic ( around 100Mbit/s) but it is not a rule it has happened even on low trafic. Servers are used as reverse http proxy (varnish). On 6 equal servers this panic happens aprox 2 times a day depending on network load. Machine completly freezes till the management watchdog reboots. We had to put serial console on these servers to catch the oops. Is there anything else We can do to debug this? The RIP is always the same: RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 rest of the oops always differs a litle ... here is an example: RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) Stack: ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 Call Trace: <IRQ> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 [<ffffffff8140701f>] ip_rcv+0x24f/0x350 [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 [<ffffffff8100c27c>] call_softirq+0x1c/0x30 [<ffffffff8100e04d>] do_softirq+0x3d/0x80 [<ffffffff81041b0b>] irq_exit+0x7b/0x90 [<ffffffff8100d613>] do_IRQ+0x73/0xe0 [<ffffffff8100bb13>] ret_from_intr+0x0/0xa <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 [<ffffffff81468db6>] ? rest_init+0x66/0x70 [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 cd 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 f4 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 RSP <ffffc90000003a40> CR2: 0000000000000000 ---[ end trace d97d99c9ae1d52cc ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 Call Trace: <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 [<ffffffff8100f38e>] oops_end+0x9e/0xb0 [<ffffffff81025b9a>] no_context+0x15a/0x250 [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 [<ffffffff8102639f>] do_page_fault+0x17f/0x260 [<ffffffff8147eadf>] page_fault+0x1f/0x30 [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 [<ffffffff8140701f>] ip_rcv+0x24f/0x350 [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 [<ffffffff8100c27c>] call_softirq+0x1c/0x30 [<ffffffff8100e04d>] do_softirq+0x3d/0x80 [<ffffffff81041b0b>] irq_exit+0x7b/0x90 [<ffffffff8100d613>] do_IRQ+0x73/0xe0 [<ffffffff8100bb13>] ret_from_intr+0x0/0xa <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 [<ffffffff81468db6>] ? rest_init+0x66/0x70 [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 -- Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Fw: [Bug 14470] New: freez in TCP stack 2009-10-26 15:41 Fw: [Bug 14470] New: freez in TCP stack Stephen Hemminger @ 2009-10-28 22:13 ` Andrew Morton 2009-10-28 22:27 ` Denys Fedoryschenko 2009-10-29 5:35 ` Eric Dumazet 0 siblings, 2 replies; 17+ messages in thread From: Andrew Morton @ 2009-10-28 22:13 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, kolo, bugzilla-daemon On Mon, 26 Oct 2009 08:41:32 -0700 Stephen Hemminger <shemminger@linux-foundation.org> wrote: > > > Begin forwarded message: > > Date: Mon, 26 Oct 2009 12:47:22 GMT > From: bugzilla-daemon@bugzilla.kernel.org > To: shemminger@linux-foundation.org > Subject: [Bug 14470] New: freez in TCP stack > Stephen, please retain the bugzilla and reporter email cc's when forwarding a report to a mailing list. > http://bugzilla.kernel.org/show_bug.cgi?id=14470 > > Summary: freez in TCP stack > Product: Networking > Version: 2.5 > Kernel Version: 2.6.31 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: kolo@albatani.cz > Regression: No > > > We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears > usualy under a high network trafic ( around 100Mbit/s) but it is not a rule it > has happened even on low trafic. > > Servers are used as reverse http proxy (varnish). > > On 6 equal servers this panic happens aprox 2 times a day depending on network > load. Machine completly freezes till the management watchdog reboots. > Twice a day on six separate machines. That ain't no hardware glitch. Vaclav, are you able to say whether this is a regression? Did those machines run 2.6.30 (for example)? Thanks. > We had to put serial console on these servers to catch the oops. Is there > anything else We can do to debug this? > The RIP is always the same: > > RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > tcp_xmit_retransmit_queue+0x8c/0x290 > > rest of the oops always differs a litle ... here is an example: > > RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > tcp_xmit_retransmit_queue+0x8c/0x290 > RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 > RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 > RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 > RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) > Stack: > ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 > <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 > <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 > Call Trace: > <IRQ> > [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > <EOI> > [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > [<ffffffff81468db6>] ? rest_init+0x66/0x70 > [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 cd > 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 f4 > 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 > RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 > RSP <ffffc90000003a40> > CR2: 0000000000000000 > ---[ end trace d97d99c9ae1d52cc ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 > Call Trace: > <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 > [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa > [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 > [<ffffffff8100f38e>] oops_end+0x9e/0xb0 > [<ffffffff81025b9a>] no_context+0x15a/0x250 > [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 > [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 > [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 > [<ffffffff8102639f>] do_page_fault+0x17f/0x260 > [<ffffffff8147eadf>] page_fault+0x1f/0x30 > [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 > [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > [<ffffffff81468db6>] ? rest_init+0x66/0x70 > [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Fw: [Bug 14470] New: freez in TCP stack 2009-10-28 22:13 ` Andrew Morton @ 2009-10-28 22:27 ` Denys Fedoryschenko 2009-10-29 5:35 ` Eric Dumazet 1 sibling, 0 replies; 17+ messages in thread From: Denys Fedoryschenko @ 2009-10-28 22:27 UTC (permalink / raw) To: netdev; +Cc: Stephen Hemminger > > Twice a day on six separate machines. That ain't no hardware glitch. > > Vaclav, are you able to say whether this is a regression? Did those > machines run 2.6.30 (for example)? > > Thanks. I had issues on Dell also. On one fixed by bios update, another only after tuning some voodoo settings in sysctl (i was in hurry, no redundancy for this server, and it was rebooting each day 1-3 times). It happens also in 32 and 64bit kernels (32bit userspace), also "heavy" tcp workload, both of them act as proxy. But my issue probably different, on both Dell servers i had bnx2 with IPMI. It was very weird, nmi_watchdog, panic on reboot / on oops, detect softlockups, detect deadlocks, detect hang tasks, hangcheck timer - didn't help, only hardware watchdog (IPMI or iTCO) able to catch hang and reboot server. Because i didn't had anything useful to report(remote server and netconsole didn't give anything), i didn't fill bugzilla report. Not sure my post useful in this case, but sharing experience anyway. > > > We had to put serial console on these servers to catch the oops. Is there > > anything else We can do to debug this? > > The RIP is always the same: > > > > RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > > tcp_xmit_retransmit_queue+0x8c/0x290 > > > > rest of the oops always differs a litle ... here is an example: > > > > RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > > tcp_xmit_retransmit_queue+0x8c/0x290 > > RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 > > RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 > > RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 > > RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > > R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 > > FS: 0000000000000000(0000) GS:ffffc90000000000(0000) > > knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process swapper (pid: 0, threadinfo ffffffff81608000, task > > ffffffff81631440) Stack: > > ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 > > <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 > > <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 > > Call Trace: > > <IRQ> > > [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > > [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > > [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > > [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > > [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > > [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > > [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > > [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > > [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > > [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > > [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > > [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > > [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > > [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > > [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > > [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > > [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > > [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > > [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > > [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > > [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > > [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > > [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > > [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > > [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > > [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > > <EOI> > > [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > > [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > > [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > > [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > > [<ffffffff81468db6>] ? rest_init+0x66/0x70 > > [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > > [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > > [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > > Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f > > b6 cd 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 > > 24 4d 39 f4 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 > > RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 > > RSP <ffffc90000003a40> > > CR2: 0000000000000000 > > ---[ end trace d97d99c9ae1d52cc ]--- > > Kernel panic - not syncing: Fatal exception in interrupt > > Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 > > Call Trace: > > <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 > > [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa > > [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 > > [<ffffffff8100f38e>] oops_end+0x9e/0xb0 > > [<ffffffff81025b9a>] no_context+0x15a/0x250 > > [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 > > [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 > > [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 > > [<ffffffff8102639f>] do_page_fault+0x17f/0x260 > > [<ffffffff8147eadf>] page_fault+0x1f/0x30 > > [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 > > [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > > [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > > [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > > [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > > [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > > [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > > [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > > [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > > [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > > [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > > [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > > [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > > [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > > [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > > [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > > [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > > [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > > [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > > [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > > [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > > [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > > [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > > [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > > [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > > [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > > [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > > <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > > [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > > [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > > [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > > [<ffffffff81468db6>] ? rest_init+0x66/0x70 > > [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > > [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > > [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Fw: [Bug 14470] New: freez in TCP stack 2009-10-28 22:13 ` Andrew Morton 2009-10-28 22:27 ` Denys Fedoryschenko @ 2009-10-29 5:35 ` Eric Dumazet 2009-10-29 5:59 ` Eric Dumazet 2009-10-29 12:58 ` Fw: " Ilpo Järvinen 1 sibling, 2 replies; 17+ messages in thread From: Eric Dumazet @ 2009-10-29 5:35 UTC (permalink / raw) To: Andrew Morton; +Cc: Stephen Hemminger, netdev, kolo, bugzilla-daemon Andrew Morton a écrit : > On Mon, 26 Oct 2009 08:41:32 -0700 > Stephen Hemminger <shemminger@linux-foundation.org> wrote: > >> >> Begin forwarded message: >> >> Date: Mon, 26 Oct 2009 12:47:22 GMT >> From: bugzilla-daemon@bugzilla.kernel.org >> To: shemminger@linux-foundation.org >> Subject: [Bug 14470] New: freez in TCP stack >> > > Stephen, please retain the bugzilla and reporter email cc's when > forwarding a report to a mailing list. > > >> http://bugzilla.kernel.org/show_bug.cgi?id=14470 >> >> Summary: freez in TCP stack >> Product: Networking >> Version: 2.5 >> Kernel Version: 2.6.31 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: high >> Priority: P1 >> Component: IPV4 >> AssignedTo: shemminger@linux-foundation.org >> ReportedBy: kolo@albatani.cz >> Regression: No >> >> >> We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears >> usualy under a high network trafic ( around 100Mbit/s) but it is not a rule it >> has happened even on low trafic. >> >> Servers are used as reverse http proxy (varnish). >> >> On 6 equal servers this panic happens aprox 2 times a day depending on network >> load. Machine completly freezes till the management watchdog reboots. >> > > Twice a day on six separate machines. That ain't no hardware glitch. > > Vaclav, are you able to say whether this is a regression? Did those > machines run 2.6.30 (for example)? > > Thanks. > >> We had to put serial console on these servers to catch the oops. Is there >> anything else We can do to debug this? >> The RIP is always the same: >> >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >> tcp_xmit_retransmit_queue+0x8c/0x290 >> >> rest of the oops always differs a litle ... here is an example: >> >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >> tcp_xmit_retransmit_queue+0x8c/0x290 >> RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 >> RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 >> RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 >> RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 >> FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) >> Stack: >> ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 >> <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 >> <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 >> Call Trace: >> <IRQ> >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >> <EOI> >> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 cd >> 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 f4 >> 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 >> RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 >> RSP <ffffc90000003a40> >> CR2: 0000000000000000 >> ---[ end trace d97d99c9ae1d52cc ]--- >> Kernel panic - not syncing: Fatal exception in interrupt >> Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 >> Call Trace: >> <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 >> [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa >> [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 >> [<ffffffff8100f38e>] oops_end+0x9e/0xb0 >> [<ffffffff81025b9a>] no_context+0x15a/0x250 >> [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 >> [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 >> [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 >> [<ffffffff8102639f>] do_page_fault+0x17f/0x260 >> [<ffffffff8147eadf>] page_fault+0x1f/0x30 >> [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >> <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 cmp %eax,0x40(%r12) 0f 89 00 01 00 00 jns ... 41 0f b6 cd movzbl %r13b,%ecx 41 bd 2f 00 00 00 mov $0x2f000000,%r13d 83 e1 03 and $0x3,%ecx 0f 84 fc 00 00 00 je ... 4d 8b 24 24 mov (%r12),%r12 skb = skb->next <>49 8b 04 24 mov (%r12),%rax << NULL POINTER dereference >> 4d 39 f4 cmp %r14,%r12 0f 18 08 prefetcht0 (%rax) 0f 84 d9 00 00 00 je ... 4c 3b a3 b8 01 cmp crash is in void tcp_xmit_retransmit_queue(struct sock *sk) { << HERE >> tcp_for_write_queue_from(skb, sk) { } Some skb in sk_write_queue has a NULL ->next pointer Strange thing is R14 and RAX =ffff8807e7420678 (&sk->sk_write_queue) R14 is the stable value during the loop, while RAW is scratch register. I dont have full disassembly for this function, but I guess we just entered the loop (or RAX should be really different at this point) So, maybe list head itself is corrupted (sk->sk_write_queue->next = NULL) or, retransmit_skb_hint problem ? (we forget to set it to NULL in some cases ?) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Fw: [Bug 14470] New: freez in TCP stack 2009-10-29 5:35 ` Eric Dumazet @ 2009-10-29 5:59 ` Eric Dumazet 2009-10-29 6:02 ` David Miller 2009-10-29 8:00 ` David Miller 2009-10-29 12:58 ` Fw: " Ilpo Järvinen 1 sibling, 2 replies; 17+ messages in thread From: Eric Dumazet @ 2009-10-29 5:59 UTC (permalink / raw) To: David S. Miller Cc: Andrew Morton, Stephen Hemminger, netdev, kolo, bugzilla-daemon Eric Dumazet a écrit : > Andrew Morton a écrit : >> On Mon, 26 Oct 2009 08:41:32 -0700 >> Stephen Hemminger <shemminger@linux-foundation.org> wrote: >> >>> Begin forwarded message: >>> >>> Date: Mon, 26 Oct 2009 12:47:22 GMT >>> From: bugzilla-daemon@bugzilla.kernel.org >>> To: shemminger@linux-foundation.org >>> Subject: [Bug 14470] New: freez in TCP stack >>> >> Stephen, please retain the bugzilla and reporter email cc's when >> forwarding a report to a mailing list. >> >> >>> http://bugzilla.kernel.org/show_bug.cgi?id=14470 >>> >>> Summary: freez in TCP stack >>> Product: Networking >>> Version: 2.5 >>> Kernel Version: 2.6.31 >>> Platform: All >>> OS/Version: Linux >>> Tree: Mainline >>> Status: NEW >>> Severity: high >>> Priority: P1 >>> Component: IPV4 >>> AssignedTo: shemminger@linux-foundation.org >>> ReportedBy: kolo@albatani.cz >>> Regression: No >>> >>> >>> We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears >>> usualy under a high network trafic ( around 100Mbit/s) but it is not a rule it >>> has happened even on low trafic. >>> >>> Servers are used as reverse http proxy (varnish). >>> >>> On 6 equal servers this panic happens aprox 2 times a day depending on network >>> load. Machine completly freezes till the management watchdog reboots. >>> >> Twice a day on six separate machines. That ain't no hardware glitch. >> >> Vaclav, are you able to say whether this is a regression? Did those >> machines run 2.6.30 (for example)? >> >> Thanks. >> >>> We had to put serial console on these servers to catch the oops. Is there >>> anything else We can do to debug this? >>> The RIP is always the same: >>> >>> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >>> tcp_xmit_retransmit_queue+0x8c/0x290 >>> >>> rest of the oops always differs a litle ... here is an example: >>> >>> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >>> tcp_xmit_retransmit_queue+0x8c/0x290 >>> RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 >>> RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 >>> RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 >>> RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>> R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 >>> FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >>> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) >>> Stack: >>> ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 >>> <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 >>> <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 >>> Call Trace: >>> <IRQ> >>> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >>> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >>> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >>> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >>> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >>> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >>> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >>> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >>> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >>> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >>> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >>> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >>> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >>> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >>> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >>> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >>> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >>> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >>> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >>> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >>> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >>> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >>> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >>> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >>> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >>> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >>> <EOI> >>> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >>> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >>> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >>> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >>> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >>> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >>> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >>> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >>> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 cd >>> 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 f4 >>> 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 >>> RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 >>> RSP <ffffc90000003a40> >>> CR2: 0000000000000000 >>> ---[ end trace d97d99c9ae1d52cc ]--- >>> Kernel panic - not syncing: Fatal exception in interrupt >>> Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 >>> Call Trace: >>> <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 >>> [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa >>> [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 >>> [<ffffffff8100f38e>] oops_end+0x9e/0xb0 >>> [<ffffffff81025b9a>] no_context+0x15a/0x250 >>> [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 >>> [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 >>> [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 >>> [<ffffffff8102639f>] do_page_fault+0x17f/0x260 >>> [<ffffffff8147eadf>] page_fault+0x1f/0x30 >>> [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 >>> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >>> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >>> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >>> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >>> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >>> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >>> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >>> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >>> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >>> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >>> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >>> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >>> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >>> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >>> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >>> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >>> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >>> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >>> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >>> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >>> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >>> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >>> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >>> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >>> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >>> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >>> <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >>> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >>> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >>> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >>> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >>> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >>> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >>> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >>> > > > Code: 00 eb 28 8b 83 d0 03 00 00 > 41 39 44 24 40 cmp %eax,0x40(%r12) > 0f 89 00 01 00 00 jns ... > 41 0f b6 cd movzbl %r13b,%ecx > 41 bd 2f 00 00 00 mov $0x2f000000,%r13d > 83 e1 03 and $0x3,%ecx > 0f 84 fc 00 00 00 je ... > 4d 8b 24 24 mov (%r12),%r12 skb = skb->next > <>49 8b 04 24 mov (%r12),%rax << NULL POINTER dereference >> > 4d 39 f4 cmp %r14,%r12 > 0f 18 08 prefetcht0 (%rax) > 0f 84 d9 00 00 00 je ... > 4c 3b a3 b8 01 cmp > > > crash is in > void tcp_xmit_retransmit_queue(struct sock *sk) > { > > << HERE >> tcp_for_write_queue_from(skb, sk) { > > } > > > Some skb in sk_write_queue has a NULL ->next pointer > > Strange thing is R14 and RAX =ffff8807e7420678 (&sk->sk_write_queue) > R14 is the stable value during the loop, while RAW is scratch register. > > I dont have full disassembly for this function, but I guess we just entered the loop > (or RAX should be really different at this point) > > So, maybe list head itself is corrupted (sk->sk_write_queue->next = NULL) > > or, retransmit_skb_hint problem ? (we forget to set it to NULL in some cases ?) > David, what do you think of following patch ? I wonder if we should reorganize code to add sanity checks in tcp_unlink_write_queue() that the skb we delete from queue is not still referenced. [PATCH] tcp: clear retrans hints in tcp_send_synack() There is a small possibility the skb we unlink from write queue is still referenced by retrans hints. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index fcd278a..b22a72d 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2201,6 +2201,7 @@ int tcp_send_synack(struct sock *sk) struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC); if (nskb == NULL) return -ENOMEM; + tcp_clear_all_retrans_hints(tcp_sk(sk)); tcp_unlink_write_queue(skb, sk); skb_header_release(nskb); __tcp_add_write_queue_head(sk, nskb); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-10-29 5:59 ` Eric Dumazet @ 2009-10-29 6:02 ` David Miller 2009-10-29 8:00 ` David Miller 1 sibling, 0 replies; 17+ messages in thread From: David Miller @ 2009-10-29 6:02 UTC (permalink / raw) To: eric.dumazet; +Cc: akpm, shemminger, netdev, kolo, bugzilla-daemon From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 29 Oct 2009 06:59:41 +0100 > David, what do you think of following patch ? > > I wonder if we should reorganize code to add sanity checks in tcp_unlink_write_queue() > that the skb we delete from queue is not still referenced. > > [PATCH] tcp: clear retrans hints in tcp_send_synack() > > There is a small possibility the skb we unlink from write queue > is still referenced by retrans hints. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Yes, the first thing I thought of when I saw this crash was the hints. I'll think this over. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-10-29 5:59 ` Eric Dumazet 2009-10-29 6:02 ` David Miller @ 2009-10-29 8:00 ` David Miller 2009-11-26 21:54 ` Ilpo Järvinen 1 sibling, 1 reply; 17+ messages in thread From: David Miller @ 2009-10-29 8:00 UTC (permalink / raw) To: eric.dumazet; +Cc: akpm, shemminger, netdev, kolo, bugzilla-daemon From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 29 Oct 2009 06:59:41 +0100 > [PATCH] tcp: clear retrans hints in tcp_send_synack() > > There is a small possibility the skb we unlink from write queue > is still referenced by retrans hints. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> So, this would only be true if we were dealing with a data packet here. We're not, this is a SYN+ACK which happens to be cloned in the write queue. The hint SKBs pointers can only point to real data packets. And we're only dealing with data packets once we enter established state, and when we enter established by definition we have unlinked and freed up any SYN and SYN+ACK SKBs in the write queue. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-10-29 8:00 ` David Miller @ 2009-11-26 21:54 ` Ilpo Järvinen 2009-11-26 23:37 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Ilpo Järvinen @ 2009-11-26 21:54 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, Andrew Morton, Stephen Hemminger, Netdev, kolo, bugzilla-daemon [-- Attachment #1: Type: TEXT/PLAIN, Size: 1981 bytes --] On Thu, 29 Oct 2009, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 29 Oct 2009 06:59:41 +0100 > > > [PATCH] tcp: clear retrans hints in tcp_send_synack() > > > > There is a small possibility the skb we unlink from write queue > > is still referenced by retrans hints. > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > So, this would only be true if we were dealing with a data > packet here. We're not, this is a SYN+ACK which happens to > be cloned in the write queue. > > The hint SKBs pointers can only point to real data packets. > > And we're only dealing with data packets once we enter established > state, and when we enter established by definition we have unlinked > and freed up any SYN and SYN+ACK SKBs in the write queue. How about this then... Does the original reporter have NFS in use? [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) Eric Dumazet mentioned in a context of another problem: "Well, it seems NFS reuses its socket, so maybe we miss some cleaning as spotted in this old patch" I've not check under which conditions that actually happens but if true, we need to make sure we don't accidently leave stale hints behind when the write queue had to be purged (whether reusing with NFS can actually happen if purging took place is something I'm not sure of). ...At least it compiles. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/net/tcp.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..6b13faa 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1228,6 +1228,7 @@ static inline void tcp_write_queue_purge(struct sock *sk) while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) sk_wmem_free_skb(sk, skb); sk_mem_reclaim(sk); + tcp_clear_all_retrans_hints(tcp_sk(sk)); } static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-11-26 21:54 ` Ilpo Järvinen @ 2009-11-26 23:37 ` David Miller 2009-11-27 6:17 ` Eric Dumazet 2009-12-02 23:10 ` David Miller 2009-12-03 6:24 ` David Miller 2 siblings, 1 reply; 17+ messages in thread From: David Miller @ 2009-11-26 23:37 UTC (permalink / raw) To: ilpo.jarvinen Cc: eric.dumazet, akpm, shemminger, netdev, kolo, bugzilla-daemon From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > How about this then... Does the original reporter have NFS in use? > > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) I must be getting old and senile, but I specifically remembered that we prevented a socket from ever being bound again once it has been bound one time specifically so we didn't have to deal with issues like this. I really don't think it's valid for NFS to reuse the socket structure like this over and over again. And that's why only NFS can reproduce this, the interfaces provided userland can't actually go through this sequence after a socket goes down one time all the way to close. Do we really want to audit each and every odd member of the socket structure from the generic portion all the way down to INET and TCP specifics to figure out what needs to get zero'd out? So much relies upon the one-time full zero out during sock allocation. Let's fix NFS instead. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-11-26 23:37 ` David Miller @ 2009-11-27 6:17 ` Eric Dumazet 0 siblings, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2009-11-27 6:17 UTC (permalink / raw) To: David Miller Cc: ilpo.jarvinen, akpm, shemminger, netdev, kolo, bugzilla-daemon, Trond Myklebust David Miller a écrit : > I must be getting old and senile, but I specifically remembered that > we prevented a socket from ever being bound again once it has been > bound one time specifically so we didn't have to deal with issues > like this. > > I really don't think it's valid for NFS to reuse the socket structure > like this over and over again. And that's why only NFS can reproduce > this, the interfaces provided userland can't actually go through this > sequence after a socket goes down one time all the way to close. > > Do we really want to audit each and every odd member of the socket > structure from the generic portion all the way down to INET and > TCP specifics to figure out what needs to get zero'd out? An audit is always welcomed, we might find bugs :) > > So much relies upon the one-time full zero out during sock allocation. > > Let's fix NFS instead. bugzilla reference : http://bugzilla.kernel.org/show_bug.cgi?id=14580 Trond said : NFS MUST reuse the same port because on most servers, the replay cache is keyed to the port number. In other words, when we replay an RPC call, the server will only recognise it as a replay if it originates from the same port. See http://www.connectathon.org/talks96/werme1.html Please note the socket stays bound to a given local port. We want to connect() it to a possible other target, that's all. In NFS case 'other target' is in fact the same target, but this is a special case of a more general one. Hmm... if an application wants to keep a local port for itself (not allowing another one to get this (ephemeral ?) port during the close()/socket()/bind() window), this is the only way. TCP state machine allows this IMHO. google for "tcp AF_UNSPEC connect" to find many references and man pages for this stuff. http://kerneltrap.org/Linux/Connect_Specification_versus_Man_Page How other Unixes / OS handle this ? How many applications use this trick ? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-11-26 21:54 ` Ilpo Järvinen 2009-11-26 23:37 ` David Miller @ 2009-12-02 23:10 ` David Miller 2009-12-03 6:24 ` David Miller 2 siblings, 0 replies; 17+ messages in thread From: David Miller @ 2009-12-02 23:10 UTC (permalink / raw) To: ilpo.jarvinen Cc: eric.dumazet, akpm, shemminger, netdev, kolo, bugzilla-daemon From: "Ilpo J^[$(D+#^[(Brvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) > > Eric Dumazet mentioned in a context of another problem: > > "Well, it seems NFS reuses its socket, so maybe we miss some > cleaning as spotted in this old patch" > > I've not check under which conditions that actually happens but > if true, we need to make sure we don't accidently leave stale > hints behind when the write queue had to be purged (whether reusing > with NFS can actually happen if purging took place is something I'm > not sure of). > > ...At least it compiles. > > Signed-off-by: Ilpo J^[$(D+#^[(Brvinen <ilpo.jarvinen@helsinki.fi> I think this is a good safety net even if it doesn't specifically fix a specific problem. But I'd like to see this patch tested by the person seeing the problem so we can know whether that is fixed or not. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-11-26 21:54 ` Ilpo Järvinen 2009-11-26 23:37 ` David Miller 2009-12-02 23:10 ` David Miller @ 2009-12-03 6:24 ` David Miller 2010-03-18 21:04 ` Andrew Morton 2 siblings, 1 reply; 17+ messages in thread From: David Miller @ 2009-12-03 6:24 UTC (permalink / raw) To: ilpo.jarvinen Cc: eric.dumazet, akpm, shemminger, netdev, kolo, bugzilla-daemon From: "Ilpo J^[$(D+#^[(Brvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) Ok, since Linus just released 2.6.32 I'm tossing this into net-next-2.6 so it gets wider exposure. I still want to see test results from the bug reporter, and if it fixes things we can toss this into -stable too. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2009-12-03 6:24 ` David Miller @ 2010-03-18 21:04 ` Andrew Morton 2010-03-19 15:52 ` Ilpo Järvinen 0 siblings, 1 reply; 17+ messages in thread From: Andrew Morton @ 2010-03-18 21:04 UTC (permalink / raw) To: David Miller Cc: ilpo.jarvinen, eric.dumazet, shemminger, netdev, kolo, bugzilla-daemon On Wed, 02 Dec 2009 22:24:46 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: "Ilpo J__rvinen" <ilpo.jarvinen@helsinki.fi> > Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > > > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) > > Ok, since Linus just released 2.6.32 I'm tossing this into net-next-2.6 > so it gets wider exposure. > > I still want to see test results from the bug reporter, and if it fixes > things we can toss this into -stable too. Despite my request to take this to email, quite a few people have been jumping onto this report via bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=14470 Bit of a pita, but it'd be worth someone taking a look to ensure that we're all talking about the same bug. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Bug 14470] New: freez in TCP stack 2010-03-18 21:04 ` Andrew Morton @ 2010-03-19 15:52 ` Ilpo Järvinen 0 siblings, 0 replies; 17+ messages in thread From: Ilpo Järvinen @ 2010-03-19 15:52 UTC (permalink / raw) To: Andrew Morton Cc: David Miller, eric.dumazet, Stephen Hemminger, Netdev, kolo, bugzilla-daemon On Thu, 18 Mar 2010, Andrew Morton wrote: > On Wed, 02 Dec 2009 22:24:46 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > > > From: "Ilpo J__rvinen" <ilpo.jarvinen@helsinki.fi> > > Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > > > > > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) > > > > Ok, since Linus just released 2.6.32 I'm tossing this into net-next-2.6 > > so it gets wider exposure. > > > > I still want to see test results from the bug reporter, and if it fixes > > things we can toss this into -stable too. > > Despite my request to take this to email, quite a few people have been > jumping onto this report via bugzilla: > http://bugzilla.kernel.org/show_bug.cgi?id=14470 > > Bit of a pita, but it'd be worth someone taking a look to ensure that > we're all talking about the same bug. Could one try with this debug patch: http://marc.info/?l=linux-kernel&m=126624014117610&w=2 It should prevent crashing too. -- i. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Fw: [Bug 14470] New: freez in TCP stack 2009-10-29 5:35 ` Eric Dumazet 2009-10-29 5:59 ` Eric Dumazet @ 2009-10-29 12:58 ` Ilpo Järvinen 2009-10-29 14:08 ` Eric Dumazet 2009-10-30 20:18 ` Herbert Xu 1 sibling, 2 replies; 17+ messages in thread From: Ilpo Järvinen @ 2009-10-29 12:58 UTC (permalink / raw) To: Eric Dumazet, David Miller Cc: Andrew Morton, Stephen Hemminger, Netdev, kolo, bugzilla-daemon [-- Attachment #1: Type: TEXT/PLAIN, Size: 10018 bytes --] On Thu, 29 Oct 2009, Eric Dumazet wrote: > Andrew Morton a écrit : > > On Mon, 26 Oct 2009 08:41:32 -0700 > > Stephen Hemminger <shemminger@linux-foundation.org> wrote: > > > >> > >> Begin forwarded message: > >> > >> Date: Mon, 26 Oct 2009 12:47:22 GMT > >> From: bugzilla-daemon@bugzilla.kernel.org > >> To: shemminger@linux-foundation.org > >> Subject: [Bug 14470] New: freez in TCP stack > >> > > > > Stephen, please retain the bugzilla and reporter email cc's when > > forwarding a report to a mailing list. > > > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=14470 > >> > >> Summary: freez in TCP stack > >> Product: Networking > >> Version: 2.5 > >> Kernel Version: 2.6.31 > >> Platform: All > >> OS/Version: Linux > >> Tree: Mainline > >> Status: NEW > >> Severity: high > >> Priority: P1 > >> Component: IPV4 > >> AssignedTo: shemminger@linux-foundation.org > >> ReportedBy: kolo@albatani.cz > >> Regression: No > >> > >> > >> We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears > >> usualy under a high network trafic ( around 100Mbit/s) but it is not a rule it > >> has happened even on low trafic. > >> > >> Servers are used as reverse http proxy (varnish). > >> > >> On 6 equal servers this panic happens aprox 2 times a day depending on network > >> load. Machine completly freezes till the management watchdog reboots. > >> > > > > Twice a day on six separate machines. That ain't no hardware glitch. > > > > Vaclav, are you able to say whether this is a regression? Did those > > machines run 2.6.30 (for example)? > > > > Thanks. > > > >> We had to put serial console on these servers to catch the oops. Is there > >> anything else We can do to debug this? > >> The RIP is always the same: > >> > >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > >> tcp_xmit_retransmit_queue+0x8c/0x290 > >> > >> rest of the oops always differs a litle ... here is an example: > >> > >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > >> tcp_xmit_retransmit_queue+0x8c/0x290 > >> RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 > >> RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 > >> RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 > >> RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >> R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 > >> FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 > >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > >> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) > >> Stack: > >> ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 > >> <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 > >> <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 > >> Call Trace: > >> <IRQ> > >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > >> <EOI> > >> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 > >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 cd > >> 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 f4 > >> 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 > >> RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 > >> RSP <ffffc90000003a40> > >> CR2: 0000000000000000 > >> ---[ end trace d97d99c9ae1d52cc ]--- > >> Kernel panic - not syncing: Fatal exception in interrupt > >> Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 > >> Call Trace: > >> <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 > >> [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa > >> [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 > >> [<ffffffff8100f38e>] oops_end+0x9e/0xb0 > >> [<ffffffff81025b9a>] no_context+0x15a/0x250 > >> [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 > >> [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 > >> [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 > >> [<ffffffff8102639f>] do_page_fault+0x17f/0x260 > >> [<ffffffff8147eadf>] page_fault+0x1f/0x30 > >> [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 > >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > >> <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 > >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > >> > > > Code: 00 eb 28 8b 83 d0 03 00 00 > 41 39 44 24 40 cmp %eax,0x40(%r12) > 0f 89 00 01 00 00 jns ... > 41 0f b6 cd movzbl %r13b,%ecx > 41 bd 2f 00 00 00 mov $0x2f000000,%r13d > 83 e1 03 and $0x3,%ecx > 0f 84 fc 00 00 00 je ... > 4d 8b 24 24 mov (%r12),%r12 skb = skb->next > <>49 8b 04 24 mov (%r12),%rax << NULL POINTER dereference >> > 4d 39 f4 cmp %r14,%r12 > 0f 18 08 prefetcht0 (%rax) > 0f 84 d9 00 00 00 je ... > 4c 3b a3 b8 01 cmp > > > crash is in > void tcp_xmit_retransmit_queue(struct sock *sk) > { > > << HERE >> tcp_for_write_queue_from(skb, sk) { > > } > > > Some skb in sk_write_queue has a NULL ->next pointer > > Strange thing is R14 and RAX =ffff8807e7420678 (&sk->sk_write_queue) > R14 is the stable value during the loop, while RAW is scratch register. > > I dont have full disassembly for this function, but I guess we just > entered the loop (or RAX should be really different at this point) > > So, maybe list head itself is corrupted (sk->sk_write_queue->next = NULL) One more alternative along those lines could perhaps be: We enter with empty write_queue there and with the hint being null, so we take the else branch... and skb_peek then gives us the NULL ptr. However, I cannot see how this could happen as all branches trap with return before the reach tcp_xmit_retransmit_queue. > or, retransmit_skb_hint problem ? (we forget to set it to NULL in some > cases ?) ...I don't understand how a stale reference would yield to a consistent NULL ptr crash there rather than hard to track corruption for most of the times and random crashes then here and there. Or perhaps we were just very lucky to immediately get only those reports which point out to the right track :-). ...I tried to find what is wrong with it but sadly came up only ah-this-is-it-oh-wait-it's-ok type of things. -- i. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Fw: [Bug 14470] New: freez in TCP stack 2009-10-29 12:58 ` Fw: " Ilpo Järvinen @ 2009-10-29 14:08 ` Eric Dumazet 2009-10-30 20:18 ` Herbert Xu 1 sibling, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2009-10-29 14:08 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, Andrew Morton, Stephen Hemminger, Netdev, kolo, bugzilla-daemon > ...I don't understand how a stale reference would yield to a consistent > NULL ptr crash there rather than hard to track corruption for most of the > times and random crashes then here and there. Or perhaps we were just very > lucky to immediately get only those reports which point out to the right > track :-). > When a skb is freed, and re-allocated, we clear most of its fields in __alloc_skb() memset(skb, 0, offsetof(struct sk_buff, tail)); Then if this skb is freed again, not queued anywhere, its skb->next stays NULL So if we have a stale reference to a freed skb, we can : - Get a NULL pointer, or a poisonned value (if SLUB_DEBUG) Here is a debug patch to check we dont have stale pointers, maybe this will help ?sync [PATCH] tcp: check stale pointers in tcp_unlink_write_queue() In order to track some obscure bug, we check in tcp_unlink_write_queue() if we dont have stale references to unlinked skb Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/tcp.h | 4 ++++ net/ipv4/tcp.c | 2 +- net/ipv4/tcp_input.c | 4 ++-- net/ipv4/tcp_output.c | 8 ++++---- 4 files changed, 11 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 740d09b..09da342 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1357,6 +1357,10 @@ static inline void tcp_insert_write_queue_before(struct sk_buff *new, static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk) { + WARN_ON(skb == tcp_sk(sk)->retransmit_skb_hint); + WARN_ON(skb == tcp_sk(sk)->lost_skb_hint); + WARN_ON(skb == tcp_sk(sk)->scoreboard_skb_hint); + WARN_ON(skb == sk->sk_send_head); __skb_unlink(skb, &sk->sk_write_queue); } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e0cfa63..328bdb1 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1102,11 +1102,11 @@ out: do_fault: if (!skb->len) { - tcp_unlink_write_queue(skb, sk); /* It is the one place in all of TCP, except connection * reset, where we can be unlinking the send_head. */ tcp_check_send_head(sk, skb); + tcp_unlink_write_queue(skb, sk); sk_wmem_free_skb(sk, skb); } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ba0eab6..fccc6e9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3251,13 +3251,13 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, if (!fully_acked) break; - tcp_unlink_write_queue(skb, sk); - sk_wmem_free_skb(sk, skb); tp->scoreboard_skb_hint = NULL; if (skb == tp->retransmit_skb_hint) tp->retransmit_skb_hint = NULL; if (skb == tp->lost_skb_hint) tp->lost_skb_hint = NULL; + tcp_unlink_write_queue(skb, sk); + sk_wmem_free_skb(sk, skb); } if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una))) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 616c686..196171d 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1791,6 +1791,10 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) tcp_highest_sack_combine(sk, next_skb, skb); + /* changed transmit queue under us so clear hints */ + tcp_clear_retrans_hints_partial(tp); + if (next_skb == tp->retransmit_skb_hint) + tp->retransmit_skb_hint = skb; tcp_unlink_write_queue(next_skb, sk); skb_copy_from_linear_data(next_skb, skb_put(skb, next_skb_size), @@ -1813,10 +1817,6 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) */ TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS; - /* changed transmit queue under us so clear hints */ - tcp_clear_retrans_hints_partial(tp); - if (next_skb == tp->retransmit_skb_hint) - tp->retransmit_skb_hint = skb; tcp_adjust_pcount(sk, next_skb, tcp_skb_pcount(next_skb)); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Fw: [Bug 14470] New: freez in TCP stack 2009-10-29 12:58 ` Fw: " Ilpo Järvinen 2009-10-29 14:08 ` Eric Dumazet @ 2009-10-30 20:18 ` Herbert Xu 1 sibling, 0 replies; 17+ messages in thread From: Herbert Xu @ 2009-10-30 20:18 UTC (permalink / raw) To: Ilpo Järvinen Cc: eric.dumazet, davem, akpm, shemminger, netdev, kolo, bugzilla-daemon Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > One more alternative along those lines could perhaps be: > > We enter with empty write_queue there and with the hint being null, so we > take the else branch... and skb_peek then gives us the NULL ptr. However, > I cannot see how this could happen as all branches trap with return > before the reach tcp_xmit_retransmit_queue. Why don't we add a WARN_ON in there and see if it triggers? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2010-03-19 15:52 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-10-26 15:41 Fw: [Bug 14470] New: freez in TCP stack Stephen Hemminger 2009-10-28 22:13 ` Andrew Morton 2009-10-28 22:27 ` Denys Fedoryschenko 2009-10-29 5:35 ` Eric Dumazet 2009-10-29 5:59 ` Eric Dumazet 2009-10-29 6:02 ` David Miller 2009-10-29 8:00 ` David Miller 2009-11-26 21:54 ` Ilpo Järvinen 2009-11-26 23:37 ` David Miller 2009-11-27 6:17 ` Eric Dumazet 2009-12-02 23:10 ` David Miller 2009-12-03 6:24 ` David Miller 2010-03-18 21:04 ` Andrew Morton 2010-03-19 15:52 ` Ilpo Järvinen 2009-10-29 12:58 ` Fw: " Ilpo Järvinen 2009-10-29 14:08 ` Eric Dumazet 2009-10-30 20:18 ` Herbert Xu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.