* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
[not found] <D2CA09D3-A93C-48CF-A23B-DC1B76D66818@rcs-rds.ro>
@ 2013-05-21 21:01 ` Eric Dumazet
2013-05-22 8:36 ` Daniel Petre
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-21 21:01 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
> Hello,
> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
> is corrupted" when using gre tunnels and network interface flaps and ip_gre
> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
>
> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down,
> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
>
> crash> bt
>
> PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
> #2 [ffff88003fc038b8] panic at ffffffff81540026
> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
> --- <IRQ stack> ---
> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
> [exception RIP: mwait_idle+95]
> RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
> RAX: 0000000000000000 RBX: ffffffff8154194e RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
> RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
> ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
>
> crash> log
>
> [..]
>
> [ 6772.560124] e1000e: eth3 NIC Link is Down
> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: Rx
> [ 6928.050119] e1000e: eth3 NIC Link is Down
> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: Rx
> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
> is corrupted in: ffffffff814d5fec
>
> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
> [ 6945.738189] Call Trace:
> [ 6945.738212] <IRQ> [<ffffffff8154001f>] ? panic+0xbf/0x1c9
> [ 6945.738245] [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
> [ 6945.738271] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> [ 6945.738296] [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
> [ 6945.738320] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> [ 6945.738344] [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
> [ 6945.738369] [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
> [ 6945.738393] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
> [ 6945.738418] [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
> [ 6945.738443] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
> [ 6945.738470] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
> [ 6945.738494] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
> [ 6945.738518] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
> [ 6945.738542] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
> [ 6945.738567] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
> [ 6945.738591] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
> [ 6945.738616] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
> [ 6945.738642] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
> [ 6945.738667] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
> [ 6945.738690] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
> [ 6945.738715] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
> [ 6945.738739] [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
> [ 6945.738762] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
> [ 6945.738786] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
> [ 6945.738808] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
> [ 6945.738831] [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
> [ 6945.738854] <EOI> [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
> [ 6945.738884] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
> [ 6945.738907] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
> [ 6945.738930] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
> [ 6945.738954] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
> [ 6945.738978] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
>
>
> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
> ---
>
> --- linux-3.8.13/net/ipv4/ip_gre.c.orig 2013-05-21 20:28:37.340537935 +0300
> +++ linux-3.8.13/net/ipv4/ip_gre.c 2013-05-21 20:32:47.248722835 +0300
> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
> gro_cells_receive(&tunnel->gro_cells, skb);
> return 0;
> }
> - icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> + /* don't send icmp destination unreachable if tunnel is down
> + the IP stack gets corrupted and machine panics!
> + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
>
> drop:
> kfree_skb(skb);
Hmm... can you reproduce this bug on latest kernel ?
(preferably David Miller net tree :
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
)
Thanks
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-21 21:01 ` [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach Eric Dumazet
@ 2013-05-22 8:36 ` Daniel Petre
2013-05-22 11:37 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-22 8:36 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 05/22/2013 12:01 AM, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
>> Hello,
>> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
>> is corrupted" when using gre tunnels and network interface flaps and ip_gre
>> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
>>
>> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down,
>> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
>>
>> crash> bt
>>
>> PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
>> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
>> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
>> #2 [ffff88003fc038b8] panic at ffffffff81540026
>> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
>> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
>> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
>> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
>> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
>> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
>> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
>> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
>> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
>> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
>> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
>> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
>> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
>> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
>> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
>> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
>> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
>> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
>> --- <IRQ stack> ---
>> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
>> [exception RIP: mwait_idle+95]
>> RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
>> RAX: 0000000000000000 RBX: ffffffff8154194e RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
>> RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
>> ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
>> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
>>
>> crash> log
>>
>> [..]
>>
>> [ 6772.560124] e1000e: eth3 NIC Link is Down
>> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>> Control: Rx
>> [ 6928.050119] e1000e: eth3 NIC Link is Down
>> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>> Control: Rx
>> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
>> is corrupted in: ffffffff814d5fec
>>
>> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
>> [ 6945.738189] Call Trace:
>> [ 6945.738212] <IRQ> [<ffffffff8154001f>] ? panic+0xbf/0x1c9
>> [ 6945.738245] [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
>> [ 6945.738271] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>> [ 6945.738296] [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
>> [ 6945.738320] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>> [ 6945.738344] [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
>> [ 6945.738369] [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
>> [ 6945.738393] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
>> [ 6945.738418] [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
>> [ 6945.738443] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
>> [ 6945.738470] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
>> [ 6945.738494] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
>> [ 6945.738518] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
>> [ 6945.738542] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
>> [ 6945.738567] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
>> [ 6945.738591] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
>> [ 6945.738616] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
>> [ 6945.738642] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
>> [ 6945.738667] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
>> [ 6945.738690] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
>> [ 6945.738715] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
>> [ 6945.738739] [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
>> [ 6945.738762] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
>> [ 6945.738786] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
>> [ 6945.738808] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
>> [ 6945.738831] [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
>> [ 6945.738854] <EOI> [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
>> [ 6945.738884] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
>> [ 6945.738907] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
>> [ 6945.738930] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
>> [ 6945.738954] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
>> [ 6945.738978] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
>>
>>
>> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
>> ---
>>
>> --- linux-3.8.13/net/ipv4/ip_gre.c.orig 2013-05-21 20:28:37.340537935 +0300
>> +++ linux-3.8.13/net/ipv4/ip_gre.c 2013-05-21 20:32:47.248722835 +0300
>> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
>> gro_cells_receive(&tunnel->gro_cells, skb);
>> return 0;
>> }
>> - icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
>> + /* don't send icmp destination unreachable if tunnel is down
>> + the IP stack gets corrupted and machine panics!
>> + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
>>
>> drop:
>> kfree_skb(skb);
>
> Hmm... can you reproduce this bug on latest kernel ?
>
> (preferably David Miller net tree :
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
> )
Hello Eric,
unfortunately the machine we have worked on the last weeks cannot be
used anymore for tests as it was and still is a production router..
I can tell you we have tested kernel 3.6.3, 3.6.8, 3.7.1, 3.7.6, 3.7.10
and right now it runs 3.8.13 with intel e1000e and broadcom tg3 nics and
each time the interface where we have the few gre tunnels goes down the
debian squeeze up-to-date router will panic.
I might be able to get a similar setup in the next weeks but it's a
little uncertain.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-22 8:36 ` Daniel Petre
@ 2013-05-22 11:37 ` Eric Dumazet
2013-05-22 11:49 ` Daniel Petre
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-22 11:37 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Wed, 2013-05-22 at 11:36 +0300, Daniel Petre wrote:
> On 05/22/2013 12:01 AM, Eric Dumazet wrote:
> > On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
> >> Hello,
> >> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
> >> is corrupted" when using gre tunnels and network interface flaps and ip_gre
> >> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
> >>
> >> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down,
> >> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
> >>
> >> crash> bt
> >>
> >> PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
> >> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
> >> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
> >> #2 [ffff88003fc038b8] panic at ffffffff81540026
> >> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
> >> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
> >> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
> >> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
> >> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
> >> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
> >> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
> >> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
> >> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
> >> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
> >> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
> >> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
> >> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
> >> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
> >> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
> >> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
> >> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
> >> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
> >> --- <IRQ stack> ---
> >> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
> >> [exception RIP: mwait_idle+95]
> >> RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
> >> RAX: 0000000000000000 RBX: ffffffff8154194e RCX: 0000000000000000
> >> RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
> >> RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
> >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >> R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
> >> ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
> >> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
> >>
> >> crash> log
> >>
> >> [..]
> >>
> >> [ 6772.560124] e1000e: eth3 NIC Link is Down
> >> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> >> Control: Rx
> >> [ 6928.050119] e1000e: eth3 NIC Link is Down
> >> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> >> Control: Rx
> >> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
> >> is corrupted in: ffffffff814d5fec
> >>
> >> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
> >> [ 6945.738189] Call Trace:
> >> [ 6945.738212] <IRQ> [<ffffffff8154001f>] ? panic+0xbf/0x1c9
> >> [ 6945.738245] [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
> >> [ 6945.738271] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> >> [ 6945.738296] [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
> >> [ 6945.738320] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> >> [ 6945.738344] [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
> >> [ 6945.738369] [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
> >> [ 6945.738393] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
> >> [ 6945.738418] [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
> >> [ 6945.738443] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
> >> [ 6945.738470] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
> >> [ 6945.738494] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
> >> [ 6945.738518] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
> >> [ 6945.738542] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
> >> [ 6945.738567] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
> >> [ 6945.738591] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
> >> [ 6945.738616] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
> >> [ 6945.738642] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
> >> [ 6945.738667] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
> >> [ 6945.738690] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
> >> [ 6945.738715] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
> >> [ 6945.738739] [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
> >> [ 6945.738762] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
> >> [ 6945.738786] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
> >> [ 6945.738808] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
> >> [ 6945.738831] [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
> >> [ 6945.738854] <EOI> [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
> >> [ 6945.738884] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
> >> [ 6945.738907] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
> >> [ 6945.738930] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
> >> [ 6945.738954] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
> >> [ 6945.738978] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
> >>
> >>
> >> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
> >> ---
> >>
> >> --- linux-3.8.13/net/ipv4/ip_gre.c.orig 2013-05-21 20:28:37.340537935 +0300
> >> +++ linux-3.8.13/net/ipv4/ip_gre.c 2013-05-21 20:32:47.248722835 +0300
> >> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
> >> gro_cells_receive(&tunnel->gro_cells, skb);
> >> return 0;
> >> }
> >> - icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> >> + /* don't send icmp destination unreachable if tunnel is down
> >> + the IP stack gets corrupted and machine panics!
> >> + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
> >>
> >> drop:
> >> kfree_skb(skb);
> >
> > Hmm... can you reproduce this bug on latest kernel ?
> >
> > (preferably David Miller net tree :
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
> > )
>
> Hello Eric,
> unfortunately the machine we have worked on the last weeks cannot be
> used anymore for tests as it was and still is a production router..
>
> I can tell you we have tested kernel 3.6.3, 3.6.8, 3.7.1, 3.7.6, 3.7.10
> and right now it runs 3.8.13 with intel e1000e and broadcom tg3 nics and
> each time the interface where we have the few gre tunnels goes down the
> debian squeeze up-to-date router will panic.
>
> I might be able to get a similar setup in the next weeks but it's a
> little uncertain.
>
What's the setup of the machine exactly ?
You receive packets from e1000e, add forward them through a GRE tunnel
via tg3 ?
Please give us some details so that we can reproduce the bug and fix it.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-22 11:37 ` Eric Dumazet
@ 2013-05-22 11:49 ` Daniel Petre
2013-05-22 11:53 ` Eric Dumazet
2013-05-22 13:52 ` Eric Dumazet
0 siblings, 2 replies; 17+ messages in thread
From: Daniel Petre @ 2013-05-22 11:49 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 05/22/2013 02:37 PM, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 11:36 +0300, Daniel Petre wrote:
>> On 05/22/2013 12:01 AM, Eric Dumazet wrote:
>>> On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
>>>> Hello,
>>>> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
>>>> is corrupted" when using gre tunnels and network interface flaps and ip_gre
>>>> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
>>>>
>>>> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down,
>>>> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
>>>>
>>>> crash> bt
>>>>
>>>> PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
>>>> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
>>>> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
>>>> #2 [ffff88003fc038b8] panic at ffffffff81540026
>>>> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
>>>> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
>>>> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
>>>> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
>>>> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
>>>> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
>>>> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
>>>> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
>>>> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
>>>> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
>>>> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
>>>> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
>>>> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
>>>> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
>>>> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
>>>> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
>>>> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
>>>> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
>>>> --- <IRQ stack> ---
>>>> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
>>>> [exception RIP: mwait_idle+95]
>>>> RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
>>>> RAX: 0000000000000000 RBX: ffffffff8154194e RCX: 0000000000000000
>>>> RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
>>>> RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>> R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
>>>> ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
>>>> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
>>>>
>>>> crash> log
>>>>
>>>> [..]
>>>>
>>>> [ 6772.560124] e1000e: eth3 NIC Link is Down
>>>> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>>>> Control: Rx
>>>> [ 6928.050119] e1000e: eth3 NIC Link is Down
>>>> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>>>> Control: Rx
>>>> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
>>>> is corrupted in: ffffffff814d5fec
>>>>
>>>> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
>>>> [ 6945.738189] Call Trace:
>>>> [ 6945.738212] <IRQ> [<ffffffff8154001f>] ? panic+0xbf/0x1c9
>>>> [ 6945.738245] [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
>>>> [ 6945.738271] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>>>> [ 6945.738296] [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
>>>> [ 6945.738320] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>>>> [ 6945.738344] [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
>>>> [ 6945.738369] [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
>>>> [ 6945.738393] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
>>>> [ 6945.738418] [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
>>>> [ 6945.738443] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
>>>> [ 6945.738470] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
>>>> [ 6945.738494] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
>>>> [ 6945.738518] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
>>>> [ 6945.738542] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
>>>> [ 6945.738567] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
>>>> [ 6945.738591] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
>>>> [ 6945.738616] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
>>>> [ 6945.738642] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
>>>> [ 6945.738667] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
>>>> [ 6945.738690] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
>>>> [ 6945.738715] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
>>>> [ 6945.738739] [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
>>>> [ 6945.738762] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
>>>> [ 6945.738786] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
>>>> [ 6945.738808] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
>>>> [ 6945.738831] [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
>>>> [ 6945.738854] <EOI> [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
>>>> [ 6945.738884] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
>>>> [ 6945.738907] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
>>>> [ 6945.738930] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
>>>> [ 6945.738954] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
>>>> [ 6945.738978] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
>>>>
>>>>
>>>> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
>>>> ---
>>>>
>>>> --- linux-3.8.13/net/ipv4/ip_gre.c.orig 2013-05-21 20:28:37.340537935 +0300
>>>> +++ linux-3.8.13/net/ipv4/ip_gre.c 2013-05-21 20:32:47.248722835 +0300
>>>> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
>>>> gro_cells_receive(&tunnel->gro_cells, skb);
>>>> return 0;
>>>> }
>>>> - icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
>>>> + /* don't send icmp destination unreachable if tunnel is down
>>>> + the IP stack gets corrupted and machine panics!
>>>> + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
>>>>
>>>> drop:
>>>> kfree_skb(skb);
>>>
>>> Hmm... can you reproduce this bug on latest kernel ?
>>>
>>> (preferably David Miller net tree :
>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
>>> )
>>
>> Hello Eric,
>> unfortunately the machine we have worked on the last weeks cannot be
>> used anymore for tests as it was and still is a production router..
>>
>> I can tell you we have tested kernel 3.6.3, 3.6.8, 3.7.1, 3.7.6, 3.7.10
>> and right now it runs 3.8.13 with intel e1000e and broadcom tg3 nics and
>> each time the interface where we have the few gre tunnels goes down the
>> debian squeeze up-to-date router will panic.
>>
>> I might be able to get a similar setup in the next weeks but it's a
>> little uncertain.
>>
>
> What's the setup of the machine exactly ?
>
> You receive packets from e1000e, add forward them through a GRE tunnel
> via tg3 ?
>
> Please give us some details so that we can reproduce the bug and fix it.
>
Hello Eric,
some machines have e1000e others have tg3 (with mtu 1524) then we have
few gre tunnels on top of the downlink ethernet and the traffic goes up
the router via the second ethernet interface, nothing complicated.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-22 11:49 ` Daniel Petre
@ 2013-05-22 11:53 ` Eric Dumazet
2013-05-22 13:52 ` Eric Dumazet
1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-22 11:53 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:
>
> Hello Eric,
> some machines have e1000e others have tg3 (with mtu 1524) then we have
> few gre tunnels on top of the downlink ethernet and the traffic goes up
> the router via the second ethernet interface, nothing complicated.
Nothing complicated, but I am not going to spend one hour to reproduce
the bug by guessing what you do.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-22 11:49 ` Daniel Petre
2013-05-22 11:53 ` Eric Dumazet
@ 2013-05-22 13:52 ` Eric Dumazet
2013-05-22 15:40 ` Daniel Petre
1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-22 13:52 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:
> Hello Eric,
> some machines have e1000e others have tg3 (with mtu 1524) then we have
> few gre tunnels on top of the downlink ethernet and the traffic goes up
> the router via the second ethernet interface, nothing complicated.
>
The crash by the way is happening in icmp_send() called from
ipv4_link_failure(), called from ip_tunnel_xmit() when IPv6 destination
cannot be reached.
Your patch therefore should not 'avoid' the problem ...
My guess is kernel stack is too small to afford icmp_send() being called
twice (recursively)
Could you try :
net/ipv4/icmp.c | 72 ++++++++++++++++++++++++----------------------
1 file changed, 38 insertions(+), 34 deletions(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 76e10b4..e33f3b0 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -208,7 +208,7 @@ static struct sock *icmp_sk(struct net *net)
return net->ipv4.icmp_sk[smp_processor_id()];
}
-static inline struct sock *icmp_xmit_lock(struct net *net)
+static struct sock *icmp_xmit_lock(struct net *net)
{
struct sock *sk;
@@ -226,7 +226,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
return sk;
}
-static inline void icmp_xmit_unlock(struct sock *sk)
+static void icmp_xmit_unlock(struct sock *sk)
{
spin_unlock_bh(&sk->sk_lock.slock);
}
@@ -235,8 +235,8 @@ static inline void icmp_xmit_unlock(struct sock *sk)
* Send an ICMP frame.
*/
-static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
- struct flowi4 *fl4, int type, int code)
+static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
+ struct flowi4 *fl4, int type, int code)
{
struct dst_entry *dst = &rt->dst;
bool rc = true;
@@ -375,19 +375,22 @@ out_unlock:
icmp_xmit_unlock(sk);
}
-static struct rtable *icmp_route_lookup(struct net *net,
- struct flowi4 *fl4,
- struct sk_buff *skb_in,
- const struct iphdr *iph,
- __be32 saddr, u8 tos,
- int type, int code,
- struct icmp_bxm *param)
+struct icmp_send_data {
+ struct icmp_bxm icmp_param;
+ struct ipcm_cookie ipc;
+ struct flowi4 fl4;
+};
+
+static noinline_for_stack struct rtable *
+icmp_route_lookup(struct net *net, struct flowi4 *fl4,
+ struct sk_buff *skb_in, const struct iphdr *iph,
+ __be32 saddr, u8 tos, int type, int code,
+ struct icmp_bxm *param)
{
struct rtable *rt, *rt2;
struct flowi4 fl4_dec;
int err;
- memset(fl4, 0, sizeof(*fl4));
fl4->daddr = (param->replyopts.opt.opt.srr ?
param->replyopts.opt.opt.faddr : iph->saddr);
fl4->saddr = saddr;
@@ -482,14 +485,12 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
{
struct iphdr *iph;
int room;
- struct icmp_bxm icmp_param;
struct rtable *rt = skb_rtable(skb_in);
- struct ipcm_cookie ipc;
- struct flowi4 fl4;
__be32 saddr;
u8 tos;
struct net *net;
struct sock *sk;
+ struct icmp_send_data *data = NULL;
if (!rt)
goto out;
@@ -585,7 +586,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
IPTOS_PREC_INTERNETCONTROL) :
iph->tos;
- if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
+ data = kzalloc(sizeof(*data), GFP_ATOMIC);
+ if (!data)
+ goto out_unlock;
+
+ if (ip_options_echo(&data->icmp_param.replyopts.opt.opt, skb_in))
goto out_unlock;
@@ -593,23 +598,21 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
* Prepare data for ICMP header.
*/
- icmp_param.data.icmph.type = type;
- icmp_param.data.icmph.code = code;
- icmp_param.data.icmph.un.gateway = info;
- icmp_param.data.icmph.checksum = 0;
- icmp_param.skb = skb_in;
- icmp_param.offset = skb_network_offset(skb_in);
+ data->icmp_param.data.icmph.type = type;
+ data->icmp_param.data.icmph.code = code;
+ data->icmp_param.data.icmph.un.gateway = info;
+ data->icmp_param.skb = skb_in;
+ data->icmp_param.offset = skb_network_offset(skb_in);
inet_sk(sk)->tos = tos;
- ipc.addr = iph->saddr;
- ipc.opt = &icmp_param.replyopts.opt;
- ipc.tx_flags = 0;
+ data->ipc.addr = iph->saddr;
+ data->ipc.opt = &data->icmp_param.replyopts.opt;
- rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
- type, code, &icmp_param);
+ rt = icmp_route_lookup(net, &data->fl4, skb_in, iph, saddr, tos,
+ type, code, &data->icmp_param);
if (IS_ERR(rt))
goto out_unlock;
- if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
+ if (!icmpv4_xrlim_allow(net, rt, &data->fl4, type, code))
goto ende;
/* RFC says return as much as we can without exceeding 576 bytes. */
@@ -617,19 +620,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
room = dst_mtu(&rt->dst);
if (room > 576)
room = 576;
- room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
+ room -= sizeof(struct iphdr) + data->icmp_param.replyopts.opt.opt.optlen;
room -= sizeof(struct icmphdr);
- icmp_param.data_len = skb_in->len - icmp_param.offset;
- if (icmp_param.data_len > room)
- icmp_param.data_len = room;
- icmp_param.head_len = sizeof(struct icmphdr);
+ data->icmp_param.data_len = skb_in->len - data->icmp_param.offset;
+ if (data->icmp_param.data_len > room)
+ data->icmp_param.data_len = room;
+ data->icmp_param.head_len = sizeof(struct icmphdr);
- icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
+ icmp_push_reply(&data->icmp_param, &data->fl4, &data->ipc, &rt);
ende:
ip_rt_put(rt);
out_unlock:
icmp_xmit_unlock(sk);
+ kfree(data);
out:;
}
EXPORT_SYMBOL(icmp_send);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-22 13:52 ` Eric Dumazet
@ 2013-05-22 15:40 ` Daniel Petre
2013-05-23 8:47 ` Daniel Petre
0 siblings, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-22 15:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 05/22/2013 04:52 PM, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:
>
>> Hello Eric,
>> some machines have e1000e others have tg3 (with mtu 1524) then we have
>> few gre tunnels on top of the downlink ethernet and the traffic goes up
>> the router via the second ethernet interface, nothing complicated.
>>
>
> The crash by the way is happening in icmp_send() called from
> ipv4_link_failure(), called from ip_tunnel_xmit() when IPv6 destination
> cannot be reached.
>
> Your patch therefore should not 'avoid' the problem ...
>
> My guess is kernel stack is too small to afford icmp_send() being called
> twice (recursively)
>
> Could you try :
>
Hello Eric,
thanks for the patch, we managed to compile and push the kernel live,
it went in panic when we shut the port to the server..
crash> bt
PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88003fc05df0] machine_kexec at ffffffff81027430
#1 [ffff88003fc05e40] crash_kexec at ffffffff8107da80
#2 [ffff88003fc05f10] oops_end at ffffffff81005bf8
#3 [ffff88003fc05f30] do_stack_segment at ffffffff8100365f
#4 [ffff88003fc05f50] retint_signal at ffffffff81542d12
[exception RIP: __kmalloc+144]
RIP: ffffffff810d0a20 RSP: ffff88003fc03a30 RFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88003d672a00 RCX: 00000000003c1bf9
RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI: 0000000000013ba0
RBP: 37f5089fae060a80 R8: ffffffff814d5def R9: ffff88003fc03a80
R10: 00000000557809c3 R11: ffff88003e1053c0 R12: ffff88003e001240
R13: 0000000000008020 R14: 0000000000000000 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <STACKFAULT exception stack> ---
#5 [ffff88003fc03a30] __kmalloc at ffffffff810d0a20
#6 [ffff88003fc03a58] icmp_send at ffffffff814d5def
#7 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#8 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#9 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#10 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
#11 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
#12 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
#13 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
#14 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
#15 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
#16 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
#17 [ffff88003fc03f38] segment_not_present at ffffffff8154438c
#18 [ffff88003fc03f70] irq_exit at ffffffff8103e9cd
#19 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
#20 [ffff88003fc03fb0] save_paranoid at ffffffff81542b6a
--- <IRQ stack> ---
#21 [ffffffff81801ea8] save_paranoid at ffffffff81542b6a
[exception RIP: mwait_idle+95]
RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff8154189e RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
#22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
---------------------
[ 645.650121] e1000e: eth3 NIC Link is Down
[ 664.596968] stack segment: 0000 [#1] SMP
[ 664.597121] Modules linked in: coretemp
[ 664.597264] CPU 0
[ 664.597309] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #4 IBM IBM
System x3250 M2
[ 664.597447] RIP: 0010:[<ffffffff810d0a20>] [<ffffffff810d0a20>]
__kmalloc+0x90/0x180
[ 664.597559] RSP: 0018:ffff88003fc03a30 EFLAGS: 00010202
[ 664.597621] RAX: 0000000000000000 RBX: ffff88003d672a00 RCX:
00000000003c1bf9
[ 664.597687] RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI:
0000000000013ba0
[ 664.597752] RBP: 37f5089fae060a80 R08: ffffffff814d5def R09:
ffff88003fc03a80
[ 664.597817] R10: 00000000557809c3 R11: ffff88003e1053c0 R12:
ffff88003e001240
[ 664.597882] R13: 0000000000008020 R14: 0000000000000000 R15:
0000000000000001
[ 664.597948] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[ 664.598015] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 664.598077] CR2: 00007fefa9e458e0 CR3: 000000003d848000 CR4:
00000000000007f0
[ 664.598143] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 664.598208] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 664.598273] Process swapper/0 (pid: 0, threadinfo ffffffff81800000,
task ffffffff81813420)
[ 664.598340] Stack:
[ 664.598396] 00000000c3097855 ffff88003d672a00 0000000000000003
0000000000000001
[ 664.598627] ffff880039ead70e ffffffff814d5def ffff88003ce11840
0000000000000246
[ 664.598859] ffff88003d0b4000 ffffffff814a2beb 0000000000010018
ffff88003e1053c0
[ 664.599090] Call Trace:
[ 664.599147] <IRQ>
[ 664.599190]
[ 664.599289] [<ffffffff814d5def>] ? icmp_send+0x11f/0x390
[ 664.599353] [<ffffffff814a2beb>] ? __ip_rt_update_pmtu+0xbb/0x110
[ 664.599418] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
[ 664.599482] [<ffffffff814e78b5>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
[ 664.599547] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
[ 664.599612] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
[ 664.599676] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
[ 664.599739] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
[ 664.600002] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
[ 664.600002] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
[ 664.600002] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
[ 664.600002] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
[ 664.600002] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
[ 664.600002] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
[ 664.600002] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
[ 664.600002] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
[ 664.600002] [<ffffffff8154438c>] ? call_softirq+0x1c/0x30
[ 664.600002] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
[ 664.600002] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
[ 664.600002] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
[ 664.600002] [<ffffffff81542b6a>] ? common_interrupt+0x6a/0x6a
[ 664.600002] <EOI>
[ 664.600002]
[ 664.600002] [<ffffffff8154189e>] ? __schedule+0x26e/0x5b0
[ 664.600002] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
[ 664.600002] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
[ 664.600002] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
[ 664.600002] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
[ 664.600002] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
[ 664.600002] Code: 28 49 8b 0c 24 65 48 03 0c 25 88 cc 00 00 48 8b 51
08 48 8b 29 48 85 ed 0f 84 d3 00 00 00 49 63 44 24 20 49 8b 3c 24 48 8d
4a 01 <48> 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 3c 01 75 c2 49
[ 664.600002] RIP [<ffffffff810d0a20>] __kmalloc+0x90/0x180
[ 664.600002] RSP <ffff88003fc03a30>
> net/ipv4/icmp.c | 72 ++++++++++++++++++++++++----------------------
> 1 file changed, 38 insertions(+), 34 deletions(-)
>
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 76e10b4..e33f3b0 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -208,7 +208,7 @@ static struct sock *icmp_sk(struct net *net)
> return net->ipv4.icmp_sk[smp_processor_id()];
> }
>
> -static inline struct sock *icmp_xmit_lock(struct net *net)
> +static struct sock *icmp_xmit_lock(struct net *net)
> {
> struct sock *sk;
>
> @@ -226,7 +226,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
> return sk;
> }
>
> -static inline void icmp_xmit_unlock(struct sock *sk)
> +static void icmp_xmit_unlock(struct sock *sk)
> {
> spin_unlock_bh(&sk->sk_lock.slock);
> }
> @@ -235,8 +235,8 @@ static inline void icmp_xmit_unlock(struct sock *sk)
> * Send an ICMP frame.
> */
>
> -static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
> - struct flowi4 *fl4, int type, int code)
> +static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
> + struct flowi4 *fl4, int type, int code)
> {
> struct dst_entry *dst = &rt->dst;
> bool rc = true;
> @@ -375,19 +375,22 @@ out_unlock:
> icmp_xmit_unlock(sk);
> }
>
> -static struct rtable *icmp_route_lookup(struct net *net,
> - struct flowi4 *fl4,
> - struct sk_buff *skb_in,
> - const struct iphdr *iph,
> - __be32 saddr, u8 tos,
> - int type, int code,
> - struct icmp_bxm *param)
> +struct icmp_send_data {
> + struct icmp_bxm icmp_param;
> + struct ipcm_cookie ipc;
> + struct flowi4 fl4;
> +};
> +
> +static noinline_for_stack struct rtable *
> +icmp_route_lookup(struct net *net, struct flowi4 *fl4,
> + struct sk_buff *skb_in, const struct iphdr *iph,
> + __be32 saddr, u8 tos, int type, int code,
> + struct icmp_bxm *param)
> {
> struct rtable *rt, *rt2;
> struct flowi4 fl4_dec;
> int err;
>
> - memset(fl4, 0, sizeof(*fl4));
> fl4->daddr = (param->replyopts.opt.opt.srr ?
> param->replyopts.opt.opt.faddr : iph->saddr);
> fl4->saddr = saddr;
> @@ -482,14 +485,12 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> {
> struct iphdr *iph;
> int room;
> - struct icmp_bxm icmp_param;
> struct rtable *rt = skb_rtable(skb_in);
> - struct ipcm_cookie ipc;
> - struct flowi4 fl4;
> __be32 saddr;
> u8 tos;
> struct net *net;
> struct sock *sk;
> + struct icmp_send_data *data = NULL;
>
> if (!rt)
> goto out;
> @@ -585,7 +586,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> IPTOS_PREC_INTERNETCONTROL) :
> iph->tos;
>
> - if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
> + data = kzalloc(sizeof(*data), GFP_ATOMIC);
> + if (!data)
> + goto out_unlock;
> +
> + if (ip_options_echo(&data->icmp_param.replyopts.opt.opt, skb_in))
> goto out_unlock;
>
>
> @@ -593,23 +598,21 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> * Prepare data for ICMP header.
> */
>
> - icmp_param.data.icmph.type = type;
> - icmp_param.data.icmph.code = code;
> - icmp_param.data.icmph.un.gateway = info;
> - icmp_param.data.icmph.checksum = 0;
> - icmp_param.skb = skb_in;
> - icmp_param.offset = skb_network_offset(skb_in);
> + data->icmp_param.data.icmph.type = type;
> + data->icmp_param.data.icmph.code = code;
> + data->icmp_param.data.icmph.un.gateway = info;
> + data->icmp_param.skb = skb_in;
> + data->icmp_param.offset = skb_network_offset(skb_in);
> inet_sk(sk)->tos = tos;
> - ipc.addr = iph->saddr;
> - ipc.opt = &icmp_param.replyopts.opt;
> - ipc.tx_flags = 0;
> + data->ipc.addr = iph->saddr;
> + data->ipc.opt = &data->icmp_param.replyopts.opt;
>
> - rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
> - type, code, &icmp_param);
> + rt = icmp_route_lookup(net, &data->fl4, skb_in, iph, saddr, tos,
> + type, code, &data->icmp_param);
> if (IS_ERR(rt))
> goto out_unlock;
>
> - if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
> + if (!icmpv4_xrlim_allow(net, rt, &data->fl4, type, code))
> goto ende;
>
> /* RFC says return as much as we can without exceeding 576 bytes. */
> @@ -617,19 +620,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> room = dst_mtu(&rt->dst);
> if (room > 576)
> room = 576;
> - room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
> + room -= sizeof(struct iphdr) + data->icmp_param.replyopts.opt.opt.optlen;
> room -= sizeof(struct icmphdr);
>
> - icmp_param.data_len = skb_in->len - icmp_param.offset;
> - if (icmp_param.data_len > room)
> - icmp_param.data_len = room;
> - icmp_param.head_len = sizeof(struct icmphdr);
> + data->icmp_param.data_len = skb_in->len - data->icmp_param.offset;
> + if (data->icmp_param.data_len > room)
> + data->icmp_param.data_len = room;
> + data->icmp_param.head_len = sizeof(struct icmphdr);
>
> - icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
> + icmp_push_reply(&data->icmp_param, &data->fl4, &data->ipc, &rt);
> ende:
> ip_rt_put(rt);
> out_unlock:
> icmp_xmit_unlock(sk);
> + kfree(data);
> out:;
> }
> EXPORT_SYMBOL(icmp_send);
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-22 15:40 ` Daniel Petre
@ 2013-05-23 8:47 ` Daniel Petre
2013-05-23 15:53 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-23 8:47 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 05/22/2013 06:40 PM, Daniel Petre wrote:
> On 05/22/2013 04:52 PM, Eric Dumazet wrote:
>> On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:
>>
>>> Hello Eric,
>>> some machines have e1000e others have tg3 (with mtu 1524) then we have
>>> few gre tunnels on top of the downlink ethernet and the traffic goes up
>>> the router via the second ethernet interface, nothing complicated.
>>>
>>
>> The crash by the way is happening in icmp_send() called from
>> ipv4_link_failure(), called from ip_tunnel_xmit() when IPv6 destination
>> cannot be reached.
>>
>> Your patch therefore should not 'avoid' the problem ...
>>
>> My guess is kernel stack is too small to afford icmp_send() being called
>> twice (recursively)
>>
>> Could you try :
>>
>
> Hello Eric,
> thanks for the patch, we managed to compile and push the kernel live,
> it went in panic when we shut the port to the server..
Hello again Eric,
we applied the little patch from:
http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
we have flapped the link few times and everything recovered smooth.
>
> crash> bt
> PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
> #0 [ffff88003fc05df0] machine_kexec at ffffffff81027430
> #1 [ffff88003fc05e40] crash_kexec at ffffffff8107da80
> #2 [ffff88003fc05f10] oops_end at ffffffff81005bf8
> #3 [ffff88003fc05f30] do_stack_segment at ffffffff8100365f
> #4 [ffff88003fc05f50] retint_signal at ffffffff81542d12
> [exception RIP: __kmalloc+144]
> RIP: ffffffff810d0a20 RSP: ffff88003fc03a30 RFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff88003d672a00 RCX: 00000000003c1bf9
> RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI: 0000000000013ba0
> RBP: 37f5089fae060a80 R8: ffffffff814d5def R9: ffff88003fc03a80
> R10: 00000000557809c3 R11: ffff88003e1053c0 R12: ffff88003e001240
> R13: 0000000000008020 R14: 0000000000000000 R15: 0000000000000001
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> --- <STACKFAULT exception stack> ---
> #5 [ffff88003fc03a30] __kmalloc at ffffffff810d0a20
> #6 [ffff88003fc03a58] icmp_send at ffffffff814d5def
> #7 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
> #8 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
> #9 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
> #10 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
> #11 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
> #12 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
> #13 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
> #14 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
> #15 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
> #16 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
> #17 [ffff88003fc03f38] segment_not_present at ffffffff8154438c
> #18 [ffff88003fc03f70] irq_exit at ffffffff8103e9cd
> #19 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
> #20 [ffff88003fc03fb0] save_paranoid at ffffffff81542b6a
> --- <IRQ stack> ---
> #21 [ffffffff81801ea8] save_paranoid at ffffffff81542b6a
> [exception RIP: mwait_idle+95]
> RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
> RAX: 0000000000000000 RBX: ffffffff8154189e RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
> RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
> ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
>
> ---------------------
>
> [ 645.650121] e1000e: eth3 NIC Link is Down
> [ 664.596968] stack segment: 0000 [#1] SMP
> [ 664.597121] Modules linked in: coretemp
> [ 664.597264] CPU 0
> [ 664.597309] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #4 IBM IBM
> System x3250 M2
> [ 664.597447] RIP: 0010:[<ffffffff810d0a20>] [<ffffffff810d0a20>]
> __kmalloc+0x90/0x180
> [ 664.597559] RSP: 0018:ffff88003fc03a30 EFLAGS: 00010202
> [ 664.597621] RAX: 0000000000000000 RBX: ffff88003d672a00 RCX:
> 00000000003c1bf9
> [ 664.597687] RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI:
> 0000000000013ba0
> [ 664.597752] RBP: 37f5089fae060a80 R08: ffffffff814d5def R09:
> ffff88003fc03a80
> [ 664.597817] R10: 00000000557809c3 R11: ffff88003e1053c0 R12:
> ffff88003e001240
> [ 664.597882] R13: 0000000000008020 R14: 0000000000000000 R15:
> 0000000000000001
> [ 664.597948] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000)
> knlGS:0000000000000000
> [ 664.598015] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 664.598077] CR2: 00007fefa9e458e0 CR3: 000000003d848000 CR4:
> 00000000000007f0
> [ 664.598143] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 664.598208] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 664.598273] Process swapper/0 (pid: 0, threadinfo ffffffff81800000,
> task ffffffff81813420)
> [ 664.598340] Stack:
> [ 664.598396] 00000000c3097855 ffff88003d672a00 0000000000000003
> 0000000000000001
> [ 664.598627] ffff880039ead70e ffffffff814d5def ffff88003ce11840
> 0000000000000246
> [ 664.598859] ffff88003d0b4000 ffffffff814a2beb 0000000000010018
> ffff88003e1053c0
> [ 664.599090] Call Trace:
> [ 664.599147] <IRQ>
> [ 664.599190]
> [ 664.599289] [<ffffffff814d5def>] ? icmp_send+0x11f/0x390
> [ 664.599353] [<ffffffff814a2beb>] ? __ip_rt_update_pmtu+0xbb/0x110
> [ 664.599418] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
> [ 664.599482] [<ffffffff814e78b5>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
> [ 664.599547] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
> [ 664.599612] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
> [ 664.599676] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
> [ 664.599739] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
> [ 664.600002] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
> [ 664.600002] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
> [ 664.600002] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
> [ 664.600002] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
> [ 664.600002] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
> [ 664.600002] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
> [ 664.600002] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
> [ 664.600002] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
> [ 664.600002] [<ffffffff8154438c>] ? call_softirq+0x1c/0x30
> [ 664.600002] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
> [ 664.600002] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
> [ 664.600002] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
> [ 664.600002] [<ffffffff81542b6a>] ? common_interrupt+0x6a/0x6a
> [ 664.600002] <EOI>
> [ 664.600002]
> [ 664.600002] [<ffffffff8154189e>] ? __schedule+0x26e/0x5b0
> [ 664.600002] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
> [ 664.600002] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
> [ 664.600002] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
> [ 664.600002] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
> [ 664.600002] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
> [ 664.600002] Code: 28 49 8b 0c 24 65 48 03 0c 25 88 cc 00 00 48 8b 51
> 08 48 8b 29 48 85 ed 0f 84 d3 00 00 00 49 63 44 24 20 49 8b 3c 24 48 8d
> 4a 01 <48> 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 3c 01 75 c2 49
> [ 664.600002] RIP [<ffffffff810d0a20>] __kmalloc+0x90/0x180
> [ 664.600002] RSP <ffff88003fc03a30>
>
>
>> net/ipv4/icmp.c | 72 ++++++++++++++++++++++++----------------------
>> 1 file changed, 38 insertions(+), 34 deletions(-)
>>
>> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>> index 76e10b4..e33f3b0 100644
>> --- a/net/ipv4/icmp.c
>> +++ b/net/ipv4/icmp.c
>> @@ -208,7 +208,7 @@ static struct sock *icmp_sk(struct net *net)
>> return net->ipv4.icmp_sk[smp_processor_id()];
>> }
>>
>> -static inline struct sock *icmp_xmit_lock(struct net *net)
>> +static struct sock *icmp_xmit_lock(struct net *net)
>> {
>> struct sock *sk;
>>
>> @@ -226,7 +226,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
>> return sk;
>> }
>>
>> -static inline void icmp_xmit_unlock(struct sock *sk)
>> +static void icmp_xmit_unlock(struct sock *sk)
>> {
>> spin_unlock_bh(&sk->sk_lock.slock);
>> }
>> @@ -235,8 +235,8 @@ static inline void icmp_xmit_unlock(struct sock *sk)
>> * Send an ICMP frame.
>> */
>>
>> -static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
>> - struct flowi4 *fl4, int type, int code)
>> +static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
>> + struct flowi4 *fl4, int type, int code)
>> {
>> struct dst_entry *dst = &rt->dst;
>> bool rc = true;
>> @@ -375,19 +375,22 @@ out_unlock:
>> icmp_xmit_unlock(sk);
>> }
>>
>> -static struct rtable *icmp_route_lookup(struct net *net,
>> - struct flowi4 *fl4,
>> - struct sk_buff *skb_in,
>> - const struct iphdr *iph,
>> - __be32 saddr, u8 tos,
>> - int type, int code,
>> - struct icmp_bxm *param)
>> +struct icmp_send_data {
>> + struct icmp_bxm icmp_param;
>> + struct ipcm_cookie ipc;
>> + struct flowi4 fl4;
>> +};
>> +
>> +static noinline_for_stack struct rtable *
>> +icmp_route_lookup(struct net *net, struct flowi4 *fl4,
>> + struct sk_buff *skb_in, const struct iphdr *iph,
>> + __be32 saddr, u8 tos, int type, int code,
>> + struct icmp_bxm *param)
>> {
>> struct rtable *rt, *rt2;
>> struct flowi4 fl4_dec;
>> int err;
>>
>> - memset(fl4, 0, sizeof(*fl4));
>> fl4->daddr = (param->replyopts.opt.opt.srr ?
>> param->replyopts.opt.opt.faddr : iph->saddr);
>> fl4->saddr = saddr;
>> @@ -482,14 +485,12 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>> {
>> struct iphdr *iph;
>> int room;
>> - struct icmp_bxm icmp_param;
>> struct rtable *rt = skb_rtable(skb_in);
>> - struct ipcm_cookie ipc;
>> - struct flowi4 fl4;
>> __be32 saddr;
>> u8 tos;
>> struct net *net;
>> struct sock *sk;
>> + struct icmp_send_data *data = NULL;
>>
>> if (!rt)
>> goto out;
>> @@ -585,7 +586,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>> IPTOS_PREC_INTERNETCONTROL) :
>> iph->tos;
>>
>> - if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
>> + data = kzalloc(sizeof(*data), GFP_ATOMIC);
>> + if (!data)
>> + goto out_unlock;
>> +
>> + if (ip_options_echo(&data->icmp_param.replyopts.opt.opt, skb_in))
>> goto out_unlock;
>>
>>
>> @@ -593,23 +598,21 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>> * Prepare data for ICMP header.
>> */
>>
>> - icmp_param.data.icmph.type = type;
>> - icmp_param.data.icmph.code = code;
>> - icmp_param.data.icmph.un.gateway = info;
>> - icmp_param.data.icmph.checksum = 0;
>> - icmp_param.skb = skb_in;
>> - icmp_param.offset = skb_network_offset(skb_in);
>> + data->icmp_param.data.icmph.type = type;
>> + data->icmp_param.data.icmph.code = code;
>> + data->icmp_param.data.icmph.un.gateway = info;
>> + data->icmp_param.skb = skb_in;
>> + data->icmp_param.offset = skb_network_offset(skb_in);
>> inet_sk(sk)->tos = tos;
>> - ipc.addr = iph->saddr;
>> - ipc.opt = &icmp_param.replyopts.opt;
>> - ipc.tx_flags = 0;
>> + data->ipc.addr = iph->saddr;
>> + data->ipc.opt = &data->icmp_param.replyopts.opt;
>>
>> - rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
>> - type, code, &icmp_param);
>> + rt = icmp_route_lookup(net, &data->fl4, skb_in, iph, saddr, tos,
>> + type, code, &data->icmp_param);
>> if (IS_ERR(rt))
>> goto out_unlock;
>>
>> - if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
>> + if (!icmpv4_xrlim_allow(net, rt, &data->fl4, type, code))
>> goto ende;
>>
>> /* RFC says return as much as we can without exceeding 576 bytes. */
>> @@ -617,19 +620,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>> room = dst_mtu(&rt->dst);
>> if (room > 576)
>> room = 576;
>> - room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
>> + room -= sizeof(struct iphdr) + data->icmp_param.replyopts.opt.opt.optlen;
>> room -= sizeof(struct icmphdr);
>>
>> - icmp_param.data_len = skb_in->len - icmp_param.offset;
>> - if (icmp_param.data_len > room)
>> - icmp_param.data_len = room;
>> - icmp_param.head_len = sizeof(struct icmphdr);
>> + data->icmp_param.data_len = skb_in->len - data->icmp_param.offset;
>> + if (data->icmp_param.data_len > room)
>> + data->icmp_param.data_len = room;
>> + data->icmp_param.head_len = sizeof(struct icmphdr);
>>
>> - icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
>> + icmp_push_reply(&data->icmp_param, &data->fl4, &data->ipc, &rt);
>> ende:
>> ip_rt_put(rt);
>> out_unlock:
>> icmp_xmit_unlock(sk);
>> + kfree(data);
>> out:;
>> }
>> EXPORT_SYMBOL(icmp_send);
>>
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-23 8:47 ` Daniel Petre
@ 2013-05-23 15:53 ` Eric Dumazet
2013-05-23 16:59 ` Daniel Petre
2013-05-23 17:10 ` Eric Dumazet
0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-23 15:53 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
>
> Hello again Eric,
> we applied the little patch from:
> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
> we have flapped the link few times and everything recovered smooth.
>
Thats a very good catch, now we have to fix the bug at the right place.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-23 15:53 ` Eric Dumazet
@ 2013-05-23 16:59 ` Daniel Petre
2013-05-23 17:11 ` Eric Dumazet
2013-05-23 17:10 ` Eric Dumazet
1 sibling, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-23 16:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On May 23, 2013, at 6:53 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
>
>>
>> Hello again Eric,
>> we applied the little patch from:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
>> we have flapped the link few times and everything recovered smooth.
>>
>
> Thats a very good catch, now we have to fix the bug at the right place.
>
Hey Eric,
maybe this could work?
--- linux-3.8.13/net/ipv4/ip_gre.c.orig 2013-05-23 19:54:58.317798942 +0300
+++ linux-3.8.13/net/ipv4/ip_gre.c 2013-05-23 19:56:30.290029424 +0300
@@ -882,7 +882,7 @@ static netdev_tx_t ipgre_tunnel_xmit(str
if (time_before(jiffies,
tunnel->err_time + IPTUNNEL_ERR_TIMEO)) {
tunnel->err_count--;
-
+ memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
dst_link_failure(skb);
} else
tunnel->err_count = 0;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-23 15:53 ` Eric Dumazet
2013-05-23 16:59 ` Daniel Petre
@ 2013-05-23 17:10 ` Eric Dumazet
2013-05-24 9:40 ` Daniel Petre
2013-05-24 15:49 ` [PATCH] ip_tunnel: " Eric Dumazet
1 sibling, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-23 17:10 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Thu, 2013-05-23 at 08:53 -0700, Eric Dumazet wrote:
> On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
>
> >
> > Hello again Eric,
> > we applied the little patch from:
> > http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
> > we have flapped the link few times and everything recovered smooth.
> >
>
> Thats a very good catch, now we have to fix the bug at the right place.
Please try the following patch :
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 91d66db..563358e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -795,6 +795,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
if (dev->type == ARPHRD_ETHER)
IPCB(skb)->flags = 0;
+ memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
gre_hlen = 0;
@@ -952,7 +953,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
skb_push(skb, gre_hlen);
skb_reset_network_header(skb);
skb_set_transport_header(skb, sizeof(*iph));
- memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
IPSKB_REROUTED);
skb_dst_drop(skb);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-23 16:59 ` Daniel Petre
@ 2013-05-23 17:11 ` Eric Dumazet
0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-23 17:11 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Thu, 2013-05-23 at 19:59 +0300, Daniel Petre wrote:
> On May 23, 2013, at 6:53 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
> >
> >>
> >> Hello again Eric,
> >> we applied the little patch from:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
> >> we have flapped the link few times and everything recovered smooth.
> >>
> >
> > Thats a very good catch, now we have to fix the bug at the right place.
> >
>
> Hey Eric,
> maybe this could work?
>
> --- linux-3.8.13/net/ipv4/ip_gre.c.orig 2013-05-23 19:54:58.317798942 +0300
> +++ linux-3.8.13/net/ipv4/ip_gre.c 2013-05-23 19:56:30.290029424 +0300
> @@ -882,7 +882,7 @@ static netdev_tx_t ipgre_tunnel_xmit(str
> if (time_before(jiffies,
> tunnel->err_time + IPTUNNEL_ERR_TIMEO)) {
> tunnel->err_count--;
> -
> + memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
> dst_link_failure(skb);
> } else
> tunnel->err_count = 0;
>
Not exactly, please try the patch I sent.
Thanks !
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-23 17:10 ` Eric Dumazet
@ 2013-05-24 9:40 ` Daniel Petre
2013-05-24 13:47 ` Eric Dumazet
2013-05-24 15:49 ` [PATCH] ip_tunnel: " Eric Dumazet
1 sibling, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-24 9:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 05/23/2013 08:10 PM, Eric Dumazet wrote:
> On Thu, 2013-05-23 at 08:53 -0700, Eric Dumazet wrote:
>> On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
>>
>>>
>>> Hello again Eric,
>>> we applied the little patch from:
>>> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
>>> we have flapped the link few times and everything recovered smooth.
>>>
>>
>> Thats a very good catch, now we have to fix the bug at the right place.
>
>
> Please try the following patch :
we have compiled it, tested it few times and.. nothing evil happens anymore!
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 91d66db..563358e 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -795,6 +795,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>
> if (dev->type == ARPHRD_ETHER)
> IPCB(skb)->flags = 0;
> + memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>
> if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
> gre_hlen = 0;
> @@ -952,7 +953,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
> skb_push(skb, gre_hlen);
> skb_reset_network_header(skb);
> skb_set_transport_header(skb, sizeof(*iph));
> - memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
> IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
> IPSKB_REROUTED);
> skb_dst_drop(skb);
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
2013-05-24 9:40 ` Daniel Petre
@ 2013-05-24 13:47 ` Eric Dumazet
0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-24 13:47 UTC (permalink / raw)
To: Daniel Petre; +Cc: netdev
On Fri, 2013-05-24 at 12:40 +0300, Daniel Petre wrote:
> On 05/23/2013 08:10 PM, Eric Dumazet wrote:
>
> we have compiled it, tested it few times and.. nothing evil happens anymore!
Thanks for the report. I'll cook the various variants of the patch.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] ip_tunnel: fix kernel panic with icmp_dest_unreach
2013-05-23 17:10 ` Eric Dumazet
2013-05-24 9:40 ` Daniel Petre
@ 2013-05-24 15:49 ` Eric Dumazet
2013-05-26 6:27 ` David Miller
1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-24 15:49 UTC (permalink / raw)
To: Daniel Petre, David Miller; +Cc: netdev
From: Eric Dumazet <edumazet@google.com>
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:
#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
Daniel found a similar problem mentioned in
http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.
We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.
A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.
Many thanks to Daniel for providing reports, patches and testing !
Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/ip_tunnel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index e4147ec..be2f8da 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -503,6 +503,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
inner_iph = (const struct iphdr *)skb_inner_network_header(skb);
+ memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
dst = tnl_params->daddr;
if (dst == 0) {
/* NBMA tunnel */
@@ -658,7 +659,6 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
skb_dst_drop(skb);
skb_dst_set(skb, &rt->dst);
- memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
/* Push down and install the IP header. */
skb_push(skb, sizeof(struct iphdr));
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] ip_tunnel: fix kernel panic with icmp_dest_unreach
2013-05-24 15:49 ` [PATCH] ip_tunnel: " Eric Dumazet
@ 2013-05-26 6:27 ` David Miller
0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2013-05-26 6:27 UTC (permalink / raw)
To: eric.dumazet; +Cc: daniel.petre, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 24 May 2013 08:49:58 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> Daniel Petre reported crashes in icmp_dst_unreach() with following call
> graph:
...
> Daniel found a similar problem mentioned in
> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
>
> And indeed this is the root cause : skb->cb[] contains data fooling IP
> stack.
>
> We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
> is called. Or else skb->cb[] might contain garbage from GSO segmentation
> layer.
>
> A similar fix was tested on linux-3.9, but gre code was refactored in
> linux-3.10. I'll send patches for stable kernels as well.
>
> Many thanks to Daniel for providing reports, patches and testing !
>
> Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks a lot everyone.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
@ 2013-05-21 17:53 Daniel Petre
0 siblings, 0 replies; 17+ messages in thread
From: Daniel Petre @ 2013-05-21 17:53 UTC (permalink / raw)
To: linux-kernel
Hello,
This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
is corrupted" when using gre tunnels and network interface flaps and ip_gre
sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down,
i managed to take a look at the vmcore with crash utility to find each time icmp_send :
crash> bt
PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88003fc03798] machine_kexec at ffffffff81027430
#1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
#2 [ffff88003fc038b8] panic at ffffffff81540026
#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
#12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
#13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
#14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
#15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
#16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
#17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
#18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
#19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
#20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
--- <IRQ stack> ---
#21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
[exception RIP: mwait_idle+95]
RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff8154194e RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
#22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
crash> log
[..]
[ 6772.560124] e1000e: eth3 NIC Link is Down
[ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx
[ 6928.050119] e1000e: eth3 NIC Link is Down
[ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx
[ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: ffffffff814d5fec
[ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
[ 6945.738189] Call Trace:
[ 6945.738212] <IRQ> [<ffffffff8154001f>] ? panic+0xbf/0x1c9
[ 6945.738245] [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
[ 6945.738271] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
[ 6945.738296] [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
[ 6945.738320] [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
[ 6945.738344] [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
[ 6945.738369] [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
[ 6945.738393] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
[ 6945.738418] [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
[ 6945.738443] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
[ 6945.738470] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
[ 6945.738494] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
[ 6945.738518] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
[ 6945.738542] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
[ 6945.738567] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
[ 6945.738591] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
[ 6945.738616] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
[ 6945.738642] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
[ 6945.738667] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
[ 6945.738690] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
[ 6945.738715] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
[ 6945.738739] [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
[ 6945.738762] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
[ 6945.738786] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
[ 6945.738808] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
[ 6945.738831] [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
[ 6945.738854] <EOI> [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
[ 6945.738884] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
[ 6945.738907] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
[ 6945.738930] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
[ 6945.738954] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
[ 6945.738978] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
---
--- linux-3.8.13/net/ipv4/ip_gre.c.orig 2013-05-21 20:28:37.340537935 +0300
+++ linux-3.8.13/net/ipv4/ip_gre.c 2013-05-21 20:32:47.248722835 +0300
@@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
gro_cells_receive(&tunnel->gro_cells, skb);
return 0;
}
- icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
+ /* don't send icmp destination unreachable if tunnel is down
+ the IP stack gets corrupted and machine panics!
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
drop:
kfree_skb(skb);
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-05-26 6:27 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <D2CA09D3-A93C-48CF-A23B-DC1B76D66818@rcs-rds.ro>
2013-05-21 21:01 ` [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach Eric Dumazet
2013-05-22 8:36 ` Daniel Petre
2013-05-22 11:37 ` Eric Dumazet
2013-05-22 11:49 ` Daniel Petre
2013-05-22 11:53 ` Eric Dumazet
2013-05-22 13:52 ` Eric Dumazet
2013-05-22 15:40 ` Daniel Petre
2013-05-23 8:47 ` Daniel Petre
2013-05-23 15:53 ` Eric Dumazet
2013-05-23 16:59 ` Daniel Petre
2013-05-23 17:11 ` Eric Dumazet
2013-05-23 17:10 ` Eric Dumazet
2013-05-24 9:40 ` Daniel Petre
2013-05-24 13:47 ` Eric Dumazet
2013-05-24 15:49 ` [PATCH] ip_tunnel: " Eric Dumazet
2013-05-26 6:27 ` David Miller
2013-05-21 17:53 [PATCH] ip_gre: " Daniel Petre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).