netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
       [not found] <D2CA09D3-A93C-48CF-A23B-DC1B76D66818@rcs-rds.ro>
@ 2013-05-21 21:01 ` Eric Dumazet
  2013-05-22  8:36   ` Daniel Petre
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-21 21:01 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
> Hello,
> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
> is corrupted" when using gre tunnels and network interface flaps and ip_gre 
> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
> 
> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down, 
> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
> 
> crash> bt
> 
> PID: 0      TASK: ffffffff81813420  CPU: 0   COMMAND: "swapper/0"
> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
> #2 [ffff88003fc038b8] panic at ffffffff81540026
> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
> --- <IRQ stack> ---
> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
>  [exception RIP: mwait_idle+95]
>  RIP: ffffffff8100ad8f  RSP: ffffffff81801f50  RFLAGS: 00000246
>  RAX: 0000000000000000  RBX: ffffffff8154194e  RCX: 0000000000000000
>  RDX: 0000000000000000  RSI: ffffffff81801fd8  RDI: ffff88003fc0d840
>  RBP: ffffffff8185be80   R8: 0000000000000000   R9: 0000000000000001
>  R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>  R13: ffffffff81813420  R14: ffff88003fc11000  R15: ffffffff81813420
>  ORIG_RAX: ffffffffffffff1e  CS: 0010  SS: 0018
> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
> 
> crash> log
> 
> [..]
> 
> [ 6772.560124] e1000e: eth3 NIC Link is Down
> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: Rx
> [ 6928.050119] e1000e: eth3 NIC Link is Down
> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: Rx
> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
> is corrupted in: ffffffff814d5fec
> 
> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
> [ 6945.738189] Call Trace:
> [ 6945.738212]  <IRQ>  [<ffffffff8154001f>] ? panic+0xbf/0x1c9
> [ 6945.738245]  [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
> [ 6945.738271]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> [ 6945.738296]  [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
> [ 6945.738320]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> [ 6945.738344]  [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
> [ 6945.738369]  [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
> [ 6945.738393]  [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
> [ 6945.738418]  [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
> [ 6945.738443]  [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
> [ 6945.738470]  [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
> [ 6945.738494]  [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
> [ 6945.738518]  [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
> [ 6945.738542]  [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
> [ 6945.738567]  [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
> [ 6945.738591]  [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
> [ 6945.738616]  [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
> [ 6945.738642]  [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
> [ 6945.738667]  [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
> [ 6945.738690]  [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
> [ 6945.738715]  [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
> [ 6945.738739]  [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
> [ 6945.738762]  [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
> [ 6945.738786]  [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
> [ 6945.738808]  [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
> [ 6945.738831]  [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
> [ 6945.738854]  <EOI>  [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
> [ 6945.738884]  [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
> [ 6945.738907]  [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
> [ 6945.738930]  [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
> [ 6945.738954]  [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
> [ 6945.738978]  [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
> 
> 
> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
> ---
> 
> --- linux-3.8.13/net/ipv4/ip_gre.c.orig	2013-05-21 20:28:37.340537935 +0300
> +++ linux-3.8.13/net/ipv4/ip_gre.c	2013-05-21 20:32:47.248722835 +0300
> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
> 		gro_cells_receive(&tunnel->gro_cells, skb);
> 		return 0;
> 	}
> -	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> +	/* don't send icmp destination unreachable if tunnel is down
> +	the IP stack gets corrupted and machine panics!
> +	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
> 
> drop:
> 	kfree_skb(skb);

Hmm... can you reproduce this bug on latest kernel ?

(preferably David Miller net tree :
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
)

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-21 21:01 ` [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach Eric Dumazet
@ 2013-05-22  8:36   ` Daniel Petre
  2013-05-22 11:37     ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-22  8:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 05/22/2013 12:01 AM, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
>> Hello,
>> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
>> is corrupted" when using gre tunnels and network interface flaps and ip_gre 
>> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
>>
>> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down, 
>> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
>>
>> crash> bt
>>
>> PID: 0      TASK: ffffffff81813420  CPU: 0   COMMAND: "swapper/0"
>> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
>> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
>> #2 [ffff88003fc038b8] panic at ffffffff81540026
>> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
>> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
>> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
>> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
>> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
>> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
>> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
>> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
>> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
>> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
>> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
>> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
>> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
>> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
>> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
>> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
>> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
>> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
>> --- <IRQ stack> ---
>> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
>>  [exception RIP: mwait_idle+95]
>>  RIP: ffffffff8100ad8f  RSP: ffffffff81801f50  RFLAGS: 00000246
>>  RAX: 0000000000000000  RBX: ffffffff8154194e  RCX: 0000000000000000
>>  RDX: 0000000000000000  RSI: ffffffff81801fd8  RDI: ffff88003fc0d840
>>  RBP: ffffffff8185be80   R8: 0000000000000000   R9: 0000000000000001
>>  R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>>  R13: ffffffff81813420  R14: ffff88003fc11000  R15: ffffffff81813420
>>  ORIG_RAX: ffffffffffffff1e  CS: 0010  SS: 0018
>> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
>>
>> crash> log
>>
>> [..]
>>
>> [ 6772.560124] e1000e: eth3 NIC Link is Down
>> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>> Control: Rx
>> [ 6928.050119] e1000e: eth3 NIC Link is Down
>> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>> Control: Rx
>> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
>> is corrupted in: ffffffff814d5fec
>>
>> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
>> [ 6945.738189] Call Trace:
>> [ 6945.738212]  <IRQ>  [<ffffffff8154001f>] ? panic+0xbf/0x1c9
>> [ 6945.738245]  [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
>> [ 6945.738271]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>> [ 6945.738296]  [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
>> [ 6945.738320]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>> [ 6945.738344]  [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
>> [ 6945.738369]  [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
>> [ 6945.738393]  [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
>> [ 6945.738418]  [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
>> [ 6945.738443]  [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
>> [ 6945.738470]  [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
>> [ 6945.738494]  [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
>> [ 6945.738518]  [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
>> [ 6945.738542]  [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
>> [ 6945.738567]  [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
>> [ 6945.738591]  [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
>> [ 6945.738616]  [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
>> [ 6945.738642]  [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
>> [ 6945.738667]  [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
>> [ 6945.738690]  [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
>> [ 6945.738715]  [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
>> [ 6945.738739]  [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
>> [ 6945.738762]  [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
>> [ 6945.738786]  [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
>> [ 6945.738808]  [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
>> [ 6945.738831]  [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
>> [ 6945.738854]  <EOI>  [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
>> [ 6945.738884]  [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
>> [ 6945.738907]  [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
>> [ 6945.738930]  [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
>> [ 6945.738954]  [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
>> [ 6945.738978]  [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
>>
>>
>> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
>> ---
>>
>> --- linux-3.8.13/net/ipv4/ip_gre.c.orig	2013-05-21 20:28:37.340537935 +0300
>> +++ linux-3.8.13/net/ipv4/ip_gre.c	2013-05-21 20:32:47.248722835 +0300
>> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
>> 		gro_cells_receive(&tunnel->gro_cells, skb);
>> 		return 0;
>> 	}
>> -	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
>> +	/* don't send icmp destination unreachable if tunnel is down
>> +	the IP stack gets corrupted and machine panics!
>> +	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
>>
>> drop:
>> 	kfree_skb(skb);
> 
> Hmm... can you reproduce this bug on latest kernel ?
> 
> (preferably David Miller net tree :
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
> )

Hello Eric,
unfortunately the machine we have worked on the last weeks cannot be
used anymore for tests as it was and still is a production router..

I can tell you we have tested kernel 3.6.3, 3.6.8, 3.7.1, 3.7.6, 3.7.10
and right now it runs 3.8.13 with intel e1000e and broadcom tg3 nics and
each time the interface where we have the few gre tunnels goes down the
debian squeeze up-to-date router will panic.

I might be able to get a similar setup in the next weeks but it's a
little uncertain.

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-22  8:36   ` Daniel Petre
@ 2013-05-22 11:37     ` Eric Dumazet
  2013-05-22 11:49       ` Daniel Petre
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-22 11:37 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Wed, 2013-05-22 at 11:36 +0300, Daniel Petre wrote:
> On 05/22/2013 12:01 AM, Eric Dumazet wrote:
> > On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
> >> Hello,
> >> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
> >> is corrupted" when using gre tunnels and network interface flaps and ip_gre 
> >> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
> >>
> >> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down, 
> >> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
> >>
> >> crash> bt
> >>
> >> PID: 0      TASK: ffffffff81813420  CPU: 0   COMMAND: "swapper/0"
> >> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
> >> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
> >> #2 [ffff88003fc038b8] panic at ffffffff81540026
> >> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
> >> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
> >> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
> >> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
> >> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
> >> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
> >> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
> >> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
> >> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
> >> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
> >> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
> >> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
> >> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
> >> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
> >> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
> >> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
> >> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
> >> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
> >> --- <IRQ stack> ---
> >> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
> >>  [exception RIP: mwait_idle+95]
> >>  RIP: ffffffff8100ad8f  RSP: ffffffff81801f50  RFLAGS: 00000246
> >>  RAX: 0000000000000000  RBX: ffffffff8154194e  RCX: 0000000000000000
> >>  RDX: 0000000000000000  RSI: ffffffff81801fd8  RDI: ffff88003fc0d840
> >>  RBP: ffffffff8185be80   R8: 0000000000000000   R9: 0000000000000001
> >>  R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
> >>  R13: ffffffff81813420  R14: ffff88003fc11000  R15: ffffffff81813420
> >>  ORIG_RAX: ffffffffffffff1e  CS: 0010  SS: 0018
> >> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
> >>
> >> crash> log
> >>
> >> [..]
> >>
> >> [ 6772.560124] e1000e: eth3 NIC Link is Down
> >> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> >> Control: Rx
> >> [ 6928.050119] e1000e: eth3 NIC Link is Down
> >> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
> >> Control: Rx
> >> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
> >> is corrupted in: ffffffff814d5fec
> >>
> >> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
> >> [ 6945.738189] Call Trace:
> >> [ 6945.738212]  <IRQ>  [<ffffffff8154001f>] ? panic+0xbf/0x1c9
> >> [ 6945.738245]  [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
> >> [ 6945.738271]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> >> [ 6945.738296]  [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
> >> [ 6945.738320]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
> >> [ 6945.738344]  [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
> >> [ 6945.738369]  [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
> >> [ 6945.738393]  [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
> >> [ 6945.738418]  [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
> >> [ 6945.738443]  [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
> >> [ 6945.738470]  [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
> >> [ 6945.738494]  [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
> >> [ 6945.738518]  [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
> >> [ 6945.738542]  [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
> >> [ 6945.738567]  [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
> >> [ 6945.738591]  [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
> >> [ 6945.738616]  [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
> >> [ 6945.738642]  [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
> >> [ 6945.738667]  [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
> >> [ 6945.738690]  [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
> >> [ 6945.738715]  [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
> >> [ 6945.738739]  [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
> >> [ 6945.738762]  [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
> >> [ 6945.738786]  [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
> >> [ 6945.738808]  [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
> >> [ 6945.738831]  [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
> >> [ 6945.738854]  <EOI>  [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
> >> [ 6945.738884]  [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
> >> [ 6945.738907]  [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
> >> [ 6945.738930]  [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
> >> [ 6945.738954]  [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
> >> [ 6945.738978]  [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
> >>
> >>
> >> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
> >> ---
> >>
> >> --- linux-3.8.13/net/ipv4/ip_gre.c.orig	2013-05-21 20:28:37.340537935 +0300
> >> +++ linux-3.8.13/net/ipv4/ip_gre.c	2013-05-21 20:32:47.248722835 +0300
> >> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
> >> 		gro_cells_receive(&tunnel->gro_cells, skb);
> >> 		return 0;
> >> 	}
> >> -	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> >> +	/* don't send icmp destination unreachable if tunnel is down
> >> +	the IP stack gets corrupted and machine panics!
> >> +	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
> >>
> >> drop:
> >> 	kfree_skb(skb);
> > 
> > Hmm... can you reproduce this bug on latest kernel ?
> > 
> > (preferably David Miller net tree :
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
> > )
> 
> Hello Eric,
> unfortunately the machine we have worked on the last weeks cannot be
> used anymore for tests as it was and still is a production router..
> 
> I can tell you we have tested kernel 3.6.3, 3.6.8, 3.7.1, 3.7.6, 3.7.10
> and right now it runs 3.8.13 with intel e1000e and broadcom tg3 nics and
> each time the interface where we have the few gre tunnels goes down the
> debian squeeze up-to-date router will panic.
> 
> I might be able to get a similar setup in the next weeks but it's a
> little uncertain.
> 

What's the setup of the machine exactly ?

You receive packets from e1000e, add forward them through a GRE tunnel
via tg3 ?

Please give us some details so that we can reproduce the bug and fix it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-22 11:37     ` Eric Dumazet
@ 2013-05-22 11:49       ` Daniel Petre
  2013-05-22 11:53         ` Eric Dumazet
  2013-05-22 13:52         ` Eric Dumazet
  0 siblings, 2 replies; 17+ messages in thread
From: Daniel Petre @ 2013-05-22 11:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 05/22/2013 02:37 PM, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 11:36 +0300, Daniel Petre wrote:
>> On 05/22/2013 12:01 AM, Eric Dumazet wrote:
>>> On Tue, 2013-05-21 at 20:53 +0300, Daniel Petre wrote:
>>>> Hello,
>>>> This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
>>>> is corrupted" when using gre tunnels and network interface flaps and ip_gre 
>>>> sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.
>>>>
>>>> Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down, 
>>>> i managed to take a look at the vmcore with crash utility to find each time icmp_send :
>>>>
>>>> crash> bt
>>>>
>>>> PID: 0      TASK: ffffffff81813420  CPU: 0   COMMAND: "swapper/0"
>>>> #0 [ffff88003fc03798] machine_kexec at ffffffff81027430
>>>> #1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
>>>> #2 [ffff88003fc038b8] panic at ffffffff81540026
>>>> #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
>>>> #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
>>>> #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
>>>> #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
>>>> #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
>>>> #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
>>>> #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
>>>> #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
>>>> #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
>>>> #12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
>>>> #13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
>>>> #14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
>>>> #15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
>>>> #16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
>>>> #17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
>>>> #18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
>>>> #19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
>>>> #20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
>>>> --- <IRQ stack> ---
>>>> #21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
>>>>  [exception RIP: mwait_idle+95]
>>>>  RIP: ffffffff8100ad8f  RSP: ffffffff81801f50  RFLAGS: 00000246
>>>>  RAX: 0000000000000000  RBX: ffffffff8154194e  RCX: 0000000000000000
>>>>  RDX: 0000000000000000  RSI: ffffffff81801fd8  RDI: ffff88003fc0d840
>>>>  RBP: ffffffff8185be80   R8: 0000000000000000   R9: 0000000000000001
>>>>  R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>>>>  R13: ffffffff81813420  R14: ffff88003fc11000  R15: ffffffff81813420
>>>>  ORIG_RAX: ffffffffffffff1e  CS: 0010  SS: 0018
>>>> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
>>>>
>>>> crash> log
>>>>
>>>> [..]
>>>>
>>>> [ 6772.560124] e1000e: eth3 NIC Link is Down
>>>> [ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>>>> Control: Rx
>>>> [ 6928.050119] e1000e: eth3 NIC Link is Down
>>>> [ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
>>>> Control: Rx
>>>> [ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
>>>> is corrupted in: ffffffff814d5fec
>>>>
>>>> [ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
>>>> [ 6945.738189] Call Trace:
>>>> [ 6945.738212]  <IRQ>  [<ffffffff8154001f>] ? panic+0xbf/0x1c9
>>>> [ 6945.738245]  [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
>>>> [ 6945.738271]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>>>> [ 6945.738296]  [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
>>>> [ 6945.738320]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
>>>> [ 6945.738344]  [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
>>>> [ 6945.738369]  [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
>>>> [ 6945.738393]  [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
>>>> [ 6945.738418]  [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
>>>> [ 6945.738443]  [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
>>>> [ 6945.738470]  [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
>>>> [ 6945.738494]  [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
>>>> [ 6945.738518]  [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
>>>> [ 6945.738542]  [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
>>>> [ 6945.738567]  [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
>>>> [ 6945.738591]  [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
>>>> [ 6945.738616]  [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
>>>> [ 6945.738642]  [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
>>>> [ 6945.738667]  [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
>>>> [ 6945.738690]  [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
>>>> [ 6945.738715]  [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
>>>> [ 6945.738739]  [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
>>>> [ 6945.738762]  [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
>>>> [ 6945.738786]  [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
>>>> [ 6945.738808]  [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
>>>> [ 6945.738831]  [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
>>>> [ 6945.738854]  <EOI>  [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
>>>> [ 6945.738884]  [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
>>>> [ 6945.738907]  [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
>>>> [ 6945.738930]  [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
>>>> [ 6945.738954]  [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
>>>> [ 6945.738978]  [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
>>>>
>>>>
>>>> Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
>>>> ---
>>>>
>>>> --- linux-3.8.13/net/ipv4/ip_gre.c.orig	2013-05-21 20:28:37.340537935 +0300
>>>> +++ linux-3.8.13/net/ipv4/ip_gre.c	2013-05-21 20:32:47.248722835 +0300
>>>> @@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
>>>> 		gro_cells_receive(&tunnel->gro_cells, skb);
>>>> 		return 0;
>>>> 	}
>>>> -	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
>>>> +	/* don't send icmp destination unreachable if tunnel is down
>>>> +	the IP stack gets corrupted and machine panics!
>>>> +	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */
>>>>
>>>> drop:
>>>> 	kfree_skb(skb);
>>>
>>> Hmm... can you reproduce this bug on latest kernel ?
>>>
>>> (preferably David Miller net tree :
>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
>>> )
>>
>> Hello Eric,
>> unfortunately the machine we have worked on the last weeks cannot be
>> used anymore for tests as it was and still is a production router..
>>
>> I can tell you we have tested kernel 3.6.3, 3.6.8, 3.7.1, 3.7.6, 3.7.10
>> and right now it runs 3.8.13 with intel e1000e and broadcom tg3 nics and
>> each time the interface where we have the few gre tunnels goes down the
>> debian squeeze up-to-date router will panic.
>>
>> I might be able to get a similar setup in the next weeks but it's a
>> little uncertain.
>>
> 
> What's the setup of the machine exactly ?
> 
> You receive packets from e1000e, add forward them through a GRE tunnel
> via tg3 ?
> 
> Please give us some details so that we can reproduce the bug and fix it.
> 

Hello Eric,
some machines have e1000e others have tg3 (with mtu 1524) then we have
few gre tunnels on top of the downlink ethernet and the traffic goes up
the router via the second ethernet interface, nothing complicated.

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-22 11:49       ` Daniel Petre
@ 2013-05-22 11:53         ` Eric Dumazet
  2013-05-22 13:52         ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-22 11:53 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:

> 
> Hello Eric,
> some machines have e1000e others have tg3 (with mtu 1524) then we have
> few gre tunnels on top of the downlink ethernet and the traffic goes up
> the router via the second ethernet interface, nothing complicated.

Nothing complicated, but I am not going to spend one hour to reproduce
the bug by guessing what you do.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-22 11:49       ` Daniel Petre
  2013-05-22 11:53         ` Eric Dumazet
@ 2013-05-22 13:52         ` Eric Dumazet
  2013-05-22 15:40           ` Daniel Petre
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-22 13:52 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:

> Hello Eric,
> some machines have e1000e others have tg3 (with mtu 1524) then we have
> few gre tunnels on top of the downlink ethernet and the traffic goes up
> the router via the second ethernet interface, nothing complicated.
> 

The crash by the way is happening in icmp_send() called from
ipv4_link_failure(), called from ip_tunnel_xmit() when IPv6 destination
cannot be reached.

Your patch therefore should not 'avoid' the problem ...

My guess is kernel stack is too small to afford icmp_send() being called
twice (recursively)

Could you try :

 net/ipv4/icmp.c |   72 ++++++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 34 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 76e10b4..e33f3b0 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -208,7 +208,7 @@ static struct sock *icmp_sk(struct net *net)
 	return net->ipv4.icmp_sk[smp_processor_id()];
 }
 
-static inline struct sock *icmp_xmit_lock(struct net *net)
+static struct sock *icmp_xmit_lock(struct net *net)
 {
 	struct sock *sk;
 
@@ -226,7 +226,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
 	return sk;
 }
 
-static inline void icmp_xmit_unlock(struct sock *sk)
+static void icmp_xmit_unlock(struct sock *sk)
 {
 	spin_unlock_bh(&sk->sk_lock.slock);
 }
@@ -235,8 +235,8 @@ static inline void icmp_xmit_unlock(struct sock *sk)
  *	Send an ICMP frame.
  */
 
-static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
-				      struct flowi4 *fl4, int type, int code)
+static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
+			       struct flowi4 *fl4, int type, int code)
 {
 	struct dst_entry *dst = &rt->dst;
 	bool rc = true;
@@ -375,19 +375,22 @@ out_unlock:
 	icmp_xmit_unlock(sk);
 }
 
-static struct rtable *icmp_route_lookup(struct net *net,
-					struct flowi4 *fl4,
-					struct sk_buff *skb_in,
-					const struct iphdr *iph,
-					__be32 saddr, u8 tos,
-					int type, int code,
-					struct icmp_bxm *param)
+struct icmp_send_data {
+	struct icmp_bxm icmp_param;
+	struct ipcm_cookie ipc;
+	struct flowi4 fl4;
+};
+
+static noinline_for_stack struct rtable *
+icmp_route_lookup(struct net *net, struct flowi4 *fl4,
+		  struct sk_buff *skb_in, const struct iphdr *iph,
+		  __be32 saddr, u8 tos, int type, int code,
+		  struct icmp_bxm *param)
 {
 	struct rtable *rt, *rt2;
 	struct flowi4 fl4_dec;
 	int err;
 
-	memset(fl4, 0, sizeof(*fl4));
 	fl4->daddr = (param->replyopts.opt.opt.srr ?
 		      param->replyopts.opt.opt.faddr : iph->saddr);
 	fl4->saddr = saddr;
@@ -482,14 +485,12 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 {
 	struct iphdr *iph;
 	int room;
-	struct icmp_bxm icmp_param;
 	struct rtable *rt = skb_rtable(skb_in);
-	struct ipcm_cookie ipc;
-	struct flowi4 fl4;
 	__be32 saddr;
 	u8  tos;
 	struct net *net;
 	struct sock *sk;
+	struct icmp_send_data *data = NULL;
 
 	if (!rt)
 		goto out;
@@ -585,7 +586,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 					   IPTOS_PREC_INTERNETCONTROL) :
 					  iph->tos;
 
-	if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
+	data = kzalloc(sizeof(*data), GFP_ATOMIC);
+	if (!data)
+		goto out_unlock;
+
+	if (ip_options_echo(&data->icmp_param.replyopts.opt.opt, skb_in))
 		goto out_unlock;
 
 
@@ -593,23 +598,21 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	 *	Prepare data for ICMP header.
 	 */
 
-	icmp_param.data.icmph.type	 = type;
-	icmp_param.data.icmph.code	 = code;
-	icmp_param.data.icmph.un.gateway = info;
-	icmp_param.data.icmph.checksum	 = 0;
-	icmp_param.skb	  = skb_in;
-	icmp_param.offset = skb_network_offset(skb_in);
+	data->icmp_param.data.icmph.type	 = type;
+	data->icmp_param.data.icmph.code	 = code;
+	data->icmp_param.data.icmph.un.gateway = info;
+	data->icmp_param.skb	  = skb_in;
+	data->icmp_param.offset = skb_network_offset(skb_in);
 	inet_sk(sk)->tos = tos;
-	ipc.addr = iph->saddr;
-	ipc.opt = &icmp_param.replyopts.opt;
-	ipc.tx_flags = 0;
+	data->ipc.addr = iph->saddr;
+	data->ipc.opt = &data->icmp_param.replyopts.opt;
 
-	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
-			       type, code, &icmp_param);
+	rt = icmp_route_lookup(net, &data->fl4, skb_in, iph, saddr, tos,
+			       type, code, &data->icmp_param);
 	if (IS_ERR(rt))
 		goto out_unlock;
 
-	if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
+	if (!icmpv4_xrlim_allow(net, rt, &data->fl4, type, code))
 		goto ende;
 
 	/* RFC says return as much as we can without exceeding 576 bytes. */
@@ -617,19 +620,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	room = dst_mtu(&rt->dst);
 	if (room > 576)
 		room = 576;
-	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
+	room -= sizeof(struct iphdr) + data->icmp_param.replyopts.opt.opt.optlen;
 	room -= sizeof(struct icmphdr);
 
-	icmp_param.data_len = skb_in->len - icmp_param.offset;
-	if (icmp_param.data_len > room)
-		icmp_param.data_len = room;
-	icmp_param.head_len = sizeof(struct icmphdr);
+	data->icmp_param.data_len = skb_in->len - data->icmp_param.offset;
+	if (data->icmp_param.data_len > room)
+		data->icmp_param.data_len = room;
+	data->icmp_param.head_len = sizeof(struct icmphdr);
 
-	icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
+	icmp_push_reply(&data->icmp_param, &data->fl4, &data->ipc, &rt);
 ende:
 	ip_rt_put(rt);
 out_unlock:
 	icmp_xmit_unlock(sk);
+	kfree(data);
 out:;
 }
 EXPORT_SYMBOL(icmp_send);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-22 13:52         ` Eric Dumazet
@ 2013-05-22 15:40           ` Daniel Petre
  2013-05-23  8:47             ` Daniel Petre
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-22 15:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 05/22/2013 04:52 PM, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:
> 
>> Hello Eric,
>> some machines have e1000e others have tg3 (with mtu 1524) then we have
>> few gre tunnels on top of the downlink ethernet and the traffic goes up
>> the router via the second ethernet interface, nothing complicated.
>>
> 
> The crash by the way is happening in icmp_send() called from
> ipv4_link_failure(), called from ip_tunnel_xmit() when IPv6 destination
> cannot be reached.
> 
> Your patch therefore should not 'avoid' the problem ...
> 
> My guess is kernel stack is too small to afford icmp_send() being called
> twice (recursively)
> 
> Could you try :
> 

Hello Eric,
thanks for the patch, we managed to compile and push the kernel live,
it went in panic when we shut the port to the server..

crash> bt
PID: 0      TASK: ffffffff81813420  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff88003fc05df0] machine_kexec at ffffffff81027430
 #1 [ffff88003fc05e40] crash_kexec at ffffffff8107da80
 #2 [ffff88003fc05f10] oops_end at ffffffff81005bf8
 #3 [ffff88003fc05f30] do_stack_segment at ffffffff8100365f
 #4 [ffff88003fc05f50] retint_signal at ffffffff81542d12
    [exception RIP: __kmalloc+144]
    RIP: ffffffff810d0a20  RSP: ffff88003fc03a30  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: ffff88003d672a00  RCX: 00000000003c1bf9
    RDX: 00000000003c1bf8  RSI: 0000000000008020  RDI: 0000000000013ba0
    RBP: 37f5089fae060a80   R8: ffffffff814d5def   R9: ffff88003fc03a80
    R10: 00000000557809c3  R11: ffff88003e1053c0  R12: ffff88003e001240
    R13: 0000000000008020  R14: 0000000000000000  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <STACKFAULT exception stack> ---
 #5 [ffff88003fc03a30] __kmalloc at ffffffff810d0a20
 #6 [ffff88003fc03a58] icmp_send at ffffffff814d5def
 #7 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
 #8 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
 #9 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#10 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
#11 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
#12 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
#13 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
#14 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
#15 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
#16 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
#17 [ffff88003fc03f38] segment_not_present at ffffffff8154438c
#18 [ffff88003fc03f70] irq_exit at ffffffff8103e9cd
#19 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
#20 [ffff88003fc03fb0] save_paranoid at ffffffff81542b6a
--- <IRQ stack> ---
#21 [ffffffff81801ea8] save_paranoid at ffffffff81542b6a
    [exception RIP: mwait_idle+95]
    RIP: ffffffff8100ad8f  RSP: ffffffff81801f50  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffffffff8154189e  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff81801fd8  RDI: ffff88003fc0d840
    RBP: ffffffff8185be80   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: ffffffff81813420  R14: ffff88003fc11000  R15: ffffffff81813420
    ORIG_RAX: ffffffffffffff1e  CS: 0010  SS: 0018
#22 [ffffffff81801f50] cpu_idle at ffffffff8100b126

---------------------

[  645.650121] e1000e: eth3 NIC Link is Down
[  664.596968] stack segment: 0000 [#1] SMP
[  664.597121] Modules linked in: coretemp
[  664.597264] CPU 0
[  664.597309] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #4 IBM IBM
System x3250 M2
[  664.597447] RIP: 0010:[<ffffffff810d0a20>]  [<ffffffff810d0a20>]
__kmalloc+0x90/0x180
[  664.597559] RSP: 0018:ffff88003fc03a30  EFLAGS: 00010202
[  664.597621] RAX: 0000000000000000 RBX: ffff88003d672a00 RCX:
00000000003c1bf9
[  664.597687] RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI:
0000000000013ba0
[  664.597752] RBP: 37f5089fae060a80 R08: ffffffff814d5def R09:
ffff88003fc03a80
[  664.597817] R10: 00000000557809c3 R11: ffff88003e1053c0 R12:
ffff88003e001240
[  664.597882] R13: 0000000000008020 R14: 0000000000000000 R15:
0000000000000001
[  664.597948] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[  664.598015] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  664.598077] CR2: 00007fefa9e458e0 CR3: 000000003d848000 CR4:
00000000000007f0
[  664.598143] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  664.598208] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  664.598273] Process swapper/0 (pid: 0, threadinfo ffffffff81800000,
task ffffffff81813420)
[  664.598340] Stack:
[  664.598396]  00000000c3097855 ffff88003d672a00 0000000000000003
0000000000000001
[  664.598627]  ffff880039ead70e ffffffff814d5def ffff88003ce11840
0000000000000246
[  664.598859]  ffff88003d0b4000 ffffffff814a2beb 0000000000010018
ffff88003e1053c0
[  664.599090] Call Trace:
[  664.599147]  <IRQ>
[  664.599190]
[  664.599289]  [<ffffffff814d5def>] ? icmp_send+0x11f/0x390
[  664.599353]  [<ffffffff814a2beb>] ? __ip_rt_update_pmtu+0xbb/0x110
[  664.599418]  [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
[  664.599482]  [<ffffffff814e78b5>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
[  664.599547]  [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
[  664.599612]  [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
[  664.599676]  [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
[  664.599739]  [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
[  664.600002]  [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
[  664.600002]  [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
[  664.600002]  [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
[  664.600002]  [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
[  664.600002]  [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
[  664.600002]  [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
[  664.600002]  [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
[  664.600002]  [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
[  664.600002]  [<ffffffff8154438c>] ? call_softirq+0x1c/0x30
[  664.600002]  [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
[  664.600002]  [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
[  664.600002]  [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
[  664.600002]  [<ffffffff81542b6a>] ? common_interrupt+0x6a/0x6a
[  664.600002]  <EOI>
[  664.600002]
[  664.600002]  [<ffffffff8154189e>] ? __schedule+0x26e/0x5b0
[  664.600002]  [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
[  664.600002]  [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
[  664.600002]  [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
[  664.600002]  [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
[  664.600002]  [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
[  664.600002] Code: 28 49 8b 0c 24 65 48 03 0c 25 88 cc 00 00 48 8b 51
08 48 8b 29 48 85 ed 0f 84 d3 00 00 00 49 63 44 24 20 49 8b 3c 24 48 8d
4a 01 <48> 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 3c 01 75 c2 49
[  664.600002] RIP  [<ffffffff810d0a20>] __kmalloc+0x90/0x180
[  664.600002]  RSP <ffff88003fc03a30>


>  net/ipv4/icmp.c |   72 ++++++++++++++++++++++++----------------------
>  1 file changed, 38 insertions(+), 34 deletions(-)
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 76e10b4..e33f3b0 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -208,7 +208,7 @@ static struct sock *icmp_sk(struct net *net)
>  	return net->ipv4.icmp_sk[smp_processor_id()];
>  }
>  
> -static inline struct sock *icmp_xmit_lock(struct net *net)
> +static struct sock *icmp_xmit_lock(struct net *net)
>  {
>  	struct sock *sk;
>  
> @@ -226,7 +226,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
>  	return sk;
>  }
>  
> -static inline void icmp_xmit_unlock(struct sock *sk)
> +static void icmp_xmit_unlock(struct sock *sk)
>  {
>  	spin_unlock_bh(&sk->sk_lock.slock);
>  }
> @@ -235,8 +235,8 @@ static inline void icmp_xmit_unlock(struct sock *sk)
>   *	Send an ICMP frame.
>   */
>  
> -static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
> -				      struct flowi4 *fl4, int type, int code)
> +static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
> +			       struct flowi4 *fl4, int type, int code)
>  {
>  	struct dst_entry *dst = &rt->dst;
>  	bool rc = true;
> @@ -375,19 +375,22 @@ out_unlock:
>  	icmp_xmit_unlock(sk);
>  }
>  
> -static struct rtable *icmp_route_lookup(struct net *net,
> -					struct flowi4 *fl4,
> -					struct sk_buff *skb_in,
> -					const struct iphdr *iph,
> -					__be32 saddr, u8 tos,
> -					int type, int code,
> -					struct icmp_bxm *param)
> +struct icmp_send_data {
> +	struct icmp_bxm icmp_param;
> +	struct ipcm_cookie ipc;
> +	struct flowi4 fl4;
> +};
> +
> +static noinline_for_stack struct rtable *
> +icmp_route_lookup(struct net *net, struct flowi4 *fl4,
> +		  struct sk_buff *skb_in, const struct iphdr *iph,
> +		  __be32 saddr, u8 tos, int type, int code,
> +		  struct icmp_bxm *param)
>  {
>  	struct rtable *rt, *rt2;
>  	struct flowi4 fl4_dec;
>  	int err;
>  
> -	memset(fl4, 0, sizeof(*fl4));
>  	fl4->daddr = (param->replyopts.opt.opt.srr ?
>  		      param->replyopts.opt.opt.faddr : iph->saddr);
>  	fl4->saddr = saddr;
> @@ -482,14 +485,12 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  {
>  	struct iphdr *iph;
>  	int room;
> -	struct icmp_bxm icmp_param;
>  	struct rtable *rt = skb_rtable(skb_in);
> -	struct ipcm_cookie ipc;
> -	struct flowi4 fl4;
>  	__be32 saddr;
>  	u8  tos;
>  	struct net *net;
>  	struct sock *sk;
> +	struct icmp_send_data *data = NULL;
>  
>  	if (!rt)
>  		goto out;
> @@ -585,7 +586,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  					   IPTOS_PREC_INTERNETCONTROL) :
>  					  iph->tos;
>  
> -	if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
> +	data = kzalloc(sizeof(*data), GFP_ATOMIC);
> +	if (!data)
> +		goto out_unlock;
> +
> +	if (ip_options_echo(&data->icmp_param.replyopts.opt.opt, skb_in))
>  		goto out_unlock;
>  
>  
> @@ -593,23 +598,21 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  	 *	Prepare data for ICMP header.
>  	 */
>  
> -	icmp_param.data.icmph.type	 = type;
> -	icmp_param.data.icmph.code	 = code;
> -	icmp_param.data.icmph.un.gateway = info;
> -	icmp_param.data.icmph.checksum	 = 0;
> -	icmp_param.skb	  = skb_in;
> -	icmp_param.offset = skb_network_offset(skb_in);
> +	data->icmp_param.data.icmph.type	 = type;
> +	data->icmp_param.data.icmph.code	 = code;
> +	data->icmp_param.data.icmph.un.gateway = info;
> +	data->icmp_param.skb	  = skb_in;
> +	data->icmp_param.offset = skb_network_offset(skb_in);
>  	inet_sk(sk)->tos = tos;
> -	ipc.addr = iph->saddr;
> -	ipc.opt = &icmp_param.replyopts.opt;
> -	ipc.tx_flags = 0;
> +	data->ipc.addr = iph->saddr;
> +	data->ipc.opt = &data->icmp_param.replyopts.opt;
>  
> -	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
> -			       type, code, &icmp_param);
> +	rt = icmp_route_lookup(net, &data->fl4, skb_in, iph, saddr, tos,
> +			       type, code, &data->icmp_param);
>  	if (IS_ERR(rt))
>  		goto out_unlock;
>  
> -	if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
> +	if (!icmpv4_xrlim_allow(net, rt, &data->fl4, type, code))
>  		goto ende;
>  
>  	/* RFC says return as much as we can without exceeding 576 bytes. */
> @@ -617,19 +620,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  	room = dst_mtu(&rt->dst);
>  	if (room > 576)
>  		room = 576;
> -	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
> +	room -= sizeof(struct iphdr) + data->icmp_param.replyopts.opt.opt.optlen;
>  	room -= sizeof(struct icmphdr);
>  
> -	icmp_param.data_len = skb_in->len - icmp_param.offset;
> -	if (icmp_param.data_len > room)
> -		icmp_param.data_len = room;
> -	icmp_param.head_len = sizeof(struct icmphdr);
> +	data->icmp_param.data_len = skb_in->len - data->icmp_param.offset;
> +	if (data->icmp_param.data_len > room)
> +		data->icmp_param.data_len = room;
> +	data->icmp_param.head_len = sizeof(struct icmphdr);
>  
> -	icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
> +	icmp_push_reply(&data->icmp_param, &data->fl4, &data->ipc, &rt);
>  ende:
>  	ip_rt_put(rt);
>  out_unlock:
>  	icmp_xmit_unlock(sk);
> +	kfree(data);
>  out:;
>  }
>  EXPORT_SYMBOL(icmp_send);
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-22 15:40           ` Daniel Petre
@ 2013-05-23  8:47             ` Daniel Petre
  2013-05-23 15:53               ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-23  8:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 05/22/2013 06:40 PM, Daniel Petre wrote:
> On 05/22/2013 04:52 PM, Eric Dumazet wrote:
>> On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:
>>
>>> Hello Eric,
>>> some machines have e1000e others have tg3 (with mtu 1524) then we have
>>> few gre tunnels on top of the downlink ethernet and the traffic goes up
>>> the router via the second ethernet interface, nothing complicated.
>>>
>>
>> The crash by the way is happening in icmp_send() called from
>> ipv4_link_failure(), called from ip_tunnel_xmit() when IPv6 destination
>> cannot be reached.
>>
>> Your patch therefore should not 'avoid' the problem ...
>>
>> My guess is kernel stack is too small to afford icmp_send() being called
>> twice (recursively)
>>
>> Could you try :
>>
> 
> Hello Eric,
> thanks for the patch, we managed to compile and push the kernel live,
> it went in panic when we shut the port to the server..

Hello again Eric,
we applied the little patch from:
http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
we have flapped the link few times and everything recovered smooth.

> 
> crash> bt
> PID: 0      TASK: ffffffff81813420  CPU: 0   COMMAND: "swapper/0"
>  #0 [ffff88003fc05df0] machine_kexec at ffffffff81027430
>  #1 [ffff88003fc05e40] crash_kexec at ffffffff8107da80
>  #2 [ffff88003fc05f10] oops_end at ffffffff81005bf8
>  #3 [ffff88003fc05f30] do_stack_segment at ffffffff8100365f
>  #4 [ffff88003fc05f50] retint_signal at ffffffff81542d12
>     [exception RIP: __kmalloc+144]
>     RIP: ffffffff810d0a20  RSP: ffff88003fc03a30  RFLAGS: 00010202
>     RAX: 0000000000000000  RBX: ffff88003d672a00  RCX: 00000000003c1bf9
>     RDX: 00000000003c1bf8  RSI: 0000000000008020  RDI: 0000000000013ba0
>     RBP: 37f5089fae060a80   R8: ffffffff814d5def   R9: ffff88003fc03a80
>     R10: 00000000557809c3  R11: ffff88003e1053c0  R12: ffff88003e001240
>     R13: 0000000000008020  R14: 0000000000000000  R15: 0000000000000001
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> --- <STACKFAULT exception stack> ---
>  #5 [ffff88003fc03a30] __kmalloc at ffffffff810d0a20
>  #6 [ffff88003fc03a58] icmp_send at ffffffff814d5def
>  #7 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
>  #8 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
>  #9 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
> #10 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
> #11 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
> #12 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
> #13 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
> #14 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
> #15 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
> #16 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
> #17 [ffff88003fc03f38] segment_not_present at ffffffff8154438c
> #18 [ffff88003fc03f70] irq_exit at ffffffff8103e9cd
> #19 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
> #20 [ffff88003fc03fb0] save_paranoid at ffffffff81542b6a
> --- <IRQ stack> ---
> #21 [ffffffff81801ea8] save_paranoid at ffffffff81542b6a
>     [exception RIP: mwait_idle+95]
>     RIP: ffffffff8100ad8f  RSP: ffffffff81801f50  RFLAGS: 00000246
>     RAX: 0000000000000000  RBX: ffffffff8154189e  RCX: 0000000000000000
>     RDX: 0000000000000000  RSI: ffffffff81801fd8  RDI: ffff88003fc0d840
>     RBP: ffffffff8185be80   R8: 0000000000000000   R9: 0000000000000001
>     R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>     R13: ffffffff81813420  R14: ffff88003fc11000  R15: ffffffff81813420
>     ORIG_RAX: ffffffffffffff1e  CS: 0010  SS: 0018
> #22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
> 
> ---------------------
> 
> [  645.650121] e1000e: eth3 NIC Link is Down
> [  664.596968] stack segment: 0000 [#1] SMP
> [  664.597121] Modules linked in: coretemp
> [  664.597264] CPU 0
> [  664.597309] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #4 IBM IBM
> System x3250 M2
> [  664.597447] RIP: 0010:[<ffffffff810d0a20>]  [<ffffffff810d0a20>]
> __kmalloc+0x90/0x180
> [  664.597559] RSP: 0018:ffff88003fc03a30  EFLAGS: 00010202
> [  664.597621] RAX: 0000000000000000 RBX: ffff88003d672a00 RCX:
> 00000000003c1bf9
> [  664.597687] RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI:
> 0000000000013ba0
> [  664.597752] RBP: 37f5089fae060a80 R08: ffffffff814d5def R09:
> ffff88003fc03a80
> [  664.597817] R10: 00000000557809c3 R11: ffff88003e1053c0 R12:
> ffff88003e001240
> [  664.597882] R13: 0000000000008020 R14: 0000000000000000 R15:
> 0000000000000001
> [  664.597948] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
> knlGS:0000000000000000
> [  664.598015] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  664.598077] CR2: 00007fefa9e458e0 CR3: 000000003d848000 CR4:
> 00000000000007f0
> [  664.598143] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  664.598208] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  664.598273] Process swapper/0 (pid: 0, threadinfo ffffffff81800000,
> task ffffffff81813420)
> [  664.598340] Stack:
> [  664.598396]  00000000c3097855 ffff88003d672a00 0000000000000003
> 0000000000000001
> [  664.598627]  ffff880039ead70e ffffffff814d5def ffff88003ce11840
> 0000000000000246
> [  664.598859]  ffff88003d0b4000 ffffffff814a2beb 0000000000010018
> ffff88003e1053c0
> [  664.599090] Call Trace:
> [  664.599147]  <IRQ>
> [  664.599190]
> [  664.599289]  [<ffffffff814d5def>] ? icmp_send+0x11f/0x390
> [  664.599353]  [<ffffffff814a2beb>] ? __ip_rt_update_pmtu+0xbb/0x110
> [  664.599418]  [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
> [  664.599482]  [<ffffffff814e78b5>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
> [  664.599547]  [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
> [  664.599612]  [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
> [  664.599676]  [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
> [  664.599739]  [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
> [  664.600002]  [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
> [  664.600002]  [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
> [  664.600002]  [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
> [  664.600002]  [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
> [  664.600002]  [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
> [  664.600002]  [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
> [  664.600002]  [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
> [  664.600002]  [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
> [  664.600002]  [<ffffffff8154438c>] ? call_softirq+0x1c/0x30
> [  664.600002]  [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
> [  664.600002]  [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
> [  664.600002]  [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
> [  664.600002]  [<ffffffff81542b6a>] ? common_interrupt+0x6a/0x6a
> [  664.600002]  <EOI>
> [  664.600002]
> [  664.600002]  [<ffffffff8154189e>] ? __schedule+0x26e/0x5b0
> [  664.600002]  [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
> [  664.600002]  [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
> [  664.600002]  [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
> [  664.600002]  [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
> [  664.600002]  [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
> [  664.600002] Code: 28 49 8b 0c 24 65 48 03 0c 25 88 cc 00 00 48 8b 51
> 08 48 8b 29 48 85 ed 0f 84 d3 00 00 00 49 63 44 24 20 49 8b 3c 24 48 8d
> 4a 01 <48> 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 3c 01 75 c2 49
> [  664.600002] RIP  [<ffffffff810d0a20>] __kmalloc+0x90/0x180
> [  664.600002]  RSP <ffff88003fc03a30>
> 
> 
>>  net/ipv4/icmp.c |   72 ++++++++++++++++++++++++----------------------
>>  1 file changed, 38 insertions(+), 34 deletions(-)
>>
>> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>> index 76e10b4..e33f3b0 100644
>> --- a/net/ipv4/icmp.c
>> +++ b/net/ipv4/icmp.c
>> @@ -208,7 +208,7 @@ static struct sock *icmp_sk(struct net *net)
>>  	return net->ipv4.icmp_sk[smp_processor_id()];
>>  }
>>  
>> -static inline struct sock *icmp_xmit_lock(struct net *net)
>> +static struct sock *icmp_xmit_lock(struct net *net)
>>  {
>>  	struct sock *sk;
>>  
>> @@ -226,7 +226,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
>>  	return sk;
>>  }
>>  
>> -static inline void icmp_xmit_unlock(struct sock *sk)
>> +static void icmp_xmit_unlock(struct sock *sk)
>>  {
>>  	spin_unlock_bh(&sk->sk_lock.slock);
>>  }
>> @@ -235,8 +235,8 @@ static inline void icmp_xmit_unlock(struct sock *sk)
>>   *	Send an ICMP frame.
>>   */
>>  
>> -static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
>> -				      struct flowi4 *fl4, int type, int code)
>> +static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
>> +			       struct flowi4 *fl4, int type, int code)
>>  {
>>  	struct dst_entry *dst = &rt->dst;
>>  	bool rc = true;
>> @@ -375,19 +375,22 @@ out_unlock:
>>  	icmp_xmit_unlock(sk);
>>  }
>>  
>> -static struct rtable *icmp_route_lookup(struct net *net,
>> -					struct flowi4 *fl4,
>> -					struct sk_buff *skb_in,
>> -					const struct iphdr *iph,
>> -					__be32 saddr, u8 tos,
>> -					int type, int code,
>> -					struct icmp_bxm *param)
>> +struct icmp_send_data {
>> +	struct icmp_bxm icmp_param;
>> +	struct ipcm_cookie ipc;
>> +	struct flowi4 fl4;
>> +};
>> +
>> +static noinline_for_stack struct rtable *
>> +icmp_route_lookup(struct net *net, struct flowi4 *fl4,
>> +		  struct sk_buff *skb_in, const struct iphdr *iph,
>> +		  __be32 saddr, u8 tos, int type, int code,
>> +		  struct icmp_bxm *param)
>>  {
>>  	struct rtable *rt, *rt2;
>>  	struct flowi4 fl4_dec;
>>  	int err;
>>  
>> -	memset(fl4, 0, sizeof(*fl4));
>>  	fl4->daddr = (param->replyopts.opt.opt.srr ?
>>  		      param->replyopts.opt.opt.faddr : iph->saddr);
>>  	fl4->saddr = saddr;
>> @@ -482,14 +485,12 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>>  {
>>  	struct iphdr *iph;
>>  	int room;
>> -	struct icmp_bxm icmp_param;
>>  	struct rtable *rt = skb_rtable(skb_in);
>> -	struct ipcm_cookie ipc;
>> -	struct flowi4 fl4;
>>  	__be32 saddr;
>>  	u8  tos;
>>  	struct net *net;
>>  	struct sock *sk;
>> +	struct icmp_send_data *data = NULL;
>>  
>>  	if (!rt)
>>  		goto out;
>> @@ -585,7 +586,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>>  					   IPTOS_PREC_INTERNETCONTROL) :
>>  					  iph->tos;
>>  
>> -	if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
>> +	data = kzalloc(sizeof(*data), GFP_ATOMIC);
>> +	if (!data)
>> +		goto out_unlock;
>> +
>> +	if (ip_options_echo(&data->icmp_param.replyopts.opt.opt, skb_in))
>>  		goto out_unlock;
>>  
>>  
>> @@ -593,23 +598,21 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>>  	 *	Prepare data for ICMP header.
>>  	 */
>>  
>> -	icmp_param.data.icmph.type	 = type;
>> -	icmp_param.data.icmph.code	 = code;
>> -	icmp_param.data.icmph.un.gateway = info;
>> -	icmp_param.data.icmph.checksum	 = 0;
>> -	icmp_param.skb	  = skb_in;
>> -	icmp_param.offset = skb_network_offset(skb_in);
>> +	data->icmp_param.data.icmph.type	 = type;
>> +	data->icmp_param.data.icmph.code	 = code;
>> +	data->icmp_param.data.icmph.un.gateway = info;
>> +	data->icmp_param.skb	  = skb_in;
>> +	data->icmp_param.offset = skb_network_offset(skb_in);
>>  	inet_sk(sk)->tos = tos;
>> -	ipc.addr = iph->saddr;
>> -	ipc.opt = &icmp_param.replyopts.opt;
>> -	ipc.tx_flags = 0;
>> +	data->ipc.addr = iph->saddr;
>> +	data->ipc.opt = &data->icmp_param.replyopts.opt;
>>  
>> -	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
>> -			       type, code, &icmp_param);
>> +	rt = icmp_route_lookup(net, &data->fl4, skb_in, iph, saddr, tos,
>> +			       type, code, &data->icmp_param);
>>  	if (IS_ERR(rt))
>>  		goto out_unlock;
>>  
>> -	if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
>> +	if (!icmpv4_xrlim_allow(net, rt, &data->fl4, type, code))
>>  		goto ende;
>>  
>>  	/* RFC says return as much as we can without exceeding 576 bytes. */
>> @@ -617,19 +620,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>>  	room = dst_mtu(&rt->dst);
>>  	if (room > 576)
>>  		room = 576;
>> -	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
>> +	room -= sizeof(struct iphdr) + data->icmp_param.replyopts.opt.opt.optlen;
>>  	room -= sizeof(struct icmphdr);
>>  
>> -	icmp_param.data_len = skb_in->len - icmp_param.offset;
>> -	if (icmp_param.data_len > room)
>> -		icmp_param.data_len = room;
>> -	icmp_param.head_len = sizeof(struct icmphdr);
>> +	data->icmp_param.data_len = skb_in->len - data->icmp_param.offset;
>> +	if (data->icmp_param.data_len > room)
>> +		data->icmp_param.data_len = room;
>> +	data->icmp_param.head_len = sizeof(struct icmphdr);
>>  
>> -	icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
>> +	icmp_push_reply(&data->icmp_param, &data->fl4, &data->ipc, &rt);
>>  ende:
>>  	ip_rt_put(rt);
>>  out_unlock:
>>  	icmp_xmit_unlock(sk);
>> +	kfree(data);
>>  out:;
>>  }
>>  EXPORT_SYMBOL(icmp_send);
>>
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-23  8:47             ` Daniel Petre
@ 2013-05-23 15:53               ` Eric Dumazet
  2013-05-23 16:59                 ` Daniel Petre
  2013-05-23 17:10                 ` Eric Dumazet
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-23 15:53 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:

> 
> Hello again Eric,
> we applied the little patch from:
> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
> we have flapped the link few times and everything recovered smooth.
> 

Thats a very good catch, now we have to fix the bug at the right place.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-23 15:53               ` Eric Dumazet
@ 2013-05-23 16:59                 ` Daniel Petre
  2013-05-23 17:11                   ` Eric Dumazet
  2013-05-23 17:10                 ` Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-23 16:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev


On May 23, 2013, at 6:53 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
> 
>> 
>> Hello again Eric,
>> we applied the little patch from:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
>> we have flapped the link few times and everything recovered smooth.
>> 
> 
> Thats a very good catch, now we have to fix the bug at the right place.
> 

Hey Eric,
maybe this could work?

--- linux-3.8.13/net/ipv4/ip_gre.c.orig	2013-05-23 19:54:58.317798942 +0300
+++ linux-3.8.13/net/ipv4/ip_gre.c	2013-05-23 19:56:30.290029424 +0300
@@ -882,7 +882,7 @@ static netdev_tx_t ipgre_tunnel_xmit(str
 		if (time_before(jiffies,
 				tunnel->err_time + IPTUNNEL_ERR_TIMEO)) {
 			tunnel->err_count--;
-
+			memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 			dst_link_failure(skb);
 		} else
 			tunnel->err_count = 0;


> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-23 15:53               ` Eric Dumazet
  2013-05-23 16:59                 ` Daniel Petre
@ 2013-05-23 17:10                 ` Eric Dumazet
  2013-05-24  9:40                   ` Daniel Petre
  2013-05-24 15:49                   ` [PATCH] ip_tunnel: " Eric Dumazet
  1 sibling, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-23 17:10 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Thu, 2013-05-23 at 08:53 -0700, Eric Dumazet wrote:
> On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
> 
> > 
> > Hello again Eric,
> > we applied the little patch from:
> > http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
> > we have flapped the link few times and everything recovered smooth.
> > 
> 
> Thats a very good catch, now we have to fix the bug at the right place.


Please try the following patch :

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 91d66db..563358e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -795,6 +795,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 
 	if (dev->type == ARPHRD_ETHER)
 		IPCB(skb)->flags = 0;
+	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 
 	if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
 		gre_hlen = 0;
@@ -952,7 +953,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 	skb_push(skb, gre_hlen);
 	skb_reset_network_header(skb);
 	skb_set_transport_header(skb, sizeof(*iph));
-	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
 			      IPSKB_REROUTED);
 	skb_dst_drop(skb);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-23 16:59                 ` Daniel Petre
@ 2013-05-23 17:11                   ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-23 17:11 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Thu, 2013-05-23 at 19:59 +0300, Daniel Petre wrote:
> On May 23, 2013, at 6:53 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
> > 
> >> 
> >> Hello again Eric,
> >> we applied the little patch from:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
> >> we have flapped the link few times and everything recovered smooth.
> >> 
> > 
> > Thats a very good catch, now we have to fix the bug at the right place.
> > 
> 
> Hey Eric,
> maybe this could work?
> 
> --- linux-3.8.13/net/ipv4/ip_gre.c.orig	2013-05-23 19:54:58.317798942 +0300
> +++ linux-3.8.13/net/ipv4/ip_gre.c	2013-05-23 19:56:30.290029424 +0300
> @@ -882,7 +882,7 @@ static netdev_tx_t ipgre_tunnel_xmit(str
>  		if (time_before(jiffies,
>  				tunnel->err_time + IPTUNNEL_ERR_TIMEO)) {
>  			tunnel->err_count--;
> -
> +			memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>  			dst_link_failure(skb);
>  		} else
>  			tunnel->err_count = 0;
> 

Not exactly, please try the patch I sent.

Thanks !

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-23 17:10                 ` Eric Dumazet
@ 2013-05-24  9:40                   ` Daniel Petre
  2013-05-24 13:47                     ` Eric Dumazet
  2013-05-24 15:49                   ` [PATCH] ip_tunnel: " Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Petre @ 2013-05-24  9:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 05/23/2013 08:10 PM, Eric Dumazet wrote:
> On Thu, 2013-05-23 at 08:53 -0700, Eric Dumazet wrote:
>> On Thu, 2013-05-23 at 11:47 +0300, Daniel Petre wrote:
>>
>>>
>>> Hello again Eric,
>>> we applied the little patch from:
>>> http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
>>> we have flapped the link few times and everything recovered smooth.
>>>
>>
>> Thats a very good catch, now we have to fix the bug at the right place.
> 
> 
> Please try the following patch :

we have compiled it, tested it few times and.. nothing evil happens anymore!

> 
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 91d66db..563358e 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -795,6 +795,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>  
>  	if (dev->type == ARPHRD_ETHER)
>  		IPCB(skb)->flags = 0;
> +	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>  
>  	if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
>  		gre_hlen = 0;
> @@ -952,7 +953,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>  	skb_push(skb, gre_hlen);
>  	skb_reset_network_header(skb);
>  	skb_set_transport_header(skb, sizeof(*iph));
> -	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>  	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
>  			      IPSKB_REROUTED);
>  	skb_dst_drop(skb);
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
  2013-05-24  9:40                   ` Daniel Petre
@ 2013-05-24 13:47                     ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-05-24 13:47 UTC (permalink / raw)
  To: Daniel Petre; +Cc: netdev

On Fri, 2013-05-24 at 12:40 +0300, Daniel Petre wrote:
> On 05/23/2013 08:10 PM, Eric Dumazet wrote:

> 
> we have compiled it, tested it few times and.. nothing evil happens anymore!

Thanks for the report. I'll cook the various variants of the patch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] ip_tunnel: fix kernel panic with icmp_dest_unreach
  2013-05-23 17:10                 ` Eric Dumazet
  2013-05-24  9:40                   ` Daniel Petre
@ 2013-05-24 15:49                   ` Eric Dumazet
  2013-05-26  6:27                     ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-05-24 15:49 UTC (permalink / raw)
  To: Daniel Petre, David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:

#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596

Daniel found a similar problem mentioned in 
 http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html

And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.

We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.

A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.

Many thanks to Daniel for providing reports, patches and testing !

Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ip_tunnel.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index e4147ec..be2f8da 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -503,6 +503,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	inner_iph = (const struct iphdr *)skb_inner_network_header(skb);
 
+	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 	dst = tnl_params->daddr;
 	if (dst == 0) {
 		/* NBMA tunnel */
@@ -658,7 +659,6 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	skb_dst_drop(skb);
 	skb_dst_set(skb, &rt->dst);
-	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 
 	/* Push down and install the IP header. */
 	skb_push(skb, sizeof(struct iphdr));

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] ip_tunnel: fix kernel panic with icmp_dest_unreach
  2013-05-24 15:49                   ` [PATCH] ip_tunnel: " Eric Dumazet
@ 2013-05-26  6:27                     ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2013-05-26  6:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: daniel.petre, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 24 May 2013 08:49:58 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Daniel Petre reported crashes in icmp_dst_unreach() with following call
> graph:
 ...
> Daniel found a similar problem mentioned in 
>  http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
> 
> And indeed this is the root cause : skb->cb[] contains data fooling IP
> stack.
> 
> We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
> is called. Or else skb->cb[] might contain garbage from GSO segmentation
> layer.
> 
> A similar fix was tested on linux-3.9, but gre code was refactored in
> linux-3.10. I'll send patches for stable kernels as well.
> 
> Many thanks to Daniel for providing reports, patches and testing !
> 
> Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks a lot everyone.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
@ 2013-05-21 17:53 Daniel Petre
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Petre @ 2013-05-21 17:53 UTC (permalink / raw)
  To: linux-kernel

Hello,
This mini patch mitigates the kernel panic with "stack-protector: Kernel stack
is corrupted" when using gre tunnels and network interface flaps and ip_gre 
sends a icmp dest unreachable which gets in the IP stack and most probably messes things up.

Current ip_gre panics both 3.7.10 and 3.8.13 few seconds after ether interface is down, 
i managed to take a look at the vmcore with crash utility to find each time icmp_send :

crash> bt

PID: 0      TASK: ffffffff81813420  CPU: 0   COMMAND: "swapper/0"
#0 [ffff88003fc03798] machine_kexec at ffffffff81027430
#1 [ffff88003fc037e8] crash_kexec at ffffffff8107da80
#2 [ffff88003fc038b8] panic at ffffffff81540026
#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
#12 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
#13 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
#14 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
#15 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
#16 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
#17 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
#18 [ffff88003fc03f38] call_softirq at ffffffff8154444c
#19 [ffff88003fc03f50] do_softirq at ffffffff810047dd
#20 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
--- <IRQ stack> ---
#21 [ffffffff81801ea8] ret_from_intr at ffffffff81542c2a
 [exception RIP: mwait_idle+95]
 RIP: ffffffff8100ad8f  RSP: ffffffff81801f50  RFLAGS: 00000246
 RAX: 0000000000000000  RBX: ffffffff8154194e  RCX: 0000000000000000
 RDX: 0000000000000000  RSI: ffffffff81801fd8  RDI: ffff88003fc0d840
 RBP: ffffffff8185be80   R8: 0000000000000000   R9: 0000000000000001
 R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
 R13: ffffffff81813420  R14: ffff88003fc11000  R15: ffffffff81813420
 ORIG_RAX: ffffffffffffff1e  CS: 0010  SS: 0018
#22 [ffffffff81801f50] cpu_idle at ffffffff8100b126

crash> log

[..]

[ 6772.560124] e1000e: eth3 NIC Link is Down
[ 6786.680876] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx
[ 6928.050119] e1000e: eth3 NIC Link is Down
[ 6945.240875] e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx
[ 6945.738103] Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: ffffffff814d5fec

[ 6945.738165] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #3
[ 6945.738189] Call Trace:
[ 6945.738212]  <IRQ>  [<ffffffff8154001f>] ? panic+0xbf/0x1c9
[ 6945.738245]  [<ffffffff814ab505>] ? ip_finish_output+0x255/0x3e0
[ 6945.738271]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
[ 6945.738296]  [<ffffffff81037f77>] ? __stack_chk_fail+0x17/0x30
[ 6945.738320]  [<ffffffff814d5fec>] ? icmp_send+0x6bc/0x730
[ 6945.738344]  [<ffffffff81009470>] ? nommu_map_sg+0x110/0x110
[ 6945.738369]  [<ffffffff815428f9>] ? _raw_spin_lock_bh+0x9/0x30
[ 6945.738393]  [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
[ 6945.738418]  [<ffffffff814e7965>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
[ 6945.738443]  [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
[ 6945.738470]  [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
[ 6945.738494]  [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
[ 6945.738518]  [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
[ 6945.738542]  [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
[ 6945.738567]  [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
[ 6945.738591]  [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
[ 6945.738616]  [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
[ 6945.738642]  [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
[ 6945.738667]  [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
[ 6945.738690]  [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
[ 6945.738715]  [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
[ 6945.738739]  [<ffffffff8154444c>] ? call_softirq+0x1c/0x30
[ 6945.738762]  [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
[ 6945.738786]  [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
[ 6945.738808]  [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
[ 6945.738831]  [<ffffffff81542c2a>] ? common_interrupt+0x6a/0x6a
[ 6945.738854]  <EOI>  [<ffffffff8154194e>] ? __schedule+0x26e/0x5b0
[ 6945.738884]  [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
[ 6945.738907]  [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
[ 6945.738930]  [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
[ 6945.738954]  [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
[ 6945.738978]  [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2


Signed-off-by: Daniel Petre <daniel.petre@rcs-rds.ro>
---

--- linux-3.8.13/net/ipv4/ip_gre.c.orig	2013-05-21 20:28:37.340537935 +0300
+++ linux-3.8.13/net/ipv4/ip_gre.c	2013-05-21 20:32:47.248722835 +0300
@@ -728,7 +728,9 @@ static int ipgre_rcv(struct sk_buff *skb
		gro_cells_receive(&tunnel->gro_cells, skb);
		return 0;
	}
-	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
+	/* don't send icmp destination unreachable if tunnel is down
+	the IP stack gets corrupted and machine panics!
+	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); */

drop:
	kfree_skb(skb);

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-05-26  6:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <D2CA09D3-A93C-48CF-A23B-DC1B76D66818@rcs-rds.ro>
2013-05-21 21:01 ` [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach Eric Dumazet
2013-05-22  8:36   ` Daniel Petre
2013-05-22 11:37     ` Eric Dumazet
2013-05-22 11:49       ` Daniel Petre
2013-05-22 11:53         ` Eric Dumazet
2013-05-22 13:52         ` Eric Dumazet
2013-05-22 15:40           ` Daniel Petre
2013-05-23  8:47             ` Daniel Petre
2013-05-23 15:53               ` Eric Dumazet
2013-05-23 16:59                 ` Daniel Petre
2013-05-23 17:11                   ` Eric Dumazet
2013-05-23 17:10                 ` Eric Dumazet
2013-05-24  9:40                   ` Daniel Petre
2013-05-24 13:47                     ` Eric Dumazet
2013-05-24 15:49                   ` [PATCH] ip_tunnel: " Eric Dumazet
2013-05-26  6:27                     ` David Miller
2013-05-21 17:53 [PATCH] ip_gre: " Daniel Petre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).