All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible bug in net/ipv4/route.c?
@ 2010-07-01 23:00 Sol Kavy
  2010-07-05 12:04   ` Herbert Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Sol Kavy @ 2010-07-01 23:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Ren, Guojun Jin, Murat Sezgin, Sener Ilgen

Found Linux: 2.6.28
Arch: Ubicom32 <not yet pushed>
Project: uCLinux based Router
Test: Bit torrent Stress Test

Note: The top of Linus git net/ipv4/route.c appears to have the same issue.

The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.

In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:

1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.

2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.

3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.

Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.

Please send email to sol@ubicom.com in addition to posting your response.

Thanks,

Sol Kavy/Murat Sezgin
Ubicom, Inc.

Patch:  

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 125ee64..d13805f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff *skb)
{
        struct rtable *rt;

+       /*
+         * Since link failure can be called with skbs from many layers (see arp)
+         * the cb area of the skb must be cleared before use. Because the cb area 
+         * can be formatted according to the caller layer's cb area format and it may cause
+         * corruptions when it is handled in a different network layer.
+         */
+       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
        icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
        rt = skb->rtable;

The packet is enqueud by:
do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().

The packet is then dequeued by: 
do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().

Because the Ubicom Ethernet driver overwrites the common buffer area, the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.   




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
  2010-07-01 23:00 Possible bug in net/ipv4/route.c? Sol Kavy
@ 2010-07-05 12:04   ` Herbert Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2010-07-05 12:04 UTC (permalink / raw)
  To: Sol Kavy
  Cc: linux-kernel, gren, gjin, msezgin, silgen, David S. Miller,
	Stephen Hemminger, netdev

Sol Kavy <skavy@ubicom.com> wrote:
> Found Linux: 2.6.28
> Arch: Ubicom32 <not yet pushed>
> Project: uCLinux based Router
> Test: Bit torrent Stress Test
> 
> Note: The top of Linus git net/ipv4/route.c appears to have the same issue.
> 
> The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.
> 
> In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:
> 
> 1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.
> 
> 2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.
> 
> 3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.
> 
> Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.

Thanks for the report!
 
> Patch:  
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 125ee64..d13805f 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff *skb)
> {
>         struct rtable *rt;
> 
> +       /*
> +         * Since link failure can be called with skbs from many layers (see arp)
> +         * the cb area of the skb must be cleared before use. Because the cb area 
> +         * can be formatted according to the caller layer's cb area format and it may cause
> +         * corruptions when it is handled in a different network layer.
> +         */
> +       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
>         rt = skb->rtable;
> 
> The packet is enqueud by:
> do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().
> 
> The packet is then dequeued by: 
> do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().
> 
> Because the Ubicom Ethernet driver overwrites the common buffer area, the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.   

Generally this area should be cleared on entry to each stack
intending on using it.  So in this case, I'd point the finger
of blame at the bridge stack for letting this packet into the
IP stack through the back entrance without taking the proper
precautions.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
@ 2010-07-05 12:04   ` Herbert Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2010-07-05 12:04 UTC (permalink / raw)
  To: Sol Kavy
  Cc: linux-kernel, gren, gjin, msezgin, silgen, David S. Miller,
	Stephen Hemminger, netdev

Sol Kavy <skavy@ubicom.com> wrote:
> Found Linux: 2.6.28
> Arch: Ubicom32 <not yet pushed>
> Project: uCLinux based Router
> Test: Bit torrent Stress Test
> 
> Note: The top of Linus git net/ipv4/route.c appears to have the same issue.
> 
> The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.
> 
> In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:
> 
> 1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.
> 
> 2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.
> 
> 3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.
> 
> Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.

Thanks for the report!
 
> Patch:  
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 125ee64..d13805f 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff *skb)
> {
>         struct rtable *rt;
> 
> +       /*
> +         * Since link failure can be called with skbs from many layers (see arp)
> +         * the cb area of the skb must be cleared before use. Because the cb area 
> +         * can be formatted according to the caller layer's cb area format and it may cause
> +         * corruptions when it is handled in a different network layer.
> +         */
> +       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
>         rt = skb->rtable;
> 
> The packet is enqueud by:
> do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().
> 
> The packet is then dequeued by: 
> do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().
> 
> Because the Ubicom Ethernet driver overwrites the common buffer area, the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.   

Generally this area should be cleared on entry to each stack
intending on using it.  So in this case, I'd point the finger
of blame at the bridge stack for letting this packet into the
IP stack through the back entrance without taking the proper
precautions.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Possible bug in net/ipv4/route.c?
  2010-07-05 12:04   ` Herbert Xu
  (?)
@ 2010-07-05 16:37   ` Sol Kavy
  -1 siblings, 0 replies; 13+ messages in thread
From: Sol Kavy @ 2010-07-05 16:37 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Greg Ren, Guojun Jin, Murat Sezgin, Sener Ilgen, David S. Miller,
	Stephen Hemminger, netdev

So is there some place else that should have the clear instead of ipv4_link_failure()?

Sol

-----Original Message-----
From: Herbert Xu [mailto:herbert@gondor.apana.org.au] 
Sent: Monday, July 05, 2010 5:04 AM
To: Sol Kavy
Cc: linux-kernel@vger.kernel.org; Greg Ren; Guojun Jin; Murat Sezgin; Sener Ilgen; David S. Miller; Stephen Hemminger; netdev@vger.kernel.org
Subject: Re: Possible bug in net/ipv4/route.c?

Sol Kavy <skavy@ubicom.com> wrote:
> Found Linux: 2.6.28
> Arch: Ubicom32 <not yet pushed>
> Project: uCLinux based Router
> Test: Bit torrent Stress Test
> 
> Note: The top of Linus git net/ipv4/route.c appears to have the same issue.
> 
> The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.
> 
> In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:
> 
> 1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.
> 
> 2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.
> 
> 3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.
> 
> Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.

Thanks for the report!
 
> Patch:
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 
> 125ee64..d13805f 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff 
> *skb) {
>         struct rtable *rt;
> 
> +       /*
> +         * Since link failure can be called with skbs from many 
> +layers (see arp)
> +         * the cb area of the skb must be cleared before use. Because 
> +the cb area
> +         * can be formatted according to the caller layer's cb area 
> +format and it may cause
> +         * corruptions when it is handled in a different network layer.
> +         */
> +       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
>         rt = skb->rtable;
> 
> The packet is enqueud by:
> do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().
> 
> The packet is then dequeued by: 
> do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().
> 
> Because the Ubicom Ethernet driver overwrites the common buffer area, 
> the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.

Generally this area should be cleared on entry to each stack intending on using it.  So in this case, I'd point the finger of blame at the bridge stack for letting this packet into the IP stack through the back entrance without taking the proper precautions.

Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
  2010-07-05 12:04   ` Herbert Xu
@ 2010-07-05 18:03     ` Stephen Hemminger
  -1 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2010-07-05 18:03 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Sol Kavy, linux-kernel, gren, gjin, msezgin, silgen,
	David S. Miller, netdev

On Mon, 5 Jul 2010 20:04:13 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Sol Kavy <skavy@ubicom.com> wrote:
> > Found Linux: 2.6.28
> > Arch: Ubicom32 <not yet pushed>
> > Project: uCLinux based Router
> > Test: Bit torrent Stress Test
> > 
> > Note: The top of Linus git net/ipv4/route.c appears to have the same issue.
> > 
> > The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.
> > 
> > In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:
> > 
> > 1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.
> > 
> > 2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.
> > 
> > 3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.
> > 
> > Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.
> 
> Thanks for the report!
>  
> > Patch:  
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 125ee64..d13805f 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff *skb)
> > {
> >         struct rtable *rt;
> > 
> > +       /*
> > +         * Since link failure can be called with skbs from many layers (see arp)
> > +         * the cb area of the skb must be cleared before use. Because the cb area 
> > +         * can be formatted according to the caller layer's cb area format and it may cause
> > +         * corruptions when it is handled in a different network layer.
> > +         */
> > +       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
> >         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
> >         rt = skb->rtable;
> > 
> > The packet is enqueud by:
> > do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().
> > 
> > The packet is then dequeued by: 
> > do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().
> > 
> > Because the Ubicom Ethernet driver overwrites the common buffer area, the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.   
> 
> Generally this area should be cleared on entry to each stack
> intending on using it.  So in this case, I'd point the finger
> of blame at the bridge stack for letting this packet into the
> IP stack through the back entrance without taking the proper
> precautions.

The CB is used in two places in the bridge code.
1) The recent multicast changes (IGMP)
2) Netfilter / ebtables to store header.

Ebtables is okay because the only part of the cb[] it uses
is the incoming device (brdev) which is always initialized coming into
the bridge: br_dev_xmit and br_handle_frame_finish

The IGMP code looks buggy.

/* net device transmit always called with no BH (preempt_disabled) */
netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
{
... [A]
	BR_INPUT_SKB_CB(skb)->brdev = dev;

	skb_reset_mac_header(skb);
	skb_pull(skb, ETH_HLEN);

	if (is_multicast_ether_addr(dest)) {
		if (br_multicast_rcv(br, NULL, skb))
			goto out;

		mdst = br_mdb_get(br, skb);
		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb))
			... [B]

The problem is that br_dev_xmit is looking at flags in the CB[] that
are uninitialized.

if br_dev_xmit cleared the CB at [A] the mrouters_only would always be zero
at [B].

Where should the mrouters and igmp_only fields in skb be initialized?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
@ 2010-07-05 18:03     ` Stephen Hemminger
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2010-07-05 18:03 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Sol Kavy, linux-kernel, gren, gjin, msezgin, silgen,
	David S. Miller, netdev

On Mon, 5 Jul 2010 20:04:13 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Sol Kavy <skavy@ubicom.com> wrote:
> > Found Linux: 2.6.28
> > Arch: Ubicom32 <not yet pushed>
> > Project: uCLinux based Router
> > Test: Bit torrent Stress Test
> > 
> > Note: The top of Linus git net/ipv4/route.c appears to have the same issue.
> > 
> > The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.
> > 
> > In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:
> > 
> > 1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.
> > 
> > 2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.
> > 
> > 3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.
> > 
> > Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.
> 
> Thanks for the report!
>  
> > Patch:  
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 125ee64..d13805f 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff *skb)
> > {
> >         struct rtable *rt;
> > 
> > +       /*
> > +         * Since link failure can be called with skbs from many layers (see arp)
> > +         * the cb area of the skb must be cleared before use. Because the cb area 
> > +         * can be formatted according to the caller layer's cb area format and it may cause
> > +         * corruptions when it is handled in a different network layer.
> > +         */
> > +       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
> >         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
> >         rt = skb->rtable;
> > 
> > The packet is enqueud by:
> > do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().
> > 
> > The packet is then dequeued by: 
> > do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().
> > 
> > Because the Ubicom Ethernet driver overwrites the common buffer area, the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.   
> 
> Generally this area should be cleared on entry to each stack
> intending on using it.  So in this case, I'd point the finger
> of blame at the bridge stack for letting this packet into the
> IP stack through the back entrance without taking the proper
> precautions.

The CB is used in two places in the bridge code.
1) The recent multicast changes (IGMP)
2) Netfilter / ebtables to store header.

Ebtables is okay because the only part of the cb[] it uses
is the incoming device (brdev) which is always initialized coming into
the bridge: br_dev_xmit and br_handle_frame_finish

The IGMP code looks buggy.

/* net device transmit always called with no BH (preempt_disabled) */
netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
{
... [A]
	BR_INPUT_SKB_CB(skb)->brdev = dev;

	skb_reset_mac_header(skb);
	skb_pull(skb, ETH_HLEN);

	if (is_multicast_ether_addr(dest)) {
		if (br_multicast_rcv(br, NULL, skb))
			goto out;

		mdst = br_mdb_get(br, skb);
		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb))
			... [B]

The problem is that br_dev_xmit is looking at flags in the CB[] that
are uninitialized.

if br_dev_xmit cleared the CB at [A] the mrouters_only would always be zero
at [B].

Where should the mrouters and igmp_only fields in skb be initialized?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
  2010-07-05 18:03     ` Stephen Hemminger
@ 2010-07-06  0:28       ` Herbert Xu
  -1 siblings, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2010-07-06  0:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Sol Kavy, linux-kernel, gren, gjin, msezgin, silgen,
	David S. Miller, netdev

On Mon, Jul 05, 2010 at 11:03:28AM -0700, Stephen Hemminger wrote:
>
> The problem is that br_dev_xmit is looking at flags in the CB[] that
> are uninitialized.
> 
> if br_dev_xmit cleared the CB at [A] the mrouters_only would always be zero
> at [B].
> 
> Where should the mrouters and igmp_only fields in skb be initialized?

They are initialised in br_multicast_rcv.

Anyway, this isn't the problem here.  The problem is that before
bridge netfilter passes a packet through IPv4 through ARP, it must
ensure that the cb used by IPv4 is cleared.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
@ 2010-07-06  0:28       ` Herbert Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2010-07-06  0:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Sol Kavy, linux-kernel, gren, gjin, msezgin, silgen,
	David S. Miller, netdev

On Mon, Jul 05, 2010 at 11:03:28AM -0700, Stephen Hemminger wrote:
>
> The problem is that br_dev_xmit is looking at flags in the CB[] that
> are uninitialized.
> 
> if br_dev_xmit cleared the CB at [A] the mrouters_only would always be zero
> at [B].
> 
> Where should the mrouters and igmp_only fields in skb be initialized?

They are initialised in br_multicast_rcv.

Anyway, this isn't the problem here.  The problem is that before
bridge netfilter passes a packet through IPv4 through ARP, it must
ensure that the cb used by IPv4 is cleared.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
       [not found]       ` <CB2DD11991B27C4F99935E6229450D3203950BE5@STORK.scenix.com>
@ 2010-07-06  7:29           ` Herbert Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2010-07-06  7:29 UTC (permalink / raw)
  To: Guojun Jin
  Cc: Stephen Hemminger, Sol Kavy, linux-kernel, Greg Ren,
	Murat Sezgin, Sener Ilgen, David S. Miller, netdev

On Mon, Jul 05, 2010 at 11:06:30PM -0700, Guojun Jin wrote:
>
> Would you please point to where is the best place that CB should be cleared for this case?
> 
> I also wonder why IP packet is routed through ARP queue for ICMP delievry.

I was hoping that you guys would figure this out :)

Anyway, this patch should fix the crash.

bridge: Clear IPCB before possible entry into IP stack

The bridge protocol lives dangerously by having incestuous relations
with the IP stack.  In this instance an abomination has been created
where a bogus IPCB area from a bridged packet leads to a crash in
the IP stack because it's interpreted as IP options.

This patch papers over the problem by clearing the IPCB area in that
particular spot.  To fix this properly we'd also need to parse any
IP options if present but I'm way too lazy for that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 4442099..8fb75f8 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -591,6 +591,9 @@ static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff *skb,
 
 	pskb_trim_rcsum(skb, len);
 
+	/* BUG: Should really parse the IP options here. */
+	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+
 	nf_bridge_put(skb->nf_bridge);
 	if (!nf_bridge_alloc(skb))
 		return NF_DROP;

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
@ 2010-07-06  7:29           ` Herbert Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2010-07-06  7:29 UTC (permalink / raw)
  To: Guojun Jin
  Cc: Stephen Hemminger, Sol Kavy, linux-kernel, Greg Ren,
	Murat Sezgin, Sener Ilgen, David S. Miller, netdev

On Mon, Jul 05, 2010 at 11:06:30PM -0700, Guojun Jin wrote:
>
> Would you please point to where is the best place that CB should be cleared for this case?
> 
> I also wonder why IP packet is routed through ARP queue for ICMP delievry.

I was hoping that you guys would figure this out :)

Anyway, this patch should fix the crash.

bridge: Clear IPCB before possible entry into IP stack

The bridge protocol lives dangerously by having incestuous relations
with the IP stack.  In this instance an abomination has been created
where a bogus IPCB area from a bridged packet leads to a crash in
the IP stack because it's interpreted as IP options.

This patch papers over the problem by clearing the IPCB area in that
particular spot.  To fix this properly we'd also need to parse any
IP options if present but I'm way too lazy for that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 4442099..8fb75f8 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -591,6 +591,9 @@ static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff *skb,
 
 	pskb_trim_rcsum(skb, len);
 
+	/* BUG: Should really parse the IP options here. */
+	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+
 	nf_bridge_put(skb->nf_bridge);
 	if (!nf_bridge_alloc(skb))
 		return NF_DROP;

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
  2010-07-06  7:29           ` Herbert Xu
@ 2010-07-07 21:46             ` David Miller
  -1 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2010-07-07 21:46 UTC (permalink / raw)
  To: herbert
  Cc: gjin, shemminger, skavy, linux-kernel, gren, msezgin, silgen, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 6 Jul 2010 15:29:28 +0800

> bridge: Clear IPCB before possible entry into IP stack
> 
> The bridge protocol lives dangerously by having incestuous relations
> with the IP stack.  In this instance an abomination has been created
> where a bogus IPCB area from a bridged packet leads to a crash in
> the IP stack because it's interpreted as IP options.
> 
> This patch papers over the problem by clearing the IPCB area in that
> particular spot.  To fix this properly we'd also need to parse any
> IP options if present but I'm way too lazy for that.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks a lot!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug in net/ipv4/route.c?
@ 2010-07-07 21:46             ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2010-07-07 21:46 UTC (permalink / raw)
  To: herbert
  Cc: gjin, shemminger, skavy, linux-kernel, gren, msezgin, silgen, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 6 Jul 2010 15:29:28 +0800

> bridge: Clear IPCB before possible entry into IP stack
> 
> The bridge protocol lives dangerously by having incestuous relations
> with the IP stack.  In this instance an abomination has been created
> where a bogus IPCB area from a bridged packet leads to a crash in
> the IP stack because it's interpreted as IP options.
> 
> This patch papers over the problem by clearing the IPCB area in that
> particular spot.  To fix this properly we'd also need to parse any
> IP options if present but I'm way too lazy for that.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks a lot!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Possible bug in net/ipv4/route.c?
@ 2010-07-02  0:19 Sol Kavy
  0 siblings, 0 replies; 13+ messages in thread
From: Sol Kavy @ 2010-07-02  0:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Ren, Guojun Jin, Murat Sezgin, Sener Ilgen

Found Linux: 2.6.28
Arch: Ubicom32 <not yet pushed>
Project: uCLinux based Router
Test: Bit torrent Stress Test

Note: The top of Linus git net/ipv4/route.c appears to have the same issue.

The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.

In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:

1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.

2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.

3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.

Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.

Please send email to sol@ubicom.com in addition to posting your response.

Thanks,

Sol Kavy/Murat Sezgin
Ubicom, Inc.

Patch:  

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 125ee64..d13805f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff *skb)
{
        struct rtable *rt;

+       /*
+         * Since link failure can be called with skbs from many layers (see arp)
+         * the cb area of the skb must be cleared before use. Because the cb area 
+         * can be formatted according to the caller layer's cb area format and it may cause
+         * corruptions when it is handled in a different network layer.
+         */
+       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
        icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
        rt = skb->rtable;

The packet is enqueud by:
do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().

The packet is then dequeued by: 
do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().

Because the Ubicom Ethernet driver overwrites the common buffer area, the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.   




^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-07-07 21:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-01 23:00 Possible bug in net/ipv4/route.c? Sol Kavy
2010-07-05 12:04 ` Herbert Xu
2010-07-05 12:04   ` Herbert Xu
2010-07-05 16:37   ` Sol Kavy
2010-07-05 18:03   ` Stephen Hemminger
2010-07-05 18:03     ` Stephen Hemminger
2010-07-06  0:28     ` Herbert Xu
2010-07-06  0:28       ` Herbert Xu
     [not found]       ` <CB2DD11991B27C4F99935E6229450D3203950BE5@STORK.scenix.com>
2010-07-06  7:29         ` Herbert Xu
2010-07-06  7:29           ` Herbert Xu
2010-07-07 21:46           ` David Miller
2010-07-07 21:46             ` David Miller
2010-07-02  0:19 Sol Kavy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.