All of lore.kernel.org
 help / color / mirror / Atom feed
* net: af_packet: skb_orphan should be avoided in TX path.
@ 2010-09-05 17:18 Changli Gao
  2010-09-05 17:43 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Changli Gao @ 2010-09-05 17:18 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet; +Cc: Linux Netdev List

af_packet uses tpacket_destruct_skb() to notify its user a frame is
sent out through NIC, and the memory for that frame is available for
the others. If the driver calls skb_orphan() before the frame is sent
out successfully, and the user may fill other data into the space for
this frame, this frame will be corrupted. It became more likely after
skb_try_orphan() was added into dev_hard_start_xmit().

Am I correct?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-05 17:18 net: af_packet: skb_orphan should be avoided in TX path Changli Gao
@ 2010-09-05 17:43 ` Eric Dumazet
  2010-09-05 17:51   ` Changli Gao
  2010-09-06 10:35   ` Michael S. Tsirkin
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-09-05 17:43 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Linux Netdev List

Le lundi 06 septembre 2010 à 01:18 +0800, Changli Gao a écrit :
> af_packet uses tpacket_destruct_skb() to notify its user a frame is
> sent out through NIC, and the memory for that frame is available for
> the others. If the driver calls skb_orphan() before the frame is sent
> out successfully, and the user may fill other data into the space for
> this frame, this frame will be corrupted. It became more likely after
> skb_try_orphan() was added into dev_hard_start_xmit().
> 
> Am I correct?
> 

Yes good catch. We might add a :

SKBTX_NO_EARLY_ORPHAN = 1 << 4,

so that skb_orphan_try() do not early orphan this kind of skb


diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f900ffc..9c1a480 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -176,6 +176,9 @@ enum {
 
 	/* ensure the originating sk reference is available on driver level */
 	SKBTX_DRV_NEEDS_SK_REF = 1 << 3,
+
+	/* dont early orphan this skb in skb_orphan_try() */
+	SKBTX_NO_EARLY_ORPHAN = 1 << 4,
 };
 
 /* This data is invariant across clones and lives at
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 3616f27..306795d 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1029,6 +1029,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		}
 
 		skb->destructor = tpacket_destruct_skb;
+		skb_shinfo(skb)->tx_flags |= SKBTX_NO_EARLY_ORPHAN;
 		__packet_set_status(po, ph, TP_STATUS_SENDING);
 		atomic_inc(&po->tx_ring.pending);
 



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-05 17:43 ` Eric Dumazet
@ 2010-09-05 17:51   ` Changli Gao
  2010-09-05 18:08     ` Oliver Hartkopp
  2010-09-06 10:35   ` Michael S. Tsirkin
  1 sibling, 1 reply; 9+ messages in thread
From: Changli Gao @ 2010-09-05 17:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List

On Mon, Sep 6, 2010 at 1:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 06 septembre 2010 à 01:18 +0800, Changli Gao a écrit :
>> af_packet uses tpacket_destruct_skb() to notify its user a frame is
>> sent out through NIC, and the memory for that frame is available for
>> the others. If the driver calls skb_orphan() before the frame is sent
>> out successfully, and the user may fill other data into the space for
>> this frame, this frame will be corrupted. It became more likely after
>> skb_try_orphan() was added into dev_hard_start_xmit().
>>
>> Am I correct?
>>
>
> Yes good catch. We might add a :
>
> SKBTX_NO_EARLY_ORPHAN = 1 << 4,
>
> so that skb_orphan_try() do not early orphan this kind of skb
>

It may solve the issue of skb_orphan_try(), but some NICs still call
skb_orphan(). Maybe replacing skb_orphan() with skb_orphan_try() can
work around this issue.

localhost linux # grep skb_orphan drivers/net/ -r
drivers/net/can/dev.c:                  skb_orphan(skb);
drivers/net/mlx4/en_tx.c:               skb_orphan(skb);
drivers/net/cxgb3/sge.c:                skb_orphan(skb);
drivers/net/cxgb4/sge.c:                skb_orphan(skb);
drivers/net/niu.c:              skb_orphan(skb);
drivers/net/cxgb4vf/sge.c:              skb_orphan(skb);
drivers/net/tun.c:      skb_orphan(skb);
drivers/net/virtio_net.c:       skb_orphan(skb);
drivers/net/loopback.c: skb_orphan(skb);
drivers/net/wireless/mac80211_hwsim.c:  skb_orphan(skb);
drivers/net/wireless/libertas/tx.c:             skb_orphan(skb);

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-05 17:51   ` Changli Gao
@ 2010-09-05 18:08     ` Oliver Hartkopp
  0 siblings, 0 replies; 9+ messages in thread
From: Oliver Hartkopp @ 2010-09-05 18:08 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List

On 05.09.2010 19:51, Changli Gao wrote:
> On Mon, Sep 6, 2010 at 1:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le lundi 06 septembre 2010 à 01:18 +0800, Changli Gao a écrit :
>>> af_packet uses tpacket_destruct_skb() to notify its user a frame is
>>> sent out through NIC, and the memory for that frame is available for
>>> the others. If the driver calls skb_orphan() before the frame is sent
>>> out successfully, and the user may fill other data into the space for
>>> this frame, this frame will be corrupted. It became more likely after
>>> skb_try_orphan() was added into dev_hard_start_xmit().
>>>
>>> Am I correct?
>>>
>>
>> Yes good catch. We might add a :
>>
>> SKBTX_NO_EARLY_ORPHAN = 1 << 4,
>>
>> so that skb_orphan_try() do not early orphan this kind of skb
>>
> 
> It may solve the issue of skb_orphan_try(), but some NICs still call
> skb_orphan(). Maybe replacing skb_orphan() with skb_orphan_try() can
> work around this issue.
> 

Hm - i'm not really sure if skb_orphan_try() helps on this level.

E.g. for drivers/net/can/dev.c the skb_orphan is used to handle the correct
order of echo'ed skbs. Other drivers may do similar things.

AFAIK skb_orphan_try() helps to increase the performance for usual network
traffic, as it allows the orphan on a 'higher' level inside the networking stack.

In some cases the skb needs to be untouched, and this can be indicated in the
shared tx_flags. IMO skb_orphan_try() on driver level would break the correct
behaviour in most cases.

Regards,
Oliver

> localhost linux # grep skb_orphan drivers/net/ -r
> drivers/net/can/dev.c:                  skb_orphan(skb);
> drivers/net/mlx4/en_tx.c:               skb_orphan(skb);
> drivers/net/cxgb3/sge.c:                skb_orphan(skb);
> drivers/net/cxgb4/sge.c:                skb_orphan(skb);
> drivers/net/niu.c:              skb_orphan(skb);
> drivers/net/cxgb4vf/sge.c:              skb_orphan(skb);
> drivers/net/tun.c:      skb_orphan(skb);
> drivers/net/virtio_net.c:       skb_orphan(skb);
> drivers/net/loopback.c: skb_orphan(skb);
> drivers/net/wireless/mac80211_hwsim.c:  skb_orphan(skb);
> drivers/net/wireless/libertas/tx.c:             skb_orphan(skb);
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-05 17:43 ` Eric Dumazet
  2010-09-05 17:51   ` Changli Gao
@ 2010-09-06 10:35   ` Michael S. Tsirkin
  2010-09-06 15:44     ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2010-09-06 10:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, David S. Miller, Linux Netdev List

On Sun, Sep 05, 2010 at 07:43:55PM +0200, Eric Dumazet wrote:
> Le lundi 06 septembre 2010 à 01:18 +0800, Changli Gao a écrit :
> > af_packet uses tpacket_destruct_skb() to notify its user a frame is
> > sent out through NIC, and the memory for that frame is available for
> > the others. If the driver calls skb_orphan() before the frame is sent
> > out successfully, and the user may fill other data into the space for
> > this frame, this frame will be corrupted. It became more likely after
> > skb_try_orphan() was added into dev_hard_start_xmit().
> > 
> > Am I correct?
> > 
> 
> Yes good catch. We might add a :
> 
> SKBTX_NO_EARLY_ORPHAN = 1 << 4,
> 
> so that skb_orphan_try() do not early orphan this kind of skb
> 

I think there are bigger issues here.  As was pointed out, drivers might
orphan skbs before they transmit them.
And at least for tun, the reason is that we might hang on
to skbs indefinitely because userspace is not reading them.

So in that case, if you just prevent tun from orphaning skbs, the socket
will be prevented from sending any more packets out even if they are for
a completely unrelated destinations, right?
Further, module can't get unloaded and I think socket can not get
closed, so user can't kill the task which has the socket?

And thinking about this, I think I see
another issue related to the use of the destructor callback:

static void tpacket_destruct_skb(struct sk_buff *skb)
{
        struct packet_sock *po = pkt_sk(skb->sk);
        void *ph;

        BUG_ON(skb == NULL);

        if (likely(po->tx_ring.pg_vec)) {
                ph = skb_shinfo(skb)->destructor_arg;
                BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
                BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
                atomic_dec(&po->tx_ring.pending);
                __packet_set_status(po, ph, TP_STATUS_AVAILABLE);
        }

        sock_wfree(skb);

<-----
at this point we still have to execute instructions
in this function to return from it. However
socket and thus module reference count
got already dropped to 0, so I think module could get unloaded
and these instructions could get overwritten.

}

I conclude that destructor callback should never point to a function residing
in a module, always to a function that is guaranteed to be builtin, this
function must be the one that drops the last module reference.

Comments?


> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index f900ffc..9c1a480 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -176,6 +176,9 @@ enum {
>  
>  	/* ensure the originating sk reference is available on driver level */
>  	SKBTX_DRV_NEEDS_SK_REF = 1 << 3,
> +
> +	/* dont early orphan this skb in skb_orphan_try() */
> +	SKBTX_NO_EARLY_ORPHAN = 1 << 4,
>  };
>  
>  /* This data is invariant across clones and lives at
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 3616f27..306795d 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1029,6 +1029,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>  		}
>  
>  		skb->destructor = tpacket_destruct_skb;
> +		skb_shinfo(skb)->tx_flags |= SKBTX_NO_EARLY_ORPHAN;
>  		__packet_set_status(po, ph, TP_STATUS_SENDING);
>  		atomic_inc(&po->tx_ring.pending);
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-06 10:35   ` Michael S. Tsirkin
@ 2010-09-06 15:44     ` Eric Dumazet
  2010-09-06 19:53       ` Michael S. Tsirkin
  2010-09-08 17:39       ` David Miller
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-09-06 15:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Changli Gao, David S. Miller, Linux Netdev List

Le lundi 06 septembre 2010 à 13:35 +0300, Michael S. Tsirkin a écrit :

> I think there are bigger issues here.  As was pointed out, drivers might
> orphan skbs before they transmit them.
> And at least for tun, the reason is that we might hang on
> to skbs indefinitely because userspace is not reading them.
> 
> So in that case, if you just prevent tun from orphaning skbs, the socket
> will be prevented from sending any more packets out even if they are for
> a completely unrelated destinations, right?
> Further, module can't get unloaded and I think socket can not get
> closed, so user can't kill the task which has the socket?
> 
> And thinking about this, I think I see
> another issue related to the use of the destructor callback:
> 
> static void tpacket_destruct_skb(struct sk_buff *skb)
> {
>         struct packet_sock *po = pkt_sk(skb->sk);
>         void *ph;
> 
>         BUG_ON(skb == NULL);
> 
>         if (likely(po->tx_ring.pg_vec)) {
>                 ph = skb_shinfo(skb)->destructor_arg;
>                 BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
>                 BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
>                 atomic_dec(&po->tx_ring.pending);
>                 __packet_set_status(po, ph, TP_STATUS_AVAILABLE);
>         }
> 
>         sock_wfree(skb);
> 
> <-----
> at this point we still have to execute instructions
> in this function to return from it. However
> socket and thus module reference count
> got already dropped to 0, so I think module could get unloaded
> and these instructions could get overwritten.
> 
> }
> 
> I conclude that destructor callback should never point to a function residing
> in a module, always to a function that is guaranteed to be builtin, this
> function must be the one that drops the last module reference.

It would be a surprise to use tx mmap (presumably to get high
performance), and a modular af_unix ;)

> 
> Comments?

The whole thing (packet / tx mmap) is broken, if you ask me.

skb_orphan() is not about protecting data, but doing per socket memory
accounting.

We have to skb_orphan() while data is still in use by skb, not only in
drivers but in core network stack. (loopback case for example, no need
to think about TUN being special ;) )

So I believe using mmap and tx on af_unix is racy in its current design.

We probably can remove some skb_orphan() calls (now its done in core
network, no real need to make it from some drivers), but not have a
complete solution to the problem Changli raised, without adding yet
another field into skb_shared_info...




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-06 15:44     ` Eric Dumazet
@ 2010-09-06 19:53       ` Michael S. Tsirkin
  2010-09-06 20:11         ` David Miller
  2010-09-08 17:39       ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2010-09-06 19:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, David S. Miller, Linux Netdev List

On Mon, Sep 06, 2010 at 05:44:27PM +0200, Eric Dumazet wrote:
> Le lundi 06 septembre 2010 à 13:35 +0300, Michael S. Tsirkin a écrit :
> 
> > I think there are bigger issues here.  As was pointed out, drivers might
> > orphan skbs before they transmit them.
> > And at least for tun, the reason is that we might hang on
> > to skbs indefinitely because userspace is not reading them.
> > 
> > So in that case, if you just prevent tun from orphaning skbs, the socket
> > will be prevented from sending any more packets out even if they are for
> > a completely unrelated destinations, right?
> > Further, module can't get unloaded and I think socket can not get
> > closed, so user can't kill the task which has the socket?
> > 
> > And thinking about this, I think I see
> > another issue related to the use of the destructor callback:
> > 
> > static void tpacket_destruct_skb(struct sk_buff *skb)
> > {
> >         struct packet_sock *po = pkt_sk(skb->sk);
> >         void *ph;
> > 
> >         BUG_ON(skb == NULL);
> > 
> >         if (likely(po->tx_ring.pg_vec)) {
> >                 ph = skb_shinfo(skb)->destructor_arg;
> >                 BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
> >                 BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
> >                 atomic_dec(&po->tx_ring.pending);
> >                 __packet_set_status(po, ph, TP_STATUS_AVAILABLE);
> >         }
> > 
> >         sock_wfree(skb);
> > 
> > <-----
> > at this point we still have to execute instructions
> > in this function to return from it. However
> > socket and thus module reference count
> > got already dropped to 0, so I think module could get unloaded
> > and these instructions could get overwritten.
> > 
> > }
> > 
> > I conclude that destructor callback should never point to a function residing
> > in a module, always to a function that is guaranteed to be builtin, this
> > function must be the one that drops the last module reference.
> 
> It would be a surprise to use tx mmap (presumably to get high
> performance), and a modular af_unix ;)

Hmm, this seems to be what distros ship. If it's illegal, we should disable
this in Kconfig?

> > 
> > Comments?
> 
> The whole thing (packet / tx mmap) is broken, if you ask me.
> 
> skb_orphan() is not about protecting data, but doing per socket memory
> accounting.

Right.

> We have to skb_orphan() while data is still in use by skb, not only in
> drivers but in core network stack. (loopback case for example, no need
> to think about TUN being special ;) )
> 
> So I believe using mmap and tx on af_unix is racy in its current design.
> 
> We probably can remove some skb_orphan() calls (now its done in core
> network, no real need to make it from some drivers), but not have a
> complete solution to the problem Changli raised, without adding yet
> another field into skb_shared_info...

Whouldn't that just make the problem harder to debug?

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-06 19:53       ` Michael S. Tsirkin
@ 2010-09-06 20:11         ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-09-06 20:11 UTC (permalink / raw)
  To: mst; +Cc: eric.dumazet, xiaosuo, netdev

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 6 Sep 2010 22:53:30 +0300

> On Mon, Sep 06, 2010 at 05:44:27PM +0200, Eric Dumazet wrote:
>> It would be a surprise to use tx mmap (presumably to get high
>> performance), and a modular af_unix ;)
> 
> Hmm, this seems to be what distros ship. If it's illegal, we should disable
> this in Kconfig?

He said it would be illogical to configure the kernel like this if
the goal is top performance, he didn't say it was illegal.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: net: af_packet: skb_orphan should be avoided in TX path.
  2010-09-06 15:44     ` Eric Dumazet
  2010-09-06 19:53       ` Michael S. Tsirkin
@ 2010-09-08 17:39       ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2010-09-08 17:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: mst, xiaosuo, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 06 Sep 2010 17:44:27 +0200

> skb_orphan() is not about protecting data, but doing per socket memory
> accounting.

Right, therefore I would suggest we use a different callback mechanism
for af_packet to fix this problem.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-09-08 17:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-05 17:18 net: af_packet: skb_orphan should be avoided in TX path Changli Gao
2010-09-05 17:43 ` Eric Dumazet
2010-09-05 17:51   ` Changli Gao
2010-09-05 18:08     ` Oliver Hartkopp
2010-09-06 10:35   ` Michael S. Tsirkin
2010-09-06 15:44     ` Eric Dumazet
2010-09-06 19:53       ` Michael S. Tsirkin
2010-09-06 20:11         ` David Miller
2010-09-08 17:39       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.