All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Changli Gao <xiaosuo@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Netdev List <netdev@vger.kernel.org>
Subject: Re: net: af_packet: skb_orphan should be avoided in TX path.
Date: Mon, 6 Sep 2010 22:53:30 +0300	[thread overview]
Message-ID: <20100906195330.GA30715@redhat.com> (raw)
In-Reply-To: <1283787867.2654.600.camel@edumazet-laptop>

On Mon, Sep 06, 2010 at 05:44:27PM +0200, Eric Dumazet wrote:
> Le lundi 06 septembre 2010 à 13:35 +0300, Michael S. Tsirkin a écrit :
> 
> > I think there are bigger issues here.  As was pointed out, drivers might
> > orphan skbs before they transmit them.
> > And at least for tun, the reason is that we might hang on
> > to skbs indefinitely because userspace is not reading them.
> > 
> > So in that case, if you just prevent tun from orphaning skbs, the socket
> > will be prevented from sending any more packets out even if they are for
> > a completely unrelated destinations, right?
> > Further, module can't get unloaded and I think socket can not get
> > closed, so user can't kill the task which has the socket?
> > 
> > And thinking about this, I think I see
> > another issue related to the use of the destructor callback:
> > 
> > static void tpacket_destruct_skb(struct sk_buff *skb)
> > {
> >         struct packet_sock *po = pkt_sk(skb->sk);
> >         void *ph;
> > 
> >         BUG_ON(skb == NULL);
> > 
> >         if (likely(po->tx_ring.pg_vec)) {
> >                 ph = skb_shinfo(skb)->destructor_arg;
> >                 BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
> >                 BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
> >                 atomic_dec(&po->tx_ring.pending);
> >                 __packet_set_status(po, ph, TP_STATUS_AVAILABLE);
> >         }
> > 
> >         sock_wfree(skb);
> > 
> > <-----
> > at this point we still have to execute instructions
> > in this function to return from it. However
> > socket and thus module reference count
> > got already dropped to 0, so I think module could get unloaded
> > and these instructions could get overwritten.
> > 
> > }
> > 
> > I conclude that destructor callback should never point to a function residing
> > in a module, always to a function that is guaranteed to be builtin, this
> > function must be the one that drops the last module reference.
> 
> It would be a surprise to use tx mmap (presumably to get high
> performance), and a modular af_unix ;)

Hmm, this seems to be what distros ship. If it's illegal, we should disable
this in Kconfig?

> > 
> > Comments?
> 
> The whole thing (packet / tx mmap) is broken, if you ask me.
> 
> skb_orphan() is not about protecting data, but doing per socket memory
> accounting.

Right.

> We have to skb_orphan() while data is still in use by skb, not only in
> drivers but in core network stack. (loopback case for example, no need
> to think about TUN being special ;) )
> 
> So I believe using mmap and tx on af_unix is racy in its current design.
> 
> We probably can remove some skb_orphan() calls (now its done in core
> network, no real need to make it from some drivers), but not have a
> complete solution to the problem Changli raised, without adding yet
> another field into skb_shared_info...

Whouldn't that just make the problem harder to debug?

-- 
MST

  reply	other threads:[~2010-09-06 19:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-05 17:18 net: af_packet: skb_orphan should be avoided in TX path Changli Gao
2010-09-05 17:43 ` Eric Dumazet
2010-09-05 17:51   ` Changli Gao
2010-09-05 18:08     ` Oliver Hartkopp
2010-09-06 10:35   ` Michael S. Tsirkin
2010-09-06 15:44     ` Eric Dumazet
2010-09-06 19:53       ` Michael S. Tsirkin [this message]
2010-09-06 20:11         ` David Miller
2010-09-08 17:39       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100906195330.GA30715@redhat.com \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiaosuo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.