All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: "Alexander Duyck" <alexander.duyck@gmail.com>,
	"David Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	"Neal Cardwell" <ncardwell@google.com>,
	"Tom Herbert" <therbert@google.com>,
	"Jeff Kirsher" <jeffrey.t.kirsher@intel.com>,
	"Michael Chan" <mchan@broadcom.com>,
	"Matt Carlson" <mcarlson@broadcom.com>,
	"Herbert Xu" <herbert@gondor.apana.org.au>,
	"Ben Hutchings" <bhutchings@solarflare.com>,
	"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>,
	"Maciej Żenczykowski" <maze@google.com>
Subject: Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()
Date: Wed, 02 May 2012 18:12:48 +0200	[thread overview]
Message-ID: <1335975168.22133.578.camel@edumazet-glaptop> (raw)
In-Reply-To: <4FA15830.6080600@intel.com>

On Wed, 2012-05-02 at 08:52 -0700, Alexander Duyck wrote:
> On 05/02/2012 01:13 AM, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Before stealing fragments or skb head, we must make sure skb is not
> > cloned.
> >
> > If skb is cloned, we must take references on pages instead.
> >
> > Bug happened using tcpdump (if not using mmap())
> >
> > Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/tcp_input.c |   17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 96a631d..7686d7f 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -4467,7 +4467,7 @@ static bool tcp_try_coalesce(struct sock *sk,
> >  			     struct sk_buff *from,
> >  			     bool *fragstolen)
> >  {
> > -	int delta, len = from->len;
> > +	int i, delta, len = from->len;
> >  
> >  	*fragstolen = false;
> >  	if (tcp_hdr(from)->fin)
> > @@ -4497,7 +4497,13 @@ copyfrags:
> >  		       skb_shinfo(from)->frags,
> >  		       skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
> >  		skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
> > -		skb_shinfo(from)->nr_frags = 0;
> > +
> > +		if (skb_cloned(from))
> > +			for (i = 0; i < skb_shinfo(from)->nr_frags; i++)
> > +				skb_frag_ref(from, i);
> > +		else
> > +			skb_shinfo(from)->nr_frags = 0;
> > +
> >  		to->truesize += delta;
> >  		atomic_add(delta, &sk->sk_rmem_alloc);
> >  		sk_mem_charge(sk, delta);
> I am fairly certain the bug I saw is only masked over by this change. 
> The underlying problem is that we shouldn't be messing with nr_frags on
> the from or the to if either one is clone.  You now have a check in
> place for the from, but what about the to?  This function should
> probably be calling a pskb_expand_head on the to skb in order to
> guarantee that the skb->head isn't shared.  Otherwise this is going to
> cause other issues for any functions that are sharing these skbs that
> just walk through frags without checking skb->len or skb->data_len first. 

Its safe to increase to->len and increase nr_frags in this context,
because we hold a reference to dataref : It cannot disappear under us.

clones will still have their skb->len at skb_clone() time and wont care
we expanded the frags.

> 
> > @@ -4515,7 +4521,12 @@ copyfrags:
> >  		offset = from->data - (unsigned char *)page_address(page);
> >  		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
> >  				   page, offset, skb_headlen(from));
> > -		*fragstolen = true;
> > +
> > +		if (skb_cloned(from))
> > +			get_page(page);
> > +		else
> > +			*fragstolen = true;
> > +
> >  		delta = len; /* we dont know real truesize... */
> >  		goto copyfrags;
> >  	}
> >
> >
> I don't see where we are now addressing the put_page call to release the
> page afterwards.  By calling get_page you are incrementing the page
> count by one, but where are you decrementing dataref in the shared
> info?  Without that we are looking at a memory leak because __kfree_skb
> will decrement the dataref but it will never reach 0 so it will never
> call put_page on the head frag.

really the dataref was already incremented at skb_clone() time

It will be properly decremented since we call __kfree_skb()

Only the last decrement will perform the put_page()

Think about splice() is doing, its the same get_page() game.

  reply	other threads:[~2012-05-02 16:13 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-27 10:37 [PATCH 3/4 net-next] net: make GRO aware of skb->head_frag Eric Dumazet
2012-04-30 17:54 ` Eric Dumazet
2012-04-30 18:10 ` [PATCH 3/4 v2 " Eric Dumazet
2012-04-30 23:36   ` Alexander Duyck
2012-05-01  1:27     ` Eric Dumazet
2012-05-01  5:33       ` Alexander Duyck
2012-05-01  6:39         ` Eric Dumazet
2012-05-01 16:17           ` Alexander Duyck
2012-05-01 17:04             ` Eric Dumazet
2012-05-01 19:45               ` Alexander Duyck
2012-05-02  2:45                 ` Eric Dumazet
2012-05-02  8:24                 ` Eric Dumazet
2012-05-02 16:16                   ` Alexander Duyck
2012-05-02 16:19                     ` Eric Dumazet
2012-05-02 16:27                       ` Eric Dumazet
2012-05-02 17:04                         ` Alexander Duyck
2012-05-02 17:02                       ` Alexander Duyck
2012-05-02 17:16                   ` Rick Jones
2012-05-01 22:58               ` Alexander Duyck
2012-05-01 23:10                 ` Alexander Duyck
2012-05-02  2:47                   ` Eric Dumazet
2012-05-02  3:54                     ` Eric Dumazet
2012-05-02  8:13                     ` [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce() Eric Dumazet
2012-05-02 15:52                       ` Alexander Duyck
2012-05-02 16:12                         ` Eric Dumazet [this message]
2012-05-02 16:27                           ` Alexander Duyck
2012-05-02 16:46                             ` Eric Dumazet
2012-05-02 17:55                               ` [PATCH v2 " Eric Dumazet
2012-05-02 19:58                                 ` [PATCH net-next] net: implement tcp coalescing in tcp_queue_rcv() Eric Dumazet
2012-05-02 20:11                                   ` Joe Perches
2012-05-02 20:23                                     ` Eric Dumazet
2012-05-02 20:34                                       ` Joe Perches
2012-05-03  0:32                                       ` David Miller
2012-05-03  1:11                                   ` David Miller
2012-05-03  2:14                                     ` Eric Dumazet
2012-05-03  2:21                                       ` David Miller
2012-05-03  1:11                                 ` [PATCH v2 net-next] net: take care of cloned skbs in tcp_try_coalesce() David Miller
2012-05-02 18:05                               ` [PATCH " Alexander Duyck
2012-05-02 18:15                                 ` Eric Dumazet
2012-05-02 20:55                                   ` Alexander Duyck
2012-05-03  1:52                                     ` Eric Dumazet
2012-05-03  3:00                                       ` Alexander Duyck
2012-05-03  3:14                                         ` Eric Dumazet
2012-05-03  3:28                                           ` Alexander Duyck
2012-05-01  1:48   ` [PATCH 3/4 v2 net-next] net: make GRO aware of skb->head_frag David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1335975168.22133.578.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=ilpo.jarvinen@helsinki.fi \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=maze@google.com \
    --cc=mcarlson@broadcom.com \
    --cc=mchan@broadcom.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.