All of lore.kernel.org
 help / color / mirror / Atom feed
* tcp_shift_skb_data uses wrong mss in non-gso case?
@ 2010-08-17 22:35 Robert Evans
  2010-08-23 22:34 ` Jesse Brandeburg
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Evans @ 2010-08-17 22:35 UTC (permalink / raw)
  To: 'netdev@vger.kernel.org'

Hello all,

In reading through the latest SACK code introduced by 832d11c, I have noticed
that for the in_sack case tcp_shift_skb will take mss = tcp_skb_seglen(skb).
This seems to be wrong since the queue might contain small packets (f.e.
TCP_NODELAY). If the collapse succeeds, the resulting skb will have an
arbitrarily small gso_size equal to the original skb length.

8ed88d4:net/ipv4/tcp_input.c
   1506         in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) &&
   1507                   !before(end_seq, TCP_SKB_CB(skb)->end_seq);
   1508 
   1509         if (in_sack) {
   1510                 len = skb->len;
   1511                 pcount = tcp_skb_pcount(skb);
   1512                 mss = tcp_skb_seglen(skb);     /* possibly wrong? */
   1513 
   1514                 /* TODO: Fix DSACKs to not fragment already SACKed and w
   1514 e can
   1515                  * drop this restriction as unnecessary
   1516                  */
   1517                 if (mss != tcp_skb_seglen(prev))
   1518                         goto fallback;
   1519         } else {

This ends up being troublesome if the segment is later retransmitted and the
device driver has trouble with very small gso_size (e1000e seems to be an
example).

Is the small gso_size the correct and/or desired behavior?  Or am I missing
something else that prevents this from being a problem?


-Bob Evans

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_shift_skb_data uses wrong mss in non-gso case?
  2010-08-17 22:35 tcp_shift_skb_data uses wrong mss in non-gso case? Robert Evans
@ 2010-08-23 22:34 ` Jesse Brandeburg
  2010-08-24  7:46   ` Herbert Xu
  2010-08-24  8:59   ` Ilpo Järvinen
  0 siblings, 2 replies; 6+ messages in thread
From: Jesse Brandeburg @ 2010-08-23 22:34 UTC (permalink / raw)
  To: Robert Evans, Herbert Xu, Eric Dumazet, Ilpo Järvinen; +Cc: netdev

CC Ilpo, the creator of this patch being discussed:

On Tue, Aug 17, 2010 at 3:35 PM, Robert Evans
<robert.evans@nasdaqomx.com> wrote:
> Hello all,
>
> In reading through the latest SACK code introduced by 832d11c, I have noticed
> that for the in_sack case tcp_shift_skb will take mss = tcp_skb_seglen(skb).
> This seems to be wrong since the queue might contain small packets (f.e.
> TCP_NODELAY). If the collapse succeeds, the resulting skb will have an
> arbitrarily small gso_size equal to the original skb length.

yeah, gso_size really should never be == skb->length, because then it
implies you're offloading a frame to be segmented with no segmentable
data.

> 8ed88d4:net/ipv4/tcp_input.c
>   1506         in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) &&
>   1507                   !before(end_seq, TCP_SKB_CB(skb)->end_seq);
>   1508
>   1509         if (in_sack) {
>   1510                 len = skb->len;
>   1511                 pcount = tcp_skb_pcount(skb);
>   1512                 mss = tcp_skb_seglen(skb);     /* possibly wrong? */
>   1513
>   1514                 /* TODO: Fix DSACKs to not fragment already SACKed and w
>   1514 e can
>   1515                  * drop this restriction as unnecessary
>   1516                  */
>   1517                 if (mss != tcp_skb_seglen(prev))
>   1518                         goto fallback;
>   1519         } else {
>
> This ends up being troublesome if the segment is later retransmitted and the
> device driver has trouble with very small gso_size (e1000e seems to be an
> example).

I bet lots of other drivers will have issue with this too.

> Is the small gso_size the correct and/or desired behavior?  Or am I missing
> something else that prevents this from being a problem?

I believe that this is invalid for the stack to do, Ilpo, Herbert?
what do you think?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_shift_skb_data uses wrong mss in non-gso case?
  2010-08-23 22:34 ` Jesse Brandeburg
@ 2010-08-24  7:46   ` Herbert Xu
  2010-08-24  8:59   ` Ilpo Järvinen
  1 sibling, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2010-08-24  7:46 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Robert Evans, Eric Dumazet, Ilpo Järvinen, netdev

On Mon, Aug 23, 2010 at 03:34:04PM -0700, Jesse Brandeburg wrote:
> CC Ilpo, the creator of this patch being discussed:
> 
> On Tue, Aug 17, 2010 at 3:35 PM, Robert Evans
> > This ends up being troublesome if the segment is later retransmitted and the
> > device driver has trouble with very small gso_size (e1000e seems to be an
> > example).

I don't see the problem.  If mss == skb->len then tcp_shifted_skb
will zero gso_size.

Which code path is setting gso_size to skb->len?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_shift_skb_data uses wrong mss in non-gso case?
  2010-08-23 22:34 ` Jesse Brandeburg
  2010-08-24  7:46   ` Herbert Xu
@ 2010-08-24  8:59   ` Ilpo Järvinen
  2010-08-25 14:43     ` Robert Evans
  1 sibling, 1 reply; 6+ messages in thread
From: Ilpo Järvinen @ 2010-08-24  8:59 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Robert Evans, Herbert Xu, Eric Dumazet, netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2951 bytes --]

On Mon, 23 Aug 2010, Jesse Brandeburg wrote:

> On Tue, Aug 17, 2010 at 3:35 PM, Robert Evans
> <robert.evans@nasdaqomx.com> wrote:
> > Hello all,
> >
> > In reading through the latest SACK code introduced by 832d11c, I have noticed
> > that for the in_sack case tcp_shift_skb will take mss = tcp_skb_seglen(skb).
> > This seems to be wrong since the queue might contain small packets (f.e.
> > TCP_NODELAY). If the collapse succeeds, the resulting skb will have an
> > arbitrarily small gso_size equal to the original skb length.
> 
> yeah, gso_size really should never be == skb->length, because then it
> implies you're offloading a frame to be segmented with no segmentable
> data.

What makes you think we'd get such segments? Point here is verify that the 
segments to be combined into a single skb share segment size, small or 
large doesn't make any difference in that respect as both were counted as 
a packet (or pcount worth of packets) w.r.t. cwnd consumption.

> > 8ed88d4:net/ipv4/tcp_input.c
> >   1506         in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) &&
> >   1507                   !before(end_seq, TCP_SKB_CB(skb)->end_seq);
> >   1508
> >   1509         if (in_sack) {
> >   1510                 len = skb->len;
> >   1511                 pcount = tcp_skb_pcount(skb);
> >   1512                 mss = tcp_skb_seglen(skb);     /* possibly wrong? */
> >   1513
> >   1514                 /* TODO: Fix DSACKs to not fragment already SACKed and w
> >   1514 e can
> >   1515                  * drop this restriction as unnecessary
> >   1516                  */
> >   1517                 if (mss != tcp_skb_seglen(prev))
> >   1518                         goto fallback;
> >   1519         } else {
> >
> > This ends up being troublesome if the segment is later retransmitted and the
> > device driver has trouble with very small gso_size (e1000e seems to be an
> > example).
> 
> I bet lots of other drivers will have issue with this too.
>
> > Is the small gso_size the correct and/or desired behavior?  Or am I missing
> > something else that prevents this from being a problem?
> 
> I believe that this is invalid for the stack to do, Ilpo, Herbert?
> what do you think?

The point of this code is to work with already SACKed segments (or to 
become SACKed from the current ACK), only TCP code is interested in them 
anymore. They should not be sent again to wire unless SACK reneging occurs 
but then retransmission code should deal with this just fine:

        if (skb->len > cur_mss) {
                if (tcp_fragment(sk, skb, cur_mss, cur_mss))
                        return -ENOMEM; /* We'll try again later. */
        } else {
                int oldpcount = tcp_skb_pcount(skb);

                if (unlikely(oldpcount > 1)) {
                        tcp_init_tso_segs(sk, skb, cur_mss);
                        tcp_adjust_pcount(sk, skb, oldpcount - tcp_skb_pcount(skb));
                }
        }


-- 
 i.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: tcp_shift_skb_data uses wrong mss in non-gso case?
  2010-08-24  8:59   ` Ilpo Järvinen
@ 2010-08-25 14:43     ` Robert Evans
  2010-08-25 23:43       ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Evans @ 2010-08-25 14:43 UTC (permalink / raw)
  To: 'Ilpo Järvinen', Jesse Brandeburg
  Cc: Herbert Xu, Eric Dumazet, netdev

> > > Is the small gso_size the correct and/or desired behavior?  Or am I missing
> > > something else that prevents this from being a problem?
> > 
> > I believe that this is invalid for the stack to do, Ilpo, Herbert?
> > what do you think?
>
> The point of this code is to work with already SACKed segments (or to 
> become SACKed from the current ACK), only TCP code is interested in them 
> anymore. They should not be sent again to wire unless SACK reneging occurs 
> but then retransmission code should deal with this just fine:
>
>         if (skb->len > cur_mss) {
>                 if (tcp_fragment(sk, skb, cur_mss, cur_mss))
>                         return -ENOMEM; /* We'll try again later. */
>         } else {
>                 int oldpcount = tcp_skb_pcount(skb);
>
>                 if (unlikely(oldpcount > 1)) {
>                         tcp_init_tso_segs(sk, skb, cur_mss);
>                         tcp_adjust_pcount(sk, skb, oldpcount - tcp_skb_pcount(skb));
>                 }
>         }

Ilpo is right, the code in the retransmit path will prevent any packets 
with tiny gso_size from being sent out to the net device.

However, that code was added after Ilpo's SACK patches. The original patch 
went in for 2.6.29, and Ilpo fixed the retransmit in 2.6.30.  So 2.6.29 and 
its descendants can have this problem. I'm using 2.6.29.4, so there you 
have it.

A kernel update will definitely fix my problem, so that should be the 
end of it.  Thanks for your time everyone!

-Bob

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_shift_skb_data uses wrong mss in non-gso case?
  2010-08-25 14:43     ` Robert Evans
@ 2010-08-25 23:43       ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2010-08-25 23:43 UTC (permalink / raw)
  To: robert.evans
  Cc: ilpo.jarvinen, jesse.brandeburg, herbert, eric.dumazet, netdev

From: Robert Evans <robert.evans@nasdaqomx.com>
Date: Wed, 25 Aug 2010 10:43:23 -0400

> However, that code was added after Ilpo's SACK patches. The original patch 
> went in for 2.6.29, and Ilpo fixed the retransmit in 2.6.30.  So 2.6.29 and 
> its descendants can have this problem. I'm using 2.6.29.4, so there you 
> have it.
> 
> A kernel update will definitely fix my problem, so that should be the 
> end of it.  Thanks for your time everyone!

Thanks for hunting down the true cause of the problem.

If 2.6.29.x-stable was still active we should submit a backport but
to the best of my knowledge that branch -stable is no longer active.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-25 23:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-17 22:35 tcp_shift_skb_data uses wrong mss in non-gso case? Robert Evans
2010-08-23 22:34 ` Jesse Brandeburg
2010-08-24  7:46   ` Herbert Xu
2010-08-24  8:59   ` Ilpo Järvinen
2010-08-25 14:43     ` Robert Evans
2010-08-25 23:43       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.