All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start
@ 2013-02-28 10:26 Thomas Graf
  2013-02-28 16:18 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Graf @ 2013-02-28 10:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, foraker1

If a TCP retransmission gets partially ACKed and collapsed multiple
times it is possible for the headroom to grow beyond 64K which will
overflow the 16bit skb->csum_start which is based on the start of
the headroom. It has been observed rarely in the wild with IPoIB due
to the 64K MTU.

With this patch, the overflow has not been observed for over a week
while previously it would occur within ~ 1 day.

A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
LLNL for helping out with the investigation and testing.

Reported-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/ipv4/tcp_output.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e2b4461..1902fee 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2305,6 +2305,12 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
 		if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))
 			break;
 
+		/* Never collapse if the resulting headroom + data exceeds
+		 * 64K as that is the maximum csum_start can cover.
+		 */
+		if (skb_headroom(to) + to->len + skb->len > 0xFFFF)
+			break;
+
 		tcp_collapse_retrans(sk, to);
 	}
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start
  2013-02-28 10:26 [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start Thomas Graf
@ 2013-02-28 16:18 ` Eric Dumazet
  2013-02-28 16:45   ` Thomas Graf
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-02-28 16:18 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev, foraker1

On Thu, 2013-02-28 at 10:26 +0000, Thomas Graf wrote:
> If a TCP retransmission gets partially ACKed and collapsed multiple
> times it is possible for the headroom to grow beyond 64K which will
> overflow the 16bit skb->csum_start which is based on the start of
> the headroom. It has been observed rarely in the wild with IPoIB due
> to the 64K MTU.
> 
> With this patch, the overflow has not been observed for over a week
> while previously it would occur within ~ 1 day.
> 
> A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
> LLNL for helping out with the investigation and testing.
> 
> Reported-by: Jim Foraker <foraker1@llnl.gov>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  net/ipv4/tcp_output.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index e2b4461..1902fee 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2305,6 +2305,12 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
>  		if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))
>  			break;
>  
> +		/* Never collapse if the resulting headroom + data exceeds
> +		 * 64K as that is the maximum csum_start can cover.
> +		 */
> +		if (skb_headroom(to) + to->len + skb->len > 0xFFFF)
> +			break;
> +
>  		tcp_collapse_retrans(sk, to);
>  	}
>  }

but.... what is the value of skb_availroom(to) ?

The earlier test at line 2302 should already guard this case ?

               /* Punt if not enough space exists in the first SKB for
                 * the data in the second
                 */
                if (skb->len > skb_availroom(to))
                        break;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start
  2013-02-28 16:18 ` Eric Dumazet
@ 2013-02-28 16:45   ` Thomas Graf
  2013-02-28 17:35     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Graf @ 2013-02-28 16:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, foraker1

On 02/28/13 at 08:18am, Eric Dumazet wrote:
> but.... what is the value of skb_availroom(to) ?
> 
> The earlier test at line 2302 should already guard this case ?
> 
>                /* Punt if not enough space exists in the first SKB for
>                  * the data in the second
>                  */
>                 if (skb->len > skb_availroom(to))
>                         break;

Only if it is guaranteed that we never see an MSS > 64K.

Assuming that is true forever, then a21d45726 (tcp: avoid order-1
allocations on wifi and tx path) does in fact resolve this
issue at the cost of not being able to use the extra room
kmalloc() might have given us in __alloc_skb() for collapsing.

Was that an intentional side effect of a21d45726?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start
  2013-02-28 16:45   ` Thomas Graf
@ 2013-02-28 17:35     ` Eric Dumazet
  2013-02-28 17:40       ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-02-28 17:35 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev, foraker1

On Thu, 2013-02-28 at 16:45 +0000, Thomas Graf wrote:
> On 02/28/13 at 08:18am, Eric Dumazet wrote:
> > but.... what is the value of skb_availroom(to) ?
> > 
> > The earlier test at line 2302 should already guard this case ?
> > 
> >                /* Punt if not enough space exists in the first SKB for
> >                  * the data in the second
> >                  */
> >                 if (skb->len > skb_availroom(to))
> >                         break;
> 
> Only if it is guaranteed that we never see an MSS > 64K.
> 
> Assuming that is true forever, then a21d45726 (tcp: avoid order-1
> allocations on wifi and tx path) does in fact resolve this
> issue at the cost of not being able to use the extra room
> kmalloc() might have given us in __alloc_skb() for collapsing.
> 
> Was that an intentional side effect of a21d45726?

This had a followup with 22b4a4f22da4
(tcp: fix retransmit of partially acked frames)

I also wonder if there is not another similar potential problem in
__tcp_retransmit_skb) after call to 

tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)

csum_start can overflow again because of MAX_TCP_HEADER headroom
reserve.

So maybe we should limit TCP MTU to (64K - MAX_TCP_HEADER)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start
  2013-02-28 17:35     ` Eric Dumazet
@ 2013-02-28 17:40       ` Eric Dumazet
  2013-02-28 19:23         ` Thomas Graf
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-02-28 17:40 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev, foraker1

On Thu, 2013-02-28 at 09:35 -0800, Eric Dumazet wrote:
> On Thu, 2013-02-28 at 16:45 +0000, Thomas Graf wrote:
> > On 02/28/13 at 08:18am, Eric Dumazet wrote:
> > > but.... what is the value of skb_availroom(to) ?
> > > 
> > > The earlier test at line 2302 should already guard this case ?
> > > 
> > >                /* Punt if not enough space exists in the first SKB for
> > >                  * the data in the second
> > >                  */
> > >                 if (skb->len > skb_availroom(to))
> > >                         break;
> > 
> > Only if it is guaranteed that we never see an MSS > 64K.
> > 
> > Assuming that is true forever, then a21d45726 (tcp: avoid order-1
> > allocations on wifi and tx path) does in fact resolve this
> > issue at the cost of not being able to use the extra room
> > kmalloc() might have given us in __alloc_skb() for collapsing.
> > 
> > Was that an intentional side effect of a21d45726?
> 
> This had a followup with 22b4a4f22da4
> (tcp: fix retransmit of partially acked frames)
> 
> I also wonder if there is not another similar potential problem in
> __tcp_retransmit_skb) after call to 
> 
> tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)
> 
> csum_start can overflow again because of MAX_TCP_HEADER headroom
> reserve.
> 
> So maybe we should limit TCP MTU to (64K - MAX_TCP_HEADER)
> 

Or adapt the test at line 2390 in net/ipv4/tcp_output.c

to force in the case there could be an overflow a :

struct sk_buff *nskb = __pskb_copy()

So that we have a new skb with minimal headroom.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start
  2013-02-28 17:40       ` Eric Dumazet
@ 2013-02-28 19:23         ` Thomas Graf
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Graf @ 2013-02-28 19:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, foraker1

On 02/28/13 at 09:40am, Eric Dumazet wrote:
> On Thu, 2013-02-28 at 09:35 -0800, Eric Dumazet wrote:
> > This had a followup with 22b4a4f22da4
> > (tcp: fix retransmit of partially acked frames)
> > 
> > I also wonder if there is not another similar potential problem in
> > __tcp_retransmit_skb) after call to 
> > 
> > tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)
> > 
> > csum_start can overflow again because of MAX_TCP_HEADER headroom
> > reserve.
> > 
> > So maybe we should limit TCP MTU to (64K - MAX_TCP_HEADER)
> > 
> 
> Or adapt the test at line 2390 in net/ipv4/tcp_output.c
> 
> to force in the case there could be an overflow a :
> 
> struct sk_buff *nskb = __pskb_copy()
> 
> So that we have a new skb with minimal headroom.

I would favour the __pskb_copy() variant as that would be more
generic. I'll redo the patch.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-02-28 19:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-28 10:26 [PATCH net] tcp: Don't collapse if resulting skb could overflow skb->csum_start Thomas Graf
2013-02-28 16:18 ` Eric Dumazet
2013-02-28 16:45   ` Thomas Graf
2013-02-28 17:35     ` Eric Dumazet
2013-02-28 17:40       ` Eric Dumazet
2013-02-28 19:23         ` Thomas Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.