[REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")

All of lore.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
@ 2016-06-23 10:08 Arjun V.
  2016-06-23 11:51 ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Arjun V. @ 2016-06-23 10:08 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

Hi all, 

The following patch introduced a regression in Chelsio cxgb4 driver, causing port failure when running heavy TSO traffic:

commit 10d3be569243def8d92ac3722395ef5a59c504e6
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Apr 21 10:55:23 2016 -0700

    tcp-tso: do not split TSO packets at retransmit time

    Linux TCP stack painfully segments all TSO/GSO packets before retransmits.

    This was fine back in the days when TSO/GSO were emerging, with their
    bugs, but we believe the dark age is over.

    Keeping big packets in write queues, but also in stack traversal
    has a lot of benefits.
     - Less memory overhead, because write queues have less skbs
     - Less cpu overhead at ACK processing.
     - Better SACK processing, as lot of studies mentioned how
       awful linux was at this ;)
     - Less cpu overhead to send the rtx packets
       (IP stack traversal, netfilter traversal, drivers...)
     - Better latencies in presence of losses.
     - Smaller spikes in fq like packet schedulers, as retransmits
       are not constrained by TCP Small Queues.

    1 % packet losses are common today, and at 100Gbit speeds, this
    translates to ~80,000 losses per second.
    Losses are often correlated, and we see many retransmit events
    leading to 1-MSS train of packets, at the time hosts are already
    under stress.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: David S. Miller davem@davemloft.net

When the number of TCP retransmissions are quite high, the packet length coming from stack does not seems to be proper, due to which our TSO module gets stuck. 
If I change segs back to 1 in __tcp_retransmit_skb(),  traffic is running fine. Please let us know if we are missing something.

Thanks,
Arjun.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-23 10:08 [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time") Arjun V.
@ 2016-06-23 11:51 ` Eric Dumazet
  2016-06-24 11:38   ` Arjun V.
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-06-23 11:51 UTC (permalink / raw)
  To: Arjun V.
  Cc: netdev, Hariprasad S, Casey Leedom, Kumar A S, Santosh Rastapur,
	Nirranjan Kirubaharan, davem, ycheng

On Thu, Jun 23, 2016 at 3:08 AM, Arjun V. <arjun@chelsio.com> wrote:
> Hi all,
>
> The following patch introduced a regression in Chelsio cxgb4 driver, causing port failure when running heavy TSO traffic:
>
> commit 10d3be569243def8d92ac3722395ef5a59c504e6
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Apr 21 10:55:23 2016 -0700
>
>     tcp-tso: do not split TSO packets at retransmit time
>
>     Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
>
>     This was fine back in the days when TSO/GSO were emerging, with their
>     bugs, but we believe the dark age is over.
>
>     Keeping big packets in write queues, but also in stack traversal
>     has a lot of benefits.
>      - Less memory overhead, because write queues have less skbs
>      - Less cpu overhead at ACK processing.
>      - Better SACK processing, as lot of studies mentioned how
>        awful linux was at this ;)
>      - Less cpu overhead to send the rtx packets
>        (IP stack traversal, netfilter traversal, drivers...)
>      - Better latencies in presence of losses.
>      - Smaller spikes in fq like packet schedulers, as retransmits
>        are not constrained by TCP Small Queues.
>
>     1 % packet losses are common today, and at 100Gbit speeds, this
>     translates to ~80,000 losses per second.
>     Losses are often correlated, and we see many retransmit events
>     leading to 1-MSS train of packets, at the time hosts are already
>     under stress.
>
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Acked-by: Yuchung Cheng <ycheng@google.com>
>     Signed-off-by: David S. Miller davem@davemloft.net
>
> When the number of TCP retransmissions are quite high, the packet length coming from stack does not seems to be proper, due to which our TSO module gets stuck.
> If I change segs back to 1 in __tcp_retransmit_skb(),  traffic is running fine. Please let us know if we are missing something.
>
> Thanks,
> Arjun.
>

Hmm... I see nothing wrong in TCP stack.

Can you give me more details on the wrong packet length you see ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-23 11:51 ` Eric Dumazet
@ 2016-06-24 11:38   ` Arjun V.
  2016-06-24 14:25     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Arjun V. @ 2016-06-24 11:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Hariprasad S, Casey Leedom, Kumar A S, Santosh Rastapur,
	Nirranjan Kirubaharan, davem, ycheng

[-- Attachment #1: Type: text/plain, Size: 3018 bytes --]

Eric,
We are seeing skb's with length(skb->len) greater than 65536 coming into our ndo_start_xmit() callback routine. 
We can add a check in our eth_xmit() routine to skip those packets, but it will be better if you fix this in kernel.


I have attached pcap file obtained from tcpdump. In the pcap file there are 2 such packets(I used tcpdump filter to extract out those packets).

Let us know if you need anything else.

Thanks,
Arjun.

-----Original Message-----
From: Eric Dumazet [mailto:edumazet@google.com] 
Sent: Thursday, June 23, 2016 5:21 PM
To: Arjun V.
Cc: netdev@vger.kernel.org; Hariprasad S; Casey Leedom; Kumar A S; Santosh Rastapur; Nirranjan Kirubaharan; davem@davemloft.net; ycheng@google.com
Subject: Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")

On Thu, Jun 23, 2016 at 3:08 AM, Arjun V. <arjun@chelsio.com> wrote:
> Hi all,
>
> The following patch introduced a regression in Chelsio cxgb4 driver, causing port failure when running heavy TSO traffic:
>
> commit 10d3be569243def8d92ac3722395ef5a59c504e6
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Apr 21 10:55:23 2016 -0700
>
>     tcp-tso: do not split TSO packets at retransmit time
>
>     Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
>
>     This was fine back in the days when TSO/GSO were emerging, with their
>     bugs, but we believe the dark age is over.
>
>     Keeping big packets in write queues, but also in stack traversal
>     has a lot of benefits.
>      - Less memory overhead, because write queues have less skbs
>      - Less cpu overhead at ACK processing.
>      - Better SACK processing, as lot of studies mentioned how
>        awful linux was at this ;)
>      - Less cpu overhead to send the rtx packets
>        (IP stack traversal, netfilter traversal, drivers...)
>      - Better latencies in presence of losses.
>      - Smaller spikes in fq like packet schedulers, as retransmits
>        are not constrained by TCP Small Queues.
>
>     1 % packet losses are common today, and at 100Gbit speeds, this
>     translates to ~80,000 losses per second.
>     Losses are often correlated, and we see many retransmit events
>     leading to 1-MSS train of packets, at the time hosts are already
>     under stress.
>
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Acked-by: Yuchung Cheng <ycheng@google.com>
>     Signed-off-by: David S. Miller davem@davemloft.net
>
> When the number of TCP retransmissions are quite high, the packet length coming from stack does not seems to be proper, due to which our TSO module gets stuck.
> If I change segs back to 1 in __tcp_retransmit_skb(),  traffic is running fine. Please let us know if we are missing something.
>
> Thanks,
> Arjun.
>

Hmm... I see nothing wrong in TCP stack.

Can you give me more details on the wrong packet length you see ?

[-- Attachment #2: tcpdump.pcap --]
[-- Type: application/octet-stream, Size: 131116 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-24 11:38   ` Arjun V.
@ 2016-06-24 14:25     ` Eric Dumazet
  2016-06-24 16:25       ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-06-24 14:25 UTC (permalink / raw)
  To: Arjun V.
  Cc: netdev, Hariprasad S, Casey Leedom, Kumar A S, Santosh Rastapur,
	Nirranjan Kirubaharan, davem, ycheng

Please do not top post on netdev mailing list

On Fri, Jun 24, 2016 at 4:38 AM, Arjun V. <arjun@chelsio.com> wrote:
> Eric,
> We are seeing skb's with length(skb->len) greater than 65536 coming into our ndo_start_xmit() callback routine.
> We can add a check in our eth_xmit() routine to skip those packets, but it will be better if you fix this in kernel.
>
>
> I have attached pcap file obtained from tcpdump. In the pcap file there are 2 such packets(I used tcpdump filter to extract out those packets).
>
> Let us know if you need anything else.
>

Beats me really.

At retransmit time, we only can eventually reduce packet sizes
(assuming GSO is used, because we might coalesce sub-mss packets in
tcp_retrans_try_collapse())

So why are you seeing too big packets at retransmit, I really have no idea.

MIght be some bug related to MSS computation, overflowing somehow ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-24 14:25     ` Eric Dumazet
@ 2016-06-24 16:25       ` Eric Dumazet
  2016-06-27  7:17         ` Arjun V
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-06-24 16:25 UTC (permalink / raw)
  To: Eric Dumazet, Arjun V.
  Cc: netdev, Hariprasad S, Casey Leedom, Kumar A S, Santosh Rastapur,
	Nirranjan Kirubaharan, davem, ycheng

On Fri, 2016-06-24 at 07:25 -0700, Eric Dumazet wrote:
> Please do not top post on netdev mailing list
> 
> On Fri, Jun 24, 2016 at 4:38 AM, Arjun V. <arjun@chelsio.com> wrote:
> > Eric,
> > We are seeing skb's with length(skb->len) greater than 65536 coming into our ndo_start_xmit() callback routine.
> > We can add a check in our eth_xmit() routine to skip those packets, but it will be better if you fix this in kernel.
> >
> >
> > I have attached pcap file obtained from tcpdump. In the pcap file there are 2 such packets(I used tcpdump filter to extract out those packets).
> >
> > Let us know if you need anything else.
> >
> 
> Beats me really.
> 
> At retransmit time, we only can eventually reduce packet sizes
> (assuming GSO is used, because we might coalesce sub-mss packets in
> tcp_retrans_try_collapse())
> 
> So why are you seeing too big packets at retransmit, I really have no idea.
> 
> MIght be some bug related to MSS computation, overflowing somehow ?

Could you try this ?

Thanks !

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d39e9e47a26e55ad2b8f775bf9ea9dfb5b12aee5..27013056bcfb9aa49601806bb3aa55a1ac664873 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1177,6 +1177,7 @@ int inet_sk_rebuild_header(struct sock *sk)
 
 		/* Routing failed... */
 		sk->sk_route_caps = 0;
+		sk->sk_gso_max_segs = 1;
 		/*
 		 * Other protocols have to map its equivalent state to TCP_SYN_SENT.
 		 * DCCP maps its DCCP_REQUESTING state to TCP_SYN_SENT. -acme
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 2076c21107d07e4e78a0a29f1d374c3414b8e1bd..ecc0281acfb702b138c68ac51e3a0518052785b0 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -685,6 +685,7 @@ int inet6_sk_rebuild_header(struct sock *sk)
 		dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 		if (IS_ERR(dst)) {
 			sk->sk_route_caps = 0;
+			sk->sk_gso_max_segs = 1;
 			sk->sk_err_soft = -PTR_ERR(dst);
 			return PTR_ERR(dst);
 		}

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-24 16:25       ` Eric Dumazet
@ 2016-06-27  7:17         ` Arjun V
  2016-06-27  8:06           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Arjun V @ 2016-06-27  7:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, netdev, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

On Friday, June 06/24/16, 2016 at 21:55:07 +0530, Eric Dumazet wrote:
> On Fri, 2016-06-24 at 07:25 -0700, Eric Dumazet wrote:
> > Please do not top post on netdev mailing list
> > 
> > On Fri, Jun 24, 2016 at 4:38 AM, Arjun V. <arjun@chelsio.com> wrote:
> > > Eric,
> > > We are seeing skb's with length(skb->len) greater than 65536 coming into our ndo_start_xmit() callback routine.
> > > We can add a check in our eth_xmit() routine to skip those packets, but it will be better if you fix this in kernel.
> > >
> > >
> > > I have attached pcap file obtained from tcpdump. In the pcap file there are 2 such packets(I used tcpdump filter to extract out those packets).
> > >
> > > Let us know if you need anything else.
> > >
> > 
> > Beats me really.
> > 
> > At retransmit time, we only can eventually reduce packet sizes
> > (assuming GSO is used, because we might coalesce sub-mss packets in
> > tcp_retrans_try_collapse())
> > 
> > So why are you seeing too big packets at retransmit, I really have no idea.
> > 
> > MIght be some bug related to MSS computation, overflowing somehow ?
> 
> Could you try this ?
> 
> Thanks !
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index d39e9e47a26e55ad2b8f775bf9ea9dfb5b12aee5..27013056bcfb9aa49601806bb3aa55a1ac664873 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1177,6 +1177,7 @@ int inet_sk_rebuild_header(struct sock *sk)
>  
>  		/* Routing failed... */
>  		sk->sk_route_caps = 0;
> +		sk->sk_gso_max_segs = 1;
>  		/*
>  		 * Other protocols have to map its equivalent state to TCP_SYN_SENT.
>  		 * DCCP maps its DCCP_REQUESTING state to TCP_SYN_SENT. -acme
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 2076c21107d07e4e78a0a29f1d374c3414b8e1bd..ecc0281acfb702b138c68ac51e3a0518052785b0 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -685,6 +685,7 @@ int inet6_sk_rebuild_header(struct sock *sk)
>  		dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
>  		if (IS_ERR(dst)) {
>  			sk->sk_route_caps = 0;
> +			sk->sk_gso_max_segs = 1;
>  			sk->sk_err_soft = -PTR_ERR(dst);
>  			return PTR_ERR(dst);
>  		}
> 
> 
> 
> 
Eric,

Thanks for the quick patch. It didn't help much. We still see packets larger than 65536 with the patch.

Below is the dump_stack() trace I am seeing for packets larger than 65536 in our xmit routine:

[  619.094366] CPU: 5 PID: 19322 Comm: netperf Not tainted 4.7.0-rc2+ #85
[  619.094373] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015
[  619.094379]  0000000000000001 ffff88067fd42c58 ffffffff81534635 ffff88065b59c2e0
[  619.094393]  ffff88065b520000 ffff8806570e1ae0 ffff88065b520000 ffff88067fd42d68
[  619.094404]  ffffffffa001cbee 000005f27fd42ca8 ffff880600000005 ffff880600000300
[  619.094417] Call Trace:
[  619.094421]  <IRQ>  [<ffffffff81534635>] dump_stack+0xa5/0xe0
[  619.094445]  [<ffffffffa001cbee>] t4_eth_xmit+0x43e/0xd90 [cxgb4]
[  619.094455]  [<ffffffff81187fa9>] ? trace_clock_local+0x9/0x10
[  619.094461]  [<ffffffff81198672>] ? rb_reserve_next_event+0x642/0x750
[  619.094469]  [<ffffffff81aee5ed>] dev_hard_start_xmit+0x15d/0x460
[  619.094475]  [<ffffffff81aedb80>] ? __netdev_pick_tx+0x170/0x1f0
[  619.094482]  [<ffffffff81af58ff>] ? validate_xmit_skb_list+0x4f/0xc0
[  619.094489]  [<ffffffff81b2b861>] sch_direct_xmit+0x221/0x2b0
[  619.094497]  [<ffffffff81af602c>] __dev_queue_xmit+0x6bc/0x9a0
[  619.094504]  [<ffffffff81bf79bf>] ? ipt_do_table+0x30f/0x620
[  619.094512]  [<ffffffff81af6330>] dev_queue_xmit+0x10/0x20
[  619.094518]  [<ffffffff81b774f9>] ip_finish_output2+0x459/0x570
[  619.094525]  [<ffffffff81b426c4>] ? nf_iterate+0xa4/0x110
[  619.094531]  [<ffffffff81b426c4>] ? nf_iterate+0xa4/0x110
[  619.094538]  [<ffffffff81b7973b>] ip_finish_output+0x1eb/0x330
[  619.094545]  [<ffffffff81b798f4>] ip_output+0x74/0x110
[  619.094552]  [<ffffffff81b78802>] ? __ip_local_out+0xb2/0x100
[  619.094557]  [<ffffffff81b79550>] ? ip_queue_xmit+0x4f0/0x4f0
[  619.094564]  [<ffffffff81b788d1>] ip_local_out+0x81/0xa0
[  619.094571]  [<ffffffff81b792b1>] ip_queue_xmit+0x251/0x4f0
[  619.094579]  [<ffffffff81b9bf19>] ? tcp_options_write+0x279/0x350
[  619.094585]  [<ffffffff81b9c7bb>] tcp_transmit_skb+0x7cb/0xe70
[  619.094592]  [<ffffffff81b9ef03>] ? tcp_current_mss+0x53/0xc0
[  619.094599]  [<ffffffff81b9f815>] __tcp_retransmit_skb+0x2e5/0x9c0
[  619.094607]  [<ffffffff810cb4a6>] ? get_nohz_timer_target+0x26/0x140
[  619.094613]  [<ffffffff81b9ff07>] tcp_retransmit_skb+0x17/0x100
[  619.094620]  [<ffffffff81ba046e>] tcp_xmit_retransmit_queue+0x47e/0x670
[  619.094626]  [<ffffffff81b8e753>] tcp_xmit_recovery+0xa3/0xb0
[  619.094631]  [<ffffffff81b98897>] tcp_ack+0xd67/0x1e70
[  619.094639]  [<ffffffff81b9634f>] ? tcp_validate_incoming+0x7f/0x560
[  619.094646]  [<ffffffff81b99af2>] tcp_rcv_established+0x152/0x780
[  619.094654]  [<ffffffff81baa13e>] tcp_v4_do_rcv+0x10e/0x3a0
[  619.094661]  [<ffffffff81bab24d>] tcp_v4_rcv+0xe7d/0xf20
[  619.094667]  [<ffffffff81bf4f93>] ? ipv4_confirm+0x133/0x1d0
[  619.094673]  [<ffffffff81b426c4>] ? nf_iterate+0xa4/0x110
[  619.094680]  [<ffffffff81b70138>] ip_local_deliver_finish+0xc8/0x2c0
[  619.094686]  [<ffffffff81b703f8>] ip_local_deliver+0xc8/0x140
[  619.094692]  [<ffffffff81ba9f02>] ? tcp_v4_early_demux+0x162/0x290
[  619.094697]  [<ffffffff81b70070>] ? ip_call_ra_chain+0x220/0x220
[  619.094703]  [<ffffffff81b6fc5f>] ip_rcv_finish+0x3ef/0x5e0
[  619.094709]  [<ffffffff81b4294a>] ? nf_hook_slow+0x4a/0x1b0
[  619.094714]  [<ffffffff81b70abb>] ip_rcv+0x64b/0x750
[  619.094721]  [<ffffffff81b6f870>] ? inet_add_protocol+0x60/0x60
[  619.094727]  [<ffffffff81af1e93>] __netif_receive_skb_core+0xd53/0x13e0
[  619.094735]  [<ffffffff81bb325f>] ? tcp4_gro_receive+0xef/0x2d0
[  619.094743]  [<ffffffff81bcd256>] ? inet_gro_receive+0x3b6/0x4e0
[  619.094749]  [<ffffffff81ae794a>] ? gro_pull_from_frag0+0x16a/0x190
[  619.094755]  [<ffffffff81af25a5>] __netif_receive_skb+0x85/0xe0
[  619.094762]  [<ffffffff81af2928>] netif_receive_skb_internal+0xf8/0x160
[  619.094768]  [<ffffffff81ae789d>] ? gro_pull_from_frag0+0xbd/0x190
[  619.094774]  [<ffffffff81b29f6c>] ? eth_type_trans+0x14c/0x280
[  619.094780]  [<ffffffff81af44d5>] napi_gro_frags+0x3e5/0x400
[  619.094792]  [<ffffffffa001bd85>] t4_ethrx_handler+0x8e5/0x9f0 [cxgb4]
[  619.094799]  [<ffffffff81b796c9>] ? ip_finish_output+0x179/0x330
[  619.094809]  [<ffffffffa001af55>] process_responses+0x735/0x8b0 [cxgb4]
[  619.094815]  [<ffffffff81b798f4>] ? ip_output+0x74/0x110
[  619.094822]  [<ffffffff81b788d1>] ? ip_local_out+0x81/0xa0
[  619.094828]  [<ffffffff8103e532>] ? native_sched_clock+0x82/0x1a0
[  619.094834]  [<ffffffff81187fa9>] ? trace_clock_local+0x9/0x10
[  619.094840]  [<ffffffff81198672>] ? rb_reserve_next_event+0x642/0x750
[  619.094849]  [<ffffffff81193e82>] ? rb_set_commit_to_write+0x212/0x350
[  619.094861]  [<ffffffffa001b186>] napi_rx_handler+0xb6/0x260 [cxgb4]
[  619.094868]  [<ffffffff81af3b4c>] napi_poll+0x1fc/0x3d0
[  619.094873]  [<ffffffff81198985>] ? ring_buffer_lock_reserve+0x205/0x260
[  619.094881]  [<ffffffff811933b3>] ? rb_event_data+0x53/0xa0
[  619.094886]  [<ffffffff81af3e1e>] net_rx_action+0xfe/0x2f0
[  619.094893]  [<ffffffff811b88e9>] ? trace_event_buffer_commit+0x179/0x2b0
[  619.094901]  [<ffffffff8109b225>] ? trace_event_raw_event_softirq+0x115/0x150
[  619.094909]  [<ffffffff81deed4d>] __do_softirq+0x1cd/0x4e1
[  619.094916]  [<ffffffff8109a868>] ? irq_exit+0x58/0x110
[  619.094925]  [<ffffffff81dedf1c>] do_softirq_own_stack+0x1c/0x30
[  619.094929]  <EOI>  [<ffffffff8109a458>] do_softirq+0x68/0x70
[  619.094940]  [<ffffffff8109a629>] __local_bh_enable_ip+0xf9/0x160
[  619.094946]  [<ffffffff81bf79bf>] ipt_do_table+0x30f/0x620
[  619.094954]  [<ffffffff81bfae3f>] iptable_mangle_hook+0xef/0x1d0
[  619.094960]  [<ffffffff81b426c4>] nf_iterate+0xa4/0x110
[  619.094966]  [<ffffffff81b4294a>] nf_hook_slow+0x4a/0x1b0
[  619.094972]  [<ffffffff81b7884b>] __ip_local_out+0xfb/0x100
[  619.094978]  [<ffffffff81b74f00>] ? neigh_key_eq32+0x20/0x20
[  619.094984]  [<ffffffff81b78882>] ip_local_out+0x32/0xa0
[  619.094990]  [<ffffffff81b792b1>] ip_queue_xmit+0x251/0x4f0
[  619.094997]  [<ffffffff81b9bf19>] ? tcp_options_write+0x279/0x350
[  619.095003]  [<ffffffff81b9c7bb>] tcp_transmit_skb+0x7cb/0xe70
[  619.095010]  [<ffffffff81b9e942>] tcp_send_ack+0x182/0x1c0
[  619.095016]  [<ffffffff81b96093>] __tcp_ack_snd_check+0x63/0x100
[  619.095023]  [<ffffffff81b99d9f>] tcp_rcv_established+0x3ff/0x780
[  619.095029]  [<ffffffff81baa13e>] tcp_v4_do_rcv+0x10e/0x3a0
[  619.095035]  [<ffffffff81b86969>] tcp_prequeue_process+0x169/0x190
[  619.095042]  [<ffffffff81b84e32>] ? tcp_cleanup_rbuf+0x72/0x280
[  619.095049]  [<ffffffff81b8b6ed>] tcp_recvmsg+0xfed/0x12d0
[  619.095057]  [<ffffffff812bb776>] ? __fget_light+0x96/0x100
[  619.095064]  [<ffffffff81bcc4b9>] inet_recvmsg+0xa9/0xe0
[  619.095071]  [<ffffffff81ac13da>] sock_recvmsg+0x4a/0x60
[  619.095078]  [<ffffffff81ac343d>] SYSC_recvfrom+0x13d/0x1d0
[  619.095085]  [<ffffffff8100276e>] ? do_audit_syscall_entry+0xae/0xe0
[  619.095093]  [<ffffffff81002a1e>] ? syscall_trace_enter_phase2+0x27e/0x2f0
[  619.095100]  [<ffffffff81ac34de>] SyS_recvfrom+0xe/0x10
[  619.095105]  [<ffffffff81003426>] do_syscall_64+0xa6/0x170
[  619.095111]  [<ffffffff81002479>] ? prepare_exit_to_usermode+0x49/0x80
[  619.095118]  [<ffffffff81dec4fc>] entry_SYSCALL64_slow_path+0x25/0x25

Thanks,
Arjun.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27  7:17         ` Arjun V
@ 2016-06-27  8:06           ` Eric Dumazet
  2016-06-27  8:59             ` Arjun V
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-06-27  8:06 UTC (permalink / raw)
  To: Arjun V
  Cc: Eric Dumazet, netdev, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

On Mon, 2016-06-27 at 12:47 +0530, Arjun V wrote:

> Eric,
> 
> Thanks for the quick patch. It didn't help much. We still see packets larger than 65536 with the patch.
> 
> Below is the dump_stack() trace I am seeing for packets larger than 65536 in our xmit routine:

What values do you get for skb->len ?

Again, at retransmit time we never grow packets, we only can split
existing packets. So there is something very wrong.

The original xmit should already have hit this issue.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27  8:06           ` Eric Dumazet
@ 2016-06-27  8:59             ` Arjun V
  2016-06-27  9:21               ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Arjun V @ 2016-06-27  8:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, netdev, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

On Monday, June 06/27/16, 2016 at 13:36:22 +0530, Eric Dumazet wrote:
> On Mon, 2016-06-27 at 12:47 +0530, Arjun V wrote:
> 
> > Eric,
> > 
> > Thanks for the quick patch. It didn't help much. We still see packets larger than 65536 with the patch.
> > 
> > Below is the dump_stack() trace I am seeing for packets larger than 65536 in our xmit routine:
> 
> What values do you get for skb->len ?
>
These are the skb->len values I am hitting in t4_eth_xmit() for around half hour run of netperf bidi(800 connections total)
packet with skb->len 69570 detected
packet with skb->len 66674 detected
packet with skb->len 68122 detected
packet with skb->len 68122 detected
packet with skb->len 72466 detected
packet with skb->len 76810 detected
packet with skb->len 69570 detected
packet with skb->len 66674 detected
packet with skb->len 66674 detected
packet with skb->len 68122 detected
packet with skb->len 71018 detected
packet with skb->len 66674 detected
packet with skb->len 66674 detected
packet with skb->len 66674 detected
packet with skb->len 69570 detected
packet with skb->len 68122 detected
packet with skb->len 73914 detected
packet with skb->len 66674 detected
packet with skb->len 69570 detected
packet with skb->len 69570 detected
packet with skb->len 69570 detected
packet with skb->len 68122 detected
packet with skb->len 72466 detected
packet with skb->len 68122 detected
packet with skb->len 69570 detected
packet with skb->len 69570 detected
packet with skb->len 72466 detected
packet with skb->len 72466 detected

> Again, at retransmit time we never grow packets, we only can split
> existing packets. So there is something very wrong.
> 
> The original xmit should already have hit this issue.
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27  8:59             ` Arjun V
@ 2016-06-27  9:21               ` Eric Dumazet
  2016-06-27  9:54                 ` Arjun V
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-06-27  9:21 UTC (permalink / raw)
  To: Arjun V
  Cc: Eric Dumazet, netdev, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

On Mon, 2016-06-27 at 14:29 +0530, Arjun V wrote:
> On Monday, June 06/27/16, 2016 at 13:36:22 +0530, Eric Dumazet wrote:
> > On Mon, 2016-06-27 at 12:47 +0530, Arjun V wrote:
> > 
> > > Eric,
> > > 
> > > Thanks for the quick patch. It didn't help much. We still see packets larger than 65536 with the patch.
> > > 
> > > Below is the dump_stack() trace I am seeing for packets larger than 65536 in our xmit routine:
> > 
> > What values do you get for skb->len ?
> >
> These are the skb->len values I am hitting in t4_eth_xmit() for around half hour run of netperf bidi(800 connections total)
> packet with skb->len 69570 detected
> packet with skb->len 66674 detected

Ok please add :

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c7ed147449c1b7ba029b12e033ad779a631460a..39bd2a2b8f4d931c514fe62a00541efef29157ad 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -845,6 +845,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 new_size_goal, size_goal;
+	u32 res;
 
 	if (!large_allowed || !sk_can_gso(sk))
 		return mss_now;
@@ -862,7 +863,11 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 		size_goal = tp->gso_segs * mss_now;
 	}
 
-	return max(size_goal, mss_now);
+	res = max(size_goal, mss_now);
+	if (res >= 65536)
+		pr_err_once("tcp_xmit_size_goal() -> %u size_goal %u, mss_now %u gso_max_size %u gso_segs %u\n",
+			    res, size_goal, mss_now, sk->sk_gso_max_size, tp->gso_segs);
+	return res;
 }
 
 static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27  9:21               ` Eric Dumazet
@ 2016-06-27  9:54                 ` Arjun V
  2016-06-27 10:50                   ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Arjun V @ 2016-06-27  9:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, netdev, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

On Monday, June 06/27/16, 2016 at 14:51:19 +0530, Eric Dumazet wrote:
> On Mon, 2016-06-27 at 14:29 +0530, Arjun V wrote:
> > On Monday, June 06/27/16, 2016 at 13:36:22 +0530, Eric Dumazet wrote:
> > > On Mon, 2016-06-27 at 12:47 +0530, Arjun V wrote:
> > > 
> > > > Eric,
> > > > 
> > > > Thanks for the quick patch. It didn't help much. We still see packets larger than 65536 with the patch.
> > > > 
> > > > Below is the dump_stack() trace I am seeing for packets larger than 65536 in our xmit routine:
> > > 
> > > What values do you get for skb->len ?
> > >
> > These are the skb->len values I am hitting in t4_eth_xmit() for around half hour run of netperf bidi(800 connections total)
> > packet with skb->len 69570 detected
> > packet with skb->len 66674 detected
> 
> Ok please add :
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 5c7ed147449c1b7ba029b12e033ad779a631460a..39bd2a2b8f4d931c514fe62a00541efef29157ad 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -845,6 +845,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
>  	u32 new_size_goal, size_goal;
> +	u32 res;
>  
>  	if (!large_allowed || !sk_can_gso(sk))
>  		return mss_now;
> @@ -862,7 +863,11 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
>  		size_goal = tp->gso_segs * mss_now;
>  	}
>  
> -	return max(size_goal, mss_now);
> +	res = max(size_goal, mss_now);
> +	if (res >= 65536)
> +		pr_err_once("tcp_xmit_size_goal() -> %u size_goal %u, mss_now %u gso_max_size %u gso_segs %u\n",
> +			    res, size_goal, mss_now, sk->sk_gso_max_size, tp->gso_segs);
> +	return res;
>  }
>  
>  static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
> 
> 
Applied your patch.
The above debug print is not getting invoked, when skb->len is greater than 65536.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27  9:54                 ` Arjun V
@ 2016-06-27 10:50                   ` Eric Dumazet
  2016-06-27 12:46                     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-06-27 10:50 UTC (permalink / raw)
  To: Arjun V
  Cc: Eric Dumazet, netdev, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

On Mon, 2016-06-27 at 15:24 +0530, Arjun V wrote:

> Applied your patch.
> The above debug print is not getting invoked, when skb->len is greater than 65536.

Interesting.

It looks like we might have a bug in tcp_shift_skb_data() not respecting
sk->sk_gso_max_size at all ...

I will send a patch.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27 10:50                   ` Eric Dumazet
@ 2016-06-27 12:46                     ` Eric Dumazet
  2016-06-27 13:12                       ` Neal Cardwell
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-06-27 12:46 UTC (permalink / raw)
  To: Arjun V
  Cc: Eric Dumazet, netdev, Hariprasad S, Casey Leedom, Kumar A S,
	Santosh Rastapur, Nirranjan Kirubaharan, davem, ycheng

On Mon, 2016-06-27 at 12:50 +0200, Eric Dumazet wrote:
> On Mon, 2016-06-27 at 15:24 +0530, Arjun V wrote:
> 
> > Applied your patch.
> > The above debug print is not getting invoked, when skb->len is greater than 65536.
> 
> Interesting.
> 
> It looks like we might have a bug in tcp_shift_skb_data() not respecting
> sk->sk_gso_max_size at all ...

Note that this also means TCP receiver misbehaves and force us to SACK
reneging.

Patch would be :

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8bd9911fdd16..3587efe22864 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2784,6 +2784,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
 		if (segs <= 0)
 			return;
+		/* In case tcp_shift_skb_data() have aggregated large skbs,
+		 * we need to make sure not sending too big TSO packets.
+		 */
+		segs = min_t(int, segs, tp->gso_segs);
 
 		if (fwd_rexmitting) {
 begin_fwd:

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27 12:46                     ` Eric Dumazet
@ 2016-06-27 13:12                       ` Neal Cardwell
  2016-06-27 13:27                         ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Neal Cardwell @ 2016-06-27 13:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arjun V, Eric Dumazet, netdev, Hariprasad S, Casey Leedom,
	Kumar A S, Santosh Rastapur, Nirranjan Kirubaharan, davem,
	ycheng

On Mon, Jun 27, 2016 at 8:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Patch would be :
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 8bd9911fdd16..3587efe22864 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2784,6 +2784,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
>                 segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
>                 if (segs <= 0)
>                         return;
> +               /* In case tcp_shift_skb_data() have aggregated large skbs,
> +                * we need to make sure not sending too big TSO packets.
> +                */
> +               segs = min_t(int, segs, tp->gso_segs);
>
>                 if (fwd_rexmitting) {
>  begin_fwd:

Nice catch, Eric. What do you think about using tcp_tso_autosize()
instead of tp->gso_segs? The goal would be to get autosized skbs in
this case, instead of 64KByte skbs. In addition to helping this corner
case of SACK reneging, this might also help things in general, since
if we are retransmitting packets then the cwnd and hence pacing rate
and hence autosized skb length might be smaller now than they were
when the packets were first sent. Just a thought.

neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27 13:12                       ` Neal Cardwell
@ 2016-06-27 13:27                         ` Eric Dumazet
  2016-06-27 13:36                           ` Neal Cardwell
  2016-06-27 15:38                           ` [PATCH net] tcp: do not send too big packets at retransmit time Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2016-06-27 13:27 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Arjun V, Eric Dumazet, netdev, Hariprasad S, Casey Leedom,
	Kumar A S, Santosh Rastapur, Nirranjan Kirubaharan, davem,
	ycheng

On Mon, 2016-06-27 at 09:12 -0400, Neal Cardwell wrote:
> On Mon, Jun 27, 2016 at 8:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Patch would be :
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 8bd9911fdd16..3587efe22864 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2784,6 +2784,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> >                 segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
> >                 if (segs <= 0)
> >                         return;
> > +               /* In case tcp_shift_skb_data() have aggregated large skbs,
> > +                * we need to make sure not sending too big TSO packets.
> > +                */
> > +               segs = min_t(int, segs, tp->gso_segs);
> >
> >                 if (fwd_rexmitting) {
> >  begin_fwd:
> 
> Nice catch, Eric. What do you think about using tcp_tso_autosize()
> instead of tp->gso_segs? The goal would be to get autosized skbs in
> this case, instead of 64KByte skbs. In addition to helping this corner
> case of SACK reneging, this might also help things in general, since
> if we are retransmitting packets then the cwnd and hence pacing rate
> and hence autosized skb length might be smaller now than they were
> when the packets were first sent. Just a thought.

Excellent idea ;)

Here is the v2 patch.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8bd9911fdd16..e00e972c4e6a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2751,7 +2751,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 	struct sk_buff *hole = NULL;
-	u32 last_lost;
+	u32 max_segs, last_lost;
 	int mib_idx;
 	int fwd_rexmitting = 0;
 
@@ -2771,6 +2771,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		last_lost = tp->snd_una;
 	}
 
+	max_segs = tcp_tso_autosize(sk, tcp_current_mss(sk));
 	tcp_for_write_queue_from(skb, sk) {
 		__u8 sacked = TCP_SKB_CB(skb)->sacked;
 		int segs;
@@ -2784,6 +2785,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
 		if (segs <= 0)
 			return;
+		/* In case tcp_shift_skb_data() have aggregated large skbs,
+		 * we need to make sure not sending too big TSO packets.
+		 */
+		segs = min_t(int, segs, max_segs);
 
 		if (fwd_rexmitting) {
 begin_fwd:

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27 13:27                         ` Eric Dumazet
@ 2016-06-27 13:36                           ` Neal Cardwell
  2016-06-27 14:13                             ` Arjun V
  2016-06-27 15:38                           ` [PATCH net] tcp: do not send too big packets at retransmit time Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Neal Cardwell @ 2016-06-27 13:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arjun V, Eric Dumazet, netdev, Hariprasad S, Casey Leedom,
	Kumar A S, Santosh Rastapur, Nirranjan Kirubaharan, davem,
	ycheng

On Mon, Jun 27, 2016 at 9:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Excellent idea ;)
>
> Here is the v2 patch.
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 8bd9911fdd16..e00e972c4e6a 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2751,7 +2751,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
>         struct tcp_sock *tp = tcp_sk(sk);
>         struct sk_buff *skb;
>         struct sk_buff *hole = NULL;
> -       u32 last_lost;
> +       u32 max_segs, last_lost;
>         int mib_idx;
>         int fwd_rexmitting = 0;
>
> @@ -2771,6 +2771,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
>                 last_lost = tp->snd_una;
>         }
>
> +       max_segs = tcp_tso_autosize(sk, tcp_current_mss(sk));
>         tcp_for_write_queue_from(skb, sk) {
>                 __u8 sacked = TCP_SKB_CB(skb)->sacked;
>                 int segs;
> @@ -2784,6 +2785,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
>                 segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
>                 if (segs <= 0)
>                         return;
> +               /* In case tcp_shift_skb_data() have aggregated large skbs,
> +                * we need to make sure not sending too big TSO packets.
> +                */
> +               segs = min_t(int, segs, max_segs);
>
>                 if (fwd_rexmitting) {
>  begin_fwd:
>

Looks great to me. Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")
  2016-06-27 13:36                           ` Neal Cardwell
@ 2016-06-27 14:13                             ` Arjun V
  0 siblings, 0 replies; 19+ messages in thread
From: Arjun V @ 2016-06-27 14:13 UTC (permalink / raw)
  To: Neal Cardwell, Eric Dumazet
  Cc: netdev, Hariprasad S, Casey Leedom, Kumar A S, Santosh Rastapur,
	Nirranjan Kirubaharan, davem, ycheng

On Monday, June 06/27/16, 2016 at 19:06:05 +0530, Neal Cardwell wrote:
> On Mon, Jun 27, 2016 at 9:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Excellent idea ;)
> >
> > Here is the v2 patch.
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 8bd9911fdd16..e00e972c4e6a 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2751,7 +2751,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> >         struct tcp_sock *tp = tcp_sk(sk);
> >         struct sk_buff *skb;
> >         struct sk_buff *hole = NULL;
> > -       u32 last_lost;
> > +       u32 max_segs, last_lost;
> >         int mib_idx;
> >         int fwd_rexmitting = 0;
> >
> > @@ -2771,6 +2771,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> >                 last_lost = tp->snd_una;
> >         }
> >
> > +       max_segs = tcp_tso_autosize(sk, tcp_current_mss(sk));
> >         tcp_for_write_queue_from(skb, sk) {
> >                 __u8 sacked = TCP_SKB_CB(skb)->sacked;
> >                 int segs;
> > @@ -2784,6 +2785,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> >                 segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
> >                 if (segs <= 0)
> >                         return;
> > +               /* In case tcp_shift_skb_data() have aggregated large skbs,
> > +                * we need to make sure not sending too big TSO packets.
> > +                */
> > +               segs = min_t(int, segs, max_segs);
> >
> >                 if (fwd_rexmitting) {
> >  begin_fwd:
> >
> 
> Looks great to me. Thanks, Eric!
> 
> neal

This patch works fine. Now I am not seeing any packets with len > 64K. Thanks Eric for your support.

Arjun.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net] tcp: do not send too big packets at retransmit time
  2016-06-27 13:27                         ` Eric Dumazet
  2016-06-27 13:36                           ` Neal Cardwell
@ 2016-06-27 15:38                           ` Eric Dumazet
  2016-06-27 15:47                             ` Neal Cardwell
  2016-06-29  9:26                             ` David Miller
  1 sibling, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2016-06-27 15:38 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Arjun V, Eric Dumazet, netdev, Hariprasad S, Casey Leedom,
	Kumar A S, Santosh Rastapur, Nirranjan Kirubaharan, davem,
	ycheng

From: Eric Dumazet <edumazet@google.com>

Arjun reported a bug in TCP stack and bisected it to a recent commit.

In case where we process SACK, we can coalesce multiple skbs
into fat ones (tcp_shift_skb_data()), to lower write queue
overhead, because we do not expect to retransmit these packets.

However, SACK reneging can happen, forcing the sender to retransmit
all these packets. If skb->len is above 64KB, we then send buggy
IP packets that could hang TSO engine on cxgb4.

Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
so that we cook packets of optimal size vs TCP/pacing.

Thanks to Arjun for reporting the bug and running the tests !

Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun V <arjun@chelsio.com>
Tested-by: Arjun V <arjun@chelsio.com>
---
 net/ipv4/tcp_output.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8bd9911fdd16..e00e972c4e6a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2751,7 +2751,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 	struct sk_buff *hole = NULL;
-	u32 last_lost;
+	u32 max_segs, last_lost;
 	int mib_idx;
 	int fwd_rexmitting = 0;
 
@@ -2771,6 +2771,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		last_lost = tp->snd_una;
 	}
 
+	max_segs = tcp_tso_autosize(sk, tcp_current_mss(sk));
 	tcp_for_write_queue_from(skb, sk) {
 		__u8 sacked = TCP_SKB_CB(skb)->sacked;
 		int segs;
@@ -2784,6 +2785,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
 		if (segs <= 0)
 			return;
+		/* In case tcp_shift_skb_data() have aggregated large skbs,
+		 * we need to make sure not sending too bigs TSO packets
+		 */
+		segs = min_t(int, segs, max_segs);
 
 		if (fwd_rexmitting) {
 begin_fwd:

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net] tcp: do not send too big packets at retransmit time
  2016-06-27 15:38                           ` [PATCH net] tcp: do not send too big packets at retransmit time Eric Dumazet
@ 2016-06-27 15:47                             ` Neal Cardwell
  2016-06-29  9:26                             ` David Miller
  1 sibling, 0 replies; 19+ messages in thread
From: Neal Cardwell @ 2016-06-27 15:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arjun V, Eric Dumazet, netdev, Hariprasad S, Casey Leedom,
	Kumar A S, Santosh Rastapur, Nirranjan Kirubaharan, davem,
	ycheng

On Mon, Jun 27, 2016 at 11:38 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Arjun reported a bug in TCP stack and bisected it to a recent commit.
>
> In case where we process SACK, we can coalesce multiple skbs
> into fat ones (tcp_shift_skb_data()), to lower write queue
> overhead, because we do not expect to retransmit these packets.
>
> However, SACK reneging can happen, forcing the sender to retransmit
> all these packets. If skb->len is above 64KB, we then send buggy
> IP packets that could hang TSO engine on cxgb4.
>
> Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
> so that we cook packets of optimal size vs TCP/pacing.
>
> Thanks to Arjun for reporting the bug and running the tests !
>
> Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Arjun V <arjun@chelsio.com>
> Tested-by: Arjun V <arjun@chelsio.com>
> ---
>  net/ipv4/tcp_output.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] tcp: do not send too big packets at retransmit time
  2016-06-27 15:38                           ` [PATCH net] tcp: do not send too big packets at retransmit time Eric Dumazet
  2016-06-27 15:47                             ` Neal Cardwell
@ 2016-06-29  9:26                             ` David Miller
  1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2016-06-29  9:26 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ncardwell, arjun, edumazet, netdev, hariprasad, leedom, kumaras,
	santosh, nirranjan, ycheng

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 27 Jun 2016 17:38:50 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> Arjun reported a bug in TCP stack and bisected it to a recent commit.
> 
> In case where we process SACK, we can coalesce multiple skbs
> into fat ones (tcp_shift_skb_data()), to lower write queue
> overhead, because we do not expect to retransmit these packets.
> 
> However, SACK reneging can happen, forcing the sender to retransmit
> all these packets. If skb->len is above 64KB, we then send buggy
> IP packets that could hang TSO engine on cxgb4.
> 
> Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
> so that we cook packets of optimal size vs TCP/pacing.
> 
> Thanks to Arjun for reporting the bug and running the tests !
> 
> Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Arjun V <arjun@chelsio.com>
> Tested-by: Arjun V <arjun@chelsio.com>

Applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-06-29  9:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-23 10:08 [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time") Arjun V.
2016-06-23 11:51 ` Eric Dumazet
2016-06-24 11:38   ` Arjun V.
2016-06-24 14:25     ` Eric Dumazet
2016-06-24 16:25       ` Eric Dumazet
2016-06-27  7:17         ` Arjun V
2016-06-27  8:06           ` Eric Dumazet
2016-06-27  8:59             ` Arjun V
2016-06-27  9:21               ` Eric Dumazet
2016-06-27  9:54                 ` Arjun V
2016-06-27 10:50                   ` Eric Dumazet
2016-06-27 12:46                     ` Eric Dumazet
2016-06-27 13:12                       ` Neal Cardwell
2016-06-27 13:27                         ` Eric Dumazet
2016-06-27 13:36                           ` Neal Cardwell
2016-06-27 14:13                             ` Arjun V
2016-06-27 15:38                           ` [PATCH net] tcp: do not send too big packets at retransmit time Eric Dumazet
2016-06-27 15:47                             ` Neal Cardwell
2016-06-29  9:26                             ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.