All of lore.kernel.org
 help / color / mirror / Atom feed
* Why does tcp collapse behavior depend on nr_frags?
@ 2016-11-01  8:15 Ilya Lesokhin
  2016-11-01 13:33 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Lesokhin @ 2016-11-01  8:15 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, tls-fpga-sw-dev

Hi,
I've notice that tcp_can_collapse() returns false if skb_shinfo(skb)->nr_frags != 0.
Is there a reason why we want to base the collapse decision in retransmission on whether
the data is located in a frag or the linear part?

The relevant commit is
	 tcp: collapse more than two on retransmission  ('4a17fc3add594fcc1c778e93a95b6ecf47f630e5')

Thanks,
Ilya

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why does tcp collapse behavior depend on nr_frags?
  2016-11-01  8:15 Why does tcp collapse behavior depend on nr_frags? Ilya Lesokhin
@ 2016-11-01 13:33 ` Eric Dumazet
  2016-11-01 15:32   ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-11-01 13:33 UTC (permalink / raw)
  To: Ilya Lesokhin; +Cc: netdev, tls-fpga-sw-dev, Ilpo Järvinen

On Tue, 2016-11-01 at 08:15 +0000, Ilya Lesokhin wrote:
> Hi,
> I've notice that tcp_can_collapse() returns false if skb_shinfo(skb)->nr_frags != 0.
> Is there a reason why we want to base the collapse decision in retransmission on whether
> the data is located in a frag or the linear part?
> 
> The relevant commit is
> 	 tcp: collapse more than two on retransmission  ('4a17fc3add594fcc1c778e93a95b6ecf47f630e5')
> 
> Thanks,
> Ilya

Good point.

I am guessing you see this after commit
3613b3dbd1ade9a6a626dae1f608c57638eb5e8a
("tcp: prepare skbs for better sack shifting")  ?

Problem is that the left skb might have no skb_availroom() anyway...

We could theoretically use the same helpers we use at sack shifting,
but are these collapse events frequent enough we should care ?

Thanks !

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next] tcp: enhance tcp collapsing
  2016-11-01 13:33 ` Eric Dumazet
@ 2016-11-01 15:32   ` Eric Dumazet
  2016-11-01 16:35     ` Neal Cardwell
  2016-11-01 17:53     ` [PATCH v2 " Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-11-01 15:32 UTC (permalink / raw)
  To: Ilya Lesokhin, David Miller
  Cc: netdev, Ilpo Järvinen, Neal Cardwell, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

As Ilya Lesokhin suggested, we can collapse two skbs at retransmit
time even if the skb at the right has fragments.

We simply have to use more generic skb_copy_bits() instead of
skb_copy_from_linear_data() in tcp_collapse_retrans()

Tested:

Used following packetdrill test

// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+.100 < . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4

   +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
   +0 write(4, ..., 200) = 200
   +0 > P. 1:201(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 201:401(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 401:601(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 601:801(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 801:1001(200) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1001:1101(100) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1101:1201(100) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1201:1301(100) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1301:1401(100) ack 1

+.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401>
// Check that TCP collapse works :
   +0 > P. 1:1001(1000) ack 1


Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
net/ipv4/tcp_output.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 896e9dfbdb5cd9ca0fa003f6be2c5cd332dde7cf..ad668d97fbb1660a9ad44ddb459ea45e15a22de1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2529,8 +2529,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 
 	tcp_unlink_write_queue(next_skb, sk);
 
-	skb_copy_from_linear_data(next_skb, skb_put(skb, next_skb_size),
-				  next_skb_size);
+	skb_copy_bits(next_skb, 0, skb_put(skb, next_skb_size), next_skb_size);
 
 	if (next_skb->ip_summed == CHECKSUM_PARTIAL)
 		skb->ip_summed = CHECKSUM_PARTIAL;
@@ -2567,14 +2566,11 @@ static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb)
 {
 	if (tcp_skb_pcount(skb) > 1)
 		return false;
-	/* TODO: SACK collapsing could be used to remove this condition */
-	if (skb_shinfo(skb)->nr_frags != 0)
-		return false;
 	if (skb_cloned(skb))
 		return false;
 	if (skb == tcp_send_head(sk))
 		return false;
-	/* Some heurestics for collapsing over SACK'd could be invented */
+	/* Some heuristics for collapsing over SACK'd could be invented */
 	if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)
 		return false;
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: enhance tcp collapsing
  2016-11-01 15:32   ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet
@ 2016-11-01 16:35     ` Neal Cardwell
  2016-11-01 17:39       ` Eric Dumazet
  2016-11-01 17:53     ` [PATCH v2 " Eric Dumazet
  1 sibling, 1 reply; 8+ messages in thread
From: Neal Cardwell @ 2016-11-01 16:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ilya Lesokhin, David Miller, netdev, Ilpo Järvinen, Yuchung Cheng

On Tue, Nov 1, 2016 at 11:32 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> As Ilya Lesokhin suggested, we can collapse two skbs at retransmit
> time even if the skb at the right has fragments.
>
> We simply have to use more generic skb_copy_bits() instead of
> skb_copy_from_linear_data() in tcp_collapse_retrans()
>
> Tested:
>
> Used following packetdrill test
>
> // Establish a connection.
> 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>    +0 bind(3, ..., ...) = 0
>    +0 listen(3, 1) = 0
>
>    +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
> +.100 < . 1:1(0) ack 1 win 257
>    +0 accept(3, ..., ...) = 4
>
>    +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>    +0 write(4, ..., 200) = 200
>    +0 > P. 1:201(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 201:401(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 401:601(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 601:801(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 801:1001(200) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1001:1101(100) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1101:1201(100) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1201:1301(100) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1301:1401(100) ack 1
>
> +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401>
> // Check that TCP collapse works :
>    +0 > P. 1:1001(1000) ack 1
>
>
> Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: enhance tcp collapsing
  2016-11-01 16:35     ` Neal Cardwell
@ 2016-11-01 17:39       ` Eric Dumazet
  2016-11-01 17:47         ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-11-01 17:39 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Ilya Lesokhin, David Miller, netdev, Ilpo Järvinen, Yuchung Cheng

On Tue, 2016-11-01 at 12:35 -0400, Neal Cardwell wrote:
> On Tue, Nov 1, 2016 at 11:32 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > From: Eric Dumazet <edumazet@google.com>
> >
> > As Ilya Lesokhin suggested, we can collapse two skbs at retransmit
> > time even if the skb at the right has fragments.
> >
> > We simply have to use more generic skb_copy_bits() instead of
> > skb_copy_from_linear_data() in tcp_collapse_retrans()
> >

> >
> > Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Neal Cardwell <ncardwell@google.com>
> > ---
> 
> Acked-by: Neal Cardwell <ncardwell@google.com>
> 

Thanks Neal.

David please hold this patch, further tests show an issue I need to
track and fix.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: enhance tcp collapsing
  2016-11-01 17:39       ` Eric Dumazet
@ 2016-11-01 17:47         ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-11-01 17:47 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Ilya Lesokhin, David Miller, netdev, Ilpo Järvinen, Yuchung Cheng

On Tue, 2016-11-01 at 10:39 -0700, Eric Dumazet wrote:

> Thanks Neal.
> 
> David please hold this patch, further tests show an issue I need to
> track and fix.

The issue is that the right skb can be a FIN packet with no data,
merged to a left skb with frag(s).

Thus I need to call skb_copy_bits() only if there is something to copy,
otherwise skb_put() can BUG_ON in SKB_LINEAR_ASSERT(), even if @len is
0.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next] tcp: enhance tcp collapsing
  2016-11-01 15:32   ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet
  2016-11-01 16:35     ` Neal Cardwell
@ 2016-11-01 17:53     ` Eric Dumazet
  2016-11-01 19:35       ` Neal Cardwell
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-11-01 17:53 UTC (permalink / raw)
  To: Ilya Lesokhin, David Miller
  Cc: netdev, Ilpo Järvinen, Neal Cardwell, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

As Ilya Lesokhin suggested, we can collapse two skbs at retransmit
time even if the skb at the right has fragments.

We simply have to use more generic skb_copy_bits() instead of
skb_copy_from_linear_data() in tcp_collapse_retrans()

Also need to guard this skb_copy_bits() in case there is nothing to
copy, otherwise skb_put() could panic if left skb has frags.

Tested:

Used following packetdrill test

// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+.100 < . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4

   +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
   +0 write(4, ..., 200) = 200
   +0 > P. 1:201(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 201:401(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 401:601(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 601:801(200) ack 1
+.001 write(4, ..., 200) = 200
   +0 > P. 801:1001(200) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1001:1101(100) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1101:1201(100) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1201:1301(100) ack 1
+.001 write(4, ..., 100) = 100
   +0 > P. 1301:1401(100) ack 1

+.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401>
// Check that TCP collapse works :
   +0 > P. 1:1001(1000) ack 1


Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 896e9dfbdb5cd9ca0fa003f6be2c5cd332dde7cf..f57b5aa51b59cf0a58975fe34a7dcdb886ea8c50 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2529,8 +2529,9 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 
 	tcp_unlink_write_queue(next_skb, sk);
 
-	skb_copy_from_linear_data(next_skb, skb_put(skb, next_skb_size),
-				  next_skb_size);
+	if (next_skb_size)
+		skb_copy_bits(next_skb, 0, skb_put(skb, next_skb_size),
+			      next_skb_size);
 
 	if (next_skb->ip_summed == CHECKSUM_PARTIAL)
 		skb->ip_summed = CHECKSUM_PARTIAL;
@@ -2567,14 +2568,11 @@ static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb)
 {
 	if (tcp_skb_pcount(skb) > 1)
 		return false;
-	/* TODO: SACK collapsing could be used to remove this condition */
-	if (skb_shinfo(skb)->nr_frags != 0)
-		return false;
 	if (skb_cloned(skb))
 		return false;
 	if (skb == tcp_send_head(sk))
 		return false;
-	/* Some heurestics for collapsing over SACK'd could be invented */
+	/* Some heuristics for collapsing over SACK'd could be invented */
 	if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)
 		return false;
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] tcp: enhance tcp collapsing
  2016-11-01 17:53     ` [PATCH v2 " Eric Dumazet
@ 2016-11-01 19:35       ` Neal Cardwell
  0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2016-11-01 19:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ilya Lesokhin, David Miller, netdev, Ilpo Järvinen, Yuchung Cheng

On Tue, Nov 1, 2016 at 1:53 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> As Ilya Lesokhin suggested, we can collapse two skbs at retransmit
> time even if the skb at the right has fragments.
>
> We simply have to use more generic skb_copy_bits() instead of
> skb_copy_from_linear_data() in tcp_collapse_retrans()
>
> Also need to guard this skb_copy_bits() in case there is nothing to
> copy, otherwise skb_put() could panic if left skb has frags.
>
> Tested:
>
> Used following packetdrill test
>
> // Establish a connection.
> 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>    +0 bind(3, ..., ...) = 0
>    +0 listen(3, 1) = 0
>
>    +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
> +.100 < . 1:1(0) ack 1 win 257
>    +0 accept(3, ..., ...) = 4
>
>    +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>    +0 write(4, ..., 200) = 200
>    +0 > P. 1:201(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 201:401(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 401:601(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 601:801(200) ack 1
> +.001 write(4, ..., 200) = 200
>    +0 > P. 801:1001(200) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1001:1101(100) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1101:1201(100) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1201:1301(100) ack 1
> +.001 write(4, ..., 100) = 100
>    +0 > P. 1301:1401(100) ack 1
>
> +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401>
> // Check that TCP collapse works :
>    +0 > P. 1:1001(1000) ack 1
>
>
> Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric. :-)

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-01 19:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-01  8:15 Why does tcp collapse behavior depend on nr_frags? Ilya Lesokhin
2016-11-01 13:33 ` Eric Dumazet
2016-11-01 15:32   ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet
2016-11-01 16:35     ` Neal Cardwell
2016-11-01 17:39       ` Eric Dumazet
2016-11-01 17:47         ` Eric Dumazet
2016-11-01 17:53     ` [PATCH v2 " Eric Dumazet
2016-11-01 19:35       ` Neal Cardwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.