All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: prepare skbs for better sack shifting
@ 2016-09-15 16:33 Eric Dumazet
  2016-09-15 17:52 ` Yuchung Cheng
  2016-09-17 14:05 ` David Miller
  0 siblings, 2 replies; 3+ messages in thread
From: Eric Dumazet @ 2016-09-15 16:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

With large BDP TCP flows and lossy networks, it is very important
to keep a low number of skbs in the write queue.

RACK and SACK processing can perform a linear scan of it.

We should avoid putting any payload in skb->head, so that SACK
shifting can be done if needed.

With this patch, we allow to pack ~0.5 MB per skb instead of
the 64KB initially cooked at tcp_sendmsg() time.

This gives a reduction of number of skbs in write queue by eight.
tcp_rack_detect_loss() likes this.

We still allow payload in skb->head for first skb put in the queue,
to not impact RPC workloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp.c |   31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a13fcb369f52fe85def7c9d856259bc0509f3453..7dae800092e62cec330544851289d20a68642561 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1020,17 +1020,31 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 }
 EXPORT_SYMBOL(tcp_sendpage);
 
-static inline int select_size(const struct sock *sk, bool sg)
+/* Do not bother using a page frag for very small frames.
+ * But use this heuristic only for the first skb in write queue.
+ *
+ * Having no payload in skb->head allows better SACK shifting
+ * in tcp_shift_skb_data(), reducing sack/rack overhead, because
+ * write queue has less skbs.
+ * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
+ * This also speeds up tso_fragment(), since it wont fallback
+ * to tcp_fragment().
+ */
+static int linear_payload_sz(bool first_skb)
+{
+	if (first_skb)
+		return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+	return 0;
+}
+
+static int select_size(const struct sock *sk, bool sg, bool first_skb)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	int tmp = tp->mss_cache;
 
 	if (sg) {
 		if (sk_can_gso(sk)) {
-			/* Small frames wont use a full page:
-			 * Payload will immediately follow tcp header.
-			 */
-			tmp = SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+			tmp = linear_payload_sz(first_skb);
 		} else {
 			int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);
 
@@ -1161,6 +1175,8 @@ restart:
 		}
 
 		if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
+			bool first_skb;
+
 new_segment:
 			/* Allocate new segment. If the interface is SG,
 			 * allocate skb fitting to single page.
@@ -1172,10 +1188,11 @@ new_segment:
 				process_backlog = false;
 				goto restart;
 			}
+			first_skb = skb_queue_empty(&sk->sk_write_queue);
 			skb = sk_stream_alloc_skb(sk,
-						  select_size(sk, sg),
+						  select_size(sk, sg, first_skb),
 						  sk->sk_allocation,
-						  skb_queue_empty(&sk->sk_write_queue));
+						  first_skb);
 			if (!skb)
 				goto wait_for_memory;
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] tcp: prepare skbs for better sack shifting
  2016-09-15 16:33 [PATCH net-next] tcp: prepare skbs for better sack shifting Eric Dumazet
@ 2016-09-15 17:52 ` Yuchung Cheng
  2016-09-17 14:05 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: Yuchung Cheng @ 2016-09-15 17:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Thu, Sep 15, 2016 at 9:33 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> With large BDP TCP flows and lossy networks, it is very important
> to keep a low number of skbs in the write queue.
>
> RACK and SACK processing can perform a linear scan of it.
>
> We should avoid putting any payload in skb->head, so that SACK
> shifting can be done if needed.
>
> With this patch, we allow to pack ~0.5 MB per skb instead of
> the 64KB initially cooked at tcp_sendmsg() time.
>
> This gives a reduction of number of skbs in write queue by eight.
> tcp_rack_detect_loss() likes this.
>
> We still allow payload in skb->head for first skb put in the queue,
> to not impact RPC workloads.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>


> ---
>  net/ipv4/tcp.c |   31 ++++++++++++++++++++++++-------
>  1 file changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index a13fcb369f52fe85def7c9d856259bc0509f3453..7dae800092e62cec330544851289d20a68642561 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1020,17 +1020,31 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
>  }
>  EXPORT_SYMBOL(tcp_sendpage);
>
> -static inline int select_size(const struct sock *sk, bool sg)
> +/* Do not bother using a page frag for very small frames.
> + * But use this heuristic only for the first skb in write queue.
> + *
> + * Having no payload in skb->head allows better SACK shifting
> + * in tcp_shift_skb_data(), reducing sack/rack overhead, because
> + * write queue has less skbs.
> + * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
> + * This also speeds up tso_fragment(), since it wont fallback
> + * to tcp_fragment().
> + */
> +static int linear_payload_sz(bool first_skb)
> +{
> +       if (first_skb)
> +               return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> +       return 0;
> +}
> +
> +static int select_size(const struct sock *sk, bool sg, bool first_skb)
>  {
>         const struct tcp_sock *tp = tcp_sk(sk);
>         int tmp = tp->mss_cache;
>
>         if (sg) {
>                 if (sk_can_gso(sk)) {
> -                       /* Small frames wont use a full page:
> -                        * Payload will immediately follow tcp header.
> -                        */
> -                       tmp = SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> +                       tmp = linear_payload_sz(first_skb);
>                 } else {
>                         int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);
>
> @@ -1161,6 +1175,8 @@ restart:
>                 }
>
>                 if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
> +                       bool first_skb;
> +
>  new_segment:
>                         /* Allocate new segment. If the interface is SG,
>                          * allocate skb fitting to single page.
> @@ -1172,10 +1188,11 @@ new_segment:
>                                 process_backlog = false;
>                                 goto restart;
>                         }
> +                       first_skb = skb_queue_empty(&sk->sk_write_queue);
>                         skb = sk_stream_alloc_skb(sk,
> -                                                 select_size(sk, sg),
> +                                                 select_size(sk, sg, first_skb),
>                                                   sk->sk_allocation,
> -                                                 skb_queue_empty(&sk->sk_write_queue));
> +                                                 first_skb);
>                         if (!skb)
>                                 goto wait_for_memory;
>
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] tcp: prepare skbs for better sack shifting
  2016-09-15 16:33 [PATCH net-next] tcp: prepare skbs for better sack shifting Eric Dumazet
  2016-09-15 17:52 ` Yuchung Cheng
@ 2016-09-17 14:05 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2016-09-17 14:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ycheng

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Sep 2016 09:33:02 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> With large BDP TCP flows and lossy networks, it is very important
> to keep a low number of skbs in the write queue.
> 
> RACK and SACK processing can perform a linear scan of it.
> 
> We should avoid putting any payload in skb->head, so that SACK
> shifting can be done if needed.
> 
> With this patch, we allow to pack ~0.5 MB per skb instead of
> the 64KB initially cooked at tcp_sendmsg() time.
> 
> This gives a reduction of number of skbs in write queue by eight.
> tcp_rack_detect_loss() likes this.
> 
> We still allow payload in skb->head for first skb put in the queue,
> to not impact RPC workloads.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>

Applied.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-17 14:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-15 16:33 [PATCH net-next] tcp: prepare skbs for better sack shifting Eric Dumazet
2016-09-15 17:52 ` Yuchung Cheng
2016-09-17 14:05 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.