All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: fix slow start after idle vs TSO/GSO
@ 2015-08-20 17:08 Eric Dumazet
  2015-08-21 15:10 ` Neal Cardwell
  2015-08-21 19:30 ` [PATCH v2 " Eric Dumazet
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-08-20 17:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

slow start after idle might reduce cwnd, but we perform this
after first packet was cooked and sent.

With TSO/GSO, it means that we might send a full TSO packet
even if cwnd should have been reduced to IW10.

Moving the SSAI check in skb_entail() makes sense, because
we slightly reduce number of times this check is done,
especially for large send() and TCP Small queue callbacks from
softirq context.

Tested:

Following packetdrill test demonstrates the problem
// Test of slow start after idle

`sysctl -q net.ipv4.tcp_slow_start_after_idle=1`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.100 < . 1:1(0) ack 1 win 511
+0    accept(3, ..., ...) = 4
+0    setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0

+0    write(4, ..., 26000) = 26000
+0    > . 1:5001(5000) ack 1
+0    > . 5001:10001(5000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

+.100 < . 1:1(0) ack 10001 win 511
+0    %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }%
+0    > . 10001:20001(10000) ack 1
+0    > P. 20001:26001(6000) ack 1

+.100 < . 1:1(0) ack 26001 win 511
+0    %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }%

+4 write(4, ..., 20000) = 20000
// If slow start after idle works properly, we should send 5 MSS here (cwnd/2)
+0    > . 26001:31001(5000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
+0    > . 31001:36001(5000) ack 1

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 include/net/tcp.h     |    1 +
 net/ipv4/tcp.c        |    8 ++++++++
 net/ipv4/tcp_output.c |   12 ++++--------
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 364426a..639f64e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1165,6 +1165,7 @@ static inline void tcp_sack_reset(struct tcp_options_received *rx_opt)
 }
 
 u32 tcp_default_init_rwnd(u32 mss);
+void tcp_cwnd_restart(struct sock *sk, s32 delta);
 
 /* Determine a window scaling and initial window to offer. */
 void tcp_select_initial_window(int __space, __u32 mss, __u32 *rcv_wnd,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 45534a5..e228433 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -627,6 +627,14 @@ static void skb_entail(struct sock *sk, struct sk_buff *skb)
 	sk_mem_charge(sk, skb->truesize);
 	if (tp->nonagle & TCP_NAGLE_PUSH)
 		tp->nonagle &= ~TCP_NAGLE_PUSH;
+
+	if (sysctl_tcp_slow_start_after_idle &&
+	    sk->sk_write_queue.next == skb) {
+		s32 delta = tcp_time_stamp - tp->lsndtime;
+
+		if (delta > inet_csk(sk)->icsk_rto)
+			tcp_cwnd_restart(sk, delta);
+	}
 }
 
 static inline void tcp_mark_urg(struct tcp_sock *tp, int flags)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 444ab5b..1188e4f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -137,12 +137,12 @@ static __u16 tcp_advertise_mss(struct sock *sk)
 }
 
 /* RFC2861. Reset CWND after idle period longer RTO to "restart window".
- * This is the first part of cwnd validation mechanism. */
-static void tcp_cwnd_restart(struct sock *sk, const struct dst_entry *dst)
+ * This is the first part of cwnd validation mechanism.
+ */
+void tcp_cwnd_restart(struct sock *sk, s32 delta)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	s32 delta = tcp_time_stamp - tp->lsndtime;
-	u32 restart_cwnd = tcp_init_cwnd(tp, dst);
+	u32 restart_cwnd = tcp_init_cwnd(tp, __sk_dst_get(sk));
 	u32 cwnd = tp->snd_cwnd;
 
 	tcp_ca_event(sk, CA_EVENT_CWND_RESTART);
@@ -164,10 +164,6 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	const u32 now = tcp_time_stamp;
 
-	if (sysctl_tcp_slow_start_after_idle &&
-	    (!tp->packets_out && (s32)(now - tp->lsndtime) > icsk->icsk_rto))
-		tcp_cwnd_restart(sk, __sk_dst_get(sk));
-
 	tp->lsndtime = now;
 
 	/* If it is a reply for ato after last received

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] tcp: fix slow start after idle vs TSO/GSO
  2015-08-20 17:08 [PATCH net-next] tcp: fix slow start after idle vs TSO/GSO Eric Dumazet
@ 2015-08-21 15:10 ` Neal Cardwell
  2015-08-21 16:27   ` Eric Dumazet
  2015-08-21 19:30 ` [PATCH v2 " Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: Neal Cardwell @ 2015-08-21 15:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Yuchung Cheng

On Thu, Aug 20, 2015 at 1:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> slow start after idle might reduce cwnd, but we perform this
> after first packet was cooked and sent.
>
> With TSO/GSO, it means that we might send a full TSO packet
> even if cwnd should have been reduced to IW10.
>
> Moving the SSAI check in skb_entail() makes sense, because
> we slightly reduce number of times this check is done,
> especially for large send() and TCP Small queue callbacks from
> softirq context.

Very nice catch, and this fix seems like a definite improvement.

One potential issue is that the connection can restart from idle not
just because new data has been written (which this patch addresses),
but also because the receive window opens and so now packets can be
sent again. The old version of the code implicitly fired the restart
code path in the "receive window opens" case as well, since it fired
every time new data was sent. We might want to check if we need to
call tcp_cwnd_restart() in tcp_ack_update_window(), next to the call
for tcp_fast_path_check()?

neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] tcp: fix slow start after idle vs TSO/GSO
  2015-08-21 15:10 ` Neal Cardwell
@ 2015-08-21 16:27   ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-08-21 16:27 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, netdev, Yuchung Cheng

On Fri, 2015-08-21 at 11:10 -0400, Neal Cardwell wrote:

> Very nice catch, and this fix seems like a definite improvement.
> 
> One potential issue is that the connection can restart from idle not
> just because new data has been written (which this patch addresses),
> but also because the receive window opens and so now packets can be
> sent again. The old version of the code implicitly fired the restart
> code path in the "receive window opens" case as well, since it fired
> every time new data was sent. We might want to check if we need to
> call tcp_cwnd_restart() in tcp_ack_update_window(), next to the call
> for tcp_fast_path_check()?

Excellent, I wrote a 2nd packetdrill test to exercise this path, will
submit a v2 soon.

Thanks Neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 net-next] tcp: fix slow start after idle vs TSO/GSO
  2015-08-20 17:08 [PATCH net-next] tcp: fix slow start after idle vs TSO/GSO Eric Dumazet
  2015-08-21 15:10 ` Neal Cardwell
@ 2015-08-21 19:30 ` Eric Dumazet
  2015-08-22  1:43   ` Neal Cardwell
  2015-08-25 18:23   ` David Miller
  1 sibling, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-08-21 19:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

slow start after idle might reduce cwnd, but we perform this
after first packet was cooked and sent.

With TSO/GSO, it means that we might send a full TSO packet
even if cwnd should have been reduced to IW10.

Moving the SSAI check in skb_entail() makes sense, because
we slightly reduce number of times this check is done,
especially for large send() and TCP Small queue callbacks from
softirq context.

As Neal pointed out, we also need to perform the check
if/when receive window opens.

Tested:

Following packetdrill test demonstrates the problem
// Test of slow start after idle

`sysctl -q net.ipv4.tcp_slow_start_after_idle=1`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.100 < . 1:1(0) ack 1 win 511
+0    accept(3, ..., ...) = 4
+0    setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0

+0    write(4, ..., 26000) = 26000
+0    > . 1:5001(5000) ack 1
+0    > . 5001:10001(5000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

+.100 < . 1:1(0) ack 10001 win 511
+0    %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }%
+0    > . 10001:20001(10000) ack 1
+0    > P. 20001:26001(6000) ack 1

+.100 < . 1:1(0) ack 26001 win 511
+0    %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }%

+4 write(4, ..., 20000) = 20000
// If slow start after idle works properly, we should send 5 MSS here (cwnd/2)
+0    > . 26001:31001(5000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
+0    > . 31001:36001(5000) ack 1

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 include/net/tcp.h     |   13 +++++++++++++
 net/ipv4/tcp.c        |    2 ++
 net/ipv4/tcp_input.c  |    3 +++
 net/ipv4/tcp_output.c |   12 ++++--------
 4 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 364426a..309801f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1165,6 +1165,19 @@ static inline void tcp_sack_reset(struct tcp_options_received *rx_opt)
 }
 
 u32 tcp_default_init_rwnd(u32 mss);
+void tcp_cwnd_restart(struct sock *sk, s32 delta);
+
+static inline void tcp_slow_start_after_idle_check(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	s32 delta;
+
+	if (!sysctl_tcp_slow_start_after_idle || tp->packets_out)
+		return;
+	delta = tcp_time_stamp - tp->lsndtime;
+	if (delta > inet_csk(sk)->icsk_rto)
+		tcp_cwnd_restart(sk, delta);
+}
 
 /* Determine a window scaling and initial window to offer. */
 void tcp_select_initial_window(int __space, __u32 mss, __u32 *rcv_wnd,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 45534a5..b8b8fa1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -627,6 +627,8 @@ static void skb_entail(struct sock *sk, struct sk_buff *skb)
 	sk_mem_charge(sk, skb->truesize);
 	if (tp->nonagle & TCP_NAGLE_PUSH)
 		tp->nonagle &= ~TCP_NAGLE_PUSH;
+
+	tcp_slow_start_after_idle_check(sk);
 }
 
 static inline void tcp_mark_urg(struct tcp_sock *tp, int flags)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4e4d6bc..0abca28 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3332,6 +3332,9 @@ static int tcp_ack_update_window(struct sock *sk, const struct sk_buff *skb, u32
 			tp->pred_flags = 0;
 			tcp_fast_path_check(sk);
 
+			if (tcp_send_head(sk))
+				tcp_slow_start_after_idle_check(sk);
+
 			if (nwin > tp->max_window) {
 				tp->max_window = nwin;
 				tcp_sync_mss(sk, inet_csk(sk)->icsk_pmtu_cookie);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 444ab5b..1188e4f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -137,12 +137,12 @@ static __u16 tcp_advertise_mss(struct sock *sk)
 }
 
 /* RFC2861. Reset CWND after idle period longer RTO to "restart window".
- * This is the first part of cwnd validation mechanism. */
-static void tcp_cwnd_restart(struct sock *sk, const struct dst_entry *dst)
+ * This is the first part of cwnd validation mechanism.
+ */
+void tcp_cwnd_restart(struct sock *sk, s32 delta)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	s32 delta = tcp_time_stamp - tp->lsndtime;
-	u32 restart_cwnd = tcp_init_cwnd(tp, dst);
+	u32 restart_cwnd = tcp_init_cwnd(tp, __sk_dst_get(sk));
 	u32 cwnd = tp->snd_cwnd;
 
 	tcp_ca_event(sk, CA_EVENT_CWND_RESTART);
@@ -164,10 +164,6 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	const u32 now = tcp_time_stamp;
 
-	if (sysctl_tcp_slow_start_after_idle &&
-	    (!tp->packets_out && (s32)(now - tp->lsndtime) > icsk->icsk_rto))
-		tcp_cwnd_restart(sk, __sk_dst_get(sk));
-
 	tp->lsndtime = now;
 
 	/* If it is a reply for ato after last received

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next] tcp: fix slow start after idle vs TSO/GSO
  2015-08-21 19:30 ` [PATCH v2 " Eric Dumazet
@ 2015-08-22  1:43   ` Neal Cardwell
  2015-08-25 18:23   ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Neal Cardwell @ 2015-08-22  1:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Yuchung Cheng

On Fri, Aug 21, 2015 at 3:30 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> slow start after idle might reduce cwnd, but we perform this
> after first packet was cooked and sent.
>
> With TSO/GSO, it means that we might send a full TSO packet
> even if cwnd should have been reduced to IW10.
>
> Moving the SSAI check in skb_entail() makes sense, because
> we slightly reduce number of times this check is done,
> especially for large send() and TCP Small queue callbacks from
> softirq context.
>
> As Neal pointed out, we also need to perform the check
> if/when receive window opens.
...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Looks good to me. Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next] tcp: fix slow start after idle vs TSO/GSO
  2015-08-21 19:30 ` [PATCH v2 " Eric Dumazet
  2015-08-22  1:43   ` Neal Cardwell
@ 2015-08-25 18:23   ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2015-08-25 18:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ncardwell, ycheng

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 21 Aug 2015 12:30:00 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> slow start after idle might reduce cwnd, but we perform this
> after first packet was cooked and sent.
> 
> With TSO/GSO, it means that we might send a full TSO packet
> even if cwnd should have been reduced to IW10.
> 
> Moving the SSAI check in skb_entail() makes sense, because
> we slightly reduce number of times this check is done,
> especially for large send() and TCP Small queue callbacks from
> softirq context.
> 
> As Neal pointed out, we also need to perform the check
> if/when receive window opens.
> 
> Tested:
> 
> Following packetdrill test demonstrates the problem
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-25 18:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-20 17:08 [PATCH net-next] tcp: fix slow start after idle vs TSO/GSO Eric Dumazet
2015-08-21 15:10 ` Neal Cardwell
2015-08-21 16:27   ` Eric Dumazet
2015-08-21 19:30 ` [PATCH v2 " Eric Dumazet
2015-08-22  1:43   ` Neal Cardwell
2015-08-25 18:23   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.