netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates
@ 2020-05-04 18:27 Eric Dumazet
  2020-05-04 18:27 ` [PATCH net-next 1/2] tcp: refine tcp_pacing_delay() for very " Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Dumazet @ 2020-05-04 18:27 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Neal Cardwell, Yuchung Cheng

After pacing horizon addition, we have to adjust how we arm rto
timer, otherwise we might freeze very low pacing rate flows.

Eric Dumazet (2):
  tcp: refine tcp_pacing_delay() for very low pacing rates
  tcp: defer xmit timer reset in tcp_xmit_retransmit_queue()

 include/net/tcp.h     | 21 ++++++++-------------
 net/ipv4/tcp_input.c  |  4 ++--
 net/ipv4/tcp_output.c | 22 ++++++++++++----------
 3 files changed, 22 insertions(+), 25 deletions(-)

-- 
2.26.2.526.g744177e7f7-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net-next 1/2] tcp: refine tcp_pacing_delay() for very low pacing rates
  2020-05-04 18:27 [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates Eric Dumazet
@ 2020-05-04 18:27 ` Eric Dumazet
  2020-05-04 18:27 ` [PATCH net-next 2/2] tcp: defer xmit timer reset in tcp_xmit_retransmit_queue() Eric Dumazet
  2020-05-07  0:29 ` [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2020-05-04 18:27 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Neal Cardwell, Yuchung Cheng

With the addition of horizon feature to sch_fq, we noticed some
suboptimal behavior of extremely low pacing rate TCP flows, especially
when TCP is not aware of a drop happening in lower stacks.

Back in commit 3f80e08f40cd ("tcp: add tcp_reset_xmit_timer() helper"),
tcp_pacing_delay() was added to estimate an extra delay to add to standard
rto timers.

This patch removes the skb argument from this helper and
tcp_reset_xmit_timer() because it makes more sense to simply
consider the time at which next packet is allowed to be sent,
instead of the time of whatever packet has been sent.

This avoids arming RTO timer too soon and removes
spurious horizon drops.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h     | 21 ++++++++-------------
 net/ipv4/tcp_input.c  |  4 ++--
 net/ipv4/tcp_output.c |  8 +++-----
 3 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 1beed50522b116e670887e7af132174d21c71744..43b87a8d4790658b28356b5394e07b1419e27b77 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1289,26 +1289,22 @@ static inline bool tcp_needs_internal_pacing(const struct sock *sk)
 	return smp_load_acquire(&sk->sk_pacing_status) == SK_PACING_NEEDED;
 }
 
-/* Return in jiffies the delay before one skb is sent.
- * If @skb is NULL, we look at EDT for next packet being sent on the socket.
+/* Estimates in how many jiffies next packet for this flow can be sent.
+ * Scheduling a retransmit timer too early would be silly.
  */
-static inline unsigned long tcp_pacing_delay(const struct sock *sk,
-					     const struct sk_buff *skb)
+static inline unsigned long tcp_pacing_delay(const struct sock *sk)
 {
-	s64 pacing_delay = skb ? skb->tstamp : tcp_sk(sk)->tcp_wstamp_ns;
+	s64 delay = tcp_sk(sk)->tcp_wstamp_ns - tcp_sk(sk)->tcp_clock_cache;
 
-	pacing_delay -= tcp_sk(sk)->tcp_clock_cache;
-
-	return pacing_delay > 0 ? nsecs_to_jiffies(pacing_delay) : 0;
+	return delay > 0 ? nsecs_to_jiffies(delay) : 0;
 }
 
 static inline void tcp_reset_xmit_timer(struct sock *sk,
 					const int what,
 					unsigned long when,
-					const unsigned long max_when,
-					const struct sk_buff *skb)
+					const unsigned long max_when)
 {
-	inet_csk_reset_xmit_timer(sk, what, when + tcp_pacing_delay(sk, skb),
+	inet_csk_reset_xmit_timer(sk, what, when + tcp_pacing_delay(sk),
 				  max_when);
 }
 
@@ -1336,8 +1332,7 @@ static inline void tcp_check_probe_timer(struct sock *sk)
 {
 	if (!tcp_sk(sk)->packets_out && !inet_csk(sk)->icsk_pending)
 		tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0,
-				     tcp_probe0_base(sk), TCP_RTO_MAX,
-				     NULL);
+				     tcp_probe0_base(sk), TCP_RTO_MAX);
 }
 
 static inline void tcp_init_wl(struct tcp_sock *tp, u32 seq)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d68128a672ab05899d26eb9b1978c4a34023d51f..7d205b2a733ce940dba289a89ca7b78974b52caf 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3014,7 +3014,7 @@ void tcp_rearm_rto(struct sock *sk)
 			rto = usecs_to_jiffies(max_t(int, delta_us, 1));
 		}
 		tcp_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto,
-				     TCP_RTO_MAX, tcp_rtx_queue_head(sk));
+				     TCP_RTO_MAX);
 	}
 }
 
@@ -3291,7 +3291,7 @@ static void tcp_ack_probe(struct sock *sk)
 		unsigned long when = tcp_probe0_when(sk, TCP_RTO_MAX);
 
 		tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0,
-				     when, TCP_RTO_MAX, NULL);
+				     when, TCP_RTO_MAX);
 	}
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c414aeb1efa927c84bd4cd6afd0c21d46f049d5b..32c9db902f180aad3776b85bcdeb482443a9a5be 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2593,8 +2593,7 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto)
 	if (rto_delta_us > 0)
 		timeout = min_t(u32, timeout, usecs_to_jiffies(rto_delta_us));
 
-	tcp_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
-			     TCP_RTO_MAX, NULL);
+	tcp_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout, TCP_RTO_MAX);
 	return true;
 }
 
@@ -3174,8 +3173,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		    icsk->icsk_pending != ICSK_TIME_REO_TIMEOUT)
 			tcp_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
 					     inet_csk(sk)->icsk_rto,
-					     TCP_RTO_MAX,
-					     skb);
+					     TCP_RTO_MAX);
 	}
 }
 
@@ -3907,7 +3905,7 @@ void tcp_send_probe0(struct sock *sk)
 		 */
 		timeout = TCP_RESOURCE_PROBE_INTERVAL;
 	}
-	tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, timeout, TCP_RTO_MAX, NULL);
+	tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, timeout, TCP_RTO_MAX);
 }
 
 int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
-- 
2.26.2.526.g744177e7f7-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net-next 2/2] tcp: defer xmit timer reset in tcp_xmit_retransmit_queue()
  2020-05-04 18:27 [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates Eric Dumazet
  2020-05-04 18:27 ` [PATCH net-next 1/2] tcp: refine tcp_pacing_delay() for very " Eric Dumazet
@ 2020-05-04 18:27 ` Eric Dumazet
  2020-05-07  0:29 ` [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2020-05-04 18:27 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Neal Cardwell, Yuchung Cheng

As hinted in prior change ("tcp: refine tcp_pacing_delay()
for very low pacing rates"), it is probably best arming
the xmit timer only when all the packets have been scheduled,
rather than when the head of rtx queue has been re-sent.

This does matter for flows having extremely low pacing rates,
since their tp->tcp_wstamp_ns could be far in the future.

Note that the regular xmit path has a stronger limit
in tcp_small_queue_check(), meaning it is less likely to
go beyond the pacing horizon.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 32c9db902f180aad3776b85bcdeb482443a9a5be..a50e1990a845a258d4cc6a2a989d09068ea3a973 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3112,6 +3112,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct sk_buff *skb, *rtx_head, *hole = NULL;
 	struct tcp_sock *tp = tcp_sk(sk);
+	bool rearm_timer = false;
 	u32 max_segs;
 	int mib_idx;
 
@@ -3134,7 +3135,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
 		if (segs <= 0)
-			return;
+			break;
 		sacked = TCP_SKB_CB(skb)->sacked;
 		/* In case tcp_shift_skb_data() have aggregated large skbs,
 		 * we need to make sure not sending too bigs TSO packets
@@ -3159,10 +3160,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			continue;
 
 		if (tcp_small_queue_check(sk, skb, 1))
-			return;
+			break;
 
 		if (tcp_retransmit_skb(sk, skb, segs))
-			return;
+			break;
 
 		NET_ADD_STATS(sock_net(sk), mib_idx, tcp_skb_pcount(skb));
 
@@ -3171,10 +3172,13 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		if (skb == rtx_head &&
 		    icsk->icsk_pending != ICSK_TIME_REO_TIMEOUT)
-			tcp_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
-					     inet_csk(sk)->icsk_rto,
-					     TCP_RTO_MAX);
+			rearm_timer = true;
+
 	}
+	if (rearm_timer)
+		tcp_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
+				     inet_csk(sk)->icsk_rto,
+				     TCP_RTO_MAX);
 }
 
 /* We allow to exceed memory limits for FIN packets to expedite
-- 
2.26.2.526.g744177e7f7-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates
  2020-05-04 18:27 [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates Eric Dumazet
  2020-05-04 18:27 ` [PATCH net-next 1/2] tcp: refine tcp_pacing_delay() for very " Eric Dumazet
  2020-05-04 18:27 ` [PATCH net-next 2/2] tcp: defer xmit timer reset in tcp_xmit_retransmit_queue() Eric Dumazet
@ 2020-05-07  0:29 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2020-05-07  0:29 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, ncardwell, ycheng

From: Eric Dumazet <edumazet@google.com>
Date: Mon,  4 May 2020 11:27:48 -0700

> After pacing horizon addition, we have to adjust how we arm rto
> timer, otherwise we might freeze very low pacing rate flows.

Series applied, thanks Eric.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-07  0:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 18:27 [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates Eric Dumazet
2020-05-04 18:27 ` [PATCH net-next 1/2] tcp: refine tcp_pacing_delay() for very " Eric Dumazet
2020-05-04 18:27 ` [PATCH net-next 2/2] tcp: defer xmit timer reset in tcp_xmit_retransmit_queue() Eric Dumazet
2020-05-07  0:29 ` [PATCH net-next 0/2] tcp: minor adjustments for low pacing rates David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).