linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate
@ 2021-09-26  7:18 Qiumiao Zhang
  2021-09-26  7:18 ` [PATCH stable 4.19 1/4] tcp: address problems caused by EDT misshaps Qiumiao Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Qiumiao Zhang @ 2021-09-26  7:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: stable, Greg Kroah-Hartman, Sasha Levin, David S . Miller,
	Alexey Kuznetsov, netdev, yanan, rose.chen

This patch series is present in v5.15 and fixes the problem that the
timeout value calculated by tcp_model_timeout() is not accurate.

Pseudo-Shortlog of commits:

Eric Dumazet <edumazet@google.com>
    tcp: address problems caused by EDT misshaps

Yuchung Cheng <ycheng@google.com>
    tcp: always set retrans_stamp on recovery

Yuchung Cheng <ycheng@google.com>
    tcp: create a helper to model exponential backoff

Eric Dumazet <edumazet@google.com>
    tcp: adjust rto_base in retransmits_timed_out()

 net/ipv4/tcp_input.c  | 16 ++++++-----
 net/ipv4/tcp_output.c |  9 +++----
 net/ipv4/tcp_timer.c  | 63 ++++++++++++++++++++-----------------------
 3 files changed, 43 insertions(+), 45 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH stable 4.19 1/4] tcp: address problems caused by EDT misshaps
  2021-09-26  7:18 [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Qiumiao Zhang
@ 2021-09-26  7:18 ` Qiumiao Zhang
  2021-09-26  7:18 ` [PATCH stable 4.19 2/4] tcp: always set retrans_stamp on recovery Qiumiao Zhang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Qiumiao Zhang @ 2021-09-26  7:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: stable, Greg Kroah-Hartman, Sasha Levin, David S . Miller,
	Alexey Kuznetsov, netdev, yanan, rose.chen

From: Eric Dumazet <edumazet@google.com>

commit 9efdda4e3abed13f0903b7b6e4d4c2102019440a upstream

When a qdisc setup including pacing FQ is dismantled and recreated,
some TCP packets are sent earlier than instructed by TCP stack.

TCP can be fooled when ACK comes back, because the following
operation can return a negative value.

    tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;

Some paths in TCP stack were not dealing properly with this,
this patch addresses four of them.

Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
---
 net/ipv4/tcp_input.c | 16 ++++++++++------
 net/ipv4/tcp_timer.c | 10 ++++++----
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a058fb936c2a..2a5f7db9a653 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -581,10 +581,12 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
 		u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
 		u32 delta_us;
 
-		if (!delta)
-			delta = 1;
-		delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
-		tcp_rcv_rtt_update(tp, delta_us, 0);
+		if (likely(delta < INT_MAX / (USEC_PER_SEC / TCP_TS_HZ))) {
+			if (!delta)
+				delta = 1;
+			delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
+			tcp_rcv_rtt_update(tp, delta_us, 0);
+		}
 	}
 }
 
@@ -2934,9 +2936,11 @@ static bool tcp_ack_update_rtt(struct sock *sk, const int flag,
 	if (seq_rtt_us < 0 && tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
 	    flag & FLAG_ACKED) {
 		u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
-		u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
 
-		seq_rtt_us = ca_rtt_us = delta_us;
+		if (likely(delta < INT_MAX / (USEC_PER_SEC / TCP_TS_HZ))) {
+			seq_rtt_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
+			ca_rtt_us = seq_rtt_us;
+		}
 	}
 	rs->rtt_us = ca_rtt_us; /* RTT of last (S)ACKed packet (or -1) */
 	if (seq_rtt_us < 0)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 681882a40968..7c7f7e6cd955 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -40,15 +40,17 @@ static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 elapsed, start_ts;
+	s32 remaining;
 
 	start_ts = tcp_retransmit_stamp(sk);
 	if (!icsk->icsk_user_timeout || !start_ts)
 		return icsk->icsk_rto;
 	elapsed = tcp_time_stamp(tcp_sk(sk)) - start_ts;
-	if (elapsed >= icsk->icsk_user_timeout)
+	remaining = icsk->icsk_user_timeout - elapsed;
+	if (remaining <= 0)
 		return 1; /* user timeout has passed; fire ASAP */
-	else
-		return min_t(u32, icsk->icsk_rto, msecs_to_jiffies(icsk->icsk_user_timeout - elapsed));
+
+	return min_t(u32, icsk->icsk_rto, msecs_to_jiffies(remaining));
 }
 
 /**
@@ -210,7 +212,7 @@ static bool retransmits_timed_out(struct sock *sk,
 				(boundary - linear_backoff_thresh) * TCP_RTO_MAX;
 		timeout = jiffies_to_msecs(timeout);
 	}
-	return (tcp_time_stamp(tcp_sk(sk)) - start_ts) >= timeout;
+	return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0;
 }
 
 /* A write timeout has occurred. Process the after effects. */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH stable 4.19 2/4] tcp: always set retrans_stamp on recovery
  2021-09-26  7:18 [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Qiumiao Zhang
  2021-09-26  7:18 ` [PATCH stable 4.19 1/4] tcp: address problems caused by EDT misshaps Qiumiao Zhang
@ 2021-09-26  7:18 ` Qiumiao Zhang
  2021-09-26  7:18 ` [PATCH stable 4.19 3/4] tcp: create a helper to model exponential backoff Qiumiao Zhang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Qiumiao Zhang @ 2021-09-26  7:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: stable, Greg Kroah-Hartman, Sasha Levin, David S . Miller,
	Alexey Kuznetsov, netdev, yanan, rose.chen

From: Yuchung Cheng <ycheng@google.com>

commit 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c upstream

Previously TCP socket's retrans_stamp is not set if the
retransmission has failed to send. As a result if a socket is
experiencing local issues to retransmit packets, determining when
to abort a socket is complicated w/o knowning the starting time of
the recovery since retrans_stamp may remain zero.

This complication causes sub-optimal behavior that TCP may use the
latest, instead of the first, retransmission time to compute the
elapsed time of a stalling connection due to local issues. Then TCP
may disrecard TCP retries settings and keep retrying until it finally
succeed: not a good idea when the local host is already strained.

The simple fix is to always timestamp the start of a recovery.
It's worth noting that retrans_stamp is also used to compare echo
timestamp values to detect spurious recovery. This patch does
not break that because retrans_stamp is still later than when the
original packet was sent.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
---
 net/ipv4/tcp_output.c |  9 ++++-----
 net/ipv4/tcp_timer.c  | 23 +++--------------------
 2 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 97b9d671a83c..6710056fd1b2 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3011,13 +3011,12 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 #endif
 		TCP_SKB_CB(skb)->sacked |= TCPCB_RETRANS;
 		tp->retrans_out += tcp_skb_pcount(skb);
-
-		/* Save stamp of the first retransmit. */
-		if (!tp->retrans_stamp)
-			tp->retrans_stamp = tcp_skb_timestamp(skb);
-
 	}
 
+	/* Save stamp of the first (attempted) retransmit. */
+	if (!tp->retrans_stamp)
+		tp->retrans_stamp = tcp_skb_timestamp(skb);
+
 	if (tp->undo_retrans < 0)
 		tp->undo_retrans = 0;
 	tp->undo_retrans += tcp_skb_pcount(skb);
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 7c7f7e6cd955..9a904bf8decc 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -22,28 +22,14 @@
 #include <linux/gfp.h>
 #include <net/tcp.h>
 
-static u32 tcp_retransmit_stamp(const struct sock *sk)
-{
-	u32 start_ts = tcp_sk(sk)->retrans_stamp;
-
-	if (unlikely(!start_ts)) {
-		struct sk_buff *head = tcp_rtx_queue_head(sk);
-
-		if (!head)
-			return 0;
-		start_ts = tcp_skb_timestamp(head);
-	}
-	return start_ts;
-}
-
 static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 elapsed, start_ts;
 	s32 remaining;
 
-	start_ts = tcp_retransmit_stamp(sk);
-	if (!icsk->icsk_user_timeout || !start_ts)
+	start_ts = tcp_sk(sk)->retrans_stamp;
+	if (!icsk->icsk_user_timeout)
 		return icsk->icsk_rto;
 	elapsed = tcp_time_stamp(tcp_sk(sk)) - start_ts;
 	remaining = icsk->icsk_user_timeout - elapsed;
@@ -198,10 +184,7 @@ static bool retransmits_timed_out(struct sock *sk,
 	if (!inet_csk(sk)->icsk_retransmits)
 		return false;
 
-	start_ts = tcp_retransmit_stamp(sk);
-	if (!start_ts)
-		return false;
-
+	start_ts = tcp_sk(sk)->retrans_stamp;
 	if (likely(timeout == 0)) {
 		linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH stable 4.19 3/4] tcp: create a helper to model exponential backoff
  2021-09-26  7:18 [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Qiumiao Zhang
  2021-09-26  7:18 ` [PATCH stable 4.19 1/4] tcp: address problems caused by EDT misshaps Qiumiao Zhang
  2021-09-26  7:18 ` [PATCH stable 4.19 2/4] tcp: always set retrans_stamp on recovery Qiumiao Zhang
@ 2021-09-26  7:18 ` Qiumiao Zhang
  2021-09-26  7:18 ` [PATCH stable 4.19 4/4] tcp: adjust rto_base in retransmits_timed_out() Qiumiao Zhang
  2021-09-27 12:23 ` [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Greg Kroah-Hartman
  4 siblings, 0 replies; 6+ messages in thread
From: Qiumiao Zhang @ 2021-09-26  7:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: stable, Greg Kroah-Hartman, Sasha Levin, David S . Miller,
	Alexey Kuznetsov, netdev, yanan, rose.chen

From: Yuchung Cheng <ycheng@google.com>

commit 01a523b071618abbc634d1958229fe3bd2dfa5fa upstream

Create a helper to model TCP exponential backoff for the next patch.
This is pure refactor w no behavior change.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
---
 net/ipv4/tcp_timer.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 9a904bf8decc..9e9507f125a2 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -160,7 +160,20 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
 	tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
 }
 
-
+static unsigned int tcp_model_timeout(struct sock *sk,
+				      unsigned int boundary,
+				      unsigned int rto_base)
+{
+	unsigned int linear_backoff_thresh, timeout;
+
+	linear_backoff_thresh = ilog2(TCP_RTO_MAX / rto_base);
+	if (boundary <= linear_backoff_thresh)
+		timeout = ((2 << boundary) - 1) * rto_base;
+	else
+		timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
+			(boundary - linear_backoff_thresh) * TCP_RTO_MAX;
+	return jiffies_to_msecs(timeout);
+}
 /**
  *  retransmits_timed_out() - returns true if this connection has timed out
  *  @sk:       The current socket
@@ -178,23 +191,15 @@ static bool retransmits_timed_out(struct sock *sk,
 				  unsigned int boundary,
 				  unsigned int timeout)
 {
-	const unsigned int rto_base = TCP_RTO_MIN;
-	unsigned int linear_backoff_thresh, start_ts;
+	unsigned int start_ts;
 
 	if (!inet_csk(sk)->icsk_retransmits)
 		return false;
 
 	start_ts = tcp_sk(sk)->retrans_stamp;
-	if (likely(timeout == 0)) {
-		linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
-
-		if (boundary <= linear_backoff_thresh)
-			timeout = ((2 << boundary) - 1) * rto_base;
-		else
-			timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
-				(boundary - linear_backoff_thresh) * TCP_RTO_MAX;
-		timeout = jiffies_to_msecs(timeout);
-	}
+	if (likely(timeout == 0))
+		timeout = tcp_model_timeout(sk, boundary, TCP_RTO_MIN);
+
 	return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH stable 4.19 4/4] tcp: adjust rto_base in retransmits_timed_out()
  2021-09-26  7:18 [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Qiumiao Zhang
                   ` (2 preceding siblings ...)
  2021-09-26  7:18 ` [PATCH stable 4.19 3/4] tcp: create a helper to model exponential backoff Qiumiao Zhang
@ 2021-09-26  7:18 ` Qiumiao Zhang
  2021-09-27 12:23 ` [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Greg Kroah-Hartman
  4 siblings, 0 replies; 6+ messages in thread
From: Qiumiao Zhang @ 2021-09-26  7:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: stable, Greg Kroah-Hartman, Sasha Levin, David S . Miller,
	Alexey Kuznetsov, netdev, yanan, rose.chen

From: Eric Dumazet <edumazet@google.com>

commit 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17 upstream

The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.

But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value)

This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.

Fixes: a41e8a88b06e ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
---
 net/ipv4/tcp_timer.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 9e9507f125a2..d071ed6b8b9a 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -197,8 +197,13 @@ static bool retransmits_timed_out(struct sock *sk,
 		return false;
 
 	start_ts = tcp_sk(sk)->retrans_stamp;
-	if (likely(timeout == 0))
-		timeout = tcp_model_timeout(sk, boundary, TCP_RTO_MIN);
+	if (likely(timeout == 0)) {
+		unsigned int rto_base = TCP_RTO_MIN;
+
+		if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
+			rto_base = tcp_timeout_init(sk);
+		timeout = tcp_model_timeout(sk, boundary, rto_base);
+	}
 
 	return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate
  2021-09-26  7:18 [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Qiumiao Zhang
                   ` (3 preceding siblings ...)
  2021-09-26  7:18 ` [PATCH stable 4.19 4/4] tcp: adjust rto_base in retransmits_timed_out() Qiumiao Zhang
@ 2021-09-27 12:23 ` Greg Kroah-Hartman
  4 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-27 12:23 UTC (permalink / raw)
  To: Qiumiao Zhang
  Cc: linux-kernel, stable, Sasha Levin, David S . Miller,
	Alexey Kuznetsov, netdev, yanan, rose.chen

On Sun, Sep 26, 2021 at 03:18:38PM +0800, Qiumiao Zhang wrote:
> This patch series is present in v5.15 and fixes the problem that the
> timeout value calculated by tcp_model_timeout() is not accurate.

5.15 is not yet released :)

These are all in 5.1 and 5.4 and newer...

Anyway, I've queued them all up now, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-27 12:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-26  7:18 [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Qiumiao Zhang
2021-09-26  7:18 ` [PATCH stable 4.19 1/4] tcp: address problems caused by EDT misshaps Qiumiao Zhang
2021-09-26  7:18 ` [PATCH stable 4.19 2/4] tcp: always set retrans_stamp on recovery Qiumiao Zhang
2021-09-26  7:18 ` [PATCH stable 4.19 3/4] tcp: create a helper to model exponential backoff Qiumiao Zhang
2021-09-26  7:18 ` [PATCH stable 4.19 4/4] tcp: adjust rto_base in retransmits_timed_out() Qiumiao Zhang
2021-09-27 12:23 ` [PATCH stable 4.19 0/4] tcp: fix the timeout value calculated by tcp_model_timeout() is not accurate Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).