[PATCH net 0/3] tcp: fix xmit timer rearming to avoid stalls

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net 0/3] tcp: fix xmit timer rearming to avoid stalls
@ 2017-08-01  2:58 Neal Cardwell
  2017-08-01  2:58 ` [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix Neal Cardwell
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Neal Cardwell @ 2017-08-01  2:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell

This patch series is a bug fix for a TCP loss recovery performance bug
reported independently in recent netdev threads:
    
 (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
 (ii) July 26, 2017: netdev thread:
       "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
       outstanding TLP retransmission"

Many thanks to Klavs Klavsen and Mao Wenan for the detailed reports,
traces, and packetdrill test cases, which enabled us to root-cause
this issue and verify the fix.

Neal Cardwell (3):
  tcp: introduce tcp_rto_delta_us() helper for xmit timer fix
  tcp: enable xmit timer fix by having TLP use time when RTO should fire
  tcp: fix xmit timer to only be reset if data ACKed/SACKed

 include/net/tcp.h     | 10 ++++++++++
 net/ipv4/tcp_input.c  | 30 +++++++++++++++++-------------
 net/ipv4/tcp_output.c | 21 ++++-----------------
 3 files changed, 31 insertions(+), 30 deletions(-)

-- 
2.14.0.rc0.400.g1c36432dff-goog

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix
  2017-08-01  2:58 [PATCH net 0/3] tcp: fix xmit timer rearming to avoid stalls Neal Cardwell
@ 2017-08-01  2:58 ` Neal Cardwell
  2017-08-01  7:16   ` Eric Dumazet
  2017-08-01  2:58 ` [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire Neal Cardwell
  2017-08-01  2:58 ` [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed Neal Cardwell
  2 siblings, 1 reply; 22+ messages in thread
From: Neal Cardwell @ 2017-08-01  2:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng, Nandita Dukkipati

Pure refactor. This helper will be required in the xmit timer fix
later in the patch series. (Because the TLP logic will want to make
this calculation.)

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
---
 include/net/tcp.h    | 10 ++++++++++
 net/ipv4/tcp_input.c |  5 +----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70483296157f..ada65e767b28 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1916,6 +1916,16 @@ extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq,
 			     u64 xmit_time);
 extern void tcp_rack_reo_timeout(struct sock *sk);
 
+/* At how many usecs into the future should the RTO fire? */
+static inline s64 tcp_rto_delta_us(const struct sock *sk)
+{
+	const struct sk_buff *skb = tcp_write_queue_head(sk);
+	u32 rto = inet_csk(sk)->icsk_rto;
+	u64 rto_time_stamp_us = skb->skb_mstamp + jiffies_to_usecs(rto);
+
+	return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
+}
+
 /*
  * Save and compile IPv4 options, return a pointer to it
  */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2920e0cb09f8..345febf0a46e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3004,10 +3004,7 @@ void tcp_rearm_rto(struct sock *sk)
 		/* Offset the time elapsed after installing regular RTO */
 		if (icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT ||
 		    icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
-			struct sk_buff *skb = tcp_write_queue_head(sk);
-			u64 rto_time_stamp = skb->skb_mstamp +
-					     jiffies_to_usecs(rto);
-			s64 delta_us = rto_time_stamp - tp->tcp_mstamp;
+			s64 delta_us = tcp_rto_delta_us(sk);
 			/* delta_us may not be positive if the socket is locked
 			 * when the retrans timer fires and is rescheduled.
 			 */
-- 
2.14.0.rc0.400.g1c36432dff-goog

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire
  2017-08-01  2:58 [PATCH net 0/3] tcp: fix xmit timer rearming to avoid stalls Neal Cardwell
  2017-08-01  2:58 ` [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix Neal Cardwell
@ 2017-08-01  2:58 ` Neal Cardwell
  2017-08-01  7:22   ` Eric Dumazet
  2017-08-01  2:58 ` [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed Neal Cardwell
  2 siblings, 1 reply; 22+ messages in thread
From: Neal Cardwell @ 2017-08-01  2:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng, Nandita Dukkipati

Have tcp_schedule_loss_probe() base the TLP scheduling decision based
on when the RTO *should* fire. This is to enable the upcoming xmit
timer fix in this series, where tcp_schedule_loss_probe() cannot
assume that the last timer installed was an RTO timer (because we are
no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
ACK). So tcp_schedule_loss_probe() must independently figure out when
an RTO would want to fire.

In the new TLP implementation following in this series, we cannot
assume that icsk_timeout was set based on an RTO; after processing a
cumulative ACK the icsk_timeout we see can be from a previous TLP or
RTO. So we need to independently recalculate the RTO time (instead of
reading it out of icsk_timeout). Removing this dependency on the
nature of icsk_timeout makes things a little easier to reason about
anyway.

Note that the old and new code should be equivalent, since they are
both saying: "if the RTO is in the future, but at an earlier time than
the normal TLP time, then set the TLP timer to fire when the RTO would
have fired".

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
---
 net/ipv4/tcp_output.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2f1588bf73da..0ae6b5d176c0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2377,8 +2377,8 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	u32 timeout, tlp_time_stamp, rto_time_stamp;
 	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
+	u32 timeout, rto_delta_us;

 	/* No consecutive loss probes. */
 	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
@@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 	timeout = max_t(u32, timeout, msecs_to_jiffies(10));

 	/* If RTO is shorter, just schedule TLP in its place. */
-	tlp_time_stamp = tcp_jiffies32 + timeout;
-	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
-	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
-		s32 delta = rto_time_stamp - tcp_jiffies32;
-		if (delta > 0)
-			timeout = delta;
-	}
+	rto_delta_us = tcp_rto_delta_us(sk);  /* How far in future is RTO? */
+	if (rto_delta_us > 0)
+		timeout = min_t(u32, timeout, usecs_to_jiffies(rto_delta_us));

 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
 				  TCP_RTO_MAX);
-- 
2.14.0.rc0.400.g1c36432dff-goog

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-01  2:58 [PATCH net 0/3] tcp: fix xmit timer rearming to avoid stalls Neal Cardwell
  2017-08-01  2:58 ` [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix Neal Cardwell
  2017-08-01  2:58 ` [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire Neal Cardwell
@ 2017-08-01  2:58 ` Neal Cardwell
  2017-08-01  7:25   ` Eric Dumazet
                     ` (3 more replies)
  2 siblings, 4 replies; 22+ messages in thread
From: Neal Cardwell @ 2017-08-01  2:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng, Nandita Dukkipati

Fix a TCP loss recovery performance bug raised recently on the netdev
list, in two threads:

(i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
(ii) July 26, 2017: netdev thread:
     "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
     outstanding TLP retransmission"

The basic problem is that incoming TCP packets that did not indicate
forward progress could cause the xmit timer (TLP or RTO) to be rearmed
and pushed back in time. In certain corner cases this could result in
the following problems noted in these threads:

 - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
   could cause TCP to repeatedly schedule TLPs forever. We kept
   sending TLPs after every ~200ms, which elicited bogus SACKs, which
   caused more TLPs, ad infinitum; we never fired an RTO to fill in
   the holes.

 - Incoming data segments could, in some cases, cause us to reschedule
   our RTO or TLP timer further out in time, for no good reason. This
   could cause repeated inbound data to result in stalls in outbound
   data, in the presence of packet loss.

This commit fixes these bugs by changing the TLP and RTO ACK
processing to:

 (a) Only reschedule the xmit timer once per ACK.

 (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
     ACK indicates sufficient forward progress (a packet was
     cumulatively ACKed, or we got a SACK for a packet that was sent
     before the most recent retransmit of the write queue head).

This brings us back into closer compliance with the RFCs, since, as
the comment for tcp_rearm_rto() notes, we should only restart the RTO
timer after forward progress on the connection. Previously we were
restarting the xmit timer even in these cases where there was no
forward progress.

As a side benefit, this commit simplifies and speeds up the TCP timer
arming logic. We had been calling inet_csk_reset_xmit_timer() three
times on normal ACKs that cumulatively acknowledged some data:

1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
        if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
               tcp_rearm_rto(sk);

2) Once in tcp_clean_rtx_queue(), to update the RTO:
        if (flag & FLAG_ACKED) {
               tcp_rearm_rto(sk);

3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
   to TLP:
        if (icsk->icsk_pending == ICSK_TIME_RETRANS)
               tcp_schedule_loss_probe(sk);

This commit, by only rescheduling the xmit timer once per ACK,
simplifies the code and reduces CPU overhead.

This commit was tested in an A/B test with Google web server
traffic. SNMP stats and request latency metrics were within noise
levels, substantiating that for normal web traffic patterns this is a
rare issue. This commit was also tested with packetdrill tests to
verify that it fixes the timer behavior in the corner cases discussed
in the netdev threads mentioned above.

This patch is a bug fix patch intended to be queued for -stable
relases.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Reported-by: Klavs Klavsen <kl@vsen.dk>
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
---
 net/ipv4/tcp_input.c  | 25 ++++++++++++++++---------
 net/ipv4/tcp_output.c |  9 ---------
 2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 345febf0a46e..3e777cfbba56 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -107,6 +107,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 #define FLAG_ORIG_SACK_ACKED	0x200 /* Never retransmitted data are (s)acked	*/
 #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
 #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained D-SACK info */
+#define FLAG_SET_XMIT_TIMER	0x1000 /* Set TLP or RTO timer */
 #define FLAG_SACK_RENEGING	0x2000 /* snd_una advanced to a sacked seq */
 #define FLAG_UPDATE_TS_RECENT	0x4000 /* tcp_replace_ts_recent() */
 #define FLAG_NO_CHALLENGE_ACK	0x8000 /* do not call tcp_send_challenge_ack()	*/
@@ -3016,6 +3017,13 @@ void tcp_rearm_rto(struct sock *sk)
 	}
 }
 
+/* Try to schedule a loss probe; if that doesn't work, then schedule an RTO. */
+static void tcp_set_xmit_timer(struct sock *sk)
+{
+	if (!tcp_schedule_loss_probe(sk))
+		tcp_rearm_rto(sk);
+}
+
 /* If we get here, the whole TSO packet has not been acked. */
 static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
 {
@@ -3177,7 +3185,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 					ca_rtt_us, sack->rate);
 
 	if (flag & FLAG_ACKED) {
-		tcp_rearm_rto(sk);
+		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
 		if (unlikely(icsk->icsk_mtup.probe_size &&
 			     !after(tp->mtu_probe.probe_seq_end, tp->snd_una))) {
 			tcp_mtup_probe_success(sk);
@@ -3205,7 +3213,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 		 * after when the head was last (re)transmitted. Otherwise the
 		 * timeout may continue to extend in loss recovery.
 		 */
-		tcp_rearm_rto(sk);
+		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
 	}
 
 	if (icsk->icsk_ca_ops->pkts_acked) {
@@ -3577,9 +3585,6 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (after(ack, tp->snd_nxt))
 		goto invalid_ack;
 
-	if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
-		tcp_rearm_rto(sk);
-
 	if (after(ack, prior_snd_una)) {
 		flag |= FLAG_SND_UNA_ADVANCED;
 		icsk->icsk_retransmits = 0;
@@ -3644,18 +3649,20 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una, &acked,
 				    &sack_state);
 
+	if (tp->tlp_high_seq)
+		tcp_process_tlp_ack(sk, ack, flag);
+	/* If needed, reset TLP/RTO timer; RACK may later override this. */
+	if (flag & FLAG_SET_XMIT_TIMER)
+		tcp_set_xmit_timer(sk);
+
 	if (tcp_ack_is_dubious(sk, flag)) {
 		is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP));
 		tcp_fastretrans_alert(sk, acked, is_dupack, &flag, &rexmit);
 	}
-	if (tp->tlp_high_seq)
-		tcp_process_tlp_ack(sk, ack, flag);
 
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
 		sk_dst_confirm(sk);
 
-	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
-		tcp_schedule_loss_probe(sk);
 	delivered = tp->delivered - delivered;	/* freshly ACKed or SACKed */
 	lost = tp->lost - lost;			/* freshly marked lost */
 	tcp_rate_gen(sk, delivered, lost, sack_state.rate);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0ae6b5d176c0..c99cba897b9c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2380,21 +2380,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
 	u32 timeout, rto_delta_us;
 
-	/* No consecutive loss probes. */
-	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
-		tcp_rearm_rto(sk);
-		return false;
-	}
 	/* Don't do any loss probe on a Fast Open connection before 3WHS
 	 * finishes.
 	 */
 	if (tp->fastopen_rsk)
 		return false;
 
-	/* TLP is only scheduled when next timer event is RTO. */
-	if (icsk->icsk_pending != ICSK_TIME_RETRANS)
-		return false;
-
 	/* Schedule a loss probe in 2*RTT for SACK capable connections
 	 * in Open state, that are either limited by cwnd or application.
 	 */
-- 
2.14.0.rc0.400.g1c36432dff-goog

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix
  2017-08-01  2:58 ` [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix Neal Cardwell
@ 2017-08-01  7:16   ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2017-08-01  7:16 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, netdev, Yuchung Cheng, Nandita Dukkipati

On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> Pure refactor. This helper will be required in the xmit timer fix
> later in the patch series. (Because the TLP logic will want to make
> this calculation.)
> 
> Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> ---
>  include/net/tcp.h    | 10 ++++++++++
>  net/ipv4/tcp_input.c |  5 +----
>  2 files changed, 11 insertions(+), 4 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire
  2017-08-01  2:58 ` [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire Neal Cardwell
@ 2017-08-01  7:22   ` Eric Dumazet
  2017-08-01 14:35     ` Neal Cardwell
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2017-08-01  7:22 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, netdev, Yuchung Cheng, Nandita Dukkipati

On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> Have tcp_schedule_loss_probe() base the TLP scheduling decision based
> on when the RTO *should* fire. This is to enable the upcoming xmit
> timer fix in this series, where tcp_schedule_loss_probe() cannot
> assume that the last timer installed was an RTO timer (because we are
> no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
> ACK). So tcp_schedule_loss_probe() must independently figure out when
> an RTO would want to fire.
> 
> In the new TLP implementation following in this series, we cannot
> assume that icsk_timeout was set based on an RTO; after processing a
> cumulative ACK the icsk_timeout we see can be from a previous TLP or
> RTO. So we need to independently recalculate the RTO time (instead of
> reading it out of icsk_timeout). Removing this dependency on the
> nature of icsk_timeout makes things a little easier to reason about
> anyway.
> 
> Note that the old and new code should be equivalent, since they are
> both saying: "if the RTO is in the future, but at an earlier time than
> the normal TLP time, then set the TLP timer to fire when the RTO would
> have fired".
> 
> Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> ---
>  net/ipv4/tcp_output.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 2f1588bf73da..0ae6b5d176c0 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2377,8 +2377,8 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  {
>  	struct inet_connection_sock *icsk = inet_csk(sk);
>  	struct tcp_sock *tp = tcp_sk(sk);
> -	u32 timeout, tlp_time_stamp, rto_time_stamp;
>  	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
> +	u32 timeout, rto_delta_us;
>  
>  	/* No consecutive loss probes. */
>  	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
>  
>  	/* If RTO is shorter, just schedule TLP in its place. */

I have hard time to read this comment.

We are here trying to arm a timer based on TLP.

If RTO is shorter, we'll arm the timer based on RTO instead of TLP.

Is "If RTO is shorter, just schedule TLP in its place." really correct ?

I suggest we reword the comment or simply get rid of it now the code is
more obvious.

> -	tlp_time_stamp = tcp_jiffies32 + timeout;
> -	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
> -	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
> -		s32 delta = rto_time_stamp - tcp_jiffies32;
> -		if (delta > 0)
> -			timeout = delta;
> -	}
> +	rto_delta_us = tcp_rto_delta_us(sk);  /* How far in future is RTO? */
> +	if (rto_delta_us > 0)
> +		timeout = min_t(u32, timeout, usecs_to_jiffies(rto_delta_us));
>  
>  	inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
>  				  TCP_RTO_MAX);

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-01  2:58 ` [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed Neal Cardwell
@ 2017-08-01  7:25   ` Eric Dumazet
  2017-08-01 12:20   ` maowenan
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2017-08-01  7:25 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, netdev, Yuchung Cheng, Nandita Dukkipati

On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> Fix a TCP loss recovery performance bug raised recently on the netdev
> list, in two threads:
> 
> (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
> (ii) July 26, 2017: netdev thread:
>      "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
>      outstanding TLP retransmission"

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-01  2:58 ` [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed Neal Cardwell
  2017-08-01  7:25   ` Eric Dumazet
@ 2017-08-01 12:20   ` maowenan
  2017-08-01 14:24     ` Neal Cardwell
  2017-08-04  7:12   ` maowenan
  2017-08-04  7:33   ` maowenan
  3 siblings, 1 reply; 22+ messages in thread
From: maowenan @ 2017-08-01 12:20 UTC (permalink / raw)
  To: Neal Cardwell, David Miller; +Cc: netdev, Yuchung Cheng, Nandita Dukkipati



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Neal Cardwell
> Sent: Tuesday, August 01, 2017 10:58 AM
> To: David Miller
> Cc: netdev@vger.kernel.org; Neal Cardwell; Yuchung Cheng; Nandita Dukkipati
> Subject: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> ACKed/SACKed
> 
> Fix a TCP loss recovery performance bug raised recently on the netdev list, in
> two threads:
> 
> (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
> (ii) July 26, 2017: netdev thread:
>      "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
>      outstanding TLP retransmission"
> 
> The basic problem is that incoming TCP packets that did not indicate forward
> progress could cause the xmit timer (TLP or RTO) to be rearmed and pushed
> back in time. In certain corner cases this could result in the following problems
> noted in these threads:
> 
>  - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
>    could cause TCP to repeatedly schedule TLPs forever. We kept
>    sending TLPs after every ~200ms, which elicited bogus SACKs, which
>    caused more TLPs, ad infinitum; we never fired an RTO to fill in
>    the holes.
> 
>  - Incoming data segments could, in some cases, cause us to reschedule
>    our RTO or TLP timer further out in time, for no good reason. This
>    could cause repeated inbound data to result in stalls in outbound
>    data, in the presence of packet loss.
> 
> This commit fixes these bugs by changing the TLP and RTO ACK processing to:
> 
>  (a) Only reschedule the xmit timer once per ACK.
> 
>  (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
>      ACK indicates sufficient forward progress (a packet was
>      cumulatively ACKed, or we got a SACK for a packet that was sent
>      before the most recent retransmit of the write queue head).
> 
> This brings us back into closer compliance with the RFCs, since, as the
> comment for tcp_rearm_rto() notes, we should only restart the RTO timer
> after forward progress on the connection. Previously we were restarting the
> xmit timer even in these cases where there was no forward progress.
> 
> As a side benefit, this commit simplifies and speeds up the TCP timer arming
> logic. We had been calling inet_csk_reset_xmit_timer() three times on normal
> ACKs that cumulatively acknowledged some data:
> 
> 1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
>         if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>                tcp_rearm_rto(sk);
> 
> 2) Once in tcp_clean_rtx_queue(), to update the RTO:
>         if (flag & FLAG_ACKED) {
>                tcp_rearm_rto(sk);
> 
> 3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
>    to TLP:
>         if (icsk->icsk_pending == ICSK_TIME_RETRANS)
>                tcp_schedule_loss_probe(sk);
> 
> This commit, by only rescheduling the xmit timer once per ACK, simplifies the
> code and reduces CPU overhead.
> 
> This commit was tested in an A/B test with Google web server traffic. SNMP
> stats and request latency metrics were within noise levels, substantiating that
> for normal web traffic patterns this is a rare issue. This commit was also tested
> with packetdrill tests to verify that it fixes the timer behavior in the corner
> cases discussed in the netdev threads mentioned above.
> 
> This patch is a bug fix patch intended to be queued for -stable relases.
> 
> Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> Reported-by: Klavs Klavsen <kl@vsen.dk>
> Reported-by: Mao Wenan <maowenan@huawei.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> ---
>  net/ipv4/tcp_input.c  | 25 ++++++++++++++++---------
> net/ipv4/tcp_output.c |  9 ---------
>  2 files changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
> 345febf0a46e..3e777cfbba56 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -107,6 +107,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
>  #define FLAG_ORIG_SACK_ACKED	0x200 /* Never retransmitted data are
> (s)acked	*/
>  #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!=
> FLAG_DATA_ACKED) */
>  #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained D-SACK info
> */
> +#define FLAG_SET_XMIT_TIMER	0x1000 /* Set TLP or RTO timer */
>  #define FLAG_SACK_RENEGING	0x2000 /* snd_una advanced to a sacked
> seq */
>  #define FLAG_UPDATE_TS_RECENT	0x4000 /* tcp_replace_ts_recent() */
>  #define FLAG_NO_CHALLENGE_ACK	0x8000 /* do not call
> tcp_send_challenge_ack()	*/
> @@ -3016,6 +3017,13 @@ void tcp_rearm_rto(struct sock *sk)
>  	}
>  }
> 
> +/* Try to schedule a loss probe; if that doesn't work, then schedule an
> +RTO. */ static void tcp_set_xmit_timer(struct sock *sk) {
> +	if (!tcp_schedule_loss_probe(sk))
> +		tcp_rearm_rto(sk);
> +}
> +
>  /* If we get here, the whole TSO packet has not been acked. */  static u32
> tcp_tso_acked(struct sock *sk, struct sk_buff *skb)  { @@ -3177,7 +3185,7
> @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
>  					ca_rtt_us, sack->rate);
> 
>  	if (flag & FLAG_ACKED) {
> -		tcp_rearm_rto(sk);
> +		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
>  		if (unlikely(icsk->icsk_mtup.probe_size &&
>  			     !after(tp->mtu_probe.probe_seq_end, tp->snd_una))) {
>  			tcp_mtup_probe_success(sk);
> @@ -3205,7 +3213,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int
> prior_fackets,
>  		 * after when the head was last (re)transmitted. Otherwise the
>  		 * timeout may continue to extend in loss recovery.
>  		 */
> -		tcp_rearm_rto(sk);
> +		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
>  	}
> 
>  	if (icsk->icsk_ca_ops->pkts_acked) {
> @@ -3577,9 +3585,6 @@ static int tcp_ack(struct sock *sk, const struct
> sk_buff *skb, int flag)
>  	if (after(ack, tp->snd_nxt))
>  		goto invalid_ack;
> 
> -	if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
> -		tcp_rearm_rto(sk);
> -
>  	if (after(ack, prior_snd_una)) {
>  		flag |= FLAG_SND_UNA_ADVANCED;
>  		icsk->icsk_retransmits = 0;
> @@ -3644,18 +3649,20 @@ static int tcp_ack(struct sock *sk, const struct
> sk_buff *skb, int flag)
>  	flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una, &acked,
>  				    &sack_state);
> 
> +	if (tp->tlp_high_seq)
> +		tcp_process_tlp_ack(sk, ack, flag);
> +	/* If needed, reset TLP/RTO timer; RACK may later override this. */
[Mao Wenan] I have question about RACK, if there is no RACK feature in lower version, who can clear this flag:FLAG_SET_XMIT_TIMER?
 
> +	if (flag & FLAG_SET_XMIT_TIMER)
> +		tcp_set_xmit_timer(sk);
> +
>  	if (tcp_ack_is_dubious(sk, flag)) {
>  		is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED |
> FLAG_NOT_DUP));
>  		tcp_fastretrans_alert(sk, acked, is_dupack, &flag, &rexmit);
>  	}
> -	if (tp->tlp_high_seq)
> -		tcp_process_tlp_ack(sk, ack, flag);
> 
>  	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
>  		sk_dst_confirm(sk);
> 
> -	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
> -		tcp_schedule_loss_probe(sk);
>  	delivered = tp->delivered - delivered;	/* freshly ACKed or SACKed */
>  	lost = tp->lost - lost;			/* freshly marked lost */
>  	tcp_rate_gen(sk, delivered, lost, sack_state.rate); diff --git
> a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> 0ae6b5d176c0..c99cba897b9c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2380,21 +2380,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
>  	u32 timeout, rto_delta_us;
> 
> -	/* No consecutive loss probes. */
> -	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> -		tcp_rearm_rto(sk);
> -		return false;
> -	}
>  	/* Don't do any loss probe on a Fast Open connection before 3WHS
>  	 * finishes.
>  	 */
>  	if (tp->fastopen_rsk)
>  		return false;
> 
> -	/* TLP is only scheduled when next timer event is RTO. */
> -	if (icsk->icsk_pending != ICSK_TIME_RETRANS)
> -		return false;
> -
>  	/* Schedule a loss probe in 2*RTT for SACK capable connections
>  	 * in Open state, that are either limited by cwnd or application.
>  	 */
> --
> 2.14.0.rc0.400.g1c36432dff-goog

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-01 12:20   ` maowenan
@ 2017-08-01 14:24     ` Neal Cardwell
  0 siblings, 0 replies; 22+ messages in thread
From: Neal Cardwell @ 2017-08-01 14:24 UTC (permalink / raw)
  To: maowenan; +Cc: David Miller, netdev, Yuchung Cheng, Nandita Dukkipati

On Tue, Aug 1, 2017 at 8:20 AM, maowenan <maowenan@huawei.com> wrote:
> > +     /* If needed, reset TLP/RTO timer; RACK may later override this. */
> [Mao Wenan] I have question about RACK, if there is no RACK feature
> in lower version, who can clear this flag:FLAG_SET_XMIT_TIMER?

In the comment, "this" is referring to the xmit timer. The comment is
referring to the fact that RACK (or similarly
tcp_pause_early_retransmit() in earlier kernels) can override the
value of the xmit timer set in tcp_set_xmit_timer(). The comment is
not claiming that RACK clears the FLAG_SET_XMIT_TIMER bit in "flag".

Note that "flag" is an argument to tcp_ack(). And while processing a
given incoming ACK, the callers of tcp_ack() pass in either 0 or one
or more constant bit flags. So FLAG_SET_XMIT_TIMER and all the other
bit flags are implicitly set to zero at the beginning of processing
each ACK.

neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire
  2017-08-01  7:22   ` Eric Dumazet
@ 2017-08-01 14:35     ` Neal Cardwell
  2017-08-01 20:40       ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Neal Cardwell @ 2017-08-01 14:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Netdev, Yuchung Cheng, Nandita Dukkipati

On Tue, Aug 1, 2017 at 3:22 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
>> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>>       timeout = max_t(u32, timeout, msecs_to_jiffies(10));
>>
>>       /* If RTO is shorter, just schedule TLP in its place. */
>
> I have hard time to read this comment.
>
> We are here trying to arm a timer based on TLP.
>
> If RTO is shorter, we'll arm the timer based on RTO instead of TLP.
>
> Is "If RTO is shorter, just schedule TLP in its place." really correct ?
>
> I suggest we reword the comment or simply get rid of it now the code is
> more obvious.

OK, how about:

  /* If the RTO formula yields an earlier time, then use that time. */

We can also add a reference to the RACK/TLP Internet Draft at the top
of tcp_schedule_loss_probe().

Whatever wording we decide on, I am happy to send a patch for net-next
once this fix is merged into net-next.

neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire
  2017-08-01 14:35     ` Neal Cardwell
@ 2017-08-01 20:40       ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2017-08-01 20:40 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, Netdev, Yuchung Cheng, Nandita Dukkipati

On Tue, 2017-08-01 at 10:35 -0400, Neal Cardwell wrote:
> On Tue, Aug 1, 2017 at 3:22 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> >> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> >>       timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> >>
> >>       /* If RTO is shorter, just schedule TLP in its place. */
> >
> > I have hard time to read this comment.
> >
> > We are here trying to arm a timer based on TLP.
> >
> > If RTO is shorter, we'll arm the timer based on RTO instead of TLP.
> >
> > Is "If RTO is shorter, just schedule TLP in its place." really correct ?
> >
> > I suggest we reword the comment or simply get rid of it now the code is
> > more obvious.
> 
> OK, how about:
> 
>   /* If the RTO formula yields an earlier time, then use that time. */
> 

Sounds better :)

> We can also add a reference to the RACK/TLP Internet Draft at the top
> of tcp_schedule_loss_probe().
> 
> Whatever wording we decide on, I am happy to send a patch for net-next
> once this fix is merged into net-next.

Sure.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-01  2:58 ` [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed Neal Cardwell
  2017-08-01  7:25   ` Eric Dumazet
  2017-08-01 12:20   ` maowenan
@ 2017-08-04  7:12   ` maowenan
  2017-08-04 15:54     ` Neal Cardwell
  2017-08-04  7:33   ` maowenan
  3 siblings, 1 reply; 22+ messages in thread
From: maowenan @ 2017-08-04  7:12 UTC (permalink / raw)
  To: maowenan, Neal Cardwell, David Miller
  Cc: netdev, Yuchung Cheng, Nandita Dukkipati



> -----Original Message-----
> From: maowenan
> Sent: Tuesday, August 01, 2017 8:20 PM
> To: 'Neal Cardwell'; David Miller
> Cc: netdev@vger.kernel.org; Yuchung Cheng; Nandita Dukkipati
> Subject: RE: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> ACKed/SACKed
> 
> 
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org
> > [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of Neal Cardwell
> > Sent: Tuesday, August 01, 2017 10:58 AM
> > To: David Miller
> > Cc: netdev@vger.kernel.org; Neal Cardwell; Yuchung Cheng; Nandita
> > Dukkipati
> > Subject: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> > ACKed/SACKed
> >
> > Fix a TCP loss recovery performance bug raised recently on the netdev
> > list, in two threads:
> >
> > (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
> > (ii) July 26, 2017: netdev thread:
> >      "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
> >      outstanding TLP retransmission"
> >
> > The basic problem is that incoming TCP packets that did not indicate
> > forward progress could cause the xmit timer (TLP or RTO) to be rearmed
> > and pushed back in time. In certain corner cases this could result in
> > the following problems noted in these threads:
> >
> >  - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
> >    could cause TCP to repeatedly schedule TLPs forever. We kept
> >    sending TLPs after every ~200ms, which elicited bogus SACKs, which
> >    caused more TLPs, ad infinitum; we never fired an RTO to fill in
> >    the holes.
> >
> >  - Incoming data segments could, in some cases, cause us to reschedule
> >    our RTO or TLP timer further out in time, for no good reason. This
> >    could cause repeated inbound data to result in stalls in outbound
> >    data, in the presence of packet loss.
> >
> > This commit fixes these bugs by changing the TLP and RTO ACK processing to:
> >
> >  (a) Only reschedule the xmit timer once per ACK.
> >
> >  (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
> >      ACK indicates sufficient forward progress (a packet was
> >      cumulatively ACKed, or we got a SACK for a packet that was sent
> >      before the most recent retransmit of the write queue head).
> >
> > This brings us back into closer compliance with the RFCs, since, as
> > the comment for tcp_rearm_rto() notes, we should only restart the RTO
> > timer after forward progress on the connection. Previously we were
> > restarting the xmit timer even in these cases where there was no forward
> progress.
> >
> > As a side benefit, this commit simplifies and speeds up the TCP timer
> > arming logic. We had been calling inet_csk_reset_xmit_timer() three
> > times on normal ACKs that cumulatively acknowledged some data:
> >
> > 1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
> >         if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
> >                tcp_rearm_rto(sk);
> >
> > 2) Once in tcp_clean_rtx_queue(), to update the RTO:
> >         if (flag & FLAG_ACKED) {
> >                tcp_rearm_rto(sk);
> >
> > 3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
> >    to TLP:
> >         if (icsk->icsk_pending == ICSK_TIME_RETRANS)
> >                tcp_schedule_loss_probe(sk);
> >
> > This commit, by only rescheduling the xmit timer once per ACK,
> > simplifies the code and reduces CPU overhead.
> >
> > This commit was tested in an A/B test with Google web server traffic.
> > SNMP stats and request latency metrics were within noise levels,
> > substantiating that for normal web traffic patterns this is a rare
> > issue. This commit was also tested with packetdrill tests to verify
> > that it fixes the timer behavior in the corner cases discussed in the netdev
> threads mentioned above.
> >
> > This patch is a bug fix patch intended to be queued for -stable relases.
> >
> > Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> > Reported-by: Klavs Klavsen <kl@vsen.dk>
> > Reported-by: Mao Wenan <maowenan@huawei.com>
> > Signed-off-by: Neal Cardwell <ncardwell@google.com>
> > Signed-off-by: Yuchung Cheng <ycheng@google.com>
> > Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> > ---
> >  net/ipv4/tcp_input.c  | 25 ++++++++++++++++---------
> > net/ipv4/tcp_output.c |  9 ---------
> >  2 files changed, 16 insertions(+), 18 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
> > 345febf0a46e..3e777cfbba56 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -107,6 +107,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly =
> HZ/2;
> >  #define FLAG_ORIG_SACK_ACKED	0x200 /* Never retransmitted data are
> > (s)acked	*/
> >  #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!=
> > FLAG_DATA_ACKED) */
> >  #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained D-SACK info
> > */
> > +#define FLAG_SET_XMIT_TIMER	0x1000 /* Set TLP or RTO timer */
> >  #define FLAG_SACK_RENEGING	0x2000 /* snd_una advanced to a
> sacked
> > seq */
> >  #define FLAG_UPDATE_TS_RECENT	0x4000 /* tcp_replace_ts_recent() */
> >  #define FLAG_NO_CHALLENGE_ACK	0x8000 /* do not call
> > tcp_send_challenge_ack()	*/
> > @@ -3016,6 +3017,13 @@ void tcp_rearm_rto(struct sock *sk)
> >  	}
> >  }
> >
> > +/* Try to schedule a loss probe; if that doesn't work, then schedule
> > +an RTO. */ static void tcp_set_xmit_timer(struct sock *sk) {
> > +	if (!tcp_schedule_loss_probe(sk))
> > +		tcp_rearm_rto(sk);
> > +}
> > +
> >  /* If we get here, the whole TSO packet has not been acked. */
> > static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)  { @@
> > -3177,7 +3185,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int
> prior_fackets,
> >  					ca_rtt_us, sack->rate);
> >
> >  	if (flag & FLAG_ACKED) {
> > -		tcp_rearm_rto(sk);
> > +		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
> >  		if (unlikely(icsk->icsk_mtup.probe_size &&
> >  			     !after(tp->mtu_probe.probe_seq_end, tp->snd_una))) {
> >  			tcp_mtup_probe_success(sk);
> > @@ -3205,7 +3213,7 @@ static int tcp_clean_rtx_queue(struct sock *sk,
> > int prior_fackets,
> >  		 * after when the head was last (re)transmitted. Otherwise the
> >  		 * timeout may continue to extend in loss recovery.
> >  		 */
> > -		tcp_rearm_rto(sk);
> > +		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
> >  	}
> >
> >  	if (icsk->icsk_ca_ops->pkts_acked) { @@ -3577,9 +3585,6 @@ static
> > int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
> >  	if (after(ack, tp->snd_nxt))
> >  		goto invalid_ack;
> >
> > -	if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
> > -		tcp_rearm_rto(sk);
> > -
> >  	if (after(ack, prior_snd_una)) {
> >  		flag |= FLAG_SND_UNA_ADVANCED;
> >  		icsk->icsk_retransmits = 0;
> > @@ -3644,18 +3649,20 @@ static int tcp_ack(struct sock *sk, const
> > struct sk_buff *skb, int flag)
> >  	flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una, &acked,
> >  				    &sack_state);
> >
> > +	if (tp->tlp_high_seq)
> > +		tcp_process_tlp_ack(sk, ack, flag);
> > +	/* If needed, reset TLP/RTO timer; RACK may later override this. */
> [Mao Wenan] I have question about RACK, if there is no RACK feature in lower
> version, who can clear this flag:FLAG_SET_XMIT_TIMER?
> 
> > +	if (flag & FLAG_SET_XMIT_TIMER)
> > +		tcp_set_xmit_timer(sk);
> > +
> >  	if (tcp_ack_is_dubious(sk, flag)) {
> >  		is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED |
> FLAG_NOT_DUP));
> >  		tcp_fastretrans_alert(sk, acked, is_dupack, &flag, &rexmit);
> >  	}
> > -	if (tp->tlp_high_seq)
> > -		tcp_process_tlp_ack(sk, ack, flag);
> >
> >  	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
> >  		sk_dst_confirm(sk);
> >
> > -	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
> > -		tcp_schedule_loss_probe(sk);
> >  	delivered = tp->delivered - delivered;	/* freshly ACKed or SACKed */
> >  	lost = tp->lost - lost;			/* freshly marked lost */
> >  	tcp_rate_gen(sk, delivered, lost, sack_state.rate); diff --git
> > a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> > 0ae6b5d176c0..c99cba897b9c 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2380,21 +2380,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> >  	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
> >  	u32 timeout, rto_delta_us;
> >
> > -	/* No consecutive loss probes. */
> > -	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> > -		tcp_rearm_rto(sk);
> > -		return false;
> > -	}
[Mao Wenan] I'm sorry I can't get why you delete this and below "if" branch?
> >  	/* Don't do any loss probe on a Fast Open connection before 3WHS
> >  	 * finishes.
> >  	 */
> >  	if (tp->fastopen_rsk)
> >  		return false;
> >
> > -	/* TLP is only scheduled when next timer event is RTO. */
> > -	if (icsk->icsk_pending != ICSK_TIME_RETRANS)
> > -		return false;
> > -
> >  	/* Schedule a loss probe in 2*RTT for SACK capable connections
> >  	 * in Open state, that are either limited by cwnd or application.
> >  	 */
> > --
> > 2.14.0.rc0.400.g1c36432dff-goog

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-01  2:58 ` [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed Neal Cardwell
                     ` (2 preceding siblings ...)
  2017-08-04  7:12   ` maowenan
@ 2017-08-04  7:33   ` maowenan
  2017-08-04 16:59     ` Neal Cardwell
  3 siblings, 1 reply; 22+ messages in thread
From: maowenan @ 2017-08-04  7:33 UTC (permalink / raw)
  To: maowenan, Neal Cardwell, David Miller
  Cc: netdev, Yuchung Cheng, Nandita Dukkipati



> -----Original Message-----
> From: maowenan
> Sent: Tuesday, August 01, 2017 8:20 PM
> To: 'Neal Cardwell'; David Miller
> Cc: netdev@vger.kernel.org; Yuchung Cheng; Nandita Dukkipati
> Subject: RE: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> ACKed/SACKed
> 
> 
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org
> > [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of Neal Cardwell
> > Sent: Tuesday, August 01, 2017 10:58 AM
> > To: David Miller
> > Cc: netdev@vger.kernel.org; Neal Cardwell; Yuchung Cheng; Nandita
> > Dukkipati
> > Subject: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> > ACKed/SACKed
> >
> > Fix a TCP loss recovery performance bug raised recently on the netdev
> > list, in two threads:
> >
> > (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
> > (ii) July 26, 2017: netdev thread:
> >      "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
> >      outstanding TLP retransmission"
> >
> > The basic problem is that incoming TCP packets that did not indicate
> > forward progress could cause the xmit timer (TLP or RTO) to be rearmed
> > and pushed back in time. In certain corner cases this could result in
> > the following problems noted in these threads:
> >
> >  - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
> >    could cause TCP to repeatedly schedule TLPs forever. We kept
> >    sending TLPs after every ~200ms, which elicited bogus SACKs, which
> >    caused more TLPs, ad infinitum; we never fired an RTO to fill in
> >    the holes.
> >
> >  - Incoming data segments could, in some cases, cause us to reschedule
> >    our RTO or TLP timer further out in time, for no good reason. This
> >    could cause repeated inbound data to result in stalls in outbound
> >    data, in the presence of packet loss.
> >
> > This commit fixes these bugs by changing the TLP and RTO ACK processing to:
> >
> >  (a) Only reschedule the xmit timer once per ACK.
> >
> >  (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
> >      ACK indicates sufficient forward progress (a packet was
> >      cumulatively ACKed, or we got a SACK for a packet that was sent
> >      before the most recent retransmit of the write queue head).
> >
> > This brings us back into closer compliance with the RFCs, since, as
> > the comment for tcp_rearm_rto() notes, we should only restart the RTO
> > timer after forward progress on the connection. Previously we were
> > restarting the xmit timer even in these cases where there was no forward
> progress.
> >
> > As a side benefit, this commit simplifies and speeds up the TCP timer
> > arming logic. We had been calling inet_csk_reset_xmit_timer() three
> > times on normal ACKs that cumulatively acknowledged some data:
> >
> > 1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
> >         if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
> >                tcp_rearm_rto(sk);
> >
> > 2) Once in tcp_clean_rtx_queue(), to update the RTO:
> >         if (flag & FLAG_ACKED) {
> >                tcp_rearm_rto(sk);
> >
> > 3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
> >    to TLP:
> >         if (icsk->icsk_pending == ICSK_TIME_RETRANS)
> >                tcp_schedule_loss_probe(sk);
> >
> > This commit, by only rescheduling the xmit timer once per ACK,
> > simplifies the code and reduces CPU overhead.
> >
> > This commit was tested in an A/B test with Google web server traffic.
> > SNMP stats and request latency metrics were within noise levels,
> > substantiating that for normal web traffic patterns this is a rare
> > issue. This commit was also tested with packetdrill tests to verify
> > that it fixes the timer behavior in the corner cases discussed in the netdev
> threads mentioned above.
> >
> > This patch is a bug fix patch intended to be queued for -stable relases.
> >
> > Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> > Reported-by: Klavs Klavsen <kl@vsen.dk>
> > Reported-by: Mao Wenan <maowenan@huawei.com>
> > Signed-off-by: Neal Cardwell <ncardwell@google.com>
> > Signed-off-by: Yuchung Cheng <ycheng@google.com>
> > Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> > ---
> >  net/ipv4/tcp_input.c  | 25 ++++++++++++++++---------
> > net/ipv4/tcp_output.c |  9 ---------
> >  2 files changed, 16 insertions(+), 18 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
> > 345febf0a46e..3e777cfbba56 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -107,6 +107,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly =
> HZ/2;
> >  #define FLAG_ORIG_SACK_ACKED	0x200 /* Never retransmitted data are
> > (s)acked	*/
> >  #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!=
> > FLAG_DATA_ACKED) */
> >  #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained D-SACK info
> > */
> > +#define FLAG_SET_XMIT_TIMER	0x1000 /* Set TLP or RTO timer */
> >  #define FLAG_SACK_RENEGING	0x2000 /* snd_una advanced to a
> sacked
> > seq */
> >  #define FLAG_UPDATE_TS_RECENT	0x4000 /* tcp_replace_ts_recent() */
> >  #define FLAG_NO_CHALLENGE_ACK	0x8000 /* do not call
> > tcp_send_challenge_ack()	*/
> > @@ -3016,6 +3017,13 @@ void tcp_rearm_rto(struct sock *sk)
> >  	}
> >  }
> >
> > +/* Try to schedule a loss probe; if that doesn't work, then schedule
> > +an RTO. */ static void tcp_set_xmit_timer(struct sock *sk) {
> > +	if (!tcp_schedule_loss_probe(sk))
> > +		tcp_rearm_rto(sk);
> > +}
> > +
> >  /* If we get here, the whole TSO packet has not been acked. */
> > static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)  { @@
> > -3177,7 +3185,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int
> prior_fackets,
> >  					ca_rtt_us, sack->rate);
> >
> >  	if (flag & FLAG_ACKED) {
> > -		tcp_rearm_rto(sk);
> > +		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
> >  		if (unlikely(icsk->icsk_mtup.probe_size &&
> >  			     !after(tp->mtu_probe.probe_seq_end, tp->snd_una))) {
> >  			tcp_mtup_probe_success(sk);
> > @@ -3205,7 +3213,7 @@ static int tcp_clean_rtx_queue(struct sock *sk,
> > int prior_fackets,
> >  		 * after when the head was last (re)transmitted. Otherwise the
> >  		 * timeout may continue to extend in loss recovery.
> >  		 */
> > -		tcp_rearm_rto(sk);
> > +		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
> >  	}
> >
> >  	if (icsk->icsk_ca_ops->pkts_acked) { @@ -3577,9 +3585,6 @@ static
> > int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
> >  	if (after(ack, tp->snd_nxt))
> >  		goto invalid_ack;
> >
> > -	if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
> > -		tcp_rearm_rto(sk);
> > -
> >  	if (after(ack, prior_snd_una)) {
> >  		flag |= FLAG_SND_UNA_ADVANCED;
> >  		icsk->icsk_retransmits = 0;
> > @@ -3644,18 +3649,20 @@ static int tcp_ack(struct sock *sk, const
> > struct sk_buff *skb, int flag)
> >  	flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una, &acked,
> >  				    &sack_state);
> >
> > +	if (tp->tlp_high_seq)
> > +		tcp_process_tlp_ack(sk, ack, flag);
> > +	/* If needed, reset TLP/RTO timer; RACK may later override this. */
> [Mao Wenan] I have question about RACK, if there is no RACK feature in lower
> version, who can clear this flag:FLAG_SET_XMIT_TIMER?
> 
> > +	if (flag & FLAG_SET_XMIT_TIMER)
> > +		tcp_set_xmit_timer(sk);
> > +
> >  	if (tcp_ack_is_dubious(sk, flag)) {
> >  		is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED |
> FLAG_NOT_DUP));
> >  		tcp_fastretrans_alert(sk, acked, is_dupack, &flag, &rexmit);
> >  	}
> > -	if (tp->tlp_high_seq)
> > -		tcp_process_tlp_ack(sk, ack, flag);
> >
> >  	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
> >  		sk_dst_confirm(sk);
> >
> > -	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
> > -		tcp_schedule_loss_probe(sk);
> >  	delivered = tp->delivered - delivered;	/* freshly ACKed or SACKed */
> >  	lost = tp->lost - lost;			/* freshly marked lost */
> >  	tcp_rate_gen(sk, delivered, lost, sack_state.rate); diff --git
> > a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> > 0ae6b5d176c0..c99cba897b9c 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2380,21 +2380,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> >  	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
> >  	u32 timeout, rto_delta_us;
> >
> > -	/* No consecutive loss probes. */
> > -	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> > -		tcp_rearm_rto(sk);
> > -		return false;
> > -	}
[Mao Wenan]Follow previous mail, in lower version such as 3.10, I found
there are many timer type, e.g:ICSK_TIME_EARLY_RETRANS, RTO,PTO
are used. I'm not sure there exist some unknown problem if we don't check 
isck_pending here and below if branch? And could you please post one lower
version patch such as 3.10?  Thanks a lot. 
 
#define ICSK_TIME_RETRANS	1	/* Retransmit timer */
#define ICSK_TIME_DACK		2	/* Delayed ack timer */
#define ICSK_TIME_PROBE0	3	/* Zero window probe timer */
#define ICSK_TIME_EARLY_RETRANS 4	/* Early retransmit timer */
#define ICSK_TIME_LOSS_PROBE	5	/* Tail loss probe timer */

> >  	/* Don't do any loss probe on a Fast Open connection before 3WHS
> >  	 * finishes.
> >  	 */
> >  	if (tp->fastopen_rsk)
> >  		return false;
> >
> > -	/* TLP is only scheduled when next timer event is RTO. */
> > -	if (icsk->icsk_pending != ICSK_TIME_RETRANS)
> > -		return false;
> > -
> >  	/* Schedule a loss probe in 2*RTT for SACK capable connections
> >  	 * in Open state, that are either limited by cwnd or application.
> >  	 */
> > --
> > 2.14.0.rc0.400.g1c36432dff-goog

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-04  7:12   ` maowenan
@ 2017-08-04 15:54     ` Neal Cardwell
  0 siblings, 0 replies; 22+ messages in thread
From: Neal Cardwell @ 2017-08-04 15:54 UTC (permalink / raw)
  To: maowenan; +Cc: David Miller, netdev, Yuchung Cheng, Nandita Dukkipati

On Fri, Aug 4, 2017 at 3:12 AM, maowenan <maowenan@huawei.com> wrote:
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2380,21 +2380,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> > >     u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
> > >     u32 timeout, rto_delta_us;
> > >
> > > -   /* No consecutive loss probes. */
> > > -   if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> > > -           tcp_rearm_rto(sk);
> > > -           return false;
> > > -   }
> [Mao Wenan] I'm sorry I can't get why you delete this and below "if" branch?

We deleted those two "if" branches in tcp_schedule_loss_probe()
because they were assuming that TLP probes would only be scheduled in
a context where an RTO had already been scheduled. With the old
implementation that was true: on every ACK (tcp_ack()) or send of new
data (tcp_event_new_data_sent()) we would first schedule an RTO (by
calling tcp_rearm_rto()) and then schedule a TLP (by calling
tcp_schedule_loss_probe()). So the checks were the right ones for the
old implementation.

With the new implementation, we do not first rearm the RTO on every
incoming ACK. That means when we get to tcp_schedule_loss_probe() we
may find either an RTO or TLP is pending.

Hope that helps clear that up.

cheers,
neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-04  7:33   ` maowenan
@ 2017-08-04 16:59     ` Neal Cardwell
  2017-08-04 17:10       ` Willy Tarreau
  0 siblings, 1 reply; 22+ messages in thread
From: Neal Cardwell @ 2017-08-04 16:59 UTC (permalink / raw)
  To: maowenan; +Cc: David Miller, netdev, Yuchung Cheng, Nandita Dukkipati

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

On Fri, Aug 4, 2017 at 3:33 AM, maowenan <maowenan@huawei.com> wrote:
> [Mao Wenan]Follow previous mail, in lower version such as 3.10, I found
> there are many timer type, e.g:ICSK_TIME_EARLY_RETRANS, RTO,PTO
> are used. I'm not sure there exist some unknown problem if we don't check
> isck_pending here and below if branch? And could you please post one lower
> version patch such as 3.10?  Thanks a lot.
>
> #define ICSK_TIME_RETRANS       1       /* Retransmit timer */
> #define ICSK_TIME_DACK          2       /* Delayed ack timer */
> #define ICSK_TIME_PROBE0        3       /* Zero window probe timer */
> #define ICSK_TIME_EARLY_RETRANS 4       /* Early retransmit timer */
> #define ICSK_TIME_LOSS_PROBE    5       /* Tail loss probe timer */

I think you'll find that it you audit how these patches interact with
those other timer types you'll find that the behavior either matches
the existing code, or is improved. We ran these patches through our
packetdrill test suite and found that was the case for all our
existing tests. But please let us know if you find a specific scenario
where that is not the case.

I have attached patches for this fix rebased on to v3.10.107, the
latest stable release for 3.10. That's pretty far back in history, so
there were substantial conflict resolutions and adjustments required.
:-) Hope that helps.

thanks,
neal

[-- Attachment #2: 0001-tcp-introduce-tcp_rto_delta_us-helper-for-xmit-timer.patch --]
[-- Type: application/octet-stream, Size: 2261 bytes --]

From 6e5001de12d20fee1696fe50df51b05438278f4c Mon Sep 17 00:00:00 2001
From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 27 Jul 2017 10:01:20 -0400
Subject: [PATCH 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer
 fix

Pure refactor. This helper will be required in the xmit timer fix
later in the patch series. (Because the TLP logic will want to make
this calculation.)

[This version of the commit was compiled and briefly tested
based on top of v3.10.107.]

Change-Id: I1ccfba0b00465454bf5ce22e6fef5f7b5dd94d15
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h    | 10 ++++++++++
 net/ipv4/tcp_input.c |  4 +---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 79cd118d5994..c4db9acefa9c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1592,4 +1592,14 @@ struct tcp_request_sock_ops {
 extern void tcp_v4_init(void);
 extern void tcp_init(void);
 
+/* At how many jiffies into the future should the RTO fire? */
+static inline s32 tcp_rto_delta(const struct sock *sk)
+{
+	const struct sk_buff *skb = tcp_write_queue_head(sk);
+	const u32 rto = inet_csk(sk)->icsk_rto;
+	const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto;
+
+	return (s32)(rto_time_stamp - tcp_time_stamp);
+}
+
 #endif	/* _TCP_H */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0680058fe693..a36b7c55b76b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2972,9 +2972,7 @@ void tcp_rearm_rto(struct sock *sk)
 		/* Offset the time elapsed after installing regular RTO */
 		if (icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
 		    icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
-			struct sk_buff *skb = tcp_write_queue_head(sk);
-			const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto;
-			s32 delta = (s32)(rto_time_stamp - tcp_time_stamp);
+			s32 delta = tcp_rto_delta(sk);
 			/* delta may not be positive if the socket is locked
 			 * when the retrans timer fires and is rescheduled.
 			 */
-- 
2.14.0.rc1.383.gd1ce394fe2-goog


[-- Attachment #3: 0002-tcp-enable-xmit-timer-fix-by-having-TLP-use-time-whe.patch --]
[-- Type: application/octet-stream, Size: 3106 bytes --]

From 50c4e5e3bd4a437a3fbc237e0c248f758aa2ab85 Mon Sep 17 00:00:00 2001
From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 27 Jul 2017 10:09:30 -0400
Subject: [PATCH 2/3] tcp: enable xmit timer fix by having TLP use time when
 RTO should fire

Have tcp_schedule_loss_probe() base the TLP scheduling decision based
on when the RTO *should* fire. This is to enable the upcoming xmit
timer fix in this series, where tcp_schedule_loss_probe() cannot
assume that the last timer installed was an RTO timer (because we are
no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
ACK). So tcp_schedule_loss_probe() must independently figure out when
an RTO would want to fire.

In the new TLP implementation following in this series, we cannot
assume that icsk_timeout was set based on an RTO; after processing a
cumulative ACK the icsk_timeout we see can be from a previous TLP or
RTO. So we need to independently recalculate the RTO time (instead of
reading it out of icsk_timeout). Removing this dependency on the
nature of icsk_timeout makes things a little easier to reason about
anyway.

Note that the old and new code should be equivalent, since they are
both saying: "if the RTO is in the future, but at an earlier time than
the normal TLP time, then set the TLP timer to fire when the RTO would
have fired".

[This version of the commit was compiled and briefly tested
based on top of v3.10.107.]

Change-Id: I597ad6446edde15bf2cea8e56d603a2c52f8221b
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8729a934124f..135440283f21 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1947,8 +1947,8 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	u32 timeout, tlp_time_stamp, rto_time_stamp;
 	u32 rtt = tp->srtt >> 3;
+	u32 timeout, rto_delta;
 
 	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS))
 		return false;
@@ -1987,14 +1987,10 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
 	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
 
-	/* If RTO is shorter, just schedule TLP in its place. */
-	tlp_time_stamp = tcp_time_stamp + timeout;
-	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
-	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
-		s32 delta = rto_time_stamp - tcp_time_stamp;
-		if (delta > 0)
-			timeout = delta;
-	}
+	/* If the RTO formula yields an earlier time, then use that time. */
+	rto_delta = tcp_rto_delta(sk);  /* How far in future is RTO? */
+	if (rto_delta > 0)
+		timeout = min_t(u32, timeout, rto_delta);
 
 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
 				  TCP_RTO_MAX);
-- 
2.14.0.rc1.383.gd1ce394fe2-goog


[-- Attachment #4: 0003-tcp-fix-xmit-timer-to-only-be-reset-if-data-ACKed-SA.patch --]
[-- Type: application/octet-stream, Size: 7689 bytes --]

From c920720b6bd7fa0cc855b16799b9ea11ef95db76 Mon Sep 17 00:00:00 2001
From: Neal Cardwell <ncardwell@google.com>
Date: Wed, 26 Jul 2017 21:43:27 -0400
Subject: [PATCH 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

Fix a TCP loss recovery performance bug raised recently on the netdev
list, in two threads:

(i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
(ii) July 26, 2017: netdev thread:
     "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
     outstanding TLP retransmission"

The basic problem is that incoming TCP packets that did not indicate
forward progress could cause the xmit timer (TLP or RTO) to be rearmed
and pushed back in time. In certain corner cases this could result in
the following problems noted in these threads:

 - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
   could cause TCP to repeatedly schedule TLPs forever. We kept
   sending TLPs after every ~200ms, which elicited bogus SACKs, which
   caused more TLPs, ad infinitum; we never fired an RTO to fill in
   the holes.

 - Incoming data segments could, in some cases, cause us to reschedule
   our RTO or TLP timer further out in time, for no good reason. This
   could cause repeated inbound data to result in stalls in outbound
   data, in the presence of packet loss.

This commit fixes these bugs by changing the TLP and RTO ACK
processing to:

 (a) Only reschedule the xmit timer once per ACK.

 (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
     ACK indicates sufficient forward progress (a packet was
     cumulatively ACKed, or we got a SACK for a packet that was sent
     before the most recent retransmit of the write queue head).

This brings us back into closer compliance with the RFCs, since, as
the comment for tcp_rearm_rto() notes, we should only restart the RTO
timer after forward progress on the connection. Previously we were
restarting the xmit timer even in these cases where there was no
forward progress.

As a side benefit, this commit simplifies and speeds up the TCP timer
arming logic. We had been calling inet_csk_reset_xmit_timer() three
times on normal ACKs that cumulatively acknowledged some data:

1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
        if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
               tcp_rearm_rto(sk);

2) Once in tcp_clean_rtx_queue(), to update the RTO:
        if (flag & FLAG_ACKED) {
               tcp_rearm_rto(sk);

3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
   to TLP:
        if (icsk->icsk_pending == ICSK_TIME_RETRANS)
               tcp_schedule_loss_probe(sk);

This commit, by only rescheduling the xmit timer once per ACK,
simplifies the code and reduces CPU overhead.

This commit was tested in an A/B test with Google web server
traffic. SNMP stats and request latency metrics were within noise
levels, substantiating that for normal web traffic patterns this is a
rare issue. This commit was also tested with packetdrill tests to
verify that it fixes the timer behavior in the corner cases discussed
in the netdev threads mentioned above.

This patch is a bug fix patch intended to be queued for -stable
relases.

[This version of the commit was compiled and briefly tested
based on top of v3.10.107.]

Change-Id: If0417380fd59290b65cf04a415373aa13dd1dad7
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Reported-by: Klavs Klavsen <kl@vsen.dk>
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c  | 25 +++++++++++++++----------
 net/ipv4/tcp_output.c | 12 ------------
 2 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a36b7c55b76b..70f217cb3a1b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -111,6 +111,7 @@ int sysctl_tcp_early_retrans __read_mostly = 3;
 #define FLAG_ORIG_SACK_ACKED	0x200 /* Never retransmitted data are (s)acked	*/
 #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
 #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained D-SACK info */
+#define FLAG_SET_XMIT_TIMER	0x1000 /* Set TLP or RTO timer */
 #define FLAG_SACK_RENEGING	0x2000 /* snd_una advanced to a sacked seq */
 #define FLAG_UPDATE_TS_RECENT	0x4000 /* tcp_replace_ts_recent() */
 
@@ -3002,6 +3003,13 @@ void tcp_resume_early_retransmit(struct sock *sk)
 	tcp_xmit_retransmit_queue(sk);
 }
 
+/* Try to schedule a loss probe; if that doesn't work, then schedule an RTO. */
+static void tcp_set_xmit_timer(struct sock *sk)
+{
+	if (!tcp_schedule_loss_probe(sk))
+		tcp_rearm_rto(sk);
+}
+
 /* If we get here, the whole TSO packet has not been acked. */
 static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
 {
@@ -3132,7 +3140,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 		}
 
 		tcp_ack_update_rtt(sk, flag, seq_rtt);
-		tcp_rearm_rto(sk);
+		flag |= FLAG_SET_XMIT_TIMER;  /* set TLP or RTO timer */
 
 		if (tcp_is_reno(tp)) {
 			tcp_remove_reno_sacks(sk, pkts_acked);
@@ -3392,10 +3400,6 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (after(ack, tp->snd_nxt))
 		goto invalid_ack;
 
-	if (icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
-	    icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
-		tcp_rearm_rto(sk);
-
 	if (after(ack, prior_snd_una))
 		flag |= FLAG_SND_UNA_ADVANCED;
 
@@ -3452,6 +3456,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 
 	pkts_acked = previous_packets_out - tp->packets_out;
 
+	if (tp->tlp_high_seq)
+		tcp_process_tlp_ack(sk, ack, flag);
+	/* If needed, reset TLP/RTO timer; RACK may later override this. */
+	if (flag & FLAG_SET_XMIT_TIMER)
+		tcp_set_xmit_timer(sk);
+
 	if (tcp_ack_is_dubious(sk, flag)) {
 		/* Advance CWND, if state allows this. */
 		if ((flag & FLAG_DATA_ACKED) && tcp_may_raise_cwnd(sk, flag))
@@ -3464,17 +3474,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 			tcp_cong_avoid(sk, ack, prior_in_flight);
 	}
 
-	if (tp->tlp_high_seq)
-		tcp_process_tlp_ack(sk, ack, flag);
-
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
 		struct dst_entry *dst = __sk_dst_get(sk);
 		if (dst)
 			dst_confirm(dst);
 	}
 
-	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
-		tcp_schedule_loss_probe(sk);
 	if (tp->srtt != prior_rtt || tp->snd_cwnd != prior_cwnd)
 		tcp_update_pacing_rate(sk);
 	return 1;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 135440283f21..f5d670ccd403 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1945,28 +1945,16 @@ repair:
 
 bool tcp_schedule_loss_probe(struct sock *sk)
 {
-	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 rtt = tp->srtt >> 3;
 	u32 timeout, rto_delta;
 
-	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS))
-		return false;
-	/* No consecutive loss probes. */
-	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
-		tcp_rearm_rto(sk);
-		return false;
-	}
 	/* Don't do any loss probe on a Fast Open connection before 3WHS
 	 * finishes.
 	 */
 	if (sk->sk_state == TCP_SYN_RECV)
 		return false;
 
-	/* TLP is only scheduled when next timer event is RTO. */
-	if (icsk->icsk_pending != ICSK_TIME_RETRANS)
-		return false;
-
 	/* Schedule a loss probe in 2*RTT for SACK capable connections
 	 * in Open state, that are either limited by cwnd or application.
 	 */
-- 
2.14.0.rc1.383.gd1ce394fe2-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-04 16:59     ` Neal Cardwell
@ 2017-08-04 17:10       ` Willy Tarreau
  2017-08-04 18:01         ` Neal Cardwell
  0 siblings, 1 reply; 22+ messages in thread
From: Willy Tarreau @ 2017-08-04 17:10 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: maowenan, David Miller, netdev, Yuchung Cheng, Nandita Dukkipati

Hi Neal,

On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
> I have attached patches for this fix rebased on to v3.10.107, the
> latest stable release for 3.10. That's pretty far back in history, so
> there were substantial conflict resolutions and adjustments required.
> :-) Hope that helps.

At least it will help me :-)

Do you suggest that I queue them for 3.10.108, that I wait for Maowenan
to test them more broadly first or anything else ?

I'm fine with any option.
Thanks!
Willy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-04 17:10       ` Willy Tarreau
@ 2017-08-04 18:01         ` Neal Cardwell
  2017-08-04 18:18           ` Willy Tarreau
  0 siblings, 1 reply; 22+ messages in thread
From: Neal Cardwell @ 2017-08-04 18:01 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: maowenan, David Miller, netdev, Yuchung Cheng, Nandita Dukkipati,
	Eric Dumazet

On Fri, Aug 4, 2017 at 1:10 PM, Willy Tarreau <w@1wt.eu> wrote:
> Hi Neal,
>
> On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
>> I have attached patches for this fix rebased on to v3.10.107, the
>> latest stable release for 3.10. That's pretty far back in history, so
>> there were substantial conflict resolutions and adjustments required.
>> :-) Hope that helps.
>
> At least it will help me :-)
>
> Do you suggest that I queue them for 3.10.108, that I wait for Maowenan
> to test them more broadly first or anything else ?

Let's wait for Maowenan to test them first.

Thanks!

neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-04 18:01         ` Neal Cardwell
@ 2017-08-04 18:18           ` Willy Tarreau
  2017-08-06  7:39             ` maowenan
  0 siblings, 1 reply; 22+ messages in thread
From: Willy Tarreau @ 2017-08-04 18:18 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: maowenan, David Miller, netdev, Yuchung Cheng, Nandita Dukkipati,
	Eric Dumazet

On Fri, Aug 04, 2017 at 02:01:34PM -0400, Neal Cardwell wrote:
> On Fri, Aug 4, 2017 at 1:10 PM, Willy Tarreau <w@1wt.eu> wrote:
> > Hi Neal,
> >
> > On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
> >> I have attached patches for this fix rebased on to v3.10.107, the
> >> latest stable release for 3.10. That's pretty far back in history, so
> >> there were substantial conflict resolutions and adjustments required.
> >> :-) Hope that helps.
> >
> > At least it will help me :-)
> >
> > Do you suggest that I queue them for 3.10.108, that I wait for Maowenan
> > to test them more broadly first or anything else ?
> 
> Let's wait for Maowenan to test them first.

Fine, thanks.
Willy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-04 18:18           ` Willy Tarreau
@ 2017-08-06  7:39             ` maowenan
  2017-08-06  7:44               ` Willy Tarreau
  0 siblings, 1 reply; 22+ messages in thread
From: maowenan @ 2017-08-06  7:39 UTC (permalink / raw)
  To: Willy Tarreau, Neal Cardwell
  Cc: David Miller, netdev, Yuchung Cheng, Nandita Dukkipati, Eric Dumazet



> -----Original Message-----
> From: Willy Tarreau [mailto:w@1wt.eu]
> Sent: Saturday, August 05, 2017 2:19 AM
> To: Neal Cardwell
> Cc: maowenan; David Miller; netdev@vger.kernel.org; Yuchung Cheng; Nandita
> Dukkipati; Eric Dumazet
> Subject: Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> ACKed/SACKed
> 
> On Fri, Aug 04, 2017 at 02:01:34PM -0400, Neal Cardwell wrote:
> > On Fri, Aug 4, 2017 at 1:10 PM, Willy Tarreau <w@1wt.eu> wrote:
> > > Hi Neal,
> > >
> > > On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
> > >> I have attached patches for this fix rebased on to v3.10.107, the
> > >> latest stable release for 3.10. That's pretty far back in history,
> > >> so there were substantial conflict resolutions and adjustments required.
> > >> :-) Hope that helps.
> > >
> > > At least it will help me :-)
> > >
> > > Do you suggest that I queue them for 3.10.108, that I wait for
> > > Maowenan to test them more broadly first or anything else ?
> >
> > Let's wait for Maowenan to test them first.
[Mao Wenan]It works well with these patches of v3.10, and the 
retransmission packet is about 250ms after TLP probe.
Thank you very much for these patches under 3.10.
> 
> Fine, thanks.
> Willy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-06  7:39             ` maowenan
@ 2017-08-06  7:44               ` Willy Tarreau
  2017-08-06 15:21                 ` Neal Cardwell
  2017-08-06 15:24                 ` Neal Cardwell
  0 siblings, 2 replies; 22+ messages in thread
From: Willy Tarreau @ 2017-08-06  7:44 UTC (permalink / raw)
  To: maowenan
  Cc: Neal Cardwell, David Miller, netdev, Yuchung Cheng,
	Nandita Dukkipati, Eric Dumazet

On Sun, Aug 06, 2017 at 07:39:57AM +0000, maowenan wrote:
> 
> 
> > -----Original Message-----
> > From: Willy Tarreau [mailto:w@1wt.eu]
> > Sent: Saturday, August 05, 2017 2:19 AM
> > To: Neal Cardwell
> > Cc: maowenan; David Miller; netdev@vger.kernel.org; Yuchung Cheng; Nandita
> > Dukkipati; Eric Dumazet
> > Subject: Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> > ACKed/SACKed
> > 
> > On Fri, Aug 04, 2017 at 02:01:34PM -0400, Neal Cardwell wrote:
> > > On Fri, Aug 4, 2017 at 1:10 PM, Willy Tarreau <w@1wt.eu> wrote:
> > > > Hi Neal,
> > > >
> > > > On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
> > > >> I have attached patches for this fix rebased on to v3.10.107, the
> > > >> latest stable release for 3.10. That's pretty far back in history,
> > > >> so there were substantial conflict resolutions and adjustments required.
> > > >> :-) Hope that helps.
> > > >
> > > > At least it will help me :-)
> > > >
> > > > Do you suggest that I queue them for 3.10.108, that I wait for
> > > > Maowenan to test them more broadly first or anything else ?
> > >
> > > Let's wait for Maowenan to test them first.
> [Mao Wenan]It works well with these patches of v3.10, and the 
> retransmission packet is about 250ms after TLP probe.
> Thank you very much for these patches under 3.10.

Thanks, I'll take them then.

Willy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-06  7:44               ` Willy Tarreau
@ 2017-08-06 15:21                 ` Neal Cardwell
  2017-08-06 15:24                 ` Neal Cardwell
  1 sibling, 0 replies; 22+ messages in thread
From: Neal Cardwell @ 2017-08-06 15:21 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: maowenan, David Miller, netdev, Yuchung Cheng, Nandita Dukkipati,
	Eric Dumazet

On Sun, Aug 6, 2017 at 3:44 AM, Willy Tarreau <w@1wt.eu> wrote:
>
> On Sun, Aug 06, 2017 at 07:39:57AM +0000, maowenan wrote:
> >
> > [Mao Wenan]It works well with these patches of v3.10, and the
> > retransmission packet is about 250ms after TLP probe.
> > Thank you very much for these patches under 3.10.

Great. Thanks for testing those patches!

> Thanks, I'll take them then.

And thanks for applying the patches to 3.10.y!

neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed
  2017-08-06  7:44               ` Willy Tarreau
  2017-08-06 15:21                 ` Neal Cardwell
@ 2017-08-06 15:24                 ` Neal Cardwell
  1 sibling, 0 replies; 22+ messages in thread
From: Neal Cardwell @ 2017-08-06 15:24 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: maowenan, David Miller, netdev, Yuchung Cheng, Nandita Dukkipati,
	Eric Dumazet

On Sun, Aug 6, 2017 at 3:44 AM, Willy Tarreau <w@1wt.eu> wrote:
>
> On Sun, Aug 06, 2017 at 07:39:57AM +0000, maowenan wrote:
> >
> > [Mao Wenan]It works well with these patches of v3.10, and the
> > retransmission packet is about 250ms after TLP probe.
> > Thank you very much for these patches under 3.10.

Great. Thanks for testing those patches!

> Thanks, I'll take them then.

And thanks for applying the patches to 3.10.y!

neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-06 15:24 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01  2:58 [PATCH net 0/3] tcp: fix xmit timer rearming to avoid stalls Neal Cardwell
2017-08-01  2:58 ` [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix Neal Cardwell
2017-08-01  7:16   ` Eric Dumazet
2017-08-01  2:58 ` [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire Neal Cardwell
2017-08-01  7:22   ` Eric Dumazet
2017-08-01 14:35     ` Neal Cardwell
2017-08-01 20:40       ` Eric Dumazet
2017-08-01  2:58 ` [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed Neal Cardwell
2017-08-01  7:25   ` Eric Dumazet
2017-08-01 12:20   ` maowenan
2017-08-01 14:24     ` Neal Cardwell
2017-08-04  7:12   ` maowenan
2017-08-04 15:54     ` Neal Cardwell
2017-08-04  7:33   ` maowenan
2017-08-04 16:59     ` Neal Cardwell
2017-08-04 17:10       ` Willy Tarreau
2017-08-04 18:01         ` Neal Cardwell
2017-08-04 18:18           ` Willy Tarreau
2017-08-06  7:39             ` maowenan
2017-08-06  7:44               ` Willy Tarreau
2017-08-06 15:21                 ` Neal Cardwell
2017-08-06 15:24                 ` Neal Cardwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.