All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] tcp: default RACK loss recovery
@ 2018-05-16 23:40 Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 1/8] tcp: support DUPACK threshold in RACK Yuchung Cheng
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

This patch set implements the features correspond to the
draft-ietf-tcpm-rack-03 version of the RACK draft.
https://datatracker.ietf.org/meeting/101/materials/slides-101-tcpm-update-on-tcp-rack-00

1. SACK: implement equivalent DUPACK threshold heuristic in RACK to
   replace existing RFC6675 recovery (tcp_mark_head_lost).

2. Non-SACK: simplify RFC6582 NewReno implementation

3. RTO: apply RACK's time-based approach to avoid spuriouly
   marking very recently sent packets lost.

4. with (1)(2)(3), make RACK the exclusive fast recovery mechanism to
   mark losses based on time on S/ACK. Tail loss probe and F-RTO remain
   enabled by default as complementary mechanisms to send probes in
   CA_Open and CA_Loss states. The probes would solicit S/ACKs to trigger
   RACK time-based loss detection.

All Google web and internal servers have been running RACK-only mode
(4) for a while now. a/b experiments indicate RACK/TLP on average
reduces recovery latency by 10% compared to RFC6675. RFC6675
is default-off now but can be enabled by disabling RACK (sysctl
net.ipv4.tcp_recovery=0) for unseen issues.

Yuchung Cheng (8):
  tcp: support DUPACK threshold in RACK
  tcp: disable RFC6675 loss detection
  tcp: simpler NewReno implementation
  tcp: account lost retransmit after timeout
  tcp: new helper tcp_timeout_mark_lost
  tcp: separate loss marking and state update on RTO
  tcp: new helper tcp_rack_skb_timeout
  tcp: don't mark recently sent packets lost on RTO

 Documentation/networking/ip-sysctl.txt |  4 +-
 include/net/tcp.h                      |  5 ++
 net/ipv4/tcp_input.c                   | 99 ++++++++++++++------------
 net/ipv4/tcp_recovery.c                | 80 ++++++++++++++++-----
 4 files changed, 124 insertions(+), 64 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next 1/8] tcp: support DUPACK threshold in RACK
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 2/8] tcp: disable RFC6675 loss detection Yuchung Cheng
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

This patch adds support for the classic DUPACK threshold rule
(#DupThresh) in RACK.

When the number of packets SACKed is greater or equal to the
threshold, RACK sets the reordering window to zero which would
immediately mark all the unsacked packets below the highest SACKed
sequence lost. Since this approach is known to not work well with
reordering, RACK only uses it if no reordering has been observed.

The DUPACK threshold rule is a particularly useful extension to the
fast recoveries triggered by RACK reordering timer. For example
data-center transfers where the RTT is much smaller than a timer
tick, or high RTT path where the default RTT/4 may take too long.

Note that this patch differs slightly from RFC6675. RFC6675
considers a packet lost when at least #DupThresh higher-sequence
packets are SACKed.

With RACK, for connections that have seen reordering, RACK
continues to use a dynamically-adaptive time-based reordering
window to detect losses. But for connections on which we have not
yet seen reordering, this patch considers a packet lost when at
least one higher sequence packet is SACKed and the total number
of SACKed packets is at least DupThresh. For example, suppose a
connection has not seen reordering, and sends 10 packets, and
packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2
lost. RACK considers packets 1, 2, 4, 6 lost.

There is some small risk of spurious retransmits here due to
reordering. However, this is mostly limited to the first flight of
a connection on which the sender receives SACKs from reordering.
And RFC 6675 and FACK loss detection have a similar risk on the
first flight with reordering (it's just that the risk of spurious
retransmits from reordering was slightly narrower for those older
algorithms due to the margin of 3*MSS).

Also the minimum reordering window is reduced from 1 msec to 0
to recover quicker on short RTT transfers. Therefore RACK is more
aggressive in marking packets lost during recovery to reduce the
reordering window timeouts.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 Documentation/networking/ip-sysctl.txt |  1 +
 include/net/tcp.h                      |  1 +
 net/ipv4/tcp_recovery.c                | 40 +++++++++++++++++---------
 3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 59afc9a10b4f..13bbac50dc8b 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -451,6 +451,7 @@ tcp_recovery - INTEGER
 	RACK: 0x1 enables the RACK loss detection for fast detection of lost
 	      retransmissions and tail drops.
 	RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
+	RACK: 0x4 disables RACK's DUPACK threshold heuristic
 
 	Default: 0x1
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3b1d617b0110..85000c85ddcd 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -245,6 +245,7 @@ extern long sysctl_tcp_mem[3];
 
 #define TCP_RACK_LOSS_DETECTION  0x1 /* Use RACK to detect losses */
 #define TCP_RACK_STATIC_REO_WND  0x2 /* Use static RACK reo wnd */
+#define TCP_RACK_NO_DUPTHRESH    0x4 /* Do not use DUPACK threshold in RACK */
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 3a81720ac0c4..1c1bdf12a96f 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -21,6 +21,32 @@ static bool tcp_rack_sent_after(u64 t1, u64 t2, u32 seq1, u32 seq2)
 	return t1 > t2 || (t1 == t2 && after(seq1, seq2));
 }
 
+u32 tcp_rack_reo_wnd(const struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (!tp->rack.reord) {
+		/* If reordering has not been observed, be aggressive during
+		 * the recovery or starting the recovery by DUPACK threshold.
+		 */
+		if (inet_csk(sk)->icsk_ca_state >= TCP_CA_Recovery)
+			return 0;
+
+		if (tp->sacked_out >= tp->reordering &&
+		    !(sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_NO_DUPTHRESH))
+			return 0;
+	}
+
+	/* To be more reordering resilient, allow min_rtt/4 settling delay.
+	 * Use min_rtt instead of the smoothed RTT because reordering is
+	 * often a path property and less related to queuing or delayed ACKs.
+	 * Upon receiving DSACKs, linearly increase the window up to the
+	 * smoothed RTT.
+	 */
+	return min((tcp_min_rtt(tp) >> 2) * tp->rack.reo_wnd_steps,
+		   tp->srtt_us >> 3);
+}
+
 /* RACK loss detection (IETF draft draft-ietf-tcpm-rack-01):
  *
  * Marks a packet lost, if some packet sent later has been (s)acked.
@@ -44,23 +70,11 @@ static bool tcp_rack_sent_after(u64 t1, u64 t2, u32 seq1, u32 seq2)
 static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	u32 min_rtt = tcp_min_rtt(tp);
 	struct sk_buff *skb, *n;
 	u32 reo_wnd;
 
 	*reo_timeout = 0;
-	/* To be more reordering resilient, allow min_rtt/4 settling delay
-	 * (lower-bounded to 1000uS). We use min_rtt instead of the smoothed
-	 * RTT because reordering is often a path property and less related
-	 * to queuing or delayed ACKs.
-	 */
-	reo_wnd = 1000;
-	if ((tp->rack.reord || inet_csk(sk)->icsk_ca_state < TCP_CA_Recovery) &&
-	    min_rtt != ~0U) {
-		reo_wnd = max((min_rtt >> 2) * tp->rack.reo_wnd_steps, reo_wnd);
-		reo_wnd = min(reo_wnd, tp->srtt_us >> 3);
-	}
-
+	reo_wnd = tcp_rack_reo_wnd(sk);
 	list_for_each_entry_safe(skb, n, &tp->tsorted_sent_queue,
 				 tcp_tsorted_anchor) {
 		struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 2/8] tcp: disable RFC6675 loss detection
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 1/8] tcp: support DUPACK threshold in RACK Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 3/8] tcp: simpler NewReno implementation Yuchung Cheng
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

This patch disables RFC6675 loss detection and make sysctl
net.ipv4.tcp_recovery = 1 controls a binary choice between RACK
(1) or RFC6675 (0).

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 Documentation/networking/ip-sysctl.txt |  3 ++-
 net/ipv4/tcp_input.c                   | 12 ++++++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 13bbac50dc8b..ea304a23c8d7 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -449,7 +449,8 @@ tcp_recovery - INTEGER
 	features.
 
 	RACK: 0x1 enables the RACK loss detection for fast detection of lost
-	      retransmissions and tail drops.
+	      retransmissions and tail drops. It also subsumes and disables
+	      RFC6675 recovery for SACK connections.
 	RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
 	RACK: 0x4 disables RACK's DUPACK threshold heuristic
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b188e0d75edd..ccbe04f80040 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2035,6 +2035,11 @@ static inline int tcp_dupack_heuristics(const struct tcp_sock *tp)
 	return tp->sacked_out + 1;
 }
 
+static bool tcp_is_rack(const struct sock *sk)
+{
+	return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION;
+}
+
 /* Linux NewReno/SACK/ECN state machine.
  * --------------------------------------
  *
@@ -2141,7 +2146,7 @@ static bool tcp_time_to_recover(struct sock *sk, int flag)
 		return true;
 
 	/* Not-A-Trick#2 : Classic rule... */
-	if (tcp_dupack_heuristics(tp) > tp->reordering)
+	if (!tcp_is_rack(sk) && tcp_dupack_heuristics(tp) > tp->reordering)
 		return true;
 
 	return false;
@@ -2722,8 +2727,7 @@ static void tcp_rack_identify_loss(struct sock *sk, int *ack_flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	/* Use RACK to detect loss */
-	if (sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION) {
+	if (tcp_is_rack(sk)) {
 		u32 prior_retrans = tp->retrans_out;
 
 		tcp_rack_mark_lost(sk);
@@ -2862,7 +2866,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
 		fast_rexmit = 1;
 	}
 
-	if (do_lost)
+	if (!tcp_is_rack(sk) && do_lost)
 		tcp_update_scoreboard(sk, fast_rexmit);
 	*rexmit = REXMIT_LOST;
 }
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 3/8] tcp: simpler NewReno implementation
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 1/8] tcp: support DUPACK threshold in RACK Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 2/8] tcp: disable RFC6675 loss detection Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 4/8] tcp: account lost retransmit after timeout Yuchung Cheng
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

This is a rewrite of NewReno loss recovery implementation that is
simpler and standalone for readability and better performance by
using less states.

Note that NewReno refers to RFC6582 as a modification to the fast
recovery algorithm. It is used only if the connection does not
support SACK in Linux. It should not to be confused with the Reno
(AIMD) congestion control.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 include/net/tcp.h       |  1 +
 net/ipv4/tcp_input.c    | 19 +++++++++++--------
 net/ipv4/tcp_recovery.c | 27 +++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 85000c85ddcd..d7f81325bee5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1878,6 +1878,7 @@ void tcp_v4_init(void);
 void tcp_init(void);
 
 /* tcp_recovery.c */
+void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced);
 extern void tcp_rack_mark_lost(struct sock *sk);
 extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq,
 			     u64 xmit_time);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ccbe04f80040..076206873e3e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2223,9 +2223,7 @@ static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	if (tcp_is_reno(tp)) {
-		tcp_mark_head_lost(sk, 1, 1);
-	} else {
+	if (tcp_is_sack(tp)) {
 		int sacked_upto = tp->sacked_out - tp->reordering;
 		if (sacked_upto >= 0)
 			tcp_mark_head_lost(sk, sacked_upto, 0);
@@ -2723,11 +2721,16 @@ static bool tcp_try_undo_partial(struct sock *sk, u32 prior_snd_una)
 	return false;
 }
 
-static void tcp_rack_identify_loss(struct sock *sk, int *ack_flag)
+static void tcp_identify_packet_loss(struct sock *sk, int *ack_flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	if (tcp_is_rack(sk)) {
+	if (tcp_rtx_queue_empty(sk))
+		return;
+
+	if (unlikely(tcp_is_reno(tp))) {
+		tcp_newreno_mark_lost(sk, *ack_flag & FLAG_SND_UNA_ADVANCED);
+	} else if (tcp_is_rack(sk)) {
 		u32 prior_retrans = tp->retrans_out;
 
 		tcp_rack_mark_lost(sk);
@@ -2823,11 +2826,11 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
 			tcp_try_keep_open(sk);
 			return;
 		}
-		tcp_rack_identify_loss(sk, ack_flag);
+		tcp_identify_packet_loss(sk, ack_flag);
 		break;
 	case TCP_CA_Loss:
 		tcp_process_loss(sk, flag, is_dupack, rexmit);
-		tcp_rack_identify_loss(sk, ack_flag);
+		tcp_identify_packet_loss(sk, ack_flag);
 		if (!(icsk->icsk_ca_state == TCP_CA_Open ||
 		      (*ack_flag & FLAG_LOST_RETRANS)))
 			return;
@@ -2844,7 +2847,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
 		if (icsk->icsk_ca_state <= TCP_CA_Disorder)
 			tcp_try_undo_dsack(sk);
 
-		tcp_rack_identify_loss(sk, ack_flag);
+		tcp_identify_packet_loss(sk, ack_flag);
 		if (!tcp_time_to_recover(sk, flag)) {
 			tcp_try_to_open(sk, flag);
 			return;
diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 1c1bdf12a96f..299b0e38aa9a 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -216,3 +216,30 @@ void tcp_rack_update_reo_wnd(struct sock *sk, struct rate_sample *rs)
 		tp->rack.reo_wnd_steps = 1;
 	}
 }
+
+/* RFC6582 NewReno recovery for non-SACK connection. It simply retransmits
+ * the next unacked packet upon receiving
+ * a) three or more DUPACKs to start the fast recovery
+ * b) an ACK acknowledging new data during the fast recovery.
+ */
+void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced)
+{
+	const u8 state = inet_csk(sk)->icsk_ca_state;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if ((state < TCP_CA_Recovery && tp->sacked_out >= tp->reordering) ||
+	    (state == TCP_CA_Recovery && snd_una_advanced)) {
+		struct sk_buff *skb = tcp_rtx_queue_head(sk);
+		u32 mss;
+
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			return;
+
+		mss = tcp_skb_mss(skb);
+		if (tcp_skb_pcount(skb) > 1 && skb->len > mss)
+			tcp_fragment(sk, TCP_FRAG_IN_RTX_QUEUE, skb,
+				     mss, mss, GFP_ATOMIC);
+
+		tcp_skb_mark_lost_uncond_verify(tp, skb);
+	}
+}
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 4/8] tcp: account lost retransmit after timeout
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
                   ` (2 preceding siblings ...)
  2018-05-16 23:40 ` [PATCH net-next 3/8] tcp: simpler NewReno implementation Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 5/8] tcp: new helper tcp_timeout_mark_lost Yuchung Cheng
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

The previous approach for the lost and retransmit bits was to
wipe the slate clean: zero all the lost and retransmit bits,
correspondingly zero the lost_out and retrans_out counters, and
then add back the lost bits (and correspondingly increment lost_out).

The new approach is to treat this very much like marking packets
lost in fast recovery. We don’t wipe the slate clean. We just say
that for all packets that were not yet marked sacked or lost, we now
mark them as lost in exactly the same way we do for fast recovery.

This fixes the lost retransmit accounting at RTO time and greatly
simplifies the RTO code by sharing much of the logic with Fast
Recovery.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 include/net/tcp.h       |  1 +
 net/ipv4/tcp_input.c    | 18 +++---------------
 net/ipv4/tcp_recovery.c |  4 ++--
 3 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d7f81325bee5..402484ed9b57 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1878,6 +1878,7 @@ void tcp_v4_init(void);
 void tcp_init(void);
 
 /* tcp_recovery.c */
+void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb);
 void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced);
 extern void tcp_rack_mark_lost(struct sock *sk);
 extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 076206873e3e..6fb0a28977a0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1929,7 +1929,6 @@ void tcp_enter_loss(struct sock *sk)
 	struct sk_buff *skb;
 	bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
 	bool is_reneg;			/* is receiver reneging on SACKs? */
-	bool mark_lost;
 
 	/* Reduce ssthresh if it has not yet been made inside this window. */
 	if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
@@ -1945,9 +1944,6 @@ void tcp_enter_loss(struct sock *sk)
 	tp->snd_cwnd_cnt   = 0;
 	tp->snd_cwnd_stamp = tcp_jiffies32;
 
-	tp->retrans_out = 0;
-	tp->lost_out = 0;
-
 	if (tcp_is_reno(tp))
 		tcp_reset_reno_sack(tp);
 
@@ -1959,21 +1955,13 @@ void tcp_enter_loss(struct sock *sk)
 		/* Mark SACK reneging until we recover from this loss event. */
 		tp->is_sack_reneg = 1;
 	}
-	tcp_clear_all_retrans_hints(tp);
-
 	skb_rbtree_walk_from(skb) {
-		mark_lost = (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||
-			     is_reneg);
-		if (mark_lost)
-			tcp_sum_lost(tp, skb);
-		TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
-		if (mark_lost) {
+		if (is_reneg)
 			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
-			TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
-			tp->lost_out += tcp_skb_pcount(skb);
-		}
+		tcp_mark_skb_lost(sk, skb);
 	}
 	tcp_verify_left_out(tp);
+	tcp_clear_all_retrans_hints(tp);
 
 	/* Timeout in disordered state after receiving substantial DUPACKs
 	 * suggests that the degree of reordering is over-estimated.
diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 299b0e38aa9a..b2f9be388bf3 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -2,7 +2,7 @@
 #include <linux/tcp.h>
 #include <net/tcp.h>
 
-static void tcp_rack_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
+void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
@@ -95,7 +95,7 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 		remaining = tp->rack.rtt_us + reo_wnd -
 			    tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp);
 		if (remaining <= 0) {
-			tcp_rack_mark_skb_lost(sk, skb);
+			tcp_mark_skb_lost(sk, skb);
 			list_del_init(&skb->tcp_tsorted_anchor);
 		} else {
 			/* Record maximum wait time */
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 5/8] tcp: new helper tcp_timeout_mark_lost
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
                   ` (3 preceding siblings ...)
  2018-05-16 23:40 ` [PATCH net-next 4/8] tcp: account lost retransmit after timeout Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 6/8] tcp: separate loss marking and state update on RTO Yuchung Cheng
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets
lost upon RTO.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 net/ipv4/tcp_input.c | 50 +++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6fb0a28977a0..af32accda2a9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1917,18 +1917,43 @@ static inline void tcp_init_undo(struct tcp_sock *tp)
 	tp->undo_retrans = tp->retrans_out ? : -1;
 }
 
-/* Enter Loss state. If we detect SACK reneging, forget all SACK information
+/* If we detect SACK reneging, forget all SACK information
  * and reset tags completely, otherwise preserve SACKs. If receiver
  * dropped its ofo queue, we will know this due to reneging detection.
  */
+static void tcp_timeout_mark_lost(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	bool is_reneg;			/* is receiver reneging on SACKs? */
+
+	skb = tcp_rtx_queue_head(sk);
+	is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED);
+	if (is_reneg) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
+		tp->sacked_out = 0;
+		/* Mark SACK reneging until we recover from this loss event. */
+		tp->is_sack_reneg = 1;
+	} else if (tcp_is_reno(tp)) {
+		tcp_reset_reno_sack(tp);
+	}
+
+	skb_rbtree_walk_from(skb) {
+		if (is_reneg)
+			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
+		tcp_mark_skb_lost(sk, skb);
+	}
+	tcp_verify_left_out(tp);
+	tcp_clear_all_retrans_hints(tp);
+}
+
+/* Enter Loss state. */
 void tcp_enter_loss(struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct net *net = sock_net(sk);
-	struct sk_buff *skb;
 	bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
-	bool is_reneg;			/* is receiver reneging on SACKs? */
 
 	/* Reduce ssthresh if it has not yet been made inside this window. */
 	if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
@@ -1944,24 +1969,7 @@ void tcp_enter_loss(struct sock *sk)
 	tp->snd_cwnd_cnt   = 0;
 	tp->snd_cwnd_stamp = tcp_jiffies32;
 
-	if (tcp_is_reno(tp))
-		tcp_reset_reno_sack(tp);
-
-	skb = tcp_rtx_queue_head(sk);
-	is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED);
-	if (is_reneg) {
-		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
-		tp->sacked_out = 0;
-		/* Mark SACK reneging until we recover from this loss event. */
-		tp->is_sack_reneg = 1;
-	}
-	skb_rbtree_walk_from(skb) {
-		if (is_reneg)
-			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
-		tcp_mark_skb_lost(sk, skb);
-	}
-	tcp_verify_left_out(tp);
-	tcp_clear_all_retrans_hints(tp);
+	tcp_timeout_mark_lost(sk);
 
 	/* Timeout in disordered state after receiving substantial DUPACKs
 	 * suggests that the degree of reordering is over-estimated.
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 6/8] tcp: separate loss marking and state update on RTO
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
                   ` (4 preceding siblings ...)
  2018-05-16 23:40 ` [PATCH net-next 5/8] tcp: new helper tcp_timeout_mark_lost Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 7/8] tcp: new helper tcp_rack_skb_timeout Yuchung Cheng
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

Previously when TCP times out, it first updates cwnd and ssthresh,
marks packets lost, and then updates congestion state again. This
was fine because everything not yet delivered is marked lost,
so the inflight is always 0 and cwnd can be safely set to 1 to
retransmit one packet on timeout.

But the inflight may not always be 0 on timeout if TCP changes to
mark packets lost based on packet sent time. Therefore we must
first mark the packet lost, then set the cwnd based on the
(updated) inflight.

This is not a pure refactor. Congestion control may potentially
break if it uses (not yet updated) inflight to compute ssthresh.
Fortunately all existing congestion control modules does not do that.
Also it changes the inflight when CA_LOSS_EVENT is called, and only
westwood processes such an event but does not use inflight.

This change has two other minor side benefits:
1) consistent with Fast Recovery s.t. the inflight is updated
   first before tcp_enter_recovery flips state to CA_Recovery.

2) avoid intertwining loss marking with state update, making the
   code more readable.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 net/ipv4/tcp_input.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index af32accda2a9..1ccc97b368c7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1955,6 +1955,8 @@ void tcp_enter_loss(struct sock *sk)
 	struct net *net = sock_net(sk);
 	bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
 
+	tcp_timeout_mark_lost(sk);
+
 	/* Reduce ssthresh if it has not yet been made inside this window. */
 	if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
 	    !after(tp->high_seq, tp->snd_una) ||
@@ -1969,8 +1971,6 @@ void tcp_enter_loss(struct sock *sk)
 	tp->snd_cwnd_cnt   = 0;
 	tp->snd_cwnd_stamp = tcp_jiffies32;
 
-	tcp_timeout_mark_lost(sk);
-
 	/* Timeout in disordered state after receiving substantial DUPACKs
 	 * suggests that the degree of reordering is over-estimated.
 	 */
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 7/8] tcp: new helper tcp_rack_skb_timeout
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
                   ` (5 preceding siblings ...)
  2018-05-16 23:40 ` [PATCH net-next 6/8] tcp: separate loss marking and state update on RTO Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-16 23:40 ` [PATCH net-next 8/8] tcp: don't mark recently sent packets lost on RTO Yuchung Cheng
  2018-05-17 19:45 ` [PATCH net-next 0/8] tcp: default RACK loss recovery David Miller
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack
to prepare the final RTO change.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 include/net/tcp.h       |  2 ++
 net/ipv4/tcp_input.c    | 10 +++++-----
 net/ipv4/tcp_recovery.c |  9 +++++++--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 402484ed9b57..b46d0f9adbdb 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1880,6 +1880,8 @@ void tcp_init(void);
 /* tcp_recovery.c */
 void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb);
 void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced);
+extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb,
+				u32 reo_wnd);
 extern void tcp_rack_mark_lost(struct sock *sk);
 extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq,
 			     u64 xmit_time);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1ccc97b368c7..ba8a8e3464aa 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1917,6 +1917,11 @@ static inline void tcp_init_undo(struct tcp_sock *tp)
 	tp->undo_retrans = tp->retrans_out ? : -1;
 }
 
+static bool tcp_is_rack(const struct sock *sk)
+{
+	return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION;
+}
+
 /* If we detect SACK reneging, forget all SACK information
  * and reset tags completely, otherwise preserve SACKs. If receiver
  * dropped its ofo queue, we will know this due to reneging detection.
@@ -2031,11 +2036,6 @@ static inline int tcp_dupack_heuristics(const struct tcp_sock *tp)
 	return tp->sacked_out + 1;
 }
 
-static bool tcp_is_rack(const struct sock *sk)
-{
-	return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION;
-}
-
 /* Linux NewReno/SACK/ECN state machine.
  * --------------------------------------
  *
diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index b2f9be388bf3..30cbfb69b1de 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -47,6 +47,12 @@ u32 tcp_rack_reo_wnd(const struct sock *sk)
 		   tp->srtt_us >> 3);
 }
 
+s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, u32 reo_wnd)
+{
+	return tp->rack.rtt_us + reo_wnd -
+	       tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp);
+}
+
 /* RACK loss detection (IETF draft draft-ietf-tcpm-rack-01):
  *
  * Marks a packet lost, if some packet sent later has been (s)acked.
@@ -92,8 +98,7 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 		/* A packet is lost if it has not been s/acked beyond
 		 * the recent RTT plus the reordering window.
 		 */
-		remaining = tp->rack.rtt_us + reo_wnd -
-			    tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp);
+		remaining = tcp_rack_skb_timeout(tp, skb, reo_wnd);
 		if (remaining <= 0) {
 			tcp_mark_skb_lost(sk, skb);
 			list_del_init(&skb->tcp_tsorted_anchor);
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 8/8] tcp: don't mark recently sent packets lost on RTO
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
                   ` (6 preceding siblings ...)
  2018-05-16 23:40 ` [PATCH net-next 7/8] tcp: new helper tcp_rack_skb_timeout Yuchung Cheng
@ 2018-05-16 23:40 ` Yuchung Cheng
  2018-05-17 19:45 ` [PATCH net-next 0/8] tcp: default RACK loss recovery David Miller
  8 siblings, 0 replies; 11+ messages in thread
From: Yuchung Cheng @ 2018-05-16 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha, Yuchung Cheng

An RTO event indicates the head has not been acked for a long time
after its last (re)transmission. But the other packets are not
necessarily lost if they have been only sent recently (for example
due to application limit). This patch would prohibit marking packets
sent within an RTT to be lost on RTO event, using similar logic in
TCP RACK detection.

Normally the head (SND.UNA) would be marked lost since RTO should
fire strictly after the head was sent. An exception is when the
most recent RACK RTT measurement is larger than the (previous)
RTO. To address this exception the head is always marked lost.

Congestion control interaction: since we may not mark every packet
lost, the congestion window may be more than 1 (inflight plus 1).
But only one packet will be retransmitted after RTO, since
tcp_retransmit_timer() calls tcp_retransmit_skb(...,segs=1). The
connection still performs slow start from one packet (with Cubic
congestion control).

This commit was tested in an A/B test with Google web servers,
and showed a reduction of 2% in (spurious) retransmits post
timeout (SlowStartRetrans), and correspondingly reduced DSACKs
(DSACKIgnoredOld) by 7%.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
---
 net/ipv4/tcp_input.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ba8a8e3464aa..0bf032839548 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1929,11 +1929,11 @@ static bool tcp_is_rack(const struct sock *sk)
 static void tcp_timeout_mark_lost(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sk_buff *skb;
+	struct sk_buff *skb, *head;
 	bool is_reneg;			/* is receiver reneging on SACKs? */
 
-	skb = tcp_rtx_queue_head(sk);
-	is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED);
+	head = tcp_rtx_queue_head(sk);
+	is_reneg = head && (TCP_SKB_CB(head)->sacked & TCPCB_SACKED_ACKED);
 	if (is_reneg) {
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
 		tp->sacked_out = 0;
@@ -1943,9 +1943,13 @@ static void tcp_timeout_mark_lost(struct sock *sk)
 		tcp_reset_reno_sack(tp);
 	}
 
+	skb = head;
 	skb_rbtree_walk_from(skb) {
 		if (is_reneg)
 			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
+		else if (tcp_is_rack(sk) && skb != head &&
+			 tcp_rack_skb_timeout(tp, skb, 0) > 0)
+			continue; /* Don't mark recently sent ones lost yet */
 		tcp_mark_skb_lost(sk, skb);
 	}
 	tcp_verify_left_out(tp);
@@ -1972,7 +1976,7 @@ void tcp_enter_loss(struct sock *sk)
 		tcp_ca_event(sk, CA_EVENT_LOSS);
 		tcp_init_undo(tp);
 	}
-	tp->snd_cwnd	   = 1;
+	tp->snd_cwnd	   = tcp_packets_in_flight(tp) + 1;
 	tp->snd_cwnd_cnt   = 0;
 	tp->snd_cwnd_stamp = tcp_jiffies32;
 
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/8] tcp: default RACK loss recovery
  2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
                   ` (7 preceding siblings ...)
  2018-05-16 23:40 ` [PATCH net-next 8/8] tcp: don't mark recently sent packets lost on RTO Yuchung Cheng
@ 2018-05-17 19:45 ` David Miller
  2018-05-21 18:39   ` hiren panchasara
  8 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2018-05-17 19:45 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, edumazet, ncardwell, soheil, priyarjha

From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 16 May 2018 16:40:09 -0700

> This patch set implements the features correspond to the
> draft-ietf-tcpm-rack-03 version of the RACK draft.
> https://datatracker.ietf.org/meeting/101/materials/slides-101-tcpm-update-on-tcp-rack-00
> 
> 1. SACK: implement equivalent DUPACK threshold heuristic in RACK to
>    replace existing RFC6675 recovery (tcp_mark_head_lost).
> 
> 2. Non-SACK: simplify RFC6582 NewReno implementation
> 
> 3. RTO: apply RACK's time-based approach to avoid spuriouly
>    marking very recently sent packets lost.
> 
> 4. with (1)(2)(3), make RACK the exclusive fast recovery mechanism to
>    mark losses based on time on S/ACK. Tail loss probe and F-RTO remain
>    enabled by default as complementary mechanisms to send probes in
>    CA_Open and CA_Loss states. The probes would solicit S/ACKs to trigger
>    RACK time-based loss detection.
> 
> All Google web and internal servers have been running RACK-only mode
> (4) for a while now. a/b experiments indicate RACK/TLP on average
> reduces recovery latency by 10% compared to RFC6675. RFC6675
> is default-off now but can be enabled by disabling RACK (sysctl
> net.ipv4.tcp_recovery=0) for unseen issues.

Series applied.

These patches, the design of the ordering of changes in the patch series,
and the commit messages themselves were more than a pleasure to read.

Really, this patch series is a great model for others who want to
improve the quality and reviewability of their submissions.

Thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/8] tcp: default RACK loss recovery
  2018-05-17 19:45 ` [PATCH net-next 0/8] tcp: default RACK loss recovery David Miller
@ 2018-05-21 18:39   ` hiren panchasara
  0 siblings, 0 replies; 11+ messages in thread
From: hiren panchasara @ 2018-05-21 18:39 UTC (permalink / raw)
  To: David Miller; +Cc: ycheng, netdev, edumazet, ncardwell, soheil, priyarjha

[-- Attachment #1: Type: text/plain, Size: 1830 bytes --]

On 05/17/18 at 03:45P, David Miller wrote:
> From: Yuchung Cheng <ycheng@google.com>
> Date: Wed, 16 May 2018 16:40:09 -0700
> 
> > This patch set implements the features correspond to the
> > draft-ietf-tcpm-rack-03 version of the RACK draft.
> > https://datatracker.ietf.org/meeting/101/materials/slides-101-tcpm-update-on-tcp-rack-00
> > 
> > 1. SACK: implement equivalent DUPACK threshold heuristic in RACK to
> >    replace existing RFC6675 recovery (tcp_mark_head_lost).
> > 
> > 2. Non-SACK: simplify RFC6582 NewReno implementation
> > 
> > 3. RTO: apply RACK's time-based approach to avoid spuriouly
> >    marking very recently sent packets lost.
> > 
> > 4. with (1)(2)(3), make RACK the exclusive fast recovery mechanism to
> >    mark losses based on time on S/ACK. Tail loss probe and F-RTO remain
> >    enabled by default as complementary mechanisms to send probes in
> >    CA_Open and CA_Loss states. The probes would solicit S/ACKs to trigger
> >    RACK time-based loss detection.
> > 
> > All Google web and internal servers have been running RACK-only mode
> > (4) for a while now. a/b experiments indicate RACK/TLP on average
> > reduces recovery latency by 10% compared to RFC6675. RFC6675
> > is default-off now but can be enabled by disabling RACK (sysctl
> > net.ipv4.tcp_recovery=0) for unseen issues.
> 
> Series applied.
> 
> These patches, the design of the ordering of changes in the patch series,
> and the commit messages themselves were more than a pleasure to read.
> 
> Really, this patch series is a great model for others who want to
> improve the quality and reviewability of their submissions.

Second that as a maintainer of a non-linux stack. Thanks a ton for such
clear and crisp commit-log messages along with the actual changes.

Cheers,
Hiren

[-- Attachment #2: Type: application/pgp-signature, Size: 603 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-21 18:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16 23:40 [PATCH net-next 0/8] tcp: default RACK loss recovery Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 1/8] tcp: support DUPACK threshold in RACK Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 2/8] tcp: disable RFC6675 loss detection Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 3/8] tcp: simpler NewReno implementation Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 4/8] tcp: account lost retransmit after timeout Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 5/8] tcp: new helper tcp_timeout_mark_lost Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 6/8] tcp: separate loss marking and state update on RTO Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 7/8] tcp: new helper tcp_rack_skb_timeout Yuchung Cheng
2018-05-16 23:40 ` [PATCH net-next 8/8] tcp: don't mark recently sent packets lost on RTO Yuchung Cheng
2018-05-17 19:45 ` [PATCH net-next 0/8] tcp: default RACK loss recovery David Miller
2018-05-21 18:39   ` hiren panchasara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.