netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] simplify TCP loss marking code
@ 2020-09-25 17:04 Yuchung Cheng
  2020-09-25 17:04 ` [PATCH net-next 1/4] tcp: consistently check retransmit hint Yuchung Cheng
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Yuchung Cheng @ 2020-09-25 17:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, Yuchung Cheng

The TCP loss marking is implemented by a set of intertwined
subroutines. TCP has several loss detection algorithms
(RACK, RFC6675/FACK, NewReno, etc) each calls a subset of
these routines to mark a packet lost. This has led to
various bugs (and fixes and fixes of fixes).

This patch set is to consolidate the loss marking code so
all detection algorithms call the same routine tcp_mark_skb_lost().

Yuchung Cheng (4):
  tcp: consistently check retransmit hint
  tcp: move tcp_mark_skb_lost
  tcp: simplify tcp_mark_skb_lost
  tcp: consolidate tcp_mark_skb_lost and tcp_skb_mark_lost

 net/ipv4/tcp_input.c    | 60 +++++++++++++++--------------------------
 net/ipv4/tcp_recovery.c | 16 +----------
 2 files changed, 23 insertions(+), 53 deletions(-)

-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next 1/4] tcp: consistently check retransmit hint
  2020-09-25 17:04 [PATCH net-next 0/4] simplify TCP loss marking code Yuchung Cheng
@ 2020-09-25 17:04 ` Yuchung Cheng
  2020-09-25 17:04 ` [PATCH net-next 2/4] tcp: move tcp_mark_skb_lost Yuchung Cheng
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Yuchung Cheng @ 2020-09-25 17:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, Yuchung Cheng

tcp_simple_retransmit() used for path MTU discovery may not adjust
the retransmit hint properly by deducting retrans_out before checking
it to adjust the hint. This patch fixes this by a correct routine
tcp_mark_skb_lost() already used by the RACK loss detection.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c    | 9 ++-------
 net/ipv4/tcp_recovery.c | 2 +-
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 50834e7f958e..f84420dc7d37 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2675,13 +2675,8 @@ void tcp_simple_retransmit(struct sock *sk)
 
 	skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
 		if (tcp_skb_seglen(skb) > mss &&
-		    !(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) {
-			if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS) {
-				TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
-				tp->retrans_out -= tcp_skb_pcount(skb);
-			}
-			tcp_skb_mark_lost_uncond_verify(tp, skb);
-		}
+		    !(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
+			tcp_mark_skb_lost(sk, skb);
 	}
 
 	tcp_clear_retrans_hints_partial(tp);
diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index fdb715bdd2d1..26a42289a870 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -246,6 +246,6 @@ void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced)
 			tcp_fragment(sk, TCP_FRAG_IN_RTX_QUEUE, skb,
 				     mss, mss, GFP_ATOMIC);
 
-		tcp_skb_mark_lost_uncond_verify(tp, skb);
+		tcp_mark_skb_lost(sk, skb);
 	}
 }
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 2/4] tcp: move tcp_mark_skb_lost
  2020-09-25 17:04 [PATCH net-next 0/4] simplify TCP loss marking code Yuchung Cheng
  2020-09-25 17:04 ` [PATCH net-next 1/4] tcp: consistently check retransmit hint Yuchung Cheng
@ 2020-09-25 17:04 ` Yuchung Cheng
  2020-09-25 17:04 ` [PATCH net-next 3/4] tcp: simplify tcp_mark_skb_lost Yuchung Cheng
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Yuchung Cheng @ 2020-09-25 17:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, Yuchung Cheng

A pure refactor to move tcp_mark_skb_lost to tcp_input.c to prepare
for the later loss marking consolidation.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c    | 14 ++++++++++++++
 net/ipv4/tcp_recovery.c | 14 --------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f84420dc7d37..0f8d33b95678 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1016,6 +1016,20 @@ static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb)
 		tp->retransmit_skb_hint = skb;
 }
 
+void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	tcp_skb_mark_lost_uncond_verify(tp, skb);
+	if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS) {
+		/* Account for retransmits that are lost again */
+		TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
+		tp->retrans_out -= tcp_skb_pcount(skb);
+		NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT,
+			      tcp_skb_pcount(skb));
+	}
+}
+
 /* Sum the number of packets on the wire we have marked as lost.
  * There are two cases we care about here:
  * a) Packet hasn't been marked lost (nor retransmitted),
diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 26a42289a870..f65a3ddd0d58 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -2,20 +2,6 @@
 #include <linux/tcp.h>
 #include <net/tcp.h>
 
-void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	tcp_skb_mark_lost_uncond_verify(tp, skb);
-	if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS) {
-		/* Account for retransmits that are lost again */
-		TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
-		tp->retrans_out -= tcp_skb_pcount(skb);
-		NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT,
-			      tcp_skb_pcount(skb));
-	}
-}
-
 static bool tcp_rack_sent_after(u64 t1, u64 t2, u32 seq1, u32 seq2)
 {
 	return t1 > t2 || (t1 == t2 && after(seq1, seq2));
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 3/4] tcp: simplify tcp_mark_skb_lost
  2020-09-25 17:04 [PATCH net-next 0/4] simplify TCP loss marking code Yuchung Cheng
  2020-09-25 17:04 ` [PATCH net-next 1/4] tcp: consistently check retransmit hint Yuchung Cheng
  2020-09-25 17:04 ` [PATCH net-next 2/4] tcp: move tcp_mark_skb_lost Yuchung Cheng
@ 2020-09-25 17:04 ` Yuchung Cheng
  2020-09-25 17:04 ` [PATCH net-next 4/4] tcp: consolidate tcp_mark_skb_lost and tcp_skb_mark_lost Yuchung Cheng
  2020-09-26  0:17 ` [PATCH net-next 0/4] simplify TCP loss marking code David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Yuchung Cheng @ 2020-09-25 17:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, Yuchung Cheng

This patch consolidates and simplifes the loss marking logic used
by a few loss detections (RACK, RFC6675, NewReno). Previously
each detection uses a subset of several intertwined subroutines.
This unncessary complexity has led to bugs (and fixes of bug fixes).

tcp_mark_skb_lost now is the single one routine to mark a packet loss
when a loss detection caller deems an skb ist lost:

   1. rewind tp->retransmit_hint_skb if skb has lower sequence or
      all lost ones have been retransmitted.

   2. book-keeping: adjust flags and counts depending on if skb was
      retransmitted or not.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 59 +++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 37 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0f8d33b95678..9be41b69a75b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1006,7 +1006,11 @@ static void tcp_check_sack_reordering(struct sock *sk, const u32 low_seq,
 		      ts ? LINUX_MIB_TCPTSREORDER : LINUX_MIB_TCPSACKREORDER);
 }
 
-/* This must be called before lost_out is incremented */
+ /* This must be called before lost_out or retrans_out are updated
+  * on a new loss, because we want to know if all skbs previously
+  * known to be lost have already been retransmitted, indicating
+  * that this newly lost skb is our next skb to retransmit.
+  */
 static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb)
 {
 	if ((!tp->retransmit_skb_hint && tp->retrans_out >= tp->lost_out) ||
@@ -1018,32 +1022,25 @@ static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb)
 
 void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
 {
+	__u8 sacked = TCP_SKB_CB(skb)->sacked;
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	tcp_skb_mark_lost_uncond_verify(tp, skb);
-	if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS) {
-		/* Account for retransmits that are lost again */
-		TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
-		tp->retrans_out -= tcp_skb_pcount(skb);
-		NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT,
-			      tcp_skb_pcount(skb));
-	}
-}
-
-/* Sum the number of packets on the wire we have marked as lost.
- * There are two cases we care about here:
- * a) Packet hasn't been marked lost (nor retransmitted),
- *    and this is the first loss.
- * b) Packet has been marked both lost and retransmitted,
- *    and this means we think it was lost again.
- */
-static void tcp_sum_lost(struct tcp_sock *tp, struct sk_buff *skb)
-{
-	__u8 sacked = TCP_SKB_CB(skb)->sacked;
+	if (sacked & TCPCB_SACKED_ACKED)
+		return;
 
-	if (!(sacked & TCPCB_LOST) ||
-	    ((sacked & TCPCB_LOST) && (sacked & TCPCB_SACKED_RETRANS)))
-		tp->lost += tcp_skb_pcount(skb);
+	tcp_verify_retransmit_hint(tp, skb);
+	if (sacked & TCPCB_LOST) {
+		if (sacked & TCPCB_SACKED_RETRANS) {
+			/* Account for retransmits that are lost again */
+			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
+			tp->retrans_out -= tcp_skb_pcount(skb);
+			NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT,
+				      tcp_skb_pcount(skb));
+		}
+	} else {
+		tp->lost_out += tcp_skb_pcount(skb);
+		TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
+	}
 }
 
 static void tcp_skb_mark_lost(struct tcp_sock *tp, struct sk_buff *skb)
@@ -1057,17 +1054,6 @@ static void tcp_skb_mark_lost(struct tcp_sock *tp, struct sk_buff *skb)
 	}
 }
 
-void tcp_skb_mark_lost_uncond_verify(struct tcp_sock *tp, struct sk_buff *skb)
-{
-	tcp_verify_retransmit_hint(tp, skb);
-
-	tcp_sum_lost(tp, skb);
-	if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_ACKED))) {
-		tp->lost_out += tcp_skb_pcount(skb);
-		TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
-	}
-}
-
 /* Updates the delivered and delivered_ce counts */
 static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
 				bool ece_ack)
@@ -2688,8 +2674,7 @@ void tcp_simple_retransmit(struct sock *sk)
 	unsigned int mss = tcp_current_mss(sk);
 
 	skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
-		if (tcp_skb_seglen(skb) > mss &&
-		    !(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
+		if (tcp_skb_seglen(skb) > mss)
 			tcp_mark_skb_lost(sk, skb);
 	}
 
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 4/4] tcp: consolidate tcp_mark_skb_lost and tcp_skb_mark_lost
  2020-09-25 17:04 [PATCH net-next 0/4] simplify TCP loss marking code Yuchung Cheng
                   ` (2 preceding siblings ...)
  2020-09-25 17:04 ` [PATCH net-next 3/4] tcp: simplify tcp_mark_skb_lost Yuchung Cheng
@ 2020-09-25 17:04 ` Yuchung Cheng
  2020-09-26  0:17 ` [PATCH net-next 0/4] simplify TCP loss marking code David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Yuchung Cheng @ 2020-09-25 17:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, Yuchung Cheng

tcp_skb_mark_lost is used by RFC6675-SACK and can easily be replaced
with the new tcp_mark_skb_lost handler.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9be41b69a75b..2ebfe87210f7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1043,17 +1043,6 @@ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
 	}
 }
 
-static void tcp_skb_mark_lost(struct tcp_sock *tp, struct sk_buff *skb)
-{
-	if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_ACKED))) {
-		tcp_verify_retransmit_hint(tp, skb);
-
-		tp->lost_out += tcp_skb_pcount(skb);
-		tcp_sum_lost(tp, skb);
-		TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
-	}
-}
-
 /* Updates the delivered and delivered_ce counts */
 static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
 				bool ece_ack)
@@ -2308,7 +2297,8 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head)
 		if (cnt > packets)
 			break;
 
-		tcp_skb_mark_lost(tp, skb);
+		if (!(TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
+			tcp_mark_skb_lost(sk, skb);
 
 		if (mark_head)
 			break;
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/4] simplify TCP loss marking code
  2020-09-25 17:04 [PATCH net-next 0/4] simplify TCP loss marking code Yuchung Cheng
                   ` (3 preceding siblings ...)
  2020-09-25 17:04 ` [PATCH net-next 4/4] tcp: consolidate tcp_mark_skb_lost and tcp_skb_mark_lost Yuchung Cheng
@ 2020-09-26  0:17 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2020-09-26  0:17 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, edumazet, ncardwell

From: Yuchung Cheng <ycheng@google.com>
Date: Fri, 25 Sep 2020 10:04:27 -0700

> The TCP loss marking is implemented by a set of intertwined
> subroutines. TCP has several loss detection algorithms
> (RACK, RFC6675/FACK, NewReno, etc) each calls a subset of
> these routines to mark a packet lost. This has led to
> various bugs (and fixes and fixes of fixes).
> 
> This patch set is to consolidate the loss marking code so
> all detection algorithms call the same routine tcp_mark_skb_lost().

Looks good, series applied.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-26  0:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 17:04 [PATCH net-next 0/4] simplify TCP loss marking code Yuchung Cheng
2020-09-25 17:04 ` [PATCH net-next 1/4] tcp: consistently check retransmit hint Yuchung Cheng
2020-09-25 17:04 ` [PATCH net-next 2/4] tcp: move tcp_mark_skb_lost Yuchung Cheng
2020-09-25 17:04 ` [PATCH net-next 3/4] tcp: simplify tcp_mark_skb_lost Yuchung Cheng
2020-09-25 17:04 ` [PATCH net-next 4/4] tcp: consolidate tcp_mark_skb_lost and tcp_skb_mark_lost Yuchung Cheng
2020-09-26  0:17 ` [PATCH net-next 0/4] simplify TCP loss marking code David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).