All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] tcp: better deal with delayed TX completions
@ 2021-03-11 20:35 Eric Dumazet
  2021-03-11 20:35 ` [PATCH net-next 1/3] tcp: plug skb_still_in_host_queue() to TSQ Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Eric Dumazet @ 2021-03-11 20:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Neil Spring

From: Eric Dumazet <edumazet@google.com>

Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)

While problems have been there forever, second patch might
introduce some regressions so I prefer not backport
them to stable releases before things settle.

Many thanks to FB team for their help and tests.

Few packetdrill tests need to be changed to reflect
the improvements brought by this series.

Eric Dumazet (3):
  tcp: plug skb_still_in_host_queue() to TSQ
  tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()
  tcp: remove obsolete check in __tcp_retransmit_skb()

 include/linux/skbuff.h |  2 +-
 net/ipv4/tcp_input.c   | 10 ++++------
 net/ipv4/tcp_output.c  | 20 ++++++++------------
 3 files changed, 13 insertions(+), 19 deletions(-)

-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/3] tcp: plug skb_still_in_host_queue() to TSQ
  2021-03-11 20:35 [PATCH net-next 0/3] tcp: better deal with delayed TX completions Eric Dumazet
@ 2021-03-11 20:35 ` Eric Dumazet
  2021-03-11 20:35 ` [PATCH net-next 2/3] tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack() Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2021-03-11 20:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Neil Spring

From: Eric Dumazet <edumazet@google.com>

Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)

Main issue is that TCP stack has a logic preventing a packet
being retransmit if the prior clone has not yet been
orphaned or freed.

This logic came with commit 1f3279ae0c13 ("tcp: avoid
retransmits of TCP packets hanging in host queues")

Thankfully, in the case skb_still_in_host_queue() detects
the initial clone is still in flight, it can use TSQ logic
that will eventually retry later, at the moment the clone
is freed or orphaned.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Neil Spring <ntspring@fb.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 include/linux/skbuff.h |  2 +-
 net/ipv4/tcp_output.c  | 12 ++++++++----
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0503c917d77301f433122bf34a659bb855763144..483e89348f78b48235748de37ae3ea7ec9450491 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1140,7 +1140,7 @@ static inline bool skb_fclone_busy(const struct sock *sk,
 
 	return skb->fclone == SKB_FCLONE_ORIG &&
 	       refcount_read(&fclones->fclone_ref) > 1 &&
-	       fclones->skb2.sk == sk;
+	       READ_ONCE(fclones->skb2.sk) == sk;
 }
 
 /**
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fbf140a770d8e21b936369b79abbe9857537acd8..0dbf208a4f2f17c630084e87f4a9a2ad0dc24168 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2775,13 +2775,17 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto)
  * a packet is still in a qdisc or driver queue.
  * In this case, there is very little point doing a retransmit !
  */
-static bool skb_still_in_host_queue(const struct sock *sk,
+static bool skb_still_in_host_queue(struct sock *sk,
 				    const struct sk_buff *skb)
 {
 	if (unlikely(skb_fclone_busy(sk, skb))) {
-		NET_INC_STATS(sock_net(sk),
-			      LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES);
-		return true;
+		set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
+		smp_mb__after_atomic();
+		if (skb_fclone_busy(sk, skb)) {
+			NET_INC_STATS(sock_net(sk),
+				      LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES);
+			return true;
+		}
 	}
 	return false;
 }
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/3] tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()
  2021-03-11 20:35 [PATCH net-next 0/3] tcp: better deal with delayed TX completions Eric Dumazet
  2021-03-11 20:35 ` [PATCH net-next 1/3] tcp: plug skb_still_in_host_queue() to TSQ Eric Dumazet
@ 2021-03-11 20:35 ` Eric Dumazet
  2021-03-11 20:35 ` [PATCH net-next 3/3] tcp: remove obsolete check in __tcp_retransmit_skb() Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2021-03-11 20:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Neil Spring

From: Eric Dumazet <edumazet@google.com>

Jakub reported Data included in a Fastopen SYN that had to be
retransmit would have to wait for an RTO if TX completions are slow,
even with prior fix.

This is because tcp_rcv_fastopen_synack() does not use standard
rtx logic, meaning TSQ handler exits early in tcp_tsq_write()
because tp->lost_out == tp->retrans_out

Lets make tcp_rcv_fastopen_synack() use standard rtx logic,
by using tcp_mark_skb_lost() on the skb thats needs to be
sent again.

Not this raised a warning in tcp_fastretrans_alert() during my tests
since we consider the data not being aknowledged
by the receiver does not mean packet was lost on the network.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp_input.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 69a545db80d2ead47ffcf2f3819a6d066e95f35d..4cf4dd532d1c65bba417a66ba6b7783491b6380a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2914,7 +2914,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
 	/* D. Check state exit conditions. State can be terminated
 	 *    when high_seq is ACKed. */
 	if (icsk->icsk_ca_state == TCP_CA_Open) {
-		WARN_ON(tp->retrans_out != 0);
+		WARN_ON(tp->retrans_out != 0 && !tp->syn_data);
 		tp->retrans_stamp = 0;
 	} else if (!before(tp->snd_una, tp->high_seq)) {
 		switch (icsk->icsk_ca_state) {
@@ -5994,11 +5994,9 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
 			tp->fastopen_client_fail = TFO_SYN_RETRANSMITTED;
 		else
 			tp->fastopen_client_fail = TFO_DATA_NOT_ACKED;
-		skb_rbtree_walk_from(data) {
-			if (__tcp_retransmit_skb(sk, data, 1))
-				break;
-		}
-		tcp_rearm_rto(sk);
+		skb_rbtree_walk_from(data)
+			 tcp_mark_skb_lost(sk, data);
+		tcp_xmit_retransmit_queue(sk);
 		NET_INC_STATS(sock_net(sk),
 				LINUX_MIB_TCPFASTOPENACTIVEFAIL);
 		return true;
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 3/3] tcp: remove obsolete check in __tcp_retransmit_skb()
  2021-03-11 20:35 [PATCH net-next 0/3] tcp: better deal with delayed TX completions Eric Dumazet
  2021-03-11 20:35 ` [PATCH net-next 1/3] tcp: plug skb_still_in_host_queue() to TSQ Eric Dumazet
  2021-03-11 20:35 ` [PATCH net-next 2/3] tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack() Eric Dumazet
@ 2021-03-11 20:35 ` Eric Dumazet
  2021-03-12 18:18 ` [PATCH net-next 0/3] tcp: better deal with delayed TX completions Jakub Kicinski
  2021-03-12 20:40 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2021-03-11 20:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Neil Spring

From: Eric Dumazet <edumazet@google.com>

TSQ provides a nice way to avoid bufferbloat on individual socket,
including retransmit packets. We can get rid of the old
heuristic:

	/* Do not sent more than we queued. 1/4 is reserved for possible
	 * copying overhead: fragmentation, tunneling, mangling etc.
	 */
	if (refcount_read(&sk->sk_wmem_alloc) >
	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
		  sk->sk_sndbuf))
		return -EAGAIN;

This heuristic was giving false positives according to Jakub,
whenever TX completions are delayed above RTT. (Ack packets
are processed by TCP stack before clones are orphaned/freed)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp_output.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0dbf208a4f2f17c630084e87f4a9a2ad0dc24168..bde781f46b41a5dd9eb8db3fb65b45d73e592b4b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3151,14 +3151,6 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 	if (icsk->icsk_mtup.probe_size)
 		icsk->icsk_mtup.probe_size = 0;
 
-	/* Do not sent more than we queued. 1/4 is reserved for possible
-	 * copying overhead: fragmentation, tunneling, mangling etc.
-	 */
-	if (refcount_read(&sk->sk_wmem_alloc) >
-	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
-		  sk->sk_sndbuf))
-		return -EAGAIN;
-
 	if (skb_still_in_host_queue(sk, skb))
 		return -EBUSY;
 
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] tcp: better deal with delayed TX completions
  2021-03-11 20:35 [PATCH net-next 0/3] tcp: better deal with delayed TX completions Eric Dumazet
                   ` (2 preceding siblings ...)
  2021-03-11 20:35 ` [PATCH net-next 3/3] tcp: remove obsolete check in __tcp_retransmit_skb() Eric Dumazet
@ 2021-03-12 18:18 ` Jakub Kicinski
  2021-03-12 19:04   ` Yuchung Cheng
  2021-03-12 20:40 ` patchwork-bot+netdevbpf
  4 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2021-03-12 18:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Neal Cardwell,
	Yuchung Cheng, Neil Spring

On Thu, 11 Mar 2021 12:35:03 -0800 Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Jakub and Neil reported an increase of RTO timers whenever
> TX completions are delayed a bit more (by increasing
> NIC TX coalescing parameters)
> 
> While problems have been there forever, second patch might
> introduce some regressions so I prefer not backport
> them to stable releases before things settle.
> 
> Many thanks to FB team for their help and tests.
> 
> Few packetdrill tests need to be changed to reflect
> the improvements brought by this series.

FWIW I run some workloads with this for a day and looks good:

Tested-by: Jakub Kicinski <kuba@kernel.org>

Thank you!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] tcp: better deal with delayed TX completions
  2021-03-12 18:18 ` [PATCH net-next 0/3] tcp: better deal with delayed TX completions Jakub Kicinski
@ 2021-03-12 19:04   ` Yuchung Cheng
  2021-03-12 19:09     ` Neal Cardwell
  0 siblings, 1 reply; 8+ messages in thread
From: Yuchung Cheng @ 2021-03-12 19:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, David S . Miller, netdev, Eric Dumazet,
	Neal Cardwell, Neil Spring

On Fri, Mar 12, 2021 at 10:18 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 11 Mar 2021 12:35:03 -0800 Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Jakub and Neil reported an increase of RTO timers whenever
> > TX completions are delayed a bit more (by increasing
> > NIC TX coalescing parameters)
> >
> > While problems have been there forever, second patch might
> > introduce some regressions so I prefer not backport
> > them to stable releases before things settle.
> >
> > Many thanks to FB team for their help and tests.
> >
> > Few packetdrill tests need to be changed to reflect
> > the improvements brought by this series.
>
> FWIW I run some workloads with this for a day and looks good:
>
> Tested-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Yuchung Cheng <ycheng@google.com>
Thank you Eric for fixing the bug.

>
> Thank you!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] tcp: better deal with delayed TX completions
  2021-03-12 19:04   ` Yuchung Cheng
@ 2021-03-12 19:09     ` Neal Cardwell
  0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2021-03-12 19:09 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Jakub Kicinski, Eric Dumazet, David S . Miller, netdev,
	Eric Dumazet, Neil Spring

On Fri, Mar 12, 2021 at 2:05 PM Yuchung Cheng <ycheng@google.com> wrote:
>
> On Fri, Mar 12, 2021 at 10:18 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 11 Mar 2021 12:35:03 -0800 Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > Jakub and Neil reported an increase of RTO timers whenever
> > > TX completions are delayed a bit more (by increasing
> > > NIC TX coalescing parameters)
> > >
> > > While problems have been there forever, second patch might
> > > introduce some regressions so I prefer not backport
> > > them to stable releases before things settle.
> > >
> > > Many thanks to FB team for their help and tests.
> > >
> > > Few packetdrill tests need to be changed to reflect
> > > the improvements brought by this series.
> >
> > FWIW I run some workloads with this for a day and looks good:
> >
> > Tested-by: Jakub Kicinski <kuba@kernel.org>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Thank you Eric for fixing the bug.

The series looks good to me as well.

Re this one:
-               WARN_ON(tp->retrans_out != 0);
+               WARN_ON(tp->retrans_out != 0 && !tp->syn_data);

it seems a little unfortunate to lose the power of this WARN_ON for
the lifetime of TFO connections, but I do not have a better idea. :-)

thanks,
neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] tcp: better deal with delayed TX completions
  2021-03-11 20:35 [PATCH net-next 0/3] tcp: better deal with delayed TX completions Eric Dumazet
                   ` (3 preceding siblings ...)
  2021-03-12 18:18 ` [PATCH net-next 0/3] tcp: better deal with delayed TX completions Jakub Kicinski
@ 2021-03-12 20:40 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-03-12 20:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, netdev, edumazet, ncardwell, ycheng, ntspring

Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Thu, 11 Mar 2021 12:35:03 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Jakub and Neil reported an increase of RTO timers whenever
> TX completions are delayed a bit more (by increasing
> NIC TX coalescing parameters)
> 
> While problems have been there forever, second patch might
> introduce some regressions so I prefer not backport
> them to stable releases before things settle.
> 
> [...]

Here is the summary with links:
  - [net-next,1/3] tcp: plug skb_still_in_host_queue() to TSQ
    https://git.kernel.org/netdev/net-next/c/f4dae54e486d
  - [net-next,2/3] tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()
    https://git.kernel.org/netdev/net-next/c/a7abf3cd76e1
  - [net-next,3/3] tcp: remove obsolete check in __tcp_retransmit_skb()
    https://git.kernel.org/netdev/net-next/c/ac3959fd0dcc

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-12 20:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 20:35 [PATCH net-next 0/3] tcp: better deal with delayed TX completions Eric Dumazet
2021-03-11 20:35 ` [PATCH net-next 1/3] tcp: plug skb_still_in_host_queue() to TSQ Eric Dumazet
2021-03-11 20:35 ` [PATCH net-next 2/3] tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack() Eric Dumazet
2021-03-11 20:35 ` [PATCH net-next 3/3] tcp: remove obsolete check in __tcp_retransmit_skb() Eric Dumazet
2021-03-12 18:18 ` [PATCH net-next 0/3] tcp: better deal with delayed TX completions Jakub Kicinski
2021-03-12 19:04   ` Yuchung Cheng
2021-03-12 19:09     ` Neal Cardwell
2021-03-12 20:40 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.