linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl
@ 2023-10-11 20:30 Haiyang Zhang
  2023-10-16  9:10 ` Simon Horman
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Haiyang Zhang @ 2023-10-11 20:30 UTC (permalink / raw)
  To: linux-hyperv, netdev
  Cc: haiyangz, kys, davem, edumazet, kuba, pabeni, corbet, dsahern,
	ncardwell, ycheng, kuniyu, morleyd, mfreemon, mubashirq,
	linux-doc, weiwan, linux-kernel

TCP pingpong threshold is 1 by default. But some applications, like SQL DB
may prefer a higher pingpong threshold to activate delayed acks in quick
ack mode for better performance.

The pingpong threshold and related code were changed to 3 in the year
2019 in:
  commit 4a41f453bedf ("tcp: change pingpong threshold to 3")
And reverted to 1 in the year 2022 in:
  commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"")

There is no single value that fits all applications.
Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
optimal performance based on the application needs.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
v3: Updated doc as suggested by Neal Cardwell.
    Updated variable location in struct netns_ipv4 as suggested by Kuniyuki
    Iwashima.

v2: Make it per-namesapce setting, and other updates suggested by Neal Cardwell,
and Kuniyuki Iwashima.
---
 Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
 include/net/inet_connection_sock.h     | 16 ++++++++++++----
 include/net/netns/ipv4.h               |  2 ++
 net/ipv4/sysctl_net_ipv4.c             |  8 ++++++++
 net/ipv4/tcp_ipv4.c                    |  2 ++
 net/ipv4/tcp_output.c                  |  4 ++--
 6 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index f7dfde3b09a9..e7ec9026e5db 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1183,6 +1183,19 @@ tcp_plb_cong_thresh - INTEGER
 
 	Default: 128
 
+tcp_pingpong_thresh - INTEGER
+	The number of estimated data replies sent for estimated incoming data
+	requests that must happen before TCP considers that a connection is a
+	"ping-pong" (request-response) connection for which delayed
+	acknowledgments can provide benefits.
+
+	This threshold is 1 by default, but some applications may need a higher
+	threshold for optimal performance.
+
+	Possible Values: 1 - 255
+
+	Default: 1
+
 UDP variables
 =============
 
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index d6d9d1c1985a..086d1193c9ef 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -328,11 +328,10 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
 
 struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
 
-#define TCP_PINGPONG_THRESH	1
-
 static inline void inet_csk_enter_pingpong_mode(struct sock *sk)
 {
-	inet_csk(sk)->icsk_ack.pingpong = TCP_PINGPONG_THRESH;
+	inet_csk(sk)->icsk_ack.pingpong =
+		READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh);
 }
 
 static inline void inet_csk_exit_pingpong_mode(struct sock *sk)
@@ -342,7 +341,16 @@ static inline void inet_csk_exit_pingpong_mode(struct sock *sk)
 
 static inline bool inet_csk_in_pingpong_mode(struct sock *sk)
 {
-	return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH;
+	return inet_csk(sk)->icsk_ack.pingpong >=
+	       READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh);
+}
+
+static inline void inet_csk_inc_pingpong_cnt(struct sock *sk)
+{
+	struct inet_connection_sock *icsk = inet_csk(sk);
+
+	if (icsk->icsk_ack.pingpong < U8_MAX)
+		icsk->icsk_ack.pingpong++;
 }
 
 static inline bool inet_csk_has_ulp(const struct sock *sk)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d96d05b08819..73f43f699199 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -133,6 +133,8 @@ struct netns_ipv4 {
 	u8 sysctl_tcp_migrate_req;
 	u8 sysctl_tcp_comp_sack_nr;
 	u8 sysctl_tcp_backlog_ack_defer;
+	u8 sysctl_tcp_pingpong_thresh;
+
 	int sysctl_tcp_reordering;
 	u8 sysctl_tcp_retries1;
 	u8 sysctl_tcp_retries2;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index e7f024d93572..f63a545a7374 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -1498,6 +1498,14 @@ static struct ctl_table ipv4_net_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
+	{
+		.procname	= "tcp_pingpong_thresh",
+		.data		= &init_net.ipv4.sysctl_tcp_pingpong_thresh,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= proc_dou8vec_minmax,
+		.extra1		= SYSCTL_ONE,
+	},
 	{ }
 };
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a441740616d7..f603ad9307af 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3288,6 +3288,8 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_syn_linear_timeouts = 4;
 	net->ipv4.sysctl_tcp_shrink_window = 0;
 
+	net->ipv4.sysctl_tcp_pingpong_thresh = 1;
+
 	return 0;
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f207712eece1..7d0fe76d56ef 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -170,10 +170,10 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
 	tp->lsndtime = now;
 
 	/* If it is a reply for ato after last received
-	 * packet, enter pingpong mode.
+	 * packet, increase pingpong count.
 	 */
 	if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
-		inet_csk_enter_pingpong_mode(sk);
+		inet_csk_inc_pingpong_cnt(sk);
 }
 
 /* Account for an ACK we sent. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl
  2023-10-11 20:30 [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl Haiyang Zhang
@ 2023-10-16  9:10 ` Simon Horman
  2023-10-16 11:40 ` Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Simon Horman @ 2023-10-16  9:10 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: linux-hyperv, netdev, kys, davem, edumazet, kuba, pabeni, corbet,
	dsahern, ncardwell, ycheng, kuniyu, morleyd, mfreemon, mubashirq,
	linux-doc, weiwan, linux-kernel

On Wed, Oct 11, 2023 at 01:30:44PM -0700, Haiyang Zhang wrote:
> TCP pingpong threshold is 1 by default. But some applications, like SQL DB
> may prefer a higher pingpong threshold to activate delayed acks in quick
> ack mode for better performance.
> 
> The pingpong threshold and related code were changed to 3 in the year
> 2019 in:
>   commit 4a41f453bedf ("tcp: change pingpong threshold to 3")
> And reverted to 1 in the year 2022 in:
>   commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"")
> 
> There is no single value that fits all applications.
> Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
> optimal performance based on the application needs.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
> v3: Updated doc as suggested by Neal Cardwell.
>     Updated variable location in struct netns_ipv4 as suggested by Kuniyuki
>     Iwashima.
> 
> v2: Make it per-namesapce setting, and other updates suggested by Neal Cardwell,
> and Kuniyuki Iwashima.

Thanks,

this looks clean to me. It seems to address the review of v2.
And keeps the knob as syctl as discussed in v2.

Reviewed-by: Simon Horman <horms@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl
  2023-10-11 20:30 [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl Haiyang Zhang
  2023-10-16  9:10 ` Simon Horman
@ 2023-10-16 11:40 ` Eric Dumazet
  2023-10-16 16:17 ` Neal Cardwell
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2023-10-16 11:40 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: linux-hyperv, netdev, kys, davem, kuba, pabeni, corbet, dsahern,
	ncardwell, ycheng, kuniyu, morleyd, mfreemon, mubashirq,
	linux-doc, weiwan, linux-kernel

On Wed, Oct 11, 2023 at 10:31 PM Haiyang Zhang <haiyangz@microsoft.com> wrote:
>
> TCP pingpong threshold is 1 by default. But some applications, like SQL DB
> may prefer a higher pingpong threshold to activate delayed acks in quick
> ack mode for better performance.
>

...

>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f207712eece1..7d0fe76d56ef 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -170,10 +170,10 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
>         tp->lsndtime = now;
>
>         /* If it is a reply for ato after last received
> -        * packet, enter pingpong mode.
> +        * packet, increase pingpong count.
>          */
>         if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
> -               inet_csk_enter_pingpong_mode(sk);
> +               inet_csk_inc_pingpong_cnt(sk);
>  }
>
>  /* Account for an ACK we sent. */

OK, but I do not think we solved the fundamental problem of using
jiffies for this heuristic,
especially for HZ=100 or HZ=250 builds.

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl
  2023-10-11 20:30 [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl Haiyang Zhang
  2023-10-16  9:10 ` Simon Horman
  2023-10-16 11:40 ` Eric Dumazet
@ 2023-10-16 16:17 ` Neal Cardwell
  2023-10-16 16:49 ` Kuniyuki Iwashima
  2023-10-16 22:30 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: Neal Cardwell @ 2023-10-16 16:17 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: linux-hyperv, netdev, kys, davem, edumazet, kuba, pabeni, corbet,
	dsahern, ycheng, kuniyu, morleyd, mfreemon, mubashirq, linux-doc,
	weiwan, linux-kernel

On Wed, Oct 11, 2023 at 4:31 PM Haiyang Zhang <haiyangz@microsoft.com> wrote:
>
> TCP pingpong threshold is 1 by default. But some applications, like SQL DB
> may prefer a higher pingpong threshold to activate delayed acks in quick
> ack mode for better performance.
>
> The pingpong threshold and related code were changed to 3 in the year
> 2019 in:
>   commit 4a41f453bedf ("tcp: change pingpong threshold to 3")
> And reverted to 1 in the year 2022 in:
>   commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"")
>
> There is no single value that fits all applications.
> Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
> optimal performance based on the application needs.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
> v3: Updated doc as suggested by Neal Cardwell.
>     Updated variable location in struct netns_ipv4 as suggested by Kuniyuki
>     Iwashima.
>
> v2: Make it per-namesapce setting, and other updates suggested by Neal Cardwell,
> and Kuniyuki Iwashima.
> ---
>  Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
>  include/net/inet_connection_sock.h     | 16 ++++++++++++----
>  include/net/netns/ipv4.h               |  2 ++
>  net/ipv4/sysctl_net_ipv4.c             |  8 ++++++++
>  net/ipv4/tcp_ipv4.c                    |  2 ++
>  net/ipv4/tcp_output.c                  |  4 ++--
>  6 files changed, 39 insertions(+), 6 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks!

neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl
  2023-10-11 20:30 [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl Haiyang Zhang
                   ` (2 preceding siblings ...)
  2023-10-16 16:17 ` Neal Cardwell
@ 2023-10-16 16:49 ` Kuniyuki Iwashima
  2023-10-16 22:30 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2023-10-16 16:49 UTC (permalink / raw)
  To: haiyangz
  Cc: corbet, davem, dsahern, edumazet, kuba, kuniyu, kys, linux-doc,
	linux-hyperv, linux-kernel, mfreemon, morleyd, mubashirq,
	ncardwell, netdev, pabeni, weiwan, ycheng

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Wed, 11 Oct 2023 13:30:44 -0700
> TCP pingpong threshold is 1 by default. But some applications, like SQL DB
> may prefer a higher pingpong threshold to activate delayed acks in quick
> ack mode for better performance.
> 
> The pingpong threshold and related code were changed to 3 in the year
> 2019 in:
>   commit 4a41f453bedf ("tcp: change pingpong threshold to 3")
> And reverted to 1 in the year 2022 in:
>   commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"")
> 
> There is no single value that fits all applications.
> Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
> optimal performance based on the application needs.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl
  2023-10-11 20:30 [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl Haiyang Zhang
                   ` (3 preceding siblings ...)
  2023-10-16 16:49 ` Kuniyuki Iwashima
@ 2023-10-16 22:30 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-10-16 22:30 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: linux-hyperv, netdev, kys, davem, edumazet, kuba, pabeni, corbet,
	dsahern, ncardwell, ycheng, kuniyu, morleyd, mfreemon, mubashirq,
	linux-doc, weiwan, linux-kernel

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 11 Oct 2023 13:30:44 -0700 you wrote:
> TCP pingpong threshold is 1 by default. But some applications, like SQL DB
> may prefer a higher pingpong threshold to activate delayed acks in quick
> ack mode for better performance.
> 
> The pingpong threshold and related code were changed to 3 in the year
> 2019 in:
>   commit 4a41f453bedf ("tcp: change pingpong threshold to 3")
> And reverted to 1 in the year 2022 in:
>   commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"")
> 
> [...]

Here is the summary with links:
  - [net-next,v3] tcp: Set pingpong threshold via sysctl
    https://git.kernel.org/netdev/net-next/c/562b1fdf061b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-16 22:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-11 20:30 [PATCH net-next,v3] tcp: Set pingpong threshold via sysctl Haiyang Zhang
2023-10-16  9:10 ` Simon Horman
2023-10-16 11:40 ` Eric Dumazet
2023-10-16 16:17 ` Neal Cardwell
2023-10-16 16:49 ` Kuniyuki Iwashima
2023-10-16 22:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).