All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] tcp: allow user and drivers to tweak TSQ logic through sysctl knobs
@ 2018-01-05 11:32 Natale Patriciello
  2018-01-05 11:32 ` [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic" Natale Patriciello
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Natale Patriciello @ 2018-01-05 11:32 UTC (permalink / raw)
  To: netdev
  Cc: Natale Patriciello, Eric Dumazet,
	Toke Høiland-Jørgensen, Carlo Augusto Grazia

With the commit 3a9b76fd0db9 ("tcp: allow drivers to tweak TSQ logic") a
network device driver can tune the number of bytes that TCP can queue in
the qdisc/devices. The tuning is done through changing the log scale
used to derive the TSQ credit from the current pacing rate (given in
B/s). By default, it is 10 (equivalent to 1/1024 of a second), but
drivers can select smaller values (from 1/512 to 1/64 of a second) that
reflects in a higher number of bytes that can be enqueued.

The approach has three inherent problems:
0. The values that can be chosen are on a logarithmic scale;
1. It assumes that drivers maintainer will test different values and
   select the best one, putting it in the stone of the driver source code;
2. It prevents end users unable to recompile the kernel from
   experimenting other values.

The users that rely on out-of-tree drivers should have a good knowledge
of the WiFi ecosystem, and be able to patch their driver, to benefit
from the approach: not everyone is a C expert.

This series reverts the above approach and adds sysctl knobs for the
tuning. Firstly, we added the option to disable the TSQ with the magic
value -1 on tcp_limit_output_bytes. Then, we added two sysctl knobs to
allow tuning the number of packets or the ms value that can be enqueued.
So, the user (or the drivers, assuming they can modify sysctl values)
can easily use other values without recompiling the kernel.

A driver that right now has chosen to use a value of 64 for the
shifting, with this patch should put in the sysctl_tcp_limit_output_ms
variable a value of 16 (which corresponds to ~16 ms of bytes that can be
enqueued) to obtain the same effect. The advantage of our series is that
the driver developer (or the end user) can also test (without
recompiling the kernel) values from 9 to 15 ms, which with the current
kernel approach cannot be tested.

Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041
Cc: Eric Dumazet <edumazet@google.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>

Natale Patriciello (3):
  Revert "tcp: allow drivers to tweak TSQ logic"
  tcp: Negative values of sysctl_tcp_limit_output_bytes disable TSQ
  tcp: Add tunable parameters for TSQ

 Documentation/networking/ip-sysctl.txt | 22 ++++++++++++++++++++++
 include/net/netns/ipv4.h               |  2 ++
 include/net/sock.h                     |  2 --
 net/core/sock.c                        |  1 -
 net/ipv4/sysctl_net_ipv4.c             | 14 ++++++++++++++
 net/ipv4/tcp_output.c                  | 10 ++++++++--
 6 files changed, 46 insertions(+), 5 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic"
  2018-01-05 11:32 [RFC PATCH 0/3] tcp: allow user and drivers to tweak TSQ logic through sysctl knobs Natale Patriciello
@ 2018-01-05 11:32 ` Natale Patriciello
  2018-01-05 11:44   ` Eric Dumazet
  2018-01-05 11:32 ` [RFC PATCH 2/3] tcp: Negative values of sysctl_tcp_limit_output_bytes disable TSQ Natale Patriciello
  2018-01-05 11:32 ` [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ Natale Patriciello
  2 siblings, 1 reply; 10+ messages in thread
From: Natale Patriciello @ 2018-01-05 11:32 UTC (permalink / raw)
  To: netdev; +Cc: Natale Patriciello, Carlo Augusto Grazia, Eric Dumazet

There are a lot of drivers that are still outside the mainline, and it is
impossible to expect the authors (which in most cases do not maintain the
driver anymore) to check the best value for the device, and then to insert
it into the driver code. Moreover, if the users want to check or to use
other values, they have to recompile the driver (or the kernel).

This reverts commit 3a9b76fd0db9f0d426533f96a68a62a58753a51e, preparing the
field for tunable sysctl knobs.

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Signed-off-by: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
Cc: Eric Dumazet <edumazet@google.com>

---
 include/net/sock.h    | 2 --
 net/core/sock.c       | 1 -
 net/ipv4/tcp_output.c | 4 ++--
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7a7b14e9628a..20c6a9ae25f4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -267,7 +267,6 @@ struct sock_common {
   *	@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
   *	@sk_gso_max_size: Maximum GSO segment size to build
   *	@sk_gso_max_segs: Maximum number of GSO segments
-  *	@sk_pacing_shift: scaling factor for TCP Small Queues
   *	@sk_lingertime: %SO_LINGER l_linger setting
   *	@sk_backlog: always used with the per-socket spinlock held
   *	@sk_callback_lock: used with the callbacks in the end of this struct
@@ -449,7 +448,6 @@ struct sock {
 				sk_type      : 16;
 #define SK_PROTOCOL_MAX U8_MAX
 	u16			sk_gso_max_segs;
-	u8			sk_pacing_shift;
 	unsigned long	        sk_lingertime;
 	struct proto		*sk_prot_creator;
 	rwlock_t		sk_callback_lock;
diff --git a/net/core/sock.c b/net/core/sock.c
index c0b5b2f17412..78e16e06b0b7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2744,7 +2744,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_max_pacing_rate = ~0U;
 	sk->sk_pacing_rate = ~0U;
-	sk->sk_pacing_shift = 10;
 	sk->sk_incoming_cpu = -1;
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a4d214c7b506..ef1ae727320f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1720,7 +1720,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 {
 	u32 bytes, segs;
 
-	bytes = min(sk->sk_pacing_rate >> sk->sk_pacing_shift,
+	bytes = min(sk->sk_pacing_rate >> 10,
 		    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
 
 	/* Goal is to send at least one packet per ms,
@@ -2198,7 +2198,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 {
 	unsigned int limit;
 
-	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> sk->sk_pacing_shift);
+	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
 	limit = min_t(u32, limit,
 		      sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);
 	limit <<= factor;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/3] tcp: Negative values of sysctl_tcp_limit_output_bytes disable TSQ
  2018-01-05 11:32 [RFC PATCH 0/3] tcp: allow user and drivers to tweak TSQ logic through sysctl knobs Natale Patriciello
  2018-01-05 11:32 ` [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic" Natale Patriciello
@ 2018-01-05 11:32 ` Natale Patriciello
  2018-01-05 11:43   ` Eric Dumazet
  2018-01-05 11:32 ` [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ Natale Patriciello
  2 siblings, 1 reply; 10+ messages in thread
From: Natale Patriciello @ 2018-01-05 11:32 UTC (permalink / raw)
  To: netdev; +Cc: Natale Patriciello, Eric Dumazet, Carlo Augusto Grazia

This commit adds the possibility for a user to disable the TSQ by using an
existing sysctl knob. This feature existed until it was removed by the
commit c9eeec26e32e ("tcp: TSQ can use a dynamic limit").

Fixes: c9eeec26e32e ("tcp: TSQ can use a dynamic limit")

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
Tested-by: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
---
 Documentation/networking/ip-sysctl.txt | 1 +
 net/ipv4/tcp_output.c                  | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 46c7e1085efc..3b530fe8a494 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -721,6 +721,7 @@ tcp_limit_output_bytes - INTEGER
 	typical pfifo_fast qdiscs.
 	tcp_limit_output_bytes limits the number of bytes on qdisc
 	or device to reduce artificial RTT/cwnd and reduce bufferbloat.
+	Set to -1 to disable.
 	Default: 262144
 
 tcp_challenge_ack_limit - INTEGER
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ef1ae727320f..997a6fbdbe1a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2198,6 +2198,9 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 {
 	unsigned int limit;
 
+	if (sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes < 0)
+		return false;
+
 	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
 	limit = min_t(u32, limit,
 		      sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ
  2018-01-05 11:32 [RFC PATCH 0/3] tcp: allow user and drivers to tweak TSQ logic through sysctl knobs Natale Patriciello
  2018-01-05 11:32 ` [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic" Natale Patriciello
  2018-01-05 11:32 ` [RFC PATCH 2/3] tcp: Negative values of sysctl_tcp_limit_output_bytes disable TSQ Natale Patriciello
@ 2018-01-05 11:32 ` Natale Patriciello
  2018-01-05 11:53   ` Eric Dumazet
  2 siblings, 1 reply; 10+ messages in thread
From: Natale Patriciello @ 2018-01-05 11:32 UTC (permalink / raw)
  To: netdev; +Cc: Natale Patriciello, Carlo Augusto Grazia

The original TSQ algorithm limits the number of packets in qdisc/devices to
two packets / or ~1 ms. With this commit, two sysctl knobs are added to
allow tuning the number of packets or the ms value.

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Cc: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
Tested-by: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
---
 Documentation/networking/ip-sysctl.txt | 23 ++++++++++++++++++++++-
 include/net/netns/ipv4.h               |  2 ++
 net/ipv4/sysctl_net_ipv4.c             | 14 ++++++++++++++
 net/ipv4/tcp_output.c                  |  5 ++++-
 4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 3b530fe8a494..2510ef885746 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -721,9 +721,30 @@ tcp_limit_output_bytes - INTEGER
 	typical pfifo_fast qdiscs.
 	tcp_limit_output_bytes limits the number of bytes on qdisc
 	or device to reduce artificial RTT/cwnd and reduce bufferbloat.
-	Set to -1 to disable.
+	The overall limit is given by the following (rate is in B/ms):
+	limit = min(output_bytes, max(output_pkt * mss, output_ms * rate)
+	Set to -1 to unconditionally disable TSQ, regardless of the
+	values of tcp_limit_output_ms and tcp_limit_output_pkt.
 	Default: 262144
 
+tcp_limit_output_ms - UNSIGNED INTEGER
+	Controls TCP Small Queue limit per TCP socket, under a time point
+	of view. Given a transmission rate, limit the bytes on qdisc or
+	device to a value that can be transmitted approximately in the
+	time provided in this parameter at the given rate. This limit
+	is doubled for retransmissions. The overall limit is given by
+	the following (rate is in B/ms):
+	limit = min(output_bytes, max(output_pkt * mss, output_ms * rate)
+	Default: 1
+
+tcp_limit_output_pkt - UNSIGNED INTEGER
+	Controls TCP Small Queue limit per tcp socket.
+	tcp_limit_output_pkt limits the number of packets queued in
+	qdisc/device. This limit is doubled for retransmissions.
+	The overall limit is given by the following (rate is in B/ms):
+	limit = min(output_bytes, max(output_pkt * mss, output_ms * rate)
+	Default: 2
+
 tcp_challenge_ack_limit - INTEGER
 	Limits number of Challenge ACK sent per second, as recommended
 	in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 44668c29701a..e2c06827d0bb 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -148,6 +148,8 @@ struct netns_ipv4 {
 	int sysctl_tcp_tso_win_divisor;
 	int sysctl_tcp_workaround_signed_windows;
 	int sysctl_tcp_limit_output_bytes;
+	unsigned int sysctl_tcp_limit_output_ms;
+	unsigned int sysctl_tcp_limit_output_pkt;
 	int sysctl_tcp_challenge_ack_limit;
 	int sysctl_tcp_min_tso_segs;
 	int sysctl_tcp_min_rtt_wlen;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 93e172118a94..775a4d079a9b 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -1094,6 +1094,20 @@ static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "tcp_limit_output_ms",
+		.data		= &init_net.ipv4.sysctl_tcp_limit_output_ms,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec
+	},
+	{
+		.procname	= "tcp_limit_output_pkt",
+		.data		= &init_net.ipv4.sysctl_tcp_limit_output_pkt,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec
+	},
 	{
 		.procname	= "tcp_challenge_ack_limit",
 		.data		= &init_net.ipv4.sysctl_tcp_challenge_ack_limit,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 997a6fbdbe1a..eae715c4a005 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2201,7 +2201,10 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 	if (sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes < 0)
 		return false;
 
-	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+	limit = sock_net(sk)->ipv4.sysctl_tcp_limit_output_ms *
+		(sk->sk_pacing_rate >> 10);
+	limit = max(sock_net(sk)->ipv4.sysctl_tcp_limit_output_pkt * skb->truesize,
+		    limit);
 	limit = min_t(u32, limit,
 		      sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);
 	limit <<= factor;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/3] tcp: Negative values of sysctl_tcp_limit_output_bytes disable TSQ
  2018-01-05 11:32 ` [RFC PATCH 2/3] tcp: Negative values of sysctl_tcp_limit_output_bytes disable TSQ Natale Patriciello
@ 2018-01-05 11:43   ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2018-01-05 11:43 UTC (permalink / raw)
  To: Natale Patriciello; +Cc: netdev, Carlo Augusto Grazia

On Fri, Jan 5, 2018 at 3:32 AM, Natale Patriciello
<natale.patriciello@gmail.com> wrote:
> This commit adds the possibility for a user to disable the TSQ by using an
> existing sysctl knob. This feature existed until it was removed by the
> commit c9eeec26e32e ("tcp: TSQ can use a dynamic limit").
>
> Fixes: c9eeec26e32e ("tcp: TSQ can use a dynamic limit")
>
> Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
> Tested-by: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
> ---
>  Documentation/networking/ip-sysctl.txt | 1 +
>  net/ipv4/tcp_output.c                  | 3 +++
>  2 files changed, 4 insertions(+)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 46c7e1085efc..3b530fe8a494 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -721,6 +721,7 @@ tcp_limit_output_bytes - INTEGER
>         typical pfifo_fast qdiscs.
>         tcp_limit_output_bytes limits the number of bytes on qdisc
>         or device to reduce artificial RTT/cwnd and reduce bufferbloat.
> +       Set to -1 to disable.
>         Default: 262144
>
>  tcp_challenge_ack_limit - INTEGER
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ef1ae727320f..997a6fbdbe1a 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2198,6 +2198,9 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
>  {
>         unsigned int limit;
>
> +       if (sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes < 0)
> +               return false;
> +
>

NACK

I do not want to add yet another condition in fast path.

Just put an arbitrary large value in the existing sysctl, no need for
extra code.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic"
  2018-01-05 11:32 ` [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic" Natale Patriciello
@ 2018-01-05 11:44   ` Eric Dumazet
  2018-01-05 15:26     ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2018-01-05 11:44 UTC (permalink / raw)
  To: Natale Patriciello; +Cc: netdev, Carlo Augusto Grazia

On Fri, Jan 5, 2018 at 3:32 AM, Natale Patriciello
<natale.patriciello@gmail.com> wrote:
> There are a lot of drivers that are still outside the mainline, and it is
> impossible to expect the authors (which in most cases do not maintain the
> driver anymore) to check the best value for the device, and then to insert
> it into the driver code. Moreover, if the users want to check or to use
> other values, they have to recompile the driver (or the kernel).
>

This is nonsense

I do not care of out of tree drivers.

NACK.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ
  2018-01-05 11:32 ` [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ Natale Patriciello
@ 2018-01-05 11:53   ` Eric Dumazet
  2018-01-06 19:22     ` Natale Patriciello
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2018-01-05 11:53 UTC (permalink / raw)
  To: Natale Patriciello, netdev; +Cc: Carlo Augusto Grazia

On Fri, 2018-01-05 at 12:32 +0100, Natale Patriciello wrote:
> The original TSQ algorithm limits the number of packets in qdisc/devices to
> two packets / or ~1 ms. With this commit, two sysctl knobs are added to
> allow tuning the number of packets or the ms value.
> 
> Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
> Cc: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
> Tested-by: Carlo Augusto Grazia <carloaugusto.grazia@unimore.it>
> ---

NACK

You provide dubious reasons, and no real tests done on various
hardwares.

We do not want sysctls that wont solve the issue you try to address.

Your patch series comes too late.

A linux host can have one 10Gbit NIC and a wifi adapter, they require
different tunings.

This is why we added sk_pacing_shift_update().

If this needs refinement, lets talk.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic"
  2018-01-05 11:44   ` Eric Dumazet
@ 2018-01-05 15:26     ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2018-01-05 15:26 UTC (permalink / raw)
  To: edumazet; +Cc: natale.patriciello, netdev, carloaugusto.grazia

From: Eric Dumazet <edumazet@google.com>
Date: Fri, 5 Jan 2018 03:44:20 -0800

> On Fri, Jan 5, 2018 at 3:32 AM, Natale Patriciello
> <natale.patriciello@gmail.com> wrote:
>> There are a lot of drivers that are still outside the mainline, and it is
>> impossible to expect the authors (which in most cases do not maintain the
>> driver anymore) to check the best value for the device, and then to insert
>> it into the driver code. Moreover, if the users want to check or to use
>> other values, they have to recompile the driver (or the kernel).
>>
> 
> This is nonsense
> 
> I do not care of out of tree drivers.
> 
> NACK.

Agreed, big NACK from me as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ
  2018-01-05 11:53   ` Eric Dumazet
@ 2018-01-06 19:22     ` Natale Patriciello
  2018-01-09 19:04       ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Natale Patriciello @ 2018-01-06 19:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, carloaugusto.grazia

Thank you, Eric and David, for the time spent in reviewing our work.
Some comments inline:

On 05/01/18 at 03:53am, Eric Dumazet wrote:

> I do not want to add yet another condition in fast path.
> Just put an arbitrary large value in the existing sysctl, no need for
> extra code.

Due to the minimum statement at the line 2202, the algorithm will ignore
the arbitrarily large value and will use ~1 ms of data at the current
rate or 2 segments instead. Therefore, right now there is not the
possibility to completely disable TSQ, while there was in the first
version of it.

> You provide dubious reasons, and no real tests done on various
> hardwares.

We did perform some test internally on a 4.13 kernel for an academic
submission. By varying the parameters, we were able to double the
throughput reachable by any congestion {avoidance, control} algorithm on
top of 2.4GHz networks with a channel of 40 MHz, and to reduce latency
(maybe there is some kind of data waiting that is done at
driver/firmware/hardware level). Then we saw the patch, and we became
aware of the community interest in the topic and decided to ask for
feedback on a revised version.

We will for sure increase the number of test cases (including CPU usage)
and report as soon as the academic world allows us. We are happily using
Flent for the testing phase.

> A linux host can have one 10Gbit NIC and a wifi adapter, they require
> different tunings.

This is an excellent example that we did not consider while developing
the patch. Thanks.

> This is why we added sk_pacing_shift_update(). If this needs
> refinement, lets talk.

We believe that it is fundamental to give the user the runtime control
of the algorithm, which right now starts with latency-saving default
values but is tailored for a specific kind of network. As you pointed
out, these should be tuned per-interface. In the following days, we will
perform more focused testing, trying to assess if there are cases in
which a fine-tuning is preferable to a logarithmic one.

Meanwhile, what would be the best way to expose sk_pacing_shift to the
userspace?

Thank you again

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ
  2018-01-06 19:22     ` Natale Patriciello
@ 2018-01-09 19:04       ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2018-01-09 19:04 UTC (permalink / raw)
  To: Natale Patriciello; +Cc: netdev, carloaugusto.grazia

On Sat, 2018-01-06 at 20:22 +0100, Natale Patriciello wrote:
> Thank you, Eric and David, for the time spent in reviewing our work.
> Some comments inline:
> 
> On 05/01/18 at 03:53am, Eric Dumazet wrote:
> 
> > I do not want to add yet another condition in fast path.
> > Just put an arbitrary large value in the existing sysctl, no need for
> > extra code.
> 
> Due to the minimum statement at the line 2202, the algorithm will ignore
> the arbitrarily large value and will use ~1 ms of data at the current
> rate or 2 segments instead. Therefore, right now there is not the
> possibility to completely disable TSQ, while there was in the first
> version of it.

We do not want to disable TSQ, or even allow someone to disable it.

limit = max(2 * skb->truesize,
            sk->sk_pacing_rate >> sk->sk_pacing_shift);
limit = min_t(u32, limit,
             sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);

Meaning that if you set sk_pacing_shift to 0, you allow to fill the NIC
with 1 second of TCP data per flow. Which is insanely high.

> 
> > You provide dubious reasons, and no real tests done on various
> > hardwares.
> 
> We did perform some test internally on a 4.13 kernel for an academic
> submission. By varying the parameters, we were able to double the
> throughput reachable by any congestion {avoidance, control} algorithm on
> top of 2.4GHz networks with a channel of 40 MHz, and to reduce latency
> (maybe there is some kind of data waiting that is done at
> driver/firmware/hardware level).

Right, we know that bufferbloat helps to get high throughput on a
single flow in a controlled environment.


>  Then we saw the patch, and we became
> aware of the community interest in the topic and decided to ask for
> feedback on a revised version.

No driver uses yet the infra, yet you want to revise it ?
This makes no sense.

> 
> We will for sure increase the number of test cases (including CPU usage)
> and report as soon as the academic world allows us. We are happily using
> Flent for the testing phase.
> 
> > A linux host can have one 10Gbit NIC and a wifi adapter, they require
> > different tunings.
> 
> This is an excellent example that we did not consider while developing
> the patch. Thanks.
> 
> > This is why we added sk_pacing_shift_update(). If this needs
> > refinement, lets talk.
> 
> We believe that it is fundamental to give the user the runtime control
> of the algorithm, which right now starts with latency-saving default
> values but is tailored for a specific kind of network. As you pointed
> out, these should be tuned per-interface. In the following days, we will
> perform more focused testing, trying to assess if there are cases in
> which a fine-tuning is preferable to a logarithmic one.

fine-tuning looks overkill here, especially if you need an extra divide
or more socket space...

> 
> Meanwhile, what would be the best way to expose sk_pacing_shift to the
> userspace?
> 

sk_pacing_shift is not yet part of the ABI. We do not know if it will
be enough or not.

There is no point exposing it, unless we want to keep it for the years
to come (ABI is in stone) and believe user should get it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-09 19:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-05 11:32 [RFC PATCH 0/3] tcp: allow user and drivers to tweak TSQ logic through sysctl knobs Natale Patriciello
2018-01-05 11:32 ` [RFC PATCH 1/3] Revert "tcp: allow drivers to tweak TSQ logic" Natale Patriciello
2018-01-05 11:44   ` Eric Dumazet
2018-01-05 15:26     ` David Miller
2018-01-05 11:32 ` [RFC PATCH 2/3] tcp: Negative values of sysctl_tcp_limit_output_bytes disable TSQ Natale Patriciello
2018-01-05 11:43   ` Eric Dumazet
2018-01-05 11:32 ` [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ Natale Patriciello
2018-01-05 11:53   ` Eric Dumazet
2018-01-06 19:22     ` Natale Patriciello
2018-01-09 19:04       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.