linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] Separate SMC parameter settings from TCP sysctls
@ 2022-09-20  2:45 Wen Gu
  2022-09-20  2:53 ` [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time Wen Gu
  2022-09-20  2:54 ` [PATCH net-next 2/2] net/smc: Unbind r/w buffer size from clcsock and make them tunable Wen Gu
  0 siblings, 2 replies; 8+ messages in thread
From: Wen Gu @ 2022-09-20  2:45 UTC (permalink / raw)
  To: kgraul, wenjia, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

SMC shares some sysctls with TCP, but considering the difference
between these two protocols, it may not be very suitable for SMC
to reuse TCP parameter settings in some cases, such as keepalive
time or buffer size.

So this patch set aims to introduce some SMC specific sysctls to
independently and flexibly set the parameters that suit SMC.

Tony Lu (1):
  net/smc: Unbind r/w buffer size from clcsock and make them tunable

Wen Gu (1):
  net/smc: Introduce a specific sysctl for TEST_LINK time

 Documentation/networking/smc-sysctl.rst | 25 ++++++++++++++++++++++++
 include/net/netns/smc.h                 |  3 +++
 net/smc/af_smc.c                        |  5 ++---
 net/smc/smc_core.c                      |  8 ++++----
 net/smc/smc_llc.c                       |  2 +-
 net/smc/smc_llc.h                       |  1 +
 net/smc/smc_sysctl.c                    | 34 +++++++++++++++++++++++++++++++++
 7 files changed, 70 insertions(+), 8 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time
  2022-09-20  2:45 [PATCH net-next 0/2] Separate SMC parameter settings from TCP sysctls Wen Gu
@ 2022-09-20  2:53 ` Wen Gu
  2022-09-20  2:57   ` Tony Lu
  2022-09-20  4:55   ` dust.li
  2022-09-20  2:54 ` [PATCH net-next 2/2] net/smc: Unbind r/w buffer size from clcsock and make them tunable Wen Gu
  1 sibling, 2 replies; 8+ messages in thread
From: Wen Gu @ 2022-09-20  2:53 UTC (permalink / raw)
  To: kgraul, wenjia, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

SMC-R tests the viability of link by sending out TEST_LINK LLC
messages over RoCE fabric when connections on link have been
idle for a time longer than keepalive interval (testlink time).

But using tcp_keepalive_time as testlink time maybe not quite
suitable because it is default no less than two hours[1], which
is too long for single link to find peer dead. The active host
will still use peer-dead link (QP) sending messages, and can't
find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes
more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally.

So this patch introduces a independent sysctl for SMC-R to set
link keepalive time, in order to detect link down in time. The
default value is 30 seconds.

[1] https://www.rfc-editor.org/rfc/rfc1122#page-101

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 Documentation/networking/smc-sysctl.rst |  7 +++++++
 include/net/netns/smc.h                 |  1 +
 net/smc/smc_llc.c                       |  2 +-
 net/smc/smc_llc.h                       |  1 +
 net/smc/smc_sysctl.c                    | 14 ++++++++++++++
 5 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
index 742e90e..f8c3d59 100644
--- a/Documentation/networking/smc-sysctl.rst
+++ b/Documentation/networking/smc-sysctl.rst
@@ -34,3 +34,10 @@ smcr_buf_type - INTEGER
         - 1 - Use virtually contiguous buffers
         - 2 - Mixed use of the two types. Try physically contiguous buffers first.
           If not available, use virtually contiguous buffers then.
+
+smcr_testlink_time - INTEGER
+	How frequently SMC-R link sends out TEST_LINK LLC messages to confirm
+	viability, after the last activity of connections on it. The maximum
+	value is (INT_MAX / HZ) seconds, the minimum value is 1 second.
+
+	Default: 30 seconds.
diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
index 2adbe2b..d295e2c 100644
--- a/include/net/netns/smc.h
+++ b/include/net/netns/smc.h
@@ -19,5 +19,6 @@ struct netns_smc {
 #endif
 	unsigned int			sysctl_autocorking_size;
 	unsigned int			sysctl_smcr_buf_type;
+	int				sysctl_smcr_testlink_time;
 };
 #endif
diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index 175026a..388bd2e 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -2127,7 +2127,7 @@ void smc_llc_lgr_init(struct smc_link_group *lgr, struct smc_sock *smc)
 	init_waitqueue_head(&lgr->llc_flow_waiter);
 	init_waitqueue_head(&lgr->llc_msg_waiter);
 	mutex_init(&lgr->llc_conf_mutex);
-	lgr->llc_testlink_time = READ_ONCE(net->ipv4.sysctl_tcp_keepalive_time);
+	lgr->llc_testlink_time = READ_ONCE(net->smc.sysctl_smcr_testlink_time) * HZ;
 }
 
 /* called after lgr was removed from lgr_list */
diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
index 4404e52..1de9a29 100644
--- a/net/smc/smc_llc.h
+++ b/net/smc/smc_llc.h
@@ -19,6 +19,7 @@
 
 #define SMC_LLC_WAIT_FIRST_TIME		(5 * HZ)
 #define SMC_LLC_WAIT_TIME		(2 * HZ)
+#define SMC_LLC_TESTLINK_DEFAULT_TIME	30
 
 enum smc_llc_reqresp {
 	SMC_LLC_REQ,
diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
index 0613868..7f68520 100644
--- a/net/smc/smc_sysctl.c
+++ b/net/smc/smc_sysctl.c
@@ -16,8 +16,12 @@
 
 #include "smc.h"
 #include "smc_core.h"
+#include "smc_llc.h"
 #include "smc_sysctl.h"
 
+static int smcr_testlink_time_min = 1;
+static int smcr_testlink_time_max = (INT_MAX / HZ);
+
 static struct ctl_table smc_table[] = {
 	{
 		.procname       = "autocorking_size",
@@ -35,6 +39,15 @@
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_TWO,
 	},
+	{
+		.procname	= "smcr_testlink_time",
+		.data		= &init_net.smc.sysctl_smcr_testlink_time,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &smcr_testlink_time_min,
+		.extra2		= &smcr_testlink_time_max,
+	},
 	{  }
 };
 
@@ -60,6 +73,7 @@ int __net_init smc_sysctl_net_init(struct net *net)
 
 	net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE;
 	net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS;
+	net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME;
 
 	return 0;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/2] net/smc: Unbind r/w buffer size from clcsock and make them tunable
  2022-09-20  2:45 [PATCH net-next 0/2] Separate SMC parameter settings from TCP sysctls Wen Gu
  2022-09-20  2:53 ` [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time Wen Gu
@ 2022-09-20  2:54 ` Wen Gu
  1 sibling, 0 replies; 8+ messages in thread
From: Wen Gu @ 2022-09-20  2:54 UTC (permalink / raw)
  To: kgraul, wenjia, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

From: Tony Lu <tonylu@linux.alibaba.com>

Currently, SMC uses smc->sk.sk_{rcv|snd}buf to create buffers for
send buffer and RMB. And the values of buffer size are from tcp_{w|r}mem
in clcsock.

The buffer size from TCP socket doesn't fit SMC well. Generally, buffers
are usually larger than TCP for SMC-R/-D to get higher performance, for
they are different underlay devices and paths.

So this patch unbinds buffer size from TCP, and introduces two sysctl
knobs to tune them independently. Also, these knobs are per net
namespace and work for containers.

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 Documentation/networking/smc-sysctl.rst | 18 ++++++++++++++++++
 include/net/netns/smc.h                 |  2 ++
 net/smc/af_smc.c                        |  5 ++---
 net/smc/smc_core.c                      |  8 ++++----
 net/smc/smc_sysctl.c                    | 20 ++++++++++++++++++++
 5 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
index f8c3d59..2e45bbe 100644
--- a/Documentation/networking/smc-sysctl.rst
+++ b/Documentation/networking/smc-sysctl.rst
@@ -41,3 +41,21 @@ smcr_testlink_time - INTEGER
 	value is (INT_MAX / HZ) seconds, the minimum value is 1 second.
 
 	Default: 30 seconds.
+
+wmem - INTEGER
+	Initial size of send buffer used by SMC sockets.
+	The default value inherits from net.ipv4.tcp_wmem[1].
+
+	The minimum value is 16KiB and there is no hard limit for max value, but
+	only allowed 512KiB for SMC-R and 1MiB for SMC-D.
+
+	Default: 16K
+
+rmem - INTEGER
+	Initial size of receive buffer (RMB) used by SMC sockets.
+	The default value inherits from net.ipv4.tcp_rmem[1].
+
+	The minimum value is 16KiB and there is no hard limit for max value, but
+	only allowed 512KiB for SMC-R and 1MiB for SMC-D.
+
+	Default: 128K
diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
index d295e2c..582212a 100644
--- a/include/net/netns/smc.h
+++ b/include/net/netns/smc.h
@@ -20,5 +20,7 @@ struct netns_smc {
 	unsigned int			sysctl_autocorking_size;
 	unsigned int			sysctl_smcr_buf_type;
 	int				sysctl_smcr_testlink_time;
+	int				sysctl_wmem;
+	int				sysctl_rmem;
 };
 #endif
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0939cc3..e44ca70 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -379,6 +379,8 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
 	sk->sk_state = SMC_INIT;
 	sk->sk_destruct = smc_destruct;
 	sk->sk_protocol = protocol;
+	WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(net->smc.sysctl_wmem));
+	WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(net->smc.sysctl_rmem));
 	smc = smc_sk(sk);
 	INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
 	INIT_WORK(&smc->connect_work, smc_connect_work);
@@ -3253,9 +3255,6 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
 		smc->clcsock = clcsock;
 	}
 
-	smc->sk.sk_sndbuf = max(smc->clcsock->sk->sk_sndbuf, SMC_BUF_MIN_SIZE);
-	smc->sk.sk_rcvbuf = max(smc->clcsock->sk->sk_rcvbuf, SMC_BUF_MIN_SIZE);
-
 out:
 	return rc;
 }
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index ebf56cd..ea41f22 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -2307,10 +2307,10 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
 
 	if (is_rmb)
 		/* use socket recv buffer size (w/o overhead) as start value */
-		sk_buf_size = smc->sk.sk_rcvbuf / 2;
+		sk_buf_size = smc->sk.sk_rcvbuf;
 	else
 		/* use socket send buffer size (w/o overhead) as start value */
-		sk_buf_size = smc->sk.sk_sndbuf / 2;
+		sk_buf_size = smc->sk.sk_sndbuf;
 
 	for (bufsize_short = smc_compress_bufsize(sk_buf_size, is_smcd, is_rmb);
 	     bufsize_short >= 0; bufsize_short--) {
@@ -2369,7 +2369,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
 	if (is_rmb) {
 		conn->rmb_desc = buf_desc;
 		conn->rmbe_size_short = bufsize_short;
-		smc->sk.sk_rcvbuf = bufsize * 2;
+		smc->sk.sk_rcvbuf = bufsize;
 		atomic_set(&conn->bytes_to_rcv, 0);
 		conn->rmbe_update_limit =
 			smc_rmb_wnd_update_limit(buf_desc->len);
@@ -2377,7 +2377,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
 			smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */
 	} else {
 		conn->sndbuf_desc = buf_desc;
-		smc->sk.sk_sndbuf = bufsize * 2;
+		smc->sk.sk_sndbuf = bufsize;
 		atomic_set(&conn->sndbuf_space, bufsize);
 	}
 	return 0;
diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
index 7f68520..0046a88 100644
--- a/net/smc/smc_sysctl.c
+++ b/net/smc/smc_sysctl.c
@@ -21,6 +21,8 @@
 
 static int smcr_testlink_time_min = 1;
 static int smcr_testlink_time_max = (INT_MAX / HZ);
+static int min_sndbuf = SMC_BUF_MIN_SIZE;
+static int min_rcvbuf = SMC_BUF_MIN_SIZE;
 
 static struct ctl_table smc_table[] = {
 	{
@@ -48,6 +50,22 @@
 		.extra1		= &smcr_testlink_time_min,
 		.extra2		= &smcr_testlink_time_max,
 	},
+	{
+		.procname	= "wmem",
+		.data		= &init_net.smc.sysctl_wmem,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &min_sndbuf,
+	},
+	{
+		.procname	= "rmem",
+		.data		= &init_net.smc.sysctl_rmem,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &min_rcvbuf,
+	},
 	{  }
 };
 
@@ -74,6 +92,8 @@ int __net_init smc_sysctl_net_init(struct net *net)
 	net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE;
 	net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS;
 	net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME;
+	WRITE_ONCE(net->smc.sysctl_wmem, READ_ONCE(net->ipv4.sysctl_tcp_wmem[1]));
+	WRITE_ONCE(net->smc.sysctl_rmem, READ_ONCE(net->ipv4.sysctl_tcp_rmem[1]));
 
 	return 0;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time
  2022-09-20  2:53 ` [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time Wen Gu
@ 2022-09-20  2:57   ` Tony Lu
  2022-09-20  4:55   ` dust.li
  1 sibling, 0 replies; 8+ messages in thread
From: Tony Lu @ 2022-09-20  2:57 UTC (permalink / raw)
  To: Wen Gu
  Cc: kgraul, wenjia, davem, edumazet, kuba, pabeni, linux-s390,
	netdev, linux-kernel

On Tue, Sep 20, 2022 at 10:53:54AM +0800, Wen Gu wrote:
> SMC-R tests the viability of link by sending out TEST_LINK LLC
> messages over RoCE fabric when connections on link have been
> idle for a time longer than keepalive interval (testlink time).
> 
> But using tcp_keepalive_time as testlink time maybe not quite
> suitable because it is default no less than two hours[1], which
> is too long for single link to find peer dead. The active host
> will still use peer-dead link (QP) sending messages, and can't
> find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes
> more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally.
> 
> So this patch introduces a independent sysctl for SMC-R to set
> link keepalive time, in order to detect link down in time. The
> default value is 30 seconds.
> 
> [1] https://www.rfc-editor.org/rfc/rfc1122#page-101
> 
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>

LGTM.

Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>

> ---
>  Documentation/networking/smc-sysctl.rst |  7 +++++++
>  include/net/netns/smc.h                 |  1 +
>  net/smc/smc_llc.c                       |  2 +-
>  net/smc/smc_llc.h                       |  1 +
>  net/smc/smc_sysctl.c                    | 14 ++++++++++++++
>  5 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
> index 742e90e..f8c3d59 100644
> --- a/Documentation/networking/smc-sysctl.rst
> +++ b/Documentation/networking/smc-sysctl.rst
> @@ -34,3 +34,10 @@ smcr_buf_type - INTEGER
>          - 1 - Use virtually contiguous buffers
>          - 2 - Mixed use of the two types. Try physically contiguous buffers first.
>            If not available, use virtually contiguous buffers then.
> +
> +smcr_testlink_time - INTEGER
> +	How frequently SMC-R link sends out TEST_LINK LLC messages to confirm
> +	viability, after the last activity of connections on it. The maximum
> +	value is (INT_MAX / HZ) seconds, the minimum value is 1 second.
> +
> +	Default: 30 seconds.
> diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
> index 2adbe2b..d295e2c 100644
> --- a/include/net/netns/smc.h
> +++ b/include/net/netns/smc.h
> @@ -19,5 +19,6 @@ struct netns_smc {
>  #endif
>  	unsigned int			sysctl_autocorking_size;
>  	unsigned int			sysctl_smcr_buf_type;
> +	int				sysctl_smcr_testlink_time;
>  };
>  #endif
> diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
> index 175026a..388bd2e 100644
> --- a/net/smc/smc_llc.c
> +++ b/net/smc/smc_llc.c
> @@ -2127,7 +2127,7 @@ void smc_llc_lgr_init(struct smc_link_group *lgr, struct smc_sock *smc)
>  	init_waitqueue_head(&lgr->llc_flow_waiter);
>  	init_waitqueue_head(&lgr->llc_msg_waiter);
>  	mutex_init(&lgr->llc_conf_mutex);
> -	lgr->llc_testlink_time = READ_ONCE(net->ipv4.sysctl_tcp_keepalive_time);
> +	lgr->llc_testlink_time = READ_ONCE(net->smc.sysctl_smcr_testlink_time) * HZ;
>  }
>  
>  /* called after lgr was removed from lgr_list */
> diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
> index 4404e52..1de9a29 100644
> --- a/net/smc/smc_llc.h
> +++ b/net/smc/smc_llc.h
> @@ -19,6 +19,7 @@
>  
>  #define SMC_LLC_WAIT_FIRST_TIME		(5 * HZ)
>  #define SMC_LLC_WAIT_TIME		(2 * HZ)
> +#define SMC_LLC_TESTLINK_DEFAULT_TIME	30
>  
>  enum smc_llc_reqresp {
>  	SMC_LLC_REQ,
> diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
> index 0613868..7f68520 100644
> --- a/net/smc/smc_sysctl.c
> +++ b/net/smc/smc_sysctl.c
> @@ -16,8 +16,12 @@
>  
>  #include "smc.h"
>  #include "smc_core.h"
> +#include "smc_llc.h"
>  #include "smc_sysctl.h"
>  
> +static int smcr_testlink_time_min = 1;
> +static int smcr_testlink_time_max = (INT_MAX / HZ);
> +
>  static struct ctl_table smc_table[] = {
>  	{
>  		.procname       = "autocorking_size",
> @@ -35,6 +39,15 @@
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_TWO,
>  	},
> +	{
> +		.procname	= "smcr_testlink_time",
> +		.data		= &init_net.smc.sysctl_smcr_testlink_time,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &smcr_testlink_time_min,
> +		.extra2		= &smcr_testlink_time_max,
> +	},
>  	{  }
>  };
>  
> @@ -60,6 +73,7 @@ int __net_init smc_sysctl_net_init(struct net *net)
>  
>  	net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE;
>  	net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS;
> +	net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME;
>  
>  	return 0;
>  
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time
  2022-09-20  2:53 ` [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time Wen Gu
  2022-09-20  2:57   ` Tony Lu
@ 2022-09-20  4:55   ` dust.li
  2022-09-20  6:23     ` Wen Gu
  1 sibling, 1 reply; 8+ messages in thread
From: dust.li @ 2022-09-20  4:55 UTC (permalink / raw)
  To: Wen Gu, kgraul, wenjia, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

On Tue, Sep 20, 2022 at 10:53:54AM +0800, Wen Gu wrote:
>SMC-R tests the viability of link by sending out TEST_LINK LLC
>messages over RoCE fabric when connections on link have been
>idle for a time longer than keepalive interval (testlink time).
>
>But using tcp_keepalive_time as testlink time maybe not quite
>suitable because it is default no less than two hours[1], which
>is too long for single link to find peer dead. The active host
>will still use peer-dead link (QP) sending messages, and can't
>find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes
>more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally.
>
>So this patch introduces a independent sysctl for SMC-R to set
>link keepalive time, in order to detect link down in time. The
>default value is 30 seconds.
>
>[1] https://www.rfc-editor.org/rfc/rfc1122#page-101
>
>Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>---
> Documentation/networking/smc-sysctl.rst |  7 +++++++
> include/net/netns/smc.h                 |  1 +
> net/smc/smc_llc.c                       |  2 +-
> net/smc/smc_llc.h                       |  1 +
> net/smc/smc_sysctl.c                    | 14 ++++++++++++++
> 5 files changed, 24 insertions(+), 1 deletion(-)
>
>diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
>index 742e90e..f8c3d59 100644
>--- a/Documentation/networking/smc-sysctl.rst
>+++ b/Documentation/networking/smc-sysctl.rst
>@@ -34,3 +34,10 @@ smcr_buf_type - INTEGER
>         - 1 - Use virtually contiguous buffers
>         - 2 - Mixed use of the two types. Try physically contiguous buffers first.
>           If not available, use virtually contiguous buffers then.
>+
>+smcr_testlink_time - INTEGER
>+	How frequently SMC-R link sends out TEST_LINK LLC messages to confirm
>+	viability, after the last activity of connections on it. The maximum
>+	value is (INT_MAX / HZ) seconds, the minimum value is 1 second.
>+
>+	Default: 30 seconds.
>diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
>index 2adbe2b..d295e2c 100644
>--- a/include/net/netns/smc.h
>+++ b/include/net/netns/smc.h
>@@ -19,5 +19,6 @@ struct netns_smc {
> #endif
> 	unsigned int			sysctl_autocorking_size;
> 	unsigned int			sysctl_smcr_buf_type;
>+	int				sysctl_smcr_testlink_time;
> };
> #endif
>diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
>index 175026a..388bd2e 100644
>--- a/net/smc/smc_llc.c
>+++ b/net/smc/smc_llc.c
>@@ -2127,7 +2127,7 @@ void smc_llc_lgr_init(struct smc_link_group *lgr, struct smc_sock *smc)
> 	init_waitqueue_head(&lgr->llc_flow_waiter);
> 	init_waitqueue_head(&lgr->llc_msg_waiter);
> 	mutex_init(&lgr->llc_conf_mutex);
>-	lgr->llc_testlink_time = READ_ONCE(net->ipv4.sysctl_tcp_keepalive_time);
>+	lgr->llc_testlink_time = READ_ONCE(net->smc.sysctl_smcr_testlink_time) * HZ;
> }
> 
> /* called after lgr was removed from lgr_list */
>diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
>index 4404e52..1de9a29 100644
>--- a/net/smc/smc_llc.h
>+++ b/net/smc/smc_llc.h
>@@ -19,6 +19,7 @@
> 
> #define SMC_LLC_WAIT_FIRST_TIME		(5 * HZ)
> #define SMC_LLC_WAIT_TIME		(2 * HZ)
>+#define SMC_LLC_TESTLINK_DEFAULT_TIME	30

I'm wondering why we don't follow the upper to macros using (30 * HZ) ?


> 
> enum smc_llc_reqresp {
> 	SMC_LLC_REQ,
>diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
>index 0613868..7f68520 100644
>--- a/net/smc/smc_sysctl.c
>+++ b/net/smc/smc_sysctl.c
>@@ -16,8 +16,12 @@
> 
> #include "smc.h"
> #include "smc_core.h"
>+#include "smc_llc.h"
> #include "smc_sysctl.h"
> 
>+static int smcr_testlink_time_min = 1;
>+static int smcr_testlink_time_max = (INT_MAX / HZ);
>+
> static struct ctl_table smc_table[] = {
> 	{
> 		.procname       = "autocorking_size",
>@@ -35,6 +39,15 @@
> 		.extra1		= SYSCTL_ZERO,
> 		.extra2		= SYSCTL_TWO,
> 	},
>+	{
>+		.procname	= "smcr_testlink_time",
>+		.data		= &init_net.smc.sysctl_smcr_testlink_time,
>+		.maxlen		= sizeof(int),
>+		.mode		= 0644,
>+		.proc_handler	= proc_dointvec_minmax,
>+		.extra1		= &smcr_testlink_time_min,
>+		.extra2		= &smcr_testlink_time_max,
>+	},
> 	{  }
> };
> 
>@@ -60,6 +73,7 @@ int __net_init smc_sysctl_net_init(struct net *net)
> 
> 	net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE;
> 	net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS;
>+	net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME;
> 
> 	return 0;
> 
>-- 
>1.8.3.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time
  2022-09-20  4:55   ` dust.li
@ 2022-09-20  6:23     ` Wen Gu
  2022-09-20  8:21       ` dust.li
  0 siblings, 1 reply; 8+ messages in thread
From: Wen Gu @ 2022-09-20  6:23 UTC (permalink / raw)
  To: dust.li, kgraul, wenjia, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel



On 2022/9/20 12:55, dust.li wrote:

> On Tue, Sep 20, 2022 at 10:53:54AM +0800, Wen Gu wrote:
>> SMC-R tests the viability of link by sending out TEST_LINK LLC
>> messages over RoCE fabric when connections on link have been
>> idle for a time longer than keepalive interval (testlink time).
>>
>> But using tcp_keepalive_time as testlink time maybe not quite
>> suitable because it is default no less than two hours[1], which
>> is too long for single link to find peer dead. The active host
>> will still use peer-dead link (QP) sending messages, and can't
>> find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes
>> more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally.
>>
>> So this patch introduces a independent sysctl for SMC-R to set
>> link keepalive time, in order to detect link down in time. The
>> default value is 30 seconds.
>>
>> [1] https://www.rfc-editor.org/rfc/rfc1122#page-101
>>
>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>> ---

>> /* called after lgr was removed from lgr_list */
>> diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
>> index 4404e52..1de9a29 100644
>> --- a/net/smc/smc_llc.h
>> +++ b/net/smc/smc_llc.h
>> @@ -19,6 +19,7 @@
>>
>> #define SMC_LLC_WAIT_FIRST_TIME		(5 * HZ)
>> #define SMC_LLC_WAIT_TIME		(2 * HZ)
>> +#define SMC_LLC_TESTLINK_DEFAULT_TIME	30
> 
> I'm wondering why we don't follow the upper to macros using (30 * HZ) ?
> 
Thanks for the reivew.

Because the value of sysctl_smcr_testlink_time is in seconds, and the value
of llc_testlink_time is jiffies.

I have thought about
1) using proc_dointvec_jiffies as sysctl's proc_handler just like TCP does.
    But proc_dointvec_jiffies has no minimum limit, value 0 makes no sense for SMC testlink.
2) using proc_dointvec_ms_jiffies_minmax as proc_handler. But millisecond interval
    seems expensive for SMC test link.

So, I choose to use proc_dointvec_minmax, make sysctl_smcr_testlink_time in
seconds, and convert to jiffies when assigning to llc_testlink_time.

Thanks,
Wen Gu.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time
  2022-09-20  6:23     ` Wen Gu
@ 2022-09-20  8:21       ` dust.li
  2022-09-20  9:03         ` Wen Gu
  0 siblings, 1 reply; 8+ messages in thread
From: dust.li @ 2022-09-20  8:21 UTC (permalink / raw)
  To: Wen Gu, kgraul, wenjia, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

On Tue, Sep 20, 2022 at 02:23:09PM +0800, Wen Gu wrote:
>
>
>On 2022/9/20 12:55, dust.li wrote:
>
>> On Tue, Sep 20, 2022 at 10:53:54AM +0800, Wen Gu wrote:
>> > SMC-R tests the viability of link by sending out TEST_LINK LLC
>> > messages over RoCE fabric when connections on link have been
>> > idle for a time longer than keepalive interval (testlink time).
>> > 
>> > But using tcp_keepalive_time as testlink time maybe not quite
>> > suitable because it is default no less than two hours[1], which
>> > is too long for single link to find peer dead. The active host
>> > will still use peer-dead link (QP) sending messages, and can't
>> > find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes
>> > more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally.
>> > 
>> > So this patch introduces a independent sysctl for SMC-R to set
>> > link keepalive time, in order to detect link down in time. The
>> > default value is 30 seconds.
>> > 
>> > [1] https://www.rfc-editor.org/rfc/rfc1122#page-101
>> > 
>> > Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>> > ---
>
>> > /* called after lgr was removed from lgr_list */
>> > diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
>> > index 4404e52..1de9a29 100644
>> > --- a/net/smc/smc_llc.h
>> > +++ b/net/smc/smc_llc.h
>> > @@ -19,6 +19,7 @@
>> > 
>> > #define SMC_LLC_WAIT_FIRST_TIME		(5 * HZ)
>> > #define SMC_LLC_WAIT_TIME		(2 * HZ)
>> > +#define SMC_LLC_TESTLINK_DEFAULT_TIME	30
>> 
>> I'm wondering why we don't follow the upper to macros using (30 * HZ) ?
>> 
>Thanks for the reivew.
>
>Because the value of sysctl_smcr_testlink_time is in seconds, and the value
>of llc_testlink_time is jiffies.
>
>I have thought about
>1) using proc_dointvec_jiffies as sysctl's proc_handler just like TCP does.
>   But proc_dointvec_jiffies has no minimum limit, value 0 makes no sense for SMC testlink.

Maybe 0 means disable the LLC TEST LINK ?


>2) using proc_dointvec_ms_jiffies_minmax as proc_handler. But millisecond interval
>   seems expensive for SMC test link.
>
>So, I choose to use proc_dointvec_minmax, make sysctl_smcr_testlink_time in
>seconds, and convert to jiffies when assigning to llc_testlink_time.

If proc_dointvec_jiffies_minmax is really the problem, maybe you can
write your own proc handler.


Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time
  2022-09-20  8:21       ` dust.li
@ 2022-09-20  9:03         ` Wen Gu
  0 siblings, 0 replies; 8+ messages in thread
From: Wen Gu @ 2022-09-20  9:03 UTC (permalink / raw)
  To: dust.li, kgraul, wenjia, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel



On 2022/9/20 16:21, dust.li wrote:

> On Tue, Sep 20, 2022 at 02:23:09PM +0800, Wen Gu wrote:
>>
>>
>> On 2022/9/20 12:55, dust.li wrote:
>>
>>> On Tue, Sep 20, 2022 at 10:53:54AM +0800, Wen Gu wrote:

>>>> +++ b/net/smc/smc_llc.h
>>>> @@ -19,6 +19,7 @@
>>>>
>>>> #define SMC_LLC_WAIT_FIRST_TIME		(5 * HZ)
>>>> #define SMC_LLC_WAIT_TIME		(2 * HZ)
>>>> +#define SMC_LLC_TESTLINK_DEFAULT_TIME	30
>>>
>>> I'm wondering why we don't follow the upper to macros using (30 * HZ) ?
>>>
>> Thanks for the reivew.
>>
>> Because the value of sysctl_smcr_testlink_time is in seconds, and the value
>> of llc_testlink_time is jiffies.
>>
>> I have thought about
>> 1) using proc_dointvec_jiffies as sysctl's proc_handler just like TCP does.
>>    But proc_dointvec_jiffies has no minimum limit, value 0 makes no sense for SMC testlink.
> 
> Maybe 0 means disable the LLC TEST LINK ?
> 
> 
>> 2) using proc_dointvec_ms_jiffies_minmax as proc_handler. But millisecond interval
>>    seems expensive for SMC test link.
>>
>> So, I choose to use proc_dointvec_minmax, make sysctl_smcr_testlink_time in
>> seconds, and convert to jiffies when assigning to llc_testlink_time.
> 
> If proc_dointvec_jiffies_minmax is really the problem, maybe you can
> write your own proc handler.
> 

Oops, I didn't noticed that value 0 means disabling LLC testlink in smc_llc_link_active().

So no need to set the minimum limit and proc_dointvec_jiffies will be fine. I will send a v2 to improve it.

Thanks,
Wen Gu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-20  9:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20  2:45 [PATCH net-next 0/2] Separate SMC parameter settings from TCP sysctls Wen Gu
2022-09-20  2:53 ` [PATCH net-next 1/2] net/smc: Introduce a specific sysctl for TEST_LINK time Wen Gu
2022-09-20  2:57   ` Tony Lu
2022-09-20  4:55   ` dust.li
2022-09-20  6:23     ` Wen Gu
2022-09-20  8:21       ` dust.li
2022-09-20  9:03         ` Wen Gu
2022-09-20  2:54 ` [PATCH net-next 2/2] net/smc: Unbind r/w buffer size from clcsock and make them tunable Wen Gu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).