netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/3] net/smc: Optimizing performance in
@ 2022-01-28 14:44 D. Wythe
  2022-01-28 14:44 ` [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent D. Wythe
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: D. Wythe @ 2022-01-28 14:44 UTC (permalink / raw)
  To: kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch set aims to optimizing performance of SMC in short-lived
links scenarios, which is quite unsatisfactory right now.

In our benchmark, we test it with follow scripts:

./wrk -c 10000 -t 4 -H 'Connection: Close' -d 20 http://smc-server

Current performance figures like that:

Running 20s test @ http://11.213.45.6
  4 threads and 10000 connections
  4956 requests in 20.06s, 3.24MB read
  Socket errors: connect 0, read 0, write 672, timeout 0
Requests/sec:    247.07
Transfer/sec:    165.28KB

There are many reasons for this phenomenon, this patch set doesn't
solve it all though, but it can be well alleviated with it in.

Patch 1/3  (Make smc_tcp_listen_work() independent) :

Separate smc_tcp_listen_work() from smc_listen_work(), make them
independent of each other, the busy SMC handshake can not affect new TCP
connections visit any more. Avoid discarding a large number of TCP
connections after being overstock, which is undoubtedly raise the
connection establishment time.

Patch 2/3 (Limits SMC backlog connections):

Since patch 1 has separated smc_tcp_listen_work() from
smc_listen_work(), an unrestricted TCP accept have come into being. This
patch try to put a limit on SMC backlog connections refers to
implementation of TCP.

Patch 3/3 (Fallback when SMC handshake workqueue congested):

Considering the complexity of SMC handshake right now, in short-lived
links scenarios, this may not be the main scenario of SMC though, it's
performance is still quite poor. This Patch try to provide auto fallback
case when SMC handshake workqueue congested, which is the sign of SMC
handshake stacking in our opinion.

Of course, it's optional.

After this patch set, performance figures like that:

Running 20s test @ http://11.213.45.6
  4 threads and 10000 connections
  693253 requests in 20.10s, 452.88MB read
Requests/sec:  34488.13
Transfer/sec:     22.53MB

That's a quite well performance improvement, about to 6 to 7 times in my
environment.

---
changelog:
v2: fix compile warning and invalid dependencies for kconfig
---
D. Wythe (3):
  net/smc: Make smc_tcp_listen_work() independent
  net/smc: Limits backlog connections
  net/smc: Fallback when handshake workqueue congested

 include/linux/tcp.h  |  1 +
 net/ipv4/tcp_input.c |  3 +-
 net/smc/Kconfig      | 12 ++++++++
 net/smc/af_smc.c     | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 net/smc/smc.h        |  5 ++++
 5 files changed, 96 insertions(+), 3 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent
  2022-01-28 14:44 [PATCH v2 net-next 0/3] net/smc: Optimizing performance in D. Wythe
@ 2022-01-28 14:44 ` D. Wythe
  2022-01-31 12:45   ` Karsten Graul
  2022-01-28 14:44 ` [PATCH v2 net-next 2/3] net/smc: Limits backlog connections D. Wythe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: D. Wythe @ 2022-01-28 14:44 UTC (permalink / raw)
  To: kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

In multithread and 10K connections benchmark, the backend TCP connection
established very slowly, and lots of TCP connections stay in SYN_SENT
state.

Client: smc_run wrk -c 10000 -t 4 http://server

the netstate of server host shows like:
    145042 times the listen queue of a socket overflowed
    145042 SYNs to LISTEN sockets dropped

One reason of this issue is that, since the smc_tcp_listen_work() shared
the same workqueue (smc_hs_wq) with smc_listen_work(), while the
smc_listen_work() do blocking wait for smc connection established. Once
the workqueue became congested, it's will block the accpet() from TCP
listen.

This patch creates a independent workqueue(smc_tcp_ls_wq) for
smc_tcp_listen_work(), separate it from smc_listen_work(), which is
quite acceptable considering that smc_tcp_listen_work() runs very fast.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 13 +++++++++++--
 net/smc/smc.h    |  1 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index d5ea62b..1b40304 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -59,6 +59,7 @@
 						 * creation on client
 						 */
 
+struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
 struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
 struct workqueue_struct	*smc_close_wq;	/* wq for close work */
 
@@ -2124,7 +2125,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
 	lsmc->clcsk_data_ready(listen_clcsock);
 	if (lsmc->sk.sk_state == SMC_LISTEN) {
 		sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */
-		if (!queue_work(smc_hs_wq, &lsmc->tcp_listen_work))
+		if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work))
 			sock_put(&lsmc->sk);
 	}
 }
@@ -2919,9 +2920,14 @@ static int __init smc_init(void)
 		goto out_nl;
 
 	rc = -ENOMEM;
+
+	smc_tcp_ls_wq = alloc_workqueue("smc_tcp_ls_wq", 0, 0);
+	if (!smc_tcp_ls_wq)
+		goto out_pnet;
+
 	smc_hs_wq = alloc_workqueue("smc_hs_wq", 0, 0);
 	if (!smc_hs_wq)
-		goto out_pnet;
+		goto out_alloc_tcp_ls_wq;
 
 	smc_close_wq = alloc_workqueue("smc_close_wq", 0, 0);
 	if (!smc_close_wq)
@@ -2992,6 +2998,8 @@ static int __init smc_init(void)
 	destroy_workqueue(smc_close_wq);
 out_alloc_hs_wq:
 	destroy_workqueue(smc_hs_wq);
+out_alloc_tcp_ls_wq:
+	destroy_workqueue(smc_tcp_ls_wq);
 out_pnet:
 	smc_pnet_exit();
 out_nl:
@@ -3010,6 +3018,7 @@ static void __exit smc_exit(void)
 	smc_core_exit();
 	smc_ib_unregister_client();
 	destroy_workqueue(smc_close_wq);
+	destroy_workqueue(smc_tcp_ls_wq);
 	destroy_workqueue(smc_hs_wq);
 	proto_unregister(&smc_proto6);
 	proto_unregister(&smc_proto);
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 3d0b8e3..bd2f3dc 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -264,6 +264,7 @@ static inline struct smc_sock *smc_sk(const struct sock *sk)
 	return (struct smc_sock *)sk;
 }
 
+extern struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
 extern struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
 extern struct workqueue_struct	*smc_close_wq;	/* wq for close work */
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 net-next 2/3] net/smc: Limits backlog connections
  2022-01-28 14:44 [PATCH v2 net-next 0/3] net/smc: Optimizing performance in D. Wythe
  2022-01-28 14:44 ` [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent D. Wythe
@ 2022-01-28 14:44 ` D. Wythe
  2022-01-29  4:37   ` Tony Lu
  2022-01-28 14:44 ` [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested D. Wythe
  2022-01-31 12:46 ` [PATCH v2 net-next 0/3] net/smc: Optimizing performance in Karsten Graul
  3 siblings, 1 reply; 14+ messages in thread
From: D. Wythe @ 2022-01-28 14:44 UTC (permalink / raw)
  To: kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

Current implementation does not handling backlog semantics, one
potential risk is that server will be flooded by infinite amount
connections, even if client was SMC-incapable.

This patch works to put a limit on backlog connections, referring to the
TCP implementation, we divides SMC connections into two categories:

1. Half SMC connection, which includes all TCP established while SMC not
connections.

2. Full SMC connection, which includes all SMC established connections.

For half SMC connection, since all half SMC connections starts with TCP
established, we can achieve our goal by put a limit before TCP
established. Refer to the implementation of TCP, this limits will based
on not only the half SMC connections but also the full connections,
which is also a constraint on full SMC connections.

For full SMC connections, although we know exactly where it starts, it's
quite hard to put a limit before it. The easiest way is to block wait
before receive SMC confirm CLC message, while it's under protection by
smc_server_lgr_pending, a global lock, which leads this limit to the
entire host instead of a single listen socket. Another way is to drop
the full connections, but considering the cast of SMC connections, we
prefer to keep full SMC connections.

Even so, the limits of full SMC connections still exists, see commits
about half SMC connection below.

After this patch, the limits of backend connection shows like:

For SMC:

1. Client with SMC-capability can makes 2 * backlog full SMC connections
   or 1 * backlog half SMC connections and 1 * backlog full SMC
   connections at most.

2. Client without SMC-capability can only makes 1 * backlog half TCP
   connections and 1 * backlog full TCP connections.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
changelog:
v2: fix compile warning
---
 net/smc/af_smc.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 net/smc/smc.h    |  4 ++++
 2 files changed, 47 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 1b40304..66a0e64 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -73,6 +73,34 @@ static void smc_set_keepalive(struct sock *sk, int val)
 	smc->clcsock->sk->sk_prot->keepalive(smc->clcsock->sk, val);
 }
 
+static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
+					  struct request_sock *req,
+					  struct dst_entry *dst,
+					  struct request_sock *req_unhash,
+					  bool *own_req)
+{
+	struct smc_sock *smc;
+
+	smc = (struct smc_sock *)((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY);
+
+	if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->smc_pendings) >
+				sk->sk_max_ack_backlog)
+		goto drop;
+
+	if (sk_acceptq_is_full(&smc->sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		goto drop;
+	}
+
+	/* passthrough to origin syn recv sock fct */
+	return smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
+
+drop:
+	dst_release(dst);
+	tcp_listendrop(sk);
+	return NULL;
+}
+
 static struct smc_hashinfo smc_v4_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
 };
@@ -1491,6 +1519,9 @@ static void smc_listen_out(struct smc_sock *new_smc)
 	struct smc_sock *lsmc = new_smc->listen_smc;
 	struct sock *newsmcsk = &new_smc->sk;
 
+	if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+		atomic_dec(&lsmc->smc_pendings);
+
 	if (lsmc->sk.sk_state == SMC_LISTEN) {
 		lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING);
 		smc_accept_enqueue(&lsmc->sk, newsmcsk);
@@ -2096,6 +2127,9 @@ static void smc_tcp_listen_work(struct work_struct *work)
 		if (!new_smc)
 			continue;
 
+		if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+			atomic_inc(&lsmc->smc_pendings);
+
 		new_smc->listen_smc = lsmc;
 		new_smc->use_fallback = lsmc->use_fallback;
 		new_smc->fallback_rsn = lsmc->fallback_rsn;
@@ -2163,6 +2197,15 @@ static int smc_listen(struct socket *sock, int backlog)
 	smc->clcsock->sk->sk_data_ready = smc_clcsock_data_ready;
 	smc->clcsock->sk->sk_user_data =
 		(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+
+	/* save origin ops */
+	smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;
+
+	smc->af_ops = *smc->ori_af_ops;
+	smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;
+
+	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
+
 	rc = kernel_listen(smc->clcsock, backlog);
 	if (rc) {
 		smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index bd2f3dc..0e27113 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -240,6 +240,10 @@ struct smc_sock {				/* smc sock container */
 	bool			use_fallback;	/* fallback to tcp */
 	int			fallback_rsn;	/* reason for fallback */
 	u32			peer_diagnosis; /* decline reason from peer */
+	atomic_t                smc_pendings;   /* pending smc connections */
+	struct inet_connection_sock_af_ops		af_ops;
+	const struct inet_connection_sock_af_ops	*ori_af_ops;
+						/* origin af ops */
 	int			sockopt_defer_accept;
 						/* sockopt TCP_DEFER_ACCEPT
 						 * value
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested
  2022-01-28 14:44 [PATCH v2 net-next 0/3] net/smc: Optimizing performance in D. Wythe
  2022-01-28 14:44 ` [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent D. Wythe
  2022-01-28 14:44 ` [PATCH v2 net-next 2/3] net/smc: Limits backlog connections D. Wythe
@ 2022-01-28 14:44 ` D. Wythe
  2022-01-29  4:33   ` Tony Lu
  2022-01-31 12:46 ` [PATCH v2 net-next 0/3] net/smc: Optimizing performance in Karsten Graul
  3 siblings, 1 reply; 14+ messages in thread
From: D. Wythe @ 2022-01-28 14:44 UTC (permalink / raw)
  To: kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch intends to provide a mechanism to allow automatic fallback to
TCP according to the pressure of SMC handshake process. At present,
frequent visits will cause the incoming connections to be backlogged in
SMC handshake queue, raise the connections established time. Which is
quite unacceptable for those applications who base on short lived
connections.

It should be optional for applications that don't care about connection
established time. For now, this patch only provides the switch at the
compile time.

There are two ways to implement this mechanism:

1. Fallback when TCP established.
2. Fallback before TCP established.

In the first way, we need to wait and receive CLC messages that the
client will potentially send, and then actively reply with a decline
message, in a sense, which is also a sort of SMC handshake, affect the
connections established time on its way.

In the second way, the only problem is that we need to inject SMC logic
into TCP when it is about to reply the incoming SYN, since we already do
that, it's seems not a problem anymore. And advantage is obvious, few
additional processes are required to complete the fallback.

This patch use the second way.

Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
changelog:
v2: fix invalid dependencies for kconfig 
---
 include/linux/tcp.h  |  1 +
 net/ipv4/tcp_input.c |  3 ++-
 net/smc/Kconfig      | 12 ++++++++++++
 net/smc/af_smc.c     | 22 ++++++++++++++++++++++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 78b91bb..1c4ae5d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -394,6 +394,7 @@ struct tcp_sock {
 	bool	is_mptcp;
 #endif
 #if IS_ENABLED(CONFIG_SMC)
+	bool	(*smc_in_limited)(const struct sock *sk);
 	bool	syn_smc;	/* SYN includes SMC */
 #endif
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index dc49a3d..9890de9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6701,7 +6701,8 @@ static void tcp_openreq_init(struct request_sock *req,
 	ireq->ir_num = ntohs(tcp_hdr(skb)->dest);
 	ireq->ir_mark = inet_request_mark(sk, skb);
 #if IS_ENABLED(CONFIG_SMC)
-	ireq->smc_ok = rx_opt->smc_ok;
+	ireq->smc_ok = rx_opt->smc_ok && !(tcp_sk(sk)->smc_in_limited &&
+			tcp_sk(sk)->smc_in_limited(sk));
 #endif
 }
 
diff --git a/net/smc/Kconfig b/net/smc/Kconfig
index 1ab3c5a..a4e1713 100644
--- a/net/smc/Kconfig
+++ b/net/smc/Kconfig
@@ -19,3 +19,15 @@ config SMC_DIAG
 	  smcss.
 
 	  if unsure, say Y.
+
+if SMC
+
+config SMC_AUTO_FALLBACK
+	bool "SMC: automatic fallback to TCP"
+	default y
+	help
+	  Allow automatic fallback to TCP accroding to the pressure of SMC-R
+	  handshake process.
+
+	  If that's not what you except or unsure, say N.
+endif
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 66a0e64..49b8a29 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -101,6 +101,24 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff
 	return NULL;
 }
 
+#if IS_ENABLED(CONFIG_SMC_AUTO_FALLBACK)
+static bool smc_is_in_limited(const struct sock *sk)
+{
+	const struct smc_sock *smc;
+
+	smc = (const struct smc_sock *)
+		((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY);
+
+	if (!smc)
+		return true;
+
+	if (workqueue_congested(WORK_CPU_UNBOUND, smc_hs_wq))
+		return true;
+
+	return false;
+}
+#endif
+
 static struct smc_hashinfo smc_v4_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
 };
@@ -2206,6 +2224,10 @@ static int smc_listen(struct socket *sock, int backlog)
 
 	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
 
+#if IS_ENABLED(CONFIG_SMC_AUTO_FALLBACK)
+	tcp_sk(smc->clcsock->sk)->smc_in_limited = smc_is_in_limited;
+#endif
+
 	rc = kernel_listen(smc->clcsock, backlog);
 	if (rc) {
 		smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested
  2022-01-28 14:44 ` [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested D. Wythe
@ 2022-01-29  4:33   ` Tony Lu
  2022-02-02 14:04     ` D. Wythe
  2022-02-07  7:13     ` D. Wythe
  0 siblings, 2 replies; 14+ messages in thread
From: Tony Lu @ 2022-01-29  4:33 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

On Fri, Jan 28, 2022 at 10:44:38PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> @@ -19,3 +19,15 @@ config SMC_DIAG

...

> +if SMC
> +
> +config SMC_AUTO_FALLBACK
> +	bool "SMC: automatic fallback to TCP"
> +	default y
> +	help
> +	  Allow automatic fallback to TCP accroding to the pressure of SMC-R
> +	  handshake process.
> +
> +	  If that's not what you except or unsure, say N.
> +endif

Using a netlink knob to control behavior with static key should be more
flexible. As I appended in the previous version of this patch.

Thank you,
Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 2/3] net/smc: Limits backlog connections
  2022-01-28 14:44 ` [PATCH v2 net-next 2/3] net/smc: Limits backlog connections D. Wythe
@ 2022-01-29  4:37   ` Tony Lu
  2022-02-02 14:01     ` D. Wythe
  0 siblings, 1 reply; 14+ messages in thread
From: Tony Lu @ 2022-01-29  4:37 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

On Fri, Jan 28, 2022 at 10:44:37PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Current implementation does not handling backlog semantics, one
> potential risk is that server will be flooded by infinite amount
> connections, even if client was SMC-incapable.
> 
> This patch works to put a limit on backlog connections, referring to the
> TCP implementation, we divides SMC connections into two categories:
> 
> 1. Half SMC connection, which includes all TCP established while SMC not
> connections.
> 
> 2. Full SMC connection, which includes all SMC established connections.
> 
> For half SMC connection, since all half SMC connections starts with TCP
> established, we can achieve our goal by put a limit before TCP
> established. Refer to the implementation of TCP, this limits will based
> on not only the half SMC connections but also the full connections,
> which is also a constraint on full SMC connections.
> 
> For full SMC connections, although we know exactly where it starts, it's
> quite hard to put a limit before it. The easiest way is to block wait
> before receive SMC confirm CLC message, while it's under protection by
> smc_server_lgr_pending, a global lock, which leads this limit to the
> entire host instead of a single listen socket. Another way is to drop
> the full connections, but considering the cast of SMC connections, we
> prefer to keep full SMC connections.
> 
> Even so, the limits of full SMC connections still exists, see commits
> about half SMC connection below.
> 
> After this patch, the limits of backend connection shows like:
> 
> For SMC:
> 
> 1. Client with SMC-capability can makes 2 * backlog full SMC connections
>    or 1 * backlog half SMC connections and 1 * backlog full SMC
>    connections at most.
> 
> 2. Client without SMC-capability can only makes 1 * backlog half TCP
>    connections and 1 * backlog full TCP connections.
> 
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> changelog:
> v2: fix compile warning
> ---
>  net/smc/af_smc.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  net/smc/smc.h    |  4 ++++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 1b40304..66a0e64 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -73,6 +73,34 @@ static void smc_set_keepalive(struct sock *sk, int val)
>  	smc->clcsock->sk->sk_prot->keepalive(smc->clcsock->sk, val);
>  }
>  
> +static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
> +					  struct request_sock *req,
> +					  struct dst_entry *dst,
> +					  struct request_sock *req_unhash,
> +					  bool *own_req)
> +{
> +	struct smc_sock *smc;
> +
> +	smc = (struct smc_sock *)((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY);
> +
> +	if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->smc_pendings) >
> +				sk->sk_max_ack_backlog)
> +		goto drop;
> +
> +	if (sk_acceptq_is_full(&smc->sk)) {
> +		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> +		goto drop;
> +	}
> +
> +	/* passthrough to origin syn recv sock fct */
> +	return smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);

I am wondering if there would introduce more overhead, compared with
original implement?

> +
> +drop:
> +	dst_release(dst);
> +	tcp_listendrop(sk);
> +	return NULL;
> +}
> +
>  static struct smc_hashinfo smc_v4_hashinfo = {
>  	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
>  };
> @@ -1491,6 +1519,9 @@ static void smc_listen_out(struct smc_sock *new_smc)
>  	struct smc_sock *lsmc = new_smc->listen_smc;
>  	struct sock *newsmcsk = &new_smc->sk;
>  
> +	if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
> +		atomic_dec(&lsmc->smc_pendings);
> +
>  	if (lsmc->sk.sk_state == SMC_LISTEN) {
>  		lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING);
>  		smc_accept_enqueue(&lsmc->sk, newsmcsk);
> @@ -2096,6 +2127,9 @@ static void smc_tcp_listen_work(struct work_struct *work)
>  		if (!new_smc)
>  			continue;
>  
> +		if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
> +			atomic_inc(&lsmc->smc_pendings);
> +
>  		new_smc->listen_smc = lsmc;
>  		new_smc->use_fallback = lsmc->use_fallback;
>  		new_smc->fallback_rsn = lsmc->fallback_rsn;
> @@ -2163,6 +2197,15 @@ static int smc_listen(struct socket *sock, int backlog)
>  	smc->clcsock->sk->sk_data_ready = smc_clcsock_data_ready;
>  	smc->clcsock->sk->sk_user_data =
>  		(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
> +
> +	/* save origin ops */
> +	smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;
> +
> +	smc->af_ops = *smc->ori_af_ops;
> +	smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;
> +
> +	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;

Consider to save syn_recv_sock this field only? There seems no need to
save this ops all.

Thank you,
Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent
  2022-01-28 14:44 ` [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent D. Wythe
@ 2022-01-31 12:45   ` Karsten Graul
  2022-02-02 12:53     ` D. Wythe
  0 siblings, 1 reply; 14+ messages in thread
From: Karsten Graul @ 2022-01-31 12:45 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

On 28/01/2022 15:44, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> In multithread and 10K connections benchmark, the backend TCP connection
> established very slowly, and lots of TCP connections stay in SYN_SENT
> state.
> 
> Client: smc_run wrk -c 10000 -t 4 http://server
> 
> the netstate of server host shows like:
>     145042 times the listen queue of a socket overflowed
>     145042 SYNs to LISTEN sockets dropped
> 
> One reason of this issue is that, since the smc_tcp_listen_work() shared
> the same workqueue (smc_hs_wq) with smc_listen_work(), while the
> smc_listen_work() do blocking wait for smc connection established. Once
> the workqueue became congested, it's will block the accpet() from TCP
                                                      ^^^
                                                      accept()
> listen.
> 
> This patch creates a independent workqueue(smc_tcp_ls_wq) for
> smc_tcp_listen_work(), separate it from smc_listen_work(), which is
> quite acceptable considering that smc_tcp_listen_work() runs very fast.
> 
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>  net/smc/af_smc.c | 13 +++++++++++--
>  net/smc/smc.h    |  1 +
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index d5ea62b..1b40304 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -59,6 +59,7 @@
>  						 * creation on client
>  						 */
>  
> +struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
>  struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
>  struct workqueue_struct	*smc_close_wq;	/* wq for close work */
>  
> @@ -2124,7 +2125,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
>  	lsmc->clcsk_data_ready(listen_clcsock);
>  	if (lsmc->sk.sk_state == SMC_LISTEN) {
>  		sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */
> -		if (!queue_work(smc_hs_wq, &lsmc->tcp_listen_work))
> +		if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work))
>  			sock_put(&lsmc->sk);
>  	}
>  }
> @@ -2919,9 +2920,14 @@ static int __init smc_init(void)
>  		goto out_nl;
>  
>  	rc = -ENOMEM;
> +
> +	smc_tcp_ls_wq = alloc_workqueue("smc_tcp_ls_wq", 0, 0);
> +	if (!smc_tcp_ls_wq)
> +		goto out_pnet;
> +
>  	smc_hs_wq = alloc_workqueue("smc_hs_wq", 0, 0);
>  	if (!smc_hs_wq)
> -		goto out_pnet;
> +		goto out_alloc_tcp_ls_wq;
>  
>  	smc_close_wq = alloc_workqueue("smc_close_wq", 0, 0);
>  	if (!smc_close_wq)
> @@ -2992,6 +2998,8 @@ static int __init smc_init(void)
>  	destroy_workqueue(smc_close_wq);
>  out_alloc_hs_wq:
>  	destroy_workqueue(smc_hs_wq);
> +out_alloc_tcp_ls_wq:
> +	destroy_workqueue(smc_tcp_ls_wq);
>  out_pnet:
>  	smc_pnet_exit();
>  out_nl:
> @@ -3010,6 +3018,7 @@ static void __exit smc_exit(void)
>  	smc_core_exit();
>  	smc_ib_unregister_client();
>  	destroy_workqueue(smc_close_wq);
> +	destroy_workqueue(smc_tcp_ls_wq);
>  	destroy_workqueue(smc_hs_wq);
>  	proto_unregister(&smc_proto6);
>  	proto_unregister(&smc_proto);
> diff --git a/net/smc/smc.h b/net/smc/smc.h
> index 3d0b8e3..bd2f3dc 100644
> --- a/net/smc/smc.h
> +++ b/net/smc/smc.h
> @@ -264,6 +264,7 @@ static inline struct smc_sock *smc_sk(const struct sock *sk)
>  	return (struct smc_sock *)sk;
>  }
>  
> +extern struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */

I don't think this extern is needed, the work queue is only used within af_smc.c, right?
Even the smc_hs_wq would not need to be extern, but this would be a future cleanup.

>  extern struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
>  extern struct workqueue_struct	*smc_close_wq;	/* wq for close work */
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 0/3] net/smc: Optimizing performance in
  2022-01-28 14:44 [PATCH v2 net-next 0/3] net/smc: Optimizing performance in D. Wythe
                   ` (2 preceding siblings ...)
  2022-01-28 14:44 ` [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested D. Wythe
@ 2022-01-31 12:46 ` Karsten Graul
  2022-02-02 13:00   ` D. Wythe
  3 siblings, 1 reply; 14+ messages in thread
From: Karsten Graul @ 2022-01-31 12:46 UTC (permalink / raw)
  To: D. Wythe; +Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

On 28/01/2022 15:44, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 

Cover letter subject: "Optimizing performance in" ??

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent
  2022-01-31 12:45   ` Karsten Graul
@ 2022-02-02 12:53     ` D. Wythe
  0 siblings, 0 replies; 14+ messages in thread
From: D. Wythe @ 2022-02-02 12:53 UTC (permalink / raw)
  To: Karsten Graul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

That's right, 'extern' is unnecessary, I'll remove it soon.

Looking forward for more advise.

Thanks.


>>   
>> +extern struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
> 
> I don't think this extern is needed, the work queue is only used within af_smc.c, right?
> Even the smc_hs_wq would not need to be extern, but this would be a future cleanup.
> 
>>   extern struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
>>   extern struct workqueue_struct	*smc_close_wq;	/* wq for close work */
>>   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 0/3] net/smc: Optimizing performance in
  2022-01-31 12:46 ` [PATCH v2 net-next 0/3] net/smc: Optimizing performance in Karsten Graul
@ 2022-02-02 13:00   ` D. Wythe
  0 siblings, 0 replies; 14+ messages in thread
From: D. Wythe @ 2022-02-02 13:00 UTC (permalink / raw)
  To: Karsten Graul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

Optimizing performance in short-lived links scenarios...

Sorry for that, it was cut off automatically. I don't know how to keep 
it as it was. Could you give me any any suggestions?

Thanks.

在 2022/1/31 下午8:46, Karsten Graul 写道:
> On 28/01/2022 15:44, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
> 
> Cover letter subject: "Optimizing performance in" ??

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 2/3] net/smc: Limits backlog connections
  2022-01-29  4:37   ` Tony Lu
@ 2022-02-02 14:01     ` D. Wythe
  0 siblings, 0 replies; 14+ messages in thread
From: D. Wythe @ 2022-02-02 14:01 UTC (permalink / raw)
  To: Tony Lu
  Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

The overhead will certainly exist, but compared with the benefits, I 
think it should be acceptable. If you do care, maybe we can add a switch 
to control it.


> I am wondering if there would introduce more overhead, compared with
> original implement?
> 
>> +
>> +drop:
>> +	dst_release(dst);
>> +	tcp_listendrop(sk);
>> +	return NULL;
>> +}
>> +
>>   static struct smc_hashinfo smc_v4_hashinfo = {
>>   	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
>>   };
>> @@ -1491,6 +1519,9 @@ static void smc_listen_out(struct smc_sock *new_smc)
>>   	struct smc_sock *lsmc = new_smc->listen_smc;
>>   	struct sock *newsmcsk = &new_smc->sk;
>>   
>> +	if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
>> +		atomic_dec(&lsmc->smc_pendings);
>> +
>>   	if (lsmc->sk.sk_state == SMC_LISTEN) {
>>   		lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING);
>>   		smc_accept_enqueue(&lsmc->sk, newsmcsk);
>> @@ -2096,6 +2127,9 @@ static void smc_tcp_listen_work(struct work_struct *work)
>>   		if (!new_smc)
>>   			continue;
>>   
>> +		if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
>> +			atomic_inc(&lsmc->smc_pendings);
>> +
>>   		new_smc->listen_smc = lsmc;
>>   		new_smc->use_fallback = lsmc->use_fallback;
>>   		new_smc->fallback_rsn = lsmc->fallback_rsn;
>> @@ -2163,6 +2197,15 @@ static int smc_listen(struct socket *sock, int backlog)
>>   	smc->clcsock->sk->sk_data_ready = smc_clcsock_data_ready;
>>   	smc->clcsock->sk->sk_user_data =
>>   		(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
>> +
>> +	/* save origin ops */
>> +	smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;
>> +
>> +	smc->af_ops = *smc->ori_af_ops;
>> +	smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;
>> +
>> +	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
> 


Only save syn_recv_sock? Maybe this comment is confusing,
  ‘Copy the origin ops’ is better, the origin ops is pointer to a const 
structure, we must copy it all, and repointer it to our structure. so 
the copy/save is necessary.

Thanks.

> Consider to save syn_recv_sock this field only? There seems no need to
> save this ops all.
> 
> Thank you,
> Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested
  2022-01-29  4:33   ` Tony Lu
@ 2022-02-02 14:04     ` D. Wythe
  2022-02-07  7:13     ` D. Wythe
  1 sibling, 0 replies; 14+ messages in thread
From: D. Wythe @ 2022-02-02 14:04 UTC (permalink / raw)
  To: Tony Lu
  Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts


Copy that. I'll try it in next version.

Thanks.

在 2022/1/29 下午12:33, Tony Lu 写道:

> Using a netlink knob to control behavior with static key should be more
> flexible. As I appended in the previous version of this patch.
> 
> Thank you,
> Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested
  2022-01-29  4:33   ` Tony Lu
  2022-02-02 14:04     ` D. Wythe
@ 2022-02-07  7:13     ` D. Wythe
  2022-02-07  9:37       ` Tony Lu
  1 sibling, 1 reply; 14+ messages in thread
From: D. Wythe @ 2022-02-07  7:13 UTC (permalink / raw)
  To: Tony Lu
  Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts


After some trial and thought, I found that the scope of netlink control 
is too large, we should limit the scope to socket. Adding a socket 
option may be a better choice, what do you think?


在 2022/1/29 下午12:33, Tony Lu 写道:
> On Fri, Jan 28, 2022 at 10:44:38PM +0800, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>> @@ -19,3 +19,15 @@ config SMC_DIAG
> 
> ...
> 
>> +if SMC
>> +
>> +config SMC_AUTO_FALLBACK
>> +	bool "SMC: automatic fallback to TCP"
>> +	default y
>> +	help
>> +	  Allow automatic fallback to TCP accroding to the pressure of SMC-R
>> +	  handshake process.
>> +
>> +	  If that's not what you except or unsure, say N.
>> +endif
> 
> Using a netlink knob to control behavior with static key should be more
> flexible. As I appended in the previous version of this patch.
> 
> Thank you,
> Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested
  2022-02-07  7:13     ` D. Wythe
@ 2022-02-07  9:37       ` Tony Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Tony Lu @ 2022-02-07  9:37 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma, matthieu.baerts

On Mon, Feb 07, 2022 at 03:13:22PM +0800, D. Wythe wrote:
> 
> After some trial and thought, I found that the scope of netlink control is
> too large, we should limit the scope to socket. Adding a socket option may
> be a better choice, what do you think?
> 
It is a good idea to be a socket-level config. Maybe we could consider
netlink as default global behaviour.

Thanks,
Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-02-07  9:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28 14:44 [PATCH v2 net-next 0/3] net/smc: Optimizing performance in D. Wythe
2022-01-28 14:44 ` [PATCH v2 net-next 1/3] net/smc: Make smc_tcp_listen_work() independent D. Wythe
2022-01-31 12:45   ` Karsten Graul
2022-02-02 12:53     ` D. Wythe
2022-01-28 14:44 ` [PATCH v2 net-next 2/3] net/smc: Limits backlog connections D. Wythe
2022-01-29  4:37   ` Tony Lu
2022-02-02 14:01     ` D. Wythe
2022-01-28 14:44 ` [PATCH v2 net-next 3/3] net/smc: Fallback when handshake workqueue congested D. Wythe
2022-01-29  4:33   ` Tony Lu
2022-02-02 14:04     ` D. Wythe
2022-02-07  7:13     ` D. Wythe
2022-02-07  9:37       ` Tony Lu
2022-01-31 12:46 ` [PATCH v2 net-next 0/3] net/smc: Optimizing performance in Karsten Graul
2022-02-02 13:00   ` D. Wythe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).