All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
@ 2022-10-23 12:43 D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending D.Wythe
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D.Wythe" <alibuda@linux.alibaba.com>

This patch set attempts to optimize the parallelism of SMC-R connections,
mainly to reduce unnecessary blocking on locks, and to fix exceptions that
occur after thoses optimization.

According to Off-CPU graph, SMC worker's off-CPU as that:

smc_close_passive_work                  (1.09%)
        smcr_buf_unuse                  (1.08%)
                smc_llc_flow_initiate   (1.02%)

smc_listen_work                         (48.17%)
        __mutex_lock.isra.11            (47.96%)


An ideal SMC-R connection process should only block on the IO events
of the network, but it's quite clear that the SMC-R connection now is
queued on the lock most of the time.

The goal of this patchset is to achieve our ideal situation where
network IO events are blocked for the majority of the connection lifetime.

There are three big locks here:

1. smc_client_lgr_pending & smc_server_lgr_pending

2. llc_conf_mutex

3. rmbs_lock & sndbufs_lock

And an implementation issue:

1. confirm/delete rkey msg can't be sent concurrently while
protocol allows indeed.

Unfortunately,The above problems together affect the parallelism of
SMC-R connection. If any of them are not solved. our goal cannot
be achieved.

After this patch set, we can get a quite ideal off-CPU graph as
following:

smc_close_passive_work                                  (41.58%)
        smcr_buf_unuse                                  (41.57%)
                smc_llc_do_delete_rkey                  (41.57%)

smc_listen_work                                         (39.10%)
        smc_clc_wait_msg                                (13.18%)
                tcp_recvmsg_locked                      (13.18)
        smc_listen_find_device                          (25.87%)
                smcr_lgr_reg_rmbs                       (25.87%)
                        smc_llc_do_confirm_rkey         (25.87%)

We can see that most of the waiting times are waiting for network IO
events. This also has a certain performance improvement on our
short-lived conenction wrk/nginx benchmark test:

+--------------+------+------+-------+--------+------+--------+
|conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
+--------------+------+------+-------+--------+------+--------+
|SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
+--------------+------+------+-------+--------+------+--------+
|SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
+--------------+------+------+-------+--------+------+--------+
|TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
+--------------+------+------+-------+--------+------+--------+

The reason why the benefit is not obvious after the number of connections
has increased dues to workqueue. If we try to change workqueue to UNBOUND,
we can obtain at least 4-5 times performance improvement, reach up to half
of TCP. However, this is not an elegant solution, the optimization of it
will be much more complicated. But in any case, we will submit relevant
optimization patches as soon as possible.

Please note that the premise here is that the lock related problem
must be solved first, otherwise, no matter how we optimize the workqueue,
there won't be much improvement.

Because there are a lot of related changes to the code, if you have
any questions or suggestions, please let me know.

Thanks
D. Wythe

v1 -> v2:

1. Fix panic in SMC-D scenario
2. Fix lnkc related hashfn calculation exception, caused by operator
priority
3. Only wake up one connection if the lnk is not active
4. Delete obsolete unlock logic in smc_listen_work()
5. PATCH format, do Reverse Christmas tree
6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx 
7. PATCH format, add correct fix tag for the patches for fixes.
8. PATCH format, fix some spelling error
9. PATCH format, rename slow to do_slow

v2 -> v3:

1. add SMC-D support, remove the concept of link cluster since SMC-D has
no link at all. Replace it by lgr decision maker, who provides suggestions
to SMC-D and SMC-R on whether to create new link group.

2. Fix the corruption problem described by PATCH 'fix application
data exception' on SMC-D.

v3 -> v4:

1. Fix panic caused by uninitialization map.

D. Wythe (10):
  net/smc: remove locks smc_client_lgr_pending and
    smc_server_lgr_pending
  net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
  net/smc: allow confirm/delete rkey response deliver multiplex
  net/smc: make SMC_LLC_FLOW_RKEY run concurrently
  net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
  net/smc: use read semaphores to reduce unnecessary blocking in
    smc_buf_create() & smcr_buf_unuse()
  net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
  net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
  net/smc: Fix potential panic dues to unprotected
    smc_llc_srv_add_link()
  net/smc: fix application data exception

 net/smc/af_smc.c   |  70 ++++----
 net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
 net/smc/smc_core.h |  36 +++-
 net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
 net/smc/smc_llc.h  |   6 +
 net/smc/smc_wr.c   |  10 --
 net/smc/smc_wr.h   |  10 ++
 7 files changed, 712 insertions(+), 175 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
@ 2022-10-23 12:43 ` D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 02/10] net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending D.Wythe
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch attempts to remove locks named smc_client_lgr_pending and
smc_server_lgr_pending, which aim to serialize the creation of link
group. However, once link group existed already, those locks are
meaningless, worse still, they make incoming connections have to be
queued one after the other.

Now, the creation of link group is no longer generated by competition,
but allocated through following strategy.

1. Try to find a suitable link group, if successd, current connection
is considered as NON first contact connection. ends.

2. Check the number of connections currently waiting for a suitable
link group to be created, if it is not less that the number of link
groups to be created multiplied by (SMC_RMBS_PER_LGR_MAX - 1), then
increase the number of link groups to be created, current connection
is considered as the first contact connection. ends.

3. Increase the number of connections currently waiting, and wait
for woken up.

4. Decrease the number of connections currently waiting, goto 1.

We wake up the connection that was put to sleep in stage 3 through
the SMC link state change event. Once the link moves out of the
SMC_LNK_ACTIVATING state, decrease the number of link groups to
be created, and then wake up at most (SMC_RMBS_PER_LGR_MAX - 1)
connections.

In the iplementation, we introduce the concept of lnk cluster, which is
a collection of links with the same characteristics (see
smcr_lnk_cluster_cmpfn() with more details), which makes it possible to
wake up efficiently in the scenario of N v.s 1.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c   |  41 ++----
 net/smc/smc_core.c | 375 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 net/smc/smc_core.h |  10 ++
 3 files changed, 392 insertions(+), 34 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index e44ca70..e369ab6 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -53,13 +53,6 @@
 #include "smc_tracepoint.h"
 #include "smc_sysctl.h"
 
-static DEFINE_MUTEX(smc_server_lgr_pending);	/* serialize link group
-						 * creation on server
-						 */
-static DEFINE_MUTEX(smc_client_lgr_pending);	/* serialize link group
-						 * creation on client
-						 */
-
 static struct workqueue_struct	*smc_tcp_ls_wq;	/* wq for tcp listen work */
 struct workqueue_struct	*smc_hs_wq;	/* wq for handshake work */
 struct workqueue_struct	*smc_close_wq;	/* wq for close work */
@@ -1196,10 +1189,8 @@ static int smc_connect_rdma(struct smc_sock *smc,
 	if (reason_code)
 		return reason_code;
 
-	mutex_lock(&smc_client_lgr_pending);
 	reason_code = smc_conn_create(smc, ini);
 	if (reason_code) {
-		mutex_unlock(&smc_client_lgr_pending);
 		return reason_code;
 	}
 
@@ -1291,7 +1282,6 @@ static int smc_connect_rdma(struct smc_sock *smc,
 		if (reason_code)
 			goto connect_abort;
 	}
-	mutex_unlock(&smc_client_lgr_pending);
 
 	smc_copy_sock_settings_to_clc(smc);
 	smc->connect_nonblock = 0;
@@ -1301,7 +1291,6 @@ static int smc_connect_rdma(struct smc_sock *smc,
 	return 0;
 connect_abort:
 	smc_conn_abort(smc, ini->first_contact_local);
-	mutex_unlock(&smc_client_lgr_pending);
 	smc->connect_nonblock = 0;
 
 	return reason_code;
@@ -1347,11 +1336,8 @@ static int smc_connect_ism(struct smc_sock *smc,
 	}
 	ini->ism_peer_gid[ini->ism_selected] = aclc->d0.gid;
 
-	/* there is only one lgr role for SMC-D; use server lock */
-	mutex_lock(&smc_server_lgr_pending);
 	rc = smc_conn_create(smc, ini);
 	if (rc) {
-		mutex_unlock(&smc_server_lgr_pending);
 		return rc;
 	}
 
@@ -1378,7 +1364,6 @@ static int smc_connect_ism(struct smc_sock *smc,
 				  aclc->hdr.version, eid, NULL);
 	if (rc)
 		goto connect_abort;
-	mutex_unlock(&smc_server_lgr_pending);
 
 	smc_copy_sock_settings_to_clc(smc);
 	smc->connect_nonblock = 0;
@@ -1388,7 +1373,6 @@ static int smc_connect_ism(struct smc_sock *smc,
 	return 0;
 connect_abort:
 	smc_conn_abort(smc, ini->first_contact_local);
-	mutex_unlock(&smc_server_lgr_pending);
 	smc->connect_nonblock = 0;
 
 	return rc;
@@ -1504,6 +1488,9 @@ static int __smc_connect(struct smc_sock *smc)
 
 	SMC_STAT_CLNT_SUCC_INC(sock_net(smc->clcsock->sk), aclc);
 	smc_connect_ism_vlan_cleanup(smc, ini);
+	if (ini->first_contact_local)
+		smc_lgr_decision_maker_on_first_contact_done(ini, true /* success */);
+
 	kfree(buf);
 	kfree(ini);
 	return 0;
@@ -1512,6 +1499,8 @@ static int __smc_connect(struct smc_sock *smc)
 	smc_connect_ism_vlan_cleanup(smc, ini);
 	kfree(buf);
 fallback:
+	if (ini->first_contact_local)
+		smc_lgr_decision_maker_on_first_contact_done(ini, false /* fail */);
 	kfree(ini);
 	return smc_connect_decline_fallback(smc, rc, version);
 }
@@ -2378,7 +2367,6 @@ static void smc_listen_work(struct work_struct *work)
 	if (rc)
 		goto out_decl;
 
-	mutex_lock(&smc_server_lgr_pending);
 	smc_close_init(new_smc);
 	smc_rx_init(new_smc);
 	smc_tx_init(new_smc);
@@ -2386,18 +2374,14 @@ static void smc_listen_work(struct work_struct *work)
 	/* determine ISM or RoCE device used for connection */
 	rc = smc_listen_find_device(new_smc, pclc, ini);
 	if (rc)
-		goto out_unlock;
+		goto out_decl;
 
 	/* send SMC Accept CLC message */
 	accept_version = ini->is_smcd ? ini->smcd_version : ini->smcr_version;
 	rc = smc_clc_send_accept(new_smc, ini->first_contact_local,
 				 accept_version, ini->negotiated_eid);
 	if (rc)
-		goto out_unlock;
-
-	/* SMC-D does not need this lock any more */
-	if (ini->is_smcd)
-		mutex_unlock(&smc_server_lgr_pending);
+		goto out_decl;
 
 	/* receive SMC Confirm CLC message */
 	memset(buf, 0, sizeof(*buf));
@@ -2405,8 +2389,6 @@ static void smc_listen_work(struct work_struct *work)
 	rc = smc_clc_wait_msg(new_smc, cclc, sizeof(*buf),
 			      SMC_CLC_CONFIRM, CLC_WAIT_TIME);
 	if (rc) {
-		if (!ini->is_smcd)
-			goto out_unlock;
 		goto out_decl;
 	}
 
@@ -2415,17 +2397,18 @@ static void smc_listen_work(struct work_struct *work)
 		rc = smc_listen_rdma_finish(new_smc, cclc,
 					    ini->first_contact_local, ini);
 		if (rc)
-			goto out_unlock;
-		mutex_unlock(&smc_server_lgr_pending);
+			goto out_decl;
 	}
 	smc_conn_save_peer_info(new_smc, cclc);
 	smc_listen_out_connected(new_smc);
 	SMC_STAT_SERV_SUCC_INC(sock_net(newclcsock->sk), ini);
+	if (ini->first_contact_local)
+		smc_lgr_decision_maker_on_first_contact_done(ini, true /* success */);
 	goto out_free;
 
-out_unlock:
-	mutex_unlock(&smc_server_lgr_pending);
 out_decl:
+	if (ini && ini->first_contact_local)
+		smc_lgr_decision_maker_on_first_contact_done(ini, false /* fail */);
 	smc_listen_decline(new_smc, rc, ini ? ini->first_contact_local : 0,
 			   proposal_version);
 out_free:
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index e6ee797..fff41c3 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -46,6 +46,282 @@ struct smc_lgr_list smc_lgr_list = {	/* established link groups */
 	.num = 0,
 };
 
+struct smc_lgr_decision_maker {
+	struct rhash_head	rnode;	/* node for rhashtable */
+	struct wait_queue_head	wq;	/* queue for connection that have been
+					 * decided to wait
+					 */
+	spinlock_t		lock;	/* protection for decision maker */
+	refcount_t		ref;	/* refcount for decision maker */
+	int			type;	/* smc type */
+	int			role;	/* smc role */
+	unsigned long		pending_capability;
+					/* maximum pending number of connections that
+					 * need to wait.
+					 */
+	unsigned long		conns_pending;
+					/* the number of pending connections */
+	u32			lacking_first_contact;
+					/* indicate that the connection
+					 * should perform first contact.
+					 */
+};
+
+struct smcr_lgr_decision_maker {
+	struct smc_lgr_decision_maker	ldm;
+	u8	peer_systemid[SMC_SYSTEMID_LEN];
+	u8	peer_mac[ETH_ALEN];	/* = gid[8:10||13:15] */
+	u8	peer_gid[SMC_GID_SIZE];	/* gid of peer*/
+	int	clcqpn;
+};
+
+struct smcd_lgr_decision_maker {
+	struct smc_lgr_decision_maker  ldm;
+	u64	peer_gid;
+	struct	smcd_dev *dev;
+};
+
+static int smcr_lgr_decision_maker_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+	const struct smc_init_info *ini = arg->key;
+	const struct smcr_lgr_decision_maker *rldm = obj;
+
+	if (ini->role != rldm->ldm.role)
+		return 1;
+
+	if (memcmp(ini->peer_systemid, rldm->peer_systemid, SMC_SYSTEMID_LEN))
+		return 1;
+
+	if (memcmp(ini->peer_gid, rldm->peer_gid, SMC_GID_SIZE))
+		return 1;
+
+	if ((ini->role == SMC_SERV || ini->ib_clcqpn == rldm->clcqpn) &&
+	    (ini->smcr_version == SMC_V2 ||
+	    !memcmp(ini->peer_mac, rldm->peer_mac, ETH_ALEN)))
+		return 0;
+
+	return 1;
+}
+
+static u32 smcr_lgr_decision_maker_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct smcr_lgr_decision_maker *rldm = data;
+
+	return jhash2((u32 *)rldm->peer_systemid, SMC_SYSTEMID_LEN / sizeof(u32), seed)
+		+ ((rldm->ldm.role == SMC_SERV) ? 0 : rldm->clcqpn);
+}
+
+static u32 smcr_lgr_decision_maker_arg_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct smc_init_info *ini = data;
+
+	return jhash2((u32 *)ini->peer_systemid, SMC_SYSTEMID_LEN / sizeof(u32), seed)
+		+ ((ini->role == SMC_SERV) ? 0 : ini->ib_clcqpn);
+}
+
+static void smcr_lgr_decision_maker_init(struct smc_lgr_decision_maker *ldm,
+					 struct smc_init_info *ini)
+{
+	struct smcr_lgr_decision_maker *rldm = (struct smcr_lgr_decision_maker *)ldm;
+
+	memcpy(rldm->peer_systemid, ini->peer_systemid, SMC_SYSTEMID_LEN);
+	memcpy(rldm->peer_gid, ini->peer_gid, SMC_GID_SIZE);
+	memcpy(rldm->peer_mac, ini->peer_mac, ETH_ALEN);
+	rldm->clcqpn = ini->ib_clcqpn;
+}
+
+static int smcd_lgr_decision_maker_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+	const struct smc_init_info *ini = arg->key;
+	const struct smcd_lgr_decision_maker *dldm = obj;
+
+	if (ini->role != dldm->ldm.role)
+		return 1;
+
+	if (ini->ism_peer_gid[ini->ism_selected] != dldm->peer_gid)
+		return 1;
+
+	if (ini->ism_dev[ini->ism_selected] != dldm->dev)
+		return 1;
+
+	return 0;
+}
+
+static u32 smcd_lgr_decision_maker_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct smcd_lgr_decision_maker *dlcm = data;
+
+	return jhash2((u32 *)&dlcm->peer_gid, sizeof(dlcm->peer_gid) / sizeof(u32), seed);
+}
+
+static u32 smcd_lgr_decision_maker_arg_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct smc_init_info *ini = data;
+	u64 select_gid;
+
+	select_gid = ini->ism_peer_gid[ini->ism_selected];
+	return jhash2((u32 *)&select_gid, sizeof(select_gid) / sizeof(u32), seed);
+}
+
+static void smcd_lgr_decision_maker_init(struct smc_lgr_decision_maker *ldm,
+					 struct smc_init_info *ini)
+{
+	struct smcd_lgr_decision_maker *dldm = (struct smcd_lgr_decision_maker *)ldm;
+
+	dldm->peer_gid = ini->ism_peer_gid[ini->ism_selected];
+	dldm->dev = ini->ism_dev[ini->ism_selected];
+}
+
+struct smc_lgr_decision_builder {
+	struct rhashtable	map;
+	spinlock_t	map_lock;	/* protect map */
+	struct rhashtable_params	default_params;
+					/* how to serach and insert decision maker by ini */
+	void (*init)(struct smc_lgr_decision_maker *ldm, struct smc_init_info *ini);
+					/* init maker by ini */
+	u32	sz;			/* size */
+};
+
+static struct smc_lgr_decision_builder	smc_lgr_decision_set[SMC_TYPE_D + 1] = {
+	/* SMC_TYPE_R = 0 */
+	{
+		.sz		= sizeof(struct smcr_lgr_decision_maker),
+		.init		= smcr_lgr_decision_maker_init,
+		.map_lock	= __SPIN_LOCK_UNLOCKED(smc_lgr_decision_set[SMC_TYPE_R].map_lock),
+		.default_params = {
+			.head_offset = offsetof(struct smc_lgr_decision_maker, rnode),
+			.key_len = sizeof(struct smc_init_info),
+			.obj_cmpfn = smcr_lgr_decision_maker_cmpfn,
+			.obj_hashfn = smcr_lgr_decision_maker_hashfn,
+			.hashfn = smcr_lgr_decision_maker_arg_hashfn,
+			.automatic_shrinking = true,
+		},
+	},
+	/* SMC_TYPE_D = 1 */
+	{
+		.sz		= sizeof(struct smcd_lgr_decision_maker),
+		.init		= smcd_lgr_decision_maker_init,
+		.map_lock	= __SPIN_LOCK_UNLOCKED(smc_lgr_decision_set[SMC_TYPE_D].map_lock),
+		.default_params = {
+			.head_offset = offsetof(struct smc_lgr_decision_maker, rnode),
+			.key_len = sizeof(struct smc_init_info),
+			.obj_cmpfn = smcd_lgr_decision_maker_cmpfn,
+			.obj_hashfn = smcd_lgr_decision_maker_hashfn,
+			.hashfn = smcd_lgr_decision_maker_arg_hashfn,
+			.automatic_shrinking = true,
+		},
+	},
+};
+
+/* hold a reference for smc_lgr_decision_maker */
+static inline void smc_lgr_decision_maker_hold(struct smc_lgr_decision_maker *ldm)
+{
+	if (likely(ldm))
+		refcount_inc(&ldm->ref);
+}
+
+/* release a reference for smc_lgr_decision_maker */
+static inline void smc_lgr_decision_maker_put(struct smc_lgr_decision_maker *ldm)
+{
+	bool do_free = false;
+	int type;
+
+	if (unlikely(!ldm))
+		return;
+
+	if (refcount_dec_not_one(&ldm->ref))
+		return;
+
+	type = ldm->type;
+
+	spin_lock_bh(&smc_lgr_decision_set[type].map_lock);
+	/* last ref */
+	if (refcount_dec_and_test(&ldm->ref)) {
+		do_free = true;
+		rhashtable_remove_fast(&smc_lgr_decision_set[type].map, &ldm->rnode,
+				       smc_lgr_decision_set[type].default_params);
+	}
+	spin_unlock_bh(&smc_lgr_decision_set[type].map_lock);
+	if (do_free)
+		kfree(ldm);
+}
+
+static struct smc_lgr_decision_maker *
+smc_get_or_create_lgr_decision_maker(struct smc_init_info *ini)
+{
+	struct smc_lgr_decision_maker *ldm;
+	int err, type;
+
+	type = ini->is_smcd ? SMC_TYPE_D : SMC_TYPE_R;
+
+	spin_lock_bh(&smc_lgr_decision_set[type].map_lock);
+	ldm = rhashtable_lookup_fast(&smc_lgr_decision_set[type].map, ini,
+				     smc_lgr_decision_set[type].default_params);
+	if (!ldm) {
+		ldm = kzalloc(smc_lgr_decision_set[type].sz, GFP_ATOMIC);
+		if (unlikely(!ldm))
+			goto fail;
+
+		/* common init */
+		spin_lock_init(&ldm->lock);
+		init_waitqueue_head(&ldm->wq);
+		refcount_set(&ldm->ref, 1);
+		ldm->type = type;
+		ldm->role = ini->role;
+
+		/* init */
+		if (smc_lgr_decision_set[type].init)
+			smc_lgr_decision_set[type].init(ldm, ini);
+
+		err = rhashtable_insert_fast(&smc_lgr_decision_set[type].map,
+					     &ldm->rnode,
+					     smc_lgr_decision_set[type].default_params);
+		if (unlikely(err)) {
+			pr_warn_ratelimited("smc: rhashtable_insert_fast failed (%d)", err);
+			kfree(ldm);
+			ldm = NULL;
+		}
+	} else {
+		smc_lgr_decision_maker_hold(ldm);
+	}
+fail:
+	spin_unlock_bh(&smc_lgr_decision_set[type].map_lock);
+	return ldm;
+}
+
+void smc_lgr_decision_maker_on_first_contact_done(struct smc_init_info *ini, bool success)
+{
+	struct smc_lgr_decision_maker *ldm;
+	int nr;
+
+	if (unlikely(!ini->first_contact_local))
+		return;
+
+	/* get lgr decision maker */
+	ldm = ini->ldm;
+
+	if (unlikely(!ldm))
+		return;
+
+	spin_lock_bh(&ldm->lock);
+	ldm->pending_capability -= (SMC_RMBS_PER_LGR_MAX - 1);
+	nr = SMC_RMBS_PER_LGR_MAX - 1;
+	if (unlikely(!success) && ldm->role == SMC_SERV) {
+		ldm->lacking_first_contact++;
+		/* only to wake up one connection to perfrom
+		 * first contact in server side, client MUST wake up
+		 * all to decline.
+		 */
+		nr = 1;
+	}
+	spin_unlock_bh(&ldm->lock);
+	if (nr)
+		wake_up_nr(&ldm->wq, nr);
+
+	/* hold in smc_lgr_create */
+	smc_lgr_decision_maker_put(ldm);
+}
+
 static atomic_t lgr_cnt = ATOMIC_INIT(0); /* number of existing link groups */
 static DECLARE_WAIT_QUEUE_HEAD(lgrs_deleted);
 
@@ -756,6 +1032,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
 	lnk->link_id = smcr_next_link_id(lgr);
 	lnk->lgr = lgr;
 	smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */
+	rwlock_init(&lnk->rtokens_lock);
 	lnk->link_idx = link_idx;
 	lnk->wr_rx_id_compl = 0;
 	smc_ibdev_cnt_inc(lnk);
@@ -914,6 +1191,11 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
 		atomic_inc(&lgr_cnt);
 	}
 	smc->conn.lgr = lgr;
+
+	lgr->ldm = ini->ldm;
+	/* smc_lgr_decision_maker_put in __smc_lgr_free() */
+	smc_lgr_decision_maker_hold(lgr->ldm);
+
 	spin_lock_bh(lgr_lock);
 	list_add_tail(&lgr->list, lgr_list);
 	spin_unlock_bh(lgr_lock);
@@ -1363,6 +1645,9 @@ static void __smc_lgr_free(struct smc_link_group *lgr)
 		if (!atomic_dec_return(&lgr_cnt))
 			wake_up(&lgrs_deleted);
 	}
+	/* smc_lgr_decision_maker_hold in smc_lgr_create() */
+	if (lgr->ldm)
+		smc_lgr_decision_maker_put(lgr->ldm);
 	kfree(lgr);
 }
 
@@ -1851,11 +2136,13 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 {
 	struct smc_connection *conn = &smc->conn;
 	struct net *net = sock_net(&smc->sk);
+	DECLARE_WAITQUEUE(wait, current);
+	struct smc_lgr_decision_maker *ldm = NULL;
 	struct list_head *lgr_list;
 	struct smc_link_group *lgr;
 	enum smc_lgr_role role;
 	spinlock_t *lgr_lock;
-	int rc = 0;
+	int rc = 0, timeo = CLC_WAIT_TIME;
 
 	lgr_list = ini->is_smcd ? &ini->ism_dev[ini->ism_selected]->lgr_list :
 				  &smc_lgr_list.list;
@@ -1863,12 +2150,24 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 				  &smc_lgr_list.lock;
 	ini->first_contact_local = 1;
 	role = smc->listen_smc ? SMC_SERV : SMC_CLNT;
-	if (role == SMC_CLNT && ini->first_contact_peer)
+	ini->role = role;
+
+	ldm = smc_get_or_create_lgr_decision_maker(ini);
+	if (unlikely(!ldm))
+		return SMC_CLC_DECL_INTERR;
+
+	if (role == SMC_CLNT && ini->first_contact_peer) {
+		spin_lock_bh(&ldm->lock);
+		ldm->pending_capability += (SMC_RMBS_PER_LGR_MAX - 1);
+		spin_unlock_bh(&ldm->lock);
 		/* create new link group as well */
 		goto create;
+	}
 
 	/* determine if an existing link group can be reused */
 	spin_lock_bh(lgr_lock);
+	spin_lock(&ldm->lock);
+again:
 	list_for_each_entry(lgr, lgr_list, list) {
 		write_lock_bh(&lgr->conns_lock);
 		if ((ini->is_smcd ?
@@ -1895,9 +2194,42 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 		}
 		write_unlock_bh(&lgr->conns_lock);
 	}
+	/* make decision */
+	if (ini->first_contact_local) {
+		if (ldm->pending_capability > ldm->conns_pending) {
+			ldm->conns_pending++;
+			add_wait_queue(&ldm->wq, &wait);
+			spin_unlock(&ldm->lock);
+			spin_unlock_bh(lgr_lock);
+			set_current_state(TASK_INTERRUPTIBLE);
+			/* need to wait at least once first contact done */
+			timeo = schedule_timeout(timeo);
+			set_current_state(TASK_RUNNING);
+			remove_wait_queue(&ldm->wq, &wait);
+			spin_lock_bh(lgr_lock);
+			spin_lock(&ldm->lock);
+
+			ldm->conns_pending--;
+			if (likely(timeo && !ldm->lacking_first_contact))
+				goto again;
+
+			/* lnk create failed, only server side can raise
+			 * a new first contact. client side here will
+			 * fallback by SMC_CLC_DECL_SYNCERR.
+			 */
+			if (role == SMC_SERV && ldm->lacking_first_contact)
+				ldm->lacking_first_contact--;
+		}
+		if (role == SMC_SERV) {
+			/* first_contact */
+			ldm->pending_capability += (SMC_RMBS_PER_LGR_MAX - 1);
+		}
+	}
+
+	spin_unlock(&ldm->lock);
 	spin_unlock_bh(lgr_lock);
 	if (rc)
-		return rc;
+		goto out;
 
 	if (role == SMC_CLNT && !ini->first_contact_peer &&
 	    ini->first_contact_local) {
@@ -1905,11 +2237,15 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 		 * a new one
 		 * send out_of_sync decline, reason synchr. error
 		 */
-		return SMC_CLC_DECL_SYNCERR;
+		rc = SMC_CLC_DECL_SYNCERR;
+		goto out;
 	}
 
 create:
 	if (ini->first_contact_local) {
+		ini->ldm = ldm;
+		/* smc_lgr_decision_maker_put in first_contact_done() */
+		smc_lgr_decision_maker_hold(ldm);
 		rc = smc_lgr_create(smc, ini);
 		if (rc)
 			goto out;
@@ -1942,6 +2278,8 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 #endif
 
 out:
+	/* smc_lgr_decision_maker_hold in smc_get_or_create_lgr_decision_make() */
+	smc_lgr_decision_maker_put(ldm);
 	return rc;
 }
 
@@ -2504,19 +2842,24 @@ int smc_rtoken_add(struct smc_link *lnk, __be64 nw_vaddr, __be32 nw_rkey)
 	u32 rkey = ntohl(nw_rkey);
 	int i;
 
+	write_lock_bh(&lnk->rtokens_lock);
 	for (i = 0; i < SMC_RMBS_PER_LGR_MAX; i++) {
 		if (lgr->rtokens[i][lnk->link_idx].rkey == rkey &&
 		    lgr->rtokens[i][lnk->link_idx].dma_addr == dma_addr &&
 		    test_bit(i, lgr->rtokens_used_mask)) {
 			/* already in list */
+			write_unlock_bh(&lnk->rtokens_lock);
 			return i;
 		}
 	}
 	i = smc_rmb_reserve_rtoken_idx(lgr);
-	if (i < 0)
+	if (i < 0) {
+		write_unlock_bh(&lnk->rtokens_lock);
 		return i;
+	}
 	lgr->rtokens[i][lnk->link_idx].rkey = rkey;
 	lgr->rtokens[i][lnk->link_idx].dma_addr = dma_addr;
+	write_unlock_bh(&lnk->rtokens_lock);
 	return i;
 }
 
@@ -2527,6 +2870,7 @@ int smc_rtoken_delete(struct smc_link *lnk, __be32 nw_rkey)
 	u32 rkey = ntohl(nw_rkey);
 	int i, j;
 
+	write_lock_bh(&lnk->rtokens_lock);
 	for (i = 0; i < SMC_RMBS_PER_LGR_MAX; i++) {
 		if (lgr->rtokens[i][lnk->link_idx].rkey == rkey &&
 		    test_bit(i, lgr->rtokens_used_mask)) {
@@ -2535,9 +2879,11 @@ int smc_rtoken_delete(struct smc_link *lnk, __be32 nw_rkey)
 				lgr->rtokens[i][j].dma_addr = 0;
 			}
 			clear_bit(i, lgr->rtokens_used_mask);
+			write_unlock_bh(&lnk->rtokens_lock);
 			return 0;
 		}
 	}
+	write_unlock_bh(&lnk->rtokens_lock);
 	return -ENOENT;
 }
 
@@ -2603,12 +2949,31 @@ static int smc_core_reboot_event(struct notifier_block *this,
 
 int __init smc_core_init(void)
 {
+	int i;
+
+	/* init smc lgr decision maker builder */
+	for (i = 0; i <= SMC_TYPE_D; i++)
+		rhashtable_init(&smc_lgr_decision_set[i].map,
+				&smc_lgr_decision_set[i].default_params);
+
 	return register_reboot_notifier(&smc_reboot_notifier);
 }
 
+static void smc_lgr_decision_maker_free_cb(void *ptr, void *arg)
+{
+	kfree(ptr);
+}
+
 /* Called (from smc_exit) when module is removed */
 void smc_core_exit(void)
 {
+	int i;
+
 	unregister_reboot_notifier(&smc_reboot_notifier);
 	smc_lgrs_shutdown();
+
+	/* destroy smc lgr decision maker builder */
+	for (i = 0; i <= SMC_TYPE_D; i++)
+		rhashtable_free_and_destroy(&smc_lgr_decision_set[i].map,
+					    smc_lgr_decision_maker_free_cb, NULL);
 }
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 285f9bd..ad22751 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -15,6 +15,7 @@
 #include <linux/atomic.h>
 #include <linux/smc.h>
 #include <linux/pci.h>
+#include <linux/rhashtable.h>
 #include <rdma/ib_verbs.h>
 #include <net/genetlink.h>
 
@@ -107,6 +108,7 @@ struct smc_link {
 	u32			wr_tx_cnt;	/* number of WR send buffers */
 	wait_queue_head_t	wr_tx_wait;	/* wait for free WR send buf */
 	atomic_t		wr_tx_refcnt;	/* tx refs to link */
+	rwlock_t		rtokens_lock;
 
 	struct smc_wr_buf	*wr_rx_bufs;	/* WR recv payload buffers */
 	struct ib_recv_wr	*wr_rx_ibs;	/* WR recv meta data */
@@ -244,6 +246,8 @@ struct smc_llc_flow {
 	struct smc_llc_qentry *qentry;
 };
 
+struct smc_lgr_decision_maker;
+
 struct smc_link_group {
 	struct list_head	list;
 	struct rb_root		conns_all;	/* connection tree */
@@ -335,6 +339,8 @@ struct smc_link_group {
 						/* peer triggered shutdownn */
 		};
 	};
+	struct smc_lgr_decision_maker	*ldm;
+						/* who decides to create this lgr */
 };
 
 struct smc_clc_msg_local;
@@ -373,6 +379,7 @@ struct smc_init_info {
 	unsigned short		vlan_id;
 	u32			rc;
 	u8			negotiated_eid[SMC_MAX_EID_LEN];
+	struct smc_lgr_decision_maker *ldm;
 	/* SMC-R */
 	u8			smcr_version;
 	u8			check_smcrv2;
@@ -391,6 +398,7 @@ struct smc_init_info {
 	u8			ism_offered_cnt; /* # of ISM devices offered */
 	u8			ism_selected;    /* index of selected ISM dev*/
 	u8			smcd_version;
+	u8			role;
 };
 
 /* Find the connection associated with the given alert token in the link group.
@@ -559,6 +567,8 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
 int smcr_nl_get_link(struct sk_buff *skb, struct netlink_callback *cb);
 int smcd_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb);
 
+void smc_lgr_decision_maker_on_first_contact_done(struct smc_init_info *ini, bool success);
+
 static inline struct smc_link_group *smc_get_lgr(struct smc_link *link)
 {
 	return link->lgr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 02/10] net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending D.Wythe
@ 2022-10-23 12:43 ` D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 03/10] net/smc: allow confirm/delete rkey response deliver multiplex D.Wythe
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

As commit 4940a1fdf31c ("net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB
error cause by server") mentioned, it works only when all connection
creations are completely protected by smc_server_lgr_pending, since we
already remove this lock, we need to re-fix the issues.

Fixes: 4940a1fdf31c ("net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c   |  2 ++
 net/smc/smc_core.c | 11 ++++++++---
 net/smc/smc_core.h | 19 +++++++++++++++++++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index e369ab6..b55d9ad4 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2399,6 +2399,7 @@ static void smc_listen_work(struct work_struct *work)
 		if (rc)
 			goto out_decl;
 	}
+	smc_conn_leave_rtoken_pending(new_smc, ini);
 	smc_conn_save_peer_info(new_smc, cclc);
 	smc_listen_out_connected(new_smc);
 	SMC_STAT_SERV_SUCC_INC(sock_net(newclcsock->sk), ini);
@@ -2407,6 +2408,7 @@ static void smc_listen_work(struct work_struct *work)
 	goto out_free;
 
 out_decl:
+	smc_conn_leave_rtoken_pending(new_smc, ini);
 	if (ini && ini->first_contact_local)
 		smc_lgr_decision_maker_on_first_contact_done(ini, false /* fail */);
 	smc_listen_decline(new_smc, rc, ini ? ini->first_contact_local : 0,
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index fff41c3..c76e9a4 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -2182,14 +2182,19 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 		     lgr->vlan_id == ini->vlan_id) &&
 		    (role == SMC_CLNT || ini->is_smcd ||
 		    (lgr->conns_num < SMC_RMBS_PER_LGR_MAX &&
-		      !bitmap_full(lgr->rtokens_used_mask, SMC_RMBS_PER_LGR_MAX)))) {
+		    (SMC_RMBS_PER_LGR_MAX -
+			bitmap_weight(lgr->rtokens_used_mask, SMC_RMBS_PER_LGR_MAX)
+				> atomic_read(&lgr->rtoken_pendings))))) {
 			/* link group found */
 			ini->first_contact_local = 0;
 			conn->lgr = lgr;
 			rc = smc_lgr_register_conn(conn, false);
 			write_unlock_bh(&lgr->conns_lock);
-			if (!rc && delayed_work_pending(&lgr->free_work))
-				cancel_delayed_work(&lgr->free_work);
+			if (!rc) {
+				smc_conn_enter_rtoken_pending(smc, ini);
+				if (delayed_work_pending(&lgr->free_work))
+					cancel_delayed_work(&lgr->free_work);
+			}
 			break;
 		}
 		write_unlock_bh(&lgr->conns_lock);
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index ad22751..dae2983 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -253,6 +253,9 @@ struct smc_link_group {
 	struct rb_root		conns_all;	/* connection tree */
 	rwlock_t		conns_lock;	/* protects conns_all */
 	unsigned int		conns_num;	/* current # of connections */
+	atomic_t		rtoken_pendings;/* number of connection that
+						 * lgr assigned but no rtoken got yet
+						 */
 	unsigned short		vlan_id;	/* vlan id of link group */
 
 	struct list_head	sndbufs[SMC_RMBE_SIZES];/* tx buffers */
@@ -569,6 +572,22 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
 
 void smc_lgr_decision_maker_on_first_contact_done(struct smc_init_info *ini, bool success);
 
+static inline void smc_conn_enter_rtoken_pending(struct smc_sock *smc, struct smc_init_info *ini)
+{
+	struct smc_link_group *lgr = smc->conn.lgr;
+
+	if (lgr && !ini->first_contact_local)
+		atomic_inc(&lgr->rtoken_pendings);
+}
+
+static inline void smc_conn_leave_rtoken_pending(struct smc_sock *smc, struct smc_init_info *ini)
+{
+	struct smc_link_group *lgr = smc->conn.lgr;
+
+	if (lgr && !ini->first_contact_local)
+		atomic_dec(&lgr->rtoken_pendings);
+}
+
 static inline struct smc_link_group *smc_get_lgr(struct smc_link *link)
 {
 	return link->lgr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 03/10] net/smc: allow confirm/delete rkey response deliver multiplex
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 02/10] net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending D.Wythe
@ 2022-10-23 12:43 ` D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 04/10] net/smc: make SMC_LLC_FLOW_RKEY run concurrently D.Wythe
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

We know that all flows except confirm_rkey and delete_rkey are exclusive,
confirm/delete rkey flows can run concurrently (local and remote).

Although the protocol allows, all flows are actually mutually exclusive
in implementation, dues to waiting for LLC messages is in serial.

This aggravates the time for establishing or destroying a SMC-R
connections, connections have to be queued in smc_llc_wait.

We use rtokens or rkey to correlate a confirm/delete rkey message
with its response.

Before sending a message, we put context with rtokens or rkey into
wait queue. When a response message received, we wakeup the context
which with the same rtokens or rkey against the response message.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_llc.c | 174 +++++++++++++++++++++++++++++++++++++++++-------------
 net/smc/smc_wr.c  |  10 ----
 net/smc/smc_wr.h  |  10 ++++
 3 files changed, 143 insertions(+), 51 deletions(-)

diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index 524649d..24f9488 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -200,6 +200,7 @@ struct smc_llc_msg_delete_rkey_v2 {	/* type 0x29 */
 struct smc_llc_qentry {
 	struct list_head list;
 	struct smc_link *link;
+	void *private;
 	union smc_llc_msg msg;
 };
 
@@ -479,19 +480,17 @@ int smc_llc_send_confirm_link(struct smc_link *link,
 	return rc;
 }
 
-/* send LLC confirm rkey request */
-static int smc_llc_send_confirm_rkey(struct smc_link *send_link,
-				     struct smc_buf_desc *rmb_desc)
+/* build LLC confirm rkey request */
+static int smc_llc_build_confirm_rkey_request(struct smc_link *send_link,
+					      struct smc_buf_desc *rmb_desc,
+					      struct smc_wr_tx_pend_priv **priv)
 {
 	struct smc_llc_msg_confirm_rkey *rkeyllc;
-	struct smc_wr_tx_pend_priv *pend;
 	struct smc_wr_buf *wr_buf;
 	struct smc_link *link;
 	int i, rc, rtok_ix;
 
-	if (!smc_wr_tx_link_hold(send_link))
-		return -ENOLINK;
-	rc = smc_llc_add_pending_send(send_link, &wr_buf, &pend);
+	rc = smc_llc_add_pending_send(send_link, &wr_buf, priv);
 	if (rc)
 		goto put_out;
 	rkeyllc = (struct smc_llc_msg_confirm_rkey *)wr_buf;
@@ -521,25 +520,20 @@ static int smc_llc_send_confirm_rkey(struct smc_link *send_link,
 		cpu_to_be64((uintptr_t)rmb_desc->cpu_addr) :
 		cpu_to_be64((u64)sg_dma_address
 			    (rmb_desc->sgt[send_link->link_idx].sgl));
-	/* send llc message */
-	rc = smc_wr_tx_send(send_link, pend);
 put_out:
-	smc_wr_tx_link_put(send_link);
 	return rc;
 }
 
-/* send LLC delete rkey request */
-static int smc_llc_send_delete_rkey(struct smc_link *link,
-				    struct smc_buf_desc *rmb_desc)
+/* build LLC delete rkey request */
+static int smc_llc_build_delete_rkey_request(struct smc_link *link,
+					     struct smc_buf_desc *rmb_desc,
+					     struct smc_wr_tx_pend_priv **priv)
 {
 	struct smc_llc_msg_delete_rkey *rkeyllc;
-	struct smc_wr_tx_pend_priv *pend;
 	struct smc_wr_buf *wr_buf;
 	int rc;
 
-	if (!smc_wr_tx_link_hold(link))
-		return -ENOLINK;
-	rc = smc_llc_add_pending_send(link, &wr_buf, &pend);
+	rc = smc_llc_add_pending_send(link, &wr_buf, priv);
 	if (rc)
 		goto put_out;
 	rkeyllc = (struct smc_llc_msg_delete_rkey *)wr_buf;
@@ -548,10 +542,7 @@ static int smc_llc_send_delete_rkey(struct smc_link *link,
 	smc_llc_init_msg_hdr(&rkeyllc->hd, link->lgr, sizeof(*rkeyllc));
 	rkeyllc->num_rkeys = 1;
 	rkeyllc->rkey[0] = htonl(rmb_desc->mr[link->link_idx]->rkey);
-	/* send llc message */
-	rc = smc_wr_tx_send(link, pend);
 put_out:
-	smc_wr_tx_link_put(link);
 	return rc;
 }
 
@@ -2017,7 +2008,8 @@ static void smc_llc_rx_response(struct smc_link *link,
 	case SMC_LLC_DELETE_RKEY:
 		if (flowtype != SMC_LLC_FLOW_RKEY || flow->qentry)
 			break;	/* drop out-of-flow response */
-		goto assign;
+		__wake_up(&link->lgr->llc_msg_waiter, TASK_NORMAL, 1, qentry);
+		goto free;
 	case SMC_LLC_CONFIRM_RKEY_CONT:
 		/* not used because max links is 3 */
 		break;
@@ -2026,6 +2018,7 @@ static void smc_llc_rx_response(struct smc_link *link,
 					   qentry->msg.raw.hdr.common.type);
 		break;
 	}
+free:
 	kfree(qentry);
 	return;
 assign:
@@ -2184,25 +2177,98 @@ void smc_llc_link_clear(struct smc_link *link, bool log)
 	cancel_delayed_work_sync(&link->llc_testlink_wrk);
 }
 
+static int smc_llc_rkey_response_wake_function(struct wait_queue_entry *wq_entry,
+					       unsigned int mode, int sync, void *key)
+{
+	struct smc_llc_qentry *except, *incoming;
+	u8 except_llc_type;
+
+	/* not a rkey response */
+	if (!key)
+		return 0;
+
+	except = wq_entry->private;
+	incoming = key;
+
+	except_llc_type = except->msg.raw.hdr.common.llc_type;
+
+	/* except LLC MSG TYPE mismatch */
+	if (except_llc_type != incoming->msg.raw.hdr.common.llc_type)
+		return 0;
+
+	switch (except_llc_type) {
+	case SMC_LLC_CONFIRM_RKEY:
+		if (memcmp(except->msg.confirm_rkey.rtoken, incoming->msg.confirm_rkey.rtoken,
+			   sizeof(struct smc_rmb_rtoken) *
+			   except->msg.confirm_rkey.rtoken[0].num_rkeys))
+			return 0;
+		break;
+	case SMC_LLC_DELETE_RKEY:
+		if (memcmp(except->msg.delete_rkey.rkey, incoming->msg.delete_rkey.rkey,
+			   sizeof(__be32) * except->msg.delete_rkey.num_rkeys))
+			return 0;
+		break;
+	default:
+		pr_warn("smc: invalid except llc msg %d.\n", except_llc_type);
+		return 0;
+	}
+
+	/* match, save hdr */
+	memcpy(&except->msg.raw.hdr, &incoming->msg.raw.hdr, sizeof(except->msg.raw.hdr));
+
+	wq_entry->private = except->private;
+	return woken_wake_function(wq_entry, mode, sync, NULL);
+}
+
 /* register a new rtoken at the remote peer (for all links) */
 int smc_llc_do_confirm_rkey(struct smc_link *send_link,
 			    struct smc_buf_desc *rmb_desc)
 {
+	DEFINE_WAIT_FUNC(wait, smc_llc_rkey_response_wake_function);
 	struct smc_link_group *lgr = send_link->lgr;
-	struct smc_llc_qentry *qentry = NULL;
-	int rc = 0;
+	long timeout = SMC_LLC_WAIT_TIME;
+	struct smc_wr_tx_pend_priv *priv;
+	struct smc_llc_qentry qentry;
+	struct smc_wr_tx_pend *pend;
+	int rc = 0, flags;
 
-	rc = smc_llc_send_confirm_rkey(send_link, rmb_desc);
+	if (!smc_wr_tx_link_hold(send_link))
+		return -ENOLINK;
+
+	rc = smc_llc_build_confirm_rkey_request(send_link, rmb_desc, &priv);
 	if (rc)
 		goto out;
-	/* receive CONFIRM RKEY response from server over RoCE fabric */
-	qentry = smc_llc_wait(lgr, send_link, SMC_LLC_WAIT_TIME,
-			      SMC_LLC_CONFIRM_RKEY);
-	if (!qentry || (qentry->msg.raw.hdr.flags & SMC_LLC_FLAG_RKEY_NEG))
+
+	pend = container_of(priv, struct smc_wr_tx_pend, priv);
+	/* make a copy of send msg */
+	memcpy(&qentry.msg.confirm_rkey, send_link->wr_tx_bufs[pend->idx].raw,
+	       sizeof(qentry.msg.confirm_rkey));
+
+	qentry.private = wait.private;
+	wait.private = &qentry;
+
+	add_wait_queue(&lgr->llc_msg_waiter, &wait);
+
+	/* send llc message */
+	rc = smc_wr_tx_send(send_link, priv);
+	smc_wr_tx_link_put(send_link);
+	if (rc) {
+		remove_wait_queue(&lgr->llc_msg_waiter, &wait);
+		goto out;
+	}
+
+	while (!signal_pending(current) && timeout) {
+		timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout);
+		if (qentry.msg.raw.hdr.flags & SMC_LLC_FLAG_RESP)
+			break;
+	}
+
+	remove_wait_queue(&lgr->llc_msg_waiter, &wait);
+	flags = qentry.msg.raw.hdr.flags;
+
+	if (!(flags & SMC_LLC_FLAG_RESP) || flags & SMC_LLC_FLAG_RKEY_NEG)
 		rc = -EFAULT;
 out:
-	if (qentry)
-		smc_llc_flow_qentry_del(&lgr->llc_flow_lcl);
 	return rc;
 }
 
@@ -2210,26 +2276,52 @@ int smc_llc_do_confirm_rkey(struct smc_link *send_link,
 int smc_llc_do_delete_rkey(struct smc_link_group *lgr,
 			   struct smc_buf_desc *rmb_desc)
 {
-	struct smc_llc_qentry *qentry = NULL;
+	DEFINE_WAIT_FUNC(wait, smc_llc_rkey_response_wake_function);
+	long timeout = SMC_LLC_WAIT_TIME;
+	struct smc_wr_tx_pend_priv *priv;
+	struct smc_llc_qentry qentry;
+	struct smc_wr_tx_pend *pend;
 	struct smc_link *send_link;
-	int rc = 0;
+	int rc = 0, flags;
 
 	send_link = smc_llc_usable_link(lgr);
-	if (!send_link)
+	if (!send_link || !smc_wr_tx_link_hold(send_link))
 		return -ENOLINK;
 
-	/* protected by llc_flow control */
-	rc = smc_llc_send_delete_rkey(send_link, rmb_desc);
+	rc = smc_llc_build_delete_rkey_request(send_link, rmb_desc, &priv);
 	if (rc)
 		goto out;
-	/* receive DELETE RKEY response from server over RoCE fabric */
-	qentry = smc_llc_wait(lgr, send_link, SMC_LLC_WAIT_TIME,
-			      SMC_LLC_DELETE_RKEY);
-	if (!qentry || (qentry->msg.raw.hdr.flags & SMC_LLC_FLAG_RKEY_NEG))
+
+	pend = container_of(priv, struct smc_wr_tx_pend, priv);
+	/* make a copy of send msg */
+	memcpy(&qentry.msg.delete_link, send_link->wr_tx_bufs[pend->idx].raw,
+	       sizeof(qentry.msg.delete_link));
+
+	qentry.private = wait.private;
+	wait.private = &qentry;
+
+	add_wait_queue(&lgr->llc_msg_waiter, &wait);
+
+	/* send llc message */
+	rc = smc_wr_tx_send(send_link, priv);
+	smc_wr_tx_link_put(send_link);
+	if (rc) {
+		remove_wait_queue(&lgr->llc_msg_waiter, &wait);
+		goto out;
+	}
+
+	while (!signal_pending(current) && timeout) {
+		timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout);
+		if (qentry.msg.raw.hdr.flags & SMC_LLC_FLAG_RESP)
+			break;
+	}
+
+	remove_wait_queue(&lgr->llc_msg_waiter, &wait);
+	flags = qentry.msg.raw.hdr.flags;
+
+	if (!(flags & SMC_LLC_FLAG_RESP) || flags & SMC_LLC_FLAG_RKEY_NEG)
 		rc = -EFAULT;
 out:
-	if (qentry)
-		smc_llc_flow_qentry_del(&lgr->llc_flow_lcl);
 	return rc;
 }
 
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index b0678a4..797dffa 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -37,16 +37,6 @@
 static DEFINE_HASHTABLE(smc_wr_rx_hash, SMC_WR_RX_HASH_BITS);
 static DEFINE_SPINLOCK(smc_wr_rx_hash_lock);
 
-struct smc_wr_tx_pend {	/* control data for a pending send request */
-	u64			wr_id;		/* work request id sent */
-	smc_wr_tx_handler	handler;
-	enum ib_wc_status	wc_status;	/* CQE status */
-	struct smc_link		*link;
-	u32			idx;
-	struct smc_wr_tx_pend_priv priv;
-	u8			compl_requested;
-};
-
 /******************************** send queue *********************************/
 
 /*------------------------------- completion --------------------------------*/
diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h
index 45e9b89..a4ea215 100644
--- a/net/smc/smc_wr.h
+++ b/net/smc/smc_wr.h
@@ -46,6 +46,16 @@ struct smc_wr_rx_handler {
 	u8			type;
 };
 
+struct smc_wr_tx_pend {	/* control data for a pending send request */
+	u64			wr_id;		/* work request id sent */
+	smc_wr_tx_handler	handler;
+	enum ib_wc_status	wc_status;	/* CQE status */
+	struct smc_link		*link;
+	u32			idx;
+	struct smc_wr_tx_pend_priv priv;
+	u8			compl_requested;
+};
+
 /* Only used by RDMA write WRs.
  * All other WRs (CDC/LLC) use smc_wr_tx_send handling WR_ID implicitly
  */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 04/10] net/smc: make SMC_LLC_FLOW_RKEY run concurrently
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (2 preceding siblings ...)
  2022-10-23 12:43 ` [PATCH net-next v4 03/10] net/smc: allow confirm/delete rkey response deliver multiplex D.Wythe
@ 2022-10-23 12:43 ` D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 05/10] net/smc: llc_conf_mutex refactor, replace it with rw_semaphore D.Wythe
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

Once confirm/delete rkey response can be multiplex delivered,
We can allow parallel execution of start (remote) or
initialization (local) a SMC_LLC_FLOW_RKEY flow.

This patch will count the flows executed in parallel, and only when
the count reaches zero will the current flow type be removed.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_core.h |  1 +
 net/smc/smc_llc.c  | 69 +++++++++++++++++++++++++++++++++++++++++-------------
 net/smc/smc_llc.h  |  6 +++++
 3 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index dae2983..234d213 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -244,6 +244,7 @@ enum smc_llc_flowtype {
 struct smc_llc_flow {
 	enum smc_llc_flowtype type;
 	struct smc_llc_qentry *qentry;
+	refcount_t	parallel_refcnt;
 };
 
 struct smc_lgr_decision_maker;
diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index 24f9488..700129c 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -231,10 +231,18 @@ static inline void smc_llc_flow_qentry_set(struct smc_llc_flow *flow,
 	flow->qentry = qentry;
 }
 
-static void smc_llc_flow_parallel(struct smc_link_group *lgr, u8 flow_type,
+static void smc_llc_flow_parallel(struct smc_link_group *lgr, struct smc_llc_flow *flow,
 				  struct smc_llc_qentry *qentry)
 {
 	u8 msg_type = qentry->msg.raw.hdr.common.llc_type;
+	u8 flow_type = flow->type;
+
+	/* SMC_LLC_FLOW_RKEY can be parallel */
+	if (flow_type == SMC_LLC_FLOW_RKEY &&
+	    (msg_type == SMC_LLC_CONFIRM_RKEY || msg_type == SMC_LLC_DELETE_RKEY)) {
+		refcount_inc(&flow->parallel_refcnt);
+		return;
+	}
 
 	if ((msg_type == SMC_LLC_ADD_LINK || msg_type == SMC_LLC_DELETE_LINK) &&
 	    flow_type != msg_type && !lgr->delayed_event) {
@@ -261,7 +269,7 @@ static bool smc_llc_flow_start(struct smc_llc_flow *flow,
 	spin_lock_bh(&lgr->llc_flow_lock);
 	if (flow->type) {
 		/* a flow is already active */
-		smc_llc_flow_parallel(lgr, flow->type, qentry);
+		smc_llc_flow_parallel(lgr, flow, qentry);
 		spin_unlock_bh(&lgr->llc_flow_lock);
 		return false;
 	}
@@ -280,6 +288,7 @@ static bool smc_llc_flow_start(struct smc_llc_flow *flow,
 		flow->type = SMC_LLC_FLOW_NONE;
 	}
 	smc_llc_flow_qentry_set(flow, qentry);
+	refcount_set(&flow->parallel_refcnt, 1);
 	spin_unlock_bh(&lgr->llc_flow_lock);
 	return true;
 }
@@ -289,6 +298,7 @@ int smc_llc_flow_initiate(struct smc_link_group *lgr,
 			  enum smc_llc_flowtype type)
 {
 	enum smc_llc_flowtype allowed_remote = SMC_LLC_FLOW_NONE;
+	bool accept = false;
 	int rc;
 
 	/* all flows except confirm_rkey and delete_rkey are exclusive,
@@ -300,10 +310,39 @@ int smc_llc_flow_initiate(struct smc_link_group *lgr,
 	if (list_empty(&lgr->list))
 		return -ENODEV;
 	spin_lock_bh(&lgr->llc_flow_lock);
-	if (lgr->llc_flow_lcl.type == SMC_LLC_FLOW_NONE &&
-	    (lgr->llc_flow_rmt.type == SMC_LLC_FLOW_NONE ||
-	     lgr->llc_flow_rmt.type == allowed_remote)) {
-		lgr->llc_flow_lcl.type = type;
+
+	/* Flow is initialized only if the following conditions are met:
+	 * incoming flow	local flow		remote flow
+	 * exclusive		NONE			NONE
+	 * SMC_LLC_FLOW_RKEY	SMC_LLC_FLOW_RKEY	SMC_LLC_FLOW_RKEY
+	 * SMC_LLC_FLOW_RKEY	NONE			SMC_LLC_FLOW_RKEY
+	 * SMC_LLC_FLOW_RKEY	SMC_LLC_FLOW_RKEY	NONE
+	 */
+	switch (type) {
+	case SMC_LLC_FLOW_RKEY:
+		if (!SMC_IS_PARALLEL_FLOW(lgr->llc_flow_lcl.type))
+			break;
+		if (!SMC_IS_PARALLEL_FLOW(lgr->llc_flow_rmt.type))
+			break;
+		/* accepted */
+		accept = true;
+		break;
+	default:
+		if (!SMC_IS_NONE_FLOW(lgr->llc_flow_lcl.type))
+			break;
+		if (!SMC_IS_NONE_FLOW(lgr->llc_flow_rmt.type))
+			break;
+		/* accepted */
+		accept = true;
+		break;
+	}
+	if (accept) {
+		if (SMC_IS_NONE_FLOW(lgr->llc_flow_lcl.type)) {
+			lgr->llc_flow_lcl.type = type;
+			refcount_set(&lgr->llc_flow_lcl.parallel_refcnt, 1);
+		} else {
+			refcount_inc(&lgr->llc_flow_lcl.parallel_refcnt);
+		}
 		spin_unlock_bh(&lgr->llc_flow_lock);
 		return 0;
 	}
@@ -322,6 +361,10 @@ int smc_llc_flow_initiate(struct smc_link_group *lgr,
 void smc_llc_flow_stop(struct smc_link_group *lgr, struct smc_llc_flow *flow)
 {
 	spin_lock_bh(&lgr->llc_flow_lock);
+	if (!refcount_dec_and_test(&flow->parallel_refcnt)) {
+		spin_unlock_bh(&lgr->llc_flow_lock);
+		return;
+	}
 	memset(flow, 0, sizeof(*flow));
 	flow->type = SMC_LLC_FLOW_NONE;
 	spin_unlock_bh(&lgr->llc_flow_lock);
@@ -1723,16 +1766,14 @@ static void smc_llc_delete_link_work(struct work_struct *work)
 }
 
 /* process a confirm_rkey request from peer, remote flow */
-static void smc_llc_rmt_conf_rkey(struct smc_link_group *lgr)
+static void smc_llc_rmt_conf_rkey(struct smc_link_group *lgr, struct smc_llc_qentry *qentry)
 {
 	struct smc_llc_msg_confirm_rkey *llc;
-	struct smc_llc_qentry *qentry;
 	struct smc_link *link;
 	int num_entries;
 	int rk_idx;
 	int i;
 
-	qentry = lgr->llc_flow_rmt.qentry;
 	llc = &qentry->msg.confirm_rkey;
 	link = qentry->link;
 
@@ -1759,19 +1800,16 @@ static void smc_llc_rmt_conf_rkey(struct smc_link_group *lgr)
 	llc->hd.flags |= SMC_LLC_FLAG_RESP;
 	smc_llc_init_msg_hdr(&llc->hd, link->lgr, sizeof(*llc));
 	smc_llc_send_message(link, &qentry->msg);
-	smc_llc_flow_qentry_del(&lgr->llc_flow_rmt);
 }
 
 /* process a delete_rkey request from peer, remote flow */
-static void smc_llc_rmt_delete_rkey(struct smc_link_group *lgr)
+static void smc_llc_rmt_delete_rkey(struct smc_link_group *lgr, struct smc_llc_qentry *qentry)
 {
 	struct smc_llc_msg_delete_rkey *llc;
-	struct smc_llc_qentry *qentry;
 	struct smc_link *link;
 	u8 err_mask = 0;
 	int i, max;
 
-	qentry = lgr->llc_flow_rmt.qentry;
 	llc = &qentry->msg.delete_rkey;
 	link = qentry->link;
 
@@ -1809,7 +1847,6 @@ static void smc_llc_rmt_delete_rkey(struct smc_link_group *lgr)
 finish:
 	llc->hd.flags |= SMC_LLC_FLAG_RESP;
 	smc_llc_send_message(link, &qentry->msg);
-	smc_llc_flow_qentry_del(&lgr->llc_flow_rmt);
 }
 
 static void smc_llc_protocol_violation(struct smc_link_group *lgr, u8 type)
@@ -1910,7 +1947,7 @@ static void smc_llc_event_handler(struct smc_llc_qentry *qentry)
 		/* new request from remote, assign to remote flow */
 		if (smc_llc_flow_start(&lgr->llc_flow_rmt, qentry)) {
 			/* process here, does not wait for more llc msgs */
-			smc_llc_rmt_conf_rkey(lgr);
+			smc_llc_rmt_conf_rkey(lgr, qentry);
 			smc_llc_flow_stop(lgr, &lgr->llc_flow_rmt);
 		}
 		return;
@@ -1923,7 +1960,7 @@ static void smc_llc_event_handler(struct smc_llc_qentry *qentry)
 		/* new request from remote, assign to remote flow */
 		if (smc_llc_flow_start(&lgr->llc_flow_rmt, qentry)) {
 			/* process here, does not wait for more llc msgs */
-			smc_llc_rmt_delete_rkey(lgr);
+			smc_llc_rmt_delete_rkey(lgr, qentry);
 			smc_llc_flow_stop(lgr, &lgr->llc_flow_rmt);
 		}
 		return;
diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
index 7e7a316..cb217793 100644
--- a/net/smc/smc_llc.h
+++ b/net/smc/smc_llc.h
@@ -49,6 +49,12 @@ enum smc_llc_msg_type {
 #define smc_link_downing(state) \
 	(cmpxchg(state, SMC_LNK_ACTIVE, SMC_LNK_INACTIVE) == SMC_LNK_ACTIVE)
 
+#define SMC_IS_NONE_FLOW(type)		\
+	((type) == SMC_LLC_FLOW_NONE)
+
+#define SMC_IS_PARALLEL_FLOW(type)	\
+	(((type) == SMC_LLC_FLOW_RKEY) || SMC_IS_NONE_FLOW(type))
+
 /* LLC DELETE LINK Request Reason Codes */
 #define SMC_LLC_DEL_LOST_PATH		0x00010000
 #define SMC_LLC_DEL_OP_INIT_TERM	0x00020000
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 05/10] net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (3 preceding siblings ...)
  2022-10-23 12:43 ` [PATCH net-next v4 04/10] net/smc: make SMC_LLC_FLOW_RKEY run concurrently D.Wythe
@ 2022-10-23 12:43 ` D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 06/10] net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() D.Wythe
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

llc_conf_mutex was used to protect links and link related configurations
in the same link group, for example, add or delete links. However,
in most cases, the protected critical area has only read semantics and
with no write semantics at all, such as obtaining a usable link or an
available rmb_desc.

This patch do simply code refactoring, replace mutex with rw_semaphore,
replace mutex_lock with down_write and replace mutex_unlock with
up_write.

Theoretically, this replacement is equivalent, but after this patch,
we can distinguish lock granularity according to different semantics
of critical areas.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c   |  8 ++++----
 net/smc/smc_core.c | 20 ++++++++++----------
 net/smc/smc_core.h |  2 +-
 net/smc/smc_llc.c  | 18 +++++++++---------
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index b55d9ad4..5c12cd7 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -493,7 +493,7 @@ static int smcr_lgr_reg_sndbufs(struct smc_link *link,
 		return -EINVAL;
 
 	/* protect against parallel smcr_link_reg_buf() */
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
 		if (!smc_link_active(&lgr->lnk[i]))
 			continue;
@@ -501,7 +501,7 @@ static int smcr_lgr_reg_sndbufs(struct smc_link *link,
 		if (rc)
 			break;
 	}
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 	return rc;
 }
 
@@ -518,7 +518,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link,
 	/* protect against parallel smc_llc_cli_rkey_exchange() and
 	 * parallel smcr_link_reg_buf()
 	 */
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
 		if (!smc_link_active(&lgr->lnk[i]))
 			continue;
@@ -535,7 +535,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link,
 	}
 	rmb_desc->is_conf_rkey = true;
 out:
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 	smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl);
 	return rc;
 }
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index c76e9a4..b14ad32 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1383,10 +1383,10 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 		rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY);
 		if (!rc) {
 			/* protect against smc_llc_cli_rkey_exchange() */
-			mutex_lock(&lgr->llc_conf_mutex);
+			down_write(&lgr->llc_conf_mutex);
 			smc_llc_do_delete_rkey(lgr, buf_desc);
 			buf_desc->is_conf_rkey = false;
-			mutex_unlock(&lgr->llc_conf_mutex);
+			up_write(&lgr->llc_conf_mutex);
 			smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl);
 		}
 	}
@@ -1657,12 +1657,12 @@ static void smc_lgr_free(struct smc_link_group *lgr)
 	int i;
 
 	if (!lgr->is_smcd) {
-		mutex_lock(&lgr->llc_conf_mutex);
+		down_write(&lgr->llc_conf_mutex);
 		for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
 			if (lgr->lnk[i].state != SMC_LNK_UNUSED)
 				smcr_link_clear(&lgr->lnk[i], false);
 		}
-		mutex_unlock(&lgr->llc_conf_mutex);
+		up_write(&lgr->llc_conf_mutex);
 		smc_llc_lgr_clear(lgr);
 	}
 
@@ -1976,12 +1976,12 @@ static void smcr_link_down(struct smc_link *lnk)
 	} else {
 		if (lgr->llc_flow_lcl.type != SMC_LLC_FLOW_NONE) {
 			/* another llc task is ongoing */
-			mutex_unlock(&lgr->llc_conf_mutex);
+			up_write(&lgr->llc_conf_mutex);
 			wait_event_timeout(lgr->llc_flow_waiter,
 				(list_empty(&lgr->list) ||
 				 lgr->llc_flow_lcl.type == SMC_LLC_FLOW_NONE),
 				SMC_LLC_WAIT_TIME);
-			mutex_lock(&lgr->llc_conf_mutex);
+			down_write(&lgr->llc_conf_mutex);
 		}
 		if (!list_empty(&lgr->list)) {
 			smc_llc_send_delete_link(to_lnk, del_link_id,
@@ -2041,9 +2041,9 @@ static void smc_link_down_work(struct work_struct *work)
 	if (list_empty(&lgr->list))
 		return;
 	wake_up_all(&lgr->llc_msg_waiter);
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	smcr_link_down(link);
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 }
 
 static int smc_vlan_by_tcpsk_walk(struct net_device *lower_dev,
@@ -2585,7 +2585,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr,
 	int i, rc = 0, cnt = 0;
 
 	/* protect against parallel link reconfiguration */
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
 		struct smc_link *lnk = &lgr->lnk[i];
 
@@ -2598,7 +2598,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr,
 		cnt++;
 	}
 out:
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 	if (!rc && !cnt)
 		rc = -EINVAL;
 	return rc;
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 234d213..2a5a51b 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -306,7 +306,7 @@ struct smc_link_group {
 						/* queue for llc events */
 			spinlock_t		llc_event_q_lock;
 						/* protects llc_event_q */
-			struct mutex		llc_conf_mutex;
+			struct rw_semaphore	llc_conf_mutex;
 						/* protects lgr reconfig. */
 			struct work_struct	llc_add_link_work;
 			struct work_struct	llc_del_link_work;
diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index 700129c..edb0cef 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -1236,12 +1236,12 @@ static void smc_llc_process_cli_add_link(struct smc_link_group *lgr)
 
 	qentry = smc_llc_flow_qentry_clr(&lgr->llc_flow_lcl);
 
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	if (smc_llc_is_local_add_link(&qentry->msg))
 		smc_llc_cli_add_link_invite(qentry->link, qentry);
 	else
 		smc_llc_cli_add_link(qentry->link, qentry);
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 }
 
 static int smc_llc_active_link_count(struct smc_link_group *lgr)
@@ -1543,13 +1543,13 @@ static void smc_llc_process_srv_add_link(struct smc_link_group *lgr)
 
 	qentry = smc_llc_flow_qentry_clr(&lgr->llc_flow_lcl);
 
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	rc = smc_llc_srv_add_link(link, qentry);
 	if (!rc && lgr->type == SMC_LGR_SYMMETRIC) {
 		/* delete any asymmetric link */
 		smc_llc_delete_asym_link(lgr);
 	}
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 	kfree(qentry);
 }
 
@@ -1616,7 +1616,7 @@ static void smc_llc_process_cli_delete_link(struct smc_link_group *lgr)
 		smc_lgr_terminate_sched(lgr);
 		goto out;
 	}
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	/* delete single link */
 	for (lnk_idx = 0; lnk_idx < SMC_LINKS_PER_LGR_MAX; lnk_idx++) {
 		if (lgr->lnk[lnk_idx].link_id != del_llc->link_num)
@@ -1650,7 +1650,7 @@ static void smc_llc_process_cli_delete_link(struct smc_link_group *lgr)
 		smc_lgr_terminate_sched(lgr);
 	}
 out_unlock:
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 out:
 	kfree(qentry);
 }
@@ -1686,7 +1686,7 @@ static void smc_llc_process_srv_delete_link(struct smc_link_group *lgr)
 	int active_links;
 	int i;
 
-	mutex_lock(&lgr->llc_conf_mutex);
+	down_write(&lgr->llc_conf_mutex);
 	qentry = smc_llc_flow_qentry_clr(&lgr->llc_flow_lcl);
 	lnk = qentry->link;
 	del_llc = &qentry->msg.delete_link;
@@ -1742,7 +1742,7 @@ static void smc_llc_process_srv_delete_link(struct smc_link_group *lgr)
 		smc_llc_add_link_local(lnk);
 	}
 out:
-	mutex_unlock(&lgr->llc_conf_mutex);
+	up_write(&lgr->llc_conf_mutex);
 	kfree(qentry);
 }
 
@@ -2156,7 +2156,7 @@ void smc_llc_lgr_init(struct smc_link_group *lgr, struct smc_sock *smc)
 	spin_lock_init(&lgr->llc_flow_lock);
 	init_waitqueue_head(&lgr->llc_flow_waiter);
 	init_waitqueue_head(&lgr->llc_msg_waiter);
-	mutex_init(&lgr->llc_conf_mutex);
+	init_rwsem(&lgr->llc_conf_mutex);
 	lgr->llc_testlink_time = READ_ONCE(net->smc.sysctl_smcr_testlink_time);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 06/10] net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse()
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (4 preceding siblings ...)
  2022-10-23 12:43 ` [PATCH net-next v4 05/10] net/smc: llc_conf_mutex refactor, replace it with rw_semaphore D.Wythe
@ 2022-10-23 12:43 ` D.Wythe
  2022-10-23 12:43 ` [PATCH net-next v4 07/10] net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() D.Wythe
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

Following is part of Off-CPU graph during frequent SMC-R short-lived
processing:

process_one_work				(51.19%)
smc_close_passive_work			(28.36%)
	smcr_buf_unuse				(28.34%)
	rwsem_down_write_slowpath		(28.22%)

smc_listen_work				(22.83%)
	smc_clc_wait_msg			(1.84%)
	smc_buf_create				(20.45%)
		smcr_buf_map_usable_links
		rwsem_down_write_slowpath	(20.43%)
	smcr_lgr_reg_rmbs			(0.53%)
		rwsem_down_write_slowpath	(0.43%)
		smc_llc_do_confirm_rkey		(0.08%)

We can clearly see that during the connection establishment time,
waiting time of connections is not on IO, but on llc_conf_mutex.

What is more important, the core critical area (smcr_buf_unuse() &
smc_buf_create()) only perfroms read semantics on links, we can
easily replace it with read semaphore.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index b14ad32..48ad76b 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1383,10 +1383,10 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 		rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY);
 		if (!rc) {
 			/* protect against smc_llc_cli_rkey_exchange() */
-			down_write(&lgr->llc_conf_mutex);
+			down_read(&lgr->llc_conf_mutex);
 			smc_llc_do_delete_rkey(lgr, buf_desc);
 			buf_desc->is_conf_rkey = false;
-			up_write(&lgr->llc_conf_mutex);
+			up_read(&lgr->llc_conf_mutex);
 			smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl);
 		}
 	}
@@ -2585,7 +2585,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr,
 	int i, rc = 0, cnt = 0;
 
 	/* protect against parallel link reconfiguration */
-	down_write(&lgr->llc_conf_mutex);
+	down_read(&lgr->llc_conf_mutex);
 	for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
 		struct smc_link *lnk = &lgr->lnk[i];
 
@@ -2598,7 +2598,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr,
 		cnt++;
 	}
 out:
-	up_write(&lgr->llc_conf_mutex);
+	up_read(&lgr->llc_conf_mutex);
 	if (!rc && !cnt)
 		rc = -EINVAL;
 	return rc;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 07/10] net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (5 preceding siblings ...)
  2022-10-23 12:43 ` [PATCH net-next v4 06/10] net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() D.Wythe
@ 2022-10-23 12:43 ` D.Wythe
  2022-10-23 12:44 ` [PATCH net-next v4 08/10] net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore D.Wythe
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:43 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

Unlike smc_buf_create() and smcr_buf_unuse(), smcr_lgr_reg_rmbs() is
exclusive when assigned rmb_desc was not registered, although it can be
executed in parallel when assigned rmb_desc was registered already
and only performs read semtamics on it. Hence, we can not simply replace
it with read semaphore.

The idea here is that if the assigned rmb_desc was registered already,
use read semaphore to protect the critical section, once the assigned
rmb_desc was not registered, keep using keep write semaphore still
to keep its exclusivity.

Thanks to the reusable features of rmb_desc, which allows us to execute
in parallel in most cases.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 5c12cd7..3bac24e 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -510,11 +510,26 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link,
 			     struct smc_buf_desc *rmb_desc)
 {
 	struct smc_link_group *lgr = link->lgr;
+	bool do_slow = false;
 	int i, rc = 0;
 
 	rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY);
 	if (rc)
 		return rc;
+
+	down_read(&lgr->llc_conf_mutex);
+	for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
+		if (!smc_link_active(&lgr->lnk[i]))
+			continue;
+		if (!rmb_desc->is_reg_mr[link->link_idx]) {
+			up_read(&lgr->llc_conf_mutex);
+			goto slow_path;
+		}
+	}
+	/* mr register already */
+	goto fast_path;
+slow_path:
+	do_slow = true;
 	/* protect against parallel smc_llc_cli_rkey_exchange() and
 	 * parallel smcr_link_reg_buf()
 	 */
@@ -526,7 +541,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link,
 		if (rc)
 			goto out;
 	}
-
+fast_path:
 	/* exchange confirm_rkey msg with peer */
 	rc = smc_llc_do_confirm_rkey(link, rmb_desc);
 	if (rc) {
@@ -535,7 +550,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link,
 	}
 	rmb_desc->is_conf_rkey = true;
 out:
-	up_write(&lgr->llc_conf_mutex);
+	do_slow ? up_write(&lgr->llc_conf_mutex) : up_read(&lgr->llc_conf_mutex);
 	smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl);
 	return rc;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 08/10] net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (6 preceding siblings ...)
  2022-10-23 12:43 ` [PATCH net-next v4 07/10] net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() D.Wythe
@ 2022-10-23 12:44 ` D.Wythe
  2022-10-23 12:44 ` [PATCH net-next v4 09/10] net/smc: Fix potential panic dues to unprotected smc_llc_srv_add_link() D.Wythe
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:44 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

It's clear that rmbs_lock and sndbufs_lock are aims to protect the
rmbs list or the sndbufs list.

During connection establieshment, smc_buf_get_slot() will always
be invoked, and it only performs read semantics in rmbs list and
sndbufs list.

Based on the above considerations, we replace mutex with rw_semaphore.
Only smc_buf_get_slot() use down_read() to allow smc_buf_get_slot()
run concurrently, other part use down_write() to keep exclusive
semantics.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_core.c | 55 +++++++++++++++++++++++++++---------------------------
 net/smc/smc_core.h |  4 ++--
 net/smc/smc_llc.c  | 16 ++++++++--------
 3 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 48ad76b..b09a3fd 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1128,8 +1128,8 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
 	lgr->freeing = 0;
 	lgr->vlan_id = ini->vlan_id;
 	refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */
-	mutex_init(&lgr->sndbufs_lock);
-	mutex_init(&lgr->rmbs_lock);
+	init_rwsem(&lgr->sndbufs_lock);
+	init_rwsem(&lgr->rmbs_lock);
 	rwlock_init(&lgr->conns_lock);
 	for (i = 0; i < SMC_RMBE_SIZES; i++) {
 		INIT_LIST_HEAD(&lgr->sndbufs[i]);
@@ -1375,7 +1375,7 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
 static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 			   struct smc_link_group *lgr)
 {
-	struct mutex *lock;	/* lock buffer list */
+	struct rw_semaphore *lock;	/* lock buffer list */
 	int rc;
 
 	if (is_rmb && buf_desc->is_conf_rkey && !list_empty(&lgr->list)) {
@@ -1395,9 +1395,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 		/* buf registration failed, reuse not possible */
 		lock = is_rmb ? &lgr->rmbs_lock :
 				&lgr->sndbufs_lock;
-		mutex_lock(lock);
+		down_write(lock);
 		list_del(&buf_desc->list);
-		mutex_unlock(lock);
+		up_write(lock);
 
 		smc_buf_free(lgr, is_rmb, buf_desc);
 	} else {
@@ -1501,15 +1501,16 @@ static void smcr_buf_unmap_lgr(struct smc_link *lnk)
 	int i;
 
 	for (i = 0; i < SMC_RMBE_SIZES; i++) {
-		mutex_lock(&lgr->rmbs_lock);
+		down_write(&lgr->rmbs_lock);
 		list_for_each_entry_safe(buf_desc, bf, &lgr->rmbs[i], list)
 			smcr_buf_unmap_link(buf_desc, true, lnk);
-		mutex_unlock(&lgr->rmbs_lock);
-		mutex_lock(&lgr->sndbufs_lock);
+		up_write(&lgr->rmbs_lock);
+
+		down_write(&lgr->sndbufs_lock);
 		list_for_each_entry_safe(buf_desc, bf, &lgr->sndbufs[i],
 					 list)
 			smcr_buf_unmap_link(buf_desc, false, lnk);
-		mutex_unlock(&lgr->sndbufs_lock);
+		up_write(&lgr->sndbufs_lock);
 	}
 }
 
@@ -2328,19 +2329,19 @@ int smc_uncompress_bufsize(u8 compressed)
  * buffer size; if not available, return NULL
  */
 static struct smc_buf_desc *smc_buf_get_slot(int compressed_bufsize,
-					     struct mutex *lock,
+					     struct rw_semaphore *lock,
 					     struct list_head *buf_list)
 {
 	struct smc_buf_desc *buf_slot;
 
-	mutex_lock(lock);
+	down_read(lock);
 	list_for_each_entry(buf_slot, buf_list, list) {
 		if (cmpxchg(&buf_slot->used, 0, 1) == 0) {
-			mutex_unlock(lock);
+			up_read(lock);
 			return buf_slot;
 		}
 	}
-	mutex_unlock(lock);
+	up_read(lock);
 	return NULL;
 }
 
@@ -2449,13 +2450,13 @@ int smcr_link_reg_buf(struct smc_link *link, struct smc_buf_desc *buf_desc)
 	return 0;
 }
 
-static int _smcr_buf_map_lgr(struct smc_link *lnk, struct mutex *lock,
+static int _smcr_buf_map_lgr(struct smc_link *lnk, struct rw_semaphore *lock,
 			     struct list_head *lst, bool is_rmb)
 {
 	struct smc_buf_desc *buf_desc, *bf;
 	int rc = 0;
 
-	mutex_lock(lock);
+	down_write(lock);
 	list_for_each_entry_safe(buf_desc, bf, lst, list) {
 		if (!buf_desc->used)
 			continue;
@@ -2464,7 +2465,7 @@ static int _smcr_buf_map_lgr(struct smc_link *lnk, struct mutex *lock,
 			goto out;
 	}
 out:
-	mutex_unlock(lock);
+	up_write(lock);
 	return rc;
 }
 
@@ -2497,37 +2498,37 @@ int smcr_buf_reg_lgr(struct smc_link *lnk)
 	int i, rc = 0;
 
 	/* reg all RMBs for a new link */
-	mutex_lock(&lgr->rmbs_lock);
+	down_write(&lgr->rmbs_lock);
 	for (i = 0; i < SMC_RMBE_SIZES; i++) {
 		list_for_each_entry_safe(buf_desc, bf, &lgr->rmbs[i], list) {
 			if (!buf_desc->used)
 				continue;
 			rc = smcr_link_reg_buf(lnk, buf_desc);
 			if (rc) {
-				mutex_unlock(&lgr->rmbs_lock);
+				up_write(&lgr->rmbs_lock);
 				return rc;
 			}
 		}
 	}
-	mutex_unlock(&lgr->rmbs_lock);
+	up_write(&lgr->rmbs_lock);
 
 	if (lgr->buf_type == SMCR_PHYS_CONT_BUFS)
 		return rc;
 
 	/* reg all vzalloced sndbufs for a new link */
-	mutex_lock(&lgr->sndbufs_lock);
+	down_write(&lgr->sndbufs_lock);
 	for (i = 0; i < SMC_RMBE_SIZES; i++) {
 		list_for_each_entry_safe(buf_desc, bf, &lgr->sndbufs[i], list) {
 			if (!buf_desc->used || !buf_desc->is_vm)
 				continue;
 			rc = smcr_link_reg_buf(lnk, buf_desc);
 			if (rc) {
-				mutex_unlock(&lgr->sndbufs_lock);
+				up_write(&lgr->sndbufs_lock);
 				return rc;
 			}
 		}
 	}
-	mutex_unlock(&lgr->sndbufs_lock);
+	up_write(&lgr->sndbufs_lock);
 	return rc;
 }
 
@@ -2648,7 +2649,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
 	struct list_head *buf_list;
 	int bufsize, bufsize_short;
 	bool is_dgraded = false;
-	struct mutex *lock;	/* lock buffer list */
+	struct rw_semaphore *lock;	/* lock buffer list */
 	int sk_buf_size;
 
 	if (is_rmb)
@@ -2696,9 +2697,9 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
 		SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb);
 		SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, bufsize);
 		buf_desc->used = 1;
-		mutex_lock(lock);
+		down_write(lock);
 		list_add(&buf_desc->list, buf_list);
-		mutex_unlock(lock);
+		up_write(lock);
 		break; /* found */
 	}
 
@@ -2772,9 +2773,9 @@ int smc_buf_create(struct smc_sock *smc, bool is_smcd)
 	/* create rmb */
 	rc = __smc_buf_create(smc, is_smcd, true);
 	if (rc) {
-		mutex_lock(&smc->conn.lgr->sndbufs_lock);
+		down_write(&smc->conn.lgr->sndbufs_lock);
 		list_del(&smc->conn.sndbuf_desc->list);
-		mutex_unlock(&smc->conn.lgr->sndbufs_lock);
+		up_write(&smc->conn.lgr->sndbufs_lock);
 		smc_buf_free(smc->conn.lgr, false, smc->conn.sndbuf_desc);
 		smc->conn.sndbuf_desc = NULL;
 	}
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 2a5a51b..7771462 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -260,9 +260,9 @@ struct smc_link_group {
 	unsigned short		vlan_id;	/* vlan id of link group */
 
 	struct list_head	sndbufs[SMC_RMBE_SIZES];/* tx buffers */
-	struct mutex		sndbufs_lock;	/* protects tx buffers */
+	struct rw_semaphore	sndbufs_lock;	/* protects tx buffers */
 	struct list_head	rmbs[SMC_RMBE_SIZES];	/* rx buffers */
-	struct mutex		rmbs_lock;	/* protects rx buffers */
+	struct rw_semaphore	rmbs_lock;	/* protects rx buffers */
 
 	u8			id[SMC_LGR_ID_SIZE];	/* unique lgr id */
 	struct delayed_work	free_work;	/* delayed freeing of an lgr */
diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index edb0cef..4a010eb 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -642,7 +642,7 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
 
 	prim_lnk_idx = link->link_idx;
 	lnk_idx = link_new->link_idx;
-	mutex_lock(&lgr->rmbs_lock);
+	down_write(&lgr->rmbs_lock);
 	ext->num_rkeys = lgr->conns_num;
 	if (!ext->num_rkeys)
 		goto out;
@@ -662,7 +662,7 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
 	}
 	len += i * sizeof(ext->rt[0]);
 out:
-	mutex_unlock(&lgr->rmbs_lock);
+	up_write(&lgr->rmbs_lock);
 	return len;
 }
 
@@ -923,7 +923,7 @@ static int smc_llc_cli_rkey_exchange(struct smc_link *link,
 	int rc = 0;
 	int i;
 
-	mutex_lock(&lgr->rmbs_lock);
+	down_write(&lgr->rmbs_lock);
 	num_rkeys_send = lgr->conns_num;
 	buf_pos = smc_llc_get_first_rmb(lgr, &buf_lst);
 	do {
@@ -950,7 +950,7 @@ static int smc_llc_cli_rkey_exchange(struct smc_link *link,
 			break;
 	} while (num_rkeys_send || num_rkeys_recv);
 
-	mutex_unlock(&lgr->rmbs_lock);
+	up_write(&lgr->rmbs_lock);
 	return rc;
 }
 
@@ -1033,14 +1033,14 @@ static void smc_llc_save_add_link_rkeys(struct smc_link *link,
 	ext = (struct smc_llc_msg_add_link_v2_ext *)((u8 *)lgr->wr_rx_buf_v2 +
 						     SMC_WR_TX_SIZE);
 	max = min_t(u8, ext->num_rkeys, SMC_LLC_RKEYS_PER_MSG_V2);
-	mutex_lock(&lgr->rmbs_lock);
+	down_write(&lgr->rmbs_lock);
 	for (i = 0; i < max; i++) {
 		smc_rtoken_set(lgr, link->link_idx, link_new->link_idx,
 			       ext->rt[i].rmb_key,
 			       ext->rt[i].rmb_vaddr_new,
 			       ext->rt[i].rmb_key_new);
 	}
-	mutex_unlock(&lgr->rmbs_lock);
+	up_write(&lgr->rmbs_lock);
 }
 
 static void smc_llc_save_add_link_info(struct smc_link *link,
@@ -1347,7 +1347,7 @@ static int smc_llc_srv_rkey_exchange(struct smc_link *link,
 	int rc = 0;
 	int i;
 
-	mutex_lock(&lgr->rmbs_lock);
+	down_write(&lgr->rmbs_lock);
 	num_rkeys_send = lgr->conns_num;
 	buf_pos = smc_llc_get_first_rmb(lgr, &buf_lst);
 	do {
@@ -1372,7 +1372,7 @@ static int smc_llc_srv_rkey_exchange(struct smc_link *link,
 		smc_llc_flow_qentry_del(&lgr->llc_flow_lcl);
 	} while (num_rkeys_send || num_rkeys_recv);
 out:
-	mutex_unlock(&lgr->rmbs_lock);
+	up_write(&lgr->rmbs_lock);
 	return rc;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 09/10] net/smc: Fix potential panic dues to unprotected smc_llc_srv_add_link()
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (7 preceding siblings ...)
  2022-10-23 12:44 ` [PATCH net-next v4 08/10] net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore D.Wythe
@ 2022-10-23 12:44 ` D.Wythe
  2022-10-23 12:44 ` [PATCH net-next v4 10/10] net/smc: fix application data exception D.Wythe
  2022-10-24 13:11 ` [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections Jan Karcher
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:44 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

After we optimize the parallel capability of SMC-R connection establish,
there is a certain chance to trigger the following panic:

PID: 5900   TASK: ffff88c1c8af4100  CPU: 1   COMMAND: "kworker/1:48"
 #0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7
 #1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a
 #2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60
 #3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7
 #4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715
 #5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654
 #6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62
    [exception RIP: ib_alloc_mr+19]
    RIP: ffffffffc0c9cce3  RSP: ffff9456c1cc7c38  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000004
    RDX: 0000000000000010  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88c1ea281d00   R8: 000000020a34ffff   R9: ffff88c1350bbb20
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000010  R14: ffff88c1ab040a50  R15: ffff88c1ea281d00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc]
 #8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc]
 #9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc]

The reason here is that when the server tries to create a second link,
smc_llc_srv_add_link() has no protection and may add a new link to
link group. This breaks the security environment protected by
llc_conf_mutex.

Fixes: 2d2209f20189 ("net/smc: first part of add link processing as SMC server")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 3bac24e..8647d5e 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1829,8 +1829,10 @@ static int smcr_serv_conf_first_link(struct smc_sock *smc)
 	smc_llc_link_active(link);
 	smcr_lgr_set_type(link->lgr, SMC_LGR_SINGLE);
 
+	down_write(&link->lgr->llc_conf_mutex);
 	/* initial contact - try to establish second link */
 	smc_llc_srv_add_link(link, NULL);
+	up_write(&link->lgr->llc_conf_mutex);
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 10/10] net/smc: fix application data exception
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (8 preceding siblings ...)
  2022-10-23 12:44 ` [PATCH net-next v4 09/10] net/smc: Fix potential panic dues to unprotected smc_llc_srv_add_link() D.Wythe
@ 2022-10-23 12:44 ` D.Wythe
  2022-10-24 13:11 ` [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections Jan Karcher
  10 siblings, 0 replies; 20+ messages in thread
From: D.Wythe @ 2022-10-23 12:44 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

After we optimize the parallel capability of SMC-R connection
establishment, There is a certain probability that following
exceptions will occur in the wrk benchmark test:

Running 10s test @ http://11.213.45.6:80
  8 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.72ms   13.94ms 245.33ms   94.17%
    Req/Sec     1.96k   713.67     5.41k    75.16%
  155262 requests in 10.10s, 23.10MB read
Non-2xx or 3xx responses: 3

We will find that the error is HTTP 400 error, which is a serious
exception in our test, which means the application data was
corrupted.

Consider the following scenarios:

CPU0                            CPU1

buf_desc->used = 0;
                                cmpxchg(buf_desc->used, 0, 1)
                                deal_with(buf_desc)

memset(buf_desc->cpu_addr,0);

This will cause the data received by a victim connection to be cleared,
thus triggering an HTTP 400 error in the server.

This patch exchange the order between clear used and memset, add
barrier to ensure memory consistency.

Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_core.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index b09a3fd..448b851 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1401,8 +1401,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 
 		smc_buf_free(lgr, is_rmb, buf_desc);
 	} else {
-		buf_desc->used = 0;
-		memset(buf_desc->cpu_addr, 0, buf_desc->len);
+		/* memzero_explicit provides potential memory barrier semantics */
+		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
+		WRITE_ONCE(buf_desc->used, 0);
 	}
 }
 
@@ -1413,19 +1414,17 @@ static void smc_buf_unuse(struct smc_connection *conn,
 		if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
 			smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
 		} else {
-			conn->sndbuf_desc->used = 0;
-			memset(conn->sndbuf_desc->cpu_addr, 0,
-			       conn->sndbuf_desc->len);
+			memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
+			WRITE_ONCE(conn->sndbuf_desc->used, 0);
 		}
 	}
 	if (conn->rmb_desc) {
 		if (!lgr->is_smcd) {
 			smcr_buf_unuse(conn->rmb_desc, true, lgr);
 		} else {
-			conn->rmb_desc->used = 0;
-			memset(conn->rmb_desc->cpu_addr, 0,
-			       conn->rmb_desc->len +
-			       sizeof(struct smcd_cdc_msg));
+			memzero_explicit(conn->rmb_desc->cpu_addr,
+					 conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
+			WRITE_ONCE(conn->rmb_desc->used, 0);
 		}
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
                   ` (9 preceding siblings ...)
  2022-10-23 12:44 ` [PATCH net-next v4 10/10] net/smc: fix application data exception D.Wythe
@ 2022-10-24 13:11 ` Jan Karcher
  2022-10-26  7:20   ` D. Wythe
  2022-11-01  7:22   ` D. Wythe
  10 siblings, 2 replies; 20+ messages in thread
From: Jan Karcher @ 2022-10-24 13:11 UTC (permalink / raw)
  To: D.Wythe, kgraul, wenjia; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

Hi D. Wythe,

I re-run the tests with your fix.
SMC-R works fine now. For SMC-D we still have the following problem. It 
is kind of the same as i reported in v2 but even weirder:

smc stats:

t8345011
SMC-D Connections Summary
   Total connections handled          2465
SMC-R Connections Summary
   Total connections handled           232

t8345010
SMC-D Connections Summary
   Total connections handled          2290
SMC-R Connections Summary
   Total connections handled           231


smc linkgroups:

t8345011
[root@t8345011 ~]# smcr linkgroup
LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
00000400 SERV     SYM         0       0  NET25
[root@t8345011 ~]# smcd linkgroup
LG-ID    VLAN  #Conns  PNET-ID
00000300    0      16  NET25

t8345010
[root@t8345010 tela-kernel]# smcr linkgroup
LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
00000400 CLNT     SYM         0       0  NET25
[root@t8345010 tela-kernel]# smcd linkgroup
LG-ID    VLAN  #Conns  PNET-ID
00000300    0       1  NET25


smcss:

t8345011
[root@t8345011 ~]# smcss
State          UID   Inode   Local Address           Peer Address 
     Intf Mode

t8345010
[root@t8345010 tela-kernel]# smcss
State          UID   Inode   Local Address           Peer Address 
     Intf Mode


lsmod:

t8345011
[root@t8345011 ~]# lsmod | grep smc
smc                   225280  18 ism,smc_diag
t8345010
[root@t8345010 tela-kernel]# lsmod | grep smc
smc                   225280  3 ism,smc_diag

Also smc_dbg and netstat do not show any more information on this 
problem. We only see in the dmesg that the code seems to build up SMC-R 
linkgroups even tho we are running the SMC-D tests.
NOTE: we disabled the syncookies for the tests.

dmesg:

t8345011
smc-tests: test_smcapp_torture_test started
kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. 
Dropping request.  Check SNMP counters.
kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 
00000401, ibdev mlx5_0, ibport 1
kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25 

kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 
00000402, ibdev mlx5_1, ibport 1
kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25

t8345010
smc-tests: test_smcapp_torture_test started
kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 
00000401, ibdev mlx5_0, ibport 1
kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25 

kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 
00000402, ibdev mlx5_1, ibport 1
kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid 
NET25

If this output does not help and if you want us to look deeper into it 
feel free to let us know and we can debug further.

On 23/10/2022 14:43, D.Wythe wrote:
> From: "D.Wythe" <alibuda@linux.alibaba.com>
> 
> This patch set attempts to optimize the parallelism of SMC-R connections,
> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> occur after thoses optimization.
> 
> According to Off-CPU graph, SMC worker's off-CPU as that:
> 
> smc_close_passive_work                  (1.09%)
>          smcr_buf_unuse                  (1.08%)
>                  smc_llc_flow_initiate   (1.02%)
> 
> smc_listen_work                         (48.17%)
>          __mutex_lock.isra.11            (47.96%)
> 
> 
> An ideal SMC-R connection process should only block on the IO events
> of the network, but it's quite clear that the SMC-R connection now is
> queued on the lock most of the time.
> 
> The goal of this patchset is to achieve our ideal situation where
> network IO events are blocked for the majority of the connection lifetime.
> 
> There are three big locks here:
> 
> 1. smc_client_lgr_pending & smc_server_lgr_pending
> 
> 2. llc_conf_mutex
> 
> 3. rmbs_lock & sndbufs_lock
> 
> And an implementation issue:
> 
> 1. confirm/delete rkey msg can't be sent concurrently while
> protocol allows indeed.
> 
> Unfortunately,The above problems together affect the parallelism of
> SMC-R connection. If any of them are not solved. our goal cannot
> be achieved.
> 
> After this patch set, we can get a quite ideal off-CPU graph as
> following:
> 
> smc_close_passive_work                                  (41.58%)
>          smcr_buf_unuse                                  (41.57%)
>                  smc_llc_do_delete_rkey                  (41.57%)
> 
> smc_listen_work                                         (39.10%)
>          smc_clc_wait_msg                                (13.18%)
>                  tcp_recvmsg_locked                      (13.18)
>          smc_listen_find_device                          (25.87%)
>                  smcr_lgr_reg_rmbs                       (25.87%)
>                          smc_llc_do_confirm_rkey         (25.87%)
> 
> We can see that most of the waiting times are waiting for network IO
> events. This also has a certain performance improvement on our
> short-lived conenction wrk/nginx benchmark test:
> 
> +--------------+------+------+-------+--------+------+--------+
> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
> +--------------+------+------+-------+--------+------+--------+
> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
> +--------------+------+------+-------+--------+------+--------+
> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
> +--------------+------+------+-------+--------+------+--------+
> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
> +--------------+------+------+-------+--------+------+--------+
> 
> The reason why the benefit is not obvious after the number of connections
> has increased dues to workqueue. If we try to change workqueue to UNBOUND,
> we can obtain at least 4-5 times performance improvement, reach up to half
> of TCP. However, this is not an elegant solution, the optimization of it
> will be much more complicated. But in any case, we will submit relevant
> optimization patches as soon as possible.
> 
> Please note that the premise here is that the lock related problem
> must be solved first, otherwise, no matter how we optimize the workqueue,
> there won't be much improvement.
> 
> Because there are a lot of related changes to the code, if you have
> any questions or suggestions, please let me know.
> 
> Thanks
> D. Wythe
> 
> v1 -> v2:
> 
> 1. Fix panic in SMC-D scenario
> 2. Fix lnkc related hashfn calculation exception, caused by operator
> priority
> 3. Only wake up one connection if the lnk is not active
> 4. Delete obsolete unlock logic in smc_listen_work()
> 5. PATCH format, do Reverse Christmas tree
> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
> 7. PATCH format, add correct fix tag for the patches for fixes.
> 8. PATCH format, fix some spelling error
> 9. PATCH format, rename slow to do_slow
> 
> v2 -> v3:
> 
> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
> no link at all. Replace it by lgr decision maker, who provides suggestions
> to SMC-D and SMC-R on whether to create new link group.
> 
> 2. Fix the corruption problem described by PATCH 'fix application
> data exception' on SMC-D.
> 
> v3 -> v4:
> 
> 1. Fix panic caused by uninitialization map.
> 
> D. Wythe (10):
>    net/smc: remove locks smc_client_lgr_pending and
>      smc_server_lgr_pending
>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>    net/smc: allow confirm/delete rkey response deliver multiplex
>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>    net/smc: use read semaphores to reduce unnecessary blocking in
>      smc_buf_create() & smcr_buf_unuse()
>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>    net/smc: Fix potential panic dues to unprotected
>      smc_llc_srv_add_link()
>    net/smc: fix application data exception
> 
>   net/smc/af_smc.c   |  70 ++++----
>   net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
>   net/smc/smc_core.h |  36 +++-
>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>   net/smc/smc_llc.h  |   6 +
>   net/smc/smc_wr.c   |  10 --
>   net/smc/smc_wr.h   |  10 ++
>   7 files changed, 712 insertions(+), 175 deletions(-)
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-10-24 13:11 ` [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections Jan Karcher
@ 2022-10-26  7:20   ` D. Wythe
  2022-11-01  7:22   ` D. Wythe
  1 sibling, 0 replies; 20+ messages in thread
From: D. Wythe @ 2022-10-26  7:20 UTC (permalink / raw)
  To: Jan Karcher, kgraul, wenjia; +Cc: kuba, davem, netdev, linux-s390, linux-rdma


Thank you for your information. I'm trying to debug it.
we will let you know as soon as possible.

Best Wishes.
D. Wythe

On 10/24/22 9:11 PM, Jan Karcher wrote:
> Hi D. Wythe,
> 
> I re-run the tests with your fix.
> SMC-R works fine now. For SMC-D we still have the following problem. It is kind of the same as i reported in v2 but even weirder:
> 
> smc stats:
> 
> t8345011
> SMC-D Connections Summary
>    Total connections handled          2465
> SMC-R Connections Summary
>    Total connections handled           232
> 
> t8345010
> SMC-D Connections Summary
>    Total connections handled          2290
> SMC-R Connections Summary
>    Total connections handled           231
> 
> 
> smc linkgroups:
> 
> t8345011
> [root@t8345011 ~]# smcr linkgroup
> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
> 00000400 SERV     SYM         0       0  NET25
> [root@t8345011 ~]# smcd linkgroup
> LG-ID    VLAN  #Conns  PNET-ID
> 00000300    0      16  NET25
> 
> t8345010
> [root@t8345010 tela-kernel]# smcr linkgroup
> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
> 00000400 CLNT     SYM         0       0  NET25
> [root@t8345010 tela-kernel]# smcd linkgroup
> LG-ID    VLAN  #Conns  PNET-ID
> 00000300    0       1  NET25
> 
> 
> smcss:
> 
> t8345011
> [root@t8345011 ~]# smcss
> State          UID   Inode   Local Address           Peer Address     Intf Mode
> 
> t8345010
> [root@t8345010 tela-kernel]# smcss
> State          UID   Inode   Local Address           Peer Address     Intf Mode
> 
> 
> lsmod:
> 
> t8345011
> [root@t8345011 ~]# lsmod | grep smc
> smc                   225280  18 ism,smc_diag
> t8345010
> [root@t8345010 tela-kernel]# lsmod | grep smc
> smc                   225280  3 ism,smc_diag
> 
> Also smc_dbg and netstat do not show any more information on this problem. We only see in the dmesg that the code seems to build up SMC-R linkgroups even tho we are running the SMC-D tests.
> NOTE: we disabled the syncookies for the tests.
> 
> dmesg:
> 
> t8345011
> smc-tests: test_smcapp_torture_test started
> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. Dropping request.  Check SNMP counters.
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
> 
> t8345010
> smc-tests: test_smcapp_torture_test started
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
> 
> If this output does not help and if you want us to look deeper into it feel free to let us know and we can debug further.
> 
> On 23/10/2022 14:43, D.Wythe wrote:
>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>
>> This patch set attempts to optimize the parallelism of SMC-R connections,
>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>> occur after thoses optimization.
>>
>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>
>> smc_close_passive_work                  (1.09%)
>>          smcr_buf_unuse                  (1.08%)
>>                  smc_llc_flow_initiate   (1.02%)
>>
>> smc_listen_work                         (48.17%)
>>          __mutex_lock.isra.11            (47.96%)
>>
>>
>> An ideal SMC-R connection process should only block on the IO events
>> of the network, but it's quite clear that the SMC-R connection now is
>> queued on the lock most of the time.
>>
>> The goal of this patchset is to achieve our ideal situation where
>> network IO events are blocked for the majority of the connection lifetime.
>>
>> There are three big locks here:
>>
>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>
>> 2. llc_conf_mutex
>>
>> 3. rmbs_lock & sndbufs_lock
>>
>> And an implementation issue:
>>
>> 1. confirm/delete rkey msg can't be sent concurrently while
>> protocol allows indeed.
>>
>> Unfortunately,The above problems together affect the parallelism of
>> SMC-R connection. If any of them are not solved. our goal cannot
>> be achieved.
>>
>> After this patch set, we can get a quite ideal off-CPU graph as
>> following:
>>
>> smc_close_passive_work                                  (41.58%)
>>          smcr_buf_unuse                                  (41.57%)
>>                  smc_llc_do_delete_rkey                  (41.57%)
>>
>> smc_listen_work                                         (39.10%)
>>          smc_clc_wait_msg                                (13.18%)
>>                  tcp_recvmsg_locked                      (13.18)
>>          smc_listen_find_device                          (25.87%)
>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>                          smc_llc_do_confirm_rkey         (25.87%)
>>
>> We can see that most of the waiting times are waiting for network IO
>> events. This also has a certain performance improvement on our
>> short-lived conenction wrk/nginx benchmark test:
>>
>> +--------------+------+------+-------+--------+------+--------+
>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>> +--------------+------+------+-------+--------+------+--------+
>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>> +--------------+------+------+-------+--------+------+--------+
>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>> +--------------+------+------+-------+--------+------+--------+
>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>> +--------------+------+------+-------+--------+------+--------+
>>
>> The reason why the benefit is not obvious after the number of connections
>> has increased dues to workqueue. If we try to change workqueue to UNBOUND,
>> we can obtain at least 4-5 times performance improvement, reach up to half
>> of TCP. However, this is not an elegant solution, the optimization of it
>> will be much more complicated. But in any case, we will submit relevant
>> optimization patches as soon as possible.
>>
>> Please note that the premise here is that the lock related problem
>> must be solved first, otherwise, no matter how we optimize the workqueue,
>> there won't be much improvement.
>>
>> Because there are a lot of related changes to the code, if you have
>> any questions or suggestions, please let me know.
>>
>> Thanks
>> D. Wythe
>>
>> v1 -> v2:
>>
>> 1. Fix panic in SMC-D scenario
>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>> priority
>> 3. Only wake up one connection if the lnk is not active
>> 4. Delete obsolete unlock logic in smc_listen_work()
>> 5. PATCH format, do Reverse Christmas tree
>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>> 7. PATCH format, add correct fix tag for the patches for fixes.
>> 8. PATCH format, fix some spelling error
>> 9. PATCH format, rename slow to do_slow
>>
>> v2 -> v3:
>>
>> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
>> no link at all. Replace it by lgr decision maker, who provides suggestions
>> to SMC-D and SMC-R on whether to create new link group.
>>
>> 2. Fix the corruption problem described by PATCH 'fix application
>> data exception' on SMC-D.
>>
>> v3 -> v4:
>>
>> 1. Fix panic caused by uninitialization map.
>>
>> D. Wythe (10):
>>    net/smc: remove locks smc_client_lgr_pending and
>>      smc_server_lgr_pending
>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>      smc_buf_create() & smcr_buf_unuse()
>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>>    net/smc: Fix potential panic dues to unprotected
>>      smc_llc_srv_add_link()
>>    net/smc: fix application data exception
>>
>>   net/smc/af_smc.c   |  70 ++++----
>>   net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
>>   net/smc/smc_core.h |  36 +++-
>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>   net/smc/smc_llc.h  |   6 +
>>   net/smc/smc_wr.c   |  10 --
>>   net/smc/smc_wr.h   |  10 ++
>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-10-24 13:11 ` [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections Jan Karcher
  2022-10-26  7:20   ` D. Wythe
@ 2022-11-01  7:22   ` D. Wythe
  2022-11-02 13:55     ` Wenjia Zhang
  1 sibling, 1 reply; 20+ messages in thread
From: D. Wythe @ 2022-11-01  7:22 UTC (permalink / raw)
  To: Jan Karcher, kgraul, wenjia; +Cc: kuba, davem, netdev, linux-s390, linux-rdma


Hi Jan,

Our team conducted some code reviews over this, but unfortunately no obvious problems were found. Hence
we are waiting for Tony Lu's virtual SMC-D device to test, which is expected to come in this week.  Before that,
I wonder if your tests are running separately on separate PATCH? If so, I would like to please you to test
the first PATCH and the second PATCH together. I doubt that the problem repaired by the second PATCH
is the cause of this issues.

Best Wishes.
D. Wythe


On 10/24/22 9:11 PM, Jan Karcher wrote:
> Hi D. Wythe,
> 
> I re-run the tests with your fix.
> SMC-R works fine now. For SMC-D we still have the following problem. It is kind of the same as i reported in v2 but even weirder:
> 
> smc stats:
> 
> t8345011
> SMC-D Connections Summary
>    Total connections handled          2465
> SMC-R Connections Summary
>    Total connections handled           232
> 
> t8345010
> SMC-D Connections Summary
>    Total connections handled          2290
> SMC-R Connections Summary
>    Total connections handled           231
> 
> 
> smc linkgroups:
> 
> t8345011
> [root@t8345011 ~]# smcr linkgroup
> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
> 00000400 SERV     SYM         0       0  NET25
> [root@t8345011 ~]# smcd linkgroup
> LG-ID    VLAN  #Conns  PNET-ID
> 00000300    0      16  NET25
> 
> t8345010
> [root@t8345010 tela-kernel]# smcr linkgroup
> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
> 00000400 CLNT     SYM         0       0  NET25
> [root@t8345010 tela-kernel]# smcd linkgroup
> LG-ID    VLAN  #Conns  PNET-ID
> 00000300    0       1  NET25
> 
> 
> smcss:
> 
> t8345011
> [root@t8345011 ~]# smcss
> State          UID   Inode   Local Address           Peer Address     Intf Mode
> 
> t8345010
> [root@t8345010 tela-kernel]# smcss
> State          UID   Inode   Local Address           Peer Address     Intf Mode
> 
> 
> lsmod:
> 
> t8345011
> [root@t8345011 ~]# lsmod | grep smc
> smc                   225280  18 ism,smc_diag
> t8345010
> [root@t8345010 tela-kernel]# lsmod | grep smc
> smc                   225280  3 ism,smc_diag
> 
> Also smc_dbg and netstat do not show any more information on this problem. We only see in the dmesg that the code seems to build up SMC-R linkgroups even tho we are running the SMC-D tests.
> NOTE: we disabled the syncookies for the tests.
> 
> dmesg:
> 
> t8345011
> smc-tests: test_smcapp_torture_test started
> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. Dropping request.  Check SNMP counters.
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
> 
> t8345010
> smc-tests: test_smcapp_torture_test started
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
> 
> If this output does not help and if you want us to look deeper into it feel free to let us know and we can debug further.
> 
> On 23/10/2022 14:43, D.Wythe wrote:
>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>
>> This patch set attempts to optimize the parallelism of SMC-R connections,
>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>> occur after thoses optimization.
>>
>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>
>> smc_close_passive_work                  (1.09%)
>>          smcr_buf_unuse                  (1.08%)
>>                  smc_llc_flow_initiate   (1.02%)
>>
>> smc_listen_work                         (48.17%)
>>          __mutex_lock.isra.11            (47.96%)
>>
>>
>> An ideal SMC-R connection process should only block on the IO events
>> of the network, but it's quite clear that the SMC-R connection now is
>> queued on the lock most of the time.
>>
>> The goal of this patchset is to achieve our ideal situation where
>> network IO events are blocked for the majority of the connection lifetime.
>>
>> There are three big locks here:
>>
>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>
>> 2. llc_conf_mutex
>>
>> 3. rmbs_lock & sndbufs_lock
>>
>> And an implementation issue:
>>
>> 1. confirm/delete rkey msg can't be sent concurrently while
>> protocol allows indeed.
>>
>> Unfortunately,The above problems together affect the parallelism of
>> SMC-R connection. If any of them are not solved. our goal cannot
>> be achieved.
>>
>> After this patch set, we can get a quite ideal off-CPU graph as
>> following:
>>
>> smc_close_passive_work                                  (41.58%)
>>          smcr_buf_unuse                                  (41.57%)
>>                  smc_llc_do_delete_rkey                  (41.57%)
>>
>> smc_listen_work                                         (39.10%)
>>          smc_clc_wait_msg                                (13.18%)
>>                  tcp_recvmsg_locked                      (13.18)
>>          smc_listen_find_device                          (25.87%)
>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>                          smc_llc_do_confirm_rkey         (25.87%)
>>
>> We can see that most of the waiting times are waiting for network IO
>> events. This also has a certain performance improvement on our
>> short-lived conenction wrk/nginx benchmark test:
>>
>> +--------------+------+------+-------+--------+------+--------+
>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>> +--------------+------+------+-------+--------+------+--------+
>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>> +--------------+------+------+-------+--------+------+--------+
>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>> +--------------+------+------+-------+--------+------+--------+
>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>> +--------------+------+------+-------+--------+------+--------+
>>
>> The reason why the benefit is not obvious after the number of connections
>> has increased dues to workqueue. If we try to change workqueue to UNBOUND,
>> we can obtain at least 4-5 times performance improvement, reach up to half
>> of TCP. However, this is not an elegant solution, the optimization of it
>> will be much more complicated. But in any case, we will submit relevant
>> optimization patches as soon as possible.
>>
>> Please note that the premise here is that the lock related problem
>> must be solved first, otherwise, no matter how we optimize the workqueue,
>> there won't be much improvement.
>>
>> Because there are a lot of related changes to the code, if you have
>> any questions or suggestions, please let me know.
>>
>> Thanks
>> D. Wythe
>>
>> v1 -> v2:
>>
>> 1. Fix panic in SMC-D scenario
>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>> priority
>> 3. Only wake up one connection if the lnk is not active
>> 4. Delete obsolete unlock logic in smc_listen_work()
>> 5. PATCH format, do Reverse Christmas tree
>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>> 7. PATCH format, add correct fix tag for the patches for fixes.
>> 8. PATCH format, fix some spelling error
>> 9. PATCH format, rename slow to do_slow
>>
>> v2 -> v3:
>>
>> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
>> no link at all. Replace it by lgr decision maker, who provides suggestions
>> to SMC-D and SMC-R on whether to create new link group.
>>
>> 2. Fix the corruption problem described by PATCH 'fix application
>> data exception' on SMC-D.
>>
>> v3 -> v4:
>>
>> 1. Fix panic caused by uninitialization map.
>>
>> D. Wythe (10):
>>    net/smc: remove locks smc_client_lgr_pending and
>>      smc_server_lgr_pending
>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>      smc_buf_create() & smcr_buf_unuse()
>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>>    net/smc: Fix potential panic dues to unprotected
>>      smc_llc_srv_add_link()
>>    net/smc: fix application data exception
>>
>>   net/smc/af_smc.c   |  70 ++++----
>>   net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
>>   net/smc/smc_core.h |  36 +++-
>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>   net/smc/smc_llc.h  |   6 +
>>   net/smc/smc_wr.c   |  10 --
>>   net/smc/smc_wr.h   |  10 ++
>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-11-01  7:22   ` D. Wythe
@ 2022-11-02 13:55     ` Wenjia Zhang
  2022-11-07 11:05       ` D. Wythe
  0 siblings, 1 reply; 20+ messages in thread
From: Wenjia Zhang @ 2022-11-02 13:55 UTC (permalink / raw)
  To: D. Wythe, Jan Karcher, kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 01.11.22 08:22, D. Wythe wrote:
> 
> Hi Jan,
> 
> Our team conducted some code reviews over this, but unfortunately no 
> obvious problems were found. Hence
> we are waiting for Tony Lu's virtual SMC-D device to test, which is 
> expected to come in this week.  Before that,
> I wonder if your tests are running separately on separate PATCH? If so, 
> I would like to please you to test
> the first PATCH and the second PATCH together. I doubt that the problem 
> repaired by the second PATCH
> is the cause of this issues.
> 
> Best Wishes.
> D. Wythe
> 

Hi D. Wythe,

We did test the series of the patches as a whole. That would be great if 
you could use Tony's virtual device to test SMC-D. By the way, I'll put 
your patches in our CI, let's see if it can find something.

Best,
Wenjia
> 
> On 10/24/22 9:11 PM, Jan Karcher wrote:
>> Hi D. Wythe,
>>
>> I re-run the tests with your fix.
>> SMC-R works fine now. For SMC-D we still have the following problem. 
>> It is kind of the same as i reported in v2 but even weirder:
>>
>> smc stats:
>>
>> t8345011
>> SMC-D Connections Summary
>>    Total connections handled          2465
>> SMC-R Connections Summary
>>    Total connections handled           232
>>
>> t8345010
>> SMC-D Connections Summary
>>    Total connections handled          2290
>> SMC-R Connections Summary
>>    Total connections handled           231
>>
>>
>> smc linkgroups:
>>
>> t8345011
>> [root@t8345011 ~]# smcr linkgroup
>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>> 00000400 SERV     SYM         0       0  NET25
>> [root@t8345011 ~]# smcd linkgroup
>> LG-ID    VLAN  #Conns  PNET-ID
>> 00000300    0      16  NET25
>>
>> t8345010
>> [root@t8345010 tela-kernel]# smcr linkgroup
>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>> 00000400 CLNT     SYM         0       0  NET25
>> [root@t8345010 tela-kernel]# smcd linkgroup
>> LG-ID    VLAN  #Conns  PNET-ID
>> 00000300    0       1  NET25
>>
>>
>> smcss:
>>
>> t8345011
>> [root@t8345011 ~]# smcss
>> State          UID   Inode   Local Address           Peer Address     
>> Intf Mode
>>
>> t8345010
>> [root@t8345010 tela-kernel]# smcss
>> State          UID   Inode   Local Address           Peer Address     
>> Intf Mode
>>
>>
>> lsmod:
>>
>> t8345011
>> [root@t8345011 ~]# lsmod | grep smc
>> smc                   225280  18 ism,smc_diag
>> t8345010
>> [root@t8345010 tela-kernel]# lsmod | grep smc
>> smc                   225280  3 ism,smc_diag
>>
>> Also smc_dbg and netstat do not show any more information on this 
>> problem. We only see in the dmesg that the code seems to build up 
>> SMC-R linkgroups even tho we are running the SMC-D tests.
>> NOTE: we disabled the syncookies for the tests.
>>
>> dmesg:
>>
>> t8345011
>> smc-tests: test_smcapp_torture_test started
>> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. 
>> Dropping request.  Check SNMP counters.
>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 
>> 00000401, ibdev mlx5_0, ibport 1
>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 
>> 00000402, ibdev mlx5_1, ibport 1
>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid 
>> NET25
>>
>> t8345010
>> smc-tests: test_smcapp_torture_test started
>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 
>> 00000401, ibdev mlx5_0, ibport 1
>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 
>> 00000402, ibdev mlx5_1, ibport 1
>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid 
>> NET25
>>
>> If this output does not help and if you want us to look deeper into it 
>> feel free to let us know and we can debug further.
>>
>> On 23/10/2022 14:43, D.Wythe wrote:
>>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>>
>>> This patch set attempts to optimize the parallelism of SMC-R 
>>> connections,
>>> mainly to reduce unnecessary blocking on locks, and to fix exceptions 
>>> that
>>> occur after thoses optimization.
>>>
>>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>>
>>> smc_close_passive_work                  (1.09%)
>>>          smcr_buf_unuse                  (1.08%)
>>>                  smc_llc_flow_initiate   (1.02%)
>>>
>>> smc_listen_work                         (48.17%)
>>>          __mutex_lock.isra.11            (47.96%)
>>>
>>>
>>> An ideal SMC-R connection process should only block on the IO events
>>> of the network, but it's quite clear that the SMC-R connection now is
>>> queued on the lock most of the time.
>>>
>>> The goal of this patchset is to achieve our ideal situation where
>>> network IO events are blocked for the majority of the connection 
>>> lifetime.
>>>
>>> There are three big locks here:
>>>
>>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>>
>>> 2. llc_conf_mutex
>>>
>>> 3. rmbs_lock & sndbufs_lock
>>>
>>> And an implementation issue:
>>>
>>> 1. confirm/delete rkey msg can't be sent concurrently while
>>> protocol allows indeed.
>>>
>>> Unfortunately,The above problems together affect the parallelism of
>>> SMC-R connection. If any of them are not solved. our goal cannot
>>> be achieved.
>>>
>>> After this patch set, we can get a quite ideal off-CPU graph as
>>> following:
>>>
>>> smc_close_passive_work                                  (41.58%)
>>>          smcr_buf_unuse                                  (41.57%)
>>>                  smc_llc_do_delete_rkey                  (41.57%)
>>>
>>> smc_listen_work                                         (39.10%)
>>>          smc_clc_wait_msg                                (13.18%)
>>>                  tcp_recvmsg_locked                      (13.18)
>>>          smc_listen_find_device                          (25.87%)
>>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>>                          smc_llc_do_confirm_rkey         (25.87%)
>>>
>>> We can see that most of the waiting times are waiting for network IO
>>> events. This also has a certain performance improvement on our
>>> short-lived conenction wrk/nginx benchmark test:
>>>
>>> +--------------+------+------+-------+--------+------+--------+
>>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>>> +--------------+------+------+-------+--------+------+--------+
>>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>>> +--------------+------+------+-------+--------+------+--------+
>>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>>> +--------------+------+------+-------+--------+------+--------+
>>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>>> +--------------+------+------+-------+--------+------+--------+
>>>
>>> The reason why the benefit is not obvious after the number of 
>>> connections
>>> has increased dues to workqueue. If we try to change workqueue to 
>>> UNBOUND,
>>> we can obtain at least 4-5 times performance improvement, reach up to 
>>> half
>>> of TCP. However, this is not an elegant solution, the optimization of it
>>> will be much more complicated. But in any case, we will submit relevant
>>> optimization patches as soon as possible.
>>>
>>> Please note that the premise here is that the lock related problem
>>> must be solved first, otherwise, no matter how we optimize the 
>>> workqueue,
>>> there won't be much improvement.
>>>
>>> Because there are a lot of related changes to the code, if you have
>>> any questions or suggestions, please let me know.
>>>
>>> Thanks
>>> D. Wythe
>>>
>>> v1 -> v2:
>>>
>>> 1. Fix panic in SMC-D scenario
>>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>>> priority
>>> 3. Only wake up one connection if the lnk is not active
>>> 4. Delete obsolete unlock logic in smc_listen_work()
>>> 5. PATCH format, do Reverse Christmas tree
>>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>>> 7. PATCH format, add correct fix tag for the patches for fixes.
>>> 8. PATCH format, fix some spelling error
>>> 9. PATCH format, rename slow to do_slow
>>>
>>> v2 -> v3:
>>>
>>> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
>>> no link at all. Replace it by lgr decision maker, who provides 
>>> suggestions
>>> to SMC-D and SMC-R on whether to create new link group.
>>>
>>> 2. Fix the corruption problem described by PATCH 'fix application
>>> data exception' on SMC-D.
>>>
>>> v3 -> v4:
>>>
>>> 1. Fix panic caused by uninitialization map.
>>>
>>> D. Wythe (10):
>>>    net/smc: remove locks smc_client_lgr_pending and
>>>      smc_server_lgr_pending
>>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>>      smc_buf_create() & smcr_buf_unuse()
>>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>>>    net/smc: Fix potential panic dues to unprotected
>>>      smc_llc_srv_add_link()
>>>    net/smc: fix application data exception
>>>
>>>   net/smc/af_smc.c   |  70 ++++----
>>>   net/smc/smc_core.c | 478 
>>> +++++++++++++++++++++++++++++++++++++++++++++++------
>>>   net/smc/smc_core.h |  36 +++-
>>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>>   net/smc/smc_llc.h  |   6 +
>>>   net/smc/smc_wr.c   |  10 --
>>>   net/smc/smc_wr.h   |  10 ++
>>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-11-02 13:55     ` Wenjia Zhang
@ 2022-11-07 11:05       ` D. Wythe
  2022-11-09  9:10         ` D. Wythe
  0 siblings, 1 reply; 20+ messages in thread
From: D. Wythe @ 2022-11-07 11:05 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma


Hi Wenjia,

Thanks a lot for your information, before that we thought you did PATCH test one by one,
now I think I have found the root cause, and I will release a new version to fix this
soon as possible.

Best Wishes.
D. Wythe

On 11/2/22 9:55 PM, Wenjia Zhang wrote:
> 
> 
> On 01.11.22 08:22, D. Wythe wrote:
>>
>> Hi Jan,
>>
>> Our team conducted some code reviews over this, but unfortunately no obvious problems were found. Hence
>> we are waiting for Tony Lu's virtual SMC-D device to test, which is expected to come in this week.  Before that,
>> I wonder if your tests are running separately on separate PATCH? If so, I would like to please you to test
>> the first PATCH and the second PATCH together. I doubt that the problem repaired by the second PATCH
>> is the cause of this issues.
>>
>> Best Wishes.
>> D. Wythe
>>
> 
> Hi D. Wythe,
> 
> We did test the series of the patches as a whole. That would be great if you could use Tony's virtual device to test SMC-D. By the way, I'll put your patches in our CI, let's see if it can find something.
> 
> Best,
> Wenjia
>>
>> On 10/24/22 9:11 PM, Jan Karcher wrote:
>>> Hi D. Wythe,
>>>
>>> I re-run the tests with your fix.
>>> SMC-R works fine now. For SMC-D we still have the following problem. It is kind of the same as i reported in v2 but even weirder:
>>>
>>> smc stats:
>>>
>>> t8345011
>>> SMC-D Connections Summary
>>>    Total connections handled          2465
>>> SMC-R Connections Summary
>>>    Total connections handled           232
>>>
>>> t8345010
>>> SMC-D Connections Summary
>>>    Total connections handled          2290
>>> SMC-R Connections Summary
>>>    Total connections handled           231
>>>
>>>
>>> smc linkgroups:
>>>
>>> t8345011
>>> [root@t8345011 ~]# smcr linkgroup
>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>> 00000400 SERV     SYM         0       0  NET25
>>> [root@t8345011 ~]# smcd linkgroup
>>> LG-ID    VLAN  #Conns  PNET-ID
>>> 00000300    0      16  NET25
>>>
>>> t8345010
>>> [root@t8345010 tela-kernel]# smcr linkgroup
>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>> 00000400 CLNT     SYM         0       0  NET25
>>> [root@t8345010 tela-kernel]# smcd linkgroup
>>> LG-ID    VLAN  #Conns  PNET-ID
>>> 00000300    0       1  NET25
>>>
>>>
>>> smcss:
>>>
>>> t8345011
>>> [root@t8345011 ~]# smcss
>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>
>>> t8345010
>>> [root@t8345010 tela-kernel]# smcss
>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>
>>>
>>> lsmod:
>>>
>>> t8345011
>>> [root@t8345011 ~]# lsmod | grep smc
>>> smc                   225280  18 ism,smc_diag
>>> t8345010
>>> [root@t8345010 tela-kernel]# lsmod | grep smc
>>> smc                   225280  3 ism,smc_diag
>>>
>>> Also smc_dbg and netstat do not show any more information on this problem. We only see in the dmesg that the code seems to build up SMC-R linkgroups even tho we are running the SMC-D tests.
>>> NOTE: we disabled the syncookies for the tests.
>>>
>>> dmesg:
>>>
>>> t8345011
>>> smc-tests: test_smcapp_torture_test started
>>> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. Dropping request.  Check SNMP counters.
>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>
>>> t8345010
>>> smc-tests: test_smcapp_torture_test started
>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>
>>> If this output does not help and if you want us to look deeper into it feel free to let us know and we can debug further.
>>>
>>> On 23/10/2022 14:43, D.Wythe wrote:
>>>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>>>
>>>> This patch set attempts to optimize the parallelism of SMC-R connections,
>>>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>>>> occur after thoses optimization.
>>>>
>>>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>>>
>>>> smc_close_passive_work                  (1.09%)
>>>>          smcr_buf_unuse                  (1.08%)
>>>>                  smc_llc_flow_initiate   (1.02%)
>>>>
>>>> smc_listen_work                         (48.17%)
>>>>          __mutex_lock.isra.11            (47.96%)
>>>>
>>>>
>>>> An ideal SMC-R connection process should only block on the IO events
>>>> of the network, but it's quite clear that the SMC-R connection now is
>>>> queued on the lock most of the time.
>>>>
>>>> The goal of this patchset is to achieve our ideal situation where
>>>> network IO events are blocked for the majority of the connection lifetime.
>>>>
>>>> There are three big locks here:
>>>>
>>>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>>>
>>>> 2. llc_conf_mutex
>>>>
>>>> 3. rmbs_lock & sndbufs_lock
>>>>
>>>> And an implementation issue:
>>>>
>>>> 1. confirm/delete rkey msg can't be sent concurrently while
>>>> protocol allows indeed.
>>>>
>>>> Unfortunately,The above problems together affect the parallelism of
>>>> SMC-R connection. If any of them are not solved. our goal cannot
>>>> be achieved.
>>>>
>>>> After this patch set, we can get a quite ideal off-CPU graph as
>>>> following:
>>>>
>>>> smc_close_passive_work                                  (41.58%)
>>>>          smcr_buf_unuse                                  (41.57%)
>>>>                  smc_llc_do_delete_rkey                  (41.57%)
>>>>
>>>> smc_listen_work                                         (39.10%)
>>>>          smc_clc_wait_msg                                (13.18%)
>>>>                  tcp_recvmsg_locked                      (13.18)
>>>>          smc_listen_find_device                          (25.87%)
>>>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>>>                          smc_llc_do_confirm_rkey         (25.87%)
>>>>
>>>> We can see that most of the waiting times are waiting for network IO
>>>> events. This also has a certain performance improvement on our
>>>> short-lived conenction wrk/nginx benchmark test:
>>>>
>>>> +--------------+------+------+-------+--------+------+--------+
>>>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>>>> +--------------+------+------+-------+--------+------+--------+
>>>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>>>> +--------------+------+------+-------+--------+------+--------+
>>>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>>>> +--------------+------+------+-------+--------+------+--------+
>>>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>>>> +--------------+------+------+-------+--------+------+--------+
>>>>
>>>> The reason why the benefit is not obvious after the number of connections
>>>> has increased dues to workqueue. If we try to change workqueue to UNBOUND,
>>>> we can obtain at least 4-5 times performance improvement, reach up to half
>>>> of TCP. However, this is not an elegant solution, the optimization of it
>>>> will be much more complicated. But in any case, we will submit relevant
>>>> optimization patches as soon as possible.
>>>>
>>>> Please note that the premise here is that the lock related problem
>>>> must be solved first, otherwise, no matter how we optimize the workqueue,
>>>> there won't be much improvement.
>>>>
>>>> Because there are a lot of related changes to the code, if you have
>>>> any questions or suggestions, please let me know.
>>>>
>>>> Thanks
>>>> D. Wythe
>>>>
>>>> v1 -> v2:
>>>>
>>>> 1. Fix panic in SMC-D scenario
>>>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>>>> priority
>>>> 3. Only wake up one connection if the lnk is not active
>>>> 4. Delete obsolete unlock logic in smc_listen_work()
>>>> 5. PATCH format, do Reverse Christmas tree
>>>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>>>> 7. PATCH format, add correct fix tag for the patches for fixes.
>>>> 8. PATCH format, fix some spelling error
>>>> 9. PATCH format, rename slow to do_slow
>>>>
>>>> v2 -> v3:
>>>>
>>>> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
>>>> no link at all. Replace it by lgr decision maker, who provides suggestions
>>>> to SMC-D and SMC-R on whether to create new link group.
>>>>
>>>> 2. Fix the corruption problem described by PATCH 'fix application
>>>> data exception' on SMC-D.
>>>>
>>>> v3 -> v4:
>>>>
>>>> 1. Fix panic caused by uninitialization map.
>>>>
>>>> D. Wythe (10):
>>>>    net/smc: remove locks smc_client_lgr_pending and
>>>>      smc_server_lgr_pending
>>>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>>>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>>>      smc_buf_create() & smcr_buf_unuse()
>>>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>>>>    net/smc: Fix potential panic dues to unprotected
>>>>      smc_llc_srv_add_link()
>>>>    net/smc: fix application data exception
>>>>
>>>>   net/smc/af_smc.c   |  70 ++++----
>>>>   net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>   net/smc/smc_core.h |  36 +++-
>>>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>>>   net/smc/smc_llc.h  |   6 +
>>>>   net/smc/smc_wr.c   |  10 --
>>>>   net/smc/smc_wr.h   |  10 ++
>>>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-11-07 11:05       ` D. Wythe
@ 2022-11-09  9:10         ` D. Wythe
  2022-11-09 17:31           ` Wenjia Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: D. Wythe @ 2022-11-09  9:10 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma


Hi Wenjia and Jan,

I'm not sure whether my guess is right, I need some help from you. I guess the smcd_ops register_dmb()
is not thread-safe, after I remove the lock, different connections might get the same sba_idx, which will cause
the connection to be lost in the map(smcd->conn). If so, the CDC message carrying close/abort information cannot be
distributed to the correct connection, then the connection remains in link group abnormally.

/* Set a connection using this DMBE. */
void smc_ism_set_conn(struct smc_connection *conn)
{
	unsigned long flags;

	spin_lock_irqsave(&conn->lgr->smcd->lock, flags);
	conn->lgr->smcd->conn[conn->rmb_desc->sba_idx] = conn;
	spin_unlock_irqrestore(&conn->lgr->smcd->lock, flags);
}


struct smcd_ops {

	int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
}


On 11/7/22 7:05 PM, D. Wythe wrote:
> 
> Hi Wenjia,
> 
> Thanks a lot for your information, before that we thought you did PATCH test one by one,
> now I think I have found the root cause, and I will release a new version to fix this
> soon as possible.
> 
> Best Wishes.
> D. Wythe
> 
> On 11/2/22 9:55 PM, Wenjia Zhang wrote:
>>
>>
>> On 01.11.22 08:22, D. Wythe wrote:
>>>
>>> Hi Jan,
>>>
>>> Our team conducted some code reviews over this, but unfortunately no obvious problems were found. Hence
>>> we are waiting for Tony Lu's virtual SMC-D device to test, which is expected to come in this week.  Before that,
>>> I wonder if your tests are running separately on separate PATCH? If so, I would like to please you to test
>>> the first PATCH and the second PATCH together. I doubt that the problem repaired by the second PATCH
>>> is the cause of this issues.
>>>
>>> Best Wishes.
>>> D. Wythe
>>>
>>
>> Hi D. Wythe,
>>
>> We did test the series of the patches as a whole. That would be great if you could use Tony's virtual device to test SMC-D. By the way, I'll put your patches in our CI, let's see if it can find something.
>>
>> Best,
>> Wenjia
>>>
>>> On 10/24/22 9:11 PM, Jan Karcher wrote:
>>>> Hi D. Wythe,
>>>>
>>>> I re-run the tests with your fix.
>>>> SMC-R works fine now. For SMC-D we still have the following problem. It is kind of the same as i reported in v2 but even weirder:
>>>>
>>>> smc stats:
>>>>
>>>> t8345011
>>>> SMC-D Connections Summary
>>>>    Total connections handled          2465
>>>> SMC-R Connections Summary
>>>>    Total connections handled           232
>>>>
>>>> t8345010
>>>> SMC-D Connections Summary
>>>>    Total connections handled          2290
>>>> SMC-R Connections Summary
>>>>    Total connections handled           231
>>>>
>>>>
>>>> smc linkgroups:
>>>>
>>>> t8345011
>>>> [root@t8345011 ~]# smcr linkgroup
>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>> 00000400 SERV     SYM         0       0  NET25
>>>> [root@t8345011 ~]# smcd linkgroup
>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>> 00000300    0      16  NET25
>>>>
>>>> t8345010
>>>> [root@t8345010 tela-kernel]# smcr linkgroup
>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>> 00000400 CLNT     SYM         0       0  NET25
>>>> [root@t8345010 tela-kernel]# smcd linkgroup
>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>> 00000300    0       1  NET25
>>>>
>>>>
>>>> smcss:
>>>>
>>>> t8345011
>>>> [root@t8345011 ~]# smcss
>>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>>
>>>> t8345010
>>>> [root@t8345010 tela-kernel]# smcss
>>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>>
>>>>
>>>> lsmod:
>>>>
>>>> t8345011
>>>> [root@t8345011 ~]# lsmod | grep smc
>>>> smc                   225280  18 ism,smc_diag
>>>> t8345010
>>>> [root@t8345010 tela-kernel]# lsmod | grep smc
>>>> smc                   225280  3 ism,smc_diag
>>>>
>>>> Also smc_dbg and netstat do not show any more information on this problem. We only see in the dmesg that the code seems to build up SMC-R linkgroups even tho we are running the SMC-D tests.
>>>> NOTE: we disabled the syncookies for the tests.
>>>>
>>>> dmesg:
>>>>
>>>> t8345011
>>>> smc-tests: test_smcapp_torture_test started
>>>> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. Dropping request.  Check SNMP counters.
>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>>
>>>> t8345010
>>>> smc-tests: test_smcapp_torture_test started
>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>>
>>>> If this output does not help and if you want us to look deeper into it feel free to let us know and we can debug further.
>>>>
>>>> On 23/10/2022 14:43, D.Wythe wrote:
>>>>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> This patch set attempts to optimize the parallelism of SMC-R connections,
>>>>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>>>>> occur after thoses optimization.
>>>>>
>>>>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>>>>
>>>>> smc_close_passive_work                  (1.09%)
>>>>>          smcr_buf_unuse                  (1.08%)
>>>>>                  smc_llc_flow_initiate   (1.02%)
>>>>>
>>>>> smc_listen_work                         (48.17%)
>>>>>          __mutex_lock.isra.11            (47.96%)
>>>>>
>>>>>
>>>>> An ideal SMC-R connection process should only block on the IO events
>>>>> of the network, but it's quite clear that the SMC-R connection now is
>>>>> queued on the lock most of the time.
>>>>>
>>>>> The goal of this patchset is to achieve our ideal situation where
>>>>> network IO events are blocked for the majority of the connection lifetime.
>>>>>
>>>>> There are three big locks here:
>>>>>
>>>>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>>>>
>>>>> 2. llc_conf_mutex
>>>>>
>>>>> 3. rmbs_lock & sndbufs_lock
>>>>>
>>>>> And an implementation issue:
>>>>>
>>>>> 1. confirm/delete rkey msg can't be sent concurrently while
>>>>> protocol allows indeed.
>>>>>
>>>>> Unfortunately,The above problems together affect the parallelism of
>>>>> SMC-R connection. If any of them are not solved. our goal cannot
>>>>> be achieved.
>>>>>
>>>>> After this patch set, we can get a quite ideal off-CPU graph as
>>>>> following:
>>>>>
>>>>> smc_close_passive_work                                  (41.58%)
>>>>>          smcr_buf_unuse                                  (41.57%)
>>>>>                  smc_llc_do_delete_rkey                  (41.57%)
>>>>>
>>>>> smc_listen_work                                         (39.10%)
>>>>>          smc_clc_wait_msg                                (13.18%)
>>>>>                  tcp_recvmsg_locked                      (13.18)
>>>>>          smc_listen_find_device                          (25.87%)
>>>>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>>>>                          smc_llc_do_confirm_rkey         (25.87%)
>>>>>
>>>>> We can see that most of the waiting times are waiting for network IO
>>>>> events. This also has a certain performance improvement on our
>>>>> short-lived conenction wrk/nginx benchmark test:
>>>>>
>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>
>>>>> The reason why the benefit is not obvious after the number of connections
>>>>> has increased dues to workqueue. If we try to change workqueue to UNBOUND,
>>>>> we can obtain at least 4-5 times performance improvement, reach up to half
>>>>> of TCP. However, this is not an elegant solution, the optimization of it
>>>>> will be much more complicated. But in any case, we will submit relevant
>>>>> optimization patches as soon as possible.
>>>>>
>>>>> Please note that the premise here is that the lock related problem
>>>>> must be solved first, otherwise, no matter how we optimize the workqueue,
>>>>> there won't be much improvement.
>>>>>
>>>>> Because there are a lot of related changes to the code, if you have
>>>>> any questions or suggestions, please let me know.
>>>>>
>>>>> Thanks
>>>>> D. Wythe
>>>>>
>>>>> v1 -> v2:
>>>>>
>>>>> 1. Fix panic in SMC-D scenario
>>>>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>>>>> priority
>>>>> 3. Only wake up one connection if the lnk is not active
>>>>> 4. Delete obsolete unlock logic in smc_listen_work()
>>>>> 5. PATCH format, do Reverse Christmas tree
>>>>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>>>>> 7. PATCH format, add correct fix tag for the patches for fixes.
>>>>> 8. PATCH format, fix some spelling error
>>>>> 9. PATCH format, rename slow to do_slow
>>>>>
>>>>> v2 -> v3:
>>>>>
>>>>> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
>>>>> no link at all. Replace it by lgr decision maker, who provides suggestions
>>>>> to SMC-D and SMC-R on whether to create new link group.
>>>>>
>>>>> 2. Fix the corruption problem described by PATCH 'fix application
>>>>> data exception' on SMC-D.
>>>>>
>>>>> v3 -> v4:
>>>>>
>>>>> 1. Fix panic caused by uninitialization map.
>>>>>
>>>>> D. Wythe (10):
>>>>>    net/smc: remove locks smc_client_lgr_pending and
>>>>>      smc_server_lgr_pending
>>>>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>>>>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>>>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>>>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>>>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>>>>      smc_buf_create() & smcr_buf_unuse()
>>>>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>>>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>>>>>    net/smc: Fix potential panic dues to unprotected
>>>>>      smc_llc_srv_add_link()
>>>>>    net/smc: fix application data exception
>>>>>
>>>>>   net/smc/af_smc.c   |  70 ++++----
>>>>>   net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>>   net/smc/smc_core.h |  36 +++-
>>>>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>>>>   net/smc/smc_llc.h  |   6 +
>>>>>   net/smc/smc_wr.c   |  10 --
>>>>>   net/smc/smc_wr.h   |  10 ++
>>>>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-11-09  9:10         ` D. Wythe
@ 2022-11-09 17:31           ` Wenjia Zhang
  2022-11-10  7:54             ` D. Wythe
  0 siblings, 1 reply; 20+ messages in thread
From: Wenjia Zhang @ 2022-11-09 17:31 UTC (permalink / raw)
  To: D. Wythe, Jan Karcher, kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 09.11.22 10:10, D. Wythe wrote:
> 
> Hi Wenjia and Jan,
> 
> I'm not sure whether my guess is right, I need some help from you. I 
> guess the smcd_ops register_dmb()
> is not thread-safe, after I remove the lock, different connections might 
> get the same sba_idx, which will cause
> the connection to be lost in the map(smcd->conn). If so, the CDC message 
> carrying close/abort information cannot be
> distributed to the correct connection, then the connection remains in 
> link group abnormally.
> 
> /* Set a connection using this DMBE. */
> void smc_ism_set_conn(struct smc_connection *conn)
> {
>      unsigned long flags;
> 
>      spin_lock_irqsave(&conn->lgr->smcd->lock, flags);
>      conn->lgr->smcd->conn[conn->rmb_desc->sba_idx] = conn;
>      spin_unlock_irqrestore(&conn->lgr->smcd->lock, flags);
> }
> 
> 
> struct smcd_ops {
> 
>      int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
> }
> 
> 

Hi D. Wythe,

Very glad if we can help. It does look very questionable. However, I 
don't really think it's the reason to trigger the problem. I did some 
traces, and saw there was already something wrong during the CLC 
handshake, where one connection is decided for SMC-R not -D. This is one 
piece of the snapshot of the trace:

<...>-540539 [000] 306429.068196::|  smc_connect() {
<...>-540539 [000] 306429.068198::|    smc_copy_sock_settings_to_clc();
<...>-540539 [000] 306429.120310::|    smc_ism_is_v2_capable();
<...>-540539 [000] 306429.120316::|    .LASANPC6743();
<...>-540539 [000] 306429.120319::|    smc_find_proposal_devices() {
<...>-540539 [000] 306429.120319::|      smc_pnet_find_ism_resource() {
<...>-540539 [000] 306429.120331::|        smc_pnet_match();
<...>-540539 [000] 306429.120332::|      }
<...>-540539 [000] 306429.120333::|      smc_ism_get_chid();
<...>-540539 [000] 306429.120334::|      smc_pnet_find_roce_resource() {
<...>-540539 [000] 306429.120344::|        smc_pnet_match();
<...>-540539 [000] 306429.120346::|        smc_ib_port_active();
<...>-540539 [000] 306429.120347::|        smc_pnet_determine_gid() {
<...>-540539 [000] 306429.120347::|          smc_ib_determine_gid() {
<...>-540539 [000] 306429.120350::|            smc_ib_determine_gid_rcu();
<...>-540539 [000] 306429.120351::|          }
<...>-540539 [000] 306429.120352::|        }
<...>-540539 [000] 306429.120352::|      }
<...>-540539 [000] 306429.120353::|      smc_ism_is_v2_capable();
<...>-540539 [000] 306429.120355::|      smc_clc_ueid_count();
<...>-540539 [000] 306429.120357::|      smc_pnet_find_roce_resource() {
<...>-540539 [000] 306429.120367::|        smc_pnet_match();
<...>-540539 [000] 306429.120368::|        smc_ib_port_active();
<...>-540539 [000] 306429.120369::|        smc_pnet_determine_gid() {
<...>-540539 [000] 306429.120370::|          smc_ib_determine_gid() {
<...>-540539 [000] 306429.120372::|            smc_ib_determine_gid_rcu();
<...>-540539 [000] 306429.120376::|            smc_ib_determine_gid_rcu();
<...>-540539 [000] 306429.120379::|            smc_ib_determine_gid_rcu();
<...>-540539 [000] 306429.120382::|            smc_ib_determine_gid_rcu();
<...>-540539 [000] 306429.120650::|          }
<...>-540539 [000] 306429.120651::|        }
<...>-540539 [000] 306429.120652::|        smc_pnet_match();
<...>-540539 [000] 306429.120653::|        smc_ib_port_active();
<...>-540539 [000] 306429.120654::|        smc_pnet_determine_gid() {
<...>-540539 [000] 306429.120654::|          smc_ib_determine_gid() {
<...>-540539 [000] 306429.120657::|            smc_ib_determine_gid_rcu();
<...>-540539 [000] 306429.120660::|            smc_ib_determine_gid_rcu();
<...>-540539 [000] 306429.120829::|          }
<...>-540539 [000] 306429.120829::|        }
<...>-540539 [000] 306429.120830::|      }
<...>-540539 [000] 306429.120831::|    }
<...>-540539 [000] 306429.120836::|    .LASANPC6660() {
<...>-540539 [000] 306429.120843::|      .LASANPC6654() {
<...>-540539 [000] 306429.120847::|        smc_clc_prfx_set4_rcu.isra.0();
<...>-540539 [000] 306429.120849::|      }
<...>-540539 [000] 306429.120850::|      smc_ism_get_chid();
<...>-540539 [000] 306429.120851::|      smc_ism_get_system_eid();
<...>-540539 [000] 306429.120889::|    }
<...>-540539 [000] 306429.120890::|    .LASANPC6658() {
<...>-540539 [000] 306429.124906::|      smc_clc_msg_hdr_valid();
<...>-540539 [000] 306429.124908::|    }
<...>-540539 [000] 306429.124908::|    .LASANPC6727() {
<...>-540539 [000] 306429.124909::|      smc_connect_rdma_v2_prepare();
<...>-540539 [000] 306429.124912::|      smc_conn_create() {

> On 11/7/22 7:05 PM, D. Wythe wrote:
>>
>> Hi Wenjia,
>>
>> Thanks a lot for your information, before that we thought you did 
>> PATCH test one by one,
>> now I think I have found the root cause, and I will release a new 
>> version to fix this
>> soon as possible.
>>
>> Best Wishes.
>> D. Wythe
>>
>> On 11/2/22 9:55 PM, Wenjia Zhang wrote:
>>>
>>>
>>> On 01.11.22 08:22, D. Wythe wrote:
>>>>
>>>> Hi Jan,
>>>>
>>>> Our team conducted some code reviews over this, but unfortunately no 
>>>> obvious problems were found. Hence
>>>> we are waiting for Tony Lu's virtual SMC-D device to test, which is 
>>>> expected to come in this week.  Before that,
>>>> I wonder if your tests are running separately on separate PATCH? If 
>>>> so, I would like to please you to test
>>>> the first PATCH and the second PATCH together. I doubt that the 
>>>> problem repaired by the second PATCH
>>>> is the cause of this issues.
>>>>
>>>> Best Wishes.
>>>> D. Wythe
>>>>
>>>
>>> Hi D. Wythe,
>>>
>>> We did test the series of the patches as a whole. That would be great 
>>> if you could use Tony's virtual device to test SMC-D. By the way, 
>>> I'll put your patches in our CI, let's see if it can find something.
>>>
>>> Best,
>>> Wenjia
>>>>
>>>> On 10/24/22 9:11 PM, Jan Karcher wrote:
>>>>> Hi D. Wythe,
>>>>>
>>>>> I re-run the tests with your fix.
>>>>> SMC-R works fine now. For SMC-D we still have the following 
>>>>> problem. It is kind of the same as i reported in v2 but even weirder:
>>>>>
>>>>> smc stats:
>>>>>
>>>>> t8345011
>>>>> SMC-D Connections Summary
>>>>>    Total connections handled          2465
>>>>> SMC-R Connections Summary
>>>>>    Total connections handled           232
>>>>>
>>>>> t8345010
>>>>> SMC-D Connections Summary
>>>>>    Total connections handled          2290
>>>>> SMC-R Connections Summary
>>>>>    Total connections handled           231
>>>>>
>>>>>
>>>>> smc linkgroups:
>>>>>
>>>>> t8345011
>>>>> [root@t8345011 ~]# smcr linkgroup
>>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>>> 00000400 SERV     SYM         0       0  NET25
>>>>> [root@t8345011 ~]# smcd linkgroup
>>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>>> 00000300    0      16  NET25
>>>>>
>>>>> t8345010
>>>>> [root@t8345010 tela-kernel]# smcr linkgroup
>>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>>> 00000400 CLNT     SYM         0       0  NET25
>>>>> [root@t8345010 tela-kernel]# smcd linkgroup
>>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>>> 00000300    0       1  NET25
>>>>>
>>>>>
>>>>> smcss:
>>>>>
>>>>> t8345011
>>>>> [root@t8345011 ~]# smcss
>>>>> State          UID   Inode   Local Address           Peer Address 
>>>>> Intf Mode
>>>>>
>>>>> t8345010
>>>>> [root@t8345010 tela-kernel]# smcss
>>>>> State          UID   Inode   Local Address           Peer Address 
>>>>> Intf Mode
>>>>>
>>>>>
>>>>> lsmod:
>>>>>
>>>>> t8345011
>>>>> [root@t8345011 ~]# lsmod | grep smc
>>>>> smc                   225280  18 ism,smc_diag
>>>>> t8345010
>>>>> [root@t8345010 tela-kernel]# lsmod | grep smc
>>>>> smc                   225280  3 ism,smc_diag
>>>>>
>>>>> Also smc_dbg and netstat do not show any more information on this 
>>>>> problem. We only see in the dmesg that the code seems to build up 
>>>>> SMC-R linkgroups even tho we are running the SMC-D tests.
>>>>> NOTE: we disabled the syncookies for the tests.
>>>>>
>>>>> dmesg:
>>>>>
>>>>> t8345011
>>>>> smc-tests: test_smcapp_torture_test started
>>>>> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. 
>>>>> Dropping request.  Check SNMP counters.
>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, 
>>>>> peerid 00000401, ibdev mlx5_0, ibport 1
>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid 
>>>>> NET25
>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, 
>>>>> peerid 00000402, ibdev mlx5_1, ibport 1
>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, 
>>>>> pnetid NET25
>>>>>
>>>>> t8345010
>>>>> smc-tests: test_smcapp_torture_test started
>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, 
>>>>> peerid 00000401, ibdev mlx5_0, ibport 1
>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid 
>>>>> NET25
>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, 
>>>>> peerid 00000402, ibdev mlx5_1, ibport 1
>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, 
>>>>> pnetid NET25
>>>>>
>>>>> If this output does not help and if you want us to look deeper into 
>>>>> it feel free to let us know and we can debug further.
>>>>>
>>>>> On 23/10/2022 14:43, D.Wythe wrote:
>>>>>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>>>>>
>>>>>> This patch set attempts to optimize the parallelism of SMC-R 
>>>>>> connections,
>>>>>> mainly to reduce unnecessary blocking on locks, and to fix 
>>>>>> exceptions that
>>>>>> occur after thoses optimization.
>>>>>>
>>>>>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>>>>>
>>>>>> smc_close_passive_work                  (1.09%)
>>>>>>          smcr_buf_unuse                  (1.08%)
>>>>>>                  smc_llc_flow_initiate   (1.02%)
>>>>>>
>>>>>> smc_listen_work                         (48.17%)
>>>>>>          __mutex_lock.isra.11            (47.96%)
>>>>>>
>>>>>>
>>>>>> An ideal SMC-R connection process should only block on the IO events
>>>>>> of the network, but it's quite clear that the SMC-R connection now is
>>>>>> queued on the lock most of the time.
>>>>>>
>>>>>> The goal of this patchset is to achieve our ideal situation where
>>>>>> network IO events are blocked for the majority of the connection 
>>>>>> lifetime.
>>>>>>
>>>>>> There are three big locks here:
>>>>>>
>>>>>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>>>>>
>>>>>> 2. llc_conf_mutex
>>>>>>
>>>>>> 3. rmbs_lock & sndbufs_lock
>>>>>>
>>>>>> And an implementation issue:
>>>>>>
>>>>>> 1. confirm/delete rkey msg can't be sent concurrently while
>>>>>> protocol allows indeed.
>>>>>>
>>>>>> Unfortunately,The above problems together affect the parallelism of
>>>>>> SMC-R connection. If any of them are not solved. our goal cannot
>>>>>> be achieved.
>>>>>>
>>>>>> After this patch set, we can get a quite ideal off-CPU graph as
>>>>>> following:
>>>>>>
>>>>>> smc_close_passive_work                                  (41.58%)
>>>>>>          smcr_buf_unuse                                  (41.57%)
>>>>>>                  smc_llc_do_delete_rkey                  (41.57%)
>>>>>>
>>>>>> smc_listen_work                                         (39.10%)
>>>>>>          smc_clc_wait_msg                                (13.18%)
>>>>>>                  tcp_recvmsg_locked                      (13.18)
>>>>>>          smc_listen_find_device                          (25.87%)
>>>>>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>>>>>                          smc_llc_do_confirm_rkey         (25.87%)
>>>>>>
>>>>>> We can see that most of the waiting times are waiting for network IO
>>>>>> events. This also has a certain performance improvement on our
>>>>>> short-lived conenction wrk/nginx benchmark test:
>>>>>>
>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>
>>>>>> The reason why the benefit is not obvious after the number of 
>>>>>> connections
>>>>>> has increased dues to workqueue. If we try to change workqueue to 
>>>>>> UNBOUND,
>>>>>> we can obtain at least 4-5 times performance improvement, reach up 
>>>>>> to half
>>>>>> of TCP. However, this is not an elegant solution, the optimization 
>>>>>> of it
>>>>>> will be much more complicated. But in any case, we will submit 
>>>>>> relevant
>>>>>> optimization patches as soon as possible.
>>>>>>
>>>>>> Please note that the premise here is that the lock related problem
>>>>>> must be solved first, otherwise, no matter how we optimize the 
>>>>>> workqueue,
>>>>>> there won't be much improvement.
>>>>>>
>>>>>> Because there are a lot of related changes to the code, if you have
>>>>>> any questions or suggestions, please let me know.
>>>>>>
>>>>>> Thanks
>>>>>> D. Wythe
>>>>>>
>>>>>> v1 -> v2:
>>>>>>
>>>>>> 1. Fix panic in SMC-D scenario
>>>>>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>>>>>> priority
>>>>>> 3. Only wake up one connection if the lnk is not active
>>>>>> 4. Delete obsolete unlock logic in smc_listen_work()
>>>>>> 5. PATCH format, do Reverse Christmas tree
>>>>>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>>>>>> 7. PATCH format, add correct fix tag for the patches for fixes.
>>>>>> 8. PATCH format, fix some spelling error
>>>>>> 9. PATCH format, rename slow to do_slow
>>>>>>
>>>>>> v2 -> v3:
>>>>>>
>>>>>> 1. add SMC-D support, remove the concept of link cluster since 
>>>>>> SMC-D has
>>>>>> no link at all. Replace it by lgr decision maker, who provides 
>>>>>> suggestions
>>>>>> to SMC-D and SMC-R on whether to create new link group.
>>>>>>
>>>>>> 2. Fix the corruption problem described by PATCH 'fix application
>>>>>> data exception' on SMC-D.
>>>>>>
>>>>>> v3 -> v4:
>>>>>>
>>>>>> 1. Fix panic caused by uninitialization map.
>>>>>>
>>>>>> D. Wythe (10):
>>>>>>    net/smc: remove locks smc_client_lgr_pending and
>>>>>>      smc_server_lgr_pending
>>>>>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without 
>>>>>> smc_server_lgr_pending
>>>>>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>>>>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>>>>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>>>>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>>>>>      smc_buf_create() & smcr_buf_unuse()
>>>>>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>>>>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with 
>>>>>> rw_semaphore
>>>>>>    net/smc: Fix potential panic dues to unprotected
>>>>>>      smc_llc_srv_add_link()
>>>>>>    net/smc: fix application data exception
>>>>>>
>>>>>>   net/smc/af_smc.c   |  70 ++++----
>>>>>>   net/smc/smc_core.c | 478 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>>>   net/smc/smc_core.h |  36 +++-
>>>>>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>>>>>   net/smc/smc_llc.h  |   6 +
>>>>>>   net/smc/smc_wr.c   |  10 --
>>>>>>   net/smc/smc_wr.h   |  10 ++
>>>>>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>>>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-11-09 17:31           ` Wenjia Zhang
@ 2022-11-10  7:54             ` D. Wythe
  2022-11-10  9:39               ` D. Wythe
  0 siblings, 1 reply; 20+ messages in thread
From: D. Wythe @ 2022-11-10  7:54 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma


Hi Wenjia,

According to code in ism_register_dmb():

	if (!dmb->sba_idx) {
		bit = find_next_zero_bit(ism->sba_bitmap, ISM_NR_DMBS,
					 ISM_DMB_BIT_OFFSET);
		if (bit == ISM_NR_DMBS)
			return -ENOSPC;

		dmb->sba_idx = bit;
	}
	if (dmb->sba_idx < ISM_DMB_BIT_OFFSET ||
	    test_and_set_bit(dmb->sba_idx, ism->sba_bitmap))
		return -EINVAL;

We can see that ism_register_dmb() is not thread-safe, invoking this function at the same time
without protected may fail with -EINVAL (different callers might get the same bit on
find_next_zero_bit()).

Considering the stack call chain:

smc_listen_work

smc_listen_find_device
	smc_find_ism_v2_device_serv()
	if (ini->dev[0])
		return
	smc_find_ism_v1_device_serv()
	if (ini->dev[0])
		return
	smc_find_rdma_v2_device_serv()

smc_find_ism_v2_device_serv:

	/* separate - outside the smcd_dev_list.lock */
	smcd_version = ini->smcd_version;
	for (i = 0; i < matches; i++) {
		ini->smcd_version = SMC_V2;
		ini->is_smcd = true;
		ini->ism_selected = i;
		rc = smc_listen_ism_init(new_smc, ini);
		if (rc) {
			smc_find_ism_store_rc(rc, ini);
			/* try next active ISM device */
			continue;
		}
		return; /* matching and usable V2 ISM device found */
	}
	/* no V2 ISM device could be initialized */
	ini->smcd_version = smcd_version;	/* restore original value */
	ini->negotiated_eid[0] = 0;

not_found:
	ini->smcd_version &= ~SMC_V2;
	ini->ism_dev[0] = NULL;
	ini->is_smcd = false;


smc_listen_ism_init
smc_buf_create
__smc_buf_create
smcd_new_buf_create(is_rmb = true)

smc_ism_register_dmb
	return -EINVAL;


Therefore, the failure of ism_register_dmb() will result in smc_find_ism_v2_device_serv() failed,
Similarly, smc_find_ism_v1_device_serv() may also fail too, then SMC-R is finally selected.
If my guess is correct, the SMC-D connections should include different versions of v1 and v2 if
v1&v2 are both supplied. (you could see it by $(smcr -dd stats))


However, the leak problem of connection cannot be explained.
Could you help me dump the status of those connection, I wish to
know which state they stayed at last. (crash utils or $(smcr -a))


Thanks.
D. Wythe


On 11/10/22 1:31 AM, Wenjia Zhang wrote:
> 
> 
> On 09.11.22 10:10, D. Wythe wrote:
>>
>> Hi Wenjia and Jan,
>>
>> I'm not sure whether my guess is right, I need some help from you. I guess the smcd_ops register_dmb()
>> is not thread-safe, after I remove the lock, different connections might get the same sba_idx, which will cause
>> the connection to be lost in the map(smcd->conn). If so, the CDC message carrying close/abort information cannot be
>> distributed to the correct connection, then the connection remains in link group abnormally.
>>
>> /* Set a connection using this DMBE. */
>> void smc_ism_set_conn(struct smc_connection *conn)
>> {
>>      unsigned long flags;
>>
>>      spin_lock_irqsave(&conn->lgr->smcd->lock, flags);
>>      conn->lgr->smcd->conn[conn->rmb_desc->sba_idx] = conn;
>>      spin_unlock_irqrestore(&conn->lgr->smcd->lock, flags);
>> }
>>
>>
>> struct smcd_ops {
>>
>>      int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
>> }
>>
>>
> 
> Hi D. Wythe,
> 
> Very glad if we can help. It does look very questionable. However, I don't really think it's the reason to trigger the problem. I did some traces, and saw there was already something wrong during the CLC handshake, where one connection is decided for SMC-R not -D. This is one piece of the snapshot of the trace:
> 
> <...>-540539 [000] 306429.068196::|  smc_connect() {
> <...>-540539 [000] 306429.068198::|    smc_copy_sock_settings_to_clc();
> <...>-540539 [000] 306429.120310::|    smc_ism_is_v2_capable();
> <...>-540539 [000] 306429.120316::|    .LASANPC6743();
> <...>-540539 [000] 306429.120319::|    smc_find_proposal_devices() {
> <...>-540539 [000] 306429.120319::|      smc_pnet_find_ism_resource() {
> <...>-540539 [000] 306429.120331::|        smc_pnet_match();
> <...>-540539 [000] 306429.120332::|      }
> <...>-540539 [000] 306429.120333::|      smc_ism_get_chid();
> <...>-540539 [000] 306429.120334::|      smc_pnet_find_roce_resource() {
> <...>-540539 [000] 306429.120344::|        smc_pnet_match();
> <...>-540539 [000] 306429.120346::|        smc_ib_port_active();
> <...>-540539 [000] 306429.120347::|        smc_pnet_determine_gid() {
> <...>-540539 [000] 306429.120347::|          smc_ib_determine_gid() {
> <...>-540539 [000] 306429.120350::|            smc_ib_determine_gid_rcu();
> <...>-540539 [000] 306429.120351::|          }
> <...>-540539 [000] 306429.120352::|        }
> <...>-540539 [000] 306429.120352::|      }
> <...>-540539 [000] 306429.120353::|      smc_ism_is_v2_capable();
> <...>-540539 [000] 306429.120355::|      smc_clc_ueid_count();
> <...>-540539 [000] 306429.120357::|      smc_pnet_find_roce_resource() {
> <...>-540539 [000] 306429.120367::|        smc_pnet_match();
> <...>-540539 [000] 306429.120368::|        smc_ib_port_active();
> <...>-540539 [000] 306429.120369::|        smc_pnet_determine_gid() {
> <...>-540539 [000] 306429.120370::|          smc_ib_determine_gid() {
> <...>-540539 [000] 306429.120372::|            smc_ib_determine_gid_rcu();
> <...>-540539 [000] 306429.120376::|            smc_ib_determine_gid_rcu();
> <...>-540539 [000] 306429.120379::|            smc_ib_determine_gid_rcu();
> <...>-540539 [000] 306429.120382::|            smc_ib_determine_gid_rcu();
> <...>-540539 [000] 306429.120650::|          }
> <...>-540539 [000] 306429.120651::|        }
> <...>-540539 [000] 306429.120652::|        smc_pnet_match();
> <...>-540539 [000] 306429.120653::|        smc_ib_port_active();
> <...>-540539 [000] 306429.120654::|        smc_pnet_determine_gid() {
> <...>-540539 [000] 306429.120654::|          smc_ib_determine_gid() {
> <...>-540539 [000] 306429.120657::|            smc_ib_determine_gid_rcu();
> <...>-540539 [000] 306429.120660::|            smc_ib_determine_gid_rcu();
> <...>-540539 [000] 306429.120829::|          }
> <...>-540539 [000] 306429.120829::|        }
> <...>-540539 [000] 306429.120830::|      }
> <...>-540539 [000] 306429.120831::|    }
> <...>-540539 [000] 306429.120836::|    .LASANPC6660() {
> <...>-540539 [000] 306429.120843::|      .LASANPC6654() {
> <...>-540539 [000] 306429.120847::|        smc_clc_prfx_set4_rcu.isra.0();
> <...>-540539 [000] 306429.120849::|      }
> <...>-540539 [000] 306429.120850::|      smc_ism_get_chid();
> <...>-540539 [000] 306429.120851::|      smc_ism_get_system_eid();
> <...>-540539 [000] 306429.120889::|    }
> <...>-540539 [000] 306429.120890::|    .LASANPC6658() {
> <...>-540539 [000] 306429.124906::|      smc_clc_msg_hdr_valid();
> <...>-540539 [000] 306429.124908::|    }
> <...>-540539 [000] 306429.124908::|    .LASANPC6727() {
> <...>-540539 [000] 306429.124909::|      smc_connect_rdma_v2_prepare();
> <...>-540539 [000] 306429.124912::|      smc_conn_create() {
> 
>> On 11/7/22 7:05 PM, D. Wythe wrote:
>>>
>>> Hi Wenjia,
>>>
>>> Thanks a lot for your information, before that we thought you did PATCH test one by one,
>>> now I think I have found the root cause, and I will release a new version to fix this
>>> soon as possible.
>>>
>>> Best Wishes.
>>> D. Wythe
>>>
>>> On 11/2/22 9:55 PM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 01.11.22 08:22, D. Wythe wrote:
>>>>>
>>>>> Hi Jan,
>>>>>
>>>>> Our team conducted some code reviews over this, but unfortunately no obvious problems were found. Hence
>>>>> we are waiting for Tony Lu's virtual SMC-D device to test, which is expected to come in this week.  Before that,
>>>>> I wonder if your tests are running separately on separate PATCH? If so, I would like to please you to test
>>>>> the first PATCH and the second PATCH together. I doubt that the problem repaired by the second PATCH
>>>>> is the cause of this issues.
>>>>>
>>>>> Best Wishes.
>>>>> D. Wythe
>>>>>
>>>>
>>>> Hi D. Wythe,
>>>>
>>>> We did test the series of the patches as a whole. That would be great if you could use Tony's virtual device to test SMC-D. By the way, I'll put your patches in our CI, let's see if it can find something.
>>>>
>>>> Best,
>>>> Wenjia
>>>>>
>>>>> On 10/24/22 9:11 PM, Jan Karcher wrote:
>>>>>> Hi D. Wythe,
>>>>>>
>>>>>> I re-run the tests with your fix.
>>>>>> SMC-R works fine now. For SMC-D we still have the following problem. It is kind of the same as i reported in v2 but even weirder:
>>>>>>
>>>>>> smc stats:
>>>>>>
>>>>>> t8345011
>>>>>> SMC-D Connections Summary
>>>>>>    Total connections handled          2465
>>>>>> SMC-R Connections Summary
>>>>>>    Total connections handled           232
>>>>>>
>>>>>> t8345010
>>>>>> SMC-D Connections Summary
>>>>>>    Total connections handled          2290
>>>>>> SMC-R Connections Summary
>>>>>>    Total connections handled           231
>>>>>>
>>>>>>
>>>>>> smc linkgroups:
>>>>>>
>>>>>> t8345011
>>>>>> [root@t8345011 ~]# smcr linkgroup
>>>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>>>> 00000400 SERV     SYM         0       0  NET25
>>>>>> [root@t8345011 ~]# smcd linkgroup
>>>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>>>> 00000300    0      16  NET25
>>>>>>
>>>>>> t8345010
>>>>>> [root@t8345010 tela-kernel]# smcr linkgroup
>>>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>>>> 00000400 CLNT     SYM         0       0  NET25
>>>>>> [root@t8345010 tela-kernel]# smcd linkgroup
>>>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>>>> 00000300    0       1  NET25
>>>>>>
>>>>>>
>>>>>> smcss:
>>>>>>
>>>>>> t8345011
>>>>>> [root@t8345011 ~]# smcss
>>>>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>>>>
>>>>>> t8345010
>>>>>> [root@t8345010 tela-kernel]# smcss
>>>>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>>>>
>>>>>>
>>>>>> lsmod:
>>>>>>
>>>>>> t8345011
>>>>>> [root@t8345011 ~]# lsmod | grep smc
>>>>>> smc                   225280  18 ism,smc_diag
>>>>>> t8345010
>>>>>> [root@t8345010 tela-kernel]# lsmod | grep smc
>>>>>> smc                   225280  3 ism,smc_diag
>>>>>>
>>>>>> Also smc_dbg and netstat do not show any more information on this problem. We only see in the dmesg that the code seems to build up SMC-R linkgroups even tho we are running the SMC-D tests.
>>>>>> NOTE: we disabled the syncookies for the tests.
>>>>>>
>>>>>> dmesg:
>>>>>>
>>>>>> t8345011
>>>>>> smc-tests: test_smcapp_torture_test started
>>>>>> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. Dropping request.  Check SNMP counters.
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>>>>
>>>>>> t8345010
>>>>>> smc-tests: test_smcapp_torture_test started
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>>>>
>>>>>> If this output does not help and if you want us to look deeper into it feel free to let us know and we can debug further.
>>>>>>
>>>>>> On 23/10/2022 14:43, D.Wythe wrote:
>>>>>>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>>>>>>
>>>>>>> This patch set attempts to optimize the parallelism of SMC-R connections,
>>>>>>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>>>>>>> occur after thoses optimization.
>>>>>>>
>>>>>>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>>>>>>
>>>>>>> smc_close_passive_work                  (1.09%)
>>>>>>>          smcr_buf_unuse                  (1.08%)
>>>>>>>                  smc_llc_flow_initiate   (1.02%)
>>>>>>>
>>>>>>> smc_listen_work                         (48.17%)
>>>>>>>          __mutex_lock.isra.11            (47.96%)
>>>>>>>
>>>>>>>
>>>>>>> An ideal SMC-R connection process should only block on the IO events
>>>>>>> of the network, but it's quite clear that the SMC-R connection now is
>>>>>>> queued on the lock most of the time.
>>>>>>>
>>>>>>> The goal of this patchset is to achieve our ideal situation where
>>>>>>> network IO events are blocked for the majority of the connection lifetime.
>>>>>>>
>>>>>>> There are three big locks here:
>>>>>>>
>>>>>>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>>>>>>
>>>>>>> 2. llc_conf_mutex
>>>>>>>
>>>>>>> 3. rmbs_lock & sndbufs_lock
>>>>>>>
>>>>>>> And an implementation issue:
>>>>>>>
>>>>>>> 1. confirm/delete rkey msg can't be sent concurrently while
>>>>>>> protocol allows indeed.
>>>>>>>
>>>>>>> Unfortunately,The above problems together affect the parallelism of
>>>>>>> SMC-R connection. If any of them are not solved. our goal cannot
>>>>>>> be achieved.
>>>>>>>
>>>>>>> After this patch set, we can get a quite ideal off-CPU graph as
>>>>>>> following:
>>>>>>>
>>>>>>> smc_close_passive_work                                  (41.58%)
>>>>>>>          smcr_buf_unuse                                  (41.57%)
>>>>>>>                  smc_llc_do_delete_rkey                  (41.57%)
>>>>>>>
>>>>>>> smc_listen_work                                         (39.10%)
>>>>>>>          smc_clc_wait_msg                                (13.18%)
>>>>>>>                  tcp_recvmsg_locked                      (13.18)
>>>>>>>          smc_listen_find_device                          (25.87%)
>>>>>>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>>>>>>                          smc_llc_do_confirm_rkey         (25.87%)
>>>>>>>
>>>>>>> We can see that most of the waiting times are waiting for network IO
>>>>>>> events. This also has a certain performance improvement on our
>>>>>>> short-lived conenction wrk/nginx benchmark test:
>>>>>>>
>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>>
>>>>>>> The reason why the benefit is not obvious after the number of connections
>>>>>>> has increased dues to workqueue. If we try to change workqueue to UNBOUND,
>>>>>>> we can obtain at least 4-5 times performance improvement, reach up to half
>>>>>>> of TCP. However, this is not an elegant solution, the optimization of it
>>>>>>> will be much more complicated. But in any case, we will submit relevant
>>>>>>> optimization patches as soon as possible.
>>>>>>>
>>>>>>> Please note that the premise here is that the lock related problem
>>>>>>> must be solved first, otherwise, no matter how we optimize the workqueue,
>>>>>>> there won't be much improvement.
>>>>>>>
>>>>>>> Because there are a lot of related changes to the code, if you have
>>>>>>> any questions or suggestions, please let me know.
>>>>>>>
>>>>>>> Thanks
>>>>>>> D. Wythe
>>>>>>>
>>>>>>> v1 -> v2:
>>>>>>>
>>>>>>> 1. Fix panic in SMC-D scenario
>>>>>>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>>>>>>> priority
>>>>>>> 3. Only wake up one connection if the lnk is not active
>>>>>>> 4. Delete obsolete unlock logic in smc_listen_work()
>>>>>>> 5. PATCH format, do Reverse Christmas tree
>>>>>>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>>>>>>> 7. PATCH format, add correct fix tag for the patches for fixes.
>>>>>>> 8. PATCH format, fix some spelling error
>>>>>>> 9. PATCH format, rename slow to do_slow
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>>
>>>>>>> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
>>>>>>> no link at all. Replace it by lgr decision maker, who provides suggestions
>>>>>>> to SMC-D and SMC-R on whether to create new link group.
>>>>>>>
>>>>>>> 2. Fix the corruption problem described by PATCH 'fix application
>>>>>>> data exception' on SMC-D.
>>>>>>>
>>>>>>> v3 -> v4:
>>>>>>>
>>>>>>> 1. Fix panic caused by uninitialization map.
>>>>>>>
>>>>>>> D. Wythe (10):
>>>>>>>    net/smc: remove locks smc_client_lgr_pending and
>>>>>>>      smc_server_lgr_pending
>>>>>>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>>>>>>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>>>>>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>>>>>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>>>>>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>>>>>>      smc_buf_create() & smcr_buf_unuse()
>>>>>>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>>>>>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>>>>>>>    net/smc: Fix potential panic dues to unprotected
>>>>>>>      smc_llc_srv_add_link()
>>>>>>>    net/smc: fix application data exception
>>>>>>>
>>>>>>>   net/smc/af_smc.c   |  70 ++++----
>>>>>>>   net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>>>>   net/smc/smc_core.h |  36 +++-
>>>>>>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>>>>>>   net/smc/smc_llc.h  |   6 +
>>>>>>>   net/smc/smc_wr.c   |  10 --
>>>>>>>   net/smc/smc_wr.h   |  10 ++
>>>>>>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>>>>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections
  2022-11-10  7:54             ` D. Wythe
@ 2022-11-10  9:39               ` D. Wythe
  0 siblings, 0 replies; 20+ messages in thread
From: D. Wythe @ 2022-11-10  9:39 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, kgraul
  Cc: kuba, davem, netdev, linux-s390, linux-rdma


Hi Wenjia,

I've found the cause of the connection leak, which is also related to failure
mentioned in my last email. Furthermore, the failure occurs on the first contact
connection, and unfortunately, our decision-maker wrongly allows SMC-D connection to
be attached to lgr before the first contact. Considering the call chain:


smc_buf_create()
	ism_register_dmb() = -EINVAL

smc_conn_abort
if (local_first && lgr_valid)
	smc_lgr_cleanup_early(lgr);

smc_lgr_cleanup_early
__smc_lgr_terminate
smc_conn_kill(conn, soft);
if (conn->lgr->is_smcd) {
	smc_ism_unset_conn(conn);
}

smc_ism_unset_conn
	conn->lgr->smcd->conn[conn->rmb_desc->sba_idx] = NULL;

As a result, the mapping of the connections on the smcd->conn is lost, and
those connections know nothing about it. They can no longer receive any messages
from ISM device, including close/abort messages.

I will fix these two problems as soon as possible by:

1. call ism_register_dmb() with lock to avoid unexpected failure.
2. force SMC-D connection to be attached to lgr after the first contact done.

D. Wythe

On 11/10/22 3:54 PM, D. Wythe wrote:
> 
> Hi Wenjia,
> 
> According to code in ism_register_dmb():
> 
>      if (!dmb->sba_idx) {
>          bit = find_next_zero_bit(ism->sba_bitmap, ISM_NR_DMBS,
>                       ISM_DMB_BIT_OFFSET);
>          if (bit == ISM_NR_DMBS)
>              return -ENOSPC;
> 
>          dmb->sba_idx = bit;
>      }
>      if (dmb->sba_idx < ISM_DMB_BIT_OFFSET ||
>          test_and_set_bit(dmb->sba_idx, ism->sba_bitmap))
>          return -EINVAL;
> 
> We can see that ism_register_dmb() is not thread-safe, invoking this function at the same time
> without protected may fail with -EINVAL (different callers might get the same bit on
> find_next_zero_bit()).
> 
> Considering the stack call chain:
> 
> smc_listen_work
> 
> smc_listen_find_device
>      smc_find_ism_v2_device_serv()
>      if (ini->dev[0])
>          return
>      smc_find_ism_v1_device_serv()
>      if (ini->dev[0])
>          return
>      smc_find_rdma_v2_device_serv()
> 
> smc_find_ism_v2_device_serv:
> 
>      /* separate - outside the smcd_dev_list.lock */
>      smcd_version = ini->smcd_version;
>      for (i = 0; i < matches; i++) {
>          ini->smcd_version = SMC_V2;
>          ini->is_smcd = true;
>          ini->ism_selected = i;
>          rc = smc_listen_ism_init(new_smc, ini);
>          if (rc) {
>              smc_find_ism_store_rc(rc, ini);
>              /* try next active ISM device */
>              continue;
>          }
>          return; /* matching and usable V2 ISM device found */
>      }
>      /* no V2 ISM device could be initialized */
>      ini->smcd_version = smcd_version;    /* restore original value */
>      ini->negotiated_eid[0] = 0;
> 
> not_found:
>      ini->smcd_version &= ~SMC_V2;
>      ini->ism_dev[0] = NULL;
>      ini->is_smcd = false;
> 
> 
> smc_listen_ism_init
> smc_buf_create
> __smc_buf_create
> smcd_new_buf_create(is_rmb = true)
> 
> smc_ism_register_dmb
>      return -EINVAL;
> 
> 
> Therefore, the failure of ism_register_dmb() will result in smc_find_ism_v2_device_serv() failed,
> Similarly, smc_find_ism_v1_device_serv() may also fail too, then SMC-R is finally selected.
> If my guess is correct, the SMC-D connections should include different versions of v1 and v2 if
> v1&v2 are both supplied. (you could see it by $(smcr -dd stats))
> 
> 
> However, the leak problem of connection cannot be explained.
> Could you help me dump the status of those connection, I wish to
> know which state they stayed at last. (crash utils or $(smcr -a))
> 
> 
> Thanks.
> D. Wythe
> 
> 
> On 11/10/22 1:31 AM, Wenjia Zhang wrote:
>>
>>
>> On 09.11.22 10:10, D. Wythe wrote:
>>>
>>> Hi Wenjia and Jan,
>>>
>>> I'm not sure whether my guess is right, I need some help from you. I guess the smcd_ops register_dmb()
>>> is not thread-safe, after I remove the lock, different connections might get the same sba_idx, which will cause
>>> the connection to be lost in the map(smcd->conn). If so, the CDC message carrying close/abort information cannot be
>>> distributed to the correct connection, then the connection remains in link group abnormally.
>>>
>>> /* Set a connection using this DMBE. */
>>> void smc_ism_set_conn(struct smc_connection *conn)
>>> {
>>>      unsigned long flags;
>>>
>>>      spin_lock_irqsave(&conn->lgr->smcd->lock, flags);
>>>      conn->lgr->smcd->conn[conn->rmb_desc->sba_idx] = conn;
>>>      spin_unlock_irqrestore(&conn->lgr->smcd->lock, flags);
>>> }
>>>
>>>
>>> struct smcd_ops {
>>>
>>>      int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
>>> }
>>>
>>>
>>
>> Hi D. Wythe,
>>
>> Very glad if we can help. It does look very questionable. However, I don't really think it's the reason to trigger the problem. I did some traces, and saw there was already something wrong during the CLC handshake, where one connection is decided for SMC-R not -D. This is one piece of the snapshot of the trace:
>>
>> <...>-540539 [000] 306429.068196::|  smc_connect() {
>> <...>-540539 [000] 306429.068198::|    smc_copy_sock_settings_to_clc();
>> <...>-540539 [000] 306429.120310::|    smc_ism_is_v2_capable();
>> <...>-540539 [000] 306429.120316::|    .LASANPC6743();
>> <...>-540539 [000] 306429.120319::|    smc_find_proposal_devices() {
>> <...>-540539 [000] 306429.120319::|      smc_pnet_find_ism_resource() {
>> <...>-540539 [000] 306429.120331::|        smc_pnet_match();
>> <...>-540539 [000] 306429.120332::|      }
>> <...>-540539 [000] 306429.120333::|      smc_ism_get_chid();
>> <...>-540539 [000] 306429.120334::|      smc_pnet_find_roce_resource() {
>> <...>-540539 [000] 306429.120344::|        smc_pnet_match();
>> <...>-540539 [000] 306429.120346::|        smc_ib_port_active();
>> <...>-540539 [000] 306429.120347::|        smc_pnet_determine_gid() {
>> <...>-540539 [000] 306429.120347::|          smc_ib_determine_gid() {
>> <...>-540539 [000] 306429.120350::|            smc_ib_determine_gid_rcu();
>> <...>-540539 [000] 306429.120351::|          }
>> <...>-540539 [000] 306429.120352::|        }
>> <...>-540539 [000] 306429.120352::|      }
>> <...>-540539 [000] 306429.120353::|      smc_ism_is_v2_capable();
>> <...>-540539 [000] 306429.120355::|      smc_clc_ueid_count();
>> <...>-540539 [000] 306429.120357::|      smc_pnet_find_roce_resource() {
>> <...>-540539 [000] 306429.120367::|        smc_pnet_match();
>> <...>-540539 [000] 306429.120368::|        smc_ib_port_active();
>> <...>-540539 [000] 306429.120369::|        smc_pnet_determine_gid() {
>> <...>-540539 [000] 306429.120370::|          smc_ib_determine_gid() {
>> <...>-540539 [000] 306429.120372::|            smc_ib_determine_gid_rcu();
>> <...>-540539 [000] 306429.120376::|            smc_ib_determine_gid_rcu();
>> <...>-540539 [000] 306429.120379::|            smc_ib_determine_gid_rcu();
>> <...>-540539 [000] 306429.120382::|            smc_ib_determine_gid_rcu();
>> <...>-540539 [000] 306429.120650::|          }
>> <...>-540539 [000] 306429.120651::|        }
>> <...>-540539 [000] 306429.120652::|        smc_pnet_match();
>> <...>-540539 [000] 306429.120653::|        smc_ib_port_active();
>> <...>-540539 [000] 306429.120654::|        smc_pnet_determine_gid() {
>> <...>-540539 [000] 306429.120654::|          smc_ib_determine_gid() {
>> <...>-540539 [000] 306429.120657::|            smc_ib_determine_gid_rcu();
>> <...>-540539 [000] 306429.120660::|            smc_ib_determine_gid_rcu();
>> <...>-540539 [000] 306429.120829::|          }
>> <...>-540539 [000] 306429.120829::|        }
>> <...>-540539 [000] 306429.120830::|      }
>> <...>-540539 [000] 306429.120831::|    }
>> <...>-540539 [000] 306429.120836::|    .LASANPC6660() {
>> <...>-540539 [000] 306429.120843::|      .LASANPC6654() {
>> <...>-540539 [000] 306429.120847::|        smc_clc_prfx_set4_rcu.isra.0();
>> <...>-540539 [000] 306429.120849::|      }
>> <...>-540539 [000] 306429.120850::|      smc_ism_get_chid();
>> <...>-540539 [000] 306429.120851::|      smc_ism_get_system_eid();
>> <...>-540539 [000] 306429.120889::|    }
>> <...>-540539 [000] 306429.120890::|    .LASANPC6658() {
>> <...>-540539 [000] 306429.124906::|      smc_clc_msg_hdr_valid();
>> <...>-540539 [000] 306429.124908::|    }
>> <...>-540539 [000] 306429.124908::|    .LASANPC6727() {
>> <...>-540539 [000] 306429.124909::|      smc_connect_rdma_v2_prepare();
>> <...>-540539 [000] 306429.124912::|      smc_conn_create() {
>>
>>> On 11/7/22 7:05 PM, D. Wythe wrote:
>>>>
>>>> Hi Wenjia,
>>>>
>>>> Thanks a lot for your information, before that we thought you did PATCH test one by one,
>>>> now I think I have found the root cause, and I will release a new version to fix this
>>>> soon as possible.
>>>>
>>>> Best Wishes.
>>>> D. Wythe
>>>>
>>>> On 11/2/22 9:55 PM, Wenjia Zhang wrote:
>>>>>
>>>>>
>>>>> On 01.11.22 08:22, D. Wythe wrote:
>>>>>>
>>>>>> Hi Jan,
>>>>>>
>>>>>> Our team conducted some code reviews over this, but unfortunately no obvious problems were found. Hence
>>>>>> we are waiting for Tony Lu's virtual SMC-D device to test, which is expected to come in this week.  Before that,
>>>>>> I wonder if your tests are running separately on separate PATCH? If so, I would like to please you to test
>>>>>> the first PATCH and the second PATCH together. I doubt that the problem repaired by the second PATCH
>>>>>> is the cause of this issues.
>>>>>>
>>>>>> Best Wishes.
>>>>>> D. Wythe
>>>>>>
>>>>>
>>>>> Hi D. Wythe,
>>>>>
>>>>> We did test the series of the patches as a whole. That would be great if you could use Tony's virtual device to test SMC-D. By the way, I'll put your patches in our CI, let's see if it can find something.
>>>>>
>>>>> Best,
>>>>> Wenjia
>>>>>>
>>>>>> On 10/24/22 9:11 PM, Jan Karcher wrote:
>>>>>>> Hi D. Wythe,
>>>>>>>
>>>>>>> I re-run the tests with your fix.
>>>>>>> SMC-R works fine now. For SMC-D we still have the following problem. It is kind of the same as i reported in v2 but even weirder:
>>>>>>>
>>>>>>> smc stats:
>>>>>>>
>>>>>>> t8345011
>>>>>>> SMC-D Connections Summary
>>>>>>>    Total connections handled          2465
>>>>>>> SMC-R Connections Summary
>>>>>>>    Total connections handled           232
>>>>>>>
>>>>>>> t8345010
>>>>>>> SMC-D Connections Summary
>>>>>>>    Total connections handled          2290
>>>>>>> SMC-R Connections Summary
>>>>>>>    Total connections handled           231
>>>>>>>
>>>>>>>
>>>>>>> smc linkgroups:
>>>>>>>
>>>>>>> t8345011
>>>>>>> [root@t8345011 ~]# smcr linkgroup
>>>>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>>>>> 00000400 SERV     SYM         0       0  NET25
>>>>>>> [root@t8345011 ~]# smcd linkgroup
>>>>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>>>>> 00000300    0      16  NET25
>>>>>>>
>>>>>>> t8345010
>>>>>>> [root@t8345010 tela-kernel]# smcr linkgroup
>>>>>>> LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
>>>>>>> 00000400 CLNT     SYM         0       0  NET25
>>>>>>> [root@t8345010 tela-kernel]# smcd linkgroup
>>>>>>> LG-ID    VLAN  #Conns  PNET-ID
>>>>>>> 00000300    0       1  NET25
>>>>>>>
>>>>>>>
>>>>>>> smcss:
>>>>>>>
>>>>>>> t8345011
>>>>>>> [root@t8345011 ~]# smcss
>>>>>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>>>>>
>>>>>>> t8345010
>>>>>>> [root@t8345010 tela-kernel]# smcss
>>>>>>> State          UID   Inode   Local Address           Peer Address Intf Mode
>>>>>>>
>>>>>>>
>>>>>>> lsmod:
>>>>>>>
>>>>>>> t8345011
>>>>>>> [root@t8345011 ~]# lsmod | grep smc
>>>>>>> smc                   225280  18 ism,smc_diag
>>>>>>> t8345010
>>>>>>> [root@t8345010 tela-kernel]# lsmod | grep smc
>>>>>>> smc                   225280  3 ism,smc_diag
>>>>>>>
>>>>>>> Also smc_dbg and netstat do not show any more information on this problem. We only see in the dmesg that the code seems to build up SMC-R linkgroups even tho we are running the SMC-D tests.
>>>>>>> NOTE: we disabled the syncookies for the tests.
>>>>>>>
>>>>>>> dmesg:
>>>>>>>
>>>>>>> t8345011
>>>>>>> smc-tests: test_smcapp_torture_test started
>>>>>>> kernel: TCP: request_sock_TCP: Possible SYN flooding on port 22465. Dropping request.  Check SNMP counters.
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>>>>>
>>>>>>> t8345010
>>>>>>> smc-tests: test_smcapp_torture_test started
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1
>>>>>>> kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25
>>>>>>>
>>>>>>> If this output does not help and if you want us to look deeper into it feel free to let us know and we can debug further.
>>>>>>>
>>>>>>> On 23/10/2022 14:43, D.Wythe wrote:
>>>>>>>> From: "D.Wythe" <alibuda@linux.alibaba.com>
>>>>>>>>
>>>>>>>> This patch set attempts to optimize the parallelism of SMC-R connections,
>>>>>>>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>>>>>>>> occur after thoses optimization.
>>>>>>>>
>>>>>>>> According to Off-CPU graph, SMC worker's off-CPU as that:
>>>>>>>>
>>>>>>>> smc_close_passive_work                  (1.09%)
>>>>>>>>          smcr_buf_unuse                  (1.08%)
>>>>>>>>                  smc_llc_flow_initiate   (1.02%)
>>>>>>>>
>>>>>>>> smc_listen_work                         (48.17%)
>>>>>>>>          __mutex_lock.isra.11            (47.96%)
>>>>>>>>
>>>>>>>>
>>>>>>>> An ideal SMC-R connection process should only block on the IO events
>>>>>>>> of the network, but it's quite clear that the SMC-R connection now is
>>>>>>>> queued on the lock most of the time.
>>>>>>>>
>>>>>>>> The goal of this patchset is to achieve our ideal situation where
>>>>>>>> network IO events are blocked for the majority of the connection lifetime.
>>>>>>>>
>>>>>>>> There are three big locks here:
>>>>>>>>
>>>>>>>> 1. smc_client_lgr_pending & smc_server_lgr_pending
>>>>>>>>
>>>>>>>> 2. llc_conf_mutex
>>>>>>>>
>>>>>>>> 3. rmbs_lock & sndbufs_lock
>>>>>>>>
>>>>>>>> And an implementation issue:
>>>>>>>>
>>>>>>>> 1. confirm/delete rkey msg can't be sent concurrently while
>>>>>>>> protocol allows indeed.
>>>>>>>>
>>>>>>>> Unfortunately,The above problems together affect the parallelism of
>>>>>>>> SMC-R connection. If any of them are not solved. our goal cannot
>>>>>>>> be achieved.
>>>>>>>>
>>>>>>>> After this patch set, we can get a quite ideal off-CPU graph as
>>>>>>>> following:
>>>>>>>>
>>>>>>>> smc_close_passive_work                                  (41.58%)
>>>>>>>>          smcr_buf_unuse                                  (41.57%)
>>>>>>>>                  smc_llc_do_delete_rkey                  (41.57%)
>>>>>>>>
>>>>>>>> smc_listen_work                                         (39.10%)
>>>>>>>>          smc_clc_wait_msg                                (13.18%)
>>>>>>>>                  tcp_recvmsg_locked                      (13.18)
>>>>>>>>          smc_listen_find_device                          (25.87%)
>>>>>>>>                  smcr_lgr_reg_rmbs                       (25.87%)
>>>>>>>>                          smc_llc_do_confirm_rkey         (25.87%)
>>>>>>>>
>>>>>>>> We can see that most of the waiting times are waiting for network IO
>>>>>>>> events. This also has a certain performance improvement on our
>>>>>>>> short-lived conenction wrk/nginx benchmark test:
>>>>>>>>
>>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>>> |conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
>>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>>> |SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
>>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>>> |SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
>>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>>> |TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
>>>>>>>> +--------------+------+------+-------+--------+------+--------+
>>>>>>>>
>>>>>>>> The reason why the benefit is not obvious after the number of connections
>>>>>>>> has increased dues to workqueue. If we try to change workqueue to UNBOUND,
>>>>>>>> we can obtain at least 4-5 times performance improvement, reach up to half
>>>>>>>> of TCP. However, this is not an elegant solution, the optimization of it
>>>>>>>> will be much more complicated. But in any case, we will submit relevant
>>>>>>>> optimization patches as soon as possible.
>>>>>>>>
>>>>>>>> Please note that the premise here is that the lock related problem
>>>>>>>> must be solved first, otherwise, no matter how we optimize the workqueue,
>>>>>>>> there won't be much improvement.
>>>>>>>>
>>>>>>>> Because there are a lot of related changes to the code, if you have
>>>>>>>> any questions or suggestions, please let me know.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> D. Wythe
>>>>>>>>
>>>>>>>> v1 -> v2:
>>>>>>>>
>>>>>>>> 1. Fix panic in SMC-D scenario
>>>>>>>> 2. Fix lnkc related hashfn calculation exception, caused by operator
>>>>>>>> priority
>>>>>>>> 3. Only wake up one connection if the lnk is not active
>>>>>>>> 4. Delete obsolete unlock logic in smc_listen_work()
>>>>>>>> 5. PATCH format, do Reverse Christmas tree
>>>>>>>> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
>>>>>>>> 7. PATCH format, add correct fix tag for the patches for fixes.
>>>>>>>> 8. PATCH format, fix some spelling error
>>>>>>>> 9. PATCH format, rename slow to do_slow
>>>>>>>>
>>>>>>>> v2 -> v3:
>>>>>>>>
>>>>>>>> 1. add SMC-D support, remove the concept of link cluster since SMC-D has
>>>>>>>> no link at all. Replace it by lgr decision maker, who provides suggestions
>>>>>>>> to SMC-D and SMC-R on whether to create new link group.
>>>>>>>>
>>>>>>>> 2. Fix the corruption problem described by PATCH 'fix application
>>>>>>>> data exception' on SMC-D.
>>>>>>>>
>>>>>>>> v3 -> v4:
>>>>>>>>
>>>>>>>> 1. Fix panic caused by uninitialization map.
>>>>>>>>
>>>>>>>> D. Wythe (10):
>>>>>>>>    net/smc: remove locks smc_client_lgr_pending and
>>>>>>>>      smc_server_lgr_pending
>>>>>>>>    net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
>>>>>>>>    net/smc: allow confirm/delete rkey response deliver multiplex
>>>>>>>>    net/smc: make SMC_LLC_FLOW_RKEY run concurrently
>>>>>>>>    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
>>>>>>>>    net/smc: use read semaphores to reduce unnecessary blocking in
>>>>>>>>      smc_buf_create() & smcr_buf_unuse()
>>>>>>>>    net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
>>>>>>>>    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
>>>>>>>>    net/smc: Fix potential panic dues to unprotected
>>>>>>>>      smc_llc_srv_add_link()
>>>>>>>>    net/smc: fix application data exception
>>>>>>>>
>>>>>>>>   net/smc/af_smc.c   |  70 ++++----
>>>>>>>>   net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>>>>>   net/smc/smc_core.h |  36 +++-
>>>>>>>>   net/smc/smc_llc.c  | 277 ++++++++++++++++++++++---------
>>>>>>>>   net/smc/smc_llc.h  |   6 +
>>>>>>>>   net/smc/smc_wr.c   |  10 --
>>>>>>>>   net/smc/smc_wr.h   |  10 ++
>>>>>>>>   7 files changed, 712 insertions(+), 175 deletions(-)
>>>>>>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-11-10  9:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-23 12:43 [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections D.Wythe
2022-10-23 12:43 ` [PATCH net-next v4 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending D.Wythe
2022-10-23 12:43 ` [PATCH net-next v4 02/10] net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending D.Wythe
2022-10-23 12:43 ` [PATCH net-next v4 03/10] net/smc: allow confirm/delete rkey response deliver multiplex D.Wythe
2022-10-23 12:43 ` [PATCH net-next v4 04/10] net/smc: make SMC_LLC_FLOW_RKEY run concurrently D.Wythe
2022-10-23 12:43 ` [PATCH net-next v4 05/10] net/smc: llc_conf_mutex refactor, replace it with rw_semaphore D.Wythe
2022-10-23 12:43 ` [PATCH net-next v4 06/10] net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() D.Wythe
2022-10-23 12:43 ` [PATCH net-next v4 07/10] net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() D.Wythe
2022-10-23 12:44 ` [PATCH net-next v4 08/10] net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore D.Wythe
2022-10-23 12:44 ` [PATCH net-next v4 09/10] net/smc: Fix potential panic dues to unprotected smc_llc_srv_add_link() D.Wythe
2022-10-23 12:44 ` [PATCH net-next v4 10/10] net/smc: fix application data exception D.Wythe
2022-10-24 13:11 ` [PATCH net-next v4 00/10] optimize the parallelism of SMC-R connections Jan Karcher
2022-10-26  7:20   ` D. Wythe
2022-11-01  7:22   ` D. Wythe
2022-11-02 13:55     ` Wenjia Zhang
2022-11-07 11:05       ` D. Wythe
2022-11-09  9:10         ` D. Wythe
2022-11-09 17:31           ` Wenjia Zhang
2022-11-10  7:54             ` D. Wythe
2022-11-10  9:39               ` D. Wythe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.