* [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
@ 2022-02-25 8:11 D. Wythe
2022-02-25 16:45 ` D. Wythe
0 siblings, 1 reply; 7+ messages in thread
From: D. Wythe @ 2022-02-25 8:11 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
Remove connections from link group is not synchronous with handling
SMC_LLC_DELETE_RKEY, which means that even the number of connections is
less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
register rtoken successfully later, in other words, the rtoken entry may
have not been released. This will cause an unexpected
SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
to fallback to TCP.
Therefore, we need to judge according to the number of idle rtoken
entry.
Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/smc_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 29525d0..24ef0af 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1864,7 +1864,8 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
(ini->smcd_version == SMC_V2 ||
lgr->vlan_id == ini->vlan_id) &&
(role == SMC_CLNT || ini->is_smcd ||
- lgr->conns_num < SMC_RMBS_PER_LGR_MAX)) {
+ lgr->conns_num < SMC_RMBS_PER_LGR_MAX -
+ bitmap_weight(lgr->rtokens_used_mask, SMC_RMBS_PER_LGR_MAX))) {
/* link group found */
ini->first_contact_local = 0;
conn->lgr = lgr;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
2022-02-25 8:11 [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error D. Wythe
@ 2022-02-25 16:45 ` D. Wythe
0 siblings, 0 replies; 7+ messages in thread
From: D. Wythe @ 2022-02-25 16:45 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
在 2022/2/25 下午4:11, D. Wythe 写道:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Remove connections from link group is not synchronous with handling
> SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> register rtoken successfully later, in other words, the rtoken entry may
> have not been released. This will cause an unexpected
> SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> to fallback to TCP.
>
> Therefore, we need to judge according to the number of idle rtoken
> entry.
>
> Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> net/smc/smc_core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index 29525d0..24ef0af 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1864,7 +1864,8 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
> (ini->smcd_version == SMC_V2 ||
> lgr->vlan_id == ini->vlan_id) &&
> (role == SMC_CLNT || ini->is_smcd ||
> - lgr->conns_num < SMC_RMBS_PER_LGR_MAX)) {
> + lgr->conns_num < SMC_RMBS_PER_LGR_MAX -
> + bitmap_weight(lgr->rtokens_used_mask, SMC_RMBS_PER_LGR_MAX))) {
> /* link group found */
> ini->first_contact_local = 0;
> conn->lgr = lgr;
I did a horrible math here, i'll send another fix later.
Best wishes.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
2022-03-01 13:17 D. Wythe
2022-03-02 7:28 ` Wen Gu
@ 2022-03-02 11:44 ` D. Wythe
1 sibling, 0 replies; 7+ messages in thread
From: D. Wythe @ 2022-03-02 11:44 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
在 2022/3/1 下午9:17, D. Wythe 写道:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Remove connections from link group is not synchronous with handling
> SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> register rtoken successfully later, in other words, the rtoken entry may
> have not been released. This will cause an unexpected
> SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> to fallback to TCP.
>
> We found that the main reason for the problem dues to following execution
> sequence:
>
> Server Conn A: Server Conn B: Client Conn B:
>
> smc_lgr_unregister_conn
> smc_lgr_register_conn
> smc_clc_send_accept ->
> smc_rtoken_add
> smcr_buf_unuse
> -> Client Conn A:
> smc_rtoken_delete
>
> smc_lgr_unregister_conn() makes current link available to assigned to new
> incoming connection, while smcr_buf_unuse() has not executed yet, which
> means that smc_rtoken_add may fail because of insufficient rtoken_entry,
> reversing their execution order will avoid this problem.
>
> Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> net/smc/smc_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index 2f321d2..c9c3a68 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1161,8 +1161,8 @@ void smc_conn_free(struct smc_connection *conn)
> cancel_work_sync(&conn->abort_work);
> }
> if (!list_empty(&lgr->list)) {
> - smc_lgr_unregister_conn(conn);
> smc_buf_unuse(conn, lgr); /* allow buffer reuse */
> + smc_lgr_unregister_conn(conn);
> }
>
> if (!lgr->conns_num)
I have two patch for this issue, and i missed one, I'll post it in v2
series.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
2022-03-02 7:51 ` D. Wythe
@ 2022-03-02 9:17 ` Wen Gu
0 siblings, 0 replies; 7+ messages in thread
From: Wen Gu @ 2022-03-02 9:17 UTC (permalink / raw)
To: D. Wythe; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma
On 2022/3/2 3:51 pm, D. Wythe wrote:
> We should always be willing to improve the success rate of the SMC
> connection, creating a new group is not a side effect of this patch, it
> actually dues to the state bewteen connections that can not achieve
> clock synchronization. In fact, it can happen in any times.
>
> Thanks.
OK, I understand. Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
2022-03-02 7:28 ` Wen Gu
@ 2022-03-02 7:51 ` D. Wythe
2022-03-02 9:17 ` Wen Gu
0 siblings, 1 reply; 7+ messages in thread
From: D. Wythe @ 2022-03-02 7:51 UTC (permalink / raw)
To: Wen Gu; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma
On Wed, Mar 02, 2022 at 03:28:32PM +0800, Wen Gu wrote:
>
>
> 在 2022/3/1 下午9:17, D. Wythe 写道:
> >From: "D. Wythe" <alibuda@linux.alibaba.com>
> >
> >Remove connections from link group is not synchronous with handling
> >SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> >less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> >register rtoken successfully later, in other words, the rtoken entry may
> >have not been released. This will cause an unexpected
> >SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> >to fallback to TCP.
> >
>
>
> IMHO, if there are SMC_RMBS_PER_LGR_MAX connections in the link group now,
> one of them is being removed and here comes a new connection at this moment,
> then:
>
> (1) without this patch, the new connection will be added into the old link group
> but fallback if the removing connection has not finished unregistering its rmb.
>
> (2) with this patch, a new link group will be created and the new connection
> will be added into the new link group.
>
> I am wondering if (1) should be considered as a issue, or just a bydesign?
> If it is a issue, I think this patch works, Thanks!
We should always be willing to improve the success rate of the SMC
connection, creating a new group is not a side effect of this patch, it
actually dues to the state bewteen connections that can not achieve
clock synchronization. In fact, it can happen in any times.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
2022-03-01 13:17 D. Wythe
@ 2022-03-02 7:28 ` Wen Gu
2022-03-02 7:51 ` D. Wythe
2022-03-02 11:44 ` D. Wythe
1 sibling, 1 reply; 7+ messages in thread
From: Wen Gu @ 2022-03-02 7:28 UTC (permalink / raw)
To: D. Wythe, kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
在 2022/3/1 下午9:17, D. Wythe 写道:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Remove connections from link group is not synchronous with handling
> SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> register rtoken successfully later, in other words, the rtoken entry may
> have not been released. This will cause an unexpected
> SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> to fallback to TCP.
>
IMHO, if there are SMC_RMBS_PER_LGR_MAX connections in the link group now,
one of them is being removed and here comes a new connection at this moment,
then:
(1) without this patch, the new connection will be added into the old link group
but fallback if the removing connection has not finished unregistering its rmb.
(2) with this patch, a new link group will be created and the new connection
will be added into the new link group.
I am wondering if (1) should be considered as a issue, or just a bydesign?
If it is a issue, I think this patch works, Thanks!
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Best Regards,
Wen Gu
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
@ 2022-03-01 13:17 D. Wythe
2022-03-02 7:28 ` Wen Gu
2022-03-02 11:44 ` D. Wythe
0 siblings, 2 replies; 7+ messages in thread
From: D. Wythe @ 2022-03-01 13:17 UTC (permalink / raw)
To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
Remove connections from link group is not synchronous with handling
SMC_LLC_DELETE_RKEY, which means that even the number of connections is
less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
register rtoken successfully later, in other words, the rtoken entry may
have not been released. This will cause an unexpected
SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
to fallback to TCP.
We found that the main reason for the problem dues to following execution
sequence:
Server Conn A: Server Conn B: Client Conn B:
smc_lgr_unregister_conn
smc_lgr_register_conn
smc_clc_send_accept ->
smc_rtoken_add
smcr_buf_unuse
-> Client Conn A:
smc_rtoken_delete
smc_lgr_unregister_conn() makes current link available to assigned to new
incoming connection, while smcr_buf_unuse() has not executed yet, which
means that smc_rtoken_add may fail because of insufficient rtoken_entry,
reversing their execution order will avoid this problem.
Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/smc_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 2f321d2..c9c3a68 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1161,8 +1161,8 @@ void smc_conn_free(struct smc_connection *conn)
cancel_work_sync(&conn->abort_work);
}
if (!list_empty(&lgr->list)) {
- smc_lgr_unregister_conn(conn);
smc_buf_unuse(conn, lgr); /* allow buffer reuse */
+ smc_lgr_unregister_conn(conn);
}
if (!lgr->conns_num)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-03-02 11:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25 8:11 [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error D. Wythe
2022-02-25 16:45 ` D. Wythe
2022-03-01 13:17 D. Wythe
2022-03-02 7:28 ` Wen Gu
2022-03-02 7:51 ` D. Wythe
2022-03-02 9:17 ` Wen Gu
2022-03-02 11:44 ` D. Wythe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.