All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
@ 2022-02-25  8:11 D. Wythe
  2022-02-25 16:45 ` D. Wythe
  0 siblings, 1 reply; 7+ messages in thread
From: D. Wythe @ 2022-02-25  8:11 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

Remove connections from link group is not synchronous with handling
SMC_LLC_DELETE_RKEY, which means that even the number of connections is
less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
register rtoken successfully later, in other words, the rtoken entry may
have not been released. This will cause an unexpected
SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
to fallback to TCP.

Therefore, we need to judge according to the number of idle rtoken
entry.

Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 29525d0..24ef0af 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1864,7 +1864,8 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 		    (ini->smcd_version == SMC_V2 ||
 		     lgr->vlan_id == ini->vlan_id) &&
 		    (role == SMC_CLNT || ini->is_smcd ||
-		     lgr->conns_num < SMC_RMBS_PER_LGR_MAX)) {
+		     lgr->conns_num < SMC_RMBS_PER_LGR_MAX -
+		     bitmap_weight(lgr->rtokens_used_mask, SMC_RMBS_PER_LGR_MAX))) {
 			/* link group found */
 			ini->first_contact_local = 0;
 			conn->lgr = lgr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
  2022-02-25  8:11 [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error D. Wythe
@ 2022-02-25 16:45 ` D. Wythe
  0 siblings, 0 replies; 7+ messages in thread
From: D. Wythe @ 2022-02-25 16:45 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma



在 2022/2/25 下午4:11, D. Wythe 写道:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Remove connections from link group is not synchronous with handling
> SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> register rtoken successfully later, in other words, the rtoken entry may
> have not been released. This will cause an unexpected
> SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> to fallback to TCP.
> 
> Therefore, we need to judge according to the number of idle rtoken
> entry.
> 
> Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/smc_core.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index 29525d0..24ef0af 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1864,7 +1864,8 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
>   		    (ini->smcd_version == SMC_V2 ||
>   		     lgr->vlan_id == ini->vlan_id) &&
>   		    (role == SMC_CLNT || ini->is_smcd ||
> -		     lgr->conns_num < SMC_RMBS_PER_LGR_MAX)) {
> +		     lgr->conns_num < SMC_RMBS_PER_LGR_MAX -
> +		     bitmap_weight(lgr->rtokens_used_mask, SMC_RMBS_PER_LGR_MAX))) {
>   			/* link group found */
>   			ini->first_contact_local = 0;
>   			conn->lgr = lgr;


I did a horrible math here, i'll send another fix later.

Best wishes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
  2022-03-01 13:17 D. Wythe
  2022-03-02  7:28 ` Wen Gu
@ 2022-03-02 11:44 ` D. Wythe
  1 sibling, 0 replies; 7+ messages in thread
From: D. Wythe @ 2022-03-02 11:44 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma



在 2022/3/1 下午9:17, D. Wythe 写道:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Remove connections from link group is not synchronous with handling
> SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> register rtoken successfully later, in other words, the rtoken entry may
> have not been released. This will cause an unexpected
> SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> to fallback to TCP.
> 
> We found that the main reason for the problem dues to following execution
> sequence:
> 
> Server Conn A:           Server Conn B:			Client Conn B:
> 
> smc_lgr_unregister_conn
>                          smc_lgr_register_conn
>                          smc_clc_send_accept     ->
>                                                          smc_rtoken_add
> smcr_buf_unuse
> 		->		Client Conn A:
> 				smc_rtoken_delete
> 
> smc_lgr_unregister_conn() makes current link available to assigned to new
> incoming connection, while smcr_buf_unuse() has not executed yet, which
> means that smc_rtoken_add may fail because of insufficient rtoken_entry,
> reversing their execution order will avoid this problem.
> 
> Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/smc_core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index 2f321d2..c9c3a68 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1161,8 +1161,8 @@ void smc_conn_free(struct smc_connection *conn)
>   			cancel_work_sync(&conn->abort_work);
>   	}
>   	if (!list_empty(&lgr->list)) {
> -		smc_lgr_unregister_conn(conn);
>   		smc_buf_unuse(conn, lgr); /* allow buffer reuse */
> +		smc_lgr_unregister_conn(conn);
>   	}
>   
>   	if (!lgr->conns_num)

I have two patch for this issue, and i missed one, I'll post it in v2 
series.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
  2022-03-02  7:51   ` D. Wythe
@ 2022-03-02  9:17     ` Wen Gu
  0 siblings, 0 replies; 7+ messages in thread
From: Wen Gu @ 2022-03-02  9:17 UTC (permalink / raw)
  To: D. Wythe; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma



On 2022/3/2 3:51 pm, D. Wythe wrote:

> We should always be willing to improve the success rate of the SMC
> connection, creating a new group is not a side effect of this patch, it
> actually dues to the state bewteen connections that can not achieve
> clock synchronization. In fact, it can happen in any times.
> 
> Thanks.

OK, I understand. Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
  2022-03-02  7:28 ` Wen Gu
@ 2022-03-02  7:51   ` D. Wythe
  2022-03-02  9:17     ` Wen Gu
  0 siblings, 1 reply; 7+ messages in thread
From: D. Wythe @ 2022-03-02  7:51 UTC (permalink / raw)
  To: Wen Gu; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma

On Wed, Mar 02, 2022 at 03:28:32PM +0800, Wen Gu wrote:
> 
> 
> 在 2022/3/1 下午9:17, D. Wythe 写道:
> >From: "D. Wythe" <alibuda@linux.alibaba.com>
> >
> >Remove connections from link group is not synchronous with handling
> >SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> >less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> >register rtoken successfully later, in other words, the rtoken entry may
> >have not been released. This will cause an unexpected
> >SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> >to fallback to TCP.
> >
> 
> 
> IMHO, if there are SMC_RMBS_PER_LGR_MAX connections in the link group now,
> one of them is being removed and here comes a new connection at this moment,
> then:
> 
> (1) without this patch, the new connection will be added into the old link group
>     but fallback if the removing connection has not finished unregistering its rmb.
> 
> (2) with this patch, a new link group will be created and the new connection
>     will be added into the new link group.
> 
> I am wondering if (1) should be considered as a issue, or just a bydesign?
> If it is a issue, I think this patch works, Thanks!


We should always be willing to improve the success rate of the SMC 
connection, creating a new group is not a side effect of this patch, it 
actually dues to the state bewteen connections that can not achieve 
clock synchronization. In fact, it can happen in any times.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
  2022-03-01 13:17 D. Wythe
@ 2022-03-02  7:28 ` Wen Gu
  2022-03-02  7:51   ` D. Wythe
  2022-03-02 11:44 ` D. Wythe
  1 sibling, 1 reply; 7+ messages in thread
From: Wen Gu @ 2022-03-02  7:28 UTC (permalink / raw)
  To: D. Wythe, kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma



在 2022/3/1 下午9:17, D. Wythe 写道:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Remove connections from link group is not synchronous with handling
> SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> register rtoken successfully later, in other words, the rtoken entry may
> have not been released. This will cause an unexpected
> SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> to fallback to TCP.
> 


IMHO, if there are SMC_RMBS_PER_LGR_MAX connections in the link group now,
one of them is being removed and here comes a new connection at this moment,
then:

(1) without this patch, the new connection will be added into the old link group
     but fallback if the removing connection has not finished unregistering its rmb.

(2) with this patch, a new link group will be created and the new connection
     will be added into the new link group.

I am wondering if (1) should be considered as a issue, or just a bydesign?
If it is a issue, I think this patch works, Thanks!

Reviewed-by: Wen Gu <guwen@linux.alibaba.com>

Best Regards,
Wen Gu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
@ 2022-03-01 13:17 D. Wythe
  2022-03-02  7:28 ` Wen Gu
  2022-03-02 11:44 ` D. Wythe
  0 siblings, 2 replies; 7+ messages in thread
From: D. Wythe @ 2022-03-01 13:17 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

Remove connections from link group is not synchronous with handling
SMC_LLC_DELETE_RKEY, which means that even the number of connections is
less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
register rtoken successfully later, in other words, the rtoken entry may
have not been released. This will cause an unexpected
SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
to fallback to TCP.

We found that the main reason for the problem dues to following execution
sequence:

Server Conn A:           Server Conn B:			Client Conn B:

smc_lgr_unregister_conn
                        smc_lgr_register_conn
                        smc_clc_send_accept     ->
                                                        smc_rtoken_add
smcr_buf_unuse
		->		Client Conn A:
				smc_rtoken_delete

smc_lgr_unregister_conn() makes current link available to assigned to new
incoming connection, while smcr_buf_unuse() has not executed yet, which
means that smc_rtoken_add may fail because of insufficient rtoken_entry,
reversing their execution order will avoid this problem.

Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 2f321d2..c9c3a68 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1161,8 +1161,8 @@ void smc_conn_free(struct smc_connection *conn)
 			cancel_work_sync(&conn->abort_work);
 	}
 	if (!list_empty(&lgr->list)) {
-		smc_lgr_unregister_conn(conn);
 		smc_buf_unuse(conn, lgr); /* allow buffer reuse */
+		smc_lgr_unregister_conn(conn);
 	}
 
 	if (!lgr->conns_num)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-02 11:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25  8:11 [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error D. Wythe
2022-02-25 16:45 ` D. Wythe
2022-03-01 13:17 D. Wythe
2022-03-02  7:28 ` Wen Gu
2022-03-02  7:51   ` D. Wythe
2022-03-02  9:17     ` Wen Gu
2022-03-02 11:44 ` D. Wythe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.