linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/5] net/smc: bugfixs for smc-r
@ 2023-10-11  7:33 D. Wythe
  2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
                   ` (5 more replies)
  0 siblings, 6 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-11  7:33 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patches contains bugfix following:

1. hung state
2. sock leak
3. potential panic 

We have been testing these patches for some time, but
if you have any questions, please let us know.

Thanks,
D. Wythe

D. Wythe (5):
  net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  net/smc: fix incorrect barrier usage
  net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
  net/smc: protect connection state transitions in listen work
  net/smc: put sk reference if close work was canceled

 net/smc/af_smc.c    |  9 +++++++--
 net/smc/smc.h       |  5 +++++
 net/smc/smc_cdc.c   | 11 +++++------
 net/smc/smc_close.c |  5 +++--
 net/smc/smc_core.c  | 21 +++++++++++++--------
 5 files changed, 33 insertions(+), 18 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
@ 2023-10-11  7:33 ` D. Wythe
  2023-10-11 14:00   ` Dust Li
                     ` (2 more replies)
  2023-10-11  7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-11  7:33 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

Considering scenario:

				smc_cdc_rx_handler_rwwi
__smc_release
				sock_set_flag
smc_close_active()
sock_set_flag

__set_bit(DEAD)			__set_bit(DONE)

Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
in smc_close_passive_work:

if (sock_flag(sk, SOCK_DEAD) &&
	smc_close_sent_any_close(conn)) {
	sk->sk_state = SMC_CLOSED;
} else {
	/* just shutdown, but not yet closed locally */
	sk->sk_state = SMC_APPFINCLOSEWAIT;
}

Replace sock_set_flags or __set_bit to set_bit will fix this problem.
Since set_bit is atomic.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c    | 4 ++--
 net/smc/smc.h       | 5 +++++
 net/smc/smc_cdc.c   | 2 +-
 net/smc/smc_close.c | 2 +-
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index bacdd97..5ad2a9f 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
 
 	if (!smc->use_fallback) {
 		rc = smc_close_active(smc);
-		sock_set_flag(sk, SOCK_DEAD);
+		smc_sock_set_flag(sk, SOCK_DEAD);
 		sk->sk_shutdown |= SHUTDOWN_MASK;
 	} else {
 		if (sk->sk_state != SMC_CLOSED) {
@@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
 		if (new_clcsock)
 			sock_release(new_clcsock);
 		new_sk->sk_state = SMC_CLOSED;
-		sock_set_flag(new_sk, SOCK_DEAD);
+		smc_sock_set_flag(new_sk, SOCK_DEAD);
 		sock_put(new_sk); /* final */
 		*new_smc = NULL;
 		goto out;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 24745fd..e377980 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
 int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
 int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
 
+static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag)
+{
+	set_bit(flag, &sk->sk_flags);
+}
+
 #endif	/* __SMC_H */
diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 89105e9..01bdb79 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
 		smc->sk.sk_shutdown |= RCV_SHUTDOWN;
 		if (smc->clcsock && smc->clcsock->sk)
 			smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
-		sock_set_flag(&smc->sk, SOCK_DONE);
+		smc_sock_set_flag(&smc->sk, SOCK_DONE);
 		sock_hold(&smc->sk); /* sock_put in close_work */
 		if (!queue_work(smc_close_wq, &conn->close_work))
 			sock_put(&smc->sk);
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index dbdf03e..449ef45 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
 		break;
 	}
 
-	sock_set_flag(sk, SOCK_DEAD);
+	smc_sock_set_flag(sk, SOCK_DEAD);
 	sk->sk_state_change(sk);
 
 	if (release_clcsock) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net 2/5] net/smc: fix incorrect barrier usage
  2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
  2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
@ 2023-10-11  7:33 ` D. Wythe
  2023-10-11  8:44   ` Heiko Carstens
  2023-10-11  7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-11  7:33 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe, Heiko Carstens

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch add explicit CPU barrier to ensure memory
consistency rather than compiler barrier.

Besides, the atomicity between READ_ONCE and cmpxhcg cannot
be guaranteed, so we need to use atomic ops. The simple way
is to replace READ_ONCE with xchg.

Fixes: 475f9ff63ee8 ("net/smc: fix application data exception")
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Links: https://lore.kernel.org/netdev/1b7c95be-d3d9-53c3-3152-cd835314d37c@linux.ibm.com/T/
---
 net/smc/smc_core.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d520ee6..cc7d72e 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1133,9 +1133,10 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 
 		smc_buf_free(lgr, is_rmb, buf_desc);
 	} else {
-		/* memzero_explicit provides potential memory barrier semantics */
-		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
-		WRITE_ONCE(buf_desc->used, 0);
+		memset(buf_desc->cpu_addr, 0, buf_desc->len);
+		/* make sure buf_desc->used not be reordered ahead */
+		smp_mb__before_atomic();
+		xchg(&buf_desc->used, 0);
 	}
 }
 
@@ -1146,17 +1147,21 @@ static void smc_buf_unuse(struct smc_connection *conn,
 		if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
 			smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
 		} else {
-			memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
-			WRITE_ONCE(conn->sndbuf_desc->used, 0);
+			memset(conn->sndbuf_desc->cpu_addr, 0, conn->sndbuf_desc->len);
+			/* make sure buf_desc->used not be reordered ahead */
+			smp_mb__before_atomic();
+			xchg(&conn->sndbuf_desc->used, 0);
 		}
 	}
 	if (conn->rmb_desc) {
 		if (!lgr->is_smcd) {
 			smcr_buf_unuse(conn->rmb_desc, true, lgr);
 		} else {
-			memzero_explicit(conn->rmb_desc->cpu_addr,
-					 conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
-			WRITE_ONCE(conn->rmb_desc->used, 0);
+			memset(conn->rmb_desc->cpu_addr, 0,
+			       conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
+			/* make sure buf_desc->used not be reordered ahead */
+			smp_mb__before_atomic();
+			xchg(&conn->rmb_desc->used, 0);
 		}
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
  2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
  2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
  2023-10-11  7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
@ 2023-10-11  7:33 ` D. Wythe
  2023-10-11 20:37   ` Wenjia Zhang
  2023-10-11  7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-11  7:33 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch re-fix the issues memtianed by commit 22a825c541d7
("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").

Blocking sending message do solve the issues though, but it also
prevents the peer to receive the final message. Besides, in logic,
whether the sndbuf_desc is NULL or not have no impact on the processing
of cdc message sending.

Hence that, this patch allow the cdc message sending but to check the
sndbuf_desc with care in smc_cdc_tx_handler().

Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_cdc.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 01bdb79..3c06625 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct smc_wr_tx_pend_priv *pnd_snd,
 {
 	struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend *)pnd_snd;
 	struct smc_connection *conn = cdcpend->conn;
+	struct smc_buf_desc *sndbuf_desc;
 	struct smc_sock *smc;
 	int diff;
 
+	sndbuf_desc = conn->sndbuf_desc;
 	smc = container_of(conn, struct smc_sock, conn);
 	bh_lock_sock(&smc->sk);
-	if (!wc_status) {
-		diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
+	if (!wc_status && sndbuf_desc) {
+		diff = smc_curs_diff(sndbuf_desc->len,
 				     &cdcpend->conn->tx_curs_fin,
 				     &cdcpend->cursor);
 		/* sndbuf_space is decreased in smc_sendmsg */
@@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
 	union smc_host_cursor cfed;
 	int rc;
 
-	if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
-		return -ENOBUFS;
-
 	smc_cdc_add_pending_send(conn, pend);
 
 	conn->tx_cdc_seq++;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net 4/5] net/smc: protect connection state transitions in listen work
  2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
                   ` (2 preceding siblings ...)
  2023-10-11  7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
@ 2023-10-11  7:33 ` D. Wythe
  2023-10-12 17:14   ` Wenjia Zhang
  2023-10-11  7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
  2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
  5 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-11  7:33 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

Consider the following scenario:

				smc_close_passive_work
smc_listen_out_connected
				lock_sock()
if (state  == SMC_INIT)
				if (state  == SMC_INIT)
					state = SMC_APPCLOSEWAIT1;
	state = SMC_ACTIVE
				release_sock()

This would cause the state machine of the connection to be corrupted.
Also, this issue can occur in smc_listen_out_err().

To solve this problem, we can protect the state transitions under
the lock of sock to avoid collision.

Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 5ad2a9f..3bb8265 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1926,8 +1926,10 @@ static void smc_listen_out_connected(struct smc_sock *new_smc)
 {
 	struct sock *newsmcsk = &new_smc->sk;
 
+	lock_sock(newsmcsk);
 	if (newsmcsk->sk_state == SMC_INIT)
 		newsmcsk->sk_state = SMC_ACTIVE;
+	release_sock(newsmcsk);
 
 	smc_listen_out(new_smc);
 }
@@ -1939,9 +1941,12 @@ static void smc_listen_out_err(struct smc_sock *new_smc)
 	struct net *net = sock_net(newsmcsk);
 
 	this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt);
+
+	lock_sock(newsmcsk);
 	if (newsmcsk->sk_state == SMC_INIT)
 		sock_put(&new_smc->sk); /* passive closing */
 	newsmcsk->sk_state = SMC_CLOSED;
+	release_sock(newsmcsk);
 
 	smc_listen_out(new_smc);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
                   ` (3 preceding siblings ...)
  2023-10-11  7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
@ 2023-10-11  7:33 ` D. Wythe
  2023-10-11 14:54   ` Dust Li
  2023-10-12 19:04   ` Wenjia Zhang
  2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
  5 siblings, 2 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-11  7:33 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe

From: "D. Wythe" <alibuda@linux.alibaba.com>

Note that we always hold a reference to sock when attempting
to submit close_work. Therefore, if we have successfully
canceled close_work from pending, we MUST release that reference
to avoid potential leaks.

Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_close.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index 449ef45..10219f5 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock *smc)
 	struct sock *sk = &smc->sk;
 
 	release_sock(sk);
-	cancel_work_sync(&smc->conn.close_work);
+	if (cancel_work_sync(&smc->conn.close_work))
+		sock_put(sk);
 	cancel_delayed_work_sync(&smc->conn.tx_work);
 	lock_sock(sk);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH net 2/5] net/smc: fix incorrect barrier usage
  2023-10-11  7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
@ 2023-10-11  8:44   ` Heiko Carstens
  2023-10-11  8:57     ` D. Wythe
  0 siblings, 1 reply; 38+ messages in thread
From: Heiko Carstens @ 2023-10-11  8:44 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, wenjia, jaka, wintera, kuba, davem, netdev, linux-s390,
	linux-rdma

On Wed, Oct 11, 2023 at 03:33:17PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch add explicit CPU barrier to ensure memory
> consistency rather than compiler barrier.
> 
> Besides, the atomicity between READ_ONCE and cmpxhcg cannot
> be guaranteed, so we need to use atomic ops. The simple way
> is to replace READ_ONCE with xchg.
> 
> Fixes: 475f9ff63ee8 ("net/smc: fix application data exception")
> Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

^^^
I did not Co-develop this, nor did I provide an explicit Signed-off-by.
Please don't add Signed-off-by statements which have not been explicitly
agreed on.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 2/5] net/smc: fix incorrect barrier usage
  2023-10-11  8:44   ` Heiko Carstens
@ 2023-10-11  8:57     ` D. Wythe
  0 siblings, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-11  8:57 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: kgraul, wenjia, jaka, wintera, kuba, davem, netdev, linux-s390,
	linux-rdma



On 10/11/23 4:44 PM, Heiko Carstens wrote:
> I did not Co-develop this, nor did I provide an explicit Signed-off-by.
> Please don't add Signed-off-by statements which have not been explicitly
> agreed on.

Sorry for that, I have used the wrong tag, Reported by might be more 
appropriate .
I will remove this in the next version, sorry to bother you again.

Best wishes,
D. Wythe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
@ 2023-10-11 14:00   ` Dust Li
  2023-10-11 20:31   ` Wenjia Zhang
  2023-10-23 20:53   ` Wenjia Zhang
  2 siblings, 0 replies; 38+ messages in thread
From: Dust Li @ 2023-10-11 14:00 UTC (permalink / raw)
  To: D. Wythe, kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

On Wed, Oct 11, 2023 at 03:33:16PM +0800, D. Wythe wrote:
>From: "D. Wythe" <alibuda@linux.alibaba.com>
>
>Considering scenario:
>
>				smc_cdc_rx_handler_rwwi
>__smc_release
>				sock_set_flag
>smc_close_active()
>sock_set_flag
>
>__set_bit(DEAD)			__set_bit(DONE)

If I understand correctly, both operations should hold sock_lock,
that means thay should not race, have I missed something ?

>
>Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>in smc_close_passive_work:
>
>if (sock_flag(sk, SOCK_DEAD) &&
>	smc_close_sent_any_close(conn)) {
>	sk->sk_state = SMC_CLOSED;
>} else {
>	/* just shutdown, but not yet closed locally */
>	sk->sk_state = SMC_APPFINCLOSEWAIT;
>}
>
>Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>Since set_bit is atomic.
>

You should add a fixes tag.
>Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>---
> net/smc/af_smc.c    | 4 ++--
> net/smc/smc.h       | 5 +++++
> net/smc/smc_cdc.c   | 2 +-
> net/smc/smc_close.c | 2 +-
> 4 files changed, 9 insertions(+), 4 deletions(-)
>
>diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>index bacdd97..5ad2a9f 100644
>--- a/net/smc/af_smc.c
>+++ b/net/smc/af_smc.c
>@@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
> 
> 	if (!smc->use_fallback) {
> 		rc = smc_close_active(smc);
>-		sock_set_flag(sk, SOCK_DEAD);
>+		smc_sock_set_flag(sk, SOCK_DEAD);
> 		sk->sk_shutdown |= SHUTDOWN_MASK;
> 	} else {
> 		if (sk->sk_state != SMC_CLOSED) {
>@@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
> 		if (new_clcsock)
> 			sock_release(new_clcsock);
> 		new_sk->sk_state = SMC_CLOSED;
>-		sock_set_flag(new_sk, SOCK_DEAD);
>+		smc_sock_set_flag(new_sk, SOCK_DEAD);
> 		sock_put(new_sk); /* final */
> 		*new_smc = NULL;
> 		goto out;
>diff --git a/net/smc/smc.h b/net/smc/smc.h
>index 24745fd..e377980 100644
>--- a/net/smc/smc.h
>+++ b/net/smc/smc.h
>@@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
> int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
> int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
> 
>+static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag)
>+{
>+	set_bit(flag, &sk->sk_flags);
>+}
>+
> #endif	/* __SMC_H */
>diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>index 89105e9..01bdb79 100644
>--- a/net/smc/smc_cdc.c
>+++ b/net/smc/smc_cdc.c
>@@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
> 		smc->sk.sk_shutdown |= RCV_SHUTDOWN;
> 		if (smc->clcsock && smc->clcsock->sk)
> 			smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>-		sock_set_flag(&smc->sk, SOCK_DONE);
>+		smc_sock_set_flag(&smc->sk, SOCK_DONE);
> 		sock_hold(&smc->sk); /* sock_put in close_work */
> 		if (!queue_work(smc_close_wq, &conn->close_work))
> 			sock_put(&smc->sk);
>diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>index dbdf03e..449ef45 100644
>--- a/net/smc/smc_close.c
>+++ b/net/smc/smc_close.c
>@@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
> 		break;
> 	}
> 
>-	sock_set_flag(sk, SOCK_DEAD);
>+	smc_sock_set_flag(sk, SOCK_DEAD);
> 	sk->sk_state_change(sk);
> 
> 	if (release_clcsock) {
>-- 
>1.8.3.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-11  7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
@ 2023-10-11 14:54   ` Dust Li
  2023-10-12 19:04   ` Wenjia Zhang
  1 sibling, 0 replies; 38+ messages in thread
From: Dust Li @ 2023-10-11 14:54 UTC (permalink / raw)
  To: D. Wythe, kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

On Wed, Oct 11, 2023 at 03:33:20PM +0800, D. Wythe wrote:
>From: "D. Wythe" <alibuda@linux.alibaba.com>
>
>Note that we always hold a reference to sock when attempting
>to submit close_work. Therefore, if we have successfully
>canceled close_work from pending, we MUST release that reference
>to avoid potential leaks.
>
>Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups")
>Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>

Best regards,
Dust

>---
> net/smc/smc_close.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>index 449ef45..10219f5 100644
>--- a/net/smc/smc_close.c
>+++ b/net/smc/smc_close.c
>@@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock *smc)
> 	struct sock *sk = &smc->sk;
> 
> 	release_sock(sk);
>-	cancel_work_sync(&smc->conn.close_work);
>+	if (cancel_work_sync(&smc->conn.close_work))
>+		sock_put(sk);
> 	cancel_delayed_work_sync(&smc->conn.tx_work);
> 	lock_sock(sk);
> }
>-- 
>1.8.3.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
  2023-10-11 14:00   ` Dust Li
@ 2023-10-11 20:31   ` Wenjia Zhang
  2023-10-12  2:47     ` D. Wythe
       [not found]     ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
  2023-10-23 20:53   ` Wenjia Zhang
  2 siblings, 2 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-11 20:31 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Considering scenario:
> 
> 				smc_cdc_rx_handler_rwwi
> __smc_release
> 				sock_set_flag
> smc_close_active()
> sock_set_flag
> 
> __set_bit(DEAD)			__set_bit(DONE)
> 
> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
> if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
> in smc_close_passive_work:
> 
> if (sock_flag(sk, SOCK_DEAD) &&
> 	smc_close_sent_any_close(conn)) {
> 	sk->sk_state = SMC_CLOSED;
> } else {
> 	/* just shutdown, but not yet closed locally */
> 	sk->sk_state = SMC_APPFINCLOSEWAIT;
> }
> 
> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
> Since set_bit is atomic.
> 
I didn't really understand the scenario. What is 
smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock during 
the runtime?

> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/af_smc.c    | 4 ++--
>   net/smc/smc.h       | 5 +++++
>   net/smc/smc_cdc.c   | 2 +-
>   net/smc/smc_close.c | 2 +-
>   4 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index bacdd97..5ad2a9f 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>   
>   	if (!smc->use_fallback) {
>   		rc = smc_close_active(smc);
> -		sock_set_flag(sk, SOCK_DEAD);
> +		smc_sock_set_flag(sk, SOCK_DEAD);
>   		sk->sk_shutdown |= SHUTDOWN_MASK;
>   	} else {
>   		if (sk->sk_state != SMC_CLOSED) {
> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
>   		if (new_clcsock)
>   			sock_release(new_clcsock);
>   		new_sk->sk_state = SMC_CLOSED;
> -		sock_set_flag(new_sk, SOCK_DEAD);
> +		smc_sock_set_flag(new_sk, SOCK_DEAD);
>   		sock_put(new_sk); /* final */
>   		*new_smc = NULL;
>   		goto out;
> diff --git a/net/smc/smc.h b/net/smc/smc.h
> index 24745fd..e377980 100644
> --- a/net/smc/smc.h
> +++ b/net/smc/smc.h
> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>   int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
>   int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
>   
> +static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag)
> +{
> +	set_bit(flag, &sk->sk_flags);
> +}
> +
>   #endif	/* __SMC_H */
> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
> index 89105e9..01bdb79 100644
> --- a/net/smc/smc_cdc.c
> +++ b/net/smc/smc_cdc.c
> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
>   		smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>   		if (smc->clcsock && smc->clcsock->sk)
>   			smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
> -		sock_set_flag(&smc->sk, SOCK_DONE);
> +		smc_sock_set_flag(&smc->sk, SOCK_DONE);
>   		sock_hold(&smc->sk); /* sock_put in close_work */
>   		if (!queue_work(smc_close_wq, &conn->close_work))
>   			sock_put(&smc->sk);
> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
> index dbdf03e..449ef45 100644
> --- a/net/smc/smc_close.c
> +++ b/net/smc/smc_close.c
> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>   		break;
>   	}
>   
> -	sock_set_flag(sk, SOCK_DEAD);
> +	smc_sock_set_flag(sk, SOCK_DEAD);
>   	sk->sk_state_change(sk);
>   
>   	if (release_clcsock) {

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
  2023-10-11  7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
@ 2023-10-11 20:37   ` Wenjia Zhang
  2023-10-12  2:49     ` D. Wythe
  0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-11 20:37 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch re-fix the issues memtianed by commit 22a825c541d7
> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
> 
> Blocking sending message do solve the issues though, but it also
> prevents the peer to receive the final message. Besides, in logic,
> whether the sndbuf_desc is NULL or not have no impact on the processing
> of cdc message sending.
> 
Agree.

> Hence that, this patch allow the cdc message sending but to check the
> sndbuf_desc with care in smc_cdc_tx_handler().
> 
> Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/smc_cdc.c | 9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
> index 01bdb79..3c06625 100644
> --- a/net/smc/smc_cdc.c
> +++ b/net/smc/smc_cdc.c
> @@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct smc_wr_tx_pend_priv *pnd_snd,
>   {
>   	struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend *)pnd_snd;
>   	struct smc_connection *conn = cdcpend->conn;
> +	struct smc_buf_desc *sndbuf_desc;
>   	struct smc_sock *smc;
>   	int diff;
>   
> +	sndbuf_desc = conn->sndbuf_desc;
>   	smc = container_of(conn, struct smc_sock, conn);
>   	bh_lock_sock(&smc->sk);
> -	if (!wc_status) {
> -		diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
> +	if (!wc_status && sndbuf_desc) {
> +		diff = smc_curs_diff(sndbuf_desc->len,
How could this guarantee that the sndbuf_desc would not be NULL?

>   				     &cdcpend->conn->tx_curs_fin,
>   				     &cdcpend->cursor);
>   		/* sndbuf_space is decreased in smc_sendmsg */
> @@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
>   	union smc_host_cursor cfed;
>   	int rc;
>   
> -	if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
> -		return -ENOBUFS;
> -
>   	smc_cdc_add_pending_send(conn, pend);
>   
>   	conn->tx_cdc_seq++;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-11 20:31   ` Wenjia Zhang
@ 2023-10-12  2:47     ` D. Wythe
       [not found]     ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
  1 sibling, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-12  2:47 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> Considering scenario:
>>
>>                 smc_cdc_rx_handler_rwwi
>> __smc_release
>>                 sock_set_flag
>> smc_close_active()
>> sock_set_flag
>>
>> __set_bit(DEAD)            __set_bit(DONE)
>>
>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>> in smc_close_passive_work:
>>
>> if (sock_flag(sk, SOCK_DEAD) &&
>>     smc_close_sent_any_close(conn)) {
>>     sk->sk_state = SMC_CLOSED;
>> } else {
>>     /* just shutdown, but not yet closed locally */
>>     sk->sk_state = SMC_APPFINCLOSEWAIT;
>> }
>>
>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> Since set_bit is atomic.
>>
> I didn't really understand the scenario. What is 
> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock 
> during the runtime?
>

Hi Wenjia,

Sorry for that, It is not smc_cdc_rx_handler_rwwi() but 
smc_cdc_rx_handler();

Following is a more specific description of the issues


lock_sock()
__smc_release

smc_cdc_rx_handler()
smc_cdc_msg_recv()
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE) sock_set_flag(DEAD)
__set_bit __set_bit
bh_unlock_sock()
release_sock()


Note :  bh_lock_sock and lock_sock are not mutually exclusive.
They are actually used for different purposes and contexts.


>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>>   net/smc/af_smc.c    | 4 ++--
>>   net/smc/smc.h       | 5 +++++
>>   net/smc/smc_cdc.c   | 2 +-
>>   net/smc/smc_close.c | 2 +-
>>   4 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index bacdd97..5ad2a9f 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>>         if (!smc->use_fallback) {
>>           rc = smc_close_active(smc);
>> -        sock_set_flag(sk, SOCK_DEAD);
>> +        smc_sock_set_flag(sk, SOCK_DEAD);
>>           sk->sk_shutdown |= SHUTDOWN_MASK;
>>       } else {
>>           if (sk->sk_state != SMC_CLOSED) {
>> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock 
>> *lsmc, struct smc_sock **new_smc)
>>           if (new_clcsock)
>>               sock_release(new_clcsock);
>>           new_sk->sk_state = SMC_CLOSED;
>> -        sock_set_flag(new_sk, SOCK_DEAD);
>> +        smc_sock_set_flag(new_sk, SOCK_DEAD);
>>           sock_put(new_sk); /* final */
>>           *new_smc = NULL;
>>           goto out;
>> diff --git a/net/smc/smc.h b/net/smc/smc.h
>> index 24745fd..e377980 100644
>> --- a/net/smc/smc.h
>> +++ b/net/smc/smc.h
>> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>>   int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct 
>> genl_info *info);
>>   int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct 
>> genl_info *info);
>>   +static inline void smc_sock_set_flag(struct sock *sk, enum 
>> sock_flags flag)
>> +{
>> +    set_bit(flag, &sk->sk_flags);
>> +}
>> +
>>   #endif    /* __SMC_H */
>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>> index 89105e9..01bdb79 100644
>> --- a/net/smc/smc_cdc.c
>> +++ b/net/smc/smc_cdc.c
>> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct 
>> smc_sock *smc,
>>           smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>>           if (smc->clcsock && smc->clcsock->sk)
>>               smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>> -        sock_set_flag(&smc->sk, SOCK_DONE);
>> +        smc_sock_set_flag(&smc->sk, SOCK_DONE);
>>           sock_hold(&smc->sk); /* sock_put in close_work */
>>           if (!queue_work(smc_close_wq, &conn->close_work))
>>               sock_put(&smc->sk);
>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>> index dbdf03e..449ef45 100644
>> --- a/net/smc/smc_close.c
>> +++ b/net/smc/smc_close.c
>> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>>           break;
>>       }
>>   -    sock_set_flag(sk, SOCK_DEAD);
>> +    smc_sock_set_flag(sk, SOCK_DEAD);
>>       sk->sk_state_change(sk);
>>         if (release_clcsock) {


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
  2023-10-11 20:37   ` Wenjia Zhang
@ 2023-10-12  2:49     ` D. Wythe
  2023-10-12 15:15       ` Wenjia Zhang
  0 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-12  2:49 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/12/23 4:37 AM, Wenjia Zhang wrote:
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> This patch re-fix the issues memtianed by commit 22a825c541d7
>> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
>>
>> Blocking sending message do solve the issues though, but it also
>> prevents the peer to receive the final message. Besides, in logic,
>> whether the sndbuf_desc is NULL or not have no impact on the processing
>> of cdc message sending.
>>
> Agree.
>
>> Hence that, this patch allow the cdc message sending but to check the
>> sndbuf_desc with care in smc_cdc_tx_handler().
>>
>> Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in 
>> smc_cdc_tx_handler()")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>>   net/smc/smc_cdc.c | 9 ++++-----
>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>> index 01bdb79..3c06625 100644
>> --- a/net/smc/smc_cdc.c
>> +++ b/net/smc/smc_cdc.c
>> @@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct 
>> smc_wr_tx_pend_priv *pnd_snd,
>>   {
>>       struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend 
>> *)pnd_snd;
>>       struct smc_connection *conn = cdcpend->conn;
>> +    struct smc_buf_desc *sndbuf_desc;
>>       struct smc_sock *smc;
>>       int diff;
>>   +    sndbuf_desc = conn->sndbuf_desc;
>>       smc = container_of(conn, struct smc_sock, conn);
>>       bh_lock_sock(&smc->sk);
>> -    if (!wc_status) {
>> -        diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
>> +    if (!wc_status && sndbuf_desc) {
>> +        diff = smc_curs_diff(sndbuf_desc->len,
> How could this guarantee that the sndbuf_desc would not be NULL?
>

It can not guarantee he sndbuf_desc would not be NULL, but it will prevents
the smc_cdc_tx_handler() to access a NULL sndbuf_desc. So that we
can avoid the panic descried in commit 22a825c541d7
("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").

>> &cdcpend->conn->tx_curs_fin,
>>                        &cdcpend->cursor);
>>           /* sndbuf_space is decreased in smc_sendmsg */
>> @@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
>>       union smc_host_cursor cfed;
>>       int rc;
>>   -    if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
>> -        return -ENOBUFS;
>> -
>>       smc_cdc_add_pending_send(conn, pend);
>>         conn->tx_cdc_seq++;


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
       [not found]     ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
@ 2023-10-12 11:51       ` Wenjia Zhang
  2023-10-13  5:32         ` Dust Li
  0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 11:51 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 12.10.23 04:37, D. Wythe wrote:
> 
> 
> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>
>>
>> On 11.10.23 09:33, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> Considering scenario:
>>>
>>>                 smc_cdc_rx_handler_rwwi
>>> __smc_release
>>>                 sock_set_flag
>>> smc_close_active()
>>> sock_set_flag
>>>
>>> __set_bit(DEAD)            __set_bit(DONE)
>>>
>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>> if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>>> in smc_close_passive_work:
>>>
>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>     smc_close_sent_any_close(conn)) {
>>>     sk->sk_state = SMC_CLOSED;
>>> } else {
>>>     /* just shutdown, but not yet closed locally */
>>>     sk->sk_state = SMC_APPFINCLOSEWAIT;
>>> }
>>>
>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>>> Since set_bit is atomic.
>>>
>> I didn't really understand the scenario. What is 
>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock 
>> during the runtime?
>>
> 
> Hi Wenjia,
> 
> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but 
> smc_cdc_rx_handler();
> 
> Following is a more specific description of the issues
> 
> 
> lock_sock()
> __smc_release
> 
> smc_cdc_rx_handler()
> smc_cdc_msg_recv()
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE) sock_set_flag(DEAD)
> __set_bit __set_bit
> bh_unlock_sock()
> release_sock()
> 
> 
> 
> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are 
> actually used for different purposes and contexts.
> 
> 
ok, that's true that |bh_lock_sock|and |lock_sock|are not really 
mutually exclusive. However, since bh_lock_sock() is used, this scenario 
you described above should not happen, because that gets the 
sk_lock.slock. Following this scenarios, IMO, only the following 
situation can happen.

lock_sock()
__smc_release

smc_cdc_rx_handler()
smc_cdc_msg_recv()
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE)
bh_unlock_sock()
sock_set_flag(DEAD)
release_sock()

> 
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> ---
>>>   net/smc/af_smc.c    | 4 ++--
>>>   net/smc/smc.h       | 5 +++++
>>>   net/smc/smc_cdc.c   | 2 +-
>>>   net/smc/smc_close.c | 2 +-
>>>   4 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>> index bacdd97..5ad2a9f 100644
>>> --- a/net/smc/af_smc.c
>>> +++ b/net/smc/af_smc.c
>>> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>>>         if (!smc->use_fallback) {
>>>           rc = smc_close_active(smc);
>>> -        sock_set_flag(sk, SOCK_DEAD);
>>> +        smc_sock_set_flag(sk, SOCK_DEAD);
>>>           sk->sk_shutdown |= SHUTDOWN_MASK;
>>>       } else {
>>>           if (sk->sk_state != SMC_CLOSED) {
>>> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock 
>>> *lsmc, struct smc_sock **new_smc)
>>>           if (new_clcsock)
>>>               sock_release(new_clcsock);
>>>           new_sk->sk_state = SMC_CLOSED;
>>> -        sock_set_flag(new_sk, SOCK_DEAD);
>>> +        smc_sock_set_flag(new_sk, SOCK_DEAD);
>>>           sock_put(new_sk); /* final */
>>>           *new_smc = NULL;
>>>           goto out;
>>> diff --git a/net/smc/smc.h b/net/smc/smc.h
>>> index 24745fd..e377980 100644
>>> --- a/net/smc/smc.h
>>> +++ b/net/smc/smc.h
>>> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>>>   int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct 
>>> genl_info *info);
>>>   int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct 
>>> genl_info *info);
>>>   +static inline void smc_sock_set_flag(struct sock *sk, enum 
>>> sock_flags flag)
>>> +{
>>> +    set_bit(flag, &sk->sk_flags);
>>> +}
>>> +
>>>   #endif    /* __SMC_H */
>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>>> index 89105e9..01bdb79 100644
>>> --- a/net/smc/smc_cdc.c
>>> +++ b/net/smc/smc_cdc.c
>>> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct 
>>> smc_sock *smc,
>>>           smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>>>           if (smc->clcsock && smc->clcsock->sk)
>>>               smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>>> -        sock_set_flag(&smc->sk, SOCK_DONE);
>>> +        smc_sock_set_flag(&smc->sk, SOCK_DONE);
>>>           sock_hold(&smc->sk); /* sock_put in close_work */
>>>           if (!queue_work(smc_close_wq, &conn->close_work))
>>>               sock_put(&smc->sk);
>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>> index dbdf03e..449ef45 100644
>>> --- a/net/smc/smc_close.c
>>> +++ b/net/smc/smc_close.c
>>> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>>>           break;
>>>       }
>>>   -    sock_set_flag(sk, SOCK_DEAD);
>>> +    smc_sock_set_flag(sk, SOCK_DEAD);
>>>       sk->sk_state_change(sk);
>>>         if (release_clcsock) {
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 0/5] net/smc: bugfixs for smc-r
  2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
                   ` (4 preceding siblings ...)
  2023-10-11  7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
@ 2023-10-12 13:43 ` Alexandra Winter
  2023-10-17  1:56   ` D. Wythe
  5 siblings, 1 reply; 38+ messages in thread
From: Alexandra Winter @ 2023-10-12 13:43 UTC (permalink / raw)
  To: D. Wythe, kgraul, wenjia, jaka
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

The subject of the thread says 'smc-r', but some of the changes affect smc-d alike,
don't they?


On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patches contains bugfix following:
> 
> 1. hung state
> 2. sock leak
> 3. potential panic 
> 

I may be helpful for the reviewers, when you point out, which patch fixes which problem.

Were they all found by code reviews?
Or did some occur in real life? If so, then what were the symptoms?
A description of the symptoms is helpful for somebody who is debugging and wants to check
whether the issue was already fixed upstream.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
  2023-10-12  2:49     ` D. Wythe
@ 2023-10-12 15:15       ` Wenjia Zhang
  0 siblings, 0 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 15:15 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 12.10.23 04:49, D. Wythe wrote:
> 
> 
> On 10/12/23 4:37 AM, Wenjia Zhang wrote:
>>
>>
>> On 11.10.23 09:33, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> This patch re-fix the issues memtianed by commit 22a825c541d7
>>> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
>>>
>>> Blocking sending message do solve the issues though, but it also
>>> prevents the peer to receive the final message. Besides, in logic,
>>> whether the sndbuf_desc is NULL or not have no impact on the processing
>>> of cdc message sending.
>>>
>> Agree.
>>
>>> Hence that, this patch allow the cdc message sending but to check the
>>> sndbuf_desc with care in smc_cdc_tx_handler().
>>>
>>> Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in 
>>> smc_cdc_tx_handler()")
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> ---
>>>   net/smc/smc_cdc.c | 9 ++++-----
>>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>>> index 01bdb79..3c06625 100644
>>> --- a/net/smc/smc_cdc.c
>>> +++ b/net/smc/smc_cdc.c
>>> @@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct 
>>> smc_wr_tx_pend_priv *pnd_snd,
>>>   {
>>>       struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend 
>>> *)pnd_snd;
>>>       struct smc_connection *conn = cdcpend->conn;
>>> +    struct smc_buf_desc *sndbuf_desc;
>>>       struct smc_sock *smc;
>>>       int diff;
>>>   +    sndbuf_desc = conn->sndbuf_desc;
>>>       smc = container_of(conn, struct smc_sock, conn);
>>>       bh_lock_sock(&smc->sk);
>>> -    if (!wc_status) {
>>> -        diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
>>> +    if (!wc_status && sndbuf_desc) {
>>> +        diff = smc_curs_diff(sndbuf_desc->len,
>> How could this guarantee that the sndbuf_desc would not be NULL?
>>
> 
> It can not guarantee he sndbuf_desc would not be NULL, but it will prevents
> the smc_cdc_tx_handler() to access a NULL sndbuf_desc. So that we
> can avoid the panic descried in commit 22a825c541d7
> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
> 
got it, thanks!

Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

>>> &cdcpend->conn->tx_curs_fin,
>>>                        &cdcpend->cursor);
>>>           /* sndbuf_space is decreased in smc_sendmsg */
>>> @@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
>>>       union smc_host_cursor cfed;
>>>       int rc;
>>>   -    if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
>>> -        return -ENOBUFS;
>>> -
>>>       smc_cdc_add_pending_send(conn, pend);
>>>         conn->tx_cdc_seq++;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 4/5] net/smc: protect connection state transitions in listen work
  2023-10-11  7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
@ 2023-10-12 17:14   ` Wenjia Zhang
  2023-10-31  3:04     ` D. Wythe
  0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 17:14 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Consider the following scenario:
> 
> 				smc_close_passive_work
> smc_listen_out_connected
> 				lock_sock()
> if (state  == SMC_INIT)
> 				if (state  == SMC_INIT)
> 					state = SMC_APPCLOSEWAIT1;
> 	state = SMC_ACTIVE
> 				release_sock()
> 
> This would cause the state machine of the connection to be corrupted.
> Also, this issue can occur in smc_listen_out_err().
> 
> To solve this problem, we can protect the state transitions under
> the lock of sock to avoid collision.
> 
To this fix, I have to repeat the question from Alexandra.
Did the scenario occur in real life? Or is it just kind of potencial 
problem you found during the code review?

If it is the former one, could you please show us the corresponding 
message, e.g. from dmesg? If it is the latter one, I'd like to deal with 
it more carefully. Going from this scenario, I noticed that there could 
also be other similar places where we need to make sure that no race 
happens. Thus, it would make more sense to find a systematic approach.

> Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/af_smc.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 5ad2a9f..3bb8265 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -1926,8 +1926,10 @@ static void smc_listen_out_connected(struct smc_sock *new_smc)
>   {
>   	struct sock *newsmcsk = &new_smc->sk;
>   
> +	lock_sock(newsmcsk);
>   	if (newsmcsk->sk_state == SMC_INIT)
>   		newsmcsk->sk_state = SMC_ACTIVE;
> +	release_sock(newsmcsk);
>   
>   	smc_listen_out(new_smc);
>   }
> @@ -1939,9 +1941,12 @@ static void smc_listen_out_err(struct smc_sock *new_smc)
>   	struct net *net = sock_net(newsmcsk);
>   
>   	this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt);
> +
> +	lock_sock(newsmcsk);
>   	if (newsmcsk->sk_state == SMC_INIT)
>   		sock_put(&new_smc->sk); /* passive closing */
>   	newsmcsk->sk_state = SMC_CLOSED;
> +	release_sock(newsmcsk);
>   
>   	smc_listen_out(new_smc);
>   }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-11  7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
  2023-10-11 14:54   ` Dust Li
@ 2023-10-12 19:04   ` Wenjia Zhang
       [not found]     ` <ee641ca5-104b-d1ec-5b2a-e20237c5378a@linux.alibaba.com>
  1 sibling, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 19:04 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Note that we always hold a reference to sock when attempting
> to submit close_work. 
yes
Therefore, if we have successfully
> canceled close_work from pending, we MUST release that reference
> to avoid potential leaks.
> 
Isn't the corresponding reference already released inside the 
smc_close_passive_work()?

> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/smc_close.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
> index 449ef45..10219f5 100644
> --- a/net/smc/smc_close.c
> +++ b/net/smc/smc_close.c
> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock *smc)
>   	struct sock *sk = &smc->sk;
>   
>   	release_sock(sk);
> -	cancel_work_sync(&smc->conn.close_work);
> +	if (cancel_work_sync(&smc->conn.close_work))
> +		sock_put(sk);
>   	cancel_delayed_work_sync(&smc->conn.tx_work);
>   	lock_sock(sk);
>   }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-12 11:51       ` Wenjia Zhang
@ 2023-10-13  5:32         ` Dust Li
  2023-10-13 11:52           ` Wenjia Zhang
  0 siblings, 1 reply; 38+ messages in thread
From: Dust Li @ 2023-10-13  5:32 UTC (permalink / raw)
  To: Wenjia Zhang, D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>
>
>On 12.10.23 04:37, D. Wythe wrote:
>> 
>> 
>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>> > 
>> > 
>> > On 11.10.23 09:33, D. Wythe wrote:
>> > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > > 
>> > > Considering scenario:
>> > > 
>> > >                 smc_cdc_rx_handler_rwwi
>> > > __smc_release
>> > >                 sock_set_flag
>> > > smc_close_active()
>> > > sock_set_flag
>> > > 
>> > > __set_bit(DEAD)            __set_bit(DONE)
>> > > 
>> > > Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> > > if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>> > > in smc_close_passive_work:
>> > > 
>> > > if (sock_flag(sk, SOCK_DEAD) &&
>> > >     smc_close_sent_any_close(conn)) {
>> > >     sk->sk_state = SMC_CLOSED;
>> > > } else {
>> > >     /* just shutdown, but not yet closed locally */
>> > >     sk->sk_state = SMC_APPFINCLOSEWAIT;
>> > > }
>> > > 
>> > > Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> > > Since set_bit is atomic.
>> > > 
>> > I didn't really understand the scenario. What is
>> > smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> > during the runtime?
>> > 
>> 
>> Hi Wenjia,
>> 
>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>> smc_cdc_rx_handler();
>> 
>> Following is a more specific description of the issues
>> 
>> 
>> lock_sock()
>> __smc_release
>> 
>> smc_cdc_rx_handler()
>> smc_cdc_msg_recv()
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE) sock_set_flag(DEAD)
>> __set_bit __set_bit
>> bh_unlock_sock()
>> release_sock()
>> 
>> 
>> 
>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>> actually used for different purposes and contexts.
>> 
>> 
>ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>exclusive. However, since bh_lock_sock() is used, this scenario you described
>above should not happen, because that gets the sk_lock.slock. Following this
>scenarios, IMO, only the following situation can happen.
>
>lock_sock()
>__smc_release
>
>smc_cdc_rx_handler()
>smc_cdc_msg_recv()
>bh_lock_sock()
>smc_cdc_msg_recv_action()
>sock_set_flag(DONE)
>bh_unlock_sock()
>sock_set_flag(DEAD)
>release_sock()

Hi wenjia,

I think I know what D. Wythe means now, and I think he is right on this.

IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
acquires the lock before bh_lock_sock(). This is how the sock lock works.

    PROCESS CONTEXT                                 INTERRUPT CONTEXT
------------------------------------------------------------------------
lock_sock()
    spin_lock_bh(&sk->sk_lock.slock);
    ...
    sk->sk_lock.owned = 1;
    // here the spinlock is released
    spin_unlock_bh(&sk->sk_lock.slock);
__smc_release()
                                                   bh_lock_sock(&smc->sk);
                                                   smc_cdc_msg_recv_action(smc, cdc);
                                                       sock_set_flag(&smc->sk, SOCK_DONE);
                                                   bh_unlock_sock(&smc->sk);

    sock_set_flag(DEAD)  <-- Can be before or after sock_set_flag(DONE)
release_sock()

The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
after lock_sock() return. Therefor, there is actually no lock between
the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
before or after sock_set_flag(DONE).


Actually, in TCP, the interrupt context will check sock_owned_by_user().
If it returns true, the softirq just defer the process to backlog, and process
that in release_sock(). Which avoid the race between softirq and process
when visiting the 'struct sock'.

tcp_v4_rcv()
         bh_lock_sock_nested(sk);
         tcp_segs_in(tcp_sk(sk), skb);
         ret = 0;
         if (!sock_owned_by_user(sk)) {
                 ret = tcp_v4_do_rcv(sk, skb);
         } else {
                 if (tcp_add_backlog(sk, skb, &drop_reason))
                         goto discard_and_relse;
         }
         bh_unlock_sock(sk);


But in SMC we don't have a backlog, that means fields in 'struct sock'
might all have race, and this sock_set_flag() is just one of the cases.

Best regards,
Dust



>
>> 
>> > > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> > > ---
>> > >   net/smc/af_smc.c    | 4 ++--
>> > >   net/smc/smc.h       | 5 +++++
>> > >   net/smc/smc_cdc.c   | 2 +-
>> > >   net/smc/smc_close.c | 2 +-
>> > >   4 files changed, 9 insertions(+), 4 deletions(-)
>> > > 
>> > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> > > index bacdd97..5ad2a9f 100644
>> > > --- a/net/smc/af_smc.c
>> > > +++ b/net/smc/af_smc.c
>> > > @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>> > >         if (!smc->use_fallback) {
>> > >           rc = smc_close_active(smc);
>> > > -        sock_set_flag(sk, SOCK_DEAD);
>> > > +        smc_sock_set_flag(sk, SOCK_DEAD);
>> > >           sk->sk_shutdown |= SHUTDOWN_MASK;
>> > >       } else {
>> > >           if (sk->sk_state != SMC_CLOSED) {
>> > > @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct
>> > > smc_sock *lsmc, struct smc_sock **new_smc)
>> > >           if (new_clcsock)
>> > >               sock_release(new_clcsock);
>> > >           new_sk->sk_state = SMC_CLOSED;
>> > > -        sock_set_flag(new_sk, SOCK_DEAD);
>> > > +        smc_sock_set_flag(new_sk, SOCK_DEAD);
>> > >           sock_put(new_sk); /* final */
>> > >           *new_smc = NULL;
>> > >           goto out;
>> > > diff --git a/net/smc/smc.h b/net/smc/smc.h
>> > > index 24745fd..e377980 100644
>> > > --- a/net/smc/smc.h
>> > > +++ b/net/smc/smc.h
>> > > @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>> > >   int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct
>> > > genl_info *info);
>> > >   int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct
>> > > genl_info *info);
>> > >   +static inline void smc_sock_set_flag(struct sock *sk, enum
>> > > sock_flags flag)
>> > > +{
>> > > +    set_bit(flag, &sk->sk_flags);
>> > > +}
>> > > +
>> > >   #endif    /* __SMC_H */
>> > > diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>> > > index 89105e9..01bdb79 100644
>> > > --- a/net/smc/smc_cdc.c
>> > > +++ b/net/smc/smc_cdc.c
>> > > @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct
>> > > smc_sock *smc,
>> > >           smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>> > >           if (smc->clcsock && smc->clcsock->sk)
>> > >               smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>> > > -        sock_set_flag(&smc->sk, SOCK_DONE);
>> > > +        smc_sock_set_flag(&smc->sk, SOCK_DONE);
>> > >           sock_hold(&smc->sk); /* sock_put in close_work */
>> > >           if (!queue_work(smc_close_wq, &conn->close_work))
>> > >               sock_put(&smc->sk);
>> > > diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>> > > index dbdf03e..449ef45 100644
>> > > --- a/net/smc/smc_close.c
>> > > +++ b/net/smc/smc_close.c
>> > > @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>> > >           break;
>> > >       }
>> > >   -    sock_set_flag(sk, SOCK_DEAD);
>> > > +    smc_sock_set_flag(sk, SOCK_DEAD);
>> > >       sk->sk_state_change(sk);
>> > >         if (release_clcsock) {
>> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-13  5:32         ` Dust Li
@ 2023-10-13 11:52           ` Wenjia Zhang
  2023-10-13 12:27             ` Dust Li
  0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-13 11:52 UTC (permalink / raw)
  To: dust.li, D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 13.10.23 07:32, Dust Li wrote:
> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>
>>
>> On 12.10.23 04:37, D. Wythe wrote:
>>>
>>>
>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> Considering scenario:
>>>>>
>>>>>                  smc_cdc_rx_handler_rwwi
>>>>> __smc_release
>>>>>                  sock_set_flag
>>>>> smc_close_active()
>>>>> sock_set_flag
>>>>>
>>>>> __set_bit(DEAD)            __set_bit(DONE)
>>>>>
>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>> if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>>>>> in smc_close_passive_work:
>>>>>
>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>>      smc_close_sent_any_close(conn)) {
>>>>>      sk->sk_state = SMC_CLOSED;
>>>>> } else {
>>>>>      /* just shutdown, but not yet closed locally */
>>>>>      sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>> }
>>>>>
>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>>>>> Since set_bit is atomic.
>>>>>
>>>> I didn't really understand the scenario. What is
>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>> during the runtime?
>>>>
>>>
>>> Hi Wenjia,
>>>
>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>> smc_cdc_rx_handler();
>>>
>>> Following is a more specific description of the issues
>>>
>>>
>>> lock_sock()
>>> __smc_release
>>>
>>> smc_cdc_rx_handler()
>>> smc_cdc_msg_recv()
>>> bh_lock_sock()
>>> smc_cdc_msg_recv_action()
>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>> __set_bit __set_bit
>>> bh_unlock_sock()
>>> release_sock()
>>>
>>>
>>>
>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>>> actually used for different purposes and contexts.
>>>
>>>
>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> exclusive. However, since bh_lock_sock() is used, this scenario you described
>> above should not happen, because that gets the sk_lock.slock. Following this
>> scenarios, IMO, only the following situation can happen.
>>
>> lock_sock()
>> __smc_release
>>
>> smc_cdc_rx_handler()
>> smc_cdc_msg_recv()
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE)
>> bh_unlock_sock()
>> sock_set_flag(DEAD)
>> release_sock()
> 
> Hi wenjia,
> 
> I think I know what D. Wythe means now, and I think he is right on this.
> 
> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
> acquires the lock before bh_lock_sock(). This is how the sock lock works.
> 
>      PROCESS CONTEXT                                 INTERRUPT CONTEXT
> ------------------------------------------------------------------------
> lock_sock()
>      spin_lock_bh(&sk->sk_lock.slock);
>      ...
>      sk->sk_lock.owned = 1;
>      // here the spinlock is released
>      spin_unlock_bh(&sk->sk_lock.slock);
> __smc_release()
>                                                     bh_lock_sock(&smc->sk);
>                                                     smc_cdc_msg_recv_action(smc, cdc);
>                                                         sock_set_flag(&smc->sk, SOCK_DONE);
>                                                     bh_unlock_sock(&smc->sk);
> 
>      sock_set_flag(DEAD)  <-- Can be before or after sock_set_flag(DONE)
> release_sock()
> 
> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
> after lock_sock() return. Therefor, there is actually no lock between
> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
> before or after sock_set_flag(DONE).
> 
> 
> Actually, in TCP, the interrupt context will check sock_owned_by_user().
> If it returns true, the softirq just defer the process to backlog, and process
> that in release_sock(). Which avoid the race between softirq and process
> when visiting the 'struct sock'.
> 
> tcp_v4_rcv()
>           bh_lock_sock_nested(sk);
>           tcp_segs_in(tcp_sk(sk), skb);
>           ret = 0;
>           if (!sock_owned_by_user(sk)) {
>                   ret = tcp_v4_do_rcv(sk, skb);
>           } else {
>                   if (tcp_add_backlog(sk, skb, &drop_reason))
>                           goto discard_and_relse;
>           }
>           bh_unlock_sock(sk);
> 
> 
> But in SMC we don't have a backlog, that means fields in 'struct sock'
> might all have race, and this sock_set_flag() is just one of the cases.
> 
> Best regards,
> Dust
> 
I agree on your description above.
Sure, the following case 1) can also happen

case 1)
-------
  lock_sock()
  __smc_release

  sock_set_flag(DEAD)
  bh_lock_sock()
  smc_cdc_msg_recv_action()
  sock_set_flag(DONE)
  bh_unlock_sock()
  release_sock()

case 2)
-------
  lock_sock()
  __smc_release

  bh_lock_sock()
  smc_cdc_msg_recv_action()
  sock_set_flag(DONE) sock_set_flag(DEAD)
  __set_bit __set_bit
  bh_unlock_sock()
  release_sock()

My point here is that case2) can never happen. i.e that 
sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently. 
Thus, how could
the atomic set help make sure that the Dead flag would not be 
overwritten with DONE?

Maybe I'm the only one who is getting stuck in the problem. I'll 
aprieciate if you can help me get out :P

Thanks,
Wenjia
> 
> 
>>
>>>
>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>> ---
>>>>>    net/smc/af_smc.c    | 4 ++--
>>>>>    net/smc/smc.h       | 5 +++++
>>>>>    net/smc/smc_cdc.c   | 2 +-
>>>>>    net/smc/smc_close.c | 2 +-
>>>>>    4 files changed, 9 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>>> index bacdd97..5ad2a9f 100644
>>>>> --- a/net/smc/af_smc.c
>>>>> +++ b/net/smc/af_smc.c
>>>>> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>>>>>          if (!smc->use_fallback) {
>>>>>            rc = smc_close_active(smc);
>>>>> -        sock_set_flag(sk, SOCK_DEAD);
>>>>> +        smc_sock_set_flag(sk, SOCK_DEAD);
>>>>>            sk->sk_shutdown |= SHUTDOWN_MASK;
>>>>>        } else {
>>>>>            if (sk->sk_state != SMC_CLOSED) {
>>>>> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct
>>>>> smc_sock *lsmc, struct smc_sock **new_smc)
>>>>>            if (new_clcsock)
>>>>>                sock_release(new_clcsock);
>>>>>            new_sk->sk_state = SMC_CLOSED;
>>>>> -        sock_set_flag(new_sk, SOCK_DEAD);
>>>>> +        smc_sock_set_flag(new_sk, SOCK_DEAD);
>>>>>            sock_put(new_sk); /* final */
>>>>>            *new_smc = NULL;
>>>>>            goto out;
>>>>> diff --git a/net/smc/smc.h b/net/smc/smc.h
>>>>> index 24745fd..e377980 100644
>>>>> --- a/net/smc/smc.h
>>>>> +++ b/net/smc/smc.h
>>>>> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>>>>>    int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct
>>>>> genl_info *info);
>>>>>    int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct
>>>>> genl_info *info);
>>>>>    +static inline void smc_sock_set_flag(struct sock *sk, enum
>>>>> sock_flags flag)
>>>>> +{
>>>>> +    set_bit(flag, &sk->sk_flags);
>>>>> +}
>>>>> +
>>>>>    #endif    /* __SMC_H */
>>>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>>>>> index 89105e9..01bdb79 100644
>>>>> --- a/net/smc/smc_cdc.c
>>>>> +++ b/net/smc/smc_cdc.c
>>>>> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct
>>>>> smc_sock *smc,
>>>>>            smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>>>>>            if (smc->clcsock && smc->clcsock->sk)
>>>>>                smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>>>>> -        sock_set_flag(&smc->sk, SOCK_DONE);
>>>>> +        smc_sock_set_flag(&smc->sk, SOCK_DONE);
>>>>>            sock_hold(&smc->sk); /* sock_put in close_work */
>>>>>            if (!queue_work(smc_close_wq, &conn->close_work))
>>>>>                sock_put(&smc->sk);
>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>> index dbdf03e..449ef45 100644
>>>>> --- a/net/smc/smc_close.c
>>>>> +++ b/net/smc/smc_close.c
>>>>> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>>>>>            break;
>>>>>        }
>>>>>    -    sock_set_flag(sk, SOCK_DEAD);
>>>>> +    smc_sock_set_flag(sk, SOCK_DEAD);
>>>>>        sk->sk_state_change(sk);
>>>>>          if (release_clcsock) {
>>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-13 11:52           ` Wenjia Zhang
@ 2023-10-13 12:27             ` Dust Li
  2023-10-17  2:00               ` D. Wythe
  0 siblings, 1 reply; 38+ messages in thread
From: Dust Li @ 2023-10-13 12:27 UTC (permalink / raw)
  To: Wenjia Zhang, D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>
>
>On 13.10.23 07:32, Dust Li wrote:
>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>> > 
>> > 
>> > On 12.10.23 04:37, D. Wythe wrote:
>> > > 
>> > > 
>> > > On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>> > > > 
>> > > > 
>> > > > On 11.10.23 09:33, D. Wythe wrote:
>> > > > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > > > > 
>> > > > > Considering scenario:
>> > > > > 
>> > > > >                  smc_cdc_rx_handler_rwwi
>> > > > > __smc_release
>> > > > >                  sock_set_flag
>> > > > > smc_close_active()
>> > > > > sock_set_flag
>> > > > > 
>> > > > > __set_bit(DEAD)            __set_bit(DONE)
>> > > > > 
>> > > > > Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> > > > > if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>> > > > > in smc_close_passive_work:
>> > > > > 
>> > > > > if (sock_flag(sk, SOCK_DEAD) &&
>> > > > >      smc_close_sent_any_close(conn)) {
>> > > > >      sk->sk_state = SMC_CLOSED;
>> > > > > } else {
>> > > > >      /* just shutdown, but not yet closed locally */
>> > > > >      sk->sk_state = SMC_APPFINCLOSEWAIT;
>> > > > > }
>> > > > > 
>> > > > > Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> > > > > Since set_bit is atomic.
>> > > > > 
>> > > > I didn't really understand the scenario. What is
>> > > > smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> > > > during the runtime?
>> > > > 
>> > > 
>> > > Hi Wenjia,
>> > > 
>> > > Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>> > > smc_cdc_rx_handler();
>> > > 
>> > > Following is a more specific description of the issues
>> > > 
>> > > 
>> > > lock_sock()
>> > > __smc_release
>> > > 
>> > > smc_cdc_rx_handler()
>> > > smc_cdc_msg_recv()
>> > > bh_lock_sock()
>> > > smc_cdc_msg_recv_action()
>> > > sock_set_flag(DONE) sock_set_flag(DEAD)
>> > > __set_bit __set_bit
>> > > bh_unlock_sock()
>> > > release_sock()
>> > > 
>> > > 
>> > > 
>> > > Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>> > > actually used for different purposes and contexts.
>> > > 
>> > > 
>> > ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> > exclusive. However, since bh_lock_sock() is used, this scenario you described
>> > above should not happen, because that gets the sk_lock.slock. Following this
>> > scenarios, IMO, only the following situation can happen.
>> > 
>> > lock_sock()
>> > __smc_release
>> > 
>> > smc_cdc_rx_handler()
>> > smc_cdc_msg_recv()
>> > bh_lock_sock()
>> > smc_cdc_msg_recv_action()
>> > sock_set_flag(DONE)
>> > bh_unlock_sock()
>> > sock_set_flag(DEAD)
>> > release_sock()
>> 
>> Hi wenjia,
>> 
>> I think I know what D. Wythe means now, and I think he is right on this.
>> 
>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
>> acquires the lock before bh_lock_sock(). This is how the sock lock works.
>> 
>>      PROCESS CONTEXT                                 INTERRUPT CONTEXT
>> ------------------------------------------------------------------------
>> lock_sock()
>>      spin_lock_bh(&sk->sk_lock.slock);
>>      ...
>>      sk->sk_lock.owned = 1;
>>      // here the spinlock is released
>>      spin_unlock_bh(&sk->sk_lock.slock);
>> __smc_release()
>>                                                     bh_lock_sock(&smc->sk);
>>                                                     smc_cdc_msg_recv_action(smc, cdc);
>>                                                         sock_set_flag(&smc->sk, SOCK_DONE);
>>                                                     bh_unlock_sock(&smc->sk);
>> 
>>      sock_set_flag(DEAD)  <-- Can be before or after sock_set_flag(DONE)
>> release_sock()
>> 
>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
>> after lock_sock() return. Therefor, there is actually no lock between
>> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
>> before or after sock_set_flag(DONE).
>> 
>> 
>> Actually, in TCP, the interrupt context will check sock_owned_by_user().
>> If it returns true, the softirq just defer the process to backlog, and process
>> that in release_sock(). Which avoid the race between softirq and process
>> when visiting the 'struct sock'.
>> 
>> tcp_v4_rcv()
>>           bh_lock_sock_nested(sk);
>>           tcp_segs_in(tcp_sk(sk), skb);
>>           ret = 0;
>>           if (!sock_owned_by_user(sk)) {
>>                   ret = tcp_v4_do_rcv(sk, skb);
>>           } else {
>>                   if (tcp_add_backlog(sk, skb, &drop_reason))
>>                           goto discard_and_relse;
>>           }
>>           bh_unlock_sock(sk);
>> 
>> 
>> But in SMC we don't have a backlog, that means fields in 'struct sock'
>> might all have race, and this sock_set_flag() is just one of the cases.
>> 
>> Best regards,
>> Dust
>> 
>I agree on your description above.
>Sure, the following case 1) can also happen
>
>case 1)
>-------
> lock_sock()
> __smc_release
>
> sock_set_flag(DEAD)
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE)
> bh_unlock_sock()
> release_sock()
>
>case 2)
>-------
> lock_sock()
> __smc_release
>
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE) sock_set_flag(DEAD)
> __set_bit __set_bit
> bh_unlock_sock()
> release_sock()
>
>My point here is that case2) can never happen. i.e that sock_set_flag(DONE)
>and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>the atomic set help make sure that the Dead flag would not be overwritten
>with DONE?

I agree with you on this. I also don't see using atomic can
solve the problem of overwriting the DEAD flag with DONE.

I think we need some mechanisms to ensure that sk_flags and other
struct sock related fields are not modified simultaneously.

Best regards,
Dust



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 0/5] net/smc: bugfixs for smc-r
  2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
@ 2023-10-17  1:56   ` D. Wythe
  0 siblings, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-17  1:56 UTC (permalink / raw)
  To: Alexandra Winter, kgraul, wenjia, jaka
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/12/23 9:43 PM, Alexandra Winter wrote:
> The subject of the thread says 'smc-r', but some of the changes affect smc-d alike,
> don't they?

Yes, sorry for this mistake, it should be bugfix for smc.
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> This patches contains bugfix following:
>>
>> 1. hung state
>> 2. sock leak
>> 3. potential panic
>>
> I may be helpful for the reviewers, when you point out, which patch fixes which problem.
>
> Were they all found by code reviews?
> Or did some occur in real life? If so, then what were the symptoms?
> A description of the symptoms is helpful for somebody who is debugging and wants to check
> whether the issue was already fixed upstream.

Hi Alexandra,

Except for the issue with the barrier, which was feedback from the 
review, all other issues have actually occurred in our environment
and have been verified through internal testing. However, most of these 
issues are caused by reference leakage rather than panic, so it is 
difficult to provide a
representative phenomenon. But what you said is do necessary, so I will 
post some phenomena in the next version, such as

lsmod | grep smc
or
smcss - a

In that case, we can found  the issues of reference residue or the 
connection residue. Hope it can be helpful to you.

Thanks,
D. Wythe



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-13 12:27             ` Dust Li
@ 2023-10-17  2:00               ` D. Wythe
  2023-10-17  8:39                 ` Dust Li
  2023-10-17 17:03                 ` Wenjia Zhang
  0 siblings, 2 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-17  2:00 UTC (permalink / raw)
  To: dust.li, Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/13/23 8:27 PM, Dust Li wrote:
> On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>>
>> On 13.10.23 07:32, Dust Li wrote:
>>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>>>
>>>> On 12.10.23 04:37, D. Wythe wrote:
>>>>>
>>>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>>>
>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>
>>>>>>> Considering scenario:
>>>>>>>
>>>>>>>                   smc_cdc_rx_handler_rwwi
>>>>>>> __smc_release
>>>>>>>                   sock_set_flag
>>>>>>> smc_close_active()
>>>>>>> sock_set_flag
>>>>>>>
>>>>>>> __set_bit(DEAD)            __set_bit(DONE)
>>>>>>>
>>>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>>>> if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>>>>>>> in smc_close_passive_work:
>>>>>>>
>>>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>>>>       smc_close_sent_any_close(conn)) {
>>>>>>>       sk->sk_state = SMC_CLOSED;
>>>>>>> } else {
>>>>>>>       /* just shutdown, but not yet closed locally */
>>>>>>>       sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>>>> }
>>>>>>>
>>>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>>>>>>> Since set_bit is atomic.
>>>>>>>
>>>>>> I didn't really understand the scenario. What is
>>>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>>>> during the runtime?
>>>>>>
>>>>> Hi Wenjia,
>>>>>
>>>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>>>> smc_cdc_rx_handler();
>>>>>
>>>>> Following is a more specific description of the issues
>>>>>
>>>>>
>>>>> lock_sock()
>>>>> __smc_release
>>>>>
>>>>> smc_cdc_rx_handler()
>>>>> smc_cdc_msg_recv()
>>>>> bh_lock_sock()
>>>>> smc_cdc_msg_recv_action()
>>>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>>>> __set_bit __set_bit
>>>>> bh_unlock_sock()
>>>>> release_sock()
>>>>>
>>>>>
>>>>>
>>>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>>>>> actually used for different purposes and contexts.
>>>>>
>>>>>
>>>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>>>> exclusive. However, since bh_lock_sock() is used, this scenario you described
>>>> above should not happen, because that gets the sk_lock.slock. Following this
>>>> scenarios, IMO, only the following situation can happen.
>>>>
>>>> lock_sock()
>>>> __smc_release
>>>>
>>>> smc_cdc_rx_handler()
>>>> smc_cdc_msg_recv()
>>>> bh_lock_sock()
>>>> smc_cdc_msg_recv_action()
>>>> sock_set_flag(DONE)
>>>> bh_unlock_sock()
>>>> sock_set_flag(DEAD)
>>>> release_sock()
>>> Hi wenjia,
>>>
>>> I think I know what D. Wythe means now, and I think he is right on this.
>>>
>>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
>>> acquires the lock before bh_lock_sock(). This is how the sock lock works.
>>>
>>>       PROCESS CONTEXT                                 INTERRUPT CONTEXT
>>> ------------------------------------------------------------------------
>>> lock_sock()
>>>       spin_lock_bh(&sk->sk_lock.slock);
>>>       ...
>>>       sk->sk_lock.owned = 1;
>>>       // here the spinlock is released
>>>       spin_unlock_bh(&sk->sk_lock.slock);
>>> __smc_release()
>>>                                                      bh_lock_sock(&smc->sk);
>>>                                                      smc_cdc_msg_recv_action(smc, cdc);
>>>                                                          sock_set_flag(&smc->sk, SOCK_DONE);
>>>                                                      bh_unlock_sock(&smc->sk);
>>>
>>>       sock_set_flag(DEAD)  <-- Can be before or after sock_set_flag(DONE)
>>> release_sock()
>>>
>>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
>>> after lock_sock() return. Therefor, there is actually no lock between
>>> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
>>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
>>> before or after sock_set_flag(DONE).
>>>
>>>
>>> Actually, in TCP, the interrupt context will check sock_owned_by_user().
>>> If it returns true, the softirq just defer the process to backlog, and process
>>> that in release_sock(). Which avoid the race between softirq and process
>>> when visiting the 'struct sock'.
>>>
>>> tcp_v4_rcv()
>>>            bh_lock_sock_nested(sk);
>>>            tcp_segs_in(tcp_sk(sk), skb);
>>>            ret = 0;
>>>            if (!sock_owned_by_user(sk)) {
>>>                    ret = tcp_v4_do_rcv(sk, skb);
>>>            } else {
>>>                    if (tcp_add_backlog(sk, skb, &drop_reason))
>>>                            goto discard_and_relse;
>>>            }
>>>            bh_unlock_sock(sk);
>>>
>>>
>>> But in SMC we don't have a backlog, that means fields in 'struct sock'
>>> might all have race, and this sock_set_flag() is just one of the cases.
>>>
>>> Best regards,
>>> Dust
>>>
>> I agree on your description above.
>> Sure, the following case 1) can also happen
>>
>> case 1)
>> -------
>> lock_sock()
>> __smc_release
>>
>> sock_set_flag(DEAD)
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE)
>> bh_unlock_sock()
>> release_sock()
>>
>> case 2)
>> -------
>> lock_sock()
>> __smc_release
>>
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE) sock_set_flag(DEAD)
>> __set_bit __set_bit
>> bh_unlock_sock()
>> release_sock()
>>
>> My point here is that case2) can never happen. i.e that sock_set_flag(DONE)
>> and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>> the atomic set help make sure that the Dead flag would not be overwritten
>> with DONE?
> I agree with you on this. I also don't see using atomic can
> solve the problem of overwriting the DEAD flag with DONE.
>
> I think we need some mechanisms to ensure that sk_flags and other
> struct sock related fields are not modified simultaneously.
>
> Best regards,
> Dust

It seems that everyone has agrees on that case 2 is impossible. I'm a 
bit confused, why that
sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently. 
What mechanism
prevents their parallel execution?

Best wishes,
D. Wythe

>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-17  2:00               ` D. Wythe
@ 2023-10-17  8:39                 ` Dust Li
  2023-10-17 17:03                 ` Wenjia Zhang
  1 sibling, 0 replies; 38+ messages in thread
From: Dust Li @ 2023-10-17  8:39 UTC (permalink / raw)
  To: D. Wythe, Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

On Tue, Oct 17, 2023 at 10:00:28AM +0800, D. Wythe wrote:
>
>
>On 10/13/23 8:27 PM, Dust Li wrote:
>> On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>> > 
>> > On 13.10.23 07:32, Dust Li wrote:
>> > > On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>> > > > 
>> > > > On 12.10.23 04:37, D. Wythe wrote:
>> > > > > 
>> > > > > On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>> > > > > > 
>> > > > > > On 11.10.23 09:33, D. Wythe wrote:
>> > > > > > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > > > > > > 
>> > > > > > > Considering scenario:
>> > > > > > > 
>> > > > > > >                   smc_cdc_rx_handler_rwwi
>> > > > > > > __smc_release
>> > > > > > >                   sock_set_flag
>> > > > > > > smc_close_active()
>> > > > > > > sock_set_flag
>> > > > > > > 
>> > > > > > > __set_bit(DEAD)            __set_bit(DONE)
>> > > > > > > 
>> > > > > > > Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> > > > > > > if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>> > > > > > > in smc_close_passive_work:
>> > > > > > > 
>> > > > > > > if (sock_flag(sk, SOCK_DEAD) &&
>> > > > > > >       smc_close_sent_any_close(conn)) {
>> > > > > > >       sk->sk_state = SMC_CLOSED;
>> > > > > > > } else {
>> > > > > > >       /* just shutdown, but not yet closed locally */
>> > > > > > >       sk->sk_state = SMC_APPFINCLOSEWAIT;
>> > > > > > > }
>> > > > > > > 
>> > > > > > > Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> > > > > > > Since set_bit is atomic.
>> > > > > > > 
>> > > > > > I didn't really understand the scenario. What is
>> > > > > > smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> > > > > > during the runtime?
>> > > > > > 
>> > > > > Hi Wenjia,
>> > > > > 
>> > > > > Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>> > > > > smc_cdc_rx_handler();
>> > > > > 
>> > > > > Following is a more specific description of the issues
>> > > > > 
>> > > > > 
>> > > > > lock_sock()
>> > > > > __smc_release
>> > > > > 
>> > > > > smc_cdc_rx_handler()
>> > > > > smc_cdc_msg_recv()
>> > > > > bh_lock_sock()
>> > > > > smc_cdc_msg_recv_action()
>> > > > > sock_set_flag(DONE) sock_set_flag(DEAD)
>> > > > > __set_bit __set_bit
>> > > > > bh_unlock_sock()
>> > > > > release_sock()
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>> > > > > actually used for different purposes and contexts.
>> > > > > 
>> > > > > 
>> > > > ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> > > > exclusive. However, since bh_lock_sock() is used, this scenario you described
>> > > > above should not happen, because that gets the sk_lock.slock. Following this
>> > > > scenarios, IMO, only the following situation can happen.
>> > > > 
>> > > > lock_sock()
>> > > > __smc_release
>> > > > 
>> > > > smc_cdc_rx_handler()
>> > > > smc_cdc_msg_recv()
>> > > > bh_lock_sock()
>> > > > smc_cdc_msg_recv_action()
>> > > > sock_set_flag(DONE)
>> > > > bh_unlock_sock()
>> > > > sock_set_flag(DEAD)
>> > > > release_sock()
>> > > Hi wenjia,
>> > > 
>> > > I think I know what D. Wythe means now, and I think he is right on this.
>> > > 
>> > > IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
>> > > acquires the lock before bh_lock_sock(). This is how the sock lock works.
>> > > 
>> > >       PROCESS CONTEXT                                 INTERRUPT CONTEXT
>> > > ------------------------------------------------------------------------
>> > > lock_sock()
>> > >       spin_lock_bh(&sk->sk_lock.slock);
>> > >       ...
>> > >       sk->sk_lock.owned = 1;
>> > >       // here the spinlock is released
>> > >       spin_unlock_bh(&sk->sk_lock.slock);
>> > > __smc_release()
>> > >                                                      bh_lock_sock(&smc->sk);
>> > >                                                      smc_cdc_msg_recv_action(smc, cdc);
>> > >                                                          sock_set_flag(&smc->sk, SOCK_DONE);
>> > >                                                      bh_unlock_sock(&smc->sk);
>> > > 
>> > >       sock_set_flag(DEAD)  <-- Can be before or after sock_set_flag(DONE)
>> > > release_sock()
>> > > 
>> > > The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
>> > > after lock_sock() return. Therefor, there is actually no lock between
>> > > the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
>> > > Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
>> > > before or after sock_set_flag(DONE).
>> > > 
>> > > 
>> > > Actually, in TCP, the interrupt context will check sock_owned_by_user().
>> > > If it returns true, the softirq just defer the process to backlog, and process
>> > > that in release_sock(). Which avoid the race between softirq and process
>> > > when visiting the 'struct sock'.
>> > > 
>> > > tcp_v4_rcv()
>> > >            bh_lock_sock_nested(sk);
>> > >            tcp_segs_in(tcp_sk(sk), skb);
>> > >            ret = 0;
>> > >            if (!sock_owned_by_user(sk)) {
>> > >                    ret = tcp_v4_do_rcv(sk, skb);
>> > >            } else {
>> > >                    if (tcp_add_backlog(sk, skb, &drop_reason))
>> > >                            goto discard_and_relse;
>> > >            }
>> > >            bh_unlock_sock(sk);
>> > > 
>> > > 
>> > > But in SMC we don't have a backlog, that means fields in 'struct sock'
>> > > might all have race, and this sock_set_flag() is just one of the cases.
>> > > 
>> > > Best regards,
>> > > Dust
>> > > 
>> > I agree on your description above.
>> > Sure, the following case 1) can also happen
>> > 
>> > case 1)
>> > -------
>> > lock_sock()
>> > __smc_release
>> > 
>> > sock_set_flag(DEAD)
>> > bh_lock_sock()
>> > smc_cdc_msg_recv_action()
>> > sock_set_flag(DONE)
>> > bh_unlock_sock()
>> > release_sock()
>> > 
>> > case 2)
>> > -------
>> > lock_sock()
>> > __smc_release
>> > 
>> > bh_lock_sock()
>> > smc_cdc_msg_recv_action()
>> > sock_set_flag(DONE) sock_set_flag(DEAD)
>> > __set_bit __set_bit
>> > bh_unlock_sock()
>> > release_sock()
>> > 
>> > My point here is that case2) can never happen. i.e that sock_set_flag(DONE)
>> > and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>> > the atomic set help make sure that the Dead flag would not be overwritten
>> > with DONE?
>> I agree with you on this. I also don't see using atomic can
>> solve the problem of overwriting the DEAD flag with DONE.
>> 
>> I think we need some mechanisms to ensure that sk_flags and other
>> struct sock related fields are not modified simultaneously.
>> 
>> Best regards,
>> Dust
>
>It seems that everyone has agrees on that case 2 is impossible. I'm a bit
>confused, why that
>sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently. What
>mechanism
>prevents their parallel execution?

Upon reviewing the code again, I realize that my previous understanding
was incorrect. I mistakenly believed that the DEAD and DONE flags would
overwrite each other, without realizing that sk_flags is actually used
as a bitmap.

So, I think you are right, using atomic will ensure that the DEAD flag is
always set.

Best regards,
Dust


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-17  2:00               ` D. Wythe
  2023-10-17  8:39                 ` Dust Li
@ 2023-10-17 17:03                 ` Wenjia Zhang
       [not found]                   ` <4065e94f-f7ea-7943-e2cc-0c7d3f9c788b@linux.alibaba.com>
  1 sibling, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-17 17:03 UTC (permalink / raw)
  To: D. Wythe, dust.li, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 17.10.23 04:00, D. Wythe wrote:
> 
> 
> On 10/13/23 8:27 PM, Dust Li wrote:
>> On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>>>
>>> On 13.10.23 07:32, Dust Li wrote:
>>>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>>>>
>>>>> On 12.10.23 04:37, D. Wythe wrote:
>>>>>>
>>>>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>>>>
>>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>>
>>>>>>>> Considering scenario:
>>>>>>>>
>>>>>>>>                   smc_cdc_rx_handler_rwwi
>>>>>>>> __smc_release
>>>>>>>>                   sock_set_flag
>>>>>>>> smc_close_active()
>>>>>>>> sock_set_flag
>>>>>>>>
>>>>>>>> __set_bit(DEAD)            __set_bit(DONE)
>>>>>>>>
>>>>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>>>>> if the DEAD flag lost, the state SMC_CLOSED  will be never be 
>>>>>>>> reached
>>>>>>>> in smc_close_passive_work:
>>>>>>>>
>>>>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>>>>>       smc_close_sent_any_close(conn)) {
>>>>>>>>       sk->sk_state = SMC_CLOSED;
>>>>>>>> } else {
>>>>>>>>       /* just shutdown, but not yet closed locally */
>>>>>>>>       sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this 
>>>>>>>> problem.
>>>>>>>> Since set_bit is atomic.
>>>>>>>>
>>>>>>> I didn't really understand the scenario. What is
>>>>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>>>>> during the runtime?
>>>>>>>
>>>>>> Hi Wenjia,
>>>>>>
>>>>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>>>>> smc_cdc_rx_handler();
>>>>>>
>>>>>> Following is a more specific description of the issues
>>>>>>
>>>>>>
>>>>>> lock_sock()
>>>>>> __smc_release
>>>>>>
>>>>>> smc_cdc_rx_handler()
>>>>>> smc_cdc_msg_recv()
>>>>>> bh_lock_sock()
>>>>>> smc_cdc_msg_recv_action()
>>>>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>>>>> __set_bit __set_bit
>>>>>> bh_unlock_sock()
>>>>>> release_sock()
>>>>>>
>>>>>>
>>>>>>
>>>>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. 
>>>>>> They are
>>>>>> actually used for different purposes and contexts.
>>>>>>
>>>>>>
>>>>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really 
>>>>> mutually
>>>>> exclusive. However, since bh_lock_sock() is used, this scenario you 
>>>>> described
>>>>> above should not happen, because that gets the sk_lock.slock. 
>>>>> Following this
>>>>> scenarios, IMO, only the following situation can happen.
>>>>>
>>>>> lock_sock()
>>>>> __smc_release
>>>>>
>>>>> smc_cdc_rx_handler()
>>>>> smc_cdc_msg_recv()
>>>>> bh_lock_sock()
>>>>> smc_cdc_msg_recv_action()
>>>>> sock_set_flag(DONE)
>>>>> bh_unlock_sock()
>>>>> sock_set_flag(DEAD)
>>>>> release_sock()
>>>> Hi wenjia,
>>>>
>>>> I think I know what D. Wythe means now, and I think he is right on 
>>>> this.
>>>>
>>>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() 
>>>> if it
>>>> acquires the lock before bh_lock_sock(). This is how the sock lock 
>>>> works.
>>>>
>>>>       PROCESS CONTEXT                                 INTERRUPT CONTEXT
>>>> ------------------------------------------------------------------------
>>>> lock_sock()
>>>>       spin_lock_bh(&sk->sk_lock.slock);
>>>>       ...
>>>>       sk->sk_lock.owned = 1;
>>>>       // here the spinlock is released
>>>>       spin_unlock_bh(&sk->sk_lock.slock);
>>>> __smc_release()
>>>>                                                      
>>>> bh_lock_sock(&smc->sk);
>>>>                                                      
>>>> smc_cdc_msg_recv_action(smc, cdc);
>>>>                                                          
>>>> sock_set_flag(&smc->sk, SOCK_DONE);
>>>>                                                      
>>>> bh_unlock_sock(&smc->sk);
>>>>
>>>>       sock_set_flag(DEAD)  <-- Can be before or after 
>>>> sock_set_flag(DONE)
>>>> release_sock()
>>>>
>>>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already 
>>>> released
>>>> after lock_sock() return. Therefor, there is actually no lock between
>>>> the code after lock_sock() and before release_sock() with 
>>>> bh_lock_sock()...bh_unlock_sock().
>>>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and 
>>>> might be
>>>> before or after sock_set_flag(DONE).
>>>>
>>>>
>>>> Actually, in TCP, the interrupt context will check 
>>>> sock_owned_by_user().
>>>> If it returns true, the softirq just defer the process to backlog, 
>>>> and process
>>>> that in release_sock(). Which avoid the race between softirq and 
>>>> process
>>>> when visiting the 'struct sock'.
>>>>
>>>> tcp_v4_rcv()
>>>>            bh_lock_sock_nested(sk);
>>>>            tcp_segs_in(tcp_sk(sk), skb);
>>>>            ret = 0;
>>>>            if (!sock_owned_by_user(sk)) {
>>>>                    ret = tcp_v4_do_rcv(sk, skb);
>>>>            } else {
>>>>                    if (tcp_add_backlog(sk, skb, &drop_reason))
>>>>                            goto discard_and_relse;
>>>>            }
>>>>            bh_unlock_sock(sk);
>>>>
>>>>
>>>> But in SMC we don't have a backlog, that means fields in 'struct sock'
>>>> might all have race, and this sock_set_flag() is just one of the cases.
>>>>
>>>> Best regards,
>>>> Dust
>>>>
>>> I agree on your description above.
>>> Sure, the following case 1) can also happen
>>>
>>> case 1)
>>> -------
>>> lock_sock()
>>> __smc_release
>>>
>>> sock_set_flag(DEAD)
>>> bh_lock_sock()
>>> smc_cdc_msg_recv_action()
>>> sock_set_flag(DONE)
>>> bh_unlock_sock()
>>> release_sock()
>>>
>>> case 2)
>>> -------
>>> lock_sock()
>>> __smc_release
>>>
>>> bh_lock_sock()
>>> smc_cdc_msg_recv_action()
>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>> __set_bit __set_bit
>>> bh_unlock_sock()
>>> release_sock()
>>>
>>> My point here is that case2) can never happen. i.e that 
>>> sock_set_flag(DONE)
>>> and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>>> the atomic set help make sure that the Dead flag would not be 
>>> overwritten
>>> with DONE?
>> I agree with you on this. I also don't see using atomic can
>> solve the problem of overwriting the DEAD flag with DONE.
>>
>> I think we need some mechanisms to ensure that sk_flags and other
>> struct sock related fields are not modified simultaneously.
>>
>> Best regards,
>> Dust
> 
> It seems that everyone has agrees on that case 2 is impossible. I'm a 
> bit confused, why that
> sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently. 
> What mechanism
> prevents their parallel execution?
> 
> Best wishes,
> D. Wythe
> 
>>
> 
In the smc_cdc_rx_handler(), if bh_lock_sock() is got, how could the 
sock_set_flag(DEAD) in the __smc_release() modify the flag concurrently? 
As I said, that could be just kind of lapse of my thought, but I still 
want to make it clarify.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
       [not found]     ` <ee641ca5-104b-d1ec-5b2a-e20237c5378a@linux.alibaba.com>
@ 2023-10-18 20:26       ` Wenjia Zhang
  2023-10-19  7:33         ` D. Wythe
  0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-18 20:26 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 17.10.23 04:06, D. Wythe wrote:
> 
> 
> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>
>>
>> On 11.10.23 09:33, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> Note that we always hold a reference to sock when attempting
>>> to submit close_work. 
>> yes
>> Therefore, if we have successfully
>>> canceled close_work from pending, we MUST release that reference
>>> to avoid potential leaks.
>>>
>> Isn't the corresponding reference already released inside the 
>> smc_close_passive_work()?
>>
> 
> Hi Wenjia,
> 
> If we successfully cancel the close work from the pending state,
> it means that smc_close_passive_work() has never been executed.
> 
> You can find more details here.
> 
> /**
> * cancel_work_sync - cancel a work and wait for it to finish
> * @work:the work to cancel
> *
> * Cancel @work and wait for its execution to finish. This function
> * can be used even if the work re-queues itself or migrates to
> * another workqueue. On return from this function, @work is
> * guaranteed to be not pending or executing on any CPU.
> *
> * cancel_work_sync(&delayed_work->work) must not be used for
> * delayed_work's. Use cancel_delayed_work_sync() instead.
> *
> * The caller must ensure that the workqueue on which @work was last
> * queued can't be destroyed before this function returns.
> *
> * Return:
> * %true if @work was pending, %false otherwise.
> */
> boolcancel_work_sync(structwork_struct *work)
> {
> return__cancel_work_timer(work, false);
> }
> 
> Best wishes,
> D. Wythe
As I understand, queue_work() would wake up the work if the work is not 
already on the queue. And the sock_hold() is just prio to the 
queue_work(). That means, cancel_work_sync() would cancel the work 
either before its execution or after. If your fix refers to the former 
case, at this moment, I don't think the reference can be hold, thus it 
is unnecessary to put it.
> 
>>> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link 
>>> groups")
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> ---
>>>   net/smc/smc_close.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>> index 449ef45..10219f5 100644
>>> --- a/net/smc/smc_close.c
>>> +++ b/net/smc/smc_close.c
>>> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock 
>>> *smc)
>>>       struct sock *sk = &smc->sk;
>>>         release_sock(sk);
>>> -    cancel_work_sync(&smc->conn.close_work);
>>> +    if (cancel_work_sync(&smc->conn.close_work))
>>> +        sock_put(sk);
>>>       cancel_delayed_work_sync(&smc->conn.tx_work);
>>>       lock_sock(sk);
>>>   }
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-18 20:26       ` Wenjia Zhang
@ 2023-10-19  7:33         ` D. Wythe
  2023-10-19 17:40           ` Wenjia Zhang
  0 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-19  7:33 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>
>
> On 17.10.23 04:06, D. Wythe wrote:
>>
>>
>> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>>
>>>
>>> On 11.10.23 09:33, D. Wythe wrote:
>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>
>>>> Note that we always hold a reference to sock when attempting
>>>> to submit close_work. 
>>> yes
>>> Therefore, if we have successfully
>>>> canceled close_work from pending, we MUST release that reference
>>>> to avoid potential leaks.
>>>>
>>> Isn't the corresponding reference already released inside the 
>>> smc_close_passive_work()?
>>>
>>
>> Hi Wenjia,
>>
>> If we successfully cancel the close work from the pending state,
>> it means that smc_close_passive_work() has never been executed.
>>
>> You can find more details here.
>>
>> /**
>> * cancel_work_sync - cancel a work and wait for it to finish
>> * @work:the work to cancel
>> *
>> * Cancel @work and wait for its execution to finish. This function
>> * can be used even if the work re-queues itself or migrates to
>> * another workqueue. On return from this function, @work is
>> * guaranteed to be not pending or executing on any CPU.
>> *
>> * cancel_work_sync(&delayed_work->work) must not be used for
>> * delayed_work's. Use cancel_delayed_work_sync() instead.
>> *
>> * The caller must ensure that the workqueue on which @work was last
>> * queued can't be destroyed before this function returns.
>> *
>> * Return:
>> * %true if @work was pending, %false otherwise.
>> */
>> boolcancel_work_sync(structwork_struct *work)
>> {
>> return__cancel_work_timer(work, false);
>> }
>>
>> Best wishes,
>> D. Wythe
> As I understand, queue_work() would wake up the work if the work is 
> not already on the queue. And the sock_hold() is just prio to the 
> queue_work(). That means, cancel_work_sync() would cancel the work 
> either before its execution or after. If your fix refers to the former 
> case, at this moment, I don't think the reference can be hold, thus it 
> is unnecessary to put it.
>>

I am quite confuse about why you think when we cancel the work before 
its execution,
the reference can not be hold ?


Perhaps the following diagram can describe the problem in better way :

smc_close_cancel_work
smc_cdc_msg_recv_action


sock_hold
queue_work
                                                                if 
(cancel_work_sync())        // successfully cancel before execution
sock_put()                        //  need to put it since we already 
hold a ref before   queue_work()


>>>> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link 
>>>> groups")
>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>> ---
>>>>   net/smc/smc_close.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>> index 449ef45..10219f5 100644
>>>> --- a/net/smc/smc_close.c
>>>> +++ b/net/smc/smc_close.c
>>>> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct 
>>>> smc_sock *smc)
>>>>       struct sock *sk = &smc->sk;
>>>>         release_sock(sk);
>>>> -    cancel_work_sync(&smc->conn.close_work);
>>>> +    if (cancel_work_sync(&smc->conn.close_work))
>>>> +        sock_put(sk);
>>>>       cancel_delayed_work_sync(&smc->conn.tx_work);
>>>>       lock_sock(sk);
>>>>   }
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
       [not found]                   ` <4065e94f-f7ea-7943-e2cc-0c7d3f9c788b@linux.alibaba.com>
@ 2023-10-19 11:54                     ` Wenjia Zhang
  0 siblings, 0 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-19 11:54 UTC (permalink / raw)
  To: D. Wythe, dust.li, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 19.10.23 10:09, D. Wythe wrote:
> 
> 
> On 10/18/23 1:03 AM, Wenjia Zhang wrote:
>>
>>
>> On 17.10.23 04:00, D. Wythe wrote:
>>>
>>>
>>> On 10/13/23 8:27 PM, Dust Li wrote:
>>>> On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>>>>>
>>>>> On 13.10.23 07:32, Dust Li wrote:
>>>>>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>>>>>>
>>>>>>> On 12.10.23 04:37, D. Wythe wrote:
>>>>>>>>
>>>>>>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>>>>>>
>>>>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>>>>
>>>>>>>>>> Considering scenario:
>>>>>>>>>>
>>>>>>>>>>                   smc_cdc_rx_handler_rwwi
>>>>>>>>>> __smc_release
>>>>>>>>>>                   sock_set_flag
>>>>>>>>>> smc_close_active()
>>>>>>>>>> sock_set_flag
>>>>>>>>>>
>>>>>>>>>> __set_bit(DEAD)            __set_bit(DONE)
>>>>>>>>>>
>>>>>>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>>>>>>> if the DEAD flag lost, the state SMC_CLOSED  will be never be 
>>>>>>>>>> reached
>>>>>>>>>> in smc_close_passive_work:
>>>>>>>>>>
>>>>>>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>>>>>>>       smc_close_sent_any_close(conn)) {
>>>>>>>>>>       sk->sk_state = SMC_CLOSED;
>>>>>>>>>> } else {
>>>>>>>>>>       /* just shutdown, but not yet closed locally */
>>>>>>>>>>       sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this 
>>>>>>>>>> problem.
>>>>>>>>>> Since set_bit is atomic.
>>>>>>>>>>
>>>>>>>>> I didn't really understand the scenario. What is
>>>>>>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>>>>>>> during the runtime?
>>>>>>>>>
>>>>>>>> Hi Wenjia,
>>>>>>>>
>>>>>>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>>>>>>> smc_cdc_rx_handler();
>>>>>>>>
>>>>>>>> Following is a more specific description of the issues
>>>>>>>>
>>>>>>>>
>>>>>>>> lock_sock()
>>>>>>>> __smc_release
>>>>>>>>
>>>>>>>> smc_cdc_rx_handler()
>>>>>>>> smc_cdc_msg_recv()
>>>>>>>> bh_lock_sock()
>>>>>>>> smc_cdc_msg_recv_action()
>>>>>>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>>>>>>> __set_bit __set_bit
>>>>>>>> bh_unlock_sock()
>>>>>>>> release_sock()
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. 
>>>>>>>> They are
>>>>>>>> actually used for different purposes and contexts.
>>>>>>>>
>>>>>>>>
>>>>>>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really 
>>>>>>> mutually
>>>>>>> exclusive. However, since bh_lock_sock() is used, this scenario 
>>>>>>> you described
>>>>>>> above should not happen, because that gets the sk_lock.slock. 
>>>>>>> Following this
>>>>>>> scenarios, IMO, only the following situation can happen.
>>>>>>>
>>>>>>> lock_sock()
>>>>>>> __smc_release
>>>>>>>
>>>>>>> smc_cdc_rx_handler()
>>>>>>> smc_cdc_msg_recv()
>>>>>>> bh_lock_sock()
>>>>>>> smc_cdc_msg_recv_action()
>>>>>>> sock_set_flag(DONE)
>>>>>>> bh_unlock_sock()
>>>>>>> sock_set_flag(DEAD)
>>>>>>> release_sock()
>>>>>> Hi wenjia,
>>>>>>
>>>>>> I think I know what D. Wythe means now, and I think he is right on 
>>>>>> this.
>>>>>>
>>>>>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() 
>>>>>> if it
>>>>>> acquires the lock before bh_lock_sock(). This is how the sock lock 
>>>>>> works.
>>>>>>
>>>>>>       PROCESS CONTEXT INTERRUPT CONTEXT
>>>>>> ------------------------------------------------------------------------
>>>>>> lock_sock()
>>>>>>       spin_lock_bh(&sk->sk_lock.slock);
>>>>>>       ...
>>>>>>       sk->sk_lock.owned = 1;
>>>>>>       // here the spinlock is released
>>>>>>       spin_unlock_bh(&sk->sk_lock.slock);
>>>>>> __smc_release()
>>>>>> bh_lock_sock(&smc->sk);
>>>>>> smc_cdc_msg_recv_action(smc, cdc);
>>>>>> sock_set_flag(&smc->sk, SOCK_DONE);
>>>>>> bh_unlock_sock(&smc->sk);
>>>>>>
>>>>>>       sock_set_flag(DEAD)  <-- Can be before or after 
>>>>>> sock_set_flag(DONE)
>>>>>> release_sock()
>>>>>>
>>>>>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is 
>>>>>> already released
>>>>>> after lock_sock() return. Therefor, there is actually no lock between
>>>>>> the code after lock_sock() and before release_sock() with 
>>>>>> bh_lock_sock()...bh_unlock_sock().
>>>>>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and 
>>>>>> might be
>>>>>> before or after sock_set_flag(DONE).
>>>>>>
>>>>>>
>>>>>> Actually, in TCP, the interrupt context will check 
>>>>>> sock_owned_by_user().
>>>>>> If it returns true, the softirq just defer the process to backlog, 
>>>>>> and process
>>>>>> that in release_sock(). Which avoid the race between softirq and 
>>>>>> process
>>>>>> when visiting the 'struct sock'.
>>>>>>
>>>>>> tcp_v4_rcv()
>>>>>>            bh_lock_sock_nested(sk);
>>>>>>            tcp_segs_in(tcp_sk(sk), skb);
>>>>>>            ret = 0;
>>>>>>            if (!sock_owned_by_user(sk)) {
>>>>>>                    ret = tcp_v4_do_rcv(sk, skb);
>>>>>>            } else {
>>>>>>                    if (tcp_add_backlog(sk, skb, &drop_reason))
>>>>>>                            goto discard_and_relse;
>>>>>>            }
>>>>>>            bh_unlock_sock(sk);
>>>>>>
>>>>>>
>>>>>> But in SMC we don't have a backlog, that means fields in 'struct 
>>>>>> sock'
>>>>>> might all have race, and this sock_set_flag() is just one of the 
>>>>>> cases.
>>>>>>
>>>>>> Best regards,
>>>>>> Dust
>>>>>>
>>>>> I agree on your description above.
>>>>> Sure, the following case 1) can also happen
>>>>>
>>>>> case 1)
>>>>> -------
>>>>> lock_sock()
>>>>> __smc_release
>>>>>
>>>>> sock_set_flag(DEAD)
>>>>> bh_lock_sock()
>>>>> smc_cdc_msg_recv_action()
>>>>> sock_set_flag(DONE)
>>>>> bh_unlock_sock()
>>>>> release_sock()
>>>>>
>>>>> case 2)
>>>>> -------
>>>>> lock_sock()
>>>>> __smc_release
>>>>>
>>>>> bh_lock_sock()
>>>>> smc_cdc_msg_recv_action()
>>>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>>>> __set_bit __set_bit
>>>>> bh_unlock_sock()
>>>>> release_sock()
>>>>>
>>>>> My point here is that case2) can never happen. i.e that 
>>>>> sock_set_flag(DONE)
>>>>> and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>>>>> the atomic set help make sure that the Dead flag would not be 
>>>>> overwritten
>>>>> with DONE?
>>>> I agree with you on this. I also don't see using atomic can
>>>> solve the problem of overwriting the DEAD flag with DONE.
>>>>
>>>> I think we need some mechanisms to ensure that sk_flags and other
>>>> struct sock related fields are not modified simultaneously.
>>>>
>>>> Best regards,
>>>> Dust
>>>
>>> It seems that everyone has agrees on that case 2 is impossible. I'm a 
>>> bit confused, why that
>>> sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen 
>>> concurrently. What mechanism
>>> prevents their parallel execution?
>>>
>>> Best wishes,
>>> D. Wythe
>>>
>>>>
>>>
>> In the smc_cdc_rx_handler(), if bh_lock_sock() is got, how could the 
>> sock_set_flag(DEAD) in the __smc_release() modify the flag 
>> concurrently? As I said, that could be just kind of lapse of my 
>> thought, but I still want to make it clarify.
> 
> #define bh_lock_sock(__sk) spin_lock(&((__sk)->sk_lock.slock))
> 
> static inline void lock_sock(struct sock *sk)
> {
>      lock_sock_nested(sk, 0);
> }
> 
> void lock_sock_nested(struct sock *sk, int subclass)
> {
>      might_sleep();
> spin_lock_bh(&sk->sk_lock.slock);
>      if (sk->sk_lock.owned)
>          __lock_sock(sk);
>      sk->sk_lock.owned = 1;
> 
> */spin_unlock(&sk->sk_lock.slock);/*
>      /*
>       * The sk_lock has mutex_lock() semantics here:
>       */
>      mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
>      local_bh_enable();
> }
> 
> 
> It seems that you believe bh_lock_sock() will block the execution of 
> __smc_release(), indicating that you think the spin on slock will block 
> the execution of __smc_release().
> So, you assume that __smc_release() must also spin on slock, right?
> 
That is right what I mean.

> However, lock_sock() releases the slock before returning. You can see it 
> in code above. In other words, __smc_release() will not spin on slock.
> This means that bh_lock_sock() will not block the execution of 
> __smc_release().
> 
Do you mean that the spin_unlock you marked in the code above is to 
release the socket spin lock from __smc_release()?

> Hoping this will helps
> 
> Best wishes,
> D. Wythe
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-19  7:33         ` D. Wythe
@ 2023-10-19 17:40           ` Wenjia Zhang
  2023-10-20  2:41             ` D. Wythe
  0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-19 17:40 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 19.10.23 09:33, D. Wythe wrote:
> 
> 
> On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>>
>>
>> On 17.10.23 04:06, D. Wythe wrote:
>>>
>>>
>>> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> Note that we always hold a reference to sock when attempting
>>>>> to submit close_work. 
>>>> yes
>>>> Therefore, if we have successfully
>>>>> canceled close_work from pending, we MUST release that reference
>>>>> to avoid potential leaks.
>>>>>
>>>> Isn't the corresponding reference already released inside the 
>>>> smc_close_passive_work()?
>>>>
>>>
>>> Hi Wenjia,
>>>
>>> If we successfully cancel the close work from the pending state,
>>> it means that smc_close_passive_work() has never been executed.
>>>
>>> You can find more details here.
>>>
>>> /**
>>> * cancel_work_sync - cancel a work and wait for it to finish
>>> * @work:the work to cancel
>>> *
>>> * Cancel @work and wait for its execution to finish. This function
>>> * can be used even if the work re-queues itself or migrates to
>>> * another workqueue. On return from this function, @work is
>>> * guaranteed to be not pending or executing on any CPU.
>>> *
>>> * cancel_work_sync(&delayed_work->work) must not be used for
>>> * delayed_work's. Use cancel_delayed_work_sync() instead.
>>> *
>>> * The caller must ensure that the workqueue on which @work was last
>>> * queued can't be destroyed before this function returns.
>>> *
>>> * Return:
>>> * %true if @work was pending, %false otherwise.
>>> */
>>> boolcancel_work_sync(structwork_struct *work)
>>> {
>>> return__cancel_work_timer(work, false);
>>> }
>>>
>>> Best wishes,
>>> D. Wythe
>> As I understand, queue_work() would wake up the work if the work is 
>> not already on the queue. And the sock_hold() is just prio to the 
>> queue_work(). That means, cancel_work_sync() would cancel the work 
>> either before its execution or after. If your fix refers to the former 
>> case, at this moment, I don't think the reference can be hold, thus it 
>> is unnecessary to put it.
>>>
> 
> I am quite confuse about why you think when we cancel the work before 
> its execution,
> the reference can not be hold ?
> 
> 
> Perhaps the following diagram can describe the problem in better way :
> 
> smc_close_cancel_work
> smc_cdc_msg_recv_action
> 
> 
> sock_hold
> queue_work
>                                                                 if 
> (cancel_work_sync())        // successfully cancel before execution
> sock_put()                        //  need to put it since we already 
> hold a ref before   queue_work()
> 
> 
ha, I already thought you might ask such question:P

I think here two Problems need to be clarified:

1) Do you think the bh_lock_sock/bh_unlock_sock in the smc_cdc_msg_recv 
does not protect the smc_cdc_msg_recv_action() from cancel_work_sync()?
Maybe that would go back to the discussion in the other patch on the 
behaviors of the locks.

2) If the queue_work returns true, as I said in the last main, the work 
should be (being) executed. How could the cancel_work_sync() cancel the 
work before execution successgully?

>>>>> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link 
>>>>> groups")
>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>> ---
>>>>>   net/smc/smc_close.c | 3 ++-
>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>> index 449ef45..10219f5 100644
>>>>> --- a/net/smc/smc_close.c
>>>>> +++ b/net/smc/smc_close.c
>>>>> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct 
>>>>> smc_sock *smc)
>>>>>       struct sock *sk = &smc->sk;
>>>>>         release_sock(sk);
>>>>> -    cancel_work_sync(&smc->conn.close_work);
>>>>> +    if (cancel_work_sync(&smc->conn.close_work))
>>>>> +        sock_put(sk);
>>>>>       cancel_delayed_work_sync(&smc->conn.tx_work);
>>>>>       lock_sock(sk);
>>>>>   }
>>>
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-19 17:40           ` Wenjia Zhang
@ 2023-10-20  2:41             ` D. Wythe
  2023-10-23  8:19               ` Wenjia Zhang
  0 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-20  2:41 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/20/23 1:40 AM, Wenjia Zhang wrote:
>
>
> On 19.10.23 09:33, D. Wythe wrote:
>>
>>
>> On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>>>
>>>
>>> On 17.10.23 04:06, D. Wythe wrote:
>>>>
>>>>
>>>> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>>>>
>>>>>
>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>
>>>>>> Note that we always hold a reference to sock when attempting
>>>>>> to submit close_work. 
>>>>> yes
>>>>> Therefore, if we have successfully
>>>>>> canceled close_work from pending, we MUST release that reference
>>>>>> to avoid potential leaks.
>>>>>>
>>>>> Isn't the corresponding reference already released inside the 
>>>>> smc_close_passive_work()?
>>>>>
>>>>
>>>> Hi Wenjia,
>>>>
>>>> If we successfully cancel the close work from the pending state,
>>>> it means that smc_close_passive_work() has never been executed.
>>>>
>>>> You can find more details here.
>>>>
>>>> /**
>>>> * cancel_work_sync - cancel a work and wait for it to finish
>>>> * @work:the work to cancel
>>>> *
>>>> * Cancel @work and wait for its execution to finish. This function
>>>> * can be used even if the work re-queues itself or migrates to
>>>> * another workqueue. On return from this function, @work is
>>>> * guaranteed to be not pending or executing on any CPU.
>>>> *
>>>> * cancel_work_sync(&delayed_work->work) must not be used for
>>>> * delayed_work's. Use cancel_delayed_work_sync() instead.
>>>> *
>>>> * The caller must ensure that the workqueue on which @work was last
>>>> * queued can't be destroyed before this function returns.
>>>> *
>>>> * Return:
>>>> * %true if @work was pending, %false otherwise.
>>>> */
>>>> boolcancel_work_sync(structwork_struct *work)
>>>> {
>>>> return__cancel_work_timer(work, false);
>>>> }
>>>>
>>>> Best wishes,
>>>> D. Wythe
>>> As I understand, queue_work() would wake up the work if the work is 
>>> not already on the queue. And the sock_hold() is just prio to the 
>>> queue_work(). That means, cancel_work_sync() would cancel the work 
>>> either before its execution or after. If your fix refers to the 
>>> former case, at this moment, I don't think the reference can be 
>>> hold, thus it is unnecessary to put it.
>>>>
>>
>> I am quite confuse about why you think when we cancel the work before 
>> its execution,
>> the reference can not be hold ?
>>
>>
>> Perhaps the following diagram can describe the problem in better way :
>>
>> smc_close_cancel_work
>> smc_cdc_msg_recv_action
>>
>>
>> sock_hold
>> queue_work
>> if (cancel_work_sync())        // successfully cancel before execution
>> sock_put()                        //  need to put it since we already 
>> hold a ref before   queue_work()
>>
>>
> ha, I already thought you might ask such question:P
>
> I think here two Problems need to be clarified:
>
> 1) Do you think the bh_lock_sock/bh_unlock_sock in the 
> smc_cdc_msg_recv does not protect the smc_cdc_msg_recv_action() from 
> cancel_work_sync()?
> Maybe that would go back to the discussion in the other patch on the 
> behaviors of the locks.
>

Yes. bh_lock_sock/bh_unlock_sock can not block code execution protected 
by lock_sock/unlock(). That is to say, they are not exclusive.

We can use a very simple example to infer that since bh_lock_sock is 
type of spin-lock, if bh_lock_sock/bh_unlock_sock can block 
lock_sock/unlock(),
then lock_sock/unlock() can also block bh_lock_sock/bh_unlock_sock.

If this is true, when the process context already lock_sock(), the 
interrupt context must wait for the process to call
release_sock(). Obviously, this is very unreasonable.


> 2) If the queue_work returns true, as I said in the last main, the 
> work should be (being) executed. How could the cancel_work_sync() 
> cancel the work before execution successgully?

No, that's not true. In fact, if queue_work returns true, it simply 
means that we have added the task to the queue and may schedule a worker 
to execute it,
but it does not guarantee that the task will be executed or is being 
executed when it returns true,
the task might still in the list and waiting some worker to execute it.

We can make a simple inference,

1. A known fact is that if no special flag (WORK_UNBOUND) is given, 
tasks submitted will eventually be executed on the CPU where they were 
submitted.

2. If the queue_work returns true, the work should be or is being executed

If all of the above are true, when we invoke queue_work in an interrupt 
context, does it mean that the submitted task will be executed in the 
interrupt context?


Best wishes,
D. Wythe

>
>>>>>> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD 
>>>>>> link groups")
>>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>>> ---
>>>>>>   net/smc/smc_close.c | 3 ++-
>>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>>> index 449ef45..10219f5 100644
>>>>>> --- a/net/smc/smc_close.c
>>>>>> +++ b/net/smc/smc_close.c
>>>>>> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct 
>>>>>> smc_sock *smc)
>>>>>>       struct sock *sk = &smc->sk;
>>>>>>         release_sock(sk);
>>>>>> -    cancel_work_sync(&smc->conn.close_work);
>>>>>> +    if (cancel_work_sync(&smc->conn.close_work))
>>>>>> +        sock_put(sk);
>>>>>> cancel_delayed_work_sync(&smc->conn.tx_work);
>>>>>>       lock_sock(sk);
>>>>>>   }
>>>>
>>
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-20  2:41             ` D. Wythe
@ 2023-10-23  8:19               ` Wenjia Zhang
  2023-10-23  8:52                 ` D. Wythe
  0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-23  8:19 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 20.10.23 04:41, D. Wythe wrote:
> 
> 
> On 10/20/23 1:40 AM, Wenjia Zhang wrote:
>>
>>
>> On 19.10.23 09:33, D. Wythe wrote:
>>>
>>>
>>> On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 17.10.23 04:06, D. Wythe wrote:
>>>>>
>>>>>
>>>>> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>>>>>
>>>>>>
>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>
>>>>>>> Note that we always hold a reference to sock when attempting
>>>>>>> to submit close_work. 
>>>>>> yes
>>>>>> Therefore, if we have successfully
>>>>>>> canceled close_work from pending, we MUST release that reference
>>>>>>> to avoid potential leaks.
>>>>>>>
>>>>>> Isn't the corresponding reference already released inside the 
>>>>>> smc_close_passive_work()?
>>>>>>
>>>>>
>>>>> Hi Wenjia,
>>>>>
>>>>> If we successfully cancel the close work from the pending state,
>>>>> it means that smc_close_passive_work() has never been executed.
>>>>>
>>>>> You can find more details here.
>>>>>
>>>>> /**
>>>>> * cancel_work_sync - cancel a work and wait for it to finish
>>>>> * @work:the work to cancel
>>>>> *
>>>>> * Cancel @work and wait for its execution to finish. This function
>>>>> * can be used even if the work re-queues itself or migrates to
>>>>> * another workqueue. On return from this function, @work is
>>>>> * guaranteed to be not pending or executing on any CPU.
>>>>> *
>>>>> * cancel_work_sync(&delayed_work->work) must not be used for
>>>>> * delayed_work's. Use cancel_delayed_work_sync() instead.
>>>>> *
>>>>> * The caller must ensure that the workqueue on which @work was last
>>>>> * queued can't be destroyed before this function returns.
>>>>> *
>>>>> * Return:
>>>>> * %true if @work was pending, %false otherwise.
>>>>> */
>>>>> boolcancel_work_sync(structwork_struct *work)
>>>>> {
>>>>> return__cancel_work_timer(work, false);
>>>>> }
>>>>>
>>>>> Best wishes,
>>>>> D. Wythe
>>>> As I understand, queue_work() would wake up the work if the work is 
>>>> not already on the queue. And the sock_hold() is just prio to the 
>>>> queue_work(). That means, cancel_work_sync() would cancel the work 
>>>> either before its execution or after. If your fix refers to the 
>>>> former case, at this moment, I don't think the reference can be 
>>>> hold, thus it is unnecessary to put it.
>>>>>
>>>
>>> I am quite confuse about why you think when we cancel the work before 
>>> its execution,
>>> the reference can not be hold ?
>>>
>>>
>>> Perhaps the following diagram can describe the problem in better way :
>>>
>>> smc_close_cancel_work
>>> smc_cdc_msg_recv_action
>>>
>>>
>>> sock_hold
>>> queue_work
>>> if (cancel_work_sync())        // successfully cancel before execution
>>> sock_put()                        //  need to put it since we already 
>>> hold a ref before   queue_work()
>>>
>>>
>> ha, I already thought you might ask such question:P
>>
>> I think here two Problems need to be clarified:
>>
>> 1) Do you think the bh_lock_sock/bh_unlock_sock in the 
>> smc_cdc_msg_recv does not protect the smc_cdc_msg_recv_action() from 
>> cancel_work_sync()?
>> Maybe that would go back to the discussion in the other patch on the 
>> behaviors of the locks.
>>
> 
> Yes. bh_lock_sock/bh_unlock_sock can not block code execution protected 
> by lock_sock/unlock(). That is to say, they are not exclusive.
> 
No, the logic of the inference is very vague to me. My understand is 
completely different. That is what I read from the kernel code. They are 
not *completely* exclusive, because while the bottom half context holds 
the lock i.e. bh_lock_sock, the process context can not get the lock by 
lock_sock. (This is actually my main point of my argument for these 
fixes, and I didn't see any clarify from you). However, while the 
process context holds the lock by lock_sock, the bottom half context can 
still get it by bh_lock_sock, this is just like what you showed in the 
code in lock_sock. Once it gets the ownership, it release the spinlock.

> We can use a very simple example to infer that since bh_lock_sock is 
> type of spin-lock, if bh_lock_sock/bh_unlock_sock can block 
> lock_sock/unlock(),
> then lock_sock/unlock() can also block bh_lock_sock/bh_unlock_sock.
> 
> If this is true, when the process context already lock_sock(), the 
> interrupt context must wait for the process to call
> release_sock(). Obviously, this is very unreasonable.
> 
> 
>> 2) If the queue_work returns true, as I said in the last main, the 
>> work should be (being) executed. How could the cancel_work_sync() 
>> cancel the work before execution successgully?
> 
> No, that's not true. In fact, if queue_work returns true, it simply 
> means that we have added the task to the queue and may schedule a worker 
> to execute it,
> but it does not guarantee that the task will be executed or is being 
> executed when it returns true,
> the task might still in the list and waiting some worker to execute it.
> 
> We can make a simple inference,
> 
> 1. A known fact is that if no special flag (WORK_UNBOUND) is given, 
> tasks submitted will eventually be executed on the CPU where they were 
> submitted.
> 
> 2. If the queue_work returns true, the work should be or is being executed
> 
> If all of the above are true, when we invoke queue_work in an interrupt 
> context, does it mean that the submitted task will be executed in the 
> interrupt context?
> 
> 
> Best wishes,
> D. Wythe
> 
If you say the thread is not gauranteed to be waken up in then 
queue_work to execute the work, please explain what the kick_pool 
function does.

However, the spin_lock understanding is still the key problem in the 
cases. As I said, if it is not get clarify, we don't really need to go 
on to disucss this.

>>
>>>>>>> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD 
>>>>>>> link groups")
>>>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>>>> ---
>>>>>>>   net/smc/smc_close.c | 3 ++-
>>>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>>>> index 449ef45..10219f5 100644
>>>>>>> --- a/net/smc/smc_close.c
>>>>>>> +++ b/net/smc/smc_close.c
>>>>>>> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct 
>>>>>>> smc_sock *smc)
>>>>>>>       struct sock *sk = &smc->sk;
>>>>>>>         release_sock(sk);
>>>>>>> -    cancel_work_sync(&smc->conn.close_work);
>>>>>>> +    if (cancel_work_sync(&smc->conn.close_work))
>>>>>>> +        sock_put(sk);
>>>>>>> cancel_delayed_work_sync(&smc->conn.tx_work);
>>>>>>>       lock_sock(sk);
>>>>>>>   }
>>>>>
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-23  8:19               ` Wenjia Zhang
@ 2023-10-23  8:52                 ` D. Wythe
  2023-10-23 10:28                   ` Wenjia Zhang
  0 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-23  8:52 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/23/23 4:19 PM, Wenjia Zhang wrote:
>
>
> On 20.10.23 04:41, D. Wythe wrote:
>>
>>
>> On 10/20/23 1:40 AM, Wenjia Zhang wrote:
>>>
>>>
>>> On 19.10.23 09:33, D. Wythe wrote:
>>>>
>>>>
>>>> On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>>>>>
>>>>>
>>>>> On 17.10.23 04:06, D. Wythe wrote:
>>>>>>
>>>>>>
>>>>>> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>>
>>>>>>>> Note that we always hold a reference to sock when attempting
>>>>>>>> to submit close_work. 
>>>>>>> yes
>>>>>>> Therefore, if we have successfully
>>>>>>>> canceled close_work from pending, we MUST release that reference
>>>>>>>> to avoid potential leaks.
>>>>>>>>
>>>>>>> Isn't the corresponding reference already released inside the 
>>>>>>> smc_close_passive_work()?
>>>>>>>
>>>>>>
>>>>>> Hi Wenjia,
>>>>>>
>>>>>> If we successfully cancel the close work from the pending state,
>>>>>> it means that smc_close_passive_work() has never been executed.
>>>>>>
>>>>>> You can find more details here.
>>>>>>
>>>>>> /**
>>>>>> * cancel_work_sync - cancel a work and wait for it to finish
>>>>>> * @work:the work to cancel
>>>>>> *
>>>>>> * Cancel @work and wait for its execution to finish. This function
>>>>>> * can be used even if the work re-queues itself or migrates to
>>>>>> * another workqueue. On return from this function, @work is
>>>>>> * guaranteed to be not pending or executing on any CPU.
>>>>>> *
>>>>>> * cancel_work_sync(&delayed_work->work) must not be used for
>>>>>> * delayed_work's. Use cancel_delayed_work_sync() instead.
>>>>>> *
>>>>>> * The caller must ensure that the workqueue on which @work was last
>>>>>> * queued can't be destroyed before this function returns.
>>>>>> *
>>>>>> * Return:
>>>>>> * %true if @work was pending, %false otherwise.
>>>>>> */
>>>>>> boolcancel_work_sync(structwork_struct *work)
>>>>>> {
>>>>>> return__cancel_work_timer(work, false);
>>>>>> }
>>>>>>
>>>>>> Best wishes,
>>>>>> D. Wythe
>>>>> As I understand, queue_work() would wake up the work if the work 
>>>>> is not already on the queue. And the sock_hold() is just prio to 
>>>>> the queue_work(). That means, cancel_work_sync() would cancel the 
>>>>> work either before its execution or after. If your fix refers to 
>>>>> the former case, at this moment, I don't think the reference can 
>>>>> be hold, thus it is unnecessary to put it.
>>>>>>
>>>>
>>>> I am quite confuse about why you think when we cancel the work 
>>>> before its execution,
>>>> the reference can not be hold ?
>>>>
>>>>
>>>> Perhaps the following diagram can describe the problem in better way :
>>>>
>>>> smc_close_cancel_work
>>>> smc_cdc_msg_recv_action
>>>>
>>>>
>>>> sock_hold
>>>> queue_work
>>>> if (cancel_work_sync())        // successfully cancel before execution
>>>> sock_put()                        //  need to put it since we 
>>>> already hold a ref before   queue_work()
>>>>
>>>>
>>> ha, I already thought you might ask such question:P
>>>
>>> I think here two Problems need to be clarified:
>>>
>>> 1) Do you think the bh_lock_sock/bh_unlock_sock in the 
>>> smc_cdc_msg_recv does not protect the smc_cdc_msg_recv_action() from 
>>> cancel_work_sync()?
>>> Maybe that would go back to the discussion in the other patch on the 
>>> behaviors of the locks.
>>>
>>
>> Yes. bh_lock_sock/bh_unlock_sock can not block code execution 
>> protected by lock_sock/unlock(). That is to say, they are not exclusive.
>>
> No, the logic of the inference is very vague to me. My understand is 
> completely different. That is what I read from the kernel code. They 
> are not *completely* exclusive, because while the bottom half context 
> holds the lock i.e. bh_lock_sock, the process context can not get the 
> lock by lock_sock. (This is actually my main point of my argument for 
> these fixes, and I didn't see any clarify from you). However, while 
> the process context holds the lock by lock_sock, the bottom half 
> context can still get it by bh_lock_sock, this is just like what you 
> showed in the code in lock_sock. Once it gets the ownership, it 
> release the spinlock.
>

“ while the process context holds the lock by lock_sock, the bottom half 
context can still get it by bh_lock_sock,  ”

You already got that, so why that sock_set_flag(DONE) and 
sock_set_flag(DEAD) can not happen concurrently ?


>> We can use a very simple example to infer that since bh_lock_sock is 
>> type of spin-lock, if bh_lock_sock/bh_unlock_sock can block 
>> lock_sock/unlock(),
>> then lock_sock/unlock() can also block bh_lock_sock/bh_unlock_sock.
>>
>> If this is true, when the process context already lock_sock(), the 
>> interrupt context must wait for the process to call
>> release_sock(). Obviously, this is very unreasonable.
>>
>>
>>> 2) If the queue_work returns true, as I said in the last main, the 
>>> work should be (being) executed. How could the cancel_work_sync() 
>>> cancel the work before execution successgully?
>>
>> No, that's not true. In fact, if queue_work returns true, it simply 
>> means that we have added the task to the queue and may schedule a 
>> worker to execute it,
>> but it does not guarantee that the task will be executed or is being 
>> executed when it returns true,
>> the task might still in the list and waiting some worker to execute it.
>>
>> We can make a simple inference,
>>
>> 1. A known fact is that if no special flag (WORK_UNBOUND) is given, 
>> tasks submitted will eventually be executed on the CPU where they 
>> were submitted.
>>
>> 2. If the queue_work returns true, the work should be or is being 
>> executed
>>
>> If all of the above are true, when we invoke queue_work in an 
>> interrupt context, does it mean that the submitted task will be 
>> executed in the interrupt context?
>>
>>
>> Best wishes,
>> D. Wythe
>>
> If you say the thread is not gauranteed to be waken up in then 
> queue_work to execute the work, please explain what the kick_pool 
> function does.

I never said that.

>
> However, the spin_lock understanding is still the key problem in the 
> cases. As I said, if it is not get clarify, we don't really need to go 
> on to disucss this.
>
>>>
>>>>>>>> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD 
>>>>>>>> link groups")
>>>>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>   net/smc/smc_close.c | 3 ++-
>>>>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>>>>> index 449ef45..10219f5 100644
>>>>>>>> --- a/net/smc/smc_close.c
>>>>>>>> +++ b/net/smc/smc_close.c
>>>>>>>> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct 
>>>>>>>> smc_sock *smc)
>>>>>>>>       struct sock *sk = &smc->sk;
>>>>>>>>         release_sock(sk);
>>>>>>>> -    cancel_work_sync(&smc->conn.close_work);
>>>>>>>> +    if (cancel_work_sync(&smc->conn.close_work))
>>>>>>>> +        sock_put(sk);
>>>>>>>> cancel_delayed_work_sync(&smc->conn.tx_work);
>>>>>>>>       lock_sock(sk);
>>>>>>>>   }
>>>>>>
>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-23  8:52                 ` D. Wythe
@ 2023-10-23 10:28                   ` Wenjia Zhang
  2023-10-23 11:56                     ` Dust Li
       [not found]                     ` <59c0c75f-e9df-2ef1-ead2-7c5c97f3e750@linux.alibaba.com>
  0 siblings, 2 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-23 10:28 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 23.10.23 10:52, D. Wythe wrote:
> 
> 
> On 10/23/23 4:19 PM, Wenjia Zhang wrote:
>>
>>
>> On 20.10.23 04:41, D. Wythe wrote:
>>>
>>>
>>> On 10/20/23 1:40 AM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 19.10.23 09:33, D. Wythe wrote:
>>>>>
>>>>>
>>>>> On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>>>>>>
>>>>>>
>>>>>> On 17.10.23 04:06, D. Wythe wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>>>
>>>>>>>>> Note that we always hold a reference to sock when attempting
>>>>>>>>> to submit close_work. 
>>>>>>>> yes
>>>>>>>> Therefore, if we have successfully
>>>>>>>>> canceled close_work from pending, we MUST release that reference
>>>>>>>>> to avoid potential leaks.
>>>>>>>>>
>>>>>>>> Isn't the corresponding reference already released inside the 
>>>>>>>> smc_close_passive_work()?
>>>>>>>>
>>>>>>>
>>>>>>> Hi Wenjia,
>>>>>>>
>>>>>>> If we successfully cancel the close work from the pending state,
>>>>>>> it means that smc_close_passive_work() has never been executed.
>>>>>>>
>>>>>>> You can find more details here.
>>>>>>>
>>>>>>> /**
>>>>>>> * cancel_work_sync - cancel a work and wait for it to finish
>>>>>>> * @work:the work to cancel
>>>>>>> *
>>>>>>> * Cancel @work and wait for its execution to finish. This function
>>>>>>> * can be used even if the work re-queues itself or migrates to
>>>>>>> * another workqueue. On return from this function, @work is
>>>>>>> * guaranteed to be not pending or executing on any CPU.
>>>>>>> *
>>>>>>> * cancel_work_sync(&delayed_work->work) must not be used for
>>>>>>> * delayed_work's. Use cancel_delayed_work_sync() instead.
>>>>>>> *
>>>>>>> * The caller must ensure that the workqueue on which @work was last
>>>>>>> * queued can't be destroyed before this function returns.
>>>>>>> *
>>>>>>> * Return:
>>>>>>> * %true if @work was pending, %false otherwise.
>>>>>>> */
>>>>>>> boolcancel_work_sync(structwork_struct *work)
>>>>>>> {
>>>>>>> return__cancel_work_timer(work, false);
>>>>>>> }
>>>>>>>
>>>>>>> Best wishes,
>>>>>>> D. Wythe
>>>>>> As I understand, queue_work() would wake up the work if the work 
>>>>>> is not already on the queue. And the sock_hold() is just prio to 
>>>>>> the queue_work(). That means, cancel_work_sync() would cancel the 
>>>>>> work either before its execution or after. If your fix refers to 
>>>>>> the former case, at this moment, I don't think the reference can 
>>>>>> be hold, thus it is unnecessary to put it.
>>>>>>>
>>>>>
>>>>> I am quite confuse about why you think when we cancel the work 
>>>>> before its execution,
>>>>> the reference can not be hold ?
>>>>>
>>>>>
>>>>> Perhaps the following diagram can describe the problem in better way :
>>>>>
>>>>> smc_close_cancel_work
>>>>> smc_cdc_msg_recv_action
>>>>>
>>>>>
>>>>> sock_hold
>>>>> queue_work
>>>>> if (cancel_work_sync())        // successfully cancel before execution
>>>>> sock_put()                        //  need to put it since we 
>>>>> already hold a ref before   queue_work()
>>>>>
>>>>>
>>>> ha, I already thought you might ask such question:P
>>>>
>>>> I think here two Problems need to be clarified:
>>>>
>>>> 1) Do you think the bh_lock_sock/bh_unlock_sock in the 
>>>> smc_cdc_msg_recv does not protect the smc_cdc_msg_recv_action() from 
>>>> cancel_work_sync()?
>>>> Maybe that would go back to the discussion in the other patch on the 
>>>> behaviors of the locks.
>>>>
>>>
>>> Yes. bh_lock_sock/bh_unlock_sock can not block code execution 
>>> protected by lock_sock/unlock(). That is to say, they are not exclusive.
>>>
>> No, the logic of the inference is very vague to me. My understand is 
>> completely different. That is what I read from the kernel code. They 
>> are not *completely* exclusive, because while the bottom half context 
>> holds the lock i.e. bh_lock_sock, the process context can not get the 
>> lock by lock_sock. (This is actually my main point of my argument for 
>> these fixes, and I didn't see any clarify from you). However, while 
>> the process context holds the lock by lock_sock, the bottom half 
>> context can still get it by bh_lock_sock, this is just like what you 
>> showed in the code in lock_sock. Once it gets the ownership, it 
>> release the spinlock.
>>
> 
> “ while the process context holds the lock by lock_sock, the bottom half 
> context can still get it by bh_lock_sock,  ”
> 
> You already got that, so why that sock_set_flag(DONE) and 
> sock_set_flag(DEAD) can not happen concurrently ?
> 

Then I'd ask how do you understand this sentence I wrote? "while the 
bottom half context holds the lock i.e. bh_lock_sock, the process 
context can not get the lock by lock_sock."
> 
>>> We can use a very simple example to infer that since bh_lock_sock is 
>>> type of spin-lock, if bh_lock_sock/bh_unlock_sock can block 
>>> lock_sock/unlock(),
>>> then lock_sock/unlock() can also block bh_lock_sock/bh_unlock_sock.
>>>
>>> If this is true, when the process context already lock_sock(), the 
>>> interrupt context must wait for the process to call
>>> release_sock(). Obviously, this is very unreasonable.
>>>
>>>
>>>> 2) If the queue_work returns true, as I said in the last main, the 
>>>> work should be (being) executed. How could the cancel_work_sync() 
>>>> cancel the work before execution successgully?
>>>
>>> No, that's not true. In fact, if queue_work returns true, it simply 
>>> means that we have added the task to the queue and may schedule a 
>>> worker to execute it,
>>> but it does not guarantee that the task will be executed or is being 
>>> executed when it returns true,
>>> the task might still in the list and waiting some worker to execute it.
>>>
>>> We can make a simple inference,
>>>
>>> 1. A known fact is that if no special flag (WORK_UNBOUND) is given, 
>>> tasks submitted will eventually be executed on the CPU where they 
>>> were submitted.
>>>
>>> 2. If the queue_work returns true, the work should be or is being 
>>> executed
>>>
>>> If all of the above are true, when we invoke queue_work in an 
>>> interrupt context, does it mean that the submitted task will be 
>>> executed in the interrupt context?
>>>
>>>
>>> Best wishes,
>>> D. Wythe
>>>
>> If you say the thread is not gauranteed to be waken up in then 
>> queue_work to execute the work, please explain what the kick_pool 
>> function does.
> 
> I never said that.
> 
What do you understand on the kick_pool there?
>>
>> However, the spin_lock understanding is still the key problem in the 
>> cases. As I said, if it is not get clarify, we don't really need to go 
>> on to disucss this.
>>
>>>>
>>>>>>>>> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD 
>>>>>>>>> link groups")
>>>>>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>>>>>> ---
>>>>>>>>>   net/smc/smc_close.c | 3 ++-
>>>>>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>>>>>> index 449ef45..10219f5 100644
>>>>>>>>> --- a/net/smc/smc_close.c
>>>>>>>>> +++ b/net/smc/smc_close.c
>>>>>>>>> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct 
>>>>>>>>> smc_sock *smc)
>>>>>>>>>       struct sock *sk = &smc->sk;
>>>>>>>>>         release_sock(sk);
>>>>>>>>> -    cancel_work_sync(&smc->conn.close_work);
>>>>>>>>> +    if (cancel_work_sync(&smc->conn.close_work))
>>>>>>>>> +        sock_put(sk);
>>>>>>>>> cancel_delayed_work_sync(&smc->conn.tx_work);
>>>>>>>>>       lock_sock(sk);
>>>>>>>>>   }
>>>>>>>
>>>>>
>>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
  2023-10-23 10:28                   ` Wenjia Zhang
@ 2023-10-23 11:56                     ` Dust Li
       [not found]                     ` <59c0c75f-e9df-2ef1-ead2-7c5c97f3e750@linux.alibaba.com>
  1 sibling, 0 replies; 38+ messages in thread
From: Dust Li @ 2023-10-23 11:56 UTC (permalink / raw)
  To: Wenjia Zhang, D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

On Mon, Oct 23, 2023 at 12:28:16PM +0200, Wenjia Zhang wrote:
>
>
>On 23.10.23 10:52, D. Wythe wrote:
>> 
>> 
>> On 10/23/23 4:19 PM, Wenjia Zhang wrote:
>> > 
>> > 
>> > On 20.10.23 04:41, D. Wythe wrote:
>> > > 
>> > > 
>> > > On 10/20/23 1:40 AM, Wenjia Zhang wrote:
>> > > > 
>> > > > 
>> > > > On 19.10.23 09:33, D. Wythe wrote:
>> > > > > 
>> > > > > 
>> > > > > On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>> > > > > > 
>> > > > > > 
>> > > > > > On 17.10.23 04:06, D. Wythe wrote:
>> > > > > > > 
>> > > > > > > 
>> > > > > > > On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>> > > > > > > > 
>> > > > > > > > 
>> > > > > > > > On 11.10.23 09:33, D. Wythe wrote:
>> > > > > > > > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > > > > > > > > 
>> > > > > > > > > Note that we always hold a reference to sock when attempting
>> > > > > > > > > to submit close_work.
>> > > > > > > > yes
>> > > > > > > > Therefore, if we have successfully
>> > > > > > > > > canceled close_work from pending, we MUST release that reference
>> > > > > > > > > to avoid potential leaks.
>> > > > > > > > > 
>> > > > > > > > Isn't the corresponding reference already
>> > > > > > > > released inside the smc_close_passive_work()?
>> > > > > > > > 
>> > > > > > > 
>> > > > > > > Hi Wenjia,
>> > > > > > > 
>> > > > > > > If we successfully cancel the close work from the pending state,
>> > > > > > > it means that smc_close_passive_work() has never been executed.
>> > > > > > > 
>> > > > > > > You can find more details here.
>> > > > > > > 
>> > > > > > > /**
>> > > > > > > * cancel_work_sync - cancel a work and wait for it to finish
>> > > > > > > * @work:the work to cancel
>> > > > > > > *
>> > > > > > > * Cancel @work and wait for its execution to finish. This function
>> > > > > > > * can be used even if the work re-queues itself or migrates to
>> > > > > > > * another workqueue. On return from this function, @work is
>> > > > > > > * guaranteed to be not pending or executing on any CPU.
>> > > > > > > *
>> > > > > > > * cancel_work_sync(&delayed_work->work) must not be used for
>> > > > > > > * delayed_work's. Use cancel_delayed_work_sync() instead.
>> > > > > > > *
>> > > > > > > * The caller must ensure that the workqueue on which @work was last
>> > > > > > > * queued can't be destroyed before this function returns.
>> > > > > > > *
>> > > > > > > * Return:
>> > > > > > > * %true if @work was pending, %false otherwise.
>> > > > > > > */
>> > > > > > > boolcancel_work_sync(structwork_struct *work)
>> > > > > > > {
>> > > > > > > return__cancel_work_timer(work, false);
>> > > > > > > }
>> > > > > > > 
>> > > > > > > Best wishes,
>> > > > > > > D. Wythe
>> > > > > > As I understand, queue_work() would wake up the work
>> > > > > > if the work is not already on the queue. And the
>> > > > > > sock_hold() is just prio to the queue_work(). That
>> > > > > > means, cancel_work_sync() would cancel the work
>> > > > > > either before its execution or after. If your fix
>> > > > > > refers to the former case, at this moment, I don't
>> > > > > > think the reference can be hold, thus it is
>> > > > > > unnecessary to put it.
>> > > > > > > 
>> > > > > 
>> > > > > I am quite confuse about why you think when we cancel the
>> > > > > work before its execution,
>> > > > > the reference can not be hold ?
>> > > > > 
>> > > > > 
>> > > > > Perhaps the following diagram can describe the problem in better way :
>> > > > > 
>> > > > > smc_close_cancel_work
>> > > > > smc_cdc_msg_recv_action
>> > > > > 
>> > > > > 
>> > > > > sock_hold
>> > > > > queue_work
>> > > > > if (cancel_work_sync())        // successfully cancel before execution
>> > > > > sock_put()                        //  need to put it
>> > > > > since we already hold a ref before   queue_work()
>> > > > > 
>> > > > > 
>> > > > ha, I already thought you might ask such question:P
>> > > > 
>> > > > I think here two Problems need to be clarified:
>> > > > 
>> > > > 1) Do you think the bh_lock_sock/bh_unlock_sock in the
>> > > > smc_cdc_msg_recv does not protect the
>> > > > smc_cdc_msg_recv_action() from cancel_work_sync()?
>> > > > Maybe that would go back to the discussion in the other patch
>> > > > on the behaviors of the locks.
>> > > > 
>> > > 
>> > > Yes. bh_lock_sock/bh_unlock_sock can not block code execution
>> > > protected by lock_sock/unlock(). That is to say, they are not
>> > > exclusive.
>> > > 
>> > No, the logic of the inference is very vague to me. My understand is
>> > completely different. That is what I read from the kernel code. They
>> > are not *completely* exclusive, because while the bottom half context
>> > holds the lock i.e. bh_lock_sock, the process context can not get the
>> > lock by lock_sock. (This is actually my main point of my argument for
>> > these fixes, and I didn't see any clarify from you). However, while
>> > the process context holds the lock by lock_sock, the bottom half
>> > context can still get it by bh_lock_sock, this is just like what you
>> > showed in the code in lock_sock. Once it gets the ownership, it
>> > release the spinlock.
>> > 
>> 
>> “ while the process context holds the lock by lock_sock, the bottom half
>> context can still get it by bh_lock_sock,  ”
>> 
>> You already got that, so why that sock_set_flag(DONE) and
>> sock_set_flag(DEAD) can not happen concurrently ?
>> 
>
>Then I'd ask how do you understand this sentence I wrote? "while the bottom
>half context holds the lock i.e. bh_lock_sock, the process context can not
>get the lock by lock_sock."

That's correct, but the reverse is not true. i.e. if the process context
hold the lock, the botton half context can still acquire the lock.

Best regards,
Dust


>> 
>> > > We can use a very simple example to infer that since bh_lock_sock
>> > > is type of spin-lock, if bh_lock_sock/bh_unlock_sock can block
>> > > lock_sock/unlock(),
>> > > then lock_sock/unlock() can also block bh_lock_sock/bh_unlock_sock.
>> > > 
>> > > If this is true, when the process context already lock_sock(),
>> > > the interrupt context must wait for the process to call
>> > > release_sock(). Obviously, this is very unreasonable.
>> > > 
>> > > 
>> > > > 2) If the queue_work returns true, as I said in the last
>> > > > main, the work should be (being) executed. How could the
>> > > > cancel_work_sync() cancel the work before execution
>> > > > successgully?
>> > > 
>> > > No, that's not true. In fact, if queue_work returns true, it
>> > > simply means that we have added the task to the queue and may
>> > > schedule a worker to execute it,
>> > > but it does not guarantee that the task will be executed or is
>> > > being executed when it returns true,
>> > > the task might still in the list and waiting some worker to execute it.
>> > > 
>> > > We can make a simple inference,
>> > > 
>> > > 1. A known fact is that if no special flag (WORK_UNBOUND) is
>> > > given, tasks submitted will eventually be executed on the CPU
>> > > where they were submitted.
>> > > 
>> > > 2. If the queue_work returns true, the work should be or is being
>> > > executed
>> > > 
>> > > If all of the above are true, when we invoke queue_work in an
>> > > interrupt context, does it mean that the submitted task will be
>> > > executed in the interrupt context?
>> > > 
>> > > 
>> > > Best wishes,
>> > > D. Wythe
>> > > 
>> > If you say the thread is not gauranteed to be waken up in then
>> > queue_work to execute the work, please explain what the kick_pool
>> > function does.
>> 
>> I never said that.
>> 
>What do you understand on the kick_pool there?
>> > 
>> > However, the spin_lock understanding is still the key problem in the
>> > cases. As I said, if it is not get clarify, we don't really need to
>> > go on to disucss this.
>> > 
>> > > > 
>> > > > > > > > > Fixes: 42bfba9eaa33 ("net/smc: immediate
>> > > > > > > > > termination for SMCD link groups")
>> > > > > > > > > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> > > > > > > > > ---
>> > > > > > > > >   net/smc/smc_close.c | 3 ++-
>> > > > > > > > >   1 file changed, 2 insertions(+), 1 deletion(-)
>> > > > > > > > > 
>> > > > > > > > > diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>> > > > > > > > > index 449ef45..10219f5 100644
>> > > > > > > > > --- a/net/smc/smc_close.c
>> > > > > > > > > +++ b/net/smc/smc_close.c
>> > > > > > > > > @@ -116,7 +116,8 @@ static void
>> > > > > > > > > smc_close_cancel_work(struct smc_sock
>> > > > > > > > > *smc)
>> > > > > > > > >       struct sock *sk = &smc->sk;
>> > > > > > > > >         release_sock(sk);
>> > > > > > > > > -    cancel_work_sync(&smc->conn.close_work);
>> > > > > > > > > +    if (cancel_work_sync(&smc->conn.close_work))
>> > > > > > > > > +        sock_put(sk);
>> > > > > > > > > cancel_delayed_work_sync(&smc->conn.tx_work);
>> > > > > > > > >       lock_sock(sk);
>> > > > > > > > >   }
>> > > > > > > 
>> > > > > 
>> > > > > 
>> > > 
>> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
       [not found]                     ` <59c0c75f-e9df-2ef1-ead2-7c5c97f3e750@linux.alibaba.com>
@ 2023-10-23 20:52                       ` Wenjia Zhang
  0 siblings, 0 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-23 20:52 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 23.10.23 14:18, D. Wythe wrote:
> 
> 
> On 10/23/23 6:28 PM, Wenjia Zhang wrote:
>>
>>
>> On 23.10.23 10:52, D. Wythe wrote:
>>>
>>>
>>> On 10/23/23 4:19 PM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 20.10.23 04:41, D. Wythe wrote:
>>>>>
>>>>>
>>>>> On 10/20/23 1:40 AM, Wenjia Zhang wrote:
>>>>>>
>>>>>>
>>>>>> On 19.10.23 09:33, D. Wythe wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 10/19/23 4:26 AM, Wenjia Zhang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17.10.23 04:06, D. Wythe wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/13/23 3:04 AM, Wenjia Zhang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>>>>>
>>>>>>>>>>> Note that we always hold a reference to sock when attempting
>>>>>>>>>>> to submit close_work. 
>>>>>>>>>> yes
>>>>>>>>>> Therefore, if we have successfully
>>>>>>>>>>> canceled close_work from pending, we MUST release that reference
>>>>>>>>>>> to avoid potential leaks.
>>>>>>>>>>>
>>>>>>>>>> Isn't the corresponding reference already released inside the 
>>>>>>>>>> smc_close_passive_work()?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Wenjia,
>>>>>>>>>
>>>>>>>>> If we successfully cancel the close work from the pending state,
>>>>>>>>> it means that smc_close_passive_work() has never been executed.
>>>>>>>>>
>>>>>>>>> You can find more details here.
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>> * cancel_work_sync - cancel a work and wait for it to finish
>>>>>>>>> * @work:the work to cancel
>>>>>>>>> *
>>>>>>>>> * Cancel @work and wait for its execution to finish. This function
>>>>>>>>> * can be used even if the work re-queues itself or migrates to
>>>>>>>>> * another workqueue. On return from this function, @work is
>>>>>>>>> * guaranteed to be not pending or executing on any CPU.
>>>>>>>>> *
>>>>>>>>> * cancel_work_sync(&delayed_work->work) must not be used for
>>>>>>>>> * delayed_work's. Use cancel_delayed_work_sync() instead.
>>>>>>>>> *
>>>>>>>>> * The caller must ensure that the workqueue on which @work was 
>>>>>>>>> last
>>>>>>>>> * queued can't be destroyed before this function returns.
>>>>>>>>> *
>>>>>>>>> * Return:
>>>>>>>>> * %true if @work was pending, %false otherwise.
>>>>>>>>> */
>>>>>>>>> boolcancel_work_sync(structwork_struct *work)
>>>>>>>>> {
>>>>>>>>> return__cancel_work_timer(work, false);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Best wishes,
>>>>>>>>> D. Wythe
>>>>>>>> As I understand, queue_work() would wake up the work if the work 
>>>>>>>> is not already on the queue. And the sock_hold() is just prio to 
>>>>>>>> the queue_work(). That means, cancel_work_sync() would cancel 
>>>>>>>> the work either before its execution or after. If your fix 
>>>>>>>> refers to the former case, at this moment, I don't think the 
>>>>>>>> reference can be hold, thus it is unnecessary to put it.
>>>>>>>>>
>>>>>>>
>>>>>>> I am quite confuse about why you think when we cancel the work 
>>>>>>> before its execution,
>>>>>>> the reference can not be hold ?
>>>>>>>
>>>>>>>
>>>>>>> Perhaps the following diagram can describe the problem in better 
>>>>>>> way :
>>>>>>>
>>>>>>> smc_close_cancel_work
>>>>>>> smc_cdc_msg_recv_action
>>>>>>>
>>>>>>>
>>>>>>> sock_hold
>>>>>>> queue_work
>>>>>>> if (cancel_work_sync())        // successfully cancel before 
>>>>>>> execution
>>>>>>> sock_put()                        //  need to put it since we 
>>>>>>> already hold a ref before   queue_work()
>>>>>>>
>>>>>>>
>>>>>> ha, I already thought you might ask such question:P
>>>>>>
>>>>>> I think here two Problems need to be clarified:
>>>>>>
>>>>>> 1) Do you think the bh_lock_sock/bh_unlock_sock in the 
>>>>>> smc_cdc_msg_recv does not protect the smc_cdc_msg_recv_action() 
>>>>>> from cancel_work_sync()?
>>>>>> Maybe that would go back to the discussion in the other patch on 
>>>>>> the behaviors of the locks.
>>>>>>
>>>>>
>>>>> Yes. bh_lock_sock/bh_unlock_sock can not block code execution 
>>>>> protected by lock_sock/unlock(). That is to say, they are not 
>>>>> exclusive.
>>>>>
>>>> No, the logic of the inference is very vague to me. My understand is 
>>>> completely different. That is what I read from the kernel code. They 
>>>> are not *completely* exclusive, because while the bottom half 
>>>> context holds the lock i.e. bh_lock_sock, the process context can 
>>>> not get the lock by lock_sock. (This is actually my main point of my 
>>>> argument for these fixes, and I didn't see any clarify from you). 
>>>> However, while the process context holds the lock by lock_sock, the 
>>>> bottom half context can still get it by bh_lock_sock, this is just 
>>>> like what you showed in the code in lock_sock. Once it gets the 
>>>> ownership, it release the spinlock.
>>>>
>>>
>>> “ while the process context holds the lock by lock_sock, the bottom 
>>> half context can still get it by bh_lock_sock,  ”
>>>
>>> You already got that, so why that sock_set_flag(DONE) and 
>>> sock_set_flag(DEAD) can not happen concurrently ?
>>>
>>
>> Then I'd ask how do you understand this sentence I wrote? "while the 
>> bottom half context holds the lock i.e. bh_lock_sock, the process 
>> context can not get the lock by lock_sock."
>>>
> 
> That's also true.  I have no questions on it.  They are asymmetrical.
> 
> But we cannot guarantee that the interrupt context always holds the lock 
> before the process context, that's why i think
> that sock_set_flag(DONE) and sock_set_flag(DEAD) can run concurrently.
> 
ok, I have to agree with that. I did too much focus on this case :(
So I think the approach of the 1st patch is also appropriate. Thank you 
for taking time to let me out!

>>>>> We can use a very simple example to infer that since bh_lock_sock 
>>>>> is type of spin-lock, if bh_lock_sock/bh_unlock_sock can block 
>>>>> lock_sock/unlock(),
>>>>> then lock_sock/unlock() can also block bh_lock_sock/bh_unlock_sock.
>>>>>
>>>>> If this is true, when the process context already lock_sock(), the 
>>>>> interrupt context must wait for the process to call
>>>>> release_sock(). Obviously, this is very unreasonable.
>>>>>
>>>>>
>>>>>> 2) If the queue_work returns true, as I said in the last main, the 
>>>>>> work should be (being) executed. How could the cancel_work_sync() 
>>>>>> cancel the work before execution successgully?
>>>>>
>>>>> No, that's not true. In fact, if queue_work returns true, it simply 
>>>>> means that we have added the task to the queue and may schedule a 
>>>>> worker to execute it,
>>>>> but it does not guarantee that the task will be executed or is 
>>>>> being executed when it returns true,
>>>>> the task might still in the list and waiting some worker to execute 
>>>>> it.
>>>>>
>>>>> We can make a simple inference,
>>>>>
>>>>> 1. A known fact is that if no special flag (WORK_UNBOUND) is given, 
>>>>> tasks submitted will eventually be executed on the CPU where they 
>>>>> were submitted.
>>>>>
>>>>> 2. If the queue_work returns true, the work should be or is being 
>>>>> executed
>>>>>
>>>>> If all of the above are true, when we invoke queue_work in an 
>>>>> interrupt context, does it mean that the submitted task will be 
>>>>> executed in the interrupt context?
>>>>>
>>>>>
>>>>> Best wishes,
>>>>> D. Wythe
>>>>>
>>>> If you say the thread is not gauranteed to be waken up in then 
>>>> queue_work to execute the work, please explain what the kick_pool 
>>>> function does.
>>>
>>> I never said that.
>>>
>> What do you understand on the kick_pool there?
> 
> 
> 
> 
> I think this simple logic-code graph can totally explain my point of 
> view in clear.
> 
> My key point is queue_work can not guarantee the work_1 is executed or 
> being executed, the work_1 might still be
> in the list ( before executed ) .
> 
> The kick_pool() might wake up the 'a_idle_worker' from schedule(), and 
> then the work_1 can be executed soon.
> But we can not said that the  work_1 is already executed or being executed.
> 
> In fact, we can invoke cancel_work_syn() to delete the work_1 from the 
> list to avoid to be executed, when the
> a_idle_worker_main has not delete(or pop) the work_1 yet.
> 
> Besides, there is a upper limit to the number of idle workers. If the 
> current number of work_x being executed exceeds this number,
> the work_1 must wait until there are idle_workers available. In that 
> case, we can not said that the  work_1 is already executed
> or being executed as well.
> 
I do agree with this explaination. My thought was that cancel_work_syn() 
deleting the work_1 from the list to avoid to be executed would rarely 
happen, as I was focusing the scenario above. Since we have the 
agreement on the locks now, I agree that would happen.

Thanks again!
Here you are:
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
  2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
  2023-10-11 14:00   ` Dust Li
  2023-10-11 20:31   ` Wenjia Zhang
@ 2023-10-23 20:53   ` Wenjia Zhang
  2 siblings, 0 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-23 20:53 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Considering scenario:
> 
> 				smc_cdc_rx_handler_rwwi
> __smc_release
> 				sock_set_flag
> smc_close_active()
> sock_set_flag
> 
> __set_bit(DEAD)			__set_bit(DONE)
> 
> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
> if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
> in smc_close_passive_work:
> 
> if (sock_flag(sk, SOCK_DEAD) &&
> 	smc_close_sent_any_close(conn)) {
> 	sk->sk_state = SMC_CLOSED;
> } else {
> 	/* just shutdown, but not yet closed locally */
> 	sk->sk_state = SMC_APPFINCLOSEWAIT;
> }
> 
> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
> Since set_bit is atomic.
> 
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/af_smc.c    | 4 ++--
>   net/smc/smc.h       | 5 +++++
>   net/smc/smc_cdc.c   | 2 +-
>   net/smc/smc_close.c | 2 +-
>   4 files changed, 9 insertions(+), 4 deletions(-)
> 

Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net 4/5] net/smc: protect connection state transitions in listen work
  2023-10-12 17:14   ` Wenjia Zhang
@ 2023-10-31  3:04     ` D. Wythe
  0 siblings, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-31  3:04 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 10/13/23 1:14 AM, Wenjia Zhang wrote:
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> Consider the following scenario:
>>
>>                 smc_close_passive_work
>> smc_listen_out_connected
>>                 lock_sock()
>> if (state  == SMC_INIT)
>>                 if (state  == SMC_INIT)
>>                     state = SMC_APPCLOSEWAIT1;
>>     state = SMC_ACTIVE
>>                 release_sock()
>>
>> This would cause the state machine of the connection to be corrupted.
>> Also, this issue can occur in smc_listen_out_err().
>>
>> To solve this problem, we can protect the state transitions under
>> the lock of sock to avoid collision.
>>
> To this fix, I have to repeat the question from Alexandra.
> Did the scenario occur in real life? Or is it just kind of potencial 
> problem you found during the code review?
>

Hi Wenjia,

This is a real issue that occurred in our environment rather than being 
obtained from code reviews.
Unfortunately, since this patch does not cause panic, but rather 
potential reference leaks, so it is difficult for me
to provide a very intuitive error message.

> If it is the former one, could you please show us the corresponding 
> message, e.g. from dmesg? If it is the latter one, I'd like to deal 
> with it more carefully. Going from this scenario, I noticed that there 
> could also be other similar places where we need to make sure that no 
> race happens. Thus, it would make more sense to find a systematic 
> approach.
>

We agree that we should deal with it with more care, In fact, this issue 
is very complex and we may spend a lot of time discussing it. Therefore, 
I suggest that we can temporarily drop it
so that we can quickly accept the patch we have already agreed on. I 
will send those patches separately in the future.

Best Wishes,
D. Wythe

>> Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in 
>> af_smc")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>>   net/smc/af_smc.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index 5ad2a9f..3bb8265 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -1926,8 +1926,10 @@ static void smc_listen_out_connected(struct 
>> smc_sock *new_smc)
>>   {
>>       struct sock *newsmcsk = &new_smc->sk;
>>   +    lock_sock(newsmcsk);
>>       if (newsmcsk->sk_state == SMC_INIT)
>>           newsmcsk->sk_state = SMC_ACTIVE;
>> +    release_sock(newsmcsk);
>>         smc_listen_out(new_smc);
>>   }
>> @@ -1939,9 +1941,12 @@ static void smc_listen_out_err(struct smc_sock 
>> *new_smc)
>>       struct net *net = sock_net(newsmcsk);
>> this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt);
>> +
>> +    lock_sock(newsmcsk);
>>       if (newsmcsk->sk_state == SMC_INIT)
>>           sock_put(&new_smc->sk); /* passive closing */
>>       newsmcsk->sk_state = SMC_CLOSED;
>> +    release_sock(newsmcsk);
>>         smc_listen_out(new_smc);
>>   }


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-10-31  3:04 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
2023-10-11 14:00   ` Dust Li
2023-10-11 20:31   ` Wenjia Zhang
2023-10-12  2:47     ` D. Wythe
     [not found]     ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
2023-10-12 11:51       ` Wenjia Zhang
2023-10-13  5:32         ` Dust Li
2023-10-13 11:52           ` Wenjia Zhang
2023-10-13 12:27             ` Dust Li
2023-10-17  2:00               ` D. Wythe
2023-10-17  8:39                 ` Dust Li
2023-10-17 17:03                 ` Wenjia Zhang
     [not found]                   ` <4065e94f-f7ea-7943-e2cc-0c7d3f9c788b@linux.alibaba.com>
2023-10-19 11:54                     ` Wenjia Zhang
2023-10-23 20:53   ` Wenjia Zhang
2023-10-11  7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
2023-10-11  8:44   ` Heiko Carstens
2023-10-11  8:57     ` D. Wythe
2023-10-11  7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
2023-10-11 20:37   ` Wenjia Zhang
2023-10-12  2:49     ` D. Wythe
2023-10-12 15:15       ` Wenjia Zhang
2023-10-11  7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
2023-10-12 17:14   ` Wenjia Zhang
2023-10-31  3:04     ` D. Wythe
2023-10-11  7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
2023-10-11 14:54   ` Dust Li
2023-10-12 19:04   ` Wenjia Zhang
     [not found]     ` <ee641ca5-104b-d1ec-5b2a-e20237c5378a@linux.alibaba.com>
2023-10-18 20:26       ` Wenjia Zhang
2023-10-19  7:33         ` D. Wythe
2023-10-19 17:40           ` Wenjia Zhang
2023-10-20  2:41             ` D. Wythe
2023-10-23  8:19               ` Wenjia Zhang
2023-10-23  8:52                 ` D. Wythe
2023-10-23 10:28                   ` Wenjia Zhang
2023-10-23 11:56                     ` Dust Li
     [not found]                     ` <59c0c75f-e9df-2ef1-ead2-7c5c97f3e750@linux.alibaba.com>
2023-10-23 20:52                       ` Wenjia Zhang
2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
2023-10-17  1:56   ` D. Wythe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).