linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue
@ 2021-11-10 12:50 Wen Gu
  2021-11-10 12:50 ` [RFC PATCH net v2 1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback Wen Gu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Wen Gu @ 2021-11-10 12:50 UTC (permalink / raw)
  To: kgraul, tonylu
  Cc: davem, kuba, linux-s390, netdev, linux-kernel, dust.li, xuanzhuo, guwen

Hi, Karsten

Thanks for your reply. The previous discussion about the issue of socket
wait queue mismatch in SMC fallback can be referred from:
https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/

This set of patches includes two RFC patches, they are both aimed to fix
the same issue, the mismatch of socket wait queue in SMC fallback.

In your last reply, I am suggested to add the complete description about
the intention of initial patch in order that readers can understand the
idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix
socket wait queue mismatch issue caused by fallback" of this mail.

Unfortunately, I found a defect later in the solution of the initial patch
or the v2 patch mentioned above. The defect is about fasync_list and related
to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback").

When user applications use sock_fasync() to insert entries into fasync_list,
the wait queue they operate is smc socket->wq. But in initial patch or
the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(),
thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq
finally. So the entries added into smc socket->wq.fasync_list won't be woken
up at all before fallback.

So the solution in initial patch or the v2 patch of this mail by swapping
sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue.

Therefore, I tried another solution by removing the wait queue entries from
smc socket->wq to clcsocket->wq during the fallback, which is described in the
"[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this
mail. In our test environment, this patch can fix the fallback issue well.

I am looking forward to hear your opinions. Thank you.

Cheers,
Wen Gu

Wen Gu (2):
  net/smc: Fix socket wait queue mismatch issue caused by fallback
  net/smc: Transfer remaining wait queue entries


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH net v2 1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback
  2021-11-10 12:50 [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Wen Gu
@ 2021-11-10 12:50 ` Wen Gu
  2021-11-10 12:50 ` [RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries during fallback Wen Gu
  2021-11-11 14:21 ` [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Karsten Graul
  2 siblings, 0 replies; 5+ messages in thread
From: Wen Gu @ 2021-11-10 12:50 UTC (permalink / raw)
  To: kgraul, tonylu
  Cc: davem, kuba, linux-s390, netdev, linux-kernel, dust.li, xuanzhuo, guwen

There may be a mismatch of socket wait queue caused by SMC fallback.

When user applications used SMC to replace TCP, they might add
an eppoll_entry into smc socket->wq and expect to be notified if
epoll events like EPOLL_IN/EPOLL_OUT occurred on the smc socket.

However, once a fallback occurred, user applications start to use
clcsocket. So it is clcsocket->wq instead of smc socket->wq which
is woken up when there is data to be processed or sending space
available. Thus the eppoll_entry which was added into smc socket->wq
before fallback does not work anymore.

[if not fallback]
   application
        ^
      (poll)
        |
  smc socket->wq            clcsocket->wq
(has eppoll_entry)                .
        .                         .
        .                         .
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
        ^                         ^
        | (data flow)             |(tcp handshake in rendezvous)
        |                         |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

[if fallback]
   application <------------------|
                        (can't poll anything)
                                  |
  smc socket->wq            clcsocket->wq
(has eppoll_entry)                .
        .                         .
        .                         .
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
                                  ^
                                  |(data flow)
                                  |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

For example, in nginx/wrk benchmark, this issue will cause an
all-zeros test result:

server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html

  Running 5s test @ http://11.200.15.93/index.html
     1 threads and 1 connections
     Thread Stats   Avg      Stdev     Max   ± Stdev
     	Latency     0.00us    0.00us   0.00us    -nan%
	Req/Sec     0.00      0.00     0.00      -nan%
	0 requests in 5.00s, 0.00B read
     Requests/sec:      0.00
     Transfer/sec:       0.00B

The main idea of this patch is that since wait queue is switched
from smc socket->wq to clcsocket->wq after fallback, disabling the
eppoll_entry in smc socket->wq, maybe we can try to use clcsocket->wq
from the beginning.

So this patch set smc socket->sk->sk_wq to clcsocket->wq when smc
socket is created and reset clcsocket->sk->sk_wq to clcsocket->wq
once fallback occurrs, making clcsocket->wq the constant wait queue
that user applications use. Thus the user application can be
notified by the eppoll_entry in clcsocket->wq no matter whether
a fallback occurrs.

After this patch:

[if not fallback]
   application <------------------|
                                (poll)
                                  |
  smc socket->wq            clcsocket->wq
        .                 (has eppoll_entry)
            `   .                 .
                    `   . .   `
                  .   `    `   .
               `                  `
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
        ^                         ^
        |(data flow)              |(tcp handshake in rendezvous)
        |                         |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

[if fallback]
   application <------------------|
                                (poll)
                                  |
  smc socket->wq            clcsocket->wq
        .                 (has eppoll_entry)
            `   .                 .
                    `   .         .
                           `   .  .
                                  `
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
                                  ^
                                  |(data flow)
                                  |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

Link: https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
---
v1 -> v2
- Add the complete description about the intention of initial patch.
---
 net/smc/af_smc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0cf7ed2..fe5a2cd 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -566,6 +566,20 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 	smc->fallback_rsn = reason_code;
 	smc_stat_fallback(smc);
 	trace_smc_switch_to_fallback(smc, reason_code);
+
+	/* We swapped sk->sk_wq of clcsocket and smc socket in smc_create()
+	 * to prevent mismatch of wait queue caused by fallback.
+	 *
+	 * If fallback really occurred, user application starts to use clcsocket.
+	 * Accordingly, clcsocket->sk->sk_wq should be reset to clcsocket->wq
+	 * in order that user application still uses the same wait queue as what
+	 * was used before fallback.
+	 *
+	 * After fallback, the relation between socket->wq and socket->sk->sk_wq is:
+	 * - clcsocket->sk->sk_wq  -->  clcsocket->wq
+	 * - smcsocket->sk->sk_wq  -->  clcsocket->wq
+	 */
+	rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->clcsock->wq);
 	if (smc->sk.sk_socket && smc->sk.sk_socket->file) {
 		smc->clcsock->file = smc->sk.sk_socket->file;
 		smc->clcsock->file->private_data = smc->clcsock;
@@ -2174,6 +2188,12 @@ static int smc_accept(struct socket *sock, struct socket *new_sock,
 	if (rc)
 		goto out;
 
+	/* The new accepted smc sock sets sk->sk_wq to clcsocket->wq like what
+	 * smc_create() did when the fallback does not occur.
+	 */
+	if (!smc_sk(nsk)->use_fallback)
+		rcu_assign_pointer(nsk->sk_wq, &smc_sk(nsk)->clcsock->wq);
+
 	if (lsmc->sockopt_defer_accept && !(flags & O_NONBLOCK)) {
 		/* wait till data arrives on the socket */
 		timeo = msecs_to_jiffies(lsmc->sockopt_defer_accept *
@@ -2310,8 +2330,15 @@ static __poll_t smc_poll(struct file *file, struct socket *sock,
 		mask = smc->clcsock->ops->poll(file, smc->clcsock, wait);
 		sk->sk_err = smc->clcsock->sk->sk_err;
 	} else {
-		if (sk->sk_state != SMC_CLOSED)
-			sock_poll_wait(file, sock, wait);
+		if (sk->sk_state != SMC_CLOSED) {
+			/* SMC always uses clcsocket->wq for the call to
+			 * sock_poll_wait(), which is the same wait queue
+			 * as what smc socket->sk->sk_wq points to before
+			 * fallback or what clcsocket->sk->sk_wq points to
+			 * after fallback.
+			 */
+			sock_poll_wait(file, smc->clcsock, wait);
+		}
 		if (sk->sk_err)
 			mask |= EPOLLERR;
 		if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
@@ -2707,6 +2734,31 @@ static int smc_create(struct net *net, struct socket *sock, int protocol,
 	smc->sk.sk_sndbuf = max(smc->clcsock->sk->sk_sndbuf, SMC_BUF_MIN_SIZE);
 	smc->sk.sk_rcvbuf = max(smc->clcsock->sk->sk_rcvbuf, SMC_BUF_MIN_SIZE);
 
+	/* To ensure that user applications consistently use the same wait queue
+	 * before and after fallback, we set smc socket->sk->sk_wq to clcsocket->wq
+	 * here and reset clcsocket->sk->sk_wq to clcsocket->wq in
+	 * smc_switch_to_fallback() if fallback occurrs, making clcsocket->wq the
+	 * constant wait queue which user applications use.
+	 *
+	 * here:
+	 * - Set smc socket->sk->sk_wq to clcsocket->wq
+	 *   So that the sk_data_ready()/sk_write_space.. will wake up clcsocket->wq
+	 *   and user applications will be notified of epoll events if they added
+	 *   eppoll_entry into clcsocket->wq through smc_poll().
+	 *
+	 * - Set clcsocket->sk->sk_wq to smc socket->wq
+	 *   Since clcsocket->wq is occupied by smcsocket->sk->sk_wq, clcsocket->
+	 *   sk->sk_wq have to use another wait queue (smc socket->wq) during TCP
+	 *   handshake or CLC messages transmission in rendezvous.
+	 *
+	 * So the relation between socket->wq and socket->sk->sk_wq before fallback is:
+	 *
+	 * - smc socket->sk->sk_wq --> clcsocket->wq
+	 * - clcsocket->sk->sk_wq  --> smc socket->wq
+	 */
+	rcu_assign_pointer(sk->sk_wq, &smc->clcsock->wq);
+	rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->sk.sk_socket->wq);
+
 out:
 	return rc;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries during fallback
  2021-11-10 12:50 [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Wen Gu
  2021-11-10 12:50 ` [RFC PATCH net v2 1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback Wen Gu
@ 2021-11-10 12:50 ` Wen Gu
  2021-11-11 14:21 ` [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Karsten Graul
  2 siblings, 0 replies; 5+ messages in thread
From: Wen Gu @ 2021-11-10 12:50 UTC (permalink / raw)
  To: kgraul, tonylu
  Cc: davem, kuba, linux-s390, netdev, linux-kernel, dust.li, xuanzhuo, guwen

The SMC fallback is incomplete currently. There may be some
wait queue entries remaining in smc socket->wq, which should
be removed to clcsocket->wq during the fallback.

For example, in nginx/wrk benchmark, this issue causes an
all-zeros test result:

server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html

  Running 5s test @ http://11.200.15.93/index.html
     1 threads and 1 connections
     Thread Stats   Avg      Stdev     Max   ± Stdev
     	Latency     0.00us    0.00us   0.00us    -nan%
	Req/Sec     0.00      0.00     0.00      -nan%
	0 requests in 5.00s, 0.00B read
     Requests/sec:      0.00
     Transfer/sec:       0.00B

The reason for this all-zeros result is that when wrk used SMC
to replace TCP, it added an eppoll_entry into smc socket->wq
and expected to be notified if epoll events like EPOLL_IN/
EPOLL_OUT occurred on the smc socket.

However, once a fallback occurred, wrk switches to use clcsocket.
Now it is clcsocket->wq instead of smc socket->wq which will
be woken up. The eppoll_entry remaining in smc socket->wq does
not work anymore and wrk stops the test.

This patch fixes this issue by removing remaining wait queue
entries from smc socket->wq to clcsocket->wq during the fallback.

Link: https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/af_smc.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0cf7ed2..11a966a 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -562,6 +562,10 @@ static void smc_stat_fallback(struct smc_sock *smc)
 
 static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 {
+	wait_queue_head_t *smc_wait = sk_sleep(&smc->sk);
+	wait_queue_head_t *clc_wait = sk_sleep(smc->clcsock->sk);
+	unsigned long flags;
+
 	smc->use_fallback = true;
 	smc->fallback_rsn = reason_code;
 	smc_stat_fallback(smc);
@@ -571,6 +575,16 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 		smc->clcsock->file->private_data = smc->clcsock;
 		smc->clcsock->wq.fasync_list =
 			smc->sk.sk_socket->wq.fasync_list;
+
+		/* There might be some wait queue entries remaining
+		 * in smc socket->wq, which should be removed to
+		 * clcsocket->wq during the fallback.
+		 */
+		spin_lock_irqsave(&smc_wait->lock, flags);
+		spin_lock_irqsave(&clc_wait->lock, flags);
+		list_splice_init(&smc_wait->head, &clc_wait->head);
+		spin_unlock_irqrestore(&clc_wait->lock, flags);
+		spin_unlock_irqrestore(&smc_wait->lock, flags);
 	}
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue
  2021-11-10 12:50 [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Wen Gu
  2021-11-10 12:50 ` [RFC PATCH net v2 1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback Wen Gu
  2021-11-10 12:50 ` [RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries during fallback Wen Gu
@ 2021-11-11 14:21 ` Karsten Graul
  2021-11-12  3:09   ` Wen Gu
  2 siblings, 1 reply; 5+ messages in thread
From: Karsten Graul @ 2021-11-11 14:21 UTC (permalink / raw)
  To: Wen Gu, tonylu
  Cc: davem, kuba, linux-s390, netdev, linux-kernel, dust.li, xuanzhuo

On 10/11/2021 13:50, Wen Gu wrote:
> Hi, Karsten
> 
> Thanks for your reply. The previous discussion about the issue of socket
> wait queue mismatch in SMC fallback can be referred from:
> https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
> 
> This set of patches includes two RFC patches, they are both aimed to fix
> the same issue, the mismatch of socket wait queue in SMC fallback.
> 
> In your last reply, I am suggested to add the complete description about
> the intention of initial patch in order that readers can understand the
> idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix
> socket wait queue mismatch issue caused by fallback" of this mail.
> 
> Unfortunately, I found a defect later in the solution of the initial patch
> or the v2 patch mentioned above. The defect is about fasync_list and related
> to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback").
> 
> When user applications use sock_fasync() to insert entries into fasync_list,
> the wait queue they operate is smc socket->wq. But in initial patch or
> the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(),
> thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq
> finally. So the entries added into smc socket->wq.fasync_list won't be woken
> up at all before fallback.
> 
> So the solution in initial patch or the v2 patch of this mail by swapping
> sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue.
> 
> Therefore, I tried another solution by removing the wait queue entries from
> smc socket->wq to clcsocket->wq during the fallback, which is described in the
> "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this
> mail. In our test environment, this patch can fix the fallback issue well.

Still running final tests but overall its working well here, too.
Until we maybe find a 'cleaner' solution if this I would like to go with your
current fixes. But I would like to improve the wording of the commit message and
the comments a little bit if you are okay with that.

If you send a new series with the 2 patches then I would take them and post them
to the list again with my changes.

What do you think?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue
  2021-11-11 14:21 ` [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Karsten Graul
@ 2021-11-12  3:09   ` Wen Gu
  0 siblings, 0 replies; 5+ messages in thread
From: Wen Gu @ 2021-11-12  3:09 UTC (permalink / raw)
  To: Karsten Graul, tonylu
  Cc: davem, kuba, linux-s390, netdev, linux-kernel, dust.li, xuanzhuo



On 2021/11/11 10:21 pm, Karsten Graul wrote:
> On 10/11/2021 13:50, Wen Gu wrote:
>> Hi, Karsten
>>
>> Thanks for your reply. The previous discussion about the issue of socket
>> wait queue mismatch in SMC fallback can be referred from:
>> https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
>>
>> This set of patches includes two RFC patches, they are both aimed to fix
>> the same issue, the mismatch of socket wait queue in SMC fallback.
>>
>> In your last reply, I am suggested to add the complete description about
>> the intention of initial patch in order that readers can understand the
>> idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix
>> socket wait queue mismatch issue caused by fallback" of this mail.
>>
>> Unfortunately, I found a defect later in the solution of the initial patch
>> or the v2 patch mentioned above. The defect is about fasync_list and related
>> to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback").
>>
>> When user applications use sock_fasync() to insert entries into fasync_list,
>> the wait queue they operate is smc socket->wq. But in initial patch or
>> the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(),
>> thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq
>> finally. So the entries added into smc socket->wq.fasync_list won't be woken
>> up at all before fallback.
>>
>> So the solution in initial patch or the v2 patch of this mail by swapping
>> sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue.
>>
>> Therefore, I tried another solution by removing the wait queue entries from
>> smc socket->wq to clcsocket->wq during the fallback, which is described in the
>> "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this
>> mail. In our test environment, this patch can fix the fallback issue well.
> 
> Still running final tests but overall its working well here, too.
> Until we maybe find a 'cleaner' solution if this I would like to go with your
> current fixes. But I would like to improve the wording of the commit message and
> the comments a little bit if you are okay with that.
> 
> If you send a new series with the 2 patches then I would take them and post them
> to the list again with my changes.

Seems just the second patch alone will fix the issue.

> 
> What do you think?
> 

Thanks for your reply. I am glad that the second patch works well.

To avoid there being any misunderstanding between us, I want to explain 
that just the second patch "[RFC PATCH net 2/2] net/smc: Transfer 
remaining wait queue entries" alone will fix the issue well.

Because it transfers the remaining entries in smc socket->wq to 
clcsocket->wq during the fallback, so that the entries added into smc 
socket->wq before fallback will still works after fallback, even though 
user applications start to use clcsocket.


The first patch "[RFC PATCH net v2 0/2] net/smc: Fix socket wait queue 
mismatch issue caused by fallback" should be abandoned.

I sent it only to better explain the defect I found in my initial patch 
or this v2 patch. Hope it didn't bother you. Swapping the sk->sk_wq 
seems a bad way to fix the issue because it can not handle the 
fasync_list well. Unfortunately I found this defect until I almost 
finished it :(

So, I think maybe it is fine that just send the second patch "[RFC PATCH 
net 2/2] net/smc: Transfer remaining wait queue entries" again. I will 
send it later.

And, it is okay for me if you want to improve the commit messages or 
comments.

Thank you.

Cheers,
Wen Gu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-12  3:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 12:50 [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Wen Gu
2021-11-10 12:50 ` [RFC PATCH net v2 1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback Wen Gu
2021-11-10 12:50 ` [RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries during fallback Wen Gu
2021-11-11 14:21 ` [RFC PATCH 0/2] Two RFC patches for the same SMC socket wait queue mismatch issue Karsten Graul
2021-11-12  3:09   ` Wen Gu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).