All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] p9: trans_fd: Fix deadlock when connection cancel
@ 2022-08-31 18:09 Schspa Shi
  2022-08-31 20:42 ` asmadeus
  0 siblings, 1 reply; 5+ messages in thread
From: Schspa Shi @ 2022-08-31 18:09 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, linux_oss, davem, edumazet, kuba, pabeni
  Cc: v9fs-developer, netdev, linux-kernel, Schspa Shi

There is a deadlock condition when connection canceled.

Please refer to the following scenarios.
           task 0                task1
------------------------------------------------------------------
p9_client_rpc
  req = p9_client_prepare_req(c, type, c->msize, fmt, ap);
  // refcount = 2
  err = c->trans_mod->request(c, req);
  // req was added to unsent_req_list
  wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);

                           p9_read_work
                             // IO error happen
                           error:
                             p9_conn_cancel(m, err);
                              spin_lock(&m->client->lock);
                              // hold client->lock now
                              p9_client_cb
                                req->status = REQ_STATUS_ERROR;
                                wake_up(&req->wq);
                                // task 0 wakeup
                                << preempted >>

reterr:
	p9_req_put(c, req);
    // refcount = 1 now
                                << got scheduled >>
                                p9_req_put
                                  // refcount = 0
                                  p9_tag_remove(c, r);
                                    spin_lock_irqsave(&c->lock, flags);
                           ------------- deadlock -------------

[  651.564169] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  651.564176] rcu:     3-...0: (8 ticks this GP) idle=40b4/1/0x4000000000000000 softirq=1289/1290 fqs=83762
[  651.564185]  (detected by 2, t=420047 jiffies, g=1601, q=992 ncpus=4)
[  651.564190] Sending NMI from CPU 2 to CPUs 3:
[  651.539301] NMI backtrace for cpu 3
[  651.539301] CPU: 3 PID: 46 Comm: kworker/3:1 Not tainted 6.0.0-rc2-rt3-00493-g2af9a9504166 #3
[  651.539301] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  651.539301] Workqueue: events p9_read_work
[  651.539301] RIP: 0010:queued_spin_lock_slowpath+0xfc/0x590
[  651.539301] Code: 00 00 00 65 48 2b 04 25 28 00 00 00 0f 85 a5 04 00 00 48 81 c4 88 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc0
[  651.539301] RSP: 0018:ffff888002987ad8 EFLAGS: 00000002
[  651.539301] RAX: 0000000000000000 RBX: 0000000000000001 RCX: dffffc0000000000
[  651.539301] RDX: 0000000000000003 RSI: 0000000000000004 RDI: ffff888004adf600
[  651.539301] RBP: ffff888004adf600 R08: ffffffff81d341a0 R09: ffff888004adf603
[  651.539301] R10: ffffed100095bec0 R11: 0000000000000001 R12: 0000000000000001
[  651.539301] R13: 1ffff11000530f5c R14: ffff888004adf600 R15: ffff888002987c38
[  651.539301] FS:  0000000000000000(0000) GS:ffff888036580000(0000) knlGS:0000000000000000
[  651.539301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  651.539301] CR2: 00007fe012d608dc CR3: 000000000bc16000 CR4: 0000000000350ee0
[  651.539301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  651.539301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  651.539301] Call Trace:
[  651.539301]  <TASK>
[  651.539301]  ? osq_unlock+0x100/0x100
[  651.539301]  ? ret_from_fork+0x1f/0x30
[  651.539301]  do_raw_spin_lock+0x196/0x1a0
[  651.539301]  ? spin_bug+0x90/0x90
[  651.539301]  ? do_raw_spin_lock+0x114/0x1a0
[  651.539301]  _raw_spin_lock_irqsave+0x1c/0x30
[  651.539301]  p9_req_put+0x61/0x130
[  651.539301]  p9_conn_cancel+0x321/0x3b0
[  651.539301]  ? p9_conn_create+0x1f0/0x1f0
[  651.539301]  p9_read_work+0x207/0x7d0
[  651.539301]  ? p9_fd_create+0x1d0/0x1d0
[  651.539301]  ? spin_bug+0x90/0x90
[  651.539301]  ? read_word_at_a_time+0xe/0x20
[  651.539301]  process_one_work+0x420/0x720
[  651.539301]  worker_thread+0x2b9/0x700
[  651.539301]  ? rescuer_thread+0x620/0x620
[  651.539301]  kthread+0x176/0x1b0
[  651.539301]  ? kthread_complete_and_exit+0x20/0x20
[  651.539301]  ret_from_fork+0x1f/0x30

To fix it, we can add extra reference counter to avoid deadlock, and
decrease it after we unlock the client->lock.

Fixes: 67dd8e445ee0 ("9p: roll p9_tag_remove into p9_req_put")

Signed-off-by: Schspa Shi <schspa@gmail.com>
---
 net/9p/trans_fd.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e758978b44bee..2e4e039b38e3e 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -205,14 +205,19 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 		list_move(&req->req_list, &cancel_list);
 	}
 
-	list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) {
+	list_for_each_entry(req, &cancel_list, req_list) {
 		p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req);
-		list_del(&req->req_list);
 		if (!req->t_err)
 			req->t_err = err;
+		p9_req_get(req);
 		p9_client_cb(m->client, req, REQ_STATUS_ERROR);
 	}
 	spin_unlock(&m->client->lock);
+
+	list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) {
+		list_del(&req->req_list);
+		p9_req_put(m->client, req);
+	}
 }
 
 static __poll_t
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] p9: trans_fd: Fix deadlock when connection cancel
  2022-08-31 18:09 [PATCH] p9: trans_fd: Fix deadlock when connection cancel Schspa Shi
@ 2022-08-31 20:42 ` asmadeus
  2022-09-01  2:55   ` Schspa Shi
  0 siblings, 1 reply; 5+ messages in thread
From: asmadeus @ 2022-08-31 20:42 UTC (permalink / raw)
  To: Schspa Shi
  Cc: ericvh, lucho, linux_oss, davem, edumazet, kuba, pabeni,
	v9fs-developer, netdev, linux-kernel

Schspa Shi wrote on Thu, Sep 01, 2022 at 02:09:50AM +0800:
> To fix it, we can add extra reference counter to avoid deadlock, and
> decrease it after we unlock the client->lock.

Thanks for the patch!

Unfortunately I already sent a slightly different version to the list,
hidden in another syzbot thread, here:
https://lkml.kernel.org/r/YvyD053bdbGE9xoo@codewreck.org

(yes, sorry, not exactly somewhere I'd expect someone to find it... 9p
hasn't had many contributors recently)


Basically instead of taking an extra lock I just released the client
lock before calling p9_client_cb, so it shouldn't hang anymore.

We don't need the lock to call the cb as in p9_conn_cancel we already
won't accept any new request and by this point the requests are in a
local list that isn't shared anywhere.

If you have a test setup, would you mind testing my patch?
That's the main reason I was delaying pushing it.

Since you went out of your way to make this patch if you agree with my
approach I don't mind adding your sign off or another mark of having
worked on it.

Thank you,
--
Dominique

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] p9: trans_fd: Fix deadlock when connection cancel
  2022-08-31 20:42 ` asmadeus
@ 2022-09-01  2:55   ` Schspa Shi
  2022-09-01 15:27     ` Christian Schoenebeck
  0 siblings, 1 reply; 5+ messages in thread
From: Schspa Shi @ 2022-09-01  2:55 UTC (permalink / raw)
  To: asmadeus
  Cc: ericvh, lucho, linux_oss, davem, edumazet, kuba, pabeni,
	v9fs-developer, netdev, linux-kernel


asmadeus@codewreck.org writes:

> Schspa Shi wrote on Thu, Sep 01, 2022 at 02:09:50AM +0800:
>> To fix it, we can add extra reference counter to avoid deadlock, and
>> decrease it after we unlock the client->lock.
>
> Thanks for the patch!
>
> Unfortunately I already sent a slightly different version to the list,
> hidden in another syzbot thread, here:
> https://lkml.kernel.org/r/YvyD053bdbGE9xoo@codewreck.org
>
> (yes, sorry, not exactly somewhere I'd expect someone to find it... 9p
> hasn't had many contributors recently)
>
>
> Basically instead of taking an extra lock I just released the client
> lock before calling p9_client_cb, so it shouldn't hang anymore.
>
> We don't need the lock to call the cb as in p9_conn_cancel we already
> won't accept any new request and by this point the requests are in a
> local list that isn't shared anywhere.
>

Ok, thank you for pointing that out.

> If you have a test setup, would you mind testing my patch?
> That's the main reason I was delaying pushing it.
>

I have test it with my enviroment, it not hang anymore.

> Since you went out of your way to make this patch if you agree with my
> approach I don't mind adding your sign off or another mark of having
> worked on it.
>
> Thank you,

-- 
BRs
Schspa Shi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] p9: trans_fd: Fix deadlock when connection cancel
  2022-09-01  2:55   ` Schspa Shi
@ 2022-09-01 15:27     ` Christian Schoenebeck
  2022-09-04  6:42       ` asmadeus
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Schoenebeck @ 2022-09-01 15:27 UTC (permalink / raw)
  To: asmadeus, Schspa Shi
  Cc: ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs-developer,
	netdev, linux-kernel

On Donnerstag, 1. September 2022 04:55:36 CEST Schspa Shi wrote:
> asmadeus@codewreck.org writes:
> > Schspa Shi wrote on Thu, Sep 01, 2022 at 02:09:50AM +0800:
> >> To fix it, we can add extra reference counter to avoid deadlock, and
> >> decrease it after we unlock the client->lock.
> > 
> > Thanks for the patch!
> > 
> > Unfortunately I already sent a slightly different version to the list,
> > hidden in another syzbot thread, here:
> > https://lkml.kernel.org/r/YvyD053bdbGE9xoo@codewreck.org
> > 
> > (yes, sorry, not exactly somewhere I'd expect someone to find it... 9p
> > hasn't had many contributors recently)
> > 
> > 
> > Basically instead of taking an extra lock I just released the client
> > lock before calling p9_client_cb, so it shouldn't hang anymore.
> > 
> > We don't need the lock to call the cb as in p9_conn_cancel we already
> > won't accept any new request and by this point the requests are in a
> > local list that isn't shared anywhere.
> 
> Ok, thank you for pointing that out.
> 
> > If you have a test setup, would you mind testing my patch?
> > That's the main reason I was delaying pushing it.
> 
> I have test it with my enviroment, it not hang anymore.

Are you fine with that Dominique, or do you want me to test your linked patch 
as well?

You can also explicitly tell me if you need something to be reviewed/tested.

> > Since you went out of your way to make this patch if you agree with my
> > approach I don't mind adding your sign off or another mark of having
> > worked on it.
> > 
> > Thank you,





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] p9: trans_fd: Fix deadlock when connection cancel
  2022-09-01 15:27     ` Christian Schoenebeck
@ 2022-09-04  6:42       ` asmadeus
  0 siblings, 0 replies; 5+ messages in thread
From: asmadeus @ 2022-09-04  6:42 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Schspa Shi, ericvh, lucho, davem, edumazet, kuba, pabeni,
	v9fs-developer, netdev, linux-kernel

Christian Schoenebeck wrote on Thu, Sep 01, 2022 at 05:27:53PM +0200:
> > > If you have a test setup, would you mind testing my patch?
> > > That's the main reason I was delaying pushing it.
> > 
> > I have test it with my enviroment, it not hang anymore.
> 
> Are you fine with that Dominique, or do you want me to test your linked patch 
> as well?
> 
> You can also explicitly tell me if you need something to be reviewed/tested.

I've just resent both patches properly; that should be better for
everyone. It can't hurt to get more tests :)

I don't think we'll catch anything with Tetsuo Handa's other two fixes
as we don't really test trans_fd all that much, so I'll give it a spin
with ganesha on my end when I can find time.

Thanks!
--
Dominiquem

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-04  6:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31 18:09 [PATCH] p9: trans_fd: Fix deadlock when connection cancel Schspa Shi
2022-08-31 20:42 ` asmadeus
2022-09-01  2:55   ` Schspa Shi
2022-09-01 15:27     ` Christian Schoenebeck
2022-09-04  6:42       ` asmadeus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.