All of lore.kernel.org
 help / color / mirror / Atom feed
* iouring locking issue in io_req_complete_post() /  io_rsrc_node_ref_zero()
@ 2021-08-09  4:36 Nadav Amit
  2021-08-09 13:49 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Nadav Amit @ 2021-08-09  4:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

Jens, others,

Sorry for bothering again, but I encountered a lockdep assertion failure:

[  106.009878] ------------[ cut here ]------------
[  106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0
[  106.014524] Modules linked in:
[  106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161
[  106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
[  106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0
[  106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65
[  106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046
[  106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
[  106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac
[  106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9
[  106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae
[  106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890
[  106.036115] FS:  00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000
[  106.037855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0
[  106.040453] Call Trace:
[  106.041245]  _raw_spin_unlock_bh+0x31/0x40
[  106.042543]  io_rsrc_node_ref_zero+0x13e/0x190
[  106.043471]  io_dismantle_req+0x215/0x220
[  106.044297]  io_req_complete_post+0x1b8/0x720
[  106.045456]  __io_complete_rw.isra.0+0x16b/0x1f0
[  106.046593]  io_complete_rw+0x10/0x20

[ .... The rest of the call-stack is my stuff ] 


Apparently, io_req_complete_post() disables IRQs and this code-path seems
valid (IOW: I did not somehow cause this failure). I am not familiar with
this code, so some feedback would be appreciated.

Thanks,
Nadav


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero()
  2021-08-09  4:36 iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() Nadav Amit
@ 2021-08-09 13:49 ` Jens Axboe
  2021-08-09 22:01   ` Nadav Amit
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2021-08-09 13:49 UTC (permalink / raw)
  To: Nadav Amit; +Cc: io-uring

On 8/8/21 10:36 PM, Nadav Amit wrote:
> Jens, others,
> 
> Sorry for bothering again, but I encountered a lockdep assertion failure:
> 
> [  106.009878] ------------[ cut here ]------------
> [  106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0
> [  106.014524] Modules linked in:
> [  106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161
> [  106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
> [  106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0
> [  106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65
> [  106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046
> [  106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
> [  106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac
> [  106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9
> [  106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae
> [  106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890
> [  106.036115] FS:  00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000
> [  106.037855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0
> [  106.040453] Call Trace:
> [  106.041245]  _raw_spin_unlock_bh+0x31/0x40
> [  106.042543]  io_rsrc_node_ref_zero+0x13e/0x190
> [  106.043471]  io_dismantle_req+0x215/0x220
> [  106.044297]  io_req_complete_post+0x1b8/0x720
> [  106.045456]  __io_complete_rw.isra.0+0x16b/0x1f0
> [  106.046593]  io_complete_rw+0x10/0x20
> 
> [ .... The rest of the call-stack is my stuff ] 
> 
> 
> Apparently, io_req_complete_post() disables IRQs and this code-path seems
> valid (IOW: I did not somehow cause this failure). I am not familiar with
> this code, so some feedback would be appreciated.

Can you try with this patch?


diff --git a/fs/io_uring.c b/fs/io_uring.c
index ca064486cb41..6a8257233061 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7138,16 +7138,6 @@ static void **io_alloc_page_table(size_t size)
 	return table;
 }
 
-static inline void io_rsrc_ref_lock(struct io_ring_ctx *ctx)
-{
-	spin_lock_bh(&ctx->rsrc_ref_lock);
-}
-
-static inline void io_rsrc_ref_unlock(struct io_ring_ctx *ctx)
-{
-	spin_unlock_bh(&ctx->rsrc_ref_lock);
-}
-
 static void io_rsrc_node_destroy(struct io_rsrc_node *ref_node)
 {
 	percpu_ref_exit(&ref_node->refs);
@@ -7164,9 +7154,9 @@ static void io_rsrc_node_switch(struct io_ring_ctx *ctx,
 		struct io_rsrc_node *rsrc_node = ctx->rsrc_node;
 
 		rsrc_node->rsrc_data = data_to_kill;
-		io_rsrc_ref_lock(ctx);
+		spin_lock_irq(&ctx->rsrc_ref_lock);
 		list_add_tail(&rsrc_node->node, &ctx->rsrc_ref_list);
-		io_rsrc_ref_unlock(ctx);
+		spin_unlock_irq(&ctx->rsrc_ref_lock);
 
 		atomic_inc(&data_to_kill->refs);
 		percpu_ref_kill(&rsrc_node->refs);
@@ -7674,9 +7664,10 @@ static void io_rsrc_node_ref_zero(struct percpu_ref *ref)
 {
 	struct io_rsrc_node *node = container_of(ref, struct io_rsrc_node, refs);
 	struct io_ring_ctx *ctx = node->rsrc_data->ctx;
+	unsigned long flags;
 	bool first_add = false;
 
-	io_rsrc_ref_lock(ctx);
+	spin_lock_irqsave(&ctx->rsrc_ref_lock, flags);
 	node->done = true;
 
 	while (!list_empty(&ctx->rsrc_ref_list)) {
@@ -7688,7 +7679,7 @@ static void io_rsrc_node_ref_zero(struct percpu_ref *ref)
 		list_del(&node->node);
 		first_add |= llist_add(&node->llist, &ctx->rsrc_put_llist);
 	}
-	io_rsrc_ref_unlock(ctx);
+	spin_unlock_irqrestore(&ctx->rsrc_ref_lock, flags);
 
 	if (first_add)
 		mod_delayed_work(system_wq, &ctx->rsrc_put_work, HZ);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero()
  2021-08-09 13:49 ` Jens Axboe
@ 2021-08-09 22:01   ` Nadav Amit
  2021-08-09 22:11     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Nadav Amit @ 2021-08-09 22:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring


> On Aug 9, 2021, at 6:49 AM, Jens Axboe <axboe@kernel.dk> wrote:
> 
> On 8/8/21 10:36 PM, Nadav Amit wrote:
>> Jens, others,
>> 
>> Sorry for bothering again, but I encountered a lockdep assertion failure:
>> 
>> [  106.009878] ------------[ cut here ]------------
>> [  106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0
>> [  106.014524] Modules linked in:
>> [  106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161
>> [  106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
>> [  106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0
>> [  106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65
>> [  106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046
>> [  106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
>> [  106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac
>> [  106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9
>> [  106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae
>> [  106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890
>> [  106.036115] FS:  00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000
>> [  106.037855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0
>> [  106.040453] Call Trace:
>> [  106.041245]  _raw_spin_unlock_bh+0x31/0x40
>> [  106.042543]  io_rsrc_node_ref_zero+0x13e/0x190
>> [  106.043471]  io_dismantle_req+0x215/0x220
>> [  106.044297]  io_req_complete_post+0x1b8/0x720
>> [  106.045456]  __io_complete_rw.isra.0+0x16b/0x1f0
>> [  106.046593]  io_complete_rw+0x10/0x20
>> 
>> [ .... The rest of the call-stack is my stuff ] 
>> 
>> 
>> Apparently, io_req_complete_post() disables IRQs and this code-path seems
>> valid (IOW: I did not somehow cause this failure). I am not familiar with
>> this code, so some feedback would be appreciated.
> 
> Can you try with this patch?

Thanks! I might have hit another issue, but apparently even if it is
real, it is unrelated.

Tested-by: Nadav Amit <nadav.amit@gmail.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero()
  2021-08-09 22:01   ` Nadav Amit
@ 2021-08-09 22:11     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2021-08-09 22:11 UTC (permalink / raw)
  To: Nadav Amit; +Cc: io-uring

On 8/9/21 4:01 PM, Nadav Amit wrote:
> 
>> On Aug 9, 2021, at 6:49 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 8/8/21 10:36 PM, Nadav Amit wrote:
>>> Jens, others,
>>>
>>> Sorry for bothering again, but I encountered a lockdep assertion failure:
>>>
>>> [  106.009878] ------------[ cut here ]------------
>>> [  106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0
>>> [  106.014524] Modules linked in:
>>> [  106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161
>>> [  106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
>>> [  106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0
>>> [  106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65
>>> [  106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046
>>> [  106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
>>> [  106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac
>>> [  106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9
>>> [  106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae
>>> [  106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890
>>> [  106.036115] FS:  00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000
>>> [  106.037855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0
>>> [  106.040453] Call Trace:
>>> [  106.041245]  _raw_spin_unlock_bh+0x31/0x40
>>> [  106.042543]  io_rsrc_node_ref_zero+0x13e/0x190
>>> [  106.043471]  io_dismantle_req+0x215/0x220
>>> [  106.044297]  io_req_complete_post+0x1b8/0x720
>>> [  106.045456]  __io_complete_rw.isra.0+0x16b/0x1f0
>>> [  106.046593]  io_complete_rw+0x10/0x20
>>>
>>> [ .... The rest of the call-stack is my stuff ] 
>>>
>>>
>>> Apparently, io_req_complete_post() disables IRQs and this code-path seems
>>> valid (IOW: I did not somehow cause this failure). I am not familiar with
>>> this code, so some feedback would be appreciated.
>>
>> Can you try with this patch?
> 
> Thanks! I might have hit another issue, but apparently even if it is
> real, it is unrelated.
> 
> Tested-by: Nadav Amit <nadav.amit@gmail.com>

Thanks for testing! And regarding another issue, I would expect nothing
less :-). It's always interesting to see new paths being paved, and
inevitably that'll shake out a few issues in code that's been less
exercised than the general part.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-09 22:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09  4:36 iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() Nadav Amit
2021-08-09 13:49 ` Jens Axboe
2021-08-09 22:01   ` Nadav Amit
2021-08-09 22:11     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.