io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] io_uring: leave clean req to be done in flush overflow
@ 2021-01-20  8:11 Joseph Qi
  2021-01-20 12:35 ` Pavel Begunkov
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2021-01-20  8:11 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Xiaoguang Wang

Abaci reported the following BUG:

[   27.629441] BUG: sleeping function called from invalid context at fs/file.c:402
[   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0
[   27.633220] 1 lock held by io_wqe_worker-0/1012:
[   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
[   27.636487] irq event stamp: 66658
[   27.637302] hardirqs last  enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0
[   27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50
[   27.641196] softirqs last  enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa
[   27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20
[   27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68
[   27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   27.649249] Call Trace:
[   27.649874]  dump_stack+0xac/0xe3
[   27.650666]  ___might_sleep+0x284/0x2c0
[   27.651566]  put_files_struct+0xb8/0x120
[   27.652481]  __io_clean_op+0x10c/0x2a0
[   27.653362]  __io_cqring_fill_event+0x2c1/0x350
[   27.654399]  __io_req_complete.part.102+0x41/0x70
[   27.655464]  io_openat2+0x151/0x300
[   27.656297]  io_issue_sqe+0x6c/0x14e0
[   27.657170]  ? lock_acquire+0x31a/0x440
[   27.658068]  ? io_worker_handle_work+0x24e/0x8a0
[   27.659119]  ? find_held_lock+0x28/0xb0
[   27.660026]  ? io_wq_submit_work+0x7f/0x240
[   27.660991]  io_wq_submit_work+0x7f/0x240
[   27.661915]  ? trace_hardirqs_on+0x46/0x110
[   27.662890]  io_worker_handle_work+0x501/0x8a0
[   27.663917]  ? io_wqe_worker+0x135/0x520
[   27.664836]  io_wqe_worker+0x158/0x520
[   27.665719]  ? __kthread_parkme+0x96/0xc0
[   27.666663]  ? io_worker_handle_work+0x8a0/0x8a0
[   27.667726]  kthread+0x134/0x180
[   27.668506]  ? kthread_create_worker_on_cpu+0x90/0x90
[   27.669641]  ret_from_fork+0x1f/0x30

It blames we call cond_resched() with completion_lock when clean
request. In fact we will do it during flush overflow and it seems we
have no reason to do it before. So just remove io_clean_op() in
__io_cqring_fill_event() to fix this BUG.

Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
---
 fs/io_uring.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 985a9e3..9b937d1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1860,7 +1860,6 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
 			set_bit(0, &ctx->cq_check_overflow);
 			ctx->rings->sq_flags |= IORING_SQ_CQ_OVERFLOW;
 		}
-		io_clean_op(req);
 		req->result = res;
 		req->compl.cflags = cflags;
 		refcount_inc(&req->refs);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: leave clean req to be done in flush overflow
  2021-01-20  8:11 [PATCH] io_uring: leave clean req to be done in flush overflow Joseph Qi
@ 2021-01-20 12:35 ` Pavel Begunkov
  2021-01-21  1:37   ` Joseph Qi
  2021-01-21  1:54   ` Xiaoguang Wang
  0 siblings, 2 replies; 6+ messages in thread
From: Pavel Begunkov @ 2021-01-20 12:35 UTC (permalink / raw)
  To: Joseph Qi, Jens Axboe, io-uring; +Cc: Xiaoguang Wang

On 20/01/2021 08:11, Joseph Qi wrote:
> Abaci reported the following BUG:
> 
> [   27.629441] BUG: sleeping function called from invalid context at fs/file.c:402
> [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0
> [   27.633220] 1 lock held by io_wqe_worker-0/1012:
> [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
> [   27.636487] irq event stamp: 66658
> [   27.637302] hardirqs last  enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0
> [   27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50
> [   27.641196] softirqs last  enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa
> [   27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20
> [   27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68
> [   27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [   27.649249] Call Trace:
> [   27.649874]  dump_stack+0xac/0xe3
> [   27.650666]  ___might_sleep+0x284/0x2c0
> [   27.651566]  put_files_struct+0xb8/0x120
> [   27.652481]  __io_clean_op+0x10c/0x2a0
> [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
> [   27.654399]  __io_req_complete.part.102+0x41/0x70
> [   27.655464]  io_openat2+0x151/0x300
> [   27.656297]  io_issue_sqe+0x6c/0x14e0
> [   27.657170]  ? lock_acquire+0x31a/0x440
> [   27.658068]  ? io_worker_handle_work+0x24e/0x8a0
> [   27.659119]  ? find_held_lock+0x28/0xb0
> [   27.660026]  ? io_wq_submit_work+0x7f/0x240
> [   27.660991]  io_wq_submit_work+0x7f/0x240
> [   27.661915]  ? trace_hardirqs_on+0x46/0x110
> [   27.662890]  io_worker_handle_work+0x501/0x8a0
> [   27.663917]  ? io_wqe_worker+0x135/0x520
> [   27.664836]  io_wqe_worker+0x158/0x520
> [   27.665719]  ? __kthread_parkme+0x96/0xc0
> [   27.666663]  ? io_worker_handle_work+0x8a0/0x8a0
> [   27.667726]  kthread+0x134/0x180
> [   27.668506]  ? kthread_create_worker_on_cpu+0x90/0x90
> [   27.669641]  ret_from_fork+0x1f/0x30
> 
> It blames we call cond_resched() with completion_lock when clean
> request. In fact we will do it during flush overflow and it seems we
> have no reason to do it before. So just remove io_clean_op() in
> __io_cqring_fill_event() to fix this BUG.

Nope, it would be broken. You may override, e.g. iov pointer
that is dynamically allocated, and the function makes sure all
those are deleted and freed. Most probably there will be problems
on flush side as well.

Looks like the problem is that we do spin_lock_irqsave() in
__io_req_complete() and then just spin_lock() for put_files_struct().
Jens, is it a real problem?

At least for 5.12 there is a cleanup as below, moving drop_files()
into io_req_clean_work/io_free_req(), which is out of locks. Depends
on that don't-cancel-by-files patch, but I guess can be for 5.11


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4f702d03d375..3d3087851fed 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -614,7 +614,6 @@ enum {
 	REQ_F_BUFFER_SELECT_BIT	= IOSQE_BUFFER_SELECT_BIT,
 
 	REQ_F_FAIL_LINK_BIT,
-	REQ_F_INFLIGHT_BIT,
 	REQ_F_CUR_POS_BIT,
 	REQ_F_NOWAIT_BIT,
 	REQ_F_LINK_TIMEOUT_BIT,
@@ -647,8 +646,6 @@ enum {
 
 	/* fail rest of links */
 	REQ_F_FAIL_LINK		= BIT(REQ_F_FAIL_LINK_BIT),
-	/* on inflight list */
-	REQ_F_INFLIGHT		= BIT(REQ_F_INFLIGHT_BIT),
 	/* read/write uses file position */
 	REQ_F_CUR_POS		= BIT(REQ_F_CUR_POS_BIT),
 	/* must not punt to workers */
@@ -1057,8 +1054,7 @@ EXPORT_SYMBOL(io_uring_get_socket);
 
 static inline void io_clean_op(struct io_kiocb *req)
 {
-	if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED |
-			  REQ_F_INFLIGHT))
+	if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED))
 		__io_clean_op(req);
 }
 
@@ -1375,6 +1371,11 @@ static void io_req_clean_work(struct io_kiocb *req)
 			free_fs_struct(fs);
 		req->work.flags &= ~IO_WQ_WORK_FS;
 	}
+	if (req->work.flags & IO_WQ_WORK_FILES) {
+		put_files_struct(req->work.identity->files);
+		put_nsproxy(req->work.identity->nsproxy);
+		req->work.flags &= ~IO_WQ_WORK_FILES;
+	}
 
 	io_put_identity(req->task->io_uring, req);
 }
@@ -1483,7 +1484,6 @@ static bool io_grab_identity(struct io_kiocb *req)
 			return false;
 		atomic_inc(&id->files->count);
 		get_nsproxy(id->nsproxy);
-		req->flags |= REQ_F_INFLIGHT;
 		req->work.flags |= IO_WQ_WORK_FILES;
 	}
 	if (!(req->work.flags & IO_WQ_WORK_MM) &&
@@ -6128,18 +6128,6 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return -EIOCBQUEUED;
 }
 
-static void io_req_drop_files(struct io_kiocb *req)
-{
-	struct io_uring_task *tctx = req->task->io_uring;
-
-	put_files_struct(req->work.identity->files);
-	put_nsproxy(req->work.identity->nsproxy);
-	req->flags &= ~REQ_F_INFLIGHT;
-	req->work.flags &= ~IO_WQ_WORK_FILES;
-	if (atomic_read(&tctx->in_idle))
-		wake_up(&tctx->wait);
-}
-
 static void __io_clean_op(struct io_kiocb *req)
 {
 	if (req->flags & REQ_F_BUFFER_SELECTED) {
@@ -6197,9 +6185,6 @@ static void __io_clean_op(struct io_kiocb *req)
 		}
 		req->flags &= ~REQ_F_NEED_CLEANUP;
 	}
-
-	if (req->flags & REQ_F_INFLIGHT)
-		io_req_drop_files(req);
 }
 
 static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,


-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: leave clean req to be done in flush overflow
  2021-01-20 12:35 ` Pavel Begunkov
@ 2021-01-21  1:37   ` Joseph Qi
  2021-01-21  2:00     ` Pavel Begunkov
  2021-01-21  1:54   ` Xiaoguang Wang
  1 sibling, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2021-01-21  1:37 UTC (permalink / raw)
  To: Pavel Begunkov, Joseph Qi, Jens Axboe, io-uring; +Cc: Xiaoguang Wang



On 1/20/21 8:35 PM, Pavel Begunkov wrote:
> On 20/01/2021 08:11, Joseph Qi wrote:
>> Abaci reported the following BUG:
>>
>> [   27.629441] BUG: sleeping function called from invalid context at fs/file.c:402
>> [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0
>> [   27.633220] 1 lock held by io_wqe_worker-0/1012:
>> [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
>> [   27.636487] irq event stamp: 66658
>> [   27.637302] hardirqs last  enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0
>> [   27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50
>> [   27.641196] softirqs last  enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa
>> [   27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20
>> [   27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68
>> [   27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>> [   27.649249] Call Trace:
>> [   27.649874]  dump_stack+0xac/0xe3
>> [   27.650666]  ___might_sleep+0x284/0x2c0
>> [   27.651566]  put_files_struct+0xb8/0x120
>> [   27.652481]  __io_clean_op+0x10c/0x2a0
>> [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
>> [   27.654399]  __io_req_complete.part.102+0x41/0x70
>> [   27.655464]  io_openat2+0x151/0x300
>> [   27.656297]  io_issue_sqe+0x6c/0x14e0
>> [   27.657170]  ? lock_acquire+0x31a/0x440
>> [   27.658068]  ? io_worker_handle_work+0x24e/0x8a0
>> [   27.659119]  ? find_held_lock+0x28/0xb0
>> [   27.660026]  ? io_wq_submit_work+0x7f/0x240
>> [   27.660991]  io_wq_submit_work+0x7f/0x240
>> [   27.661915]  ? trace_hardirqs_on+0x46/0x110
>> [   27.662890]  io_worker_handle_work+0x501/0x8a0
>> [   27.663917]  ? io_wqe_worker+0x135/0x520
>> [   27.664836]  io_wqe_worker+0x158/0x520
>> [   27.665719]  ? __kthread_parkme+0x96/0xc0
>> [   27.666663]  ? io_worker_handle_work+0x8a0/0x8a0
>> [   27.667726]  kthread+0x134/0x180
>> [   27.668506]  ? kthread_create_worker_on_cpu+0x90/0x90
>> [   27.669641]  ret_from_fork+0x1f/0x30
>>
>> It blames we call cond_resched() with completion_lock when clean
>> request. In fact we will do it during flush overflow and it seems we
>> have no reason to do it before. So just remove io_clean_op() in
>> __io_cqring_fill_event() to fix this BUG.
> 
> Nope, it would be broken. You may override, e.g. iov pointer
> that is dynamically allocated, and the function makes sure all
> those are deleted and freed. Most probably there will be problems
> on flush side as well.
> 
> Looks like the problem is that we do spin_lock_irqsave() in
> __io_req_complete() and then just spin_lock() for put_files_struct().
> Jens, is it a real problem?
> 
From the code, it is because it might sleep in close_files():

...
if (file) {
	filp_close(file, files);
	cond_resched();
}


Thanks,
Joseph

> At least for 5.12 there is a cleanup as below, moving drop_files()
> into io_req_clean_work/io_free_req(), which is out of locks. Depends
> on that don't-cancel-by-files patch, but I guess can be for 5.11

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: leave clean req to be done in flush overflow
  2021-01-20 12:35 ` Pavel Begunkov
  2021-01-21  1:37   ` Joseph Qi
@ 2021-01-21  1:54   ` Xiaoguang Wang
  2021-01-21  2:11     ` Pavel Begunkov
  1 sibling, 1 reply; 6+ messages in thread
From: Xiaoguang Wang @ 2021-01-21  1:54 UTC (permalink / raw)
  To: Pavel Begunkov, Joseph Qi, Jens Axboe, io-uring

hi Pavel,

> On 20/01/2021 08:11, Joseph Qi wrote:
>> Abaci reported the following BUG:
>>
>> [   27.629441] BUG: sleeping function called from invalid context at fs/file.c:402
>> [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0
>> [   27.633220] 1 lock held by io_wqe_worker-0/1012:
>> [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
>> [   27.636487] irq event stamp: 66658
>> [   27.637302] hardirqs last  enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0
>> [   27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50
>> [   27.641196] softirqs last  enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa
>> [   27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20
>> [   27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68
>> [   27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>> [   27.649249] Call Trace:
>> [   27.649874]  dump_stack+0xac/0xe3
>> [   27.650666]  ___might_sleep+0x284/0x2c0
>> [   27.651566]  put_files_struct+0xb8/0x120
>> [   27.652481]  __io_clean_op+0x10c/0x2a0
>> [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
>> [   27.654399]  __io_req_complete.part.102+0x41/0x70
>> [   27.655464]  io_openat2+0x151/0x300
>> [   27.656297]  io_issue_sqe+0x6c/0x14e0
>> [   27.657170]  ? lock_acquire+0x31a/0x440
>> [   27.658068]  ? io_worker_handle_work+0x24e/0x8a0
>> [   27.659119]  ? find_held_lock+0x28/0xb0
>> [   27.660026]  ? io_wq_submit_work+0x7f/0x240
>> [   27.660991]  io_wq_submit_work+0x7f/0x240
>> [   27.661915]  ? trace_hardirqs_on+0x46/0x110
>> [   27.662890]  io_worker_handle_work+0x501/0x8a0
>> [   27.663917]  ? io_wqe_worker+0x135/0x520
>> [   27.664836]  io_wqe_worker+0x158/0x520
>> [   27.665719]  ? __kthread_parkme+0x96/0xc0
>> [   27.666663]  ? io_worker_handle_work+0x8a0/0x8a0
>> [   27.667726]  kthread+0x134/0x180
>> [   27.668506]  ? kthread_create_worker_on_cpu+0x90/0x90
>> [   27.669641]  ret_from_fork+0x1f/0x30
>>
>> It blames we call cond_resched() with completion_lock when clean
>> request. In fact we will do it during flush overflow and it seems we
>> have no reason to do it before. So just remove io_clean_op() in
>> __io_cqring_fill_event() to fix this BUG.
> 
> Nope, it would be broken. You may override, e.g. iov pointer
> that is dynamically allocated, and the function makes sure all
> those are deleted and freed. Most probably there will be problems
> on flush side as well.
Could you please explain more why this is a problem?
io_clean_op justs does some clean work, free allocated memory, put file, etc,
and these jobs should can be done in __io_cqring_overflow_flush():
	while (!list_empty(&list)) {
		req = list_first_entry(&list, struct io_kiocb, compl.list);
		list_del(&req->compl.list);
		io_put_req(req); // will call io_clean_op
	}

And calling a single io_clean_op in __io_cqring_fill_event() looks weird.

Regards,
Xiaoguang Wang

> 
> Looks like the problem is that we do spin_lock_irqsave() in
> __io_req_complete() and then just spin_lock() for put_files_struct().
> Jens, is it a real problem?
> 
> At least for 5.12 there is a cleanup as below, moving drop_files()
> into io_req_clean_work/io_free_req(), which is out of locks. Depends
> on that don't-cancel-by-files patch, but I guess can be for 5.11
> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 4f702d03d375..3d3087851fed 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -614,7 +614,6 @@ enum {
>   	REQ_F_BUFFER_SELECT_BIT	= IOSQE_BUFFER_SELECT_BIT,
>   
>   	REQ_F_FAIL_LINK_BIT,
> -	REQ_F_INFLIGHT_BIT,
>   	REQ_F_CUR_POS_BIT,
>   	REQ_F_NOWAIT_BIT,
>   	REQ_F_LINK_TIMEOUT_BIT,
> @@ -647,8 +646,6 @@ enum {
>   
>   	/* fail rest of links */
>   	REQ_F_FAIL_LINK		= BIT(REQ_F_FAIL_LINK_BIT),
> -	/* on inflight list */
> -	REQ_F_INFLIGHT		= BIT(REQ_F_INFLIGHT_BIT),
>   	/* read/write uses file position */
>   	REQ_F_CUR_POS		= BIT(REQ_F_CUR_POS_BIT),
>   	/* must not punt to workers */
> @@ -1057,8 +1054,7 @@ EXPORT_SYMBOL(io_uring_get_socket);
>   
>   static inline void io_clean_op(struct io_kiocb *req)
>   {
> -	if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED |
> -			  REQ_F_INFLIGHT))
> +	if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED))
>   		__io_clean_op(req);
>   }
>   
> @@ -1375,6 +1371,11 @@ static void io_req_clean_work(struct io_kiocb *req)
>   			free_fs_struct(fs);
>   		req->work.flags &= ~IO_WQ_WORK_FS;
>   	}
> +	if (req->work.flags & IO_WQ_WORK_FILES) {
> +		put_files_struct(req->work.identity->files);
> +		put_nsproxy(req->work.identity->nsproxy);
> +		req->work.flags &= ~IO_WQ_WORK_FILES;
> +	}
>   
>   	io_put_identity(req->task->io_uring, req);
>   }
> @@ -1483,7 +1484,6 @@ static bool io_grab_identity(struct io_kiocb *req)
>   			return false;
>   		atomic_inc(&id->files->count);
>   		get_nsproxy(id->nsproxy);
> -		req->flags |= REQ_F_INFLIGHT;
>   		req->work.flags |= IO_WQ_WORK_FILES;
>   	}
>   	if (!(req->work.flags & IO_WQ_WORK_MM) &&
> @@ -6128,18 +6128,6 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>   	return -EIOCBQUEUED;
>   }
>   
> -static void io_req_drop_files(struct io_kiocb *req)
> -{
> -	struct io_uring_task *tctx = req->task->io_uring;
> -
> -	put_files_struct(req->work.identity->files);
> -	put_nsproxy(req->work.identity->nsproxy);
> -	req->flags &= ~REQ_F_INFLIGHT;
> -	req->work.flags &= ~IO_WQ_WORK_FILES;
> -	if (atomic_read(&tctx->in_idle))
> -		wake_up(&tctx->wait);
> -}
> -
>   static void __io_clean_op(struct io_kiocb *req)
>   {
>   	if (req->flags & REQ_F_BUFFER_SELECTED) {
> @@ -6197,9 +6185,6 @@ static void __io_clean_op(struct io_kiocb *req)
>   		}
>   		req->flags &= ~REQ_F_NEED_CLEANUP;
>   	}
> -
> -	if (req->flags & REQ_F_INFLIGHT)
> -		io_req_drop_files(req);
>   }
>   
>   static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: leave clean req to be done in flush overflow
  2021-01-21  1:37   ` Joseph Qi
@ 2021-01-21  2:00     ` Pavel Begunkov
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2021-01-21  2:00 UTC (permalink / raw)
  To: Joseph Qi, Joseph Qi, Jens Axboe, io-uring; +Cc: Xiaoguang Wang

On 21/01/2021 01:37, Joseph Qi wrote:
> 
> 
> On 1/20/21 8:35 PM, Pavel Begunkov wrote:
>> On 20/01/2021 08:11, Joseph Qi wrote:
>>> Abaci reported the following BUG:
>>>
>>> [   27.629441] BUG: sleeping function called from invalid context at fs/file.c:402
>>> [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0
>>> [   27.633220] 1 lock held by io_wqe_worker-0/1012:
>>> [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
>>> [   27.636487] irq event stamp: 66658
>>> [   27.637302] hardirqs last  enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0
>>> [   27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50
>>> [   27.641196] softirqs last  enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa
>>> [   27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20
>>> [   27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68
>>> [   27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>>> [   27.649249] Call Trace:
>>> [   27.649874]  dump_stack+0xac/0xe3
>>> [   27.650666]  ___might_sleep+0x284/0x2c0
>>> [   27.651566]  put_files_struct+0xb8/0x120
>>> [   27.652481]  __io_clean_op+0x10c/0x2a0
>>> [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
>>> [   27.654399]  __io_req_complete.part.102+0x41/0x70
>>> [   27.655464]  io_openat2+0x151/0x300
>>> [   27.656297]  io_issue_sqe+0x6c/0x14e0
>>> [   27.657170]  ? lock_acquire+0x31a/0x440
>>> [   27.658068]  ? io_worker_handle_work+0x24e/0x8a0
>>> [   27.659119]  ? find_held_lock+0x28/0xb0
>>> [   27.660026]  ? io_wq_submit_work+0x7f/0x240
>>> [   27.660991]  io_wq_submit_work+0x7f/0x240
>>> [   27.661915]  ? trace_hardirqs_on+0x46/0x110
>>> [   27.662890]  io_worker_handle_work+0x501/0x8a0
>>> [   27.663917]  ? io_wqe_worker+0x135/0x520
>>> [   27.664836]  io_wqe_worker+0x158/0x520
>>> [   27.665719]  ? __kthread_parkme+0x96/0xc0
>>> [   27.666663]  ? io_worker_handle_work+0x8a0/0x8a0
>>> [   27.667726]  kthread+0x134/0x180
>>> [   27.668506]  ? kthread_create_worker_on_cpu+0x90/0x90
>>> [   27.669641]  ret_from_fork+0x1f/0x30
>>>
>>> It blames we call cond_resched() with completion_lock when clean
>>> request. In fact we will do it during flush overflow and it seems we
>>> have no reason to do it before. So just remove io_clean_op() in
>>> __io_cqring_fill_event() to fix this BUG.
>>
>> Nope, it would be broken. You may override, e.g. iov pointer
>> that is dynamically allocated, and the function makes sure all
>> those are deleted and freed. Most probably there will be problems
>> on flush side as well.
>>
>> Looks like the problem is that we do spin_lock_irqsave() in
>> __io_req_complete() and then just spin_lock() for put_files_struct().
>> Jens, is it a real problem?
>>
> From the code, it is because it might sleep in close_files():

Makes sense. The diff should handle it, but let's see if
it would ever be applicable after some other bug fixes.

> 
> ...
> if (file) {
> 	filp_close(file, files);
> 	cond_resched();
> }
> 
> 
> Thanks,
> Joseph
> 
>> At least for 5.12 there is a cleanup as below, moving drop_files()
>> into io_req_clean_work/io_free_req(), which is out of locks. Depends
>> on that don't-cancel-by-files patch, but I guess can be for 5.11

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: leave clean req to be done in flush overflow
  2021-01-21  1:54   ` Xiaoguang Wang
@ 2021-01-21  2:11     ` Pavel Begunkov
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2021-01-21  2:11 UTC (permalink / raw)
  To: Xiaoguang Wang, Joseph Qi, Jens Axboe, io-uring

On 21/01/2021 01:54, Xiaoguang Wang wrote:
> hi Pavel,
> 
>> On 20/01/2021 08:11, Joseph Qi wrote:
>>> Abaci reported the following BUG:
>>>
>>> [   27.629441] BUG: sleeping function called from invalid context at fs/file.c:402
>>> [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0
>>> [   27.633220] 1 lock held by io_wqe_worker-0/1012:
>>> [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
>>> [   27.636487] irq event stamp: 66658
>>> [   27.637302] hardirqs last  enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0
>>> [   27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50
>>> [   27.641196] softirqs last  enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa
>>> [   27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20
>>> [   27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68
>>> [   27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>>> [   27.649249] Call Trace:
>>> [   27.649874]  dump_stack+0xac/0xe3
>>> [   27.650666]  ___might_sleep+0x284/0x2c0
>>> [   27.651566]  put_files_struct+0xb8/0x120
>>> [   27.652481]  __io_clean_op+0x10c/0x2a0
>>> [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
>>> [   27.654399]  __io_req_complete.part.102+0x41/0x70
>>> [   27.655464]  io_openat2+0x151/0x300
>>> [   27.656297]  io_issue_sqe+0x6c/0x14e0
>>> [   27.657170]  ? lock_acquire+0x31a/0x440
>>> [   27.658068]  ? io_worker_handle_work+0x24e/0x8a0
>>> [   27.659119]  ? find_held_lock+0x28/0xb0
>>> [   27.660026]  ? io_wq_submit_work+0x7f/0x240
>>> [   27.660991]  io_wq_submit_work+0x7f/0x240
>>> [   27.661915]  ? trace_hardirqs_on+0x46/0x110
>>> [   27.662890]  io_worker_handle_work+0x501/0x8a0
>>> [   27.663917]  ? io_wqe_worker+0x135/0x520
>>> [   27.664836]  io_wqe_worker+0x158/0x520
>>> [   27.665719]  ? __kthread_parkme+0x96/0xc0
>>> [   27.666663]  ? io_worker_handle_work+0x8a0/0x8a0
>>> [   27.667726]  kthread+0x134/0x180
>>> [   27.668506]  ? kthread_create_worker_on_cpu+0x90/0x90
>>> [   27.669641]  ret_from_fork+0x1f/0x30
>>>
>>> It blames we call cond_resched() with completion_lock when clean
>>> request. In fact we will do it during flush overflow and it seems we
>>> have no reason to do it before. So just remove io_clean_op() in
>>> __io_cqring_fill_event() to fix this BUG.
>>
>> Nope, it would be broken. You may override, e.g. iov pointer
>> that is dynamically allocated, and the function makes sure all
>> those are deleted and freed. Most probably there will be problems
>> on flush side as well.
> Could you please explain more why this is a problem?
> io_clean_op justs does some clean work, free allocated memory, put file, etc,
> and these jobs should can be done in __io_cqring_overflow_flush():

struct io_kiocb {
	union {
		struct file		*file;
		struct io_rw		rw;
		...
		/* use only after cleaning per-op data, see io_clean_op() */
		struct io_completion	compl;
	};
};

io_clean_op() cleans everything in first 64B (not only), and that space
is used for overflow lists, etc.

io_clean_op(req);
req->compl.cflags = cflags;
     -----
list_add_tail(&req->compl.list, &ctx->cq_overflow_list);
                    -----

That's the reason why we need to call it. A bit different story is why
it does drop_files(). One time it was in io_req_clean_work(), which is
called without locks held, but there were nasty races with cancellations
of overflowed reqs, so it was much easier to move into io_clean_op(),
so we just don't ever have requests with ->files in overflowed lists.

As we just changed that cancellation scheme, those races are not
existent anymore, and it could be moved back as in the diff. 


>     while (!list_empty(&list)) {
>         req = list_first_entry(&list, struct io_kiocb, compl.list);
>         list_del(&req->compl.list);
>         io_put_req(req); // will call io_clean_op
>     }

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-21  3:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20  8:11 [PATCH] io_uring: leave clean req to be done in flush overflow Joseph Qi
2021-01-20 12:35 ` Pavel Begunkov
2021-01-21  1:37   ` Joseph Qi
2021-01-21  2:00     ` Pavel Begunkov
2021-01-21  1:54   ` Xiaoguang Wang
2021-01-21  2:11     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).