[PATCH 0/2] poll fixes

io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] poll fixes
@ 2021-09-12 16:23 Hao Xu
  2021-09-12 16:23 ` [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list Hao Xu
  2021-09-12 16:23 ` [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion Hao Xu
  0 siblings, 2 replies; 11+ messages in thread
From: Hao Xu @ 2021-09-12 16:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Two fixes about poll.

Hao Xu (2):
  io_uring: fix tw list mess-up by adding tw while it's already in tw
    list
  io_uring: fix race between poll completion and cancel_hash insertion

 fs/io_uring.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

-- 
2.24.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list
  2021-09-12 16:23 [PATCH 0/2] poll fixes Hao Xu
@ 2021-09-12 16:23 ` Hao Xu
  2021-09-15  9:44   ` Pavel Begunkov
  2021-09-12 16:23 ` [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion Hao Xu
  1 sibling, 1 reply; 11+ messages in thread
From: Hao Xu @ 2021-09-12 16:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

For multishot mode, there may be cases like:
io_poll_task_func()
-> add_wait_queue()
                            async_wake()
                            ->io_req_task_work_add()
                            this one mess up the running task_work list
                            since req->io_task_work.node is in use.

similar situation for req->io_task_work.fallback_node.
Fix it by set node->next = NULL before we run the tw, so that when we
add req back to the wait queue in middle of tw running, we can safely
re-add it to the tw list.

Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---

 fs/io_uring.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 30d959416eba..c16f6be3d46b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1216,13 +1216,17 @@ static void io_fallback_req_func(struct work_struct *work)
 	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
 						fallback_work.work);
 	struct llist_node *node = llist_del_all(&ctx->fallback_llist);
-	struct io_kiocb *req, *tmp;
+	struct io_kiocb *req;
 	bool locked = false;
 
 	percpu_ref_get(&ctx->refs);
-	llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
+	req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
+	while (member_address_is_nonnull(req, io_task_work.fallback_node)) {
+		node = req->io_task_work.fallback_node.next;
+		req->io_task_work.fallback_node.next = NULL;
 		req->io_task_work.func(req, &locked);
-
+		req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
+	}
 	if (locked) {
 		if (ctx->submit_state.compl_nr)
 			io_submit_flush_completions(ctx);
@@ -2126,6 +2130,7 @@ static void tctx_task_work(struct callback_head *cb)
 				locked = mutex_trylock(&ctx->uring_lock);
 				percpu_ref_get(&ctx->refs);
 			}
+			node->next = NULL;
 			req->io_task_work.func(req, &locked);
 			node = next;
 		} while (node);
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion
  2021-09-12 16:23 [PATCH 0/2] poll fixes Hao Xu
  2021-09-12 16:23 ` [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list Hao Xu
@ 2021-09-12 16:23 ` Hao Xu
  2021-09-15  9:50   ` Pavel Begunkov
  2021-09-15 10:12   ` Pavel Begunkov
  1 sibling, 2 replies; 11+ messages in thread
From: Hao Xu @ 2021-09-12 16:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

If poll arming and poll completion runs parallelly, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:
             iowq                          original context
  io_poll_add
    vfs_poll
     (interruption happens
      tw queued to original
      context)                              io_poll_task_func
                                              generate cqe
                                              del from cancel_hash[]
    if !poll.done
      insert to cancel_hash[]

The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---

Didn't find the exact commit to add Fixes: for..

 fs/io_uring.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c16f6be3d46b..988679e5063f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5118,10 +5118,8 @@ static bool __io_poll_complete(struct io_kiocb *req, __poll_t mask)
 	}
 	if (req->poll.events & EPOLLONESHOT)
 		flags = 0;
-	if (!io_cqring_fill_event(ctx, req->user_data, error, flags)) {
-		req->poll.done = true;
+	if (!io_cqring_fill_event(ctx, req->user_data, error, flags))
 		flags = 0;
-	}
 	if (flags & IORING_CQE_F_MORE)
 		ctx->cq_extra++;
 
@@ -5152,6 +5150,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 		if (done) {
 			io_poll_remove_double(req);
 			hash_del(&req->hash_node);
+			req->poll.done = true;
 		} else {
 			req->result = 0;
 			add_wait_queue(req->poll.head, &req->poll.wait);
@@ -5289,6 +5288,7 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
 
 	hash_del(&req->hash_node);
 	io_poll_remove_double(req);
+	req->poll.done = true;
 	spin_unlock(&ctx->completion_lock);
 
 	if (!READ_ONCE(apoll->poll.canceled))
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list
  2021-09-12 16:23 ` [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list Hao Xu
@ 2021-09-15  9:44   ` Pavel Begunkov
  2021-09-15 10:48     ` Hao Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-09-15  9:44 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/12/21 5:23 PM, Hao Xu wrote:
> For multishot mode, there may be cases like:
> io_poll_task_func()
> -> add_wait_queue()
>                             async_wake()
>                             ->io_req_task_work_add()
>                             this one mess up the running task_work list
>                             since req->io_task_work.node is in use.
> 
> similar situation for req->io_task_work.fallback_node.
> Fix it by set node->next = NULL before we run the tw, so that when we
> add req back to the wait queue in middle of tw running, we can safely
> re-add it to the tw list.

It may get screwed before we get to "node->next = NULL;",

-> async_wake()
  -> io_req_task_work_add()
-> async_wake()
  -> io_req_task_work_add()
tctx_task_work()


> Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work")
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
> 
>  fs/io_uring.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 30d959416eba..c16f6be3d46b 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1216,13 +1216,17 @@ static void io_fallback_req_func(struct work_struct *work)
>  	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
>  						fallback_work.work);
>  	struct llist_node *node = llist_del_all(&ctx->fallback_llist);
> -	struct io_kiocb *req, *tmp;
> +	struct io_kiocb *req;
>  	bool locked = false;
>  
>  	percpu_ref_get(&ctx->refs);
> -	llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
> +	req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
> +	while (member_address_is_nonnull(req, io_task_work.fallback_node)) {
> +		node = req->io_task_work.fallback_node.next;
> +		req->io_task_work.fallback_node.next = NULL;
>  		req->io_task_work.func(req, &locked);
> -
> +		req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
> +	}
>  	if (locked) {
>  		if (ctx->submit_state.compl_nr)
>  			io_submit_flush_completions(ctx);
> @@ -2126,6 +2130,7 @@ static void tctx_task_work(struct callback_head *cb)
>  				locked = mutex_trylock(&ctx->uring_lock);
>  				percpu_ref_get(&ctx->refs);
>  			}
> +			node->next = NULL;
>  			req->io_task_work.func(req, &locked);
>  			node = next;
>  		} while (node);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion
  2021-09-12 16:23 ` [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion Hao Xu
@ 2021-09-15  9:50   ` Pavel Begunkov
  2021-09-15 10:49     ` Hao Xu
  2021-09-15 10:12   ` Pavel Begunkov
  1 sibling, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-09-15  9:50 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/12/21 5:23 PM, Hao Xu wrote:
> If poll arming and poll completion runs parallelly, there maybe races.
> For instance, run io_poll_add in iowq and io_poll_task_func in original
> context, then:
>              iowq                          original context
>   io_poll_add
>     vfs_poll
>      (interruption happens
>       tw queued to original
>       context)                              io_poll_task_func
>                                               generate cqe
>                                               del from cancel_hash[]
>     if !poll.done
>       insert to cancel_hash[]
> 
> The entry left in cancel_hash[], similar case for fast poll.
> Fix it by set poll.done = true when del from cancel_hash[].

Sounds like a valid concern. And the code looks good, but one
of two patches crashed the kernel somewhere in io_read().

./232c93d07b74-test

will be retesting


> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
> 
> Didn't find the exact commit to add Fixes: for..

That's ok. Maybe you can find which kernel versions are
affected? So we can add stable and specify where it should
be backported? E.g.

Cc: stable@vger.kernel.org # 5.12+


>  fs/io_uring.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index c16f6be3d46b..988679e5063f 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5118,10 +5118,8 @@ static bool __io_poll_complete(struct io_kiocb *req, __poll_t mask)
>  	}
>  	if (req->poll.events & EPOLLONESHOT)
>  		flags = 0;
> -	if (!io_cqring_fill_event(ctx, req->user_data, error, flags)) {
> -		req->poll.done = true;
> +	if (!io_cqring_fill_event(ctx, req->user_data, error, flags))
>  		flags = 0;
> -	}
>  	if (flags & IORING_CQE_F_MORE)
>  		ctx->cq_extra++;
>  
> @@ -5152,6 +5150,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>  		if (done) {
>  			io_poll_remove_double(req);
>  			hash_del(&req->hash_node);
> +			req->poll.done = true;
>  		} else {
>  			req->result = 0;
>  			add_wait_queue(req->poll.head, &req->poll.wait);
> @@ -5289,6 +5288,7 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>  
>  	hash_del(&req->hash_node);
>  	io_poll_remove_double(req);
> +	req->poll.done = true;
>  	spin_unlock(&ctx->completion_lock);
>  
>  	if (!READ_ONCE(apoll->poll.canceled))
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion
  2021-09-12 16:23 ` [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion Hao Xu
  2021-09-15  9:50   ` Pavel Begunkov
@ 2021-09-15 10:12   ` Pavel Begunkov
  2021-09-15 10:50     ` Hao Xu
  1 sibling, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-09-15 10:12 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/12/21 5:23 PM, Hao Xu wrote:
> If poll arming and poll completion runs parallelly, there maybe races.
> For instance, run io_poll_add in iowq and io_poll_task_func in original
> context, then:
>              iowq                          original context
>   io_poll_add
>     vfs_poll
>      (interruption happens
>       tw queued to original
>       context)                              io_poll_task_func
>                                               generate cqe
>                                               del from cancel_hash[]
>     if !poll.done
>       insert to cancel_hash[]
> 
> The entry left in cancel_hash[], similar case for fast poll.
> Fix it by set poll.done = true when del from cancel_hash[].
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
> 
> Didn't find the exact commit to add Fixes: for..
> 
>  fs/io_uring.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index c16f6be3d46b..988679e5063f 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5118,10 +5118,8 @@ static bool __io_poll_complete(struct io_kiocb *req, __poll_t mask)
>  	}
>  	if (req->poll.events & EPOLLONESHOT)
>  		flags = 0;
> -	if (!io_cqring_fill_event(ctx, req->user_data, error, flags)) {
> -		req->poll.done = true;
> +	if (!io_cqring_fill_event(ctx, req->user_data, error, flags))
>  		flags = 0;
> -	}
>  	if (flags & IORING_CQE_F_MORE)
>  		ctx->cq_extra++;
>  
> @@ -5152,6 +5150,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>  		if (done) {
>  			io_poll_remove_double(req);
>  			hash_del(&req->hash_node);
> +			req->poll.done = true;
>  		} else {
>  			req->result = 0;
>  			add_wait_queue(req->poll.head, &req->poll.wait);
> @@ -5289,6 +5288,7 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>  
>  	hash_del(&req->hash_node);
>  	io_poll_remove_double(req);
> +	req->poll.done = true;

Only poll request has req->poll. E.g. it overwrites parts of req->rw.kiocb,
I guess .ki_complete in particular.

struct async_poll *apoll = req->apoll;
apoll->poll.done = true;


>  	spin_unlock(&ctx->completion_lock);
>  
>  	if (!READ_ONCE(apoll->poll.canceled))
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list
  2021-09-15  9:44   ` Pavel Begunkov
@ 2021-09-15 10:48     ` Hao Xu
  2021-09-26  9:48       ` Hao Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Hao Xu @ 2021-09-15 10:48 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/15 下午5:44, Pavel Begunkov 写道:
> On 9/12/21 5:23 PM, Hao Xu wrote:
>> For multishot mode, there may be cases like:
>> io_poll_task_func()
>> -> add_wait_queue()
>>                              async_wake()
>>                              ->io_req_task_work_add()
>>                              this one mess up the running task_work list
>>                              since req->io_task_work.node is in use.
>>
>> similar situation for req->io_task_work.fallback_node.
>> Fix it by set node->next = NULL before we run the tw, so that when we
>> add req back to the wait queue in middle of tw running, we can safely
>> re-add it to the tw list.
> 
> It may get screwed before we get to "node->next = NULL;",
> 
> -> async_wake()
>    -> io_req_task_work_add()
> -> async_wake()
>    -> io_req_task_work_add()
> tctx_task_work()
True, this may happen if there is second poll wait entry.
This pacth is for single wait entry only..
I'm thinking about the second poll entry issue, would be in a separate
patch.
> 
> 
>> Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work")
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>
>>   fs/io_uring.c | 11 ++++++++---
>>   1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 30d959416eba..c16f6be3d46b 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1216,13 +1216,17 @@ static void io_fallback_req_func(struct work_struct *work)
>>   	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
>>   						fallback_work.work);
>>   	struct llist_node *node = llist_del_all(&ctx->fallback_llist);
>> -	struct io_kiocb *req, *tmp;
>> +	struct io_kiocb *req;
>>   	bool locked = false;
>>   
>>   	percpu_ref_get(&ctx->refs);
>> -	llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>> +	req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
>> +	while (member_address_is_nonnull(req, io_task_work.fallback_node)) {
>> +		node = req->io_task_work.fallback_node.next;
>> +		req->io_task_work.fallback_node.next = NULL;
>>   		req->io_task_work.func(req, &locked);
>> -
>> +		req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
>> +	}
>>   	if (locked) {
>>   		if (ctx->submit_state.compl_nr)
>>   			io_submit_flush_completions(ctx);
>> @@ -2126,6 +2130,7 @@ static void tctx_task_work(struct callback_head *cb)
>>   				locked = mutex_trylock(&ctx->uring_lock);
>>   				percpu_ref_get(&ctx->refs);
>>   			}
>> +			node->next = NULL;
>>   			req->io_task_work.func(req, &locked);
>>   			node = next;
>>   		} while (node);
>>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion
  2021-09-15  9:50   ` Pavel Begunkov
@ 2021-09-15 10:49     ` Hao Xu
  0 siblings, 0 replies; 11+ messages in thread
From: Hao Xu @ 2021-09-15 10:49 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/15 下午5:50, Pavel Begunkov 写道:
> On 9/12/21 5:23 PM, Hao Xu wrote:
>> If poll arming and poll completion runs parallelly, there maybe races.
>> For instance, run io_poll_add in iowq and io_poll_task_func in original
>> context, then:
>>               iowq                          original context
>>    io_poll_add
>>      vfs_poll
>>       (interruption happens
>>        tw queued to original
>>        context)                              io_poll_task_func
>>                                                generate cqe
>>                                                del from cancel_hash[]
>>      if !poll.done
>>        insert to cancel_hash[]
>>
>> The entry left in cancel_hash[], similar case for fast poll.
>> Fix it by set poll.done = true when del from cancel_hash[].
> 
> Sounds like a valid concern. And the code looks good, but one
> of two patches crashed the kernel somewhere in io_read().
> 
> ./232c93d07b74-test
I'll run it too, didn't meet that when I did the test.
> 
> will be retesting
> 
> 
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>
>> Didn't find the exact commit to add Fixes: for..
> 
> That's ok. Maybe you can find which kernel versions are
> affected? So we can add stable and specify where it should
> be backported? E.g.
Sure, will do that.
> 
> Cc: stable@vger.kernel.org # 5.12+
> 
> 
>>   fs/io_uring.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index c16f6be3d46b..988679e5063f 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5118,10 +5118,8 @@ static bool __io_poll_complete(struct io_kiocb *req, __poll_t mask)
>>   	}
>>   	if (req->poll.events & EPOLLONESHOT)
>>   		flags = 0;
>> -	if (!io_cqring_fill_event(ctx, req->user_data, error, flags)) {
>> -		req->poll.done = true;
>> +	if (!io_cqring_fill_event(ctx, req->user_data, error, flags))
>>   		flags = 0;
>> -	}
>>   	if (flags & IORING_CQE_F_MORE)
>>   		ctx->cq_extra++;
>>   
>> @@ -5152,6 +5150,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>>   		if (done) {
>>   			io_poll_remove_double(req);
>>   			hash_del(&req->hash_node);
>> +			req->poll.done = true;
>>   		} else {
>>   			req->result = 0;
>>   			add_wait_queue(req->poll.head, &req->poll.wait);
>> @@ -5289,6 +5288,7 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>   
>>   	hash_del(&req->hash_node);
>>   	io_poll_remove_double(req);
>> +	req->poll.done = true;
>>   	spin_unlock(&ctx->completion_lock);
>>   
>>   	if (!READ_ONCE(apoll->poll.canceled))
>>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion
  2021-09-15 10:12   ` Pavel Begunkov
@ 2021-09-15 10:50     ` Hao Xu
  0 siblings, 0 replies; 11+ messages in thread
From: Hao Xu @ 2021-09-15 10:50 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/15 下午6:12, Pavel Begunkov 写道:
> On 9/12/21 5:23 PM, Hao Xu wrote:
>> If poll arming and poll completion runs parallelly, there maybe races.
>> For instance, run io_poll_add in iowq and io_poll_task_func in original
>> context, then:
>>               iowq                          original context
>>    io_poll_add
>>      vfs_poll
>>       (interruption happens
>>        tw queued to original
>>        context)                              io_poll_task_func
>>                                                generate cqe
>>                                                del from cancel_hash[]
>>      if !poll.done
>>        insert to cancel_hash[]
>>
>> The entry left in cancel_hash[], similar case for fast poll.
>> Fix it by set poll.done = true when del from cancel_hash[].
>>
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>
>> Didn't find the exact commit to add Fixes: for..
>>
>>   fs/io_uring.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index c16f6be3d46b..988679e5063f 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5118,10 +5118,8 @@ static bool __io_poll_complete(struct io_kiocb *req, __poll_t mask)
>>   	}
>>   	if (req->poll.events & EPOLLONESHOT)
>>   		flags = 0;
>> -	if (!io_cqring_fill_event(ctx, req->user_data, error, flags)) {
>> -		req->poll.done = true;
>> +	if (!io_cqring_fill_event(ctx, req->user_data, error, flags))
>>   		flags = 0;
>> -	}
>>   	if (flags & IORING_CQE_F_MORE)
>>   		ctx->cq_extra++;
>>   
>> @@ -5152,6 +5150,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>>   		if (done) {
>>   			io_poll_remove_double(req);
>>   			hash_del(&req->hash_node);
>> +			req->poll.done = true;
>>   		} else {
>>   			req->result = 0;
>>   			add_wait_queue(req->poll.head, &req->poll.wait);
>> @@ -5289,6 +5288,7 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>   
>>   	hash_del(&req->hash_node);
>>   	io_poll_remove_double(req);
>> +	req->poll.done = true;
> 
> Only poll request has req->poll. E.g. it overwrites parts of req->rw.kiocb,
> I guess .ki_complete in particular.
> 
> struct async_poll *apoll = req->apoll;
> apoll->poll.done = true;
Thanks!
> 
> 
>>   	spin_unlock(&ctx->completion_lock);
>>   
>>   	if (!READ_ONCE(apoll->poll.canceled))
>>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list
  2021-09-15 10:48     ` Hao Xu
@ 2021-09-26  9:48       ` Hao Xu
  2021-09-29 11:16         ` Pavel Begunkov
  0 siblings, 1 reply; 11+ messages in thread
From: Hao Xu @ 2021-09-26  9:48 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/15 下午6:48, Hao Xu 写道:
> 在 2021/9/15 下午5:44, Pavel Begunkov 写道:
>> On 9/12/21 5:23 PM, Hao Xu wrote:
>>> For multishot mode, there may be cases like:
>>> io_poll_task_func()
>>> -> add_wait_queue()
>>>                              async_wake()
>>>                              ->io_req_task_work_add()
>>>                              this one mess up the running task_work list
>>>                              since req->io_task_work.node is in use.
>>>
>>> similar situation for req->io_task_work.fallback_node.
>>> Fix it by set node->next = NULL before we run the tw, so that when we
>>> add req back to the wait queue in middle of tw running, we can safely
>>> re-add it to the tw list.
>>
>> It may get screwed before we get to "node->next = NULL;",
>>
>> -> async_wake()
>>    -> io_req_task_work_add()
>> -> async_wake()
>>    -> io_req_task_work_add()
>> tctx_task_work()
> True, this may happen if there is second poll wait entry.
> This pacth is for single wait entry only..
> I'm thinking about the second poll entry issue, would be in a separate
> patch.
hmm, reviewed this email again and now I think I got what you were
saying, do you mean the second async_wake() triggered before we removed
the wait entry in the first async_wake(), like

async_wake
                           async_wake
->del wait entry

>>
>>
>>> Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work")
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>
>>>   fs/io_uring.c | 11 ++++++++---
>>>   1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 30d959416eba..c16f6be3d46b 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1216,13 +1216,17 @@ static void io_fallback_req_func(struct 
>>> work_struct *work)
>>>       struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
>>>                           fallback_work.work);
>>>       struct llist_node *node = llist_del_all(&ctx->fallback_llist);
>>> -    struct io_kiocb *req, *tmp;
>>> +    struct io_kiocb *req;
>>>       bool locked = false;
>>>       percpu_ref_get(&ctx->refs);
>>> -    llist_for_each_entry_safe(req, tmp, node, 
>>> io_task_work.fallback_node)
>>> +    req = llist_entry(node, struct io_kiocb, 
>>> io_task_work.fallback_node);
>>> +    while (member_address_is_nonnull(req, 
>>> io_task_work.fallback_node)) {
>>> +        node = req->io_task_work.fallback_node.next;
>>> +        req->io_task_work.fallback_node.next = NULL;
>>>           req->io_task_work.func(req, &locked);
>>> -
>>> +        req = llist_entry(node, struct io_kiocb, 
>>> io_task_work.fallback_node);
>>> +    }
>>>       if (locked) {
>>>           if (ctx->submit_state.compl_nr)
>>>               io_submit_flush_completions(ctx);
>>> @@ -2126,6 +2130,7 @@ static void tctx_task_work(struct callback_head 
>>> *cb)
>>>                   locked = mutex_trylock(&ctx->uring_lock);
>>>                   percpu_ref_get(&ctx->refs);
>>>               }
>>> +            node->next = NULL;
>>>               req->io_task_work.func(req, &locked);
>>>               node = next;
>>>           } while (node);
>>>
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list
  2021-09-26  9:48       ` Hao Xu
@ 2021-09-29 11:16         ` Pavel Begunkov
  0 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-09-29 11:16 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/26/21 10:48 AM, Hao Xu wrote:
> 在 2021/9/15 下午6:48, Hao Xu 写道:
>> 在 2021/9/15 下午5:44, Pavel Begunkov 写道:
>>> On 9/12/21 5:23 PM, Hao Xu wrote:
>>>> For multishot mode, there may be cases like:
>>>> io_poll_task_func()
>>>> -> add_wait_queue()
>>>>                              async_wake()
>>>>                              ->io_req_task_work_add()
>>>>                              this one mess up the running task_work list
>>>>                              since req->io_task_work.node is in use.
>>>>
>>>> similar situation for req->io_task_work.fallback_node.
>>>> Fix it by set node->next = NULL before we run the tw, so that when we
>>>> add req back to the wait queue in middle of tw running, we can safely
>>>> re-add it to the tw list.
>>>
>>> It may get screwed before we get to "node->next = NULL;",
>>>
>>> -> async_wake()
>>>    -> io_req_task_work_add()
>>> -> async_wake()
>>>    -> io_req_task_work_add()
>>> tctx_task_work()
>> True, this may happen if there is second poll wait entry.
>> This pacth is for single wait entry only..
>> I'm thinking about the second poll entry issue, would be in a separate
>> patch.
> hmm, reviewed this email again and now I think I got what you were
> saying, do you mean the second async_wake() triggered before we removed
> the wait entry in the first async_wake(), like
> 
> async_wake
>                           async_wake
> ->del wait entry

Looks we had different problems in mind, let's move the conversation to
the new thread with resent patches



>>>> Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work")
>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>> ---
>>>>
>>>>   fs/io_uring.c | 11 ++++++++---
>>>>   1 file changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 30d959416eba..c16f6be3d46b 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -1216,13 +1216,17 @@ static void io_fallback_req_func(struct work_struct *work)
>>>>       struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
>>>>                           fallback_work.work);
>>>>       struct llist_node *node = llist_del_all(&ctx->fallback_llist);
>>>> -    struct io_kiocb *req, *tmp;
>>>> +    struct io_kiocb *req;
>>>>       bool locked = false;
>>>>       percpu_ref_get(&ctx->refs);
>>>> -    llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>>> +    req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
>>>> +    while (member_address_is_nonnull(req, io_task_work.fallback_node)) {
>>>> +        node = req->io_task_work.fallback_node.next;
>>>> +        req->io_task_work.fallback_node.next = NULL;
>>>>           req->io_task_work.func(req, &locked);
>>>> -
>>>> +        req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node);
>>>> +    }
>>>>       if (locked) {
>>>>           if (ctx->submit_state.compl_nr)
>>>>               io_submit_flush_completions(ctx);
>>>> @@ -2126,6 +2130,7 @@ static void tctx_task_work(struct callback_head *cb)
>>>>                   locked = mutex_trylock(&ctx->uring_lock);
>>>>                   percpu_ref_get(&ctx->refs);
>>>>               }
>>>> +            node->next = NULL;
>>>>               req->io_task_work.func(req, &locked);
>>>>               node = next;
>>>>           } while (node);
>>>>
>>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-29 11:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-12 16:23 [PATCH 0/2] poll fixes Hao Xu
2021-09-12 16:23 ` [PATCH 1/2] io_uring: fix tw list mess-up by adding tw while it's already in tw list Hao Xu
2021-09-15  9:44   ` Pavel Begunkov
2021-09-15 10:48     ` Hao Xu
2021-09-26  9:48       ` Hao Xu
2021-09-29 11:16         ` Pavel Begunkov
2021-09-12 16:23 ` [PATCH 2/2] io_uring: fix race between poll completion and cancel_hash insertion Hao Xu
2021-09-15  9:50   ` Pavel Begunkov
2021-09-15 10:49     ` Hao Xu
2021-09-15 10:12   ` Pavel Begunkov
2021-09-15 10:50     ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).