io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Polled I/O cannot find completions
@ 2020-03-27  2:57 Bijan Mottahedeh
  2020-03-27 15:36 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-03-27  2:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

I'm seeing poll threads hang as I increase the number of threads in 
polled fio tests.  I think this is because of polling on BLK_QC_T_NONE 
cookie, which will never succeed.

A related problem however, is that the meaning of BLK_QC_T_NONE seems to 
be ambiguous.

Specifically, the following cases return BLK_QC_T_NONE which I think 
would be problematic for polled io:


generic_make_request()
...
         if (current->bio_list) {
                 bio_list_add(&current->bio_list[0], bio);
                 goto out;
         }

In this case the request is delayed but should get a cookie eventually.  
How does the caller know what the right action is in this case for a 
polled request?  Polling would never succeed.


__blk_mq_issue_directly()
...
         case BLK_STS_RESOURCE:
         case BLK_STS_DEV_RESOURCE:
                 blk_mq_update_dispatch_busy(hctx, true);
                 __blk_mq_requeue_request(rq);
                 break;

In this case, cookie is not updated and would keep its default 
BLK_QC_T_NONE value from blk_mq_make_request().  However, this request 
will eventually be reissued, so again, how would the caller poll for the 
completion of this request?

blk_mq_try_issue_directly()
...
         ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
         if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
                 blk_mq_request_bypass_insert(rq, false, true);

Am I missing something here?

Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?

Thanks.

--bijan





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Polled I/O cannot find completions
  2020-03-27  2:57 Polled I/O cannot find completions Bijan Mottahedeh
@ 2020-03-27 15:36 ` Jens Axboe
  2020-03-27 16:31   ` Bijan Mottahedeh
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2020-03-27 15:36 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring, linux-block

CC'ing linux-block, this isn't an io_uring issue.


On 3/26/20 8:57 PM, Bijan Mottahedeh wrote:
> I'm seeing poll threads hang as I increase the number of threads in 
> polled fio tests.  I think this is because of polling on BLK_QC_T_NONE 
> cookie, which will never succeed.
> 
> A related problem however, is that the meaning of BLK_QC_T_NONE seems to 
> be ambiguous.
> 
> Specifically, the following cases return BLK_QC_T_NONE which I think 
> would be problematic for polled io:
> 
> 
> generic_make_request()
> ...
>          if (current->bio_list) {
>                  bio_list_add(&current->bio_list[0], bio);
>                  goto out;
>          }
> 
> In this case the request is delayed but should get a cookie eventually.  
> How does the caller know what the right action is in this case for a 
> polled request?  Polling would never succeed.
> 
> 
> __blk_mq_issue_directly()
> ...
>          case BLK_STS_RESOURCE:
>          case BLK_STS_DEV_RESOURCE:
>                  blk_mq_update_dispatch_busy(hctx, true);
>                  __blk_mq_requeue_request(rq);
>                  break;
> 
> In this case, cookie is not updated and would keep its default 
> BLK_QC_T_NONE value from blk_mq_make_request().  However, this request 
> will eventually be reissued, so again, how would the caller poll for the 
> completion of this request?
> 
> blk_mq_try_issue_directly()
> ...
>          ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
>          if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
>                  blk_mq_request_bypass_insert(rq, false, true);
> 
> Am I missing something here?
> 
> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?
> 
> Thanks.
> 
> --bijan
> 
> 
> 
> 


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Polled I/O cannot find completions
  2020-03-27 15:36 ` Jens Axboe
@ 2020-03-27 16:31   ` Bijan Mottahedeh
  2020-03-27 16:35     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-03-27 16:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-block

Does io_uring though have to deal with BLK_QC_T_NONE at all?  Or are you 
saying that it should never receive that result?
That's one of the things I'm not clear about.

--bijan

> CC'ing linux-block, this isn't an io_uring issue.
>
>
> On 3/26/20 8:57 PM, Bijan Mottahedeh wrote:
>> I'm seeing poll threads hang as I increase the number of threads in
>> polled fio tests.  I think this is because of polling on BLK_QC_T_NONE
>> cookie, which will never succeed.
>>
>> A related problem however, is that the meaning of BLK_QC_T_NONE seems to
>> be ambiguous.
>>
>> Specifically, the following cases return BLK_QC_T_NONE which I think
>> would be problematic for polled io:
>>
>>
>> generic_make_request()
>> ...
>>           if (current->bio_list) {
>>                   bio_list_add(&current->bio_list[0], bio);
>>                   goto out;
>>           }
>>
>> In this case the request is delayed but should get a cookie eventually.
>> How does the caller know what the right action is in this case for a
>> polled request?  Polling would never succeed.
>>
>>
>> __blk_mq_issue_directly()
>> ...
>>           case BLK_STS_RESOURCE:
>>           case BLK_STS_DEV_RESOURCE:
>>                   blk_mq_update_dispatch_busy(hctx, true);
>>                   __blk_mq_requeue_request(rq);
>>                   break;
>>
>> In this case, cookie is not updated and would keep its default
>> BLK_QC_T_NONE value from blk_mq_make_request().  However, this request
>> will eventually be reissued, so again, how would the caller poll for the
>> completion of this request?
>>
>> blk_mq_try_issue_directly()
>> ...
>>           ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
>>           if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
>>                   blk_mq_request_bypass_insert(rq, false, true);
>>
>> Am I missing something here?
>>
>> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?
>>
>> Thanks.
>>
>> --bijan
>>
>>
>>
>>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Polled I/O cannot find completions
  2020-03-27 16:31   ` Bijan Mottahedeh
@ 2020-03-27 16:35     ` Jens Axboe
  2020-03-31 18:43       ` Bijan Mottahedeh
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2020-03-27 16:35 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring, linux-block

On 3/27/20 10:31 AM, Bijan Mottahedeh wrote:
> Does io_uring though have to deal with BLK_QC_T_NONE at all?  Or are you 
> saying that it should never receive that result?
> That's one of the things I'm not clear about.

BLK_QC_T_* are block cookies, they are only valid in the block layer.
Only the poll handler called should have to deal with them, inside
their f_op->iopoll() handler. It's simply passed from the queue to
the poll side.

So no, io_uring shouldn't have to deal with them at all.

The problem, as I see it, is if the block layer returns BLK_QC_T_NONE
and the IO was actually queued and requires polling to be found. We'd
end up with IO timeouts for handling those requests, and that's not a
good thing...

>> On 3/26/20 8:57 PM, Bijan Mottahedeh wrote:
>>> I'm seeing poll threads hang as I increase the number of threads in
>>> polled fio tests.  I think this is because of polling on BLK_QC_T_NONE
>>> cookie, which will never succeed.
>>>
>>> A related problem however, is that the meaning of BLK_QC_T_NONE seems to
>>> be ambiguous.
>>>
>>> Specifically, the following cases return BLK_QC_T_NONE which I think
>>> would be problematic for polled io:
>>>
>>>
>>> generic_make_request()
>>> ...
>>>           if (current->bio_list) {
>>>                   bio_list_add(&current->bio_list[0], bio);
>>>                   goto out;
>>>           }
>>>
>>> In this case the request is delayed but should get a cookie eventually.
>>> How does the caller know what the right action is in this case for a
>>> polled request?  Polling would never succeed.
>>>
>>>
>>> __blk_mq_issue_directly()
>>> ...
>>>           case BLK_STS_RESOURCE:
>>>           case BLK_STS_DEV_RESOURCE:
>>>                   blk_mq_update_dispatch_busy(hctx, true);
>>>                   __blk_mq_requeue_request(rq);
>>>                   break;
>>>
>>> In this case, cookie is not updated and would keep its default
>>> BLK_QC_T_NONE value from blk_mq_make_request().  However, this request
>>> will eventually be reissued, so again, how would the caller poll for the
>>> completion of this request?
>>>
>>> blk_mq_try_issue_directly()
>>> ...
>>>           ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
>>>           if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
>>>                   blk_mq_request_bypass_insert(rq, false, true);
>>>
>>> Am I missing something here?
>>>
>>> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?

Pretty sure that's a leftover from when the attempts was made to pass
back -EAGAIN inline instead of through the bio end_io handler.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Polled I/O cannot find completions
  2020-03-27 16:35     ` Jens Axboe
@ 2020-03-31 18:43       ` Bijan Mottahedeh
  2020-04-01  1:01         ` Bijan Mottahedeh
  0 siblings, 1 reply; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-03-31 18:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-block


>> Does io_uring though have to deal with BLK_QC_T_NONE at all?  Or are you
>> saying that it should never receive that result?
>> That's one of the things I'm not clear about.
> BLK_QC_T_* are block cookies, they are only valid in the block layer.
> Only the poll handler called should have to deal with them, inside
> their f_op->iopoll() handler. It's simply passed from the queue to
> the poll side.
>
> So no, io_uring shouldn't have to deal with them at all.
>
> The problem, as I see it, is if the block layer returns BLK_QC_T_NONE
> and the IO was actually queued and requires polling to be found. We'd
> end up with IO timeouts for handling those requests, and that's not a
> good thing...

I see requests in io_do_iopoll() on poll_list with req->res == -EAGAIN, 
I think because the completion happened after an issued request was 
added to poll_list in io_iopoll_req_issued().

How should we deal with such a request, reissue unconditionally or 
something else?

--bijan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Polled I/O cannot find completions
  2020-03-31 18:43       ` Bijan Mottahedeh
@ 2020-04-01  1:01         ` Bijan Mottahedeh
  0 siblings, 0 replies; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-04-01  1:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-block

On 3/31/2020 11:43 AM, Bijan Mottahedeh wrote:
>
>>> Does io_uring though have to deal with BLK_QC_T_NONE at all?  Or are 
>>> you
>>> saying that it should never receive that result?
>>> That's one of the things I'm not clear about.
>> BLK_QC_T_* are block cookies, they are only valid in the block layer.
>> Only the poll handler called should have to deal with them, inside
>> their f_op->iopoll() handler. It's simply passed from the queue to
>> the poll side.
>>
>> So no, io_uring shouldn't have to deal with them at all.
>>
>> The problem, as I see it, is if the block layer returns BLK_QC_T_NONE
>> and the IO was actually queued and requires polling to be found. We'd
>> end up with IO timeouts for handling those requests, and that's not a
>> good thing...
>
> I see requests in io_do_iopoll() on poll_list with req->res == 
> -EAGAIN, I think because the completion happened after an issued 
> request was added to poll_list in io_iopoll_req_issued().
>
> How should we deal with such a request, reissue unconditionally or 
> something else?
>

I mimicked the done processing code in io_iopoll_complete() for -EAGAIN 
as a test.  I can now get further and don't see polling threads hang; in 
fact, I eventually see I/O timeouts as you noted.

It seems that there might be two separate issues here. Makes sense?

Thanks.

--bijan

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 62bd410..a3e3a4e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1738,11 +1738,24 @@ static void io_iopoll_complete(struct 
io_ring_ctx *ctx,
         io_free_req_many(ctx, &rb);
  }

+static void io_iopoll_queue(struct list_head *again)
+{
+       struct io_kiocb *req;
+
+       while (!list_empty(again)) {
+               req = list_first_entry(again, struct io_kiocb, list);
+               list_del(&req->list);
+               refcount_inc(&req->refs);
+               io_queue_async_work(req);
+       }
+}
+
  static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
                         long min)
  {
         struct io_kiocb *req, *tmp;
         LIST_HEAD(done);
+       LIST_HEAD(again);
         bool spin;
         int ret;

@@ -1757,9 +1770,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, 
unsigned
                 struct kiocb *kiocb = &req->rw.kiocb;

                 /*
-                * Move completed entries to our local list. If we find a
-                * request that requires polling, break out and complete
-                * the done list first, if we have entries there.
+                * Move completed and retryable entries to our local lists.
+                * If we find a request that requires polling, break out
+                * and complete those lists first, if we have entries there.
                  */
                 if (req->flags & REQ_F_IOPOLL_COMPLETED) {
                         list_move_tail(&req->list, &done);
@@ -1768,6 +1781,13 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, 
unsigned
                 if (!list_empty(&done))
                         break;

+               if (req->result == -EAGAIN) {
+                       list_move_tail(&req->list, &again);
+                       continue;
+               }
+               if (!list_empty(&again))
+                       break;
+
                 ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin);
                 if (ret < 0)
                         break;
@@ -1780,6 +1800,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, 
unsigned
         if (!list_empty(&done))
                 io_iopoll_complete(ctx, nr_events, &done);

+       if (!list_empty(&again))
+               io_iopoll_queue(&again);
+
         return ret;
  }



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-01  1:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27  2:57 Polled I/O cannot find completions Bijan Mottahedeh
2020-03-27 15:36 ` Jens Axboe
2020-03-27 16:31   ` Bijan Mottahedeh
2020-03-27 16:35     ` Jens Axboe
2020-03-31 18:43       ` Bijan Mottahedeh
2020-04-01  1:01         ` Bijan Mottahedeh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).