All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] io_uring: don't use kiocb.private to store buf_index
@ 2020-05-19 21:52 Bijan Mottahedeh
  2020-05-19 21:52 ` [RFC 1/2] " Bijan Mottahedeh
  2020-05-19 21:52 ` [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported Bijan Mottahedeh
  0 siblings, 2 replies; 16+ messages in thread
From: Bijan Mottahedeh @ 2020-05-19 21:52 UTC (permalink / raw)
  To: axboe; +Cc: io-uring

This patch set addresses problems hit when running liburing
500f9fbadef8-test.

- Patch 1 is a suggested fix to overloading of kiocb.private since it
can be written by iomap_dio_rw().

io_import_iovec() can fail a submission as follows:

	/* buffer index only valid with fixed read/write, or buffer select  */
	if (req->rw.kiocb.private && !(req->flags & REQ_F_BUFFER_SELECT))
		return -EINVAL;

so a read request fails with -EINVAL upon retry if iomap_dio_rw() has
written to iocb->private:

       WRITE_ONCE(iocb->private, dio->submit.last_queue);

The suggested fix is use a separate variable to store buf_index.

- Patch 2 reverts c58c1f8343 which had changed the error for
REQ_NOWAIT requests to non-mq queue from -ENOTSUP to -EAGAIN.

	/*
	 * Non-mq queues do not honor REQ_NOWAIT, so complete a bio
	 * with BLK_STS_AGAIN status in order to catch -EAGAIN and
	 * to give a chance to the caller to repeat request gracefully.
	 */
	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q)) {
		status = BLK_STS_AGAIN;
		goto end_io;
	}

I'm not clear what the original reasoning was but io_wq_submit_work()
will call io_issue_sqe() continuously as long as -EAGAIN is returned.
I'm not sure if this could break something else.

I ran fio as specified in c58c1f8343 with this change and don't see any errors.

Bijan Mottahedeh (2):
  io_uring: don't use kiocb.private to store buf_index
  io_uring: mark REQ_NOWAIT for a non-mq queue as unspported

 block/blk-core.c | 10 +++-------
 fs/io_uring.c    | 15 +++++++--------
 2 files changed, 10 insertions(+), 15 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 1/2] io_uring: don't use kiocb.private to store buf_index
  2020-05-19 21:52 [RFC 0/2] io_uring: don't use kiocb.private to store buf_index Bijan Mottahedeh
@ 2020-05-19 21:52 ` Bijan Mottahedeh
  2020-05-19 22:07   ` Jens Axboe
  2020-05-19 21:52 ` [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported Bijan Mottahedeh
  1 sibling, 1 reply; 16+ messages in thread
From: Bijan Mottahedeh @ 2020-05-19 21:52 UTC (permalink / raw)
  To: axboe; +Cc: io-uring

kiocb.private is used in iomap_dio_rw() so store buf_index separately.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 fs/io_uring.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7a17418..9596859 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -401,6 +401,7 @@ struct io_rw {
 	struct kiocb			kiocb;
 	u64				addr;
 	u64				len;
+	u16				buf_index;
 };
 
 struct io_connect {
@@ -2122,9 +2123,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 
 	req->rw.addr = READ_ONCE(sqe->addr);
 	req->rw.len = READ_ONCE(sqe->len);
-	/* we own ->private, reuse it for the buffer index  / buffer ID */
-	req->rw.kiocb.private = (void *) (unsigned long)
-					READ_ONCE(sqe->buf_index);
+	req->rw.buf_index = READ_ONCE(sqe->buf_index);
 	return 0;
 }
 
@@ -2167,7 +2166,7 @@ static ssize_t io_import_fixed(struct io_kiocb *req, int rw,
 	struct io_ring_ctx *ctx = req->ctx;
 	size_t len = req->rw.len;
 	struct io_mapped_ubuf *imu;
-	unsigned index, buf_index;
+	u16 index, buf_index;
 	size_t offset;
 	u64 buf_addr;
 
@@ -2175,7 +2174,7 @@ static ssize_t io_import_fixed(struct io_kiocb *req, int rw,
 	if (unlikely(!ctx->user_bufs))
 		return -EFAULT;
 
-	buf_index = (unsigned long) req->rw.kiocb.private;
+	buf_index = req->rw.buf_index;
 	if (unlikely(buf_index >= ctx->nr_user_bufs))
 		return -EFAULT;
 
@@ -2291,10 +2290,10 @@ static void __user *io_rw_buffer_select(struct io_kiocb *req, size_t *len,
 					bool needs_lock)
 {
 	struct io_buffer *kbuf;
-	int bgid;
+	u16 bgid;
 
 	kbuf = (struct io_buffer *) (unsigned long) req->rw.addr;
-	bgid = (int) (unsigned long) req->rw.kiocb.private;
+	bgid = req->rw.buf_index;
 	kbuf = io_buffer_select(req, len, bgid, kbuf, needs_lock);
 	if (IS_ERR(kbuf))
 		return kbuf;
@@ -2385,7 +2384,7 @@ static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
 	}
 
 	/* buffer index only valid with fixed read/write, or buffer select  */
-	if (req->rw.kiocb.private && !(req->flags & REQ_F_BUFFER_SELECT))
+	if (req->rw.buf_index && !(req->flags & REQ_F_BUFFER_SELECT))
 		return -EINVAL;
 
 	if (opcode == IORING_OP_READ || opcode == IORING_OP_WRITE) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-19 21:52 [RFC 0/2] io_uring: don't use kiocb.private to store buf_index Bijan Mottahedeh
  2020-05-19 21:52 ` [RFC 1/2] " Bijan Mottahedeh
@ 2020-05-19 21:52 ` Bijan Mottahedeh
  2020-05-19 22:16   ` Fwd: " Jens Axboe
  2020-05-28 18:35   ` Jeff Moyer
  1 sibling, 2 replies; 16+ messages in thread
From: Bijan Mottahedeh @ 2020-05-19 21:52 UTC (permalink / raw)
  To: axboe; +Cc: io-uring

Mark a REQ_NOWAIT request for a non-mq queue as unspported instead of
retryable since otherwise the io_uring layer will keep resubmitting
the request.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 block/blk-core.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5847993..3807140 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -962,14 +962,10 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
 	}
 
 	/*
-	 * Non-mq queues do not honor REQ_NOWAIT, so complete a bio
-	 * with BLK_STS_AGAIN status in order to catch -EAGAIN and
-	 * to give a chance to the caller to repeat request gracefully.
+	 * Non-mq queues do not honor REQ_NOWAIT, return -EOPNOTSUPP.
 	 */
-	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q)) {
-		status = BLK_STS_AGAIN;
-		goto end_io;
-	}
+	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
+		goto not_supported;
 
 	if (should_fail_bio(bio))
 		goto end_io;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 1/2] io_uring: don't use kiocb.private to store buf_index
  2020-05-19 21:52 ` [RFC 1/2] " Bijan Mottahedeh
@ 2020-05-19 22:07   ` Jens Axboe
  2020-05-19 22:20     ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2020-05-19 22:07 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 5/19/20 3:52 PM, Bijan Mottahedeh wrote:
> kiocb.private is used in iomap_dio_rw() so store buf_index separately.

Hmm, that's no good, the owner of the iocb really should own ->private
as well.

The downside of this patch is that io_rw now spills into the next
cacheline, which propagates to io_kiocb as well. iocb has 4 bytes
of padding, but probably cleaner if we can stuff it into io_kiocb
instead. How about adding a u16 after opcode? There's a 2 byte
hole there, so it would not impact the size of io_kiocb.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Fwd: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-19 21:52 ` [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported Bijan Mottahedeh
@ 2020-05-19 22:16   ` Jens Axboe
  2020-05-20 17:50     ` Christoph Hellwig
  2020-05-28 18:35   ` Jeff Moyer
  1 sibling, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2020-05-19 22:16 UTC (permalink / raw)
  To: linux-block




-------- Forwarded Message --------
Subject: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
Date: Tue, 19 May 2020 14:52:50 -0700
From: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
To: axboe@kernel.dk
CC: io-uring@vger.kernel.org

Mark a REQ_NOWAIT request for a non-mq queue as unspported instead of
retryable since otherwise the io_uring layer will keep resubmitting
the request.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 block/blk-core.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5847993..3807140 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -962,14 +962,10 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
 	}
 
 	/*
-	 * Non-mq queues do not honor REQ_NOWAIT, so complete a bio
-	 * with BLK_STS_AGAIN status in order to catch -EAGAIN and
-	 * to give a chance to the caller to repeat request gracefully.
+	 * Non-mq queues do not honor REQ_NOWAIT, return -EOPNOTSUPP.
 	 */
-	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q)) {
-		status = BLK_STS_AGAIN;
-		goto end_io;
-	}
+	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
+		goto not_supported;
 
 	if (should_fail_bio(bio))
 		goto end_io;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 1/2] io_uring: don't use kiocb.private to store buf_index
  2020-05-19 22:07   ` Jens Axboe
@ 2020-05-19 22:20     ` Jens Axboe
  2020-05-19 22:48       ` Bijan Mottahedeh
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2020-05-19 22:20 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 5/19/20 4:07 PM, Jens Axboe wrote:
> On 5/19/20 3:52 PM, Bijan Mottahedeh wrote:
>> kiocb.private is used in iomap_dio_rw() so store buf_index separately.
> 
> Hmm, that's no good, the owner of the iocb really should own ->private
> as well.
> 
> The downside of this patch is that io_rw now spills into the next
> cacheline, which propagates to io_kiocb as well. iocb has 4 bytes
> of padding, but probably cleaner if we can stuff it into io_kiocb
> instead. How about adding a u16 after opcode? There's a 2 byte
> hole there, so it would not impact the size of io_kiocb.

I applied your patch, but moved the buf_index to not grow the
structure:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.7&id=4f4eeba87cc731b200bff9372d14a80f5996b277

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/2] io_uring: don't use kiocb.private to store buf_index
  2020-05-19 22:20     ` Jens Axboe
@ 2020-05-19 22:48       ` Bijan Mottahedeh
  0 siblings, 0 replies; 16+ messages in thread
From: Bijan Mottahedeh @ 2020-05-19 22:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

On 5/19/2020 3:20 PM, Jens Axboe wrote:
> On 5/19/20 4:07 PM, Jens Axboe wrote:
>> On 5/19/20 3:52 PM, Bijan Mottahedeh wrote:
>>> kiocb.private is used in iomap_dio_rw() so store buf_index separately.
>> Hmm, that's no good, the owner of the iocb really should own ->private
>> as well.
>>
>> The downside of this patch is that io_rw now spills into the next
>> cacheline, which propagates to io_kiocb as well. iocb has 4 bytes
>> of padding, but probably cleaner if we can stuff it into io_kiocb
>> instead. How about adding a u16 after opcode? There's a 2 byte
>> hole there, so it would not impact the size of io_kiocb.
> I applied your patch, but moved the buf_index to not grow the
> structure:
>
> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.7&id=4f4eeba87cc731b200bff9372d14a80f5996b277
>

That works.

Thanks!

--bijan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Fwd: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-19 22:16   ` Fwd: " Jens Axboe
@ 2020-05-20 17:50     ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2020-05-20 17:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Tue, May 19, 2020 at 04:16:45PM -0600, Jens Axboe wrote:
> 
> 
> 
> -------- Forwarded Message --------
> Subject: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
> Date: Tue, 19 May 2020 14:52:50 -0700
> From: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
> To: axboe@kernel.dk
> CC: io-uring@vger.kernel.org
> 
> Mark a REQ_NOWAIT request for a non-mq queue as unspported instead of
> retryable since otherwise the io_uring layer will keep resubmitting
> the request.

Someone needs to audit if this translates into the right errors message
for the syscalls, as they eventually need to return -EAGAIN
to userspace.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-19 21:52 ` [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported Bijan Mottahedeh
  2020-05-19 22:16   ` Fwd: " Jens Axboe
@ 2020-05-28 18:35   ` Jeff Moyer
  2020-05-28 19:01     ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2020-05-28 18:35 UTC (permalink / raw)
  To: axboe, Bijan Mottahedeh; +Cc: io-uring

Bijan Mottahedeh <bijan.mottahedeh@oracle.com> writes:

> Mark a REQ_NOWAIT request for a non-mq queue as unspported instead of
> retryable since otherwise the io_uring layer will keep resubmitting
> the request.

Getting back to this...

Jens, right now (using your io_uring-5.7 or linus' tree) fio's
t/io_uring will never get io completions when run against a file on a
file system that is backed by lvm.  The system will have one workqueue
per sqe submitted, all spinning, eating up CPU time.

# ./t/io_uring /mnt/test/poo 
Added file /mnt/test/poo
sq_ring ptr = 0x0x7fbed40ae000
sqes ptr    = 0x0x7fbed40ac000
cq_ring ptr = 0x0x7fbed40aa000
polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
submitter=3851
IOPS=128, IOS/call=6/0, inflight=128 (128)
IOPS=0, IOS/call=0/0, inflight=128 (128)
IOPS=0, IOS/call=0/0, inflight=128 (128)
IOPS=0, IOS/call=0/0, inflight=128 (128)
IOPS=0, IOS/call=0/0, inflight=128 (128)
IOPS=0, IOS/call=0/0, inflight=128 (128)
...

# ps auxw | grep io_wqe
root      3849 80.1  0.0      0     0 ?        R    14:32   0:40 [io_wqe_worker-0]
root      3850  0.0  0.0      0     0 ?        S    14:32   0:00 [io_wqe_worker-0]
root      3853 72.8  0.0      0     0 ?        R    14:32   0:36 [io_wqe_worker-0]
root      3854 81.4  0.0      0     0 ?        R    14:32   0:40 [io_wqe_worker-1]
root      3855 74.8  0.0      0     0 ?        R    14:32   0:37 [io_wqe_worker-0]
root      3856 74.8  0.0      0     0 ?        R    14:32   0:37 [io_wqe_worker-1]
...

# ps auxw | grep io_wqe | grep -v grep | wc -l
129

With this patch applied, the test program will exit without doing I/O
(which I don't think is the right behavior either, right?):

# t/io_uring /mnt/test/poo
Added file /mnt/test/poo
sq_ring ptr = 0x0x7fdb98f00000
sqes ptr    = 0x0x7fdb98efe000
cq_ring ptr = 0x0x7fdb98efc000
polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
submitter=33233
io: unexpected ret=-95
Your filesystem/driver/kernel doesn't support polled IO
IOPS=128, IOS/call=32/0, inflight=128 (127)

/mnt/test is an xfs file system on top of a linear LVM volume on an nvme
device (with 8 poll queues configured).

-Jeff

>
> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
> ---
>  block/blk-core.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5847993..3807140 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -962,14 +962,10 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
>  	}
>  
>  	/*
> -	 * Non-mq queues do not honor REQ_NOWAIT, so complete a bio
> -	 * with BLK_STS_AGAIN status in order to catch -EAGAIN and
> -	 * to give a chance to the caller to repeat request gracefully.
> +	 * Non-mq queues do not honor REQ_NOWAIT, return -EOPNOTSUPP.
>  	 */
> -	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q)) {
> -		status = BLK_STS_AGAIN;
> -		goto end_io;
> -	}
> +	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
> +		goto not_supported;
>  
>  	if (should_fail_bio(bio))
>  		goto end_io;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-28 18:35   ` Jeff Moyer
@ 2020-05-28 19:01     ` Jens Axboe
  2020-05-28 19:22       ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2020-05-28 19:01 UTC (permalink / raw)
  To: Jeff Moyer, Bijan Mottahedeh; +Cc: io-uring

On 5/28/20 12:35 PM, Jeff Moyer wrote:
> Bijan Mottahedeh <bijan.mottahedeh@oracle.com> writes:
> 
>> Mark a REQ_NOWAIT request for a non-mq queue as unspported instead of
>> retryable since otherwise the io_uring layer will keep resubmitting
>> the request.
> 
> Getting back to this...
> 
> Jens, right now (using your io_uring-5.7 or linus' tree) fio's
> t/io_uring will never get io completions when run against a file on a
> file system that is backed by lvm.  The system will have one workqueue
> per sqe submitted, all spinning, eating up CPU time.
> 
> # ./t/io_uring /mnt/test/poo 
> Added file /mnt/test/poo
> sq_ring ptr = 0x0x7fbed40ae000
> sqes ptr    = 0x0x7fbed40ac000
> cq_ring ptr = 0x0x7fbed40aa000
> polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
> submitter=3851
> IOPS=128, IOS/call=6/0, inflight=128 (128)
> IOPS=0, IOS/call=0/0, inflight=128 (128)
> IOPS=0, IOS/call=0/0, inflight=128 (128)
> IOPS=0, IOS/call=0/0, inflight=128 (128)
> IOPS=0, IOS/call=0/0, inflight=128 (128)
> IOPS=0, IOS/call=0/0, inflight=128 (128)
> ...
> 
> # ps auxw | grep io_wqe
> root      3849 80.1  0.0      0     0 ?        R    14:32   0:40 [io_wqe_worker-0]
> root      3850  0.0  0.0      0     0 ?        S    14:32   0:00 [io_wqe_worker-0]
> root      3853 72.8  0.0      0     0 ?        R    14:32   0:36 [io_wqe_worker-0]
> root      3854 81.4  0.0      0     0 ?        R    14:32   0:40 [io_wqe_worker-1]
> root      3855 74.8  0.0      0     0 ?        R    14:32   0:37 [io_wqe_worker-0]
> root      3856 74.8  0.0      0     0 ?        R    14:32   0:37 [io_wqe_worker-1]
> ...
> 
> # ps auxw | grep io_wqe | grep -v grep | wc -l
> 129
> 
> With this patch applied, the test program will exit without doing I/O
> (which I don't think is the right behavior either, right?):
> 
> # t/io_uring /mnt/test/poo
> Added file /mnt/test/poo
> sq_ring ptr = 0x0x7fdb98f00000
> sqes ptr    = 0x0x7fdb98efe000
> cq_ring ptr = 0x0x7fdb98efc000
> polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
> submitter=33233
> io: unexpected ret=-95
> Your filesystem/driver/kernel doesn't support polled IO
> IOPS=128, IOS/call=32/0, inflight=128 (127)
> 
> /mnt/test is an xfs file system on top of a linear LVM volume on an nvme
> device (with 8 poll queues configured).

poll won't work over dm, so that looks correct. What happens if you edit
it and disable poll? Would be curious to see both buffered = 0 and
buffered = 1 runs with that.

I'll try this here too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-28 19:01     ` Jens Axboe
@ 2020-05-28 19:22       ` Jens Axboe
  2020-05-28 19:31         ` Jeff Moyer
  2020-05-28 22:12         ` Jeff Moyer
  0 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2020-05-28 19:22 UTC (permalink / raw)
  To: Jeff Moyer, Bijan Mottahedeh; +Cc: io-uring

On 5/28/20 1:01 PM, Jens Axboe wrote:
> On 5/28/20 12:35 PM, Jeff Moyer wrote:
>> Bijan Mottahedeh <bijan.mottahedeh@oracle.com> writes:
>>
>>> Mark a REQ_NOWAIT request for a non-mq queue as unspported instead of
>>> retryable since otherwise the io_uring layer will keep resubmitting
>>> the request.
>>
>> Getting back to this...
>>
>> Jens, right now (using your io_uring-5.7 or linus' tree) fio's
>> t/io_uring will never get io completions when run against a file on a
>> file system that is backed by lvm.  The system will have one workqueue
>> per sqe submitted, all spinning, eating up CPU time.
>>
>> # ./t/io_uring /mnt/test/poo 
>> Added file /mnt/test/poo
>> sq_ring ptr = 0x0x7fbed40ae000
>> sqes ptr    = 0x0x7fbed40ac000
>> cq_ring ptr = 0x0x7fbed40aa000
>> polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
>> submitter=3851
>> IOPS=128, IOS/call=6/0, inflight=128 (128)
>> IOPS=0, IOS/call=0/0, inflight=128 (128)
>> IOPS=0, IOS/call=0/0, inflight=128 (128)
>> IOPS=0, IOS/call=0/0, inflight=128 (128)
>> IOPS=0, IOS/call=0/0, inflight=128 (128)
>> IOPS=0, IOS/call=0/0, inflight=128 (128)
>> ...
>>
>> # ps auxw | grep io_wqe
>> root      3849 80.1  0.0      0     0 ?        R    14:32   0:40 [io_wqe_worker-0]
>> root      3850  0.0  0.0      0     0 ?        S    14:32   0:00 [io_wqe_worker-0]
>> root      3853 72.8  0.0      0     0 ?        R    14:32   0:36 [io_wqe_worker-0]
>> root      3854 81.4  0.0      0     0 ?        R    14:32   0:40 [io_wqe_worker-1]
>> root      3855 74.8  0.0      0     0 ?        R    14:32   0:37 [io_wqe_worker-0]
>> root      3856 74.8  0.0      0     0 ?        R    14:32   0:37 [io_wqe_worker-1]
>> ...
>>
>> # ps auxw | grep io_wqe | grep -v grep | wc -l
>> 129
>>
>> With this patch applied, the test program will exit without doing I/O
>> (which I don't think is the right behavior either, right?):
>>
>> # t/io_uring /mnt/test/poo
>> Added file /mnt/test/poo
>> sq_ring ptr = 0x0x7fdb98f00000
>> sqes ptr    = 0x0x7fdb98efe000
>> cq_ring ptr = 0x0x7fdb98efc000
>> polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
>> submitter=33233
>> io: unexpected ret=-95
>> Your filesystem/driver/kernel doesn't support polled IO
>> IOPS=128, IOS/call=32/0, inflight=128 (127)
>>
>> /mnt/test is an xfs file system on top of a linear LVM volume on an nvme
>> device (with 8 poll queues configured).
> 
> poll won't work over dm, so that looks correct. What happens if you edit
> it and disable poll? Would be curious to see both buffered = 0 and
> buffered = 1 runs with that.
> 
> I'll try this here too.

I checked, and with the offending commit reverted, it behaves exactly
like it should - io_uring doesn't hit endless retries, and we still
return -EAGAIN to userspace for preadv2(..., RFW_NOWAIT) if not supported.
I've queued up the revert.

Jeff, the poll test above is supposed to fail as we can't poll on dm.
So that part is expected.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-28 19:22       ` Jens Axboe
@ 2020-05-28 19:31         ` Jeff Moyer
  2020-05-28 22:12         ` Jeff Moyer
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Moyer @ 2020-05-28 19:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bijan Mottahedeh, io-uring

Jens Axboe <axboe@kernel.dk> writes:

> I checked, and with the offending commit reverted, it behaves exactly
> like it should - io_uring doesn't hit endless retries, and we still
> return -EAGAIN to userspace for preadv2(..., RFW_NOWAIT) if not supported.
> I've queued up the revert.
>
> Jeff, the poll test above is supposed to fail as we can't poll on dm.
> So that part is expected.

OK, works for me.  Thanks for the quick turnaround.

-Jeff


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-28 19:22       ` Jens Axboe
  2020-05-28 19:31         ` Jeff Moyer
@ 2020-05-28 22:12         ` Jeff Moyer
  2020-05-28 23:03           ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2020-05-28 22:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bijan Mottahedeh, io-uring

Jens Axboe <axboe@kernel.dk> writes:

>> poll won't work over dm, so that looks correct. What happens if you edit
>> it and disable poll? Would be curious to see both buffered = 0 and
>> buffered = 1 runs with that.
>> 
>> I'll try this here too.
>
> I checked, and with the offending commit reverted, it behaves exactly
> like it should - io_uring doesn't hit endless retries, and we still
> return -EAGAIN to userspace for preadv2(..., RFW_NOWAIT) if not supported.
> I've queued up the revert.

With that revert, I now see an issue with an xfs file system on top of
an nvme device when running the liburing test suite:

Running test 500f9fbadef8-test
Test 500f9fbadef8-test failed with ret 130

That means the test harness timed out, so we never received a
completion.

-Jeff


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-28 22:12         ` Jeff Moyer
@ 2020-05-28 23:03           ` Jens Axboe
  2020-05-29 15:02             ` Jeff Moyer
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2020-05-28 23:03 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Bijan Mottahedeh, io-uring

On 5/28/20 4:12 PM, Jeff Moyer wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>>> poll won't work over dm, so that looks correct. What happens if you edit
>>> it and disable poll? Would be curious to see both buffered = 0 and
>>> buffered = 1 runs with that.
>>>
>>> I'll try this here too.
>>
>> I checked, and with the offending commit reverted, it behaves exactly
>> like it should - io_uring doesn't hit endless retries, and we still
>> return -EAGAIN to userspace for preadv2(..., RFW_NOWAIT) if not supported.
>> I've queued up the revert.
> 
> With that revert, I now see an issue with an xfs file system on top of
> an nvme device when running the liburing test suite:
> 
> Running test 500f9fbadef8-test
> Test 500f9fbadef8-test failed with ret 130
> 
> That means the test harness timed out, so we never received a
> completion.

I can't reproduce this. Can you try again, and enable io_uring tracing?

# echo 1 > /sys/kernel/debug/tracing/events/io_uring/enable

run test

send the 'trace' file, or take a look and see what is going on.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-28 23:03           ` Jens Axboe
@ 2020-05-29 15:02             ` Jeff Moyer
  2020-05-29 16:20               ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2020-05-29 15:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bijan Mottahedeh, io-uring

Jens Axboe <axboe@kernel.dk> writes:

> On 5/28/20 4:12 PM, Jeff Moyer wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>> 
>>>> poll won't work over dm, so that looks correct. What happens if you edit
>>>> it and disable poll? Would be curious to see both buffered = 0 and
>>>> buffered = 1 runs with that.
>>>>
>>>> I'll try this here too.
>>>
>>> I checked, and with the offending commit reverted, it behaves exactly
>>> like it should - io_uring doesn't hit endless retries, and we still
>>> return -EAGAIN to userspace for preadv2(..., RFW_NOWAIT) if not supported.
>>> I've queued up the revert.
>> 
>> With that revert, I now see an issue with an xfs file system on top of
>> an nvme device when running the liburing test suite:
>> 
>> Running test 500f9fbadef8-test
>> Test 500f9fbadef8-test failed with ret 130
>> 
>> That means the test harness timed out, so we never received a
>> completion.
>
> I can't reproduce this. Can you try again, and enable io_uring tracing?
>
> # echo 1 > /sys/kernel/debug/tracing/events/io_uring/enable
>
> run test
>
> send the 'trace' file, or take a look and see what is going on.

I took a look, and it appeared as though the issue was not in the
kernel.  My liburing was not uptodate, and after grabbing the latest,
the test runs to completion.

Thanks!
Jeff


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported
  2020-05-29 15:02             ` Jeff Moyer
@ 2020-05-29 16:20               ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2020-05-29 16:20 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Bijan Mottahedeh, io-uring

On 5/29/20 9:02 AM, Jeff Moyer wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 5/28/20 4:12 PM, Jeff Moyer wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>>> poll won't work over dm, so that looks correct. What happens if you edit
>>>>> it and disable poll? Would be curious to see both buffered = 0 and
>>>>> buffered = 1 runs with that.
>>>>>
>>>>> I'll try this here too.
>>>>
>>>> I checked, and with the offending commit reverted, it behaves exactly
>>>> like it should - io_uring doesn't hit endless retries, and we still
>>>> return -EAGAIN to userspace for preadv2(..., RFW_NOWAIT) if not supported.
>>>> I've queued up the revert.
>>>
>>> With that revert, I now see an issue with an xfs file system on top of
>>> an nvme device when running the liburing test suite:
>>>
>>> Running test 500f9fbadef8-test
>>> Test 500f9fbadef8-test failed with ret 130
>>>
>>> That means the test harness timed out, so we never received a
>>> completion.
>>
>> I can't reproduce this. Can you try again, and enable io_uring tracing?
>>
>> # echo 1 > /sys/kernel/debug/tracing/events/io_uring/enable
>>
>> run test
>>
>> send the 'trace' file, or take a look and see what is going on.
> 
> I took a look, and it appeared as though the issue was not in the
> kernel.  My liburing was not uptodate, and after grabbing the latest,
> the test runs to completion.

OK good, that's a relief!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-05-29 16:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 21:52 [RFC 0/2] io_uring: don't use kiocb.private to store buf_index Bijan Mottahedeh
2020-05-19 21:52 ` [RFC 1/2] " Bijan Mottahedeh
2020-05-19 22:07   ` Jens Axboe
2020-05-19 22:20     ` Jens Axboe
2020-05-19 22:48       ` Bijan Mottahedeh
2020-05-19 21:52 ` [RFC 2/2] io_uring: mark REQ_NOWAIT for a non-mq queue as unspported Bijan Mottahedeh
2020-05-19 22:16   ` Fwd: " Jens Axboe
2020-05-20 17:50     ` Christoph Hellwig
2020-05-28 18:35   ` Jeff Moyer
2020-05-28 19:01     ` Jens Axboe
2020-05-28 19:22       ` Jens Axboe
2020-05-28 19:31         ` Jeff Moyer
2020-05-28 22:12         ` Jeff Moyer
2020-05-28 23:03           ` Jens Axboe
2020-05-29 15:02             ` Jeff Moyer
2020-05-29 16:20               ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.