All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 5.13 v2] io_uring: maintain drain requests' logic
@ 2021-04-07 11:23 Hao Xu
  2021-04-07 11:23 ` [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests Hao Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Hao Xu @ 2021-04-07 11:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

more tests comming, send this out first for comments.

Hao Xu (3):
  io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
  io_uring: maintain drain logic for multishot requests
  io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD

 fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
 include/uapi/linux/io_uring.h |  8 +++-----
 2 files changed, 32 insertions(+), 10 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
  2021-04-07 11:23 [PATCH 5.13 v2] io_uring: maintain drain requests' logic Hao Xu
@ 2021-04-07 11:23 ` Hao Xu
  2021-04-07 11:38   ` Pavel Begunkov
  2021-04-07 11:23 ` [PATCH 2/3] io_uring: maintain drain logic " Hao Xu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-07 11:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Since we now have requests that may generate multiple cqes, we need a
new flag to mark them, so that we can maintain features like drain io
easily for them.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c                 | 5 ++++-
 include/uapi/linux/io_uring.h | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 81e5d156af1c..192463bb977a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -102,7 +102,7 @@
 
 #define SQE_VALID_FLAGS	(IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK|	\
 				IOSQE_IO_HARDLINK | IOSQE_ASYNC | \
-				IOSQE_BUFFER_SELECT)
+				IOSQE_BUFFER_SELECT | IOSQE_MULTI_CQES)
 
 struct io_uring {
 	u32 head ____cacheline_aligned_in_smp;
@@ -700,6 +700,7 @@ enum {
 	REQ_F_HARDLINK_BIT	= IOSQE_IO_HARDLINK_BIT,
 	REQ_F_FORCE_ASYNC_BIT	= IOSQE_ASYNC_BIT,
 	REQ_F_BUFFER_SELECT_BIT	= IOSQE_BUFFER_SELECT_BIT,
+	REQ_F_MULTI_CQES_BIT	= IOSQE_MULTI_CQES_BIT,
 
 	REQ_F_FAIL_LINK_BIT,
 	REQ_F_INFLIGHT_BIT,
@@ -766,6 +767,8 @@ enum {
 	REQ_F_ASYNC_WRITE	= BIT(REQ_F_ASYNC_WRITE_BIT),
 	/* regular file */
 	REQ_F_ISREG		= BIT(REQ_F_ISREG_BIT),
+	/* a request can generate multiple cqes */
+	REQ_F_MULTI_CQES	= BIT(REQ_F_MULTI_CQES_BIT),
 };
 
 struct async_poll {
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 5beaa6bbc6db..303ac8005572 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -70,6 +70,7 @@ enum {
 	IOSQE_IO_HARDLINK_BIT,
 	IOSQE_ASYNC_BIT,
 	IOSQE_BUFFER_SELECT_BIT,
+	IOSQE_MULTI_CQES_BIT,
 };
 
 /*
@@ -87,6 +88,8 @@ enum {
 #define IOSQE_ASYNC		(1U << IOSQE_ASYNC_BIT)
 /* select buffer from sqe->buf_group */
 #define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
+/* may generate multiple cqes */
+#define IOSQE_MULTI_CQES	(1U << IOSQE_MULTI_CQES_BIT)
 
 /*
  * io_uring_setup() flags
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] io_uring: maintain drain logic for multishot requests
  2021-04-07 11:23 [PATCH 5.13 v2] io_uring: maintain drain requests' logic Hao Xu
  2021-04-07 11:23 ` [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests Hao Xu
@ 2021-04-07 11:23 ` Hao Xu
  2021-04-07 11:41   ` Pavel Begunkov
  2021-04-07 11:23 ` [PATCH 3/3] io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD Hao Xu
  2021-04-07 15:49 ` [PATCH 5.13 v2] io_uring: maintain drain requests' logic Jens Axboe
  3 siblings, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-07 11:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Now that we have multishot poll requests, one sqe can emit multiple
cqes. given below example:
    sqe0(multishot poll)-->sqe1-->sqe2(drain req)
sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
is a multishot poll request, sqe2 may be issued after sqe0's event
triggered twice before sqe1 completed. This isn't what users leverage
drain requests for.
Here a simple solution is to ignore all multishot poll cqes, which means
drain requests won't wait those request to be done.
To achieve this, we should reconsider the req_need_defer equation, the
original one is:

    all_sqes(excluding dropped ones) == all_cqes(including dropped ones)

this means we issue a drain request when all the previous submitted
sqes have generated their cqes.
Now we should ignore multishot requests, so:
    all_sqes - multishot_sqes == all_cqes - multishot_cqes ==>
    all_sqes + multishot_cqes - multishot_cqes == all_cqes

Thus we have to track the submittion of a multishot request and the cqes
generation of it, including the ECANCELLED cqes. Here we introduce
cq_extra = multishot_cqes - multishot_cqes for it.

There are other solutions like:
  - just track multishot (non-ECNCELLED)cqes, don't track multishot sqes.
      this way we include multishot sqes in the left end of the equation
      this means we have to see multishot sqes as normal ones, then we
      have to keep right one cqe for each multishot sqe. It's hard to do
      this since there may be some multishot sqes which triggered
      several events and then was cancelled, meanwhile other multishot
      sqes just triggered events but wasn't cancelled. We still need to
      track number of multishot sqes that haven't been cancelled, which
      make things complicated

For implementations, just do the submittion tracking in
io_submit_sqe() --> io_init_req() to make things simple. Otherwise if
we do it in per opcode issue place, then we need to carefully consider
each caller of io_req_complete_failed() because trick cases like cancel
multishot reqs in link.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 192463bb977a..a7bd223ce2cc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -423,6 +423,7 @@ struct io_ring_ctx {
 		unsigned		cq_mask;
 		atomic_t		cq_timeouts;
 		unsigned		cq_last_tm_flush;
+		unsigned		cq_extra;
 		unsigned long		cq_check_overflow;
 		struct wait_queue_head	cq_wait;
 		struct fasync_struct	*cq_fasync;
@@ -879,6 +880,8 @@ struct io_op_def {
 	unsigned		needs_async_setup : 1;
 	/* should block plug */
 	unsigned		plug : 1;
+	/* set if opcode may generate multiple cqes */
+	unsigned		multi_cqes : 1;
 	/* size of async data needed, if any */
 	unsigned short		async_size;
 };
@@ -924,6 +927,7 @@ struct io_op_def {
 	[IORING_OP_POLL_ADD] = {
 		.needs_file		= 1,
 		.unbound_nonreg_file	= 1,
+		.multi_cqes		= 1,
 	},
 	[IORING_OP_POLL_REMOVE] = {},
 	[IORING_OP_SYNC_FILE_RANGE] = {
@@ -1186,7 +1190,7 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
 	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
 		struct io_ring_ctx *ctx = req->ctx;
 
-		return seq != ctx->cached_cq_tail
+		return seq + ctx->cq_extra != ctx->cached_cq_tail
 				+ READ_ONCE(ctx->cached_cq_overflow);
 	}
 
@@ -1516,6 +1520,9 @@ static bool __io_cqring_fill_event(struct io_kiocb *req, long res,
 
 	trace_io_uring_complete(ctx, req->user_data, res, cflags);
 
+	if (req->flags & REQ_F_MULTI_CQES)
+		req->ctx->cq_extra++;
+
 	/*
 	 * If we can't get a cq entry, userspace overflowed the
 	 * submission (by quite a lot). Increment the overflow count in
@@ -6504,6 +6511,13 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	req->result = 0;
 	req->work.creds = NULL;
 
+	if (sqe_flags & IOSQE_MULTI_CQES) {
+		ctx->cq_extra--;
+		if (!io_op_defs[req->opcode].multi_cqes) {
+			return -EOPNOTSUPP;
+		}
+	}
+
 	/* enforce forwards compatibility on users */
 	if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) {
 		req->flags = 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
  2021-04-07 11:23 [PATCH 5.13 v2] io_uring: maintain drain requests' logic Hao Xu
  2021-04-07 11:23 ` [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests Hao Xu
  2021-04-07 11:23 ` [PATCH 2/3] io_uring: maintain drain logic " Hao Xu
@ 2021-04-07 11:23 ` Hao Xu
  2021-04-07 15:49 ` [PATCH 5.13 v2] io_uring: maintain drain requests' logic Jens Axboe
  3 siblings, 0 replies; 18+ messages in thread
From: Hao Xu @ 2021-04-07 11:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

leverage REQ_F_MULTI_CQES to suppoort IORING_OP_ADD multishot.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c                 | 13 ++++++++++---
 include/uapi/linux/io_uring.h |  5 -----
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a7bd223ce2cc..952ad0ddb2db 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5361,20 +5361,27 @@ static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 {
 	struct io_poll_iocb *poll = &req->poll;
 	u32 events, flags;
+	bool multishot = req->flags & REQ_F_MULTI_CQES;
+	u32 update_bits = IORING_POLL_UPDATE_EVENTS |
+		IORING_POLL_UPDATE_USER_DATA;
 
 	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
 	if (sqe->ioprio || sqe->buf_index)
 		return -EINVAL;
 	flags = READ_ONCE(sqe->len);
-	if (flags & ~(IORING_POLL_ADD_MULTI | IORING_POLL_UPDATE_EVENTS |
-			IORING_POLL_UPDATE_USER_DATA))
+	/*
+	 * can't set REQ_F_MULTI_CQES with UPDATE flags, otherwise we count
+	 * IO_POLL_ADD(IORING_POLL_UPDATE_*)'s cqe to ctx->cq_extra, which
+	 * we shouldn't
+	 */
+	if ((flags & ~update_bits) || (multishot && (flags & update_bits)))
 		return -EINVAL;
 	events = READ_ONCE(sqe->poll32_events);
 #ifdef __BIG_ENDIAN
 	events = swahw32(events);
 #endif
-	if (!(flags & IORING_POLL_ADD_MULTI))
+	if (!multishot)
 		events |= EPOLLONESHOT;
 	poll->update_events = poll->update_user_data = false;
 	if (flags & IORING_POLL_UPDATE_EVENTS) {
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 303ac8005572..a3cd943b228e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -166,14 +166,9 @@ enum {
  * POLL_ADD flags. Note that since sqe->poll_events is the flag space, the
  * command flags for POLL_ADD are stored in sqe->len.
  *
- * IORING_POLL_ADD_MULTI	Multishot poll. Sets IORING_CQE_F_MORE if
- *				the poll handler will continue to report
- *				CQEs on behalf of the same SQE.
- *
  * IORING_POLL_UPDATE		Update existing poll request, matching
  *				sqe->addr as the old user_data field.
  */
-#define IORING_POLL_ADD_MULTI	(1U << 0)
 #define IORING_POLL_UPDATE_EVENTS	(1U << 1)
 #define IORING_POLL_UPDATE_USER_DATA	(1U << 2)
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
  2021-04-07 11:23 ` [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests Hao Xu
@ 2021-04-07 11:38   ` Pavel Begunkov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-07 11:38 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 07/04/2021 12:23, Hao Xu wrote:
> Since we now have requests that may generate multiple cqes, we need a
> new flag to mark them, so that we can maintain features like drain io
> easily for them.
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c                 | 5 ++++-
>  include/uapi/linux/io_uring.h | 3 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 81e5d156af1c..192463bb977a 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -102,7 +102,7 @@
>  
>  #define SQE_VALID_FLAGS	(IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK|	\
>  				IOSQE_IO_HARDLINK | IOSQE_ASYNC | \
> -				IOSQE_BUFFER_SELECT)
> +				IOSQE_BUFFER_SELECT | IOSQE_MULTI_CQES)
>  
>  struct io_uring {
>  	u32 head ____cacheline_aligned_in_smp;
> @@ -700,6 +700,7 @@ enum {
>  	REQ_F_HARDLINK_BIT	= IOSQE_IO_HARDLINK_BIT,
>  	REQ_F_FORCE_ASYNC_BIT	= IOSQE_ASYNC_BIT,
>  	REQ_F_BUFFER_SELECT_BIT	= IOSQE_BUFFER_SELECT_BIT,
> +	REQ_F_MULTI_CQES_BIT	= IOSQE_MULTI_CQES_BIT,
>  
>  	REQ_F_FAIL_LINK_BIT,
>  	REQ_F_INFLIGHT_BIT,
> @@ -766,6 +767,8 @@ enum {
>  	REQ_F_ASYNC_WRITE	= BIT(REQ_F_ASYNC_WRITE_BIT),
>  	/* regular file */
>  	REQ_F_ISREG		= BIT(REQ_F_ISREG_BIT),
> +	/* a request can generate multiple cqes */
> +	REQ_F_MULTI_CQES	= BIT(REQ_F_MULTI_CQES_BIT),
>  };
>  
>  struct async_poll {
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 5beaa6bbc6db..303ac8005572 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -70,6 +70,7 @@ enum {
>  	IOSQE_IO_HARDLINK_BIT,
>  	IOSQE_ASYNC_BIT,
>  	IOSQE_BUFFER_SELECT_BIT,
> +	IOSQE_MULTI_CQES_BIT,
>  };

It's user API flags, I doubt you want to touch them.
1) there are not much of bits left
2) it almost always means you add overhead in generic path,
even for those who don't care about multishots or drains.

Let's keep out of it

>  
>  /*
> @@ -87,6 +88,8 @@ enum {
>  #define IOSQE_ASYNC		(1U << IOSQE_ASYNC_BIT)
>  /* select buffer from sqe->buf_group */
>  #define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
> +/* may generate multiple cqes */
> +#define IOSQE_MULTI_CQES	(1U << IOSQE_MULTI_CQES_BIT)
>  
>  /*
>   * io_uring_setup() flags
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] io_uring: maintain drain logic for multishot requests
  2021-04-07 11:23 ` [PATCH 2/3] io_uring: maintain drain logic " Hao Xu
@ 2021-04-07 11:41   ` Pavel Begunkov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-07 11:41 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 07/04/2021 12:23, Hao Xu wrote:
> Now that we have multishot poll requests, one sqe can emit multiple
> cqes. given below example:
>     sqe0(multishot poll)-->sqe1-->sqe2(drain req)
> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
> is a multishot poll request, sqe2 may be issued after sqe0's event
> triggered twice before sqe1 completed. This isn't what users leverage
> drain requests for.
> Here a simple solution is to ignore all multishot poll cqes, which means
> drain requests won't wait those request to be done.
> To achieve this, we should reconsider the req_need_defer equation, the
> original one is:
> 
>     all_sqes(excluding dropped ones) == all_cqes(including dropped ones)
> 
> this means we issue a drain request when all the previous submitted
> sqes have generated their cqes.
> Now we should ignore multishot requests, so:
>     all_sqes - multishot_sqes == all_cqes - multishot_cqes ==>
>     all_sqes + multishot_cqes - multishot_cqes == all_cqes
> 
> Thus we have to track the submittion of a multishot request and the cqes
> generation of it, including the ECANCELLED cqes. Here we introduce
> cq_extra = multishot_cqes - multishot_cqes for it.
> 
> There are other solutions like:
>   - just track multishot (non-ECNCELLED)cqes, don't track multishot sqes.
>       this way we include multishot sqes in the left end of the equation
>       this means we have to see multishot sqes as normal ones, then we
>       have to keep right one cqe for each multishot sqe. It's hard to do
>       this since there may be some multishot sqes which triggered
>       several events and then was cancelled, meanwhile other multishot
>       sqes just triggered events but wasn't cancelled. We still need to
>       track number of multishot sqes that haven't been cancelled, which
>       make things complicated
> 
> For implementations, just do the submittion tracking in
> io_submit_sqe() --> io_init_req() to make things simple. Otherwise if
> we do it in per opcode issue place, then we need to carefully consider
> each caller of io_req_complete_failed() because trick cases like cancel
> multishot reqs in link.
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 192463bb977a..a7bd223ce2cc 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -423,6 +423,7 @@ struct io_ring_ctx {
>  		unsigned		cq_mask;
>  		atomic_t		cq_timeouts;
>  		unsigned		cq_last_tm_flush;
> +		unsigned		cq_extra;
>  		unsigned long		cq_check_overflow;
>  		struct wait_queue_head	cq_wait;
>  		struct fasync_struct	*cq_fasync;
> @@ -879,6 +880,8 @@ struct io_op_def {
>  	unsigned		needs_async_setup : 1;
>  	/* should block plug */
>  	unsigned		plug : 1;
> +	/* set if opcode may generate multiple cqes */
> +	unsigned		multi_cqes : 1;
>  	/* size of async data needed, if any */
>  	unsigned short		async_size;
>  };
> @@ -924,6 +927,7 @@ struct io_op_def {
>  	[IORING_OP_POLL_ADD] = {
>  		.needs_file		= 1,
>  		.unbound_nonreg_file	= 1,
> +		.multi_cqes		= 1,
>  	},
>  	[IORING_OP_POLL_REMOVE] = {},
>  	[IORING_OP_SYNC_FILE_RANGE] = {
> @@ -1186,7 +1190,7 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>  	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>  		struct io_ring_ctx *ctx = req->ctx;
>  
> -		return seq != ctx->cached_cq_tail
> +		return seq + ctx->cq_extra != ctx->cached_cq_tail
>  				+ READ_ONCE(ctx->cached_cq_overflow);
>  	}
>  
> @@ -1516,6 +1520,9 @@ static bool __io_cqring_fill_event(struct io_kiocb *req, long res,
>  
>  	trace_io_uring_complete(ctx, req->user_data, res, cflags);
>  
> +	if (req->flags & REQ_F_MULTI_CQES)
> +		req->ctx->cq_extra++;
> +


Here we go, additional overhead burdening everyone but used for
a little new feature. All that can be done in poll or in *_prep()
on opcode by opcode basis.

>  	/*
>  	 * If we can't get a cq entry, userspace overflowed the
>  	 * submission (by quite a lot). Increment the overflow count in
> @@ -6504,6 +6511,13 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  	req->result = 0;
>  	req->work.creds = NULL;
>  
> +	if (sqe_flags & IOSQE_MULTI_CQES) {
> +		ctx->cq_extra--;
> +		if (!io_op_defs[req->opcode].multi_cqes) {
> +			return -EOPNOTSUPP;
> +		}
> +	}
> +

see above

>  	/* enforce forwards compatibility on users */
>  	if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) {
>  		req->flags = 0;
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-07 11:23 [PATCH 5.13 v2] io_uring: maintain drain requests' logic Hao Xu
                   ` (2 preceding siblings ...)
  2021-04-07 11:23 ` [PATCH 3/3] io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD Hao Xu
@ 2021-04-07 15:49 ` Jens Axboe
  2021-04-08 10:16   ` Hao Xu
  3 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-04-07 15:49 UTC (permalink / raw)
  To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi

On 4/7/21 5:23 AM, Hao Xu wrote:
> more tests comming, send this out first for comments.
> 
> Hao Xu (3):
>   io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>   io_uring: maintain drain logic for multishot requests
>   io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
> 
>  fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>  include/uapi/linux/io_uring.h |  8 +++-----
>  2 files changed, 32 insertions(+), 10 deletions(-)

Let's do the simple cq_extra first. I don't see a huge need to add an
IOSQE flag for this, probably best to just keep this on a per opcode
basis for now, which also then limits the code path to just touching
poll for now, as nothing else supports multishot CQEs at this point.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-07 15:49 ` [PATCH 5.13 v2] io_uring: maintain drain requests' logic Jens Axboe
@ 2021-04-08 10:16   ` Hao Xu
  2021-04-08 11:43     ` Hao Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-08 10:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

在 2021/4/7 下午11:49, Jens Axboe 写道:
> On 4/7/21 5:23 AM, Hao Xu wrote:
>> more tests comming, send this out first for comments.
>>
>> Hao Xu (3):
>>    io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>>    io_uring: maintain drain logic for multishot requests
>>    io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>
>>   fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>   include/uapi/linux/io_uring.h |  8 +++-----
>>   2 files changed, 32 insertions(+), 10 deletions(-)
> 
> Let's do the simple cq_extra first. I don't see a huge need to add an
> IOSQE flag for this, probably best to just keep this on a per opcode
> basis for now, which also then limits the code path to just touching
> poll for now, as nothing else supports multishot CQEs at this point.
> 
gotcha.
a small issue here:
   sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)

in the above case, assume the first 3 single-shot reqs have completed.
then I think the drian request won't be issued now unless the multishot 
request in the linkchain has been issued. The trick is: a multishot req
in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
is counted in seq, but we will deduct the sqe when it is issued if we
want to do the job per opcode not in the main code path.
before the multishot req issued:
       all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
after the multishot req issued:
       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-08 10:16   ` Hao Xu
@ 2021-04-08 11:43     ` Hao Xu
  2021-04-08 12:22       ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-08 11:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

在 2021/4/8 下午6:16, Hao Xu 写道:
> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>> more tests comming, send this out first for comments.
>>>
>>> Hao Xu (3):
>>>    io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot 
>>> requests
>>>    io_uring: maintain drain logic for multishot requests
>>>    io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>
>>>   fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>>   include/uapi/linux/io_uring.h |  8 +++-----
>>>   2 files changed, 32 insertions(+), 10 deletions(-)
>>
>> Let's do the simple cq_extra first. I don't see a huge need to add an
>> IOSQE flag for this, probably best to just keep this on a per opcode
>> basis for now, which also then limits the code path to just touching
>> poll for now, as nothing else supports multishot CQEs at this point.
>>
> gotcha.
> a small issue here:
>   sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
> 
> in the above case, assume the first 3 single-shot reqs have completed.
> then I think the drian request won't be issued now unless the multishot 
> request in the linkchain has been issued. The trick is: a multishot req
> in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
> is counted in seq, but we will deduct the sqe when it is issued if we
> want to do the job per opcode not in the main code path.
> before the multishot req issued:
>       all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
> after the multishot req issued:
>       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)

Sorry, my statement is wrong. It's not "won't be issued now unless the
multishot request in the linkchain has been issued". Actually I now
think the drain req won't be issued unless the multishot request in the
linkchain has completed. Because we may first check req_need_defer()
then issue(req->link), so:
    sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)

   sqe2 is completed:
     call req_need_defer:
     all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
   sqe3 is issued:
     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
   sqe3 is completed:
     call req_need_defer:
     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)

sqe4 shouldn't wait sqe3.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-08 11:43     ` Hao Xu
@ 2021-04-08 12:22       ` Pavel Begunkov
  2021-04-08 16:18         ` Jens Axboe
  2021-04-09  3:12         ` Hao Xu
  0 siblings, 2 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-08 12:22 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 08/04/2021 12:43, Hao Xu wrote:
> 在 2021/4/8 下午6:16, Hao Xu 写道:
>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>> more tests comming, send this out first for comments.
>>>>
>>>> Hao Xu (3):
>>>>    io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>>>>    io_uring: maintain drain logic for multishot requests
>>>>    io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>
>>>>   fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>>>   include/uapi/linux/io_uring.h |  8 +++-----
>>>>   2 files changed, 32 insertions(+), 10 deletions(-)
>>>
>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>> basis for now, which also then limits the code path to just touching
>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>
>> gotcha.
>> a small issue here:
>>   sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>
>> in the above case, assume the first 3 single-shot reqs have completed.
>> then I think the drian request won't be issued now unless the multishot request in the linkchain has been issued. The trick is: a multishot req
>> in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
>> is counted in seq, but we will deduct the sqe when it is issued if we
>> want to do the job per opcode not in the main code path.
>> before the multishot req issued:
>>       all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>> after the multishot req issued:
>>       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
> 
> Sorry, my statement is wrong. It's not "won't be issued now unless the
> multishot request in the linkchain has been issued". Actually I now
> think the drain req won't be issued unless the multishot request in the
> linkchain has completed. Because we may first check req_need_defer()
> then issue(req->link), so:
>    sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)
> 
>   sqe2 is completed:
>     call req_need_defer:
>     all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>   sqe3 is issued:
>     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>   sqe3 is completed:
>     call req_need_defer:
>     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
> 
> sqe4 shouldn't wait sqe3.

Do you mean it wouldn't if the patch is applied? Because any drain
request must wait for all requests submitted before to complete. And
so before issuing sqe4 it must wait for sqe3 __request__ to die, and
so for all sqe3's CQEs.

previously 


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-08 12:22       ` Pavel Begunkov
@ 2021-04-08 16:18         ` Jens Axboe
  2021-04-09  6:15           ` Hao Xu
  2021-04-09  3:12         ` Hao Xu
  1 sibling, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-04-08 16:18 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu; +Cc: io-uring, Joseph Qi

On 4/8/21 6:22 AM, Pavel Begunkov wrote:
> On 08/04/2021 12:43, Hao Xu wrote:
>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>> more tests comming, send this out first for comments.
>>>>>
>>>>> Hao Xu (3):
>>>>>    io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>>>>>    io_uring: maintain drain logic for multishot requests
>>>>>    io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>
>>>>>   fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>>>>   include/uapi/linux/io_uring.h |  8 +++-----
>>>>>   2 files changed, 32 insertions(+), 10 deletions(-)
>>>>
>>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>>> basis for now, which also then limits the code path to just touching
>>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>>
>>> gotcha.
>>> a small issue here:
>>>   sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>
>>> in the above case, assume the first 3 single-shot reqs have completed.
>>> then I think the drian request won't be issued now unless the multishot request in the linkchain has been issued. The trick is: a multishot req
>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
>>> is counted in seq, but we will deduct the sqe when it is issued if we
>>> want to do the job per opcode not in the main code path.
>>> before the multishot req issued:
>>>       all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>> after the multishot req issued:
>>>       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>
>> Sorry, my statement is wrong. It's not "won't be issued now unless the
>> multishot request in the linkchain has been issued". Actually I now
>> think the drain req won't be issued unless the multishot request in the
>> linkchain has completed. Because we may first check req_need_defer()
>> then issue(req->link), so:
>>    sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)
>>
>>   sqe2 is completed:
>>     call req_need_defer:
>>     all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>   sqe3 is issued:
>>     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>   sqe3 is completed:
>>     call req_need_defer:
>>     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>
>> sqe4 shouldn't wait sqe3.
> 
> Do you mean it wouldn't if the patch is applied? Because any drain
> request must wait for all requests submitted before to complete. And
> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
> so for all sqe3's CQEs.
> 
> previously 

I think we need to agree on what multishot means for dependencies. Does
it mean it just needs to trigger once? Or does it mean that it needs to
be totally finished. The latter may obviously never happen, depending on
the use case. Or it may be an expected condition because the caller will
cancel it at some point.

The most logical view imho is that multishot changes nothing wrt drain.
If you ask for drain before something executes and you are using
multishot, then you need to understand that the multishot request needs
to fully complete before that condition is true and your dependency can
execute.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-08 12:22       ` Pavel Begunkov
  2021-04-08 16:18         ` Jens Axboe
@ 2021-04-09  3:12         ` Hao Xu
  2021-04-09  3:43           ` Hao Xu
  1 sibling, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-09  3:12 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/4/8 下午8:22, Pavel Begunkov 写道:
> On 08/04/2021 12:43, Hao Xu wrote:
>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>> more tests comming, send this out first for comments.
>>>>>
>>>>> Hao Xu (3):
>>>>>     io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>>>>>     io_uring: maintain drain logic for multishot requests
>>>>>     io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>
>>>>>    fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>>>>    include/uapi/linux/io_uring.h |  8 +++-----
>>>>>    2 files changed, 32 insertions(+), 10 deletions(-)
>>>>
>>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>>> basis for now, which also then limits the code path to just touching
>>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>>
>>> gotcha.
>>> a small issue here:
>>>    sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>
>>> in the above case, assume the first 3 single-shot reqs have completed.
>>> then I think the drian request won't be issued now unless the multishot request in the linkchain has been issued. The trick is: a multishot req
>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
>>> is counted in seq, but we will deduct the sqe when it is issued if we
>>> want to do the job per opcode not in the main code path.
>>> before the multishot req issued:
>>>        all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>> after the multishot req issued:
>>>        all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>
>> Sorry, my statement is wrong. It's not "won't be issued now unless the
>> multishot request in the linkchain has been issued". Actually I now
>> think the drain req won't be issued unless the multishot request in the
>> linkchain has completed. Because we may first check req_need_defer()
>> then issue(req->link), so:
>>     sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)
>>
>>    sqe2 is completed:
>>      call req_need_defer:
>>      all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>    sqe3 is issued:
>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>    sqe3 is completed:
>>      call req_need_defer:
>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>
>> sqe4 shouldn't wait sqe3.
> 
> Do you mean it wouldn't if the patch is applied? Because any drain
> request must wait for all requests submitted before to complete. And
> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
> so for all sqe3's CQEs.
> 
> previously
> 
Hi Pavel, the issue is what will happen after the patch being applied. 
The patch is to ignore all the multishot sqes and cqes. So by design,
sqe4 should wait for sqe0,1,2's completion, not sqe3's. But since we
implement it in per opcode place and don't touch the main code path, we
deduct a multishot sqe when issusing it(eg. call io_poll_add()).
So only when we issue sqe3, the equation is true:
     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
But at this point, we already missed
io_commit_cqring()-->__io_queue_deferred(), the next time 
__io_queue_deferred() being called is when sqe3 completed, so now sqe4
has waited for sqe3, this is not by design.

Regards,
Hao


> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-09  3:12         ` Hao Xu
@ 2021-04-09  3:43           ` Hao Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Hao Xu @ 2021-04-09  3:43 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/4/9 上午11:12, Hao Xu 写道:
> 在 2021/4/8 下午8:22, Pavel Begunkov 写道:
>> On 08/04/2021 12:43, Hao Xu wrote:
>>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>>> more tests comming, send this out first for comments.
>>>>>>
>>>>>> Hao Xu (3):
>>>>>>     io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot 
>>>>>> requests
>>>>>>     io_uring: maintain drain logic for multishot requests
>>>>>>     io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>>
>>>>>>    fs/io_uring.c                 | 34 
>>>>>> +++++++++++++++++++++++++++++-----
>>>>>>    include/uapi/linux/io_uring.h |  8 +++-----
>>>>>>    2 files changed, 32 insertions(+), 10 deletions(-)
>>>>>
>>>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>>>> basis for now, which also then limits the code path to just touching
>>>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>>>
>>>> gotcha.
>>>> a small issue here:
>>>>    sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>>
>>>> in the above case, assume the first 3 single-shot reqs have completed.
>>>> then I think the drian request won't be issued now unless the 
>>>> multishot request in the linkchain has been issued. The trick is: a 
>>>> multishot req
>>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which 
>>>> means it
>>>> is counted in seq, but we will deduct the sqe when it is issued if we
>>>> want to do the job per opcode not in the main code path.
>>>> before the multishot req issued:
>>>>        all_sqes(4) - multishot_sqes(0) == all_cqes(3) - 
>>>> multishot_cqes(0)
>>>> after the multishot req issued:
>>>>        all_sqes(4) - multishot_sqes(1) == all_cqes(3) - 
>>>> multishot_cqes(0)
>>>
>>> Sorry, my statement is wrong. It's not "won't be issued now unless the
>>> multishot request in the linkchain has been issued". Actually I now
>>> think the drain req won't be issued unless the multishot request in the
>>> linkchain has completed. Because we may first check req_need_defer()
>>> then issue(req->link), so:
>>>     sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)
>>>
>>>    sqe2 is completed:
>>>      call req_need_defer:
>>>      all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>    sqe3 is issued:
>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>    sqe3 is completed:
>>>      call req_need_defer:
>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>
>>> sqe4 shouldn't wait sqe3.
>>
>> Do you mean it wouldn't if the patch is applied? Because any drain
>> request must wait for all requests submitted before to complete. And
>> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
>> so for all sqe3's CQEs.
>>
>> previously
>>
> Hi Pavel, the issue is what will happen after the patch being applied. 
> The patch is to ignore all the multishot sqes and cqes. So by design,
> sqe4 should wait for sqe0,1,2's completion, not sqe3's. But since we
> implement it in per opcode place and don't touch the main code path, we
> deduct a multishot sqe when issusing it(eg. call io_poll_add()).
> So only when we issue sqe3, the equation is true:
>     all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
> But at this point, we already missed
> io_commit_cqring()-->__io_queue_deferred(), the next time 
> __io_queue_deferred() being called is when sqe3 completed, so now sqe4
> has waited for sqe3, this is not by design.
> 
> Regards,
> Hao
> 
Moreover, I found when sqe3 completed, the equation is:
    all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(1)
thus sqe4 will never be issued.
> 
>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-08 16:18         ` Jens Axboe
@ 2021-04-09  6:15           ` Hao Xu
  2021-04-09  7:05             ` Hao Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-09  6:15 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/4/9 上午12:18, Jens Axboe 写道:
> On 4/8/21 6:22 AM, Pavel Begunkov wrote:
>> On 08/04/2021 12:43, Hao Xu wrote:
>>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>>> more tests comming, send this out first for comments.
>>>>>>
>>>>>> Hao Xu (3):
>>>>>>     io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>>>>>>     io_uring: maintain drain logic for multishot requests
>>>>>>     io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>>
>>>>>>    fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>>>>>    include/uapi/linux/io_uring.h |  8 +++-----
>>>>>>    2 files changed, 32 insertions(+), 10 deletions(-)
>>>>>
>>>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>>>> basis for now, which also then limits the code path to just touching
>>>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>>>
>>>> gotcha.
>>>> a small issue here:
>>>>    sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>>
>>>> in the above case, assume the first 3 single-shot reqs have completed.
>>>> then I think the drian request won't be issued now unless the multishot request in the linkchain has been issued. The trick is: a multishot req
>>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
>>>> is counted in seq, but we will deduct the sqe when it is issued if we
>>>> want to do the job per opcode not in the main code path.
>>>> before the multishot req issued:
>>>>        all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>> after the multishot req issued:
>>>>        all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>
>>> Sorry, my statement is wrong. It's not "won't be issued now unless the
>>> multishot request in the linkchain has been issued". Actually I now
>>> think the drain req won't be issued unless the multishot request in the
>>> linkchain has completed. Because we may first check req_need_defer()
>>> then issue(req->link), so:
>>>     sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)
>>>
>>>    sqe2 is completed:
>>>      call req_need_defer:
>>>      all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>    sqe3 is issued:
>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>    sqe3 is completed:
>>>      call req_need_defer:
>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>
>>> sqe4 shouldn't wait sqe3.
>>
>> Do you mean it wouldn't if the patch is applied? Because any drain
>> request must wait for all requests submitted before to complete. And
>> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
>> so for all sqe3's CQEs.
>>
>> previously
> 
> I think we need to agree on what multishot means for dependencies. Does
> it mean it just needs to trigger once? Or does it mean that it needs to
> be totally finished. The latter may obviously never happen, depending on
> the use case. Or it may be an expected condition because the caller will
> cancel it at some point.
> 
> The most logical view imho is that multishot changes nothing wrt drain.
> If you ask for drain before something executes and you are using
> multishot, then you need to understand that the multishot request needs
> to fully complete before that condition is true and your dependency can
> execute.
This makes sense, and the implementation would be quite simpler. but we
really need to document it somewhere so that users easily get to know
that they cannot put a drain req after some multishot reqs if they don't
want it to wait for them. Otherwise I worry about wrong use of it since
the meaning of 'put a drain req after some multishot reqs' isn't so
obvious:
     - does it waits for those multishot reqs to complete once
     - or does it waits for those ones to fully complete
     - or does it ignore those ones at all



> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-09  6:15           ` Hao Xu
@ 2021-04-09  7:05             ` Hao Xu
  2021-04-09  7:50               ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-09  7:05 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/4/9 下午2:15, Hao Xu 写道:
> 在 2021/4/9 上午12:18, Jens Axboe 写道:
>> On 4/8/21 6:22 AM, Pavel Begunkov wrote:
>>> On 08/04/2021 12:43, Hao Xu wrote:
>>>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>>>> more tests comming, send this out first for comments.
>>>>>>>
>>>>>>> Hao Xu (3):
>>>>>>>     io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot 
>>>>>>> requests
>>>>>>>     io_uring: maintain drain logic for multishot requests
>>>>>>>     io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>>>
>>>>>>>    fs/io_uring.c                 | 34 
>>>>>>> +++++++++++++++++++++++++++++-----
>>>>>>>    include/uapi/linux/io_uring.h |  8 +++-----
>>>>>>>    2 files changed, 32 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>>>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>>>>> basis for now, which also then limits the code path to just touching
>>>>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>>>>
>>>>> gotcha.
>>>>> a small issue here:
>>>>>    sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>>>
>>>>> in the above case, assume the first 3 single-shot reqs have completed.
>>>>> then I think the drian request won't be issued now unless the 
>>>>> multishot request in the linkchain has been issued. The trick is: a 
>>>>> multishot req
>>>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which 
>>>>> means it
>>>>> is counted in seq, but we will deduct the sqe when it is issued if we
>>>>> want to do the job per opcode not in the main code path.
>>>>> before the multishot req issued:
>>>>>        all_sqes(4) - multishot_sqes(0) == all_cqes(3) - 
>>>>> multishot_cqes(0)
>>>>> after the multishot req issued:
>>>>>        all_sqes(4) - multishot_sqes(1) == all_cqes(3) - 
>>>>> multishot_cqes(0)
>>>>
>>>> Sorry, my statement is wrong. It's not "won't be issued now unless the
>>>> multishot request in the linkchain has been issued". Actually I now
>>>> think the drain req won't be issued unless the multishot request in the
>>>> linkchain has completed. Because we may first check req_need_defer()
>>>> then issue(req->link), so:
>>>>     sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, 
>>>> multishot)-->sqe4(drain)
>>>>
>>>>    sqe2 is completed:
>>>>      call req_need_defer:
>>>>      all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>>    sqe3 is issued:
>>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>    sqe3 is completed:
>>>>      call req_need_defer:
>>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>
>>>> sqe4 shouldn't wait sqe3.
>>>
>>> Do you mean it wouldn't if the patch is applied? Because any drain
>>> request must wait for all requests submitted before to complete. And
>>> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
>>> so for all sqe3's CQEs.
>>>
>>> previously
>>
>> I think we need to agree on what multishot means for dependencies. Does
>> it mean it just needs to trigger once? Or does it mean that it needs to
>> be totally finished. The latter may obviously never happen, depending on
>> the use case. Or it may be an expected condition because the caller will
>> cancel it at some point.
>>
>> The most logical view imho is that multishot changes nothing wrt drain.
>> If you ask for drain before something executes and you are using
>> multishot, then you need to understand that the multishot request needs
>> to fully complete before that condition is true and your dependency can
>> execute.
> This makes sense, and the implementation would be quite simpler. but we
> really need to document it somewhere so that users easily get to know
> that they cannot put a drain req after some multishot reqs if they don't
> want it to wait for them. Otherwise I worry about wrong use of it since
> the meaning of 'put a drain req after some multishot reqs' isn't so
> obvious:
>     - does it waits for those multishot reqs to complete once
>     - or does it waits for those ones to fully complete
>     - or does it ignore those ones at all
> 
I realised that if a drain req has to wait for multishot reqs' fully
  completion, then users have to explicitly cancel all the previous
multishot reqs, otherwise it won't execute forever:
     sqe0(multishot)-->sqe1(drain)-->sqe2(cancel multishot)    stuck
> 
> 
>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-09  7:05             ` Hao Xu
@ 2021-04-09  7:50               ` Pavel Begunkov
  2021-04-12 15:07                 ` Hao Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-09  7:50 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 09/04/2021 08:05, Hao Xu wrote:
> 在 2021/4/9 下午2:15, Hao Xu 写道:
>> 在 2021/4/9 上午12:18, Jens Axboe 写道:
>>> On 4/8/21 6:22 AM, Pavel Begunkov wrote:
>>>> On 08/04/2021 12:43, Hao Xu wrote:
>>>>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>>>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>>>>> more tests comming, send this out first for comments.
>>>>>>>>
>>>>>>>> Hao Xu (3):
>>>>>>>>     io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>>>>>>>>     io_uring: maintain drain logic for multishot requests
>>>>>>>>     io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>>>>
>>>>>>>>    fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>>>>>>>    include/uapi/linux/io_uring.h |  8 +++-----
>>>>>>>>    2 files changed, 32 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>>>>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>>>>>> basis for now, which also then limits the code path to just touching
>>>>>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>>>>>
>>>>>> gotcha.
>>>>>> a small issue here:
>>>>>>    sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>>>>
>>>>>> in the above case, assume the first 3 single-shot reqs have completed.
>>>>>> then I think the drian request won't be issued now unless the multishot request in the linkchain has been issued. The trick is: a multishot req
>>>>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
>>>>>> is counted in seq, but we will deduct the sqe when it is issued if we
>>>>>> want to do the job per opcode not in the main code path.
>>>>>> before the multishot req issued:
>>>>>>        all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>>>> after the multishot req issued:
>>>>>>        all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>>
>>>>> Sorry, my statement is wrong. It's not "won't be issued now unless the
>>>>> multishot request in the linkchain has been issued". Actually I now
>>>>> think the drain req won't be issued unless the multishot request in the
>>>>> linkchain has completed. Because we may first check req_need_defer()
>>>>> then issue(req->link), so:
>>>>>     sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)
>>>>>
>>>>>    sqe2 is completed:
>>>>>      call req_need_defer:
>>>>>      all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>>>    sqe3 is issued:
>>>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>>    sqe3 is completed:
>>>>>      call req_need_defer:
>>>>>      all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>>
>>>>> sqe4 shouldn't wait sqe3.
>>>>
>>>> Do you mean it wouldn't if the patch is applied? Because any drain
>>>> request must wait for all requests submitted before to complete. And
>>>> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
>>>> so for all sqe3's CQEs.
>>>>
>>>> previously
>>>
>>> I think we need to agree on what multishot means for dependencies. Does
>>> it mean it just needs to trigger once? Or does it mean that it needs to
>>> be totally finished. The latter may obviously never happen, depending on
>>> the use case. Or it may be an expected condition because the caller will
>>> cancel it at some point.
>>>
>>> The most logical view imho is that multishot changes nothing wrt drain.
>>> If you ask for drain before something executes and you are using
>>> multishot, then you need to understand that the multishot request needs
>>> to fully complete before that condition is true and your dependency can
>>> execute.
>> This makes sense, and the implementation would be quite simpler. but we
>> really need to document it somewhere so that users easily get to know
>> that they cannot put a drain req after some multishot reqs if they don't
>> want it to wait for them. Otherwise I worry about wrong use of it since
>> the meaning of 'put a drain req after some multishot reqs' isn't so
>> obvious:
>>     - does it waits for those multishot reqs to complete once
>>     - or does it waits for those ones to fully complete
>>     - or does it ignore those ones at all
>>
> I realised that if a drain req has to wait for multishot reqs' fully
>  completion, then users have to explicitly cancel all the previous
> multishot reqs, otherwise it won't execute forever:
>     sqe0(multishot)-->sqe1(drain)-->sqe2(cancel multishot)    stuck

And it's not a new behaviour, e.g. read(pipe); drain(); where nobody
writes to the pipe will stuck as well.

I like that it currently provides a full barrier between requests, are
there other patterns used by someone?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-09  7:50               ` Pavel Begunkov
@ 2021-04-12 15:07                 ` Hao Xu
  2021-04-12 15:29                   ` Hao Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Hao Xu @ 2021-04-12 15:07 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/4/9 下午3:50, Pavel Begunkov 写道:
> On 09/04/2021 08:05, Hao Xu wrote:
>> 在 2021/4/9 下午2:15, Hao Xu 写道:
>>> 在 2021/4/9 上午12:18, Jens Axboe 写道:
>>>> On 4/8/21 6:22 AM, Pavel Begunkov wrote:
>>>>> On 08/04/2021 12:43, Hao Xu wrote:
>>>>>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>>>>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>>>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>>>>>> more tests comming, send this out first for comments.
>>>>>>>>>
>>>>>>>>> Hao Xu (3):
>>>>>>>>>      io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests
>>>>>>>>>      io_uring: maintain drain logic for multishot requests
>>>>>>>>>      io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>>>>>
>>>>>>>>>     fs/io_uring.c                 | 34 +++++++++++++++++++++++++++++-----
>>>>>>>>>     include/uapi/linux/io_uring.h |  8 +++-----
>>>>>>>>>     2 files changed, 32 insertions(+), 10 deletions(-)
>>>>>>>>
>>>>>>>> Let's do the simple cq_extra first. I don't see a huge need to add an
>>>>>>>> IOSQE flag for this, probably best to just keep this on a per opcode
>>>>>>>> basis for now, which also then limits the code path to just touching
>>>>>>>> poll for now, as nothing else supports multishot CQEs at this point.
>>>>>>>>
>>>>>>> gotcha.
>>>>>>> a small issue here:
>>>>>>>     sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>>>>>
>>>>>>> in the above case, assume the first 3 single-shot reqs have completed.
>>>>>>> then I think the drian request won't be issued now unless the multishot request in the linkchain has been issued. The trick is: a multishot req
>>>>>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which means it
>>>>>>> is counted in seq, but we will deduct the sqe when it is issued if we
>>>>>>> want to do the job per opcode not in the main code path.
>>>>>>> before the multishot req issued:
>>>>>>>         all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>>>>> after the multishot req issued:
>>>>>>>         all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>>>
>>>>>> Sorry, my statement is wrong. It's not "won't be issued now unless the
>>>>>> multishot request in the linkchain has been issued". Actually I now
>>>>>> think the drain req won't be issued unless the multishot request in the
>>>>>> linkchain has completed. Because we may first check req_need_defer()
>>>>>> then issue(req->link), so:
>>>>>>      sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, multishot)-->sqe4(drain)
>>>>>>
>>>>>>     sqe2 is completed:
>>>>>>       call req_need_defer:
>>>>>>       all_sqes(4) - multishot_sqes(0) == all_cqes(3) - multishot_cqes(0)
>>>>>>     sqe3 is issued:
>>>>>>       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>>>     sqe3 is completed:
>>>>>>       call req_need_defer:
>>>>>>       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - multishot_cqes(0)
>>>>>>
>>>>>> sqe4 shouldn't wait sqe3.
>>>>>
>>>>> Do you mean it wouldn't if the patch is applied? Because any drain
>>>>> request must wait for all requests submitted before to complete. And
>>>>> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
>>>>> so for all sqe3's CQEs.
>>>>>
>>>>> previously
>>>>
>>>> I think we need to agree on what multishot means for dependencies. Does
>>>> it mean it just needs to trigger once? Or does it mean that it needs to
>>>> be totally finished. The latter may obviously never happen, depending on
>>>> the use case. Or it may be an expected condition because the caller will
>>>> cancel it at some point.
>>>>
>>>> The most logical view imho is that multishot changes nothing wrt drain.
>>>> If you ask for drain before something executes and you are using
>>>> multishot, then you need to understand that the multishot request needs
>>>> to fully complete before that condition is true and your dependency can
>>>> execute.
>>> This makes sense, and the implementation would be quite simpler. but we
>>> really need to document it somewhere so that users easily get to know
>>> that they cannot put a drain req after some multishot reqs if they don't
>>> want it to wait for them. Otherwise I worry about wrong use of it since
>>> the meaning of 'put a drain req after some multishot reqs' isn't so
>>> obvious:
>>>      - does it waits for those multishot reqs to complete once
>>>      - or does it waits for those ones to fully complete
>>>      - or does it ignore those ones at all
>>>
>> I realised that if a drain req has to wait for multishot reqs' fully
>>   completion, then users have to explicitly cancel all the previous
>> multishot reqs, otherwise it won't execute forever:
>>      sqe0(multishot)-->sqe1(drain)-->sqe2(cancel multishot)    stuck
> 
> And it's not a new behaviour, e.g. read(pipe); drain(); where nobody
> writes to the pipe will stuck as well.
> 
> I like that it currently provides a full barrier between requests, are
> there other patterns used by someone?
> 
As I'm writing a test for it, I found there is something different.
we can break the stuck case above(read(pipe); drain();) easily since 
writing something to the pipe is independant to the sqring itself.
But for a multishot req, there are many restrictions for the cancel req.
   1. we cannot mark a cancel as LINK or DRAIN:
         (1)sqe(multishot)->sqe(link, cancel)->sqe(link, drain)
         (2)sqe(multishot)->sqe(cancel, drain)
         (3)the linkchain fails at some member, which leads to
            cancellation of the cancel req. and users have to retry.

   2. we have to be careful when marking a multishot req with LINK or
      DRAIN
       (1)sqe0(nop, link)->sqe1(multishot, link)->sqe2(nop)->sqe3(cancel)
         *  sqe3 may execute before sqe1, and cancels nothing
         in other words, we have to carefully arrange them to make sure
         the cancel req works.
       (2) sqe(multshot, drain)

There may be other cases. I feel it not easy for users to jump over
  these traps.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5.13 v2] io_uring: maintain drain requests' logic
  2021-04-12 15:07                 ` Hao Xu
@ 2021-04-12 15:29                   ` Hao Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Hao Xu @ 2021-04-12 15:29 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/4/12 下午11:07, Hao Xu 写道:
> 在 2021/4/9 下午3:50, Pavel Begunkov 写道:
>> On 09/04/2021 08:05, Hao Xu wrote:
>>> 在 2021/4/9 下午2:15, Hao Xu 写道:
>>>> 在 2021/4/9 上午12:18, Jens Axboe 写道:
>>>>> On 4/8/21 6:22 AM, Pavel Begunkov wrote:
>>>>>> On 08/04/2021 12:43, Hao Xu wrote:
>>>>>>> 在 2021/4/8 下午6:16, Hao Xu 写道:
>>>>>>>> 在 2021/4/7 下午11:49, Jens Axboe 写道:
>>>>>>>>> On 4/7/21 5:23 AM, Hao Xu wrote:
>>>>>>>>>> more tests comming, send this out first for comments.
>>>>>>>>>>
>>>>>>>>>> Hao Xu (3):
>>>>>>>>>>      io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for 
>>>>>>>>>> multishot requests
>>>>>>>>>>      io_uring: maintain drain logic for multishot requests
>>>>>>>>>>      io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD
>>>>>>>>>>
>>>>>>>>>>     fs/io_uring.c                 | 34 
>>>>>>>>>> +++++++++++++++++++++++++++++-----
>>>>>>>>>>     include/uapi/linux/io_uring.h |  8 +++-----
>>>>>>>>>>     2 files changed, 32 insertions(+), 10 deletions(-)
>>>>>>>>>
>>>>>>>>> Let's do the simple cq_extra first. I don't see a huge need to 
>>>>>>>>> add an
>>>>>>>>> IOSQE flag for this, probably best to just keep this on a per 
>>>>>>>>> opcode
>>>>>>>>> basis for now, which also then limits the code path to just 
>>>>>>>>> touching
>>>>>>>>> poll for now, as nothing else supports multishot CQEs at this 
>>>>>>>>> point.
>>>>>>>>>
>>>>>>>> gotcha.
>>>>>>>> a small issue here:
>>>>>>>>     sqe-->sqe(link)-->sqe(link)-->sqe(link, multishot)-->sqe(drain)
>>>>>>>>
>>>>>>>> in the above case, assume the first 3 single-shot reqs have 
>>>>>>>> completed.
>>>>>>>> then I think the drian request won't be issued now unless the 
>>>>>>>> multishot request in the linkchain has been issued. The trick 
>>>>>>>> is: a multishot req
>>>>>>>> in a linkchain consumes cached_sq_head when io_get_sqe(), which 
>>>>>>>> means it
>>>>>>>> is counted in seq, but we will deduct the sqe when it is issued 
>>>>>>>> if we
>>>>>>>> want to do the job per opcode not in the main code path.
>>>>>>>> before the multishot req issued:
>>>>>>>>         all_sqes(4) - multishot_sqes(0) == all_cqes(3) - 
>>>>>>>> multishot_cqes(0)
>>>>>>>> after the multishot req issued:
>>>>>>>>         all_sqes(4) - multishot_sqes(1) == all_cqes(3) - 
>>>>>>>> multishot_cqes(0)
>>>>>>>
>>>>>>> Sorry, my statement is wrong. It's not "won't be issued now 
>>>>>>> unless the
>>>>>>> multishot request in the linkchain has been issued". Actually I now
>>>>>>> think the drain req won't be issued unless the multishot request 
>>>>>>> in the
>>>>>>> linkchain has completed. Because we may first check req_need_defer()
>>>>>>> then issue(req->link), so:
>>>>>>>      sqe0-->sqe1(link)-->sqe2(link)-->sqe3(link, 
>>>>>>> multishot)-->sqe4(drain)
>>>>>>>
>>>>>>>     sqe2 is completed:
>>>>>>>       call req_need_defer:
>>>>>>>       all_sqes(4) - multishot_sqes(0) == all_cqes(3) - 
>>>>>>> multishot_cqes(0)
>>>>>>>     sqe3 is issued:
>>>>>>>       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - 
>>>>>>> multishot_cqes(0)
>>>>>>>     sqe3 is completed:
>>>>>>>       call req_need_defer:
>>>>>>>       all_sqes(4) - multishot_sqes(1) == all_cqes(3) - 
>>>>>>> multishot_cqes(0)
>>>>>>>
>>>>>>> sqe4 shouldn't wait sqe3.
>>>>>>
>>>>>> Do you mean it wouldn't if the patch is applied? Because any drain
>>>>>> request must wait for all requests submitted before to complete. And
>>>>>> so before issuing sqe4 it must wait for sqe3 __request__ to die, and
>>>>>> so for all sqe3's CQEs.
>>>>>>
>>>>>> previously
>>>>>
>>>>> I think we need to agree on what multishot means for dependencies. 
>>>>> Does
>>>>> it mean it just needs to trigger once? Or does it mean that it 
>>>>> needs to
>>>>> be totally finished. The latter may obviously never happen, 
>>>>> depending on
>>>>> the use case. Or it may be an expected condition because the caller 
>>>>> will
>>>>> cancel it at some point.
>>>>>
>>>>> The most logical view imho is that multishot changes nothing wrt 
>>>>> drain.
>>>>> If you ask for drain before something executes and you are using
>>>>> multishot, then you need to understand that the multishot request 
>>>>> needs
>>>>> to fully complete before that condition is true and your dependency 
>>>>> can
>>>>> execute.
>>>> This makes sense, and the implementation would be quite simpler. but we
>>>> really need to document it somewhere so that users easily get to know
>>>> that they cannot put a drain req after some multishot reqs if they 
>>>> don't
>>>> want it to wait for them. Otherwise I worry about wrong use of it since
>>>> the meaning of 'put a drain req after some multishot reqs' isn't so
>>>> obvious:
>>>>      - does it waits for those multishot reqs to complete once
>>>>      - or does it waits for those ones to fully complete
>>>>      - or does it ignore those ones at all
>>>>
>>> I realised that if a drain req has to wait for multishot reqs' fully
>>>   completion, then users have to explicitly cancel all the previous
>>> multishot reqs, otherwise it won't execute forever:
>>>      sqe0(multishot)-->sqe1(drain)-->sqe2(cancel multishot)    stuck
>>
>> And it's not a new behaviour, e.g. read(pipe); drain(); where nobody
>> writes to the pipe will stuck as well.
>>
>> I like that it currently provides a full barrier between requests, are
>> there other patterns used by someone?
>>
> As I'm writing a test for it, I found there is something different.
> we can break the stuck case above(read(pipe); drain();) easily since 
> writing something to the pipe is independant to the sqring itself.
> But for a multishot req, there are many restrictions for the cancel req.
>   1. we cannot mark a cancel as LINK or DRAIN:
>         (1)sqe(multishot)->sqe(link, cancel)->sqe(link, drain)
>         (2)sqe(multishot)->sqe(cancel, drain)
>         (3)the linkchain fails at some member, which leads to
>            cancellation of the cancel req. and users have to retry.
> 
>   2. we have to be careful when marking a multishot req with LINK or
>      DRAIN
>       (1)sqe0(nop, link)->sqe1(multishot, link)->sqe2(nop)->sqe3(cancel)
>         *  sqe3 may execute before sqe1, and cancels nothing
>         in other words, we have to carefully arrange them to make sure
>         the cancel req works.
>       (2) sqe(multshot, drain)
> 
> There may be other cases. I feel it not easy for users to jump over
> these traps.
> 
sorry, correct something above:
  1. we cannot mark a cancel as LINK or DRAIN:
         (1)the linkchain fails at some member, which leads to
             cancellation of the cancel req. and users have to retry.
         (2)sqe(multishot)->sqe(cancel, drain)

  2. we have to be careful when marking a multishot req with LINK
    (1)sqe0(nop, link)->sqe1(multishot,    link)->sqe2(nop)->sqe3(cancel)
          *  sqe3 may execute before sqe1, and cancels nothing
          in other words, we have to carefully arrange them to make sure
          the cancel req works.
     (2)sqe0(link, drain)->sqe1(link, multishot)->sqe2(nop)->sqe3(cancel)
        though sqe2 is a normal sqe, but it will be marked as drain since
        there is a drain req in linkchain.
     * here the word 'link' means being logically in the linkchain, I
       know that the first req without IOSQE_IO_LINK after the
       linkchain is still in the linkchain


> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-12 15:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-07 11:23 [PATCH 5.13 v2] io_uring: maintain drain requests' logic Hao Xu
2021-04-07 11:23 ` [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests Hao Xu
2021-04-07 11:38   ` Pavel Begunkov
2021-04-07 11:23 ` [PATCH 2/3] io_uring: maintain drain logic " Hao Xu
2021-04-07 11:41   ` Pavel Begunkov
2021-04-07 11:23 ` [PATCH 3/3] io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD Hao Xu
2021-04-07 15:49 ` [PATCH 5.13 v2] io_uring: maintain drain requests' logic Jens Axboe
2021-04-08 10:16   ` Hao Xu
2021-04-08 11:43     ` Hao Xu
2021-04-08 12:22       ` Pavel Begunkov
2021-04-08 16:18         ` Jens Axboe
2021-04-09  6:15           ` Hao Xu
2021-04-09  7:05             ` Hao Xu
2021-04-09  7:50               ` Pavel Begunkov
2021-04-12 15:07                 ` Hao Xu
2021-04-12 15:29                   ` Hao Xu
2021-04-09  3:12         ` Hao Xu
2021-04-09  3:43           ` Hao Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.