All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Hao Xu <haoxu@linux.alibaba.com>, Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, Joseph Qi <joseph.qi@linux.alibaba.com>
Subject: Re: [PATCH 2/3] io_uring: maintain drain logic for multishot requests
Date: Wed, 7 Apr 2021 12:41:46 +0100	[thread overview]
Message-ID: <4d6f9688-4a8b-5fc6-f965-4903b5b82074@gmail.com> (raw)
In-Reply-To: <1617794605-35748-3-git-send-email-haoxu@linux.alibaba.com>

On 07/04/2021 12:23, Hao Xu wrote:
> Now that we have multishot poll requests, one sqe can emit multiple
> cqes. given below example:
>     sqe0(multishot poll)-->sqe1-->sqe2(drain req)
> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
> is a multishot poll request, sqe2 may be issued after sqe0's event
> triggered twice before sqe1 completed. This isn't what users leverage
> drain requests for.
> Here a simple solution is to ignore all multishot poll cqes, which means
> drain requests won't wait those request to be done.
> To achieve this, we should reconsider the req_need_defer equation, the
> original one is:
> 
>     all_sqes(excluding dropped ones) == all_cqes(including dropped ones)
> 
> this means we issue a drain request when all the previous submitted
> sqes have generated their cqes.
> Now we should ignore multishot requests, so:
>     all_sqes - multishot_sqes == all_cqes - multishot_cqes ==>
>     all_sqes + multishot_cqes - multishot_cqes == all_cqes
> 
> Thus we have to track the submittion of a multishot request and the cqes
> generation of it, including the ECANCELLED cqes. Here we introduce
> cq_extra = multishot_cqes - multishot_cqes for it.
> 
> There are other solutions like:
>   - just track multishot (non-ECNCELLED)cqes, don't track multishot sqes.
>       this way we include multishot sqes in the left end of the equation
>       this means we have to see multishot sqes as normal ones, then we
>       have to keep right one cqe for each multishot sqe. It's hard to do
>       this since there may be some multishot sqes which triggered
>       several events and then was cancelled, meanwhile other multishot
>       sqes just triggered events but wasn't cancelled. We still need to
>       track number of multishot sqes that haven't been cancelled, which
>       make things complicated
> 
> For implementations, just do the submittion tracking in
> io_submit_sqe() --> io_init_req() to make things simple. Otherwise if
> we do it in per opcode issue place, then we need to carefully consider
> each caller of io_req_complete_failed() because trick cases like cancel
> multishot reqs in link.
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 192463bb977a..a7bd223ce2cc 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -423,6 +423,7 @@ struct io_ring_ctx {
>  		unsigned		cq_mask;
>  		atomic_t		cq_timeouts;
>  		unsigned		cq_last_tm_flush;
> +		unsigned		cq_extra;
>  		unsigned long		cq_check_overflow;
>  		struct wait_queue_head	cq_wait;
>  		struct fasync_struct	*cq_fasync;
> @@ -879,6 +880,8 @@ struct io_op_def {
>  	unsigned		needs_async_setup : 1;
>  	/* should block plug */
>  	unsigned		plug : 1;
> +	/* set if opcode may generate multiple cqes */
> +	unsigned		multi_cqes : 1;
>  	/* size of async data needed, if any */
>  	unsigned short		async_size;
>  };
> @@ -924,6 +927,7 @@ struct io_op_def {
>  	[IORING_OP_POLL_ADD] = {
>  		.needs_file		= 1,
>  		.unbound_nonreg_file	= 1,
> +		.multi_cqes		= 1,
>  	},
>  	[IORING_OP_POLL_REMOVE] = {},
>  	[IORING_OP_SYNC_FILE_RANGE] = {
> @@ -1186,7 +1190,7 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>  	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>  		struct io_ring_ctx *ctx = req->ctx;
>  
> -		return seq != ctx->cached_cq_tail
> +		return seq + ctx->cq_extra != ctx->cached_cq_tail
>  				+ READ_ONCE(ctx->cached_cq_overflow);
>  	}
>  
> @@ -1516,6 +1520,9 @@ static bool __io_cqring_fill_event(struct io_kiocb *req, long res,
>  
>  	trace_io_uring_complete(ctx, req->user_data, res, cflags);
>  
> +	if (req->flags & REQ_F_MULTI_CQES)
> +		req->ctx->cq_extra++;
> +


Here we go, additional overhead burdening everyone but used for
a little new feature. All that can be done in poll or in *_prep()
on opcode by opcode basis.

>  	/*
>  	 * If we can't get a cq entry, userspace overflowed the
>  	 * submission (by quite a lot). Increment the overflow count in
> @@ -6504,6 +6511,13 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  	req->result = 0;
>  	req->work.creds = NULL;
>  
> +	if (sqe_flags & IOSQE_MULTI_CQES) {
> +		ctx->cq_extra--;
> +		if (!io_op_defs[req->opcode].multi_cqes) {
> +			return -EOPNOTSUPP;
> +		}
> +	}
> +

see above

>  	/* enforce forwards compatibility on users */
>  	if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) {
>  		req->flags = 0;
> 

-- 
Pavel Begunkov

  reply	other threads:[~2021-04-07 11:45 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-07 11:23 [PATCH 5.13 v2] io_uring: maintain drain requests' logic Hao Xu
2021-04-07 11:23 ` [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests Hao Xu
2021-04-07 11:38   ` Pavel Begunkov
2021-04-07 11:23 ` [PATCH 2/3] io_uring: maintain drain logic " Hao Xu
2021-04-07 11:41   ` Pavel Begunkov [this message]
2021-04-07 11:23 ` [PATCH 3/3] io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD Hao Xu
2021-04-07 15:49 ` [PATCH 5.13 v2] io_uring: maintain drain requests' logic Jens Axboe
2021-04-08 10:16   ` Hao Xu
2021-04-08 11:43     ` Hao Xu
2021-04-08 12:22       ` Pavel Begunkov
2021-04-08 16:18         ` Jens Axboe
2021-04-09  6:15           ` Hao Xu
2021-04-09  7:05             ` Hao Xu
2021-04-09  7:50               ` Pavel Begunkov
2021-04-12 15:07                 ` Hao Xu
2021-04-12 15:29                   ` Hao Xu
2021-04-09  3:12         ` Hao Xu
2021-04-09  3:43           ` Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d6f9688-4a8b-5fc6-f965-4903b5b82074@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=haoxu@linux.alibaba.com \
    --cc=io-uring@vger.kernel.org \
    --cc=joseph.qi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.