On 08/11/2019 18:51, Jens Axboe wrote:
> It's useful for the application to know if the kernel had to dip into
> using the backlog to prevent overflows. Let's keep on accounting any
> overflow in cq_ring->overflow, even if we handled it correctly. As it's
> impossible to get dropped events with IORING_SETUP_CQ_NODROP, overflow
> with CQ_NODROP enabled simply provides a hint to the application that it
> may reconsider using a bigger ring.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> Since this hasn't been released yet, we can tweak the behavior a bit. I
> think it makes sense to still account the overflows, even if we handled
> it correctly. If the application doesn't care, it simply doesn't need to
> look at cq_ring->overflow if it is using CQ_NODROP. But it may care, as
> it is less efficient than a suitably sized ring.
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 94ec44caac00..aa3b6149dfe9 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -666,10 +666,10 @@ static void io_cqring_overflow(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  			       long res)
>  	__must_hold(&ctx->completion_lock)
>  {
> -	if (!(ctx->flags & IORING_SETUP_CQ_NODROP)) {
> -		WRITE_ONCE(ctx->rings->cq_overflow,
> -				atomic_inc_return(&ctx->cached_cq_overflow));
> -	} else {
> +	WRITE_ONCE(ctx->rings->cq_overflow,
> +			atomic_inc_return(&ctx->cached_cq_overflow));
> +
> +	if (ctx->flags & IORING_SETUP_CQ_NODROP) {

We used cq_overflow to fix __io_sequence_defer().
This breaks the assumption:
cached_cq_tail + cached_cq_overflow == 
	total number of handled completions

First, we account overflow, and then add it to cq_ring
(i.e. cached_cq_tail++) in io_cqring_overflow_flush()


>  		refcount_inc(&req->refs);
>  		req->result = res;
>  		list_add_tail(&req->list, &ctx->cq_overflow_list);
> 

-- 
Pavel Begunkov