linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [RFC] io_uring CQ ring backpressure
Date: Wed, 6 Nov 2019 20:51:48 +0100	[thread overview]
Message-ID: <CAG48ez1_91Lk73sdpp1SiufOQShdP2zX6g9gMLW46gAvMioKOA@mail.gmail.com> (raw)
In-Reply-To: <37d8ba3d-27c7-7636-0343-23ec56e4bee7@kernel.dk>

On Wed, Nov 6, 2019 at 5:23 PM Jens Axboe <axboe@kernel.dk> wrote:
> Currently we drop completion events, if the CQ ring is full. That's fine
> for requests with bounded completion times, but it may make it harder to
> use io_uring with networked IO where request completion times are
> generally unbounded. Or with POLL, for example, which is also unbounded.
>
> This patch adds IORING_SETUP_CQ_NODROP, which changes the behavior a bit
> for CQ ring overflows. First of all, it doesn't overflow the ring, it
> simply stores backlog of completions that we weren't able to put into
> the CQ ring. To prevent the backlog from growing indefinitely, if the
> backlog is non-empty, we apply back pressure on IO submissions. Any
> attempt to submit new IO with a non-empty backlog will get an -EBUSY
> return from the kernel.
>
> I think that makes for a pretty sane API in terms of how the application
> can handle it. With CQ_NODROP enabled, we'll never drop a completion
> event (well unless we're totally out of memory...), but we'll also not
> allow submissions with a completion backlog.
[...]
> +static void io_cqring_overflow(struct io_ring_ctx *ctx, u64 ki_user_data,
> +                              long res)
> +       __must_hold(&ctx->completion_lock)
> +{
> +       struct cqe_drop *drop;
> +
> +       if (!(ctx->flags & IORING_SETUP_CQ_NODROP)) {
> +log_overflow:
> +               WRITE_ONCE(ctx->rings->cq_overflow,
> +                               atomic_inc_return(&ctx->cached_cq_overflow));
> +               return;
> +       }
> +
> +       drop = kmalloc(sizeof(*drop), GFP_ATOMIC);
> +       if (!drop)
> +               goto log_overflow;
> +
> +       drop->user_data = ki_user_data;
> +       drop->res = res;
> +       list_add_tail(&drop->list, &ctx->cq_overflow_list);
> +}

This could potentially consume moderately large amounts of atomic
memory quickly and without any guarantee that the memory will be freed
anytime soon, right? That seems moderately bad. Is there no way to
e.g. pre-reserve memory for completion events, or something like that?

  parent reply	other threads:[~2019-11-06 19:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-06 16:21 [RFC] io_uring CQ ring backpressure Jens Axboe
2019-11-06 19:12 ` Pavel Begunkov
2019-11-06 19:43   ` Jens Axboe
2019-11-06 19:51 ` Jann Horn [this message]
2019-11-06 20:08   ` Jens Axboe
2019-11-06 21:31     ` Jens Axboe
2019-11-06 21:54       ` Pavel Begunkov
2019-11-06 21:56         ` Jens Axboe
2019-11-06 22:42       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez1_91Lk73sdpp1SiufOQShdP2zX6g9gMLW46gAvMioKOA@mail.gmail.com \
    --to=jannh@google.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).