All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>, io-uring@vger.kernel.org
Subject: Re: [PATCH RFC] io_uring: limit inflight IO
Date: Sat, 9 Nov 2019 08:15:21 -0700	[thread overview]
Message-ID: <38f51d0c-cd27-6631-c4d3-06fbb26a5c1e@kernel.dk> (raw)
In-Reply-To: <d8002007-7641-3e9d-0560-123358300e66@kernel.dk>

On 11/9/19 7:23 AM, Jens Axboe wrote:
> On 11/9/19 4:16 AM, Pavel Begunkov wrote:
>>> I've been struggling a bit with how to make this reliable, and I'm not
>>> so sure there's a way to do that. Let's say an application sets up a
>>> ring with 8 sq entries, which would then default to 16 cq entries. With
>>> this patch, we'd allow 16 ios inflight. But what if the application does
>>>
>>> for (i = 0; i < 32; i++) {
>>> 	sqe = get_sqe();
>>> 	prep_sqe();
>>> 	submit_sqe();
>>> }
>>>
>>> And then directly proceeds to:
>>>
>>> do {
>>> 	get_completions();
>>> } while (has_completions);
>>>
>>> As long as fewer than 16 requests complete before we start reaping,
>>> we don't lose any events. Hence there's a risk of breaking existing
>>> setups with this, even though I don't think that's a high risk.
>>>
>>
>> I think, this should be considered as an erroneous usage of the API.
>> It's better to fail ASAP than to be surprised in a production
>> system, because of non-deterministic nature of such code. Even worse
>> with trying to debug such stuff.
>>
>> As for me, cases like below are too far-fetched
>>
>> for (i = 0; i < n; i++)
>> 	submit_read_sqe()
>> for (i = 0; i < n; i++) {
>> 	device_allow_next_read()
>> 	get_single_cqe()
>> }
> 
> I can't really disagree with that, it's a use case that's bound to fail
> every now and then...
> 
> But if we agree that's the case, then we should be able to just limit
> based on the cq ring size in question.
> 
> Do we make it different fro CQ_NODROP and !CQ_NODROP or not? Because the
> above case would work with CQ_NODROP, reliably. At least CQ_NODROP is
> new so we get to set the rules for that one, they just have to make
> sense.

Just tossing this one out there, it's an incremental to v2 of the patch.

- Check upfront if we're going over the limit, use the same kind of
  cost amortization logic except something that's appropriate for
  once-per-batch.

- Fold in with the backpressure -EBUSY logic

This avoids breaking up chains, for example, and also means we don't
have to run these checks for every request.

Limit is > 2 * cq_entries. I think that's liberal enough to not cause
issues, while still having a relation to the sq/cq ring sizes which
I like.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 18711d45b994..53ccd4e1dee2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -737,25 +737,6 @@ static struct io_kiocb *io_get_fallback_req(struct io_ring_ctx *ctx)
 	return NULL;
 }
 
-static bool io_req_over_limit(struct io_ring_ctx *ctx)
-{
-	unsigned inflight;
-
-	/*
-	 * This doesn't need to be super precise, so only check every once
-	 * in a while.
-	 */
-	if (ctx->cached_sq_head & ctx->sq_mask)
-		return false;
-
-	/*
-	 * Use 2x the max CQ ring size
-	 */
-	inflight = ctx->cached_sq_head -
-		  (ctx->cached_cq_tail + atomic_read(&ctx->cached_cq_overflow));
-	return inflight >= 2 * IORING_MAX_CQ_ENTRIES;
-}
-
 static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 				   struct io_submit_state *state)
 {
@@ -766,8 +747,6 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 		return ERR_PTR(-ENXIO);
 
 	if (!state) {
-		if (unlikely(io_req_over_limit(ctx)))
-			goto out_limit;
 		req = kmem_cache_alloc(req_cachep, gfp);
 		if (unlikely(!req))
 			goto fallback;
@@ -775,8 +754,6 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 		size_t sz;
 		int ret;
 
-		if (unlikely(io_req_over_limit(ctx)))
-			goto out_limit;
 		sz = min_t(size_t, state->ios_left, ARRAY_SIZE(state->reqs));
 		ret = kmem_cache_alloc_bulk(req_cachep, gfp, sz, state->reqs);
 
@@ -812,7 +789,6 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 	req = io_get_fallback_req(ctx);
 	if (req)
 		goto got_it;
-out_limit:
 	percpu_ref_put(&ctx->refs);
 	return ERR_PTR(-EBUSY);
 }
@@ -3021,6 +2997,30 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
 	return false;
 }
 
+static bool io_sq_over_limit(struct io_ring_ctx *ctx, unsigned to_submit)
+{
+	unsigned inflight;
+
+	if ((ctx->flags & IORING_SETUP_CQ_NODROP) &&
+	    !list_empty(&ctx->cq_overflow_list))
+		return true;
+
+	/*
+	 * This doesn't need to be super precise, so only check every once
+	 * in a while.
+	 */
+	if ((ctx->cached_sq_head & ctx->sq_mask) !=
+	    ((ctx->cached_sq_head + to_submit) & ctx->sq_mask))
+		return false;
+
+	/*
+	 * Limit us to 2x the CQ ring size
+	 */
+	inflight = ctx->cached_sq_head -
+		  (ctx->cached_cq_tail + atomic_read(&ctx->cached_cq_overflow));
+	return inflight > 2 * ctx->cq_entries;
+}
+
 static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 			  struct file *ring_file, int ring_fd,
 			  struct mm_struct **mm, bool async)
@@ -3031,8 +3031,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 	int i, submitted = 0;
 	bool mm_fault = false;
 
-	if ((ctx->flags & IORING_SETUP_CQ_NODROP) &&
-	    !list_empty(&ctx->cq_overflow_list))
+	if (unlikely(io_sq_over_limit(ctx, nr)))
 		return -EBUSY;
 
 	if (nr > IO_PLUG_THRESHOLD) {

-- 
Jens Axboe


  reply	other threads:[~2019-11-09 15:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 23:21 [PATCH RFC] io_uring: limit inflight IO Jens Axboe
2019-11-08  0:19 ` Jens Axboe
2019-11-08  9:56   ` Pavel Begunkov
2019-11-08 14:05     ` Jens Axboe
2019-11-08 17:45       ` Jens Axboe
2019-11-09 11:16         ` Pavel Begunkov
2019-11-09 14:23           ` Jens Axboe
2019-11-09 15:15             ` Jens Axboe [this message]
2019-11-09 19:24             ` Pavel Begunkov
2019-11-09 10:33       ` Pavel Begunkov
2019-11-09 14:12         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38f51d0c-cd27-6631-c4d3-06fbb26a5c1e@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.