From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DFCDC43331 for ; Fri, 8 Nov 2019 00:20:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A14C02084C for ; Fri, 8 Nov 2019 00:20:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="G2Dv5sON" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725946AbfKHAUD (ORCPT ); Thu, 7 Nov 2019 19:20:03 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:35022 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726054AbfKHAUD (ORCPT ); Thu, 7 Nov 2019 19:20:03 -0500 Received: by mail-pl1-f195.google.com with SMTP id s10so2792013plp.2 for ; Thu, 07 Nov 2019 16:20:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:from:to:references:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=iT8KTat3rvuSmzK173prI1k3B/Q0JRSCNCAU8p1uN1o=; b=G2Dv5sONy5EwCX9OI7OoP6W01H+e/d8SFp+e3cR6d9R8SAUDQ6b8vOs4gDtjjm4bKc NMxucTj7uw5uUf1YzTslzyMxVektgnHdS5qcMOlf9PKsOS38LrqpRfzyEjeW9CDhw8TG CwmUakg5IYcGXLjU+7K+ULdeGVSJRqx+q7NYoKg4jHQSL97WDn/88korVfMLgEXunr+b I3ge3hDD2MaSAmgxlFqQZTBV/VqPEn+QJjlO/3xCB2Gbrjg6meUJgvlFIUjPh+nH1BGW W1ypozCVDRZDFB+FWtVRlelqZModLPDIBaUNbDr0onch1BOU+pRnCW4UIH3zZ5xfsvR3 pWkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=iT8KTat3rvuSmzK173prI1k3B/Q0JRSCNCAU8p1uN1o=; b=e+dXJHwlvlXzQoTLwJpFjnv00kTIsP5fxI5Nf72XymlDZs/7NzXGFG4utF4vvp9FkV Xwq6wasiYwgL+bwtSsxaBKGuuun7ahm/V2x8RyrgFdj6NtVh78r7rOgrZXYm+s2EqCjl NBO6OfcKYBggQV4cWcJkX8WjjwnsLgWo9uVIM+3HIdKU45GhrUxh4dhWgzGApYrXL3sG 5avfxuZHgi0h3BKN924s6SfRPpK4wUl677l9p4bv/GYG/qh9IVSfIWs19lMEo9OgWDpw wmt7dBkQEGRb4eimiQYrFCfzQfJ64UAHo4KYN9nFEIU0VR8+xhqKCJZwHLq+4zC6DSyH o9GQ== X-Gm-Message-State: APjAAAX/1HICiB8KYlNGxHK/QXdzdl4G3KNEMRGy42+j1a98WKoAvmLc B/FG4LKybauenUp19YNEaYQ4HKmPckg= X-Google-Smtp-Source: APXvYqyTl3YUk9hQ+tcriormbAzCz9ucYAPMFNdeFFrLBx8jWib6Q4k9oFxbVx7u3oL5ArPN3ZLRAw== X-Received: by 2002:a17:90a:2e03:: with SMTP id q3mr9187036pjd.63.1573172401162; Thu, 07 Nov 2019 16:20:01 -0800 (PST) Received: from [192.168.1.188] ([66.219.217.79]) by smtp.gmail.com with ESMTPSA id w15sm3793841pfn.13.2019.11.07.16.19.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Nov 2019 16:20:00 -0800 (PST) Subject: Re: [PATCH RFC] io_uring: limit inflight IO From: Jens Axboe To: io-uring@vger.kernel.org References: Message-ID: <33958a03-6714-ce51-9856-1dcbf53f67d5@kernel.dk> Date: Thu, 7 Nov 2019 17:19:58 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: io-uring-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 11/7/19 4:21 PM, Jens Axboe wrote: > I'd like some feedback on this one. Even tith the overflow backpressure > patch, we still have a potentially large gap where applications can > submit IO before we get any dropped events in the CQ ring. This is > especially true if the execution time of those requests are long > (unbounded). > > This adds IORING_SETUP_INFLIGHT, which if set, will return -EBUSY if we > have more IO pending than we can feasibly support. This is normally the > CQ ring size, but of IORING_SETUP_CQ_NODROP is enabled, then it's twice > the CQ ring size. > > This helps manage the pending queue size instead of letting it grow > indefinitely. > > Note that we could potentially just make this the default behavior - > applications need to handle -EAGAIN returns already, in case we run out > of memory, and if we change this to return -EAGAIN as well, then it > doesn't introduce any new failure cases. I'm tempted to do that... > > Anyway, comments solicited! After a little deliberation, I think we should go with the one that doesn't require users to opt-in. As mentioned, let's change it to -EAGAIN to not introduce a new errno for this. They essentially mean the same thing anyway. diff --git a/fs/io_uring.c b/fs/io_uring.c index f8344f95817e..4c488bf6e889 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -203,6 +203,7 @@ struct io_ring_ctx { unsigned sq_mask; unsigned sq_thread_idle; unsigned cached_sq_dropped; + atomic_t cached_cq_overflow; struct io_uring_sqe *sq_sqes; struct list_head defer_list; @@ -221,13 +222,12 @@ struct io_ring_ctx { struct { unsigned cached_cq_tail; - atomic_t cached_cq_overflow; unsigned cq_entries; unsigned cq_mask; + atomic_t cq_timeouts; struct wait_queue_head cq_wait; struct fasync_struct *cq_fasync; struct eventfd_ctx *cq_ev_fd; - atomic_t cq_timeouts; } ____cacheline_aligned_in_smp; struct io_rings *rings; @@ -705,16 +705,39 @@ static void io_cqring_add_event(struct io_kiocb *req, long res) io_cqring_ev_posted(ctx); } +static bool io_req_over_limit(struct io_ring_ctx *ctx) +{ + unsigned limit, inflight; + + /* + * This doesn't need to be super precise, so only check every once + * in a while. + */ + if (ctx->cached_sq_head & ctx->sq_mask) + return false; + + if (ctx->flags & IORING_SETUP_CQ_NODROP) + limit = 2 * ctx->cq_entries; + else + limit = ctx->cq_entries; + + inflight = ctx->cached_sq_head - + (ctx->cached_cq_tail + atomic_read(&ctx->cached_cq_overflow)); + return inflight >= limit; +} + static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx, - struct io_submit_state *state) + struct io_submit_state *state, bool force) { gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; struct io_kiocb *req; if (!percpu_ref_tryget(&ctx->refs)) - return NULL; + return ERR_PTR(-ENXIO); if (!state) { + if (unlikely(!force && io_req_over_limit(ctx))) + goto out; req = kmem_cache_alloc(req_cachep, gfp); if (unlikely(!req)) goto out; @@ -722,6 +745,8 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx, size_t sz; int ret; + if (unlikely(!force && io_req_over_limit(ctx))) + goto out; sz = min_t(size_t, state->ios_left, ARRAY_SIZE(state->reqs)); ret = kmem_cache_alloc_bulk(req_cachep, gfp, sz, state->reqs); @@ -754,7 +779,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx, return req; out: percpu_ref_put(&ctx->refs); - return NULL; + return ERR_PTR(-EAGAIN); } static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr) @@ -2963,10 +2988,11 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, struct io_kiocb *req; unsigned int sqe_flags; - req = io_get_req(ctx, statep); - if (unlikely(!req)) { + req = io_get_req(ctx, statep, false); + if (unlikely(IS_ERR(req))) { if (!submitted) - submitted = -EAGAIN; + submitted = PTR_ERR(req); + req = NULL; break; } if (!io_get_sqring(ctx, &req->submit)) { @@ -2986,9 +3012,11 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, if (link && (sqe_flags & IOSQE_IO_DRAIN)) { if (!shadow_req) { - shadow_req = io_get_req(ctx, NULL); - if (unlikely(!shadow_req)) + shadow_req = io_get_req(ctx, NULL, true); + if (unlikely(IS_ERR(shadow_req))) { + shadow_req = NULL; goto out; + } shadow_req->flags |= (REQ_F_IO_DRAIN | REQ_F_SHADOW_DRAIN); refcount_dec(&shadow_req->refs); } -- Jens Axboe