From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68D6EC43331 for ; Sat, 9 Nov 2019 14:14:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3190F207FA for ; Sat, 9 Nov 2019 14:14:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="HWycn4Tg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726436AbfKIOO1 (ORCPT ); Sat, 9 Nov 2019 09:14:27 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:46102 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726426AbfKIOO1 (ORCPT ); Sat, 9 Nov 2019 09:14:27 -0500 Received: by mail-pg1-f195.google.com with SMTP id r18so6005998pgu.13 for ; Sat, 09 Nov 2019 06:14:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Ts/jEoNHlk85ygzkQZRq2+zyzL7VU8Tj36WQ6x8ZIuk=; b=HWycn4TgEZDiBo8g1Vt9w66dOtP7qUda4b81xQ8XHfy0Ysp0ifoWnC3iLKoU0Iq2v4 yntvZepWnxYlsdq9dCsspmN0w+MOe6ShtVUlYAN37ThYDo3z1QZFrUbb1Cq1sSG7zH5l 1DUWaWUVYDeTsoLKGAkxkIMPksZ0WS7yQmCZ24VagCScW1z5SRIyuR/EwGHmO8hDK9ds QVmF16X5RGKCp73YpNhQjCCVfPSrtMyO0kjpvEcB6CWaoiihpfaItVVHAISfzgm1RQSB izXL1idVZj7KeBACY0Q0n1fyCJevMl2lsaUUTpG/6CvTThCv9ardi0dpAETs5gDks8+R 3Sng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Ts/jEoNHlk85ygzkQZRq2+zyzL7VU8Tj36WQ6x8ZIuk=; b=uPybOWwnsrOAKRA8ACgSRZZBOthDeGa3J7S8VaFI1trLqExgML8b6tC5GG9+8fK/Oe DdS2bdsM+AaKR0rBewrWlkXeSOyXI0zFoZ1/KKfbYUrFq6c4URYpYdzYypp/0Dj/Bw2z zmB09RYdMGUbTRnbbNNJEo11a3JNWMXDc10U6V0U00mN81UzT1e6/jmPjznxmyTPQARh fJYHdLtGByDQZkmCocygWevuqeY3NN1Ad86nitSlYxQqMGPQgbyp6y40h+veneY4zNZ3 JKuj26e53MR5w75zK1OqlXA7Veef/CuF36Q0XOHKsUoXvaBzoMGc+LyZYiXA/9If+w4O 42fg== X-Gm-Message-State: APjAAAUx/i6e0zk+zIcZyoaIyjPCWU1a5qLRNDE7mLyQ5AyArcny49Vo 9dF+Viv25xxTNQgXJI7gzSJ91A== X-Google-Smtp-Source: APXvYqw+Rv2cXhaXNvyYcKYJ/nsk+ickua0ctPo0lhliNj20NIUO2Lm4w/z2y+ZO4FH5G1I2sKIQhg== X-Received: by 2002:a17:90b:d85:: with SMTP id bg5mr2337077pjb.5.1573308866653; Sat, 09 Nov 2019 06:14:26 -0800 (PST) Received: from [192.168.1.188] ([66.219.217.79]) by smtp.gmail.com with ESMTPSA id l62sm9538826pgl.24.2019.11.09.06.14.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 09 Nov 2019 06:14:25 -0800 (PST) Subject: Re: [PATCH 3/3] io_uring: add support for backlogged CQ ring To: Pavel Begunkov , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, jannh@google.com References: <20191107160043.31725-1-axboe@kernel.dk> <20191107160043.31725-4-axboe@kernel.dk> From: Jens Axboe Message-ID: <39b387c8-9440-124a-e491-5847f1d68d2c@kernel.dk> Date: Sat, 9 Nov 2019 07:14:23 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: io-uring-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 11/9/19 5:33 AM, Pavel Begunkov wrote: > On 11/9/2019 3:25 PM, Pavel Begunkov wrote: >> On 11/7/2019 7:00 PM, Jens Axboe wrote: >>> Currently we drop completion events, if the CQ ring is full. That's fine >>> for requests with bounded completion times, but it may make it harder to >>> use io_uring with networked IO where request completion times are >>> generally unbounded. Or with POLL, for example, which is also unbounded. >>> >>> This patch adds IORING_SETUP_CQ_NODROP, which changes the behavior a bit >>> for CQ ring overflows. First of all, it doesn't overflow the ring, it >>> simply stores a backlog of completions that we weren't able to put into >>> the CQ ring. To prevent the backlog from growing indefinitely, if the >>> backlog is non-empty, we apply back pressure on IO submissions. Any >>> attempt to submit new IO with a non-empty backlog will get an -EBUSY >>> return from the kernel. This is a signal to the application that it has >>> backlogged CQ events, and that it must reap those before being allowed >>> to submit more IO.> >>> Signed-off-by: Jens Axboe >>> --- >>> fs/io_uring.c | 103 ++++++++++++++++++++++++++++------ >>> include/uapi/linux/io_uring.h | 1 + >>> 2 files changed, 87 insertions(+), 17 deletions(-) >>> >>> +static void io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) >>> +{ >>> + struct io_rings *rings = ctx->rings; >>> + struct io_uring_cqe *cqe; >>> + struct io_kiocb *req; >>> + unsigned long flags; >>> + LIST_HEAD(list); >>> + >>> + if (list_empty_careful(&ctx->cq_overflow_list)) >>> + return; >>> + if (ctx->cached_cq_tail - READ_ONCE(rings->cq.head) == >>> + rings->cq_ring_entries) >>> + return; >>> + >>> + spin_lock_irqsave(&ctx->completion_lock, flags); >>> + >>> + while (!list_empty(&ctx->cq_overflow_list)) { >>> + cqe = io_get_cqring(ctx); >>> + if (!cqe && !force) >>> + break;> + >>> + req = list_first_entry(&ctx->cq_overflow_list, struct io_kiocb, >>> + list); >>> + list_move(&req->list, &list); >>> + if (cqe) { >>> + WRITE_ONCE(cqe->user_data, req->user_data); >>> + WRITE_ONCE(cqe->res, req->result); >>> + WRITE_ONCE(cqe->flags, 0); >>> + } >> >> Hmm, second thought. We should account overflow here. >> > Clarification: We should account overflow in case of (!cqe). > > i.e. > if (!cqe) { // else > WRITE_ONCE(ctx->rings->cq_overflow, > atomic_inc_return(&ctx->cached_cq_overflow)); > } Ah yes, we should, even if this is only the flush path. I'll send out a patch for that, unless you beat me to it. -- Jens Axboe