From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 065A0C432C1 for ; Tue, 24 Sep 2019 10:11:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CCB43214AF for ; Tue, 24 Sep 2019 10:11:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="ph0fHye0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2440711AbfIXKLx (ORCPT ); Tue, 24 Sep 2019 06:11:53 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:36205 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438579AbfIXKLw (ORCPT ); Tue, 24 Sep 2019 06:11:52 -0400 Received: by mail-pf1-f196.google.com with SMTP id y22so1062548pfr.3 for ; Tue, 24 Sep 2019 03:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=zFj+W21UBEqh8XHsvLd6EQrEC8J1XYOJHhKVopUNcRg=; b=ph0fHye07aJm1FGweBetkYVsbQUiB0tpJyiAO2rPMjnyVDrKN1V9JuylbOJJh+Uiq6 ERZAssw97/WpZXs4cdaPKHzf9oVPvx3lZq9+zwjKc9fJXtHvMIGO8mSZevacuz3CEI2G HwDAGHD8AuJy+yE+YT1cct+oAg9qWOu/i6fEKHRlrf5MhZJk1tfKFd9IZyEgNwcKgRJU 3F1cLIR46AKWefArQYOgqOPc5P0qBhH4PCWZaUYYcaueDp4Fc7HX7ICkYnbENrEcABH1 z0isubH0xJDKGfjUGmhAz2d4Gcy+wBw+rYQwtEui9QEIvHcn30BzJP4QU/hr7uH5zv++ 380A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=zFj+W21UBEqh8XHsvLd6EQrEC8J1XYOJHhKVopUNcRg=; b=crajQPqdq7ivuSh1QY+ajT+JCypgRPJUY1vmWTvCBG7H3ojaXqXf/PcYuGQhLwisMB rhVa/V2D6X1gaUsFc2eDItTEIYno43XZgb06YF4TPqCNDPoTP/Ii5KPPO65ufgWBfTjm sx9HyD7rxRfHOmMeSF0GTvU71hgOLOMuo1TIcuUagnPnicQKwDY4AjI36arIlq2qcsNR nBi8xH6Gp6XOWoldGq0wHziWd4mTao0ijf1z4iyl0H17ku9LMj0c3UDzgRZrQM1bLdfR e5/Qu54Mw5GrTU8ZutrzxQt2sHSAesgL87o/AGBT305ka/226UMMuVFEh+hCVRAho9vk xu1Q== X-Gm-Message-State: APjAAAWluToPGgpzUAOdlK9nOki4XOG9Nj5nQph/efHICtfboq6h6oPW Hc6vgygJKaYZJBVpp4AhSz9G2g6wSDDi0zL/ X-Google-Smtp-Source: APXvYqx1RXW1RWcdH95KHMQFZU/aJNhyZZqCImx1y43LoC1hEiEikDXF2Coary3axGa9YA/Hsx+mcg== X-Received: by 2002:a63:2a87:: with SMTP id q129mr2374326pgq.101.1569319911155; Tue, 24 Sep 2019 03:11:51 -0700 (PDT) Received: from ?IPv6:2600:380:8419:743e:6023:99b1:fa9f:a39c? ([2600:380:8419:743e:6023:99b1:fa9f:a39c]) by smtp.gmail.com with ESMTPSA id n29sm4277137pgm.4.2019.09.24.03.11.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Sep 2019 03:11:50 -0700 (PDT) Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting To: Pavel Begunkov , Ingo Molnar Cc: Ingo Molnar , Peter Zijlstra , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org References: <20190923083549.GA42487@gmail.com> <731b2087-7786-5374-68ff-8cba42f0cd68@kernel.dk> <759b9b48-1de3-1d43-3e39-9c530bfffaa0@kernel.dk> <43244626-9cfd-0c0b-e7a1-878363712ef3@gmail.com> <0fec66fb-4534-59f8-cd88-d8d2297779aa@gmail.com> From: Jens Axboe Message-ID: Date: Tue, 24 Sep 2019 12:11:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <0fec66fb-4534-59f8-cd88-d8d2297779aa@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/24/19 3:33 AM, Pavel Begunkov wrote: > > > On 24/09/2019 11:36, Jens Axboe wrote: >> On 9/24/19 2:27 AM, Jens Axboe wrote: >>> On 9/24/19 2:02 AM, Jens Axboe wrote: >>>> On 9/24/19 1:06 AM, Pavel Begunkov wrote: >>>>> On 24/09/2019 02:00, Jens Axboe wrote: >>>>>>> I think we can do the same thing, just wrapping the waitqueue in a >>>>>>> structure with a count in it, on the stack. Got some flight time >>>>>>> coming up later today, let me try and cook up a patch. >>>>>> >>>>>> Totally untested, and sent out 5 min before departure... But something >>>>>> like this. >>>>> Hmm, reminds me my first version. Basically that's the same thing but >>>>> with macroses inlined. I wanted to make it reusable and self-contained, >>>>> though. >>>>> >>>>> If you don't think it could be useful in other places, sure, we could do >>>>> something like that. Is that so? >>>> >>>> I totally agree it could be useful in other places. Maybe formalized and >>>> used with wake_up_nr() instead of adding a new primitive? Haven't looked >>>> into that, I may be talking nonsense. >>>> >>>> In any case, I did get a chance to test it and it works for me. Here's >>>> the "finished" version, slightly cleaned up and with a comment added >>>> for good measure. >>> >>> Notes: >>> >>> This version gets the ordering right, you need exclusive waits to get >>> fifo ordering on the waitqueue. >>> >>> Both versions (yours and mine) suffer from the problem of potentially >>> waking too many. I don't think this is a real issue, as generally we >>> don't do threaded access to the io_urings. But if you had the following >>> tasks wait on the cqring: >>> >>> [min_events = 32], [min_events = 8], [min_events = 8] >>> >>> and we reach the io_cqring_events() == threshold, we'll wake all three. >>> I don't see a good solution to this, so I suspect we just live with >>> until proven an issue. Both versions are much better than what we have >>> now. >> >> Forgot an issue around signal handling, version below adds the >> right check for that too. > > It seems to be a good reason to not keep reimplementing > "prepare_to_wait*() + wait loop" every time, but keep it in sched :) I think if we do the ->private cleanup that Peter mentioned, then there's not much left in terms of consolidation. Not convinced the case is interesting enough to warrant a special helper. If others show up, it's easy enough to consolidate the use cases and unify them. If you look at wake_up_nr(), I would have thought that would be more widespread. But it really isn't. >> Curious what your test case was for this? > You mean a performance test case? It's briefly described in a comment > for the second patch. That's just rewritten io_uring-bench, with > 1. a thread generating 1 request per call in a loop > 2. and the second thread waiting for ~128 events. > Both are pinned to the same core. Gotcha, thanks. -- Jens Axboe