From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17312C433E0 for ; Mon, 3 Aug 2020 23:34:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 24E092076E for ; Mon, 3 Aug 2020 23:34:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596497672; bh=OkPvGD7iZRzVLsK6v5U9kSphpv+Tnf8gG598ArmXRMQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=AMmM6ihOC+v6rp9BP3KzmR91P6N+LnCQ9GolcC68ogIu2uUghEZ/1hgSt5rN/iIf/ +u3GqilfLhVU3NiYNBiHi4aXvuffHXdHUrNtOOA/D8FgvTkMhJcPekFmqNUmEd3dU0 dKZkRi+vLimeXD+U5rS7DxVxNZQXrCVWUVyR2YT8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727959AbgHCXe3 (ORCPT ); Mon, 3 Aug 2020 19:34:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726757AbgHCXe3 (ORCPT ); Mon, 3 Aug 2020 19:34:29 -0400 Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C23F7C061756 for ; Mon, 3 Aug 2020 16:34:28 -0700 (PDT) Received: by mail-lj1-x241.google.com with SMTP id i10so6764583ljn.2 for ; Mon, 03 Aug 2020 16:34:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=v8TTFTWbfKbzJ4mcB7seLc7/y2CHIUC97FgV8OkOono=; b=F+DIMYJhv5KPMlGDUTa9T/vYKNhd4CXOo71c2kks4dabj/+B1XBvF+aGTQ/nTePvid pK1HKnl+/vF74yKZh3I7XgnZjUrsPJ0JL1paxKh7jyAk1/Nb/kpbRlC6gRQdGMLRxJrW iTMkXCCLr3T4gE/XyW10n4kVc9Vb0XMmu7ocs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=v8TTFTWbfKbzJ4mcB7seLc7/y2CHIUC97FgV8OkOono=; b=iEda7v+m2p+zaBk+qNPzyyW/tBZQR/4nEKCbVKensXBB//mqytKml4D7p9Al6jVSDn zDmL1PN7MaKu//DXEgro9QUjWxhWcKd8uvzMte7jbOP/HyOxxHXorC8F/gRzUYzXnNwX jmdZFElhfOp/Wt7xo2OyVuCuDakxvw0wS703HlklpBih8818fxl8RQeTZkCxC6TlUceF tzW0FtJ64nmKlhuDLiEStBHFf4cW/oJAkOasSO1MiwSq5KebofreYuJ6qScuIqS0JUC8 Oi7fOMhmLpRdqU/X/SAXF5tEG8V7UZom82tyRBy3UrQZP9OCzrJAOut5dOcaXXRXuWRM X6wQ== X-Gm-Message-State: AOAM5329U14xNlQ0HgrpcrwtQBlBavKJlcpIfW4xU9xkeEXzy9NyujK6 6vneMdwvZrwZEM3k2hVMHTGdPjMc4XM= X-Google-Smtp-Source: ABdhPJyNAVjUCP6GcK2LozzrvGGTBBuzqECkg2FTf7N+CmymkOvNwLJorXQgeW6jC0/3p6TQIV5DlQ== X-Received: by 2002:a05:651c:155:: with SMTP id c21mr9368530ljd.453.1596497666699; Mon, 03 Aug 2020 16:34:26 -0700 (PDT) Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com. [209.85.167.46]) by smtp.gmail.com with ESMTPSA id q8sm5401693lfm.92.2020.08.03.16.34.25 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Aug 2020 16:34:26 -0700 (PDT) Received: by mail-lf1-f46.google.com with SMTP id i80so21350248lfi.13 for ; Mon, 03 Aug 2020 16:34:25 -0700 (PDT) X-Received: by 2002:a05:6512:3b7:: with SMTP id v23mr9734948lfp.10.1596497665541; Mon, 03 Aug 2020 16:34:25 -0700 (PDT) MIME-Version: 1.0 References: <50466810-9148-e245-7c1e-e7435b753582@kernel.dk> <56cb11b1-7943-086e-fb31-6564f4d4d089@kernel.dk> In-Reply-To: <56cb11b1-7943-086e-fb31-6564f4d4d089@kernel.dk> From: Linus Torvalds Date: Mon, 3 Aug 2020 16:34:09 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL] io_uring changes for 5.9-rc1 To: Jens Axboe Cc: io-uring , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 3, 2020 at 4:18 PM Jens Axboe wrote: > > > I took a look at the rewrite you queued up, and made a matching change > on the io_uring side: Oh, no, you made it worse. Now you're tying your odd wakeup routine to entirely irrelevant things that can't even happen to you. That io_async_buf_func() will never be called for any entry that isn't your own, so testing wait->flags & WQ_FLAG_EXCLUSIVE is completely pointless, because you never set that flag. And similarly, for you to then do wait->flags |= WQ_FLAG_WOKEN; is equally pointless, because the only thing that cares and looks at that wait entry is you, and you don't care about the WOKEN flag. So that patch shows a fundamental misunderstanding of how the waitqueues actually work. Which is kind of my _point_. The io_uring code that hooked into the page wait queues really looks like complete cut-and-paste voodoo programming. It needs comments. It's hard to follow. Even somebody like me, who actually knows how the page wait queues really work, have a really hard time following how io_uring initializing a wait-queue entry and pointing to it in the io ctx then interacts with the (later) generic file reading path, and how it then calls back at unlock time to the io_uring callback _if_ the page was locked. And that patch you point to makes me 100% sure you don't quite understand the code either. So when you do /* * Only test the bit if it's an exclusive wait, as we know the * bit is cleared for non-exclusive waits. Also see mm/filemap.c */ if ((wait->flags & WQ_FLAG_EXCLUSIVE) && test_and_set_bit(key->bit_nr, &key->page->flags)) return -1; the first test guarantees that the second test is never done, which is good, because if it *had* been done, you'd have taken the lock and nothing you have actually expects that. So the fix is to just remove those lines entirely. If somebody unlocked the page you care about, and did a wakeup on that page and bit, then you know you should start the async worker. Noi amount of testing bits matters at all. And similarly, the wait->flags |= WQ_FLAG_WOKEN; is a no-op because nothing tests that WQ_FLAG_WOKEN bit. That wait entry is _your_ wait entry. It's not the wait entry of some normal page locker - those use wake_page_function(). Now *if* you had workers that actually expected to be woken up with the page lock already held, and owning it, then that kind of WQ_FLAG_EXCLUSIVE and WQ_FLAG_WOKEN logic would be a good idea. But that's not what you have. > and also queued a documentation patch for the retry logic and the > callback handler: > > https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=9541a9d4791c2d31ba74b92666edd3f1efd936a8 Better. Although I find the first comment a bit misleading. You say /* Invoked from our "page is now unlocked" handler when someone .. but that's not really the case. The function gets called by whoever unlocks the page after you've registered that page wait entry through lock_page_async(). So there's no "our handler" anywhere, which I find misleading and confusing in the comment. Linus