From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F33CEC433E1 for ; Mon, 3 Aug 2020 23:43:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B38C206F6 for ; Mon, 3 Aug 2020 23:43:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="JZwkyV/3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729309AbgHCXno (ORCPT ); Mon, 3 Aug 2020 19:43:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728941AbgHCXno (ORCPT ); Mon, 3 Aug 2020 19:43:44 -0400 Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com [IPv6:2607:f8b0:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65735C061756 for ; Mon, 3 Aug 2020 16:43:44 -0700 (PDT) Received: by mail-pl1-x641.google.com with SMTP id g19so9283683plq.0 for ; Mon, 03 Aug 2020 16:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=wTUV0IlBas5oY3OHR4PGqhPRpHlIYrzKntHQwyyC4pg=; b=JZwkyV/3Eehf9b+PMt5bu/laXByrKXwimmNuxX/xGhhZtg2tbSGS/YWaeMibIfmacK URdiACIdYLA/D54h+t5KqvoiEWtQcmbxfm8jJT1JRADgZD0H9VsIE5deUKlFwT8C+i/k ZodCKD6CU1WqIs12YAbRP2tcheRreCm6H5PaeCKcI5gF0l//JlTaZNm3Muwbgx81uxH2 eXW/XmFLc6sT2LgYKuChxNaNQp9JkiuN/dfDvigt5NDHG0hJLDNHcN6mfp2CkyoThb48 d4UO/4dNIxh2E+BKVuSBMURBDTC9aRwpgd2KtHdZ1/tbzXLoLwgJ8VqK9N7Q9PsRIvF3 QQfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=wTUV0IlBas5oY3OHR4PGqhPRpHlIYrzKntHQwyyC4pg=; b=HOFCzUJKQYVbRpaetih9KlVc0NoKJehi+Uz3ngfcg2O1T+kaf2gZqnx0zWFZKirdtp c59+R+JsclVqTj33mypPi1UjiXHt6zBJoAvTyG5lwSoweodBwQsgUicCBfvyk6ndoMTm Voc6dvMK+yGygNNG5RrldkHwUsahwId+rDitJWWZZgcB7EdDRzrYRJCuCNpI5WJ+h41X GxIyTICAP9AMK1IcBX93zSgmm7xLAdf09sDay10MC/c32O3maEs4S47AZr+w3DCP8PyP aDgXidmRM37vgslYLDxmiKRzzUkdncMc6/izLV3XcrCT0uyOBEr50Rtss3fzaVWTUMC2 9RxQ== X-Gm-Message-State: AOAM530L6EjKkM5WkAQ2mIiEmFqbjRck84Dvso7xBJ8nue+zdolWIC3Y 8MZbHkq4SbVs7uCZE+91YeXZaL7EOjM= X-Google-Smtp-Source: ABdhPJz4TXBq0r8YZ1GaCInBFqCj9mQJXkGirDbrQ/Gxaf7fypSbh8x+8m4vz5SckNI47yunXJSYTA== X-Received: by 2002:a17:902:b40f:: with SMTP id x15mr16327490plr.329.1596498223585; Mon, 03 Aug 2020 16:43:43 -0700 (PDT) Received: from [192.168.1.182] ([66.219.217.173]) by smtp.gmail.com with ESMTPSA id n13sm536262pjb.20.2020.08.03.16.43.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Aug 2020 16:43:43 -0700 (PDT) Subject: Re: [GIT PULL] io_uring changes for 5.9-rc1 To: Linus Torvalds Cc: io-uring , "linux-kernel@vger.kernel.org" References: <50466810-9148-e245-7c1e-e7435b753582@kernel.dk> <56cb11b1-7943-086e-fb31-6564f4d4d089@kernel.dk> From: Jens Axboe Message-ID: Date: Mon, 3 Aug 2020 17:43:42 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/3/20 5:34 PM, Linus Torvalds wrote: > On Mon, Aug 3, 2020 at 4:18 PM Jens Axboe wrote: >> >> >> I took a look at the rewrite you queued up, and made a matching change >> on the io_uring side: > > Oh, no, you made it worse. > > Now you're tying your odd wakeup routine to entirely irrelevant things > that can't even happen to you. > > That io_async_buf_func() will never be called for any entry that isn't > your own, so testing > > wait->flags & WQ_FLAG_EXCLUSIVE > > is completely pointless, because you never set that flag. And > similarly, for you to then do > > wait->flags |= WQ_FLAG_WOKEN; > > is equally pointless, because the only thing that cares and looks at > that wait entry is you, and you don't care about the WOKEN flag. > > So that patch shows a fundamental misunderstanding of how the > waitqueues actually work. > > Which is kind of my _point_. The io_uring code that hooked into the > page wait queues really looks like complete cut-and-paste voodoo > programming. > > It needs comments. It's hard to follow. Even somebody like me, who > actually knows how the page wait queues really work, have a really > hard time following how io_uring initializing a wait-queue entry and > pointing to it in the io ctx then interacts with the (later) generic > file reading path, and how it then calls back at unlock time to the > io_uring callback _if_ the page was locked. > > And that patch you point to makes me 100% sure you don't quite > understand the code either. > > So when you do > > /* > * Only test the bit if it's an exclusive wait, as we know the > * bit is cleared for non-exclusive waits. Also see mm/filemap.c > */ > if ((wait->flags & WQ_FLAG_EXCLUSIVE) && > test_and_set_bit(key->bit_nr, &key->page->flags)) > return -1; > > the first test guarantees that the second test is never done, which is > good, because if it *had* been done, you'd have taken the lock and > nothing you have actually expects that. > > So the fix is to just remove those lines entirely. If somebody > unlocked the page you care about, and did a wakeup on that page and > bit, then you know you should start the async worker. Noi amount of > testing bits matters at all. > > And similarly, the > > wait->flags |= WQ_FLAG_WOKEN; > > is a no-op because nothing tests that WQ_FLAG_WOKEN bit. That wait > entry is _your_ wait entry. It's not the wait entry of some normal > page locker - those use wake_page_function(). > > Now *if* you had workers that actually expected to be woken up with > the page lock already held, and owning it, then that kind of > WQ_FLAG_EXCLUSIVE and WQ_FLAG_WOKEN logic would be a good idea. But > that's not what you have. Yes, looks like I was a bit too trigger happy without grokking the whole thing, and got it mixed up with the broader more generic waitqueue cases. Thanks for clueing me in, I've updated the patch so the use case is inline with only what io_uring is doing here. >> and also queued a documentation patch for the retry logic and the >> callback handler: >> >> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=9541a9d4791c2d31ba74b92666edd3f1efd936a8 > > Better. Although I find the first comment a bit misleading. > > You say > > /* Invoked from our "page is now unlocked" handler when someone .. > > but that's not really the case. The function gets called by whoever > unlocks the page after you've registered that page wait entry through > lock_page_async(). > > So there's no "our handler" anywhere, which I find misleading and > confusing in the comment. The 'handler' refers to the io_uring waitqueue callback, but I should probably spell that out. I'll adjust it. -- Jens Axboe