From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B634EC433F5 for ; Fri, 1 Apr 2022 08:40:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243483AbiDAIml (ORCPT ); Fri, 1 Apr 2022 04:42:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230512AbiDAImk (ORCPT ); Fri, 1 Apr 2022 04:42:40 -0400 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6448025A4BD for ; Fri, 1 Apr 2022 01:40:48 -0700 (PDT) Received: by mail-ed1-x533.google.com with SMTP id h4so2119305edr.3 for ; Fri, 01 Apr 2022 01:40:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/ODp466/ZjqwhHwnyoWxPttVU2UXfEYkQNMY70RST0c=; b=bzHaC5XoP+bPSKB6EmcWOIIiQKbqWRnKIP3NtSCb0hffxjaj8dod+/JQuoe2UVojBr Ihsn8Tl2VeHEG+LjPjB3z3c0Ke0h53uf8LJ1Z4Kw8Y4/OHKVc0OWp7u69ukbKK1cKtuM otCDLsxAkmtfmkVcP8MmyMBYs4SVq+MhdC2A4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/ODp466/ZjqwhHwnyoWxPttVU2UXfEYkQNMY70RST0c=; b=TOe57BM4MwnXOlQoQLzglcJH3nPR7za7Fne7vCf8qpsHcUP6UeYUpyh2dZgYuVGYB4 HepmyHj2dgLwaIGGgdi5CFkOELS2Khq676vDuFU1E0b47slf+q8v7u4PzzEPuIecPhiu udVWnAxIBcny3dby97/MMEknMOPZZ9kVgHakSozEQASN7VAZ7TbBP5brBpSZdUWNMMjh 686fFw0UA6YBgWe46DU3MmYRA3Ax/oUNSXvjqEavhJReR20dsJf5P9h73UjIm2cYR6dV wNjlg7j4Z1PnmVYPAgRQSuTUI/37OvsoSqY3fjXW1+X4JHCbprDoqprS7AX9VXO3KzF7 3Zrg== X-Gm-Message-State: AOAM531YyPgkDjOnQW8lXzxkDHhg+eLoGKTPpo0i4Lp1oEhQ9bvqbC4U ln3777cREvWBKjepWVGqoLMxv9dUl6yFw17KcuGHuA== X-Google-Smtp-Source: ABdhPJydVxp7y2TOLY4NvudX+gcCH3h2oZaD5I2ZL5nfQtgFSdLNpRA0MrumqZZ1DX007ALSkZ1LvP9R79NLe+FqkkA= X-Received: by 2002:a50:d711:0:b0:410:a51a:77c5 with SMTP id t17-20020a50d711000000b00410a51a77c5mr19959999edi.154.1648802446994; Fri, 01 Apr 2022 01:40:46 -0700 (PDT) MIME-Version: 1.0 References: <115fc7d1-9b9c-712b-e75d-39b2041df437@kernel.dk> <89322bd1-5e6f-bcc6-7974-ffd22363a165@kernel.dk> <0c5745ab-5d3d-52c1-6a1d-e5e33d4078b5@kernel.dk> <52dca413-61b3-8ded-c4cc-dd6c8e8de1ed@kernel.dk> <23b62cca-8ec5-f250-e5a3-7e9ed983e190@kernel.dk> <77229971-72cd-7d78-d790-3ef4789acc9e@kernel.dk> <61c2336f-0315-5f76-3022-18c80f79e0b5@kernel.dk> <38436a44-5048-2062-c339-66679ae1e282@kernel.dk> In-Reply-To: <38436a44-5048-2062-c339-66679ae1e282@kernel.dk> From: Miklos Szeredi Date: Fri, 1 Apr 2022 10:40:35 +0200 Message-ID: Subject: Re: io_uring_prep_openat_direct() and link/drain To: Jens Axboe Cc: io-uring@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On Wed, 30 Mar 2022 at 19:49, Jens Axboe wrote: > > On 3/30/22 9:53 AM, Jens Axboe wrote: > > On 3/30/22 9:17 AM, Jens Axboe wrote: > >> On 3/30/22 9:12 AM, Miklos Szeredi wrote: > >>> On Wed, 30 Mar 2022 at 17:05, Jens Axboe wrote: > >>>> > >>>> On 3/30/22 8:58 AM, Miklos Szeredi wrote: > >>>>> Next issue: seems like file slot reuse is not working correctly. > >>>>> Attached program compares reads using io_uring with plain reads of > >>>>> proc files. > >>>>> > >>>>> In the below example it is using two slots alternately but the number > >>>>> of slots does not seem to matter, read is apparently always using a > >>>>> stale file (the prior one to the most recent open on that slot). See > >>>>> how the sizes of the files lag by two lines: > >>>>> > >>>>> root@kvm:~# ./procreads > >>>>> procreads: /proc/1/stat: ok (313) > >>>>> procreads: /proc/2/stat: ok (149) > >>>>> procreads: /proc/3/stat: read size mismatch 313/150 > >>>>> procreads: /proc/4/stat: read size mismatch 149/154 > >>>>> procreads: /proc/5/stat: read size mismatch 150/161 > >>>>> procreads: /proc/6/stat: read size mismatch 154/171 > >>>>> ... > >>>>> > >>>>> Any ideas? > >>>> > >>>> Didn't look at your code yet, but with the current tree, this is the > >>>> behavior when a fixed file is used: > >>>> > >>>> At prep time, if the slot is valid it is used. If it isn't valid, > >>>> assignment is deferred until the request is issued. > >>>> > >>>> Which granted is a bit weird. It means that if you do: > >>>> > >>>> > >>>> > >>>> the read will read from fileA. But for: > >>>> > >>>> > >>>> > >>>> since slot 1 is already valid at prep time for the read, the read will > >>>> be from fileA again. > >>>> > >>>> Is this what you are seeing? It's definitely a bit confusing, and the > >>>> only reason why I didn't change it is because it could potentially break > >>>> applications. Don't think there's a high risk of that, however, so may > >>>> indeed be worth it to just bite the bullet and the assignment is > >>>> consistent (eg always done from the perspective of the previous > >>>> dependent request having completed). > >>>> > >>>> Is this what you are seeing? > >>> > >>> Right, this explains it. Then the only workaround would be to wait > >>> for the open to finish before submitting the read, but that would > >>> defeat the whole point of using io_uring for this purpose. > >> > >> Honestly, I think we should just change it during this round, making it > >> consistent with the "slot is unused" use case. The old use case is more > >> more of a "it happened to work" vs the newer consistent behavior of "we > >> always assign the file when execution starts on the request". > >> > >> Let me spin a patch, would be great if you could test. > > > > Something like this on top of the current tree should work. Can you > > test? > > You can also just re-pull for-5.18/io_uring, it has been updated. A last > minute edit make a 0 return from io_assign_file() which should've been > 'true'... Yep, this works now. Next issue: will get ENFILE even though there are just 40 slots. When running as root, then it will get as far as invoking the OOM killer, which is really bad. There's no leak, this apparently only happens when the worker doing the fputs can't keep up. Simple solution: do the fput() of the previous file synchronously with the open_direct operation; fput shouldn't be expensive... Is there a reason why this wouldn't work? Thanks, Miklos