From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABCFAC47082 for ; Sat, 29 May 2021 16:05:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 84C7D61157 for ; Sat, 29 May 2021 16:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229716AbhE2QGw (ORCPT ); Sat, 29 May 2021 12:06:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229693AbhE2QGw (ORCPT ); Sat, 29 May 2021 12:06:52 -0400 Received: from mail-il1-x130.google.com (mail-il1-x130.google.com [IPv6:2607:f8b0:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A693C061574; Sat, 29 May 2021 09:05:15 -0700 (PDT) Received: by mail-il1-x130.google.com with SMTP id j30so6065583ila.5; Sat, 29 May 2021 09:05:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UeCTtwbtJUiWSQAeIGeQfdmWGqZFg8Y6F6ODdUcAxyQ=; b=NfLMvDaTnZVbpmuFivIg/42Em+xIjAQ6JEDx1caupZNeLa4nEXUXKhRggGayEUxJsC KDNOqZXkXcha8IKsR2lRWmtQX2enI8P7PGmclb4tFcL53QqEyGgA8T3dQ4HM6DZzEzD5 vOwWYTFNJEdDU7Dvblw/sz6v2cx/+RQ/v8UuOETJzpOydOvoaeDBNaic5b0WPxbpjMsF +3puNnMHxjeI/PGW1ZBhwhb+E2msn7iiwWYWVzIQrvhxro4EPn4Yh4y7EeYMM1Zw3RI1 7rZ3yWdgCSU4JTo+vrvgiupz7OqPVw50pM1qlLFL0sv5QzRqhWBOd4Wx/A9eeS4u/Umm 1WIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UeCTtwbtJUiWSQAeIGeQfdmWGqZFg8Y6F6ODdUcAxyQ=; b=hrQx8VlDuabFLGdZm8sp6G9Qu1Eamh1GeAn273pPCagzgu2BqFK3nnojOHwq9k67qH drYayS2KXIQGAUY9vSyOwutDtPzf7FLv0fiis34UZEKvALEx3gBVyTQu8ICk2HZX8ewQ MFzdT3jfQtr07C2YwKViMsbOwn4ee0ciYubWl8zt8bLa24et4kjZ4sDAzVVWA5J3fxKx yGaJLRXcDJsGwV6GiHsbz/szvDmskN01fGZEsSiV0aIodaU6D2i4MzOUshWkvtCqeXQL newRn67QhAjvHZvxD/fb8Ma4ibxCmvpBgN5Xr9EMNsiviaoRM9UhUfMcx9V3jkcSZdYN HPAA== X-Gm-Message-State: AOAM530QC0qt7lkbOP108ADudlxhyN/UeCjL9S+AQkMvZ4noiP7Z0Zig gjpRULAM/xK3Lo/SJPOjjQtO67UcuZJFwpTg6H1YE0P9eTo= X-Google-Smtp-Source: ABdhPJzYfqm88FcV5qakZY1eh0A74PUChsz2A7bm6WNQlE9GniDhfI5RcYYqUTRGEUAgv0EDJSL8WAYyxhL6wI8gMU8= X-Received: by 2002:a92:cc43:: with SMTP id t3mr11751892ilq.250.1622304314595; Sat, 29 May 2021 09:05:14 -0700 (PDT) MIME-Version: 1.0 References: <1bb71cbf-0a10-34c7-409d-914058e102f6@virtuozzo.com> <20200922210445.GG57620@redhat.com> In-Reply-To: From: Amir Goldstein Date: Sat, 29 May 2021 19:05:03 +0300 Message-ID: Subject: Re: virtiofs uuid and file handles To: Miklos Szeredi Cc: Vivek Goyal , overlayfs , linux-fsdevel , Max Reitz Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi wrote: > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein wrote: > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi wrote: > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein wrote: > > > > > > > I think that the proper was to implement reliable persistent file > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > > FUSE protocol and allow the server to handle this. > > > > > > Max Reitz (Cc-d) is currently looking into this. > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > LOOKUP except it takes a {variable length handle, name} as input and > > > returns a variable length handle *and* a u64 node_id that can be used > > > normally for all other operations. > > > Miklos, Max, Any updates on LOOKUP_HANDLE work? > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > based fs) would be that userspace need not keep a refcounted object > > > around until the kernel sends a FORGET, but can prune its node ID > > > based cache at any time. If that happens and a request from the > > > client (kernel) comes in with a stale node ID, the server will return > > > -ESTALE and the client can ask for a new node ID with a special > > > lookup_handle(fh, NULL). > > > > > > Disadvantages being: > > > > > > - cost of generating a file handle on all lookups > > > > I never ran into a local fs implementation where this was expensive. > > > > > - cost of storing file handle in kernel icache > > > > > > I don't think either of those are problematic in the virtiofs case. > > > The cost of having to keep fds open while the client has them in its > > > cache is much higher. > > > > > > > Sounds good. > > I suppose flock() does need to keep the open fd on server. > > Open files are a separate issue and do need an active object in the server. > > The issue this solves is synchronizing "released" and "evicted" > states of objects between server and client. I.e. when a file is > closed (and no more open files exist referencing the same object) the > dentry refcount goes to zero but it remains in the cache. In this > state the server could really evict it's own cached object, but can't > because the client can gain an active reference at any time via > cached path lookup. > > One other solution would be for the server to send a notification > (NOTIFY_EVICT) that would try to clean out the object from the server > cache and respond with a FORGET if successful. But I sort of like > the file handle one better, since it solves multiple problems. > Even with LOOKUP_HANDLE, I am struggling to understand how we intend to invalidate all fuse dentries referring to ino X in case the server replies with reused ino X with a different generation that the one stored in fuse inode cache. This is an issue that I encountered when running the passthrough_hp test, on my filesystem. In tst_readdir_big() for example, underlying files are being unlinked and new files created reusing the old inode numbers. This creates a situation where server gets a lookup request for file B that uses the reused inode number X, while old file A is still in fuse dentry cache using the older generation of real inode number X which is still in fuse inode cache. Now the server knows that the real inode has been rused, because the server caches the old generation value, but it cannot reply to the lookup request before the old fuse inode has been invalidated. IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). We would also need to change fuse_dentry_revalidate() to detect the case of reused/invalidated inode. The straightforward way I can think of is to store inode generation in fuse_dentry. It won't even grow the size of the struct. Am I over complicating this? Thanks, Amir.