From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9869EC6FA82 for ; Mon, 12 Sep 2022 19:56:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229597AbiILT4y (ORCPT ); Mon, 12 Sep 2022 15:56:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229626AbiILT4x (ORCPT ); Mon, 12 Sep 2022 15:56:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0FE323BEF for ; Mon, 12 Sep 2022 12:56:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663012611; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=GuidJFb1HwC9tE/OAbLPUCFLcMK5isg+6W3HvVUdDBg=; b=UJXvcHZclPUBnhimhWEV8fDNvtsZMxbH8aIgJhgINdpgvfZ8ItZGyYRD7DIakU0zWDysNk t5O2n3TkZPIXr7JeKUJq4DwifPyimGf1/O3aKcl1xc8DeBGZNURgEb+Eb5nJbgOidDUPSX D9UtMPQdM1q4m9F5sM5rbxnBABJ1uyA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-137-ONV3ads8P1eSbhXfYS9oAg-1; Mon, 12 Sep 2022 15:56:48 -0400 X-MC-Unique: ONV3ads8P1eSbhXfYS9oAg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4567C1C1BD25; Mon, 12 Sep 2022 19:56:48 +0000 (UTC) Received: from horse.redhat.com (unknown [10.22.32.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 311C91121314; Mon, 12 Sep 2022 19:56:48 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id E1FFC2209F9; Mon, 12 Sep 2022 15:56:47 -0400 (EDT) Date: Mon, 12 Sep 2022 15:56:47 -0400 From: Vivek Goyal To: Amir Goldstein Cc: Miklos Szeredi , linux-fsdevel , Hanna Reitz Subject: Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) Message-ID: References: <20200922210445.GG57620@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote: > On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal wrote: > > > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote: > > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal wrote: > > > > > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi wrote: > > > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > > normally for all other operations. > > > > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > > based cache at any time. If that happens and a request from the > > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > > lookup_handle(fh, NULL). > > > > > > > > > > > > Disadvantages being: > > > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > > The cost of having to keep fds open while the client has them in its > > > > > > cache is much higher. > > > > > > > > > > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > > > > implementation of persistent file handles for FUSE. > > > > > > > > Hi Amir, > > > > > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was > > > > wondering how nodeid reuse is handled. > > > > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id. > > > > Ok. So this variable size id is basically file handle returned by > > host? > > > > So this looks little different from what Miklos had suggested. IIUC, > > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*. > > > > ********************************* > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > LOOKUP except it takes a {variable length handle, name} as input and > > returns a variable length handle *and* a u64 node_id that can be used > > normally for all other operations. > > *************************************** > > > > Ha! Thanks for reminding me about that. > It's been a while since I looked at what actually needs to be done. > That means that evicting server inodes from cache may not be as > easy as I had imagined. > > > > A server that declares support for LOOKUP_HANDLE must never > > > reuse a handle. > > > > > > That's the basic idea. Just as a filesystem that declares to support > > > exportfs must never reuse a file handle. > > > > > > > > > IOW, if server decides to drop > > > > nodeid from its cache and reuse it for some other file, how will we > > > > differentiate between two. Some sort of generation id encoded in > > > > nodeid? > > > > > > > > > > That's usually the way that file handles are implemented in > > > local fs. The inode number is the internal lookup index and the > > > generation part is advanced on reuse. > > > > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will > > > just use the native fs file handles, so virtiofsd can evict the inodes > > > entry from its cache completely, not only close the open fds. > > > > Ok, got it. Will be interesting to see how kernel fuse changes look > > to accomodate this variable sized nodeid. > > > > It may make sense to have a FUSE protocol dialect where nodeid > is variable size for all commands, but it probably won't be part of > the initial LOOKUP_HANDLE work. > > > > > > > That is what my libfuse_passthough POC does. > > > > Where have you hosted corresponding kernel changes? > > > > There are no kernel changes. > > For xfs and ext4 I know how to implement open_by_ino() > and I know how to parse the opaque fs file handle to extract > ino+generation from it and return them in FUSE_LOOKUP > response. Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but not necessarily on other filesystems like nfs. (Because they have their own way of encoding things in file handle). > > So I could manage to implement persistent NFS file handles > over the existing FUSE protocol with 64bit node id. And that explains that why you did not have to make kernel changes. But this does not allow sever to close the fd associate with nodeid? Or there is a way for server to generate file handle and then call open_by_handle_at(). Thanks Vivek