From: Amir Goldstein <amir73il@gmail.com>
To: David Howells <dhowells@redhat.com>
Cc: lsf-pc@lists.linux-foundation.org,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Anna Schumaker <anna.schumaker@netapp.com>,
Steve French <sfrench@samba.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Jeff Layton <jlayton@redhat.com>,
Miklos Szeredi <miklos@szeredi.hu>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work?
Date: Mon, 27 Jan 2020 21:18:26 +0200 [thread overview]
Message-ID: <CAOQ4uxgQzOsa-34aTcL7Tm68uMxq1anyC7qL+dHq3+yXAceeEA@mail.gmail.com> (raw)
In-Reply-To: <1477632.1580142761@warthog.procyon.org.uk>
On Mon, Jan 27, 2020 at 6:32 PM David Howells <dhowells@redhat.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> wrote:
>
> > My thinking is: Can't we implement a stackable cachefs which interfaces
> > with fscache and whose API to the netfs is pure vfs APIs, just like
> > overlayfs interfaces with lower fs?
>
> In short, no - doing it with pure the VFS APIs that we have is not that simple
> (yes, Solaris does it with a stacking filesystem, and I don't know anything
> about the API details, but there must be an auxiliary API). You need to
[...]
>
> > As long as netfs supports direct_IO() (all except afs do) then the active page
> > cache could be that of the stackable cachefs and network IO is always
> > direct from/to cachefs pages.
>
> What about objects that don't support DIO? Directories, symbolic links and
> automount points? All of these things are cacheable objects with AFS.
>
direct_IO is for not duplicating page cache.
Not relevant for those objects.
I guess that for those objects the invalidation callbacks is what matters.
> And speaking of automount points - how would you deal with those beyond simply
> caching the contents? Create a new stacked instance over it? How do you see
> the automount point itself?
>
I didn't get this far ;-)
> I see that the NFS FH encoder doesn't handle automount points.
>
> > If netfs supports export_operations (all except afs do), then indexing
> > the cache objects could be done in a generic manner using fsid and
> > file handle, just like overlayfs index feature works today.
>
> FSID isn't unique and doesn't exist for all filesystems. Two NFS servers, for
> example, can give you the same FSID, but referring to different things. AFS
> has a textual cell name and a volume ID that you need to combine; it doesn't
> have an FSID.
>
> This may work for overlayfs as the FSID can be confined to a particular
> overlay. However, that's not what we're dealing with. We would be talking
> about an index that potentially covers *all* the mounted netfs.
>
> Also, from your description that sounds like a bug in overlayfs. If the
> overlain NFS tree does a referral to a different server, you no longer have a
> unique FSID or a unique FH within that FSID so your index is broken.
>
I misspoke. Overlayfs uses s_uuid for index not fsid.
If s_uuid is null or no export ops, then index cannot be used.
So yeh, it's a challenge to auto index the netfs' objects.
> > Would it not be a maintenance win if all (or most of) the fscache logic
> > was yanked out of all the specific netfs's?
>
> Actually, it may not help enormously with disconnected operation. A certain
> amount of the logic probably has to be implemented in the netfs as each netfs
> provides different facilities for managing this.
>
> Yes, it gets some of the I/O stuff out - but I want to move some of that down
> into the VM if I can and librarifying the rest should take care of that.
>
> > Can you think of reasons why the stackable cachefs model cannot work
> > or why it is inferior to the current fscache integration model with netfs's?
>
> Yes. It's a lot more operationally expensive and it's harder to use. The
> cache driver would also have to get a lot bigger, but that would be
> reasonable.
>
> Firstly, the expense: you have to double up all the inodes and dentries that
> are in use - and that's not counting the resources used inside the cache
> itself.
Good point.
>
> Secondly, the administration: I'm assuming you're suggesting the way I think
> Solaris does it and that you have to make two mounts: firstly you mount the
> netfs and then you mount the cache over it. It's much simpler if you just
> need make the netfs mount only and then that goes and uses the cache if it's
> available - it's also simple to bring the cache online after the fact meaning
> you can even cache applied retroactively to a root filesystem.
>
All of the above is true is you mount the stacked cachefs to begin with.
You can add/remove the caches later.
> You also have the issue of what happens if someone bind-mounts the netfs mount
> and mounts the cache over only one of the views. Now you have a coherency
> management problem that the cache cannot see. It's only visible to the netfs,
> but the netfs doesn't know about the cache.
>
The shotgun to shoot the foot you mean - yap.
> There's also file locking. Overlayfs doesn't support file locking that I can
> see, but NFS, AFS and CIFS all do.
>
Not sure which locks you mean. flock and leases do work on overlayfs AFAIK.
But yes, every one of those things is a challenge with stacked fs, but overlayfs
has already made a lot of progress.
>
> Anyway, you might be able to guess that I'm really against using stackable
> filesystems for things like this and like UID shifting. I think it adds more
> expense and complexity than it's necessarily worth.
>
Yes, I figured as much :)
> I was more inclined to go with unionfs than overlayfs and do the filesystem
> union in the VFS as it ought to be cheaper if you're using it (whereas
> overlayfs is cheaper if you're not).
>
I guess competition is good.
Anyway, I am brewing a topic about filesystem APIs for
Hierarchic Storage Managers, such as https://vfsforgit.org/.
There are similarities between the requirements for HSM and for
disconnected operations for netfs - you might even say they are
not two different things. So we may want to bring them up together
in the same session or two adjacent sessions - we'll see.
Thanks,
Amir.
next prev parent reply other threads:[~2020-01-27 19:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-09 14:46 [LSF/MM/BPF TOPIC] How to make disconnected operation work? David Howells
2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein
2020-01-24 14:13 ` Amir Goldstein
2020-01-27 16:32 ` David Howells
2020-01-27 19:18 ` Amir Goldstein [this message]
2019-12-09 23:14 ` Jeff Layton
2020-03-06 7:11 ` Steven French
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOQ4uxgQzOsa-34aTcL7Tm68uMxq1anyC7qL+dHq3+yXAceeEA@mail.gmail.com \
--to=amir73il@gmail.com \
--cc=anna.schumaker@netapp.com \
--cc=dhowells@redhat.com \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=miklos@szeredi.hu \
--cc=sfrench@samba.org \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).