linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] How to make disconnected operation work?
@ 2019-12-09 14:46 David Howells
  2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein
  2019-12-09 23:14 ` Jeff Layton
  0 siblings, 2 replies; 7+ messages in thread
From: David Howells @ 2019-12-09 14:46 UTC (permalink / raw)
  To: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French
  Cc: dhowells, jlayton, linux-fsdevel

I've been rewriting fscache and cachefiles to massively simplify it and make
use of the kiocb interface to do direct-I/O to/from the netfs's pages which
didn't exist when I first did this.

	https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/
	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

I'm getting towards the point where it's working and able to do basic caching
once again.  So now I've been thinking about what it'd take to support
disconnected operation.  Here's a list of things that I think need to be
considered or dealt with:

 (1) Making sure the working set is present in the cache.

     - Userspace (find/cat/tar)
     - Splice netfs -> cache
     - Metadata storage (e.g. directories)
     - Permissions caching

 (2) Making sure the working set doesn't get culled.

     - Pinning API (cachectl() syscall?)
     - Allow culling to be disabled entirely on a cache
     - Per-fs/per-dir config

 (3) Switching into/out of disconnected mode.

     - Manual, automatic
     - On what granularity?
       - Entirety of fs (eg. all nfs)
       - By logical unit (server, volume, cell, share)

 (4) Local changes in disconnected mode.

     - Journal
     - File identifier allocation
     - statx flag to indicate provisional nature of info
     - New error codes
	- EDISCONNECTED - Op not available in disconnected mode
	- EDISCONDATA - Data not available in disconnected mode
	- EDISCONPERM - Permission cannot be checked in disconnected mode
	- EDISCONFULL - Disconnected mode cache full
     - SIGIO support?

 (5) Reconnection.

     - Proactive or JIT synchronisation
       - Authentication
     - Conflict detection and resolution
	 - ECONFLICTED - Disconnected mode resolution failed
     - Journal replay
     - Directory 'diffing' to find remote deletions
     - Symlink and other non-regular file comparison

 (6) Conflict resolution.

     - Automatic where possible
       - Just create/remove new non-regular files if possible
       - How to handle permission differences?
     - How to let userspace access conflicts?
       - Move local copy to 'lost+found'-like directory
       	 - Might not have been completely downloaded
       - New open() flags?
       	 - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT
       - fcntl() to switch variants?

 (7) GUI integration.

     - Entering/exiting disconnected mode notification/switches.
     - Resolution required notification.
     - Cache getting full notification.

Can anyone think of any more considerations?  What do you think of the
proposed error codes and open flags?  Is that the best way to do this?

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work?
  2019-12-09 14:46 [LSF/MM/BPF TOPIC] How to make disconnected operation work? David Howells
@ 2019-12-09 17:33 ` Amir Goldstein
  2020-01-24 14:13   ` Amir Goldstein
  2020-01-27 16:32   ` David Howells
  2019-12-09 23:14 ` Jeff Layton
  1 sibling, 2 replies; 7+ messages in thread
From: Amir Goldstein @ 2019-12-09 17:33 UTC (permalink / raw)
  To: David Howells
  Cc: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French,
	linux-fsdevel, Jeff Layton, Miklos Szeredi

On Mon, Dec 9, 2019 at 4:47 PM David Howells <dhowells@redhat.com> wrote:
>
> I've been rewriting fscache and cachefiles to massively simplify it and make
> use of the kiocb interface to do direct-I/O to/from the netfs's pages which
> didn't exist when I first did this.
>
>         https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/
>         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> I'm getting towards the point where it's working and able to do basic caching
> once again.  So now I've been thinking about what it'd take to support
> disconnected operation.  Here's a list of things that I think need to be
> considered or dealt with:
>
>  (1) Making sure the working set is present in the cache.
>
>      - Userspace (find/cat/tar)
>      - Splice netfs -> cache
>      - Metadata storage (e.g. directories)
>      - Permissions caching
>
>  (2) Making sure the working set doesn't get culled.
>
>      - Pinning API (cachectl() syscall?)
>      - Allow culling to be disabled entirely on a cache
>      - Per-fs/per-dir config
>
>  (3) Switching into/out of disconnected mode.
>
>      - Manual, automatic
>      - On what granularity?
>        - Entirety of fs (eg. all nfs)
>        - By logical unit (server, volume, cell, share)
>
>  (4) Local changes in disconnected mode.
>
>      - Journal
>      - File identifier allocation
>      - statx flag to indicate provisional nature of info
>      - New error codes
>         - EDISCONNECTED - Op not available in disconnected mode
>         - EDISCONDATA - Data not available in disconnected mode
>         - EDISCONPERM - Permission cannot be checked in disconnected mode
>         - EDISCONFULL - Disconnected mode cache full
>      - SIGIO support?
>
>  (5) Reconnection.
>
>      - Proactive or JIT synchronisation
>        - Authentication
>      - Conflict detection and resolution
>          - ECONFLICTED - Disconnected mode resolution failed
>      - Journal replay
>      - Directory 'diffing' to find remote deletions
>      - Symlink and other non-regular file comparison
>
>  (6) Conflict resolution.
>
>      - Automatic where possible
>        - Just create/remove new non-regular files if possible
>        - How to handle permission differences?
>      - How to let userspace access conflicts?
>        - Move local copy to 'lost+found'-like directory
>          - Might not have been completely downloaded
>        - New open() flags?
>          - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT
>        - fcntl() to switch variants?
>
>  (7) GUI integration.
>
>      - Entering/exiting disconnected mode notification/switches.
>      - Resolution required notification.
>      - Cache getting full notification.
>
> Can anyone think of any more considerations?  What do you think of the
> proposed error codes and open flags?  Is that the best way to do this?
>

Hi David,

I am very interested in this topic.
I can share (some) information from experience with a "Caching Gateway"
implementation in userspace shipped in products of my employer, CTERA.

I have come across several attempts to implement a network fs cache
using overlayfs. I don't remember by whom, but they were asking
questions on overlayfs list about online modification to lower layer.

It is not so far fetched, as you get many of the requirements for metadata
caching out-of-the-box, especially with recent addition of metacopy feature.
Also, if you consider the plans to implement overlayfs page cache [1][2],
then at least the read side of fscache sounds like it has some things in
common with overlayfs.

Anyway, you should know plenty about overlayfs to say if you think
there is any room for collaboration between the two projects.

Thanks,
Amir.

[1] https://marc.info/?l=linux-unionfs&m=154995746503505&w=2
[2] https://github.com/amir73il/linux/commits/ovl-aops-wip

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] How to make disconnected operation work?
  2019-12-09 14:46 [LSF/MM/BPF TOPIC] How to make disconnected operation work? David Howells
  2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein
@ 2019-12-09 23:14 ` Jeff Layton
  2020-03-06  7:11   ` Steven French
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2019-12-09 23:14 UTC (permalink / raw)
  To: David Howells, lsf-pc, Trond Myklebust, Anna Schumaker, Steve French
  Cc: linux-fsdevel

On Mon, 2019-12-09 at 14:46 +0000, David Howells wrote:
> I've been rewriting fscache and cachefiles to massively simplify it and make
> use of the kiocb interface to do direct-I/O to/from the netfs's pages which
> didn't exist when I first did this.
> 
> 	https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/
> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> 
> I'm getting towards the point where it's working and able to do basic caching
> once again.  So now I've been thinking about what it'd take to support
> disconnected operation.  Here's a list of things that I think need to be
> considered or dealt with:
> 

I'm quite interested in this too. I see that you've already given a lot
of thought to potential interfaces here. I think we'll end up having to
add a fair number of new interfaces to make something like this work.

>  (1) Making sure the working set is present in the cache.
> 
>      - Userspace (find/cat/tar)
>      - Splice netfs -> cache
>      - Metadata storage (e.g. directories)
>      - Permissions caching
> 
>  (2) Making sure the working set doesn't get culled.
> 
>      - Pinning API (cachectl() syscall?)
>      - Allow culling to be disabled entirely on a cache
>      - Per-fs/per-dir config
> 
>  (3) Switching into/out of disconnected mode.
> 
>      - Manual, automatic
>      - On what granularity?
>        - Entirety of fs (eg. all nfs)
>        - By logical unit (server, volume, cell, share)
>
>  (4) Local changes in disconnected mode.
> 
>      - Journal
>      - File identifier allocation

Yep, necessary if you want to allow disconnected creates. By coincidence
I'm working an (experimental) patchset now to add async create support
to kcephfs, and part of that involves delegating out ranges of inode
numbers. I may have some experience to report with it by the time LSF
rolls around.

>      - statx flag to indicate provisional nature of info
>      - New error codes
> 	- EDISCONNECTED - Op not available in disconnected mode
> 	- EDISCONDATA - Data not available in disconnected mode
> 	- EDISCONPERM - Permission cannot be checked in disconnected mode
> 	- EDISCONFULL - Disconnected mode cache full
>      - SIGIO support?
> 
>  (5) Reconnection.
> 
>      - Proactive or JIT synchronisation
>        - Authentication
>      - Conflict detection and resolution
> 	 - ECONFLICTED - Disconnected mode resolution failed

ECONFLICTED sort of implies that reconnection will be manual. If it
happens automagically in the background you'll have no way to report
such errors.

Also, you'll need some mechanism to know what inodes are conflicted.
This is the real difficult part of this problem, IMO.


>      - Journal replay
>      - Directory 'diffing' to find remote deletions
>      - Symlink and other non-regular file comparison
> 
>  (6) Conflict resolution.
> 
>      - Automatic where possible
>        - Just create/remove new non-regular files if possible
>        - How to handle permission differences?
>      - How to let userspace access conflicts?
>        - Move local copy to 'lost+found'-like directory
>        	 - Might not have been completely downloaded
>        - New open() flags?
>        	 - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT
>        - fcntl() to switch variants?
> 

Again, conflict resolution is the difficult part. Maybe the right
solution is to look at snapshotting-style interfaces -- i.e., handle a
disconnected mount sort of like you would a writable snapshot. Do any
(local) fs' currently offer writable snapshots, btw?

>  (7) GUI integration.
> 
>      - Entering/exiting disconnected mode notification/switches.
>      - Resolution required notification.
>      - Cache getting full notification.
> 
> Can anyone think of any more considerations?  What do you think of the
> proposed error codes and open flags?  Is that the best way to do this?
> 
> David
> 

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work?
  2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein
@ 2020-01-24 14:13   ` Amir Goldstein
  2020-01-27 16:32   ` David Howells
  1 sibling, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2020-01-24 14:13 UTC (permalink / raw)
  To: David Howells
  Cc: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French,
	linux-fsdevel, Jeff Layton, Miklos Szeredi,
	Linux NFS Mailing List

On Mon, Dec 9, 2019 at 7:33 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Dec 9, 2019 at 4:47 PM David Howells <dhowells@redhat.com> wrote:
> >
> > I've been rewriting fscache and cachefiles to massively simplify it and make
> > use of the kiocb interface to do direct-I/O to/from the netfs's pages which
> > didn't exist when I first did this.
> >
> >         https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/
> >         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> >
> > I'm getting towards the point where it's working and able to do basic caching
> > once again.  So now I've been thinking about what it'd take to support
> > disconnected operation.  Here's a list of things that I think need to be
> > considered or dealt with:
> >
> >  (1) Making sure the working set is present in the cache.
> >
> >      - Userspace (find/cat/tar)
> >      - Splice netfs -> cache
> >      - Metadata storage (e.g. directories)
> >      - Permissions caching
> >
> >  (2) Making sure the working set doesn't get culled.
> >
> >      - Pinning API (cachectl() syscall?)
> >      - Allow culling to be disabled entirely on a cache
> >      - Per-fs/per-dir config
> >
> >  (3) Switching into/out of disconnected mode.
> >
> >      - Manual, automatic
> >      - On what granularity?
> >        - Entirety of fs (eg. all nfs)
> >        - By logical unit (server, volume, cell, share)
> >
> >  (4) Local changes in disconnected mode.
> >
> >      - Journal
> >      - File identifier allocation
> >      - statx flag to indicate provisional nature of info
> >      - New error codes
> >         - EDISCONNECTED - Op not available in disconnected mode
> >         - EDISCONDATA - Data not available in disconnected mode
> >         - EDISCONPERM - Permission cannot be checked in disconnected mode
> >         - EDISCONFULL - Disconnected mode cache full
> >      - SIGIO support?
> >
> >  (5) Reconnection.
> >
> >      - Proactive or JIT synchronisation
> >        - Authentication
> >      - Conflict detection and resolution
> >          - ECONFLICTED - Disconnected mode resolution failed
> >      - Journal replay
> >      - Directory 'diffing' to find remote deletions
> >      - Symlink and other non-regular file comparison
> >
> >  (6) Conflict resolution.
> >
> >      - Automatic where possible
> >        - Just create/remove new non-regular files if possible
> >        - How to handle permission differences?
> >      - How to let userspace access conflicts?
> >        - Move local copy to 'lost+found'-like directory
> >          - Might not have been completely downloaded
> >        - New open() flags?
> >          - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT
> >        - fcntl() to switch variants?
> >
> >  (7) GUI integration.
> >
> >      - Entering/exiting disconnected mode notification/switches.
> >      - Resolution required notification.
> >      - Cache getting full notification.
> >
> > Can anyone think of any more considerations?  What do you think of the
> > proposed error codes and open flags?  Is that the best way to do this?
> >
>
> Hi David,
>
> I am very interested in this topic.
> I can share (some) information from experience with a "Caching Gateway"
> implementation in userspace shipped in products of my employer, CTERA.
>
> I have come across several attempts to implement a network fs cache
> using overlayfs. I don't remember by whom, but they were asking
> questions on overlayfs list about online modification to lower layer.
>
> It is not so far fetched, as you get many of the requirements for metadata
> caching out-of-the-box, especially with recent addition of metacopy feature.
> Also, if you consider the plans to implement overlayfs page cache [1][2],
> then at least the read side of fscache sounds like it has some things in
> common with overlayfs.
>
> Anyway, you should know plenty about overlayfs to say if you think
> there is any room for collaboration between the two projects.
>
>
> [1] https://marc.info/?l=linux-unionfs&m=154995746503505&w=2
> [2] https://github.com/amir73il/linux/commits/ovl-aops-wip

David,

I have been reading through the fscache APIs and tried to answer this
(maybe stupid) question:

Why does every netfs need to implement fscache support on its own?
fscache support as it is today is extremely intrusive to filesystem code
and your re-write doesn't make it any less intrusive.

My thinking is: Can't we implement a stackable cachefs which interfaces
with fscache and whose API to the netfs is pure vfs APIs, just like
overlayfs interfaces with lower fs?

The only fscache API I could find that really needs to be called from
netfs code is fscache_invalidate() and many of those calls are invoked
from vfs ops anyway, so maybe they could also be hoisted to this cachefs.

As long as netfs supports direct_IO() (all except afs do) then the active page
cache could be that of the stackable cachefs and network IO is always
direct from/to cachefs pages.

If netfs supports export_operations (all except afs do), then indexing
the cache objects could be done in a generic manner using fsid and
file handle, just like overlayfs index feature works today.

Would it not be a maintenance win if all (or most of) the fscache logic
was yanked out of all the specific netfs's?

Can you think of reasons why the stackable cachefs model cannot work
or why it is inferior to the current fscache integration model with netfs's?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work?
  2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein
  2020-01-24 14:13   ` Amir Goldstein
@ 2020-01-27 16:32   ` David Howells
  2020-01-27 19:18     ` Amir Goldstein
  1 sibling, 1 reply; 7+ messages in thread
From: David Howells @ 2020-01-27 16:32 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: dhowells, lsf-pc, Trond Myklebust, Anna Schumaker, Steve French,
	linux-fsdevel, Jeff Layton, Miklos Szeredi,
	Linux NFS Mailing List

Amir Goldstein <amir73il@gmail.com> wrote:

> My thinking is: Can't we implement a stackable cachefs which interfaces
> with fscache and whose API to the netfs is pure vfs APIs, just like
> overlayfs interfaces with lower fs?

In short, no - doing it with pure the VFS APIs that we have is not that simple
(yes, Solaris does it with a stacking filesystem, and I don't know anything
about the API details, but there must be an auxiliary API).  You need to
handle:

 (1) Remote invalidation.  The netfs needs to tell the cache layer
     asynchronously about remote modifications - where the modification can
     modify not just file content but also directory structure, and even file
     data invalidation may be partial.

 (2) Unique file group matching.  The info required to match a group of files
     (e.g. an NFS server, an AFS volume, a CIFS share) is not necessarily
     available through the VFS API - I'm not sure even the export API makes
     this available since it's built on the assumption that it's exporting
     local files.

 (3) File matching.  The info required to match a file to the cache is not
     necessarily available through the VFS API.  NFS has file handles, for
     example; the YFS variant of AFS has 96-bit 'inode numbers'.  (This might
     be done with the export API - it that's counted so).  Further, the file
     identifier may not be unique outside the file group.

 (4) Coherency management.  The netfs must tell the cache whether or not the
     data contained in the cache is valid.  This information is not
     necessarily available through the VFS APIs (NFS change IDs, AFS data
     version, AFS volume sync info).  It's also highly filesystem specific.

It might also have security implications for netfs's that handle their own
security (such as AFS does), but that might fall out naturally.

> As long as netfs supports direct_IO() (all except afs do) then the active page
> cache could be that of the stackable cachefs and network IO is always
> direct from/to cachefs pages.

What about objects that don't support DIO?  Directories, symbolic links and
automount points?  All of these things are cacheable objects with AFS.

And speaking of automount points - how would you deal with those beyond simply
caching the contents?  Create a new stacked instance over it?  How do you see
the automount point itself?

I see that the NFS FH encoder doesn't handle automount points.

> If netfs supports export_operations (all except afs do), then indexing
> the cache objects could be done in a generic manner using fsid and
> file handle, just like overlayfs index feature works today.

FSID isn't unique and doesn't exist for all filesystems.  Two NFS servers, for
example, can give you the same FSID, but referring to different things.  AFS
has a textual cell name and a volume ID that you need to combine; it doesn't
have an FSID.

This may work for overlayfs as the FSID can be confined to a particular
overlay.  However, that's not what we're dealing with.  We would be talking
about an index that potentially covers *all* the mounted netfs.

Also, from your description that sounds like a bug in overlayfs.  If the
overlain NFS tree does a referral to a different server, you no longer have a
unique FSID or a unique FH within that FSID so your index is broken.

> Would it not be a maintenance win if all (or most of) the fscache logic
> was yanked out of all the specific netfs's?

Actually, it may not help enormously with disconnected operation.  A certain
amount of the logic probably has to be implemented in the netfs as each netfs
provides different facilities for managing this.

Yes, it gets some of the I/O stuff out - but I want to move some of that down
into the VM if I can and librarifying the rest should take care of that.

> Can you think of reasons why the stackable cachefs model cannot work
> or why it is inferior to the current fscache integration model with netfs's?

Yes.  It's a lot more operationally expensive and it's harder to use.  The
cache driver would also have to get a lot bigger, but that would be
reasonable.

Firstly, the expense: you have to double up all the inodes and dentries that
are in use - and that's not counting the resources used inside the cache
itself.

Secondly, the administration: I'm assuming you're suggesting the way I think
Solaris does it and that you have to make two mounts: firstly you mount the
netfs and then you mount the cache over it.  It's much simpler if you just
need make the netfs mount only and then that goes and uses the cache if it's
available - it's also simple to bring the cache online after the fact meaning
you can even cache applied retroactively to a root filesystem.

You also have the issue of what happens if someone bind-mounts the netfs mount
and mounts the cache over only one of the views.  Now you have a coherency
management problem that the cache cannot see.  It's only visible to the netfs,
but the netfs doesn't know about the cache.

There's also file locking.  Overlayfs doesn't support file locking that I can
see, but NFS, AFS and CIFS all do.


Anyway, you might be able to guess that I'm really against using stackable
filesystems for things like this and like UID shifting.  I think it adds more
expense and complexity than it's necessarily worth.

I was more inclined to go with unionfs than overlayfs and do the filesystem
union in the VFS as it ought to be cheaper if you're using it (whereas
overlayfs is cheaper if you're not).

One final thing - even if we did want to switch to an stacked approach, we
might still have to maintain the current way as people use it.

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work?
  2020-01-27 16:32   ` David Howells
@ 2020-01-27 19:18     ` Amir Goldstein
  0 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2020-01-27 19:18 UTC (permalink / raw)
  To: David Howells
  Cc: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French,
	linux-fsdevel, Jeff Layton, Miklos Szeredi,
	Linux NFS Mailing List

On Mon, Jan 27, 2020 at 6:32 PM David Howells <dhowells@redhat.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> wrote:
>
> > My thinking is: Can't we implement a stackable cachefs which interfaces
> > with fscache and whose API to the netfs is pure vfs APIs, just like
> > overlayfs interfaces with lower fs?
>
> In short, no - doing it with pure the VFS APIs that we have is not that simple
> (yes, Solaris does it with a stacking filesystem, and I don't know anything
> about the API details, but there must be an auxiliary API).  You need to
[...]
>
> > As long as netfs supports direct_IO() (all except afs do) then the active page
> > cache could be that of the stackable cachefs and network IO is always
> > direct from/to cachefs pages.
>
> What about objects that don't support DIO?  Directories, symbolic links and
> automount points?  All of these things are cacheable objects with AFS.
>

direct_IO is for not duplicating page cache.
Not relevant for those objects.
I guess that for those objects the invalidation callbacks is what matters.

> And speaking of automount points - how would you deal with those beyond simply
> caching the contents?  Create a new stacked instance over it?  How do you see
> the automount point itself?
>

I didn't get this far ;-)

> I see that the NFS FH encoder doesn't handle automount points.
>
> > If netfs supports export_operations (all except afs do), then indexing
> > the cache objects could be done in a generic manner using fsid and
> > file handle, just like overlayfs index feature works today.
>
> FSID isn't unique and doesn't exist for all filesystems.  Two NFS servers, for
> example, can give you the same FSID, but referring to different things.  AFS
> has a textual cell name and a volume ID that you need to combine; it doesn't
> have an FSID.
>
> This may work for overlayfs as the FSID can be confined to a particular
> overlay.  However, that's not what we're dealing with.  We would be talking
> about an index that potentially covers *all* the mounted netfs.
>
> Also, from your description that sounds like a bug in overlayfs.  If the
> overlain NFS tree does a referral to a different server, you no longer have a
> unique FSID or a unique FH within that FSID so your index is broken.
>

I misspoke. Overlayfs uses s_uuid for index not fsid.
If s_uuid is null or no export ops, then index cannot be used.
So yeh, it's a challenge to auto index the netfs' objects.

> > Would it not be a maintenance win if all (or most of) the fscache logic
> > was yanked out of all the specific netfs's?
>
> Actually, it may not help enormously with disconnected operation.  A certain
> amount of the logic probably has to be implemented in the netfs as each netfs
> provides different facilities for managing this.
>
> Yes, it gets some of the I/O stuff out - but I want to move some of that down
> into the VM if I can and librarifying the rest should take care of that.
>
> > Can you think of reasons why the stackable cachefs model cannot work
> > or why it is inferior to the current fscache integration model with netfs's?
>
> Yes.  It's a lot more operationally expensive and it's harder to use.  The
> cache driver would also have to get a lot bigger, but that would be
> reasonable.
>
> Firstly, the expense: you have to double up all the inodes and dentries that
> are in use - and that's not counting the resources used inside the cache
> itself.

Good point.

>
> Secondly, the administration: I'm assuming you're suggesting the way I think
> Solaris does it and that you have to make two mounts: firstly you mount the
> netfs and then you mount the cache over it.  It's much simpler if you just
> need make the netfs mount only and then that goes and uses the cache if it's
> available - it's also simple to bring the cache online after the fact meaning
> you can even cache applied retroactively to a root filesystem.
>

All of the above is true is you mount the stacked cachefs to begin with.
You can add/remove the caches later.

> You also have the issue of what happens if someone bind-mounts the netfs mount
> and mounts the cache over only one of the views.  Now you have a coherency
> management problem that the cache cannot see.  It's only visible to the netfs,
> but the netfs doesn't know about the cache.
>

The shotgun to shoot the foot you mean - yap.

> There's also file locking.  Overlayfs doesn't support file locking that I can
> see, but NFS, AFS and CIFS all do.
>

Not sure which locks you mean. flock and leases do work on overlayfs AFAIK.
But yes, every one of those things is a challenge with stacked fs, but overlayfs
has already made a lot of progress.

>
> Anyway, you might be able to guess that I'm really against using stackable
> filesystems for things like this and like UID shifting.  I think it adds more
> expense and complexity than it's necessarily worth.
>

Yes, I figured as much :)

> I was more inclined to go with unionfs than overlayfs and do the filesystem
> union in the VFS as it ought to be cheaper if you're using it (whereas
> overlayfs is cheaper if you're not).
>

I guess competition is good.

Anyway, I am brewing a topic about filesystem APIs for
Hierarchic Storage Managers, such as https://vfsforgit.org/.
There are similarities between the requirements for HSM and for
disconnected operations for netfs - you might even say they are
not two different things. So we may want to bring them up together
in the same session or two adjacent sessions - we'll see.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] How to make disconnected operation work?
  2019-12-09 23:14 ` Jeff Layton
@ 2020-03-06  7:11   ` Steven French
  0 siblings, 0 replies; 7+ messages in thread
From: Steven French @ 2020-03-06  7:11 UTC (permalink / raw)
  To: Jeff Layton, David Howells, lsf-pc, Trond Myklebust, Anna Schumaker
  Cc: linux-fsdevel

As discussed in hallway discussions at Linux Storage Conference - this 
would make a good topic for LSF/MM

On 12/9/19 5:14 PM, Jeff Layton wrote:
> On Mon, 2019-12-09 at 14:46 +0000, David Howells wrote:
>> I've been rewriting fscache and cachefiles to massively simplify it and make
>> use of the kiocb interface to do direct-I/O to/from the netfs's pages which
>> didn't exist when I first did this.
>>
>> 	https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/
>> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>>
>> I'm getting towards the point where it's working and able to do basic caching
>> once again.  So now I've been thinking about what it'd take to support
>> disconnected operation.  Here's a list of things that I think need to be
>> considered or dealt with:
>>
> I'm quite interested in this too. I see that you've already given a lot
> of thought to potential interfaces here. I think we'll end up having to
> add a fair number of new interfaces to make something like this work.
>
>>   (1) Making sure the working set is present in the cache.
>>
>>       - Userspace (find/cat/tar)
>>       - Splice netfs -> cache
>>       - Metadata storage (e.g. directories)
>>       - Permissions caching
>>
>>   (2) Making sure the working set doesn't get culled.
>>
>>       - Pinning API (cachectl() syscall?)
>>       - Allow culling to be disabled entirely on a cache
>>       - Per-fs/per-dir config
>>
>>   (3) Switching into/out of disconnected mode.
>>
>>       - Manual, automatic
>>       - On what granularity?
>>         - Entirety of fs (eg. all nfs)
>>         - By logical unit (server, volume, cell, share)
>>
>>   (4) Local changes in disconnected mode.
>>
>>       - Journal
>>       - File identifier allocation
> Yep, necessary if you want to allow disconnected creates. By coincidence
> I'm working an (experimental) patchset now to add async create support
> to kcephfs, and part of that involves delegating out ranges of inode
> numbers. I may have some experience to report with it by the time LSF
> rolls around.
>
>>       - statx flag to indicate provisional nature of info
>>       - New error codes
>> 	- EDISCONNECTED - Op not available in disconnected mode
>> 	- EDISCONDATA - Data not available in disconnected mode
>> 	- EDISCONPERM - Permission cannot be checked in disconnected mode
>> 	- EDISCONFULL - Disconnected mode cache full
>>       - SIGIO support?
>>
>>   (5) Reconnection.
>>
>>       - Proactive or JIT synchronisation
>>         - Authentication
>>       - Conflict detection and resolution
>> 	 - ECONFLICTED - Disconnected mode resolution failed
> ECONFLICTED sort of implies that reconnection will be manual. If it
> happens automagically in the background you'll have no way to report
> such errors.
>
> Also, you'll need some mechanism to know what inodes are conflicted.
> This is the real difficult part of this problem, IMO.
>
>
>>       - Journal replay
>>       - Directory 'diffing' to find remote deletions
>>       - Symlink and other non-regular file comparison
>>
>>   (6) Conflict resolution.
>>
>>       - Automatic where possible
>>         - Just create/remove new non-regular files if possible
>>         - How to handle permission differences?
>>       - How to let userspace access conflicts?
>>         - Move local copy to 'lost+found'-like directory
>>         	 - Might not have been completely downloaded
>>         - New open() flags?
>>         	 - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT
>>         - fcntl() to switch variants?
>>
> Again, conflict resolution is the difficult part. Maybe the right
> solution is to look at snapshotting-style interfaces -- i.e., handle a
> disconnected mount sort of like you would a writable snapshot. Do any
> (local) fs' currently offer writable snapshots, btw?
>
>>   (7) GUI integration.
>>
>>       - Entering/exiting disconnected mode notification/switches.
>>       - Resolution required notification.
>>       - Cache getting full notification.
>>
>> Can anyone think of any more considerations?  What do you think of the
>> proposed error codes and open flags?  Is that the best way to do this?
>>
>> David
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-06  7:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09 14:46 [LSF/MM/BPF TOPIC] How to make disconnected operation work? David Howells
2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein
2020-01-24 14:13   ` Amir Goldstein
2020-01-27 16:32   ` David Howells
2020-01-27 19:18     ` Amir Goldstein
2019-12-09 23:14 ` Jeff Layton
2020-03-06  7:11   ` Steven French

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).