* [LSF/MM/BPF TOPIC] How to make disconnected operation work? @ 2019-12-09 14:46 David Howells 2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein 2019-12-09 23:14 ` Jeff Layton 0 siblings, 2 replies; 7+ messages in thread From: David Howells @ 2019-12-09 14:46 UTC (permalink / raw) To: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French Cc: dhowells, jlayton, linux-fsdevel I've been rewriting fscache and cachefiles to massively simplify it and make use of the kiocb interface to do direct-I/O to/from the netfs's pages which didn't exist when I first did this. https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/ https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter I'm getting towards the point where it's working and able to do basic caching once again. So now I've been thinking about what it'd take to support disconnected operation. Here's a list of things that I think need to be considered or dealt with: (1) Making sure the working set is present in the cache. - Userspace (find/cat/tar) - Splice netfs -> cache - Metadata storage (e.g. directories) - Permissions caching (2) Making sure the working set doesn't get culled. - Pinning API (cachectl() syscall?) - Allow culling to be disabled entirely on a cache - Per-fs/per-dir config (3) Switching into/out of disconnected mode. - Manual, automatic - On what granularity? - Entirety of fs (eg. all nfs) - By logical unit (server, volume, cell, share) (4) Local changes in disconnected mode. - Journal - File identifier allocation - statx flag to indicate provisional nature of info - New error codes - EDISCONNECTED - Op not available in disconnected mode - EDISCONDATA - Data not available in disconnected mode - EDISCONPERM - Permission cannot be checked in disconnected mode - EDISCONFULL - Disconnected mode cache full - SIGIO support? (5) Reconnection. - Proactive or JIT synchronisation - Authentication - Conflict detection and resolution - ECONFLICTED - Disconnected mode resolution failed - Journal replay - Directory 'diffing' to find remote deletions - Symlink and other non-regular file comparison (6) Conflict resolution. - Automatic where possible - Just create/remove new non-regular files if possible - How to handle permission differences? - How to let userspace access conflicts? - Move local copy to 'lost+found'-like directory - Might not have been completely downloaded - New open() flags? - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT - fcntl() to switch variants? (7) GUI integration. - Entering/exiting disconnected mode notification/switches. - Resolution required notification. - Cache getting full notification. Can anyone think of any more considerations? What do you think of the proposed error codes and open flags? Is that the best way to do this? David ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work? 2019-12-09 14:46 [LSF/MM/BPF TOPIC] How to make disconnected operation work? David Howells @ 2019-12-09 17:33 ` Amir Goldstein 2020-01-24 14:13 ` Amir Goldstein 2020-01-27 16:32 ` David Howells 2019-12-09 23:14 ` Jeff Layton 1 sibling, 2 replies; 7+ messages in thread From: Amir Goldstein @ 2019-12-09 17:33 UTC (permalink / raw) To: David Howells Cc: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French, linux-fsdevel, Jeff Layton, Miklos Szeredi On Mon, Dec 9, 2019 at 4:47 PM David Howells <dhowells@redhat.com> wrote: > > I've been rewriting fscache and cachefiles to massively simplify it and make > use of the kiocb interface to do direct-I/O to/from the netfs's pages which > didn't exist when I first did this. > > https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/ > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter > > I'm getting towards the point where it's working and able to do basic caching > once again. So now I've been thinking about what it'd take to support > disconnected operation. Here's a list of things that I think need to be > considered or dealt with: > > (1) Making sure the working set is present in the cache. > > - Userspace (find/cat/tar) > - Splice netfs -> cache > - Metadata storage (e.g. directories) > - Permissions caching > > (2) Making sure the working set doesn't get culled. > > - Pinning API (cachectl() syscall?) > - Allow culling to be disabled entirely on a cache > - Per-fs/per-dir config > > (3) Switching into/out of disconnected mode. > > - Manual, automatic > - On what granularity? > - Entirety of fs (eg. all nfs) > - By logical unit (server, volume, cell, share) > > (4) Local changes in disconnected mode. > > - Journal > - File identifier allocation > - statx flag to indicate provisional nature of info > - New error codes > - EDISCONNECTED - Op not available in disconnected mode > - EDISCONDATA - Data not available in disconnected mode > - EDISCONPERM - Permission cannot be checked in disconnected mode > - EDISCONFULL - Disconnected mode cache full > - SIGIO support? > > (5) Reconnection. > > - Proactive or JIT synchronisation > - Authentication > - Conflict detection and resolution > - ECONFLICTED - Disconnected mode resolution failed > - Journal replay > - Directory 'diffing' to find remote deletions > - Symlink and other non-regular file comparison > > (6) Conflict resolution. > > - Automatic where possible > - Just create/remove new non-regular files if possible > - How to handle permission differences? > - How to let userspace access conflicts? > - Move local copy to 'lost+found'-like directory > - Might not have been completely downloaded > - New open() flags? > - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT > - fcntl() to switch variants? > > (7) GUI integration. > > - Entering/exiting disconnected mode notification/switches. > - Resolution required notification. > - Cache getting full notification. > > Can anyone think of any more considerations? What do you think of the > proposed error codes and open flags? Is that the best way to do this? > Hi David, I am very interested in this topic. I can share (some) information from experience with a "Caching Gateway" implementation in userspace shipped in products of my employer, CTERA. I have come across several attempts to implement a network fs cache using overlayfs. I don't remember by whom, but they were asking questions on overlayfs list about online modification to lower layer. It is not so far fetched, as you get many of the requirements for metadata caching out-of-the-box, especially with recent addition of metacopy feature. Also, if you consider the plans to implement overlayfs page cache [1][2], then at least the read side of fscache sounds like it has some things in common with overlayfs. Anyway, you should know plenty about overlayfs to say if you think there is any room for collaboration between the two projects. Thanks, Amir. [1] https://marc.info/?l=linux-unionfs&m=154995746503505&w=2 [2] https://github.com/amir73il/linux/commits/ovl-aops-wip ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work? 2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein @ 2020-01-24 14:13 ` Amir Goldstein 2020-01-27 16:32 ` David Howells 1 sibling, 0 replies; 7+ messages in thread From: Amir Goldstein @ 2020-01-24 14:13 UTC (permalink / raw) To: David Howells Cc: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French, linux-fsdevel, Jeff Layton, Miklos Szeredi, Linux NFS Mailing List On Mon, Dec 9, 2019 at 7:33 PM Amir Goldstein <amir73il@gmail.com> wrote: > > On Mon, Dec 9, 2019 at 4:47 PM David Howells <dhowells@redhat.com> wrote: > > > > I've been rewriting fscache and cachefiles to massively simplify it and make > > use of the kiocb interface to do direct-I/O to/from the netfs's pages which > > didn't exist when I first did this. > > > > https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/ > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter > > > > I'm getting towards the point where it's working and able to do basic caching > > once again. So now I've been thinking about what it'd take to support > > disconnected operation. Here's a list of things that I think need to be > > considered or dealt with: > > > > (1) Making sure the working set is present in the cache. > > > > - Userspace (find/cat/tar) > > - Splice netfs -> cache > > - Metadata storage (e.g. directories) > > - Permissions caching > > > > (2) Making sure the working set doesn't get culled. > > > > - Pinning API (cachectl() syscall?) > > - Allow culling to be disabled entirely on a cache > > - Per-fs/per-dir config > > > > (3) Switching into/out of disconnected mode. > > > > - Manual, automatic > > - On what granularity? > > - Entirety of fs (eg. all nfs) > > - By logical unit (server, volume, cell, share) > > > > (4) Local changes in disconnected mode. > > > > - Journal > > - File identifier allocation > > - statx flag to indicate provisional nature of info > > - New error codes > > - EDISCONNECTED - Op not available in disconnected mode > > - EDISCONDATA - Data not available in disconnected mode > > - EDISCONPERM - Permission cannot be checked in disconnected mode > > - EDISCONFULL - Disconnected mode cache full > > - SIGIO support? > > > > (5) Reconnection. > > > > - Proactive or JIT synchronisation > > - Authentication > > - Conflict detection and resolution > > - ECONFLICTED - Disconnected mode resolution failed > > - Journal replay > > - Directory 'diffing' to find remote deletions > > - Symlink and other non-regular file comparison > > > > (6) Conflict resolution. > > > > - Automatic where possible > > - Just create/remove new non-regular files if possible > > - How to handle permission differences? > > - How to let userspace access conflicts? > > - Move local copy to 'lost+found'-like directory > > - Might not have been completely downloaded > > - New open() flags? > > - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT > > - fcntl() to switch variants? > > > > (7) GUI integration. > > > > - Entering/exiting disconnected mode notification/switches. > > - Resolution required notification. > > - Cache getting full notification. > > > > Can anyone think of any more considerations? What do you think of the > > proposed error codes and open flags? Is that the best way to do this? > > > > Hi David, > > I am very interested in this topic. > I can share (some) information from experience with a "Caching Gateway" > implementation in userspace shipped in products of my employer, CTERA. > > I have come across several attempts to implement a network fs cache > using overlayfs. I don't remember by whom, but they were asking > questions on overlayfs list about online modification to lower layer. > > It is not so far fetched, as you get many of the requirements for metadata > caching out-of-the-box, especially with recent addition of metacopy feature. > Also, if you consider the plans to implement overlayfs page cache [1][2], > then at least the read side of fscache sounds like it has some things in > common with overlayfs. > > Anyway, you should know plenty about overlayfs to say if you think > there is any room for collaboration between the two projects. > > > [1] https://marc.info/?l=linux-unionfs&m=154995746503505&w=2 > [2] https://github.com/amir73il/linux/commits/ovl-aops-wip David, I have been reading through the fscache APIs and tried to answer this (maybe stupid) question: Why does every netfs need to implement fscache support on its own? fscache support as it is today is extremely intrusive to filesystem code and your re-write doesn't make it any less intrusive. My thinking is: Can't we implement a stackable cachefs which interfaces with fscache and whose API to the netfs is pure vfs APIs, just like overlayfs interfaces with lower fs? The only fscache API I could find that really needs to be called from netfs code is fscache_invalidate() and many of those calls are invoked from vfs ops anyway, so maybe they could also be hoisted to this cachefs. As long as netfs supports direct_IO() (all except afs do) then the active page cache could be that of the stackable cachefs and network IO is always direct from/to cachefs pages. If netfs supports export_operations (all except afs do), then indexing the cache objects could be done in a generic manner using fsid and file handle, just like overlayfs index feature works today. Would it not be a maintenance win if all (or most of) the fscache logic was yanked out of all the specific netfs's? Can you think of reasons why the stackable cachefs model cannot work or why it is inferior to the current fscache integration model with netfs's? Thanks, Amir. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work? 2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein 2020-01-24 14:13 ` Amir Goldstein @ 2020-01-27 16:32 ` David Howells 2020-01-27 19:18 ` Amir Goldstein 1 sibling, 1 reply; 7+ messages in thread From: David Howells @ 2020-01-27 16:32 UTC (permalink / raw) To: Amir Goldstein Cc: dhowells, lsf-pc, Trond Myklebust, Anna Schumaker, Steve French, linux-fsdevel, Jeff Layton, Miklos Szeredi, Linux NFS Mailing List Amir Goldstein <amir73il@gmail.com> wrote: > My thinking is: Can't we implement a stackable cachefs which interfaces > with fscache and whose API to the netfs is pure vfs APIs, just like > overlayfs interfaces with lower fs? In short, no - doing it with pure the VFS APIs that we have is not that simple (yes, Solaris does it with a stacking filesystem, and I don't know anything about the API details, but there must be an auxiliary API). You need to handle: (1) Remote invalidation. The netfs needs to tell the cache layer asynchronously about remote modifications - where the modification can modify not just file content but also directory structure, and even file data invalidation may be partial. (2) Unique file group matching. The info required to match a group of files (e.g. an NFS server, an AFS volume, a CIFS share) is not necessarily available through the VFS API - I'm not sure even the export API makes this available since it's built on the assumption that it's exporting local files. (3) File matching. The info required to match a file to the cache is not necessarily available through the VFS API. NFS has file handles, for example; the YFS variant of AFS has 96-bit 'inode numbers'. (This might be done with the export API - it that's counted so). Further, the file identifier may not be unique outside the file group. (4) Coherency management. The netfs must tell the cache whether or not the data contained in the cache is valid. This information is not necessarily available through the VFS APIs (NFS change IDs, AFS data version, AFS volume sync info). It's also highly filesystem specific. It might also have security implications for netfs's that handle their own security (such as AFS does), but that might fall out naturally. > As long as netfs supports direct_IO() (all except afs do) then the active page > cache could be that of the stackable cachefs and network IO is always > direct from/to cachefs pages. What about objects that don't support DIO? Directories, symbolic links and automount points? All of these things are cacheable objects with AFS. And speaking of automount points - how would you deal with those beyond simply caching the contents? Create a new stacked instance over it? How do you see the automount point itself? I see that the NFS FH encoder doesn't handle automount points. > If netfs supports export_operations (all except afs do), then indexing > the cache objects could be done in a generic manner using fsid and > file handle, just like overlayfs index feature works today. FSID isn't unique and doesn't exist for all filesystems. Two NFS servers, for example, can give you the same FSID, but referring to different things. AFS has a textual cell name and a volume ID that you need to combine; it doesn't have an FSID. This may work for overlayfs as the FSID can be confined to a particular overlay. However, that's not what we're dealing with. We would be talking about an index that potentially covers *all* the mounted netfs. Also, from your description that sounds like a bug in overlayfs. If the overlain NFS tree does a referral to a different server, you no longer have a unique FSID or a unique FH within that FSID so your index is broken. > Would it not be a maintenance win if all (or most of) the fscache logic > was yanked out of all the specific netfs's? Actually, it may not help enormously with disconnected operation. A certain amount of the logic probably has to be implemented in the netfs as each netfs provides different facilities for managing this. Yes, it gets some of the I/O stuff out - but I want to move some of that down into the VM if I can and librarifying the rest should take care of that. > Can you think of reasons why the stackable cachefs model cannot work > or why it is inferior to the current fscache integration model with netfs's? Yes. It's a lot more operationally expensive and it's harder to use. The cache driver would also have to get a lot bigger, but that would be reasonable. Firstly, the expense: you have to double up all the inodes and dentries that are in use - and that's not counting the resources used inside the cache itself. Secondly, the administration: I'm assuming you're suggesting the way I think Solaris does it and that you have to make two mounts: firstly you mount the netfs and then you mount the cache over it. It's much simpler if you just need make the netfs mount only and then that goes and uses the cache if it's available - it's also simple to bring the cache online after the fact meaning you can even cache applied retroactively to a root filesystem. You also have the issue of what happens if someone bind-mounts the netfs mount and mounts the cache over only one of the views. Now you have a coherency management problem that the cache cannot see. It's only visible to the netfs, but the netfs doesn't know about the cache. There's also file locking. Overlayfs doesn't support file locking that I can see, but NFS, AFS and CIFS all do. Anyway, you might be able to guess that I'm really against using stackable filesystems for things like this and like UID shifting. I think it adds more expense and complexity than it's necessarily worth. I was more inclined to go with unionfs than overlayfs and do the filesystem union in the VFS as it ought to be cheaper if you're using it (whereas overlayfs is cheaper if you're not). One final thing - even if we did want to switch to an stacked approach, we might still have to maintain the current way as people use it. David ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] How to make disconnected operation work? 2020-01-27 16:32 ` David Howells @ 2020-01-27 19:18 ` Amir Goldstein 0 siblings, 0 replies; 7+ messages in thread From: Amir Goldstein @ 2020-01-27 19:18 UTC (permalink / raw) To: David Howells Cc: lsf-pc, Trond Myklebust, Anna Schumaker, Steve French, linux-fsdevel, Jeff Layton, Miklos Szeredi, Linux NFS Mailing List On Mon, Jan 27, 2020 at 6:32 PM David Howells <dhowells@redhat.com> wrote: > > Amir Goldstein <amir73il@gmail.com> wrote: > > > My thinking is: Can't we implement a stackable cachefs which interfaces > > with fscache and whose API to the netfs is pure vfs APIs, just like > > overlayfs interfaces with lower fs? > > In short, no - doing it with pure the VFS APIs that we have is not that simple > (yes, Solaris does it with a stacking filesystem, and I don't know anything > about the API details, but there must be an auxiliary API). You need to [...] > > > As long as netfs supports direct_IO() (all except afs do) then the active page > > cache could be that of the stackable cachefs and network IO is always > > direct from/to cachefs pages. > > What about objects that don't support DIO? Directories, symbolic links and > automount points? All of these things are cacheable objects with AFS. > direct_IO is for not duplicating page cache. Not relevant for those objects. I guess that for those objects the invalidation callbacks is what matters. > And speaking of automount points - how would you deal with those beyond simply > caching the contents? Create a new stacked instance over it? How do you see > the automount point itself? > I didn't get this far ;-) > I see that the NFS FH encoder doesn't handle automount points. > > > If netfs supports export_operations (all except afs do), then indexing > > the cache objects could be done in a generic manner using fsid and > > file handle, just like overlayfs index feature works today. > > FSID isn't unique and doesn't exist for all filesystems. Two NFS servers, for > example, can give you the same FSID, but referring to different things. AFS > has a textual cell name and a volume ID that you need to combine; it doesn't > have an FSID. > > This may work for overlayfs as the FSID can be confined to a particular > overlay. However, that's not what we're dealing with. We would be talking > about an index that potentially covers *all* the mounted netfs. > > Also, from your description that sounds like a bug in overlayfs. If the > overlain NFS tree does a referral to a different server, you no longer have a > unique FSID or a unique FH within that FSID so your index is broken. > I misspoke. Overlayfs uses s_uuid for index not fsid. If s_uuid is null or no export ops, then index cannot be used. So yeh, it's a challenge to auto index the netfs' objects. > > Would it not be a maintenance win if all (or most of) the fscache logic > > was yanked out of all the specific netfs's? > > Actually, it may not help enormously with disconnected operation. A certain > amount of the logic probably has to be implemented in the netfs as each netfs > provides different facilities for managing this. > > Yes, it gets some of the I/O stuff out - but I want to move some of that down > into the VM if I can and librarifying the rest should take care of that. > > > Can you think of reasons why the stackable cachefs model cannot work > > or why it is inferior to the current fscache integration model with netfs's? > > Yes. It's a lot more operationally expensive and it's harder to use. The > cache driver would also have to get a lot bigger, but that would be > reasonable. > > Firstly, the expense: you have to double up all the inodes and dentries that > are in use - and that's not counting the resources used inside the cache > itself. Good point. > > Secondly, the administration: I'm assuming you're suggesting the way I think > Solaris does it and that you have to make two mounts: firstly you mount the > netfs and then you mount the cache over it. It's much simpler if you just > need make the netfs mount only and then that goes and uses the cache if it's > available - it's also simple to bring the cache online after the fact meaning > you can even cache applied retroactively to a root filesystem. > All of the above is true is you mount the stacked cachefs to begin with. You can add/remove the caches later. > You also have the issue of what happens if someone bind-mounts the netfs mount > and mounts the cache over only one of the views. Now you have a coherency > management problem that the cache cannot see. It's only visible to the netfs, > but the netfs doesn't know about the cache. > The shotgun to shoot the foot you mean - yap. > There's also file locking. Overlayfs doesn't support file locking that I can > see, but NFS, AFS and CIFS all do. > Not sure which locks you mean. flock and leases do work on overlayfs AFAIK. But yes, every one of those things is a challenge with stacked fs, but overlayfs has already made a lot of progress. > > Anyway, you might be able to guess that I'm really against using stackable > filesystems for things like this and like UID shifting. I think it adds more > expense and complexity than it's necessarily worth. > Yes, I figured as much :) > I was more inclined to go with unionfs than overlayfs and do the filesystem > union in the VFS as it ought to be cheaper if you're using it (whereas > overlayfs is cheaper if you're not). > I guess competition is good. Anyway, I am brewing a topic about filesystem APIs for Hierarchic Storage Managers, such as https://vfsforgit.org/. There are similarities between the requirements for HSM and for disconnected operations for netfs - you might even say they are not two different things. So we may want to bring them up together in the same session or two adjacent sessions - we'll see. Thanks, Amir. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM/BPF TOPIC] How to make disconnected operation work? 2019-12-09 14:46 [LSF/MM/BPF TOPIC] How to make disconnected operation work? David Howells 2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein @ 2019-12-09 23:14 ` Jeff Layton 2020-03-06 7:11 ` Steven French 1 sibling, 1 reply; 7+ messages in thread From: Jeff Layton @ 2019-12-09 23:14 UTC (permalink / raw) To: David Howells, lsf-pc, Trond Myklebust, Anna Schumaker, Steve French Cc: linux-fsdevel On Mon, 2019-12-09 at 14:46 +0000, David Howells wrote: > I've been rewriting fscache and cachefiles to massively simplify it and make > use of the kiocb interface to do direct-I/O to/from the netfs's pages which > didn't exist when I first did this. > > https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/ > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter > > I'm getting towards the point where it's working and able to do basic caching > once again. So now I've been thinking about what it'd take to support > disconnected operation. Here's a list of things that I think need to be > considered or dealt with: > I'm quite interested in this too. I see that you've already given a lot of thought to potential interfaces here. I think we'll end up having to add a fair number of new interfaces to make something like this work. > (1) Making sure the working set is present in the cache. > > - Userspace (find/cat/tar) > - Splice netfs -> cache > - Metadata storage (e.g. directories) > - Permissions caching > > (2) Making sure the working set doesn't get culled. > > - Pinning API (cachectl() syscall?) > - Allow culling to be disabled entirely on a cache > - Per-fs/per-dir config > > (3) Switching into/out of disconnected mode. > > - Manual, automatic > - On what granularity? > - Entirety of fs (eg. all nfs) > - By logical unit (server, volume, cell, share) > > (4) Local changes in disconnected mode. > > - Journal > - File identifier allocation Yep, necessary if you want to allow disconnected creates. By coincidence I'm working an (experimental) patchset now to add async create support to kcephfs, and part of that involves delegating out ranges of inode numbers. I may have some experience to report with it by the time LSF rolls around. > - statx flag to indicate provisional nature of info > - New error codes > - EDISCONNECTED - Op not available in disconnected mode > - EDISCONDATA - Data not available in disconnected mode > - EDISCONPERM - Permission cannot be checked in disconnected mode > - EDISCONFULL - Disconnected mode cache full > - SIGIO support? > > (5) Reconnection. > > - Proactive or JIT synchronisation > - Authentication > - Conflict detection and resolution > - ECONFLICTED - Disconnected mode resolution failed ECONFLICTED sort of implies that reconnection will be manual. If it happens automagically in the background you'll have no way to report such errors. Also, you'll need some mechanism to know what inodes are conflicted. This is the real difficult part of this problem, IMO. > - Journal replay > - Directory 'diffing' to find remote deletions > - Symlink and other non-regular file comparison > > (6) Conflict resolution. > > - Automatic where possible > - Just create/remove new non-regular files if possible > - How to handle permission differences? > - How to let userspace access conflicts? > - Move local copy to 'lost+found'-like directory > - Might not have been completely downloaded > - New open() flags? > - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT > - fcntl() to switch variants? > Again, conflict resolution is the difficult part. Maybe the right solution is to look at snapshotting-style interfaces -- i.e., handle a disconnected mount sort of like you would a writable snapshot. Do any (local) fs' currently offer writable snapshots, btw? > (7) GUI integration. > > - Entering/exiting disconnected mode notification/switches. > - Resolution required notification. > - Cache getting full notification. > > Can anyone think of any more considerations? What do you think of the > proposed error codes and open flags? Is that the best way to do this? > > David > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM/BPF TOPIC] How to make disconnected operation work? 2019-12-09 23:14 ` Jeff Layton @ 2020-03-06 7:11 ` Steven French 0 siblings, 0 replies; 7+ messages in thread From: Steven French @ 2020-03-06 7:11 UTC (permalink / raw) To: Jeff Layton, David Howells, lsf-pc, Trond Myklebust, Anna Schumaker Cc: linux-fsdevel As discussed in hallway discussions at Linux Storage Conference - this would make a good topic for LSF/MM On 12/9/19 5:14 PM, Jeff Layton wrote: > On Mon, 2019-12-09 at 14:46 +0000, David Howells wrote: >> I've been rewriting fscache and cachefiles to massively simplify it and make >> use of the kiocb interface to do direct-I/O to/from the netfs's pages which >> didn't exist when I first did this. >> >> https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/ >> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter >> >> I'm getting towards the point where it's working and able to do basic caching >> once again. So now I've been thinking about what it'd take to support >> disconnected operation. Here's a list of things that I think need to be >> considered or dealt with: >> > I'm quite interested in this too. I see that you've already given a lot > of thought to potential interfaces here. I think we'll end up having to > add a fair number of new interfaces to make something like this work. > >> (1) Making sure the working set is present in the cache. >> >> - Userspace (find/cat/tar) >> - Splice netfs -> cache >> - Metadata storage (e.g. directories) >> - Permissions caching >> >> (2) Making sure the working set doesn't get culled. >> >> - Pinning API (cachectl() syscall?) >> - Allow culling to be disabled entirely on a cache >> - Per-fs/per-dir config >> >> (3) Switching into/out of disconnected mode. >> >> - Manual, automatic >> - On what granularity? >> - Entirety of fs (eg. all nfs) >> - By logical unit (server, volume, cell, share) >> >> (4) Local changes in disconnected mode. >> >> - Journal >> - File identifier allocation > Yep, necessary if you want to allow disconnected creates. By coincidence > I'm working an (experimental) patchset now to add async create support > to kcephfs, and part of that involves delegating out ranges of inode > numbers. I may have some experience to report with it by the time LSF > rolls around. > >> - statx flag to indicate provisional nature of info >> - New error codes >> - EDISCONNECTED - Op not available in disconnected mode >> - EDISCONDATA - Data not available in disconnected mode >> - EDISCONPERM - Permission cannot be checked in disconnected mode >> - EDISCONFULL - Disconnected mode cache full >> - SIGIO support? >> >> (5) Reconnection. >> >> - Proactive or JIT synchronisation >> - Authentication >> - Conflict detection and resolution >> - ECONFLICTED - Disconnected mode resolution failed > ECONFLICTED sort of implies that reconnection will be manual. If it > happens automagically in the background you'll have no way to report > such errors. > > Also, you'll need some mechanism to know what inodes are conflicted. > This is the real difficult part of this problem, IMO. > > >> - Journal replay >> - Directory 'diffing' to find remote deletions >> - Symlink and other non-regular file comparison >> >> (6) Conflict resolution. >> >> - Automatic where possible >> - Just create/remove new non-regular files if possible >> - How to handle permission differences? >> - How to let userspace access conflicts? >> - Move local copy to 'lost+found'-like directory >> - Might not have been completely downloaded >> - New open() flags? >> - O_SERVER_VARIANT, O_CLIENT_VARIANT, O_RESOLVED_VARIANT >> - fcntl() to switch variants? >> > Again, conflict resolution is the difficult part. Maybe the right > solution is to look at snapshotting-style interfaces -- i.e., handle a > disconnected mount sort of like you would a writable snapshot. Do any > (local) fs' currently offer writable snapshots, btw? > >> (7) GUI integration. >> >> - Entering/exiting disconnected mode notification/switches. >> - Resolution required notification. >> - Cache getting full notification. >> >> Can anyone think of any more considerations? What do you think of the >> proposed error codes and open flags? Is that the best way to do this? >> >> David >> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-03-06 7:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-09 14:46 [LSF/MM/BPF TOPIC] How to make disconnected operation work? David Howells 2019-12-09 17:33 ` [Lsf-pc] " Amir Goldstein 2020-01-24 14:13 ` Amir Goldstein 2020-01-27 16:32 ` David Howells 2020-01-27 19:18 ` Amir Goldstein 2019-12-09 23:14 ` Jeff Layton 2020-03-06 7:11 ` Steven French
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).