All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Network filesystem cache management system call
@ 2017-01-06 22:40 David Howells
  2017-01-06 23:18 ` Andreas Dilger
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: David Howells @ 2017-01-06 22:40 UTC (permalink / raw)
  To: lsf-pc; +Cc: dhowells, linux-fsdevel

AFS filesystem implementations like OpenAFS have a number of management calls
usually implemented through the pioctl() and afs() system calls.  However,
there is general revulsion to the idea of allowing these in Linux as they are
very much abused.

Now, I need to come up with replacements for these.  For some I can use
getxattr()/setxattr(), for some keyrings and for others procfs/sysfs, but for
some I can't.

One of these is the call to manage local caching on a file or volume.  This,
however, doesn't really need to be limited to AFS, but could also be
applicable to NFS, CIFS, etc. - and possibly even to local filesystems.

A brainstorming session on this would be useful.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-06 22:40 [LSF/MM TOPIC] Network filesystem cache management system call David Howells
@ 2017-01-06 23:18 ` Andreas Dilger
  2017-01-07 14:27   ` [Lsf-pc] " Jeff Layton
  2017-01-13 17:16 ` J. Bruce Fields
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Andreas Dilger @ 2017-01-06 23:18 UTC (permalink / raw)
  To: David Howells; +Cc: lsf-pc, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

On Jan 6, 2017, at 3:40 PM, David Howells <dhowells@redhat.com> wrote:
> 
> AFS filesystem implementations like OpenAFS have a number of management calls
> usually implemented through the pioctl() and afs() system calls.  However,
> there is general revulsion to the idea of allowing these in Linux as they are
> very much abused.
> 
> Now, I need to come up with replacements for these.  For some I can use
> getxattr()/setxattr(), for some keyrings and for others procfs/sysfs, but for
> some I can't.
> 
> One of these is the call to manage local caching on a file or volume.  This,
> however, doesn't really need to be limited to AFS, but could also be
> applicable to NFS, CIFS, etc. - and possibly even to local filesystems.
> 
> A brainstorming session on this would be useful.

This is definitely something I'd like to discuss as well.  The posix_fadvise()
call doesn't interact with the filesystem at all, so it isn't a useful source
of cache management hints.  We've implemented a Lustre-specific ladvise hint
using an ioctl to advise client and server caches, but having a generic hint
API would be better.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-06 23:18 ` Andreas Dilger
@ 2017-01-07 14:27   ` Jeff Layton
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2017-01-07 14:27 UTC (permalink / raw)
  To: Andreas Dilger, David Howells; +Cc: linux-fsdevel, lsf-pc, Trond Myklebust

On Fri, 2017-01-06 at 16:18 -0700, Andreas Dilger wrote:
> On Jan 6, 2017, at 3:40 PM, David Howells <dhowells@redhat.com> wrote:
> > 
> > 
> > AFS filesystem implementations like OpenAFS have a number of management calls
> > usually implemented through the pioctl() and afs() system calls.  However,
> > there is general revulsion to the idea of allowing these in Linux as they are
> > very much abused.
> > 
> > Now, I need to come up with replacements for these.  For some I can use
> > getxattr()/setxattr(), for some keyrings and for others procfs/sysfs, but for
> > some I can't.
> > 
> > One of these is the call to manage local caching on a file or volume.  This,
> > however, doesn't really need to be limited to AFS, but could also be
> > applicable to NFS, CIFS, etc. - and possibly even to local filesystems.
> > 
> > A brainstorming session on this would be useful.
> 
> This is definitely something I'd like to discuss as well.  The posix_fadvise()
> call doesn't interact with the filesystem at all, so it isn't a useful source
> of cache management hints. 

That could be remedied though. Maybe a new file_operations member that
posix_fadvise could call when it's defined by the fs? I think it might
be useful to explore that avenue before proposing new interfaces.

> We've implemented a Lustre-specific ladvise hint
> using an ioctl to advise client and server caches, but having a generic hint
> API would be better.
> 

Agreed.

Ceph also has a private ioctl to toggle lazy write caching. Also, ISTR
that Trond had some patches at one point to let you micromanage the NFS
client caches. All of those are potentially useful, but without a
standard way to access them, it's hard to write applications that are
fs-agnostic.

For that reason, a hinting mechanism like posix_fadvise would seem to be
the best approach, IMO. The kernel could be free to ignore any of those
calls on filesystems where it's not implemented or not applicable.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-06 22:40 [LSF/MM TOPIC] Network filesystem cache management system call David Howells
  2017-01-06 23:18 ` Andreas Dilger
@ 2017-01-13 17:16 ` J. Bruce Fields
  2017-01-15 23:36 ` Oleg Drokin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2017-01-13 17:16 UTC (permalink / raw)
  To: David Howells; +Cc: lsf-pc, linux-fsdevel

On Fri, Jan 06, 2017 at 10:40:46PM +0000, David Howells wrote:
> AFS filesystem implementations like OpenAFS have a number of management calls
> usually implemented through the pioctl() and afs() system calls.  However,
> there is general revulsion to the idea of allowing these in Linux as they are
> very much abused.
> 
> Now, I need to come up with replacements for these.  For some I can use
> getxattr()/setxattr(), for some keyrings and for others procfs/sysfs, but for
> some I can't.
> 
> One of these is the call to manage local caching on a file or volume.  This,
> however, doesn't really need to be limited to AFS, but could also be
> applicable to NFS, CIFS, etc. - and possibly even to local filesystems.

Do you have a summary of the AFS interface to give an idea what's
needed?

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-06 22:40 [LSF/MM TOPIC] Network filesystem cache management system call David Howells
  2017-01-06 23:18 ` Andreas Dilger
  2017-01-13 17:16 ` J. Bruce Fields
@ 2017-01-15 23:36 ` Oleg Drokin
  2017-01-17 16:42 ` David Howells
  2017-01-17 16:49 ` David Howells
  4 siblings, 0 replies; 13+ messages in thread
From: Oleg Drokin @ 2017-01-15 23:36 UTC (permalink / raw)
  To: David Howells; +Cc: lsf-pc, linux-fsdevel


On Jan 6, 2017, at 5:40 PM, David Howells wrote:

> AFS filesystem implementations like OpenAFS have a number of management calls
> usually implemented through the pioctl() and afs() system calls.  However,
> there is general revulsion to the idea of allowing these in Linux as they are
> very much abused.
> 
> Now, I need to come up with replacements for these.  For some I can use
> getxattr()/setxattr(), for some keyrings and for others procfs/sysfs, but for
> some I can't.
> 
> One of these is the call to manage local caching on a file or volume.  This,
> however, doesn't really need to be limited to AFS, but could also be
> applicable to NFS, CIFS, etc. - and possibly even to local filesystems.
> 
> A brainstorming session on this would be useful.

I guess this would be very interesting for Lustre as well.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-06 22:40 [LSF/MM TOPIC] Network filesystem cache management system call David Howells
                   ` (2 preceding siblings ...)
  2017-01-15 23:36 ` Oleg Drokin
@ 2017-01-17 16:42 ` David Howells
  2017-01-20 16:53   ` [Lsf-pc] " Jeff Layton
  2017-01-20 17:45   ` David Howells
  2017-01-17 16:49 ` David Howells
  4 siblings, 2 replies; 13+ messages in thread
From: David Howells @ 2017-01-17 16:42 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: dhowells, lsf-pc, linux-fsdevel, jaltman

J. Bruce Fields <bfields@fieldses.org> wrote:

> > One of these is the call to manage local caching on a file or volume.
> > This, however, doesn't really need to be limited to AFS, but could also be
> > applicable to NFS, CIFS, etc. - and possibly even to local filesystems.
> 
> Do you have a summary of the AFS interface to give an idea what's
> needed?

I have the pioctls listed here that I need to emulate:

	https://www.infradead.org/~dhowells/kafs/user_interface.html

along with my thoughts on how to do that.

For cache wangling, I was thinking of something like:

	fcachectl(int dirfd,
		  const char *pathname,
		  unsigned atflags,
		  const char *cmd,
		  char *result,
		  size_t *result_len);

The relevant pioctls are:

 (*) VIOCGETCACHEPARMS

     Get the size of the cache.

 (*) VIOCSETCACHESIZE

     Set the cache size.

 (*) VIOCFLUSH

     Invalidate the cached information for an object, both the inode/dentry
     structs and anything in the local cache.

 (*) VIOCFLUSHCB

     Invalidate any callbacks/leases outstanding on an object.  This might
     make more sense to be done via the same mechanism as lease/lock
     management.

 (*) VIOC_FLUSHVOLUME

     Flush all cached state for a volume, both from RAM and local disk cache
     as far as possible.  Files that are open aren't necessarily affected.

 (*) VIOC_FLUSHALL

     FLush all cached state for all volumes.

 (*) VIOCPREFETCH

     Prefetch a file into the cache.

So, maybe:

	fcachectl(AT_FDCWD,
		  "/afs/user/dhowells",
		  0,
		  "flush volume",
		  NULL, NULL);

to flush an AFS volume containing my home directory.

Note that doing this by fcntl() or ioctl() has potential difficulties as it
would have to work on non-file objects such as device files or symlinks.

Other functions that this could be used for are cache pinning and
fixup/integration should we ever want disconnected operation.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-06 22:40 [LSF/MM TOPIC] Network filesystem cache management system call David Howells
                   ` (3 preceding siblings ...)
  2017-01-17 16:42 ` David Howells
@ 2017-01-17 16:49 ` David Howells
  2017-01-18 20:12   ` Jeffrey Altman
  2017-01-19 14:48   ` David Howells
  4 siblings, 2 replies; 13+ messages in thread
From: David Howells @ 2017-01-17 16:49 UTC (permalink / raw)
  To: lsf-pc; +Cc: dhowells, linux-fsdevel, jaltman

David Howells <dhowells@redhat.com> wrote:

> AFS filesystem implementations like OpenAFS have a number of management calls
> usually implemented through the pioctl() and afs() system calls.  However,
> there is general revulsion to the idea of allowing these in Linux as they are
> very much abused.
> 
> Now, I need to come up with replacements for these.  For some I can use
> getxattr()/setxattr(), for some keyrings and for others procfs/sysfs, but for
> some I can't.
> 
> One of these is the call to manage local caching on a file or volume.  This,
> however, doesn't really need to be limited to AFS, but could also be
> applicable to NFS, CIFS, etc. - and possibly even to local filesystems.
> 
> A brainstorming session on this would be useful.

Another thing that I would like to work out is how to do mountpoint management
commands for kAFS.  This includes doing the following operations:

 (*) Creating a mountpoint.

 (*) Deleting a mountpoint

 (*) Reading a mountpoint.

 (*) Invalidating cached mountpoint information.

The problem here is that mountpoints are automatically mounted when you touch
them unless you specify the AT_NO_AUTOMOUNT flag - and even then, that's
ignored if the mountpoint is already mounted upon.

I could do this by saying that you have to open the parent directory of the
mountpoint and do an ioctl() on it, but would it be better to have one or more
syscalls for this purpose?

Further, would this be of use to any other filesystems?

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-17 16:49 ` David Howells
@ 2017-01-18 20:12   ` Jeffrey Altman
  2017-01-19 14:48   ` David Howells
  1 sibling, 0 replies; 13+ messages in thread
From: Jeffrey Altman @ 2017-01-18 20:12 UTC (permalink / raw)
  To: David Howells, lsf-pc; +Cc: linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 2998 bytes --]

On 1/17/2017 11:49 AM, David Howells wrote:
> David Howells <dhowells@redhat.com> wrote:
> 
>> AFS filesystem implementations like OpenAFS have a number of management calls
>> usually implemented through the pioctl() and afs() system calls.  However,
>> there is general revulsion to the idea of allowing these in Linux as they are
>> very much abused.
>>
>> Now, I need to come up with replacements for these.  For some I can use
>> getxattr()/setxattr(), for some keyrings and for others procfs/sysfs, but for
>> some I can't.
>>
>> One of these is the call to manage local caching on a file or volume.  This,
>> however, doesn't really need to be limited to AFS, but could also be
>> applicable to NFS, CIFS, etc. - and possibly even to local filesystems.
>>
>> A brainstorming session on this would be useful.

I would be happy to participate.


> Another thing that I would like to work out is how to do mountpoint management
> commands for kAFS.  This includes doing the following operations:
> 
>  (*) Creating a mountpoint.
> 
>  (*) Deleting a mountpoint
> 
>  (*) Reading a mountpoint.
> 
>  (*) Invalidating cached mountpoint information.
> 
> The problem here is that mountpoints are automatically mounted when you touch
> them unless you specify the AT_NO_AUTOMOUNT flag - and even then, that's
> ignored if the mountpoint is already mounted upon.
> 
> I could do this by saying that you have to open the parent directory of the
> mountpoint and do an ioctl() on it, but would it be better to have one or more
> syscalls for this purpose?

I am uncomfortable with opening the parent directory object because it
introduces object reference confusion when the path to the parent
directory is itself a symlink or a mount point.

Granted, the AFS3 protocol does not make this easy.

1. An AFS3 mount point is a special type of symlink

2. AFS3 does not provide a method by which a symlink's target can be
   altered or its type changed after it is created.

3. Legacy AFS3 clients have always exposed mount points to applications
   as a directory not a symlink.

If I had my preference, I would implement creating a mount point as:

1. create an empty file or symlink

2. open the resulting object and issue an ioctl against that object
   to set a mount point target which results in it becoming a mount
   point.

Deleting, Reading, Modifying, and Invalidating all are implemented as

1. open mount point

2. issue ioctl against object

Unfortunately, AFS3 simply doesn't provide the functionality I desire.
The best that could be done would be to implement a createMountPoint
ioctl which operates by deleting the open object and creating a new
mount point object in its place.  While dealing with the fact that the
AFS FID for the object object will change as a result.
> 
> Further, would this be of use to any other filesystems?

Could SMB use this as a method of defining referrals?

Jeffrey Altman



[-- Attachment #1.2: jaltman.vcf --]
[-- Type: text/x-vcard, Size: 410 bytes --]

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States
email;internet:jaltman@auristor.com
title:Founder and CEO
tel;work:+1-212-769-9018
note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman=0D=0A=
	Skype: jeffrey.e.altman=0D=0A=
	
url:https://www.auristor.com/
version:2.1
end:vcard


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4057 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-17 16:49 ` David Howells
  2017-01-18 20:12   ` Jeffrey Altman
@ 2017-01-19 14:48   ` David Howells
  2017-01-20  4:32     ` Jeffrey Altman
  1 sibling, 1 reply; 13+ messages in thread
From: David Howells @ 2017-01-19 14:48 UTC (permalink / raw)
  To: Jeffrey Altman; +Cc: dhowells, lsf-pc, linux-fsdevel

Jeffrey Altman <jaltman@auristor.com> wrote:

> > I could do this by saying that you have to open the parent directory of
> > the mountpoint and do an ioctl() on it, but would it be better to have one
> > or more syscalls for this purpose?
> 
> I am uncomfortable with opening the parent directory object because it
> introduces object reference confusion when the path to the parent
> directory is itself a symlink or a mount point.
> 
> Granted, the AFS3 protocol does not make this easy.
> 
> 1. An AFS3 mount point is a special type of symlink

Whilst this is true in the protocol, kAFS maps that into a directory that has
a ->d_automount() op.

> 2. AFS3 does not provide a method by which a symlink's target can be
>    altered or its type changed after it is created.

Can the mountpoint object be renamed and/or moved between directories?

> 3. Legacy AFS3 clients have always exposed mount points to applications
>    as a directory not a symlink.

It should be noted here that legacy AFS3 clients have a single superblock that
covers all the volumes on all the cells that they mount, so mountpoints aren't
actually visible to the VFS.

In contrast, with kAFS, each volume is kept as a separate superblock and that
is mounted directly on the mountpoint.

A further note: I believe that a volume must have a directory at the base and
cannot have a mountpoint there - so you shouldn't get stacked mountpoints
without interference by the local sysadmin manually issuing mount commands.

> If I had my preference, I would implement creating a mount point as:
> 
> 1. create an empty file or symlink
> 
> 2. open the resulting object and issue an ioctl against that object
>    to set a mount point target which results in it becoming a mount
>    point.

Opening the mountpoint object, if it's a dangling symlink, could be tricky.
There's an O_PATH that you can open with, but I'm not sure that's of use;
further, if the automountpoint is actually mounted over, you can't get to the
mountpoint without stepping over it.  Even O_NOFOLLOW and AT_NO_AUTOMOUNT
don't help.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-19 14:48   ` David Howells
@ 2017-01-20  4:32     ` Jeffrey Altman
  0 siblings, 0 replies; 13+ messages in thread
From: Jeffrey Altman @ 2017-01-20  4:32 UTC (permalink / raw)
  To: David Howells; +Cc: lsf-pc, linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 3013 bytes --]

On 1/19/2017 9:48 AM, David Howells wrote:
> Jeffrey Altman <jaltman@auristor.com> wrote:
> 
>>> I could do this by saying that you have to open the parent directory of
>>> the mountpoint and do an ioctl() on it, but would it be better to have one
>>> or more syscalls for this purpose?
>>
>> I am uncomfortable with opening the parent directory object because it
>> introduces object reference confusion when the path to the parent
>> directory is itself a symlink or a mount point.
>>
>> Granted, the AFS3 protocol does not make this easy.
>>
>> 1. An AFS3 mount point is a special type of symlink
> 
> Whilst this is true in the protocol, kAFS maps that into a directory that has
> a ->d_automount() op.

that is a fine representation.

>> 2. AFS3 does not provide a method by which a symlink's target can be
>>    altered or its type changed after it is created.
> 
> Can the mountpoint object be renamed and/or moved between directories?

yes, and it is theoretically possible for its target to be modified.  If
that happens, the data version of the object will change.

>> 3. Legacy AFS3 clients have always exposed mount points to applications
>>    as a directory not a symlink.
> 
> It should be noted here that legacy AFS3 clients have a single superblock that
> covers all the volumes on all the cells that they mount, so mountpoints aren't
> actually visible to the VFS.

correct, and this does cause a variety of problems.  there is no method
of reporting the size of the volume or the amount of free space in a
volume.  this is a side effect of /afs being exposed to the VFS as a
single device that includes all AFS3 storage everywhere in the world.
> 
> In contrast, with kAFS, each volume is kept as a separate superblock and that
> is mounted directly on the mountpoint.
> 
> A further note: I believe that a volume must have a directory at the base and
> cannot have a mountpoint there - so you shouldn't get stacked mountpoints
> without interference by the local sysadmin manually issuing mount commands.

A volume by definition is a rooted directory tree where the volume root
directory vnode id is 1 and the root directory has no parent.

>> If I had my preference, I would implement creating a mount point as:
>>
>> 1. create an empty file or symlink
>>
>> 2. open the resulting object and issue an ioctl against that object
>>    to set a mount point target which results in it becoming a mount
>>    point.
> 
> Opening the mountpoint object, if it's a dangling symlink, could be tricky.
> There's an O_PATH that you can open with, but I'm not sure that's of use;
> further, if the automountpoint is actually mounted over, you can't get to the
> mountpoint without stepping over it.  Even O_NOFOLLOW and AT_NO_AUTOMOUNT
> don't help.

David and I spoke offline.  Create and Unlink are operations on the
directory and should remain so.  Read is an operation best performed on
the mount point itself.

Jeffrey Altman



[-- Attachment #1.2: jaltman.vcf --]
[-- Type: text/x-vcard, Size: 410 bytes --]

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States
email;internet:jaltman@auristor.com
title:Founder and CEO
tel;work:+1-212-769-9018
note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman=0D=0A=
	Skype: jeffrey.e.altman=0D=0A=
	
url:https://www.auristor.com/
version:2.1
end:vcard


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4057 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-17 16:42 ` David Howells
@ 2017-01-20 16:53   ` Jeff Layton
  2017-01-20 17:45   ` David Howells
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2017-01-20 16:53 UTC (permalink / raw)
  To: David Howells, J. Bruce Fields; +Cc: linux-fsdevel, lsf-pc, jaltman

On Tue, 2017-01-17 at 16:42 +0000, David Howells wrote:
> J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > > One of these is the call to manage local caching on a file or volume.
> > > This, however, doesn't really need to be limited to AFS, but could also be
> > > applicable to NFS, CIFS, etc. - and possibly even to local filesystems.
> > 
> > Do you have a summary of the AFS interface to give an idea what's
> > needed?
> 
> I have the pioctls listed here that I need to emulate:
> 
> 	https://www.infradead.org/~dhowells/kafs/user_interface.html
> 
> along with my thoughts on how to do that.
> 
> For cache wangling, I was thinking of something like:
> 
> 	fcachectl(int dirfd,
> 		  const char *pathname,
> 		  unsigned atflags,
> 		  const char *cmd,
> 		  char *result,
> 		  size_t *result_len);
> 

I think it might be more useful to wire posix_fadvise into the
filesystem drivers somehow. A hinting interface really seems like the
right approach here, given the differences between different
filesystems.
 
> The relevant pioctls are:
> 
>  (*) VIOCGETCACHEPARMS
> 
>      Get the size of the cache.
> 

Global or per-inode cache?

>  (*) VIOCSETCACHESIZE
> 
>      Set the cache size.
> 
>  (*) VIOCFLUSH
> 
>      Invalidate the cached information for an object, both the inode/dentry
>      structs and anything in the local cache.
> 

Maybe POSIX_FADV_DONTNEED ?

>  (*) VIOCFLUSHCB
> 
>      Invalidate any callbacks/leases outstanding on an object.  This might
>      make more sense to be done via the same mechanism as lease/lock
>      management.
> 

Well...just because we have a delegation or layout on NFS, that doesn't
mean we'll have any sort of client VFS-layer lease.

I guess you could use this on NFS to force the client to drop a
delegation or layout? That could be useful.


>  (*) VIOC_FLUSHVOLUME
> 
>      Flush all cached state for a volume, both from RAM and local disk cache
>      as far as possible.  Files that are open aren't necessarily affected.
> 

Maybe POSIX_FADV_DONTNEED on the mountpoint?

>  (*) VIOC_FLUSHALL
> 
>      FLush all cached state for all volumes.
> 

How would you implement that in a generic way? Suppose I have a mix of
AFS and NFS mountpoints and issue this via some mechanism. Is
everything going to drop their caches?

Might want to punt on this one or do it with a private, AFS-only ioctl.


>  (*) VIOCPREFETCH
> 
>      Prefetch a file into the cache.
> 

POSIX_FADV_WILLNEED ?

> So, maybe:
> 
> 	fcachectl(AT_FDCWD,
> 		  "/afs/user/dhowells",
> 		  0,
> 		  "flush volume",
> 		  NULL, NULL);
> 
> to flush an AFS volume containing my home directory.
> 
> Note that doing this by fcntl() or ioctl() has potential difficulties as it
> would have to work on non-file objects such as device files or symlinks.
> 

Does AFS allow remote access to devices a'la CIFS?

Could we allow posix_fadvise on O_PATH opens? For symlinks there is
always O_NOFOLLOW.

> Other functions that this could be used for are cache pinning and
> fixup/integration should we ever want disconnected operation.
> 

Yeah, a lot of possibilities there.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-17 16:42 ` David Howells
  2017-01-20 16:53   ` [Lsf-pc] " Jeff Layton
@ 2017-01-20 17:45   ` David Howells
  2017-01-20 18:08     ` Jeff Layton
  1 sibling, 1 reply; 13+ messages in thread
From: David Howells @ 2017-01-20 17:45 UTC (permalink / raw)
  To: Jeff Layton; +Cc: dhowells, J. Bruce Fields, linux-fsdevel, lsf-pc, jaltman

Jeff Layton <jlayton@redhat.com> wrote:

> I think it might be more useful to wire posix_fadvise into the
> filesystem drivers somehow. A hinting interface really seems like the
> right approach here, given the differences between different
> filesystems.

The main reason I'm against using an fd-taking interface is that the object to
be affected might not be a regular file and could even be mounted over.

> >  (*) VIOCGETCACHEPARMS
> > 
> >      Get the size of the cache.
> > 
> 
> Global or per-inode cache?

I think this would have to be whatever cache the target inode is lurking
within.  fscache permits multiple caches on a system.

> >  (*) VIOC_FLUSHVOLUME
> > 
> >      Flush all cached state for a volume, both from RAM and local disk cache
> >      as far as possible.  Files that are open aren't necessarily affected.
> > 
> 
> Maybe POSIX_FADV_DONTNEED on the mountpoint?

Ugh.  No.  How would you differentiate flushing just the mountpoint or the
root dir from flushing the volume?  Also, you cannot open the mountpoint
object if it is mounted over.

Also POSIX_FADV_DONTNEED is a hint that an application no longer needs the
data and is not a specifically a command to flush that data.

> >  (*) VIOC_FLUSHALL
> > 
> >      FLush all cached state for all volumes.
> > 
> 
> How would you implement that in a generic way? Suppose I have a mix of
> AFS and NFS mountpoints and issue this via some mechanism. Is
> everything going to drop their caches?
> 
> Might want to punt on this one or do it with a private, AFS-only ioctl.

Might be worth making it AFS-only.  Possibly it would make sense to implement
it in userspace using VIOC_FLUSHVOLUME and iterating over /proc/mounts, but
that then begs the question of whether this should be affected by namespaces.

> POSIX_FADV_WILLNEED ?

Perhaps.

> Does AFS allow remote access to devices a'la CIFS?

No. :-)

> Could we allow posix_fadvise on O_PATH opens?  For symlinks there is always
> O_NOFOLLOW.

Maybe.  Al?

This doesn't work for mounted-over mountpoints, however.  I guess we could add
AT_NO_FOLLOW_MOUNTS to get the basalmost mountpoint.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Network filesystem cache management system call
  2017-01-20 17:45   ` David Howells
@ 2017-01-20 18:08     ` Jeff Layton
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2017-01-20 18:08 UTC (permalink / raw)
  To: David Howells; +Cc: J. Bruce Fields, linux-fsdevel, lsf-pc, jaltman

On Fri, 2017-01-20 at 17:45 +0000, David Howells wrote:
> Jeff Layton <jlayton@redhat.com> wrote:
> 
> > I think it might be more useful to wire posix_fadvise into the
> > filesystem drivers somehow. A hinting interface really seems like the
> > right approach here, given the differences between different
> > filesystems.
> 
> The main reason I'm against using an fd-taking interface is that the object to
> be affected might not be a regular file and could even be mounted over.
> 

How would you disambiguate the mounted-over case with a path-based
interface?

> > >  (*) VIOCGETCACHEPARMS
> > > 
> > >      Get the size of the cache.
> > > 
> > 
> > Global or per-inode cache?
> 
> I think this would have to be whatever cache the target inode is lurking
> within.  fscache permits multiple caches on a system.
> 

Ok, but does this tell you "how big is this entire cache?" or "how much
cache does this inode currently consume" ? Both could be useful...

> > >  (*) VIOC_FLUSHVOLUME
> > > 
> > >      Flush all cached state for a volume, both from RAM and local disk cache
> > >      as far as possible.  Files that are open aren't necessarily affected.
> > > 
> > 
> > Maybe POSIX_FADV_DONTNEED on the mountpoint?
> 
> Ugh.  No.  How would you differentiate flushing just the mountpoint or the
> root dir from flushing the volume?  Also, you cannot open the mountpoint
> object if it is mounted over.
> 

Good point.

I don't know...this kind of thing might be better suited to a sysfs-
style interface, honestly. Anything where you're dealing at the level
of an entire fs doesn't benefit much from a per-inode syscall
interface. That said, that could get messy once you start dealing with
namespaces and such.

> Also POSIX_FADV_DONTNEED is a hint that an application no longer needs the
> data and is not a specifically a command to flush that data.
> 
> > >  (*) VIOC_FLUSHALL
> > > 
> > >      FLush all cached state for all volumes.
> > > 
> > 
> > How would you implement that in a generic way? Suppose I have a mix of
> > AFS and NFS mountpoints and issue this via some mechanism. Is
> > everything going to drop their caches?
> > 
> > Might want to punt on this one or do it with a private, AFS-only ioctl.
> 
> Might be worth making it AFS-only.  Possibly it would make sense to implement
> it in userspace using VIOC_FLUSHVOLUME and iterating over /proc/mounts, but
> that then begs the question of whether this should be affected by namespaces.
> 
> > POSIX_FADV_WILLNEED ?
> 
> Perhaps.
> 
> > Does AFS allow remote access to devices a'la CIFS?
> 
> No. :-)
> 

I'm not sure I get why it's terribly useful to manipulate the cache on
a symlink or device file itself. There's generally not much cached
anyway, right (generally nothing more than a page anyway).

> > Could we allow posix_fadvise on O_PATH opens?  For symlinks there is always
> > O_NOFOLLOW.
> 
> Maybe.  Al?
> 
> This doesn't work for mounted-over mountpoints, however.  I guess we could add
> AT_NO_FOLLOW_MOUNTS to get the basalmost mountpoint.
> 

Yeah, perhaps.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-01-20 18:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-06 22:40 [LSF/MM TOPIC] Network filesystem cache management system call David Howells
2017-01-06 23:18 ` Andreas Dilger
2017-01-07 14:27   ` [Lsf-pc] " Jeff Layton
2017-01-13 17:16 ` J. Bruce Fields
2017-01-15 23:36 ` Oleg Drokin
2017-01-17 16:42 ` David Howells
2017-01-20 16:53   ` [Lsf-pc] " Jeff Layton
2017-01-20 17:45   ` David Howells
2017-01-20 18:08     ` Jeff Layton
2017-01-17 16:49 ` David Howells
2017-01-18 20:12   ` Jeffrey Altman
2017-01-19 14:48   ` David Howells
2017-01-20  4:32     ` Jeffrey Altman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.