linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/6] ovl: respect FIEMAP_FLAG_SYNC flag
       [not found] <1535374564-8257-1-git-send-email-amir73il@gmail.com>
@ 2018-08-27 12:55 ` Amir Goldstein
       [not found] ` <1535374564-8257-5-git-send-email-amir73il@gmail.com>
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2018-08-27 12:55 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, Dave Chinner, linux-unionfs, linux-fsdevel

Stacked overlayfs fiemap operation broke xfstests that test delayed
allocation (with "_test_generic_punch -d"), because ovl_fiemap()
failed to write dirty pages when requested.

Fixes: 9e142c4102db ("ovl: add ovl_fiemap()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index e0bb217c01e2..5014749fd4b4 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -467,6 +467,10 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		return -EOPNOTSUPP;
 
 	old_cred = ovl_override_creds(inode->i_sb);
+
+	if (fieinfo->fi_flags & FIEMAP_FLAG_SYNC)
+		filemap_write_and_wait(realinode->i_mapping);
+
 	err = realinode->i_op->fiemap(realinode, fieinfo, start, len);
 	revert_creds(old_cred);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 4/6] vfs: add the fadvise() file operation
       [not found] ` <1535374564-8257-5-git-send-email-amir73il@gmail.com>
@ 2018-08-27 13:56   ` Steve French
  0 siblings, 0 replies; 8+ messages in thread
From: Steve French @ 2018-08-27 13:56 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, linux-unionfs, linux-fsdevel

This reminds me of the relatively recent addition of two flags on
write (and one on read) to Windows.  Any thoughts on whether this
could be used to solve one of the problems virtualization engines
(apparently) ran into running over the network and solved on other
platforms by adding per-write flags to change whether the write is
write-through or not and whether the write should be unbuffered (also
allowed for read requests).  See MS-SMB2 section 2.2.21.
This was added to the network file system protocol fairly recently
(dialect 3.02) but I can see how it could be useful for a few
application use case and seems related to the fadvise api, although
perhaps easier to use.  Maybe fadvise could be used by cifs.ko to
properly set these flags on the write (and read) protocol ops.
Thoughts?

Any obvious ways that they could be mapped together.
On Mon, Aug 27, 2018 at 7:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> This is going to be used by overlayfs and possibly useful
> for other filesystems.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  Documentation/filesystems/vfs.txt |  3 ++
>  include/linux/fs.h                |  5 +++
>  mm/fadvise.c                      | 78 ++++++++++++++++++++++-----------------
>  3 files changed, 53 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index e0ace944a0e1..4868fa9c0758 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -885,6 +885,7 @@ struct file_operations {
>         ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
>         int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t, u64);
>         int (*dedupe_file_range)(struct file *, loff_t, struct file *, loff_t, u64);
> +       int (*fadvise)(struct file *, loff_t, loff_t, int);
>  };
>
>  Again, all methods are called without any locks being held, unless
> @@ -965,6 +966,8 @@ otherwise noted.
>    dedupe_file_range: called by the ioctl(2) system call for FIDEDUPERANGE
>         command.
>
> +  fadvise: possibly called by the fadvise64() system call.
> +
>  Note that the file operations are implemented by the specific
>  filesystem in which the inode resides. When opening a device node
>  (character or block special) most filesystems will call special
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 33322702c910..6c0b4a1c22ff 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1763,6 +1763,7 @@ struct file_operations {
>                         u64);
>         int (*dedupe_file_range)(struct file *, loff_t, struct file *, loff_t,
>                         u64);
> +       int (*fadvise)(struct file *, loff_t, loff_t, int);
>  } __randomize_layout;
>
>  struct inode_operations {
> @@ -3459,4 +3460,8 @@ static inline bool dir_relax_shared(struct inode *inode)
>  extern bool path_noexec(const struct path *path);
>  extern void inode_nohighmem(struct inode *inode);
>
> +/* mm/fadvise.c */
> +extern int vfs_fadvise(struct file *file, loff_t offset, loff_t len,
> +                      int advice);
> +
>  #endif /* _LINUX_FS_H */
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 2d8376e3c640..2f59bac1cb77 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -27,9 +27,9 @@
>   * deactivate the pages and clear PG_Referenced.
>   */
>
> -int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
> +static int generic_fadvise(struct file *file, loff_t offset, loff_t len,
> +                          int advice)
>  {
> -       struct fd f = fdget(fd);
>         struct inode *inode;
>         struct address_space *mapping;
>         struct backing_dev_info *bdi;
> @@ -37,22 +37,14 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>         pgoff_t start_index;
>         pgoff_t end_index;
>         unsigned long nrpages;
> -       int ret = 0;
>
> -       if (!f.file)
> -               return -EBADF;
> +       inode = file_inode(file);
> +       if (S_ISFIFO(inode->i_mode))
> +               return -ESPIPE;
>
> -       inode = file_inode(f.file);
> -       if (S_ISFIFO(inode->i_mode)) {
> -               ret = -ESPIPE;
> -               goto out;
> -       }
> -
> -       mapping = f.file->f_mapping;
> -       if (!mapping || len < 0) {
> -               ret = -EINVAL;
> -               goto out;
> -       }
> +       mapping = file->f_mapping;
> +       if (!mapping || len < 0)
> +               return -EINVAL;
>
>         bdi = inode_to_bdi(mapping->host);
>
> @@ -67,9 +59,9 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>                         /* no bad return value, but ignore advice */
>                         break;
>                 default:
> -                       ret = -EINVAL;
> +                       return -EINVAL;
>                 }
> -               goto out;
> +               return 0;
>         }
>
>         /*
> @@ -85,21 +77,21 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>
>         switch (advice) {
>         case POSIX_FADV_NORMAL:
> -               f.file->f_ra.ra_pages = bdi->ra_pages;
> -               spin_lock(&f.file->f_lock);
> -               f.file->f_mode &= ~FMODE_RANDOM;
> -               spin_unlock(&f.file->f_lock);
> +               file->f_ra.ra_pages = bdi->ra_pages;
> +               spin_lock(&file->f_lock);
> +               file->f_mode &= ~FMODE_RANDOM;
> +               spin_unlock(&file->f_lock);
>                 break;
>         case POSIX_FADV_RANDOM:
> -               spin_lock(&f.file->f_lock);
> -               f.file->f_mode |= FMODE_RANDOM;
> -               spin_unlock(&f.file->f_lock);
> +               spin_lock(&file->f_lock);
> +               file->f_mode |= FMODE_RANDOM;
> +               spin_unlock(&file->f_lock);
>                 break;
>         case POSIX_FADV_SEQUENTIAL:
> -               f.file->f_ra.ra_pages = bdi->ra_pages * 2;
> -               spin_lock(&f.file->f_lock);
> -               f.file->f_mode &= ~FMODE_RANDOM;
> -               spin_unlock(&f.file->f_lock);
> +               file->f_ra.ra_pages = bdi->ra_pages * 2;
> +               spin_lock(&file->f_lock);
> +               file->f_mode &= ~FMODE_RANDOM;
> +               spin_unlock(&file->f_lock);
>                 break;
>         case POSIX_FADV_WILLNEED:
>                 /* First and last PARTIAL page! */
> @@ -115,8 +107,7 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>                  * Ignore return value because fadvise() shall return
>                  * success even if filesystem can't retrieve a hint,
>                  */
> -               force_page_cache_readahead(mapping, f.file, start_index,
> -                                          nrpages);
> +               force_page_cache_readahead(mapping, file, start_index, nrpages);
>                 break;
>         case POSIX_FADV_NOREUSE:
>                 break;
> @@ -183,9 +174,30 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>                 }
>                 break;
>         default:
> -               ret = -EINVAL;
> +               return -EINVAL;
>         }
> -out:
> +       return 0;
> +}
> +
> +int vfs_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +       if (file->f_op->fadvise)
> +               return file->f_op->fadvise(file, offset, len, advice);
> +
> +       return generic_fadvise(file, offset, len, advice);
> +}
> +EXPORT_SYMBOL(vfs_fadvise);
> +
> +int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
> +{
> +       struct fd f = fdget(fd);
> +       int ret;
> +
> +       if (!f.file)
> +               return -EBADF;
> +
> +       ret = vfs_fadvise(f.file, offset, len, advice);
> +
>         fdput(f);
>         return ret;
>  }
> --
> 2.7.4
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 6/6] ovl: add ovl_fadvise()
       [not found] ` <1535374564-8257-7-git-send-email-amir73il@gmail.com>
@ 2018-08-27 18:52   ` Vivek Goyal
  2018-08-27 19:05     ` Amir Goldstein
  2019-12-28  5:49   ` Andreas Dilger
  1 sibling, 1 reply; 8+ messages in thread
From: Vivek Goyal @ 2018-08-27 18:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, linux-unionfs, linux-fsdevel

On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote:
> Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> on an overlayfs file.
> 
> Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
> Fixes: d1d04ef8572b ("ovl: stack file ops")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/file.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index a4acd84591d4..42d2d034d85c 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
>  	return ret;
>  }
>  
> +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +	struct fd real;
> +	int ret;
> +
> +	ret = ovl_real_fdget(file, &real);
> +	if (ret)
> +		return ret;
> +
> +	/* XXX: do we need mounter credentials? */

Given we are switching creds to mounter for rest of the file operations,
so I would think we need to do it here as well to be consistent with
this security model.

Thanks
Vivek
> +	ret = vfs_fadvise(real.file, offset, len, advice);
> +
> +	fdput(real);
> +
> +	return ret;
> +}
> +
>  static long ovl_real_ioctl(struct file *file, unsigned int cmd,
>  			   unsigned long arg)
>  {
> @@ -499,6 +516,7 @@ const struct file_operations ovl_file_operations = {
>  	.fsync		= ovl_fsync,
>  	.mmap		= ovl_mmap,
>  	.fallocate	= ovl_fallocate,
> +	.fadvise	= ovl_fadvise,
>  	.unlocked_ioctl	= ovl_ioctl,
>  	.compat_ioctl	= ovl_compat_ioctl,
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 6/6] ovl: add ovl_fadvise()
  2018-08-27 18:52   ` [PATCH v3 6/6] ovl: add ovl_fadvise() Vivek Goyal
@ 2018-08-27 19:05     ` Amir Goldstein
  2018-08-27 19:29       ` Miklos Szeredi
  0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2018-08-27 19:05 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, overlayfs, linux-fsdevel

On Mon, Aug 27, 2018 at 9:52 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote:
> > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> > on an overlayfs file.
> >
> > Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
> > Fixes: d1d04ef8572b ("ovl: stack file ops")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/overlayfs/file.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index a4acd84591d4..42d2d034d85c 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> >       return ret;
> >  }
> >
> > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> > +{
> > +     struct fd real;
> > +     int ret;
> > +
> > +     ret = ovl_real_fdget(file, &real);
> > +     if (ret)
> > +             return ret;
> > +
> > +     /* XXX: do we need mounter credentials? */
>
> Given we are switching creds to mounter for rest of the file operations,
> so I would think we need to do it here as well to be consistent with
> this security model.
>

Yeh, I guess so, although I did not see any security checks in
fadvise64(2) syscall. readahead(2) at least checks for FMODE_READ
on the open file fadvise doesn't even bother with that...

Miklos, let me know if you want me to add override_creds or will
you add it yourself.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 6/6] ovl: add ovl_fadvise()
  2018-08-27 19:05     ` Amir Goldstein
@ 2018-08-27 19:29       ` Miklos Szeredi
  0 siblings, 0 replies; 8+ messages in thread
From: Miklos Szeredi @ 2018-08-27 19:29 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Vivek Goyal, Al Viro, Dave Chinner, overlayfs, linux-fsdevel

On Mon, Aug 27, 2018 at 9:05 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Aug 27, 2018 at 9:52 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>> On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote:
>> > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
>> > on an overlayfs file.
>> >
>> > Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
>> > Fixes: d1d04ef8572b ("ovl: stack file ops")
>> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> > ---
>> >  fs/overlayfs/file.c | 18 ++++++++++++++++++
>> >  1 file changed, 18 insertions(+)
>> >
>> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
>> > index a4acd84591d4..42d2d034d85c 100644
>> > --- a/fs/overlayfs/file.c
>> > +++ b/fs/overlayfs/file.c
>> > @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
>> >       return ret;
>> >  }
>> >
>> > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
>> > +{
>> > +     struct fd real;
>> > +     int ret;
>> > +
>> > +     ret = ovl_real_fdget(file, &real);
>> > +     if (ret)
>> > +             return ret;
>> > +
>> > +     /* XXX: do we need mounter credentials? */
>>
>> Given we are switching creds to mounter for rest of the file operations,
>> so I would think we need to do it here as well to be consistent with
>> this security model.
>>
>
> Yeh, I guess so, although I did not see any security checks in
> fadvise64(2) syscall. readahead(2) at least checks for FMODE_READ
> on the open file fadvise doesn't even bother with that...
>
> Miklos, let me know if you want me to add override_creds or will
> you add it yourself.

I'll add it.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 5/6] vfs: implement readahead(2) using POSIX_FADV_WILLNEED
       [not found] ` <1535374564-8257-6-git-send-email-amir73il@gmail.com>
@ 2018-08-27 23:30   ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2018-08-27 23:30 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Aug 27, 2018 at 03:56:03PM +0300, Amir Goldstein wrote:
> The implementation of readahead(2) syscall is identical to that of
> fadvise64(POSIX_FADV_WILLNEED) with a few exceptions:
> 1. readahead(2) returns -EINVAL for !mapping->a_ops and fadvise64()
>    ignores the request and returns 0.
> 2. fadvise64() checks for integer overflow corner case
> 3. fadvise64() calls the optional filesystem fadvice() file operation
> 
> Unite the two implementations by calling vfs_fadvice() from readahead(2)
> syscall. Check the !mapping->a_ops in readahead(2) syscall to preserve
> legacy behavior.
> 
> Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
> Fixes: d1d04ef8572b ("ovl: stack file ops")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  mm/readahead.c | 28 +++++++++++-----------------
>  1 file changed, 11 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index a59ea70527b9..867a5cc3a62e 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -20,6 +20,7 @@
>  #include <linux/file.h>
>  #include <linux/mm_inline.h>
>  #include <linux/blk-cgroup.h>
> +#include <linux/fadvise.h>
>  
>  #include "internal.h"
>  
> @@ -576,21 +577,19 @@ page_cache_async_readahead(struct address_space *mapping,
>  EXPORT_SYMBOL_GPL(page_cache_async_readahead);
>  
>  static ssize_t
> -do_readahead(struct address_space *mapping, struct file *filp,
> -	     pgoff_t index, unsigned long nr)
> +do_readahead(struct file *file, loff_t offset, size_t count)
>  {
> -	if (!mapping || !mapping->a_ops)
> -		return -EINVAL;
> +	struct address_space *mapping = file->f_mapping;
>  
>  	/*
> -	 * Readahead doesn't make sense for DAX inodes, but we don't want it
> -	 * to report a failure either.  Instead, we just return success and
> -	 * don't do any work.
> +	 * fadvise() silently ignores an advice for a file with !a_ops and
> +	 * returns -EPIPE for a pipe. Keep this check here to comply with legacy
> +	 * -EINVAL behavior of readahead(2).
>  	 */
> -	if (dax_mapping(mapping))
> -		return 0;
> +	if (!mapping || !mapping->a_ops || !S_ISREG(file_inode(file)->i_mode))
> +		return -EINVAL;
> -	return force_page_cache_readahead(mapping, filp, index, nr);
> +	return vfs_fadvise(file, offset, count, POSIX_FADV_WILLNEED);
>  }
>  
>  ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
> @@ -601,13 +600,8 @@ ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
>  	ret = -EBADF;
>  	f = fdget(fd);
>  	if (f.file) {
> -		if (f.file->f_mode & FMODE_READ) {
> -			struct address_space *mapping = f.file->f_mapping;
> -			pgoff_t start = offset >> PAGE_SHIFT;
> -			pgoff_t end = (offset + count - 1) >> PAGE_SHIFT;
> -			unsigned long len = end - start + 1;
> -			ret = do_readahead(mapping, f.file, start, len);
> -		}
> +		if (f.file->f_mode & FMODE_READ)
> +			ret = do_readahead(f.file, offset, count);
>  		fdput(f);

Can we just get rid of do_readahead helper and call vfs_fadvise()
directly? This code is not shared with anyone and the error
semantics are specific to the readahead syscall, so IMO those three
lines of code should just be in line....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 6/6] ovl: add ovl_fadvise()
       [not found] ` <1535374564-8257-7-git-send-email-amir73il@gmail.com>
  2018-08-27 18:52   ` [PATCH v3 6/6] ovl: add ovl_fadvise() Vivek Goyal
@ 2019-12-28  5:49   ` Andreas Dilger
  2019-12-28 10:10     ` Amir Goldstein
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Dilger @ 2019-12-28  5:49 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, linux-unionfs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1785 bytes --]

On Aug 27, 2018, at 6:56 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> on an overlayfs file.

I was just looking into the existence of the "new" fadvise() method in
the VFS being able to communicate application hints directly to the
filesystem to see if it could be used to address the word size issue in
https://bugzilla.kernel.org/show_bug.cgi?id=205957 without adding a new
syscall, and came across this patch and the 4/6 patch that adds the
vfs_fadvise() function itself (copied below for clarity).

It seems to me that this implementation is broken?  Only vfs_fadvise()
is called from the fadvise64() syscall, and it will call f_op->fadvise()
if the filesystem provides this method.  Only overlayfs provides the
.fadvise method today.  However, it looks that ovl_fadvise() calls back
into vfs_fadvise() again, in a seemingly endless loop?

It seems like generic_fadvise() should be EXPORT_SYMBOL() so that any
filesystem that implements its own .fadvise method can do its own thing,
and then call generic_fadvise() to handle the remaining MM-specific work.

Thoughts?

> +int vfs_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +	if (file->f_op->fadvise)
> +		return file->f_op->fadvise(file, offset, len, advice);
> +
> +	return generic_fadvise(file, offset, len, advice);
> +}
> +EXPORT_SYMBOL(vfs_fadvise);
> 
> +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +	struct fd real;
> +	int ret;
> +
> +	ret = ovl_real_fdget(file, &real);
> +	if (ret)
> +		return ret;
> +
> +	/* XXX: do we need mounter credentials? */
> +	ret = vfs_fadvise(real.file, offset, len, advice);
> +
> +	fdput(real);
> +
> +	return ret;
> +}

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 6/6] ovl: add ovl_fadvise()
  2019-12-28  5:49   ` Andreas Dilger
@ 2019-12-28 10:10     ` Amir Goldstein
  0 siblings, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2019-12-28 10:10 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, overlayfs, linux-fsdevel

On Sat, Dec 28, 2019 at 7:49 AM Andreas Dilger <adilger@dilger.ca> wrote:
>
> On Aug 27, 2018, at 6:56 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> > on an overlayfs file.
>
> I was just looking into the existence of the "new" fadvise() method in
> the VFS being able to communicate application hints directly to the
> filesystem to see if it could be used to address the word size issue in
> https://bugzilla.kernel.org/show_bug.cgi?id=205957 without adding a new
> syscall, and came across this patch and the 4/6 patch that adds the
> vfs_fadvise() function itself (copied below for clarity).
>
> It seems to me that this implementation is broken?  Only vfs_fadvise()
> is called from the fadvise64() syscall, and it will call f_op->fadvise()
> if the filesystem provides this method.  Only overlayfs provides the
> .fadvise method today.  However, it looks that ovl_fadvise() calls back
> into vfs_fadvise() again, in a seemingly endless loop?
>

You are confusing endless loop with recursion that has a stop condition.
The entire concept of stacked filesystem is recursion back into vfs.
This is essentially what most of the ovl file operations do, but they recurse
on the "real.file", which is supposed to be on a filesystem with lower
sb->s_stack_depth (FILESYSTEM_MAX_STACK_DEPTH is 2).

> It seems like generic_fadvise() should be EXPORT_SYMBOL() so that any
> filesystem that implements its own .fadvise method can do its own thing,
> and then call generic_fadvise() to handle the remaining MM-specific work.
>
> Thoughts?

Sure makes sense.
Overlayfs just doesn't need to call the generic helper.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-28 10:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1535374564-8257-1-git-send-email-amir73il@gmail.com>
2018-08-27 12:55 ` [PATCH v3 1/6] ovl: respect FIEMAP_FLAG_SYNC flag Amir Goldstein
     [not found] ` <1535374564-8257-5-git-send-email-amir73il@gmail.com>
2018-08-27 13:56   ` [PATCH v3 4/6] vfs: add the fadvise() file operation Steve French
     [not found] ` <1535374564-8257-7-git-send-email-amir73il@gmail.com>
2018-08-27 18:52   ` [PATCH v3 6/6] ovl: add ovl_fadvise() Vivek Goyal
2018-08-27 19:05     ` Amir Goldstein
2018-08-27 19:29       ` Miklos Szeredi
2019-12-28  5:49   ` Andreas Dilger
2019-12-28 10:10     ` Amir Goldstein
     [not found] ` <1535374564-8257-6-git-send-email-amir73il@gmail.com>
2018-08-27 23:30   ` [PATCH v3 5/6] vfs: implement readahead(2) using POSIX_FADV_WILLNEED Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).