linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steve French <smfrench@gmail.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>,
	linux-unionfs@vger.kernel.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v3 4/6] vfs: add the fadvise() file operation
Date: Mon, 27 Aug 2018 08:56:55 -0500	[thread overview]
Message-ID: <CAH2r5mtsEZ_RiuuCFFYcvQ8UxEW31Vmip8no0GnoNWC9xuo+Lg@mail.gmail.com> (raw)
In-Reply-To: <1535374564-8257-5-git-send-email-amir73il@gmail.com>

This reminds me of the relatively recent addition of two flags on
write (and one on read) to Windows.  Any thoughts on whether this
could be used to solve one of the problems virtualization engines
(apparently) ran into running over the network and solved on other
platforms by adding per-write flags to change whether the write is
write-through or not and whether the write should be unbuffered (also
allowed for read requests).  See MS-SMB2 section 2.2.21.
This was added to the network file system protocol fairly recently
(dialect 3.02) but I can see how it could be useful for a few
application use case and seems related to the fadvise api, although
perhaps easier to use.  Maybe fadvise could be used by cifs.ko to
properly set these flags on the write (and read) protocol ops.
Thoughts?

Any obvious ways that they could be mapped together.
On Mon, Aug 27, 2018 at 7:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> This is going to be used by overlayfs and possibly useful
> for other filesystems.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  Documentation/filesystems/vfs.txt |  3 ++
>  include/linux/fs.h                |  5 +++
>  mm/fadvise.c                      | 78 ++++++++++++++++++++++-----------------
>  3 files changed, 53 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index e0ace944a0e1..4868fa9c0758 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -885,6 +885,7 @@ struct file_operations {
>         ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
>         int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t, u64);
>         int (*dedupe_file_range)(struct file *, loff_t, struct file *, loff_t, u64);
> +       int (*fadvise)(struct file *, loff_t, loff_t, int);
>  };
>
>  Again, all methods are called without any locks being held, unless
> @@ -965,6 +966,8 @@ otherwise noted.
>    dedupe_file_range: called by the ioctl(2) system call for FIDEDUPERANGE
>         command.
>
> +  fadvise: possibly called by the fadvise64() system call.
> +
>  Note that the file operations are implemented by the specific
>  filesystem in which the inode resides. When opening a device node
>  (character or block special) most filesystems will call special
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 33322702c910..6c0b4a1c22ff 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1763,6 +1763,7 @@ struct file_operations {
>                         u64);
>         int (*dedupe_file_range)(struct file *, loff_t, struct file *, loff_t,
>                         u64);
> +       int (*fadvise)(struct file *, loff_t, loff_t, int);
>  } __randomize_layout;
>
>  struct inode_operations {
> @@ -3459,4 +3460,8 @@ static inline bool dir_relax_shared(struct inode *inode)
>  extern bool path_noexec(const struct path *path);
>  extern void inode_nohighmem(struct inode *inode);
>
> +/* mm/fadvise.c */
> +extern int vfs_fadvise(struct file *file, loff_t offset, loff_t len,
> +                      int advice);
> +
>  #endif /* _LINUX_FS_H */
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 2d8376e3c640..2f59bac1cb77 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -27,9 +27,9 @@
>   * deactivate the pages and clear PG_Referenced.
>   */
>
> -int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
> +static int generic_fadvise(struct file *file, loff_t offset, loff_t len,
> +                          int advice)
>  {
> -       struct fd f = fdget(fd);
>         struct inode *inode;
>         struct address_space *mapping;
>         struct backing_dev_info *bdi;
> @@ -37,22 +37,14 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>         pgoff_t start_index;
>         pgoff_t end_index;
>         unsigned long nrpages;
> -       int ret = 0;
>
> -       if (!f.file)
> -               return -EBADF;
> +       inode = file_inode(file);
> +       if (S_ISFIFO(inode->i_mode))
> +               return -ESPIPE;
>
> -       inode = file_inode(f.file);
> -       if (S_ISFIFO(inode->i_mode)) {
> -               ret = -ESPIPE;
> -               goto out;
> -       }
> -
> -       mapping = f.file->f_mapping;
> -       if (!mapping || len < 0) {
> -               ret = -EINVAL;
> -               goto out;
> -       }
> +       mapping = file->f_mapping;
> +       if (!mapping || len < 0)
> +               return -EINVAL;
>
>         bdi = inode_to_bdi(mapping->host);
>
> @@ -67,9 +59,9 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>                         /* no bad return value, but ignore advice */
>                         break;
>                 default:
> -                       ret = -EINVAL;
> +                       return -EINVAL;
>                 }
> -               goto out;
> +               return 0;
>         }
>
>         /*
> @@ -85,21 +77,21 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>
>         switch (advice) {
>         case POSIX_FADV_NORMAL:
> -               f.file->f_ra.ra_pages = bdi->ra_pages;
> -               spin_lock(&f.file->f_lock);
> -               f.file->f_mode &= ~FMODE_RANDOM;
> -               spin_unlock(&f.file->f_lock);
> +               file->f_ra.ra_pages = bdi->ra_pages;
> +               spin_lock(&file->f_lock);
> +               file->f_mode &= ~FMODE_RANDOM;
> +               spin_unlock(&file->f_lock);
>                 break;
>         case POSIX_FADV_RANDOM:
> -               spin_lock(&f.file->f_lock);
> -               f.file->f_mode |= FMODE_RANDOM;
> -               spin_unlock(&f.file->f_lock);
> +               spin_lock(&file->f_lock);
> +               file->f_mode |= FMODE_RANDOM;
> +               spin_unlock(&file->f_lock);
>                 break;
>         case POSIX_FADV_SEQUENTIAL:
> -               f.file->f_ra.ra_pages = bdi->ra_pages * 2;
> -               spin_lock(&f.file->f_lock);
> -               f.file->f_mode &= ~FMODE_RANDOM;
> -               spin_unlock(&f.file->f_lock);
> +               file->f_ra.ra_pages = bdi->ra_pages * 2;
> +               spin_lock(&file->f_lock);
> +               file->f_mode &= ~FMODE_RANDOM;
> +               spin_unlock(&file->f_lock);
>                 break;
>         case POSIX_FADV_WILLNEED:
>                 /* First and last PARTIAL page! */
> @@ -115,8 +107,7 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>                  * Ignore return value because fadvise() shall return
>                  * success even if filesystem can't retrieve a hint,
>                  */
> -               force_page_cache_readahead(mapping, f.file, start_index,
> -                                          nrpages);
> +               force_page_cache_readahead(mapping, file, start_index, nrpages);
>                 break;
>         case POSIX_FADV_NOREUSE:
>                 break;
> @@ -183,9 +174,30 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
>                 }
>                 break;
>         default:
> -               ret = -EINVAL;
> +               return -EINVAL;
>         }
> -out:
> +       return 0;
> +}
> +
> +int vfs_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +       if (file->f_op->fadvise)
> +               return file->f_op->fadvise(file, offset, len, advice);
> +
> +       return generic_fadvise(file, offset, len, advice);
> +}
> +EXPORT_SYMBOL(vfs_fadvise);
> +
> +int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
> +{
> +       struct fd f = fdget(fd);
> +       int ret;
> +
> +       if (!f.file)
> +               return -EBADF;
> +
> +       ret = vfs_fadvise(f.file, offset, len, advice);
> +
>         fdput(f);
>         return ret;
>  }
> --
> 2.7.4
>


-- 
Thanks,

Steve

  reply	other threads:[~2018-08-27 13:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 12:55 [PATCH v3 0/6] Overlayfs stacked f_op fixes Amir Goldstein
2018-08-27 12:55 ` [PATCH v3 1/6] ovl: respect FIEMAP_FLAG_SYNC flag Amir Goldstein
2018-08-27 12:56 ` [PATCH v3 2/6] ovl: fix GPF in swapfile_activate of file from overlayfs over xfs Amir Goldstein
2018-08-27 12:56 ` [PATCH v3 3/6] Documentation/filesystems: update documentation of file_operations to kernel 4.18 Amir Goldstein
2018-08-27 12:56 ` [PATCH v3 4/6] vfs: add the fadvise() file operation Amir Goldstein
2018-08-27 13:56   ` Steve French [this message]
2018-08-27 12:56 ` [PATCH v3 5/6] vfs: implement readahead(2) using POSIX_FADV_WILLNEED Amir Goldstein
2018-08-27 23:30   ` Dave Chinner
2018-08-27 12:56 ` [PATCH v3 6/6] ovl: add ovl_fadvise() Amir Goldstein
2018-08-27 18:52   ` Vivek Goyal
2018-08-27 19:05     ` Amir Goldstein
2018-08-27 19:29       ` Miklos Szeredi
2019-12-28  5:49   ` Andreas Dilger
2019-12-28 10:10     ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAH2r5mtsEZ_RiuuCFFYcvQ8UxEW31Vmip8no0GnoNWC9xuo+Lg@mail.gmail.com \
    --to=smfrench@gmail.com \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).