linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>, Jann Horn <jannh@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Linux API <linux-api@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH v4 3/9] fs: add RWF_ENCODED for reading/writing compressed data
Date: Sat, 29 Feb 2020 12:40:58 +0200	[thread overview]
Message-ID: <CAOQ4uxi_KRZFiEsDj_yn0f+Zo4tgAkKKcuAp3jiAmB4r7xjiEA@mail.gmail.com> (raw)
In-Reply-To: <4f8b9a66f5f6efdb9cab566581acb292f0b5b528.1582930832.git.osandov@fb.com>

On Sat, Feb 29, 2020 at 1:14 AM Omar Sandoval <osandov@osandov.com> wrote:
>
> From: Omar Sandoval <osandov@fb.com>
>
> Btrfs supports transparent compression: data written by the user can be
> compressed when written to disk and decompressed when read back.
> However, we'd like to add an interface to write pre-compressed data
> directly to the filesystem, and the matching interface to read
> compressed data without decompressing it. This adds support for
> so-called "encoded I/O" via preadv2() and pwritev2().
>
> A new RWF_ENCODED flags indicates that a read or write is "encoded". If
> this flag is set, iov[0].iov_base points to a struct encoded_iov which
> is used for metadata: namely, the compression algorithm, unencoded
> (i.e., decompressed) length, and what subrange of the unencoded data
> should be used (needed for truncated or hole-punched extents and when
> reading in the middle of an extent). For reads, the filesystem returns
> this information; for writes, the caller provides it to the filesystem.
> iov[0].iov_len must be set to sizeof(struct encoded_iov), which can be
> used to extend the interface in the future a la copy_struct_from_user().
> The remaining iovecs contain the encoded extent.
>
> This adds the VFS helpers for supporting encoded I/O and documentation
> for filesystem support.
>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  Documentation/filesystems/encoded_io.rst |  74 ++++++++++
>  Documentation/filesystems/index.rst      |   1 +
>  include/linux/fs.h                       |  16 +++
>  include/uapi/linux/fs.h                  |  33 ++++-
>  mm/filemap.c                             | 166 +++++++++++++++++++++--
>  5 files changed, 276 insertions(+), 14 deletions(-)
>  create mode 100644 Documentation/filesystems/encoded_io.rst
>
> diff --git a/Documentation/filesystems/encoded_io.rst b/Documentation/filesystems/encoded_io.rst
> new file mode 100644
> index 000000000000..50405276d866
> --- /dev/null
> +++ b/Documentation/filesystems/encoded_io.rst
> @@ -0,0 +1,74 @@
> +===========
> +Encoded I/O
> +===========
> +
> +Encoded I/O is a mechanism for reading and writing encoded (e.g., compressed
> +and/or encrypted) data directly from/to the filesystem. The userspace interface
> +is thoroughly described in the :manpage:`encoded_io(7)` man page; this document
> +describes the requirements for filesystem support.
> +
> +First of all, a filesystem supporting encoded I/O must indicate this by setting
> +the ``FMODE_ENCODED_IO`` flag in its ``file_open`` file operation::
> +
> +    static int foo_file_open(struct inode *inode, struct file *filp)
> +    {
> +            ...
> +            filep->f_mode |= FMODE_ENCODED_IO;
> +            ...
> +    }
> +
> +Encoded I/O goes through ``read_iter`` and ``write_iter``, designated by the
> +``IOCB_ENCODED`` flag in ``kiocb->ki_flags``.
> +
> +Reads
> +=====
> +
> +Encoded ``read_iter`` should:
> +
> +1. Call ``generic_encoded_read_checks()`` to validate the file and buffers
> +   provided by userspace.
> +2. Initialize the ``encoded_iov`` appropriately.
> +3. Copy it to the user with ``copy_encoded_iov_to_iter()``.
> +4. Copy the encoded data to the user.
> +5. Advance ``kiocb->ki_pos`` by ``encoded_iov->len``.
> +6. Return the size of the encoded data read, not including the ``encoded_iov``.
> +
> +There are a few details to be aware of:
> +
> +* Encoded ``read_iter`` should support reading unencoded data if the extent is
> +  not encoded.
> +* If the buffers provided by the user are not large enough to contain an entire
> +  encoded extent, then ``read_iter`` should return ``-ENOBUFS``. This is to
> +  avoid confusing userspace with truncated data that cannot be properly
> +  decoded.
> +* Reads in the middle of an encoded extent can be returned by setting
> +  ``encoded_iov->unencoded_offset`` to non-zero.
> +* Truncated unencoded data (e.g., because the file does not end on a block
> +  boundary) may be returned by setting ``encoded_iov->len`` to a value smaller
> +  value than ``encoded_iov->unencoded_len - encoded_iov->unencoded_offset``.
> +
> +Writes
> +======
> +
> +Encoded ``write_iter`` should (in addition to the usual accounting/checks done
> +by ``write_iter``):
> +
> +1. Call ``copy_encoded_iov_from_iter()`` to get and validate the
> +   ``encoded_iov``.
> +2. Call ``generic_encoded_write_checks()`` instead of
> +   ``generic_write_checks()``.
> +3. Check that the provided encoding in ``encoded_iov`` is supported.
> +4. Advance ``kiocb->ki_pos`` by ``encoded_iov->len``.
> +5. Return the size of the encoded data written.
> +
> +Again, there are a few details:
> +
> +* Encoded ``write_iter`` doesn't need to support writing unencoded data.
> +* ``write_iter`` should either write all of the encoded data or none of it; it
> +  must not do partial writes.
> +* ``write_iter`` doesn't need to validate the encoded data; a subsequent read
> +  may return, e.g., ``-EIO`` if the data is not valid.
> +* The user may lie about the unencoded size of the data; a subsequent read
> +  should truncate or zero-extend the unencoded data rather than returning an
> +  error.
> +* Be careful of page cache coherency.
> diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
> index 386eaad008b2..e074a3f1f856 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -37,6 +37,7 @@ filesystem implementations.
>     journalling
>     fscrypt
>     fsverity
> +   encoded_io
>
>  Filesystems
>  ===========
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3cd4fe6b845e..aa7efd3430d1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -175,6 +175,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  /* File does not contribute to nr_files count */
>  #define FMODE_NOACCOUNT                ((__force fmode_t)0x20000000)
>
> +/* File supports encoded IO */
> +#define FMODE_ENCODED_IO       ((__force fmode_t)0x40000000)
> +
>  /*
>   * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
>   * that indicates that they should check the contents of the iovec are
> @@ -314,6 +317,7 @@ enum rw_hint {
>  #define IOCB_SYNC              (1 << 5)
>  #define IOCB_WRITE             (1 << 6)
>  #define IOCB_NOWAIT            (1 << 7)
> +#define IOCB_ENCODED           (1 << 8)
>
>  struct kiocb {
>         struct file             *ki_filp;
> @@ -3109,6 +3113,13 @@ extern int sb_min_blocksize(struct super_block *, int);
>  extern int generic_file_mmap(struct file *, struct vm_area_struct *);
>  extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
>  extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *);
> +struct encoded_iov;
> +extern int generic_encoded_write_checks(struct kiocb *,
> +                                       const struct encoded_iov *);
> +extern int copy_encoded_iov_from_iter(struct encoded_iov *, struct iov_iter *);
> +extern ssize_t generic_encoded_read_checks(struct kiocb *, struct iov_iter *);
> +extern int copy_encoded_iov_to_iter(const struct encoded_iov *,
> +                                   struct iov_iter *);
>  extern int generic_remap_checks(struct file *file_in, loff_t pos_in,
>                                 struct file *file_out, loff_t pos_out,
>                                 loff_t *count, unsigned int remap_flags);
> @@ -3434,6 +3445,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
>                         return -EOPNOTSUPP;
>                 ki->ki_flags |= IOCB_NOWAIT;
>         }
> +       if (flags & RWF_ENCODED) {
> +               if (!(ki->ki_filp->f_mode & FMODE_ENCODED_IO))
> +                       return -EOPNOTSUPP;
> +               ki->ki_flags |= IOCB_ENCODED;
> +       }
>         if (flags & RWF_HIPRI)
>                 ki->ki_flags |= IOCB_HIPRI;
>         if (flags & RWF_DSYNC)
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 379a612f8f1d..f8c6c1e08def 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -278,6 +278,34 @@ struct fsxattr {
>                                          SYNC_FILE_RANGE_WAIT_BEFORE | \
>                                          SYNC_FILE_RANGE_WAIT_AFTER)
>
> +enum {
> +       ENCODED_IOV_COMPRESSION_NONE,
> +#define ENCODED_IOV_COMPRESSION_NONE ENCODED_IOV_COMPRESSION_NONE
> +       ENCODED_IOV_COMPRESSION_ZLIB,
> +#define ENCODED_IOV_COMPRESSION_ZLIB ENCODED_IOV_COMPRESSION_ZLIB
> +       ENCODED_IOV_COMPRESSION_LZO,
> +#define ENCODED_IOV_COMPRESSION_LZO ENCODED_IOV_COMPRESSION_LZO
> +       ENCODED_IOV_COMPRESSION_ZSTD,
> +#define ENCODED_IOV_COMPRESSION_ZSTD ENCODED_IOV_COMPRESSION_ZSTD
> +       ENCODED_IOV_COMPRESSION_TYPES = ENCODED_IOV_COMPRESSION_ZSTD,
> +};
> +
> +enum {
> +       ENCODED_IOV_ENCRYPTION_NONE,
> +#define ENCODED_IOV_ENCRYPTION_NONE ENCODED_IOV_ENCRYPTION_NONE
> +       ENCODED_IOV_ENCRYPTION_TYPES = ENCODED_IOV_ENCRYPTION_NONE,
> +};
> +

What are those defines???

> +struct encoded_iov {
> +       __aligned_u64 len;
> +       __aligned_u64 unencoded_len;
> +       __aligned_u64 unencoded_offset;
> +       __u32 compression;
> +       __u32 encryption;
> +};
> +
[...]

> +/**
> + * copy_encoded_iov_from_iter() - copy a &struct encoded_iov from userspace
> + * @encoded: Returned encoding metadata.
> + * @from: Source iterator.
> + *
> + * This copies in the &struct encoded_iov and does some basic sanity checks.
> + * This should always be used rather than a plain copy_from_iter(), as it does
> + * the proper handling for backward- and forward-compatibility.
> + *
> + * Return: 0 on success, -EFAULT if access to userspace failed, -E2BIG if the
> + *         copied structure contained non-zero fields that this kernel doesn't
> + *         support, -EINVAL if the copied structure was invalid.
> + */
> +int copy_encoded_iov_from_iter(struct encoded_iov *encoded,
> +                              struct iov_iter *from)
> +{
> +       size_t usize;
> +       int ret;
> +
> +       usize = iov_iter_single_seg_count(from);
> +       if (usize > PAGE_SIZE)
> +               return -E2BIG;
> +       if (usize < ENCODED_IOV_SIZE_VER0)
> +               return -EINVAL;
> +       ret = copy_struct_from_iter(encoded, sizeof(*encoded), from, usize);
> +       if (ret)
> +               return ret;
> +
> +       if (encoded->compression == ENCODED_IOV_COMPRESSION_NONE &&
> +           encoded->encryption == ENCODED_IOV_ENCRYPTION_NONE)
> +               return -EINVAL;
> +       if (encoded->compression > ENCODED_IOV_COMPRESSION_TYPES ||
> +           encoded->encryption > ENCODED_IOV_ENCRYPTION_TYPES)
> +               return -EINVAL;
> +       if (encoded->unencoded_offset > encoded->unencoded_len)
> +               return -EINVAL;
> +       if (encoded->len > encoded->unencoded_len - encoded->unencoded_offset)
> +               return -EINVAL;
> +       return 0;
> +}

Repeating my comment from man page review:
It would be nice if a more granular error ENCODED_IOV_ERR_XXX code could be
set in the  encoded_iov struct.

Thanks,
Amir.

  reply	other threads:[~2020-02-29 10:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 23:13 [PATCH v4 0/9] fs: interface for directly reading/writing compressed data Omar Sandoval
2020-02-28 23:13 ` [PATCH man-pages v4] Document encoded I/O Omar Sandoval
2020-02-29 10:28   ` Amir Goldstein
2020-02-29 18:03     ` Omar Sandoval
2020-03-01  7:26       ` Amir Goldstein
2020-03-11  8:47         ` Omar Sandoval
2020-04-16 12:26   ` Michael Kerrisk (man-pages)
2020-04-16 17:02     ` Omar Sandoval
2020-04-16 20:39       ` Michael Kerrisk (man-pages)
2020-02-28 23:13 ` [PATCH v4 1/9] iov_iter: add copy_struct_from_iter() Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 2/9] fs: add O_ALLOW_ENCODED open flag Omar Sandoval
2020-02-29 10:44   ` Amir Goldstein
2020-02-28 23:13 ` [PATCH v4 3/9] fs: add RWF_ENCODED for reading/writing compressed data Omar Sandoval
2020-02-29 10:40   ` Amir Goldstein [this message]
2020-02-29 18:10     ` Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 4/9] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 5/9] btrfs: add ram_bytes and offset to btrfs_ordered_extent Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 6/9] btrfs: support different disk extent size for delalloc Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 7/9] btrfs: optionally extend i_size in cow_file_range_inline() Omar Sandoval
2020-02-28 23:14 ` [PATCH v4 8/9] btrfs: implement RWF_ENCODED reads Omar Sandoval
2020-02-28 23:14 ` [PATCH v4 9/9] btrfs: implement RWF_ENCODED writes Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxi_KRZFiEsDj_yn0f+Zo4tgAkKKcuAp3jiAmB4r7xjiEA@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=cyphar@cyphar.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).