All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>, Jann Horn <jannh@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Linux API <linux-api@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH v4 3/9] fs: add RWF_ENCODED for reading/writing compressed data
Date: Sat, 29 Feb 2020 12:40:58 +0200	[thread overview]
Message-ID: <CAOQ4uxi_KRZFiEsDj_yn0f+Zo4tgAkKKcuAp3jiAmB4r7xjiEA@mail.gmail.com> (raw)
In-Reply-To: <4f8b9a66f5f6efdb9cab566581acb292f0b5b528.1582930832.git.osandov@fb.com>

On Sat, Feb 29, 2020 at 1:14 AM Omar Sandoval <osandov@osandov.com> wrote:
>
> From: Omar Sandoval <osandov@fb.com>
>
> Btrfs supports transparent compression: data written by the user can be
> compressed when written to disk and decompressed when read back.
> However, we'd like to add an interface to write pre-compressed data
> directly to the filesystem, and the matching interface to read
> compressed data without decompressing it. This adds support for
> so-called "encoded I/O" via preadv2() and pwritev2().
>
> A new RWF_ENCODED flags indicates that a read or write is "encoded". If
> this flag is set, iov[0].iov_base points to a struct encoded_iov which
> is used for metadata: namely, the compression algorithm, unencoded
> (i.e., decompressed) length, and what subrange of the unencoded data
> should be used (needed for truncated or hole-punched extents and when
> reading in the middle of an extent). For reads, the filesystem returns
> this information; for writes, the caller provides it to the filesystem.
> iov[0].iov_len must be set to sizeof(struct encoded_iov), which can be
> used to extend the interface in the future a la copy_struct_from_user().
> The remaining iovecs contain the encoded extent.
>
> This adds the VFS helpers for supporting encoded I/O and documentation
> for filesystem support.
>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  Documentation/filesystems/encoded_io.rst |  74 ++++++++++
>  Documentation/filesystems/index.rst      |   1 +
>  include/linux/fs.h                       |  16 +++
>  include/uapi/linux/fs.h                  |  33 ++++-
>  mm/filemap.c                             | 166 +++++++++++++++++++++--
>  5 files changed, 276 insertions(+), 14 deletions(-)
>  create mode 100644 Documentation/filesystems/encoded_io.rst
>
> diff --git a/Documentation/filesystems/encoded_io.rst b/Documentation/filesystems/encoded_io.rst
> new file mode 100644
> index 000000000000..50405276d866
> --- /dev/null
> +++ b/Documentation/filesystems/encoded_io.rst
> @@ -0,0 +1,74 @@
> +===========
> +Encoded I/O
> +===========
> +
> +Encoded I/O is a mechanism for reading and writing encoded (e.g., compressed
> +and/or encrypted) data directly from/to the filesystem. The userspace interface
> +is thoroughly described in the :manpage:`encoded_io(7)` man page; this document
> +describes the requirements for filesystem support.
> +
> +First of all, a filesystem supporting encoded I/O must indicate this by setting
> +the ``FMODE_ENCODED_IO`` flag in its ``file_open`` file operation::
> +
> +    static int foo_file_open(struct inode *inode, struct file *filp)
> +    {
> +            ...
> +            filep->f_mode |= FMODE_ENCODED_IO;
> +            ...
> +    }
> +
> +Encoded I/O goes through ``read_iter`` and ``write_iter``, designated by the
> +``IOCB_ENCODED`` flag in ``kiocb->ki_flags``.
> +
> +Reads
> +=====
> +
> +Encoded ``read_iter`` should:
> +
> +1. Call ``generic_encoded_read_checks()`` to validate the file and buffers
> +   provided by userspace.
> +2. Initialize the ``encoded_iov`` appropriately.
> +3. Copy it to the user with ``copy_encoded_iov_to_iter()``.
> +4. Copy the encoded data to the user.
> +5. Advance ``kiocb->ki_pos`` by ``encoded_iov->len``.
> +6. Return the size of the encoded data read, not including the ``encoded_iov``.
> +
> +There are a few details to be aware of:
> +
> +* Encoded ``read_iter`` should support reading unencoded data if the extent is
> +  not encoded.
> +* If the buffers provided by the user are not large enough to contain an entire
> +  encoded extent, then ``read_iter`` should return ``-ENOBUFS``. This is to
> +  avoid confusing userspace with truncated data that cannot be properly
> +  decoded.
> +* Reads in the middle of an encoded extent can be returned by setting
> +  ``encoded_iov->unencoded_offset`` to non-zero.
> +* Truncated unencoded data (e.g., because the file does not end on a block
> +  boundary) may be returned by setting ``encoded_iov->len`` to a value smaller
> +  value than ``encoded_iov->unencoded_len - encoded_iov->unencoded_offset``.
> +
> +Writes
> +======
> +
> +Encoded ``write_iter`` should (in addition to the usual accounting/checks done
> +by ``write_iter``):
> +
> +1. Call ``copy_encoded_iov_from_iter()`` to get and validate the
> +   ``encoded_iov``.
> +2. Call ``generic_encoded_write_checks()`` instead of
> +   ``generic_write_checks()``.
> +3. Check that the provided encoding in ``encoded_iov`` is supported.
> +4. Advance ``kiocb->ki_pos`` by ``encoded_iov->len``.
> +5. Return the size of the encoded data written.
> +
> +Again, there are a few details:
> +
> +* Encoded ``write_iter`` doesn't need to support writing unencoded data.
> +* ``write_iter`` should either write all of the encoded data or none of it; it
> +  must not do partial writes.
> +* ``write_iter`` doesn't need to validate the encoded data; a subsequent read
> +  may return, e.g., ``-EIO`` if the data is not valid.
> +* The user may lie about the unencoded size of the data; a subsequent read
> +  should truncate or zero-extend the unencoded data rather than returning an
> +  error.
> +* Be careful of page cache coherency.
> diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
> index 386eaad008b2..e074a3f1f856 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -37,6 +37,7 @@ filesystem implementations.
>     journalling
>     fscrypt
>     fsverity
> +   encoded_io
>
>  Filesystems
>  ===========
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3cd4fe6b845e..aa7efd3430d1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -175,6 +175,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  /* File does not contribute to nr_files count */
>  #define FMODE_NOACCOUNT                ((__force fmode_t)0x20000000)
>
> +/* File supports encoded IO */
> +#define FMODE_ENCODED_IO       ((__force fmode_t)0x40000000)
> +
>  /*
>   * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
>   * that indicates that they should check the contents of the iovec are
> @@ -314,6 +317,7 @@ enum rw_hint {
>  #define IOCB_SYNC              (1 << 5)
>  #define IOCB_WRITE             (1 << 6)
>  #define IOCB_NOWAIT            (1 << 7)
> +#define IOCB_ENCODED           (1 << 8)
>
>  struct kiocb {
>         struct file             *ki_filp;
> @@ -3109,6 +3113,13 @@ extern int sb_min_blocksize(struct super_block *, int);
>  extern int generic_file_mmap(struct file *, struct vm_area_struct *);
>  extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
>  extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *);
> +struct encoded_iov;
> +extern int generic_encoded_write_checks(struct kiocb *,
> +                                       const struct encoded_iov *);
> +extern int copy_encoded_iov_from_iter(struct encoded_iov *, struct iov_iter *);
> +extern ssize_t generic_encoded_read_checks(struct kiocb *, struct iov_iter *);
> +extern int copy_encoded_iov_to_iter(const struct encoded_iov *,
> +                                   struct iov_iter *);
>  extern int generic_remap_checks(struct file *file_in, loff_t pos_in,
>                                 struct file *file_out, loff_t pos_out,
>                                 loff_t *count, unsigned int remap_flags);
> @@ -3434,6 +3445,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
>                         return -EOPNOTSUPP;
>                 ki->ki_flags |= IOCB_NOWAIT;
>         }
> +       if (flags & RWF_ENCODED) {
> +               if (!(ki->ki_filp->f_mode & FMODE_ENCODED_IO))
> +                       return -EOPNOTSUPP;
> +               ki->ki_flags |= IOCB_ENCODED;
> +       }
>         if (flags & RWF_HIPRI)
>                 ki->ki_flags |= IOCB_HIPRI;
>         if (flags & RWF_DSYNC)
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 379a612f8f1d..f8c6c1e08def 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -278,6 +278,34 @@ struct fsxattr {
>                                          SYNC_FILE_RANGE_WAIT_BEFORE | \
>                                          SYNC_FILE_RANGE_WAIT_AFTER)
>
> +enum {
> +       ENCODED_IOV_COMPRESSION_NONE,
> +#define ENCODED_IOV_COMPRESSION_NONE ENCODED_IOV_COMPRESSION_NONE
> +       ENCODED_IOV_COMPRESSION_ZLIB,
> +#define ENCODED_IOV_COMPRESSION_ZLIB ENCODED_IOV_COMPRESSION_ZLIB
> +       ENCODED_IOV_COMPRESSION_LZO,
> +#define ENCODED_IOV_COMPRESSION_LZO ENCODED_IOV_COMPRESSION_LZO
> +       ENCODED_IOV_COMPRESSION_ZSTD,
> +#define ENCODED_IOV_COMPRESSION_ZSTD ENCODED_IOV_COMPRESSION_ZSTD
> +       ENCODED_IOV_COMPRESSION_TYPES = ENCODED_IOV_COMPRESSION_ZSTD,
> +};
> +
> +enum {
> +       ENCODED_IOV_ENCRYPTION_NONE,
> +#define ENCODED_IOV_ENCRYPTION_NONE ENCODED_IOV_ENCRYPTION_NONE
> +       ENCODED_IOV_ENCRYPTION_TYPES = ENCODED_IOV_ENCRYPTION_NONE,
> +};
> +

What are those defines???

> +struct encoded_iov {
> +       __aligned_u64 len;
> +       __aligned_u64 unencoded_len;
> +       __aligned_u64 unencoded_offset;
> +       __u32 compression;
> +       __u32 encryption;
> +};
> +
[...]

> +/**
> + * copy_encoded_iov_from_iter() - copy a &struct encoded_iov from userspace
> + * @encoded: Returned encoding metadata.
> + * @from: Source iterator.
> + *
> + * This copies in the &struct encoded_iov and does some basic sanity checks.
> + * This should always be used rather than a plain copy_from_iter(), as it does
> + * the proper handling for backward- and forward-compatibility.
> + *
> + * Return: 0 on success, -EFAULT if access to userspace failed, -E2BIG if the
> + *         copied structure contained non-zero fields that this kernel doesn't
> + *         support, -EINVAL if the copied structure was invalid.
> + */
> +int copy_encoded_iov_from_iter(struct encoded_iov *encoded,
> +                              struct iov_iter *from)
> +{
> +       size_t usize;
> +       int ret;
> +
> +       usize = iov_iter_single_seg_count(from);
> +       if (usize > PAGE_SIZE)
> +               return -E2BIG;
> +       if (usize < ENCODED_IOV_SIZE_VER0)
> +               return -EINVAL;
> +       ret = copy_struct_from_iter(encoded, sizeof(*encoded), from, usize);
> +       if (ret)
> +               return ret;
> +
> +       if (encoded->compression == ENCODED_IOV_COMPRESSION_NONE &&
> +           encoded->encryption == ENCODED_IOV_ENCRYPTION_NONE)
> +               return -EINVAL;
> +       if (encoded->compression > ENCODED_IOV_COMPRESSION_TYPES ||
> +           encoded->encryption > ENCODED_IOV_ENCRYPTION_TYPES)
> +               return -EINVAL;
> +       if (encoded->unencoded_offset > encoded->unencoded_len)
> +               return -EINVAL;
> +       if (encoded->len > encoded->unencoded_len - encoded->unencoded_offset)
> +               return -EINVAL;
> +       return 0;
> +}

Repeating my comment from man page review:
It would be nice if a more granular error ENCODED_IOV_ERR_XXX code could be
set in the  encoded_iov struct.

Thanks,
Amir.

  reply	other threads:[~2020-02-29 10:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 23:13 [PATCH v4 0/9] fs: interface for directly reading/writing compressed data Omar Sandoval
2020-02-28 23:13 ` [PATCH man-pages v4] Document encoded I/O Omar Sandoval
2020-02-29 10:28   ` Amir Goldstein
2020-02-29 18:03     ` Omar Sandoval
2020-03-01  7:26       ` Amir Goldstein
2020-03-11  8:47         ` Omar Sandoval
2020-04-16 12:26   ` Michael Kerrisk (man-pages)
2020-04-16 17:02     ` Omar Sandoval
2020-04-16 20:39       ` Michael Kerrisk (man-pages)
2020-02-28 23:13 ` [PATCH v4 1/9] iov_iter: add copy_struct_from_iter() Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 2/9] fs: add O_ALLOW_ENCODED open flag Omar Sandoval
2020-02-29 10:44   ` Amir Goldstein
2020-02-28 23:13 ` [PATCH v4 3/9] fs: add RWF_ENCODED for reading/writing compressed data Omar Sandoval
2020-02-29 10:40   ` Amir Goldstein [this message]
2020-02-29 18:10     ` Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 4/9] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 5/9] btrfs: add ram_bytes and offset to btrfs_ordered_extent Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 6/9] btrfs: support different disk extent size for delalloc Omar Sandoval
2020-02-28 23:13 ` [PATCH v4 7/9] btrfs: optionally extend i_size in cow_file_range_inline() Omar Sandoval
2020-02-28 23:14 ` [PATCH v4 8/9] btrfs: implement RWF_ENCODED reads Omar Sandoval
2020-02-28 23:14 ` [PATCH v4 9/9] btrfs: implement RWF_ENCODED writes Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxi_KRZFiEsDj_yn0f+Zo4tgAkKKcuAp3jiAmB4r7xjiEA@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=cyphar@cyphar.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.