linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: Jann Horn <jannh@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>, Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-btrfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Linux API <linux-api@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [RFC PATCH 2/3] fs: add RWF_ENCODED for writing compressed data
Date: Tue, 24 Sep 2019 13:38:27 -0700	[thread overview]
Message-ID: <20190924203827.GA46556@vader> (raw)
In-Reply-To: <CAG48ez1NQBNR1XeVQYGoopEk=g_KedUr+7jxLQTaO+V8JCeweQ@mail.gmail.com>

On Tue, Sep 24, 2019 at 10:01:41PM +0200, Jann Horn wrote:
> On Tue, Sep 24, 2019 at 9:35 PM Omar Sandoval <osandov@osandov.com> wrote:
> > On Tue, Sep 24, 2019 at 10:15:13AM -0700, Omar Sandoval wrote:
> > > On Thu, Sep 19, 2019 at 05:44:12PM +0200, Jann Horn wrote:
> > > > On Thu, Sep 19, 2019 at 8:54 AM Omar Sandoval <osandov@osandov.com> wrote:
> > > > > Btrfs can transparently compress data written by the user. However, we'd
> > > > > like to add an interface to write pre-compressed data directly to the
> > > > > filesystem. This adds support for so-called "encoded writes" via
> > > > > pwritev2().
> > > > >
> > > > > A new RWF_ENCODED flags indicates that a write is "encoded". If this
> > > > > flag is set, iov[0].iov_base points to a struct encoded_iov which
> > > > > contains metadata about the write: namely, the compression algorithm and
> > > > > the unencoded (i.e., decompressed) length of the extent. iov[0].iov_len
> > > > > must be set to sizeof(struct encoded_iov), which can be used to extend
> > > > > the interface in the future. The remaining iovecs contain the encoded
> > > > > extent.
> > > > >
> > > > > A similar interface for reading encoded data can be added to preadv2()
> > > > > in the future.
> > > > >
> > > > > Filesystems must indicate that they support encoded writes by setting
> > > > > FMODE_ENCODED_IO in ->file_open().
> > > > [...]
> > > > > +int import_encoded_write(struct kiocb *iocb, struct encoded_iov *encoded,
> > > > > +                        struct iov_iter *from)
> > > > > +{
> > > > > +       if (iov_iter_single_seg_count(from) != sizeof(*encoded))
> > > > > +               return -EINVAL;
> > > > > +       if (copy_from_iter(encoded, sizeof(*encoded), from) != sizeof(*encoded))
> > > > > +               return -EFAULT;
> > > > > +       if (encoded->compression == ENCODED_IOV_COMPRESSION_NONE &&
> > > > > +           encoded->encryption == ENCODED_IOV_ENCRYPTION_NONE) {
> > > > > +               iocb->ki_flags &= ~IOCB_ENCODED;
> > > > > +               return 0;
> > > > > +       }
> > > > > +       if (encoded->compression > ENCODED_IOV_COMPRESSION_TYPES ||
> > > > > +           encoded->encryption > ENCODED_IOV_ENCRYPTION_TYPES)
> > > > > +               return -EINVAL;
> > > > > +       if (!capable(CAP_SYS_ADMIN))
> > > > > +               return -EPERM;
> > > >
> > > > How does this capable() check interact with io_uring? Without having
> > > > looked at this in detail, I suspect that when an encoded write is
> > > > requested through io_uring, the capable() check might be executed on
> > > > something like a workqueue worker thread, which is probably running
> > > > with a full capability set.
> > >
> > > I discussed this more with Jens. You're right, per-IO permission checks
> > > aren't going to work. In fully-polled mode, we never get an opportunity
> > > to check capabilities in right context. So, this will probably require a
> > > new open flag.
> >
> > Actually, file_ns_capable() accomplishes the same thing without a new
> > open flag. Changing the capable() check to file_ns_capable() in
> > init_user_ns should be enough.
> 
> +Aleksa for openat2() and open() space
> 
> Mmh... but if the file descriptor has been passed through a privilege
> boundary, it isn't really clear whether the original opener of the
> file intended for this to be possible. For example, if (as a
> hypothetical example) the init process opens a service's logfile with
> root privileges, then passes the file descriptor to that logfile to
> the service on execve(), that doesn't mean that the service should be
> able to perform compressed writes into that file, I think.

Ahh, you're right.

> I think that an open flag (as you already suggested) or an fcntl()
> operation would do the job; but AFAIK the open() flag space has run
> out, so if you hook it up that way, I think you might have to wait for
> Aleksa Sarai to get something like his sys_openat2() suggestion
> (https://lore.kernel.org/lkml/20190904201933.10736-12-cyphar@cyphar.com/)
> merged?

If I counted correctly, there's still space for a new O_ flag. One of
the problems that Aleksa is solving is that unknown O_ flags are
silently ignored, which isn't an issue for an O_ENCODED flag. If the
kernel doesn't support it, it won't support RWF_ENCODED, either, so
you'll get EOPNOTSUPP from pwritev2(). So, open flag it is...

  parent reply	other threads:[~2019-09-24 20:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19  6:53 [RFC PATCH 0/3] fs: interface for directly writing encoded (e.g., compressed) data Omar Sandoval
2019-09-19  6:53 ` [RFC PATCH 1/3] fs: pass READ/WRITE to kiocb_set_rw_flags() Omar Sandoval
2019-09-20 14:38   ` Jan Kara
2019-09-19  6:53 ` [PATCH] readv.2: Document new RWF_ENCODED flag to pwritev2() Omar Sandoval
2019-09-19  6:53 ` [RFC PATCH 2/3] fs: add RWF_ENCODED for writing compressed data Omar Sandoval
2019-09-19 15:44   ` Jann Horn
2019-09-20 16:25     ` Jens Axboe
2019-09-24 17:15     ` Omar Sandoval
2019-09-24 19:35       ` Omar Sandoval
2019-09-24 20:01         ` Jann Horn
2019-09-24 20:22           ` Christian Brauner
2019-09-24 20:50             ` Matthew Wilcox
2019-09-24 20:38           ` Omar Sandoval [this message]
2019-09-25  7:11           ` Dave Chinner
2019-09-25 12:07             ` Colin Walters
2019-09-25 14:56               ` [RFC PATCH 2/3] " Chris Mason
2019-09-26 12:17                 ` Colin Walters
2019-09-26 17:46                   ` Omar Sandoval
2019-09-25 15:08               ` [RFC PATCH 2/3] fs: " Theodore Y. Ts'o
2019-09-25 22:52               ` Dave Chinner
2019-09-26  0:36             ` Omar Sandoval
2019-09-19  6:53 ` [RFC PATCH 3/3] btrfs: implement encoded (compressed) writes Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190924203827.GA46556@vader \
    --to=osandov@osandov.com \
    --cc=axboe@kernel.dk \
    --cc=cyphar@cyphar.com \
    --cc=david@fromorbit.com \
    --cc=jannh@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=luto@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).