linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH RESEND x3 v9 1/9] iov_iter: add copy_struct_from_iter()
Date: Wed, 23 Jun 2021 08:06:39 +1000	[thread overview]
Message-ID: <20210622220639.GH2419729@dread.disaster.area> (raw)
In-Reply-To: <YND8p7ioQRfoWTOU@relinquished.localdomain>

On Mon, Jun 21, 2021 at 01:55:03PM -0700, Omar Sandoval wrote:
> On Mon, Jun 21, 2021 at 01:46:04PM -0700, Omar Sandoval wrote:
> > On Mon, Jun 21, 2021 at 12:33:17PM -0700, Linus Torvalds wrote:
> > > On Mon, Jun 21, 2021 at 11:46 AM Omar Sandoval <osandov@osandov.com> wrote:
> > > >
> > > > How do we get the userspace size with the encoded_iov.size approach?
> > > > We'd have to read the size from the iov_iter before writing to the rest
> > > > of the iov_iter. Is it okay to mix the iov_iter as a source and
> > > > destination like this? From what I can tell, it's not intended to be
> > > > used like this.
> > > 
> > > I guess it could work that way, but yes, it's ugly as hell. And I
> > > really don't want a readv() system call - that should write to the
> > > result buffer - to first have to read from it.
> > > 
> > > So I think the original "just make it be the first iov entry" is the
> > > better approach, even if Al hates it.
> > > 
> > > Although I still get the feeling that using an ioctl is the *really*
> > > correct way to go. That was my first reaction to the series
> > > originally, and I still don't see why we'd have encoded data in a
> > > regular read/write path.
> > > 
> > > What was the argument against ioctl's, again?
> > 
> > The suggestion came from Dave Chinner here:
> > https://lore.kernel.org/linux-fsdevel/20190905021012.GL7777@dread.disaster.area/
> > 
> > His objection to an ioctl was two-fold:
> > 
> > 1. This interfaces looks really similar to normal read/write, so we
> >    should try to use the normal read/write interface for it. Perhaps
> >    this trouble with iov_iter has refuted that.
> > 2. The last time we had Btrfs-specific ioctls that eventually became
> >    generic (FIDEDUPERANGE and FICLONE{,RANGE}), the generalization was
> >    painful. Part of the problem with clone/dedupe was that the Btrfs
> >    ioctls were underspecified. I think I've done a better job of
> >    documenting all of the semantics and corner cases for the encoded I/O
> >    interface (and if not, I can address this). The other part of the
> >    problem is that there were various sanity checks in the normal
> >    read/write paths that were missed or drifted out of sync in the
> >    ioctls. That requires some vigilance going forward. Maybe starting
> >    this off as a generic (not Btrfs-specific) ioctl right off the bat
> >    will help.
> > 
> > If we do go the ioctl route, then we also have to decide how much of
> > preadv2/pwritev2 it should emulate. Should it use the fd offset, or
> > should that be an ioctl argument? Some of the RWF_ flags would be useful
> > for encoded I/O, too (RWF_DSYNC, RWF_SYNC, RWF_APPEND), should it
> > support those? These bring us back to Dave's first point.
> 
> Oops, I dropped Dave from the Cc list at some point. Adding him back
> now.

Fair summary. The only other thing that I'd add is this is an IO
interface that requires issuing physical IO. So if someone wants
high throughput for encoded IO, we really need AIO and/or io_uring
support, and we get that for free if we use readv2/writev2
interfaces.

Yes, it could be an ioctl() interface, but I think that this sort of
functionality is exactly what extensible syscalls like
preadv2/pwritev2 should be used for. It's a slight variant on normal
IO, and that's exactly what the RWF_* flags are intended to be used
for - allowing interesting per-IO variant behaviour without having
to completely re-implemnt the IO path via custom ioctls every time
we want slightly different functionality...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-06-22 22:06 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 23:51 [PATCH RESEND x3 v9 0/9] fs: interface for directly reading/writing compressed data Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 1/9] iov_iter: add copy_struct_from_iter() Omar Sandoval
2021-06-18 18:50   ` Linus Torvalds
2021-06-18 19:42     ` Al Viro
2021-06-18 19:49       ` Al Viro
2021-06-18 20:33         ` Omar Sandoval
2021-06-18 20:32       ` Omar Sandoval
2021-06-18 20:58         ` Al Viro
2021-06-18 21:10           ` Linus Torvalds
2021-06-18 21:32             ` Al Viro
2021-06-18 21:40               ` Linus Torvalds
2021-06-18 22:10                 ` Omar Sandoval
2021-06-18 22:32                   ` Al Viro
2021-06-19  0:43                     ` Omar Sandoval
2021-06-21 18:46                       ` Omar Sandoval
2021-06-21 19:33                         ` Linus Torvalds
2021-06-21 20:46                           ` Omar Sandoval
2021-06-21 20:53                             ` Omar Sandoval
2021-06-21 20:55                             ` Omar Sandoval
2021-06-22 22:06                               ` Dave Chinner [this message]
2021-06-23 17:49                                 ` Omar Sandoval
2021-06-23 18:28                                   ` Linus Torvalds
2021-06-23 19:33                                     ` Omar Sandoval
2021-06-23 19:45                                   ` Al Viro
2021-06-23 20:46                                     ` Omar Sandoval
2021-06-23 21:39                                       ` Al Viro
2021-06-23 21:58                                         ` Omar Sandoval
2021-06-23 22:26                                           ` Al Viro
2021-06-24  2:00                                           ` Matthew Wilcox
2021-06-24  6:14                                             ` Omar Sandoval
2021-06-24 17:52                                               ` Linus Torvalds
2021-06-24 18:28                                                 ` Omar Sandoval
2021-06-24 21:07                                                   ` Linus Torvalds
2021-06-24 22:41                                                     ` Martin K. Petersen
2021-06-25  3:38                                                       ` Matthew Wilcox
2021-06-25 16:16                                                         ` Linus Torvalds
2021-06-25 21:07                                                           ` Omar Sandoval
2021-07-07 17:59                                                             ` Omar Sandoval
2021-07-19 15:44                                                               ` Josef Bacik
2021-06-24  6:41                                             ` Christoph Hellwig
2021-06-24  7:50                                               ` Omar Sandoval
2021-06-18 22:14                 ` Al Viro
2021-06-17 23:51 ` [PATCH RESEND x3 v9 2/9] fs: add O_ALLOW_ENCODED open flag Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 3/9] fs: add RWF_ENCODED for reading/writing compressed data Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 4/9] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 5/9] btrfs: add ram_bytes and offset to btrfs_ordered_extent Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 6/9] btrfs: support different disk extent size for delalloc Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 7/9] btrfs: optionally extend i_size in cow_file_range_inline() Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 8/9] btrfs: implement RWF_ENCODED reads Omar Sandoval
2021-06-17 23:51 ` [PATCH RESEND x3 v9 9/9] btrfs: implement RWF_ENCODED writes Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210622220639.GH2419729@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).