All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
	Nikolay Borisov <nborisov@suse.com>,
	kernel-team@fb.com, linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH 5/5] Btrfs: add ioctl for directly writing compressed data
Date: Tue, 27 Aug 2019 14:28:36 -0400	[thread overview]
Message-ID: <20190827182834.p3jk7i2krgfzwuo6@macbook-pro-91.dhcp.thefacebook.com> (raw)
In-Reply-To: <20190827182242.GA23051@vader>

On Tue, Aug 27, 2019 at 11:22:42AM -0700, Omar Sandoval wrote:
> On Tue, Aug 27, 2019 at 11:06:23AM -0700, Omar Sandoval wrote:
> > On Tue, Aug 27, 2019 at 07:57:41AM -0400, Josef Bacik wrote:
> > > On Tue, Aug 27, 2019 at 09:26:21AM +0300, Nikolay Borisov wrote:
> > > > 
> > > > 
> > > > On 27.08.19 г. 0:36 ч., Josef Bacik wrote:
> > > > > On Thu, Aug 15, 2019 at 02:04:06PM -0700, Omar Sandoval wrote:
> > > > >> From: Omar Sandoval <osandov@fb.com>
> > > > >>
> > > > >> This adds an API for writing compressed data directly to the filesystem.
> > > > >> The use case that I have in mind is send/receive: currently, when
> > > > >> sending data from one compressed filesystem to another, the sending side
> > > > >> decompresses the data and the receiving side recompresses it before
> > > > >> writing it out. This is wasteful and can be avoided if we can just send
> > > > >> and write compressed extents. The send part will be implemented in a
> > > > >> separate series, as this ioctl can stand alone.
> > > > >>
> > > > >> The interface is essentially pwrite(2) with some extra information:
> > > > >>
> > > > >> - The input buffer contains the compressed data.
> > > > >> - Both the compressed and decompressed sizes of the data are given.
> > > > >> - The compression type (zlib, lzo, or zstd) is given.
> > > > >>
> > > > >> A more detailed description of the interface, including restrictions and
> > > > >> edge cases, is included in include/uapi/linux/btrfs.h.
> > > > >>
> > > > >> The implementation is similar to direct I/O: we have to flush any
> > > > >> ordered extents, invalidate the page cache, and do the io
> > > > >> tree/delalloc/extent map/ordered extent dance. From there, we can reuse
> > > > >> the compression code with a minor modification to distinguish the new
> > > > >> ioctl from writeback.
> > > > >>
> > > > > 
> > > > > I've looked at this a few times, the locking and space reservation stuff look
> > > > > right.  What about encrypted send/recieve?  Are we going to want to use this to
> > > > > just blind copy encrypted data without having to decrypt/re-encrypt?  Should
> > > > > this be taken into consideration for this interface?  I'll think more about it,
> > > > > but I can't really see any better option than this.  Thanks,
> > > > 
> > > > The main problem is we don't have encryption implemented. And one of the
> > > > larger aspects of the encryption support is going to be how we are
> > > > storing the encryption keys. E.g. should they be part of the send
> > > > format? Or are we going to limit send/receive based on whether the
> > > > source/dest have transferred encryption keys out of line?
> > > > 
> > > 
> > > Subvolume encryption will be coming soon, but I'm less worried about the
> > > mechanics of how that will be used and more worried about making this interface
> > > work for that eventual future.  I assume we'll want to be able to just blind
> > > copy the encrypted data instead of decrypting into the send stream and then
> > > re-encrypting on the other side.  Which means we'll have two uses for this
> > > interface, and I want to make sure we're happy with it before it gets merged.
> > > Thanks,
> > > 
> > > Josef
> > 
> > Right, I think the only way to do this would be to blindly send
> > encrypted data, and leave the key management to a higher layer.
> > 
> > Looking at the ioctl definition:
> > 
> > struct btrfs_ioctl_compressed_pwrite_args {
> >         __u64 offset;           /* in */
> >         __u32 orig_len;         /* in */
> >         __u32 compressed_len;   /* in */
> >         __u32 compress_type;    /* in */
> >         __u32 reserved[9];
> >         void __user *buf;       /* in */
> > } __attribute__ ((__packed__));
> > 
> > I think there are enough reserved fields in there for, e.g., encryption
> > type, any key management-related things we might need to stuff in, etc.
> > But the naming would be pretty bad if we extended it this way. Maybe
> > compressed write -> raw write, orig_len -> num_bytes, compressed_len ->
> > disk_num_bytes?
> > 
> > struct btrfs_ioctl_raw_pwrite_args {
> >         __u64 offset;           /* in */
> >         __u32 num_bytes;        /* in */
> >         __u32 disk_num_bytes;   /* in */
> >         __u32 compress_type;    /* in */
> >         __u32 reserved[9];
> >         void __user *buf;       /* in */
> > } __attribute__ ((__packed__));
> > 
> > Besides the naming, I don't think anything else would need to change for
> > now. And if we decide that we don't want encrypted send/receive, then
> > fine, this naming is still okay.
> 
> Oh, and at this again, compression and encryption are only u8 in the
> extent item, and we have an extra u16 for "other_encoding", so it'd
> probably be safe to make it:
> 
> struct btrfs_ioctl_raw_pwrite_args {
>         __u64 offset;           /* in */
>         __u32 num_bytes;        /* in */
>         __u32 disk_num_bytes;   /* in */
>         __u8 compression;       /* in */
>         __u8 encryption;        /* in */
> 	__u16 other_encoding;   /* in */
>         __u32 reserved[9];
>         void __user *buf;       /* in */
> } __attribute__ ((__packed__));

I like this, then just adjust the patches to utilize the generic naming
convention instead of "compression" and I think it's good to go.  Thanks,

Josef

  reply	other threads:[~2019-08-27 18:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-15 21:04 [RFC PATCH 0/5] Btrfs: add interface for writing compressed extent directly Omar Sandoval
2019-08-15 21:04 ` [PATCH 1/5] Btrfs: use correct count in btrfs_file_write_iter() Omar Sandoval
2019-08-16 16:56   ` Josef Bacik
2019-08-15 21:04 ` [PATCH 2/5] Btrfs: treat RWF_{,D}SYNC writes as sync for CRCs Omar Sandoval
2019-08-16 16:59   ` Josef Bacik
2019-08-27 12:35   ` David Sterba
2019-08-27 17:44     ` Omar Sandoval
2019-08-27 18:16       ` David Sterba
2019-08-15 21:04 ` [PATCH 3/5] Btrfs: stop clearing EXTENT_DIRTY in inode I/O tree Omar Sandoval
2019-08-16 16:59   ` Josef Bacik
2019-08-15 21:04 ` [RFC PATCH 4/5] fs: export rw_verify_area() Omar Sandoval
2019-08-16 17:02   ` Josef Bacik
2019-08-15 21:04 ` [RFC PATCH 5/5] Btrfs: add ioctl for directly writing compressed data Omar Sandoval
2019-08-26 21:36   ` Josef Bacik
2019-08-27  6:26     ` Nikolay Borisov
2019-08-27 11:57       ` Josef Bacik
2019-08-27 18:06         ` Omar Sandoval
2019-08-27 18:22           ` Omar Sandoval
2019-08-27 18:28             ` Josef Bacik [this message]
2019-08-28 12:06   ` David Sterba
2019-09-03 17:14     ` Omar Sandoval
2019-08-15 21:14 ` [RFC PATCH 0/5] Btrfs: add interface for writing compressed extent directly Omar Sandoval
2019-08-27 18:31 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190827182834.p3jk7i2krgfzwuo6@macbook-pro-91.dhcp.thefacebook.com \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.