All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <jthumshirn@suse.de>
To: Dave Chinner <david@fromorbit.com>, Omar Sandoval <osandov@osandov.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 2/2] btrfs: add ioctl for directly writing compressed data
Date: Thu, 5 Sep 2019 14:51:22 +0200	[thread overview]
Message-ID: <026f0c89-2a31-6a87-12d6-884818164a63@suse.de> (raw)
In-Reply-To: <8acbff04-aee0-9f88-b2cd-a85bb7b94df8@suse.de>

On 05/09/2019 14:16, Johannes Thumshirn wrote:
> On 05/09/2019 04:10, Dave Chinner wrote:
>> On Wed, Sep 04, 2019 at 12:13:26PM -0700, Omar Sandoval wrote:
>>> From: Omar Sandoval <osandov@fb.com>
>>>
>>> This adds an API for writing compressed data directly to the filesystem.
>>> The use case that I have in mind is send/receive: currently, when
>>> sending data from one compressed filesystem to another, the sending side
>>> decompresses the data and the receiving side recompresses it before
>>> writing it out. This is wasteful and can be avoided if we can just send
>>> and write compressed extents. The send part will be implemented in a
>>> separate series, as this ioctl can stand alone.
>>>
>>> The interface is essentially pwrite(2) with some extra information:
>>>
>>> - The input buffer contains the compressed data.
>>> - Both the compressed and decompressed sizes of the data are given.
>>> - The compression type (zlib, lzo, or zstd) is given.
>>
>> So why can't you do this with pwritev2()? Heaps of flags, and
>> use a second iovec to hold the decompressed size of the previous
>> iovec. i.e.
>>
>> 	iov[0].iov_base = compressed_data;
>> 	iov[0].iov_len = compressed_size;
>> 	iov[1].iov_base = NULL;
>> 	iov[1].iov_len = uncompressed_size;
>> 	pwritev2(fd, iov, 2, offset, RWF_COMPRESSED_ZLIB);
>>
>> And you don't need to reinvent pwritev() with some whacky ioctl that
>> is bound to be completely screwed up is ways not noticed until
>> someone else tries to use it...
>>
>> I'd also suggest atht if we are going to be able to write compressed
>> data directly, then we should be able to read them as well directly
>> via preadv2()....
> 
> 
> While I'm with you on this from a design PoV, one question remains:
> What to do with the file systems that do not support compression?
> 
> Currently there's only a kernel global check for known RWF_* flags in
> kiocb_set_rw_flags().
> 
> So we need a way for the individual file systems to opt into the new
> RWF_COMPRESSED_* flags and fail early if they're not supported, that
> will cause a lot of code churn if we cannot do it in the vfs layer.
> 
> From the 52 ->write_iter callbacks in fs/ 32 are not using
> generic_file_write_iter(). So we'd have to patch 33 functions (+/- 1-2
> because my grep | wc fu isn't the best).
> 

This (from Anthony Iliopoulos) should be sufficient:

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 58a18ed11546..86f7ff0387d7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3299,7 +3299,7 @@ static loff_t btrfs_file_llseek(struct file *file,

 static int btrfs_file_open(struct inode *inode, struct file *filp)
 {
-       filp->f_mode |= FMODE_NOWAIT;
+       filp->f_mode |= (FMODE_NOWAIT|FMODE_CAN_COMPRESS);
        return generic_file_open(inode, filp);
 }

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 997a530ff4e9..1b59e795f448 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3357,6 +3357,11 @@ static inline int kiocb_set_rw_flags(struct kiocb
                ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
        if (flags & RWF_APPEND)
                ki->ki_flags |= IOCB_APPEND;
+       if (flags & RWF_COMPRESSED) {
+               if (!(ki->ki_filp->fmode & FMODE_CAN_COMPRESS))
+                       return -EOPNOTSUPP;
+               ki->ki_flags |= IOCB_COMPRESSED;
+       }
        return 0;
 }


-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
(HRB 247165, AG München)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

  reply	other threads:[~2019-09-05 12:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 19:13 [PATCH 0/2] Btrfs: add interface for writing compressed extents directly Omar Sandoval
2019-09-04 19:13 ` [PATCH 1/2] fs: export rw_verify_area() Omar Sandoval
2019-09-04 19:13 ` [PATCH 2/2] btrfs: add ioctl for directly writing compressed data Omar Sandoval
2019-09-05  2:10   ` Dave Chinner
2019-09-05 12:16     ` Johannes Thumshirn
2019-09-05 12:51       ` Johannes Thumshirn [this message]
2019-09-06  7:46       ` Dave Chinner
2019-09-06 18:19     ` Omar Sandoval
2019-09-06 21:07       ` Dave Chinner
2019-09-06 21:27         ` Omar Sandoval
2019-09-05 10:33   ` Nikolay Borisov
2019-09-19  6:14     ` Omar Sandoval
2019-09-19  7:46       ` Nikolay Borisov
2019-09-19  7:59         ` Omar Sandoval
2019-09-24 10:29           ` Nikolay Borisov
2019-09-10 11:39   ` Filipe Manana
2019-09-19  6:23     ` Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=026f0c89-2a31-6a87-12d6-884818164a63@suse.de \
    --to=jthumshirn@suse.de \
    --cc=david@fromorbit.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.