All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
To: "dsterba@suse.cz" <dsterba@suse.cz>
Cc: David Sterba <dsterba@suse.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 2/2] btrfs: zoned: fix compressed writes
Date: Mon, 17 May 2021 07:07:04 +0000	[thread overview]
Message-ID: <PH0PR04MB741608A91B3171D58DA902EF9B2D9@PH0PR04MB7416.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20210512144213.GS7604@twin.jikos.cz

On 12/05/2021 16:44, David Sterba wrote:
> On Wed, May 12, 2021 at 11:01:40PM +0900, Johannes Thumshirn wrote:
>> When multiple processes write data to the same block group on a compressed
>> zoned filesystem, the underlying device could report I/O errors and data
>> corruption is possible.
>>
>> This happens because on a zoned file system, compressed data writes where
>> sent to the device via a REQ_OP_WRITE instead of a REQ_OP_ZONE_APPEND
>> operation. But with REQ_OP_WRITE and parallel submission it cannot be
>> guaranteed that the data is always submitted aligned to the underlying
>> zone's write pointer.
>>
>> The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a zoned
>> filesystem is non intrusive on a regular file system or when submitting to
>> a conventional zone on a zoned filesystem, as it is guarded by
>> btrfs_use_zone_append.
>>
>> Reported-by: David Sterba <dsterba@suse.com>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>  fs/btrfs/compression.c | 44 ++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 40 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>> index 2bea01d23a5b..d27205791483 100644
>> --- a/fs/btrfs/compression.c
>> +++ b/fs/btrfs/compression.c
>> @@ -28,6 +28,7 @@
>>  #include "compression.h"
>>  #include "extent_io.h"
>>  #include "extent_map.h"
>> +#include "zoned.h"
>>  
>>  static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" };
>>  
>> @@ -349,6 +350,7 @@ static void end_compressed_bio_write(struct bio *bio)
>>  	 */
>>  	inode = cb->inode;
>>  	cb->compressed_pages[0]->mapping = cb->inode->i_mapping;
>> +	btrfs_record_physical_zoned(inode, cb->start, bio);
>>  	btrfs_writepage_endio_finish_ordered(cb->compressed_pages[0],
>>  			cb->start, cb->start + cb->len - 1,
>>  			bio->bi_status == BLK_STS_OK);
>> @@ -401,6 +403,10 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
>>  	u64 first_byte = disk_start;
>>  	blk_status_t ret;
>>  	int skip_sum = inode->flags & BTRFS_INODE_NODATASUM;
>> +	struct block_device *bdev;
>> +	const bool use_append = btrfs_use_zone_append(inode, disk_start);
>> +	const unsigned int bio_op =
>> +		use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE;
>>  
>>  	WARN_ON(!PAGE_ALIGNED(start));
>>  	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
>> @@ -418,10 +424,31 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
>>  	cb->nr_pages = nr_pages;
>>  
>>  	bio = btrfs_bio_alloc(first_byte);
>> -	bio->bi_opf = REQ_OP_WRITE | write_flags;
>> +	bio->bi_opf = bio_op | write_flags;
>>  	bio->bi_private = cb;
>>  	bio->bi_end_io = end_compressed_bio_write;
>>  
>> +	if (use_append) {
>> +		struct extent_map *em;
>> +		struct map_lookup *map;
>> +
>> +		em = btrfs_get_chunk_map(fs_info, disk_start, PAGE_SIZE);
> 
> The caller already does the em lookup, so this is duplicate, allocating
> memory, taking locks and doing a tree lookup. All happening on write out
> path so this seems heavy.

Right, I did not check this, sorry. Is it OK to add another patch as 
preparation swapping some of the parameters to btrfs_submit_compressed_write()
from the em? Otherwise btrfs_submit_compressed_write() will have 10 parameters
which sounds awefull.

> 
>> +		if (IS_ERR(em)) {
>> +			kfree(cb);
>> +			bio_put(bio);
>> +			return BLK_STS_NOTSUPP;
>> +		}
>> +
>> +		map = em->map_lookup;
>> +		/* We only support single profile for now */
>> +		ASSERT(map->num_stripes == 1);
>> +		bdev = map->stripes[0].dev->bdev;
>> +
>> +		free_extent_map(em);
>> +
>> +		bio_set_dev(bio, bdev);
> 
> bdev seems to be used just to set it for the bio, so it does not need to
> be declared in the function scope (or for one-time use at all)

Oops that's a left over from an earlier version.

> The same sequence of calls is done in submit_extent_page so this should
> be in a helper.

Sure.
 

  reply	other threads:[~2021-05-17  7:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-12 14:01 [PATCH 0/2] btrfs: zoned: fix writes on a compressed zoned filesystem Johannes Thumshirn
2021-05-12 14:01 ` [PATCH 1/2] btrfs: zoned: pass start block to btrfs_use_zone_append Johannes Thumshirn
2021-05-12 14:01 ` [PATCH 2/2] btrfs: zoned: fix compressed writes Johannes Thumshirn
2021-05-12 14:42   ` David Sterba
2021-05-17  7:07     ` Johannes Thumshirn [this message]
2021-05-17  9:12       ` David Sterba
2021-05-17  9:20         ` Johannes Thumshirn
2021-05-17 11:21       ` Johannes Thumshirn
2021-05-17 11:39         ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR04MB741608A91B3171D58DA902EF9B2D9@PH0PR04MB7416.namprd04.prod.outlook.com \
    --to=johannes.thumshirn@wdc.com \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.