All of lore.kernel.org
 help / color / mirror / Atom feed
* fix nodatasum I/O for zone devices
@ 2023-06-05  8:45 Christoph Hellwig
  2023-06-05  8:45 ` [PATCH 1/2] btrfs: factor out a btrfs_inode_is_nodatasum helper Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Christoph Hellwig @ 2023-06-05  8:45 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Hi all,

my recent series to optimize the ordered_extent splitting for zoned
devices broke nodatasum I/O - this was hidden by the fact that a normal
xfstests run appears to only exercise nodatasum in combination with
dev-replace, which was also broken..

Note that there is a very minor context-only conflict with work in
for-next.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] btrfs: factor out a btrfs_inode_is_nodatasum helper
  2023-06-05  8:45 fix nodatasum I/O for zone devices Christoph Hellwig
@ 2023-06-05  8:45 ` Christoph Hellwig
  2023-06-06 17:08   ` David Sterba
  2023-06-05  8:45 ` [PATCH 2/2] btrfs: allocate dummy ordereded_sums objects for nocsum I/O on zoned file systems Christoph Hellwig
  2023-06-05 10:05 ` fix nodatasum I/O for zone devices Johannes Thumshirn
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2023-06-05  8:45 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Split out a helper to check if an inode needs nodatasum treatment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c         | 3 +--
 fs/btrfs/btrfs_inode.h | 7 +++++++
 fs/btrfs/file-item.c   | 3 +--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index c9b4aa339b4b07..627d06fbb4c425 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -659,8 +659,7 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 		 * Csum items for reloc roots have already been cloned at this
 		 * point, so they are handled as part of the no-checksum case.
 		 */
-		if (inode && !(inode->flags & BTRFS_INODE_NODATASUM) &&
-		    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) &&
+		if (inode && !btrfs_inode_is_nodatasum(inode) &&
 		    !btrfs_is_data_reloc_root(inode->root)) {
 			if (should_async_write(bbio) &&
 			    btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 8abf96cfea8fae..713439d62adda3 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -535,4 +535,11 @@ void btrfs_update_inode_bytes(struct btrfs_inode *inode, const u64 add_bytes,
 			      const u64 del_bytes);
 void btrfs_assert_inode_range_clean(struct btrfs_inode *inode, u64 start, u64 end);
 
+static inline bool btrfs_inode_is_nodatasum(struct btrfs_inode *inode)
+{
+	return (inode->flags & BTRFS_INODE_NODATASUM) ||
+		test_bit(BTRFS_FS_STATE_NO_CSUMS,
+			 &inode->root->fs_info->fs_state);
+}
+
 #endif
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 2e7d5ec6c9a68c..5e6603e76e5ac0 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -355,8 +355,7 @@ blk_status_t btrfs_lookup_bio_sums(struct btrfs_bio *bbio)
 	blk_status_t ret = BLK_STS_OK;
 	u32 bio_offset = 0;
 
-	if ((inode->flags & BTRFS_INODE_NODATASUM) ||
-	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
+	if (btrfs_inode_is_nodatasum(inode))
 		return BLK_STS_OK;
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] btrfs: allocate dummy ordereded_sums objects for nocsum I/O on zoned file systems
  2023-06-05  8:45 fix nodatasum I/O for zone devices Christoph Hellwig
  2023-06-05  8:45 ` [PATCH 1/2] btrfs: factor out a btrfs_inode_is_nodatasum helper Christoph Hellwig
@ 2023-06-05  8:45 ` Christoph Hellwig
  2023-06-06 17:11   ` David Sterba
  2023-06-05 10:05 ` fix nodatasum I/O for zone devices Johannes Thumshirn
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2023-06-05  8:45 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Zoned file systems now need the ordereded_sums structure to record the
actual write location returned by zone append, so allocate dummy
structures without the csum array for them when the I/O doesn't use
checksums, and free them when completing the ordered_extent.

Fixes: 5a1b7f2b6306 ("btrfs: optimize the logical to physical mapping for zoned writes")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c       |  4 ++++
 fs/btrfs/file-item.c | 12 ++++++++++--
 fs/btrfs/zoned.c     | 11 ++++++++++-
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 627d06fbb4c425..2482739be49163 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -668,6 +668,10 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 			ret = btrfs_bio_csum(bbio);
 			if (ret)
 				goto fail_put_bio;
+		} else if (use_append) {
+			ret = btrfs_csum_one_bio(bbio);
+			if (ret)
+				goto fail_put_bio;
 		}
 	}
 
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 5e6603e76e5ac0..1308c369b1ebd8 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -751,8 +751,16 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
 	sums->logical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
 	index = 0;
 
-	shash->tfm = fs_info->csum_shash;
+	/*
+	 * If we are called for a nodatasum inode, this means we are on a zoned
+	 * file system.  In this case a ordered_csum structure needs to be
+	 * allocated to track the logical address actually written, but it does
+	 * notactually carry any checksums.
+	 */
+	if (btrfs_inode_is_nodatasum(inode))
+		goto done;
 
+	shash->tfm = fs_info->csum_shash;
 	bio_for_each_segment(bvec, bio, iter) {
 		if (!ordered) {
 			ordered = btrfs_lookup_ordered_extent(inode, offset);
@@ -817,7 +825,7 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
 
 	}
 	this_sum_bytes = 0;
-
+done:
 	/*
 	 * The ->sums assignment is for zoned writes, where a bio never spans
 	 * ordered extents and is only done unconditionally because that's cheaper
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index bbde4ddd475492..1f5497b9b2695c 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1717,7 +1717,7 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
 		if (!btrfs_zoned_split_ordered(ordered, logical, len)) {
 			set_bit(BTRFS_ORDERED_IOERR, &ordered->flags);
 			btrfs_err(fs_info, "failed to split ordered extent\n");
-			return;
+			goto out;
 		}
 		logical = sum->logical;
 		len = sum->len;
@@ -1725,6 +1725,15 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
 
 	if (ordered->disk_bytenr != logical)
 		btrfs_rewrite_logical_zoned(ordered, logical);
+
+out:
+	if (btrfs_inode_is_nodatasum(BTRFS_I(ordered->inode))) {
+		while ((sum = list_first_entry_or_null(&ordered->list,
+						       typeof(*sum), list))) {
+			list_del(&sum->list);
+			kfree(sum);
+		}
+	}
 }
 
 bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: fix nodatasum I/O for zone devices
  2023-06-05  8:45 fix nodatasum I/O for zone devices Christoph Hellwig
  2023-06-05  8:45 ` [PATCH 1/2] btrfs: factor out a btrfs_inode_is_nodatasum helper Christoph Hellwig
  2023-06-05  8:45 ` [PATCH 2/2] btrfs: allocate dummy ordereded_sums objects for nocsum I/O on zoned file systems Christoph Hellwig
@ 2023-06-05 10:05 ` Johannes Thumshirn
  2 siblings, 0 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2023-06-05 10:05 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: factor out a btrfs_inode_is_nodatasum helper
  2023-06-05  8:45 ` [PATCH 1/2] btrfs: factor out a btrfs_inode_is_nodatasum helper Christoph Hellwig
@ 2023-06-06 17:08   ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2023-06-06 17:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jun 05, 2023 at 10:45:18AM +0200, Christoph Hellwig wrote:
> Split out a helper to check if an inode needs nodatasum treatment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/bio.c         | 3 +--
>  fs/btrfs/btrfs_inode.h | 7 +++++++
>  fs/btrfs/file-item.c   | 3 +--
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index c9b4aa339b4b07..627d06fbb4c425 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -659,8 +659,7 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
>  		 * Csum items for reloc roots have already been cloned at this
>  		 * point, so they are handled as part of the no-checksum case.
>  		 */
> -		if (inode && !(inode->flags & BTRFS_INODE_NODATASUM) &&
> -		    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) &&
> +		if (inode && !btrfs_inode_is_nodatasum(inode) &&
>  		    !btrfs_is_data_reloc_root(inode->root)) {
>  			if (should_async_write(bbio) &&
>  			    btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 8abf96cfea8fae..713439d62adda3 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -535,4 +535,11 @@ void btrfs_update_inode_bytes(struct btrfs_inode *inode, const u64 add_bytes,
>  			      const u64 del_bytes);
>  void btrfs_assert_inode_range_clean(struct btrfs_inode *inode, u64 start, u64 end);
>  
> +static inline bool btrfs_inode_is_nodatasum(struct btrfs_inode *inode)
> +{
> +	return (inode->flags & BTRFS_INODE_NODATASUM) ||
> +		test_bit(BTRFS_FS_STATE_NO_CSUMS,
> +			 &inode->root->fs_info->fs_state);

This mixes inode and filesystem but the function only talks about an
inode. Also BTRFS_FS_STATE_NO_CSUMS is an exceptional state, more like a
rescue state to allow accessing filesytem with broken checksums and it
is supposed to stand out in the conditions, not be hidden in a helper.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: allocate dummy ordereded_sums objects for nocsum I/O on zoned file systems
  2023-06-05  8:45 ` [PATCH 2/2] btrfs: allocate dummy ordereded_sums objects for nocsum I/O on zoned file systems Christoph Hellwig
@ 2023-06-06 17:11   ` David Sterba
  2023-06-07  5:30     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2023-06-06 17:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jun 05, 2023 at 10:45:19AM +0200, Christoph Hellwig wrote:
> Zoned file systems now need the ordereded_sums structure to record the
> actual write location returned by zone append, so allocate dummy
> structures without the csum array for them when the I/O doesn't use
> checksums, and free them when completing the ordered_extent.
> 
> Fixes: 5a1b7f2b6306 ("btrfs: optimize the logical to physical mapping for zoned writes")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/bio.c       |  4 ++++
>  fs/btrfs/file-item.c | 12 ++++++++++--
>  fs/btrfs/zoned.c     | 11 ++++++++++-
>  3 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 627d06fbb4c425..2482739be49163 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -668,6 +668,10 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
>  			ret = btrfs_bio_csum(bbio);
>  			if (ret)
>  				goto fail_put_bio;
> +		} else if (use_append) {
> +			ret = btrfs_csum_one_bio(bbio);
> +			if (ret)
> +				goto fail_put_bio;
>  		}
>  	}
>  
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index 5e6603e76e5ac0..1308c369b1ebd8 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -751,8 +751,16 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
>  	sums->logical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
>  	index = 0;
>  
> -	shash->tfm = fs_info->csum_shash;
> +	/*
> +	 * If we are called for a nodatasum inode, this means we are on a zoned
> +	 * file system.  In this case a ordered_csum structure needs to be
> +	 * allocated to track the logical address actually written, but it does
> +	 * notactually carry any checksums.
> +	 */
> +	if (btrfs_inode_is_nodatasum(inode))

So nodatasum in checksum bio implies zoned mode, this looks fragile. Why
can't you check btrfs_is_zoned() instead? The comment is needed but I
don't agree the condition should omit zoned mode itself.

> +		goto done;
>  
> +	shash->tfm = fs_info->csum_shash;
>  	bio_for_each_segment(bvec, bio, iter) {
>  		if (!ordered) {
>  			ordered = btrfs_lookup_ordered_extent(inode, offset);
> @@ -817,7 +825,7 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
>  
>  	}
>  	this_sum_bytes = 0;
> -
> +done:
>  	/*
>  	 * The ->sums assignment is for zoned writes, where a bio never spans
>  	 * ordered extents and is only done unconditionally because that's cheaper
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index bbde4ddd475492..1f5497b9b2695c 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -1717,7 +1717,7 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
>  		if (!btrfs_zoned_split_ordered(ordered, logical, len)) {
>  			set_bit(BTRFS_ORDERED_IOERR, &ordered->flags);
>  			btrfs_err(fs_info, "failed to split ordered extent\n");
> -			return;
> +			goto out;
>  		}
>  		logical = sum->logical;
>  		len = sum->len;
> @@ -1725,6 +1725,15 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
>  
>  	if (ordered->disk_bytenr != logical)
>  		btrfs_rewrite_logical_zoned(ordered, logical);
> +
> +out:
> +	if (btrfs_inode_is_nodatasum(BTRFS_I(ordered->inode))) {

The zoned mode here is implied by the calling function, open coding the
condtions should not be a big deal, i.e. not needing the helper.

> +		while ((sum = list_first_entry_or_null(&ordered->list,
> +						       typeof(*sum), list))) {
> +			list_del(&sum->list);
> +			kfree(sum);
> +		}
> +	}
>  }
>  
>  bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
> -- 
> 2.39.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: allocate dummy ordereded_sums objects for nocsum I/O on zoned file systems
  2023-06-06 17:11   ` David Sterba
@ 2023-06-07  5:30     ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2023-06-07  5:30 UTC (permalink / raw)
  To: David Sterba
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Tue, Jun 06, 2023 at 07:11:43PM +0200, David Sterba wrote:
> > +	/*
> > +	 * If we are called for a nodatasum inode, this means we are on a zoned
> > +	 * file system.  In this case a ordered_csum structure needs to be
> > +	 * allocated to track the logical address actually written, but it does
> > +	 * notactually carry any checksums.
> > +	 */
> > +	if (btrfs_inode_is_nodatasum(inode))
> 
> So nodatasum in checksum bio implies zoned mode, this looks fragile. Why
> can't you check btrfs_is_zoned() instead? The comment is needed but I
> don't agree the condition should omit zoned mode itself.

We can't check that instead, as for zoned mode with checksums we should
not take this branch.  If we want to check btrfs_is_zoned() here it
would have to be an assert inside this branch to make things more clear.

Alternatively we could be a "bool skip_csum" argument, which is what
I did initially, but which felt more clumsy to me.

> >  
> >  	if (ordered->disk_bytenr != logical)
> >  		btrfs_rewrite_logical_zoned(ordered, logical);
> > +
> > +out:
> > +	if (btrfs_inode_is_nodatasum(BTRFS_I(ordered->inode))) {
> 
> The zoned mode here is implied by the calling function, open coding the
> condtions should not be a big deal, i.e. not needing the helper.

The zoned mode is implied.  But that's not what the helper checks for,
it checks for the per-inode or per-fs nodatasum bits.  But I can open
code it if you prefer that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-06-07  5:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-05  8:45 fix nodatasum I/O for zone devices Christoph Hellwig
2023-06-05  8:45 ` [PATCH 1/2] btrfs: factor out a btrfs_inode_is_nodatasum helper Christoph Hellwig
2023-06-06 17:08   ` David Sterba
2023-06-05  8:45 ` [PATCH 2/2] btrfs: allocate dummy ordereded_sums objects for nocsum I/O on zoned file systems Christoph Hellwig
2023-06-06 17:11   ` David Sterba
2023-06-07  5:30     ` Christoph Hellwig
2023-06-05 10:05 ` fix nodatasum I/O for zone devices Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.