linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs write-bandwidth performance regression of 6.5-rc4/rc3
@ 2023-07-31  7:22 Wang Yugui
  2023-08-01  2:22 ` Wang Yugui
  2023-08-02  8:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 29+ messages in thread
From: Wang Yugui @ 2023-07-31  7:22 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I noticed a btrfs write-bandwidth performance regression of 6.5-rc4/rc3
with the compare to btrfs 6.4.0

test case:
  disk: NVMe PCIe3 SSD *4 
  btrfs: -m raid -d raid0
  fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30
   -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=4
   -directory=/mnt/test

6.5-rc4/rc3
	fio(-numjobs=4) WRITE: bw=1512MiB/s (1586MB/s)
	but ‘dd conv=fsync’ works well, 2.1 GiB/s.

6.4.0
	fio(-numjobs=4) WRITE: bw=4147MiB/s (4349MB/s)

'git bisect' between 6.4 and 6.5 is yet not done.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/07/31



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-07-31  7:22 btrfs write-bandwidth performance regression of 6.5-rc4/rc3 Wang Yugui
@ 2023-08-01  2:22 ` Wang Yugui
  2023-08-01  8:35   ` Christoph Hellwig
  2023-08-02  8:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-01  2:22 UTC (permalink / raw)
  To: Christoph Hellwig, linux-btrfs

Hi,

> I noticed a btrfs write-bandwidth performance regression of 6.5-rc4/rc3
> with the compare to btrfs 6.4.0
> 
> test case:
>   disk: NVMe PCIe3 SSD *4 
>   btrfs: -m raid -d raid0
>   fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30
>    -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=4
>    -directory=/mnt/test
> 
> 6.5-rc4/rc3
> 	fio(-numjobs=4) WRITE: bw=1512MiB/s (1586MB/s)
> 	but ‘dd conv=fsync’ works well, 2.1 GiB/s.
> 
> 6.4.0
> 	fio(-numjobs=4) WRITE: bw=4147MiB/s (4349MB/s)
> 
> 'git bisect' between 6.4 and 6.5 is yet not done.

'git bisect' show that this patch is the root cause of performance
regression.

e917ff56c8e7 :Christoph Hellwig: btrfs: determine synchronous writers from bio
or writeback control

The performance is still good when
da023618076a :Christoph Hellwig: btrfs: submit IO synchronously for fast checksum implementations

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/01


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01  2:22 ` Wang Yugui
@ 2023-08-01  8:35   ` Christoph Hellwig
  2023-08-01  8:56     ` Wang Yugui
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-01  8:35 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Christoph Hellwig, linux-btrfs

On Tue, Aug 01, 2023 at 10:22:58AM +0800, Wang Yugui wrote:
> 'git bisect' show that this patch is the root cause of performance
> regression.
> 
> e917ff56c8e7 :Christoph Hellwig: btrfs: determine synchronous writers from bio
> or writeback control

Odd.  What CPU are you using to test?  It seems like it doesn't set
BTRFS_FS_CSUM_IMPL_FAST as that is the only way to even hit a potential
difference.  Or are you using a non-standard checksum type?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01  8:35   ` Christoph Hellwig
@ 2023-08-01  8:56     ` Wang Yugui
  2023-08-01  9:03       ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-01  8:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

Hi,

> On Tue, Aug 01, 2023 at 10:22:58AM +0800, Wang Yugui wrote:
> > 'git bisect' show that this patch is the root cause of performance
> > regression.
> > 
> > e917ff56c8e7 :Christoph Hellwig: btrfs: determine synchronous writers from bio
> > or writeback control
> 
> Odd.  What CPU are you using to test?  It seems like it doesn't set
> BTRFS_FS_CSUM_IMPL_FAST as that is the only way to even hit a potential
> difference.  Or are you using a non-standard checksum type?

The CPU is E5 2680 v2.
and I used the default checksum type of mkfs.btrfs v6.1

dd(conv=fsync)(single threads) works well.
fio(multiple jobs) does NO work well.

so maybe some problem of lock?

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/01



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01  8:56     ` Wang Yugui
@ 2023-08-01  9:03       ` Christoph Hellwig
  2023-08-01  9:32         ` Wang Yugui
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-01  9:03 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Christoph Hellwig, linux-btrfs

On Tue, Aug 01, 2023 at 04:56:58PM +0800, Wang Yugui wrote:
> > Odd.  What CPU are you using to test?  It seems like it doesn't set
> > BTRFS_FS_CSUM_IMPL_FAST as that is the only way to even hit a potential
> > difference.  Or are you using a non-standard checksum type?
> 
> The CPU is E5 2680 v2.

Is this in a VM and not passing through cpu flags?  What does dmesg
say when mounting?  Norally it should say something like:

[   23.461448] BTRFS info (device vdb): using crc32c (crc32c-intel) checksum algorithm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01  9:03       ` Christoph Hellwig
@ 2023-08-01  9:32         ` Wang Yugui
  2023-08-01 10:00           ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-01  9:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

Hi,

> On Tue, Aug 01, 2023 at 04:56:58PM +0800, Wang Yugui wrote:
> > > Odd.  What CPU are you using to test?  It seems like it doesn't set
> > > BTRFS_FS_CSUM_IMPL_FAST as that is the only way to even hit a potential
> > > difference.  Or are you using a non-standard checksum type?
> > 
> > The CPU is E5 2680 v2.
> 
> Is this in a VM and not passing through cpu flags?  What does dmesg
> say when mounting?  Norally it should say something like:
> 
> [   23.461448] BTRFS info (device vdb): using crc32c (crc32c-intel) checksum algorithm

This is NOT VM.

dmesg output:
[  250.596544] raid6: skipped pq benchmark and selected sse2x4
[  250.602836] raid6: using ssse3x2 recovery algorithm
[  250.612812] xor: automatically using best checksumming function   avx       
[  250.895573] Btrfs loaded, assert=on, zoned=yes, fsverity=no
[  250.905249] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 3 transid 6 /dev/nvme3n1 scanned by systemd-udevd (1726)
[  250.922155] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 4 transid 6 /dev/nvme0n1 scanned by systemd-udevd (1729)
[  250.935965] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 1 transid 6 /dev/nvme1n1 scanned by systemd-udevd (1724)
[  250.968268] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 2 transid 6 /dev/nvme2n1 scanned by systemd-udevd (1723)
[  251.070139] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) checksum algorithm
[  251.079164] BTRFS info (device nvme1n1): using free space tree
[  251.089871] BTRFS info (device nvme1n1): enabling ssd optimizations
[  251.096921] BTRFS info (device nvme1n1): auto enabling async discard
[  251.104417] BTRFS info (device nvme1n1): checking UUID tree

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/01


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01  9:32         ` Wang Yugui
@ 2023-08-01 10:00           ` Christoph Hellwig
  2023-08-01 13:04             ` Wang Yugui
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-01 10:00 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Christoph Hellwig, linux-btrfs

On Tue, Aug 01, 2023 at 05:32:13PM +0800, Wang Yugui wrote:
> dmesg output:
> [  250.596544] raid6: skipped pq benchmark and selected sse2x4
> [  250.602836] raid6: using ssse3x2 recovery algorithm
> [  250.612812] xor: automatically using best checksumming function   avx       
> [  250.895573] Btrfs loaded, assert=on, zoned=yes, fsverity=no
> [  250.905249] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 3 transid 6 /dev/nvme3n1 scanned by systemd-udevd (1726)
> [  250.922155] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 4 transid 6 /dev/nvme0n1 scanned by systemd-udevd (1729)
> [  250.935965] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 1 transid 6 /dev/nvme1n1 scanned by systemd-udevd (1724)
> [  250.968268] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 2 transid 6 /dev/nvme2n1 scanned by systemd-udevd (1723)
> [  251.070139] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) checksum algorithm

So this is using the normal accelerated crc32c algorith that sets
BTRFS_FS_CSUM_IMPL_FAST.  Which means the commit doesn't change
behavior in should_async_write, which is the only place that checks
the sync_writers flag.  Can your retry the bisetion or apply the patch
below for a revert on top of latest mainline? 

---
From 9bdae7bbe4144b9bb49a28a4ee1de5c0f81f9b81 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 1 Aug 2023 10:27:25 +0200
Subject: Revert "btrfs: determine synchronous writers from bio or writeback
 control"

This reverts commit e917ff56c8e7b117b590632fa40a08e36577d31f.
---
 fs/btrfs/bio.c         | 7 ++++---
 fs/btrfs/btrfs_inode.h | 3 +++
 fs/btrfs/file.c        | 8 ++++++++
 fs/btrfs/inode.c       | 1 +
 fs/btrfs/transaction.c | 2 ++
 5 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 12b12443efaabb..8fecf4e84da2bf 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -602,10 +602,11 @@ static bool should_async_write(struct btrfs_bio *bbio)
 		return false;
 
 	/*
-	 * Try to defer the submission to a workqueue to parallelize the
-	 * checksum calculation unless the I/O is issued synchronously.
+	 * If the I/O is not issued by fsync and friends, (->sync_writers != 0),
+	 * then try to defer the submission to a workqueue to parallelize the
+	 * checksum calculation.
 	 */
-	if (op_is_sync(bbio->bio.bi_opf))
+	if (atomic_read(&bbio->inode->sync_writers))
 		return false;
 
 	/* Zoned devices require I/O to be submitted in order. */
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index d47a927b3504d6..4efe895359dcf8 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -116,6 +116,9 @@ struct btrfs_inode {
 
 	unsigned long runtime_flags;
 
+	/* Keep track of who's O_SYNC/fsyncing currently */
+	atomic_t sync_writers;
+
 	/* full 64 bit generation number, struct vfs_inode doesn't have a big
 	 * enough field for this.
 	 */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fd03e689a6bedc..3e37a62a6b5db7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1648,6 +1648,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	struct file *file = iocb->ki_filp;
 	struct btrfs_inode *inode = BTRFS_I(file_inode(file));
 	ssize_t num_written, num_sync;
+	const bool sync = iocb_is_dsync(iocb);
 
 	/*
 	 * If the fs flips readonly due to some impossible error, although we
@@ -1660,6 +1661,9 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	if (encoded && (iocb->ki_flags & IOCB_NOWAIT))
 		return -EOPNOTSUPP;
 
+	if (sync)
+		atomic_inc(&inode->sync_writers);
+
 	if (encoded) {
 		num_written = btrfs_encoded_write(iocb, from, encoded);
 		num_sync = encoded->len;
@@ -1679,6 +1683,8 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 			num_written = num_sync;
 	}
 
+	if (sync)
+		atomic_dec(&inode->sync_writers);
 	return num_written;
 }
 
@@ -1722,7 +1728,9 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
 	 * several segments of stripe length (currently 64K).
 	 */
 	blk_start_plug(&plug);
+	atomic_inc(&BTRFS_I(inode)->sync_writers);
 	ret = btrfs_fdatawrite_range(inode, start, end);
+	atomic_dec(&BTRFS_I(inode)->sync_writers);
 	blk_finish_plug(&plug);
 
 	return ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 49cef61f6a39f5..b9bad13ab75d19 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8618,6 +8618,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	ei->io_tree.inode = ei;
 	extent_io_tree_init(fs_info, &ei->file_extent_tree,
 			    IO_TREE_INODE_FILE_EXTENT);
+	atomic_set(&ei->sync_writers, 0);
 	mutex_init(&ei->log_mutex);
 	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
 	INIT_LIST_HEAD(&ei->delalloc_inodes);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 91b6c2fdc420e7..cda2b86de18814 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1060,6 +1060,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 	u64 start = 0;
 	u64 end;
 
+	atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers);
 	while (!find_first_extent_bit(dirty_pages, start, &start, &end,
 				      mark, &cached_state)) {
 		bool wait_writeback = false;
@@ -1095,6 +1096,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 		cond_resched();
 		start = end + 1;
 	}
+	atomic_dec(&BTRFS_I(fs_info->btree_inode)->sync_writers);
 	return werr;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01 10:00           ` Christoph Hellwig
@ 2023-08-01 13:04             ` Wang Yugui
  2023-08-01 14:59               ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-01 13:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

Hi,

> On Tue, Aug 01, 2023 at 05:32:13PM +0800, Wang Yugui wrote:
> > dmesg output:
> > [  250.596544] raid6: skipped pq benchmark and selected sse2x4
> > [  250.602836] raid6: using ssse3x2 recovery algorithm
> > [  250.612812] xor: automatically using best checksumming function   avx       
> > [  250.895573] Btrfs loaded, assert=on, zoned=yes, fsverity=no
> > [  250.905249] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 3 transid 6 /dev/nvme3n1 scanned by systemd-udevd (1726)
> > [  250.922155] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 4 transid 6 /dev/nvme0n1 scanned by systemd-udevd (1729)
> > [  250.935965] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 1 transid 6 /dev/nvme1n1 scanned by systemd-udevd (1724)
> > [  250.968268] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 2 transid 6 /dev/nvme2n1 scanned by systemd-udevd (1723)
> > [  251.070139] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) checksum algorithm
> 
> So this is using the normal accelerated crc32c algorith that sets
> BTRFS_FS_CSUM_IMPL_FAST.  Which means the commit doesn't change
> behavior in should_async_write, which is the only place that checks
> the sync_writers flag.  Can your retry the bisetion or apply the patch
> below for a revert on top of latest mainline? 

bad performance
	6.5.0-rc4 with the revert of  e917ff56c8e7b117b590632fa40a08e36577d31f

so I redo the bisetion.

bad performance:( same as prev report)
	6.4.0 + patches until e917ff56c8e7b117b590632fa40a08e36577d31f

bad preformance ( good performance in prev report )
	6.4.0  +patches before e917ff56c8e7b117b590632fa40a08e36577d31f

good performance
	drop 'btrfs: submit IO synchronously for fast checksum implementations' too
	6.4.0 + patches until ' btrfs: use SECTOR_SHIFT to convert LBA to physical offset'

but I have tested 6.1.y with  a patch almost same as 
'btrfs: submit IO synchronously for fast checksum implementations'
for over 20+ times, no performance regression found.

 static bool should_async_write(struct btrfs_fs_info *fs_info,
 			     struct btrfs_inode *bi)
 {
+	// should_async_write() only called by btrfs_submit_metadata_bio(), it means REQ_META
+	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
+		return false;
 	if (btrfs_is_zoned(fs_info))
 		return false;
 	if (atomic_read(&bi->sync_writers))
 		return false;
-	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
-		return false;
 	return true;
 }


BTW, I checked memory ECC status, no error is reported.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/01


> ---
> From 9bdae7bbe4144b9bb49a28a4ee1de5c0f81f9b81 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Tue, 1 Aug 2023 10:27:25 +0200
> Subject: Revert "btrfs: determine synchronous writers from bio or writeback
>  control"
> 
> This reverts commit e917ff56c8e7b117b590632fa40a08e36577d31f.
> ---
>  fs/btrfs/bio.c         | 7 ++++---
>  fs/btrfs/btrfs_inode.h | 3 +++
>  fs/btrfs/file.c        | 8 ++++++++
>  fs/btrfs/inode.c       | 1 +
>  fs/btrfs/transaction.c | 2 ++
>  5 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 12b12443efaabb..8fecf4e84da2bf 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -602,10 +602,11 @@ static bool should_async_write(struct btrfs_bio *bbio)
>  		return false;
>  
>  	/*
> -	 * Try to defer the submission to a workqueue to parallelize the
> -	 * checksum calculation unless the I/O is issued synchronously.
> +	 * If the I/O is not issued by fsync and friends, (->sync_writers != 0),
> +	 * then try to defer the submission to a workqueue to parallelize the
> +	 * checksum calculation.
>  	 */
> -	if (op_is_sync(bbio->bio.bi_opf))
> +	if (atomic_read(&bbio->inode->sync_writers))
>  		return false;
>  
>  	/* Zoned devices require I/O to be submitted in order. */
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index d47a927b3504d6..4efe895359dcf8 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -116,6 +116,9 @@ struct btrfs_inode {
>  
>  	unsigned long runtime_flags;
>  
> +	/* Keep track of who's O_SYNC/fsyncing currently */
> +	atomic_t sync_writers;
> +
>  	/* full 64 bit generation number, struct vfs_inode doesn't have a big
>  	 * enough field for this.
>  	 */
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index fd03e689a6bedc..3e37a62a6b5db7 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1648,6 +1648,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  	struct file *file = iocb->ki_filp;
>  	struct btrfs_inode *inode = BTRFS_I(file_inode(file));
>  	ssize_t num_written, num_sync;
> +	const bool sync = iocb_is_dsync(iocb);
>  
>  	/*
>  	 * If the fs flips readonly due to some impossible error, although we
> @@ -1660,6 +1661,9 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  	if (encoded && (iocb->ki_flags & IOCB_NOWAIT))
>  		return -EOPNOTSUPP;
>  
> +	if (sync)
> +		atomic_inc(&inode->sync_writers);
> +
>  	if (encoded) {
>  		num_written = btrfs_encoded_write(iocb, from, encoded);
>  		num_sync = encoded->len;
> @@ -1679,6 +1683,8 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  			num_written = num_sync;
>  	}
>  
> +	if (sync)
> +		atomic_dec(&inode->sync_writers);
>  	return num_written;
>  }
>  
> @@ -1722,7 +1728,9 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
>  	 * several segments of stripe length (currently 64K).
>  	 */
>  	blk_start_plug(&plug);
> +	atomic_inc(&BTRFS_I(inode)->sync_writers);
>  	ret = btrfs_fdatawrite_range(inode, start, end);
> +	atomic_dec(&BTRFS_I(inode)->sync_writers);
>  	blk_finish_plug(&plug);
>  
>  	return ret;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 49cef61f6a39f5..b9bad13ab75d19 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8618,6 +8618,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
>  	ei->io_tree.inode = ei;
>  	extent_io_tree_init(fs_info, &ei->file_extent_tree,
>  			    IO_TREE_INODE_FILE_EXTENT);
> +	atomic_set(&ei->sync_writers, 0);
>  	mutex_init(&ei->log_mutex);
>  	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
>  	INIT_LIST_HEAD(&ei->delalloc_inodes);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 91b6c2fdc420e7..cda2b86de18814 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1060,6 +1060,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
>  	u64 start = 0;
>  	u64 end;
>  
> +	atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers);
>  	while (!find_first_extent_bit(dirty_pages, start, &start, &end,
>  				      mark, &cached_state)) {
>  		bool wait_writeback = false;
> @@ -1095,6 +1096,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
>  		cond_resched();
>  		start = end + 1;
>  	}
> +	atomic_dec(&BTRFS_I(fs_info->btree_inode)->sync_writers);
>  	return werr;
>  }
>  
> -- 
> 2.39.2



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01 13:04             ` Wang Yugui
@ 2023-08-01 14:59               ` Christoph Hellwig
  2023-08-01 15:51                 ` Wang Yugui
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-01 14:59 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Christoph Hellwig, linux-btrfs

On Tue, Aug 01, 2023 at 09:04:05PM +0800, Wang Yugui wrote:
> good performance
> 	drop 'btrfs: submit IO synchronously for fast checksum implementations' too
> 	6.4.0 + patches until ' btrfs: use SECTOR_SHIFT to convert LBA to physical offset'
> 
> but I have tested 6.1.y with  a patch almost same as 
> 'btrfs: submit IO synchronously for fast checksum implementations'
> for over 20+ times, no performance regression found.

Can you try a git-revert of 140fb1f734736a on the latest tree (which
should work cleanly) for an additional data point?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01 14:59               ` Christoph Hellwig
@ 2023-08-01 15:51                 ` Wang Yugui
  2023-08-01 15:56                   ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-01 15:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

Hi,

> On Tue, Aug 01, 2023 at 09:04:05PM +0800, Wang Yugui wrote:
> > good performance
> > 	drop 'btrfs: submit IO synchronously for fast checksum implementations' too
> > 	6.4.0 + patches until ' btrfs: use SECTOR_SHIFT to convert LBA to physical offset'
> > 
> > but I have tested 6.1.y with  a patch almost same as 
> > 'btrfs: submit IO synchronously for fast checksum implementations'
> > for over 20+ times, no performance regression found.
> 
> Can you try a git-revert of 140fb1f734736a on the latest tree (which
> should work cleanly) for an additional data point?

GOOD performance  when btrfs 6.5-rc4 with
	Revert "btrfs: determine synchronous writers from bio or writeback control"
	Revert "btrfs: submit IO synchronously for fast checksum implementations"

GOOD performance  when btrfs 6.5-rc4 with this fix too.
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 1f3e06ec6924..1b7344e673db 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -598,7 +598,7 @@ static void run_one_async_free(struct btrfs_work *work)
 static bool should_async_write(struct btrfs_bio *bbio)
 {
        /* Submit synchronously if the checksum implementation is fast. */
-       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
+       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
                return false;

        /*

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/01



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01 15:51                 ` Wang Yugui
@ 2023-08-01 15:56                   ` Christoph Hellwig
  2023-08-01 15:57                     ` Christoph Hellwig
  2023-08-02  0:04                     ` Wang Yugui
  0 siblings, 2 replies; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-01 15:56 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Christoph Hellwig, linux-btrfs

On Tue, Aug 01, 2023 at 11:51:28PM +0800, Wang Yugui wrote:
> > Can you try a git-revert of 140fb1f734736a on the latest tree (which
> > should work cleanly) for an additional data point?
> 
> GOOD performance  when btrfs 6.5-rc4 with
> 	Revert "btrfs: determine synchronous writers from bio or writeback control"
> 	Revert "btrfs: submit IO synchronously for fast checksum implementations"

And with only a revert of

"btrfs: submit IO synchronously for fast checksum implementations"?

> 
> GOOD performance  when btrfs 6.5-rc4 with this fix too.
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 1f3e06ec6924..1b7344e673db 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -598,7 +598,7 @@ static void run_one_async_free(struct btrfs_work *work)
>  static bool should_async_write(struct btrfs_bio *bbio)
>  {
>         /* Submit synchronously if the checksum implementation is fast. */
> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>                 return false;

This disables synchronous checksum calculation entirely for data I/O.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01 15:56                   ` Christoph Hellwig
@ 2023-08-01 15:57                     ` Christoph Hellwig
  2023-08-02  0:04                     ` Wang Yugui
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-01 15:57 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Christoph Hellwig, linux-btrfs

Also I'm curious if you see any differents for a non-RAID0 (i.e.
single profile) workload.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-01 15:56                   ` Christoph Hellwig
  2023-08-01 15:57                     ` Christoph Hellwig
@ 2023-08-02  0:04                     ` Wang Yugui
  2023-08-02  9:26                       ` Christoph Hellwig
  1 sibling, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-02  0:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

Hi,

> On Tue, Aug 01, 2023 at 11:51:28PM +0800, Wang Yugui wrote:
> > > Can you try a git-revert of 140fb1f734736a on the latest tree (which
> > > should work cleanly) for an additional data point?
> > 
> > GOOD performance  when btrfs 6.5-rc4 with
> > 	Revert "btrfs: determine synchronous writers from bio or writeback control"
> > 	Revert "btrfs: submit IO synchronously for fast checksum implementations"
> 
> And with only a revert of
> 
> "btrfs: submit IO synchronously for fast checksum implementations"?

GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
checksum implementations") 

> > 
> > GOOD performance  when btrfs 6.5-rc4 with this fix too.
> > diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> > index 1f3e06ec6924..1b7344e673db 100644
> > --- a/fs/btrfs/bio.c
> > +++ b/fs/btrfs/bio.c
> > @@ -598,7 +598,7 @@ static void run_one_async_free(struct btrfs_work *work)
> >  static bool should_async_write(struct btrfs_bio *bbio)
> >  {
> >         /* Submit synchronously if the checksum implementation is fast. */
> > -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> > +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> >                 return false;
> 
> This disables synchronous checksum calculation entirely for data I/O.

without this fix, data I/O checksum is always synchronous?
this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?


> Also I'm curious if you see any differents for a non-RAID0 (i.e.
> single profile) workload.

'-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
case.


Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/02


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-07-31  7:22 btrfs write-bandwidth performance regression of 6.5-rc4/rc3 Wang Yugui
  2023-08-01  2:22 ` Wang Yugui
@ 2023-08-02  8:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 0 replies; 29+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-08-02  8:45 UTC (permalink / raw)
  To: Wang Yugui, linux-btrfs, Linux kernel regressions list

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 31.07.23 09:22, Wang Yugui wrote:
> 
> I noticed a btrfs write-bandwidth performance regression of 6.5-rc4/rc3
> with the compare to btrfs 6.4.0
> 
> test case:
>   disk: NVMe PCIe3 SSD *4 
>   btrfs: -m raid -d raid0
>   fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30
>    -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=4
>    -directory=/mnt/test
> 
> 6.5-rc4/rc3
> 	fio(-numjobs=4) WRITE: bw=1512MiB/s (1586MB/s)
> 	but ‘dd conv=fsync’ works well, 2.1 GiB/s.
> 
> 6.4.0
> 	fio(-numjobs=4) WRITE: bw=4147MiB/s (4349MB/s)
> 
> 'git bisect' between 6.4 and 6.5 is yet not done.
> 

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced da023618076
#regzbot title btrfs: write-bandwidth performance regression with NVMe raid0
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-02  0:04                     ` Wang Yugui
@ 2023-08-02  9:26                       ` Christoph Hellwig
  2023-08-11  8:58                         ` Linux regression tracking (Thorsten Leemhuis)
  2023-08-11 14:23                         ` Wang Yugui
  0 siblings, 2 replies; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-02  9:26 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Christoph Hellwig, linux-btrfs, Chris Mason

On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
> > And with only a revert of
> > 
> > "btrfs: submit IO synchronously for fast checksum implementations"?
> 
> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
> checksum implementations") 

Ok, so you have a case where the offload for the checksumming generation
actually helps (by a lot).  Adding Chris to the Cc list as he was
involved with this.

> > > -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> > > +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> > >                 return false;
> > 
> > This disables synchronous checksum calculation entirely for data I/O.
> 
> without this fix, data I/O checksum is always synchronous?
> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?

It is never with the above patch.

> 
> > Also I'm curious if you see any differents for a non-RAID0 (i.e.
> > single profile) workload.
> 
> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
> case.

How does it compare with and without the revert?  Can you add the numbers?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-02  9:26                       ` Christoph Hellwig
@ 2023-08-11  8:58                         ` Linux regression tracking (Thorsten Leemhuis)
  2023-08-11 10:31                           ` Christoph Hellwig
  2023-08-11 14:23                         ` Wang Yugui
  1 sibling, 1 reply; 29+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-08-11  8:58 UTC (permalink / raw)
  To: Christoph Hellwig, Wang Yugui
  Cc: linux-btrfs, Chris Mason, David Sterba, Josef Bacik,
	Linux kernel regressions list

On 02.08.23 11:26, Christoph Hellwig wrote:
> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
>>> And with only a revert of
>>>
>>> "btrfs: submit IO synchronously for fast checksum implementations"?
>>
>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
>> checksum implementations") 
> 
> Ok, so you have a case where the offload for the checksumming generation
> actually helps (by a lot).  Adding Chris to the Cc list as he was
> involved with this.

Radio silence from Chris here and on lore in general afaics. Also
nothing new in this thread for more than a week now.

CCing David and Josef, maybe they have an idea what's up here and if
Chris might be afk for longer -- and maybe this can still be fixed
before the 6.5 release.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>                 return false;
>>>
>>> This disables synchronous checksum calculation entirely for data I/O.
>>
>> without this fix, data I/O checksum is always synchronous?
>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
> 
> It is never with the above patch.
> 
>>
>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
>>> single profile) workload.
>>
>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
>> case.
> 
> How does it compare with and without the revert?  Can you add the numbers?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-11  8:58                         ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-08-11 10:31                           ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2023-08-11 10:31 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Christoph Hellwig, Wang Yugui, linux-btrfs, Chris Mason,
	David Sterba, Josef Bacik

On Fri, Aug 11, 2023 at 10:58:27AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > Ok, so you have a case where the offload for the checksumming generation
> > actually helps (by a lot).  Adding Chris to the Cc list as he was
> > involved with this.
> 
> Radio silence from Chris here and on lore in general afaics. Also
> nothing new in this thread for more than a week now.

We're also waiting for additional data from Wang.

I'll be off for a two weeks vacation tomorrow with rather limited
inter connectivity, so if we want to make progress I'd better be
today or tomorrow morning.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-02  9:26                       ` Christoph Hellwig
  2023-08-11  8:58                         ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-08-11 14:23                         ` Wang Yugui
  2023-08-11 14:52                           ` Chris Mason
  1 sibling, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-11 14:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, Chris Mason

Hi,


> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
> > > And with only a revert of
> > > 
> > > "btrfs: submit IO synchronously for fast checksum implementations"?
> > 
> > GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
> > checksum implementations") 
> 
> Ok, so you have a case where the offload for the checksumming generation
> actually helps (by a lot).  Adding Chris to the Cc list as he was
> involved with this.
> 
> > > > -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> > > > +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> > > >                 return false;
> > > 
> > > This disables synchronous checksum calculation entirely for data I/O.
> > 
> > without this fix, data I/O checksum is always synchronous?
> > this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
> 
> It is never with the above patch.
> 
> > 
> > > Also I'm curious if you see any differents for a non-RAID0 (i.e.
> > > single profile) workload.
> > 
> > '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
> > case.
> 
> How does it compare with and without the revert?  Can you add the numbers?

the patch from wangyugui
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 12b12443efaa..8ef968f0957d 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -598,7 +598,7 @@ static void run_one_async_free(struct btrfs_work *work)
 static bool should_async_write(struct btrfs_bio *bbio)
 {
 	/* Submit synchronously if the checksum implementation is fast. */
-	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
+	if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
 		return false;
 
 	/*

btrfs device: NVMe SSD PCIe3  x4

fio(-numjobs=4 ) on linux 6.5-rc5 with the patch, btrfs '-m raid1 -d raid0' 
  WRITE: bw=3858MiB/s (4045MB/s)
  WRITE: bw=3781MiB/s (3965MB/s)

fio(-numjobs=4 ) on linux 6.5-rc5 with the patch, btrfs '-m single -d single' 
  WRITE: bw=3004MiB/s (3149MB/s),
  WRITE: bw=2851MiB/s (2990MB/s)


fio(-numjobs=4 ) on linux 6.5-rc5 without the patch, btrfs '-m raid1 -d raid0' 
  WRITE: bw=1311MiB/s (1375MB/s)
  WRITE: bw=1435MiB/s (1504MB/s)

fio(-numjobs=4 ) on linux 6.5-rc5 without the patch, btrfs '-m single -d single' 
  WRITE: bw=1337MiB/s (1402MB/s)
  WRITE: bw=1413MiB/s (1481MB/s),


Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/11




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-11 14:23                         ` Wang Yugui
@ 2023-08-11 14:52                           ` Chris Mason
  2023-08-13  9:50                             ` Wang Yugui
  0 siblings, 1 reply; 29+ messages in thread
From: Chris Mason @ 2023-08-11 14:52 UTC (permalink / raw)
  To: Wang Yugui, Christoph Hellwig; +Cc: linux-btrfs, Chris Mason

On 8/11/23 10:23 AM, Wang Yugui wrote:
> Hi,
> 
> 
>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
>>>> And with only a revert of
>>>>
>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
>>>
>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
>>> checksum implementations") 
>>
>> Ok, so you have a case where the offload for the checksumming generation
>> actually helps (by a lot).  Adding Chris to the Cc list as he was
>> involved with this.
>>
>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>                 return false;
>>>>
>>>> This disables synchronous checksum calculation entirely for data I/O.
>>>
>>> without this fix, data I/O checksum is always synchronous?
>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
>>
>> It is never with the above patch.
>>
>>>
>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
>>>> single profile) workload.
>>>
>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
>>> case.
>>
>> How does it compare with and without the revert?  Can you add the numbers?
> 

Looking through the thread, you're comparing -m single -d single, but
btrfs is still doing the raid.

Sorry to keep asking for more runs, but these numbers are a surprise,
and I probably won't have time today to reproduce before vacation next
week (sadly, Christoph and I aren't going together).

Can you please do a run where lvm or MD raid are providing the raid0?

It doesn't look like you're using compression, but I wanted to double check.

How much ram do you have?

Your fio run has 4 jobs going, can I please see the full fio output for
a fast run and a slow run?

Thanks!

-chris


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-11 14:52                           ` Chris Mason
@ 2023-08-13  9:50                             ` Wang Yugui
  2023-08-29  9:45                               ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-08-13  9:50 UTC (permalink / raw)
  To: Chris Mason; +Cc: Christoph Hellwig, linux-btrfs, Chris Mason

[-- Attachment #1: Type: text/plain, Size: 2322 bytes --]

Hi,

> On 8/11/23 10:23 AM, Wang Yugui wrote:
> > Hi,
> > 
> > 
> >> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
> >>>> And with only a revert of
> >>>>
> >>>> "btrfs: submit IO synchronously for fast checksum implementations"?
> >>>
> >>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
> >>> checksum implementations") 
> >>
> >> Ok, so you have a case where the offload for the checksumming generation
> >> actually helps (by a lot).  Adding Chris to the Cc list as he was
> >> involved with this.
> >>
> >>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> >>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> >>>>>                 return false;
> >>>>
> >>>> This disables synchronous checksum calculation entirely for data I/O.
> >>>
> >>> without this fix, data I/O checksum is always synchronous?
> >>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
> >>
> >> It is never with the above patch.
> >>
> >>>
> >>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
> >>>> single profile) workload.
> >>>
> >>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
> >>> case.
> >>
> >> How does it compare with and without the revert?  Can you add the numbers?
> > 
> 
> Looking through the thread, you're comparing -m single -d single, but
> btrfs is still doing the raid.
> 
> Sorry to keep asking for more runs, but these numbers are a surprise,
> and I probably won't have time today to reproduce before vacation next
> week (sadly, Christoph and I aren't going together).
> 
> Can you please do a run where lvm or MD raid are providing the raid0?

no LVM/MD used here.

> It doesn't look like you're using compression, but I wanted to double check.

Yes. '-m xx -d yy' with other default mkfs.btrfs option, so no compression.

> How much ram do you have?

192G ECC memory.

two CPU numa nodes, but all PCIe3 NVMe SSD are connected to one NVMe HBA/
one numa node.


> Your fio run has 4 jobs going, can I please see the full fio output for
> a fast run and a slow run?

fio results are saved into attachment files (fast.text & slow.txt)

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/13

[-- Attachment #2: fast.txt --]
[-- Type: application/octet-stream, Size: 6215 bytes --]

+ fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=4 -directory=/mnt/test
write-bandwidth: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=sync, iodepth=1
...
fio-3.19
Starting 4 processes
write-bandwidth: Laying out IO file (1 file / 32768MiB)
write-bandwidth: Laying out IO file (1 file / 32768MiB)
write-bandwidth: Laying out IO file (1 file / 32768MiB)
write-bandwidth: Laying out IO file (1 file / 32768MiB)
Jobs: 1 (f=1): [_(3),F(1)][100.0%][eta 00m:00s]
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=2570: Sun Aug 13 17:47:02 2023
  write: IOPS=697, BW=697MiB/s (731MB/s)(21.7GiB/31883msec); 0 zone resets
    clat (usec): min=325, max=23003, avg=1345.13, stdev=3158.16
     lat (usec): min=326, max=23003, avg=1345.57, stdev=3158.16
    clat percentiles (usec):
     |  1.00th=[  355],  5.00th=[  388], 10.00th=[  408], 20.00th=[  553],
     | 30.00th=[  734], 40.00th=[  832], 50.00th=[  865], 60.00th=[  889],
     | 70.00th=[  914], 80.00th=[ 1020], 90.00th=[ 1418], 95.00th=[ 1696],
     | 99.00th=[21627], 99.50th=[21890], 99.90th=[22152], 99.95th=[22152],
     | 99.99th=[22676]
   bw (  KiB/s): min=505856, max=1527017, per=26.13%, avg=755313.46, stdev=181122.74, samples=59
   iops        : min=  494, max= 1491, avg=737.59, stdev=176.88, samples=59
  lat (usec)   : 500=17.84%, 750=12.78%, 1000=48.48%
  lat (msec)   : 2=17.07%, 4=1.44%, 50=2.39%
  cpu          : usr=0.42%, sys=59.17%, ctx=7412, majf=0, minf=282
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,22238,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=2571: Sun Aug 13 17:47:02 2023
  write: IOPS=718, BW=719MiB/s (754MB/s)(21.8GiB/31099msec); 0 zone resets
    clat (usec): min=326, max=23115, avg=1338.32, stdev=3160.69
     lat (usec): min=326, max=23115, avg=1338.75, stdev=3160.69
    clat percentiles (usec):
     |  1.00th=[  351],  5.00th=[  383], 10.00th=[  404], 20.00th=[  478],
     | 30.00th=[  627], 40.00th=[  824], 50.00th=[  865], 60.00th=[  889],
     | 70.00th=[  914], 80.00th=[ 1074], 90.00th=[ 1418], 95.00th=[ 1696],
     | 99.00th=[21627], 99.50th=[21890], 99.90th=[22152], 99.95th=[22676],
     | 99.99th=[22938]
   bw (  KiB/s): min=501760, max=1881416, per=26.22%, avg=757748.20, stdev=206407.89, samples=59
   iops        : min=  490, max= 1837, avg=739.98, stdev=201.54, samples=59
  lat (usec)   : 500=21.56%, 750=12.61%, 1000=44.20%
  lat (msec)   : 2=17.72%, 4=1.53%, 20=0.01%, 50=2.38%
  cpu          : usr=0.38%, sys=61.88%, ctx=17364, majf=0, minf=284
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,22353,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=2572: Sun Aug 13 17:47:02 2023
  write: IOPS=720, BW=721MiB/s (756MB/s)(22.4GiB/31760msec); 0 zone resets
    clat (usec): min=341, max=22932, avg=1307.11, stdev=3226.40
     lat (usec): min=341, max=22932, avg=1307.52, stdev=3226.41
    clat percentiles (usec):
     |  1.00th=[  367],  5.00th=[  392], 10.00th=[  404], 20.00th=[  437],
     | 30.00th=[  553], 40.00th=[  709], 50.00th=[  832], 60.00th=[  873],
     | 70.00th=[  914], 80.00th=[ 1037], 90.00th=[ 1270], 95.00th=[ 1565],
     | 99.00th=[21365], 99.50th=[21890], 99.90th=[22152], 99.95th=[22152],
     | 99.99th=[22676]
   bw (  KiB/s): min=516047, max=2278277, per=26.81%, avg=774783.25, stdev=257993.39, samples=59
   iops        : min=  503, max= 2224, avg=756.59, stdev=251.87, samples=59
  lat (usec)   : 500=25.91%, 750=16.60%, 1000=33.29%
  lat (msec)   : 2=20.14%, 4=1.55%, 50=2.50%
  cpu          : usr=0.45%, sys=56.77%, ctx=13204, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,22894,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=2573: Sun Aug 13 17:47:02 2023
  write: IOPS=713, BW=713MiB/s (748MB/s)(22.3GiB/31998msec); 0 zone resets
    clat (usec): min=342, max=23050, avg=1310.51, stdev=3306.47
     lat (usec): min=343, max=23050, avg=1310.95, stdev=3306.48
    clat percentiles (usec):
     |  1.00th=[  371],  5.00th=[  388], 10.00th=[  404], 20.00th=[  433],
     | 30.00th=[  537], 40.00th=[  685], 50.00th=[  848], 60.00th=[  881],
     | 70.00th=[  914], 80.00th=[ 1029], 90.00th=[ 1188], 95.00th=[ 1401],
     | 99.00th=[21365], 99.50th=[21890], 99.90th=[22152], 99.95th=[22152],
     | 99.99th=[22414]
   bw (  KiB/s): min=509952, max=2291342, per=26.73%, avg=772539.32, stdev=258105.99, samples=59
   iops        : min=  498, max= 2237, avg=754.41, stdev=252.01, samples=59
  lat (usec)   : 500=27.87%, 750=14.51%, 1000=34.68%
  lat (msec)   : 2=20.30%, 50=2.64%
  cpu          : usr=0.40%, sys=54.73%, ctx=16519, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,22826,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=2822MiB/s (2959MB/s), 697MiB/s-721MiB/s (731MB/s-756MB/s), io=88.2GiB (94.7GB), run=31099-31998msec

[-- Attachment #3: slow.txt --]
[-- Type: application/octet-stream, Size: 6259 bytes --]

+ fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=4 -directory=/mnt/test
write-bandwidth: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=sync, iodepth=1
...
fio-3.19
Starting 4 processes
write-bandwidth: Laying out IO file (1 file / 32768MiB)
write-bandwidth: Laying out IO file (1 file / 32768MiB)
write-bandwidth: Laying out IO file (1 file / 32768MiB)
write-bandwidth: Laying out IO file (1 file / 32768MiB)
Jobs: 4 (f=4): [F(4)][100.0%][w=89.9MiB/s][w=89 IOPS][eta 00m:00s]
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=1795: Sun Aug 13 17:37:22 2023
  write: IOPS=325, BW=326MiB/s (342MB/s)(10.1GiB/31758msec); 0 zone resets
    clat (usec): min=335, max=28799, avg=2894.34, stdev=5373.81
     lat (usec): min=335, max=28799, avg=2894.92, stdev=5373.81
    clat percentiles (usec):
     |  1.00th=[  359],  5.00th=[  379], 10.00th=[  404], 20.00th=[  832],
     | 30.00th=[  857], 40.00th=[  873], 50.00th=[  881], 60.00th=[  898],
     | 70.00th=[  963], 80.00th=[ 1336], 90.00th=[16319], 95.00th=[16909],
     | 99.00th=[17957], 99.50th=[18744], 99.90th=[21627], 99.95th=[21627],
     | 99.99th=[21890]
   bw (  KiB/s): min=256000, max=1909180, per=26.26%, avg=347924.54, stdev=223553.56, samples=59
   iops        : min=  250, max= 1864, avg=339.76, stdev=218.26, samples=59
  lat (usec)   : 500=12.11%, 750=4.66%, 1000=57.22%
  lat (msec)   : 2=13.31%, 4=0.16%, 20=12.19%, 50=0.36%
  cpu          : usr=0.18%, sys=31.60%, ctx=1435, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,10348,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=1796: Sun Aug 13 17:37:22 2023
  write: IOPS=327, BW=328MiB/s (344MB/s)(10.1GiB/31576msec); 0 zone resets
    clat (usec): min=337, max=28887, avg=2894.97, stdev=5453.77
     lat (usec): min=337, max=28888, avg=2895.57, stdev=5453.76
    clat percentiles (usec):
     |  1.00th=[  359],  5.00th=[  383], 10.00th=[  404], 20.00th=[  775],
     | 30.00th=[  832], 40.00th=[  840], 50.00th=[  857], 60.00th=[  865],
     | 70.00th=[  938], 80.00th=[ 1020], 90.00th=[16450], 95.00th=[16909],
     | 99.00th=[18220], 99.50th=[19006], 99.90th=[21627], 99.95th=[21890],
     | 99.99th=[22152]
   bw (  KiB/s): min=260096, max=1901014, per=26.25%, avg=347786.14, stdev=222061.26, samples=59
   iops        : min=  254, max= 1856, avg=339.63, stdev=216.80, samples=59
  lat (usec)   : 500=12.40%, 750=6.49%, 1000=58.10%
  lat (msec)   : 2=10.15%, 20=12.57%, 50=0.30%
  cpu          : usr=0.20%, sys=29.40%, ctx=1441, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,10346,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=1797: Sun Aug 13 17:37:22 2023
  write: IOPS=317, BW=317MiB/s (332MB/s)(9.82GiB/31715msec); 0 zone resets
    clat (usec): min=327, max=22557, avg=2978.67, stdev=5434.74
     lat (usec): min=327, max=22557, avg=2979.27, stdev=5434.73
    clat percentiles (usec):
     |  1.00th=[  347],  5.00th=[  392], 10.00th=[  848], 20.00th=[  865],
     | 30.00th=[  865], 40.00th=[  873], 50.00th=[  881], 60.00th=[  898],
     | 70.00th=[  914], 80.00th=[ 1401], 90.00th=[16450], 95.00th=[17695],
     | 99.00th=[17957], 99.50th=[18482], 99.90th=[21890], 99.95th=[22152],
     | 99.99th=[22414]
   bw (  KiB/s): min=258048, max=1419228, per=25.64%, avg=339759.12, stdev=166043.31, samples=59
   iops        : min=  252, max= 1385, avg=331.78, stdev=162.04, samples=59
  lat (usec)   : 500=7.23%, 750=0.51%, 1000=67.28%
  lat (msec)   : 2=12.24%, 4=0.03%, 20=12.41%, 50=0.30%
  cpu          : usr=0.10%, sys=30.82%, ctx=1334, majf=0, minf=285
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,10055,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
write-bandwidth: (groupid=0, jobs=1): err= 0: pid=1798: Sun Aug 13 17:37:22 2023
  write: IOPS=325, BW=326MiB/s (342MB/s)(10.1GiB/31751msec); 0 zone resets
    clat (usec): min=327, max=22259, avg=2895.95, stdev=5395.88
     lat (usec): min=327, max=22259, avg=2896.53, stdev=5395.87
    clat percentiles (usec):
     |  1.00th=[  355],  5.00th=[  388], 10.00th=[  506], 20.00th=[  857],
     | 30.00th=[  865], 40.00th=[  873], 50.00th=[  881], 60.00th=[  889],
     | 70.00th=[  906], 80.00th=[ 1254], 90.00th=[16450], 95.00th=[17695],
     | 99.00th=[17957], 99.50th=[18482], 99.90th=[21627], 99.95th=[21890],
     | 99.99th=[22152]
   bw (  KiB/s): min=260096, max=1825888, per=26.27%, avg=348074.85, stdev=215896.47, samples=59
   iops        : min=  254, max= 1783, avg=339.92, stdev=210.83, samples=59
  lat (usec)   : 500=9.79%, 750=1.76%, 1000=64.38%
  lat (msec)   : 2=11.60%, 4=0.07%, 20=12.09%, 50=0.32%
  cpu          : usr=0.15%, sys=31.31%, ctx=2367, majf=0, minf=284
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,10342,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1294MiB/s (1357MB/s), 317MiB/s-328MiB/s (332MB/s-344MB/s), io=40.1GiB (43.1GB), run=31576-31758msec

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-13  9:50                             ` Wang Yugui
@ 2023-08-29  9:45                               ` Linux regression tracking (Thorsten Leemhuis)
  2023-09-11  7:02                                 ` Thorsten Leemhuis
  2023-12-06 14:22                                 ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 2 replies; 29+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-08-29  9:45 UTC (permalink / raw)
  To: Wang Yugui, Chris Mason
  Cc: Christoph Hellwig, linux-btrfs, Chris Mason,
	Linux kernel regressions list

On 13.08.23 11:50, Wang Yugui wrote:
>> On 8/11/23 10:23 AM, Wang Yugui wrote:
>>>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
>>>>>> And with only a revert of
>>>>>>
>>>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
>>>>> checksum implementations") 
>>>> Ok, so you have a case where the offload for the checksumming generation
>>>> actually helps (by a lot).  Adding Chris to the Cc list as he was
>>>> involved with this.
>>>>
>>>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>                 return false;
>>>>>> This disables synchronous checksum calculation entirely for data I/O.
>>>>> without this fix, data I/O checksum is always synchronous?
>>>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
>>>> It is never with the above patch.
>>>>
>>>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
>>>>>> single profile) workload.
>>>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
>>>>> case.
>>>> How does it compare with and without the revert?  Can you add the numbers?
>>
>> Looking through the thread, you're comparing -m single -d single, but
>> btrfs is still doing the raid.
>>
>> Sorry to keep asking for more runs, but these numbers are a surprise,
>> and I probably won't have time today to reproduce before vacation next
>> week (sadly, Christoph and I aren't going together).

Sadly I also did not run into either you or Christoph during my own
vacation during the last two weeks. But I'm back from it how, which got
me wondering:

What happened to this regression? Was any progress made to resolve this
in one way or another?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>> Can you please do a run where lvm or MD raid are providing the raid0?
> no LVM/MD used here.
> 
>> It doesn't look like you're using compression, but I wanted to double check.
> 
> Yes. '-m xx -d yy' with other default mkfs.btrfs option, so no compression.
> 
>> How much ram do you have?
> 
> 192G ECC memory.
> 
> two CPU numa nodes, but all PCIe3 NVMe SSD are connected to one NVMe HBA/
> one numa node.
> 
>> Your fio run has 4 jobs going, can I please see the full fio output for
>> a fast run and a slow run?
> 
> fio results are saved into attachment files (fast.text & slow.txt)
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2023/08/13

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-29  9:45                               ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-11  7:02                                 ` Thorsten Leemhuis
  2023-09-11 23:20                                   ` Wang Yugui
  2023-12-06 14:22                                 ` Linux regression tracking (Thorsten Leemhuis)
  1 sibling, 1 reply; 29+ messages in thread
From: Thorsten Leemhuis @ 2023-09-11  7:02 UTC (permalink / raw)
  To: Wang Yugui, Chris Mason
  Cc: Christoph Hellwig, linux-btrfs, Chris Mason,
	Linux kernel regressions list

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Hmmm, nothing happened wrt to this regression during the past two weeks
afaics. Wonder if it fell through the cracks due to the merge window or
if there is a good reason. Was there progress and I just missed it? To
rule out the latter:

Wang Yugui, does the problem still happen with 6.6-rc1?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 29.08.23 11:45, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 13.08.23 11:50, Wang Yugui wrote:
>>> On 8/11/23 10:23 AM, Wang Yugui wrote:
>>>>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
>>>>>>> And with only a revert of
>>>>>>>
>>>>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
>>>>>> checksum implementations") 
>>>>> Ok, so you have a case where the offload for the checksumming generation
>>>>> actually helps (by a lot).  Adding Chris to the Cc list as he was
>>>>> involved with this.
>>>>>
>>>>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>>                 return false;
>>>>>>> This disables synchronous checksum calculation entirely for data I/O.
>>>>>> without this fix, data I/O checksum is always synchronous?
>>>>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>> It is never with the above patch.
>>>>>
>>>>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
>>>>>>> single profile) workload.
>>>>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
>>>>>> case.
>>>>> How does it compare with and without the revert?  Can you add the numbers?
>>>
>>> Looking through the thread, you're comparing -m single -d single, but
>>> btrfs is still doing the raid.
>>>
>>> Sorry to keep asking for more runs, but these numbers are a surprise,
>>> and I probably won't have time today to reproduce before vacation next
>>> week (sadly, Christoph and I aren't going together).
> 
> Sadly I also did not run into either you or Christoph during my own
> vacation during the last two weeks. But I'm back from it how, which got
> me wondering:
> 
> What happened to this regression? Was any progress made to resolve this
> in one way or another?
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 
> #regzbot poke
> 
>>> Can you please do a run where lvm or MD raid are providing the raid0?
>> no LVM/MD used here.
>>
>>> It doesn't look like you're using compression, but I wanted to double check.
>>
>> Yes. '-m xx -d yy' with other default mkfs.btrfs option, so no compression.
>>
>>> How much ram do you have?
>>
>> 192G ECC memory.
>>
>> two CPU numa nodes, but all PCIe3 NVMe SSD are connected to one NVMe HBA/
>> one numa node.
>>
>>> Your fio run has 4 jobs going, can I please see the full fio output for
>>> a fast run and a slow run?
>>
>> fio results are saved into attachment files (fast.text & slow.txt)
>>
>> Best Regards
>> Wang Yugui (wangyugui@e16-tech.com)
>> 2023/08/13

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-09-11  7:02                                 ` Thorsten Leemhuis
@ 2023-09-11 23:20                                   ` Wang Yugui
  2023-09-12  7:58                                     ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 29+ messages in thread
From: Wang Yugui @ 2023-09-11 23:20 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Chris Mason, Christoph Hellwig, linux-btrfs, Chris Mason

Hi,

> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
> 
> Hmmm, nothing happened wrt to this regression during the past two weeks
> afaics. Wonder if it fell through the cracks due to the merge window or
> if there is a good reason. Was there progress and I just missed it? To
> rule out the latter:
> 
> Wang Yugui, does the problem still happen with 6.6-rc1?

The problem still happen with 6.6-rc1.
Yet no related patch is released for this problem.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/09/12


> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 
> #regzbot poke
> 
> On 29.08.23 11:45, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 13.08.23 11:50, Wang Yugui wrote:
> >>> On 8/11/23 10:23 AM, Wang Yugui wrote:
> >>>>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
> >>>>>>> And with only a revert of
> >>>>>>>
> >>>>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
> >>>>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
> >>>>>> checksum implementations") 
> >>>>> Ok, so you have a case where the offload for the checksumming generation
> >>>>> actually helps (by a lot).  Adding Chris to the Cc list as he was
> >>>>> involved with this.
> >>>>>
> >>>>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> >>>>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> >>>>>>>>                 return false;
> >>>>>>> This disables synchronous checksum calculation entirely for data I/O.
> >>>>>> without this fix, data I/O checksum is always synchronous?
> >>>>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
> >>>>> It is never with the above patch.
> >>>>>
> >>>>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
> >>>>>>> single profile) workload.
> >>>>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
> >>>>>> case.
> >>>>> How does it compare with and without the revert?  Can you add the numbers?
> >>>
> >>> Looking through the thread, you're comparing -m single -d single, but
> >>> btrfs is still doing the raid.
> >>>
> >>> Sorry to keep asking for more runs, but these numbers are a surprise,
> >>> and I probably won't have time today to reproduce before vacation next
> >>> week (sadly, Christoph and I aren't going together).
> > 
> > Sadly I also did not run into either you or Christoph during my own
> > vacation during the last two weeks. But I'm back from it how, which got
> > me wondering:
> > 
> > What happened to this regression? Was any progress made to resolve this
> > in one way or another?
> > 
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> > 
> > #regzbot poke
> > 
> >>> Can you please do a run where lvm or MD raid are providing the raid0?
> >> no LVM/MD used here.
> >>
> >>> It doesn't look like you're using compression, but I wanted to double check.
> >>
> >> Yes. '-m xx -d yy' with other default mkfs.btrfs option, so no compression.
> >>
> >>> How much ram do you have?
> >>
> >> 192G ECC memory.
> >>
> >> two CPU numa nodes, but all PCIe3 NVMe SSD are connected to one NVMe HBA/
> >> one numa node.
> >>
> >>> Your fio run has 4 jobs going, can I please see the full fio output for
> >>> a fast run and a slow run?
> >>
> >> fio results are saved into attachment files (fast.text & slow.txt)
> >>
> >> Best Regards
> >> Wang Yugui (wangyugui@e16-tech.com)
> >> 2023/08/13



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-09-11 23:20                                   ` Wang Yugui
@ 2023-09-12  7:58                                     ` Linux regression tracking (Thorsten Leemhuis)
  2023-09-26 10:55                                       ` Thorsten Leemhuis
  0 siblings, 1 reply; 29+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-12  7:58 UTC (permalink / raw)
  To: Wang Yugui, Linux regressions mailing list
  Cc: Chris Mason, Christoph Hellwig, linux-btrfs, Chris Mason,
	Josef Bacik, David Sterba

On 12.09.23 01:20, Wang Yugui wrote:
> 
>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> for once, to make this easily accessible to everyone.
>>
>> Hmmm, nothing happened wrt to this regression during the past two weeks
>> afaics. Wonder if it fell through the cracks due to the merge window or
>> if there is a good reason. Was there progress and I just missed it? To
>> rule out the latter:
>>
>> Wang Yugui, does the problem still happen with 6.6-rc1?
> 
> The problem still happen with 6.6-rc1.
> Yet no related patch is released for this problem.

Thx. Adding Josef and David to the list of recipients, maybe they might
have an idea why that commit of Christoph (da023618076a13 ("btrfs:
submit IO synchronously for fast checksum implementations") [v6.5-rc1])
seems to slow down things for Wang Yugui's Btrfs setup (hope I
understood things correctly). FWIW, the start of the thread is here:
https://lore.kernel.org/linux-btrfs/20230731152223.4EFB.409509F4@e16-tech.com/

Christoph later wrote "so you have a case where the offload for the
checksumming generation actually helps (by a lot)" and asked for advice
from the Btrfs side:
https://lore.kernel.org/linux-btrfs/20230802092631.GA27963@lst.de/

/me hopes that will get things rolling again.

/me meanwhile wonders if there were other reports like this, or if Wang
Yugui's system is somehow special.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> On 29.08.23 11:45, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 13.08.23 11:50, Wang Yugui wrote:
>>>>> On 8/11/23 10:23 AM, Wang Yugui wrote:
>>>>>>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
>>>>>>>>> And with only a revert of
>>>>>>>>>
>>>>>>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>>>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
>>>>>>>> checksum implementations") 
>>>>>>> Ok, so you have a case where the offload for the checksumming generation
>>>>>>> actually helps (by a lot).  Adding Chris to the Cc list as he was
>>>>>>> involved with this.
>>>>>>>
>>>>>>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>>>>                 return false;
>>>>>>>>> This disables synchronous checksum calculation entirely for data I/O.
>>>>>>>> without this fix, data I/O checksum is always synchronous?
>>>>>>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>>>> It is never with the above patch.
>>>>>>>
>>>>>>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
>>>>>>>>> single profile) workload.
>>>>>>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
>>>>>>>> case.
>>>>>>> How does it compare with and without the revert?  Can you add the numbers?
>>>>>
>>>>> Looking through the thread, you're comparing -m single -d single, but
>>>>> btrfs is still doing the raid.
>>>>>
>>>>> Sorry to keep asking for more runs, but these numbers are a surprise,
>>>>> and I probably won't have time today to reproduce before vacation next
>>>>> week (sadly, Christoph and I aren't going together).
>>>
>>> Sadly I also did not run into either you or Christoph during my own
>>> vacation during the last two weeks. But I'm back from it how, which got
>>> me wondering:
>>>
>>> What happened to this regression? Was any progress made to resolve this
>>> in one way or another?
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>> --
>>> Everything you wanna know about Linux kernel regression tracking:
>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>> If I did something stupid, please tell me, as explained on that page.
>>>
>>> #regzbot poke
>>>
>>>>> Can you please do a run where lvm or MD raid are providing the raid0?
>>>> no LVM/MD used here.
>>>>
>>>>> It doesn't look like you're using compression, but I wanted to double check.
>>>>
>>>> Yes. '-m xx -d yy' with other default mkfs.btrfs option, so no compression.
>>>>
>>>>> How much ram do you have?
>>>>
>>>> 192G ECC memory.
>>>>
>>>> two CPU numa nodes, but all PCIe3 NVMe SSD are connected to one NVMe HBA/
>>>> one numa node.
>>>>
>>>>> Your fio run has 4 jobs going, can I please see the full fio output for
>>>>> a fast run and a slow run?
>>>>
>>>> fio results are saved into attachment files (fast.text & slow.txt)
>>>>
>>>> Best Regards
>>>> Wang Yugui (wangyugui@e16-tech.com)
>>>> 2023/08/13
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-09-12  7:58                                     ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-26 10:55                                       ` Thorsten Leemhuis
  2023-09-26 17:18                                         ` Chris Mason
  0 siblings, 1 reply; 29+ messages in thread
From: Thorsten Leemhuis @ 2023-09-26 10:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, linux-btrfs, Chris Mason, Josef Bacik, David Sterba,
	Wang Yugui, Linux regressions mailing list

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Christoph, I'm sorry to annoy you, but things look stalled here: it
seems the Btrfs developer don't care or simply have no idea what might
be causing this. So how do we get this running again? Or was some
progress made and I just missed it?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 12.09.23 09:58, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 12.09.23 01:20, Wang Yugui wrote:
>>
>>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>>> for once, to make this easily accessible to everyone.
>>>
>>> Hmmm, nothing happened wrt to this regression during the past two weeks
>>> afaics. Wonder if it fell through the cracks due to the merge window or
>>> if there is a good reason. Was there progress and I just missed it? To
>>> rule out the latter:
>>>
>>> Wang Yugui, does the problem still happen with 6.6-rc1?
>>
>> The problem still happen with 6.6-rc1.
>> Yet no related patch is released for this problem.
> 
> Thx. Adding Josef and David to the list of recipients, maybe they might
> have an idea why that commit of Christoph (da023618076a13 ("btrfs:
> submit IO synchronously for fast checksum implementations") [v6.5-rc1])
> seems to slow down things for Wang Yugui's Btrfs setup (hope I
> understood things correctly). FWIW, the start of the thread is here:
> https://lore.kernel.org/linux-btrfs/20230731152223.4EFB.409509F4@e16-tech.com/
> 
> Christoph later wrote "so you have a case where the offload for the
> checksumming generation actually helps (by a lot)" and asked for advice
> from the Btrfs side:
> https://lore.kernel.org/linux-btrfs/20230802092631.GA27963@lst.de/
> 
> /me hopes that will get things rolling again.
> 
> /me meanwhile wonders if there were other reports like this, or if Wang
> Yugui's system is somehow special.
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 
>>> On 29.08.23 11:45, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> On 13.08.23 11:50, Wang Yugui wrote:
>>>>>> On 8/11/23 10:23 AM, Wang Yugui wrote:
>>>>>>>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
>>>>>>>>>> And with only a revert of
>>>>>>>>>>
>>>>>>>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>>>>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
>>>>>>>>> checksum implementations") 
>>>>>>>> Ok, so you have a case where the offload for the checksumming generation
>>>>>>>> actually helps (by a lot).  Adding Chris to the Cc list as he was
>>>>>>>> involved with this.
>>>>>>>>
>>>>>>>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>>>>>                 return false;
>>>>>>>>>> This disables synchronous checksum calculation entirely for data I/O.
>>>>>>>>> without this fix, data I/O checksum is always synchronous?
>>>>>>>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>>>>> It is never with the above patch.
>>>>>>>>
>>>>>>>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
>>>>>>>>>> single profile) workload.
>>>>>>>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
>>>>>>>>> case.
>>>>>>>> How does it compare with and without the revert?  Can you add the numbers?
>>>>>>
>>>>>> Looking through the thread, you're comparing -m single -d single, but
>>>>>> btrfs is still doing the raid.
>>>>>>
>>>>>> Sorry to keep asking for more runs, but these numbers are a surprise,
>>>>>> and I probably won't have time today to reproduce before vacation next
>>>>>> week (sadly, Christoph and I aren't going together).
>>>>
>>>> Sadly I also did not run into either you or Christoph during my own
>>>> vacation during the last two weeks. But I'm back from it how, which got
>>>> me wondering:
>>>>
>>>> What happened to this regression? Was any progress made to resolve this
>>>> in one way or another?
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>> --
>>>> Everything you wanna know about Linux kernel regression tracking:
>>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>>> If I did something stupid, please tell me, as explained on that page.
>>>>
>>>> #regzbot poke
>>>>
>>>>>> Can you please do a run where lvm or MD raid are providing the raid0?
>>>>> no LVM/MD used here.
>>>>>
>>>>>> It doesn't look like you're using compression, but I wanted to double check.
>>>>>
>>>>> Yes. '-m xx -d yy' with other default mkfs.btrfs option, so no compression.
>>>>>
>>>>>> How much ram do you have?
>>>>>
>>>>> 192G ECC memory.
>>>>>
>>>>> two CPU numa nodes, but all PCIe3 NVMe SSD are connected to one NVMe HBA/
>>>>> one numa node.
>>>>>
>>>>>> Your fio run has 4 jobs going, can I please see the full fio output for
>>>>>> a fast run and a slow run?
>>>>>
>>>>> fio results are saved into attachment files (fast.text & slow.txt)
>>>>>
>>>>> Best Regards
>>>>> Wang Yugui (wangyugui@e16-tech.com)
>>>>> 2023/08/13
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-09-26 10:55                                       ` Thorsten Leemhuis
@ 2023-09-26 17:18                                         ` Chris Mason
  2023-09-27 11:30                                           ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 29+ messages in thread
From: Chris Mason @ 2023-09-26 17:18 UTC (permalink / raw)
  To: Linux regressions mailing list, Christoph Hellwig
  Cc: linux-btrfs, Chris Mason, Josef Bacik, David Sterba, Wang Yugui

On 9/26/23 6:55 AM, Thorsten Leemhuis wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
> 
> Christoph, I'm sorry to annoy you, but things look stalled here: it
> seems the Btrfs developer don't care or simply have no idea what might
> be causing this. So how do we get this running again? Or was some
> progress made and I just missed it?

I've been trying to reproduce with the hardware I have on hand, but
unfortunately it all loves the change.  I'll take another pass.

-chris


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-09-26 17:18                                         ` Chris Mason
@ 2023-09-27 11:30                                           ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 29+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-27 11:30 UTC (permalink / raw)
  To: Chris Mason, Linux regressions mailing list, Christoph Hellwig
  Cc: linux-btrfs, Chris Mason, Josef Bacik, David Sterba, Wang Yugui

On 26.09.23 19:18, Chris Mason wrote:
> On 9/26/23 6:55 AM, Thorsten Leemhuis wrote:
>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> for once, to make this easily accessible to everyone.
>>
>> Christoph, I'm sorry to annoy you, but things look stalled here: it
>> seems the Btrfs developer don't care or simply have no idea what might
>> be causing this. So how do we get this running again? Or was some
>> progress made and I just missed it?
> 
> I've been trying to reproduce with the hardware I have on hand, but
> unfortunately it all loves the change.  I'll take another pass.

Many thx! Ciao, Thorsten

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-08-29  9:45                               ` Linux regression tracking (Thorsten Leemhuis)
  2023-09-11  7:02                                 ` Thorsten Leemhuis
@ 2023-12-06 14:22                                 ` Linux regression tracking (Thorsten Leemhuis)
  2023-12-13 15:57                                   ` Naohiro Aota
  1 sibling, 1 reply; 29+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-12-06 14:22 UTC (permalink / raw)
  To: Wang Yugui, Chris Mason
  Cc: Christoph Hellwig, linux-btrfs, Chris Mason,
	Linux kernel regressions list

On 29.08.23 11:45, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 13.08.23 11:50, Wang Yugui wrote:
>>> On 8/11/23 10:23 AM, Wang Yugui wrote:
>>>>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
>>>>>>> And with only a revert of
>>>>>>>
>>>>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
>>>>>> checksum implementations") 
>>>>> Ok, so you have a case where the offload for the checksumming generation
>>>>> actually helps (by a lot).  Adding Chris to the Cc list as he was
>>>>> involved with this.
>>>>>
>>>>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
>>>>>>>>                 return false;
>>>>>>> This disables synchronous checksum calculation entirely for data I/O.
>>>>>> without this fix, data I/O checksum is always synchronous?
>>>>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
>>>>> It is never with the above patch.
>>>>>
>>>>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
>>>>>>> single profile) workload.
>>>>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
>>>>>> case.
>>>>> How does it compare with and without the revert?  Can you add the numbers?
>>>
>>> Looking through the thread, you're comparing -m single -d single, but
>>> btrfs is still doing the raid.
>>>
>>> Sorry to keep asking for more runs, but these numbers are a surprise,
>>> and I probably won't have time today to reproduce before vacation next
>>> week (sadly, Christoph and I aren't going together).

FWIW, seems this thread died down and the underlying reason for the
regression despite quite a bit of effort from the developers wasn't
found. Haven't noticed any similar reports either. Reverting apparently
is not a option. So in the end this afaics remains unfixed. In an ideal
"no regressions" world this shouldn't happen, but well, we sadly don't
live in one. So I'll stop tracking this issue, it's not worth the effort:

#regzbot inconclusive: despite some efforts from the developers could
not be fixed
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
  2023-12-06 14:22                                 ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-12-13 15:57                                   ` Naohiro Aota
  0 siblings, 0 replies; 29+ messages in thread
From: Naohiro Aota @ 2023-12-13 15:57 UTC (permalink / raw)
  To: linux-btrfs, Linux regressions mailing list
  Cc: Wang Yugui, Chris Mason, Christoph Hellwig, Chris Mason

[-- Attachment #1: Type: text/plain, Size: 8011 bytes --]

Recently, I came across a similar issue with this. I was running a fio
command doing buffered IO on RAIO0 btrfs on SMR (zoned) devices and found
ignoring BTRFS_FS_CSUM_IMPL_FAST in should_async_write() improves the
performance.

I ran the same experiment on regular SSDs and on SMR HDDs, varying the
number of devices. In both case, increasing the number of devices results
in better performance for ignoring BTRFS_FS_CSUM_IMPL_FAST (disabling
inline csum).

Here is a detail I did.

Environment:
  CPU: 96 CPUs, 2 NUMA nodes
  mem: 1024GB

mkfs: mkfs.btrfs -d raid0 -m raid0 ${devs}

fio command:
fio --group_reporting --eta=always --eta-interval=10s --eta-newline=10s \
     --rw=write --fallocate=none \
     --direct=0 --ioengine=libaio --iodepth=32 --end_fsync=1 \
     --filesize=200G bs=$((64 * ${njobs}))k \
     --time_based --runtime=300s 
     --directory=/mnt --name=writer --numjobs=${njobs}

* Results on SSDs

njobs=6 for this setup.

- baseline: with inline checksum = misc-next
  - numdevs 1: WRITE: bw=442MiB/s (464MB/s), 442MiB/s-442MiB/s (464MB/s-464MB/s), io=302GiB (324GB), run=698553-698553msec
  - numdevs 2: WRITE: bw=873MiB/s (915MB/s), 873MiB/s-873MiB/s (915MB/s-915MB/s), io=425GiB (457GB), run=499052-499052msec
  - numdevs 3: WRITE: bw=1162MiB/s (1218MB/s), 1162MiB/s-1162MiB/s (1218MB/s-1218MB/s), io=491GiB (527GB), run=432872-432872msec
  - numdevs 4: WRITE: bw=1261MiB/s (1322MB/s), 1261MiB/s-1261MiB/s (1322MB/s-1322MB/s), io=493GiB (530GB), run=400496-400496msec
  - numdevs 5: WRITE: bw=1331MiB/s (1395MB/s), 1331MiB/s-1331MiB/s (1395MB/s-1395MB/s), io=494GiB (531GB), run=380279-380279msec
  - numdevs 6: WRITE: bw=1391MiB/s (1458MB/s), 1391MiB/s-1391MiB/s (1458MB/s-1458MB/s), io=499GiB (536GB), run=367312-367312msec

- without inline checksum
  - numdevs 1: WRITE: bw=437MiB/s (459MB/s), 437MiB/s-437MiB/s (459MB/s-459MB/s), io=299GiB (321GB), run=699787-699787msec
  - numdevs 2: WRITE: bw=850MiB/s (892MB/s), 850MiB/s-850MiB/s (892MB/s-892MB/s), io=419GiB (450GB), run=504463-504463msec
  - numdevs 3: WRITE: bw=1259MiB/s (1320MB/s), 1259MiB/s-1259MiB/s (1320MB/s-1320MB/s), io=539GiB (579GB), run=438666-438666msec
  - numdevs 4: WRITE: bw=1597MiB/s (1675MB/s), 1597MiB/s-1597MiB/s (1675MB/s-1675MB/s), io=636GiB (683GB), run=408050-408050msec
  - numdevs 5: WRITE: bw=2021MiB/s (2119MB/s), 2021MiB/s-2021MiB/s (2119MB/s-2119MB/s), io=759GiB (815GB), run=384534-384534msec
  - numdevs 6: WRITE: bw=2106MiB/s (2208MB/s), 2106MiB/s-2106MiB/s (2208MB/s-2208MB/s), io=760GiB (816GB), run=369705-369705msec

* Results on SMR HDDs (zoned mode)

njobs=30 for this setup

- baseline: with inline checksum = misc-next
  - numdevs  1: WRITE: bw=194MiB/s (204MB/s), 194MiB/s-194MiB/s (204MB/s-204MB/s), io=232GiB (249GB), run=1219953-1219953msec
  - numdevs  2: WRITE: bw=393MiB/s (412MB/s), 393MiB/s-393MiB/s (412MB/s-412MB/s), io=279GiB (299GB), run=725862-725862msec
  - numdevs  4: WRITE: bw=763MiB/s (800MB/s), 763MiB/s-763MiB/s (800MB/s-800MB/s), io=407GiB (437GB), run=545920-545920msec
  - numdevs  8: WRITE: bw=995MiB/s (1044MB/s), 995MiB/s-995MiB/s (1044MB/s-1044MB/s), io=448GiB (481GB), run=460426-460426msec
  - numdevs 16: WRITE: bw=1196MiB/s (1254MB/s), 1196MiB/s-1196MiB/s (1254MB/s-1254MB/s), io=447GiB (480GB), run=382588-382588msec
  - numdevs 20: WRITE: bw=1247MiB/s (1307MB/s), 1247MiB/s-1247MiB/s (1307MB/s-1307MB/s), io=445GiB (478GB), run=365526-365526msec
  - numdevs 25: WRITE: bw=1286MiB/s (1349MB/s), 1286MiB/s-1286MiB/s (1349MB/s-1349MB/s), io=443GiB (475GB), run=352474-352474msec
  - numdevs 30: WRITE: bw=1299MiB/s (1362MB/s), 1299MiB/s-1299MiB/s (1362MB/s-1362MB/s), io=437GiB (470GB), run=344890-344890msec

- without inline checksum
  - numdevs  1: WRITE: bw=219MiB/s (229MB/s), 219MiB/s-219MiB/s (229MB/s-229MB/s), io=236GiB (253GB), run=1103041-1103041msec
  - numdevs  2: WRITE: bw=426MiB/s (447MB/s), 426MiB/s-426MiB/s (447MB/s-447MB/s), io=297GiB (319GB), run=712921-712921msec
  - numdevs  4: WRITE: bw=734MiB/s (770MB/s), 734MiB/s-734MiB/s (770MB/s-770MB/s), io=383GiB (411GB), run=534348-534348msec
  - numdevs  8: WRITE: bw=1124MiB/s (1178MB/s), 1124MiB/s-1124MiB/s (1178MB/s-1178MB/s), io=497GiB (534GB), run=452973-452973msec
  - numdevs 16: WRITE: bw=1603MiB/s (1681MB/s), 1603MiB/s-1603MiB/s (1681MB/s-1681MB/s), io=652GiB (700GB), run=416390-416390msec
  - numdevs 20: WRITE: bw=1804MiB/s (1892MB/s), 1804MiB/s-1804MiB/s (1892MB/s-1892MB/s), io=705GiB (757GB), run=400029-400029msec
  - numdevs 25: WRITE: bw=1913MiB/s (2006MB/s), 1913MiB/s-1913MiB/s (2006MB/s-2006MB/s), io=660GiB (709GB), run=353584-353584msec
  - numdevs 30: WRITE: bw=1953MiB/s (2048MB/s), 1953MiB/s-1953MiB/s (2048MB/s-2048MB/s), io=658GiB (707GB), run=345082-345082msec

Also, I attached plots of ratio ("MiB/s without inline checksum" / "MiB/s
with inline checksum = misc-next") for these cases.

The plots shows disabling inline checksum can be better by at most 50% than
inline checksum, if we have many devices.

The result will vary depending on the environment, but the fast checksum
might not be fast enough if btrfs is served by RAID0 on multiple devices.

On Wed, Dec 06, 2023 at 03:22:19PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 29.08.23 11:45, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 13.08.23 11:50, Wang Yugui wrote:
> >>> On 8/11/23 10:23 AM, Wang Yugui wrote:
> >>>>> On Wed, Aug 02, 2023 at 08:04:57AM +0800, Wang Yugui wrote:
> >>>>>>> And with only a revert of
> >>>>>>>
> >>>>>>> "btrfs: submit IO synchronously for fast checksum implementations"?
> >>>>>> GOOD performance when only (Revert "btrfs: submit IO synchronously for fast
> >>>>>> checksum implementations") 
> >>>>> Ok, so you have a case where the offload for the checksumming generation
> >>>>> actually helps (by a lot).  Adding Chris to the Cc list as he was
> >>>>> involved with this.
> >>>>>
> >>>>>>>> -       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> >>>>>>>> +       if ((bbio->bio.bi_opf & REQ_META) && test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
> >>>>>>>>                 return false;
> >>>>>>> This disables synchronous checksum calculation entirely for data I/O.
> >>>>>> without this fix, data I/O checksum is always synchronous?
> >>>>>> this is a feature change of "btrfs: submit IO synchronously for fast checksum implementations"?
> >>>>> It is never with the above patch.
> >>>>>
> >>>>>>> Also I'm curious if you see any differents for a non-RAID0 (i.e.
> >>>>>>> single profile) workload.
> >>>>>> '-m single -d single' is about 10% slow that '-m raid1 -d raid0' in this test
> >>>>>> case.
> >>>>> How does it compare with and without the revert?  Can you add the numbers?
> >>>
> >>> Looking through the thread, you're comparing -m single -d single, but
> >>> btrfs is still doing the raid.
> >>>
> >>> Sorry to keep asking for more runs, but these numbers are a surprise,
> >>> and I probably won't have time today to reproduce before vacation next
> >>> week (sadly, Christoph and I aren't going together).
> 
> FWIW, seems this thread died down and the underlying reason for the
> regression despite quite a bit of effort from the developers wasn't
> found. Haven't noticed any similar reports either. Reverting apparently
> is not a option. So in the end this afaics remains unfixed. In an ideal
> "no regressions" world this shouldn't happen, but well, we sadly don't
> live in one. So I'll stop tracking this issue, it's not worth the effort:
> 
> #regzbot inconclusive: despite some efforts from the developers could
> not be fixed
> #regzbot ignore-activity
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.

[-- Attachment #2: regular-raid0.png --]
[-- Type: image/png, Size: 13934 bytes --]

[-- Attachment #3: zoned-raid0.png --]
[-- Type: image/png, Size: 14942 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-12-13 15:57 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-31  7:22 btrfs write-bandwidth performance regression of 6.5-rc4/rc3 Wang Yugui
2023-08-01  2:22 ` Wang Yugui
2023-08-01  8:35   ` Christoph Hellwig
2023-08-01  8:56     ` Wang Yugui
2023-08-01  9:03       ` Christoph Hellwig
2023-08-01  9:32         ` Wang Yugui
2023-08-01 10:00           ` Christoph Hellwig
2023-08-01 13:04             ` Wang Yugui
2023-08-01 14:59               ` Christoph Hellwig
2023-08-01 15:51                 ` Wang Yugui
2023-08-01 15:56                   ` Christoph Hellwig
2023-08-01 15:57                     ` Christoph Hellwig
2023-08-02  0:04                     ` Wang Yugui
2023-08-02  9:26                       ` Christoph Hellwig
2023-08-11  8:58                         ` Linux regression tracking (Thorsten Leemhuis)
2023-08-11 10:31                           ` Christoph Hellwig
2023-08-11 14:23                         ` Wang Yugui
2023-08-11 14:52                           ` Chris Mason
2023-08-13  9:50                             ` Wang Yugui
2023-08-29  9:45                               ` Linux regression tracking (Thorsten Leemhuis)
2023-09-11  7:02                                 ` Thorsten Leemhuis
2023-09-11 23:20                                   ` Wang Yugui
2023-09-12  7:58                                     ` Linux regression tracking (Thorsten Leemhuis)
2023-09-26 10:55                                       ` Thorsten Leemhuis
2023-09-26 17:18                                         ` Chris Mason
2023-09-27 11:30                                           ` Linux regression tracking (Thorsten Leemhuis)
2023-12-06 14:22                                 ` Linux regression tracking (Thorsten Leemhuis)
2023-12-13 15:57                                   ` Naohiro Aota
2023-08-02  8:45 ` Linux regression tracking #adding (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).