All of lore.kernel.org
 help / color / mirror / Atom feed
* don't offload CRCs generation to helpers if it is fast
@ 2023-03-29  0:13 Christoph Hellwig
  2023-03-29  0:13 ` [PATCH 1/4] btrfs: fix fast csum detection Christoph Hellwig
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Christoph Hellwig @ 2023-03-29  0:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Hi all,

based on various observations from me and Chris, we really should never
offload CRCs generation to helpers if it is hardware accelerated.

This series implements that and also tidies up various lose ends around
that.


Diffstat:
 bio.c         |   32 ++++++++++++++------------------
 btrfs_inode.h |    3 ---
 disk-io.c     |   24 ++++++++++++++++++------
 file.c        |    9 ---------
 fs.h          |    1 -
 inode.c       |    1 -
 super.c       |    3 ---
 transaction.c |    2 --
 8 files changed, 32 insertions(+), 43 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] btrfs: fix fast csum detection
  2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
@ 2023-03-29  0:13 ` Christoph Hellwig
  2023-04-03 18:35   ` David Sterba
  2023-03-29  0:13 ` [PATCH 2/4] btrfs: remove the sync_writers field in struct btrfs_inode Christoph Hellwig
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-03-29  0:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The BTRFS_FS_CSUM_IMPL_FAST flag is current set whenever a non-generic
crc32c is detected, which is the incorrect check if the file system uses
a different checksumming algorithm.  Refactor the code to only checks
this if crc32 is actually used.  Note that in an ideal world the
information if an algorithm is hardware accelerated or not should be
provided by the crypto API instead, but that's left for another day.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c | 18 +++++++++++++++++-
 fs/btrfs/super.c   |  2 --
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f57c41f41bf5f..ec765d6bc53673 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -154,6 +154,21 @@ static bool btrfs_supported_super_csum(u16 csum_type)
 	}
 }
 
+/*
+ * Check if the CSUM implementation is a fast accelerated one.
+ * As-is this is a bit of a hack and should be replaced once the
+ * csum implementations provide that information themselves.
+ */
+static bool btrfs_csum_is_fast(u16 csum_type)
+{
+	switch (csum_type) {
+	case BTRFS_CSUM_TYPE_CRC32:
+		return !strstr(crc32c_impl(), "generic");
+	default:
+		return false;
+	}
+}
+
 /*
  * Return 0 if the superblock checksum type matches the checksum value of that
  * algorithm. Pass the raw disk superblock data.
@@ -3373,7 +3388,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		btrfs_release_disk_super(disk_super);
 		goto fail_alloc;
 	}
-
+	if (btrfs_csum_is_fast(csum_type))
+		set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
 	fs_info->csum_size = btrfs_super_csum_size(disk_super);
 
 	ret = btrfs_init_csum_hash(fs_info, csum_type);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d8885966e801cd..e94a4cd06607e1 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1517,8 +1517,6 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 		shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s", fs_type->name,
 					s->s_id);
 		btrfs_sb(s)->bdev_holder = fs_type;
-		if (!strstr(crc32c_impl(), "generic"))
-			set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
 		error = btrfs_fill_super(s, fs_devices, data);
 	}
 	if (!error)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] btrfs: remove the sync_writers field in struct btrfs_inode
  2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
  2023-03-29  0:13 ` [PATCH 1/4] btrfs: fix fast csum detection Christoph Hellwig
@ 2023-03-29  0:13 ` Christoph Hellwig
  2023-04-05 13:50   ` David Sterba
  2023-03-29  0:13 ` [PATCH 3/4] btrfs: never defer I/O submission for fast CRC implementations Christoph Hellwig
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-03-29  0:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The sync_writers field communicates if an inode has outstanding
synchronous writeback.  The better way is to look at the REQ_SYNC
flag in the bio, which is set by the writeback code for WB_SYNC_ALL
writeback and can communicate the need for sync writeback without
an inode field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c         | 7 +++----
 fs/btrfs/btrfs_inode.h | 3 ---
 fs/btrfs/file.c        | 9 ---------
 fs/btrfs/inode.c       | 1 -
 fs/btrfs/transaction.c | 2 --
 5 files changed, 3 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index cf09c6271edbee..c851a3526911f6 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -552,11 +552,10 @@ static void run_one_async_free(struct btrfs_work *work)
 static bool should_async_write(struct btrfs_bio *bbio)
 {
 	/*
-	 * If the I/O is not issued by fsync and friends, (->sync_writers != 0),
-	 * then try to defer the submission to a workqueue to parallelize the
-	 * checksum calculation.
+	 * Try to defer the submission to a workqueue to parallelize the
+	 * checksum calculation unless the I/O is issued synchronously.
 	 */
-	if (atomic_read(&bbio->inode->sync_writers))
+	if (op_is_sync(bbio->bio.bi_opf))
 		return false;
 
 	/*
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 9dc21622806ef4..8666ace8879302 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -116,9 +116,6 @@ struct btrfs_inode {
 
 	unsigned long runtime_flags;
 
-	/* Keep track of who's O_SYNC/fsyncing currently */
-	atomic_t sync_writers;
-
 	/* full 64 bit generation number, struct vfs_inode doesn't have a big
 	 * enough field for this.
 	 */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5cc5a1faaef5b5..4b9433b96f4b8e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1651,7 +1651,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	struct file *file = iocb->ki_filp;
 	struct btrfs_inode *inode = BTRFS_I(file_inode(file));
 	ssize_t num_written, num_sync;
-	const bool sync = iocb_is_dsync(iocb);
 
 	/*
 	 * If the fs flips readonly due to some impossible error, although we
@@ -1664,9 +1663,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	if (encoded && (iocb->ki_flags & IOCB_NOWAIT))
 		return -EOPNOTSUPP;
 
-	if (sync)
-		atomic_inc(&inode->sync_writers);
-
 	if (encoded) {
 		num_written = btrfs_encoded_write(iocb, from, encoded);
 		num_sync = encoded->len;
@@ -1686,9 +1682,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 			num_written = num_sync;
 	}
 
-	if (sync)
-		atomic_dec(&inode->sync_writers);
-
 	current->backing_dev_info = NULL;
 	return num_written;
 }
@@ -1733,9 +1726,7 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
 	 * several segments of stripe length (currently 64K).
 	 */
 	blk_start_plug(&plug);
-	atomic_inc(&BTRFS_I(inode)->sync_writers);
 	ret = btrfs_fdatawrite_range(inode, start, end);
-	atomic_dec(&BTRFS_I(inode)->sync_writers);
 	blk_finish_plug(&plug);
 
 	return ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 865d56ff2ce150..b505b205c1b142 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8479,7 +8479,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	ei->io_tree.inode = ei;
 	extent_io_tree_init(fs_info, &ei->file_extent_tree,
 			    IO_TREE_INODE_FILE_EXTENT);
-	atomic_set(&ei->sync_writers, 0);
 	mutex_init(&ei->log_mutex);
 	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
 	INIT_LIST_HEAD(&ei->delalloc_inodes);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c8e503e5db4cdb..6587a13de0aedb 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1055,7 +1055,6 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 	u64 start = 0;
 	u64 end;
 
-	atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers);
 	while (!find_first_extent_bit(dirty_pages, start, &start, &end,
 				      mark, &cached_state)) {
 		bool wait_writeback = false;
@@ -1091,7 +1090,6 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 		cond_resched();
 		start = end + 1;
 	}
-	atomic_dec(&BTRFS_I(fs_info->btree_inode)->sync_writers);
 	return werr;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] btrfs: never defer I/O submission for fast CRC implementations
  2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
  2023-03-29  0:13 ` [PATCH 1/4] btrfs: fix fast csum detection Christoph Hellwig
  2023-03-29  0:13 ` [PATCH 2/4] btrfs: remove the sync_writers field in struct btrfs_inode Christoph Hellwig
@ 2023-03-29  0:13 ` Christoph Hellwig
  2023-03-29  0:13 ` [PATCH 4/4] btrfs: remove hipri_workers workqueue Christoph Hellwig
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2023-03-29  0:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Most modern hardware supports very fast accelerated crc32c calculation.
If that is supported the CPU overhead of the checksum calculation is
very limited, and offloading the calculation to special worker threads
has a lot of overhead for no gain.

E.g. on an Intel Optane device is actually very much slows down even
1M buffered writes with fio:

Unpatched:

write: IOPS=3316, BW=3316MiB/s (3477MB/s)(200GiB/61757msec); 0 zone resets

With synchronous CRCs:

write: IOPS=4882, BW=4882MiB/s (5119MB/s)(200GiB/41948msec); 0 zone resets

With a lot of variation during the unpatch run going down as low as
1100MB/s, while the synchronous CRC version has about the same peak write
speed but much lower dips, and fewer kworkers churning around.
Both tests had fio saturated at 100% CPU.

(thanks to Jens Axboe via Chris Mason for the benchmarking)

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index c851a3526911f6..57c35e920269f4 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -551,26 +551,26 @@ static void run_one_async_free(struct btrfs_work *work)
 
 static bool should_async_write(struct btrfs_bio *bbio)
 {
+	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
+
+	/*
+	 * Submit synchronously if the checksum implementation is fast.
+	 */
+	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
+		return false;
+
 	/*
 	 * Try to defer the submission to a workqueue to parallelize the
-	 * checksum calculation unless the I/O is issued synchronously.
+	 * checksum calculation only if the I/O is issued asynchronously.
 	 */
 	if (op_is_sync(bbio->bio.bi_opf))
 		return false;
 
 	/*
-	 * Submit metadata writes synchronously if the checksum implementation
-	 * is fast, or we are on a zoned device that wants I/O to be submitted
-	 * in order.
+	 * Zoned devices require I/O to be submitted in order.
 	 */
-	if (bbio->bio.bi_opf & REQ_META) {
-		struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
-
-		if (btrfs_is_zoned(fs_info))
-			return false;
-		if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
-			return false;
-	}
+	if ((bbio->bio.bi_opf & REQ_META) && btrfs_is_zoned(fs_info))
+		return false;
 
 	return true;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] btrfs: remove hipri_workers workqueue
  2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-03-29  0:13 ` [PATCH 3/4] btrfs: never defer I/O submission for fast CRC implementations Christoph Hellwig
@ 2023-03-29  0:13 ` Christoph Hellwig
  2023-03-30 15:42 ` don't offload CRCs generation to helpers if it is fast Chris Mason
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2023-03-29  0:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Now that btrfs_wq_submit_bio is never called for synchronous I/O,
the hipri_workers workqueue is dead code and can be removeḋ.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c     | 5 +----
 fs/btrfs/disk-io.c | 6 +-----
 fs/btrfs/fs.h      | 1 -
 fs/btrfs/super.c   | 1 -
 4 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 57c35e920269f4..f1ff33d6726404 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -598,10 +598,7 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
 
 	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
 			run_one_async_free);
-	if (op_is_sync(bbio->bio.bi_opf))
-		btrfs_queue_work(fs_info->hipri_workers, &async->work);
-	else
-		btrfs_queue_work(fs_info->workers, &async->work);
+	btrfs_queue_work(fs_info->workers, &async->work);
 	return true;
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ec765d6bc53673..7740bb1b152445 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2000,7 +2000,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 {
 	btrfs_destroy_workqueue(fs_info->fixup_workers);
 	btrfs_destroy_workqueue(fs_info->delalloc_workers);
-	btrfs_destroy_workqueue(fs_info->hipri_workers);
 	btrfs_destroy_workqueue(fs_info->workers);
 	if (fs_info->endio_workers)
 		destroy_workqueue(fs_info->endio_workers);
@@ -2195,9 +2194,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
 
 	fs_info->workers =
 		btrfs_alloc_workqueue(fs_info, "worker", flags, max_active, 16);
-	fs_info->hipri_workers =
-		btrfs_alloc_workqueue(fs_info, "worker-high",
-				      flags | WQ_HIGHPRI, max_active, 16);
 
 	fs_info->delalloc_workers =
 		btrfs_alloc_workqueue(fs_info, "delalloc",
@@ -2234,7 +2230,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
 	fs_info->discard_ctl.discard_workers =
 		alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
 
-	if (!(fs_info->workers && fs_info->hipri_workers &&
+	if (!(fs_info->workers &&
 	      fs_info->delalloc_workers && fs_info->flush_workers &&
 	      fs_info->endio_workers && fs_info->endio_meta_workers &&
 	      fs_info->compressed_write_workers &&
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index ca17a7fc3ac35c..25faa9de4b5385 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -543,7 +543,6 @@ struct btrfs_fs_info {
 	 * A third pool does submit_bio to avoid deadlocking with the other two.
 	 */
 	struct btrfs_workqueue *workers;
-	struct btrfs_workqueue *hipri_workers;
 	struct btrfs_workqueue *delalloc_workers;
 	struct btrfs_workqueue *flush_workers;
 	struct workqueue_struct *endio_workers;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e94a4cd06607e1..f2d59be4477cf1 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1627,7 +1627,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info,
 	       old_pool_size, new_pool_size);
 
 	btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
-	btrfs_workqueue_set_max(fs_info->hipri_workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->endio_write_workers, new_pool_size);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: don't offload CRCs generation to helpers if it is fast
  2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-03-29  0:13 ` [PATCH 4/4] btrfs: remove hipri_workers workqueue Christoph Hellwig
@ 2023-03-30 15:42 ` Chris Mason
  2023-03-30 21:06   ` Christoph Hellwig
  2023-03-31 17:31 ` David Sterba
  2023-04-05 23:14 ` David Sterba
  6 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2023-03-30 15:42 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

On 3/28/23 8:13 PM, Christoph Hellwig wrote:
> Hi all,
> 
> based on various observations from me and Chris, we really should never
> offload CRCs generation to helpers if it is hardware accelerated.
> 
> This series implements that and also tidies up various lose ends around
> that.
> 

Thanks for doing this, and for fixing up the fast crc detection.  For
the whole series:

Reviewed-by: Chris Mason <clm@fb.com>

I spent a while convincing myself we really don't need sync_writers, and
it's great to get rid of that.  Tangent-man noticed that we're really
close to being able to drop struct btrfs_bio_ctrl->sync_io.

As far as I can tell, it's only mirroring wbc->sync_mode, and the only
place that still cares is lock_extent_buffer_for_io(), and we could just
pass down the wbc.

-chris




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: don't offload CRCs generation to helpers if it is fast
  2023-03-30 15:42 ` don't offload CRCs generation to helpers if it is fast Chris Mason
@ 2023-03-30 21:06   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2023-03-30 21:06 UTC (permalink / raw)
  To: Chris Mason
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Mar 30, 2023 at 11:42:11AM -0400, Chris Mason wrote:
> I spent a while convincing myself we really don't need sync_writers, and
> it's great to get rid of that.  Tangent-man noticed that we're really
> close to being able to drop struct btrfs_bio_ctrl->sync_io.
> 
> As far as I can tell, it's only mirroring wbc->sync_mode, and the only
> place that still cares is lock_extent_buffer_for_io(), and we could just
> pass down the wbc.

btrfs_bio_ctrl->sync_io is already gone in misc-next and replaced with
a check for wbc->sync_mode.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: don't offload CRCs generation to helpers if it is fast
  2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-03-30 15:42 ` don't offload CRCs generation to helpers if it is fast Chris Mason
@ 2023-03-31 17:31 ` David Sterba
  2023-04-05 23:14 ` David Sterba
  6 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2023-03-31 17:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, Mar 29, 2023 at 09:13:04AM +0900, Christoph Hellwig wrote:
> Hi all,
> 
> based on various observations from me and Chris, we really should never
> offload CRCs generation to helpers if it is hardware accelerated.
> 
> This series implements that and also tidies up various lose ends around
> that.

Added to for-next, thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] btrfs: fix fast csum detection
  2023-03-29  0:13 ` [PATCH 1/4] btrfs: fix fast csum detection Christoph Hellwig
@ 2023-04-03 18:35   ` David Sterba
  2023-04-04  5:06     ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2023-04-03 18:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, Mar 29, 2023 at 09:13:05AM +0900, Christoph Hellwig wrote:
> The BTRFS_FS_CSUM_IMPL_FAST flag is current set whenever a non-generic
> crc32c is detected, which is the incorrect check if the file system uses
> a different checksumming algorithm.  Refactor the code to only checks
> this if crc32 is actually used.  Note that in an ideal world the
> information if an algorithm is hardware accelerated or not should be
> provided by the crypto API instead, but that's left for another day.

https://lore.kernel.org/linux-crypto/20190514213409.GA115510@gmail.com/
I got pointed to the driver name, the priority that would say if the
implementation is accelerated is not exported to the API. This would be
cleaner but for a simple 'is/is-not' I think it's sufficient.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/disk-io.c | 18 +++++++++++++++++-
>  fs/btrfs/super.c   |  2 --
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 3f57c41f41bf5f..ec765d6bc53673 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -154,6 +154,21 @@ static bool btrfs_supported_super_csum(u16 csum_type)
>  	}
>  }
>  
> +/*
> + * Check if the CSUM implementation is a fast accelerated one.
> + * As-is this is a bit of a hack and should be replaced once the
> + * csum implementations provide that information themselves.
> + */
> +static bool btrfs_csum_is_fast(u16 csum_type)
> +{
> +	switch (csum_type) {
> +	case BTRFS_CSUM_TYPE_CRC32:
> +		return !strstr(crc32c_impl(), "generic");

This would check the internal shash (lib/libcrc32c.c) not the one we
allocate for btrfs in btrfs_init_csum_hash. Though they both should be
equivalent as libcrc32c does some tricks to lookup the fastest
implementation but theoretically may not find the fast one, while mount
could.

> +	default:
> +		return false;
> +	}
> +}
> +
>  /*
>   * Return 0 if the superblock checksum type matches the checksum value of that
>   * algorithm. Pass the raw disk superblock data.
> @@ -3373,7 +3388,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>  		btrfs_release_disk_super(disk_super);
>  		goto fail_alloc;
>  	}
> -
> +	if (btrfs_csum_is_fast(csum_type))
> +		set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);

This ^^^

>  	fs_info->csum_size = btrfs_super_csum_size(disk_super);
>  
>  	ret = btrfs_init_csum_hash(fs_info, csum_type);

should be moved after the initialization btrfs_init_csum_hash so it
would also detect accelerated implementation of other hashes.

> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index d8885966e801cd..e94a4cd06607e1 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1517,8 +1517,6 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>  		shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s", fs_type->name,
>  					s->s_id);
>  		btrfs_sb(s)->bdev_holder = fs_type;
> -		if (!strstr(crc32c_impl(), "generic"))
> -			set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
>  		error = btrfs_fill_super(s, fs_devices, data);
>  	}
>  	if (!error)
> -- 
> 2.39.2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] btrfs: fix fast csum detection
  2023-04-03 18:35   ` David Sterba
@ 2023-04-04  5:06     ` Christoph Hellwig
  2023-04-04 17:13       ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-04-04  5:06 UTC (permalink / raw)
  To: David Sterba; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Apr 03, 2023 at 08:35:26PM +0200, David Sterba wrote:
> > a different checksumming algorithm.  Refactor the code to only checks
> > this if crc32 is actually used.  Note that in an ideal world the
> > information if an algorithm is hardware accelerated or not should be
> > provided by the crypto API instead, but that's left for another day.
> 
> https://lore.kernel.org/linux-crypto/20190514213409.GA115510@gmail.com/
> I got pointed to the driver name, the priority that would say if the
> implementation is accelerated is not exported to the API. This would be
> cleaner but for a simple 'is/is-not' I think it's sufficient.

Except that it diesn't really scale to multiple algorithms very well.
I guess the priority might be the logically best thing to do, so
I'll try to find some time to look into it.

> > +/*
> > + * Check if the CSUM implementation is a fast accelerated one.
> > + * As-is this is a bit of a hack and should be replaced once the
> > + * csum implementations provide that information themselves.
> > + */
> > +static bool btrfs_csum_is_fast(u16 csum_type)
> > +{
> > +	switch (csum_type) {
> > +	case BTRFS_CSUM_TYPE_CRC32:
> > +		return !strstr(crc32c_impl(), "generic");
> 
> This would check the internal shash (lib/libcrc32c.c) not the one we
> allocate for btrfs in btrfs_init_csum_hash. Though they both should be
> equivalent as libcrc32c does some tricks to lookup the fastest
> implementation but theoretically may not find the fast one, while mount
> could.

Yeah.

> > +	if (btrfs_csum_is_fast(csum_type))
> > +		set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
> 
> This ^^^
> 
> >  	fs_info->csum_size = btrfs_super_csum_size(disk_super);
> >  
> >  	ret = btrfs_init_csum_hash(fs_info, csum_type);
> 
> should be moved after the initialization btrfs_init_csum_hash so it
> would also detect accelerated implementation of other hashes.

Sure.  Something like this incremental fix.  Do you want to fold it in
or should I resend the series?

---
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7740bb1b152445..eeefa5105c91d5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -154,21 +154,6 @@ static bool btrfs_supported_super_csum(u16 csum_type)
 	}
 }
 
-/*
- * Check if the CSUM implementation is a fast accelerated one.
- * As-is this is a bit of a hack and should be replaced once the
- * csum implementations provide that information themselves.
- */
-static bool btrfs_csum_is_fast(u16 csum_type)
-{
-	switch (csum_type) {
-	case BTRFS_CSUM_TYPE_CRC32:
-		return !strstr(crc32c_impl(), "generic");
-	default:
-		return false;
-	}
-}
-
 /*
  * Return 0 if the superblock checksum type matches the checksum value of that
  * algorithm. Pass the raw disk superblock data.
@@ -2260,6 +2245,20 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
 
 	fs_info->csum_shash = csum_shash;
 
+	/*
+	 * Check if the CSUM implementation is a fast accelerated one.
+	 * As-is this is a bit of a hack and should be replaced once the csum
+	 * implementations provide that information themselves.
+	 */
+	switch (csum_type) {
+	case BTRFS_CSUM_TYPE_CRC32:
+		if (!strstr(crypto_shash_driver_name(csum_shash), "generic"))
+			set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
+		break;
+	default:
+		break;
+	}
+
 	btrfs_info(fs_info, "using %s (%s) checksum algorithm",
 			btrfs_super_csum_name(csum_type),
 			crypto_shash_driver_name(csum_shash));
@@ -3384,8 +3383,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		btrfs_release_disk_super(disk_super);
 		goto fail_alloc;
 	}
-	if (btrfs_csum_is_fast(csum_type))
-		set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
 	fs_info->csum_size = btrfs_super_csum_size(disk_super);
 
 	ret = btrfs_init_csum_hash(fs_info, csum_type);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] btrfs: fix fast csum detection
  2023-04-04  5:06     ` Christoph Hellwig
@ 2023-04-04 17:13       ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2023-04-04 17:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Apr 03, 2023 at 10:06:16PM -0700, Christoph Hellwig wrote:
> > > +	if (btrfs_csum_is_fast(csum_type))
> > > +		set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
> > 
> > This ^^^
> > 
> > >  	fs_info->csum_size = btrfs_super_csum_size(disk_super);
> > >  
> > >  	ret = btrfs_init_csum_hash(fs_info, csum_type);
> > 
> > should be moved after the initialization btrfs_init_csum_hash so it
> > would also detect accelerated implementation of other hashes.
> 
> Sure.  Something like this incremental fix.  Do you want to fold it in
> or should I resend the series?

I ended up with the same diff when reviewing the patch so I can fold it,
no need to resend. I'll send a separate patch to add xxhash as a fast
implementation, with some numbers.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: remove the sync_writers field in struct btrfs_inode
  2023-03-29  0:13 ` [PATCH 2/4] btrfs: remove the sync_writers field in struct btrfs_inode Christoph Hellwig
@ 2023-04-05 13:50   ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2023-04-05 13:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, Mar 29, 2023 at 09:13:06AM +0900, Christoph Hellwig wrote:
> The sync_writers field communicates if an inode has outstanding
> synchronous writeback.  The better way is to look at the REQ_SYNC
> flag in the bio, which is set by the writeback code for WB_SYNC_ALL
> writeback and can communicate the need for sync writeback without
> an inode field.

This does not go into much detail why it is 'better'. Tracking the
status from the outside as sync_writers applies to the whole inode and
not just the submission context of the individual bios so there's a
difference.

Also the subject should say something about changing the decision logic,
removal of sync_writes is just a side effect.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/bio.c         | 7 +++----
>  fs/btrfs/btrfs_inode.h | 3 ---
>  fs/btrfs/file.c        | 9 ---------
>  fs/btrfs/inode.c       | 1 -
>  fs/btrfs/transaction.c | 2 --
>  5 files changed, 3 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index cf09c6271edbee..c851a3526911f6 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -552,11 +552,10 @@ static void run_one_async_free(struct btrfs_work *work)
>  static bool should_async_write(struct btrfs_bio *bbio)
>  {
>  	/*
> -	 * If the I/O is not issued by fsync and friends, (->sync_writers != 0),
> -	 * then try to defer the submission to a workqueue to parallelize the
> -	 * checksum calculation.
> +	 * Try to defer the submission to a workqueue to parallelize the
> +	 * checksum calculation unless the I/O is issued synchronously.
>  	 */
> -	if (atomic_read(&bbio->inode->sync_writers))
> +	if (op_is_sync(bbio->bio.bi_opf))

So if there's sync_writers set from any of

- btrfs_do_write_iter
- start_ordered_ops
- btrfs_write_marked_extents

then should_async_write would always return false. The change in
behaviour looks like "write everything for this inode synchronously" vs
"write some bios synchronously". I don't see the 1:1 correspondence.

>  		return false;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: don't offload CRCs generation to helpers if it is fast
  2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-03-31 17:31 ` David Sterba
@ 2023-04-05 23:14 ` David Sterba
  2023-04-06  5:23   ` Christoph Hellwig
  6 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2023-04-05 23:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, Mar 29, 2023 at 09:13:04AM +0900, Christoph Hellwig wrote:
> Hi all,
> 
> based on various observations from me and Chris, we really should never
> offload CRCs generation to helpers if it is hardware accelerated.
> 
> This series implements that and also tidies up various lose ends around
> that.

I've picked patch 1 as it's a standalone fix and should go to stable
trees, the rest is in for-next and still queued for 6.4.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: don't offload CRCs generation to helpers if it is fast
  2023-04-05 23:14 ` David Sterba
@ 2023-04-06  5:23   ` Christoph Hellwig
  2023-04-10 17:17     ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-04-06  5:23 UTC (permalink / raw)
  To: David Sterba
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Apr 06, 2023 at 01:14:55AM +0200, David Sterba wrote:
> > This series implements that and also tidies up various lose ends around
> > that.
> 
> I've picked patch 1 as it's a standalone fix and should go to stable
> trees, the rest is in for-next and still queued for 6.4.

So the usual question:

 - should I resend the series with a commit log that you're prefer?
 - or just deliver a new commit log?
 - or are you just going to fix it up?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: don't offload CRCs generation to helpers if it is fast
  2023-04-06  5:23   ` Christoph Hellwig
@ 2023-04-10 17:17     ` Christoph Hellwig
  2023-04-11 19:09       ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-04-10 17:17 UTC (permalink / raw)
  To: David Sterba
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, Apr 05, 2023 at 10:23:10PM -0700, Christoph Hellwig wrote:
> On Thu, Apr 06, 2023 at 01:14:55AM +0200, David Sterba wrote:
> > > This series implements that and also tidies up various lose ends around
> > > that.
> > 
> > I've picked patch 1 as it's a standalone fix and should go to stable
> > trees, the rest is in for-next and still queued for 6.4.
> 
> So the usual question:
> 
>  - should I resend the series with a commit log that you're prefer?
>  - or just deliver a new commit log?
>  - or are you just going to fix it up?

Dave, can you help me on how you want me to proceed here?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: don't offload CRCs generation to helpers if it is fast
  2023-04-10 17:17     ` Christoph Hellwig
@ 2023-04-11 19:09       ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2023-04-11 19:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Christoph Hellwig, Chris Mason, Josef Bacik,
	David Sterba, linux-btrfs

On Mon, Apr 10, 2023 at 10:17:49AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 05, 2023 at 10:23:10PM -0700, Christoph Hellwig wrote:
> > On Thu, Apr 06, 2023 at 01:14:55AM +0200, David Sterba wrote:
> > > > This series implements that and also tidies up various lose ends around
> > > > that.
> > > 
> > > I've picked patch 1 as it's a standalone fix and should go to stable
> > > trees, the rest is in for-next and still queued for 6.4.
> > 
> > So the usual question:
> > 
> >  - should I resend the series with a commit log that you're prefer?
> >  - or just deliver a new commit log?
> >  - or are you just going to fix it up?
> 
> Dave, can you help me on how you want me to proceed here?

Please update the changelog of patch 2. I fix trivial things or do minor
rewording/spelling fixes, when the changelog does not match the code I'd
rather let the original author to write it and me take the second look.

It's getting close to the merge window and there was Easter holiday here
so adding the rest of the patches will be moved to 6.5, also it might be
actually better to group all the workqueue changes together.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-04-11 19:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29  0:13 don't offload CRCs generation to helpers if it is fast Christoph Hellwig
2023-03-29  0:13 ` [PATCH 1/4] btrfs: fix fast csum detection Christoph Hellwig
2023-04-03 18:35   ` David Sterba
2023-04-04  5:06     ` Christoph Hellwig
2023-04-04 17:13       ` David Sterba
2023-03-29  0:13 ` [PATCH 2/4] btrfs: remove the sync_writers field in struct btrfs_inode Christoph Hellwig
2023-04-05 13:50   ` David Sterba
2023-03-29  0:13 ` [PATCH 3/4] btrfs: never defer I/O submission for fast CRC implementations Christoph Hellwig
2023-03-29  0:13 ` [PATCH 4/4] btrfs: remove hipri_workers workqueue Christoph Hellwig
2023-03-30 15:42 ` don't offload CRCs generation to helpers if it is fast Chris Mason
2023-03-30 21:06   ` Christoph Hellwig
2023-03-31 17:31 ` David Sterba
2023-04-05 23:14 ` David Sterba
2023-04-06  5:23   ` Christoph Hellwig
2023-04-10 17:17     ` Christoph Hellwig
2023-04-11 19:09       ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.