All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: don't limit direct reads to a single sector
@ 2022-06-16  8:02 Christoph Hellwig
  2022-06-17 14:31 ` David Sterba
  2022-06-17 15:01 ` Nikolay Borisov
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-16  8:02 UTC (permalink / raw)
  To: clm, josef, dsterba; +Cc: linux-btrfs

Btrfs currently limits direct I/O reads to a single sector, which goes
back to commit c329861da406 ("Btrfs: don't allocate a separate csums
array for direct reads") from Josef.  That commit changes the direct I/O
code to ".. use the private part of the io_tree for our csums.", but ten
years later that isn't how checksums for direct reads work, instead they
use a csums allocation on a per-btrfs_dio_private basis (which have their
own performane problem for small I/O, but that will be addressed later).

Lift this limit to improve performance for large direct reads.  For
example a fio run doing 1 MiB aio reads with a queue depth of 1 roughly
tripples the throughput:

Baseline:

READ: bw=65.3MiB/s (68.5MB/s), 65.3MiB/s-65.3MiB/s (68.5MB/s-68.5MB/s), io=19.1GiB (20.6GB), run=300013-300013msec

With this patch:

READ: bw=196MiB/s (206MB/s), 196MiB/s-196MiB/s (206MB/s-206MB/s), io=57.5GiB (61.7GB), run=300006-300006msc

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6f665bf59f620..deb79e80e4e95 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7585,19 +7585,14 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
 	struct btrfs_dio_data *dio_data = iter->private;
-	u64 lockstart, lockend;
+	u64 lockstart = start;
+	u64 lockend = start + length - 1;
 	const bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
 	u64 len = length;
 	const u64 data_alloc_len = length;
 	bool unlock_extents = false;
 
-	if (!write)
-		len = min_t(u64, len, fs_info->sectorsize);
-
-	lockstart = start;
-	lockend = start + len - 1;
-
 	/*
 	 * iomap_dio_rw() only does filemap_write_and_wait_range(), which isn't
 	 * enough if we've written compressed pages to this area, so we need to
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: don't limit direct reads to a single sector
  2022-06-16  8:02 [PATCH] btrfs: don't limit direct reads to a single sector Christoph Hellwig
@ 2022-06-17 14:31 ` David Sterba
  2022-06-17 14:46   ` Nikolay Borisov
  2022-06-17 15:01 ` Nikolay Borisov
  1 sibling, 1 reply; 7+ messages in thread
From: David Sterba @ 2022-06-17 14:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: clm, josef, dsterba, linux-btrfs, nborisov

On Thu, Jun 16, 2022 at 10:02:24AM +0200, Christoph Hellwig wrote:
> Btrfs currently limits direct I/O reads to a single sector, which goes
> back to commit c329861da406 ("Btrfs: don't allocate a separate csums
> array for direct reads") from Josef.  That commit changes the direct I/O
> code to ".. use the private part of the io_tree for our csums.", but ten
> years later that isn't how checksums for direct reads work, instead they
> use a csums allocation on a per-btrfs_dio_private basis (which have their
> own performane problem for small I/O, but that will be addressed later).
> 
> Lift this limit to improve performance for large direct reads.  For
> example a fio run doing 1 MiB aio reads with a queue depth of 1 roughly
> tripples the throughput:
> 
> Baseline:
> 
> READ: bw=65.3MiB/s (68.5MB/s), 65.3MiB/s-65.3MiB/s (68.5MB/s-68.5MB/s), io=19.1GiB (20.6GB), run=300013-300013msec
> 
> With this patch:
> 
> READ: bw=196MiB/s (206MB/s), 196MiB/s-196MiB/s (206MB/s-206MB/s), io=57.5GiB (61.7GB), run=300006-300006msc

Nice catch, thanks.

Nikolay, you did some experiments with larger dio, but IIRC it was on
the bio layer. This looks like a fix for the same issue but I'm not
sure.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: don't limit direct reads to a single sector
  2022-06-17 14:31 ` David Sterba
@ 2022-06-17 14:46   ` Nikolay Borisov
  0 siblings, 0 replies; 7+ messages in thread
From: Nikolay Borisov @ 2022-06-17 14:46 UTC (permalink / raw)
  To: dsterba, Christoph Hellwig, clm, josef, dsterba, linux-btrfs



On 17.06.22 г. 17:31 ч., David Sterba wrote:
> Nikolay, you did some experiments with larger dio, but IIRC it was on
> the bio layer. This looks like a fix for the same issue but I'm not
> sure.

AFAIR I did create some patches to limit the size of the bios due to the 
memory we needed in order to save the csums for those as we were having 
some OOM situation. Still what I remember is we eventually settled to 
using vmalloc and those patches never really materialized upstream.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: don't limit direct reads to a single sector
  2022-06-16  8:02 [PATCH] btrfs: don't limit direct reads to a single sector Christoph Hellwig
  2022-06-17 14:31 ` David Sterba
@ 2022-06-17 15:01 ` Nikolay Borisov
  2022-06-17 15:58   ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Nikolay Borisov @ 2022-06-17 15:01 UTC (permalink / raw)
  To: Christoph Hellwig, clm, josef, dsterba; +Cc: linux-btrfs



On 16.06.22 г. 11:02 ч., Christoph Hellwig wrote:
> Btrfs currently limits direct I/O reads to a single sector, which goes
> back to commit c329861da406 ("Btrfs: don't allocate a separate csums
> array for direct reads") from Josef.  That commit changes the direct I/O
> code to ".. use the private part of the io_tree for our csums.", but ten
> years later that isn't how checksums for direct reads work, instead they
> use a csums allocation on a per-btrfs_dio_private basis (which have their
> own performane problem for small I/O, but that will be addressed later).
> 
> Lift this limit to improve performance for large direct reads.  For
> example a fio run doing 1 MiB aio reads with a queue depth of 1 roughly
> tripples the throughput:
> 
> Baseline:
> 
> READ: bw=65.3MiB/s (68.5MB/s), 65.3MiB/s-65.3MiB/s (68.5MB/s-68.5MB/s), io=19.1GiB (20.6GB), run=300013-300013msec
> 
> With this patch:
> 
> READ: bw=196MiB/s (206MB/s), 196MiB/s-196MiB/s (206MB/s-206MB/s), io=57.5GiB (61.7GB), run=300006-300006msc
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/inode.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 6f665bf59f620..deb79e80e4e95 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7585,19 +7585,14 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
>   	struct extent_map *em;
>   	struct extent_state *cached_state = NULL;
>   	struct btrfs_dio_data *dio_data = iter->private;
> -	u64 lockstart, lockend;
> +	u64 lockstart = start;
> +	u64 lockend = start + length - 1;
>   	const bool write = !!(flags & IOMAP_WRITE);
>   	int ret = 0;
>   	u64 len = length;
>   	const u64 data_alloc_len = length;
>   	bool unlock_extents = false;
>   
> -	if (!write)
> -		len = min_t(u64, len, fs_info->sectorsize);
> -
> -	lockstart = start;
> -	lockend = start + len - 1;
> -

The limit here has direct repercussion on how much space we could 
potentially allocated in btrfs_submit_direct for csums. For 1m read this 
means we end up with 256 entries array (provided a blocksize of 4k). For 
crc32c that's 1k of data, for blake2/sha256 that'd be 8k of data. It 
would be good to put this into the change log to demonstrated that at 
least some consideration has been given to this aspect.


In any case perhaps putting some sane upper limit might make sense?

>   	/*
>   	 * iomap_dio_rw() only does filemap_write_and_wait_range(), which isn't
>   	 * enough if we've written compressed pages to this area, so we need to

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: don't limit direct reads to a single sector
  2022-06-17 15:01 ` Nikolay Borisov
@ 2022-06-17 15:58   ` Christoph Hellwig
  2022-06-20  7:52     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-17 15:58 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Christoph Hellwig, clm, josef, dsterba, linux-btrfs

On Fri, Jun 17, 2022 at 06:01:07PM +0300, Nikolay Borisov wrote:
> The limit here has direct repercussion on how much space we could 
> potentially allocated in btrfs_submit_direct for csums. For 1m read this 
> means we end up with 256 entries array (provided a blocksize of 4k). For 
> crc32c that's 1k of data, for blake2/sha256 that'd be 8k of data. It would 
> be good to put this into the change log to demonstrated that at least some 
> consideration has been given to this aspect.
> In any case perhaps putting some sane upper limit might make sense?

Now that you mentioned an upper limit around 1MB might make sense. 
Below that this new code isn't any different from how the bbio->csum is
allocated for buffered I/O, and in one of the next round of patches I do
plan to switch it over to using that, including the optimization with
the inline csums to help small I/O.

But with huge pages we can go above the 1MB for direct I/O, while for
buffered I/O we'd need large folio support first, which isn't supported
for btrfs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: don't limit direct reads to a single sector
  2022-06-17 15:58   ` Christoph Hellwig
@ 2022-06-20  7:52     ` Christoph Hellwig
  2022-06-20 15:00       ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-20  7:52 UTC (permalink / raw)
  To: dsterba; +Cc: Nikolay Borisov, clm, josef, linux-btrfs

David, I saw this patch is already in misc-next.  Do you want me to
send an incremental patch or a new revision with the limit?  Incremental
would look kinda silly, so I'd prefer the updated version.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: don't limit direct reads to a single sector
  2022-06-20  7:52     ` Christoph Hellwig
@ 2022-06-20 15:00       ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2022-06-20 15:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: dsterba, Nikolay Borisov, clm, josef, linux-btrfs

On Mon, Jun 20, 2022 at 09:52:16AM +0200, Christoph Hellwig wrote:
> David, I saw this patch is already in misc-next.  Do you want me to
> send an incremental patch or a new revision with the limit?  Incremental
> would look kinda silly, so I'd prefer the updated version.

Please send a v2 so we can see the limit code in context. Otherwise,
it also works to send an incremental in case it's something obvious.
Patches in misc-next are meant to be updated or replaced if needed, we
want to have self contained commits instead of fixes-on-fixes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-20 15:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16  8:02 [PATCH] btrfs: don't limit direct reads to a single sector Christoph Hellwig
2022-06-17 14:31 ` David Sterba
2022-06-17 14:46   ` Nikolay Borisov
2022-06-17 15:01 ` Nikolay Borisov
2022-06-17 15:58   ` Christoph Hellwig
2022-06-20  7:52     ` Christoph Hellwig
2022-06-20 15:00       ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.