All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] bcachefs: checksum merge and writeback fix
@ 2023-09-27 11:23 Brian Foster
  2023-09-27 11:23 ` [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem Brian Foster
  2023-09-27 11:23 ` [PATCH 2/2] bcachefs: remove writeback bio size limit Brian Foster
  0 siblings, 2 replies; 9+ messages in thread
From: Brian Foster @ 2023-09-27 11:23 UTC (permalink / raw)
  To: linux-bcachefs

Hi all,

A couple more small fixes. The first patch fixes a byte order problem in
the boundary code between bcachefs and the crc32c library related to
extent (and thus checksum) merges. The second patch addresses a small
logic wart that leads to more split writeback behavior than necessary,
which is how the problem addressed by patch 1 was observed. Patch 2 was
more of a "I'm not clear on why this size check is here, so let's try to
remove it and see what happens" patch. ;)

From chatting a bit in the meeting yesterday, I think we've already
established this isn't exactly what we want to do here. Instead, we
should still have some sort of bound on the size of a writeback op.
This is discussed a bit in the commit log description, but the consensus
seemed to be that this should just be replaced with a moderately less
conservative size check, such as 16MB or so. That's notably better than
the current limit of 1MB, so unless discussion on this patch leads in a
different direction, I'll probably follow up with a v2 to do that once
I've had a bit more chance to test it. Therefore, patch 2 is included
here mainly for posterity.

Thoughts, reviews, flames appreciated.

Brian

Brian Foster (2):
  bcachefs: fix crc32c checksum merge byte order problem
  bcachefs: remove writeback bio size limit

 fs/bcachefs/checksum.c       | 4 ++--
 fs/bcachefs/fs-io-buffered.c | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem
  2023-09-27 11:23 [PATCH 0/2] bcachefs: checksum merge and writeback fix Brian Foster
@ 2023-09-27 11:23 ` Brian Foster
  2023-09-27 22:08   ` Kent Overstreet
  2023-09-27 11:23 ` [PATCH 2/2] bcachefs: remove writeback bio size limit Brian Foster
  1 sibling, 1 reply; 9+ messages in thread
From: Brian Foster @ 2023-09-27 11:23 UTC (permalink / raw)
  To: linux-bcachefs

An fsstress task on a big endian system (s390x) quickly produces a
bunch of CRC errors in the system logs. Most of these are related to
the narrow CRCs path, but the fundamental problem can be reduced to
a single write and re-read (after dropping caches) of a previously
merged extent.

The key merge path that handles extent merges eventually calls into
bch2_checksum_merge() to combine the CRCs of the associated extents.
This code attempts to avoid a byte order swap by feeding the le64
values into the crc32c code, but the latter casts the resulting u64
value down to a u32, which truncates the high bytes where the actual
crc value ends up. This results in a CRC value that does not change
(since it is merged with a CRC of 0), and checksum failures ensue.

Fix the checksum merge code to swap to cpu byte order on the
boundaries to the external crc code such that any value casting is
handled properly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/bcachefs/checksum.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/bcachefs/checksum.c b/fs/bcachefs/checksum.c
index 1948119edbf4..4cc0196ec9a5 100644
--- a/fs/bcachefs/checksum.c
+++ b/fs/bcachefs/checksum.c
@@ -362,7 +362,7 @@ struct bch_csum bch2_checksum_merge(unsigned type, struct bch_csum a,
 
 	state.type = type;
 	bch2_checksum_init(&state);
-	state.seed = (u64 __force) a.lo;
+	state.seed = le64_to_cpu(a.lo);
 
 	BUG_ON(!bch2_checksum_mergeable(type));
 
@@ -373,7 +373,7 @@ struct bch_csum bch2_checksum_merge(unsigned type, struct bch_csum a,
 				page_address(ZERO_PAGE(0)), page_len);
 		b_len -= page_len;
 	}
-	a.lo = (__le64 __force) bch2_checksum_final(&state);
+	a.lo = cpu_to_le64(bch2_checksum_final(&state));
 	a.lo ^= b.lo;
 	a.hi ^= b.hi;
 	return a;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] bcachefs: remove writeback bio size limit
  2023-09-27 11:23 [PATCH 0/2] bcachefs: checksum merge and writeback fix Brian Foster
  2023-09-27 11:23 ` [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem Brian Foster
@ 2023-09-27 11:23 ` Brian Foster
  2023-09-27 22:03   ` Kent Overstreet
  1 sibling, 1 reply; 9+ messages in thread
From: Brian Foster @ 2023-09-27 11:23 UTC (permalink / raw)
  To: linux-bcachefs

The bcachefs folio writeback code includes a bio full check as well
as a fixed size check when it determines whether to submit the
current write op or continue to add to the current bio. The current
code submits prematurely when the current folio fits exactly in the
remaining space allowed in the current bio, which typically results
in an extent merge that would have otherwise been unnecessary. This
can be observed with a buffered write sized exactly to the current
maximum value (1MB) and with key_merging_disabled=1. The latter
prevents the merge from the second write such that a subsequent
check of the extent list shows a 1020k extent followed by a
contiguous 4k extent.

It's not totally clear why the fixed write size check exists.
bio_full() already checks that the bio can accommodate the current
dirty range being processed, so the only other concern is write
latency. Even then, a 1MB cap seems rather small. For reference,
iomap includes a folio batch size (of 4k) to mitigate latency
associated with writeback completion folio processing, but that
restricts writeback bios to somewhere in the range of 16MB-256MB
depending on folio size (i.e. considering 4k to 64k pages). Unless
there is some known reason for it, remove the size limit and rely on
bio_full() to cap the size of the bio.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/bcachefs/fs-io-buffered.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
index 58ccc7b91ac7..d438b93a3a30 100644
--- a/fs/bcachefs/fs-io-buffered.c
+++ b/fs/bcachefs/fs-io-buffered.c
@@ -607,8 +607,6 @@ static int __bch2_writepage(struct folio *folio,
 		if (w->io &&
 		    (w->io->op.res.nr_replicas != nr_replicas_this_write ||
 		     bio_full(&w->io->op.wbio.bio, sectors << 9) ||
-		     w->io->op.wbio.bio.bi_iter.bi_size + (sectors << 9) >=
-		     (BIO_MAX_VECS * PAGE_SIZE) ||
 		     bio_end_sector(&w->io->op.wbio.bio) != sector))
 			bch2_writepage_do_io(w);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] bcachefs: remove writeback bio size limit
  2023-09-27 11:23 ` [PATCH 2/2] bcachefs: remove writeback bio size limit Brian Foster
@ 2023-09-27 22:03   ` Kent Overstreet
  2023-09-29 14:31     ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2023-09-27 22:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs, linux-fsdevel, hch, djwong

On Wed, Sep 27, 2023 at 07:23:38AM -0400, Brian Foster wrote:
> The bcachefs folio writeback code includes a bio full check as well
> as a fixed size check when it determines whether to submit the
> current write op or continue to add to the current bio. The current
> code submits prematurely when the current folio fits exactly in the
> remaining space allowed in the current bio, which typically results
> in an extent merge that would have otherwise been unnecessary. This
> can be observed with a buffered write sized exactly to the current
> maximum value (1MB) and with key_merging_disabled=1. The latter
> prevents the merge from the second write such that a subsequent
> check of the extent list shows a 1020k extent followed by a
> contiguous 4k extent.
> 
> It's not totally clear why the fixed write size check exists.
> bio_full() already checks that the bio can accommodate the current
> dirty range being processed, so the only other concern is write
> latency. Even then, a 1MB cap seems rather small. For reference,
> iomap includes a folio batch size (of 4k) to mitigate latency
> associated with writeback completion folio processing, but that
> restricts writeback bios to somewhere in the range of 16MB-256MB
> depending on folio size (i.e. considering 4k to 64k pages). Unless
> there is some known reason for it, remove the size limit and rely on
> bio_full() to cap the size of the bio.

We definitely need some sort of a cap; otherwise, there's nothing
preventing us from building up gigabyte+ bios (multipage bvecs, large
folios), and we don't want that.

This probably needs to be a sysctl - better would be a hint provided by
the filesystem based on the performance characteristics of the
device(s), but the writeback code doesn't know which device we're
writing to so that'll be tricky to plumb.

iomap has IOEND_BATCH_SIZE, which is another hard coded limit; perhaps
iomap could use the new sysctl as well.

> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/bcachefs/fs-io-buffered.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
> index 58ccc7b91ac7..d438b93a3a30 100644
> --- a/fs/bcachefs/fs-io-buffered.c
> +++ b/fs/bcachefs/fs-io-buffered.c
> @@ -607,8 +607,6 @@ static int __bch2_writepage(struct folio *folio,
>  		if (w->io &&
>  		    (w->io->op.res.nr_replicas != nr_replicas_this_write ||
>  		     bio_full(&w->io->op.wbio.bio, sectors << 9) ||
> -		     w->io->op.wbio.bio.bi_iter.bi_size + (sectors << 9) >=
> -		     (BIO_MAX_VECS * PAGE_SIZE) ||
>  		     bio_end_sector(&w->io->op.wbio.bio) != sector))
>  			bch2_writepage_do_io(w);
>  
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem
  2023-09-27 11:23 ` [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem Brian Foster
@ 2023-09-27 22:08   ` Kent Overstreet
  2023-09-28 14:12     ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2023-09-27 22:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

On Wed, Sep 27, 2023 at 07:23:37AM -0400, Brian Foster wrote:
> An fsstress task on a big endian system (s390x) quickly produces a
> bunch of CRC errors in the system logs. Most of these are related to
> the narrow CRCs path, but the fundamental problem can be reduced to
> a single write and re-read (after dropping caches) of a previously
> merged extent.
> 
> The key merge path that handles extent merges eventually calls into
> bch2_checksum_merge() to combine the CRCs of the associated extents.
> This code attempts to avoid a byte order swap by feeding the le64
> values into the crc32c code, but the latter casts the resulting u64
> value down to a u32, which truncates the high bytes where the actual
> crc value ends up. This results in a CRC value that does not change
> (since it is merged with a CRC of 0), and checksum failures ensue.
> 
> Fix the checksum merge code to swap to cpu byte order on the
> boundaries to the external crc code such that any value casting is
> handled properly.

Thanks! Applied.

We really need to test creating a filesystem and then reading from it on
an opposite endianness machine, have you gotten a chance to do that?

Otherwise, there's the big endian support in ktest to start looking at
again.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem
  2023-09-27 22:08   ` Kent Overstreet
@ 2023-09-28 14:12     ` Brian Foster
  2023-09-28 16:57       ` Kent Overstreet
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2023-09-28 14:12 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

On Wed, Sep 27, 2023 at 06:08:21PM -0400, Kent Overstreet wrote:
> On Wed, Sep 27, 2023 at 07:23:37AM -0400, Brian Foster wrote:
> > An fsstress task on a big endian system (s390x) quickly produces a
> > bunch of CRC errors in the system logs. Most of these are related to
> > the narrow CRCs path, but the fundamental problem can be reduced to
> > a single write and re-read (after dropping caches) of a previously
> > merged extent.
> > 
> > The key merge path that handles extent merges eventually calls into
> > bch2_checksum_merge() to combine the CRCs of the associated extents.
> > This code attempts to avoid a byte order swap by feeding the le64
> > values into the crc32c code, but the latter casts the resulting u64
> > value down to a u32, which truncates the high bytes where the actual
> > crc value ends up. This results in a CRC value that does not change
> > (since it is merged with a CRC of 0), and checksum failures ensue.
> > 
> > Fix the checksum merge code to swap to cpu byte order on the
> > boundaries to the external crc code such that any value casting is
> > handled properly.
> 
> Thanks! Applied.
> 
> We really need to test creating a filesystem and then reading from it on
> an opposite endianness machine, have you gotten a chance to do that?
> 

I gave it a quick test by just dd'ing the disk image off my fstests
TEST_DEV from the BE box I've been playing with and mounting it on a LE
system. The fs mounts, but eventually complains about a backpointer
issue after some stress I/O:

 bcachefs (loop0): error validating btree node at btree backpointers level 0/1
   u64s 11 type btree_ptr_v2 0:5342578688:0 len 0 ver 0: seq 8574dcb72b17e918 written 486 min_key 0:3338403840:1 durability: 1 ptr: 0:10388:0 gen 6
   node offset 486 bset u64s 1300: invalid bkey: backpointer at wrong pos
   u64s 9 type backpointer 0:3339255808:0 len 0 ver 0: bucket=0:6369:0 btree=extents l=0 offset=0:256 len=64 pos=536913736:256:U32_MAX, shutting down
 bcachefs (loop0): inconsistency detected - emergency read only
 bcachefs (loop0): __bch2_btree_write_buffer_flush: insert error EIO
 bcachefs (loop0 inum 201326618 offset 246272): write error while doing btree update: EIO

... and fsck similarly complains about a bunch more bp and lru related
inconsistencies. Write buffer issue, perhaps? At a glance, that seq
value looks kind of bogus, but I haven't had a chance to dig into the
details yet. Everything seems in order with the same image file on the
BE box, FWIW.

Brian

> Otherwise, there's the big endian support in ktest to start looking at
> again.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem
  2023-09-28 14:12     ` Brian Foster
@ 2023-09-28 16:57       ` Kent Overstreet
  2023-09-29  8:13         ` Janpieter Sollie
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2023-09-28 16:57 UTC (permalink / raw)
  To: Brian Foster, janpieter.sollie; +Cc: linux-bcachefs

On Thu, Sep 28, 2023 at 10:12:18AM -0400, Brian Foster wrote:
> On Wed, Sep 27, 2023 at 06:08:21PM -0400, Kent Overstreet wrote:
> > On Wed, Sep 27, 2023 at 07:23:37AM -0400, Brian Foster wrote:
> > > An fsstress task on a big endian system (s390x) quickly produces a
> > > bunch of CRC errors in the system logs. Most of these are related to
> > > the narrow CRCs path, but the fundamental problem can be reduced to
> > > a single write and re-read (after dropping caches) of a previously
> > > merged extent.
> > > 
> > > The key merge path that handles extent merges eventually calls into
> > > bch2_checksum_merge() to combine the CRCs of the associated extents.
> > > This code attempts to avoid a byte order swap by feeding the le64
> > > values into the crc32c code, but the latter casts the resulting u64
> > > value down to a u32, which truncates the high bytes where the actual
> > > crc value ends up. This results in a CRC value that does not change
> > > (since it is merged with a CRC of 0), and checksum failures ensue.
> > > 
> > > Fix the checksum merge code to swap to cpu byte order on the
> > > boundaries to the external crc code such that any value casting is
> > > handled properly.
> > 
> > Thanks! Applied.
> > 
> > We really need to test creating a filesystem and then reading from it on
> > an opposite endianness machine, have you gotten a chance to do that?
> > 
> 
> I gave it a quick test by just dd'ing the disk image off my fstests
> TEST_DEV from the BE box I've been playing with and mounting it on a LE
> system. The fs mounts, but eventually complains about a backpointer
> issue after some stress I/O:
> 
>  bcachefs (loop0): error validating btree node at btree backpointers level 0/1
>    u64s 11 type btree_ptr_v2 0:5342578688:0 len 0 ver 0: seq 8574dcb72b17e918 written 486 min_key 0:3338403840:1 durability: 1 ptr: 0:10388:0 gen 6
>    node offset 486 bset u64s 1300: invalid bkey: backpointer at wrong pos
>    u64s 9 type backpointer 0:3339255808:0 len 0 ver 0: bucket=0:6369:0 btree=extents l=0 offset=0:256 len=64 pos=536913736:256:U32_MAX, shutting down
>  bcachefs (loop0): inconsistency detected - emergency read only
>  bcachefs (loop0): __bch2_btree_write_buffer_flush: insert error EIO
>  bcachefs (loop0 inum 201326618 offset 246272): write error while doing btree update: EIO
> 
> ... and fsck similarly complains about a bunch more bp and lru related
> inconsistencies. Write buffer issue, perhaps? At a glance, that seq
> value looks kind of bogus, but I haven't had a chance to dig into the
> details yet. Everything seems in order with the same image file on the
> BE box, FWIW.

bch_backpointer looks highly suspect re: endianness, if it's not fixable
we'll have to do a bch_backpointer_v2. I expect it will be fixable
though, just tricky.

The LRU btree is just a bitset btree now, so that shouldn't have
endianness issues.

So yeah, we definitely need to get automated foreign endianness testing
going - if there's going to be more of this I don't want us to be doing
this by hand, and we need to make sure issues like this get caught in
the future.

jpsollie was looking at ktest support for big endian architectures
recently and had some patches for that, I just haven't had time to look
at them - jpsollie, do you think you can post those patches to the list?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem
  2023-09-28 16:57       ` Kent Overstreet
@ 2023-09-29  8:13         ` Janpieter Sollie
  0 siblings, 0 replies; 9+ messages in thread
From: Janpieter Sollie @ 2023-09-29  8:13 UTC (permalink / raw)
  To: Kent Overstreet, Brian Foster; +Cc: linux-bcachefs



Op 28/09/2023 om 18:57 schreef Kent Overstreet:
> On Thu, Sep 28, 2023 at 10:12:18AM -0400, Brian Foster wrote:
>> On Wed, Sep 27, 2023 at 06:08:21PM -0400, Kent Overstreet wrote:
>>> On Wed, Sep 27, 2023 at 07:23:37AM -0400, Brian Foster wrote:
>>>> An fsstress task on a big endian system (s390x) quickly produces a
>>>> bunch of CRC errors in the system logs. Most of these are related to
>>>> the narrow CRCs path, but the fundamental problem can be reduced to
>>>> a single write and re-read (after dropping caches) of a previously
>>>> merged extent.
>>>>
>>>> The key merge path that handles extent merges eventually calls into
>>>> bch2_checksum_merge() to combine the CRCs of the associated extents.
>>>> This code attempts to avoid a byte order swap by feeding the le64
>>>> values into the crc32c code, but the latter casts the resulting u64
>>>> value down to a u32, which truncates the high bytes where the actual
>>>> crc value ends up. This results in a CRC value that does not change
>>>> (since it is merged with a CRC of 0), and checksum failures ensue.
>>>>
>>>> Fix the checksum merge code to swap to cpu byte order on the
>>>> boundaries to the external crc code such that any value casting is
>>>> handled properly.
>>> Thanks! Applied.
>>>
>>> We really need to test creating a filesystem and then reading from it on
>>> an opposite endianness machine, have you gotten a chance to do that?
>>>
>> I gave it a quick test by just dd'ing the disk image off my fstests
>> TEST_DEV from the BE box I've been playing with and mounting it on a LE
>> system. The fs mounts, but eventually complains about a backpointer
>> issue after some stress I/O:
>>
>>   bcachefs (loop0): error validating btree node at btree backpointers level 0/1
>>     u64s 11 type btree_ptr_v2 0:5342578688:0 len 0 ver 0: seq 8574dcb72b17e918 written 486 min_key 0:3338403840:1 durability: 1 ptr: 0:10388:0 gen 6
>>     node offset 486 bset u64s 1300: invalid bkey: backpointer at wrong pos
>>     u64s 9 type backpointer 0:3339255808:0 len 0 ver 0: bucket=0:6369:0 btree=extents l=0 offset=0:256 len=64 pos=536913736:256:U32_MAX, shutting down
>>   bcachefs (loop0): inconsistency detected - emergency read only
>>   bcachefs (loop0): __bch2_btree_write_buffer_flush: insert error EIO
>>   bcachefs (loop0 inum 201326618 offset 246272): write error while doing btree update: EIO
>>
>> ... and fsck similarly complains about a bunch more bp and lru related
>> inconsistencies. Write buffer issue, perhaps? At a glance, that seq
>> value looks kind of bogus, but I haven't had a chance to dig into the
>> details yet. Everything seems in order with the same image file on the
>> BE box, FWIW.
> bch_backpointer looks highly suspect re: endianness, if it's not fixable
> we'll have to do a bch_backpointer_v2. I expect it will be fixable
> though, just tricky.
>
> The LRU btree is just a bitset btree now, so that shouldn't have
> endianness issues.
>
> So yeah, we definitely need to get automated foreign endianness testing
> going - if there's going to be more of this I don't want us to be doing
> this by hand, and we need to make sure issues like this get caught in
> the future.
>
> jpsollie was looking at ktest support for big endian architectures
> recently and had some patches for that, I just haven't had time to look
> at them - jpsollie, do you think you can post those patches to the list?


Currently finalizing them,

The issue is - mostly:
all BE architectures for debian are unofficial, and when you want to use a unofficial port,
you'll need to use debian sid, which is a trial-and-error when talking about dependencies

I'll create a patch of the commits in the PR.

Janpieter Sollie


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] bcachefs: remove writeback bio size limit
  2023-09-27 22:03   ` Kent Overstreet
@ 2023-09-29 14:31     ` Brian Foster
  0 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2023-09-29 14:31 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs, linux-fsdevel, hch, djwong

On Wed, Sep 27, 2023 at 06:03:26PM -0400, Kent Overstreet wrote:
> On Wed, Sep 27, 2023 at 07:23:38AM -0400, Brian Foster wrote:
> > The bcachefs folio writeback code includes a bio full check as well
> > as a fixed size check when it determines whether to submit the
> > current write op or continue to add to the current bio. The current
> > code submits prematurely when the current folio fits exactly in the
> > remaining space allowed in the current bio, which typically results
> > in an extent merge that would have otherwise been unnecessary. This
> > can be observed with a buffered write sized exactly to the current
> > maximum value (1MB) and with key_merging_disabled=1. The latter
> > prevents the merge from the second write such that a subsequent
> > check of the extent list shows a 1020k extent followed by a
> > contiguous 4k extent.
> > 
> > It's not totally clear why the fixed write size check exists.
> > bio_full() already checks that the bio can accommodate the current
> > dirty range being processed, so the only other concern is write
> > latency. Even then, a 1MB cap seems rather small. For reference,
> > iomap includes a folio batch size (of 4k) to mitigate latency
> > associated with writeback completion folio processing, but that
> > restricts writeback bios to somewhere in the range of 16MB-256MB
> > depending on folio size (i.e. considering 4k to 64k pages). Unless
> > there is some known reason for it, remove the size limit and rely on
> > bio_full() to cap the size of the bio.
> 
> We definitely need some sort of a cap; otherwise, there's nothing
> preventing us from building up gigabyte+ bios (multipage bvecs, large
> folios), and we don't want that.
> 

Yeah, I forgot about the multipage bvecs case when I was first reading
through this code.

> This probably needs to be a sysctl - better would be a hint provided by
> the filesystem based on the performance characteristics of the
> device(s), but the writeback code doesn't know which device we're
> writing to so that'll be tricky to plumb.
> 

Agree, but IIUC it looks like there's more work to do (related to the
write bounce code) before we can change this limit. I've got a v2 of
this patch, which I'll post shortly, that retains the limit, fixes up
the logic wart that originally brought attention to this code, and adds
some comments. Thanks for the additional context.

Brian

> iomap has IOEND_BATCH_SIZE, which is another hard coded limit; perhaps
> iomap could use the new sysctl as well.
> 
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/bcachefs/fs-io-buffered.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
> > index 58ccc7b91ac7..d438b93a3a30 100644
> > --- a/fs/bcachefs/fs-io-buffered.c
> > +++ b/fs/bcachefs/fs-io-buffered.c
> > @@ -607,8 +607,6 @@ static int __bch2_writepage(struct folio *folio,
> >  		if (w->io &&
> >  		    (w->io->op.res.nr_replicas != nr_replicas_this_write ||
> >  		     bio_full(&w->io->op.wbio.bio, sectors << 9) ||
> > -		     w->io->op.wbio.bio.bi_iter.bi_size + (sectors << 9) >=
> > -		     (BIO_MAX_VECS * PAGE_SIZE) ||
> >  		     bio_end_sector(&w->io->op.wbio.bio) != sector))
> >  			bch2_writepage_do_io(w);
> >  
> > -- 
> > 2.41.0
> > 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-09-29 14:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-27 11:23 [PATCH 0/2] bcachefs: checksum merge and writeback fix Brian Foster
2023-09-27 11:23 ` [PATCH 1/2] bcachefs: fix crc32c checksum merge byte order problem Brian Foster
2023-09-27 22:08   ` Kent Overstreet
2023-09-28 14:12     ` Brian Foster
2023-09-28 16:57       ` Kent Overstreet
2023-09-29  8:13         ` Janpieter Sollie
2023-09-27 11:23 ` [PATCH 2/2] bcachefs: remove writeback bio size limit Brian Foster
2023-09-27 22:03   ` Kent Overstreet
2023-09-29 14:31     ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.