All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
@ 2023-12-21  1:27 Song Liu
  2023-12-21  1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Song Liu @ 2023-12-21  1:27 UTC (permalink / raw)
  To: linux-block, linux-raid
  Cc: axboe, kent.overstreet, janpieter.sollie, colyli, bagasdotme, Song Liu

A recent bug report [1] shows md is handling a flush from bcachefs as read:

bch2_journal_write=>
  submit_bio=>
    ...
    md_handle_request =>
      raid5_make_request =>
        chunk_aligned_read =>
          raid5_read_one_chunk =>
	    ...

It appears md code only checks REQ_PREFLUSH for flush requests, which
doesn't cover all cases. OTOH, op_is_flush() doesn't check REQ_OP_FLUSH
either.

Fix this by:
1) Check REQ_PREFLUSH in op_is_flush();
2) Use op_is_flush() in md code.

Thanks,
Song

[1] https://bugzilla.kernel.org/show_bug.cgi?id=218184

Song Liu (2):
  block: Check REQ_OP_FLUSH in op_is_flush()
  md: Use op_is_flush() to check flush bio

 drivers/md/raid0.c        | 2 +-
 drivers/md/raid1.c        | 2 +-
 drivers/md/raid10.c       | 2 +-
 drivers/md/raid5.c        | 2 +-
 include/linux/blk_types.h | 3 ++-
 5 files changed, 6 insertions(+), 5 deletions(-)

--
2.34.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush()
  2023-12-21  1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
@ 2023-12-21  1:27 ` Song Liu
  2023-12-21  6:12   ` Christoph Hellwig
  2023-12-21  1:27 ` [PATCH 2/2] md: Use op_is_flush() to check flush bio Song Liu
  2023-12-21  3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
  2 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2023-12-21  1:27 UTC (permalink / raw)
  To: linux-block, linux-raid
  Cc: axboe, kent.overstreet, janpieter.sollie, colyli, bagasdotme, Song Liu

Upper layer (fs, etc.) may issue flush with something like:

  bio_reset(bio, bdev, REQ_OP_FLUSH);
  bio->bi_end_io = xxx;
  submit_bio(bio);

op_is_flush(bio->bi_opf) should return true for this bio.

Signed-off-by: Song Liu <song@kernel.org>
---
 include/linux/blk_types.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d5c5e59ddbd2..338423da84ca 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -487,7 +487,8 @@ static inline bool op_is_write(blk_opf_t op)
  */
 static inline bool op_is_flush(blk_opf_t op)
 {
-	return op & (REQ_FUA | REQ_PREFLUSH);
+	return op & (REQ_FUA | REQ_PREFLUSH) ||
+		(op & REQ_OP_MASK) == REQ_OP_FLUSH;
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] md: Use op_is_flush() to check flush bio
  2023-12-21  1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
  2023-12-21  1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
@ 2023-12-21  1:27 ` Song Liu
  2023-12-21  3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
  2 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2023-12-21  1:27 UTC (permalink / raw)
  To: linux-block, linux-raid
  Cc: axboe, kent.overstreet, janpieter.sollie, colyli, bagasdotme, Song Liu

op_is_flush() covers different ways to request flush. Use it instead of
simply checking against REQ_PREFLUSH.

Signed-off-by: Song Liu <song@kernel.org>
---
 drivers/md/raid0.c  | 2 +-
 drivers/md/raid1.c  | 2 +-
 drivers/md/raid10.c | 2 +-
 drivers/md/raid5.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c50a7abda744..20283dc5208a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -592,7 +592,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 	unsigned chunk_sects;
 	unsigned sectors;
 
-	if (unlikely(bio->bi_opf & REQ_PREFLUSH)
+	if (unlikely(op_is_flush(bio->bi_opf))
 	    && md_flush_request(mddev, bio))
 		return true;
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index aaa434f0c175..5c1dadd7fbb6 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1581,7 +1581,7 @@ static bool raid1_make_request(struct mddev *mddev, struct bio *bio)
 {
 	sector_t sectors;
 
-	if (unlikely(bio->bi_opf & REQ_PREFLUSH)
+	if (unlikely(op_is_flush(bio->bi_opf))
 	    && md_flush_request(mddev, bio))
 		return true;
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7412066ea22c..5c6e0a8635f2 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1857,7 +1857,7 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio)
 	int chunk_sects = chunk_mask + 1;
 	int sectors = bio_sectors(bio);
 
-	if (unlikely(bio->bi_opf & REQ_PREFLUSH)
+	if (unlikely(op_is_flush(bio->bi_opf))
 	    && md_flush_request(mddev, bio))
 		return true;
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e57deb1c6138..1bcf96b490a7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6070,7 +6070,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 	enum stripe_result res;
 	int s, stripe_cnt;
 
-	if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
+	if (unlikely(op_is_flush(bi->bi_opf))) {
 		int ret = log_handle_flush_request(conf, bi);
 
 		if (ret == 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
  2023-12-21  1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
  2023-12-21  1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
  2023-12-21  1:27 ` [PATCH 2/2] md: Use op_is_flush() to check flush bio Song Liu
@ 2023-12-21  3:36 ` Ed Tsai (蔡宗軒)
  2023-12-21  5:30   ` Kent Overstreet
  2 siblings, 1 reply; 9+ messages in thread
From: Ed Tsai (蔡宗軒) @ 2023-12-21  3:36 UTC (permalink / raw)
  To: song
  Cc: bagasdotme, colyli, linux-raid, linux-block, janpieter.sollie,
	axboe, kent.overstreet

On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  A recent bug report [1] shows md is handling a flush from bcachefs
> as read:
> 
> bch2_journal_write=>
>   submit_bio=>
>     ...
>     md_handle_request =>
>       raid5_make_request =>
>         chunk_aligned_read =>
>           raid5_read_one_chunk =>
> 	    ...
> 
> It appears md code only checks REQ_PREFLUSH for flush requests, which
> doesn't cover all cases. OTOH, op_is_flush() doesn't check
> REQ_OP_FLUSH
> either.
> 
> Fix this by:
> 1) Check REQ_PREFLUSH in op_is_flush();
> 2) Use op_is_flush() in md code.
> 
> Thanks,
> Song
> 
> [1] 
> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
> 

REQ_OP_FLUSH is only used by the block layer's flush code, and the
filesystem should use REQ_PREFLUSH with an empty write bio.

If we want upper layer to be able to directly send REQ_OP_FLUSH bio,
then we should retrieve all REQ_PREFLUSH to confirm. At least for now,
it seems that REQ_OP_FLUSH without REQ_PREFLUSH in `blk_flush_policy`
will directly return 0 and no flush operation will be sent to the
driver.

>  
> 
> Song Liu (2):
>   block: Check REQ_OP_FLUSH in op_is_flush()
>   md: Use op_is_flush() to check flush bio
> 
>  drivers/md/raid0.c        | 2 +-
>  drivers/md/raid1.c        | 2 +-
>  drivers/md/raid10.c       | 2 +-
>  drivers/md/raid5.c        | 2 +-
>  include/linux/blk_types.h | 3 ++-
>  5 files changed, 6 insertions(+), 5 deletions(-)
> 
> --
> 2.34.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
  2023-12-21  3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
@ 2023-12-21  5:30   ` Kent Overstreet
  2023-12-21  7:56     ` Ed Tsai (蔡宗軒)
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2023-12-21  5:30 UTC (permalink / raw)
  To: Ed Tsai (蔡宗軒)
  Cc: song, bagasdotme, colyli, linux-raid, linux-block,
	janpieter.sollie, axboe

On Thu, Dec 21, 2023 at 03:36:40AM +0000, Ed Tsai (蔡宗軒) wrote:
> On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  A recent bug report [1] shows md is handling a flush from bcachefs
> > as read:
> > 
> > bch2_journal_write=>
> >   submit_bio=>
> >     ...
> >     md_handle_request =>
> >       raid5_make_request =>
> >         chunk_aligned_read =>
> >           raid5_read_one_chunk =>
> > 	    ...
> > 
> > It appears md code only checks REQ_PREFLUSH for flush requests, which
> > doesn't cover all cases. OTOH, op_is_flush() doesn't check
> > REQ_OP_FLUSH
> > either.
> > 
> > Fix this by:
> > 1) Check REQ_PREFLUSH in op_is_flush();
> > 2) Use op_is_flush() in md code.
> > 
> > Thanks,
> > Song
> > 
> > [1] 
> > https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
> > 
> 
> REQ_OP_FLUSH is only used by the block layer's flush code, and the
> filesystem should use REQ_PREFLUSH with an empty write bio.
> 
> If we want upper layer to be able to directly send REQ_OP_FLUSH bio,
> then we should retrieve all REQ_PREFLUSH to confirm. At least for now,
> it seems that REQ_OP_FLUSH without REQ_PREFLUSH in `blk_flush_policy`
> will directly return 0 and no flush operation will be sent to the
> driver.

If that's the case, then it should be documented and there should be a
WARN_ON() in generic_make_request().

Also, glancing at blk_types.h, we have the req_op and req_flag_bits both
using (__force blk_opf_t), but using the same bit range - what the hell?
That's seriously broken...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush()
  2023-12-21  1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
@ 2023-12-21  6:12   ` Christoph Hellwig
  2023-12-21  7:53     ` Song Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2023-12-21  6:12 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-block, linux-raid, axboe, kent.overstreet,
	janpieter.sollie, colyli, bagasdotme

On Wed, Dec 20, 2023 at 05:27:14PM -0800, Song Liu wrote:
> Upper layer (fs, etc.) may issue flush with something like:
> 
>   bio_reset(bio, bdev, REQ_OP_FLUSH);
>   bio->bi_end_io = xxx;
>   submit_bio(bio);

No, they can't.  REQ_OP_FLUSH is currently only used by request
based drivers and only generated by the flush state machine.

(Not that I particularly like this wart, I'd much prefer file systems
to submit a REQ_OP_FLUSH than an empty write with a preflush flag,
but that's not a trivial change).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush()
  2023-12-21  6:12   ` Christoph Hellwig
@ 2023-12-21  7:53     ` Song Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2023-12-21  7:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, linux-raid, axboe, kent.overstreet,
	janpieter.sollie, colyli, bagasdotme

On Wed, Dec 20, 2023 at 10:12 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Dec 20, 2023 at 05:27:14PM -0800, Song Liu wrote:
> > Upper layer (fs, etc.) may issue flush with something like:
> >
> >   bio_reset(bio, bdev, REQ_OP_FLUSH);
> >   bio->bi_end_io = xxx;
> >   submit_bio(bio);
>
> No, they can't.  REQ_OP_FLUSH is currently only used by request
> based drivers and only generated by the flush state machine.

Hmm... Then this call trace from bcachefs is the exception here:

bch2_journal_write=>
  submit_bio=>
    ...
    md_handle_request =>
      raid5_make_request =>
        chunk_aligned_read =>
          raid5_read_one_chunk =>

> (Not that I particularly like this wart, I'd much prefer file systems
> to submit a REQ_OP_FLUSH than an empty write with a preflush flag,
> but that's not a trivial change).

 Kent, I think we need to update how bcachefs issue flushes.

Thanks,
Song

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
  2023-12-21  5:30   ` Kent Overstreet
@ 2023-12-21  7:56     ` Ed Tsai (蔡宗軒)
  2023-12-21 19:19       ` Kent Overstreet
  0 siblings, 1 reply; 9+ messages in thread
From: Ed Tsai (蔡宗軒) @ 2023-12-21  7:56 UTC (permalink / raw)
  To: kent.overstreet
  Cc: linux-raid, colyli, song, linux-block, bagasdotme,
	janpieter.sollie, axboe

On Thu, 2023-12-21 at 00:30 -0500, Kent Overstreet wrote:
>  On Thu, Dec 21, 2023 at 03:36:40AM +0000, Ed Tsai (蔡宗軒) wrote:
> > On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
> > > you have verified the sender or the content.
> > >  A recent bug report [1] shows md is handling a flush from
> bcachefs
> > > as read:
> > > 
> > > bch2_journal_write=>
> > >   submit_bio=>
> > >     ...
> > >     md_handle_request =>
> > >       raid5_make_request =>
> > >         chunk_aligned_read =>
> > >           raid5_read_one_chunk =>
> > >     ...
> > > 
> > > It appears md code only checks REQ_PREFLUSH for flush requests,
> which
> > > doesn't cover all cases. OTOH, op_is_flush() doesn't check
> > > REQ_OP_FLUSH
> > > either.
> > > 
> > > Fix this by:
> > > 1) Check REQ_PREFLUSH in op_is_flush();
> > > 2) Use op_is_flush() in md code.
> > > 
> > > Thanks,
> > > Song
> > > 
> > > [1] 
> > > 
> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
> > > 
> > 
> > REQ_OP_FLUSH is only used by the block layer's flush code, and the
> > filesystem should use REQ_PREFLUSH with an empty write bio.
> > 
> > If we want upper layer to be able to directly send REQ_OP_FLUSH
> bio,
> > then we should retrieve all REQ_PREFLUSH to confirm. At least for
> now,
> > it seems that REQ_OP_FLUSH without REQ_PREFLUSH in
> `blk_flush_policy`
> > will directly return 0 and no flush operation will be sent to the
> > driver.
> 
> If that's the case, then it should be documented and there should be
> a
> WARN_ON() in generic_make_request().

Please refer to the writeback_cache_control.rst. Use an empty write bio
with the REQ_PREFLUSH flag for an explicit flush, or as commonly
practiced by most filesystems, use blkdev_issue_flush for a pure flush.

> 
> Also, glancing at blk_types.h, we have the req_op and req_flag_bits
> both
> using (__force blk_opf_t), but using the same bit range - what the
> hell?
> That's seriously broken...

No, read the comment before req_op. We do not need to use the entire 32
bits to represent OP; only 8 bits for OP, while the remaning 24 bits is
used for FLAG.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
  2023-12-21  7:56     ` Ed Tsai (蔡宗軒)
@ 2023-12-21 19:19       ` Kent Overstreet
  0 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2023-12-21 19:19 UTC (permalink / raw)
  To: Ed Tsai (蔡宗軒)
  Cc: linux-raid, colyli, song, linux-block, bagasdotme,
	janpieter.sollie, axboe

On Thu, Dec 21, 2023 at 07:56:45AM +0000, Ed Tsai (蔡宗軒) wrote:
> On Thu, 2023-12-21 at 00:30 -0500, Kent Overstreet wrote:
> >  On Thu, Dec 21, 2023 at 03:36:40AM +0000, Ed Tsai (蔡宗軒) wrote:
> > > On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
> > > > you have verified the sender or the content.
> > > >  A recent bug report [1] shows md is handling a flush from
> > bcachefs
> > > > as read:
> > > > 
> > > > bch2_journal_write=>
> > > >   submit_bio=>
> > > >     ...
> > > >     md_handle_request =>
> > > >       raid5_make_request =>
> > > >         chunk_aligned_read =>
> > > >           raid5_read_one_chunk =>
> > > >     ...
> > > > 
> > > > It appears md code only checks REQ_PREFLUSH for flush requests,
> > which
> > > > doesn't cover all cases. OTOH, op_is_flush() doesn't check
> > > > REQ_OP_FLUSH
> > > > either.
> > > > 
> > > > Fix this by:
> > > > 1) Check REQ_PREFLUSH in op_is_flush();
> > > > 2) Use op_is_flush() in md code.
> > > > 
> > > > Thanks,
> > > > Song
> > > > 
> > > > [1] 
> > > > 
> > https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
> > > > 
> > > 
> > > REQ_OP_FLUSH is only used by the block layer's flush code, and the
> > > filesystem should use REQ_PREFLUSH with an empty write bio.
> > > 
> > > If we want upper layer to be able to directly send REQ_OP_FLUSH
> > bio,
> > > then we should retrieve all REQ_PREFLUSH to confirm. At least for
> > now,
> > > it seems that REQ_OP_FLUSH without REQ_PREFLUSH in
> > `blk_flush_policy`
> > > will directly return 0 and no flush operation will be sent to the
> > > driver.
> > 
> > If that's the case, then it should be documented and there should be
> > a
> > WARN_ON() in generic_make_request().
> 
> Please refer to the writeback_cache_control.rst. Use an empty write bio
> with the REQ_PREFLUSH flag for an explicit flush, or as commonly
> practiced by most filesystems, use blkdev_issue_flush for a pure flush.

That's not a substitute for a proper comment in the code.

> 
> > 
> > Also, glancing at blk_types.h, we have the req_op and req_flag_bits
> > both
> > using (__force blk_opf_t), but using the same bit range - what the
> > hell?
> > That's seriously broken...
> 
> No, read the comment before req_op. We do not need to use the entire 32
> bits to represent OP; only 8 bits for OP, while the remaning 24 bits is
> used for FLAG.

No, this is just broken; it's using the same bitwise enum for two
different enums.

bitwise exists for a reason - C enums are not natively type safe, and
mixing up enums/bitflags and using them in the wrong context is a
serious source of bugs. If it would be incorrect to or the two different
flags together, you can't use the same bitwise type.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-21 19:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
2023-12-21  1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
2023-12-21  6:12   ` Christoph Hellwig
2023-12-21  7:53     ` Song Liu
2023-12-21  1:27 ` [PATCH 2/2] md: Use op_is_flush() to check flush bio Song Liu
2023-12-21  3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
2023-12-21  5:30   ` Kent Overstreet
2023-12-21  7:56     ` Ed Tsai (蔡宗軒)
2023-12-21 19:19       ` Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.