* [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
@ 2023-12-21 1:27 Song Liu
2023-12-21 1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Song Liu @ 2023-12-21 1:27 UTC (permalink / raw)
To: linux-block, linux-raid
Cc: axboe, kent.overstreet, janpieter.sollie, colyli, bagasdotme, Song Liu
A recent bug report [1] shows md is handling a flush from bcachefs as read:
bch2_journal_write=>
submit_bio=>
...
md_handle_request =>
raid5_make_request =>
chunk_aligned_read =>
raid5_read_one_chunk =>
...
It appears md code only checks REQ_PREFLUSH for flush requests, which
doesn't cover all cases. OTOH, op_is_flush() doesn't check REQ_OP_FLUSH
either.
Fix this by:
1) Check REQ_PREFLUSH in op_is_flush();
2) Use op_is_flush() in md code.
Thanks,
Song
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218184
Song Liu (2):
block: Check REQ_OP_FLUSH in op_is_flush()
md: Use op_is_flush() to check flush bio
drivers/md/raid0.c | 2 +-
drivers/md/raid1.c | 2 +-
drivers/md/raid10.c | 2 +-
drivers/md/raid5.c | 2 +-
include/linux/blk_types.h | 3 ++-
5 files changed, 6 insertions(+), 5 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush()
2023-12-21 1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
@ 2023-12-21 1:27 ` Song Liu
2023-12-21 6:12 ` Christoph Hellwig
2023-12-21 1:27 ` [PATCH 2/2] md: Use op_is_flush() to check flush bio Song Liu
2023-12-21 3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
2 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2023-12-21 1:27 UTC (permalink / raw)
To: linux-block, linux-raid
Cc: axboe, kent.overstreet, janpieter.sollie, colyli, bagasdotme, Song Liu
Upper layer (fs, etc.) may issue flush with something like:
bio_reset(bio, bdev, REQ_OP_FLUSH);
bio->bi_end_io = xxx;
submit_bio(bio);
op_is_flush(bio->bi_opf) should return true for this bio.
Signed-off-by: Song Liu <song@kernel.org>
---
include/linux/blk_types.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d5c5e59ddbd2..338423da84ca 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -487,7 +487,8 @@ static inline bool op_is_write(blk_opf_t op)
*/
static inline bool op_is_flush(blk_opf_t op)
{
- return op & (REQ_FUA | REQ_PREFLUSH);
+ return op & (REQ_FUA | REQ_PREFLUSH) ||
+ (op & REQ_OP_MASK) == REQ_OP_FLUSH;
}
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] md: Use op_is_flush() to check flush bio
2023-12-21 1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
2023-12-21 1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
@ 2023-12-21 1:27 ` Song Liu
2023-12-21 3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
2 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2023-12-21 1:27 UTC (permalink / raw)
To: linux-block, linux-raid
Cc: axboe, kent.overstreet, janpieter.sollie, colyli, bagasdotme, Song Liu
op_is_flush() covers different ways to request flush. Use it instead of
simply checking against REQ_PREFLUSH.
Signed-off-by: Song Liu <song@kernel.org>
---
drivers/md/raid0.c | 2 +-
drivers/md/raid1.c | 2 +-
drivers/md/raid10.c | 2 +-
drivers/md/raid5.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c50a7abda744..20283dc5208a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -592,7 +592,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
unsigned chunk_sects;
unsigned sectors;
- if (unlikely(bio->bi_opf & REQ_PREFLUSH)
+ if (unlikely(op_is_flush(bio->bi_opf))
&& md_flush_request(mddev, bio))
return true;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index aaa434f0c175..5c1dadd7fbb6 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1581,7 +1581,7 @@ static bool raid1_make_request(struct mddev *mddev, struct bio *bio)
{
sector_t sectors;
- if (unlikely(bio->bi_opf & REQ_PREFLUSH)
+ if (unlikely(op_is_flush(bio->bi_opf))
&& md_flush_request(mddev, bio))
return true;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7412066ea22c..5c6e0a8635f2 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1857,7 +1857,7 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio)
int chunk_sects = chunk_mask + 1;
int sectors = bio_sectors(bio);
- if (unlikely(bio->bi_opf & REQ_PREFLUSH)
+ if (unlikely(op_is_flush(bio->bi_opf))
&& md_flush_request(mddev, bio))
return true;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e57deb1c6138..1bcf96b490a7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6070,7 +6070,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
enum stripe_result res;
int s, stripe_cnt;
- if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
+ if (unlikely(op_is_flush(bi->bi_opf))) {
int ret = log_handle_flush_request(conf, bi);
if (ret == 0)
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
2023-12-21 1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
2023-12-21 1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
2023-12-21 1:27 ` [PATCH 2/2] md: Use op_is_flush() to check flush bio Song Liu
@ 2023-12-21 3:36 ` Ed Tsai (蔡宗軒)
2023-12-21 5:30 ` Kent Overstreet
2 siblings, 1 reply; 9+ messages in thread
From: Ed Tsai (蔡宗軒) @ 2023-12-21 3:36 UTC (permalink / raw)
To: song
Cc: bagasdotme, colyli, linux-raid, linux-block, janpieter.sollie,
axboe, kent.overstreet
On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> A recent bug report [1] shows md is handling a flush from bcachefs
> as read:
>
> bch2_journal_write=>
> submit_bio=>
> ...
> md_handle_request =>
> raid5_make_request =>
> chunk_aligned_read =>
> raid5_read_one_chunk =>
> ...
>
> It appears md code only checks REQ_PREFLUSH for flush requests, which
> doesn't cover all cases. OTOH, op_is_flush() doesn't check
> REQ_OP_FLUSH
> either.
>
> Fix this by:
> 1) Check REQ_PREFLUSH in op_is_flush();
> 2) Use op_is_flush() in md code.
>
> Thanks,
> Song
>
> [1]
> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
>
REQ_OP_FLUSH is only used by the block layer's flush code, and the
filesystem should use REQ_PREFLUSH with an empty write bio.
If we want upper layer to be able to directly send REQ_OP_FLUSH bio,
then we should retrieve all REQ_PREFLUSH to confirm. At least for now,
it seems that REQ_OP_FLUSH without REQ_PREFLUSH in `blk_flush_policy`
will directly return 0 and no flush operation will be sent to the
driver.
>
>
> Song Liu (2):
> block: Check REQ_OP_FLUSH in op_is_flush()
> md: Use op_is_flush() to check flush bio
>
> drivers/md/raid0.c | 2 +-
> drivers/md/raid1.c | 2 +-
> drivers/md/raid10.c | 2 +-
> drivers/md/raid5.c | 2 +-
> include/linux/blk_types.h | 3 ++-
> 5 files changed, 6 insertions(+), 5 deletions(-)
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
2023-12-21 3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
@ 2023-12-21 5:30 ` Kent Overstreet
2023-12-21 7:56 ` Ed Tsai (蔡宗軒)
0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2023-12-21 5:30 UTC (permalink / raw)
To: Ed Tsai (蔡宗軒)
Cc: song, bagasdotme, colyli, linux-raid, linux-block,
janpieter.sollie, axboe
On Thu, Dec 21, 2023 at 03:36:40AM +0000, Ed Tsai (蔡宗軒) wrote:
> On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> > A recent bug report [1] shows md is handling a flush from bcachefs
> > as read:
> >
> > bch2_journal_write=>
> > submit_bio=>
> > ...
> > md_handle_request =>
> > raid5_make_request =>
> > chunk_aligned_read =>
> > raid5_read_one_chunk =>
> > ...
> >
> > It appears md code only checks REQ_PREFLUSH for flush requests, which
> > doesn't cover all cases. OTOH, op_is_flush() doesn't check
> > REQ_OP_FLUSH
> > either.
> >
> > Fix this by:
> > 1) Check REQ_PREFLUSH in op_is_flush();
> > 2) Use op_is_flush() in md code.
> >
> > Thanks,
> > Song
> >
> > [1]
> > https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
> >
>
> REQ_OP_FLUSH is only used by the block layer's flush code, and the
> filesystem should use REQ_PREFLUSH with an empty write bio.
>
> If we want upper layer to be able to directly send REQ_OP_FLUSH bio,
> then we should retrieve all REQ_PREFLUSH to confirm. At least for now,
> it seems that REQ_OP_FLUSH without REQ_PREFLUSH in `blk_flush_policy`
> will directly return 0 and no flush operation will be sent to the
> driver.
If that's the case, then it should be documented and there should be a
WARN_ON() in generic_make_request().
Also, glancing at blk_types.h, we have the req_op and req_flag_bits both
using (__force blk_opf_t), but using the same bit range - what the hell?
That's seriously broken...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush()
2023-12-21 1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
@ 2023-12-21 6:12 ` Christoph Hellwig
2023-12-21 7:53 ` Song Liu
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2023-12-21 6:12 UTC (permalink / raw)
To: Song Liu
Cc: linux-block, linux-raid, axboe, kent.overstreet,
janpieter.sollie, colyli, bagasdotme
On Wed, Dec 20, 2023 at 05:27:14PM -0800, Song Liu wrote:
> Upper layer (fs, etc.) may issue flush with something like:
>
> bio_reset(bio, bdev, REQ_OP_FLUSH);
> bio->bi_end_io = xxx;
> submit_bio(bio);
No, they can't. REQ_OP_FLUSH is currently only used by request
based drivers and only generated by the flush state machine.
(Not that I particularly like this wart, I'd much prefer file systems
to submit a REQ_OP_FLUSH than an empty write with a preflush flag,
but that's not a trivial change).
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush()
2023-12-21 6:12 ` Christoph Hellwig
@ 2023-12-21 7:53 ` Song Liu
0 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2023-12-21 7:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, linux-raid, axboe, kent.overstreet,
janpieter.sollie, colyli, bagasdotme
On Wed, Dec 20, 2023 at 10:12 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Dec 20, 2023 at 05:27:14PM -0800, Song Liu wrote:
> > Upper layer (fs, etc.) may issue flush with something like:
> >
> > bio_reset(bio, bdev, REQ_OP_FLUSH);
> > bio->bi_end_io = xxx;
> > submit_bio(bio);
>
> No, they can't. REQ_OP_FLUSH is currently only used by request
> based drivers and only generated by the flush state machine.
Hmm... Then this call trace from bcachefs is the exception here:
bch2_journal_write=>
submit_bio=>
...
md_handle_request =>
raid5_make_request =>
chunk_aligned_read =>
raid5_read_one_chunk =>
> (Not that I particularly like this wart, I'd much prefer file systems
> to submit a REQ_OP_FLUSH than an empty write with a preflush flag,
> but that's not a trivial change).
Kent, I think we need to update how bcachefs issue flushes.
Thanks,
Song
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
2023-12-21 5:30 ` Kent Overstreet
@ 2023-12-21 7:56 ` Ed Tsai (蔡宗軒)
2023-12-21 19:19 ` Kent Overstreet
0 siblings, 1 reply; 9+ messages in thread
From: Ed Tsai (蔡宗軒) @ 2023-12-21 7:56 UTC (permalink / raw)
To: kent.overstreet
Cc: linux-raid, colyli, song, linux-block, bagasdotme,
janpieter.sollie, axboe
On Thu, 2023-12-21 at 00:30 -0500, Kent Overstreet wrote:
> On Thu, Dec 21, 2023 at 03:36:40AM +0000, Ed Tsai (蔡宗軒) wrote:
> > On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
> > > you have verified the sender or the content.
> > > A recent bug report [1] shows md is handling a flush from
> bcachefs
> > > as read:
> > >
> > > bch2_journal_write=>
> > > submit_bio=>
> > > ...
> > > md_handle_request =>
> > > raid5_make_request =>
> > > chunk_aligned_read =>
> > > raid5_read_one_chunk =>
> > > ...
> > >
> > > It appears md code only checks REQ_PREFLUSH for flush requests,
> which
> > > doesn't cover all cases. OTOH, op_is_flush() doesn't check
> > > REQ_OP_FLUSH
> > > either.
> > >
> > > Fix this by:
> > > 1) Check REQ_PREFLUSH in op_is_flush();
> > > 2) Use op_is_flush() in md code.
> > >
> > > Thanks,
> > > Song
> > >
> > > [1]
> > >
> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
> > >
> >
> > REQ_OP_FLUSH is only used by the block layer's flush code, and the
> > filesystem should use REQ_PREFLUSH with an empty write bio.
> >
> > If we want upper layer to be able to directly send REQ_OP_FLUSH
> bio,
> > then we should retrieve all REQ_PREFLUSH to confirm. At least for
> now,
> > it seems that REQ_OP_FLUSH without REQ_PREFLUSH in
> `blk_flush_policy`
> > will directly return 0 and no flush operation will be sent to the
> > driver.
>
> If that's the case, then it should be documented and there should be
> a
> WARN_ON() in generic_make_request().
Please refer to the writeback_cache_control.rst. Use an empty write bio
with the REQ_PREFLUSH flag for an explicit flush, or as commonly
practiced by most filesystems, use blkdev_issue_flush for a pure flush.
>
> Also, glancing at blk_types.h, we have the req_op and req_flag_bits
> both
> using (__force blk_opf_t), but using the same bit range - what the
> hell?
> That's seriously broken...
No, read the comment before req_op. We do not need to use the entire 32
bits to represent OP; only 8 bits for OP, while the remaning 24 bits is
used for FLAG.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH
2023-12-21 7:56 ` Ed Tsai (蔡宗軒)
@ 2023-12-21 19:19 ` Kent Overstreet
0 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2023-12-21 19:19 UTC (permalink / raw)
To: Ed Tsai (蔡宗軒)
Cc: linux-raid, colyli, song, linux-block, bagasdotme,
janpieter.sollie, axboe
On Thu, Dec 21, 2023 at 07:56:45AM +0000, Ed Tsai (蔡宗軒) wrote:
> On Thu, 2023-12-21 at 00:30 -0500, Kent Overstreet wrote:
> > On Thu, Dec 21, 2023 at 03:36:40AM +0000, Ed Tsai (蔡宗軒) wrote:
> > > On Wed, 2023-12-20 at 17:27 -0800, Song Liu wrote:
> > > > you have verified the sender or the content.
> > > > A recent bug report [1] shows md is handling a flush from
> > bcachefs
> > > > as read:
> > > >
> > > > bch2_journal_write=>
> > > > submit_bio=>
> > > > ...
> > > > md_handle_request =>
> > > > raid5_make_request =>
> > > > chunk_aligned_read =>
> > > > raid5_read_one_chunk =>
> > > > ...
> > > >
> > > > It appears md code only checks REQ_PREFLUSH for flush requests,
> > which
> > > > doesn't cover all cases. OTOH, op_is_flush() doesn't check
> > > > REQ_OP_FLUSH
> > > > either.
> > > >
> > > > Fix this by:
> > > > 1) Check REQ_PREFLUSH in op_is_flush();
> > > > 2) Use op_is_flush() in md code.
> > > >
> > > > Thanks,
> > > > Song
> > > >
> > > > [1]
> > > >
> > https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=218184__;!!CTRNKA9wMg0ARbw!gQbjtS_f5d3Du2prpIT8zUM4mkZf7qDleyaAuEfG8j5tMrDvw7cfJUB04VWl0uVAL4BJ4YWbVopp$
> > > >
> > >
> > > REQ_OP_FLUSH is only used by the block layer's flush code, and the
> > > filesystem should use REQ_PREFLUSH with an empty write bio.
> > >
> > > If we want upper layer to be able to directly send REQ_OP_FLUSH
> > bio,
> > > then we should retrieve all REQ_PREFLUSH to confirm. At least for
> > now,
> > > it seems that REQ_OP_FLUSH without REQ_PREFLUSH in
> > `blk_flush_policy`
> > > will directly return 0 and no flush operation will be sent to the
> > > driver.
> >
> > If that's the case, then it should be documented and there should be
> > a
> > WARN_ON() in generic_make_request().
>
> Please refer to the writeback_cache_control.rst. Use an empty write bio
> with the REQ_PREFLUSH flag for an explicit flush, or as commonly
> practiced by most filesystems, use blkdev_issue_flush for a pure flush.
That's not a substitute for a proper comment in the code.
>
> >
> > Also, glancing at blk_types.h, we have the req_op and req_flag_bits
> > both
> > using (__force blk_opf_t), but using the same bit range - what the
> > hell?
> > That's seriously broken...
>
> No, read the comment before req_op. We do not need to use the entire 32
> bits to represent OP; only 8 bits for OP, while the remaning 24 bits is
> used for FLAG.
No, this is just broken; it's using the same bitwise enum for two
different enums.
bitwise exists for a reason - C enums are not natively type safe, and
mixing up enums/bitflags and using them in the wrong context is a
serious source of bugs. If it would be incorrect to or the two different
flags together, you can't use the same bitwise type.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-12-21 19:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21 1:27 [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Song Liu
2023-12-21 1:27 ` [PATCH 1/2] block: Check REQ_OP_FLUSH in op_is_flush() Song Liu
2023-12-21 6:12 ` Christoph Hellwig
2023-12-21 7:53 ` Song Liu
2023-12-21 1:27 ` [PATCH 2/2] md: Use op_is_flush() to check flush bio Song Liu
2023-12-21 3:36 ` [PATCH 0/2] block, md: Better handle REQ_OP_FLUSH Ed Tsai (蔡宗軒)
2023-12-21 5:30 ` Kent Overstreet
2023-12-21 7:56 ` Ed Tsai (蔡宗軒)
2023-12-21 19:19 ` Kent Overstreet
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.