linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] md: don't account io stat for split bio
@ 2021-05-08  3:48 Guoqing Jiang
  2021-05-10  6:00 ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Guoqing Jiang @ 2021-05-08  3:48 UTC (permalink / raw)
  To: song; +Cc: linux-raid, linux-block, pawel.wiejacha, Guoqing Jiang

From: Guoqing Jiang <jiangguoqing@kylinos.cn>

We have used generic io accounting functions to manage md io stats,
then for split bio, md also change it's bi_private and bi_end_io,
which could trigger double fault problem.

1146  <0>[33685.629591] traps: PANIC: double fault, error_code: 0x0
1147  <4>[33685.629593] double fault: 0000 [#1] SMP NOPTI
1148  <4>[33685.629594] CPU: 10 PID: 2118287 Comm: kworker/10:0
Tainted: P           OE     5.11.8-051108-generic #202103200636
1149  <4>[33685.629595] Hardware name: ASUSTeK COMPUTER INC. KRPG-U8
Series/KRPG-U8 Series, BIOS 4201 09/25/2020
1150  <4>[33685.629595] Workqueue: xfs-conv/md12 xfs_end_io [xfs]
1151  <4>[33685.629596] RIP: 0010:__slab_free+0x23/0x340
1152  <4>[33685.629597] Code: 4c fe ff ff 0f 1f 00 0f 1f 44 00 00 55
48 89 e5 41 57 49 89 cf 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83
e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c
24 20 49 8b 5c
1153  <4>[33685.629598] RSP: 0018:ffffa9bc00848fa0 EFLAGS: 00010086
1154  <4>[33685.629599] RAX: ffff94c04d8b10a0 RBX: ffff94437a34a880
RCX: ffff94437a34a880
1155  <4>[33685.629599] RDX: ffff94437a34a880 RSI: ffffcec745e8d280
RDI: ffff944300043b00
1156  <4>[33685.629599] RBP: ffffa9bc00849040 R08: 0000000000000001
R09: ffffffff82a5d6de
1157  <4>[33685.629600] R10: 0000000000000001 R11: 000000009c109000
R12: ffffcec745e8d280
1158  <4>[33685.629600] R13: ffff944300043b00 R14: ffff944300043b00
R15: ffff94437a34a880
1159  <4>[33685.629601] FS:  0000000000000000(0000)
GS:ffff94c04d880000(0000) knlGS:0000000000000000
1160  <4>[33685.629601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
1161  <4>[33685.629602] CR2: ffffa9bc00848f98 CR3: 000000014d04e000
CR4: 0000000000350ee0
1162  <4>[33685.629602] Call Trace:
1163  <4>[33685.629603]  <IRQ>
1164  <4>[33685.629603]  ? kfree+0x3bc/0x3e0
1165  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
1166  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
1167  <4>[33685.629604]  ? mempool_free+0x2f/0x80
1168  <4>[33685.629604]  ? md_end_io+0x4a/0x70
1169  <4>[33685.629604]  ? bio_endio+0xdc/0x130
1170  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
1171  <4>[33685.629605]  ? md_end_io+0x5c/0x70
1172  <4>[33685.629605]  ? bio_endio+0xdc/0x130
1173  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
1174  <4>[33685.629606]  ? md_end_io+0x5c/0x70
1175  <4>[33685.629606]  ? bio_endio+0xdc/0x130
1176  <4>[33685.629606]  ? bio_chain_endio+0x2d/0x40
1177  <4>[33685.629607]  ? md_end_io+0x5c/0x70
... repeated ...
1436  <4>[33685.629677]  ? bio_endio+0xdc/0x130
1437  <4>[33685.629677]  ? bio_chain_endio+0x2d/0x40
1438  <4>[33685.629677]  ? md_end_io+0x5c/0x70
1439  <4>[33685.629677]  ? bio_endio+0xdc/0x130
1440  <4>[33685.629678]  ? bio_chain_endio+0x2d/0x40
1441  <4>[33685.629678]  ? md_
1442  <4>[33685.629679] Lost 357 message(s)!

It looks like stack overflow happened for split bio, to fix this,
let's keep split bio untouched in md_submit_bio.

As a side effect, we need to export bio_chain_endio.

[1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t

Reported-by: Paweł Wiejacha <pawel.wiejacha@rtbhouse.com>
Tested-by: Paweł Wiejacha <pawel.wiejacha@rtbhouse.com>
Fixes: 41d2d848e5c0 ("md: improve io stats accounting")
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
 block/bio.c         | 3 ++-
 drivers/md/md.c     | 2 +-
 include/linux/bio.h | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 44205dfb6b60..759da1f6ab61 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -283,10 +283,11 @@ static struct bio *__bio_chain_endio(struct bio *bio)
 	return parent;
 }
 
-static void bio_chain_endio(struct bio *bio)
+void bio_chain_endio(struct bio *bio)
 {
 	bio_endio(__bio_chain_endio(bio));
 }
+EXPORT_SYMBOL(bio_chain_endio);
 
 /**
  * bio_chain - chain bio completions
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 49f897fbb89b..02fd272ff6f7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -489,7 +489,7 @@ static blk_qc_t md_submit_bio(struct bio *bio)
 		return BLK_QC_T_NONE;
 	}
 
-	if (bio->bi_end_io != md_end_io) {
+	if (bio->bi_end_io != md_end_io && bio->bi_end_io != bio_chain_endio) {
 		struct md_io *md_io;
 
 		md_io = mempool_alloc(&mddev->md_io_pool, GFP_NOIO);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a0b4cfdf62a4..6ea48fa1ad64 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -465,6 +465,7 @@ extern void bio_init(struct bio *bio, struct bio_vec *table,
 extern void bio_uninit(struct bio *);
 extern void bio_reset(struct bio *);
 void bio_chain(struct bio *, struct bio *);
+extern void bio_chain_endio(struct bio *bio);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-08  3:48 [PATCH] md: don't account io stat for split bio Guoqing Jiang
@ 2021-05-10  6:00 ` Christoph Hellwig
  2021-05-10  7:46   ` Guoqing Jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2021-05-10  6:00 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: song, linux-raid, linux-block, pawel.wiejacha, Guoqing Jiang

On Sat, May 08, 2021 at 11:48:15AM +0800, Guoqing Jiang wrote:
> It looks like stack overflow happened for split bio, to fix this,
> let's keep split bio untouched in md_submit_bio.
> 
> As a side effect, we need to export bio_chain_endio.

Err, no.  The right answer is to not change ->bi_end_io of bios that
you do not own instead of using a horrible hack to skip accounting for
bios that have no more or less reason to be accounted than others bios.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-10  6:00 ` Christoph Hellwig
@ 2021-05-10  7:46   ` Guoqing Jiang
  2021-05-10 19:49     ` Artur Paszkiewicz
  0 siblings, 1 reply; 11+ messages in thread
From: Guoqing Jiang @ 2021-05-10  7:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: song, linux-raid, linux-block, pawel.wiejacha, Artur Paszkiewicz



On 5/10/21 2:00 PM, Christoph Hellwig wrote:
> On Sat, May 08, 2021 at 11:48:15AM +0800, Guoqing Jiang wrote:
>> It looks like stack overflow happened for split bio, to fix this,
>> let's keep split bio untouched in md_submit_bio.
>>
>> As a side effect, we need to export bio_chain_endio.
> Err, no.  The right answer is to not change ->bi_end_io of bios that
> you do not own instead of using a horrible hack to skip accounting for
> bios that have no more or less reason to be accounted than others bios.

Thanks for the reply. I suppose that md needs to revert current
implementation of accounting io stats, then re-implement it.

Song and Artur, what are your opinion?

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-10  7:46   ` Guoqing Jiang
@ 2021-05-10 19:49     ` Artur Paszkiewicz
  2021-05-11  2:13       ` Guoqing Jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Artur Paszkiewicz @ 2021-05-10 19:49 UTC (permalink / raw)
  To: Guoqing Jiang, Christoph Hellwig
  Cc: song, linux-raid, linux-block, pawel.wiejacha

On 5/10/21 9:46 AM, Guoqing Jiang wrote:
> On 5/10/21 2:00 PM, Christoph Hellwig wrote:
>> On Sat, May 08, 2021 at 11:48:15AM +0800, Guoqing Jiang wrote:
>>> It looks like stack overflow happened for split bio, to fix this,
>>> let's keep split bio untouched in md_submit_bio.
>>>
>>> As a side effect, we need to export bio_chain_endio.
>> Err, no.  The right answer is to not change ->bi_end_io of bios that
>> you do not own instead of using a horrible hack to skip accounting for
>> bios that have no more or less reason to be accounted than others bios.
> 
> Thanks for the reply. I suppose that md needs to revert current
> implementation of accounting io stats, then re-implement it.
> 
> Song and Artur, what are your opinion?

In the initial version of the io accounting patch the bio was cloned instead
of just overriding bi_end_io and bi_private. Would this be the right approach?

https://lore.kernel.org/linux-raid/20200601161256.27718-1-artur.paszkiewicz@intel.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-10 19:49     ` Artur Paszkiewicz
@ 2021-05-11  2:13       ` Guoqing Jiang
  2021-05-11  4:38         ` Christoph Hellwig
  2021-05-11  6:58         ` Song Liu
  0 siblings, 2 replies; 11+ messages in thread
From: Guoqing Jiang @ 2021-05-11  2:13 UTC (permalink / raw)
  To: Artur Paszkiewicz, Christoph Hellwig
  Cc: song, linux-raid, linux-block, pawel.wiejacha



On 5/11/21 3:49 AM, Artur Paszkiewicz wrote:
> On 5/10/21 9:46 AM, Guoqing Jiang wrote:
>> On 5/10/21 2:00 PM, Christoph Hellwig wrote:
>>> On Sat, May 08, 2021 at 11:48:15AM +0800, Guoqing Jiang wrote:
>>>> It looks like stack overflow happened for split bio, to fix this,
>>>> let's keep split bio untouched in md_submit_bio.
>>>>
>>>> As a side effect, we need to export bio_chain_endio.
>>> Err, no.  The right answer is to not change ->bi_end_io of bios that
>>> you do not own instead of using a horrible hack to skip accounting for
>>> bios that have no more or less reason to be accounted than others bios.
>> Thanks for the reply. I suppose that md needs to revert current
>> implementation of accounting io stats, then re-implement it.
>>
>> Song and Artur, what are your opinion?
> In the initial version of the io accounting patch the bio was cloned instead
> of just overriding bi_end_io and bi_private. Would this be the right approach?
>
> https://lore.kernel.org/linux-raid/20200601161256.27718-1-artur.paszkiewicz@intel.com/

Maybe we can have different approach for different personality layers.

1. raid1 and raid10 can do the accounting in their own layer since they 
already
     clone bio here.
2. make the initial version handles other personality such as raid0 and 
raid5
     in the md layer.

Also a sysfs node which can enable/disable the accounting could be helpful.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-11  2:13       ` Guoqing Jiang
@ 2021-05-11  4:38         ` Christoph Hellwig
  2021-05-11  6:59           ` Song Liu
  2021-05-11  6:58         ` Song Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2021-05-11  4:38 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: Artur Paszkiewicz, Christoph Hellwig, song, linux-raid,
	linux-block, pawel.wiejacha

On Tue, May 11, 2021 at 10:13:41AM +0800, Guoqing Jiang wrote:
> > > Song and Artur, what are your opinion?
> > In the initial version of the io accounting patch the bio was cloned instead
> > of just overriding bi_end_io and bi_private. Would this be the right approach?
> > 
> > https://lore.kernel.org/linux-raid/20200601161256.27718-1-artur.paszkiewicz@intel.com/
> 
> Maybe we can have different approach for different personality layers.
> 
> 1. raid1 and raid10 can do the accounting in their own layer since they
> already
> ?????? clone bio here.
> 2. make the initial version handles other personality such as raid0 and
> raid5
> ?????? in the md layer.
> 
> Also a sysfs node which can enable/disable the accounting could be helpful.

Yes.  Also if the original bi_end_io is restore before completing the
bio you can still override it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-11  2:13       ` Guoqing Jiang
  2021-05-11  4:38         ` Christoph Hellwig
@ 2021-05-11  6:58         ` Song Liu
  2021-05-11  7:10           ` Christoph Hellwig
  1 sibling, 1 reply; 11+ messages in thread
From: Song Liu @ 2021-05-11  6:58 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: Artur Paszkiewicz, Christoph Hellwig, linux-raid, linux-block,
	Paweł Wiejacha

On Mon, May 10, 2021 at 7:13 PM Guoqing Jiang <jgq516@gmail.com> wrote:
>
>
>
> On 5/11/21 3:49 AM, Artur Paszkiewicz wrote:
> > On 5/10/21 9:46 AM, Guoqing Jiang wrote:
> >> On 5/10/21 2:00 PM, Christoph Hellwig wrote:
> >>> On Sat, May 08, 2021 at 11:48:15AM +0800, Guoqing Jiang wrote:
> >>>> It looks like stack overflow happened for split bio, to fix this,
> >>>> let's keep split bio untouched in md_submit_bio.
> >>>>
> >>>> As a side effect, we need to export bio_chain_endio.
> >>> Err, no.  The right answer is to not change ->bi_end_io of bios that
> >>> you do not own instead of using a horrible hack to skip accounting for
> >>> bios that have no more or less reason to be accounted than others bios.
> >> Thanks for the reply. I suppose that md needs to revert current
> >> implementation of accounting io stats, then re-implement it.
> >>
> >> Song and Artur, what are your opinion?
> > In the initial version of the io accounting patch the bio was cloned instead
> > of just overriding bi_end_io and bi_private. Would this be the right approach?
> >
> > https://lore.kernel.org/linux-raid/20200601161256.27718-1-artur.paszkiewicz@intel.com/
>
> Maybe we can have different approach for different personality layers.
>
> 1. raid1 and raid10 can do the accounting in their own layer since they
> already
>      clone bio here.
> 2. make the initial version handles other personality such as raid0 and
> raid5
>      in the md layer.
>
> Also a sysfs node which can enable/disable the accounting could be helpful.

IIUC, the sysfs node is needed to get better performance (by disabling
accounting)?
And splitting 1 and 2 above is also for better performance? If we add
the sysfs node,
I would prefer we use the same approach for all personalities (clone
in md.c). This
should simplify the code. If the user do not need extreme performance, we should
keep the stats on (default). If the user do need extreme performance, s/he could
disable stats via sysfs.

Thoughts?

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-11  4:38         ` Christoph Hellwig
@ 2021-05-11  6:59           ` Song Liu
  2021-05-11  7:11             ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2021-05-11  6:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Guoqing Jiang, Artur Paszkiewicz, linux-raid, linux-block,
	Paweł Wiejacha

On Mon, May 10, 2021 at 9:39 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, May 11, 2021 at 10:13:41AM +0800, Guoqing Jiang wrote:
> > > > Song and Artur, what are your opinion?
> > > In the initial version of the io accounting patch the bio was cloned instead
> > > of just overriding bi_end_io and bi_private. Would this be the right approach?
> > >
> > > https://lore.kernel.org/linux-raid/20200601161256.27718-1-artur.paszkiewicz@intel.com/
> >
> > Maybe we can have different approach for different personality layers.
> >
> > 1. raid1 and raid10 can do the accounting in their own layer since they
> > already
> > ?????? clone bio here.
> > 2. make the initial version handles other personality such as raid0 and
> > raid5
> > ?????? in the md layer.
> >
> > Also a sysfs node which can enable/disable the accounting could be helpful.
>
> Yes.  Also if the original bi_end_io is restore before completing the
> bio you can still override it.

Do you mean we can somehow avoid cloning the bio?

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-11  6:58         ` Song Liu
@ 2021-05-11  7:10           ` Christoph Hellwig
  2021-05-13  7:24             ` Guoqing Jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2021-05-11  7:10 UTC (permalink / raw)
  To: Song Liu
  Cc: Guoqing Jiang, Artur Paszkiewicz, Christoph Hellwig, linux-raid,
	linux-block, Pawe?? Wiejacha

On Mon, May 10, 2021 at 11:58:32PM -0700, Song Liu wrote:
> IIUC, the sysfs node is needed to get better performance (by disabling
> accounting)?

FYI, we already have that sysfs file in the block layer
("queue/iostats"), please just observe QUEUE_FLAG_IO_STAT flag.

> in md.c). This
> should simplify the code. If the user do not need extreme performance, we should
> keep the stats on (default). If the user do need extreme performance, s/he could
> disable stats via sysfs.

Given that the code already does a memory allocation it might as well
always clone and do away with the separate md_io allocation entirely.
Memory usage grows a bit, but it can reuse all the bio cloning
infrastructure.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-11  6:59           ` Song Liu
@ 2021-05-11  7:11             ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2021-05-11  7:11 UTC (permalink / raw)
  To: Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Artur Paszkiewicz, linux-raid,
	linux-block, Pawe?? Wiejacha

On Mon, May 10, 2021 at 11:59:11PM -0700, Song Liu wrote:
> > Yes.  Also if the original bi_end_io is restore before completing the
> > bio you can still override it.
> 
> Do you mean we can somehow avoid cloning the bio?

If you restore the original bi_end_io you might be able to avoid it.
That being said I suspect that always cloning will be more robust.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] md: don't account io stat for split bio
  2021-05-11  7:10           ` Christoph Hellwig
@ 2021-05-13  7:24             ` Guoqing Jiang
  0 siblings, 0 replies; 11+ messages in thread
From: Guoqing Jiang @ 2021-05-13  7:24 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu
  Cc: Artur Paszkiewicz, linux-raid, linux-block, Pawe?? Wiejacha



On 5/11/21 3:10 PM, Christoph Hellwig wrote:
> On Mon, May 10, 2021 at 11:58:32PM -0700, Song Liu wrote:
>> IIUC, the sysfs node is needed to get better performance (by disabling
>> accounting)?
> FYI, we already have that sysfs file in the block layer
> ("queue/iostats"), please just observe QUEUE_FLAG_IO_STAT flag.

Seems only nvdimm observe the flag before call bio_{start,end}_io_acct.
Does it make sense to make the checking mandatory? Something like.

--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1315,6 +1315,9 @@ static unsigned long __part_start_io_acct(struct 
block_device *part,
         const int sgrp = op_stat_group(op);
         unsigned long now = READ_ONCE(jiffies);

+       if (!blk_queue_io_stat(part->bd_disk->queue))
+               return 0;
+
         part_stat_lock();
         update_io_ticks(part, now, false);
         part_stat_inc(part, ios[sgrp]);
@@ -1351,6 +1354,9 @@ static void __part_end_io_acct(struct block_device 
*part, unsigned int op,
         unsigned long now = READ_ONCE(jiffies);
         unsigned long duration = now - start_time;

+       if (!blk_queue_io_stat(part->bd_disk->queue))
+               return;
+

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-13  7:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-08  3:48 [PATCH] md: don't account io stat for split bio Guoqing Jiang
2021-05-10  6:00 ` Christoph Hellwig
2021-05-10  7:46   ` Guoqing Jiang
2021-05-10 19:49     ` Artur Paszkiewicz
2021-05-11  2:13       ` Guoqing Jiang
2021-05-11  4:38         ` Christoph Hellwig
2021-05-11  6:59           ` Song Liu
2021-05-11  7:11             ` Christoph Hellwig
2021-05-11  6:58         ` Song Liu
2021-05-11  7:10           ` Christoph Hellwig
2021-05-13  7:24             ` Guoqing Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).