* [PATCH 0/2] Couple more bug fixes from recent commits
@ 2022-09-02 17:16 Logan Gunthorpe
2022-09-02 17:16 ` [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() Logan Gunthorpe
2022-09-02 17:16 ` [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start() Logan Gunthorpe
0 siblings, 2 replies; 9+ messages in thread
From: Logan Gunthorpe @ 2022-09-02 17:16 UTC (permalink / raw)
To: linux-kernel, linux-raid, Song Liu
Cc: Christoph Hellwig, Guoqing Jiang, Stephen Bates, Martin Oliveira,
David Sloan, Logan Gunthorpe
Hey,
We've managed to track down some hard to hit bugs that were introduced
in two relatively recent commits. The fixes are both one liners.
Thanks,
Logan
--
David Sloan (1):
md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
Logan Gunthorpe (1):
md: Remove extra mddev_get() in md_seq_start()
drivers/md/md.c | 1 -
drivers/md/raid5.c | 1 -
2 files changed, 2 deletions(-)
base-commit: 526bd69b9d330eed1db59b2cf6a7d18caf866847
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
2022-09-02 17:16 [PATCH 0/2] Couple more bug fixes from recent commits Logan Gunthorpe
@ 2022-09-02 17:16 ` Logan Gunthorpe
2022-09-05 5:51 ` Christoph Hellwig
` (2 more replies)
2022-09-02 17:16 ` [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start() Logan Gunthorpe
1 sibling, 3 replies; 9+ messages in thread
From: Logan Gunthorpe @ 2022-09-02 17:16 UTC (permalink / raw)
To: linux-kernel, linux-raid, Song Liu
Cc: Christoph Hellwig, Guoqing Jiang, Stephen Bates, Martin Oliveira,
David Sloan, David Sloan, Logan Gunthorpe
From: David Sloan <david.sloan@eideticom.com>
When running chunk-sized reads on disks with badblocks duplicate bio
free/puts are observed:
=============================================================================
BUG bio-200 (Not tainted): Object already free
-----------------------------------------------------------------------------
Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504
__slab_alloc.constprop.0+0x5a/0xb0
kmem_cache_alloc+0x31e/0x330
mempool_alloc_slab+0x17/0x20
mempool_alloc+0x100/0x2b0
bio_alloc_bioset+0x181/0x460
do_mpage_readpage+0x776/0xd00
mpage_readahead+0x166/0x320
blkdev_readahead+0x15/0x20
read_pages+0x13f/0x5f0
page_cache_ra_unbounded+0x18d/0x220
force_page_cache_ra+0x181/0x1c0
page_cache_sync_ra+0x65/0xb0
filemap_get_pages+0x1df/0xaf0
filemap_read+0x1e1/0x700
blkdev_read_iter+0x1e5/0x330
vfs_read+0x42a/0x570
Freed in mempool_free_slab+0x17/0x20 age=3 cpu=2 pid=7504
kmem_cache_free+0x46d/0x490
mempool_free_slab+0x17/0x20
mempool_free+0x66/0x190
bio_free+0x78/0x90
bio_put+0x100/0x1a0
raid5_make_request+0x2259/0x2450
md_handle_request+0x402/0x600
md_submit_bio+0xd9/0x120
__submit_bio+0x11f/0x1b0
submit_bio_noacct_nocheck+0x204/0x480
submit_bio_noacct+0x32e/0xc70
submit_bio+0x98/0x1a0
mpage_readahead+0x250/0x320
blkdev_readahead+0x15/0x20
read_pages+0x13f/0x5f0
page_cache_ra_unbounded+0x18d/0x220
Slab 0xffffea000481b600 objects=21 used=0 fp=0xffff8881206d8940 flags=0x17ffffc0010201(locked|slab|head|node=0|zone=2|lastcpupid=0x1fffff)
CPU: 0 PID: 34525 Comm: kworker/u24:2 Not tainted 6.0.0-rc2-localyes-265166-gf11c5343fa3f #143
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Workqueue: raid5wq raid5_do_work
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x78
dump_stack+0x10/0x16
print_trailer+0x158/0x165
object_err+0x35/0x50
free_debug_processing.cold+0xb7/0xbe
__slab_free+0x1ae/0x330
kmem_cache_free+0x46d/0x490
mempool_free_slab+0x17/0x20
mempool_free+0x66/0x190
bio_free+0x78/0x90
bio_put+0x100/0x1a0
mpage_end_io+0x36/0x150
bio_endio+0x2fd/0x360
md_end_io_acct+0x7e/0x90
bio_endio+0x2fd/0x360
handle_failed_stripe+0x960/0xb80
handle_stripe+0x1348/0x3760
handle_active_stripes.constprop.0+0x72a/0xaf0
raid5_do_work+0x177/0x330
process_one_work+0x616/0xb20
worker_thread+0x2bd/0x6f0
kthread+0x179/0x1b0
ret_from_fork+0x22/0x30
</TASK>
The double free is caused by an unnecessary bio_put() in the
if(is_badblock(...)) error path in raid5_read_one_chunk().
The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c
("md/raid5: move checking badblock before clone bio in
raid5_read_one_chunk"). The previous code checked and freed align_bio
which required a bio_put. After he move that is no longer needed as
raid_bio is returned to the control of the common io path which
performs its own endio resulting in a double free on bad device blocks.
Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk")
Signed-off-by: David Sloan <david.sloan@eideticom.com>
[logang@deltatee.com: minor rewording of commit message]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
drivers/md/raid5.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4e6d865a6456..734f92e75f85 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5538,7 +5538,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
&bad_sectors)) {
- bio_put(raid_bio);
rdev_dec_pending(rdev, mddev);
return 0;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start()
2022-09-02 17:16 [PATCH 0/2] Couple more bug fixes from recent commits Logan Gunthorpe
2022-09-02 17:16 ` [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() Logan Gunthorpe
@ 2022-09-02 17:16 ` Logan Gunthorpe
2022-09-05 5:51 ` Christoph Hellwig
2022-09-06 6:06 ` Guoqing Jiang
1 sibling, 2 replies; 9+ messages in thread
From: Logan Gunthorpe @ 2022-09-02 17:16 UTC (permalink / raw)
To: linux-kernel, linux-raid, Song Liu
Cc: Christoph Hellwig, Guoqing Jiang, Stephen Bates, Martin Oliveira,
David Sloan, Logan Gunthorpe
A regression is seen where mddev devices stay permanently after they
are stopped due to an elevated reference count.
This was tracked down to an extra mddev_get() in md_seq_start().
It only happened rarely because most of the time the md_seq_start()
is called with a zero offset. The path with an extra mddev_get() only
happens when it starts with a non-zero offset.
The commit noted below changed an mddev_get() to check its success
but inadevrtantly left the original call in. Remove the extra call.
Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
drivers/md/md.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..9dc0175280b4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8154,7 +8154,6 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
list_for_each(tmp,&all_mddevs)
if (!l--) {
mddev = list_entry(tmp, struct mddev, all_mddevs);
- mddev_get(mddev);
if (!mddev_get(mddev))
continue;
spin_unlock(&all_mddevs_lock);
--
2.30.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
2022-09-02 17:16 ` [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() Logan Gunthorpe
@ 2022-09-05 5:51 ` Christoph Hellwig
2022-09-05 6:07 ` Paul Menzel
2022-09-06 6:04 ` Guoqing Jiang
2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2022-09-05 5:51 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
Guoqing Jiang, Stephen Bates, Martin Oliveira, David Sloan
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start()
2022-09-02 17:16 ` [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start() Logan Gunthorpe
@ 2022-09-05 5:51 ` Christoph Hellwig
2022-09-06 6:06 ` Guoqing Jiang
1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2022-09-05 5:51 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
Guoqing Jiang, Stephen Bates, Martin Oliveira, David Sloan
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
2022-09-02 17:16 ` [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() Logan Gunthorpe
2022-09-05 5:51 ` Christoph Hellwig
@ 2022-09-05 6:07 ` Paul Menzel
2022-09-07 16:52 ` Logan Gunthorpe
2022-09-06 6:04 ` Guoqing Jiang
2 siblings, 1 reply; 9+ messages in thread
From: Paul Menzel @ 2022-09-05 6:07 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
Guoqing Jiang, Stephen Bates, Martin Oliveira, David Sloan
Dear Logan, dear David,
Thank you for the patch.
Am 02.09.22 um 19:16 schrieb Logan Gunthorpe:
> From: David Sloan <david.sloan@eideticom.com>
>
> When running chunk-sized reads on disks with badblocks duplicate bio
> free/puts are observed:
>
> =============================================================================
> BUG bio-200 (Not tainted): Object already free
> -----------------------------------------------------------------------------
> Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504
> __slab_alloc.constprop.0+0x5a/0xb0
[…]
> The double free is caused by an unnecessary bio_put() in the
> if(is_badblock(...)) error path in raid5_read_one_chunk().
>
> The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c
> ("md/raid5: move checking badblock before clone bio in
> raid5_read_one_chunk"). The previous code checked and freed align_bio
> which required a bio_put. After he move that is no longer needed as
*t*he move
> raid_bio is returned to the control of the common io path which
> performs its own endio resulting in a double free on bad device blocks.
>
> Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk")
This commit was added to Linux in v5.14-rc1, so it’s not a recent commit
– or I misunderstood the cover letter.
Kind regards,
Paul
> Signed-off-by: David Sloan <david.sloan@eideticom.com>
> [logang@deltatee.com: minor rewording of commit message]
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
> drivers/md/raid5.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4e6d865a6456..734f92e75f85 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5538,7 +5538,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
>
> if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
> &bad_sectors)) {
> - bio_put(raid_bio);
> rdev_dec_pending(rdev, mddev);
> return 0;
> }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
2022-09-02 17:16 ` [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() Logan Gunthorpe
2022-09-05 5:51 ` Christoph Hellwig
2022-09-05 6:07 ` Paul Menzel
@ 2022-09-06 6:04 ` Guoqing Jiang
2 siblings, 0 replies; 9+ messages in thread
From: Guoqing Jiang @ 2022-09-06 6:04 UTC (permalink / raw)
To: Logan Gunthorpe, linux-kernel, linux-raid, Song Liu
Cc: Christoph Hellwig, Stephen Bates, Martin Oliveira, David Sloan
On 9/3/22 1:16 AM, Logan Gunthorpe wrote:
> From: David Sloan <david.sloan@eideticom.com>
>
> When running chunk-sized reads on disks with badblocks duplicate bio
> free/puts are observed:
>
> =============================================================================
> BUG bio-200 (Not tainted): Object already free
> -----------------------------------------------------------------------------
> Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504
> __slab_alloc.constprop.0+0x5a/0xb0
> kmem_cache_alloc+0x31e/0x330
> mempool_alloc_slab+0x17/0x20
> mempool_alloc+0x100/0x2b0
> bio_alloc_bioset+0x181/0x460
> do_mpage_readpage+0x776/0xd00
> mpage_readahead+0x166/0x320
> blkdev_readahead+0x15/0x20
> read_pages+0x13f/0x5f0
> page_cache_ra_unbounded+0x18d/0x220
> force_page_cache_ra+0x181/0x1c0
> page_cache_sync_ra+0x65/0xb0
> filemap_get_pages+0x1df/0xaf0
> filemap_read+0x1e1/0x700
> blkdev_read_iter+0x1e5/0x330
> vfs_read+0x42a/0x570
> Freed in mempool_free_slab+0x17/0x20 age=3 cpu=2 pid=7504
> kmem_cache_free+0x46d/0x490
> mempool_free_slab+0x17/0x20
> mempool_free+0x66/0x190
> bio_free+0x78/0x90
> bio_put+0x100/0x1a0
> raid5_make_request+0x2259/0x2450
> md_handle_request+0x402/0x600
> md_submit_bio+0xd9/0x120
> __submit_bio+0x11f/0x1b0
> submit_bio_noacct_nocheck+0x204/0x480
> submit_bio_noacct+0x32e/0xc70
> submit_bio+0x98/0x1a0
> mpage_readahead+0x250/0x320
> blkdev_readahead+0x15/0x20
> read_pages+0x13f/0x5f0
> page_cache_ra_unbounded+0x18d/0x220
> Slab 0xffffea000481b600 objects=21 used=0 fp=0xffff8881206d8940 flags=0x17ffffc0010201(locked|slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> CPU: 0 PID: 34525 Comm: kworker/u24:2 Not tainted 6.0.0-rc2-localyes-265166-gf11c5343fa3f #143
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> Workqueue: raid5wq raid5_do_work
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5a/0x78
> dump_stack+0x10/0x16
> print_trailer+0x158/0x165
> object_err+0x35/0x50
> free_debug_processing.cold+0xb7/0xbe
> __slab_free+0x1ae/0x330
> kmem_cache_free+0x46d/0x490
> mempool_free_slab+0x17/0x20
> mempool_free+0x66/0x190
> bio_free+0x78/0x90
> bio_put+0x100/0x1a0
> mpage_end_io+0x36/0x150
> bio_endio+0x2fd/0x360
> md_end_io_acct+0x7e/0x90
> bio_endio+0x2fd/0x360
> handle_failed_stripe+0x960/0xb80
> handle_stripe+0x1348/0x3760
> handle_active_stripes.constprop.0+0x72a/0xaf0
> raid5_do_work+0x177/0x330
> process_one_work+0x616/0xb20
> worker_thread+0x2bd/0x6f0
> kthread+0x179/0x1b0
> ret_from_fork+0x22/0x30
> </TASK>
>
> The double free is caused by an unnecessary bio_put() in the
> if(is_badblock(...)) error path in raid5_read_one_chunk().
>
> The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c
> ("md/raid5: move checking badblock before clone bio in
> raid5_read_one_chunk"). The previous code checked and freed align_bio
> which required a bio_put. After he move that is no longer needed as
> raid_bio is returned to the control of the common io path which
> performs its own endio resulting in a double free on bad device blocks.
>
> Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk")
> Signed-off-by: David Sloan <david.sloan@eideticom.com>
> [logang@deltatee.com: minor rewording of commit message]
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
> drivers/md/raid5.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4e6d865a6456..734f92e75f85 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5538,7 +5538,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
>
> if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
> &bad_sectors)) {
> - bio_put(raid_bio);
> rdev_dec_pending(rdev, mddev);
> return 0;
> }
Acked-by: Guoqing Jiang <Guoqing.jiang@linux.dev>
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start()
2022-09-02 17:16 ` [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start() Logan Gunthorpe
2022-09-05 5:51 ` Christoph Hellwig
@ 2022-09-06 6:06 ` Guoqing Jiang
1 sibling, 0 replies; 9+ messages in thread
From: Guoqing Jiang @ 2022-09-06 6:06 UTC (permalink / raw)
To: Logan Gunthorpe, linux-kernel, linux-raid, Song Liu
Cc: Christoph Hellwig, Stephen Bates, Martin Oliveira, David Sloan
On 9/3/22 1:16 AM, Logan Gunthorpe wrote:
> A regression is seen where mddev devices stay permanently after they
> are stopped due to an elevated reference count.
>
> This was tracked down to an extra mddev_get() in md_seq_start().
>
> It only happened rarely because most of the time the md_seq_start()
> is called with a zero offset. The path with an extra mddev_get() only
> happens when it starts with a non-zero offset.
>
> The commit noted below changed an mddev_get() to check its success
> but inadevrtantly left the original call in. Remove the extra call.
>
> Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed")
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
> drivers/md/md.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index afaf36b2f6ab..9dc0175280b4 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8154,7 +8154,6 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
> list_for_each(tmp,&all_mddevs)
> if (!l--) {
> mddev = list_entry(tmp, struct mddev, all_mddevs);
> - mddev_get(mddev);
> if (!mddev_get(mddev))
> continue;
> spin_unlock(&all_mddevs_lock);
Acked-by: Guoqing Jiang <Guoqing.jiang@linux.dev>
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
2022-09-05 6:07 ` Paul Menzel
@ 2022-09-07 16:52 ` Logan Gunthorpe
0 siblings, 0 replies; 9+ messages in thread
From: Logan Gunthorpe @ 2022-09-07 16:52 UTC (permalink / raw)
To: Paul Menzel
Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
Guoqing Jiang, Stephen Bates, Martin Oliveira, David Sloan
On 2022-09-05 00:07, Paul Menzel wrote:
> This commit was added to Linux in v5.14-rc1, so it’s not a recent commit
> – or I misunderstood the cover letter.
Ah, yes, sorry. The commit it fixes is a bit more than a year old. I
guess it depends on how you define "recent". ;)
Logan
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-09-07 16:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-02 17:16 [PATCH 0/2] Couple more bug fixes from recent commits Logan Gunthorpe
2022-09-02 17:16 ` [PATCH 1/2] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() Logan Gunthorpe
2022-09-05 5:51 ` Christoph Hellwig
2022-09-05 6:07 ` Paul Menzel
2022-09-07 16:52 ` Logan Gunthorpe
2022-09-06 6:04 ` Guoqing Jiang
2022-09-02 17:16 ` [PATCH 2/2] md: Remove extra mddev_get() in md_seq_start() Logan Gunthorpe
2022-09-05 5:51 ` Christoph Hellwig
2022-09-06 6:06 ` Guoqing Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).