All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
@ 2023-02-08  6:35 Christoph Hellwig
  2023-02-08  7:38 ` Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Christoph Hellwig @ 2023-02-08  6:35 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Ming Lei

While del_gendisk ensures there is no outstanding I/O on the queue,
it can't prevent block layer users from building new I/O.

This leads to a NULL ->root_blkg reference in bio_associate_blkg when
allocating a new bio on a shut down file system.  Delay freeing the
blk-cgroup subsystems from del_gendisk until disk_release to make
sure the blkg and throttle information is still avaіlable for bio
submitters, even if those bios will immediately fail.

This now can cause a case where disk_release is called on a disk
that hasn't been added.  That's mostly harmless, except for a case
in blk_throttl_exit that now needs to check for a NULL ->td pointer.

Fixes: 178fa7d49815 ("blk-cgroup: delay blk-cgroup initialization until add_disk")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-throttle.c | 3 ++-
 block/genhd.c        | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 902203bdddb4b4..e7bd7050d68402 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2411,7 +2411,8 @@ void blk_throtl_exit(struct gendisk *disk)
 {
 	struct request_queue *q = disk->queue;
 
-	BUG_ON(!q->td);
+	if (!q->td)
+		return;
 	del_timer_sync(&q->td->service_queue.pending_timer);
 	throtl_shutdown_wq(q);
 	blkcg_deactivate_policy(disk, &blkcg_policy_throtl);
diff --git a/block/genhd.c b/block/genhd.c
index 7e031559bf514c..65373738c70b02 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -668,8 +668,6 @@ void del_gendisk(struct gendisk *disk)
 	rq_qos_exit(q);
 	blk_mq_unquiesce_queue(q);
 
-	blkcg_exit_disk(disk);
-
 	/*
 	 * If the disk does not own the queue, allow using passthrough requests
 	 * again.  Else leave the queue frozen to fail all I/O.
@@ -1166,6 +1164,8 @@ static void disk_release(struct device *dev)
 	might_sleep();
 	WARN_ON_ONCE(disk_live(disk));
 
+	blkcg_exit_disk(disk);
+
 	/*
 	 * To undo the all initialization from blk_mq_init_allocated_queue in
 	 * case of a probe failure where add_disk is never called we have to
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-08  6:35 [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release Christoph Hellwig
@ 2023-02-08  7:38 ` Ming Lei
  2023-02-08 15:12   ` Christoph Hellwig
  2023-02-09 15:12 ` Jens Axboe
  2023-02-10  5:05 ` Ming Lei
  2 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2023-02-08  7:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, ming.lei

On Wed, Feb 08, 2023 at 07:35:14AM +0100, Christoph Hellwig wrote:
> While del_gendisk ensures there is no outstanding I/O on the queue,
> it can't prevent block layer users from building new I/O.
> 
> This leads to a NULL ->root_blkg reference in bio_associate_blkg when
> allocating a new bio on a shut down file system.  Delay freeing the
> blk-cgroup subsystems from del_gendisk until disk_release to make
> sure the blkg and throttle information is still avaіlable for bio
> submitters, even if those bios will immediately fail.
> 
> This now can cause a case where disk_release is called on a disk
> that hasn't been added.  That's mostly harmless, except for a case
> in blk_throttl_exit that now needs to check for a NULL ->td pointer.

With this way, blkcg_init_disk() could be called before q->root_blkg
is released in disk unbind & rebind use case, then memory leak?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-08  7:38 ` Ming Lei
@ 2023-02-08 15:12   ` Christoph Hellwig
  2023-02-08 23:47     ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2023-02-08 15:12 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, axboe, linux-block

On Wed, Feb 08, 2023 at 03:38:40PM +0800, Ming Lei wrote:
> > This now can cause a case where disk_release is called on a disk
> > that hasn't been added.  That's mostly harmless, except for a case
> > in blk_throttl_exit that now needs to check for a NULL ->td pointer.
> 
> With this way, blkcg_init_disk() could be called before q->root_blkg
> is released in disk unbind & rebind use case, then memory leak?

q->root_blkg is now disk->root_blkg.  So in an unind and rebind case
a different disk will be involved.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-08 15:12   ` Christoph Hellwig
@ 2023-02-08 23:47     ` Ming Lei
  2023-02-09  5:04       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2023-02-08 23:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block

On Wed, Feb 08, 2023 at 04:12:31PM +0100, Christoph Hellwig wrote:
> On Wed, Feb 08, 2023 at 03:38:40PM +0800, Ming Lei wrote:
> > > This now can cause a case where disk_release is called on a disk
> > > that hasn't been added.  That's mostly harmless, except for a case
> > > in blk_throttl_exit that now needs to check for a NULL ->td pointer.
> > 
> > With this way, blkcg_init_disk() could be called before q->root_blkg
> > is released in disk unbind & rebind use case, then memory leak?
> 
> q->root_blkg is now disk->root_blkg.  So in an unind and rebind case
> a different disk will be involved.

OK.

Another thing is that blkcg_init_disk() and blkcg_exit_disk() becomes
asymmetrical with this patch. So alloc_disk() & add_disk(), del_disk() & put_disk()
have to be done together. If it may be documented, this patch looks
fine.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-08 23:47     ` Ming Lei
@ 2023-02-09  5:04       ` Christoph Hellwig
  2023-02-09  8:56         ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2023-02-09  5:04 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, axboe, linux-block

On Thu, Feb 09, 2023 at 07:47:38AM +0800, Ming Lei wrote:
> Another thing is that blkcg_init_disk() and blkcg_exit_disk() becomes
> asymmetrical with this patch. So alloc_disk() & add_disk(), del_disk() & put_disk()
> have to be done together. If it may be documented, this patch looks
> fine.

As in no add_disk is allowed after del_gendisk on the same disk?
It's been like that since basically forever.  I agree that documenting
it probably doesn't hurt though - but that's separate from this fix.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-09  5:04       ` Christoph Hellwig
@ 2023-02-09  8:56         ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2023-02-09  8:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block

On Thu, Feb 09, 2023 at 06:04:01AM +0100, Christoph Hellwig wrote:
> On Thu, Feb 09, 2023 at 07:47:38AM +0800, Ming Lei wrote:
> > Another thing is that blkcg_init_disk() and blkcg_exit_disk() becomes
> > asymmetrical with this patch. So alloc_disk() & add_disk(), del_disk() & put_disk()
> > have to be done together. If it may be documented, this patch looks
> > fine.
> 
> As in no add_disk is allowed after del_gendisk on the same disk?
> It's been like that since basically forever.  I agree that documenting
> it probably doesn't hurt though - but that's separate from this fix.

OK, fair enough,

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-08  6:35 [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release Christoph Hellwig
  2023-02-08  7:38 ` Ming Lei
@ 2023-02-09 15:12 ` Jens Axboe
  2023-02-10  5:05 ` Ming Lei
  2 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2023-02-09 15:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Ming Lei


On Wed, 08 Feb 2023 07:35:14 +0100, Christoph Hellwig wrote:
> While del_gendisk ensures there is no outstanding I/O on the queue,
> it can't prevent block layer users from building new I/O.
> 
> This leads to a NULL ->root_blkg reference in bio_associate_blkg when
> allocating a new bio on a shut down file system.  Delay freeing the
> blk-cgroup subsystems from del_gendisk until disk_release to make
> sure the blkg and throttle information is still avaіlable for bio
> submitters, even if those bios will immediately fail.
> 
> [...]

Applied, thanks!

[1/1] blk-cgroup: delay calling blkcg_exit_disk until disk_release
      commit: c43332fe028c252a2a28e46be70a530f64fc3c9d

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-08  6:35 [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release Christoph Hellwig
  2023-02-08  7:38 ` Ming Lei
  2023-02-09 15:12 ` Jens Axboe
@ 2023-02-10  5:05 ` Ming Lei
  2023-02-13  8:37   ` Ming Lei
  2 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2023-02-10  5:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, ming.lei

On Wed, Feb 08, 2023 at 07:35:14AM +0100, Christoph Hellwig wrote:
> While del_gendisk ensures there is no outstanding I/O on the queue,
> it can't prevent block layer users from building new I/O.
> 
> This leads to a NULL ->root_blkg reference in bio_associate_blkg when
> allocating a new bio on a shut down file system.  Delay freeing the
> blk-cgroup subsystems from del_gendisk until disk_release to make
> sure the blkg and throttle information is still avaіlable for bio
> submitters, even if those bios will immediately fail.
> 
> This now can cause a case where disk_release is called on a disk
> that hasn't been added.  That's mostly harmless, except for a case
> in blk_throttl_exit that now needs to check for a NULL ->td pointer.
> 
> Fixes: 178fa7d49815 ("blk-cgroup: delay blk-cgroup initialization until add_disk")
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

hammmmmm, this patch actually causes bigger trouble.

After commit 84d7d462b16d ("blk-cgroup: pin the gendisk in struct blkcg_gq"),
blkcg_gq instance grabs disk's reference, so moving blkcg_exit_disk
into disk_release() just causes reference cross-dependency, both are
leaked.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-10  5:05 ` Ming Lei
@ 2023-02-13  8:37   ` Ming Lei
  2023-02-13  8:42     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2023-02-13  8:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, ming.lei

On Fri, Feb 10, 2023 at 01:05:11PM +0800, Ming Lei wrote:
> On Wed, Feb 08, 2023 at 07:35:14AM +0100, Christoph Hellwig wrote:
> > While del_gendisk ensures there is no outstanding I/O on the queue,
> > it can't prevent block layer users from building new I/O.
> > 
> > This leads to a NULL ->root_blkg reference in bio_associate_blkg when
> > allocating a new bio on a shut down file system.  Delay freeing the
> > blk-cgroup subsystems from del_gendisk until disk_release to make
> > sure the blkg and throttle information is still avaіlable for bio
> > submitters, even if those bios will immediately fail.
> > 
> > This now can cause a case where disk_release is called on a disk
> > that hasn't been added.  That's mostly harmless, except for a case
> > in blk_throttl_exit that now needs to check for a NULL ->td pointer.
> > 
> > Fixes: 178fa7d49815 ("blk-cgroup: delay blk-cgroup initialization until add_disk")
> > Reported-by: Ming Lei <ming.lei@redhat.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> hammmmmm, this patch actually causes bigger trouble.
> 
> After commit 84d7d462b16d ("blk-cgroup: pin the gendisk in struct blkcg_gq"),
> blkcg_gq instance grabs disk's reference, so moving blkcg_exit_disk
> into disk_release() just causes reference cross-dependency, both are
> leaked.

Hi Christoph & Jens,

This issue is a bit serious, both blkg & disk & request_queue are leaked by
commit c43332fe028c ("blk-cgroup: delay calling blkcg_exit_disk until disk_release").

Can we solve it before merging for-6.3/block into v6.3-rc1?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release
  2023-02-13  8:37   ` Ming Lei
@ 2023-02-13  8:42     ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2023-02-13  8:42 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, axboe, linux-block

On Mon, Feb 13, 2023 at 04:37:09PM +0800, Ming Lei wrote:
> This issue is a bit serious, both blkg & disk & request_queue are leaked by
> commit c43332fe028c ("blk-cgroup: delay calling blkcg_exit_disk until disk_release").
> 
> Can we solve it before merging for-6.3/block into v6.3-rc1?

Yes.  I'm testing a fix at the moment.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-13  8:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-08  6:35 [PATCH] blk-cgroup: delay calling blkcg_exit_disk until disk_release Christoph Hellwig
2023-02-08  7:38 ` Ming Lei
2023-02-08 15:12   ` Christoph Hellwig
2023-02-08 23:47     ` Ming Lei
2023-02-09  5:04       ` Christoph Hellwig
2023-02-09  8:56         ` Ming Lei
2023-02-09 15:12 ` Jens Axboe
2023-02-10  5:05 ` Ming Lei
2023-02-13  8:37   ` Ming Lei
2023-02-13  8:42     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.