All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"
@ 2023-01-30 23:22 Bart Van Assche
  2023-01-31  1:52 ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2023-01-30 23:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Bart Van Assche, Ming Lei, Christoph Hellwig

Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's
refcnt") for many request queues the reference count drops to 1 when
the request queue is destroyed instead of to 0. In other words, the
request queue is leaked. Fix this by reverting that commit.

This leak was discovered by running the following shell command before
and after the sub-page block layer tests:

cat /sys/kernel/debug/block/sub_page_limit_queues

Without this patch, the above debugfs attribute is increased by one
after each such sub-page block layer test. With this revert applied,
that debugfs attribute drops to zero after each sub-page block layer
test.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-cgroup.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index cb110fc51940..2e531268f725 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -141,7 +141,6 @@ static void blkg_free_workfn(struct work_struct *work)
 	if (q) {
 		list_del_init(&blkg->q_node);
 		mutex_unlock(&q->blkcg_mutex);
-		blk_put_queue(q);
 	}
 
 	free_percpu(blkg->iostat_cpu);
@@ -273,9 +272,6 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk,
 	if (!blkg->iostat_cpu)
 		goto err_free;
 
-	if (!blk_get_queue(disk->queue))
-		goto err_free;
-
 	blkg->q = disk->queue;
 	INIT_LIST_HEAD(&blkg->q_node);
 	spin_lock_init(&blkg->async_bio_lock);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"
  2023-01-30 23:22 [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt" Bart Van Assche
@ 2023-01-31  1:52 ` Ming Lei
  2023-01-31 17:31   ` Bart Van Assche
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2023-01-31  1:52 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig, ming.lei

Hi Bart,

On Mon, Jan 30, 2023 at 03:22:57PM -0800, Bart Van Assche wrote:
> Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's
> refcnt") for many request queues the reference count drops to 1 when
> the request queue is destroyed instead of to 0. In other words, the
> request queue is leaked. Fix this by reverting that commit.

When/where you observe that the reference count drops to 1 instead of 0?

Do you have kmem leak log?

Probably, the last drop is in blkg_free_workfn().


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"
  2023-01-31  1:52 ` Ming Lei
@ 2023-01-31 17:31   ` Bart Van Assche
  2023-02-01  1:56     ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2023-01-31 17:31 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On 1/30/23 17:52, Ming Lei wrote:
> Hi Bart,
> 
> On Mon, Jan 30, 2023 at 03:22:57PM -0800, Bart Van Assche wrote:
>> Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's
>> refcnt") for many request queues the reference count drops to 1 when
>> the request queue is destroyed instead of to 0. In other words, the
>> request queue is leaked. Fix this by reverting that commit.
> 
> When/where you observe that the reference count drops to 1 instead of 0?
> 
> Do you have kmem leak log?
> 
> Probably, the last drop is in blkg_free_workfn().

Hi Ming,

The reference count leak was discovered while I was testing my patch 
series that adds support for sub-page limits 
(https://lore.kernel.org/linux-block/20230130212656.876311-1-bvanassche@acm.org/T/#t). 
The second patch in that series adds a counter that tracks the number of 
queues that need support for limits below the page size 
(sub_page_limit_queues). I noticed that without this patch that counter 
increases but never decreases. With this patch applied, that counter 
drops back to zero after having run a test that needs support for 
sub-page limits.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"
  2023-01-31 17:31   ` Bart Van Assche
@ 2023-02-01  1:56     ` Ming Lei
  2023-02-01  4:29       ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2023-02-01  1:56 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig, ming.lei

On Tue, Jan 31, 2023 at 09:31:36AM -0800, Bart Van Assche wrote:
> On 1/30/23 17:52, Ming Lei wrote:
> > Hi Bart,
> > 
> > On Mon, Jan 30, 2023 at 03:22:57PM -0800, Bart Van Assche wrote:
> > > Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's
> > > refcnt") for many request queues the reference count drops to 1 when
> > > the request queue is destroyed instead of to 0. In other words, the
> > > request queue is leaked. Fix this by reverting that commit.
> > 
> > When/where you observe that the reference count drops to 1 instead of 0?
> > 
> > Do you have kmem leak log?
> > 
> > Probably, the last drop is in blkg_free_workfn().
> 
> Hi Ming,
> 
> The reference count leak was discovered while I was testing my patch series
> that adds support for sub-page limits (https://lore.kernel.org/linux-block/20230130212656.876311-1-bvanassche@acm.org/T/#t).
> The second patch in that series adds a counter that tracks the number of
> queues that need support for limits below the page size
> (sub_page_limit_queues). I noticed that without this patch that counter
> increases but never decreases. With this patch applied, that counter drops
> back to zero after having run a test that needs support for sub-page limits.

I can reproduce the issue by scsi_debug now, but blkg_release() isn't called,
so looks like one blkcg_gq lifetime issue since blkcg_exit_disk() is really
run.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"
  2023-02-01  1:56     ` Ming Lei
@ 2023-02-01  4:29       ` Ming Lei
  2023-02-01 15:55         ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2023-02-01  4:29 UTC (permalink / raw)
  To: Bart Van Assche, Waiman Long; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On Wed, Feb 01, 2023 at 09:56:00AM +0800, Ming Lei wrote:
> On Tue, Jan 31, 2023 at 09:31:36AM -0800, Bart Van Assche wrote:
> > On 1/30/23 17:52, Ming Lei wrote:
> > > Hi Bart,
> > > 
> > > On Mon, Jan 30, 2023 at 03:22:57PM -0800, Bart Van Assche wrote:
> > > > Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's
> > > > refcnt") for many request queues the reference count drops to 1 when
> > > > the request queue is destroyed instead of to 0. In other words, the
> > > > request queue is leaked. Fix this by reverting that commit.
> > > 
> > > When/where you observe that the reference count drops to 1 instead of 0?
> > > 
> > > Do you have kmem leak log?
> > > 
> > > Probably, the last drop is in blkg_free_workfn().
> > 
> > Hi Ming,
> > 
> > The reference count leak was discovered while I was testing my patch series
> > that adds support for sub-page limits (https://lore.kernel.org/linux-block/20230130212656.876311-1-bvanassche@acm.org/T/#t).
> > The second patch in that series adds a counter that tracks the number of
> > queues that need support for limits below the page size
> > (sub_page_limit_queues). I noticed that without this patch that counter
> > increases but never decreases. With this patch applied, that counter drops
> > back to zero after having run a test that needs support for sub-page limits.
> 
> I can reproduce the issue by scsi_debug now, but blkg_release() isn't called,
> so looks like one blkcg_gq lifetime issue since blkcg_exit_disk() is really
> run.

The problem is caused by 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()").

This commit will hold blkg instance until blkcg_rstat_flush() is called,
and which may be delayed to css_release_work_fn().

Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"
  2023-02-01  4:29       ` Ming Lei
@ 2023-02-01 15:55         ` Ming Lei
  2023-02-01 20:21           ` Bart Van Assche
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2023-02-01 15:55 UTC (permalink / raw)
  To: Bart Van Assche, Waiman Long, Tejun Heo
  Cc: Jens Axboe, linux-block, Christoph Hellwig

On Wed, Feb 01, 2023 at 12:29:41PM +0800, Ming Lei wrote:
> On Wed, Feb 01, 2023 at 09:56:00AM +0800, Ming Lei wrote:
> > On Tue, Jan 31, 2023 at 09:31:36AM -0800, Bart Van Assche wrote:
> > > On 1/30/23 17:52, Ming Lei wrote:
> > > > Hi Bart,
> > > > 
> > > > On Mon, Jan 30, 2023 at 03:22:57PM -0800, Bart Van Assche wrote:
> > > > > Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's
> > > > > refcnt") for many request queues the reference count drops to 1 when
> > > > > the request queue is destroyed instead of to 0. In other words, the
> > > > > request queue is leaked. Fix this by reverting that commit.
> > > > 
> > > > When/where you observe that the reference count drops to 1 instead of 0?
> > > > 
> > > > Do you have kmem leak log?
> > > > 
> > > > Probably, the last drop is in blkg_free_workfn().
> > > 
> > > Hi Ming,
> > > 
> > > The reference count leak was discovered while I was testing my patch series
> > > that adds support for sub-page limits (https://lore.kernel.org/linux-block/20230130212656.876311-1-bvanassche@acm.org/T/#t).
> > > The second patch in that series adds a counter that tracks the number of
> > > queues that need support for limits below the page size
> > > (sub_page_limit_queues). I noticed that without this patch that counter
> > > increases but never decreases. With this patch applied, that counter drops
> > > back to zero after having run a test that needs support for sub-page limits.
> > 
> > I can reproduce the issue by scsi_debug now, but blkg_release() isn't called,
> > so looks like one blkcg_gq lifetime issue since blkcg_exit_disk() is really
> > run.
> 
> The problem is caused by 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()").
> 
> This commit will hold blkg instance until blkcg_rstat_flush() is called,
> and which may be delayed to css_release_work_fn().

The following patch can address the blkg leak issue:

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index cb110fc51940..78f855c34746 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -2034,6 +2034,10 @@ void blk_cgroup_bio_start(struct bio *bio)
 	struct blkg_iostat_set *bis;
 	unsigned long flags;
 
+	/* Root-level stats are sourced from system-wide IO stats */
+	if (!cgroup_parent(blkcg->css.cgroup))
+		return;
+
 	cpu = get_cpu();
 	bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
 	flags = u64_stats_update_begin_irqsave(&bis->sync);

Thanks, 
Ming


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"
  2023-02-01 15:55         ` Ming Lei
@ 2023-02-01 20:21           ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2023-02-01 20:21 UTC (permalink / raw)
  To: Ming Lei, Waiman Long, Tejun Heo
  Cc: Jens Axboe, linux-block, Christoph Hellwig

On 2/1/23 07:55, Ming Lei wrote:
> The following patch can address the blkg leak issue:
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index cb110fc51940..78f855c34746 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -2034,6 +2034,10 @@ void blk_cgroup_bio_start(struct bio *bio)
>   	struct blkg_iostat_set *bis;
>   	unsigned long flags;
>   
> +	/* Root-level stats are sourced from system-wide IO stats */
> +	if (!cgroup_parent(blkcg->css.cgroup))
> +		return;
> +
>   	cpu = get_cpu();
>   	bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
>   	flags = u64_stats_update_begin_irqsave(&bis->sync);

The above patch passes my tests. Feel free to add:

Tested-by: Bart van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-01 20:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-30 23:22 [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt" Bart Van Assche
2023-01-31  1:52 ` Ming Lei
2023-01-31 17:31   ` Bart Van Assche
2023-02-01  1:56     ` Ming Lei
2023-02-01  4:29       ` Ming Lei
2023-02-01 15:55         ` Ming Lei
2023-02-01 20:21           ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.