All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures
@ 2022-12-10 18:33 Tejun Heo
  2022-12-10 18:34 ` [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy Tejun Heo
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Tejun Heo @ 2022-12-10 18:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, darklight2357, Josef Bacik,
	Linus Torvalds, cgroups

When a gendisk is successfully initialized but add_disk() fails such as when
a loop device has invalid number of minor device numbers specified,
blkcg_init_disk() is called during init and then blkcg_exit_disk() during
error handling. Unfortunately, iolatency gets initialized in the former but
doesn't get cleaned up in the latter.

This is because, in non-error cases, the cleanup is performed by
del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos
policies, iolatency being one of them, can only be activated once the disk
is fully registered and visible. That assumption is true for wbt and iocost,
but not so for iolatency as it gets initialized before add_disk() is called.

It is desirable to lazy-init rq_qos policies because they are optional
features and add to hot path overhead once initialized - each IO has to walk
all the registered rq_qos policies. So, we want to switch iolatency to lazy
init too. However, that's a bigger change. As a fix for the immediate
problem, let's just add an extra call to rq_qos_exit() in blkcg_exit_disk().
This is safe because duplicate calls to rq_qos_exit() become noop's.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: darklight2357@icloud.com
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: d70675121546 ("block: introduce blk-iolatency io controller")
Cc: stable@vger.kernel.org # v4.19+
---
Hello,

I'm posting two patches for the iolatency memory leak issue after add_disk()
failure. This one is the immediate fix and should be really safe. However,
any change has risks and given that the bug being address is not critical at
all, I still think it'd make sense to route it through 6.2-rc1 rather than
applying directly to master for 6.1 release. So, it's tagged for the 6.2
merge window.

Thanks.

 block/blk-cgroup.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -33,6 +33,7 @@
 #include "blk-cgroup.h"
 #include "blk-ioprio.h"
 #include "blk-throttle.h"
+#include "blk-rq-qos.h"
 
 /*
  * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
@@ -1322,6 +1323,7 @@ err_unlock:
 void blkcg_exit_disk(struct gendisk *disk)
 {
 	blkg_destroy_all(disk);
+	rq_qos_exit(disk->queue);
 	blk_throtl_exit(disk);
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy
  2022-12-10 18:33 [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures Tejun Heo
@ 2022-12-10 18:34 ` Tejun Heo
  2022-12-12  8:53   ` Christoph Hellwig
  2022-12-10 18:54 ` [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures Linus Torvalds
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2022-12-10 18:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, darklight2357, Josef Bacik,
	Linus Torvalds, cgroups

Other rq_qos policies such as wbt and iocost are lazy-initialized when they
are configured for the first time for the device but iolatency is
initialized unconditionally from blkcg_init_disk() during gendisk init. Lazy
init is beneficial because rq_qos policies add runtime overhead when
initialized as every IO has to walk all registered rq_qos callbacks.

This patch switches iolatency to lazy initialization too so that it only
registered its rq_qos policy when it is first configured.

Note that there is a known race condition between blkcg config file writes
and del_gendisk() and this patch makes iolatency susceptible to it by
exposing the init path to race against the deletion path. However, that
problem already exists in iocost and is being worked on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
---
Hello,

Tagged for-6.2 but this can easily go for the next merge window too.

Thanks.

 block/blk-cgroup.c    |    8 --------
 block/blk-iolatency.c |   36 ++++++++++++++++++++++++++++++++++--
 block/blk-rq-qos.h    |    2 +-
 block/blk.h           |    6 ------
 4 files changed, 35 insertions(+), 17 deletions(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -33,7 +33,6 @@
 #include "blk-cgroup.h"
 #include "blk-ioprio.h"
 #include "blk-throttle.h"
-#include "blk-rq-qos.h"
 
 /*
  * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
@@ -1300,14 +1299,8 @@ int blkcg_init_disk(struct gendisk *disk
 	if (ret)
 		goto err_ioprio_exit;
 
-	ret = blk_iolatency_init(disk);
-	if (ret)
-		goto err_throtl_exit;
-
 	return 0;
 
-err_throtl_exit:
-	blk_throtl_exit(disk);
 err_ioprio_exit:
 	blk_ioprio_exit(disk);
 err_destroy_all:
@@ -1323,7 +1316,6 @@ err_unlock:
 void blkcg_exit_disk(struct gendisk *disk)
 {
 	blkg_destroy_all(disk);
-	rq_qos_exit(disk->queue);
 	blk_throtl_exit(disk);
 }
 
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -755,7 +755,7 @@ static void blkiolatency_enable_work_fn(
 	}
 }
 
-int blk_iolatency_init(struct gendisk *disk)
+static int blk_iolatency_init(struct gendisk *disk)
 {
 	struct request_queue *q = disk->queue;
 	struct blk_iolatency *blkiolat;
@@ -830,6 +830,31 @@ static void iolatency_clear_scaling(stru
 	}
 }
 
+static int blk_iolatency_try_init(char *input)
+{
+	static DEFINE_MUTEX(init_mutex);
+	struct block_device *bdev;
+	int ret = 0;
+
+	bdev = blkcg_conf_open_bdev(&input);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+
+	/*
+	 * blk_iolatency_init() may fail after rq_qos_add() succeeds which can
+	 * confuse iolat_rq_qos() test. Make the test and init atomic.
+	 */
+	mutex_lock(&init_mutex);
+
+	if (!iolat_rq_qos(bdev->bd_queue))
+		ret = blk_iolatency_init(bdev->bd_disk);
+
+	mutex_unlock(&init_mutex);
+	blkdev_put_no_open(bdev);
+
+	return ret;
+}
+
 static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
 			     size_t nbytes, loff_t off)
 {
@@ -842,7 +867,14 @@ static ssize_t iolatency_set_limit(struc
 	u64 oldval;
 	int ret;
 
+retry:
 	ret = blkg_conf_prep(blkcg, &blkcg_policy_iolatency, buf, &ctx);
+	if (ret == -EOPNOTSUPP) {
+		ret = blk_iolatency_try_init(buf);
+		if (ret)
+			return ret;
+		goto retry;
+	}
 	if (ret)
 		return ret;
 
@@ -974,7 +1006,7 @@ static void iolatency_pd_init(struct blk
 {
 	struct iolatency_grp *iolat = pd_to_lat(pd);
 	struct blkcg_gq *blkg = lat_to_blkg(iolat);
-	struct rq_qos *rqos = blkcg_rq_qos(blkg->q);
+	struct rq_qos *rqos = iolat_rq_qos(blkg->q);
 	struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
 	u64 now = ktime_to_ns(ktime_get());
 	int cpu;
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -74,7 +74,7 @@ static inline struct rq_qos *wbt_rq_qos(
 	return rq_qos_id(q, RQ_QOS_WBT);
 }
 
-static inline struct rq_qos *blkcg_rq_qos(struct request_queue *q)
+static inline struct rq_qos *iolat_rq_qos(struct request_queue *q)
 {
 	return rq_qos_id(q, RQ_QOS_LATENCY);
 }
--- a/block/blk.h
+++ b/block/blk.h
@@ -392,12 +392,6 @@ static inline struct bio *blk_queue_boun
 	return bio;
 }
 
-#ifdef CONFIG_BLK_CGROUP_IOLATENCY
-int blk_iolatency_init(struct gendisk *disk);
-#else
-static inline int blk_iolatency_init(struct gendisk *disk) { return 0; };
-#endif
-
 #ifdef CONFIG_BLK_DEV_ZONED
 void disk_free_zone_bitmaps(struct gendisk *disk);
 void disk_clear_zone_settings(struct gendisk *disk);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures
  2022-12-10 18:33 [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures Tejun Heo
  2022-12-10 18:34 ` [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy Tejun Heo
@ 2022-12-10 18:54 ` Linus Torvalds
  2022-12-12  8:47   ` Christoph Hellwig
  2022-12-14 19:43   ` Jens Axboe
  3 siblings, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2022-12-10 18:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, linux-kernel, darklight2357,
	Josef Bacik, cgroups

On Sat, Dec 10, 2022 at 10:33 AM Tejun Heo <tj@kernel.org> wrote:
>
> I'm posting two patches for the iolatency memory leak issue after add_disk()
> failure. This one is the immediate fix and should be really safe. However,
> any change has risks and given that the bug being address is not critical at
> all, I still think it'd make sense to route it through 6.2-rc1 rather than
> applying directly to master for 6.1 release. So, it's tagged for the 6.2
> merge window.

Ack. I'm archiving these patches, and expect I'll be getting them the
usual ways (ie pull request).

If people expect something else (like me applying them during the
merge window as patches), holler.

            Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures
@ 2022-12-12  8:47   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-12-12  8:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, linux-kernel, darklight2357,
	Josef Bacik, Linus Torvalds, cgroups

On Sat, Dec 10, 2022 at 08:33:10AM -1000, Tejun Heo wrote:
> When a gendisk is successfully initialized but add_disk() fails such as when
> a loop device has invalid number of minor device numbers specified,
> blkcg_init_disk() is called during init and then blkcg_exit_disk() during
> error handling. Unfortunately, iolatency gets initialized in the former but
> doesn't get cleaned up in the latter.
> 
> This is because, in non-error cases, the cleanup is performed by
> del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos
> policies, iolatency being one of them, can only be activated once the disk
> is fully registered and visible. That assumption is true for wbt and iocost,
> but not so for iolatency as it gets initialized before add_disk() is called.
> 
> It is desirable to lazy-init rq_qos policies because they are optional
> features and add to hot path overhead once initialized - each IO has to walk
> all the registered rq_qos policies. So, we want to switch iolatency to lazy
> init too. However, that's a bigger change. As a fix for the immediate
> problem, let's just add an extra call to rq_qos_exit() in blkcg_exit_disk().
> This is safe because duplicate calls to rq_qos_exit() become noop's.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: darklight2357@icloud.com

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures
@ 2022-12-12  8:47   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-12-12  8:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	darklight2357-mVuRI66OGLPQT0dZR+AlfA, Josef Bacik,
	Linus Torvalds, cgroups-u79uwXL29TY76Z2rM5mHXA

On Sat, Dec 10, 2022 at 08:33:10AM -1000, Tejun Heo wrote:
> When a gendisk is successfully initialized but add_disk() fails such as when
> a loop device has invalid number of minor device numbers specified,
> blkcg_init_disk() is called during init and then blkcg_exit_disk() during
> error handling. Unfortunately, iolatency gets initialized in the former but
> doesn't get cleaned up in the latter.
> 
> This is because, in non-error cases, the cleanup is performed by
> del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos
> policies, iolatency being one of them, can only be activated once the disk
> is fully registered and visible. That assumption is true for wbt and iocost,
> but not so for iolatency as it gets initialized before add_disk() is called.
> 
> It is desirable to lazy-init rq_qos policies because they are optional
> features and add to hot path overhead once initialized - each IO has to walk
> all the registered rq_qos policies. So, we want to switch iolatency to lazy
> init too. However, that's a bigger change. As a fix for the immediate
> problem, let's just add an extra call to rq_qos_exit() in blkcg_exit_disk().
> This is safe because duplicate calls to rq_qos_exit() become noop's.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Reported-by: darklight2357-mVuRI66OGLPQT0dZR+AlfA@public.gmane.org

Looks good:

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy
  2022-12-10 18:34 ` [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy Tejun Heo
@ 2022-12-12  8:53   ` Christoph Hellwig
  2022-12-12 22:44       ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-12-12  8:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, linux-kernel, darklight2357,
	Josef Bacik, Linus Torvalds, cgroups

> +static int blk_iolatency_try_init(char *input)
> +{
> +	static DEFINE_MUTEX(init_mutex);
> +	struct block_device *bdev;
> +	int ret = 0;
> +
> +	bdev = blkcg_conf_open_bdev(&input);
> +	if (IS_ERR(bdev))
> +		return PTR_ERR(bdev);

> +retry:
>  	ret = blkg_conf_prep(blkcg, &blkcg_policy_iolatency, buf, &ctx);
> +	if (ret == -EOPNOTSUPP) {
> +		ret = blk_iolatency_try_init(buf);

It's a little sad to do two block device lookups here (even if it
obviously doesn't matter for performance).  I wonder if it would
make sense to explicitly support the lazy init pattern
in blkg_conf_prep somehow.

Otherwise I'm all for the lazy init.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy
@ 2022-12-12 22:44       ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2022-12-12 22:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-kernel, darklight2357,
	Josef Bacik, Linus Torvalds, cgroups

Hello, Christoph.

On Mon, Dec 12, 2022 at 12:53:26AM -0800, Christoph Hellwig wrote:
> > +static int blk_iolatency_try_init(char *input)
> > +{
> > +	static DEFINE_MUTEX(init_mutex);
> > +	struct block_device *bdev;
> > +	int ret = 0;
> > +
> > +	bdev = blkcg_conf_open_bdev(&input);
> > +	if (IS_ERR(bdev))
> > +		return PTR_ERR(bdev);
> 
> > +retry:
> >  	ret = blkg_conf_prep(blkcg, &blkcg_policy_iolatency, buf, &ctx);
> > +	if (ret == -EOPNOTSUPP) {
> > +		ret = blk_iolatency_try_init(buf);
> 
> It's a little sad to do two block device lookups here (even if it
> obviously doesn't matter for performance).  I wonder if it would
> make sense to explicitly support the lazy init pattern
> in blkg_conf_prep somehow.
> 
> Otherwise I'm all for the lazy init.

Yeah, I thought about separating out open_bdev from blkg_conf_prep() but the
added complexity didn't feel very attractive given the usage pattern. Lemme
take a stab at it. Maybe it won't look too bad.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy
@ 2022-12-12 22:44       ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2022-12-12 22:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	darklight2357-mVuRI66OGLPQT0dZR+AlfA, Josef Bacik,
	Linus Torvalds, cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Christoph.

On Mon, Dec 12, 2022 at 12:53:26AM -0800, Christoph Hellwig wrote:
> > +static int blk_iolatency_try_init(char *input)
> > +{
> > +	static DEFINE_MUTEX(init_mutex);
> > +	struct block_device *bdev;
> > +	int ret = 0;
> > +
> > +	bdev = blkcg_conf_open_bdev(&input);
> > +	if (IS_ERR(bdev))
> > +		return PTR_ERR(bdev);
> 
> > +retry:
> >  	ret = blkg_conf_prep(blkcg, &blkcg_policy_iolatency, buf, &ctx);
> > +	if (ret == -EOPNOTSUPP) {
> > +		ret = blk_iolatency_try_init(buf);
> 
> It's a little sad to do two block device lookups here (even if it
> obviously doesn't matter for performance).  I wonder if it would
> make sense to explicitly support the lazy init pattern
> in blkg_conf_prep somehow.
> 
> Otherwise I'm all for the lazy init.

Yeah, I thought about separating out open_bdev from blkg_conf_prep() but the
added complexity didn't feel very attractive given the usage pattern. Lemme
take a stab at it. Maybe it won't look too bad.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: (subset) [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures
@ 2022-12-14 19:43   ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2022-12-14 19:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, linux-kernel, darklight2357, Josef Bacik,
	Linus Torvalds, cgroups


On Sat, 10 Dec 2022 08:33:10 -1000, Tejun Heo wrote:
> When a gendisk is successfully initialized but add_disk() fails such as when
> a loop device has invalid number of minor device numbers specified,
> blkcg_init_disk() is called during init and then blkcg_exit_disk() during
> error handling. Unfortunately, iolatency gets initialized in the former but
> doesn't get cleaned up in the latter.
> 
> This is because, in non-error cases, the cleanup is performed by
> del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos
> policies, iolatency being one of them, can only be activated once the disk
> is fully registered and visible. That assumption is true for wbt and iocost,
> but not so for iolatency as it gets initialized before add_disk() is called.
> 
> [...]

Applied, thanks!

[1/2] blk-iolatency: Fix memory leak on add_disk() failures
      commit: 813e693023ba10da9e75067780f8378465bf27cc

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: (subset) [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures
@ 2022-12-14 19:43   ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2022-12-14 19:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	darklight2357-mVuRI66OGLPQT0dZR+AlfA, Josef Bacik,
	Linus Torvalds, cgroups-u79uwXL29TY76Z2rM5mHXA


On Sat, 10 Dec 2022 08:33:10 -1000, Tejun Heo wrote:
> When a gendisk is successfully initialized but add_disk() fails such as when
> a loop device has invalid number of minor device numbers specified,
> blkcg_init_disk() is called during init and then blkcg_exit_disk() during
> error handling. Unfortunately, iolatency gets initialized in the former but
> doesn't get cleaned up in the latter.
> 
> This is because, in non-error cases, the cleanup is performed by
> del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos
> policies, iolatency being one of them, can only be activated once the disk
> is fully registered and visible. That assumption is true for wbt and iocost,
> but not so for iolatency as it gets initialized before add_disk() is called.
> 
> [...]

Applied, thanks!

[1/2] blk-iolatency: Fix memory leak on add_disk() failures
      commit: 813e693023ba10da9e75067780f8378465bf27cc

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-14 19:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-10 18:33 [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures Tejun Heo
2022-12-10 18:34 ` [PATCH 2/2 block/for-6.2] blk-iolatency: Make initialization lazy Tejun Heo
2022-12-12  8:53   ` Christoph Hellwig
2022-12-12 22:44     ` Tejun Heo
2022-12-12 22:44       ` Tejun Heo
2022-12-10 18:54 ` [PATCH 1/2 block/for-6.2] blk-iolatency: Fix memory leak on add_disk() failures Linus Torvalds
2022-12-12  8:47 ` Christoph Hellwig
2022-12-12  8:47   ` Christoph Hellwig
2022-12-14 19:43 ` (subset) " Jens Axboe
2022-12-14 19:43   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.