All of lore.kernel.org
 help / color / mirror / Atom feed
* untangle the request_queue refcounting from the queue kobject v2
@ 2022-11-14  4:26 Christoph Hellwig
  2022-11-14  4:26 ` [PATCH 1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-14  4:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Biggers, linux-block

Hi Jens,

this series cleans up the registration of the "queue/" kobject, and given
untangles it from the request_queue refcounting.

Changes since v1:
 - also change the blk_crypto_sysfs_unregister prototype
 - add two patches to fix the error handling in blk_register_queue

Diffstat:
 block/blk-core.c            |   44 +++++++++++----
 block/blk-crypto-internal.h |   10 ++-
 block/blk-crypto-sysfs.c    |   11 ++-
 block/blk-ia-ranges.c       |    3 -
 block/blk-sysfs.c           |  129 +++++++++++++++++---------------------------
 block/blk.h                 |    4 -
 block/bsg.c                 |   11 ++-
 block/elevator.c            |    2 
 include/linux/blkdev.h      |    6 --
 9 files changed, 108 insertions(+), 112 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
@ 2022-11-14  4:26 ` Christoph Hellwig
  2022-11-18  2:59   ` Eric Biggers
  2022-11-14  4:26 ` [PATCH 2/5] block: factor out a blk_debugfs_remove helper Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-14  4:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Biggers, linux-block

Prepare for changes to the block layer sysfs handling by passing the
readily available gendisk to blk_crypto_sysfs_{,un}register.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-crypto-internal.h | 10 ++++++----
 block/blk-crypto-sysfs.c    |  7 ++++---
 block/blk-sysfs.c           |  4 ++--
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/block/blk-crypto-internal.h b/block/blk-crypto-internal.h
index e6818ffaddbf8..b8a00847171f6 100644
--- a/block/blk-crypto-internal.h
+++ b/block/blk-crypto-internal.h
@@ -21,9 +21,9 @@ extern const struct blk_crypto_mode blk_crypto_modes[];
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 
-int blk_crypto_sysfs_register(struct request_queue *q);
+int blk_crypto_sysfs_register(struct gendisk *disk);
 
-void blk_crypto_sysfs_unregister(struct request_queue *q);
+void blk_crypto_sysfs_unregister(struct gendisk *disk);
 
 void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
 			     unsigned int inc);
@@ -67,12 +67,14 @@ static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
 
 #else /* CONFIG_BLK_INLINE_ENCRYPTION */
 
-static inline int blk_crypto_sysfs_register(struct request_queue *q)
+static inline int blk_crypto_sysfs_register(struct gendisk *disk)
 {
 	return 0;
 }
 
-static inline void blk_crypto_sysfs_unregister(struct request_queue *q) { }
+static inline void blk_crypto_sysfs_unregister(struct gendisk *disk)
+{
+}
 
 static inline bool bio_crypt_rq_ctx_compatible(struct request *rq,
 					       struct bio *bio)
diff --git a/block/blk-crypto-sysfs.c b/block/blk-crypto-sysfs.c
index fd93bd2f33b75..e05f145cd797f 100644
--- a/block/blk-crypto-sysfs.c
+++ b/block/blk-crypto-sysfs.c
@@ -126,8 +126,9 @@ static struct kobj_type blk_crypto_ktype = {
  * If the request_queue has a blk_crypto_profile, create the "crypto"
  * subdirectory in sysfs (/sys/block/$disk/queue/crypto/).
  */
-int blk_crypto_sysfs_register(struct request_queue *q)
+int blk_crypto_sysfs_register(struct gendisk *disk)
 {
+	struct request_queue *q = disk->queue;
 	struct blk_crypto_kobj *obj;
 	int err;
 
@@ -149,9 +150,9 @@ int blk_crypto_sysfs_register(struct request_queue *q)
 	return 0;
 }
 
-void blk_crypto_sysfs_unregister(struct request_queue *q)
+void blk_crypto_sysfs_unregister(struct gendisk *disk)
 {
-	kobject_put(q->crypto_kobject);
+	kobject_put(disk->queue->crypto_kobject);
 }
 
 static int __init blk_crypto_sysfs_init(void)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 02e94c4beff17..bd223a3bef47d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -836,7 +836,7 @@ int blk_register_queue(struct gendisk *disk)
 			goto put_dev;
 	}
 
-	ret = blk_crypto_sysfs_register(q);
+	ret = blk_crypto_sysfs_register(disk);
 	if (ret)
 		goto put_dev;
 
@@ -913,7 +913,7 @@ void blk_unregister_queue(struct gendisk *disk)
 	 */
 	if (queue_is_mq(q))
 		blk_mq_sysfs_unregister(disk);
-	blk_crypto_sysfs_unregister(q);
+	blk_crypto_sysfs_unregister(disk);
 
 	mutex_lock(&q->sysfs_lock);
 	elv_unregister_queue(q);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] block: factor out a blk_debugfs_remove helper
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
  2022-11-14  4:26 ` [PATCH 1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register Christoph Hellwig
@ 2022-11-14  4:26 ` Christoph Hellwig
  2022-11-14  4:26 ` [PATCH 3/5] block: fix error unwinding in blk_register_queue Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-14  4:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Biggers, linux-block

Split the debugfs removal from blk_unregister_queue into a helper so that
the it can be reused for blk_register_queue error handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-sysfs.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index bd223a3bef47d..197646d479b4a 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -800,6 +800,19 @@ struct kobj_type blk_queue_ktype = {
 	.release	= blk_release_queue,
 };
 
+static void blk_debugfs_remove(struct gendisk *disk)
+{
+	struct request_queue *q = disk->queue;
+
+	mutex_lock(&q->debugfs_mutex);
+	blk_trace_shutdown(q);
+	debugfs_remove_recursive(q->debugfs_dir);
+	q->debugfs_dir = NULL;
+	q->sched_debugfs_dir = NULL;
+	q->rqos_debugfs_dir = NULL;
+	mutex_unlock(&q->debugfs_mutex);
+}
+
 /**
  * blk_register_queue - register a block layer queue with sysfs
  * @disk: Disk of which the request queue should be registered with sysfs.
@@ -925,11 +938,5 @@ void blk_unregister_queue(struct gendisk *disk)
 	kobject_del(&q->kobj);
 	mutex_unlock(&q->sysfs_dir_lock);
 
-	mutex_lock(&q->debugfs_mutex);
-	blk_trace_shutdown(q);
-	debugfs_remove_recursive(q->debugfs_dir);
-	q->debugfs_dir = NULL;
-	q->sched_debugfs_dir = NULL;
-	q->rqos_debugfs_dir = NULL;
-	mutex_unlock(&q->debugfs_mutex);
+	blk_debugfs_remove(disk);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] block: fix error unwinding in blk_register_queue
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
  2022-11-14  4:26 ` [PATCH 1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register Christoph Hellwig
  2022-11-14  4:26 ` [PATCH 2/5] block: factor out a blk_debugfs_remove helper Christoph Hellwig
@ 2022-11-14  4:26 ` Christoph Hellwig
  2022-11-14  4:26 ` [PATCH 4/5] block: untangle request_queue refcounting from sysfs Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-14  4:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Biggers, linux-block

blk_register_queue fails to handle errors from blk_mq_sysfs_register,
leaks various resources on errors and accidentally sets queue refs percpu
refcount to percpu mode on kobject_add failure.  Fix all that by
properly unwinding on errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-sysfs.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 197646d479b4a..abd1784ff05e3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -823,13 +823,15 @@ int blk_register_queue(struct gendisk *disk)
 	int ret;
 
 	mutex_lock(&q->sysfs_dir_lock);
-
 	ret = kobject_add(&q->kobj, &disk_to_dev(disk)->kobj, "queue");
 	if (ret < 0)
-		goto unlock;
+		goto out_unlock_dir;
 
-	if (queue_is_mq(q))
-		blk_mq_sysfs_register(disk);
+	if (queue_is_mq(q)) {
+		ret = blk_mq_sysfs_register(disk);
+		if (ret)
+			goto out_del_queue_kobj;
+	}
 	mutex_lock(&q->sysfs_lock);
 
 	mutex_lock(&q->debugfs_mutex);
@@ -841,17 +843,17 @@ int blk_register_queue(struct gendisk *disk)
 
 	ret = disk_register_independent_access_ranges(disk);
 	if (ret)
-		goto put_dev;
+		goto out_debugfs_remove;
 
 	if (q->elevator) {
 		ret = elv_register_queue(q, false);
 		if (ret)
-			goto put_dev;
+			goto out_unregister_ia_ranges;
 	}
 
 	ret = blk_crypto_sysfs_register(disk);
 	if (ret)
-		goto put_dev;
+		goto out_elv_unregister;
 
 	blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
 	wbt_enable_default(q);
@@ -862,8 +864,6 @@ int blk_register_queue(struct gendisk *disk)
 	if (q->elevator)
 		kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
 	mutex_unlock(&q->sysfs_lock);
-
-unlock:
 	mutex_unlock(&q->sysfs_dir_lock);
 
 	/*
@@ -882,13 +882,17 @@ int blk_register_queue(struct gendisk *disk)
 
 	return ret;
 
-put_dev:
+out_elv_unregister:
 	elv_unregister_queue(q);
+out_unregister_ia_ranges:
 	disk_unregister_independent_access_ranges(disk);
+out_debugfs_remove:
+	blk_debugfs_remove(disk);
 	mutex_unlock(&q->sysfs_lock);
-	mutex_unlock(&q->sysfs_dir_lock);
+out_del_queue_kobj:
 	kobject_del(&q->kobj);
-
+out_unlock_dir:
+	mutex_unlock(&q->sysfs_dir_lock);
 	return ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] block: untangle request_queue refcounting from sysfs
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-11-14  4:26 ` [PATCH 3/5] block: fix error unwinding in blk_register_queue Christoph Hellwig
@ 2022-11-14  4:26 ` Christoph Hellwig
  2022-11-14  4:26 ` [PATCH 5/5] block: mark blk_put_queue as potentially blocking Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-14  4:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Biggers, linux-block

The kobject embedded into the request_queue is used for the queue
directory in sysfs, but that is a child of the gendisks directory and is
intimately tied to it.  Move this kobject to the gendisk and use a
refcount_t in the request_queue for the actual request_queue refcounting
that is completely unrelated to the device model.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c         | 42 ++++++++++++++++----
 block/blk-crypto-sysfs.c |  4 +-
 block/blk-ia-ranges.c    |  3 +-
 block/blk-sysfs.c        | 86 +++++++++++-----------------------------
 block/blk.h              |  4 --
 block/bsg.c              | 11 +++--
 block/elevator.c         |  2 +-
 include/linux/blkdev.h   |  6 +--
 8 files changed, 71 insertions(+), 87 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e9e2bf15cd909..d14317bfdf654 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -59,12 +59,12 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(block_split);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_insert);
 
-DEFINE_IDA(blk_queue_ida);
+static DEFINE_IDA(blk_queue_ida);
 
 /*
  * For queue allocation
  */
-struct kmem_cache *blk_requestq_cachep;
+static struct kmem_cache *blk_requestq_cachep;
 
 /*
  * Controlling structure to kblockd
@@ -252,19 +252,46 @@ void blk_clear_pm_only(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_clear_pm_only);
 
+static void blk_free_queue_rcu(struct rcu_head *rcu_head)
+{
+	kmem_cache_free(blk_requestq_cachep,
+			container_of(rcu_head, struct request_queue, rcu_head));
+}
+
+static void blk_free_queue(struct request_queue *q)
+{
+	might_sleep();
+
+	percpu_ref_exit(&q->q_usage_counter);
+
+	if (q->poll_stat)
+		blk_stat_remove_callback(q, q->poll_cb);
+	blk_stat_free_callback(q->poll_cb);
+
+	blk_free_queue_stats(q->stats);
+	kfree(q->poll_stat);
+
+	if (queue_is_mq(q))
+		blk_mq_release(q);
+
+	ida_free(&blk_queue_ida, q->id);
+	call_rcu(&q->rcu_head, blk_free_queue_rcu);
+}
+
 /**
  * blk_put_queue - decrement the request_queue refcount
  * @q: the request_queue structure to decrement the refcount for
  *
- * Decrements the refcount of the request_queue kobject. When this reaches 0
- * we'll have blk_release_queue() called.
+ * Decrements the refcount of the request_queue and free it when the refcount
+ * reaches 0.
  *
  * Context: Any context, but the last reference must not be dropped from
  *          atomic context.
  */
 void blk_put_queue(struct request_queue *q)
 {
-	kobject_put(&q->kobj);
+	if (refcount_dec_and_test(&q->refs))
+		blk_free_queue(q);
 }
 EXPORT_SYMBOL(blk_put_queue);
 
@@ -399,8 +426,7 @@ struct request_queue *blk_alloc_queue(int node_id)
 	INIT_WORK(&q->timeout_work, blk_timeout_work);
 	INIT_LIST_HEAD(&q->icq_list);
 
-	kobject_init(&q->kobj, &blk_queue_ktype);
-
+	refcount_set(&q->refs, 1);
 	mutex_init(&q->debugfs_mutex);
 	mutex_init(&q->sysfs_lock);
 	mutex_init(&q->sysfs_dir_lock);
@@ -445,7 +471,7 @@ bool blk_get_queue(struct request_queue *q)
 {
 	if (unlikely(blk_queue_dying(q)))
 		return false;
-	kobject_get(&q->kobj);
+	refcount_inc(&q->refs);
 	return true;
 }
 EXPORT_SYMBOL(blk_get_queue);
diff --git a/block/blk-crypto-sysfs.c b/block/blk-crypto-sysfs.c
index e05f145cd797f..55268edc06253 100644
--- a/block/blk-crypto-sysfs.c
+++ b/block/blk-crypto-sysfs.c
@@ -140,8 +140,8 @@ int blk_crypto_sysfs_register(struct gendisk *disk)
 		return -ENOMEM;
 	obj->profile = q->crypto_profile;
 
-	err = kobject_init_and_add(&obj->kobj, &blk_crypto_ktype, &q->kobj,
-				   "crypto");
+	err = kobject_init_and_add(&obj->kobj, &blk_crypto_ktype,
+				   &disk->queue_kobj, "crypto");
 	if (err) {
 		kobject_put(&obj->kobj);
 		return err;
diff --git a/block/blk-ia-ranges.c b/block/blk-ia-ranges.c
index 2bd1d311033b5..2141931ddd37e 100644
--- a/block/blk-ia-ranges.c
+++ b/block/blk-ia-ranges.c
@@ -123,7 +123,8 @@ int disk_register_independent_access_ranges(struct gendisk *disk)
 	 */
 	WARN_ON(iars->sysfs_registered);
 	ret = kobject_init_and_add(&iars->kobj, &blk_ia_ranges_ktype,
-				   &q->kobj, "%s", "independent_access_ranges");
+				   &disk->queue_kobj, "%s",
+				   "independent_access_ranges");
 	if (ret) {
 		disk->ia_ranges = NULL;
 		kobject_put(&iars->kobj);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index abd1784ff05e3..93d9e9c9a6ea8 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -683,8 +683,8 @@ static struct attribute *queue_attrs[] = {
 static umode_t queue_attr_visible(struct kobject *kobj, struct attribute *attr,
 				int n)
 {
-	struct request_queue *q =
-		container_of(kobj, struct request_queue, kobj);
+	struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj);
+	struct request_queue *q = disk->queue;
 
 	if (attr == &queue_io_timeout_entry.attr &&
 		(!q->mq_ops || !q->mq_ops->timeout))
@@ -710,8 +710,8 @@ static ssize_t
 queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
 {
 	struct queue_sysfs_entry *entry = to_queue(attr);
-	struct request_queue *q =
-		container_of(kobj, struct request_queue, kobj);
+	struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj);
+	struct request_queue *q = disk->queue;
 	ssize_t res;
 
 	if (!entry->show)
@@ -727,63 +727,19 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 		    const char *page, size_t length)
 {
 	struct queue_sysfs_entry *entry = to_queue(attr);
-	struct request_queue *q;
+	struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj);
+	struct request_queue *q = disk->queue;
 	ssize_t res;
 
 	if (!entry->store)
 		return -EIO;
 
-	q = container_of(kobj, struct request_queue, kobj);
 	mutex_lock(&q->sysfs_lock);
 	res = entry->store(q, page, length);
 	mutex_unlock(&q->sysfs_lock);
 	return res;
 }
 
-static void blk_free_queue_rcu(struct rcu_head *rcu_head)
-{
-	kmem_cache_free(blk_requestq_cachep,
-			container_of(rcu_head, struct request_queue, rcu_head));
-}
-
-/**
- * blk_release_queue - releases all allocated resources of the request_queue
- * @kobj: pointer to a kobject, whose container is a request_queue
- *
- * This function releases all allocated resources of the request queue.
- *
- * The struct request_queue refcount is incremented with blk_get_queue() and
- * decremented with blk_put_queue(). Once the refcount reaches 0 this function
- * is called.
- *
- * Drivers exist which depend on the release of the request_queue to be
- * synchronous, it should not be deferred.
- *
- * Context: can sleep
- */
-static void blk_release_queue(struct kobject *kobj)
-{
-	struct request_queue *q =
-		container_of(kobj, struct request_queue, kobj);
-
-	might_sleep();
-
-	percpu_ref_exit(&q->q_usage_counter);
-
-	if (q->poll_stat)
-		blk_stat_remove_callback(q, q->poll_cb);
-	blk_stat_free_callback(q->poll_cb);
-
-	blk_free_queue_stats(q->stats);
-	kfree(q->poll_stat);
-
-	if (queue_is_mq(q))
-		blk_mq_release(q);
-
-	ida_free(&blk_queue_ida, q->id);
-	call_rcu(&q->rcu_head, blk_free_queue_rcu);
-}
-
 static const struct sysfs_ops queue_sysfs_ops = {
 	.show	= queue_attr_show,
 	.store	= queue_attr_store,
@@ -794,10 +750,15 @@ static const struct attribute_group *blk_queue_attr_groups[] = {
 	NULL
 };
 
-struct kobj_type blk_queue_ktype = {
+static void blk_queue_release(struct kobject *kobj)
+{
+	/* nothing to do here, all data is associated with the parent gendisk */
+}
+
+static struct kobj_type blk_queue_ktype = {
 	.default_groups = blk_queue_attr_groups,
 	.sysfs_ops	= &queue_sysfs_ops,
-	.release	= blk_release_queue,
+	.release	= blk_queue_release,
 };
 
 static void blk_debugfs_remove(struct gendisk *disk)
@@ -823,20 +784,20 @@ int blk_register_queue(struct gendisk *disk)
 	int ret;
 
 	mutex_lock(&q->sysfs_dir_lock);
-	ret = kobject_add(&q->kobj, &disk_to_dev(disk)->kobj, "queue");
+	kobject_init(&disk->queue_kobj, &blk_queue_ktype);
+	ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
 	if (ret < 0)
-		goto out_unlock_dir;
+		goto out_put_queue_kobj;
 
 	if (queue_is_mq(q)) {
 		ret = blk_mq_sysfs_register(disk);
 		if (ret)
-			goto out_del_queue_kobj;
+			goto out_put_queue_kobj;
 	}
 	mutex_lock(&q->sysfs_lock);
 
 	mutex_lock(&q->debugfs_mutex);
-	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
-					    blk_debugfs_root);
+	q->debugfs_dir = debugfs_create_dir(disk->disk_name, blk_debugfs_root);
 	if (queue_is_mq(q))
 		blk_mq_debugfs_register(q);
 	mutex_unlock(&q->debugfs_mutex);
@@ -860,7 +821,7 @@ int blk_register_queue(struct gendisk *disk)
 	blk_throtl_register(disk);
 
 	/* Now everything is ready and send out KOBJ_ADD uevent */
-	kobject_uevent(&q->kobj, KOBJ_ADD);
+	kobject_uevent(&disk->queue_kobj, KOBJ_ADD);
 	if (q->elevator)
 		kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
 	mutex_unlock(&q->sysfs_lock);
@@ -889,9 +850,8 @@ int blk_register_queue(struct gendisk *disk)
 out_debugfs_remove:
 	blk_debugfs_remove(disk);
 	mutex_unlock(&q->sysfs_lock);
-out_del_queue_kobj:
-	kobject_del(&q->kobj);
-out_unlock_dir:
+out_put_queue_kobj:
+	kobject_put(&disk->queue_kobj);
 	mutex_unlock(&q->sysfs_dir_lock);
 	return ret;
 }
@@ -938,8 +898,8 @@ void blk_unregister_queue(struct gendisk *disk)
 	mutex_unlock(&q->sysfs_lock);
 
 	/* Now that we've deleted all child objects, we can delete the queue. */
-	kobject_uevent(&q->kobj, KOBJ_REMOVE);
-	kobject_del(&q->kobj);
+	kobject_uevent(&disk->queue_kobj, KOBJ_REMOVE);
+	kobject_del(&disk->queue_kobj);
 	mutex_unlock(&q->sysfs_dir_lock);
 
 	blk_debugfs_remove(disk);
diff --git a/block/blk.h b/block/blk.h
index e85703ae81dd1..a8ac9803fcb36 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -26,10 +26,6 @@ struct blk_flush_queue {
 	spinlock_t		mq_flush_lock;
 };
 
-extern struct kmem_cache *blk_requestq_cachep;
-extern struct kobj_type blk_queue_ktype;
-extern struct ida blk_queue_ida;
-
 bool is_flush_rq(struct request *req);
 
 struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
diff --git a/block/bsg.c b/block/bsg.c
index 2ab1351eb0823..8eba57b9bb461 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -175,8 +175,10 @@ static void bsg_device_release(struct device *dev)
 
 void bsg_unregister_queue(struct bsg_device *bd)
 {
-	if (bd->queue->kobj.sd)
-		sysfs_remove_link(&bd->queue->kobj, "bsg");
+	struct gendisk *disk = bd->queue->disk;
+
+	if (disk && disk->queue_kobj.sd)
+		sysfs_remove_link(&disk->queue_kobj, "bsg");
 	cdev_device_del(&bd->cdev, &bd->device);
 	put_device(&bd->device);
 }
@@ -216,8 +218,9 @@ struct bsg_device *bsg_register_queue(struct request_queue *q,
 	if (ret)
 		goto out_put_device;
 
-	if (q->kobj.sd) {
-		ret = sysfs_create_link(&q->kobj, &bd->device.kobj, "bsg");
+	if (q->disk && q->disk->queue_kobj.sd) {
+		ret = sysfs_create_link(&q->disk->queue_kobj, &bd->device.kobj,
+					"bsg");
 		if (ret)
 			goto out_device_del;
 	}
diff --git a/block/elevator.c b/block/elevator.c
index a5bdc3b1e7e5b..e51e4dc96d2e4 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -468,7 +468,7 @@ int elv_register_queue(struct request_queue *q, bool uevent)
 
 	lockdep_assert_held(&q->sysfs_lock);
 
-	error = kobject_add(&e->kobj, &q->kobj, "%s", "iosched");
+	error = kobject_add(&e->kobj, &q->disk->queue_kobj, "iosched");
 	if (!error) {
 		struct elv_fs_entry *attr = e->type->elevator_attrs;
 		if (attr) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9188aa3f62595..6e6d172309880 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -155,6 +155,7 @@ struct gendisk {
 	unsigned open_partitions;	/* number of open partitions */
 
 	struct backing_dev_info	*bdi;
+	struct kobject queue_kobj;	/* the queue/ directory */
 	struct kobject *slave_dir;
 #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED
 	struct list_head slave_bdevs;
@@ -430,10 +431,7 @@ struct request_queue {
 
 	struct gendisk		*disk;
 
-	/*
-	 * queue kobject
-	 */
-	struct kobject kobj;
+	refcount_t		refs;
 
 	/*
 	 * mq queue kobject
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] block: mark blk_put_queue as potentially blocking
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-11-14  4:26 ` [PATCH 4/5] block: untangle request_queue refcounting from sysfs Christoph Hellwig
@ 2022-11-14  4:26 ` Christoph Hellwig
  2022-11-19  2:19 ` untangle the request_queue refcounting from the queue kobject v2 Al Viro
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-14  4:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Biggers, linux-block

We can't just say that the last reference release may block, as any
reference dropped could be the last one.  So move the might_sleep() from
blk_free_queue to blk_put_queue and update the documentation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d14317bfdf654..8ab21dd01cd1c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -260,8 +260,6 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head)
 
 static void blk_free_queue(struct request_queue *q)
 {
-	might_sleep();
-
 	percpu_ref_exit(&q->q_usage_counter);
 
 	if (q->poll_stat)
@@ -285,11 +283,11 @@ static void blk_free_queue(struct request_queue *q)
  * Decrements the refcount of the request_queue and free it when the refcount
  * reaches 0.
  *
- * Context: Any context, but the last reference must not be dropped from
- *          atomic context.
+ * Context: Can sleep.
  */
 void blk_put_queue(struct request_queue *q)
 {
+	might_sleep();
 	if (refcount_dec_and_test(&q->refs))
 		blk_free_queue(q);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register
  2022-11-14  4:26 ` [PATCH 1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register Christoph Hellwig
@ 2022-11-18  2:59   ` Eric Biggers
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Biggers @ 2022-11-18  2:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Mon, Nov 14, 2022 at 05:26:33AM +0100, Christoph Hellwig wrote:
> Prepare for changes to the block layer sysfs handling by passing the
> readily available gendisk to blk_crypto_sysfs_{,un}register.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-crypto-internal.h | 10 ++++++----
>  block/blk-crypto-sysfs.c    |  7 ++++---
>  block/blk-sysfs.c           |  4 ++--
>  3 files changed, 12 insertions(+), 9 deletions(-)

Reviewed-by: Eric Biggers <ebiggers@google.com>

- Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: untangle the request_queue refcounting from the queue kobject v2
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-11-14  4:26 ` [PATCH 5/5] block: mark blk_put_queue as potentially blocking Christoph Hellwig
@ 2022-11-19  2:19 ` Al Viro
  2022-11-19  3:00   ` Al Viro
  2022-11-21  7:02   ` Christoph Hellwig
  2022-11-21  8:27 ` Christoph Hellwig
  2022-11-30 18:09 ` Jens Axboe
  7 siblings, 2 replies; 13+ messages in thread
From: Al Viro @ 2022-11-19  2:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Eric Biggers, linux-block

On Mon, Nov 14, 2022 at 05:26:32AM +0100, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series cleans up the registration of the "queue/" kobject, and given
> untangles it from the request_queue refcounting.
> 
> Changes since v1:
>  - also change the blk_crypto_sysfs_unregister prototype
>  - add two patches to fix the error handling in blk_register_queue

Umm...  Do we ever want access to queue parameters of the stuff that has
a queue, but no associated gendisk?  SCSI tape, for example...

	Re refcounting: AFAICS, blk_mq_alloc_disk_for_queue() is broken.
__alloc_disk_node() consumes queue reference (and stuffs it into gendisk->queue)
on success; on failure it leaves the reference alone.  E.g. this

struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
		struct lock_class_key *lkclass)
{
	struct request_queue *q;
	struct gendisk *disk;

	q = blk_mq_init_queue_data(set, queuedata);
	if (IS_ERR(q))
		return ERR_CAST(q);

// we hold the initial reference to q
	disk = __alloc_disk_node(q, set->numa_node, lkclass);
	if (!disk) {
// __alloc_disk_node() failed, we are still holding q
		blk_mq_destroy_queue(q);
		blk_put_queue(q);
// reference dropped
		return ERR_PTR(-ENOMEM);
	}
	set_bit(GD_OWNS_QUEUE, &disk->state);
	return disk;
// ... and on success, the reference is consumed by disk, which is returned to caller
}

is fine; however, this
struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q,
		struct lock_class_key *lkclass)
{
	if (!blk_get_queue(q))
		return NULL;
	return __alloc_disk_node(q, NUMA_NO_NODE, lkclass);
}
can't be right - we might fail in blk_get_queue(), returning NULL with
unchanged refcount, we might succeed and return the new gendisk that
has consumed the extra reference grabbed by blk_get_queue() *OR*
we might grab an extra reference, fail in __alloc_disk_node() and
return NULL with refcount on q bumped.  No way for caller to tell these
failure modes from each other...  The callers (both sd and sr) treat
both as "no reference grabbed", i.e. leak the queue refcount if they
fail past grabbing the queue.

Looks like we should drop the queue if __alloc_disk_node() fails.  As in

struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q,
		struct lock_class_key *lkclass)
{
	struct gendisk *disk;

	if (!blk_get_queue(q))
		return NULL;
	disk = __alloc_disk_node(q, NUMA_NO_NODE, lkclass);
	if (!disk)
		blk_put_queue(q);
	return disk;
}

Objections?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: untangle the request_queue refcounting from the queue kobject v2
  2022-11-19  2:19 ` untangle the request_queue refcounting from the queue kobject v2 Al Viro
@ 2022-11-19  3:00   ` Al Viro
  2022-11-21  7:03     ` Christoph Hellwig
  2022-11-21  7:02   ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Al Viro @ 2022-11-19  3:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Eric Biggers, linux-block

On Sat, Nov 19, 2022 at 02:19:43AM +0000, Al Viro wrote:
> On Mon, Nov 14, 2022 at 05:26:32AM +0100, Christoph Hellwig wrote:
> > Hi Jens,
> > 
> > this series cleans up the registration of the "queue/" kobject, and given
> > untangles it from the request_queue refcounting.
> > 
> > Changes since v1:
> >  - also change the blk_crypto_sysfs_unregister prototype
> >  - add two patches to fix the error handling in blk_register_queue
> 
> Umm...  Do we ever want access to queue parameters of the stuff that has
> a queue, but no associated gendisk?  SCSI tape, for example...
> 
> 	Re refcounting: AFAICS, blk_mq_alloc_disk_for_queue() is broken.

[snip]

> can't be right - we might fail in blk_get_queue(), returning NULL with
> unchanged refcount, we might succeed and return the new gendisk that
> has consumed the extra reference grabbed by blk_get_queue() *OR*
> we might grab an extra reference, fail in __alloc_disk_node() and
> return NULL with refcount on q bumped.  No way for caller to tell these
> failure modes from each other...  The callers (both sd and sr) treat
> both as "no reference grabbed", i.e. leak the queue refcount if they
> fail past grabbing the queue.

Speaking of leaks, how can this
	q = blk_mq_init_queue(&sdev->host->tag_set);
	if (IS_ERR(q)) {
		/* release fn is set up in scsi_sysfs_device_initialise, so
		 * have to free and put manually here */
		put_device(&starget->dev);
		kfree(sdev);
		goto out;
	}
	kref_get(&sdev->host->tagset_refcnt);
	sdev->request_queue = q;
	q->queuedata = sdev;
	__scsi_init_queue(sdev->host, q);

	depth = sdev->host->cmd_per_lun ?: 1;

	/*
	 * Use .can_queue as budget map's depth because we have to
	 * support adjusting queue depth from sysfs. Meantime use
	 * default device queue depth to figure out sbitmap shift
	 * since we use this queue depth most of times.
	 */
	if (scsi_realloc_sdev_budget_map(sdev, depth)) {
		put_device(&starget->dev);
		kfree(sdev);
		goto out;
	}
	...
out:
        if (display_failure_msg)
                printk(ALLOC_FAILURE_MSG, __func__);
        return NULL;


in scsi_alloc_sdev() possibly avoid leaking sdev->request_queue on the
second failure exit?  AFAICS scsi_realloc_sdev_budget_map() will see
NULL in sdev->budget_map.map, attempt
        ret = sbitmap_init_node(&sdev->budget_map,
                                scsi_device_max_queue_depth(sdev),
                                new_shift, GFP_KERNEL,
                                sdev->request_queue->node, false, true);
and if that fails - return without having even looked at sdev->request_queue.
Then we drop startget->dev (which has no way to observe sdev or anything in
it) and kfree sdev, which leaves q the only place where we have the address
of queue.  And we don't look at q after that point...

Shouldn't we do blk_mq_destroy_queue()/blk_put_queue() on that failure
exit?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: untangle the request_queue refcounting from the queue kobject v2
  2022-11-19  2:19 ` untangle the request_queue refcounting from the queue kobject v2 Al Viro
  2022-11-19  3:00   ` Al Viro
@ 2022-11-21  7:02   ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-21  7:02 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Jens Axboe, Eric Biggers, linux-block

On Sat, Nov 19, 2022 at 02:19:43AM +0000, Al Viro wrote:
> > Changes since v1:
> >  - also change the blk_crypto_sysfs_unregister prototype
> >  - add two patches to fix the error handling in blk_register_queue
> 
> Umm...  Do we ever want access to queue parameters of the stuff that has
> a queue, but no associated gendisk?  SCSI tape, for example...

What do you mean with "access queue parameters"?  The sysfs access can
since day one only happen through the queue.

> 	Re refcounting: AFAICS, blk_mq_alloc_disk_for_queue() is broken.
> __alloc_disk_node() consumes queue reference (and stuffs it into gendisk->queue)
> on success; on failure it leaves the reference alone.  E.g. this

Yes, it needs a put_queue for the error handler. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: untangle the request_queue refcounting from the queue kobject v2
  2022-11-19  3:00   ` Al Viro
@ 2022-11-21  7:03     ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-21  7:03 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Jens Axboe, Eric Biggers, linux-block

On Sat, Nov 19, 2022 at 03:00:39AM +0000, Al Viro wrote:
> in scsi_alloc_sdev() possibly avoid leaking sdev->request_queue on the
> second failure exit?

It can't.

> Shouldn't we do blk_mq_destroy_queue()/blk_put_queue() on that failure
> exit?

As far as I can tell: yes.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: untangle the request_queue refcounting from the queue kobject v2
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-11-19  2:19 ` untangle the request_queue refcounting from the queue kobject v2 Al Viro
@ 2022-11-21  8:27 ` Christoph Hellwig
  2022-11-30 18:09 ` Jens Axboe
  7 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2022-11-21  8:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block

Any comments on the actual series?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: untangle the request_queue refcounting from the queue kobject v2
  2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-11-21  8:27 ` Christoph Hellwig
@ 2022-11-30 18:09 ` Jens Axboe
  7 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2022-11-30 18:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Biggers, linux-block

On Mon, 14 Nov 2022 05:26:32 +0100, Christoph Hellwig wrote:
> this series cleans up the registration of the "queue/" kobject, and given
> untangles it from the request_queue refcounting.
> 
> Changes since v1:
>  - also change the blk_crypto_sysfs_unregister prototype
>  - add two patches to fix the error handling in blk_register_queue
> 
> [...]

Applied, thanks!

[1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register
      commit: 450deb93df7d457cdd93594a1987f9650c749b96
[2/5] block: factor out a blk_debugfs_remove helper
      commit: 6fc75f309d291d328b4ea2f91bef0ff56e4bc7c2
[3/5] block: fix error unwinding in blk_register_queue
      commit: 40602997be26887bdfa3d58659c3acb4579099e9
[4/5] block: untangle request_queue refcounting from sysfs
      commit: 2bd85221a625b316114bafaab527770b607095d3
[5/5] block: mark blk_put_queue as potentially blocking
      commit: 63f93fd6fa5717769a78d6d7bea6f7f9a1ccca8e

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-30 18:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-14  4:26 untangle the request_queue refcounting from the queue kobject v2 Christoph Hellwig
2022-11-14  4:26 ` [PATCH 1/5] blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register Christoph Hellwig
2022-11-18  2:59   ` Eric Biggers
2022-11-14  4:26 ` [PATCH 2/5] block: factor out a blk_debugfs_remove helper Christoph Hellwig
2022-11-14  4:26 ` [PATCH 3/5] block: fix error unwinding in blk_register_queue Christoph Hellwig
2022-11-14  4:26 ` [PATCH 4/5] block: untangle request_queue refcounting from sysfs Christoph Hellwig
2022-11-14  4:26 ` [PATCH 5/5] block: mark blk_put_queue as potentially blocking Christoph Hellwig
2022-11-19  2:19 ` untangle the request_queue refcounting from the queue kobject v2 Al Viro
2022-11-19  3:00   ` Al Viro
2022-11-21  7:03     ` Christoph Hellwig
2022-11-21  7:02   ` Christoph Hellwig
2022-11-21  8:27 ` Christoph Hellwig
2022-11-30 18:09 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.