* [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects
@ 2019-08-27 11:01 Ming Lei
2019-08-27 11:01 ` [PATCH V4 1/5] block: Remove blk_mq_register_dev() Ming Lei
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Ming Lei @ 2019-08-27 11:01 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Ming Lei, Christoph Hellwig, Hannes Reinecke,
Greg KH, Mike Snitzer, Bart Van Assche, Damien Le Moal
Hi,
The 1st 3 patches cleans up current uses on q->sysfs_lock.
The 4th patch adds one helper for checking if queue is registered.
The last patch splits .sysfs_lock into two locks: one is only for
sync .store/.show from sysfs, the other one is for pretecting kobjects
registering/unregistering. Meantime avoid to acquire .sysfs_lock when
removing mq & iosched kobjects, so that the reported deadlock can
be fixed.
V4:
- address comments from Bart
- update comments, add comments about releasing sysfs_lock in elevator_switch_mq
- fix a race in blk_register_queue by holding sysfs_lock for
emitting KOBJ_ADD
- only the 5th patch is updated
V3:
- drop the 4th patch in V2, which is wrong, meantime not necesary
for fixing this deadlock
- replace comment with one WARN_ON_ONCE() in patch 2
- add reviewed-by tag
V2:
- remove several uses on .sysfs_lock
- Remove blk_mq_register_dev()
- add one helper for checking queue registered
- split .sysfs_lock into two locks
Bart Van Assche (1):
block: Remove blk_mq_register_dev()
Ming Lei (4):
block: don't hold q->sysfs_lock in elevator_init_mq
blk-mq: don't hold q->sysfs_lock in blk_mq_map_swqueue
block: add helper for checking if queue is registered
block: split .sysfs_lock into two locks
block/blk-core.c | 1 +
block/blk-mq-sysfs.c | 23 ++++----------
block/blk-mq.c | 7 -----
block/blk-sysfs.c | 50 +++++++++++++++++------------
block/blk-wbt.c | 2 +-
block/blk.h | 2 +-
block/elevator.c | 71 +++++++++++++++++++++++++++++++-----------
include/linux/blk-mq.h | 1 -
include/linux/blkdev.h | 2 ++
9 files changed, 94 insertions(+), 65 deletions(-)
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
--
2.20.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH V4 1/5] block: Remove blk_mq_register_dev()
2019-08-27 11:01 [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Ming Lei
@ 2019-08-27 11:01 ` Ming Lei
2019-08-27 11:01 ` [PATCH V4 2/5] block: don't hold q->sysfs_lock in elevator_init_mq Ming Lei
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2019-08-27 11:01 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Christoph Hellwig, Ming Lei,
Hannes Reinecke
From: Bart Van Assche <bvanassche@acm.org>
This function has no callers. Hence remove it.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-mq-sysfs.c | 11 -----------
include/linux/blk-mq.h | 1 -
2 files changed, 12 deletions(-)
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index d6e1a9bd7131..6ddde3774ebe 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -349,17 +349,6 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q)
return ret;
}
-int blk_mq_register_dev(struct device *dev, struct request_queue *q)
-{
- int ret;
-
- mutex_lock(&q->sysfs_lock);
- ret = __blk_mq_register_dev(dev, q);
- mutex_unlock(&q->sysfs_lock);
-
- return ret;
-}
-
void blk_mq_sysfs_unregister(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 21cebe901ac0..62a3bb715899 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -253,7 +253,6 @@ struct request_queue *blk_mq_init_sq_queue(struct blk_mq_tag_set *set,
const struct blk_mq_ops *ops,
unsigned int queue_depth,
unsigned int set_flags);
-int blk_mq_register_dev(struct device *, struct request_queue *);
void blk_mq_unregister_dev(struct device *, struct request_queue *);
int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set);
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V4 2/5] block: don't hold q->sysfs_lock in elevator_init_mq
2019-08-27 11:01 [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Ming Lei
2019-08-27 11:01 ` [PATCH V4 1/5] block: Remove blk_mq_register_dev() Ming Lei
@ 2019-08-27 11:01 ` Ming Lei
2019-08-27 11:01 ` [PATCH V4 3/5] blk-mq: don't hold q->sysfs_lock in blk_mq_map_swqueue Ming Lei
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2019-08-27 11:01 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Ming Lei, Christoph Hellwig, Hannes Reinecke,
Greg KH, Mike Snitzer, Bart Van Assche, Damien Le Moal
The original comment says:
q->sysfs_lock must be held to provide mutual exclusion between
elevator_switch() and here.
Which is simply wrong. elevator_init_mq() is only called from
blk_mq_init_allocated_queue, which is always called before the request
queue is registered via blk_register_queue(), for dm-rq or normal rq
based driver. However, queue's kobject is only exposed and added to sysfs
in blk_register_queue(). So there isn't such race between elevator_switch()
and elevator_init_mq().
So avoid to hold q->sysfs_lock in elevator_init_mq().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/elevator.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/block/elevator.c b/block/elevator.c
index 2f17d66d0e61..33c15fb54ed1 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -607,23 +607,19 @@ int elevator_init_mq(struct request_queue *q)
if (q->nr_hw_queues != 1)
return 0;
- /*
- * q->sysfs_lock must be held to provide mutual exclusion between
- * elevator_switch() and here.
- */
- mutex_lock(&q->sysfs_lock);
+ WARN_ON_ONCE(test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags));
+
if (unlikely(q->elevator))
- goto out_unlock;
+ goto out;
e = elevator_get(q, "mq-deadline", false);
if (!e)
- goto out_unlock;
+ goto out;
err = blk_mq_init_sched(q, e);
if (err)
elevator_put(e);
-out_unlock:
- mutex_unlock(&q->sysfs_lock);
+out:
return err;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V4 3/5] blk-mq: don't hold q->sysfs_lock in blk_mq_map_swqueue
2019-08-27 11:01 [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Ming Lei
2019-08-27 11:01 ` [PATCH V4 1/5] block: Remove blk_mq_register_dev() Ming Lei
2019-08-27 11:01 ` [PATCH V4 2/5] block: don't hold q->sysfs_lock in elevator_init_mq Ming Lei
@ 2019-08-27 11:01 ` Ming Lei
2019-08-27 11:01 ` [PATCH V4 4/5] block: add helper for checking if queue is registered Ming Lei
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2019-08-27 11:01 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Ming Lei, Christoph Hellwig, Hannes Reinecke,
Greg KH, Mike Snitzer, Bart Van Assche
blk_mq_map_swqueue() is called from blk_mq_init_allocated_queue()
and blk_mq_update_nr_hw_queues(). For the former caller, the kobject
isn't exposed to userspace yet. For the latter caller, hctx sysfs entries
and debugfs are un-registered before updating nr_hw_queues.
On the other hand, commit 2f8f1336a48b ("blk-mq: always free hctx after
request queue is freed") moves freeing hctx into queue's release
handler, so there won't be race with queue release path too.
So don't hold q->sysfs_lock in blk_mq_map_swqueue().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6968de9d7402..b0ee0cac737f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2456,11 +2456,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
struct blk_mq_ctx *ctx;
struct blk_mq_tag_set *set = q->tag_set;
- /*
- * Avoid others reading imcomplete hctx->cpumask through sysfs
- */
- mutex_lock(&q->sysfs_lock);
-
queue_for_each_hw_ctx(q, hctx, i) {
cpumask_clear(hctx->cpumask);
hctx->nr_ctx = 0;
@@ -2521,8 +2516,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
HCTX_TYPE_DEFAULT, i);
}
- mutex_unlock(&q->sysfs_lock);
-
queue_for_each_hw_ctx(q, hctx, i) {
/*
* If no software queues are mapped to this hardware queue,
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V4 4/5] block: add helper for checking if queue is registered
2019-08-27 11:01 [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Ming Lei
` (2 preceding siblings ...)
2019-08-27 11:01 ` [PATCH V4 3/5] blk-mq: don't hold q->sysfs_lock in blk_mq_map_swqueue Ming Lei
@ 2019-08-27 11:01 ` Ming Lei
2019-08-27 11:01 ` [PATCH V4 5/5] block: split .sysfs_lock into two locks Ming Lei
2019-08-27 16:40 ` [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Jens Axboe
5 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2019-08-27 11:01 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Ming Lei, Christoph Hellwig, Hannes Reinecke,
Greg KH, Mike Snitzer, Bart Van Assche
There are 4 users which check if queue is registered, so add one helper
to check it.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-sysfs.c | 4 ++--
block/blk-wbt.c | 2 +-
block/elevator.c | 2 +-
include/linux/blkdev.h | 1 +
4 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 977c659dcd18..5b0b5224cfd4 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -942,7 +942,7 @@ int blk_register_queue(struct gendisk *disk)
if (WARN_ON(!q))
return -ENXIO;
- WARN_ONCE(test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags),
+ WARN_ONCE(blk_queue_registered(q),
"%s is registering an already registered queue\n",
kobject_name(&dev->kobj));
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
@@ -1026,7 +1026,7 @@ void blk_unregister_queue(struct gendisk *disk)
return;
/* Return early if disk->queue was never registered. */
- if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
+ if (!blk_queue_registered(q))
return;
/*
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 313f45a37e9d..c4d3089e47f7 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -656,7 +656,7 @@ void wbt_enable_default(struct request_queue *q)
return;
/* Queue not registered? Maybe shutting down... */
- if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
+ if (!blk_queue_registered(q))
return;
if (queue_is_mq(q) && IS_ENABLED(CONFIG_BLK_WBT_MQ))
diff --git a/block/elevator.c b/block/elevator.c
index 33c15fb54ed1..03d923196569 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -656,7 +656,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
struct elevator_type *e;
/* Make sure queue is not in the middle of being removed */
- if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
+ if (!blk_queue_registered(q))
return -ENOENT;
/*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 167bf879f072..6041755984f4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -647,6 +647,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
#define blk_queue_quiesced(q) test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
#define blk_queue_pm_only(q) atomic_read(&(q)->pm_only)
#define blk_queue_fua(q) test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
+#define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
extern void blk_set_pm_only(struct request_queue *q);
extern void blk_clear_pm_only(struct request_queue *q);
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V4 5/5] block: split .sysfs_lock into two locks
2019-08-27 11:01 [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Ming Lei
` (3 preceding siblings ...)
2019-08-27 11:01 ` [PATCH V4 4/5] block: add helper for checking if queue is registered Ming Lei
@ 2019-08-27 11:01 ` Ming Lei
2019-08-27 16:37 ` Bart Van Assche
2019-08-27 16:40 ` [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Jens Axboe
5 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2019-08-27 11:01 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Ming Lei, Christoph Hellwig, Hannes Reinecke,
Greg KH, Mike Snitzer, Bart Van Assche
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
required.
However, when mq & iosched kobjects are removed via
blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
too. This way causes AB-BA lock because the kernfs built-in lock of
'kn-count' is required inside kobject_del() too, see the lockdep warning[1].
On the other hand, it isn't necessary to acquire q->sysfs_lock for
both blk_mq_unregister_dev() & elv_unregister_queue() because
clearing REGISTERED flag prevents storing to 'queue/scheduler'
from being happened. Also sysfs write(store) is exclusive, so no
necessary to hold the lock for elv_unregister_queue() when it is
called in switching elevator path.
So split .sysfs_lock into two: one is still named as .sysfs_lock for
covering sync .store, the other one is named as .sysfs_dir_lock
for covering kobjects and related status change.
sysfs itself can handle the race between add/remove kobjects and
showing/storing attributes under kobjects. For switching scheduler
via storing to 'queue/scheduler', we use the queue flag of
QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
we can avoid to hold .sysfs_lock during removing/adding kobjects.
[1] lockdep warning
======================================================
WARNING: possible circular locking dependency detected
5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
------------------------------------------------------
rmmod/777 is trying to acquire lock:
00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72
but task is already holding lock:
00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&q->sysfs_lock){+.+.}:
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
__mutex_lock+0x14a/0xa9b
blk_mq_hw_sysfs_show+0x63/0xb6
sysfs_kf_seq_show+0x11f/0x196
seq_read+0x2cd/0x5f2
vfs_read+0xc7/0x18c
ksys_read+0xc4/0x13e
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (kn->count#202){++++}:
check_prev_add+0x5d2/0xc45
validate_chain+0xed3/0xf94
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
__kernfs_remove+0x237/0x40b
kernfs_remove_by_name_ns+0x59/0x72
remove_files+0x61/0x96
sysfs_remove_group+0x81/0xa4
sysfs_remove_groups+0x3b/0x44
kobject_del+0x44/0x94
blk_mq_unregister_dev+0x83/0xdd
blk_unregister_queue+0xa0/0x10b
del_gendisk+0x259/0x3fa
null_del_dev+0x8b/0x1c3 [null_blk]
null_exit+0x5c/0x95 [null_blk]
__se_sys_delete_module+0x204/0x337
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&q->sysfs_lock);
lock(kn->count#202);
lock(&q->sysfs_lock);
lock(kn->count#202);
*** DEADLOCK ***
2 locks held by rmmod/777:
#0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
#1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
stack backtrace:
CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
Call Trace:
dump_stack+0x9a/0xe6
check_noncircular+0x207/0x251
? print_circular_bug+0x32a/0x32a
? find_usage_backwards+0x84/0xb0
check_prev_add+0x5d2/0xc45
validate_chain+0xed3/0xf94
? check_prev_add+0xc45/0xc45
? mark_lock+0x11b/0x804
? check_usage_forwards+0x1ca/0x1ca
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
? kernfs_remove_by_name_ns+0x59/0x72
__kernfs_remove+0x237/0x40b
? kernfs_remove_by_name_ns+0x59/0x72
? kernfs_next_descendant_post+0x7d/0x7d
? strlen+0x10/0x23
? strcmp+0x22/0x44
kernfs_remove_by_name_ns+0x59/0x72
remove_files+0x61/0x96
sysfs_remove_group+0x81/0xa4
sysfs_remove_groups+0x3b/0x44
kobject_del+0x44/0x94
blk_mq_unregister_dev+0x83/0xdd
blk_unregister_queue+0xa0/0x10b
del_gendisk+0x259/0x3fa
? disk_events_poll_msecs_store+0x12b/0x12b
? check_flags+0x1ea/0x204
? mark_held_locks+0x1f/0x7a
null_del_dev+0x8b/0x1c3 [null_blk]
null_exit+0x5c/0x95 [null_blk]
__se_sys_delete_module+0x204/0x337
? free_module+0x39f/0x39f
? blkcg_maybe_throttle_current+0x8a/0x718
? rwlock_bug+0x62/0x62
? __blkcg_punt_bio_submit+0xd0/0xd0
? trace_hardirqs_on_thunk+0x1a/0x20
? mark_held_locks+0x1f/0x7a
? do_syscall_64+0x4c/0x295
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fb696cdbe6b
Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-core.c | 1 +
block/blk-mq-sysfs.c | 12 ++++-----
block/blk-sysfs.c | 46 +++++++++++++++++++++--------------
block/blk.h | 2 +-
block/elevator.c | 55 ++++++++++++++++++++++++++++++++++++------
include/linux/blkdev.h | 1 +
6 files changed, 84 insertions(+), 33 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 919629ce4015..2792f7cf7bef 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -520,6 +520,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
mutex_init(&q->blk_trace_mutex);
#endif
mutex_init(&q->sysfs_lock);
+ mutex_init(&q->sysfs_dir_lock);
spin_lock_init(&q->queue_lock);
init_waitqueue_head(&q->mq_freeze_wq);
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 6ddde3774ebe..a0d3ce30fa08 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -270,7 +270,7 @@ void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i;
- lockdep_assert_held(&q->sysfs_lock);
+ lockdep_assert_held(&q->sysfs_dir_lock);
queue_for_each_hw_ctx(q, hctx, i)
blk_mq_unregister_hctx(hctx);
@@ -320,7 +320,7 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q)
int ret, i;
WARN_ON_ONCE(!q->kobj.parent);
- lockdep_assert_held(&q->sysfs_lock);
+ lockdep_assert_held(&q->sysfs_dir_lock);
ret = kobject_add(q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq");
if (ret < 0)
@@ -354,7 +354,7 @@ void blk_mq_sysfs_unregister(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i;
- mutex_lock(&q->sysfs_lock);
+ mutex_lock(&q->sysfs_dir_lock);
if (!q->mq_sysfs_init_done)
goto unlock;
@@ -362,7 +362,7 @@ void blk_mq_sysfs_unregister(struct request_queue *q)
blk_mq_unregister_hctx(hctx);
unlock:
- mutex_unlock(&q->sysfs_lock);
+ mutex_unlock(&q->sysfs_dir_lock);
}
int blk_mq_sysfs_register(struct request_queue *q)
@@ -370,7 +370,7 @@ int blk_mq_sysfs_register(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i, ret = 0;
- mutex_lock(&q->sysfs_lock);
+ mutex_lock(&q->sysfs_dir_lock);
if (!q->mq_sysfs_init_done)
goto unlock;
@@ -381,7 +381,7 @@ int blk_mq_sysfs_register(struct request_queue *q)
}
unlock:
- mutex_unlock(&q->sysfs_lock);
+ mutex_unlock(&q->sysfs_dir_lock);
return ret;
}
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 5b0b5224cfd4..107513495220 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -938,6 +938,7 @@ int blk_register_queue(struct gendisk *disk)
int ret;
struct device *dev = disk_to_dev(disk);
struct request_queue *q = disk->queue;
+ bool has_elevator = false;
if (WARN_ON(!q))
return -ENXIO;
@@ -945,7 +946,6 @@ int blk_register_queue(struct gendisk *disk)
WARN_ONCE(blk_queue_registered(q),
"%s is registering an already registered queue\n",
kobject_name(&dev->kobj));
- blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
/*
* SCSI probing may synchronously create and destroy a lot of
@@ -965,8 +965,7 @@ int blk_register_queue(struct gendisk *disk)
if (ret)
return ret;
- /* Prevent changes through sysfs until registration is completed. */
- mutex_lock(&q->sysfs_lock);
+ mutex_lock(&q->sysfs_dir_lock);
ret = kobject_add(&q->kobj, kobject_get(&dev->kobj), "%s", "queue");
if (ret < 0) {
@@ -987,26 +986,36 @@ int blk_register_queue(struct gendisk *disk)
blk_mq_debugfs_register(q);
}
- kobject_uevent(&q->kobj, KOBJ_ADD);
-
- wbt_enable_default(q);
-
- blk_throtl_register_queue(q);
-
+ /*
+ * The flag of QUEUE_FLAG_REGISTERED isn't set yet, so elevator
+ * switch won't happen at all.
+ */
if (q->elevator) {
- ret = elv_register_queue(q);
+ ret = elv_register_queue(q, false);
if (ret) {
- mutex_unlock(&q->sysfs_lock);
- kobject_uevent(&q->kobj, KOBJ_REMOVE);
+ mutex_unlock(&q->sysfs_dir_lock);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(dev);
kobject_put(&dev->kobj);
return ret;
}
+ has_elevator = true;
}
+
+ mutex_lock(&q->sysfs_lock);
+ blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
+ wbt_enable_default(q);
+ blk_throtl_register_queue(q);
+
+ /* Now everything is ready and send out KOBJ_ADD uevent */
+ kobject_uevent(&q->kobj, KOBJ_ADD);
+ if (has_elevator)
+ kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
+ mutex_unlock(&q->sysfs_lock);
+
ret = 0;
unlock:
- mutex_unlock(&q->sysfs_lock);
+ mutex_unlock(&q->sysfs_dir_lock);
return ret;
}
EXPORT_SYMBOL_GPL(blk_register_queue);
@@ -1021,6 +1030,7 @@ EXPORT_SYMBOL_GPL(blk_register_queue);
void blk_unregister_queue(struct gendisk *disk)
{
struct request_queue *q = disk->queue;
+ bool has_elevator;
if (WARN_ON(!q))
return;
@@ -1035,25 +1045,25 @@ void blk_unregister_queue(struct gendisk *disk)
* concurrent elv_iosched_store() calls.
*/
mutex_lock(&q->sysfs_lock);
-
blk_queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
+ has_elevator = !!q->elevator;
+ mutex_unlock(&q->sysfs_lock);
+ mutex_lock(&q->sysfs_dir_lock);
/*
* Remove the sysfs attributes before unregistering the queue data
* structures that can be modified through sysfs.
*/
if (queue_is_mq(q))
blk_mq_unregister_dev(disk_to_dev(disk), q);
- mutex_unlock(&q->sysfs_lock);
kobject_uevent(&q->kobj, KOBJ_REMOVE);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(disk_to_dev(disk));
- mutex_lock(&q->sysfs_lock);
- if (q->elevator)
+ if (has_elevator)
elv_unregister_queue(q);
- mutex_unlock(&q->sysfs_lock);
+ mutex_unlock(&q->sysfs_dir_lock);
kobject_put(&disk_to_dev(disk)->kobj);
}
diff --git a/block/blk.h b/block/blk.h
index de6b2e146d6e..e4619fc5c99a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -188,7 +188,7 @@ int elevator_init_mq(struct request_queue *q);
int elevator_switch_mq(struct request_queue *q,
struct elevator_type *new_e);
void __elevator_exit(struct request_queue *, struct elevator_queue *);
-int elv_register_queue(struct request_queue *q);
+int elv_register_queue(struct request_queue *q, bool uevent);
void elv_unregister_queue(struct request_queue *q);
static inline void elevator_exit(struct request_queue *q,
diff --git a/block/elevator.c b/block/elevator.c
index 03d923196569..4781c4205a5d 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -470,13 +470,16 @@ static struct kobj_type elv_ktype = {
.release = elevator_release,
};
-int elv_register_queue(struct request_queue *q)
+/*
+ * elv_register_queue is called from either blk_register_queue or
+ * elevator_switch, elevator switch is prevented from being happen
+ * in the two paths, so it is safe to not hold q->sysfs_lock.
+ */
+int elv_register_queue(struct request_queue *q, bool uevent)
{
struct elevator_queue *e = q->elevator;
int error;
- lockdep_assert_held(&q->sysfs_lock);
-
error = kobject_add(&e->kobj, &q->kobj, "%s", "iosched");
if (!error) {
struct elv_fs_entry *attr = e->type->elevator_attrs;
@@ -487,24 +490,34 @@ int elv_register_queue(struct request_queue *q)
attr++;
}
}
- kobject_uevent(&e->kobj, KOBJ_ADD);
+ if (uevent)
+ kobject_uevent(&e->kobj, KOBJ_ADD);
+
+ mutex_lock(&q->sysfs_lock);
e->registered = 1;
+ mutex_unlock(&q->sysfs_lock);
}
return error;
}
+/*
+ * elv_unregister_queue is called from either blk_unregister_queue or
+ * elevator_switch, elevator switch is prevented from being happen
+ * in the two paths, so it is safe to not hold q->sysfs_lock.
+ */
void elv_unregister_queue(struct request_queue *q)
{
- lockdep_assert_held(&q->sysfs_lock);
-
if (q) {
struct elevator_queue *e = q->elevator;
kobject_uevent(&e->kobj, KOBJ_REMOVE);
kobject_del(&e->kobj);
+
+ mutex_lock(&q->sysfs_lock);
e->registered = 0;
/* Re-enable throttling in case elevator disabled it */
wbt_enable_default(q);
+ mutex_unlock(&q->sysfs_lock);
}
}
@@ -567,10 +580,32 @@ int elevator_switch_mq(struct request_queue *q,
lockdep_assert_held(&q->sysfs_lock);
if (q->elevator) {
- if (q->elevator->registered)
+ if (q->elevator->registered) {
+ mutex_unlock(&q->sysfs_lock);
+
+ /*
+ * Concurrent elevator switch can't happen becasue
+ * sysfs write is always exclusively on same file.
+ *
+ * Also the elevator queue won't be freed after
+ * sysfs_lock is released becasue kobject_del() in
+ * blk_unregister_queue() waits for completion of
+ * .store & .show on its attributes.
+ */
elv_unregister_queue(q);
+
+ mutex_lock(&q->sysfs_lock);
+ }
ioc_clear_queue(q);
elevator_exit(q, q->elevator);
+
+ /*
+ * sysfs_lock may be dropped, so re-check if queue is
+ * unregistered. If yes, don't switch to new elevator
+ * any more
+ */
+ if (!blk_queue_registered(q))
+ return 0;
}
ret = blk_mq_init_sched(q, new_e);
@@ -578,7 +613,11 @@ int elevator_switch_mq(struct request_queue *q,
goto out;
if (new_e) {
- ret = elv_register_queue(q);
+ mutex_unlock(&q->sysfs_lock);
+
+ ret = elv_register_queue(q, true);
+
+ mutex_lock(&q->sysfs_lock);
if (ret) {
elevator_exit(q, q->elevator);
goto out;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6041755984f4..e271c3a176fa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -539,6 +539,7 @@ struct request_queue {
struct delayed_work requeue_work;
struct mutex sysfs_lock;
+ struct mutex sysfs_dir_lock;
/*
* for reusing dead hctx instance in case of updating
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH V4 5/5] block: split .sysfs_lock into two locks
2019-08-27 11:01 ` [PATCH V4 5/5] block: split .sysfs_lock into two locks Ming Lei
@ 2019-08-27 16:37 ` Bart Van Assche
0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2019-08-27 16:37 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: linux-block, Christoph Hellwig, Hannes Reinecke, Greg KH, Mike Snitzer
On 8/27/19 4:01 AM, Ming Lei wrote:
> The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
> path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
> required.
> [ ... ]
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects
2019-08-27 11:01 [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Ming Lei
` (4 preceding siblings ...)
2019-08-27 11:01 ` [PATCH V4 5/5] block: split .sysfs_lock into two locks Ming Lei
@ 2019-08-27 16:40 ` Jens Axboe
5 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2019-08-27 16:40 UTC (permalink / raw)
To: Ming Lei
Cc: linux-block, Christoph Hellwig, Hannes Reinecke, Greg KH,
Mike Snitzer, Bart Van Assche, Damien Le Moal
On 8/27/19 5:01 AM, Ming Lei wrote:
> Hi,
>
> The 1st 3 patches cleans up current uses on q->sysfs_lock.
>
> The 4th patch adds one helper for checking if queue is registered.
>
> The last patch splits .sysfs_lock into two locks: one is only for
> sync .store/.show from sysfs, the other one is for pretecting kobjects
> registering/unregistering. Meantime avoid to acquire .sysfs_lock when
> removing mq & iosched kobjects, so that the reported deadlock can
> be fixed.
Thanks Ming, and Bart for diligent reviews. Applied for 5.4.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-08-27 16:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27 11:01 [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Ming Lei
2019-08-27 11:01 ` [PATCH V4 1/5] block: Remove blk_mq_register_dev() Ming Lei
2019-08-27 11:01 ` [PATCH V4 2/5] block: don't hold q->sysfs_lock in elevator_init_mq Ming Lei
2019-08-27 11:01 ` [PATCH V4 3/5] blk-mq: don't hold q->sysfs_lock in blk_mq_map_swqueue Ming Lei
2019-08-27 11:01 ` [PATCH V4 4/5] block: add helper for checking if queue is registered Ming Lei
2019-08-27 11:01 ` [PATCH V4 5/5] block: split .sysfs_lock into two locks Ming Lei
2019-08-27 16:37 ` Bart Van Assche
2019-08-27 16:40 ` [PATCH V4 0/5] block: don't acquire .sysfs_lock before removing mq & iosched kobjects Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).