linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
@ 2017-02-10 18:32 Omar Sandoval
  2017-02-10 18:35 ` Jens Axboe
  2017-02-15 17:24 ` Paolo Valente
  0 siblings, 2 replies; 18+ messages in thread
From: Omar Sandoval @ 2017-02-10 18:32 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Paolo Valente, kernel-team

From: Omar Sandoval <osandov@fb.com>

None of the other blk-mq elevator hooks are called with this lock held.
Additionally, it can lead to circular locking dependencies between
queue_lock and the private scheduler lock.

Reported-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/blk-ioc.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index fe186a9eade9..b12f9c87b4c3 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -35,7 +35,10 @@ static void icq_free_icq_rcu(struct rcu_head *head)
 	kmem_cache_free(icq->__rcu_icq_cache, icq);
 }
 
-/* Exit an icq. Called with both ioc and q locked. */
+/*
+ * Exit an icq. Called with both ioc and q locked for sq, only ioc locked for
+ * mq.
+ */
 static void ioc_exit_icq(struct io_cq *icq)
 {
 	struct elevator_type *et = icq->q->elevator->type;
@@ -166,6 +169,7 @@ EXPORT_SYMBOL(put_io_context);
  */
 void put_io_context_active(struct io_context *ioc)
 {
+	struct elevator_type *et;
 	unsigned long flags;
 	struct io_cq *icq;
 
@@ -184,13 +188,19 @@ void put_io_context_active(struct io_context *ioc)
 	hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
 		if (icq->flags & ICQ_EXITED)
 			continue;
-		if (spin_trylock(icq->q->queue_lock)) {
+
+		et = icq->q->elevator->type;
+		if (et->uses_mq) {
 			ioc_exit_icq(icq);
-			spin_unlock(icq->q->queue_lock);
 		} else {
-			spin_unlock_irqrestore(&ioc->lock, flags);
-			cpu_relax();
-			goto retry;
+			if (spin_trylock(icq->q->queue_lock)) {
+				ioc_exit_icq(icq);
+				spin_unlock(icq->q->queue_lock);
+			} else {
+				spin_unlock_irqrestore(&ioc->lock, flags);
+				cpu_relax();
+				goto retry;
+			}
 		}
 	}
 	spin_unlock_irqrestore(&ioc->lock, flags);
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-10 18:32 [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq Omar Sandoval
@ 2017-02-10 18:35 ` Jens Axboe
  2017-02-15 17:24 ` Paolo Valente
  1 sibling, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2017-02-10 18:35 UTC (permalink / raw)
  To: Omar Sandoval, linux-block; +Cc: Paolo Valente, kernel-team

On 02/10/2017 11:32 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> None of the other blk-mq elevator hooks are called with this lock held.
> Additionally, it can lead to circular locking dependencies between
> queue_lock and the private scheduler lock.

Applied, thanks Omar.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-10 18:32 [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq Omar Sandoval
  2017-02-10 18:35 ` Jens Axboe
@ 2017-02-15 17:24 ` Paolo Valente
  2017-02-15 17:58   ` Jens Axboe
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Valente @ 2017-02-15 17:24 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: Jens Axboe, linux-block, kernel-team


> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>=20
> From: Omar Sandoval <osandov@fb.com>
>=20
> None of the other blk-mq elevator hooks are called with this lock =
held.
> Additionally, it can lead to circular locking dependencies between
> queue_lock and the private scheduler lock.
>=20

Hi Omar,
I'm sorry but it seems that a new potential deadlock has showed up.
See lockdep splat below.

I've tried to think about different solutions than turning back to
deferring the body of exit_icq, but at no avail.

Thanks,
Paolo

[  138.167021] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
[  138.182073] [ INFO: possible circular locking dependency detected ]
[  138.185012] 4.10.0-rc5-bfq-mq+ #42 Not tainted
[  138.219651] -------------------------------------------------------
[  138.261149] aggthr-with-gre/2170 is trying to acquire lock:
[  138.313137]  (&(&bfqd->lock)->rlock){-.-...}, at: =
[<ffffffff84484435>] bfq_exit_icq_bfqq+0x55/0x140
[  138.364973]=20
[  138.364973] but task is already holding lock:
[  138.416588]  (&(&ioc->lock)->rlock){-.....}, at: [<ffffffff84447828>] =
ioc_clear_queue+0x48/0xa0
[  138.484352]=20
[  138.484352] which lock already depends on the new lock.
[  138.484352]=20
[  138.536768]=20
[  138.536768] the existing dependency chain (in reverse order) is:
[  138.599738]=20
[  138.599738] -> #2 (&(&ioc->lock)->rlock){-.....}:
[  138.650975]       =20
[  138.650981] [<ffffffff840ee6bb>] lock_acquire+0x11b/0x220
[  138.711657]       =20
[  138.711661] [<ffffffff8494324d>] _raw_spin_lock+0x3d/0x80
[  138.763924]       =20
[  138.763930] [<ffffffff84447af2>] ioc_create_icq+0x92/0x1a0
[  138.819460]       =20
[  138.819465] [<ffffffff84454619>] blk_mq_sched_get_request+0x359/0x370
[  138.875819]       =20
[  138.875824] [<ffffffff8444e950>] blk_sq_make_request+0x130/0xa50
[  138.923681]       =20
[  138.923688] [<ffffffff844408e6>] generic_make_request+0xf6/0x2b0
[  138.991605]       =20
[  138.991611] [<ffffffff84440b13>] submit_bio+0x73/0x150
[  139.041088]       =20
[  139.041094] [<ffffffff842b2aec>] submit_bh_wbc+0x14c/0x180
[  139.083607]       =20
[  139.083613] [<ffffffff842b35ff>] block_read_full_page+0x27f/0x370
[  139.124926]       =20
[  139.124932] [<ffffffff842b68b8>] blkdev_readpage+0x18/0x20
[  139.176346]       =20
[  139.176352] [<ffffffff841da963>] do_read_cache_page+0x313/0x590
[  139.177482]       =20
[  139.177484] [<ffffffff841dabf5>] read_cache_page+0x15/0x20
[  139.178550]       =20
[  139.178552] [<ffffffff84459285>] read_dev_sector+0x75/0xd0
[  139.179650]       =20
[  139.179652] [<ffffffff84461b4b>] read_lba+0x17b/0x260
[  139.180653]       =20
[  139.180655] [<ffffffff84462412>] efi_partition+0xf2/0x7c0
[  139.181708]       =20
[  139.181709] [<ffffffff8445bb4d>] check_partition+0x13d/0x220
[  139.182792]       =20
[  139.182794] [<ffffffff84459b60>] rescan_partitions+0xc0/0x380
[  139.183911]       =20
[  139.183914] [<ffffffff842b7cce>] __blkdev_get+0x3ae/0x4f0
[  139.184971]       =20
[  139.184972] [<ffffffff842b80fc>] blkdev_get+0x14c/0x3b0
[  139.186016]       =20
[  139.186018] [<ffffffff8445823f>] device_add_disk+0x45f/0x4f0
[  139.212232]       =20
[  139.212238] [<ffffffff8469da20>] sd_probe_async+0x110/0x1c0
[  139.223160]       =20
[  139.223165] [<ffffffff840bc327>] async_run_entry_fn+0x37/0x150
[  139.224742]       =20
[  139.224747] [<ffffffff840b1087>] process_one_work+0x207/0x750
[  139.235800]       =20
[  139.235809] [<ffffffff840b161b>] worker_thread+0x4b/0x4f0
[  139.236878]       =20
[  139.236880] [<ffffffff840b863f>] kthread+0x10f/0x150
[  139.237878]       =20
[  139.237881] [<ffffffff84944271>] ret_from_fork+0x31/0x40
[  139.238938]=20
[  139.238938] -> #1 (&(&q->__queue_lock)->rlock){-.....}:
[  139.239882]       =20
[  139.239885] [<ffffffff840ee6bb>] lock_acquire+0x11b/0x220
[  139.240945]       =20
[  139.240948] [<ffffffff84943e66>] _raw_spin_lock_irqsave+0x56/0x90
[  139.242117]       =20
[  139.242120] [<ffffffff84482645>] bfq_bic_lookup.isra.112+0x25/0x60
[  139.243333]       =20
[  139.243338] [<ffffffff844827bd>] bfq_request_merge+0x3d/0xe0
[  139.244427]       =20
[  139.244430] [<ffffffff84439b8f>] elv_merge+0xcf/0xe0
[  139.245416]       =20
[  139.245419] [<ffffffff84453ff6>] blk_mq_sched_try_merge+0x36/0x150
[  139.246599]       =20
[  139.246602] [<ffffffff8447feea>] bfq_bio_merge+0x5a/0xa0
[  139.247662]       =20
[  139.247665] [<ffffffff844548b0>] __blk_mq_sched_bio_merge+0x60/0x70
[  139.248839]       =20
[  139.248841] [<ffffffff8444ea94>] blk_sq_make_request+0x274/0xa50
[  139.250007]       =20
[  139.250011] [<ffffffff844408e6>] generic_make_request+0xf6/0x2b0
[  139.251156]       =20
[  139.251159] [<ffffffff84440b13>] submit_bio+0x73/0x150
[  139.252230]       =20
[  139.252234] [<ffffffff842b2aec>] submit_bh_wbc+0x14c/0x180
[  139.253306]       =20
[  139.253310] [<ffffffff842b35ff>] block_read_full_page+0x27f/0x370
[  139.254465]       =20
[  139.254467] [<ffffffff842b68b8>] blkdev_readpage+0x18/0x20
[  139.279873]       =20
[  139.279880] [<ffffffff841da963>] do_read_cache_page+0x313/0x590
[  139.281035]       =20
[  139.281036] [<ffffffff841dabf5>] read_cache_page+0x15/0x20
[  139.282111]       =20
[  139.282113] [<ffffffff84459285>] read_dev_sector+0x75/0xd0
[  139.283199]       =20
[  139.283201] [<ffffffff84461b4b>] read_lba+0x17b/0x260
[  139.284226]       =20
[  139.284228] [<ffffffff84462412>] efi_partition+0xf2/0x7c0
[  139.285336]       =20
[  139.285339] [<ffffffff8445bb4d>] check_partition+0x13d/0x220
[  139.286437]       =20
[  139.286439] [<ffffffff84459b60>] rescan_partitions+0xc0/0x380
[  139.287566]       =20
[  139.287568] [<ffffffff842b7cce>] __blkdev_get+0x3ae/0x4f0
[  139.288633]       =20
[  139.288635] [<ffffffff842b80fc>] blkdev_get+0x14c/0x3b0
[  139.289671]       =20
[  139.289672] [<ffffffff8445823f>] device_add_disk+0x45f/0x4f0
[  139.291916]       =20
[  139.291919] [<ffffffff8469da20>] sd_probe_async+0x110/0x1c0
[  139.293003]       =20
[  139.293005] [<ffffffff840bc327>] async_run_entry_fn+0x37/0x150
[  139.294126]       =20
[  139.294128] [<ffffffff840b1087>] process_one_work+0x207/0x750
[  139.295245]       =20
[  139.295247] [<ffffffff840b161b>] worker_thread+0x4b/0x4f0
[  139.296317]       =20
[  139.296319] [<ffffffff840b863f>] kthread+0x10f/0x150
[  139.301536]       =20
[  139.301539] [<ffffffff84944271>] ret_from_fork+0x31/0x40
[  139.316757]=20
[  139.316757] -> #0 (&(&bfqd->lock)->rlock){-.-...}:
[  139.317633]       =20
[  139.317638] [<ffffffff840edd24>] __lock_acquire+0x15e4/0x1890
[  139.318738]       =20
[  139.318747] [<ffffffff840ee6bb>] lock_acquire+0x11b/0x220
[  139.319814]       =20
[  139.319817] [<ffffffff849434da>] _raw_spin_lock_irq+0x4a/0x80
[  139.320917]       =20
[  139.320919] [<ffffffff84484435>] bfq_exit_icq_bfqq+0x55/0x140
[  139.332516]       =20
[  139.332522] [<ffffffff8448453c>] bfq_exit_icq+0x1c/0x30
[  139.333551]       =20
[  139.333554] [<ffffffff844471f8>] ioc_exit_icq+0x38/0x50
[  139.334580]       =20
[  139.334582] [<ffffffff844472c9>] ioc_destroy_icq+0x99/0x140
[  139.335701]       =20
[  139.335703] [<ffffffff84447832>] ioc_clear_queue+0x52/0xa0
[  139.336758]       =20
[  139.336760] [<ffffffff8443922c>] elevator_switch+0x7c/0x240
[  139.337828]       =20
[  139.337830] [<ffffffff844394f8>] __elevator_change+0x108/0x140
[  139.338992]       =20
[  139.338996] [<ffffffff8443a4d6>] elv_iosched_store+0x26/0x60
[  139.340111]       =20
[  139.340114] [<ffffffff844447a9>] queue_attr_store+0x59/0x90
[  139.341195]       =20
[  139.341198] [<ffffffff84308585>] sysfs_kf_write+0x45/0x60
[  139.342250]       =20
[  139.342252] [<ffffffff84307815>] kernfs_fop_write+0x135/0x1c0
[  139.343354]       =20
[  139.343358] [<ffffffff8426d5a7>] __vfs_write+0x37/0x160
[  139.351596]       =20
[  139.351602] [<ffffffff8426efee>] vfs_write+0xce/0x1f0
[  139.352608]       =20
[  139.352610] [<ffffffff84270518>] SyS_write+0x58/0xc0
[  139.353615]       =20
[  139.353618] [<ffffffff84943fc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
[  139.354799]=20
[  139.354799] other info that might help us debug this:
[  139.354799]=20
[  139.355925] Chain exists of:
[  139.355925]   &(&bfqd->lock)->rlock --> &(&q->__queue_lock)->rlock =
--> &(&ioc->lock)->rlock
[  139.355925]=20
[  139.357666]  Possible unsafe locking scenario:
[  139.357666]=20
[  139.368477]        CPU0                    CPU1
[  139.369129]        ----                    ----
[  139.369774]   lock(&(&ioc->lock)->rlock);
[  139.370339]                                =
lock(&(&q->__queue_lock)->rlock);
[  139.390579]                                =
lock(&(&ioc->lock)->rlock);
[  139.391522]   lock(&(&bfqd->lock)->rlock);
[  139.392094]=20
[  139.392094]  *** DEADLOCK ***
[  139.392094]=20
[  139.392912] 6 locks held by aggthr-with-gre/2170:
[  139.393562]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff8426f0cf>] =
vfs_write+0x1af/0x1f0
[  139.396183]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff843077e1>] =
kernfs_fop_write+0x101/0x1c0
[  139.397363]  #2:  (s_active#198){.+.+.+}, at: [<ffffffff843077e9>] =
kernfs_fop_write+0x109/0x1c0
[  139.437158]  #3:  (&q->sysfs_lock){+.+.+.}, at: [<ffffffff84444792>] =
queue_attr_store+0x42/0x90
[  139.438368]  #4:  (&(&q->__queue_lock)->rlock){-.....}, at: =
[<ffffffff84439224>] elevator_switch+0x74/0x240
[  139.439741]  #5:  (&(&ioc->lock)->rlock){-.....}, at: =
[<ffffffff84447828>] ioc_clear_queue+0x48/0xa0
[  139.441008]=20
[  139.441008] stack backtrace:
[  139.441624] CPU: 0 PID: 2170 Comm: aggthr-with-gre Not tainted =
4.10.0-rc5-bfq-mq+ #42
[  139.442704] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS =
VirtualBox 12/01/2006
[  139.443918] Call Trace:
[  139.444270]  dump_stack+0x85/0xc2
[  139.444734]  print_circular_bug+0x1e3/0x250
[  139.445336]  __lock_acquire+0x15e4/0x1890
[  139.445895]  lock_acquire+0x11b/0x220
[  139.446406]  ? bfq_exit_icq_bfqq+0x55/0x140
[  139.446989]  _raw_spin_lock_irq+0x4a/0x80
[  139.451750]  ? bfq_exit_icq_bfqq+0x55/0x140
[  139.452336]  bfq_exit_icq_bfqq+0x55/0x140
[  139.452898]  bfq_exit_icq+0x1c/0x30
[  139.453386]  ioc_exit_icq+0x38/0x50
[  139.453874]  ioc_destroy_icq+0x99/0x140
[  139.454408]  ioc_clear_queue+0x52/0xa0
[  139.454930]  elevator_switch+0x7c/0x240
[  139.455489]  ? kernfs_fop_write+0x101/0x1c0
[  139.456089]  __elevator_change+0x108/0x140
[  139.456657]  elv_iosched_store+0x26/0x60
[  139.457200]  queue_attr_store+0x59/0x90
[  139.489285]  sysfs_kf_write+0x45/0x60
[  139.505169]  kernfs_fop_write+0x135/0x1c0
[  139.543726]  __vfs_write+0x37/0x160
[  139.563645]  ? rcu_read_lock_sched_held+0x72/0x80
[  139.611773]  ? rcu_sync_lockdep_assert+0x2f/0x60
[  139.651981]  ? __sb_start_write+0xde/0x1e0
[  139.683576]  ? vfs_write+0x1af/0x1f0
[  139.703898]  vfs_write+0xce/0x1f0
[  139.735970]  SyS_write+0x58/0xc0
[  139.747901]  entry_SYSCALL_64_fastpath+0x23/0xc6
[  139.748543] RIP: 0033:0x7f862b9796e0
[  139.749039] RSP: 002b:00007ffdacc878b8 EFLAGS: 00000246 ORIG_RAX: =
0000000000000001
[  139.750075] RAX: ffffffffffffffda RBX: 00007f862bc47620 RCX: =
00007f862b9796e0
[  139.761064] RDX: 0000000000000005 RSI: 0000000001560008 RDI: =
0000000000000001
[  139.811601] RBP: 0000000000000004 R08: 00007f862bc48780 R09: =
00007f862c27f700
[  139.867906] R10: 0000000000000004 R11: 0000000000000246 R12: =
0000000000000004
[  139.881848] R13: 0000000001587b28 R14: 0000000000000000 R15: =
00000000004d09d9
> Reported-by: Paolo Valente <paolo.valente@linaro.org>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
> block/blk-ioc.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>=20
> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index fe186a9eade9..b12f9c87b4c3 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -35,7 +35,10 @@ static void icq_free_icq_rcu(struct rcu_head *head)
> 	kmem_cache_free(icq->__rcu_icq_cache, icq);
> }
>=20
> -/* Exit an icq. Called with both ioc and q locked. */
> +/*
> + * Exit an icq. Called with both ioc and q locked for sq, only ioc =
locked for
> + * mq.
> + */
> static void ioc_exit_icq(struct io_cq *icq)
> {
> 	struct elevator_type *et =3D icq->q->elevator->type;
> @@ -166,6 +169,7 @@ EXPORT_SYMBOL(put_io_context);
>  */
> void put_io_context_active(struct io_context *ioc)
> {
> +	struct elevator_type *et;
> 	unsigned long flags;
> 	struct io_cq *icq;
>=20
> @@ -184,13 +188,19 @@ void put_io_context_active(struct io_context =
*ioc)
> 	hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
> 		if (icq->flags & ICQ_EXITED)
> 			continue;
> -		if (spin_trylock(icq->q->queue_lock)) {
> +
> +		et =3D icq->q->elevator->type;
> +		if (et->uses_mq) {
> 			ioc_exit_icq(icq);
> -			spin_unlock(icq->q->queue_lock);
> 		} else {
> -			spin_unlock_irqrestore(&ioc->lock, flags);
> -			cpu_relax();
> -			goto retry;
> +			if (spin_trylock(icq->q->queue_lock)) {
> +				ioc_exit_icq(icq);
> +				spin_unlock(icq->q->queue_lock);
> +			} else {
> +				spin_unlock_irqrestore(&ioc->lock, =
flags);
> +				cpu_relax();
> +				goto retry;
> +			}
> 		}
> 	}
> 	spin_unlock_irqrestore(&ioc->lock, flags);
> --=20
> 2.11.1
>=20

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-15 17:24 ` Paolo Valente
@ 2017-02-15 17:58   ` Jens Axboe
  2017-02-15 18:04     ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-02-15 17:58 UTC (permalink / raw)
  To: Paolo Valente, Omar Sandoval; +Cc: linux-block, kernel-team

On 02/15/2017 10:24 AM, Paolo Valente wrote:
> 
>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto:
>>
>> From: Omar Sandoval <osandov@fb.com>
>>
>> None of the other blk-mq elevator hooks are called with this lock held.
>> Additionally, it can lead to circular locking dependencies between
>> queue_lock and the private scheduler lock.
>>
> 
> Hi Omar,
> I'm sorry but it seems that a new potential deadlock has showed up.
> See lockdep splat below.
> 
> I've tried to think about different solutions than turning back to
> deferring the body of exit_icq, but at no avail.

Looks like a interaction between bfqd->lock and q->queue_lock. Since the
core has no notion of you bfqd->lock, the naturally dependency here
would be to nest bfqd->lock inside q->queue_lock. Is that possible for
you?

Looking at the code a bit, maybe it'd just be simpler to get rid of
holding the queue lock for that spot. For the mq scheduler, we really
don't want places where we invoke with that held already. Does the below
work for you?


diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index b12f9c87b4c3..0f50b1296cf9 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -222,25 +222,34 @@ void exit_io_context(struct task_struct *task)
 	put_io_context_active(ioc);
 }
 
+static void __ioc_clear_queue(struct list_head *icq_list)
+{
+	while (!list_empty(icq_list)) {
+		struct io_cq *icq = list_entry(icq_list->next,
+					       struct io_cq, q_node);
+		struct io_context *ioc = icq->ioc;
+
+		spin_lock_irq(&ioc->lock);
+		ioc_destroy_icq(icq);
+		spin_unlock_irq(&ioc->lock);
+	}
+}
+
 /**
  * ioc_clear_queue - break any ioc association with the specified queue
  * @q: request_queue being cleared
  *
- * Walk @q->icq_list and exit all io_cq's.  Must be called with @q locked.
+ * Walk @q->icq_list and exit all io_cq's.
  */
 void ioc_clear_queue(struct request_queue *q)
 {
-	lockdep_assert_held(q->queue_lock);
+	LIST_HEAD(icq_list);
 
-	while (!list_empty(&q->icq_list)) {
-		struct io_cq *icq = list_entry(q->icq_list.next,
-					       struct io_cq, q_node);
-		struct io_context *ioc = icq->ioc;
+	spin_lock_irq(q->queue_lock);
+	list_splice_init(&q->icq_list, &icq_list);
+	spin_unlock_irq(q->queue_lock);
 
-		spin_lock(&ioc->lock);
-		ioc_destroy_icq(icq);
-		spin_unlock(&ioc->lock);
-	}
+	__ioc_clear_queue(&icq_list);
 }
 
 int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 070d81bae1d5..1944aa1cb899 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj)
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 		elevator_exit(q->elevator);
 	}
 
diff --git a/block/elevator.c b/block/elevator.c
index a25bdd90b270..aaa1e9836512 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 		if (old_registered)
 			elv_unregister_queue(q);
 
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 	}
 
 	/* allocate, init and register new elevator */

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-15 17:58   ` Jens Axboe
@ 2017-02-15 18:04     ` Jens Axboe
  2017-02-16 10:31       ` Paolo Valente
  2017-03-02 10:28       ` Paolo Valente
  0 siblings, 2 replies; 18+ messages in thread
From: Jens Axboe @ 2017-02-15 18:04 UTC (permalink / raw)
  To: Paolo Valente, Omar Sandoval; +Cc: linux-block, kernel-team

On 02/15/2017 10:58 AM, Jens Axboe wrote:
> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>
>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto:
>>>
>>> From: Omar Sandoval <osandov@fb.com>
>>>
>>> None of the other blk-mq elevator hooks are called with this lock held.
>>> Additionally, it can lead to circular locking dependencies between
>>> queue_lock and the private scheduler lock.
>>>
>>
>> Hi Omar,
>> I'm sorry but it seems that a new potential deadlock has showed up.
>> See lockdep splat below.
>>
>> I've tried to think about different solutions than turning back to
>> deferring the body of exit_icq, but at no avail.
> 
> Looks like a interaction between bfqd->lock and q->queue_lock. Since the
> core has no notion of you bfqd->lock, the naturally dependency here
> would be to nest bfqd->lock inside q->queue_lock. Is that possible for
> you?
> 
> Looking at the code a bit, maybe it'd just be simpler to get rid of
> holding the queue lock for that spot. For the mq scheduler, we really
> don't want places where we invoke with that held already. Does the below
> work for you?

Would need to remove one more lockdep assert. And only test this for
the mq parts, we'd need to spread a bit of love on the classic
scheduling icq exit path for this to work on that side.

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index b12f9c87b4c3..546ff8f81ede 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
 	icq->flags |= ICQ_EXITED;
 }
 
-/* Release an icq.  Called with both ioc and q locked. */
+/* Release an icq.  Called with ioc locked. */
 static void ioc_destroy_icq(struct io_cq *icq)
 {
 	struct io_context *ioc = icq->ioc;
@@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
 	struct elevator_type *et = q->elevator->type;
 
 	lockdep_assert_held(&ioc->lock);
-	lockdep_assert_held(q->queue_lock);
 
 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
 	hlist_del_init(&icq->ioc_node);
@@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task)
 	put_io_context_active(ioc);
 }
 
+static void __ioc_clear_queue(struct list_head *icq_list)
+{
+	while (!list_empty(icq_list)) {
+		struct io_cq *icq = list_entry(icq_list->next,
+					       struct io_cq, q_node);
+		struct io_context *ioc = icq->ioc;
+
+		spin_lock_irq(&ioc->lock);
+		ioc_destroy_icq(icq);
+		spin_unlock_irq(&ioc->lock);
+	}
+}
+
 /**
  * ioc_clear_queue - break any ioc association with the specified queue
  * @q: request_queue being cleared
  *
- * Walk @q->icq_list and exit all io_cq's.  Must be called with @q locked.
+ * Walk @q->icq_list and exit all io_cq's.
  */
 void ioc_clear_queue(struct request_queue *q)
 {
-	lockdep_assert_held(q->queue_lock);
+	LIST_HEAD(icq_list);
 
-	while (!list_empty(&q->icq_list)) {
-		struct io_cq *icq = list_entry(q->icq_list.next,
-					       struct io_cq, q_node);
-		struct io_context *ioc = icq->ioc;
+	spin_lock_irq(q->queue_lock);
+	list_splice_init(&q->icq_list, &icq_list);
+	spin_unlock_irq(q->queue_lock);
 
-		spin_lock(&ioc->lock);
-		ioc_destroy_icq(icq);
-		spin_unlock(&ioc->lock);
-	}
+	__ioc_clear_queue(&icq_list);
 }
 
 int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 070d81bae1d5..1944aa1cb899 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj)
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 		elevator_exit(q->elevator);
 	}
 
diff --git a/block/elevator.c b/block/elevator.c
index a25bdd90b270..aaa1e9836512 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 		if (old_registered)
 			elv_unregister_queue(q);
 
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 	}
 
 	/* allocate, init and register new elevator */

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-15 18:04     ` Jens Axboe
@ 2017-02-16 10:31       ` Paolo Valente
  2017-02-17 10:30         ` Paolo Valente
  2017-03-02 10:28       ` Paolo Valente
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Valente @ 2017-02-16 10:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Omar Sandoval, linux-block, kernel-team


> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha =
scritto:
>=20
> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>=20
>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>>>>=20
>>>> From: Omar Sandoval <osandov@fb.com>
>>>>=20
>>>> None of the other blk-mq elevator hooks are called with this lock =
held.
>>>> Additionally, it can lead to circular locking dependencies between
>>>> queue_lock and the private scheduler lock.
>>>>=20
>>>=20
>>> Hi Omar,
>>> I'm sorry but it seems that a new potential deadlock has showed up.
>>> See lockdep splat below.
>>>=20
>>> I've tried to think about different solutions than turning back to
>>> deferring the body of exit_icq, but at no avail.
>>=20
>> Looks like a interaction between bfqd->lock and q->queue_lock. Since =
the
>> core has no notion of you bfqd->lock, the naturally dependency here
>> would be to nest bfqd->lock inside q->queue_lock. Is that possible =
for
>> you?
>>=20
>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>> holding the queue lock for that spot. For the mq scheduler, we really
>> don't want places where we invoke with that held already. Does the =
below
>> work for you?
>=20
> Would need to remove one more lockdep assert. And only test this for
> the mq parts, we'd need to spread a bit of love on the classic
> scheduling icq exit path for this to work on that side.
>=20

Sorry Jens, same splat.  What confuses me is the second column
in the possible scenario:

[  139.368477]        CPU0                    	CPU1
[  139.369129]        ----                    	----
[  139.369774]   lock(&(&ioc->lock)->rlock);
[  139.370339]                                		=
lock(&(&q->__queue_lock)->rlock);
[  139.390579]                                		=
lock(&(&ioc->lock)->rlock);
[  139.391522]   lock(&(&bfqd->lock)->rlock);

I could not find any code path, related to the reported call traces,
and taking first q->queue_lock and then ioc->lock.

Any suggestion on how to go on, and hopefully help with this problem is
welcome.

Thanks,
Paolo

> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index b12f9c87b4c3..546ff8f81ede 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
> 	icq->flags |=3D ICQ_EXITED;
> }
>=20
> -/* Release an icq.  Called with both ioc and q locked. */
> +/* Release an icq.  Called with ioc locked. */
> static void ioc_destroy_icq(struct io_cq *icq)
> {
> 	struct io_context *ioc =3D icq->ioc;
> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
> 	struct elevator_type *et =3D q->elevator->type;
>=20
> 	lockdep_assert_held(&ioc->lock);
> -	lockdep_assert_held(q->queue_lock);
>=20
> 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
> 	hlist_del_init(&icq->ioc_node);
> @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task)
> 	put_io_context_active(ioc);
> }
>=20
> +static void __ioc_clear_queue(struct list_head *icq_list)
> +{
> +	while (!list_empty(icq_list)) {
> +		struct io_cq *icq =3D list_entry(icq_list->next,
> +					       struct io_cq, q_node);
> +		struct io_context *ioc =3D icq->ioc;
> +
> +		spin_lock_irq(&ioc->lock);
> +		ioc_destroy_icq(icq);
> +		spin_unlock_irq(&ioc->lock);
> +	}
> +}
> +
> /**
>  * ioc_clear_queue - break any ioc association with the specified =
queue
>  * @q: request_queue being cleared
>  *
> - * Walk @q->icq_list and exit all io_cq's.  Must be called with @q =
locked.
> + * Walk @q->icq_list and exit all io_cq's.
>  */
> void ioc_clear_queue(struct request_queue *q)
> {
> -	lockdep_assert_held(q->queue_lock);
> +	LIST_HEAD(icq_list);
>=20
> -	while (!list_empty(&q->icq_list)) {
> -		struct io_cq *icq =3D list_entry(q->icq_list.next,
> -					       struct io_cq, q_node);
> -		struct io_context *ioc =3D icq->ioc;
> +	spin_lock_irq(q->queue_lock);
> +	list_splice_init(&q->icq_list, &icq_list);
> +	spin_unlock_irq(q->queue_lock);
>=20
> -		spin_lock(&ioc->lock);
> -		ioc_destroy_icq(icq);
> -		spin_unlock(&ioc->lock);
> -	}
> +	__ioc_clear_queue(&icq_list);
> }
>=20
> int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, =
int node)
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 070d81bae1d5..1944aa1cb899 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject =
*kobj)
> 	blkcg_exit_queue(q);
>=20
> 	if (q->elevator) {
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 		elevator_exit(q->elevator);
> 	}
>=20
> diff --git a/block/elevator.c b/block/elevator.c
> index a25bdd90b270..aaa1e9836512 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue =
*q, struct elevator_type *new_e)
> 		if (old_registered)
> 			elv_unregister_queue(q);
>=20
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 	}
>=20
> 	/* allocate, init and register new elevator */
>=20
> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-16 10:31       ` Paolo Valente
@ 2017-02-17 10:30         ` Paolo Valente
  2017-02-22 21:21           ` Paolo Valente
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Valente @ 2017-02-17 10:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Omar Sandoval, linux-block, kernel-team


> Il giorno 16 feb 2017, alle ore 11:31, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
>>=20
>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha =
scritto:
>>=20
>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>=20
>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>>>>>=20
>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>=20
>>>>> None of the other blk-mq elevator hooks are called with this lock =
held.
>>>>> Additionally, it can lead to circular locking dependencies between
>>>>> queue_lock and the private scheduler lock.
>>>>>=20
>>>>=20
>>>> Hi Omar,
>>>> I'm sorry but it seems that a new potential deadlock has showed up.
>>>> See lockdep splat below.
>>>>=20
>>>> I've tried to think about different solutions than turning back to
>>>> deferring the body of exit_icq, but at no avail.
>>>=20
>>> Looks like a interaction between bfqd->lock and q->queue_lock. Since =
the
>>> core has no notion of you bfqd->lock, the naturally dependency here
>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible =
for
>>> you?
>>>=20
>>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>>> holding the queue lock for that spot. For the mq scheduler, we =
really
>>> don't want places where we invoke with that held already. Does the =
below
>>> work for you?
>>=20
>> Would need to remove one more lockdep assert. And only test this for
>> the mq parts, we'd need to spread a bit of love on the classic
>> scheduling icq exit path for this to work on that side.
>>=20
>=20
> Sorry Jens, same splat.  What confuses me is the second column
> in the possible scenario:
>=20
> [  139.368477]        CPU0                    	CPU1
> [  139.369129]        ----                    	----
> [  139.369774]   lock(&(&ioc->lock)->rlock);
> [  139.370339]                                		=
lock(&(&q->__queue_lock)->rlock);
> [  139.390579]                                		=
lock(&(&ioc->lock)->rlock);
> [  139.391522]   lock(&(&bfqd->lock)->rlock);
>=20
> I could not find any code path, related to the reported call traces,
> and taking first q->queue_lock and then ioc->lock.
>=20
> Any suggestion on how to go on, and hopefully help with this problem =
is
> welcome.
>=20

Jens,
this is just to tell you that I have found the link that still causes
the circular dependency: an ioc->lock nested into a queue_lock in
ioc_create_icq.  I'll try to come back with a solution proposal.

Thanks,
Paolo

> Thanks,
> Paolo
>=20
>> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
>> index b12f9c87b4c3..546ff8f81ede 100644
>> --- a/block/blk-ioc.c
>> +++ b/block/blk-ioc.c
>> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
>> 	icq->flags |=3D ICQ_EXITED;
>> }
>>=20
>> -/* Release an icq.  Called with both ioc and q locked. */
>> +/* Release an icq.  Called with ioc locked. */
>> static void ioc_destroy_icq(struct io_cq *icq)
>> {
>> 	struct io_context *ioc =3D icq->ioc;
>> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
>> 	struct elevator_type *et =3D q->elevator->type;
>>=20
>> 	lockdep_assert_held(&ioc->lock);
>> -	lockdep_assert_held(q->queue_lock);
>>=20
>> 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
>> 	hlist_del_init(&icq->ioc_node);
>> @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task)
>> 	put_io_context_active(ioc);
>> }
>>=20
>> +static void __ioc_clear_queue(struct list_head *icq_list)
>> +{
>> +	while (!list_empty(icq_list)) {
>> +		struct io_cq *icq =3D list_entry(icq_list->next,
>> +					       struct io_cq, q_node);
>> +		struct io_context *ioc =3D icq->ioc;
>> +
>> +		spin_lock_irq(&ioc->lock);
>> +		ioc_destroy_icq(icq);
>> +		spin_unlock_irq(&ioc->lock);
>> +	}
>> +}
>> +
>> /**
>> * ioc_clear_queue - break any ioc association with the specified =
queue
>> * @q: request_queue being cleared
>> *
>> - * Walk @q->icq_list and exit all io_cq's.  Must be called with @q =
locked.
>> + * Walk @q->icq_list and exit all io_cq's.
>> */
>> void ioc_clear_queue(struct request_queue *q)
>> {
>> -	lockdep_assert_held(q->queue_lock);
>> +	LIST_HEAD(icq_list);
>>=20
>> -	while (!list_empty(&q->icq_list)) {
>> -		struct io_cq *icq =3D list_entry(q->icq_list.next,
>> -					       struct io_cq, q_node);
>> -		struct io_context *ioc =3D icq->ioc;
>> +	spin_lock_irq(q->queue_lock);
>> +	list_splice_init(&q->icq_list, &icq_list);
>> +	spin_unlock_irq(q->queue_lock);
>>=20
>> -		spin_lock(&ioc->lock);
>> -		ioc_destroy_icq(icq);
>> -		spin_unlock(&ioc->lock);
>> -	}
>> +	__ioc_clear_queue(&icq_list);
>> }
>>=20
>> int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, =
int node)
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 070d81bae1d5..1944aa1cb899 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject =
*kobj)
>> 	blkcg_exit_queue(q);
>>=20
>> 	if (q->elevator) {
>> -		spin_lock_irq(q->queue_lock);
>> 		ioc_clear_queue(q);
>> -		spin_unlock_irq(q->queue_lock);
>> 		elevator_exit(q->elevator);
>> 	}
>>=20
>> diff --git a/block/elevator.c b/block/elevator.c
>> index a25bdd90b270..aaa1e9836512 100644
>> --- a/block/elevator.c
>> +++ b/block/elevator.c
>> @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue =
*q, struct elevator_type *new_e)
>> 		if (old_registered)
>> 			elv_unregister_queue(q);
>>=20
>> -		spin_lock_irq(q->queue_lock);
>> 		ioc_clear_queue(q);
>> -		spin_unlock_irq(q->queue_lock);
>> 	}
>>=20
>> 	/* allocate, init and register new elevator */
>>=20
>> --=20
>> Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-17 10:30         ` Paolo Valente
@ 2017-02-22 21:21           ` Paolo Valente
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Valente @ 2017-02-22 21:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Omar Sandoval, linux-block, kernel-team


> Il giorno 17 feb 2017, alle ore 11:30, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
>=20
>> Il giorno 16 feb 2017, alle ore 11:31, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>>=20
>>>=20
>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha =
scritto:
>>>=20
>>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>>=20
>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>>>>>>=20
>>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>>=20
>>>>>> None of the other blk-mq elevator hooks are called with this lock =
held.
>>>>>> Additionally, it can lead to circular locking dependencies =
between
>>>>>> queue_lock and the private scheduler lock.
>>>>>>=20
>>>>>=20
>>>>> Hi Omar,
>>>>> I'm sorry but it seems that a new potential deadlock has showed =
up.
>>>>> See lockdep splat below.
>>>>>=20
>>>>> I've tried to think about different solutions than turning back to
>>>>> deferring the body of exit_icq, but at no avail.
>>>>=20
>>>> Looks like a interaction between bfqd->lock and q->queue_lock. =
Since the
>>>> core has no notion of you bfqd->lock, the naturally dependency here
>>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible =
for
>>>> you?
>>>>=20
>>>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>>>> holding the queue lock for that spot. For the mq scheduler, we =
really
>>>> don't want places where we invoke with that held already. Does the =
below
>>>> work for you?
>>>=20
>>> Would need to remove one more lockdep assert. And only test this for
>>> the mq parts, we'd need to spread a bit of love on the classic
>>> scheduling icq exit path for this to work on that side.
>>>=20
>>=20
>> Sorry Jens, same splat.  What confuses me is the second column
>> in the possible scenario:
>>=20
>> [  139.368477]        CPU0                    	CPU1
>> [  139.369129]        ----                    	----
>> [  139.369774]   lock(&(&ioc->lock)->rlock);
>> [  139.370339]                                		=
lock(&(&q->__queue_lock)->rlock);
>> [  139.390579]                                		=
lock(&(&ioc->lock)->rlock);
>> [  139.391522]   lock(&(&bfqd->lock)->rlock);
>>=20
>> I could not find any code path, related to the reported call traces,
>> and taking first q->queue_lock and then ioc->lock.
>>=20
>> Any suggestion on how to go on, and hopefully help with this problem =
is
>> welcome.
>>=20
>=20
> Jens,
> this is just to tell you that I have found the link that still causes
> the circular dependency: an ioc->lock nested into a queue_lock in
> ioc_create_icq.  I'll try to come back with a solution proposal.
>=20

Solution implemented and apparently working.  If anyone wants to have
a look at or test it, it is in the usual branch [1].  I have found
other similar circular dependencies in the meanwhile, and this
solution covers them too.  I'm pasting the message of the last commit
below.  Actually, the branch now also contains a fix to another minor
bug (if useful to know, to not waste further time in preparing several
micro fixes, I have just merged some minor fixes into existing
commits). I'm starting to work on cgroups support.

    Unnest request-queue and ioc locks from scheduler locks
   =20
    In some bio-merging functions, the request-queue lock needs to be
    taken, to lookup for the bic associated with the process that issued
    the bio that may need to be merged. In addition, put_io_context must
    be invoked in some other functions, and put_io_context may cause the
    lock of the involved ioc to be taken. In both cases, these extra
    request-queue or ioc locks are taken, or might be taken, while the
    scheduler lock is being held. In this respect, there are other code
    paths, in part external to bfq-mq, in which the same locks are taken
    (nested) in the opposite order, i.e., it is the scheduler lock to be
    taken while the request-queue or the ioc lock is being held.  This
    leads to circular deadlocks.
   =20
    This commit addresses this issue by modifying the logic of the above
    functions, so as to let the lookup and put_io_context be performed,
    and thus the extra locks be taken, outside the critical sections
    protected by the scheduler lock.


Thanks,
Paolo

[1] https://github.com/Algodev-github/bfq-mq

> Thanks,
> Paolo
>=20
>> Thanks,
>> Paolo
>>=20
>>> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
>>> index b12f9c87b4c3..546ff8f81ede 100644
>>> --- a/block/blk-ioc.c
>>> +++ b/block/blk-ioc.c
>>> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
>>> 	icq->flags |=3D ICQ_EXITED;
>>> }
>>>=20
>>> -/* Release an icq.  Called with both ioc and q locked. */
>>> +/* Release an icq.  Called with ioc locked. */
>>> static void ioc_destroy_icq(struct io_cq *icq)
>>> {
>>> 	struct io_context *ioc =3D icq->ioc;
>>> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
>>> 	struct elevator_type *et =3D q->elevator->type;
>>>=20
>>> 	lockdep_assert_held(&ioc->lock);
>>> -	lockdep_assert_held(q->queue_lock);
>>>=20
>>> 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
>>> 	hlist_del_init(&icq->ioc_node);
>>> @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task)
>>> 	put_io_context_active(ioc);
>>> }
>>>=20
>>> +static void __ioc_clear_queue(struct list_head *icq_list)
>>> +{
>>> +	while (!list_empty(icq_list)) {
>>> +		struct io_cq *icq =3D list_entry(icq_list->next,
>>> +					       struct io_cq, q_node);
>>> +		struct io_context *ioc =3D icq->ioc;
>>> +
>>> +		spin_lock_irq(&ioc->lock);
>>> +		ioc_destroy_icq(icq);
>>> +		spin_unlock_irq(&ioc->lock);
>>> +	}
>>> +}
>>> +
>>> /**
>>> * ioc_clear_queue - break any ioc association with the specified =
queue
>>> * @q: request_queue being cleared
>>> *
>>> - * Walk @q->icq_list and exit all io_cq's.  Must be called with @q =
locked.
>>> + * Walk @q->icq_list and exit all io_cq's.
>>> */
>>> void ioc_clear_queue(struct request_queue *q)
>>> {
>>> -	lockdep_assert_held(q->queue_lock);
>>> +	LIST_HEAD(icq_list);
>>>=20
>>> -	while (!list_empty(&q->icq_list)) {
>>> -		struct io_cq *icq =3D list_entry(q->icq_list.next,
>>> -					       struct io_cq, q_node);
>>> -		struct io_context *ioc =3D icq->ioc;
>>> +	spin_lock_irq(q->queue_lock);
>>> +	list_splice_init(&q->icq_list, &icq_list);
>>> +	spin_unlock_irq(q->queue_lock);
>>>=20
>>> -		spin_lock(&ioc->lock);
>>> -		ioc_destroy_icq(icq);
>>> -		spin_unlock(&ioc->lock);
>>> -	}
>>> +	__ioc_clear_queue(&icq_list);
>>> }
>>>=20
>>> int create_task_io_context(struct task_struct *task, gfp_t =
gfp_flags, int node)
>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>> index 070d81bae1d5..1944aa1cb899 100644
>>> --- a/block/blk-sysfs.c
>>> +++ b/block/blk-sysfs.c
>>> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject =
*kobj)
>>> 	blkcg_exit_queue(q);
>>>=20
>>> 	if (q->elevator) {
>>> -		spin_lock_irq(q->queue_lock);
>>> 		ioc_clear_queue(q);
>>> -		spin_unlock_irq(q->queue_lock);
>>> 		elevator_exit(q->elevator);
>>> 	}
>>>=20
>>> diff --git a/block/elevator.c b/block/elevator.c
>>> index a25bdd90b270..aaa1e9836512 100644
>>> --- a/block/elevator.c
>>> +++ b/block/elevator.c
>>> @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue =
*q, struct elevator_type *new_e)
>>> 		if (old_registered)
>>> 			elv_unregister_queue(q);
>>>=20
>>> -		spin_lock_irq(q->queue_lock);
>>> 		ioc_clear_queue(q);
>>> -		spin_unlock_irq(q->queue_lock);
>>> 	}
>>>=20
>>> 	/* allocate, init and register new elevator */
>>>=20
>>> --=20
>>> Jens Axboe
>=20

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-02-15 18:04     ` Jens Axboe
  2017-02-16 10:31       ` Paolo Valente
@ 2017-03-02 10:28       ` Paolo Valente
  2017-03-02 15:00         ` Jens Axboe
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Valente @ 2017-03-02 10:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Omar Sandoval, linux-block, kernel-team


> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha =
scritto:
>=20
> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>=20
>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>>>>=20
>>>> From: Omar Sandoval <osandov@fb.com>
>>>>=20
>>>> None of the other blk-mq elevator hooks are called with this lock =
held.
>>>> Additionally, it can lead to circular locking dependencies between
>>>> queue_lock and the private scheduler lock.
>>>>=20
>>>=20
>>> Hi Omar,
>>> I'm sorry but it seems that a new potential deadlock has showed up.
>>> See lockdep splat below.
>>>=20
>>> I've tried to think about different solutions than turning back to
>>> deferring the body of exit_icq, but at no avail.
>>=20
>> Looks like a interaction between bfqd->lock and q->queue_lock. Since =
the
>> core has no notion of you bfqd->lock, the naturally dependency here
>> would be to nest bfqd->lock inside q->queue_lock. Is that possible =
for
>> you?
>>=20
>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>> holding the queue lock for that spot. For the mq scheduler, we really
>> don't want places where we invoke with that held already. Does the =
below
>> work for you?
>=20
> Would need to remove one more lockdep assert. And only test this for
> the mq parts, we'd need to spread a bit of love on the classic
> scheduling icq exit path for this to work on that side.
>=20


Jens,
here is the reply I anticipated in my previous email: after rebasing
against master, I'm getting again the deadlock that this patch of
yours solved (together with some changes in bfq-mq).  I thought you =
added a
sort of equivalent commit (now) to the mainline branch.  Am I wrong?

Thanks,
Paolo

> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index b12f9c87b4c3..546ff8f81ede 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
> 	icq->flags |=3D ICQ_EXITED;
> }
>=20
> -/* Release an icq.  Called with both ioc and q locked. */
> +/* Release an icq.  Called with ioc locked. */
> static void ioc_destroy_icq(struct io_cq *icq)
> {
> 	struct io_context *ioc =3D icq->ioc;
> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
> 	struct elevator_type *et =3D q->elevator->type;
>=20
> 	lockdep_assert_held(&ioc->lock);
> -	lockdep_assert_held(q->queue_lock);
>=20
> 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
> 	hlist_del_init(&icq->ioc_node);
> @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task)
> 	put_io_context_active(ioc);
> }
>=20
> +static void __ioc_clear_queue(struct list_head *icq_list)
> +{
> +	while (!list_empty(icq_list)) {
> +		struct io_cq *icq =3D list_entry(icq_list->next,
> +					       struct io_cq, q_node);
> +		struct io_context *ioc =3D icq->ioc;
> +
> +		spin_lock_irq(&ioc->lock);
> +		ioc_destroy_icq(icq);
> +		spin_unlock_irq(&ioc->lock);
> +	}
> +}
> +
> /**
>  * ioc_clear_queue - break any ioc association with the specified =
queue
>  * @q: request_queue being cleared
>  *
> - * Walk @q->icq_list and exit all io_cq's.  Must be called with @q =
locked.
> + * Walk @q->icq_list and exit all io_cq's.
>  */
> void ioc_clear_queue(struct request_queue *q)
> {
> -	lockdep_assert_held(q->queue_lock);
> +	LIST_HEAD(icq_list);
>=20
> -	while (!list_empty(&q->icq_list)) {
> -		struct io_cq *icq =3D list_entry(q->icq_list.next,
> -					       struct io_cq, q_node);
> -		struct io_context *ioc =3D icq->ioc;
> +	spin_lock_irq(q->queue_lock);
> +	list_splice_init(&q->icq_list, &icq_list);
> +	spin_unlock_irq(q->queue_lock);
>=20
> -		spin_lock(&ioc->lock);
> -		ioc_destroy_icq(icq);
> -		spin_unlock(&ioc->lock);
> -	}
> +	__ioc_clear_queue(&icq_list);
> }
>=20
> int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, =
int node)
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 070d81bae1d5..1944aa1cb899 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject =
*kobj)
> 	blkcg_exit_queue(q);
>=20
> 	if (q->elevator) {
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 		elevator_exit(q->elevator);
> 	}
>=20
> diff --git a/block/elevator.c b/block/elevator.c
> index a25bdd90b270..aaa1e9836512 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue =
*q, struct elevator_type *new_e)
> 		if (old_registered)
> 			elv_unregister_queue(q);
>=20
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 	}
>=20
> 	/* allocate, init and register new elevator */
>=20
> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 10:28       ` Paolo Valente
@ 2017-03-02 15:00         ` Jens Axboe
  2017-03-02 15:13           ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-03-02 15:00 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Omar Sandoval, linux-block, kernel-team

On 03/02/2017 03:28 AM, Paolo Valente wrote:
> 
>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto:
>>
>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>
>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto:
>>>>>
>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>
>>>>> None of the other blk-mq elevator hooks are called with this lock held.
>>>>> Additionally, it can lead to circular locking dependencies between
>>>>> queue_lock and the private scheduler lock.
>>>>>
>>>>
>>>> Hi Omar,
>>>> I'm sorry but it seems that a new potential deadlock has showed up.
>>>> See lockdep splat below.
>>>>
>>>> I've tried to think about different solutions than turning back to
>>>> deferring the body of exit_icq, but at no avail.
>>>
>>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the
>>> core has no notion of you bfqd->lock, the naturally dependency here
>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for
>>> you?
>>>
>>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>>> holding the queue lock for that spot. For the mq scheduler, we really
>>> don't want places where we invoke with that held already. Does the below
>>> work for you?
>>
>> Would need to remove one more lockdep assert. And only test this for
>> the mq parts, we'd need to spread a bit of love on the classic
>> scheduling icq exit path for this to work on that side.
>>
> 
> 
> Jens,
> here is the reply I anticipated in my previous email: after rebasing
> against master, I'm getting again the deadlock that this patch of
> yours solved (together with some changes in bfq-mq).  I thought you added a
> sort of equivalent commit (now) to the mainline branch.  Am I wrong?

The patch I posted was never pulled to completion, it wasn't clear
to me if it fixed your issue or not. Maybe I missed a reply on
that?

Let me take another stab at it today, I'll send you a version to test
on top of my for-linus branch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 15:00         ` Jens Axboe
@ 2017-03-02 15:13           ` Jens Axboe
  2017-03-02 16:07             ` Paolo Valente
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-03-02 15:13 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Omar Sandoval, linux-block, kernel-team

On 03/02/2017 08:00 AM, Jens Axboe wrote:
> On 03/02/2017 03:28 AM, Paolo Valente wrote:
>>
>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto:
>>>
>>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>>
>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto:
>>>>>>
>>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>>
>>>>>> None of the other blk-mq elevator hooks are called with this lock held.
>>>>>> Additionally, it can lead to circular locking dependencies between
>>>>>> queue_lock and the private scheduler lock.
>>>>>>
>>>>>
>>>>> Hi Omar,
>>>>> I'm sorry but it seems that a new potential deadlock has showed up.
>>>>> See lockdep splat below.
>>>>>
>>>>> I've tried to think about different solutions than turning back to
>>>>> deferring the body of exit_icq, but at no avail.
>>>>
>>>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the
>>>> core has no notion of you bfqd->lock, the naturally dependency here
>>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for
>>>> you?
>>>>
>>>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>>>> holding the queue lock for that spot. For the mq scheduler, we really
>>>> don't want places where we invoke with that held already. Does the below
>>>> work for you?
>>>
>>> Would need to remove one more lockdep assert. And only test this for
>>> the mq parts, we'd need to spread a bit of love on the classic
>>> scheduling icq exit path for this to work on that side.
>>>
>>
>>
>> Jens,
>> here is the reply I anticipated in my previous email: after rebasing
>> against master, I'm getting again the deadlock that this patch of
>> yours solved (together with some changes in bfq-mq).  I thought you added a
>> sort of equivalent commit (now) to the mainline branch.  Am I wrong?
> 
> The patch I posted was never pulled to completion, it wasn't clear
> to me if it fixed your issue or not. Maybe I missed a reply on
> that?
> 
> Let me take another stab at it today, I'll send you a version to test
> on top of my for-linus branch.

I took at look at my last posting, and I think it was only missing a
lock grab in cfq, since we don't hold that for icq exit anymore.  Paolo,
it'd be great if you could verify that this works. Not for bfq-mq (we
already know it does), but for CFQ. Run with lockdep enabled and see if
it spits out anything. We'll hit this exit path very quickly, so the
test doesn't need to be anything exhaustive.


diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index b12f9c87b4c3..546ff8f81ede 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
 	icq->flags |= ICQ_EXITED;
 }
 
-/* Release an icq.  Called with both ioc and q locked. */
+/* Release an icq.  Called with ioc locked. */
 static void ioc_destroy_icq(struct io_cq *icq)
 {
 	struct io_context *ioc = icq->ioc;
@@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
 	struct elevator_type *et = q->elevator->type;
 
 	lockdep_assert_held(&ioc->lock);
-	lockdep_assert_held(q->queue_lock);
 
 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
 	hlist_del_init(&icq->ioc_node);
@@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task)
 	put_io_context_active(ioc);
 }
 
+static void __ioc_clear_queue(struct list_head *icq_list)
+{
+	while (!list_empty(icq_list)) {
+		struct io_cq *icq = list_entry(icq_list->next,
+					       struct io_cq, q_node);
+		struct io_context *ioc = icq->ioc;
+
+		spin_lock_irq(&ioc->lock);
+		ioc_destroy_icq(icq);
+		spin_unlock_irq(&ioc->lock);
+	}
+}
+
 /**
  * ioc_clear_queue - break any ioc association with the specified queue
  * @q: request_queue being cleared
  *
- * Walk @q->icq_list and exit all io_cq's.  Must be called with @q locked.
+ * Walk @q->icq_list and exit all io_cq's.
  */
 void ioc_clear_queue(struct request_queue *q)
 {
-	lockdep_assert_held(q->queue_lock);
+	LIST_HEAD(icq_list);
 
-	while (!list_empty(&q->icq_list)) {
-		struct io_cq *icq = list_entry(q->icq_list.next,
-					       struct io_cq, q_node);
-		struct io_context *ioc = icq->ioc;
+	spin_lock_irq(q->queue_lock);
+	list_splice_init(&q->icq_list, &icq_list);
+	spin_unlock_irq(q->queue_lock);
 
-		spin_lock(&ioc->lock);
-		ioc_destroy_icq(icq);
-		spin_unlock(&ioc->lock);
-	}
+	__ioc_clear_queue(&icq_list);
 }
 
 int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 002af836aa87..c44b321335f3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj)
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 		elevator_exit(q->elevator);
 	}
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 137944777859..4a71c79de037 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3658,6 +3658,8 @@ static void cfq_exit_icq(struct io_cq *icq)
 	struct cfq_io_cq *cic = icq_to_cic(icq);
 	struct cfq_data *cfqd = cic_to_cfqd(cic);
 
+	spin_lock(cfqd->queue->queue_lock);
+
 	if (cic_to_cfqq(cic, false)) {
 		cfq_exit_cfqq(cfqd, cic_to_cfqq(cic, false));
 		cic_set_cfqq(cic, NULL, false);
@@ -3667,6 +3669,8 @@ static void cfq_exit_icq(struct io_cq *icq)
 		cfq_exit_cfqq(cfqd, cic_to_cfqq(cic, true));
 		cic_set_cfqq(cic, NULL, true);
 	}
+
+	spin_unlock(cfqd->queue->queue_lock);
 }
 
 static void cfq_init_prio_data(struct cfq_queue *cfqq, struct cfq_io_cq *cic)
diff --git a/block/elevator.c b/block/elevator.c
index ac1c9f481a98..01139f549b5b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -983,9 +983,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 		if (old_registered)
 			elv_unregister_queue(q);
 
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 	}
 
 	/* allocate, init and register new elevator */

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 15:13           ` Jens Axboe
@ 2017-03-02 16:07             ` Paolo Valente
  2017-03-02 16:13               ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Valente @ 2017-03-02 16:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Omar Sandoval, linux-block, kernel-team


> Il giorno 02 mar 2017, alle ore 16:13, Jens Axboe <axboe@fb.com> ha =
scritto:
>=20
> On 03/02/2017 08:00 AM, Jens Axboe wrote:
>> On 03/02/2017 03:28 AM, Paolo Valente wrote:
>>>=20
>>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha =
scritto:
>>>>=20
>>>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>>>=20
>>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>>>>>>>=20
>>>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>>>=20
>>>>>>> None of the other blk-mq elevator hooks are called with this =
lock held.
>>>>>>> Additionally, it can lead to circular locking dependencies =
between
>>>>>>> queue_lock and the private scheduler lock.
>>>>>>>=20
>>>>>>=20
>>>>>> Hi Omar,
>>>>>> I'm sorry but it seems that a new potential deadlock has showed =
up.
>>>>>> See lockdep splat below.
>>>>>>=20
>>>>>> I've tried to think about different solutions than turning back =
to
>>>>>> deferring the body of exit_icq, but at no avail.
>>>>>=20
>>>>> Looks like a interaction between bfqd->lock and q->queue_lock. =
Since the
>>>>> core has no notion of you bfqd->lock, the naturally dependency =
here
>>>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible =
for
>>>>> you?
>>>>>=20
>>>>> Looking at the code a bit, maybe it'd just be simpler to get rid =
of
>>>>> holding the queue lock for that spot. For the mq scheduler, we =
really
>>>>> don't want places where we invoke with that held already. Does the =
below
>>>>> work for you?
>>>>=20
>>>> Would need to remove one more lockdep assert. And only test this =
for
>>>> the mq parts, we'd need to spread a bit of love on the classic
>>>> scheduling icq exit path for this to work on that side.
>>>>=20
>>>=20
>>>=20
>>> Jens,
>>> here is the reply I anticipated in my previous email: after rebasing
>>> against master, I'm getting again the deadlock that this patch of
>>> yours solved (together with some changes in bfq-mq).  I thought you =
added a
>>> sort of equivalent commit (now) to the mainline branch.  Am I wrong?
>>=20
>> The patch I posted was never pulled to completion, it wasn't clear
>> to me if it fixed your issue or not. Maybe I missed a reply on
>> that?
>>=20
>> Let me take another stab at it today, I'll send you a version to test
>> on top of my for-linus branch.
>=20
> I took at look at my last posting, and I think it was only missing a
> lock grab in cfq, since we don't hold that for icq exit anymore.  =
Paolo,
> it'd be great if you could verify that this works. Not for bfq-mq (we
> already know it does), but for CFQ.

No luck, sorry :(

It looks like to have just not to take q->queue_lock in cfq_exit_icq.

[    9.329501] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

[    9.336132] [ INFO: possible recursive locking detected ]
[    9.343801] 4.10.0-bfq-mq+ #56 Not tainted
[    9.353359] ---------------------------------------------
[    9.354101] ata_id/160 is trying to acquire lock:
[    9.354750]  (&(&q->__queue_lock)->rlock){..-...}, at: =
[<ffffffffaf48501a>] cfq_exit_icq+0x2a/0x80
[    9.355986]=20
[    9.355986] but task is already holding lock:
[    9.364221]  (&(&q->__queue_lock)->rlock){..-...}, at: =
[<ffffffffaf45a686>] put_io_context_active+0x86/0xe0
[    9.382980]=20
[    9.382980] other info that might help us debug this:
[    9.386460]  Possible unsafe locking scenario:
[    9.386460]=20
[    9.397070]        CPU0
[    9.397468]        ----
[    9.397811]   lock(&(&q->__queue_lock)->rlock);
[    9.398440]   lock(&(&q->__queue_lock)->rlock);
[    9.406265]=20
[    9.406265]  *** DEADLOCK ***
[    9.406265]=20
[    9.409589]  May be due to missing lock nesting notation
[    9.409589]=20
[    9.413040] 2 locks held by ata_id/160:
[    9.413576]  #0:  (&(&ioc->lock)->rlock/1){......}, at: =
[<ffffffffaf45a638>] put_io_context_active+0x38/0xe0
[    9.418069]  #1:  (&(&q->__queue_lock)->rlock){..-...}, at: =
[<ffffffffaf45a686>] put_io_context_active+0x86/0xe0
[    9.441118]=20
[    9.441118] stack backtrace:
[    9.445113] CPU: 0 PID: 160 Comm: ata_id Not tainted 4.10.0-bfq-mq+ =
#56
[    9.446210] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS =
VirtualBox 12/01/2006
[    9.457223] Call Trace:
[    9.457636]  dump_stack+0x85/0xc2
[    9.458183]  __lock_acquire+0x1834/0x1890
[    9.458839]  ? __lock_acquire+0x4af/0x1890
[    9.464206]  lock_acquire+0x11b/0x220
[    9.471703]  ? cfq_exit_icq+0x2a/0x80
[    9.472225]  _raw_spin_lock+0x3d/0x80
[    9.472784]  ? cfq_exit_icq+0x2a/0x80
[    9.476336]  cfq_exit_icq+0x2a/0x80
[    9.486750]  ioc_exit_icq+0x38/0x50
[    9.487245]  put_io_context_active+0x92/0xe0
[    9.488049]  exit_io_context+0x48/0x50
[    9.488423]  do_exit+0x7a8/0xc80
[    9.488797]  do_group_exit+0x50/0xd0
[    9.496655]  SyS_exit_group+0x14/0x20
[    9.497170]  entry_SYSCALL_64_fastpath+0x23/0xc6
[    9.497808] RIP: 0033:0x7f224d1c1b98
[    9.498302] RSP: 002b:00007ffd342666f8 EFLAGS: 00000246 ORIG_RAX: =
00000000000000e7
[    9.499360] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f224d1c1b98
[    9.511706] RDX: 0000000000000000 RSI: 000000000000003c RDI: =
0000000000000000
[    9.516390] RBP: 00007f224db02690 R08: 00000000000000e7 R09: =
ffffffffffffff90
[    9.523044] R10: 00007f224d6d7250 R11: 0000000000000246 R12: =
0000000000000016
[    9.524015] R13: 0000000000000001 R14: 000055938a1c3010 R15: =
00007ffd34266170

Thanks,
Paolo

>  Run with lockdep enabled and see if
> it spits out anything. We'll hit this exit path very quickly, so the
> test doesn't need to be anything exhaustive.
>=20
>=20
> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index b12f9c87b4c3..546ff8f81ede 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
> 	icq->flags |=3D ICQ_EXITED;
> }
>=20
> -/* Release an icq.  Called with both ioc and q locked. */
> +/* Release an icq.  Called with ioc locked. */
> static void ioc_destroy_icq(struct io_cq *icq)
> {
> 	struct io_context *ioc =3D icq->ioc;
> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
> 	struct elevator_type *et =3D q->elevator->type;
>=20
> 	lockdep_assert_held(&ioc->lock);
> -	lockdep_assert_held(q->queue_lock);
>=20
> 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
> 	hlist_del_init(&icq->ioc_node);
> @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task)
> 	put_io_context_active(ioc);
> }
>=20
> +static void __ioc_clear_queue(struct list_head *icq_list)
> +{
> +	while (!list_empty(icq_list)) {
> +		struct io_cq *icq =3D list_entry(icq_list->next,
> +					       struct io_cq, q_node);
> +		struct io_context *ioc =3D icq->ioc;
> +
> +		spin_lock_irq(&ioc->lock);
> +		ioc_destroy_icq(icq);
> +		spin_unlock_irq(&ioc->lock);
> +	}
> +}
> +
> /**
>  * ioc_clear_queue - break any ioc association with the specified =
queue
>  * @q: request_queue being cleared
>  *
> - * Walk @q->icq_list and exit all io_cq's.  Must be called with @q =
locked.
> + * Walk @q->icq_list and exit all io_cq's.
>  */
> void ioc_clear_queue(struct request_queue *q)
> {
> -	lockdep_assert_held(q->queue_lock);
> +	LIST_HEAD(icq_list);
>=20
> -	while (!list_empty(&q->icq_list)) {
> -		struct io_cq *icq =3D list_entry(q->icq_list.next,
> -					       struct io_cq, q_node);
> -		struct io_context *ioc =3D icq->ioc;
> +	spin_lock_irq(q->queue_lock);
> +	list_splice_init(&q->icq_list, &icq_list);
> +	spin_unlock_irq(q->queue_lock);
>=20
> -		spin_lock(&ioc->lock);
> -		ioc_destroy_icq(icq);
> -		spin_unlock(&ioc->lock);
> -	}
> +	__ioc_clear_queue(&icq_list);
> }
>=20
> int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, =
int node)
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 002af836aa87..c44b321335f3 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject =
*kobj)
> 	blkcg_exit_queue(q);
>=20
> 	if (q->elevator) {
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 		elevator_exit(q->elevator);
> 	}
>=20
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 137944777859..4a71c79de037 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -3658,6 +3658,8 @@ static void cfq_exit_icq(struct io_cq *icq)
> 	struct cfq_io_cq *cic =3D icq_to_cic(icq);
> 	struct cfq_data *cfqd =3D cic_to_cfqd(cic);
>=20
> +	spin_lock(cfqd->queue->queue_lock);
> +
> 	if (cic_to_cfqq(cic, false)) {
> 		cfq_exit_cfqq(cfqd, cic_to_cfqq(cic, false));
> 		cic_set_cfqq(cic, NULL, false);
> @@ -3667,6 +3669,8 @@ static void cfq_exit_icq(struct io_cq *icq)
> 		cfq_exit_cfqq(cfqd, cic_to_cfqq(cic, true));
> 		cic_set_cfqq(cic, NULL, true);
> 	}
> +
> +	spin_unlock(cfqd->queue->queue_lock);
> }
>=20
> static void cfq_init_prio_data(struct cfq_queue *cfqq, struct =
cfq_io_cq *cic)
> diff --git a/block/elevator.c b/block/elevator.c
> index ac1c9f481a98..01139f549b5b 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -983,9 +983,7 @@ static int elevator_switch(struct request_queue =
*q, struct elevator_type *new_e)
> 		if (old_registered)
> 			elv_unregister_queue(q);
>=20
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 	}
>=20
> 	/* allocate, init and register new elevator */
>=20
> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 16:07             ` Paolo Valente
@ 2017-03-02 16:13               ` Jens Axboe
  2017-03-02 18:07                 ` Paolo Valente
  2017-03-02 20:53                 ` Omar Sandoval
  0 siblings, 2 replies; 18+ messages in thread
From: Jens Axboe @ 2017-03-02 16:13 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Omar Sandoval, linux-block, kernel-team

On 03/02/2017 09:07 AM, Paolo Valente wrote:
> 
>> Il giorno 02 mar 2017, alle ore 16:13, Jens Axboe <axboe@fb.com> ha scritto:
>>
>> On 03/02/2017 08:00 AM, Jens Axboe wrote:
>>> On 03/02/2017 03:28 AM, Paolo Valente wrote:
>>>>
>>>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto:
>>>>>
>>>>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>>>>
>>>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto:
>>>>>>>>
>>>>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>>>>
>>>>>>>> None of the other blk-mq elevator hooks are called with this lock held.
>>>>>>>> Additionally, it can lead to circular locking dependencies between
>>>>>>>> queue_lock and the private scheduler lock.
>>>>>>>>
>>>>>>>
>>>>>>> Hi Omar,
>>>>>>> I'm sorry but it seems that a new potential deadlock has showed up.
>>>>>>> See lockdep splat below.
>>>>>>>
>>>>>>> I've tried to think about different solutions than turning back to
>>>>>>> deferring the body of exit_icq, but at no avail.
>>>>>>
>>>>>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the
>>>>>> core has no notion of you bfqd->lock, the naturally dependency here
>>>>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for
>>>>>> you?
>>>>>>
>>>>>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>>>>>> holding the queue lock for that spot. For the mq scheduler, we really
>>>>>> don't want places where we invoke with that held already. Does the below
>>>>>> work for you?
>>>>>
>>>>> Would need to remove one more lockdep assert. And only test this for
>>>>> the mq parts, we'd need to spread a bit of love on the classic
>>>>> scheduling icq exit path for this to work on that side.
>>>>>
>>>>
>>>>
>>>> Jens,
>>>> here is the reply I anticipated in my previous email: after rebasing
>>>> against master, I'm getting again the deadlock that this patch of
>>>> yours solved (together with some changes in bfq-mq).  I thought you added a
>>>> sort of equivalent commit (now) to the mainline branch.  Am I wrong?
>>>
>>> The patch I posted was never pulled to completion, it wasn't clear
>>> to me if it fixed your issue or not. Maybe I missed a reply on
>>> that?
>>>
>>> Let me take another stab at it today, I'll send you a version to test
>>> on top of my for-linus branch.
>>
>> I took at look at my last posting, and I think it was only missing a
>> lock grab in cfq, since we don't hold that for icq exit anymore.  Paolo,
>> it'd be great if you could verify that this works. Not for bfq-mq (we
>> already know it does), but for CFQ.
> 
> No luck, sorry :(
> 
> It looks like to have just not to take q->queue_lock in cfq_exit_icq.

I was worried about that. How about the below? We need to grab the lock
at some point for legacy scheduling, but the ordering should be correct.


diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index b12f9c87b4c3..6fd633b5d567 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
 	icq->flags |= ICQ_EXITED;
 }
 
-/* Release an icq.  Called with both ioc and q locked. */
+/* Release an icq.  Called with ioc locked. */
 static void ioc_destroy_icq(struct io_cq *icq)
 {
 	struct io_context *ioc = icq->ioc;
@@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
 	struct elevator_type *et = q->elevator->type;
 
 	lockdep_assert_held(&ioc->lock);
-	lockdep_assert_held(q->queue_lock);
 
 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
 	hlist_del_init(&icq->ioc_node);
@@ -222,24 +221,40 @@ void exit_io_context(struct task_struct *task)
 	put_io_context_active(ioc);
 }
 
+static void __ioc_clear_queue(struct list_head *icq_list)
+{
+	unsigned long flags;
+
+	while (!list_empty(icq_list)) {
+		struct io_cq *icq = list_entry(icq_list->next,
+					       struct io_cq, q_node);
+		struct io_context *ioc = icq->ioc;
+
+		spin_lock_irqsave(&ioc->lock, flags);
+		ioc_destroy_icq(icq);
+		spin_unlock_irqrestore(&ioc->lock, flags);
+	}
+}
+
 /**
  * ioc_clear_queue - break any ioc association with the specified queue
  * @q: request_queue being cleared
  *
- * Walk @q->icq_list and exit all io_cq's.  Must be called with @q locked.
+ * Walk @q->icq_list and exit all io_cq's.
  */
 void ioc_clear_queue(struct request_queue *q)
 {
-	lockdep_assert_held(q->queue_lock);
+	LIST_HEAD(icq_list);
 
-	while (!list_empty(&q->icq_list)) {
-		struct io_cq *icq = list_entry(q->icq_list.next,
-					       struct io_cq, q_node);
-		struct io_context *ioc = icq->ioc;
+	spin_lock_irq(q->queue_lock);
+	list_splice_init(&q->icq_list, &icq_list);
 
-		spin_lock(&ioc->lock);
-		ioc_destroy_icq(icq);
-		spin_unlock(&ioc->lock);
+	if (q->mq_ops) {
+		spin_unlock_irq(q->queue_lock);
+		__ioc_clear_queue(&icq_list);
+	} else {
+		__ioc_clear_queue(&icq_list);
+		spin_unlock_irq(q->queue_lock);
 	}
 }
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 002af836aa87..c44b321335f3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj)
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 		elevator_exit(q->elevator);
 	}
 
diff --git a/block/elevator.c b/block/elevator.c
index ac1c9f481a98..01139f549b5b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -983,9 +983,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 		if (old_registered)
 			elv_unregister_queue(q);
 
-		spin_lock_irq(q->queue_lock);
 		ioc_clear_queue(q);
-		spin_unlock_irq(q->queue_lock);
 	}
 
 	/* allocate, init and register new elevator */

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 16:13               ` Jens Axboe
@ 2017-03-02 18:07                 ` Paolo Valente
  2017-03-02 18:25                   ` Jens Axboe
  2017-03-02 20:53                 ` Omar Sandoval
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Valente @ 2017-03-02 18:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Omar Sandoval, linux-block, kernel-team


> Il giorno 02 mar 2017, alle ore 17:13, Jens Axboe <axboe@fb.com> ha =
scritto:
>=20
> On 03/02/2017 09:07 AM, Paolo Valente wrote:
>>=20
>>> Il giorno 02 mar 2017, alle ore 16:13, Jens Axboe <axboe@fb.com> ha =
scritto:
>>>=20
>>> On 03/02/2017 08:00 AM, Jens Axboe wrote:
>>>> On 03/02/2017 03:28 AM, Paolo Valente wrote:
>>>>>=20
>>>>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> =
ha scritto:
>>>>>>=20
>>>>>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>>>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>>>>>=20
>>>>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>>>>>>>>>=20
>>>>>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>>>>>=20
>>>>>>>>> None of the other blk-mq elevator hooks are called with this =
lock held.
>>>>>>>>> Additionally, it can lead to circular locking dependencies =
between
>>>>>>>>> queue_lock and the private scheduler lock.
>>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Hi Omar,
>>>>>>>> I'm sorry but it seems that a new potential deadlock has showed =
up.
>>>>>>>> See lockdep splat below.
>>>>>>>>=20
>>>>>>>> I've tried to think about different solutions than turning back =
to
>>>>>>>> deferring the body of exit_icq, but at no avail.
>>>>>>>=20
>>>>>>> Looks like a interaction between bfqd->lock and q->queue_lock. =
Since the
>>>>>>> core has no notion of you bfqd->lock, the naturally dependency =
here
>>>>>>> would be to nest bfqd->lock inside q->queue_lock. Is that =
possible for
>>>>>>> you?
>>>>>>>=20
>>>>>>> Looking at the code a bit, maybe it'd just be simpler to get rid =
of
>>>>>>> holding the queue lock for that spot. For the mq scheduler, we =
really
>>>>>>> don't want places where we invoke with that held already. Does =
the below
>>>>>>> work for you?
>>>>>>=20
>>>>>> Would need to remove one more lockdep assert. And only test this =
for
>>>>>> the mq parts, we'd need to spread a bit of love on the classic
>>>>>> scheduling icq exit path for this to work on that side.
>>>>>>=20
>>>>>=20
>>>>>=20
>>>>> Jens,
>>>>> here is the reply I anticipated in my previous email: after =
rebasing
>>>>> against master, I'm getting again the deadlock that this patch of
>>>>> yours solved (together with some changes in bfq-mq).  I thought =
you added a
>>>>> sort of equivalent commit (now) to the mainline branch.  Am I =
wrong?
>>>>=20
>>>> The patch I posted was never pulled to completion, it wasn't clear
>>>> to me if it fixed your issue or not. Maybe I missed a reply on
>>>> that?
>>>>=20
>>>> Let me take another stab at it today, I'll send you a version to =
test
>>>> on top of my for-linus branch.
>>>=20
>>> I took at look at my last posting, and I think it was only missing a
>>> lock grab in cfq, since we don't hold that for icq exit anymore.  =
Paolo,
>>> it'd be great if you could verify that this works. Not for bfq-mq =
(we
>>> already know it does), but for CFQ.
>>=20
>> No luck, sorry :(
>>=20
>> It looks like to have just not to take q->queue_lock in cfq_exit_icq.
>=20
> I was worried about that. How about the below? We need to grab the =
lock
> at some point for legacy scheduling, but the ordering should be =
correct.
>=20
>=20

It seems to be: no more crashes or lockdep complains after several =
tests, boots and shutdowns :)

Thanks,
Paolo


> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index b12f9c87b4c3..6fd633b5d567 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
> 	icq->flags |=3D ICQ_EXITED;
> }
>=20
> -/* Release an icq.  Called with both ioc and q locked. */
> +/* Release an icq.  Called with ioc locked. */
> static void ioc_destroy_icq(struct io_cq *icq)
> {
> 	struct io_context *ioc =3D icq->ioc;
> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
> 	struct elevator_type *et =3D q->elevator->type;
>=20
> 	lockdep_assert_held(&ioc->lock);
> -	lockdep_assert_held(q->queue_lock);
>=20
> 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
> 	hlist_del_init(&icq->ioc_node);
> @@ -222,24 +221,40 @@ void exit_io_context(struct task_struct *task)
> 	put_io_context_active(ioc);
> }
>=20
> +static void __ioc_clear_queue(struct list_head *icq_list)
> +{
> +	unsigned long flags;
> +
> +	while (!list_empty(icq_list)) {
> +		struct io_cq *icq =3D list_entry(icq_list->next,
> +					       struct io_cq, q_node);
> +		struct io_context *ioc =3D icq->ioc;
> +
> +		spin_lock_irqsave(&ioc->lock, flags);
> +		ioc_destroy_icq(icq);
> +		spin_unlock_irqrestore(&ioc->lock, flags);
> +	}
> +}
> +
> /**
>  * ioc_clear_queue - break any ioc association with the specified =
queue
>  * @q: request_queue being cleared
>  *
> - * Walk @q->icq_list and exit all io_cq's.  Must be called with @q =
locked.
> + * Walk @q->icq_list and exit all io_cq's.
>  */
> void ioc_clear_queue(struct request_queue *q)
> {
> -	lockdep_assert_held(q->queue_lock);
> +	LIST_HEAD(icq_list);
>=20
> -	while (!list_empty(&q->icq_list)) {
> -		struct io_cq *icq =3D list_entry(q->icq_list.next,
> -					       struct io_cq, q_node);
> -		struct io_context *ioc =3D icq->ioc;
> +	spin_lock_irq(q->queue_lock);
> +	list_splice_init(&q->icq_list, &icq_list);
>=20
> -		spin_lock(&ioc->lock);
> -		ioc_destroy_icq(icq);
> -		spin_unlock(&ioc->lock);
> +	if (q->mq_ops) {
> +		spin_unlock_irq(q->queue_lock);
> +		__ioc_clear_queue(&icq_list);
> +	} else {
> +		__ioc_clear_queue(&icq_list);
> +		spin_unlock_irq(q->queue_lock);
> 	}
> }
>=20
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 002af836aa87..c44b321335f3 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject =
*kobj)
> 	blkcg_exit_queue(q);
>=20
> 	if (q->elevator) {
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 		elevator_exit(q->elevator);
> 	}
>=20
> diff --git a/block/elevator.c b/block/elevator.c
> index ac1c9f481a98..01139f549b5b 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -983,9 +983,7 @@ static int elevator_switch(struct request_queue =
*q, struct elevator_type *new_e)
> 		if (old_registered)
> 			elv_unregister_queue(q);
>=20
> -		spin_lock_irq(q->queue_lock);
> 		ioc_clear_queue(q);
> -		spin_unlock_irq(q->queue_lock);
> 	}
>=20
> 	/* allocate, init and register new elevator */
>=20
> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 18:07                 ` Paolo Valente
@ 2017-03-02 18:25                   ` Jens Axboe
  2017-03-02 20:31                     ` Paolo Valente
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-03-02 18:25 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Omar Sandoval, linux-block, kernel-team

On 03/02/2017 11:07 AM, Paolo Valente wrote:
> 
>> Il giorno 02 mar 2017, alle ore 17:13, Jens Axboe <axboe@fb.com> ha scritto:
>>
>> On 03/02/2017 09:07 AM, Paolo Valente wrote:
>>>
>>>> Il giorno 02 mar 2017, alle ore 16:13, Jens Axboe <axboe@fb.com> ha scritto:
>>>>
>>>> On 03/02/2017 08:00 AM, Jens Axboe wrote:
>>>>> On 03/02/2017 03:28 AM, Paolo Valente wrote:
>>>>>>
>>>>>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto:
>>>>>>>
>>>>>>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>>>>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>>>>>>
>>>>>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto:
>>>>>>>>>>
>>>>>>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>>>>>>
>>>>>>>>>> None of the other blk-mq elevator hooks are called with this lock held.
>>>>>>>>>> Additionally, it can lead to circular locking dependencies between
>>>>>>>>>> queue_lock and the private scheduler lock.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Omar,
>>>>>>>>> I'm sorry but it seems that a new potential deadlock has showed up.
>>>>>>>>> See lockdep splat below.
>>>>>>>>>
>>>>>>>>> I've tried to think about different solutions than turning back to
>>>>>>>>> deferring the body of exit_icq, but at no avail.
>>>>>>>>
>>>>>>>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the
>>>>>>>> core has no notion of you bfqd->lock, the naturally dependency here
>>>>>>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for
>>>>>>>> you?
>>>>>>>>
>>>>>>>> Looking at the code a bit, maybe it'd just be simpler to get rid of
>>>>>>>> holding the queue lock for that spot. For the mq scheduler, we really
>>>>>>>> don't want places where we invoke with that held already. Does the below
>>>>>>>> work for you?
>>>>>>>
>>>>>>> Would need to remove one more lockdep assert. And only test this for
>>>>>>> the mq parts, we'd need to spread a bit of love on the classic
>>>>>>> scheduling icq exit path for this to work on that side.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Jens,
>>>>>> here is the reply I anticipated in my previous email: after rebasing
>>>>>> against master, I'm getting again the deadlock that this patch of
>>>>>> yours solved (together with some changes in bfq-mq).  I thought you added a
>>>>>> sort of equivalent commit (now) to the mainline branch.  Am I wrong?
>>>>>
>>>>> The patch I posted was never pulled to completion, it wasn't clear
>>>>> to me if it fixed your issue or not. Maybe I missed a reply on
>>>>> that?
>>>>>
>>>>> Let me take another stab at it today, I'll send you a version to test
>>>>> on top of my for-linus branch.
>>>>
>>>> I took at look at my last posting, and I think it was only missing a
>>>> lock grab in cfq, since we don't hold that for icq exit anymore.  Paolo,
>>>> it'd be great if you could verify that this works. Not for bfq-mq (we
>>>> already know it does), but for CFQ.
>>>
>>> No luck, sorry :(
>>>
>>> It looks like to have just not to take q->queue_lock in cfq_exit_icq.
>>
>> I was worried about that. How about the below? We need to grab the lock
>> at some point for legacy scheduling, but the ordering should be correct.
>>
>>
> 
> It seems to be: no more crashes or lockdep complains after several tests, boots and shutdowns :)

Great! Can I add your Tested-by?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 18:25                   ` Jens Axboe
@ 2017-03-02 20:31                     ` Paolo Valente
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Valente @ 2017-03-02 20:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Omar Sandoval, linux-block, kernel-team


> Il giorno 02 mar 2017, alle ore 19:25, Jens Axboe <axboe@fb.com> ha =
scritto:
>=20
> On 03/02/2017 11:07 AM, Paolo Valente wrote:
>>=20
>>> Il giorno 02 mar 2017, alle ore 17:13, Jens Axboe <axboe@fb.com> ha =
scritto:
>>>=20
>>> On 03/02/2017 09:07 AM, Paolo Valente wrote:
>>>>=20
>>>>> Il giorno 02 mar 2017, alle ore 16:13, Jens Axboe <axboe@fb.com> =
ha scritto:
>>>>>=20
>>>>> On 03/02/2017 08:00 AM, Jens Axboe wrote:
>>>>>> On 03/02/2017 03:28 AM, Paolo Valente wrote:
>>>>>>>=20
>>>>>>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe =
<axboe@fb.com> ha scritto:
>>>>>>>>=20
>>>>>>>> On 02/15/2017 10:58 AM, Jens Axboe wrote:
>>>>>>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote:
>>>>>>>>>>=20
>>>>>>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>>>>>>>>>>>=20
>>>>>>>>>>> From: Omar Sandoval <osandov@fb.com>
>>>>>>>>>>>=20
>>>>>>>>>>> None of the other blk-mq elevator hooks are called with this =
lock held.
>>>>>>>>>>> Additionally, it can lead to circular locking dependencies =
between
>>>>>>>>>>> queue_lock and the private scheduler lock.
>>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> Hi Omar,
>>>>>>>>>> I'm sorry but it seems that a new potential deadlock has =
showed up.
>>>>>>>>>> See lockdep splat below.
>>>>>>>>>>=20
>>>>>>>>>> I've tried to think about different solutions than turning =
back to
>>>>>>>>>> deferring the body of exit_icq, but at no avail.
>>>>>>>>>=20
>>>>>>>>> Looks like a interaction between bfqd->lock and q->queue_lock. =
Since the
>>>>>>>>> core has no notion of you bfqd->lock, the naturally dependency =
here
>>>>>>>>> would be to nest bfqd->lock inside q->queue_lock. Is that =
possible for
>>>>>>>>> you?
>>>>>>>>>=20
>>>>>>>>> Looking at the code a bit, maybe it'd just be simpler to get =
rid of
>>>>>>>>> holding the queue lock for that spot. For the mq scheduler, we =
really
>>>>>>>>> don't want places where we invoke with that held already. Does =
the below
>>>>>>>>> work for you?
>>>>>>>>=20
>>>>>>>> Would need to remove one more lockdep assert. And only test =
this for
>>>>>>>> the mq parts, we'd need to spread a bit of love on the classic
>>>>>>>> scheduling icq exit path for this to work on that side.
>>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> Jens,
>>>>>>> here is the reply I anticipated in my previous email: after =
rebasing
>>>>>>> against master, I'm getting again the deadlock that this patch =
of
>>>>>>> yours solved (together with some changes in bfq-mq).  I thought =
you added a
>>>>>>> sort of equivalent commit (now) to the mainline branch.  Am I =
wrong?
>>>>>>=20
>>>>>> The patch I posted was never pulled to completion, it wasn't =
clear
>>>>>> to me if it fixed your issue or not. Maybe I missed a reply on
>>>>>> that?
>>>>>>=20
>>>>>> Let me take another stab at it today, I'll send you a version to =
test
>>>>>> on top of my for-linus branch.
>>>>>=20
>>>>> I took at look at my last posting, and I think it was only missing =
a
>>>>> lock grab in cfq, since we don't hold that for icq exit anymore.  =
Paolo,
>>>>> it'd be great if you could verify that this works. Not for bfq-mq =
(we
>>>>> already know it does), but for CFQ.
>>>>=20
>>>> No luck, sorry :(
>>>>=20
>>>> It looks like to have just not to take q->queue_lock in =
cfq_exit_icq.
>>>=20
>>> I was worried about that. How about the below? We need to grab the =
lock
>>> at some point for legacy scheduling, but the ordering should be =
correct.
>>>=20
>>>=20
>>=20
>> It seems to be: no more crashes or lockdep complains after several =
tests, boots and shutdowns :)
>=20
> Great! Can I add your Tested-by?

Sure!

Thanks,
Paolo

>=20
> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 16:13               ` Jens Axboe
  2017-03-02 18:07                 ` Paolo Valente
@ 2017-03-02 20:53                 ` Omar Sandoval
  2017-03-02 20:59                   ` Jens Axboe
  1 sibling, 1 reply; 18+ messages in thread
From: Omar Sandoval @ 2017-03-02 20:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Paolo Valente, linux-block, kernel-team

On Thu, Mar 02, 2017 at 09:13:30AM -0700, Jens Axboe wrote:
> I was worried about that. How about the below? We need to grab the lock
> at some point for legacy scheduling, but the ordering should be correct.

Makes sense, now the locking is consistent with the other place we call
ioc_exit_icq(). One nit below, and you can add

Reviewed-by: Omar Sandoval <osandov@fb.com>

> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index b12f9c87b4c3..6fd633b5d567 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
>  	icq->flags |= ICQ_EXITED;
>  }
>  
> -/* Release an icq.  Called with both ioc and q locked. */
> +/* Release an icq.  Called with ioc locked. */

For ioc_exit_icq(), we have the more explicit comment

/*
 * Exit an icq. Called with both ioc and q locked for sq, only ioc locked for
 * mq.
 */

Could you document that here, too?

>  static void ioc_destroy_icq(struct io_cq *icq)
>  {
>  	struct io_context *ioc = icq->ioc;
> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq)
>  	struct elevator_type *et = q->elevator->type;
>  
>  	lockdep_assert_held(&ioc->lock);
> -	lockdep_assert_held(q->queue_lock);
>  
>  	radix_tree_delete(&ioc->icq_tree, icq->q->id);
>  	hlist_del_init(&icq->ioc_node);
> @@ -222,24 +221,40 @@ void exit_io_context(struct task_struct *task)
>  	put_io_context_active(ioc);
>  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq
  2017-03-02 20:53                 ` Omar Sandoval
@ 2017-03-02 20:59                   ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2017-03-02 20:59 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: Paolo Valente, linux-block, kernel-team

On 03/02/2017 01:53 PM, Omar Sandoval wrote:
> On Thu, Mar 02, 2017 at 09:13:30AM -0700, Jens Axboe wrote:
>> I was worried about that. How about the below? We need to grab the lock
>> at some point for legacy scheduling, but the ordering should be correct.
> 
> Makes sense, now the locking is consistent with the other place we call
> ioc_exit_icq(). One nit below, and you can add
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
>> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
>> index b12f9c87b4c3..6fd633b5d567 100644
>> --- a/block/blk-ioc.c
>> +++ b/block/blk-ioc.c
>> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq)
>>  	icq->flags |= ICQ_EXITED;
>>  }
>>  
>> -/* Release an icq.  Called with both ioc and q locked. */
>> +/* Release an icq.  Called with ioc locked. */
> 
> For ioc_exit_icq(), we have the more explicit comment
> 
> /*
>  * Exit an icq. Called with both ioc and q locked for sq, only ioc locked for
>  * mq.
>  */
> 
> Could you document that here, too?

Done, I've synced the two comments now. Thanks for the review!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-03-02 21:35 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-10 18:32 [PATCH] blk-mq-sched: don't hold queue_lock when calling exit_icq Omar Sandoval
2017-02-10 18:35 ` Jens Axboe
2017-02-15 17:24 ` Paolo Valente
2017-02-15 17:58   ` Jens Axboe
2017-02-15 18:04     ` Jens Axboe
2017-02-16 10:31       ` Paolo Valente
2017-02-17 10:30         ` Paolo Valente
2017-02-22 21:21           ` Paolo Valente
2017-03-02 10:28       ` Paolo Valente
2017-03-02 15:00         ` Jens Axboe
2017-03-02 15:13           ` Jens Axboe
2017-03-02 16:07             ` Paolo Valente
2017-03-02 16:13               ` Jens Axboe
2017-03-02 18:07                 ` Paolo Valente
2017-03-02 18:25                   ` Jens Axboe
2017-03-02 20:31                     ` Paolo Valente
2017-03-02 20:53                 ` Omar Sandoval
2017-03-02 20:59                   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).