linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Fixes for potential deadlock during reboot
@ 2019-06-12 16:19 Coly Li
  2019-06-12 16:19 ` [PATCH 1/4] bcache: avoid a deadlock in bcache_reboot() Coly Li
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Coly Li @ 2019-06-12 16:19 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Hi folks,

Recently during my for-4.3 testing, I observe some lockdep warnings for
potential deadlock during reboot the kernel. Such potential deadlocks
are very hard to trigger in real life, but fix them and make lockdep
happy is still worthy IMHO.

This series is an effort to fix the potential deadlock issues, since
they are not critical stability fixes, they won't be CCed to stable
mailing list.

Thanks in advance for any commnet or review.

Coly Li
---

Coly Li (4):
  bcache: avoid a deadlock in bcache_reboot()
  bcache: acquire bch_register_lock later in cached_dev_detach_finish()
  bcache: acquire bch_register_lock later in cached_dev_free()
  bcache: fix potential deadlock in cached_def_free()

 drivers/md/bcache/super.c     | 49 +++++++++++++++++++++++++++++++++++++------
 drivers/md/bcache/sysfs.c     | 26 +++++++++++++++++++++++
 drivers/md/bcache/writeback.c |  4 ++++
 3 files changed, 73 insertions(+), 6 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] bcache: avoid a deadlock in bcache_reboot()
  2019-06-12 16:19 [PATCH 0/4] Fixes for potential deadlock during reboot Coly Li
@ 2019-06-12 16:19 ` Coly Li
  2019-06-12 16:19 ` [PATCH 2/4] bcache: acquire bch_register_lock later in cached_dev_detach_finish() Coly Li
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Coly Li @ 2019-06-12 16:19 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

It is quite frequently to observe deadlock in bcache_reboot() happens
and hang the system reboot process. The reason is, in bcache_reboot()
when calling bch_cache_set_stop() and bcache_device_stop() the mutex
bch_register_lock is held. But in the process to stop cache set and
bcache device, bch_register_lock will be acquired again. If this mutex
is held here, deadlock will happen inside the stopping process. The
aftermath of the deadlock is, whole system reboot gets hung.

The fix is to avoid holding bch_register_lock for the following loops
in bcache_reboot(),
       list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
                bch_cache_set_stop(c);

        list_for_each_entry_safe(dc, tdc, &uncached_devices, list)
                bcache_device_stop(&dc->disk);

A module range variable 'bcache_is_reboot' is added, it sets to true
in bcache_reboot(). In register_bcache(), if bcache_is_reboot is checked
to be true, reject the registration by returning -EBUSY immediately.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 40 +++++++++++++++++++++++++++++++++++++++-
 drivers/md/bcache/sysfs.c | 26 ++++++++++++++++++++++++++
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 16ed617183f7..2928adfea743 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -40,6 +40,7 @@ static const char invalid_uuid[] = {
 
 static struct kobject *bcache_kobj;
 struct mutex bch_register_lock;
+bool bcache_is_reboot;
 LIST_HEAD(bch_cache_sets);
 static LIST_HEAD(uncached_devices);
 
@@ -49,6 +50,7 @@ static wait_queue_head_t unregister_wait;
 struct workqueue_struct *bcache_wq;
 struct workqueue_struct *bch_journal_wq;
 
+
 #define BTREE_MAX_PAGES		(256 * 1024 / PAGE_SIZE)
 /* limitation of partitions number on single bcache device */
 #define BCACHE_MINORS		128
@@ -2345,6 +2347,11 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	if (!try_module_get(THIS_MODULE))
 		return -EBUSY;
 
+	/* For latest state of bcache_is_reboot */
+	smp_mb();
+	if (bcache_is_reboot)
+		return -EBUSY;
+
 	path = kstrndup(buffer, size, GFP_KERNEL);
 	if (!path)
 		goto err;
@@ -2474,6 +2481,9 @@ static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
 
 static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x)
 {
+	if (bcache_is_reboot)
+		return NOTIFY_DONE;
+
 	if (code == SYS_DOWN ||
 	    code == SYS_HALT ||
 	    code == SYS_POWER_OFF) {
@@ -2486,19 +2496,45 @@ static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x)
 
 		mutex_lock(&bch_register_lock);
 
+		if (bcache_is_reboot)
+			goto out;
+
+		/* New registration is rejected since now */
+		bcache_is_reboot = true;
+		/*
+		 * Make registering caller (if there is) on other CPU
+		 * core know bcache_is_reboot set to true earlier
+		 */
+		smp_mb();
+
 		if (list_empty(&bch_cache_sets) &&
 		    list_empty(&uncached_devices))
 			goto out;
 
+		mutex_unlock(&bch_register_lock);
+
 		pr_info("Stopping all devices:");
 
+		/*
+		 * The reason bch_register_lock is not held to call
+		 * bch_cache_set_stop() and bcache_device_stop() is to
+		 * avoid potential deadlock during reboot, because cache
+		 * set or bcache device stopping process will acqurie
+		 * bch_register_lock too.
+		 *
+		 * We are safe here because bcache_is_reboot sets to
+		 * true already, register_bcache() will reject new
+		 * registration now. bcache_is_reboot also makes sure
+		 * bcache_reboot() won't be re-entered on by other thread,
+		 * so there is no race in following list iteration by
+		 * list_for_each_entry_safe().
+		 */
 		list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
 			bch_cache_set_stop(c);
 
 		list_for_each_entry_safe(dc, tdc, &uncached_devices, list)
 			bcache_device_stop(&dc->disk);
 
-		mutex_unlock(&bch_register_lock);
 
 		/*
 		 * Give an early chance for other kthreads and
@@ -2626,6 +2662,8 @@ static int __init bcache_init(void)
 	bch_debug_init();
 	closure_debug_init();
 
+	bcache_is_reboot = false;
+
 	return 0;
 err:
 	bcache_exit();
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 82a2fe8ac86b..4ab15442cab5 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -16,6 +16,8 @@
 #include <linux/sort.h>
 #include <linux/sched/clock.h>
 
+extern bool bcache_is_reboot;
+
 /* Default is 0 ("writethrough") */
 static const char * const bch_cache_modes[] = {
 	"writethrough",
@@ -267,6 +269,10 @@ STORE(__cached_dev)
 	struct cache_set *c;
 	struct kobj_uevent_env *env;
 
+	/* no user space access if system is rebooting */
+	if (bcache_is_reboot)
+		return -EBUSY;
+
 #define d_strtoul(var)		sysfs_strtoul(var, dc->var)
 #define d_strtoul_nonzero(var)	sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX)
 #define d_strtoi_h(var)		sysfs_hatoi(var, dc->var)
@@ -407,6 +413,10 @@ STORE(bch_cached_dev)
 	struct cached_dev *dc = container_of(kobj, struct cached_dev,
 					     disk.kobj);
 
+	/* no user space access if system is rebooting */
+	if (bcache_is_reboot)
+		return -EBUSY;
+
 	mutex_lock(&bch_register_lock);
 	size = __cached_dev_store(kobj, attr, buf, size);
 
@@ -505,6 +515,10 @@ STORE(__bch_flash_dev)
 					       kobj);
 	struct uuid_entry *u = &d->c->uuids[d->id];
 
+	/* no user space access if system is rebooting */
+	if (bcache_is_reboot)
+		return -EBUSY;
+
 	sysfs_strtoul(data_csum,	d->data_csum);
 
 	if (attr == &sysfs_size) {
@@ -740,6 +754,10 @@ STORE(__bch_cache_set)
 	struct cache_set *c = container_of(kobj, struct cache_set, kobj);
 	ssize_t v;
 
+	/* no user space access if system is rebooting */
+	if (bcache_is_reboot)
+		return -EBUSY;
+
 	if (attr == &sysfs_unregister)
 		bch_cache_set_unregister(c);
 
@@ -859,6 +877,10 @@ STORE(bch_cache_set_internal)
 {
 	struct cache_set *c = container_of(kobj, struct cache_set, internal);
 
+	/* no user space access if system is rebooting */
+	if (bcache_is_reboot)
+		return -EBUSY;
+
 	return bch_cache_set_store(&c->kobj, attr, buf, size);
 }
 
@@ -1044,6 +1066,10 @@ STORE(__bch_cache)
 	struct cache *ca = container_of(kobj, struct cache, kobj);
 	ssize_t v;
 
+	/* no user space access if system is rebooting */
+	if (bcache_is_reboot)
+		return -EBUSY;
+
 	if (attr == &sysfs_discard) {
 		bool v = strtoul_or_return(buf);
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] bcache: acquire bch_register_lock later in cached_dev_detach_finish()
  2019-06-12 16:19 [PATCH 0/4] Fixes for potential deadlock during reboot Coly Li
  2019-06-12 16:19 ` [PATCH 1/4] bcache: avoid a deadlock in bcache_reboot() Coly Li
@ 2019-06-12 16:19 ` Coly Li
  2019-06-12 16:19 ` [PATCH 3/4] bcache: acquire bch_register_lock later in cached_dev_free() Coly Li
  2019-06-12 16:19 ` [PATCH 4/4] bcache: fix potential deadlock in cached_def_free() Coly Li
  3 siblings, 0 replies; 5+ messages in thread
From: Coly Li @ 2019-06-12 16:19 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Now there is variable bcache_is_reboot to prevent device register or
unregister during reboot, it is unncessary to still hold mutex lock
bch_register_lock before stopping writeback_rate_update kworker and
writeback kthread. And if the stopping kworker or kthread holding
bch_register_lock inside their routine (we used to have such problem
in writeback thread, thanks to Junhui Wang fixed it), it is very easy
to introduce deadlock during reboot/shutdown procedure.

Therefore in this patch, the location to acquire bch_register_lock is
moved to the location before calling calc_cached_dev_sectors(). Which
is later then original location in cached_dev_detach_finish().

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 2928adfea743..15916b3ec8bf 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1018,7 +1018,6 @@ static void cached_dev_detach_finish(struct work_struct *w)
 	BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags));
 	BUG_ON(refcount_read(&dc->count));
 
-	mutex_lock(&bch_register_lock);
 
 	if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
 		cancel_writeback_rate_update_dwork(dc);
@@ -1034,6 +1033,8 @@ static void cached_dev_detach_finish(struct work_struct *w)
 	bch_write_bdev_super(dc, &cl);
 	closure_sync(&cl);
 
+	mutex_lock(&bch_register_lock);
+
 	calc_cached_dev_sectors(dc->disk.c);
 	bcache_device_detach(&dc->disk);
 	list_move(&dc->list, &uncached_devices);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] bcache: acquire bch_register_lock later in cached_dev_free()
  2019-06-12 16:19 [PATCH 0/4] Fixes for potential deadlock during reboot Coly Li
  2019-06-12 16:19 ` [PATCH 1/4] bcache: avoid a deadlock in bcache_reboot() Coly Li
  2019-06-12 16:19 ` [PATCH 2/4] bcache: acquire bch_register_lock later in cached_dev_detach_finish() Coly Li
@ 2019-06-12 16:19 ` Coly Li
  2019-06-12 16:19 ` [PATCH 4/4] bcache: fix potential deadlock in cached_def_free() Coly Li
  3 siblings, 0 replies; 5+ messages in thread
From: Coly Li @ 2019-06-12 16:19 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

When enable lockdep engine, a lockdep warning can be observed when
reboot or shutdown system,

[ 3142.764557][    T1] bcache: bcache_reboot() Stopping all devices:
[ 3142.776265][ T2649]
[ 3142.777159][ T2649] ======================================================
[ 3142.780039][ T2649] WARNING: possible circular locking dependency detected
[ 3142.782869][ T2649] 5.2.0-rc4-lp151.20-default+ #1 Tainted: G        W
[ 3142.785684][ T2649] ------------------------------------------------------
[ 3142.788479][ T2649] kworker/3:67/2649 is trying to acquire lock:
[ 3142.790738][ T2649] 00000000aaf02291 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
[ 3142.794678][ T2649]
[ 3142.794678][ T2649] but task is already holding lock:
[ 3142.797402][ T2649] 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
[ 3142.801462][ T2649]
[ 3142.801462][ T2649] which lock already depends on the new lock.
[ 3142.801462][ T2649]
[ 3142.805277][ T2649]
[ 3142.805277][ T2649] the existing dependency chain (in reverse order) is:
[ 3142.808902][ T2649]
[ 3142.808902][ T2649] -> #2 (&bch_register_lock){+.+.}:
[ 3142.812396][ T2649]        __mutex_lock+0x7a/0x9d0
[ 3142.814184][ T2649]        cached_dev_free+0x17/0x120 [bcache]
[ 3142.816415][ T2649]        process_one_work+0x2a4/0x640
[ 3142.818413][ T2649]        worker_thread+0x39/0x3f0
[ 3142.820276][ T2649]        kthread+0x125/0x140
[ 3142.822061][ T2649]        ret_from_fork+0x3a/0x50
[ 3142.823965][ T2649]
[ 3142.823965][ T2649] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
[ 3142.827244][ T2649]        process_one_work+0x277/0x640
[ 3142.829160][ T2649]        worker_thread+0x39/0x3f0
[ 3142.830958][ T2649]        kthread+0x125/0x140
[ 3142.832674][ T2649]        ret_from_fork+0x3a/0x50
[ 3142.834915][ T2649]
[ 3142.834915][ T2649] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
[ 3142.838121][ T2649]        lock_acquire+0xb4/0x1c0
[ 3142.840025][ T2649]        flush_workqueue+0xae/0x4c0
[ 3142.842035][ T2649]        drain_workqueue+0xa9/0x180
[ 3142.844042][ T2649]        destroy_workqueue+0x17/0x250
[ 3142.846142][ T2649]        cached_dev_free+0x52/0x120 [bcache]
[ 3142.848530][ T2649]        process_one_work+0x2a4/0x640
[ 3142.850663][ T2649]        worker_thread+0x39/0x3f0
[ 3142.852464][ T2649]        kthread+0x125/0x140
[ 3142.854106][ T2649]        ret_from_fork+0x3a/0x50
[ 3142.855880][ T2649]
[ 3142.855880][ T2649] other info that might help us debug this:
[ 3142.855880][ T2649]
[ 3142.859663][ T2649] Chain exists of:
[ 3142.859663][ T2649]   (wq_completion)bcache_writeback_wq --> (work_completion)(&cl->work)#2 --> &bch_register_lock
[ 3142.859663][ T2649]
[ 3142.865424][ T2649]  Possible unsafe locking scenario:
[ 3142.865424][ T2649]
[ 3142.868022][ T2649]        CPU0                    CPU1
[ 3142.869885][ T2649]        ----                    ----
[ 3142.871751][ T2649]   lock(&bch_register_lock);
[ 3142.873379][ T2649]                                lock((work_completion)(&cl->work)#2);
[ 3142.876399][ T2649]                                lock(&bch_register_lock);
[ 3142.879727][ T2649]   lock((wq_completion)bcache_writeback_wq);
[ 3142.882064][ T2649]
[ 3142.882064][ T2649]  *** DEADLOCK ***
[ 3142.882064][ T2649]
[ 3142.885060][ T2649] 3 locks held by kworker/3:67/2649:
[ 3142.887245][ T2649]  #0: 00000000e774cdd0 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
[ 3142.890815][ T2649]  #1: 00000000f7df89da ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[ 3142.894884][ T2649]  #2: 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
[ 3142.898797][ T2649]
[ 3142.898797][ T2649] stack backtrace:
[ 3142.900961][ T2649] CPU: 3 PID: 2649 Comm: kworker/3:67 Tainted: G        W         5.2.0-rc4-lp151.20-default+ #1
[ 3142.904789][ T2649] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 3142.909168][ T2649] Workqueue: events cached_dev_free [bcache]
[ 3142.911422][ T2649] Call Trace:
[ 3142.912656][ T2649]  dump_stack+0x85/0xcb
[ 3142.914181][ T2649]  print_circular_bug+0x19a/0x1f0
[ 3142.916193][ T2649]  __lock_acquire+0x16cd/0x1850
[ 3142.917936][ T2649]  ? __lock_acquire+0x6a8/0x1850
[ 3142.919704][ T2649]  ? lock_acquire+0xb4/0x1c0
[ 3142.921335][ T2649]  ? find_held_lock+0x34/0xa0
[ 3142.923052][ T2649]  lock_acquire+0xb4/0x1c0
[ 3142.924635][ T2649]  ? flush_workqueue+0x87/0x4c0
[ 3142.926375][ T2649]  flush_workqueue+0xae/0x4c0
[ 3142.928047][ T2649]  ? flush_workqueue+0x87/0x4c0
[ 3142.929824][ T2649]  ? drain_workqueue+0xa9/0x180
[ 3142.931686][ T2649]  drain_workqueue+0xa9/0x180
[ 3142.933534][ T2649]  destroy_workqueue+0x17/0x250
[ 3142.935787][ T2649]  cached_dev_free+0x52/0x120 [bcache]
[ 3142.937795][ T2649]  process_one_work+0x2a4/0x640
[ 3142.939803][ T2649]  worker_thread+0x39/0x3f0
[ 3142.941487][ T2649]  ? process_one_work+0x640/0x640
[ 3142.943389][ T2649]  kthread+0x125/0x140
[ 3142.944894][ T2649]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 3142.947744][ T2649]  ret_from_fork+0x3a/0x50
[ 3142.970358][ T2649] bcache: bcache_device_free() bcache0 stopped

Here is how the deadlock happens.
1) bcache_reboot() calls bcache_device_stop(), then inside
   bcache_device_stop() BCACHE_DEV_CLOSING bit is set on d->flags.
   Then closure_queue(&d->cl) is called to invoke cached_dev_flush().
2) In cached_dev_flush(), cached_dev_free() is called by continu_at().
3) In cached_dev_free(), when stopping the writeback kthread of the
   cached device by kthread_stop(), dc->writeback_thread will be waken
   up to quite the kthread while-loop, then cached_dev_put() is called
   in bch_writeback_thread().
4) Calling cached_dev_put() in writeback kthread may drop dc->count to
   0, then dc->detach kworker is scheduled, which is initialized as
   cached_dev_detach_finish().
5) Inside cached_dev_detach_finish(), the last line of code is to call
   closure_put(&dc->disk.cl), which drops the last reference counter of
   closrure dc->disk.cl, then the callback cached_dev_flush() gets
   called.
Now cached_dev_flush() is called for second time in the code path, the
first time is in step 2). And again bch_register_lock will be acquired
again, and a A-A lock (lockdep terminology) is happening.

The root cause of the above A-A lock is in cached_dev_free(), mutex
bch_register_lock is held before stopping writeback kthread and other
kworkers. Fortunately now we have variable 'bcache_is_reboot', which may
prevent device registration or unregistration during reboot/shutdown
time, so it is unncessary to hold bch_register_lock such early now.

This is how this patch fixes the reboot/shutdown time A-A lock issue:
After moving mutex_lock(&bch_register_lock) to a later location where
before atomic_read(&dc->running) in cached_dev_free(), such A-A lock
problem can be solved without any reboot time registration race.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 15916b3ec8bf..f376ba7e4d3f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1223,8 +1223,6 @@ static void cached_dev_free(struct closure *cl)
 {
 	struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);
 
-	mutex_lock(&bch_register_lock);
-
 	if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
 		cancel_writeback_rate_update_dwork(dc);
 
@@ -1235,6 +1233,8 @@ static void cached_dev_free(struct closure *cl)
 	if (!IS_ERR_OR_NULL(dc->status_update_thread))
 		kthread_stop(dc->status_update_thread);
 
+	mutex_lock(&bch_register_lock);
+
 	if (atomic_read(&dc->running))
 		bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
 	bcache_device_free(&dc->disk);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] bcache: fix potential deadlock in cached_def_free()
  2019-06-12 16:19 [PATCH 0/4] Fixes for potential deadlock during reboot Coly Li
                   ` (2 preceding siblings ...)
  2019-06-12 16:19 ` [PATCH 3/4] bcache: acquire bch_register_lock later in cached_dev_free() Coly Li
@ 2019-06-12 16:19 ` Coly Li
  3 siblings, 0 replies; 5+ messages in thread
From: Coly Li @ 2019-06-12 16:19 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

When enable lockdep and reboot system with a writeback mode bcache
device, the following potential deadlock warning is reported by lockdep
engine.

[  101.536569][  T401] kworker/2:2/401 is trying to acquire lock:
[  101.538575][  T401] 00000000bbf6e6c7 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
[  101.542054][  T401]
[  101.542054][  T401] but task is already holding lock:
[  101.544587][  T401] 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[  101.548386][  T401]
[  101.548386][  T401] which lock already depends on the new lock.
[  101.548386][  T401]
[  101.551874][  T401]
[  101.551874][  T401] the existing dependency chain (in reverse order) is:
[  101.555000][  T401]
[  101.555000][  T401] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
[  101.557860][  T401]        process_one_work+0x277/0x640
[  101.559661][  T401]        worker_thread+0x39/0x3f0
[  101.561340][  T401]        kthread+0x125/0x140
[  101.562963][  T401]        ret_from_fork+0x3a/0x50
[  101.564718][  T401]
[  101.564718][  T401] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
[  101.567701][  T401]        lock_acquire+0xb4/0x1c0
[  101.569651][  T401]        flush_workqueue+0xae/0x4c0
[  101.571494][  T401]        drain_workqueue+0xa9/0x180
[  101.573234][  T401]        destroy_workqueue+0x17/0x250
[  101.575109][  T401]        cached_dev_free+0x44/0x120 [bcache]
[  101.577304][  T401]        process_one_work+0x2a4/0x640
[  101.579357][  T401]        worker_thread+0x39/0x3f0
[  101.581055][  T401]        kthread+0x125/0x140
[  101.582709][  T401]        ret_from_fork+0x3a/0x50
[  101.584592][  T401]
[  101.584592][  T401] other info that might help us debug this:
[  101.584592][  T401]
[  101.588355][  T401]  Possible unsafe locking scenario:
[  101.588355][  T401]
[  101.590974][  T401]        CPU0                    CPU1
[  101.592889][  T401]        ----                    ----
[  101.594743][  T401]   lock((work_completion)(&cl->work)#2);
[  101.596785][  T401]                                lock((wq_completion)bcache_writeback_wq);
[  101.600072][  T401]                                lock((work_completion)(&cl->work)#2);
[  101.602971][  T401]   lock((wq_completion)bcache_writeback_wq);
[  101.605255][  T401]
[  101.605255][  T401]  *** DEADLOCK ***
[  101.605255][  T401]
[  101.608310][  T401] 2 locks held by kworker/2:2/401:
[  101.610208][  T401]  #0: 00000000cf2c7d17 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
[  101.613709][  T401]  #1: 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
[  101.617480][  T401]
[  101.617480][  T401] stack backtrace:
[  101.619539][  T401] CPU: 2 PID: 401 Comm: kworker/2:2 Tainted: G        W         5.2.0-rc4-lp151.20-default+ #1
[  101.623225][  T401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[  101.627210][  T401] Workqueue: events cached_dev_free [bcache]
[  101.629239][  T401] Call Trace:
[  101.630360][  T401]  dump_stack+0x85/0xcb
[  101.631777][  T401]  print_circular_bug+0x19a/0x1f0
[  101.633485][  T401]  __lock_acquire+0x16cd/0x1850
[  101.635184][  T401]  ? __lock_acquire+0x6a8/0x1850
[  101.636863][  T401]  ? lock_acquire+0xb4/0x1c0
[  101.638421][  T401]  ? find_held_lock+0x34/0xa0
[  101.640015][  T401]  lock_acquire+0xb4/0x1c0
[  101.641513][  T401]  ? flush_workqueue+0x87/0x4c0
[  101.643248][  T401]  flush_workqueue+0xae/0x4c0
[  101.644832][  T401]  ? flush_workqueue+0x87/0x4c0
[  101.646476][  T401]  ? drain_workqueue+0xa9/0x180
[  101.648303][  T401]  drain_workqueue+0xa9/0x180
[  101.649867][  T401]  destroy_workqueue+0x17/0x250
[  101.651503][  T401]  cached_dev_free+0x44/0x120 [bcache]
[  101.653328][  T401]  process_one_work+0x2a4/0x640
[  101.655029][  T401]  worker_thread+0x39/0x3f0
[  101.656693][  T401]  ? process_one_work+0x640/0x640
[  101.658501][  T401]  kthread+0x125/0x140
[  101.660012][  T401]  ? kthread_create_worker_on_cpu+0x70/0x70
[  101.661985][  T401]  ret_from_fork+0x3a/0x50
[  101.691318][  T401] bcache: bcache_device_free() bcache0 stopped

Here is how the above potential deadlock may happen in reboot/shutdown
code path,
1) bcache_reboot() is called firstly in the reboot/shutdown code path,
   then in bcache_reboot(), bcache_device_stop() is called.
2) bcache_device_stop() sets BCACHE_DEV_CLOSING on d->falgs, then call
   closure_queue(&d->cl) to invoke cached_dev_flush(). And in turn
   cached_dev_flush() calls cached_dev_free() via closure_at()
3) In cached_dev_free(), after stopped writebach kthread
   dc->writeback_thread, the kwork dc->writeback_write_wq is stopping by
   destroy_workqueue().
4) Inside destroy_workqueue(), drain_workqueue() is called. Inside
   drain_workqueue(), flush_workqueue() is called. Then wq->lockdep_map
   is acquired by lock_map_acquire() in flush_workqueue(). After the
   lock acquired the rest part of flush_workqueue() just wait for the
   workqueue to complete.
5) Now we look back at writeback thread routine bch_writeback_thread(),
   in the main while-loop, write_dirty() is called via continue_at() in
   read_dirty_submit(), which is called via continue_at() in while-loop
   level called function read_dirty(). Inside write_dirty() it may be
   re-called on workqueeu dc->writeback_write_wq via continue_at().
   It means when the writeback kthread is stopped in cached_dev_free()
   there might be still one kworker queued on dc->writeback_write_wq
   to execute write_dirty() again.
6) Now this kworker is scheduled on dc->writeback_write_wq to run by
   process_one_work() (which is called by worker_thread()). Before
   calling the kwork routine, wq->lockdep_map is acquired.
7) But wq->lockdep_map is acquired already in step 4), so a A-A lock
   (lockdep terminology) scenario happens.

Indeed on multiple cores syatem, the above deadlock is very rare to
happen, just as the code comments in process_one_work() says,
2263     * AFAICT there is no possible deadlock scenario between the
2264     * flush_work() and complete() primitives (except for
	   single-threaded
2265     * workqueues), so hiding them isn't a problem.

But it is still good to fix such lockdep warning, even no one running
bcache on single core system.

The fix is simple. This patch solves the above potential deadlock by,
- Do not destroy workqueue dc->writeback_write_wq in cached_dev_free().
- Flush and destroy dc->writeback_write_wq in writebach kthread routine
  bch_writeback_thread(), where after quit the thread main while-loop
  and before cached_dev_put() is called.

By this fix, dc->writeback_write_wq will be stopped and destroy before
the writeback kthread stopped, so the chance for a A-A locking on
wq->lockdep_map is disappeared, such A-A deadlock won't happen
any more.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c     | 2 --
 drivers/md/bcache/writeback.c | 4 ++++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index f376ba7e4d3f..06b4cc0cecce 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1228,8 +1228,6 @@ static void cached_dev_free(struct closure *cl)
 
 	if (!IS_ERR_OR_NULL(dc->writeback_thread))
 		kthread_stop(dc->writeback_thread);
-	if (dc->writeback_write_wq)
-		destroy_workqueue(dc->writeback_write_wq);
 	if (!IS_ERR_OR_NULL(dc->status_update_thread))
 		kthread_stop(dc->status_update_thread);
 
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 73f0efac2b9f..df0f4e5a051a 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -735,6 +735,10 @@ static int bch_writeback_thread(void *arg)
 		}
 	}
 
+	if (dc->writeback_write_wq) {
+		flush_workqueue(dc->writeback_write_wq);
+		destroy_workqueue(dc->writeback_write_wq);
+	}
 	cached_dev_put(dc);
 	wait_for_kthread_stop();
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-12 16:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-12 16:19 [PATCH 0/4] Fixes for potential deadlock during reboot Coly Li
2019-06-12 16:19 ` [PATCH 1/4] bcache: avoid a deadlock in bcache_reboot() Coly Li
2019-06-12 16:19 ` [PATCH 2/4] bcache: acquire bch_register_lock later in cached_dev_detach_finish() Coly Li
2019-06-12 16:19 ` [PATCH 3/4] bcache: acquire bch_register_lock later in cached_dev_free() Coly Li
2019-06-12 16:19 ` [PATCH 4/4] bcache: fix potential deadlock in cached_def_free() Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).