All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] bcache fixes for device failure handling
@ 2018-04-24 12:14 Coly Li
  2018-04-24 12:14 ` [PATCH 1/6] bcache: store disk name in struct cache and struct cached_dev Coly Li
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Coly Li @ 2018-04-24 12:14 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Hi folks,

I receive bug reports from partners for the bcache cache device failure
handling patch set (which is just merged into 4.17). Fortunately we are
still in 4.17 merge window, so I ask helps for offering code review
before I ask Jens to pick these fixes for 4.17-rc3.

The patches are well commneted IMHO and passes regression test by myself.
Thanks for your help in advance.

Coly Li

---
Coly Li (6):
  bcache: store disk name in struct cache and struct cached_dev
  bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
  bcache: count backing device I/O error for writeback I/O
  bcache: add wait_for_kthread_stop() in bch_allocator_thread()
  bcache: set dc->io_disable to true in conditional_stop_bcache_device()
  bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set

 drivers/md/bcache/alloc.c     |  1 +
 drivers/md/bcache/bcache.h    |  4 +++
 drivers/md/bcache/debug.c     |  3 +-
 drivers/md/bcache/io.c        |  8 ++---
 drivers/md/bcache/request.c   |  5 +--
 drivers/md/bcache/super.c     | 73 ++++++++++++++++++++++++++++++-------------
 drivers/md/bcache/writeback.c |  4 ++-
 7 files changed, 64 insertions(+), 34 deletions(-)

-- 
2.16.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/6] bcache: store disk name in struct cache and struct cached_dev
  2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
@ 2018-04-24 12:14 ` Coly Li
  2018-04-24 12:14 ` [PATCH 2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() Coly Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-04-24 12:14 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Current code uses bdevname() or bio_devname() to reference gendisk
disk name when bcache needs to display the disk names in kernel message.
It was safe before bcache device failure handling patch set merged in,
because when devices are failed, there was deadlock to prevent bcache
printing error messages with gendisk disk name. But after the failure
handling patch set merged, the deadlock is fixed, so it is possible
that the gendisk structure bdev->hd_disk is released when bdevname() is
called to reference bdev->bd_disk->disk_name[]. This is why I receive
bug report of NULL pointers deference panic.

This patch stores gendisk disk name in a buffer inside struct cache and
struct cached_dev, then print out the offline device name won't reference
bdev->hd_disk anymore. And this patch also avoids extra function calls
of bdevname() and bio_devnmae().

Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
Fixes: 5138ac6748e38 ("bcache: fix misleading error message in bch_count_io_errors()")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bcache.h  |  4 ++++
 drivers/md/bcache/debug.c   |  3 +--
 drivers/md/bcache/io.c      |  8 +++-----
 drivers/md/bcache/request.c |  5 +----
 drivers/md/bcache/super.c   | 42 ++++++++++++++++++++----------------------
 5 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index d338b7086013..3a0cfb237af9 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -392,6 +392,8 @@ struct cached_dev {
 #define DEFAULT_CACHED_DEV_ERROR_LIMIT	64
 	atomic_t		io_errors;
 	unsigned		error_limit;
+
+	char			backing_dev_name[BDEVNAME_SIZE];
 };
 
 enum alloc_reserve {
@@ -464,6 +466,8 @@ struct cache {
 	atomic_long_t		meta_sectors_written;
 	atomic_long_t		btree_sectors_written;
 	atomic_long_t		sectors_written;
+
+	char			cache_dev_name[BDEVNAME_SIZE];
 };
 
 struct gc_stat {
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 028f7b386e01..4e63c6f6c04d 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -106,7 +106,6 @@ void bch_btree_verify(struct btree *b)
 
 void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 {
-	char name[BDEVNAME_SIZE];
 	struct bio *check;
 	struct bio_vec bv, cbv;
 	struct bvec_iter iter, citer = { 0 };
@@ -134,7 +133,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 					bv.bv_len),
 				 dc->disk.c,
 				 "verify failed at dev %s sector %llu",
-				 bdevname(dc->bdev, name),
+				 dc->backing_dev_name,
 				 (uint64_t) bio->bi_iter.bi_sector);
 
 		kunmap_atomic(p1);
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 7fac97ae036e..2ddf8515e6a5 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -52,7 +52,6 @@ void bch_submit_bbio(struct bio *bio, struct cache_set *c,
 /* IO errors */
 void bch_count_backing_io_errors(struct cached_dev *dc, struct bio *bio)
 {
-	char buf[BDEVNAME_SIZE];
 	unsigned errors;
 
 	WARN_ONCE(!dc, "NULL pointer of struct cached_dev");
@@ -60,7 +59,7 @@ void bch_count_backing_io_errors(struct cached_dev *dc, struct bio *bio)
 	errors = atomic_add_return(1, &dc->io_errors);
 	if (errors < dc->error_limit)
 		pr_err("%s: IO error on backing device, unrecoverable",
-			bio_devname(bio, buf));
+			dc->backing_dev_name);
 	else
 		bch_cached_dev_error(dc);
 }
@@ -105,19 +104,18 @@ void bch_count_io_errors(struct cache *ca,
 	}
 
 	if (error) {
-		char buf[BDEVNAME_SIZE];
 		unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT,
 						    &ca->io_errors);
 		errors >>= IO_ERROR_SHIFT;
 
 		if (errors < ca->set->error_limit)
 			pr_err("%s: IO error on %s%s",
-			       bdevname(ca->bdev, buf), m,
+			       ca->cache_dev_name, m,
 			       is_read ? ", recovering." : ".");
 		else
 			bch_cache_set_error(ca->set,
 					    "%s: too many IO errors %s",
-					    bdevname(ca->bdev, buf), m);
+					    ca->cache_dev_name, m);
 	}
 }
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index a65e3365eeb9..8e3e8655ed63 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -649,11 +649,8 @@ static void backing_request_endio(struct bio *bio)
 		 */
 		if (unlikely(s->iop.writeback &&
 			     bio->bi_opf & REQ_PREFLUSH)) {
-			char buf[BDEVNAME_SIZE];
-
-			bio_devname(bio, buf);
 			pr_err("Can't flush %s: returned bi_status %i",
-				buf, bio->bi_status);
+				dc->backing_dev_name, bio->bi_status);
 		} else {
 			/* set to orig_bio->bi_status in bio_complete() */
 			s->iop.status = bio->bi_status;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d90d9e59ca00..06433fab8ece 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -936,7 +936,6 @@ static void cancel_writeback_rate_update_dwork(struct cached_dev *dc)
 static void cached_dev_detach_finish(struct work_struct *w)
 {
 	struct cached_dev *dc = container_of(w, struct cached_dev, detach);
-	char buf[BDEVNAME_SIZE];
 	struct closure cl;
 	closure_init_stack(&cl);
 
@@ -967,7 +966,7 @@ static void cached_dev_detach_finish(struct work_struct *w)
 
 	mutex_unlock(&bch_register_lock);
 
-	pr_info("Caching disabled for %s", bdevname(dc->bdev, buf));
+	pr_info("Caching disabled for %s", dc->backing_dev_name);
 
 	/* Drop ref we took in cached_dev_detach() */
 	closure_put(&dc->disk.cl);
@@ -999,29 +998,28 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 {
 	uint32_t rtime = cpu_to_le32(get_seconds());
 	struct uuid_entry *u;
-	char buf[BDEVNAME_SIZE];
 	struct cached_dev *exist_dc, *t;
 
-	bdevname(dc->bdev, buf);
-
 	if ((set_uuid && memcmp(set_uuid, c->sb.set_uuid, 16)) ||
 	    (!set_uuid && memcmp(dc->sb.set_uuid, c->sb.set_uuid, 16)))
 		return -ENOENT;
 
 	if (dc->disk.c) {
-		pr_err("Can't attach %s: already attached", buf);
+		pr_err("Can't attach %s: already attached",
+		       dc->backing_dev_name);
 		return -EINVAL;
 	}
 
 	if (test_bit(CACHE_SET_STOPPING, &c->flags)) {
-		pr_err("Can't attach %s: shutting down", buf);
+		pr_err("Can't attach %s: shutting down",
+		       dc->backing_dev_name);
 		return -EINVAL;
 	}
 
 	if (dc->sb.block_size < c->sb.block_size) {
 		/* Will die */
 		pr_err("Couldn't attach %s: block size less than set's block size",
-		       buf);
+		       dc->backing_dev_name);
 		return -EINVAL;
 	}
 
@@ -1029,7 +1027,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 	list_for_each_entry_safe(exist_dc, t, &c->cached_devs, list) {
 		if (!memcmp(dc->sb.uuid, exist_dc->sb.uuid, 16)) {
 			pr_err("Tried to attach %s but duplicate UUID already attached",
-				buf);
+				dc->backing_dev_name);
 
 			return -EINVAL;
 		}
@@ -1047,13 +1045,15 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 
 	if (!u) {
 		if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
-			pr_err("Couldn't find uuid for %s in set", buf);
+			pr_err("Couldn't find uuid for %s in set",
+			       dc->backing_dev_name);
 			return -ENOENT;
 		}
 
 		u = uuid_find_empty(c);
 		if (!u) {
-			pr_err("Not caching %s, no room for UUID", buf);
+			pr_err("Not caching %s, no room for UUID",
+			       dc->backing_dev_name);
 			return -EINVAL;
 		}
 	}
@@ -1112,7 +1112,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
 	up_write(&dc->writeback_lock);
 
 	pr_info("Caching %s as %s on set %pU",
-		bdevname(dc->bdev, buf), dc->disk.disk->disk_name,
+		dc->backing_dev_name,
+		dc->disk.disk->disk_name,
 		dc->disk.c->sb.set_uuid);
 	return 0;
 }
@@ -1225,7 +1226,6 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 				 struct block_device *bdev,
 				 struct cached_dev *dc)
 {
-	char name[BDEVNAME_SIZE];
 	const char *err = "cannot allocate memory";
 	struct cache_set *c;
 
@@ -1247,7 +1247,8 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 	if (bch_cache_accounting_add_kobjs(&dc->accounting, &dc->disk.kobj))
 		goto err;
 
-	pr_info("registered backing device %s", bdevname(bdev, name));
+	bdevname(bdev, dc->backing_dev_name);
+	pr_info("registered backing device %s", dc->backing_dev_name);
 
 	list_add(&dc->list, &uncached_devices);
 	list_for_each_entry(c, &bch_cache_sets, list)
@@ -1259,7 +1260,7 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 
 	return;
 err:
-	pr_notice("error %s: %s", bdevname(bdev, name), err);
+	pr_notice("error %s: %s", dc->backing_dev_name, err);
 	bcache_device_stop(&dc->disk);
 }
 
@@ -1367,8 +1368,6 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size)
 
 bool bch_cached_dev_error(struct cached_dev *dc)
 {
-	char name[BDEVNAME_SIZE];
-
 	if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
 		return false;
 
@@ -1377,7 +1376,7 @@ bool bch_cached_dev_error(struct cached_dev *dc)
 	smp_mb();
 
 	pr_err("stop %s: too many IO errors on backing device %s\n",
-		dc->disk.disk->disk_name, bdevname(dc->bdev, name));
+		dc->disk.disk->disk_name, dc->backing_dev_name);
 
 	bcache_device_stop(&dc->disk);
 	return true;
@@ -2003,11 +2002,10 @@ static int cache_alloc(struct cache *ca)
 static int register_cache(struct cache_sb *sb, struct page *sb_page,
 				struct block_device *bdev, struct cache *ca)
 {
-	char name[BDEVNAME_SIZE];
 	const char *err = NULL; /* must be set for any error case */
 	int ret = 0;
 
-	bdevname(bdev, name);
+	bdevname(bdev, ca->cache_dev_name);
 
 	memcpy(&ca->sb, sb, sizeof(struct cache_sb));
 	ca->bdev = bdev;
@@ -2045,14 +2043,14 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
 		goto out;
 	}
 
-	pr_info("registered cache device %s", name);
+	pr_info("registered cache device %s", ca->cache_dev_name);
 
 out:
 	kobject_put(&ca->kobj);
 
 err:
 	if (err)
-		pr_notice("error %s: %s", name, err);
+		pr_notice("error %s: %s", ca->cache_dev_name, err);
 
 	return ret;
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
  2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
  2018-04-24 12:14 ` [PATCH 1/6] bcache: store disk name in struct cache and struct cached_dev Coly Li
@ 2018-04-24 12:14 ` Coly Li
  2018-04-24 12:14 ` [PATCH 3/6] bcache: count backing device I/O error for writeback I/O Coly Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-04-24 12:14 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") tries
to stop bcache device by calling bcache_device_stop() when too many I/O
errors happened on backing device. But if there is internal I/O happening
on cache device (writeback scan, garbage collection, etc), a regular I/O
request triggers the internal I/Os may still holds a refcount of dc->count,
and the refcount may only be dropped after the internal I/O stopped.

By this patch, bch_cached_dev_error() will check if the backing device is
attached to a cache set, if yes that CACHE_SET_IO_DISABLE will be set to
flags of this cache set. Then internal I/Os on cache device will be
rejected and stopped immediately, and the bcache device can be stopped.

For people who are not familiar with the interesting refcount dependance,
let me explain a bit more how the fix works. Example the writeback thread
will scan cache device for dirty data writeback purpose. Before it stopps,
it holds a refcount of dc->count. When CACHE_SET_IO_DISABLE bit is set,
the internal I/O will stopped and the while-loop in bch_writeback_thread()
quits and calls cached_dev_put() to drop dc->count. If this is the last
refcount to drop, then cached_dev_detach_finish() will be called. In this
call back function, in turn closure_put(dc->disk.cl) is called to drop a
refcount of closure dc->disk.cl. If this is the last refcount of this
closure to drop, then cached_dev_flush() will be called. Then the cached
device is freed. So if CACHE_SET_IO_DISABLE is not set, the bache device
can not be stopped until all inernal cache device I/O stopped. For large
size cache device, and writeback thread competes locks with gc thread,
there might be a quite long time to wait.

Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 06433fab8ece..85b22a7a2366 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1368,6 +1368,8 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size)
 
 bool bch_cached_dev_error(struct cached_dev *dc)
 {
+	struct cache_set *c;
+
 	if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
 		return false;
 
@@ -1378,6 +1380,21 @@ bool bch_cached_dev_error(struct cached_dev *dc)
 	pr_err("stop %s: too many IO errors on backing device %s\n",
 		dc->disk.disk->disk_name, dc->backing_dev_name);
 
+	/*
+	 * If the cached device is still attached to a cache set,
+	 * even dc->io_disable is true and no more I/O requests
+	 * accepted, cache device internal I/O (writeback scan or
+	 * garbage collection) may still prevent bcache device from
+	 * being stopped. So here CACHE_SET_IO_DISABLE should be
+	 * set to c->flags too, to make the internal I/O to cache
+	 * device rejected and stopped immediately.
+	 * If c is NULL, that means the bcache device is not attached
+	 * to any cache set, then no CACHE_SET_IO_DISABLE bit to set.
+	 */
+	c = dc->disk.c;
+	if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
+		pr_warn("CACHE_SET_IO_DISABLE already set");
+
 	bcache_device_stop(&dc->disk);
 	return true;
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/6] bcache: count backing device I/O error for writeback I/O
  2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
  2018-04-24 12:14 ` [PATCH 1/6] bcache: store disk name in struct cache and struct cached_dev Coly Li
  2018-04-24 12:14 ` [PATCH 2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() Coly Li
@ 2018-04-24 12:14 ` Coly Li
  2018-04-24 12:14 ` [PATCH 4/6] bcache: add wait_for_kthread_stop() in bch_allocator_thread() Coly Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-04-24 12:14 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
counts backing device I/O requets and set dc->io_disable to true if error
counters exceeds dc->io_error_limit. But it only counts I/O errors for
regular I/O request, neglects errors of write back I/Os when backing device
is offline.

This patch counts the errors of writeback I/Os, in dirty_endio() if
bio->bi_status is  not 0, it means error happens when writing dirty keys
to backing device, then bch_count_backing_io_errors() is called.

By this fix, even there is no reqular I/O request coming, if writeback I/O
errors exceed dc->io_error_limit, the bcache device may still be stopped
for the broken backing device.

Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/writeback.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4a9547cdcdc5..ad45ebe1a74b 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -244,8 +244,10 @@ static void dirty_endio(struct bio *bio)
 	struct keybuf_key *w = bio->bi_private;
 	struct dirty_io *io = w->private;
 
-	if (bio->bi_status)
+	if (bio->bi_status) {
 		SET_KEY_DIRTY(&w->key, false);
+		bch_count_backing_io_errors(io->dc, bio);
+	}
 
 	closure_put(&io->cl);
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/6] bcache: add wait_for_kthread_stop() in bch_allocator_thread()
  2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
                   ` (2 preceding siblings ...)
  2018-04-24 12:14 ` [PATCH 3/6] bcache: count backing device I/O error for writeback I/O Coly Li
@ 2018-04-24 12:14 ` Coly Li
  2018-04-24 12:14 ` [PATCH 5/6] bcache: set dc->io_disable to true in conditional_stop_bcache_device() Coly Li
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-04-24 12:14 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

When CACHE_SET_IO_DISABLE is set on cache set flags, bcache allocator
thread routine bch_allocator_thread() may stop the while-loops and
exit. Then it is possible to observe the following kernel oops message,

[  631.068366] bcache: bch_btree_insert() error -5
[  631.069115] bcache: cached_dev_detach_finish() Caching disabled for sdf
[  631.070220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  631.070250] PGD 0 P4D 0
[  631.070261] Oops: 0002 [#1] SMP PTI
[snipped]
[  631.070578] Workqueue: events cache_set_flush [bcache]
[  631.070597] RIP: 0010:exit_creds+0x1b/0x50
[  631.070610] RSP: 0018:ffffc9000705fe08 EFLAGS: 00010246
[  631.070626] RAX: 0000000000000001 RBX: ffff880a622ad300 RCX: 000000000000000b
[  631.070645] RDX: 0000000000000601 RSI: 000000000000000c RDI: 0000000000000000
[  631.070663] RBP: ffff880a622ad300 R08: ffffea00190c66e0 R09: 0000000000000200
[  631.070682] R10: ffff880a48123000 R11: ffff880000000000 R12: 0000000000000000
[  631.070700] R13: ffff880a4b160e40 R14: ffff880a4b160000 R15: 0ffff880667e2530
[  631.070719] FS:  0000000000000000(0000) GS:ffff880667e00000(0000) knlGS:0000000000000000
[  631.070740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  631.070755] CR2: 0000000000000000 CR3: 000000000200a001 CR4: 00000000003606e0
[  631.070774] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  631.070793] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  631.070811] Call Trace:
[  631.070828]  __put_task_struct+0x55/0x160
[  631.070845]  kthread_stop+0xee/0x100
[  631.070863]  cache_set_flush+0x11d/0x1a0 [bcache]
[  631.070879]  process_one_work+0x146/0x340
[  631.070892]  worker_thread+0x47/0x3e0
[  631.070906]  kthread+0xf5/0x130
[  631.070917]  ? max_active_store+0x60/0x60
[  631.070930]  ? kthread_bind+0x10/0x10
[  631.070945]  ret_from_fork+0x35/0x40
[snipped]
[  631.071017] RIP: exit_creds+0x1b/0x50 RSP: ffffc9000705fe08
[  631.071033] CR2: 0000000000000000
[  631.071045] ---[ end trace 011c63a24b22c927 ]---
[  631.071085] bcache: bcache_device_free() bcache0 stopped

The reason is when cache_set_flush() tries to call kthread_stop() to stop
allocator thread, but it exits already due to cache device I/O errors.

This patch adds wait_for_kthread_stop() at tail of bch_allocator_thread(),
to prevent the thread routine exiting directly. Then the allocator thread
can be blocked at wait_for_kthread_stop() and wait for cache_set_flush()
to stop it by calling kthread_stop().

Fixes: 771f393e8ffc ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index 004cc3cc6123..dcf51db0971f 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -378,6 +378,7 @@ static int bch_allocator_thread(void *arg)
 			bch_prio_write(ca);
 		}
 	}
+	wait_for_kthread_stop();
 }
 
 /* Allocation */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/6] bcache: set dc->io_disable to true in conditional_stop_bcache_device()
  2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
                   ` (3 preceding siblings ...)
  2018-04-24 12:14 ` [PATCH 4/6] bcache: add wait_for_kthread_stop() in bch_allocator_thread() Coly Li
@ 2018-04-24 12:14 ` Coly Li
  2018-04-24 12:14 ` [PATCH 6/6] bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set Coly Li
  2018-04-30  6:24 ` [PATCH 0/6] bcache fixes for device failure handling Coly Li
  6 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-04-24 12:14 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Commit 7e027ca4b534b ("bcache: add stop_when_cache_set_failed option to
backing device") adds stop_when_cache_set_failed option and stops bcache
device if stop_when_cache_set_failed is auto and there is dirty data on
broken cache device. There might exists a small time gap that the cache
set is released and set to NULL but bcache device is not released yet
(because they are released in parallel). During this time gap, dc->c is
NULL so CACHE_SET_IO_DISABLE won't be checked, and dc->io_disable is still
false, so new coming I/O requests will be accepted and directly go into
backing device as no cache set attached to. If there is dirty data on
cache device, this behavior may introduce potential inconsistent data.

This patch sets dc->io_disable to true before calling bcache_device_stop()
to make sure the backing device will reject new coming I/O request as
well, so even in the small time gap no I/O will directly go into backing
device to corrupt data consistency.

Fixes: 7e027ca4b534b ("bcache: add stop_when_cache_set_failed option to backing device")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 85b22a7a2366..487eb68b598f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1555,6 +1555,20 @@ static void conditional_stop_bcache_device(struct cache_set *c,
 		 */
 		pr_warn("stop_when_cache_set_failed of %s is \"auto\" and cache is dirty, stop it to avoid potential data corruption.",
 			d->disk->disk_name);
+			/*
+			 * There might be a small time gap that cache set is
+			 * released but bcache device is not. Inside this time
+			 * gap, regular I/O requests will directly go into
+			 * backing device as no cache set attached to. This
+			 * behavior may also introduce potential inconsistence
+			 * data in writeback mode while cache is dirty.
+			 * Therefore before calling bcache_device_stop() due
+			 * to a broken cache device, dc->io_disable should be
+			 * explicitly set to true.
+			 */
+			dc->io_disable = true;
+			/* make others know io_disable is true earlier */
+			smp_mb();
 			bcache_device_stop(d);
 	} else {
 		/*
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/6] bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
  2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
                   ` (4 preceding siblings ...)
  2018-04-24 12:14 ` [PATCH 5/6] bcache: set dc->io_disable to true in conditional_stop_bcache_device() Coly Li
@ 2018-04-24 12:14 ` Coly Li
  2018-04-30  6:24 ` [PATCH 0/6] bcache fixes for device failure handling Coly Li
  6 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-04-24 12:14 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

It is possible that multiple I/O requests hits on failed cache device or
backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
set already when a task tries to set the bit from bch_cached_dev_error()
or bch_cache_set_error(). Currently the message "CACHE_SET_IO_DISABLE
already set" is printed by pr_warn(), which might mislead users to think
a serious fault happens in source code.

This patch uses pr_info() to print the information in such situations,
avoid extra worries. This information is helpful to understand bcache
behavior in cache device failures, so I still keep them in source code.

Fixes: a705642fd4ded ("bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()")
Fixes: 771f393e8ffc9 ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 487eb68b598f..158064e5a861 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1393,7 +1393,7 @@ bool bch_cached_dev_error(struct cached_dev *dc)
 	 */
 	c = dc->disk.c;
 	if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
-		pr_warn("CACHE_SET_IO_DISABLE already set");
+		pr_info("CACHE_SET_IO_DISABLE already set");
 
 	bcache_device_stop(&dc->disk);
 	return true;
@@ -1411,7 +1411,7 @@ bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...)
 		return false;
 
 	if (test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
-		pr_warn("CACHE_SET_IO_DISABLE already set");
+		pr_info("CACHE_SET_IO_DISABLE already set");
 
 	/* XXX: we can be called from atomic context
 	acquire_console_sem();
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/6] bcache fixes for device failure handling
  2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
                   ` (5 preceding siblings ...)
  2018-04-24 12:14 ` [PATCH 6/6] bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set Coly Li
@ 2018-04-30  6:24 ` Coly Li
  6 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-04-30  6:24 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block

On 2018/4/24 8:14 PM, Coly Li wrote:
> Hi folks,
> 
> I receive bug reports from partners for the bcache cache device failure
> handling patch set (which is just merged into 4.17). Fortunately we are
> still in 4.17 merge window, so I ask helps for offering code review
> before I ask Jens to pick these fixes for 4.17-rc3.
> 
> The patches are well commneted IMHO and passes regression test by myself.
> Thanks for your help in advance.

Hi folks,

Can anybody help to offer Reivew-by: to these fixes ? Now it is already
4.17-rc3, we need these fixes go into 4.17 kernel with the patches to be
fixed. Even Acked-by: is helpful.

Thanks in advance.

Coly Li

> ---
> Coly Li (6):
>   bcache: store disk name in struct cache and struct cached_dev
>   bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
>   bcache: count backing device I/O error for writeback I/O
>   bcache: add wait_for_kthread_stop() in bch_allocator_thread()
>   bcache: set dc->io_disable to true in conditional_stop_bcache_device()
>   bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
> 
>  drivers/md/bcache/alloc.c     |  1 +
>  drivers/md/bcache/bcache.h    |  4 +++
>  drivers/md/bcache/debug.c     |  3 +-
>  drivers/md/bcache/io.c        |  8 ++---
>  drivers/md/bcache/request.c   |  5 +--
>  drivers/md/bcache/super.c     | 73 ++++++++++++++++++++++++++++++-------------
>  drivers/md/bcache/writeback.c |  4 ++-
>  7 files changed, 64 insertions(+), 34 deletions(-)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-04-30  6:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
2018-04-24 12:14 ` [PATCH 1/6] bcache: store disk name in struct cache and struct cached_dev Coly Li
2018-04-24 12:14 ` [PATCH 2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() Coly Li
2018-04-24 12:14 ` [PATCH 3/6] bcache: count backing device I/O error for writeback I/O Coly Li
2018-04-24 12:14 ` [PATCH 4/6] bcache: add wait_for_kthread_stop() in bch_allocator_thread() Coly Li
2018-04-24 12:14 ` [PATCH 5/6] bcache: set dc->io_disable to true in conditional_stop_bcache_device() Coly Li
2018-04-24 12:14 ` [PATCH 6/6] bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set Coly Li
2018-04-30  6:24 ` [PATCH 0/6] bcache fixes for device failure handling Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.