All of lore.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: linux-bcache@vger.kernel.org, axboe@kernel.dk
Cc: linux-block@vger.kernel.org, Coly Li <colyli@suse.de>
Subject: [PATCH 2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
Date: Wed,  2 May 2018 22:46:55 +0800	[thread overview]
Message-ID: <20180502144659.118628-3-colyli@suse.de> (raw)
In-Reply-To: <20180502144659.118628-1-colyli@suse.de>

Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") tries
to stop bcache device by calling bcache_device_stop() when too many I/O
errors happened on backing device. But if there is internal I/O happening
on cache device (writeback scan, garbage collection, etc), a regular I/O
request triggers the internal I/Os may still holds a refcount of dc->count,
and the refcount may only be dropped after the internal I/O stopped.

By this patch, bch_cached_dev_error() will check if the backing device is
attached to a cache set, if yes that CACHE_SET_IO_DISABLE will be set to
flags of this cache set. Then internal I/Os on cache device will be
rejected and stopped immediately, and the bcache device can be stopped.

For people who are not familiar with the interesting refcount dependance,
let me explain a bit more how the fix works. Example the writeback thread
will scan cache device for dirty data writeback purpose. Before it stopps,
it holds a refcount of dc->count. When CACHE_SET_IO_DISABLE bit is set,
the internal I/O will stopped and the while-loop in bch_writeback_thread()
quits and calls cached_dev_put() to drop dc->count. If this is the last
refcount to drop, then cached_dev_detach_finish() will be called. In this
call back function, in turn closure_put(dc->disk.cl) is called to drop a
refcount of closure dc->disk.cl. If this is the last refcount of this
closure to drop, then cached_dev_flush() will be called. Then the cached
device is freed. So if CACHE_SET_IO_DISABLE is not set, the bache device
can not be stopped until all inernal cache device I/O stopped. For large
size cache device, and writeback thread competes locks with gc thread,
there might be a quite long time to wait.

Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 8196b19fada2..a0d5a3ccc7d0 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1369,6 +1369,8 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size)
 
 bool bch_cached_dev_error(struct cached_dev *dc)
 {
+	struct cache_set *c;
+
 	if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
 		return false;
 
@@ -1379,6 +1381,21 @@ bool bch_cached_dev_error(struct cached_dev *dc)
 	pr_err("stop %s: too many IO errors on backing device %s\n",
 		dc->disk.disk->disk_name, dc->backing_dev_name);
 
+	/*
+	 * If the cached device is still attached to a cache set,
+	 * even dc->io_disable is true and no more I/O requests
+	 * accepted, cache device internal I/O (writeback scan or
+	 * garbage collection) may still prevent bcache device from
+	 * being stopped. So here CACHE_SET_IO_DISABLE should be
+	 * set to c->flags too, to make the internal I/O to cache
+	 * device rejected and stopped immediately.
+	 * If c is NULL, that means the bcache device is not attached
+	 * to any cache set, then no CACHE_SET_IO_DISABLE bit to set.
+	 */
+	c = dc->disk.c;
+	if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
+		pr_warn("CACHE_SET_IO_DISABLE already set");
+
 	bcache_device_stop(&dc->disk);
 	return true;
 }
-- 
2.16.3

  parent reply	other threads:[~2018-05-02 14:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-02 14:46 [PATCH v2 0/6] bcache device failure handling fixes for 4.17-rc4 Coly Li
2018-05-02 14:46 ` [PATCH 1/6] bcache: store disk name in struct cache and struct cached_dev Coly Li
2018-05-03  5:51   ` Hannes Reinecke
2018-05-02 14:46 ` Coly Li [this message]
2018-05-03  5:53   ` [PATCH 2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() Hannes Reinecke
2018-05-02 14:46 ` [PATCH 3/6] bcache: count backing device I/O error for writeback I/O Coly Li
2018-05-03  5:53   ` Hannes Reinecke
2018-05-02 14:46 ` [PATCH 4/6] bcache: add wait_for_kthread_stop() in bch_allocator_thread() Coly Li
2018-05-03  5:54   ` Hannes Reinecke
2018-05-02 14:46 ` [PATCH 5/6] bcache: set dc->io_disable to true in conditional_stop_bcache_device() Coly Li
2018-05-03  5:55   ` Hannes Reinecke
2018-05-02 14:46 ` [PATCH 6/6] bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set Coly Li
2018-05-02 15:01   ` Coly Li
2018-05-03  5:56   ` Hannes Reinecke
  -- strict thread matches above, loose matches on Subject: below --
2018-04-24 12:14 [PATCH 0/6] bcache fixes for device failure handling Coly Li
2018-04-24 12:14 ` [PATCH 2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180502144659.118628-3-colyli@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.