All of lore.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: linux-bcache@vger.kernel.org
Cc: linux-block@vger.kernel.org, mlyle@lyle.org,
	tang.junhui@zte.com.cn, Coly Li <colyli@suse.de>
Subject: [PATCH v1 10/10] bcache: stop all attached bcache devices for a retired cache set
Date: Wed,  3 Jan 2018 22:03:25 +0800	[thread overview]
Message-ID: <20180103140325.63175-11-colyli@suse.de> (raw)
In-Reply-To: <20180103140325.63175-1-colyli@suse.de>

When there are too many I/O errors on cache device, current bcache code
will retire the whole cache set, and detach all bcache devices. But the
detached bcache devices are not stopped, which is problematic when bcache
is in writeback mode.

If the retired cache set has dirty data of backing devices, continue
writing to bcache device will write to backing device directly. If the
LBA of write request has a dirty version cached on cache device, next time
when the cache device is re-registered and backing device re-attached to
it again, the stale dirty data on cache device will be written to backing
device, and overwrite latest directly written data. This situation causes
a quite data corruption.

This patch checkes whether cache_set->io_disable is true in
__cache_set_unregister(). If cache_set->io_disable is true, it means cache
set is unregistering by too many I/O errors, then all attached bcache
devices will be stopped as well. If cache_set->io_disable is not true, it
means __cache_set_unregister() is triggered by writing 1 to sysfs file
/sys/fs/bcache/<UUID>/bcache/stop. This is an exception because users do
it explicitly, this patch keeps existing behavior and does not stop any
bcache device.

Even the failed cache device has no dirty data, stopping bcache device is
still a desired behavior by many Ceph and data base users. Then their
application will report I/O errors due to disappeared bcache device, and
operation people will know the cache device is broken or disconnected.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 49d6fedf89c3..20a7a6959506 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1458,6 +1458,14 @@ static void __cache_set_unregister(struct closure *cl)
 				dc = container_of(c->devices[i],
 						  struct cached_dev, disk);
 				bch_cached_dev_detach(dc);
+				/*
+				 * If we come here by too many I/O errors,
+				 * bcache device should be stopped too, to
+				 * keep data consistency on cache and
+				 * backing devices.
+				 */
+				if (c->io_disable)
+					bcache_device_stop(c->devices[i]);
 			} else {
 				bcache_device_stop(c->devices[i]);
 			}
-- 
2.15.1

  parent reply	other threads:[~2018-01-03 14:04 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-03 14:03 [PATCH v1 00/10] cache device failure handling improvement Coly Li
2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
2018-01-03 17:08   ` Michael Lyle
2018-01-05 17:05     ` Coly Li
2018-01-05 17:09       ` Michael Lyle
2018-01-08  7:09   ` Hannes Reinecke
2018-01-08 13:50     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 02/10] bcache: set task properly in allocator_wait() Coly Li
2018-01-03 17:09   ` Michael Lyle
2018-01-05 17:11     ` Coly Li
2018-01-08  7:10   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used Coly Li
2018-01-03 17:11   ` Michael Lyle
2018-01-08  7:12   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 04/10] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-08  7:16   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping Coly Li
2018-01-08  7:22   ` Hannes Reinecke
2018-01-08 16:01     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier Coly Li
2018-01-08  7:25   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 07/10] bcache: set error_limit correctly Coly Li
2018-01-08  7:26   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 08/10] bcache: fix misleading error message in bch_count_io_errors() Coly Li
2018-01-03 17:14   ` Michael Lyle
2018-01-08  7:27   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 09/10] bcache: add io_disable to struct cache_set Coly Li
2018-01-08  7:30   ` Hannes Reinecke
2018-01-03 14:03 ` Coly Li [this message]
2018-01-08  7:31   ` [PATCH v1 10/10] bcache: stop all attached bcache devices for a retired cache set Hannes Reinecke
2018-01-03 17:07 ` [PATCH v1 00/10] cache device failure handling improvement Michael Lyle
2018-01-04  2:20   ` Coly Li
2018-01-04 17:46     ` Michael Lyle
2018-01-05  4:04       ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180103140325.63175-11-colyli@suse.de \
    --to=colyli@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=mlyle@lyle.org \
    --cc=tang.junhui@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.