All of lore.kernel.org
 help / color / mirror / Atom feed
* Commits for 4.15-- bcache
@ 2017-11-24 23:14 Michael Lyle
  2017-11-24 23:14 ` [PATCH 1/4] bcache: add a comment in journal bucket reading Michael Lyle
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Michael Lyle @ 2017-11-24 23:14 UTC (permalink / raw)
  To: linux-bcache, linux-block; +Cc: axboe

Hi Jens,

We have a few more small things:

Huacai Chen (1):
      bcache: Fix building error on MIPS

Michael Lyle (1):
      bcache: check return value of register_shrinker

Rui Hua (1):
      bcache: recover data from backing when data is clean

Tang Junhui (1):
      bcache: add a comment in journal bucket reading

Most important is Rui Hua's change, which prevents EINTR from being
returned to users unexpectedly (and is cc'd stable).  All changes are
reviewed and have had a moderate degree of test.

Thanks,

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] bcache: add a comment in journal bucket reading
  2017-11-24 23:14 Commits for 4.15-- bcache Michael Lyle
@ 2017-11-24 23:14 ` Michael Lyle
  2017-11-24 23:14 ` [PATCH 2/4] bcache: Fix building error on MIPS Michael Lyle
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Michael Lyle @ 2017-11-24 23:14 UTC (permalink / raw)
  To: linux-bcache, linux-block; +Cc: axboe, Tang Junhui, Michael Lyle

From: Tang Junhui <tang.junhui@zte.com.cn>

Journal bucket is a circular buffer, the bucket
can be like YYYNNNYY, which means the first valid journal in
the 7th bucket, and the latest valid journal in third bucket, in
this case, if we do not try we the zero index first, We
may get a valid journal in the 7th bucket, then we call
find_next_bit(bitmap,ca->sb.njournal_buckets, l + 1) to get the
first invalid bucket after the 7th bucket, because all these
buckets is valid, so no bit 1 in bitmap, thus find_next_bit()
function would return with ca->sb.njournal_buckets (8). So, after
that, bcache only read journal in 7th and 8the bucket,
the first to the third buckets are lost.

So, it is important to let developer know that, we need to try
the zero index at first in the hash-search, and avoid any breaks
in future's code modification.

[ML: Fixed whitespace & formatting & file permissions]

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Signed-off-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
---
 drivers/md/bcache/journal.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 02a98ddb592d..5018c56ebb67 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -170,6 +170,11 @@ int bch_journal_read(struct cache_set *c, struct list_head *list)
 		 * find a sequence of buckets with valid journal entries
 		 */
 		for (i = 0; i < ca->sb.njournal_buckets; i++) {
+			/*
+			 * We must try the index l with ZERO first for
+			 * correctness due to the scenario that the journal
+			 * bucket is circular buffer which might have wrapped
+			 */
 			l = (i * 2654435769U) % ca->sb.njournal_buckets;
 
 			if (test_bit(l, bitmap))
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] bcache: Fix building error on MIPS
  2017-11-24 23:14 Commits for 4.15-- bcache Michael Lyle
  2017-11-24 23:14 ` [PATCH 1/4] bcache: add a comment in journal bucket reading Michael Lyle
@ 2017-11-24 23:14 ` Michael Lyle
  2017-11-28 13:42   ` Christoph Hellwig
  2017-11-24 23:14 ` [PATCH 3/4] bcache: recover data from backing when data is clean Michael Lyle
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Michael Lyle @ 2017-11-24 23:14 UTC (permalink / raw)
  To: linux-bcache, linux-block; +Cc: axboe, Huacai Chen, stable, Michael Lyle

From: Huacai Chen <chenhc@lemote.com>

This patch try to fix the building error on MIPS. The reason is MIPS
has already defined the PTR macro, which conflicts with the PTR macro
in include/uapi/linux/bcache.h.

[fixed by mlyle: corrected a line-length issue]

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Michael Lyle <mlyle@lyle.org>
---
 drivers/md/bcache/alloc.c   | 2 +-
 drivers/md/bcache/extents.c | 2 +-
 drivers/md/bcache/journal.c | 2 +-
 include/uapi/linux/bcache.h | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index a27d85232ce1..a0cc1bc6d884 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -490,7 +490,7 @@ int __bch_bucket_alloc_set(struct cache_set *c, unsigned reserve,
 		if (b == -1)
 			goto err;
 
-		k->ptr[i] = PTR(ca->buckets[b].gen,
+		k->ptr[i] = MAKE_PTR(ca->buckets[b].gen,
 				bucket_to_sector(c, b),
 				ca->sb.nr_this_dev);
 
diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
index 41c238fc3733..f9d391711595 100644
--- a/drivers/md/bcache/extents.c
+++ b/drivers/md/bcache/extents.c
@@ -585,7 +585,7 @@ static bool bch_extent_merge(struct btree_keys *bk, struct bkey *l, struct bkey
 		return false;
 
 	for (i = 0; i < KEY_PTRS(l); i++)
-		if (l->ptr[i] + PTR(0, KEY_SIZE(l), 0) != r->ptr[i] ||
+		if (l->ptr[i] + MAKE_PTR(0, KEY_SIZE(l), 0) != r->ptr[i] ||
 		    PTR_BUCKET_NR(b->c, l, i) != PTR_BUCKET_NR(b->c, r, i))
 			return false;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 5018c56ebb67..a87165c1d8e5 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -512,7 +512,7 @@ static void journal_reclaim(struct cache_set *c)
 			continue;
 
 		ja->cur_idx = next;
-		k->ptr[n++] = PTR(0,
+		k->ptr[n++] = MAKE_PTR(0,
 				  bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
 				  ca->sb.nr_this_dev);
 	}
diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h
index 90fc490f973f..821f71a2e48f 100644
--- a/include/uapi/linux/bcache.h
+++ b/include/uapi/linux/bcache.h
@@ -91,7 +91,7 @@ PTR_FIELD(PTR_GEN,			0,  8)
 
 #define PTR_CHECK_DEV			((1 << PTR_DEV_BITS) - 1)
 
-#define PTR(gen, offset, dev)						\
+#define MAKE_PTR(gen, offset, dev)					\
 	((((__u64) dev) << 51) | ((__u64) offset) << 8 | gen)
 
 /* Bkey utility code */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] bcache: recover data from backing when data is clean
  2017-11-24 23:14 Commits for 4.15-- bcache Michael Lyle
  2017-11-24 23:14 ` [PATCH 1/4] bcache: add a comment in journal bucket reading Michael Lyle
  2017-11-24 23:14 ` [PATCH 2/4] bcache: Fix building error on MIPS Michael Lyle
@ 2017-11-24 23:14 ` Michael Lyle
  2017-11-24 23:14 ` [PATCH 4/4] bcache: check return value of register_shrinker Michael Lyle
  2017-11-24 23:23 ` Commits for 4.15-- bcache Jens Axboe
  4 siblings, 0 replies; 7+ messages in thread
From: Michael Lyle @ 2017-11-24 23:14 UTC (permalink / raw)
  To: linux-bcache, linux-block; +Cc: axboe, Rui Hua, stable, Michael Lyle

From: Rui Hua <huarui.dev@gmail.com>

When we send a read request and hit the clean data in cache device, there
is a situation called cache read race in bcache(see the commit in the tail
of cache_look_up(), the following explaination just copy from there):
The bucket we're reading from might be reused while our bio is in flight,
and we could then end up reading the wrong data. We guard against this
by checking (in bch_cache_read_endio()) if the pointer is stale again;
if so, we treat it as an error (s->iop.error = -EINTR) and reread from
the backing device (but we don't pass that error up anywhere)

It should be noted that cache read race happened under normal
circumstances, not the circumstance when SSD failed, it was counted
and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.

Without this patch, when we use writeback mode, we will never reread from
the backing device when cache read race happened, until the whole cache
device is clean, because the condition
(s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
will be passed up, at last, user will receive -EINTR when it's bio end,
this is not suitable, and wield to up-application.

In this patch, we use s->read_dirty_data to judge whether the read
request hit dirty data in cache device, it is safe to reread data from
the backing device when the read request hit clean data. This can not
only handle cache read race, but also recover data when failed read
request from cache device.

[edited by mlyle to fix up whitespace, commit log title, comment
spelling]

Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean")
Cc: <stable@vger.kernel.org> # 4.14
Signed-off-by: Hua Rui <huarui.dev@gmail.com>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Michael Lyle <mlyle@lyle.org>
---
 drivers/md/bcache/request.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 3a7aed7282b2..643c3021624f 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -708,16 +708,15 @@ static void cached_dev_read_error(struct closure *cl)
 {
 	struct search *s = container_of(cl, struct search, cl);
 	struct bio *bio = &s->bio.bio;
-	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
 
 	/*
-	 * If cache device is dirty (dc->has_dirty is non-zero), then
-	 * recovery a failed read request from cached device may get a
-	 * stale data back. So read failure recovery is only permitted
-	 * when cache device is clean.
+	 * If read request hit dirty data (s->read_dirty_data is true),
+	 * then recovery a failed read request from cached device may
+	 * get a stale data back. So read failure recovery is only
+	 * permitted when read request hit clean data in cache device,
+	 * or when cache read race happened.
 	 */
-	if (s->recoverable &&
-	    (dc && !atomic_read(&dc->has_dirty))) {
+	if (s->recoverable && !s->read_dirty_data) {
 		/* Retry from the backing device: */
 		trace_bcache_read_retry(s->orig_bio);
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] bcache: check return value of register_shrinker
  2017-11-24 23:14 Commits for 4.15-- bcache Michael Lyle
                   ` (2 preceding siblings ...)
  2017-11-24 23:14 ` [PATCH 3/4] bcache: recover data from backing when data is clean Michael Lyle
@ 2017-11-24 23:14 ` Michael Lyle
  2017-11-24 23:23 ` Commits for 4.15-- bcache Jens Axboe
  4 siblings, 0 replies; 7+ messages in thread
From: Michael Lyle @ 2017-11-24 23:14 UTC (permalink / raw)
  To: linux-bcache, linux-block; +Cc: axboe, Michael Lyle

register_shrinker is now __must_check, so check it to kill a warning.
Caller of bch_btree_cache_alloc in super.c appropriately checks return
value so this is fully plumbed through.

This V2 fixes checkpatch warnings and improves the commit description,
as I was too hasty getting the previous version out.

Signed-off-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Vojtech Pavlik <vojtech@suse.com>
---
 drivers/md/bcache/btree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 11c5503d31dc..81e8dc3dbe5e 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -807,7 +807,10 @@ int bch_btree_cache_alloc(struct cache_set *c)
 	c->shrink.scan_objects = bch_mca_scan;
 	c->shrink.seeks = 4;
 	c->shrink.batch = c->btree_pages * 2;
-	register_shrinker(&c->shrink);
+
+	if (register_shrinker(&c->shrink))
+		pr_warn("bcache: %s: could not register shrinker",
+				__func__);
 
 	return 0;
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Commits for 4.15-- bcache
  2017-11-24 23:14 Commits for 4.15-- bcache Michael Lyle
                   ` (3 preceding siblings ...)
  2017-11-24 23:14 ` [PATCH 4/4] bcache: check return value of register_shrinker Michael Lyle
@ 2017-11-24 23:23 ` Jens Axboe
  4 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2017-11-24 23:23 UTC (permalink / raw)
  To: Michael Lyle, linux-bcache, linux-block

On 11/24/2017 04:14 PM, Michael Lyle wrote:
> Hi Jens,
> 
> We have a few more small things:
> 
> Huacai Chen (1):
>       bcache: Fix building error on MIPS
> 
> Michael Lyle (1):
>       bcache: check return value of register_shrinker
> 
> Rui Hua (1):
>       bcache: recover data from backing when data is clean
> 
> Tang Junhui (1):
>       bcache: add a comment in journal bucket reading
> 
> Most important is Rui Hua's change, which prevents EINTR from being
> returned to users unexpectedly (and is cc'd stable).  All changes are
> reviewed and have had a moderate degree of test.

Thanks, all look fine for this series, applied.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/4] bcache: Fix building error on MIPS
  2017-11-24 23:14 ` [PATCH 2/4] bcache: Fix building error on MIPS Michael Lyle
@ 2017-11-28 13:42   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2017-11-28 13:42 UTC (permalink / raw)
  To: Michael Lyle; +Cc: linux-bcache, linux-block, axboe, Huacai Chen, stable

> diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h
> index 90fc490f973f..821f71a2e48f 100644
> --- a/include/uapi/linux/bcache.h
> +++ b/include/uapi/linux/bcache.h
> @@ -91,7 +91,7 @@ PTR_FIELD(PTR_GEN,			0,  8)
>  
>  #define PTR_CHECK_DEV			((1 << PTR_DEV_BITS) - 1)
>  
> -#define PTR(gen, offset, dev)						\
> +#define MAKE_PTR(gen, offset, dev)					\
>  	((((__u64) dev) << 51) | ((__u64) offset) << 8 | gen)

This really should have a BCACHE_ prefix, and probably a lower case
name.  And last but not least it really shouldn't be in a uapi header.

Similar things apply to various other things around it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-28 13:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-24 23:14 Commits for 4.15-- bcache Michael Lyle
2017-11-24 23:14 ` [PATCH 1/4] bcache: add a comment in journal bucket reading Michael Lyle
2017-11-24 23:14 ` [PATCH 2/4] bcache: Fix building error on MIPS Michael Lyle
2017-11-28 13:42   ` Christoph Hellwig
2017-11-24 23:14 ` [PATCH 3/4] bcache: recover data from backing when data is clean Michael Lyle
2017-11-24 23:14 ` [PATCH 4/4] bcache: check return value of register_shrinker Michael Lyle
2017-11-24 23:23 ` Commits for 4.15-- bcache Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.