From: Kent Overstreet <kent.overstreet@gmail.com>
To: Junhui Tang <tang.junhui.linux@gmail.com>
Cc: colyli@suse.de, linux-bcache@vger.kernel.org,
linux-block@vger.kernel.org
Subject: Re: [PATCH] bcache: treat stale && dirty keys as bad keys
Date: Tue, 18 Dec 2018 09:01:57 -0500 [thread overview]
Message-ID: <20181218140157.GB7144@kmo-pixel> (raw)
In-Reply-To: <CA+NNRhE8438XuoZ25kdCRjYvZv6nHJU+Yoyg6hacWuptjWWDfA@mail.gmail.com>
On Tue, Dec 18, 2018 at 02:37:14PM +0800, Junhui Tang wrote:
> From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001
> From: Tang Junhui <tang.junhui.linux@gmail.com>
> Date: Tue, 18 Dec 2018 10:38:26 +0800
> Subject: [PATCH] bcache: treat stale && dirty keys as bad keys
>
> Stale && dirty keys can be produced in the follow way:
> After writeback in write_dirty_finish(), dirty keys k1 will
> replace by clean keys k2
> ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
> ==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
> ==>static int bch_btree_insert_node(struct btree *b,
> struct btree_op *op,
> struct keylist *insert_keys,
> atomic_t *journal_ref,
> Then two steps:
> A) update k1 to k2 in btree node memory;
> bch_btree_insert_keys(b, op, insert_keys, replace_key)
> B) Write the bset(contains k2) to cache disk by a 30s delay work
> bch_btree_leaf_dirty(b, journal_ref).
> But before the 30s delay work write the bset to cache device,
> these things happend:
> A) GC works, and reclaim the bucket k2 point to;
> B) Allocator works, and invalidate the bucket k2 point to,
> and increase the gen of the bucket, and place it into free_inc
> fifo;
> C) Until now, the 30s delay work still does not finish work,
> so in the disk, the key still is k1, it is dirty and stale
> (its gen is smaller than the gen of the bucket). and then the
> machine power off suddenly happens;
> D) When the machine power on again, after the btree reconstruction,
> the stale dirty key appear.
Only prior to journal replay, right? Or did you uncover something more severe?
> In bch_extent_bad(), when expensive_debug_checks is off, it would
> treat the dirty key as good even it is stale keys, and it would
> cause bellow probelms:
> A) In read_dirty() it would cause machine crash:
> BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> B) It could be worse when reads hits stale dirty keys, it would
> read old incorrect data.
Neither of these can happen until after journal replay is finished. Prior to
journal replay we expect to find stale dirty keys - if we find any after journal
replay then it's indicative of a real bug.
>
> This patch tolerate the existence of these stale && dirty keys,
> and treat them as bad key in bch_extent_bad().
>
> Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
> ---
> drivers/md/bcache/extents.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
> index 1d09674..f9eb03c 100644
> --- a/drivers/md/bcache/extents.c
> +++ b/drivers/md/bcache/extents.c
> @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk,
> const struct bkey *k)
> {
> struct btree *b = container_of(bk, struct btree, keys);
> unsigned i, stale;
> + char buf[80];
>
> if (!KEY_PTRS(k) ||
> bch_extent_invalid(bk, k))
> @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys
> *bk, const struct bkey *k)
> if (!ptr_available(b->c, k, i))
> return true;
>
> - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
> - return false;
> -
> for (i = 0; i < KEY_PTRS(k); i++) {
> stale = ptr_stale(b->c, k, i);
>
> + if (stale && KEY_DIRTY(k)) {
> + bch_extent_to_text(buf, sizeof(buf), k);
> + pr_info("stale dirty pointer, stale %u, key: %s",
> + stale,
> + buf);
> + }
> +
> btree_bug_on(stale > 96, b,
> "key too stale: %i, need_gc %u",
> stale, b->c->need_gc);
>
> - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
> - b, "stale dirty pointer");
> -
> if (stale)
> return true;
>
> --
> 1.8.3.1
next prev parent reply other threads:[~2018-12-18 14:02 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-18 6:37 [PATCH] bcache: treat stale && dirty keys as bad keys Junhui Tang
2018-12-18 14:01 ` Kent Overstreet [this message]
2018-12-19 1:30 ` Junhui Tang
2018-12-19 1:32 ` Junhui Tang
2018-12-19 11:32 ` Kent Overstreet
2018-12-20 8:40 ` Junhui Tang
2018-12-22 13:02 ` Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181218140157.GB7144@kmo-pixel \
--to=kent.overstreet@gmail.com \
--cc=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=tang.junhui.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).