All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bcache: fix insane -ESRCH error in bch_journal_replay
@ 2022-02-15  8:28 Minlan Wang
  2022-05-13 15:22 ` Coly Li
  0 siblings, 1 reply; 2+ messages in thread
From: Minlan Wang @ 2022-02-15  8:28 UTC (permalink / raw)
  To: colyli, linux-bcache; +Cc: wangminlan

bch_keylist_empty is checked to make sure key insert is successful

Signed-off-by: Minlan Wang <wangminlan@szsandstone.com>
---
We've been using a pretty old version of bcache (from
linux-4.18-rc8) and experiencing occasional data corruption
after a new register session of bcache.  It turns out the
data corruption is caused by this code in our old version of
bcache:

in bch_journal_replay:
		 ...
                 for (k = i->j.start;
                      k < bset_bkey_last(&i->j);
                      k = bkey_next(k)) {
                         trace_bcache_journal_replay_key(k);

                         bch_keylist_init_single(&keylist, k);

                         ret = bch_btree_insert(s, &keylist, i->pin, NULL);
                         if (ret)
                                 goto err;
		 ...
bch_journal_replay returns -ESRCH, then in run_cache_set:
		 ...
                 if (j->version < BCACHE_JSET_VERSION_UUID)
                         __uuid_write(c);

                 bch_journal_replay(c, &journal);
		 ...
There's no message warning about this key insert error in
journal replay, nor does run_cache_set check for return
value of bch_journal_replay either, so later keys in jset
got lost silently and cause data corruption.

We checked the key which caused this failure in
bch_journal_replay, it is a key mapped to 2 btree node, and
the first btree node is full.
The code path causing this problem is:
1. k1 mapped to btree node n1, n2
2. When mapping into n1, bch_btree_insert_node goes into
   split case, top half of k1, say k1.1, is inserted into
   splitted n1: say n1.1, n1.2. Since the bottom half of k1,
   say k1.2, is left in insert_keys, so bch_btree_insert_node
   returns -EINTR.
   This happens in this code:
   in bch_btree_insert_node:
        ...
        } else {
                /* Invalidated all iterators */
                int ret = btree_split(b, op, insert_keys, replace_key);

                if (bch_keylist_empty(insert_keys))
                        return 0;
                else if (!ret)
                        return -EINTR;
                return ret;
        }
        ...
3. For return value -EINTR, another
   bch_btree_map_nodes_recurse in bcache_btree_root is
   triggered, with the wrong "from" key: which is
   START_KEY(&k1), instead of START_KEY(&k1.2)
4. n1.2 is revisted, since it has no overlapping range for
   k1.2, bch_btree_insert_keys sets
   op->insert_collision = true
   in the following code path:
   btree_insert_fn
   --> bch_btree_insert_node
     --> bch_btree_insert_keys
5. n2 is visisted and k1.2 is inserted here
6. bch_btree_insert detects op.op.insert_collision, and
   returns -ESRCH

In this case, the key is successfully inserted, though
bch_btree_insert returns -ESRCH.

We tried the latest version of bcache in linux mainline,
which has commit ce3e4cfb59cb (bcache: add failure check to
run_cache_set() for journal replay), the cache disk register
failed with this message:
bcache: replay journal failed, disabling caching

This is not what is expected, since the data in disk in
intact, journal replay should success.
I'm wondering if the checking of bch_btree_insert's return
value is really neccessary in bch_journal_replay, could we
just check bch_keylist_empty(&keylist) and make the replay
continue if that is empty?

We are using this patch for a work around for this issue now.

 drivers/md/bcache/journal.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 61bd79babf7a..35b8cbcd73c5 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -380,6 +380,8 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list)
 			bch_keylist_init_single(&keylist, k);
 
 			ret = bch_btree_insert(s, &keylist, i->pin, NULL);
+			if ((ret == -ESRCH) && (bch_keylist_empty(&keylist)))
+				ret = 0;
 			if (ret)
 				goto err;
 
-- 
2.18.4




^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] bcache: fix insane -ESRCH error in bch_journal_replay
  2022-02-15  8:28 [PATCH] bcache: fix insane -ESRCH error in bch_journal_replay Minlan Wang
@ 2022-05-13 15:22 ` Coly Li
  0 siblings, 0 replies; 2+ messages in thread
From: Coly Li @ 2022-05-13 15:22 UTC (permalink / raw)
  To: Minlan Wang; +Cc: linux-bcache

On 2/15/22 4:28 PM, Minlan Wang wrote:
> bch_keylist_empty is checked to make sure key insert is successful
>
> Signed-off-by: Minlan Wang <wangminlan@szsandstone.com>
> ---
> We've been using a pretty old version of bcache (from
> linux-4.18-rc8) and experiencing occasional data corruption
> after a new register session of bcache.  It turns out the
> data corruption is caused by this code in our old version of
> bcache:
>
> in bch_journal_replay:
> 		 ...
>                   for (k = i->j.start;
>                        k < bset_bkey_last(&i->j);
>                        k = bkey_next(k)) {
>                           trace_bcache_journal_replay_key(k);
>
>                           bch_keylist_init_single(&keylist, k);
>
>                           ret = bch_btree_insert(s, &keylist, i->pin, NULL);
>                           if (ret)
>                                   goto err;
> 		 ...
> bch_journal_replay returns -ESRCH, then in run_cache_set:
> 		 ...
>                   if (j->version < BCACHE_JSET_VERSION_UUID)
>                           __uuid_write(c);
>
>                   bch_journal_replay(c, &journal);
> 		 ...
> There's no message warning about this key insert error in
> journal replay, nor does run_cache_set check for return
> value of bch_journal_replay either, so later keys in jset
> got lost silently and cause data corruption.
>
> We checked the key which caused this failure in
> bch_journal_replay, it is a key mapped to 2 btree node, and
> the first btree node is full.
> The code path causing this problem is:
> 1. k1 mapped to btree node n1, n2


Hi Minlan,


Could you please provide more details about how the key 'k1' mapped to 
btree node n1 and n2 both?

Thanks.


Coly Li



> 2. When mapping into n1, bch_btree_insert_node goes into
>     split case, top half of k1, say k1.1, is inserted into
>     splitted n1: say n1.1, n1.2. Since the bottom half of k1,
>     say k1.2, is left in insert_keys, so bch_btree_insert_node
>     returns -EINTR.
>     This happens in this code:
>     in bch_btree_insert_node:
>          ...
>          } else {
>                  /* Invalidated all iterators */
>                  int ret = btree_split(b, op, insert_keys, replace_key);
>
>                  if (bch_keylist_empty(insert_keys))
>                          return 0;
>                  else if (!ret)
>                          return -EINTR;
>                  return ret;
>          }
>          ...
> 3. For return value -EINTR, another
>     bch_btree_map_nodes_recurse in bcache_btree_root is
>     triggered, with the wrong "from" key: which is
>     START_KEY(&k1), instead of START_KEY(&k1.2)
> 4. n1.2 is revisted, since it has no overlapping range for
>     k1.2, bch_btree_insert_keys sets
>     op->insert_collision = true
>     in the following code path:
>     btree_insert_fn
>     --> bch_btree_insert_node
>       --> bch_btree_insert_keys
> 5. n2 is visisted and k1.2 is inserted here
> 6. bch_btree_insert detects op.op.insert_collision, and
>     returns -ESRCH
>
> In this case, the key is successfully inserted, though
> bch_btree_insert returns -ESRCH.
>
> We tried the latest version of bcache in linux mainline,
> which has commit ce3e4cfb59cb (bcache: add failure check to
> run_cache_set() for journal replay), the cache disk register
> failed with this message:
> bcache: replay journal failed, disabling caching
>
> This is not what is expected, since the data in disk in
> intact, journal replay should success.
> I'm wondering if the checking of bch_btree_insert's return
> value is really neccessary in bch_journal_replay, could we
> just check bch_keylist_empty(&keylist) and make the replay
> continue if that is empty?
>
> We are using this patch for a work around for this issue now.
>
>   drivers/md/bcache/journal.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
> index 61bd79babf7a..35b8cbcd73c5 100644
> --- a/drivers/md/bcache/journal.c
> +++ b/drivers/md/bcache/journal.c
> @@ -380,6 +380,8 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list)
>   			bch_keylist_init_single(&keylist, k);
>   
>   			ret = bch_btree_insert(s, &keylist, i->pin, NULL);
> +			if ((ret == -ESRCH) && (bch_keylist_empty(&keylist)))
> +				ret = 0;
>   			if (ret)
>   				goto err;
>   



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-05-13 15:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-15  8:28 [PATCH] bcache: fix insane -ESRCH error in bch_journal_replay Minlan Wang
2022-05-13 15:22 ` Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.