All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw:About bcache-check
@ 2020-09-16  6:19 杨东升
  2020-09-16  6:41 ` bcache-check 杨东升
  0 siblings, 1 reply; 2+ messages in thread
From: 杨东升 @ 2020-09-16  6:19 UTC (permalink / raw)
  To: colyli; +Cc: linux-bcache

Resending with no HTML format  ... ...


Hi Coly and all,
     I found there is an error message in our testing:


Sep 27 17:43:00 node-1 kernel: bcache: error on 
c2914b7e-d665-4ec1-80e1-272755de19ef: unsupported bset version at bucket
 58290, block 0, 40818810 keys, disabling caching


I checked the code in bch_btree_node_read_done() around this message:

 214         for (;
 215              b->written < btree_blocks(b) && i->seq == b->keys.set[0].data->seq;
 216              i = write_block(b)) {
 217                 err = "unsupported bset version";
 218                 if (i->version > BCACHE_BSET_VERSION)
 219                         goto err;
 220 
The problem is we found the i->seq is what we expected for this btree_node, but the version is not BCACHE_BSET_VERSION (1)



I think there would be two reasons to cause this messages:
(1) cache discard is not enabled.
      When we allocate a bucket, if we dont enable discard, there could be some outdated data in this bucket, 

and there is possibility that the location of i->seq is equal to what we expected,

but that's really not an bset at all, so we will found version, magic and bset_csum are all unexpected, 

currently we will goto err and stop cache_set.


(2) power-cut.
       When we are doing btree_node_write, if there is a power-cut happen, we could write a partial btree.
 

But when we meet this kind of problems, we cant use this cache device. There is no tool to recovery from this kind of problem.

I think I can cook a bcache-check in bcache-tools, something like fsck. to check this kind of problem

and allow user to repair it, warning on user force-repaire is risky.



Please help to point out if there is something I am missing. 



Thanx
Dongsheng




^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re:Fw:About bcache-check
  2020-09-16  6:19 Fw:About bcache-check 杨东升
@ 2020-09-16  6:41 ` 杨东升
  0 siblings, 0 replies; 2+ messages in thread
From: 杨东升 @ 2020-09-16  6:41 UTC (permalink / raw)
  To: 杨东升; +Cc: colyli, linux-bcache

Hi Coly,

    Hmm, after a second thought, this problem sounds cant happen in the discard disabled reason:
because the seq is a random number, 

get_random_bytes(&i->seq, sizeof(uint64_t));

So it's not possible to get a same random seq in last invalidated bucket and the new bucket. 
But what about the power-cut case?

Yang




发件人:"杨东升" <dongsheng.yang@easystack.cn>
发送日期:2020-09-16 14:19:46
收件人:colyli <colyli@suse.de>
抄送人:linux-bcache <linux-bcache@vger.kernel.org>
主题:Fw:About bcache-check>Resending with no HTML format  ... ...
>
>
>Hi Coly and all,
>     I found there is an error message in our testing:
>
>
>Sep 27 17:43:00 node-1 kernel: bcache: error on 
>c2914b7e-d665-4ec1-80e1-272755de19ef: unsupported bset version at bucket
> 58290, block 0, 40818810 keys, disabling caching
>
>
>I checked the code in bch_btree_node_read_done() around this message:
>
> 214         for (;
> 215              b->written < btree_blocks(b) && i->seq == b->keys.set[0].data->seq;
> 216              i = write_block(b)) {
> 217                 err = "unsupported bset version";
> 218                 if (i->version > BCACHE_BSET_VERSION)
> 219                         goto err;
> 220 
>The problem is we found the i->seq is what we expected for this btree_node, but the version is not BCACHE_BSET_VERSION (1)
>
>
>
>I think there would be two reasons to cause this messages:
>(1) cache discard is not enabled.
>      When we allocate a bucket, if we dont enable discard, there could be some outdated data in this bucket, 
>
>and there is possibility that the location of i->seq is equal to what we expected,
>
>but that's really not an bset at all, so we will found version, magic and bset_csum are all unexpected, 
>
>currently we will goto err and stop cache_set.
>
>
>(2) power-cut.
>       When we are doing btree_node_write, if there is a power-cut happen, we could write a partial btree.
> 
>
>But when we meet this kind of problems, we cant use this cache device. There is no tool to recovery from this kind of problem.
>
>I think I can cook a bcache-check in bcache-tools, something like fsck. to check this kind of problem
>
>and allow user to repair it, warning on user force-repaire is risky.
>
>
>
>Please help to point out if there is something I am missing. 
>
>
>
>Thanx
>Dongsheng
>
>
>



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-09-16  6:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-16  6:19 Fw:About bcache-check 杨东升
2020-09-16  6:41 ` bcache-check 杨东升

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.