* Fw:About bcache-check
@ 2020-09-16 6:19 杨东升
2020-09-16 6:41 ` bcache-check 杨东升
0 siblings, 1 reply; 2+ messages in thread
From: 杨东升 @ 2020-09-16 6:19 UTC (permalink / raw)
To: colyli; +Cc: linux-bcache
Resending with no HTML format ... ...
Hi Coly and all,
I found there is an error message in our testing:
Sep 27 17:43:00 node-1 kernel: bcache: error on
c2914b7e-d665-4ec1-80e1-272755de19ef: unsupported bset version at bucket
58290, block 0, 40818810 keys, disabling caching
I checked the code in bch_btree_node_read_done() around this message:
214 for (;
215 b->written < btree_blocks(b) && i->seq == b->keys.set[0].data->seq;
216 i = write_block(b)) {
217 err = "unsupported bset version";
218 if (i->version > BCACHE_BSET_VERSION)
219 goto err;
220
The problem is we found the i->seq is what we expected for this btree_node, but the version is not BCACHE_BSET_VERSION (1)
I think there would be two reasons to cause this messages:
(1) cache discard is not enabled.
When we allocate a bucket, if we dont enable discard, there could be some outdated data in this bucket,
and there is possibility that the location of i->seq is equal to what we expected,
but that's really not an bset at all, so we will found version, magic and bset_csum are all unexpected,
currently we will goto err and stop cache_set.
(2) power-cut.
When we are doing btree_node_write, if there is a power-cut happen, we could write a partial btree.
But when we meet this kind of problems, we cant use this cache device. There is no tool to recovery from this kind of problem.
I think I can cook a bcache-check in bcache-tools, something like fsck. to check this kind of problem
and allow user to repair it, warning on user force-repaire is risky.
Please help to point out if there is something I am missing.
Thanx
Dongsheng
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re:Fw:About bcache-check
2020-09-16 6:19 Fw:About bcache-check 杨东升
@ 2020-09-16 6:41 ` 杨东升
0 siblings, 0 replies; 2+ messages in thread
From: 杨东升 @ 2020-09-16 6:41 UTC (permalink / raw)
To: 杨东升; +Cc: colyli, linux-bcache
Hi Coly,
Hmm, after a second thought, this problem sounds cant happen in the discard disabled reason:
because the seq is a random number,
get_random_bytes(&i->seq, sizeof(uint64_t));
So it's not possible to get a same random seq in last invalidated bucket and the new bucket.
But what about the power-cut case?
Yang
发件人:"杨东升" <dongsheng.yang@easystack.cn>
发送日期:2020-09-16 14:19:46
收件人:colyli <colyli@suse.de>
抄送人:linux-bcache <linux-bcache@vger.kernel.org>
主题:Fw:About bcache-check>Resending with no HTML format ... ...
>
>
>Hi Coly and all,
> I found there is an error message in our testing:
>
>
>Sep 27 17:43:00 node-1 kernel: bcache: error on
>c2914b7e-d665-4ec1-80e1-272755de19ef: unsupported bset version at bucket
> 58290, block 0, 40818810 keys, disabling caching
>
>
>I checked the code in bch_btree_node_read_done() around this message:
>
> 214 for (;
> 215 b->written < btree_blocks(b) && i->seq == b->keys.set[0].data->seq;
> 216 i = write_block(b)) {
> 217 err = "unsupported bset version";
> 218 if (i->version > BCACHE_BSET_VERSION)
> 219 goto err;
> 220
>The problem is we found the i->seq is what we expected for this btree_node, but the version is not BCACHE_BSET_VERSION (1)
>
>
>
>I think there would be two reasons to cause this messages:
>(1) cache discard is not enabled.
> When we allocate a bucket, if we dont enable discard, there could be some outdated data in this bucket,
>
>and there is possibility that the location of i->seq is equal to what we expected,
>
>but that's really not an bset at all, so we will found version, magic and bset_csum are all unexpected,
>
>currently we will goto err and stop cache_set.
>
>
>(2) power-cut.
> When we are doing btree_node_write, if there is a power-cut happen, we could write a partial btree.
>
>
>But when we meet this kind of problems, we cant use this cache device. There is no tool to recovery from this kind of problem.
>
>I think I can cook a bcache-check in bcache-tools, something like fsck. to check this kind of problem
>
>and allow user to repair it, warning on user force-repaire is risky.
>
>
>
>Please help to point out if there is something I am missing.
>
>
>
>Thanx
>Dongsheng
>
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-09-16 6:41 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-16 6:19 Fw:About bcache-check 杨东升
2020-09-16 6:41 ` bcache-check 杨东升
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.