All of lore.kernel.org
 help / color / mirror / Atom feed
* Bcache is not caching anything. cache state=inconsistent, how to clear?
@ 2021-11-23 14:48 Tobiasz Karoń
  2021-11-23 17:37 ` Kai Krakow
  2021-11-23 22:24 ` Tobiasz Karoń
  0 siblings, 2 replies; 9+ messages in thread
From: Tobiasz Karoń @ 2021-11-23 14:48 UTC (permalink / raw)
  To: linux-bcache

Hi!

TL;DR

My cache is inconsistent, and that's probably preventing Bcache for m
using it (all I/O goes to the backing device). How can I clear that?

Details:

I've been using Bcache for the past few months on my root Btrfs
filesystem with success.
Then one day out of the blue Bcache failed and took my Btrfs
filesystem with it (details:
https://www.youtube.com/watch?v=Hf3zr6CxvmI, looks similar to this:
https://stackoverflow.com/questions/22820492/how-to-revert-bcache-device-to-regular-device).
That's not the topic of my message though.
I've done a clean Arch Linux installation on Bcache + Btrfs once again
using an SSD partition for cache and an HDD as the backing device.

However, this time it doesn't do anything...
I was unable to find any information online to solve this.

My Bcache device works fine, the system boots off of it. However all
I/O goes straight to the backing HDD, and the SSD is unused. Needless
to say this means the performance is not what I got used to when
Bcache was working fine.

Here's what a 3rd party bcache-status script says (it'd be great if
bcache-tools would provide something like this, BTW):

❯ bcache-status
--- bcache ---
Device                      ? (?)
UUID                        c9cd8259-3cee-42ff-a8ec-e11193c09b7e
Block Size                  0.50KiB
Bucket Size                 512.00KiB
Congested?                  False
Read Congestion             2.0ms
Write Congestion            20.0ms
Total Cache Size            173.97GiB
Total Cache Used            8.70GiB     (5%)
Total Cache Unused          165.27GiB   (95%)
Dirty Data                  0.50KiB     (0%)
Evictable Cache             173.97GiB   (100%)
Replacement Policy          [lru] fifo random
Cache Mode                  (Unknown)
Total Hits                  0
Total Misses                0
Total Bypass Hits           0
Total Bypass Misses         0
Total Bypassed              0B

The Total Cache Used value has not changed since I've done my initial
Arch Linux installation. It seems that Bcache has "turned off" by that
point.

Here's the bcache supers fro the backing device and cache

❯ bcache-super-show /dev/sda
sb.magic                ok
sb.first_sector         8 [match]
sb.csum                 4E6EACCA74AB0AE5 [match]
sb.version              1 [backing device]

dev.label               unfa-desktop%20root
dev.uuid                49202fdf-fbe5-48fd-bdd8-df5414da817c
dev.sectors_per_block   8
dev.sectors_per_bucket  1024
dev.data.first_sector   16
dev.data.cache_mode     0 [writethrough]
dev.data.cache_state    3 [inconsistent]

cset.uuid               9572380e-8e6f-4ce4-8323-80b98a85eeed

❯ bcache-super-show /dev/sdd3
sb.magic                ok
sb.first_sector         8 [match]
sb.csum                 259C90FD74B4D4BE [match]
sb.version              3 [cache device]

dev.label               (empty)
dev.uuid                95c6449a-03b5-40f2-a8cc-80b1b61c5ef0
dev.sectors_per_block   1
dev.sectors_per_bucket  1024
dev.cache.first_sector  1024
dev.cache.cache_sectors 364833792
dev.cache.total_sectors 364834816
dev.cache.ordered       yes
dev.cache.discard       no
dev.cache.pos           0
dev.cache.replacement   0 [lru]

cset.uuid               c9cd8259-3cee-42ff-a8ec-e11193c09b7e

BTW - I've now realized I've set a label for the backing device but
not the cache. maybe this is the reason? I don't think it should work
this way but I've cleared the label on my backing device just to be
sure.

Hmm. The cache in inconsistent. I had this before I reinstalled my OS.
I have recreated the bcache cache on the SSD and was hoping that will
solve it.
I don't know what I should do with this, is this the  reason why it's
not working?

I was wondering if washing the partition and recreating the cache
would help, but I don't want to needlessly wear down the SSD if that
won't help.

Needless to say I would really like to avoid data loss when using
Bcache - it's awesome, and the developer says it's perfectly stable
and safe, but I've had a sudden failure and others had such as well
(without seeing any hardware issues that could be causing that). Maybe
I should quit using Bcache all together? Maybe it's not
production-ready? I was wondering about maybe using Bcachefs, though
the need to compile a custom kernel for it is quite a deterrent. I
tried it briefly, but the bcachefs-tools stopped working at some point
without a visible reason. I know Btrfs is flawed, though it seems to
be the best so far.

Thank you for your work,
- unfa

-- 
- Tobiasz 'unfa' Karoń

www.youtube.com/unfa000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-23 14:48 Bcache is not caching anything. cache state=inconsistent, how to clear? Tobiasz Karoń
@ 2021-11-23 17:37 ` Kai Krakow
  2021-11-23 17:40   ` Kai Krakow
  2022-01-06  3:00   ` Eric Wheeler
  2021-11-23 22:24 ` Tobiasz Karoń
  1 sibling, 2 replies; 9+ messages in thread
From: Kai Krakow @ 2021-11-23 17:37 UTC (permalink / raw)
  To: Tobiasz Karoń; +Cc: linux-bcache

Hello Tobiasz!

Am Di., 23. Nov. 2021 um 15:48 Uhr schrieb Tobiasz Karoń <unfa00@gmail.com>:
>
> Hi!
>
> TL;DR
>
> My cache is inconsistent, and that's probably preventing Bcache for m
> using it (all I/O goes to the backing device). How can I clear that?

I've had a similar problem after bcache crashed due to a bug in the
latest kernel.

I could resolve it by the following steps (I think you figure out what
the PLACEHOLDERS mean):

For each backend device, set the cache_mode to none and detach it:

# echo none >/sys/block/BDEV/BPART/bcache/cache_mode
# echo 1 >/sys/block/BDEV/BPART/bcache/detach

Unregister the cache and re-create it (4096 works around the kernel
bug, also, it's potentially broken, so re-create):

# echo 1 >/sys/fs/bcache/CSETUUID/unregister
# bcache make -C -w 4096 -l LABEL --force /dev/BPART

Re-attach the devices and set cache mode:

# echo NEW_CSETUUID >/sys/block/BDEV/BPART/bcache/attach
# echo writearound >/sys/block/BDEV/BPART/bcache/cache_mode

I'm explicitly using writearound for btrfs because:

* writethrough would write data potentially relocated by COW
* writeback potentially destroys btrfs on unexpected bcache failures
* the performance difference between writeback and writearound for
btrfs is virtually non-existent

However, writearound will cache only reads, that means boot-time
improvements will lag one boot behind: During the first boot, bcache
will read btrfs and cache the reads, on the next boot, it will read
the cached data. Using writethrough could work around that but that's
not really useful with a COW filesystem because btrfs relocated
extents on each and every tiny write - making any cached data stale
and thus occupy bcache space for no reason. So it will also amplify
writes to the SSD for no real reason.

Youtube:

The problem you see and documented is exactly what happened to me (but
on Gentoo: system froze, reboot hung, rescue disk said: cache disabled
with a similar message), and you can work around it by using blocksize
4096 - and in any case it still happens: Do NOT use writeback caching,
use writearound as mentioned above, then at least it won't destroy
btrfs and it's a matter of re-creating the cache as outlined above.

HTH
Kai


> Details:
>
> I've been using Bcache for the past few months on my root Btrfs
> filesystem with success.
> Then one day out of the blue Bcache failed and took my Btrfs
> filesystem with it (details:
> https://www.youtube.com/watch?v=Hf3zr6CxvmI, looks similar to this:
> https://stackoverflow.com/questions/22820492/how-to-revert-bcache-device-to-regular-device).
> That's not the topic of my message though.
> I've done a clean Arch Linux installation on Bcache + Btrfs once again
> using an SSD partition for cache and an HDD as the backing device.
>
> However, this time it doesn't do anything...
> I was unable to find any information online to solve this.
>
> My Bcache device works fine, the system boots off of it. However all
> I/O goes straight to the backing HDD, and the SSD is unused. Needless
> to say this means the performance is not what I got used to when
> Bcache was working fine.
>
> Here's what a 3rd party bcache-status script says (it'd be great if
> bcache-tools would provide something like this, BTW):
>
> ❯ bcache-status
> --- bcache ---
> Device                      ? (?)
> UUID                        c9cd8259-3cee-42ff-a8ec-e11193c09b7e
> Block Size                  0.50KiB
> Bucket Size                 512.00KiB
> Congested?                  False
> Read Congestion             2.0ms
> Write Congestion            20.0ms
> Total Cache Size            173.97GiB
> Total Cache Used            8.70GiB     (5%)
> Total Cache Unused          165.27GiB   (95%)
> Dirty Data                  0.50KiB     (0%)
> Evictable Cache             173.97GiB   (100%)
> Replacement Policy          [lru] fifo random
> Cache Mode                  (Unknown)
> Total Hits                  0
> Total Misses                0
> Total Bypass Hits           0
> Total Bypass Misses         0
> Total Bypassed              0B
>
> The Total Cache Used value has not changed since I've done my initial
> Arch Linux installation. It seems that Bcache has "turned off" by that
> point.
>
> Here's the bcache supers fro the backing device and cache
>
> ❯ bcache-super-show /dev/sda
> sb.magic                ok
> sb.first_sector         8 [match]
> sb.csum                 4E6EACCA74AB0AE5 [match]
> sb.version              1 [backing device]
>
> dev.label               unfa-desktop%20root
> dev.uuid                49202fdf-fbe5-48fd-bdd8-df5414da817c
> dev.sectors_per_block   8
> dev.sectors_per_bucket  1024
> dev.data.first_sector   16
> dev.data.cache_mode     0 [writethrough]
> dev.data.cache_state    3 [inconsistent]
>
> cset.uuid               9572380e-8e6f-4ce4-8323-80b98a85eeed
>
> ❯ bcache-super-show /dev/sdd3
> sb.magic                ok
> sb.first_sector         8 [match]
> sb.csum                 259C90FD74B4D4BE [match]
> sb.version              3 [cache device]
>
> dev.label               (empty)
> dev.uuid                95c6449a-03b5-40f2-a8cc-80b1b61c5ef0
> dev.sectors_per_block   1
> dev.sectors_per_bucket  1024
> dev.cache.first_sector  1024
> dev.cache.cache_sectors 364833792
> dev.cache.total_sectors 364834816
> dev.cache.ordered       yes
> dev.cache.discard       no
> dev.cache.pos           0
> dev.cache.replacement   0 [lru]
>
> cset.uuid               c9cd8259-3cee-42ff-a8ec-e11193c09b7e
>
> BTW - I've now realized I've set a label for the backing device but
> not the cache. maybe this is the reason? I don't think it should work
> this way but I've cleared the label on my backing device just to be
> sure.
>
> Hmm. The cache in inconsistent. I had this before I reinstalled my OS.
> I have recreated the bcache cache on the SSD and was hoping that will
> solve it.
> I don't know what I should do with this, is this the  reason why it's
> not working?
>
> I was wondering if washing the partition and recreating the cache
> would help, but I don't want to needlessly wear down the SSD if that
> won't help.
>
> Needless to say I would really like to avoid data loss when using
> Bcache - it's awesome, and the developer says it's perfectly stable
> and safe, but I've had a sudden failure and others had such as well
> (without seeing any hardware issues that could be causing that). Maybe
> I should quit using Bcache all together? Maybe it's not
> production-ready? I was wondering about maybe using Bcachefs, though
> the need to compile a custom kernel for it is quite a deterrent. I
> tried it briefly, but the bcachefs-tools stopped working at some point
> without a visible reason. I know Btrfs is flawed, though it seems to
> be the best so far.
>
> Thank you for your work,
> - unfa
>
> --
> - Tobiasz 'unfa' Karoń
>
> www.youtube.com/unfa000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-23 17:37 ` Kai Krakow
@ 2021-11-23 17:40   ` Kai Krakow
  2021-11-23 22:34     ` Tobiasz Karoń
  2022-01-06  3:00   ` Eric Wheeler
  1 sibling, 1 reply; 9+ messages in thread
From: Kai Krakow @ 2021-11-23 17:40 UTC (permalink / raw)
  To: Tobiasz Karoń; +Cc: linux-bcache

Oops:

> # echo 1 >/sys/fs/bcache/CSETUUID/unregister
> # bcache make -C -w 4096 -l LABEL --force /dev/BPART

CPART of course!

# bcache make -C -w 4096 -l LABEL --force /dev/CPART

Bye
Kai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-23 14:48 Bcache is not caching anything. cache state=inconsistent, how to clear? Tobiasz Karoń
  2021-11-23 17:37 ` Kai Krakow
@ 2021-11-23 22:24 ` Tobiasz Karoń
  1 sibling, 0 replies; 9+ messages in thread
From: Tobiasz Karoń @ 2021-11-23 22:24 UTC (permalink / raw)
  To: linux-bcache

I think I have solved it after reading up on
https://www.kernel.org/doc/html/latest/admin-guide/bcache.html.

1. I've set caching to none.
2. I've detached the caching device
3. I've unregistered it
4. I've done wipe-fs
5. I've recreated bcache caching device (also used --cset-uuid to
already put it into the write bcache set)
6. I've registered and reattached the cache to the backing device
7. Now my backing device shows the status as clean again.
8. I've enabled writearound caching for now (will enable writeback if
all goes well)

It seems the cache is working again:

❯ bcache-status
--- bcache ---
Device                      /dev/sda (8:0)
UUID                        c9cd8259-3cee-42ff-a8ec-e11193c09b7e
Block Size                  0.50KiB
Bucket Size                 512.00KiB
Congested?                  False
Read Congestion             2.0ms
Write Congestion            20.0ms
Total Cache Size            173.97GiB
Total Cache Used            1.74GiB     (1%)
Total Cache Unused          172.23GiB   (99%)
Dirty Data                  0.50KiB     (0%)
Evictable Cache             173.97GiB   (100%)
Replacement Policy          [lru] fifo random
Cache Mode                  [writethrough] writeback writearound none
Total Hits                  2   (0%)
Total Misses                1506
Total Bypass Hits           0   (0%)
Total Bypass Misses         9138
Total Bypassed              183.70MiB

### Part 2: It's not over!

Soon after I've done this and all seemed to be well, bcache has
imploded once again, this time thankfully not taking down my root
filesystem. Probably because it was not in writeback mode.
My OS didn't boot, and I got another checksum error at some bucket and
"disabled caching" message.

I suspect it as due to my mistake - I have deleted and recreated
bcache cache without rebooting in the middle maybe something went
wrong because of that. I've rebooted into a live system, deleted the
cache again (my backing device was clean).

I've written all zeros to the partition before recreating the cache
this time though.
I suspect maybe bcache found old data there and got confused? Wipefs
only deletes superblocks.

Before doing anything though I've mounted my backing filesystem with
`mount -o 8192` and backed it up using a btrfs-clone Python script.
After I've verified my backup was working I've unmounted the backup
medium and proceeded to recreate the cache and reattach it.

I've also found that `running` was `0` fro my bcache set, so I have
turned it on.

After a reboot everything was back to normal.

I *hope* this will keep working. Last time Bcache broke and took my
filesystem with it without anything significant happening. I'd love to
know if it's considered stable or what could be causing spontaneous
failues.

- unfa



wt., 23 lis 2021 o 15:48 Tobiasz Karoń <unfa00@gmail.com> napisał(a):
>
> Hi!
>
> TL;DR
>
> My cache is inconsistent, and that's probably preventing Bcache for m
> using it (all I/O goes to the backing device). How can I clear that?
>
> Details:
>
> I've been using Bcache for the past few months on my root Btrfs
> filesystem with success.
> Then one day out of the blue Bcache failed and took my Btrfs
> filesystem with it (details:
> https://www.youtube.com/watch?v=Hf3zr6CxvmI, looks similar to this:
> https://stackoverflow.com/questions/22820492/how-to-revert-bcache-device-to-regular-device).
> That's not the topic of my message though.
> I've done a clean Arch Linux installation on Bcache + Btrfs once again
> using an SSD partition for cache and an HDD as the backing device.
>
> However, this time it doesn't do anything...
> I was unable to find any information online to solve this.
>
> My Bcache device works fine, the system boots off of it. However all
> I/O goes straight to the backing HDD, and the SSD is unused. Needless
> to say this means the performance is not what I got used to when
> Bcache was working fine.
>
> Here's what a 3rd party bcache-status script says (it'd be great if
> bcache-tools would provide something like this, BTW):
>
> ❯ bcache-status
> --- bcache ---
> Device                      ? (?)
> UUID                        c9cd8259-3cee-42ff-a8ec-e11193c09b7e
> Block Size                  0.50KiB
> Bucket Size                 512.00KiB
> Congested?                  False
> Read Congestion             2.0ms
> Write Congestion            20.0ms
> Total Cache Size            173.97GiB
> Total Cache Used            8.70GiB     (5%)
> Total Cache Unused          165.27GiB   (95%)
> Dirty Data                  0.50KiB     (0%)
> Evictable Cache             173.97GiB   (100%)
> Replacement Policy          [lru] fifo random
> Cache Mode                  (Unknown)
> Total Hits                  0
> Total Misses                0
> Total Bypass Hits           0
> Total Bypass Misses         0
> Total Bypassed              0B
>
> The Total Cache Used value has not changed since I've done my initial
> Arch Linux installation. It seems that Bcache has "turned off" by that
> point.
>
> Here's the bcache supers fro the backing device and cache
>
> ❯ bcache-super-show /dev/sda
> sb.magic                ok
> sb.first_sector         8 [match]
> sb.csum                 4E6EACCA74AB0AE5 [match]
> sb.version              1 [backing device]
>
> dev.label               unfa-desktop%20root
> dev.uuid                49202fdf-fbe5-48fd-bdd8-df5414da817c
> dev.sectors_per_block   8
> dev.sectors_per_bucket  1024
> dev.data.first_sector   16
> dev.data.cache_mode     0 [writethrough]
> dev.data.cache_state    3 [inconsistent]
>
> cset.uuid               9572380e-8e6f-4ce4-8323-80b98a85eeed
>
> ❯ bcache-super-show /dev/sdd3
> sb.magic                ok
> sb.first_sector         8 [match]
> sb.csum                 259C90FD74B4D4BE [match]
> sb.version              3 [cache device]
>
> dev.label               (empty)
> dev.uuid                95c6449a-03b5-40f2-a8cc-80b1b61c5ef0
> dev.sectors_per_block   1
> dev.sectors_per_bucket  1024
> dev.cache.first_sector  1024
> dev.cache.cache_sectors 364833792
> dev.cache.total_sectors 364834816
> dev.cache.ordered       yes
> dev.cache.discard       no
> dev.cache.pos           0
> dev.cache.replacement   0 [lru]
>
> cset.uuid               c9cd8259-3cee-42ff-a8ec-e11193c09b7e
>
> BTW - I've now realized I've set a label for the backing device but
> not the cache. maybe this is the reason? I don't think it should work
> this way but I've cleared the label on my backing device just to be
> sure.
>
> Hmm. The cache in inconsistent. I had this before I reinstalled my OS.
> I have recreated the bcache cache on the SSD and was hoping that will
> solve it.
> I don't know what I should do with this, is this the  reason why it's
> not working?
>
> I was wondering if washing the partition and recreating the cache
> would help, but I don't want to needlessly wear down the SSD if that
> won't help.
>
> Needless to say I would really like to avoid data loss when using
> Bcache - it's awesome, and the developer says it's perfectly stable
> and safe, but I've had a sudden failure and others had such as well
> (without seeing any hardware issues that could be causing that). Maybe
> I should quit using Bcache all together? Maybe it's not
> production-ready? I was wondering about maybe using Bcachefs, though
> the need to compile a custom kernel for it is quite a deterrent. I
> tried it briefly, but the bcachefs-tools stopped working at some point
> without a visible reason. I know Btrfs is flawed, though it seems to
> be the best so far.
>
> Thank you for your work,
> - unfa
>
> --
> - Tobiasz 'unfa' Karoń
>
> www.youtube.com/unfa000



--
- Tobiasz 'unfa' Karoń

www.youtube.com/unfa000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-23 17:40   ` Kai Krakow
@ 2021-11-23 22:34     ` Tobiasz Karoń
  2021-11-24  5:35       ` Kai Krakow
  0 siblings, 1 reply; 9+ messages in thread
From: Tobiasz Karoń @ 2021-11-23 22:34 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

Thank you for your detailed reply and sharing your experience and solution.

So it seems Bcache and Btrfs are fundamentally incompatible when it
comes to caching writes? It has worked fine for 2 months, and then it
just imploded. I'll stay in writearound mode to be safe.

I've checked and my cache device has a block size of 512 bytes. That's
a strange value, as the backing device is a AF HDD (like all of them
in the past decade or more), so the block size should be 4Kb.
I guess this also works until it doesn't.

Can I destroy and recreate the cache device on a live system (my root
filesystem is on this bcache set).
I guess I can't. This is probably what I've done wrong today - I did
not unregister the whole cset before attempting to recreate the cache
device.

I am honestly a little afraid to touch it, after what happened.

I hope Bcachefs will eliminate these problems and provide a stable
unified solution.

Take care
- unfa

wt., 23 lis 2021 o 18:40 Kai Krakow <kai@kaishome.de> napisał(a):
>
> Oops:
>
> > # echo 1 >/sys/fs/bcache/CSETUUID/unregister
> > # bcache make -C -w 4096 -l LABEL --force /dev/BPART
>
> CPART of course!
>
> # bcache make -C -w 4096 -l LABEL --force /dev/CPART
>
> Bye
> Kai



-- 
- Tobiasz 'unfa' Karoń

www.youtube.com/unfa000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-23 22:34     ` Tobiasz Karoń
@ 2021-11-24  5:35       ` Kai Krakow
  2021-11-24 12:41         ` Tobiasz Karoń
  0 siblings, 1 reply; 9+ messages in thread
From: Kai Krakow @ 2021-11-24  5:35 UTC (permalink / raw)
  To: Tobiasz Karoń; +Cc: linux-bcache

Hello!

Am Di., 23. Nov. 2021 um 23:34 Uhr schrieb Tobiasz Karoń <unfa00@gmail.com>:
>
> Thank you for your detailed reply and sharing your experience and solution.
>
> So it seems Bcache and Btrfs are fundamentally incompatible when it
> comes to caching writes? It has worked fine for 2 months, and then it
> just imploded. I'll stay in writearound mode to be safe.

No, they are not fundamentally incompatible but losing writeback data
on btrfs is much more a visible catastrophic event than to other file
systems (which write data in-place when btrfs writes cow).

Even with other filesystems and bcache destroying itself in writeback
mode would cause severe damage of your filesystem (on classical
filesystem, usually you end up with garbled files having partially old
and new data, maybe some fixable metadata errors) - BUT: it is still a
catastrophic event, maybe even more so because data loss could go
silent, ending up in your backups, only to find later that you're
missing data that has already been rotated out of the backup.

Don't use writeback if you cannot afford to recover from backup when
writeback fails. That's a property of how caching works, not a
property of btrfs or bcache. It's the same for any writeback cache you
might be using: RAID-controllers come with writeback caches, and
decide to throw it away sometimes, leaving you with destroyed
filesystems, so you usually turn that off unless your workload
requires it and you can afford to throw lost data away). That doesn't
make them fundamentally incompatible with filesystems, right? Your HDD
comes with write caches which may destroy your filesystem, too, on
power-loss. You might want to turn that off, especially when using
btrfs (but also for better write latency behavior, and the kernel has
better IO scheduling anyways than the really small writecaches of
HDDs): `hdparm -W0 /dev/HDDDEV`. HDD write caches are only useful for
operating systems that do no proper write ordering/merging (usually
DOS, and maybe Windows), and sometimes HDD firmwares are buggy and
cannot use async queueing, when write caches may improve performance a
lot. But usually, you want to keep that setting off. That becomes even
more important when you use bcache in writeback mode (because HDD
write caching may then break assumptions of bcache).

> I've checked and my cache device has a block size of 512 bytes.

Yep, all my bcache systems using 512 bytes are affected by that 5.15.2
kernel bug. Use 4k and you should be okay. The problem seems to come
from page-unaligned writes - and using 4k (the page size of your CPU)
seems to work around that. Kernel 5.15.3 has the most part of the fix,
another fix is queued for one of the next releases. Another lesson
learned: Don't use a new kernel until it's in its x.y.{4,5,6}
releases. This is not the first time I had catastrophic events with
kernels in their infancy. That's why I usually avoid .0 and .1
kernels. Seems I should add .2 and .3 kernels to that list, too. Never
do a major kernel upgrade without creating a full backup first. Kernel
components like bcache are much less well-tested than other
components, so they likely break on early kernel releases for some
exotic use-cases (exotic because nobody who cares about their data
uses writeback).

> That's
> a strange value, as the backing device is a AF HDD (like all of them
> in the past decade or more), so the block size should be 4Kb.
> I guess this also works until it doesn't.

You won't have catastrophic events with writearound - and that's as
good as writeback on btrfs (and even better because it won't destroy
the filesystem in case of a cache hiccup). Bcache can break for any
reason, due to bugs, like any other kernel component. And bcache in
writeback mode usually means catastrophic results for ANY file system
attached to it - where btrfs is just much more likely to detect those
events. Even if you COULD repair the file system logical structure, it
still means some data wasn't written - btrfs just has a much better
understanding about what should be on the disk while other filesystems
silently accept the data loss after recovering from structural errors.
BTW: 4k should be safe, there's another problem in bcache unrelated to
this which still needs fixing.

> Can I destroy and recreate the cache device on a live system (my root
> filesystem is on this bcache set). I guess I can't.

Yes, you can. Detaching the cache makes the backing devices pass
through, they are still available as /dev/bcache* even with no caching
device.

> This is probably what I've done wrong today - I did
> not unregister the whole cset before attempting to recreate the cache
> device.

Okay, unregistering should be quite essential but you don't need to
reboot. Also, I recommend using a new cset UUID so it cannot conflict
with any stale data that MAY be stored in the cache.

> I am honestly a little afraid to touch it, after what happened.

Well, the cache backend is stopped or detached - it doesn't matter
anyways. Just don't use writeback for the next couple of kernel
releases (or maybe rather avoid it for the future completely).
Writeback really doesn't gain you a lot on btrfs because due to COW,
btrfs is already quite good writing (because writes are usually going
to be sequential anyways), and it has become a lot better during the
last few kernel release cycles. I've been using writeback for a long
time now but this is just another occasion why I should not have been
using writeback but writearound instead (the other one being that
sometimes on boot, my SSD detaches from the bus, making bcache throw
away all writeback data and leaving me with a destroyed filesystem).

> I hope Bcachefs will eliminate these problems and provide a stable
> unified solution.

You're swapping one "experimental" FS (btrfs) which has matured great
ways during at least the last 5 years with another experimental
filesystem which is not yet battle-tested and performance-tuned.
bcachefs and bcache are two completely distinctive products with
different use-cases, they only share a similar name because the
fundamental inner structures are based on the same code and idea (and
probably because the author thought it's cool).

I'm not sure if you use device pooling with btrfs (multiple disks) but
for my system, it showed useful to NOT use RAID-0 for btrfs data, it's
actually slower in normal desktop use and the way how btrfs internally
distributes data access across devices. I found that using single-data
mode even with multidisk has better write behavior and better read
latency, and it makes better use of bcache. So maybe its worth a try
if you fear that using writearound mode could degrade your system
responsiveness too much.

> Take care
> - unfa

Good luck
Kai


> wt., 23 lis 2021 o 18:40 Kai Krakow <kai@kaishome.de> napisał(a):
> >
> > Oops:
> >
> > > # echo 1 >/sys/fs/bcache/CSETUUID/unregister
> > > # bcache make -C -w 4096 -l LABEL --force /dev/BPART
> >
> > CPART of course!
> >
> > # bcache make -C -w 4096 -l LABEL --force /dev/CPART
> >
> > Bye
> > Kai
>
>
>
> --
> - Tobiasz 'unfa' Karoń
>
> www.youtube.com/unfa000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-24  5:35       ` Kai Krakow
@ 2021-11-24 12:41         ` Tobiasz Karoń
  2021-11-24 13:24           ` Kai Krakow
  0 siblings, 1 reply; 9+ messages in thread
From: Tobiasz Karoń @ 2021-11-24 12:41 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

śr., 24 lis 2021 o 06:36 Kai Krakow <kai@kaishome.de> napisał(a):
>
> Hello!
>
> Am Di., 23. Nov. 2021 um 23:34 Uhr schrieb Tobiasz Karoń <unfa00@gmail.com>:
> >
> > Thank you for your detailed reply and sharing your experience and solution.
> >
> > So it seems Bcache and Btrfs are fundamentally incompatible when it
> > comes to caching writes? It has worked fine for 2 months, and then it
> > just imploded. I'll stay in writearound mode to be safe.
>
> No, they are not fundamentally incompatible but losing writeback data
> on btrfs is much more a visible catastrophic event than to other file
> systems (which write data in-place when btrfs writes cow).
My issue with Btrfs is - it seems to become trashed very easily. I
would expect a COW filesystem to be much more resilient to various
errors. It seems to me that sometimes a single bad sector can make the
filesystem unmountable and unrecoverable. Maybe I am just not handling
such events properly I've definitely made mistakes in the past
(sometimes due to not enough spares to do images before messing around
- not gonna do that again).
>
> Even with other filesystems and bcache destroying itself in writeback
> mode would cause severe damage of your filesystem (on classical
> filesystem, usually you end up with garbled files having partially old
> and new data, maybe some fixable metadata errors) - BUT: it is still a
> catastrophic event, maybe even more so because data loss could go
> silent, ending up in your backups, only to find later that you're
> missing data that has already been rotated out of the backup.
>
> Don't use writeback if you cannot afford to recover from backup when
> writeback fails. That's a property of how caching works, not a
> property of btrfs or bcache. It's the same for any writeback cache you
> might be using: RAID-controllers come with writeback caches, and
> decide to throw it away sometimes, leaving you with destroyed
> filesystems, so you usually turn that off unless your workload
> requires it and you can afford to throw lost data away). That doesn't
> make them fundamentally incompatible with filesystems, right? Your HDD
> comes with write caches which may destroy your filesystem, too, on
> power-loss. You might want to turn that off, especially when using
> btrfs (but also for better write latency behavior, and the kernel has
> better IO scheduling anyways than the really small writecaches of
> HDDs): `hdparm -W0 /dev/HDDDEV`. HDD write caches are only useful for
> operating systems that do no proper write ordering/merging (usually
> DOS, and maybe Windows), and sometimes HDD firmwares are buggy and
> cannot use async queueing, when write caches may improve performance a
> lot. But usually, you want to keep that setting off. That becomes even
> more important when you use bcache in writeback mode (because HDD
> write caching may then break assumptions of bcache).

I've found out that hard drives I am using have a firmware bug that
can corrupt data when using write cache:
https://www.reddit.com/r/linux/comments/c59nry/btrfs_vs_write_caching_firmware_bugs_tldr_some/es1krq2/

I'm going to disable write cache on all of these drives. This could
explain some spontaneous collapses of Btrfs and Bcache on my system in
the past. But again: I'd expect a COW filesystem to be able to recover
from incomplete writes. I've been using Btrfs for about 3-4 years now.
Maybe I just don't know how to handle issues...

I wonder if there's an option fro me to update the firmware on my
existing drives without booting into Windows.
it seems that *some* HDD manufacturers have easy tools for Linux to do
that, but I don't know what they are, as that was redacted:
https://forum.corsair.com/forums/topic/77369-flashing-firmware-with-linux-hdparm-command/

I see that hdparm has an option called --fwdownload, thought  I'd
certainly not try that without being absolutely sure it'll work.

>
> > I've checked and my cache device has a block size of 512 bytes.
>
> Yep, all my bcache systems using 512 bytes are affected by that 5.15.2
> kernel bug. Use 4k and you should be okay. The problem seems to come
> from page-unaligned writes - and using 4k (the page size of your CPU)
> seems to work around that. Kernel 5.15.3 has the most part of the fix,
> another fix is queued for one of the next releases. Another lesson
> learned: Don't use a new kernel until it's in its x.y.{4,5,6}
> releases. This is not the first time I had catastrophic events with
> kernels in their infancy. That's why I usually avoid .0 and .1
> kernels. Seems I should add .2 and .3 kernels to that list, too. Never
> do a major kernel upgrade without creating a full backup first. Kernel
> components like bcache are much less well-tested than other
> components, so they likely break on early kernel releases for some
> exotic use-cases (exotic because nobody who cares about their data
> uses writeback).
I'm at kernel 5.15.3 right now. I think Arch Linux ships kernel
updates after they reach .3. The 5.15 came out like 2 weeks ago.

>
> > That's
> > a strange value, as the backing device is a AF HDD (like all of them
> > in the past decade or more), so the block size should be 4Kb.
> > I guess this also works until it doesn't.
>
> You won't have catastrophic events with writearound - and that's as
> good as writeback on btrfs (and even better because it won't destroy
> the filesystem in case of a cache hiccup). Bcache can break for any
> reason, due to bugs, like any other kernel component. And bcache in
> writeback mode usually means catastrophic results for ANY file system
> attached to it - where btrfs is just much more likely to detect those
> events. Even if you COULD repair the file system logical structure, it
> still means some data wasn't written - btrfs just has a much better
> understanding about what should be on the disk while other filesystems
> silently accept the data loss after recovering from structural errors.
> BTW: 4k should be safe, there's another problem in bcache unrelated to
> this which still needs fixing.
>
> > Can I destroy and recreate the cache device on a live system (my root
> > filesystem is on this bcache set). I guess I can't.
>
> Yes, you can. Detaching the cache makes the backing devices pass
> through, they are still available as /dev/bcache* even with no caching
> device.
>
> > This is probably what I've done wrong today - I did
> > not unregister the whole cset before attempting to recreate the cache
> > device.
>
> Okay, unregistering should be quite essential but you don't need to
> reboot. Also, I recommend using a new cset UUID so it cannot conflict
> with any stale data that MAY be stored in the cache.
Yeah, I used existing cset UUID. That has probably caused bcache to
write garbage and corrupt the cache...
>
> > I am honestly a little afraid to touch it, after what happened.
>
> Well, the cache backend is stopped or detached - it doesn't matter
> anyways. Just don't use writeback for the next couple of kernel
> releases (or maybe rather avoid it for the future completely).
> Writeback really doesn't gain you a lot on btrfs because due to COW,
> btrfs is already quite good writing (because writes are usually going
> to be sequential anyways), and it has become a lot better during the
> last few kernel release cycles. I've been using writeback for a long
> time now but this is just another occasion why I should not have been
> using writeback but writearound instead (the other one being that
> sometimes on boot, my SSD detaches from the bus, making bcache throw
> away all writeback data and leaving me with a destroyed filesystem).

Ok, I've booted into a live ISO and recreated the cache with 4K
blocks. I hope it's gonna spare me some adventures in the future.

>
> > I hope Bcachefs will eliminate these problems and provide a stable
> > unified solution.
>
> You're swapping one "experimental" FS (btrfs) which has matured great
> ways during at least the last 5 years with another experimental
> filesystem which is not yet battle-tested and performance-tuned.
> bcachefs and bcache are two completely distinctive products with
> different use-cases, they only share a similar name because the
> fundamental inner structures are based on the same code and idea (and
> probably because the author thought it's cool).
Yeah, honestly I wish he renamed Bcachefs to something shorter.
Anyway - I'm not gonna use it until it reaches mainline kernel, and
then still only for experiments, not for production.

>
> I'm not sure if you use device pooling with btrfs (multiple disks) but
> for my system, it showed useful to NOT use RAID-0 for btrfs data, it's
> actually slower in normal desktop use and the way how btrfs internally
> distributes data access across devices. I found that using single-data
> mode even with multidisk has better write behavior and better read
> latency, and it makes better use of bcache. So maybe its worth a try
> if you fear that using writearound mode could degrade your system
> responsiveness too much.
I am not using multiple devices in a single Btrfs filesystem at the moment.
I assumed using 2 drives in RAID1 would double the read speed (on
large files) since the extents can be read from two disks at once.
It's strange that it doesn't work like that...

>
> > Take care
> > - unfa
>
> Good luck
> Kai

Thank you so much for your insight!
That's all invaluable information you're sharing.

I hope these messages are going to be available publicly in some
mailing list archive for future reference when I inevitably encounter
the same problems in 5 years after I forgot what it was all about...

Thank you!
- unfa

>
>
> > wt., 23 lis 2021 o 18:40 Kai Krakow <kai@kaishome.de> napisał(a):
> > >
> > > Oops:
> > >
> > > > # echo 1 >/sys/fs/bcache/CSETUUID/unregister
> > > > # bcache make -C -w 4096 -l LABEL --force /dev/BPART
> > >
> > > CPART of course!
> > >
> > > # bcache make -C -w 4096 -l LABEL --force /dev/CPART
> > >
> > > Bye
> > > Kai
> >
> >
> >
> > --
> > - Tobiasz 'unfa' Karoń
> >
> > www.youtube.com/unfa000



-- 
- Tobiasz 'unfa' Karoń

www.youtube.com/unfa000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-24 12:41         ` Tobiasz Karoń
@ 2021-11-24 13:24           ` Kai Krakow
  0 siblings, 0 replies; 9+ messages in thread
From: Kai Krakow @ 2021-11-24 13:24 UTC (permalink / raw)
  To: Tobiasz Karoń; +Cc: linux-bcache

Hey there!

Am Mi., 24. Nov. 2021 um 13:41 Uhr schrieb Tobiasz Karoń <unfa00@gmail.com>:
>
> śr., 24 lis 2021 o 06:36 Kai Krakow <kai@kaishome.de> napisał(a):
> >
> > Hello!
> >
> > Am Di., 23. Nov. 2021 um 23:34 Uhr schrieb Tobiasz Karoń <unfa00@gmail.com>:
> > >
> > > Thank you for your detailed reply and sharing your experience and solution.
> > >
> > > So it seems Bcache and Btrfs are fundamentally incompatible when it
> > > comes to caching writes? It has worked fine for 2 months, and then it
> > > just imploded. I'll stay in writearound mode to be safe.
> >
> > No, they are not fundamentally incompatible but losing writeback data
> > on btrfs is much more a visible catastrophic event than to other file
> > systems (which write data in-place when btrfs writes cow).
> My issue with Btrfs is - it seems to become trashed very easily. I
> would expect a COW filesystem to be much more resilient to various
> errors. It seems to me that sometimes a single bad sector can make the
> filesystem unmountable and unrecoverable.

This should not happen, at least not when you're using dup or raid-1
for metadata. You should really use dup metadata on a single drive
(and probably you do because it's default).

OTOH, when you HDD has write caching enabled it MAY decouple reporting
of IO errors from actual write errors (write behind mode), that is, if
the drive reports data as written when it hit the cache, and later the
data cannot be written, a filesystem is actually already broken before
it could take notice. That's one failure mode but that should actually
not exist with modern drives that support write barriers (i.e., the
filesystem waits for cached writes to complete successfully). But
firmware bugs may apply here and break that assumption.

> Maybe I am just not handling
> such events properly I've definitely made mistakes in the past
> (sometimes due to not enough spares to do images before messing around
> - not gonna do that again).
> >
> > Even with other filesystems and bcache destroying itself in writeback
> > mode would cause severe damage of your filesystem (on classical
> > filesystem, usually you end up with garbled files having partially old
> > and new data, maybe some fixable metadata errors) - BUT: it is still a
> > catastrophic event, maybe even more so because data loss could go
> > silent, ending up in your backups, only to find later that you're
> > missing data that has already been rotated out of the backup.
> >
> > Don't use writeback if you cannot afford to recover from backup when
> > writeback fails. That's a property of how caching works, not a
> > property of btrfs or bcache. It's the same for any writeback cache you
> > might be using: RAID-controllers come with writeback caches, and
> > decide to throw it away sometimes, leaving you with destroyed
> > filesystems, so you usually turn that off unless your workload
> > requires it and you can afford to throw lost data away). That doesn't
> > make them fundamentally incompatible with filesystems, right? Your HDD
> > comes with write caches which may destroy your filesystem, too, on
> > power-loss. You might want to turn that off, especially when using
> > btrfs (but also for better write latency behavior, and the kernel has
> > better IO scheduling anyways than the really small writecaches of
> > HDDs): `hdparm -W0 /dev/HDDDEV`. HDD write caches are only useful for
> > operating systems that do no proper write ordering/merging (usually
> > DOS, and maybe Windows), and sometimes HDD firmwares are buggy and
> > cannot use async queueing, when write caches may improve performance a
> > lot. But usually, you want to keep that setting off. That becomes even
> > more important when you use bcache in writeback mode (because HDD
> > write caching may then break assumptions of bcache).
>
> I've found out that hard drives I am using have a firmware bug that
> can corrupt data when using write cache:
> https://www.reddit.com/r/linux/comments/c59nry/btrfs_vs_write_caching_firmware_bugs_tldr_some/es1krq2/
>
> I'm going to disable write cache on all of these drives.

You probably should do that anyways if your drive properly supports
NCQ with a queue depth greater than maybe 30? The thing is, there are
drives with a queue depth of only 1 or 2, and those will be slow
without write caching.

Write caching in the drive without BBU isn't a stable operation mode:
When you lose power, you've lost data.

> This could
> explain some spontaneous collapses of Btrfs and Bcache on my system in
> the past.

Yes, definitely!

> But again: I'd expect a COW filesystem to be able to recover
> from incomplete writes. I've been using Btrfs for about 3-4 years now.
> Maybe I just don't know how to handle issues...

No, this is probably and incomplete understanding of how btrfs
operates: btrfs CAN and DOES fix single write errors no problems at
all, that is, it detects wrong writes. And it can also handle
incomplete write AS LONG AS those missing writes are the end of the
transaction. BUT for btrfs to provide that stability it REALLY needs
the storage to preserve the order of transactions.

bcache does not do this: With writeback caching, it completely ignores
orders of transactions for writeback. It still ensures it ABOVE it
(where btrfs lives), that is if it reports data to be persisted, it's
persisted in the cache, and it guarantees data will eventually be
persisted back to backend storage. But BELOW it (where the backend
storage lives) it writes dirty data completely out of transactional
order, it rather reorders written data in order of head positioning.
This is where the speed gain comes from.

But this also means things break in a horrible way if dirty writeback
data is lost: Suddenly, the transactional order ABOVE bcache broke in
horrible ways because potentially much older data hasn't been
persisted yet while newer data has been (as in time travelling back
and revert changed data as if it was never written, if you watched
"Back to the Future" you can imagine the consequences). This
completely destroys the internal block trees of btrfs which depend on
older data written before newer data.

But yes, as a btree filesystem using COW, btrfs is especially
sensitive to such problems while other filesystems would simply see
overwritten data coming partially back from time travel mode, or some
broken metadata which eventually resolves to blocks still existing
blocks, usually resulting in cross-linked files. Often, this is easy
to repair but you end up with "repaired" files having either no
content at all (xfs does that), or the same content (fat does this),
or old content (ext4 may do this due to how its journal works). But
however the content changes, it changes partially, thus usually
breaking applications that stored the data (e.g., databases).
Essentially, it's the same catastrophic problem just you don't notice
then.

> I wonder if there's an option fro me to update the firmware on my
> existing drives without booting into Windows.
> it seems that *some* HDD manufacturers have easy tools for Linux to do
> that, but I don't know what they are, as that was redacted:
> https://forum.corsair.com/forums/topic/77369-flashing-firmware-with-linux-hdparm-command/

Most such vendors usually have a Windows-only solution, or a boot CD
(which is usually based on FreeDOS or Linux). You may look into the
enterprise downloads, those usually have Linux tools.

> I see that hdparm has an option called --fwdownload, thought  I'd
> certainly not try that without being absolutely sure it'll work.

I wouldn't... Last time I updated my Samsung or Crucial disk, I
extracted the update tool from the boot CD ISO which I could download
from the firmware update website and used that.

> > > I've checked and my cache device has a block size of 512 bytes.
> >
> > Yep, all my bcache systems using 512 bytes are affected by that 5.15.2
> > kernel bug. Use 4k and you should be okay. The problem seems to come
> > from page-unaligned writes - and using 4k (the page size of your CPU)
> > seems to work around that. Kernel 5.15.3 has the most part of the fix,
> > another fix is queued for one of the next releases. Another lesson
> > learned: Don't use a new kernel until it's in its x.y.{4,5,6}
> > releases. This is not the first time I had catastrophic events with
> > kernels in their infancy. That's why I usually avoid .0 and .1
> > kernels. Seems I should add .2 and .3 kernels to that list, too. Never
> > do a major kernel upgrade without creating a full backup first. Kernel
> > components like bcache are much less well-tested than other
> > components, so they likely break on early kernel releases for some
> > exotic use-cases (exotic because nobody who cares about their data
> > uses writeback).
> I'm at kernel 5.15.3 right now. I think Arch Linux ships kernel
> updates after they reach .3. The 5.15 came out like 2 weeks ago.

In your YT video, it shows 5.15.2...

> > > That's
> > > a strange value, as the backing device is a AF HDD (like all of them
> > > in the past decade or more), so the block size should be 4Kb.
> > > I guess this also works until it doesn't.
> >
> > You won't have catastrophic events with writearound - and that's as
> > good as writeback on btrfs (and even better because it won't destroy
> > the filesystem in case of a cache hiccup). Bcache can break for any
> > reason, due to bugs, like any other kernel component. And bcache in
> > writeback mode usually means catastrophic results for ANY file system
> > attached to it - where btrfs is just much more likely to detect those
> > events. Even if you COULD repair the file system logical structure, it
> > still means some data wasn't written - btrfs just has a much better
> > understanding about what should be on the disk while other filesystems
> > silently accept the data loss after recovering from structural errors.
> > BTW: 4k should be safe, there's another problem in bcache unrelated to
> > this which still needs fixing.
> >
> > > Can I destroy and recreate the cache device on a live system (my root
> > > filesystem is on this bcache set). I guess I can't.
> >
> > Yes, you can. Detaching the cache makes the backing devices pass
> > through, they are still available as /dev/bcache* even with no caching
> > device.
> >
> > > This is probably what I've done wrong today - I did
> > > not unregister the whole cset before attempting to recreate the cache
> > > device.
> >
> > Okay, unregistering should be quite essential but you don't need to
> > reboot. Also, I recommend using a new cset UUID so it cannot conflict
> > with any stale data that MAY be stored in the cache.
> Yeah, I used existing cset UUID. That has probably caused bcache to
> write garbage and corrupt the cache...

Probably not but you never know. Coly may have some insight here. It
usually should be safe to reuse the same UUID after re-formatting the
cache - and it should never ever reach a point to use some stale data.
But if something triggers a bug, there's extra safety now that UUIDs
don't become confused between new and old data.

> > > I am honestly a little afraid to touch it, after what happened.
> >
> > Well, the cache backend is stopped or detached - it doesn't matter
> > anyways. Just don't use writeback for the next couple of kernel
> > releases (or maybe rather avoid it for the future completely).
> > Writeback really doesn't gain you a lot on btrfs because due to COW,
> > btrfs is already quite good writing (because writes are usually going
> > to be sequential anyways), and it has become a lot better during the
> > last few kernel release cycles. I've been using writeback for a long
> > time now but this is just another occasion why I should not have been
> > using writeback but writearound instead (the other one being that
> > sometimes on boot, my SSD detaches from the bus, making bcache throw
> > away all writeback data and leaving me with a destroyed filesystem).
>
> Ok, I've booted into a live ISO and recreated the cache with 4K
> blocks. I hope it's gonna spare me some adventures in the future.

Works for me since a few days now and some heavy workloads - but in
writearound mode... I won't try writeback for a while now.

> > > I hope Bcachefs will eliminate these problems and provide a stable
> > > unified solution.
> >
> > You're swapping one "experimental" FS (btrfs) which has matured great
> > ways during at least the last 5 years with another experimental
> > filesystem which is not yet battle-tested and performance-tuned.
> > bcachefs and bcache are two completely distinctive products with
> > different use-cases, they only share a similar name because the
> > fundamental inner structures are based on the same code and idea (and
> > probably because the author thought it's cool).
> Yeah, honestly I wish he renamed Bcachefs to something shorter.
> Anyway - I'm not gonna use it until it reaches mainline kernel, and
> then still only for experiments, not for production.

Rule of thumb: It usually takes a new filesystem around 10 years to
reach fully stable operation even for corner cases after being
released to the wild. Btrfs is older meanwhile, meaning it should be
rock-solid. Bcachefs will still have to walk that route. I think you
can give it a first production ride after 3 years (not without
backups, tho).

> > I'm not sure if you use device pooling with btrfs (multiple disks) but
> > for my system, it showed useful to NOT use RAID-0 for btrfs data, it's
> > actually slower in normal desktop use and the way how btrfs internally
> > distributes data access across devices. I found that using single-data
> > mode even with multidisk has better write behavior and better read
> > latency, and it makes better use of bcache. So maybe its worth a try
> > if you fear that using writearound mode could degrade your system
> > responsiveness too much.
> I am not using multiple devices in a single Btrfs filesystem at the moment.
> I assumed using 2 drives in RAID1 would double the read speed (on
> large files) since the extents can be read from two disks at once.
> It's strange that it doesn't work like that...

That's due to how btrfs spreads RAID-1 reads across devices: It's
PID-based: every even PID reads one mirror, every odd PID reads the
other - that simple. It usually works well enough and avoids some
pitfalls with alternating between devices too often. Even real
hardware RAID has a stripe size, and you only read from one stripe at
a time, you'll never read from stripes in parallel. Reads become
parallel, when either enough readers read the disks at alternating
stripe sets, or the reads become big enough to span stripe sets.
Something similar will happen for btrfs in the first case but probably
not the latter. In any case, there's never ever a doubling of read
speed, at most you can see a doubling of readers without compromising
speed. That also means that RAID usually makes no sense for desktop
use (there's mostly always just one application accessing data at a
time which makes it difficult to engage all spindles in parallel).

Writes have to go to all spindles in parallel anyways. That's why
single-spindle data for my multi-disk btrfs is faster for me: It has a
higher chance of decoupling readers from writers.

It's not very different from how hardware RAID works, it has about the
same properties. But it may do more intelligent things than PID-based
stripe rotation (similar to how kernel mdraid does it with rotation
based on queue length and avg device response time).


> > > Take care
> > > - unfa
> >
> > Good luck
> > Kai
>
> Thank you so much for your insight!
> That's all invaluable information you're sharing.
>
> I hope these messages are going to be available publicly in some
> mailing list archive for future reference when I inevitably encounter
> the same problems in 5 years after I forgot what it was all about...
>
> Thank you!
> - unfa
>
> >
> >
> > > wt., 23 lis 2021 o 18:40 Kai Krakow <kai@kaishome.de> napisał(a):
> > > >
> > > > Oops:
> > > >
> > > > > # echo 1 >/sys/fs/bcache/CSETUUID/unregister
> > > > > # bcache make -C -w 4096 -l LABEL --force /dev/BPART
> > > >
> > > > CPART of course!
> > > >
> > > > # bcache make -C -w 4096 -l LABEL --force /dev/CPART
> > > >
> > > > Bye
> > > > Kai
> > >
> > >
> > >
> > > --
> > > - Tobiasz 'unfa' Karoń
> > >
> > > www.youtube.com/unfa000
>
>
>
> --
> - Tobiasz 'unfa' Karoń
>
> www.youtube.com/unfa000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bcache is not caching anything. cache state=inconsistent, how to clear?
  2021-11-23 17:37 ` Kai Krakow
  2021-11-23 17:40   ` Kai Krakow
@ 2022-01-06  3:00   ` Eric Wheeler
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Wheeler @ 2022-01-06  3:00 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Tobiasz Karoń, linux-bcache

[-- Attachment #1: Type: text/plain, Size: 7460 bytes --]

On Tue, 23 Nov 2021, Kai Krakow wrote:

> Hello Tobiasz!
> 
> Am Di., 23. Nov. 2021 um 15:48 Uhr schrieb Tobiasz Karoń <unfa00@gmail.com>:
> >
> > Hi!
> >
> > TL;DR
> >
> > My cache is inconsistent, and that's probably preventing Bcache for m
> > using it (all I/O goes to the backing device). How can I clear that?
> 
> I've had a similar problem after bcache crashed due to a bug in the
> latest kernel.
> 
> I could resolve it by the following steps (I think you figure out what
> the PLACEHOLDERS mean):
> 
> For each backend device, set the cache_mode to none and detach it:
> 
> # echo none >/sys/block/BDEV/BPART/bcache/cache_mode
> # echo 1 >/sys/block/BDEV/BPART/bcache/detach
> 
> Unregister the cache and re-create it (4096 works around the kernel
> bug, also, it's potentially broken, so re-create):
> 
> # echo 1 >/sys/fs/bcache/CSETUUID/unregister
> # bcache make -C -w 4096 -l LABEL --force /dev/CPART

I didn't know the cache could be formated with -w 4096.  Isn't that for 
the bdev?  If not, then beware of the 4Kn bcache bug that is floating 
around.  Not sure if -w 4096 on a cachedev would hit that or not...

-Eric

> 
> Re-attach the devices and set cache mode:
> 
> # echo NEW_CSETUUID >/sys/block/BDEV/BPART/bcache/attach
> # echo writearound >/sys/block/BDEV/BPART/bcache/cache_mode
> 
> I'm explicitly using writearound for btrfs because:
> 
> * writethrough would write data potentially relocated by COW
> * writeback potentially destroys btrfs on unexpected bcache failures
> * the performance difference between writeback and writearound for
> btrfs is virtually non-existent
> 
> However, writearound will cache only reads, that means boot-time
> improvements will lag one boot behind: During the first boot, bcache
> will read btrfs and cache the reads, on the next boot, it will read
> the cached data. Using writethrough could work around that but that's
> not really useful with a COW filesystem because btrfs relocated
> extents on each and every tiny write - making any cached data stale
> and thus occupy bcache space for no reason. So it will also amplify
> writes to the SSD for no real reason.
> 
> Youtube:
> 
> The problem you see and documented is exactly what happened to me (but
> on Gentoo: system froze, reboot hung, rescue disk said: cache disabled
> with a similar message), and you can work around it by using blocksize
> 4096 - and in any case it still happens: Do NOT use writeback caching,
> use writearound as mentioned above, then at least it won't destroy
> btrfs and it's a matter of re-creating the cache as outlined above.
> 
> HTH
> Kai
> 
> 
> > Details:
> >
> > I've been using Bcache for the past few months on my root Btrfs
> > filesystem with success.
> > Then one day out of the blue Bcache failed and took my Btrfs
> > filesystem with it (details:
> > https://www.youtube.com/watch?v=Hf3zr6CxvmI, looks similar to this:
> > https://stackoverflow.com/questions/22820492/how-to-revert-bcache-device-to-regular-device).
> > That's not the topic of my message though.
> > I've done a clean Arch Linux installation on Bcache + Btrfs once again
> > using an SSD partition for cache and an HDD as the backing device.
> >
> > However, this time it doesn't do anything...
> > I was unable to find any information online to solve this.
> >
> > My Bcache device works fine, the system boots off of it. However all
> > I/O goes straight to the backing HDD, and the SSD is unused. Needless
> > to say this means the performance is not what I got used to when
> > Bcache was working fine.
> >
> > Here's what a 3rd party bcache-status script says (it'd be great if
> > bcache-tools would provide something like this, BTW):
> >
> > ❯ bcache-status
> > --- bcache ---
> > Device                      ? (?)
> > UUID                        c9cd8259-3cee-42ff-a8ec-e11193c09b7e
> > Block Size                  0.50KiB
> > Bucket Size                 512.00KiB
> > Congested?                  False
> > Read Congestion             2.0ms
> > Write Congestion            20.0ms
> > Total Cache Size            173.97GiB
> > Total Cache Used            8.70GiB     (5%)
> > Total Cache Unused          165.27GiB   (95%)
> > Dirty Data                  0.50KiB     (0%)
> > Evictable Cache             173.97GiB   (100%)
> > Replacement Policy          [lru] fifo random
> > Cache Mode                  (Unknown)
> > Total Hits                  0
> > Total Misses                0
> > Total Bypass Hits           0
> > Total Bypass Misses         0
> > Total Bypassed              0B
> >
> > The Total Cache Used value has not changed since I've done my initial
> > Arch Linux installation. It seems that Bcache has "turned off" by that
> > point.
> >
> > Here's the bcache supers fro the backing device and cache
> >
> > ❯ bcache-super-show /dev/sda
> > sb.magic                ok
> > sb.first_sector         8 [match]
> > sb.csum                 4E6EACCA74AB0AE5 [match]
> > sb.version              1 [backing device]
> >
> > dev.label               unfa-desktop%20root
> > dev.uuid                49202fdf-fbe5-48fd-bdd8-df5414da817c
> > dev.sectors_per_block   8
> > dev.sectors_per_bucket  1024
> > dev.data.first_sector   16
> > dev.data.cache_mode     0 [writethrough]
> > dev.data.cache_state    3 [inconsistent]
> >
> > cset.uuid               9572380e-8e6f-4ce4-8323-80b98a85eeed
> >
> > ❯ bcache-super-show /dev/sdd3
> > sb.magic                ok
> > sb.first_sector         8 [match]
> > sb.csum                 259C90FD74B4D4BE [match]
> > sb.version              3 [cache device]
> >
> > dev.label               (empty)
> > dev.uuid                95c6449a-03b5-40f2-a8cc-80b1b61c5ef0
> > dev.sectors_per_block   1
> > dev.sectors_per_bucket  1024
> > dev.cache.first_sector  1024
> > dev.cache.cache_sectors 364833792
> > dev.cache.total_sectors 364834816
> > dev.cache.ordered       yes
> > dev.cache.discard       no
> > dev.cache.pos           0
> > dev.cache.replacement   0 [lru]
> >
> > cset.uuid               c9cd8259-3cee-42ff-a8ec-e11193c09b7e
> >
> > BTW - I've now realized I've set a label for the backing device but
> > not the cache. maybe this is the reason? I don't think it should work
> > this way but I've cleared the label on my backing device just to be
> > sure.
> >
> > Hmm. The cache in inconsistent. I had this before I reinstalled my OS.
> > I have recreated the bcache cache on the SSD and was hoping that will
> > solve it.
> > I don't know what I should do with this, is this the  reason why it's
> > not working?
> >
> > I was wondering if washing the partition and recreating the cache
> > would help, but I don't want to needlessly wear down the SSD if that
> > won't help.
> >
> > Needless to say I would really like to avoid data loss when using
> > Bcache - it's awesome, and the developer says it's perfectly stable
> > and safe, but I've had a sudden failure and others had such as well
> > (without seeing any hardware issues that could be causing that). Maybe
> > I should quit using Bcache all together? Maybe it's not
> > production-ready? I was wondering about maybe using Bcachefs, though
> > the need to compile a custom kernel for it is quite a deterrent. I
> > tried it briefly, but the bcachefs-tools stopped working at some point
> > without a visible reason. I know Btrfs is flawed, though it seems to
> > be the best so far.
> >
> > Thank you for your work,
> > - unfa
> >
> > --
> > - Tobiasz 'unfa' Karoń
> >
> > www.youtube.com/unfa000
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-06  3:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 14:48 Bcache is not caching anything. cache state=inconsistent, how to clear? Tobiasz Karoń
2021-11-23 17:37 ` Kai Krakow
2021-11-23 17:40   ` Kai Krakow
2021-11-23 22:34     ` Tobiasz Karoń
2021-11-24  5:35       ` Kai Krakow
2021-11-24 12:41         ` Tobiasz Karoń
2021-11-24 13:24           ` Kai Krakow
2022-01-06  3:00   ` Eric Wheeler
2021-11-23 22:24 ` Tobiasz Karoń

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.