linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* I/O error on cache device can cause user observable errors
@ 2024-02-01 22:25 Arnaldo Montagner
  2024-02-02  7:00 ` Coly Li
  0 siblings, 1 reply; 4+ messages in thread
From: Arnaldo Montagner @ 2024-02-01 22:25 UTC (permalink / raw)
  To: linux-bcache

The bcache documentation says that errors on the cache device are
handled transparently.

I'm seeing a case where the cache device is unregistered in response
to repeated write errors (expected), but that results in a read error
on the bcache device (unexpected).

Here's how I'm reproducing the problem:
1. Create a device with dm-error to simulate I/O errors. The device is
1G in size and it will fail I/Os in a 4M extent starting at offset
128M:
    $ dmsetup create cache_disk << EOF
      0      262144    linear /dev/sdb 0
      262144 8192      error
      270336 1826816   linear /dev/sdb 270336
    EOF

2. Set up bcache in writethrough mode. The backing device is 1000G in length:
    $ make-bcache --cache /dev/mapper/cache_disk --bdev /dev/sdc
--wipe-bcache --bucket 256k
    $ echo writethrough > /sys/block/bcache0/bcache/cache_mode
    $ echo 0 > /sys/block/bcache0/bcache/cache/synchronous

    $ lsblk
    NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    ...
    sdb            8:16   0    10G  0 disk
    └─cache_disk 253:0    0     1G  0 dm
      └─bcache0  252:0    0  1000G  0 disk
    sdc            8:32   0  1000G  0 disk
    └─bcache0    252:0    0  1000G  0 disk

3. Start a random read workload on the bcache device (using fio):
    $ fio --name=basic --filename=/dev/bcache0 --size=1000G
--rw=randread  --blocksize=256k --blockalign=256k

4. After a while I see that the cache device gets unregistered.
However, the application output indicates it saw an I/O error on a
read request:
     fio: io_u error on file /dev/bcache0: Input/output error: read
offset=592264298496, buflen=262144

I can see in the syslogs that bcache unregistered the cache. The logs
also show that there was an I/O error on the bcache device:
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.176867] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.186494] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.195743] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.204869] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.234722] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.246102] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.274013] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.289128] bcache:
bch_cache_set_error() error on 427201f5-5c86-4890-9866-f9860e518041:
dm-0: too many IO errors writing data to cache
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.289128] ,
disabling caching
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.306212] bcache:
conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0
is "auto" and cache is clean, keep it alive.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.306543] Buffer
I/O error on dev bcache0, logical block 144595776, async page read
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.316119] bcache:
cached_dev_detach_finish() Caching disabled for sdc
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.316398] bcache:
cache_set_free() Cache set 427201f5-5c86-4890-9866-f9860e518041
unregistered

The steps above reproduce the problem most of the time, but not
always. In a few of the attempts, the cache was unregistered without
resulting in observable I/O errors.

Is this expected?

I'm running the Linux kernel version 6.5.0.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-03  3:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-01 22:25 I/O error on cache device can cause user observable errors Arnaldo Montagner
2024-02-02  7:00 ` Coly Li
2024-02-02 17:48   ` Arnaldo Montagner
2024-02-03  3:43     ` Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).