linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Montagner <armont@google.com>
To: linux-bcache@vger.kernel.org
Subject: I/O error on cache device can cause user observable errors
Date: Thu, 1 Feb 2024 14:25:40 -0800	[thread overview]
Message-ID: <CANF=pgrX7h26TjA9bPUm9umRA-9KvELb9z3-bJsHm+t6SYbE1w@mail.gmail.com> (raw)

The bcache documentation says that errors on the cache device are
handled transparently.

I'm seeing a case where the cache device is unregistered in response
to repeated write errors (expected), but that results in a read error
on the bcache device (unexpected).

Here's how I'm reproducing the problem:
1. Create a device with dm-error to simulate I/O errors. The device is
1G in size and it will fail I/Os in a 4M extent starting at offset
128M:
    $ dmsetup create cache_disk << EOF
      0      262144    linear /dev/sdb 0
      262144 8192      error
      270336 1826816   linear /dev/sdb 270336
    EOF

2. Set up bcache in writethrough mode. The backing device is 1000G in length:
    $ make-bcache --cache /dev/mapper/cache_disk --bdev /dev/sdc
--wipe-bcache --bucket 256k
    $ echo writethrough > /sys/block/bcache0/bcache/cache_mode
    $ echo 0 > /sys/block/bcache0/bcache/cache/synchronous

    $ lsblk
    NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    ...
    sdb            8:16   0    10G  0 disk
    └─cache_disk 253:0    0     1G  0 dm
      └─bcache0  252:0    0  1000G  0 disk
    sdc            8:32   0  1000G  0 disk
    └─bcache0    252:0    0  1000G  0 disk

3. Start a random read workload on the bcache device (using fio):
    $ fio --name=basic --filename=/dev/bcache0 --size=1000G
--rw=randread  --blocksize=256k --blockalign=256k

4. After a while I see that the cache device gets unregistered.
However, the application output indicates it saw an I/O error on a
read request:
     fio: io_u error on file /dev/bcache0: Input/output error: read
offset=592264298496, buflen=262144

I can see in the syslogs that bcache unregistered the cache. The logs
also show that there was an I/O error on the bcache device:
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.176867] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.186494] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.195743] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.204869] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.234722] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.246102] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.274013] bcache:
bch_count_io_errors() dm-0: IO error on writing data to cache.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.289128] bcache:
bch_cache_set_error() error on 427201f5-5c86-4890-9866-f9860e518041:
dm-0: too many IO errors writing data to cache
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.289128] ,
disabling caching
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.306212] bcache:
conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0
is "auto" and cache is clean, keep it alive.
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.306543] Buffer
I/O error on dev bcache0, logical block 144595776, async page read
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.316119] bcache:
cached_dev_detach_finish() Caching disabled for sdc
    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.316398] bcache:
cache_set_free() Cache set 427201f5-5c86-4890-9866-f9860e518041
unregistered

The steps above reproduce the problem most of the time, but not
always. In a few of the attempts, the cache was unregistered without
resulting in observable I/O errors.

Is this expected?

I'm running the Linux kernel version 6.5.0.

             reply	other threads:[~2024-02-01 22:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-01 22:25 Arnaldo Montagner [this message]
2024-02-02  7:00 ` I/O error on cache device can cause user observable errors Coly Li
2024-02-02 17:48   ` Arnaldo Montagner
2024-02-03  3:43     ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANF=pgrX7h26TjA9bPUm9umRA-9KvELb9z3-bJsHm+t6SYbE1w@mail.gmail.com' \
    --to=armont@google.com \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).