linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Krakow <kai@kaishome.de>
To: Coly Li <colyli@suse.de>
Cc: linux-bcache@vger.kernel.org
Subject: Re: Consistent failure of bcache upgrading from 5.10 to 5.15.2
Date: Thu, 18 Nov 2021 11:27:29 +0100	[thread overview]
Message-ID: <CAC2ZOYsoZJ2_73ZBfN13txs0=zqMVcjqDMMjmiWCq=kE8sprcw@mail.gmail.com> (raw)
In-Reply-To: <7485d9b0-80f4-4fff-5a0c-6dd0c35ff91b@suse.de>

Hi Coly!

Reading the commit logs, it seems to come from using a non-default
block size, 512 in my case (although I'm pretty sure that *is* the
default on the affected system). I've checked:
```
dev.sectors_per_block   1
dev.sectors_per_bucket  1024
```

The non-affected machines use 4k blocks (sectors per block = 8).

Can this value be changed "on the fly"? I think I remember that the
bdev super block must match the cdev super block - although that
doesn't make that much sense to me.

By "on the fly" I mean: Re-create the cdev super block, then just
attach the bdev - in this case, the sectors per block should not
matter because this is a brand new cdev with no existing cache data.
But I think it will refuse attaching the devices because of
non-matching block size (at least this was the case in the past). I
don't see a point in having a block size in both super blocks at all
if the only block size that matters lives in the cdev superblock.

Thanks
Kai

Am Di., 16. Nov. 2021 um 12:02 Uhr schrieb Coly Li <colyli@suse.de>:
>
> On 11/16/21 6:10 PM, Kai Krakow wrote:
> > Hello Coly!
> >
> > I think I can consistently reproduce a failure mode of bcache when
> > going from 5.10 LTS to 5.15.2 - on one single system (my other systems
> > do just fine).
> >
> > In 5.10, bcache is stable, no problems at all. After booting to
> > 5.15.2, btrfs would complain about broken btree generation numbers,
> > then freeze completely. Going back to 5.10, bcache complains about
> > being broken and cannot start the cache set.
> >
> > I was able to reproduce the following behavior after the problem
> > struck me twice in a row:
> >
> > 1. Boot into SysRescueCD
> > 2. modprobe bcache
> > 3. Manually detach the btrfs disks from bcache, set cache mode to
> > none, force running
> > 4. Reboot into 5.15.2 (now works)
> > 5. See this error in dmesg:
> >
> > [   27.334306] bcache: bch_cache_set_error() error on
> > 04af889c-4ccb-401b-b525-fb9613a81b69: empty set at bucket 1213, block
> > 1, 0 keys, disabling caching
> > [   27.334453] bcache: cache_set_free() Cache set
> > 04af889c-4ccb-401b-b525-fb9613a81b69 unregistered
> > [   27.334510] bcache: register_cache() error sda3: failed to run cache set
> > [   27.334512] bcache: register_bcache() error : failed to register device
> >
> > 6. wipefs the failed bcache cache
> > 7. bcache make -C -w 512 /dev/sda3 -l bcache-cdev0 --force
> > 8. re-attach the btrfs disks in writearound mode
> > 9. btrfs immediately fails, freezing the system (with transactions IDs way off)
> > 10. reboot loops to 5, unable to mount
> > 11. escape the situation by starting at 1, and not make a new bcache
> >
> > Is this a known error? Why does it only hit this machine?
> >
> > SSD Model: Samsung SSD 850 EVO 250GB
>
> This is already known, there are 3 locations to fix,
>
> 1, Revert commit 2fd3e5efe791946be0957c8e1eed9560b541fe46
> 2, Revert commit  f8b679a070c536600c64a78c83b96aa617f8fa71
> 3, Do the following change in drivers/md/bcache.c,
> @@ -885,9 +885,9 @@ static void bcache_device_free(struct bcache_device *d)
>
>                 bcache_device_detach(d);
>
>         if (disk) {
> -               blk_cleanup_disk(disk);
>                 ida_simple_remove(&bcache_device_idx,
>                                   first_minor_to_idx(disk->first_minor));
> +               blk_cleanup_disk(disk);
>         }
>
> The fix 1) and 3) are on the way to stable kernel IMHO, and fix 2) is only my workaround and I don't see upstream fix yet.
>
> Just FYI.
>
> Coly Li
>

  reply	other threads:[~2021-11-18 10:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 10:10 Consistent failure of bcache upgrading from 5.10 to 5.15.2 Kai Krakow
2021-11-16 11:02 ` Coly Li
2021-11-18 10:27   ` Kai Krakow [this message]
2021-11-20  0:06     ` Eric Wheeler
2021-11-23  8:54       ` Coly Li
2021-11-23  9:30         ` Kai Krakow
2022-01-06 15:32           ` Coly Li
2022-01-06  2:51         ` Eric Wheeler
2022-01-06  9:25           ` Frédéric Dumas
2022-01-06 15:55             ` Coly Li
2022-01-08  6:57               ` Coly Li
2022-01-06 15:49           ` Coly Li
2022-02-07  6:11             ` Coly Li
2022-02-07  7:37               ` Coly Li
2022-02-07  8:10                 ` Kai Krakow
2022-02-07  8:13                   ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC2ZOYsoZJ2_73ZBfN13txs0=zqMVcjqDMMjmiWCq=kE8sprcw@mail.gmail.com' \
    --to=kai@kaishome.de \
    --cc=colyli@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).