All of lore.kernel.org
 help / color / mirror / Atom feed
* bcache: bad block header
@ 2018-04-03 19:01 Nikolaus Rath
  2018-04-03 22:38 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Nikolaus Rath @ 2018-04-03 19:01 UTC (permalink / raw)
  To: linux-bcache, linux-block

[ Re-send to both linux-block and linux-bcache ]

Hi,

A few days ago, my system refused to boot because it couldn't find the root=
 filesystem anymore. The root filesystem is ext4 on LVM on dm-crypt on bcac=
he, using kernel 4.9.92 (from Debian stretch). Booting from a recovery medi=
um with Kernel 4.16, I got:

[=C2=A0=C2=A0 84.551715] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 84.553188] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 84.616438] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 84.616440] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 84.616442] , disabling caching
[=C2=A0=C2=A0 84.616445] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 84.616597] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.375933]=C2=A0 sdb: sdb1 sdb2 sdb4 < sdb5 >
[=C2=A0=C2=A0 85.416610] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 85.416612] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 85.416614] , disabling caching
[=C2=A0=C2=A0 85.416618] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 85.416624] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 85.416626] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.416796] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.488246] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 85.488249] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 85.488251] , disabling caching
[=C2=A0=C2=A0 85.488254] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 85.488429] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.560003] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 85.560006] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 85.560008] , disabling caching
[=C2=A0=C2=A0 85.560013] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 85.560017] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.560217] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.571950] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 85.580628] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 85.761969] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.792749] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.952931] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.955640] bcache: register_bcache() error /dev/sda4: device =
already registered
[...]

These are the first messages that mention bcache. Note that the first messa=
ge is that the device is already registered - is that normal?

smartctl does not report any errors on backing or caching disks, and the sy=
stem was shutdown cleanly.

The only possibly related thing that comes to mind is that a few days ago I=
 hibernated and resumed the system (this is something I normally don't do).=
 Resume worked fine as far as I could tell though, and there have been no u=
nclean shutdowns.

Is there a way to narrow down what may have caused this corruption?

And, is there a way to gracefully recover from this situation without wipin=
g everything? Since the message mentions only problems with one block, can =
I maybe tell bcache to just ignore/drop this specific block?

Thanks!
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =
=C2=BBTime flies like an arrow, fruit flies like a Banana.=C2=AB

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache: bad block header
  2018-04-03 19:01 bcache: bad block header Nikolaus Rath
@ 2018-04-03 22:38 ` Jens Axboe
  2018-04-05  8:51   ` bcache and hibernation (was: bcache: bad block header) Nikolaus Rath
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2018-04-03 22:38 UTC (permalink / raw)
  To: Nikolaus Rath, linux-bcache, linux-block; +Cc: Michael Lyle

CC'ing Mike

On 4/3/18 1:01 PM, Nikolaus Rath wrote:
> [ Re-send to both linux-block and linux-bcache ]
> 
> Hi,
> 
> A few days ago, my system refused to boot because it couldn't find the root filesystem anymore. The root filesystem is ext4 on LVM on dm-crypt on bcache, using kernel 4.9.92 (from Debian stretch). Booting from a recovery medium with Kernel 4.16, I got:
> 
> [   84.551715] bcache: register_bcache() error /dev/sda4: device already registered
> [   84.553188] bcache: register_bcache() error /dev/sdc2: device already registered
> [   84.616438] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b135f:
> [   84.616440] bad btree header at bucket 85065, block 0, 0 keys
> [   84.616442] , disabling caching
> [   84.616445] bcache: register_cache() registered cache device sdb2
> [   84.616597] bcache: cache_set_free() Cache set 1330b5f6-0c13-43ec-b925-2ee2734b135f unregistered
> [   85.375933]  sdb: sdb1 sdb2 sdb4 < sdb5 >
> [   85.416610] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b135f:
> [   85.416612] bad btree header at bucket 85065, block 0, 0 keys
> [   85.416614] , disabling caching
> [   85.416618] bcache: register_cache() registered cache device sdb2
> [   85.416624] bcache: register_bcache() error /dev/sdc2: device already registered
> [   85.416626] bcache: register_bcache() error /dev/sda4: device already registered
> [   85.416796] bcache: cache_set_free() Cache set 1330b5f6-0c13-43ec-b925-2ee2734b135f unregistered
> [   85.488246] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b135f:
> [   85.488249] bad btree header at bucket 85065, block 0, 0 keys
> [   85.488251] , disabling caching
> [   85.488254] bcache: register_cache() registered cache device sdb2
> [   85.488429] bcache: cache_set_free() Cache set 1330b5f6-0c13-43ec-b925-2ee2734b135f unregistered
> [   85.560003] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b135f:
> [   85.560006] bad btree header at bucket 85065, block 0, 0 keys
> [   85.560008] , disabling caching
> [   85.560013] bcache: register_cache() registered cache device sdb2
> [   85.560017] bcache: register_bcache() error /dev/sda4: device already registered
> [   85.560217] bcache: cache_set_free() Cache set 1330b5f6-0c13-43ec-b925-2ee2734b135f unregistered
> [   85.571950] bcache: register_bcache() error /dev/sdc2: device already registered
> [   85.580628] bcache: register_bcache() error /dev/sdc2: device already registered
> [   85.761969] bcache: register_bcache() error /dev/sda4: device already registered
> [   85.792749] bcache: register_bcache() error /dev/sda4: device already registered
> [   85.952931] bcache: register_bcache() error /dev/sda4: device already registered
> [   85.955640] bcache: register_bcache() error /dev/sda4: device already registered
> [...]
> 
> These are the first messages that mention bcache. Note that the first message is that the device is already registered - is that normal?
> 
> smartctl does not report any errors on backing or caching disks, and the system was shutdown cleanly.
> 
> The only possibly related thing that comes to mind is that a few days ago I hibernated and resumed the system (this is something I normally don't do). Resume worked fine as far as I could tell though, and there have been no unclean shutdowns.
> 
> Is there a way to narrow down what may have caused this corruption?
> 
> And, is there a way to gracefully recover from this situation without wiping everything? Since the message mentions only problems with one block, can I maybe tell bcache to just ignore/drop this specific block?
> 
> Thanks!
> -Nikolaus
> --
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
> 
>              »Time flies like an arrow, fruit flies like a Banana.«
> 


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bcache and hibernation (was: bcache: bad block header)
  2018-04-03 22:38 ` Jens Axboe
@ 2018-04-05  8:51   ` Nikolaus Rath
  2018-04-05 18:13     ` bcache and hibernation Michael Lyle
  0 siblings, 1 reply; 6+ messages in thread
From: Nikolaus Rath @ 2018-04-05  8:51 UTC (permalink / raw)
  To: Jens Axboe, linux-bcache, linux-block; +Cc: Michael Lyle

Hi,

I have a hypothesis of what happened. My swap volume is also on LVM, and th=
us also eventually backed by bcache. Hibernation and resume work fine. But =
when the hibernation image is read during resume, the contents of the cache=
 device change because with bcache reading is no longer a read-only operati=
on. When the hibernation image is loaded, the kernel looses track of these =
changes so that what's on the cache disk no longer matches the structures i=
n the kernel. Therefore, on the first boot after the successful resume, hav=
oc ensures.

I needed the system running again, so I've now detached the backing volumes=
, re-initialized the cache volume and re-attached the backing volumes. Unfo=
rtunately there was too much filesystem damage, so I restored everything fr=
om backup.

Is there a way to prevent this from happening? Could eg the kernel detect t=
hat the swap devices is (indirectly) on bcache and refuse to hibernate? Or =
is there a way to do a "true" read-only mount of a bcache volume so that on=
e can safely resume from it?
=20
Best,
-Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =
=C2=BBTime flies like an arrow, fruit flies like a Banana.=C2=AB

On Tue, 3 Apr 2018, at 23:38, Jens Axboe wrote:
> CC'ing Mike
>=20
> On 4/3/18 1:01 PM, Nikolaus Rath wrote:
> > [ Re-send to both linux-block and linux-bcache ]
> >=20
> > Hi,
> >=20
> > A few days ago, my system refused to boot because it couldn't find the =
root filesystem anymore. The root filesystem is ext4 on LVM on dm-crypt on =
bcache, using kernel 4.9.92 (from Debian stretch). Booting from a recovery =
medium with Kernel 4.16, I got:
> >=20
> > [=C2=A0=C2=A0 84.551715] bcache: register_bcache() error /dev/sda4: dev=
ice already registered
> > [=C2=A0=C2=A0 84.553188] bcache: register_bcache() error /dev/sdc2: dev=
ice already registered
> > [=C2=A0=C2=A0 84.616438] bcache: error on 1330b5f6-0c13-43ec-b925-2ee27=
34b135f:
> > [=C2=A0=C2=A0 84.616440] bad btree header at bucket 85065, block 0, 0 k=
eys
> > [=C2=A0=C2=A0 84.616442] , disabling caching
> > [=C2=A0=C2=A0 84.616445] bcache: register_cache() registered cache devi=
ce sdb2
> > [=C2=A0=C2=A0 84.616597] bcache: cache_set_free() Cache set 1330b5f6-0c=
13-43ec-b925-2ee2734b135f unregistered
> > [=C2=A0=C2=A0 85.375933]=C2=A0 sdb: sdb1 sdb2 sdb4 < sdb5 >
> > [=C2=A0=C2=A0 85.416610] bcache: error on 1330b5f6-0c13-43ec-b925-2ee27=
34b135f:
> > [=C2=A0=C2=A0 85.416612] bad btree header at bucket 85065, block 0, 0 k=
eys
> > [=C2=A0=C2=A0 85.416614] , disabling caching
> > [=C2=A0=C2=A0 85.416618] bcache: register_cache() registered cache devi=
ce sdb2
> > [=C2=A0=C2=A0 85.416624] bcache: register_bcache() error /dev/sdc2: dev=
ice already registered
> > [=C2=A0=C2=A0 85.416626] bcache: register_bcache() error /dev/sda4: dev=
ice already registered
> > [=C2=A0=C2=A0 85.416796] bcache: cache_set_free() Cache set 1330b5f6-0c=
13-43ec-b925-2ee2734b135f unregistered
> > [=C2=A0=C2=A0 85.488246] bcache: error on 1330b5f6-0c13-43ec-b925-2ee27=
34b135f:
> > [=C2=A0=C2=A0 85.488249] bad btree header at bucket 85065, block 0, 0 k=
eys
> > [=C2=A0=C2=A0 85.488251] , disabling caching
> > [=C2=A0=C2=A0 85.488254] bcache: register_cache() registered cache devi=
ce sdb2
> > [=C2=A0=C2=A0 85.488429] bcache: cache_set_free() Cache set 1330b5f6-0c=
13-43ec-b925-2ee2734b135f unregistered
> > [=C2=A0=C2=A0 85.560003] bcache: error on 1330b5f6-0c13-43ec-b925-2ee27=
34b135f:
> > [=C2=A0=C2=A0 85.560006] bad btree header at bucket 85065, block 0, 0 k=
eys
> > [=C2=A0=C2=A0 85.560008] , disabling caching
> > [=C2=A0=C2=A0 85.560013] bcache: register_cache() registered cache devi=
ce sdb2
> > [=C2=A0=C2=A0 85.560017] bcache: register_bcache() error /dev/sda4: dev=
ice already registered
> > [=C2=A0=C2=A0 85.560217] bcache: cache_set_free() Cache set 1330b5f6-0c=
13-43ec-b925-2ee2734b135f unregistered
> > [=C2=A0=C2=A0 85.571950] bcache: register_bcache() error /dev/sdc2: dev=
ice already registered
> > [=C2=A0=C2=A0 85.580628] bcache: register_bcache() error /dev/sdc2: dev=
ice already registered
> > [=C2=A0=C2=A0 85.761969] bcache: register_bcache() error /dev/sda4: dev=
ice already registered
> > [=C2=A0=C2=A0 85.792749] bcache: register_bcache() error /dev/sda4: dev=
ice already registered
> > [=C2=A0=C2=A0 85.952931] bcache: register_bcache() error /dev/sda4: dev=
ice already registered
> > [=C2=A0=C2=A0 85.955640] bcache: register_bcache() error /dev/sda4: dev=
ice already registered
> > [...]
> >=20
> > These are the first messages that mention bcache. Note that the first m=
essage is that the device is already registered - is that normal?
> >=20
> > smartctl does not report any errors on backing or caching disks, and th=
e system was shutdown cleanly.
> >=20
> > The only possibly related thing that comes to mind is that a few days a=
go I hibernated and resumed the system (this is something I normally don't =
do). Resume worked fine as far as I could tell though, and there have been =
no unclean shutdowns.
> >=20
> > Is there a way to narrow down what may have caused this corruption?
> >=20
> > And, is there a way to gracefully recover from this situation without w=
iping everything? Since the message mentions only problems with one block, =
can I maybe tell bcache to just ignore/drop this specific block?
> >=20
> > Thanks!
> > -Nikolaus
> > --
> > GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
> >=20
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 =C2=BBTime flies like an arrow, fruit flies like a Banana.=C2=AB
> >=20
>=20
>=20
> --=20
> Jens Axboe
>=20

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache and hibernation
  2018-04-05  8:51   ` bcache and hibernation (was: bcache: bad block header) Nikolaus Rath
@ 2018-04-05 18:13     ` Michael Lyle
  2018-04-05 19:51       ` Nikolaus Rath
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Lyle @ 2018-04-05 18:13 UTC (permalink / raw)
  To: Nikolaus Rath, Jens Axboe, linux-bcache, linux-block

Hi Nikolaus (and everyone else),

Sorry I've been slow in responding.  I probably need to step down as
bcache maintainer because so many other things have competed for my time
lately and I've fallen behind on both patches and mailing list.

On 04/05/2018 01:51 AM, Nikolaus Rath wrote:
> Is there a way to prevent this from happening? Could eg the kernel detect that the swap devices is (indirectly) on bcache and refuse to hibernate? Or is there a way to do a "true" read-only mount of a bcache volume so that one can safely resume from it?

I think you're correct.  If you're using bcache in writeback mode, it is
not safe to hibernate there, because some of the blocks involved in the
resume can end up in cache (and dependency issues, like you mention).
There's similar cautions/problems with btrfs.

I am unaware of a mechanism to prohibit this in the kernel-- to say that
a given type of block provider can't be involved in a resume operation.
Most documentation for hibernation explicitly cautions about the btrfs
situation, but use of bcache is less common and as a result generally
isn't covered.

> Best,
> -Nikolaus
Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache and hibernation
  2018-04-05 18:13     ` bcache and hibernation Michael Lyle
@ 2018-04-05 19:51       ` Nikolaus Rath
  2018-04-06  0:21         ` Michael Lyle
  0 siblings, 1 reply; 6+ messages in thread
From: Nikolaus Rath @ 2018-04-05 19:51 UTC (permalink / raw)
  To: Michael Lyle; +Cc: Jens Axboe, linux-bcache, linux-block

Hi Michael,

On Apr 05 2018, Michael Lyle <mlyle@lyle.org> wrote:
> On 04/05/2018 01:51 AM, Nikolaus Rath wrote:
>> Is there a way to prevent this from happening? Could eg the kernel
>> detect that the swap devices is (indirectly) on bcache and refuse to
>> hibernate? Or is there a way to do a "true" read-only mount of a
>> bcache volume so that one can safely resume from it?
>
> I think you're correct.  If you're using bcache in writeback mode, it is
> not safe to hibernate there, because some of the blocks involved in the
> resume can end up in cache (and dependency issues, like you mention).

Could you explain why this isn't a problem with writethrough? It seems
to me that the trouble happens when the hibernation image is *read*, so
why does it matter what kind of write caching is used?

> I am unaware of a mechanism to prohibit this in the kernel-- to say that
> a given type of block provider can't be involved in a resume operation.
> Most documentation for hibernation explicitly cautions about the btrfs
> situation, but use of bcache is less common and as a result generally
> isn't covered.

Could you maybe add a warning to Documentation/bcache.txt? I think this
would have saved me.

Best,
-Nikolaus

--=20
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             =C2=BBTime flies like an arrow, fruit flies like a Banana.=C2=
=AB

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache and hibernation
  2018-04-05 19:51       ` Nikolaus Rath
@ 2018-04-06  0:21         ` Michael Lyle
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Lyle @ 2018-04-06  0:21 UTC (permalink / raw)
  To: Michael Lyle, Jens Axboe, linux-bcache, linux-block

On Thu, Apr 5, 2018 at 12:51 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
> Hi Michael,
>
> Could you explain why this isn't a problem with writethrough? It seems
> to me that the trouble happens when the hibernation image is *read*, so
> why does it matter what kind of write caching is used?

With writethrough you can set up your loader to read it directly from
the backing device-- e.g. you don't need the cache, and there's at
least some valid configurations; with writeback some of the extents
may be on the cache dev so...

That said, it's not really great to put swap/hibernate on a cache
device... the workloads don't usually benefit much from tiering (since
they tend to be write-once-read-never or write-once-read-once).

>> I am unaware of a mechanism to prohibit this in the kernel-- to say that
>> a given type of block provider can't be involved in a resume operation.
>> Most documentation for hibernation explicitly cautions about the btrfs
>> situation, but use of bcache is less common and as a result generally
>> isn't covered.
>
> Could you maybe add a warning to Documentation/bcache.txt? I think this
> would have saved me.

Yah, I can look at that.

>
> Best,
> -Nikolaus

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-06  0:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 19:01 bcache: bad block header Nikolaus Rath
2018-04-03 22:38 ` Jens Axboe
2018-04-05  8:51   ` bcache and hibernation (was: bcache: bad block header) Nikolaus Rath
2018-04-05 18:13     ` bcache and hibernation Michael Lyle
2018-04-05 19:51       ` Nikolaus Rath
2018-04-06  0:21         ` Michael Lyle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.