All of lore.kernel.org
 help / color / mirror / Atom feed
* bcache crashes at boot time
@ 2017-06-05 12:38 Igor Pavlenko
  2017-06-20 22:04 ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Igor Pavlenko @ 2017-06-05 12:38 UTC (permalink / raw)
  To: linux-bcache

Hi

Bcache crashes at boot time making root fs unavailable. Same error
occurs when I try to add caching and backing devices manually through
/sys/fs/bcache/register. I reproduced the error on ubuntu 17.04
(kernel 4.10.0) and fedora (4.4.67).

The cache was in writeback mode.

Is it possible to recover my data?

dmesg output:

[   45.941155] bcache: register_bdev() registered backing device sda3
[   46.138238] bcache: error on 471776aa-1b34-4eff-ba7d-509a9554b43d:
inconsistent ptrs: mark = 3, level = 0, disabling caching
[   46.139868] CPU: 0 PID: 1495 Comm: bcache-register Tainted: G     U
[   46.154267]  [<ffffffffc090ba34>]
__bch_btree_mark_key.part.23+0x244/0x270 [bcache]
[   46.156051]  [<ffffffffc090e3df>] bch_initial_mark_key+0xdf/0xf0 [bcache]
[   46.157744]  [<ffffffffc090e44c>] bch_btree_check_recurse+0x5c/0x230 [bcache]
[   46.159432]  [<ffffffffc090db2a>] ? bch_btree_node_get+0xba/0x280 [bcache]
[   46.160994]  [<ffffffffc090e51f>]
bch_btree_check_recurse+0x12f/0x230 [bcache]
[   46.162428]  [<ffffffffc0913738>] ? __bch_submit_bbio+0x58/0x60 [bcache]
[   46.163894]  [<ffffffffc090e800>] bch_btree_check+0x180/0x1d0 [bcache]
[   46.166862]  [<ffffffffc091d5d5>] run_cache_set+0x3b5/0x8e0 [bcache]
[   46.168399]  [<ffffffffc091ee47>] register_bcache+0xe07/0x11b0 [bcache]
[   47.182133] bcache: bch_journal_replay() journal replay done,
237858 keys in 207 entries, seq 3687554
[   47.184437] bcache: bch_cached_dev_attach() Can't attach sda3: shutting down
[   47.186168] bcache: register_cache() registered cache device sdb1
[   47.187538] bcache: cache_set_free() Cache set
471776aa-1b34-4eff-ba7d-509a9554b43d unregistered


-- 
Igor Pavlenko
ipavlenko.mail@gmail.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bcache crashes at boot time
  2017-06-05 12:38 bcache crashes at boot time Igor Pavlenko
@ 2017-06-20 22:04 ` Eric Wheeler
  2017-06-21 11:05   ` Nix
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2017-06-20 22:04 UTC (permalink / raw)
  To: Igor Pavlenko; +Cc: linux-bcache

On Mon, 5 Jun 2017, Igor Pavlenko wrote:

> Hi
> 
> Bcache crashes at boot time making root fs unavailable. Same error
> occurs when I try to add caching and backing devices manually through
> /sys/fs/bcache/register. I reproduced the error on ubuntu 17.04
> (kernel 4.10.0) and fedora (4.4.67).
> 
> The cache was in writeback mode.
> 
> Is it possible to recover my data?
> 
> dmesg output:
> 
> [   45.941155] bcache: register_bdev() registered backing device sda3
> [   46.138238] bcache: error on 471776aa-1b34-4eff-ba7d-509a9554b43d:
> inconsistent ptrs: mark = 3, level = 0, disabling caching

^^^

This is the reason, something was inconsistent enough that it couldn't 
continue.  

Disclaimer: No promises here about data integrity, so backup the full 
block device images first. This could cause more issues, but it might let 
the device come up far enough to access your data:

Near bcache/super.c:1315 You could try commenting out 
`bch_cache_set_unregister(c);` at the bottom of `bch_cache_set_error()`.

This won't fix the problem, but it would allow it to continue past the 
error---or it might just error again somewhere else.

-Eric


> [   46.139868] CPU: 0 PID: 1495 Comm: bcache-register Tainted: G     U
> [   46.154267]  [<ffffffffc090ba34>]
> __bch_btree_mark_key.part.23+0x244/0x270 [bcache]
> [   46.156051]  [<ffffffffc090e3df>] bch_initial_mark_key+0xdf/0xf0 [bcache]
> [   46.157744]  [<ffffffffc090e44c>] bch_btree_check_recurse+0x5c/0x230 [bcache]
> [   46.159432]  [<ffffffffc090db2a>] ? bch_btree_node_get+0xba/0x280 [bcache]
> [   46.160994]  [<ffffffffc090e51f>]
> bch_btree_check_recurse+0x12f/0x230 [bcache]
> [   46.162428]  [<ffffffffc0913738>] ? __bch_submit_bbio+0x58/0x60 [bcache]
> [   46.163894]  [<ffffffffc090e800>] bch_btree_check+0x180/0x1d0 [bcache]
> [   46.166862]  [<ffffffffc091d5d5>] run_cache_set+0x3b5/0x8e0 [bcache]
> [   46.168399]  [<ffffffffc091ee47>] register_bcache+0xe07/0x11b0 [bcache]
> [   47.182133] bcache: bch_journal_replay() journal replay done,
> 237858 keys in 207 entries, seq 3687554
> [   47.184437] bcache: bch_cached_dev_attach() Can't attach sda3: shutting down
> [   47.186168] bcache: register_cache() registered cache device sdb1
> [   47.187538] bcache: cache_set_free() Cache set
> 471776aa-1b34-4eff-ba7d-509a9554b43d unregistered
> 
> 
> -- 
> Igor Pavlenko
> ipavlenko.mail@gmail.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bcache crashes at boot time
  2017-06-20 22:04 ` Eric Wheeler
@ 2017-06-21 11:05   ` Nix
  2017-06-21 17:10     ` Kai Krakow
  2017-06-22  0:47     ` Eric Wheeler
  0 siblings, 2 replies; 7+ messages in thread
From: Nix @ 2017-06-21 11:05 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Igor Pavlenko, linux-bcache

On 20 Jun 2017, Eric Wheeler uttered the following:

> On Mon, 5 Jun 2017, Igor Pavlenko wrote:
>
>> [   45.941155] bcache: register_bdev() registered backing device sda3
>> [   46.138238] bcache: error on 471776aa-1b34-4eff-ba7d-509a9554b43d:
>> inconsistent ptrs: mark = 3, level = 0, disabling caching
>
> ^^^
>
> This is the reason, something was inconsistent enough that it couldn't 
> continue.  

I'm rather worried about this sort of thing, given that I got it after
my first cache population and less than a week using bcache in anger.

The system in question is an oldie that restarts by mounting as much as
it can readonly and then doing a reboot (when bcache would routinely
whine about timeouts): the things it manages to remount readonly does
not usually include / is bcache usually unhappy about this sort of
thing? Would it prefer it if I was able to unmount it properly? If so,
it's time to upgrade that machine... it's only that I've had an oops
before now when stopping bcaches (early in test, not preserved because I
thought I'd made a mistake), so I thought it might be unhappy about
*that*, as well.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bcache crashes at boot time
  2017-06-21 11:05   ` Nix
@ 2017-06-21 17:10     ` Kai Krakow
  2017-06-21 21:57       ` Nix
  2017-06-22  0:47     ` Eric Wheeler
  1 sibling, 1 reply; 7+ messages in thread
From: Kai Krakow @ 2017-06-21 17:10 UTC (permalink / raw)
  To: linux-bcache

Am Wed, 21 Jun 2017 12:05:26 +0100
schrieb Nix <nix@esperi.org.uk>:

> On 20 Jun 2017, Eric Wheeler uttered the following:
> 
> > On Mon, 5 Jun 2017, Igor Pavlenko wrote:
> >  
> >> [   45.941155] bcache: register_bdev() registered backing device
> >> sda3 [   46.138238] bcache: error on
> >> 471776aa-1b34-4eff-ba7d-509a9554b43d: inconsistent ptrs: mark = 3,
> >> level = 0, disabling caching  
> >
> > ^^^
> >
> > This is the reason, something was inconsistent enough that it
> > couldn't continue.    
> 
> I'm rather worried about this sort of thing, given that I got it after
> my first cache population and less than a week using bcache in anger.
> 
> The system in question is an oldie that restarts by mounting as much
> as it can readonly and then doing a reboot (when bcache would
> routinely whine about timeouts): the things it manages to remount
> readonly does not usually include / is bcache usually unhappy about
> this sort of thing? Would it prefer it if I was able to unmount it
> properly? If so, it's time to upgrade that machine... it's only that
> I've had an oops before now when stopping bcaches (early in test, not
> preserved because I thought I'd made a mistake), so I thought it
> might be unhappy about *that*, as well.

Does your SSD support power loss protection (PLP)?

My SSD went offline (disconnected from sda for unknown reason, a cold
reboot fixed it) without bcache having any issue (neither the cached
backing device had any issue)...

Having no PLP or enabling device write caching (e.g. by means of
hdparm) could result in unwanted data corruptions...


-- 
Regards,
Kai

Replies to list-only preferred.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bcache crashes at boot time
  2017-06-21 17:10     ` Kai Krakow
@ 2017-06-21 21:57       ` Nix
  0 siblings, 0 replies; 7+ messages in thread
From: Nix @ 2017-06-21 21:57 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

On 21 Jun 2017, Kai Krakow outgrape:

> Am Wed, 21 Jun 2017 12:05:26 +0100
> schrieb Nix <nix@esperi.org.uk>:
>> The system in question is an oldie that restarts by mounting as much
>> as it can readonly and then doing a reboot (when bcache would
>> routinely whine about timeouts): the things it manages to remount
>> readonly does not usually include / is bcache usually unhappy about
>> this sort of thing? Would it prefer it if I was able to unmount it
>> properly? If so, it's time to upgrade that machine... it's only that
>> I've had an oops before now when stopping bcaches (early in test, not
>> preserved because I thought I'd made a mistake), so I thought it
>> might be unhappy about *that*, as well.
>
> Does your SSD support power loss protection (PLP)?

It's an Intel DC3510, so one of the few that verifiably supports it and
has been verified to have the damn thing actually work: it's quite new
and its SMART logs report that the capacitors used to implement PLP work
fine. I'm also keeping my XFS logs (journals) on there, and they were
fine as well. (XFS routinely keeps stuff only in the log on a normal
shutdown, on the assumption that it'll be able to replay them on remount
without unnecessarily delaying the umount for a potentially huge
metadata update: if the logs were corrupted, it would have told me so
very loudly at startup.)

I think we can rule out the SSD itself as a cause.

> Having no PLP or enabling device write caching (e.g. by means of
> hdparm) could result in unwanted data corruptions...

This wasn't a powerdown, just a reboot. SMART reported no reset of the
SSD across the reboot.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bcache crashes at boot time
  2017-06-21 11:05   ` Nix
  2017-06-21 17:10     ` Kai Krakow
@ 2017-06-22  0:47     ` Eric Wheeler
  2017-06-22 11:24       ` Nix
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2017-06-22  0:47 UTC (permalink / raw)
  To: kent.overstreet; +Cc: Igor Pavlenko, Nix, linux-bcache

On Wed, 21 Jun 2017, Nix wrote:

> On 20 Jun 2017, Eric Wheeler uttered the following:
> 
> > On Mon, 5 Jun 2017, Igor Pavlenko wrote:
> >
> >> [   45.941155] bcache: register_bdev() registered backing device sda3
> >> [   46.138238] bcache: error on 471776aa-1b34-4eff-ba7d-509a9554b43d:
> >> inconsistent ptrs: mark = 3, level = 0, disabling caching
> >
> > ^^^
> >
> > This is the reason, something was inconsistent enough that it couldn't 
> > continue.  

Kent, do you know what this means in detail, and can you guess what might 
have caused it?

> I'm rather worried about this sort of thing, given that I got it after
> my first cache population and less than a week using bcache in anger.

Do you have a difference between `blockdev --getpbsz /dev/cache` vs 
/dev/bdev (or whatever your devices are named) ?

The physical blocksize must be the same as the cachedev IIRC.  If 4k, use 
-w 4k in make-bcache.  There's a thread somewhere discussing those 
details.  


--
Eric Wheeler


> 
> The system in question is an oldie that restarts by mounting as much as
> it can readonly and then doing a reboot (when bcache would routinely
> whine about timeouts): the things it manages to remount readonly does
> not usually include / is bcache usually unhappy about this sort of
> thing? Would it prefer it if I was able to unmount it properly? If so,
> it's time to upgrade that machine... it's only that I've had an oops
> before now when stopping bcaches (early in test, not preserved because I
> thought I'd made a mistake), so I thought it might be unhappy about
> *that*, as well.
> 
> -- 
> NULL && (void)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bcache crashes at boot time
  2017-06-22  0:47     ` Eric Wheeler
@ 2017-06-22 11:24       ` Nix
  0 siblings, 0 replies; 7+ messages in thread
From: Nix @ 2017-06-22 11:24 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: kent.overstreet, Igor Pavlenko, linux-bcache

On 22 Jun 2017, Eric Wheeler verbalised:
> The physical blocksize must be the same as the cachedev IIRC.  If 4k, use 
> -w 4k in make-bcache.  There's a thread somewhere discussing those 
> details.  

They are both 4096, and my make_bcache line was

make_bcache -B /dev/md/fast -C /dev/ssd2 --block 4096 --bucket 2M --data-offset $((2*30720))

which should be appropriate, I think? (This was my second attempt: I
came out with a number of unprintable words when I found that
make_bcache doesn't try to figure out an appropriate data offset when
used atop a RAID array, so the default options misalign the entire
thing.)

(Something I didn't make clear enough is that there there were several
successful restarts and remounts before the failed one.)

This is a writethrough cache, so it can't be a writeback problem (thank
goodness).

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-22 11:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05 12:38 bcache crashes at boot time Igor Pavlenko
2017-06-20 22:04 ` Eric Wheeler
2017-06-21 11:05   ` Nix
2017-06-21 17:10     ` Kai Krakow
2017-06-21 21:57       ` Nix
2017-06-22  0:47     ` Eric Wheeler
2017-06-22 11:24       ` Nix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.