All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Priebe <s.priebe@profihost.ag>
To: Eric Wheeler <bcache@lists.ewheeler.net>,
	Dan Merillat <dan.merillat@gmail.com>
Cc: linux-bcache@vger.kernel.org
Subject: Re: bcache fails after reboot if discard is enabled
Date: Wed, 08 Apr 2015 20:27:15 +0200	[thread overview]
Message-ID: <55257303.8020008@profihost.ag> (raw)
In-Reply-To: <alpine.DEB.2.02.1504081115550.2587@ware.dreamhost.com>


Am 08.04.2015 um 20:17 schrieb Eric Wheeler:
> Intentional top post:
>
> Anecdotally, I seem to remember someone else on the list having trouble
> using bcache when the backing device(s?) have TRIM enabled.

Me. Wasn't able to fix it. Trim just results in complete data loss with 
bcache if you reboot.

Stefan

>
> -Eric
>
> --
> Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
> 888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
> www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298
>
> On Tue, 7 Apr 2015, Dan Merillat wrote:
>
>>> It works perfectly fine here with latest 3.18. My setup is backing a btrfs
>>> filesystem in write-back mode. I can reboot cleanly, hard-reset upon
>>> freezes, I had no issues yet and no data loss. Even after hard-reset the
>>> kernel logs of both bcache and btrfs were clean, the filesystem was clean,
>>> just the usual btrfs recovery messages after an unclean shutdown.
>>>
>>> I wonder if the SSD and/or the block layer in use may be part of the
>>> problem:
>>>
>>>    * if putting bcache on LVM, discards may not be handled well
>>>    * if putting bcache or the backing fs on LVM, barriers may not be handled
>>>      well (bcache relies on perfectly working barriers)
>>>    * does the SSD support powerloss protection? (IOW, use capacitors)
>>>    * latest firmware applied? read the changelogs of it?
>>>
>>> I'd try to first figure out these differences before looking further into
>>> debugging. I guess that most consumer-grade drives at least lack a few of
>>> the important features to use write-back mode, or use bcache at all.
>>>
>>> So, to start the list: My SSD is a Crucial MX100 128GB with discards enabled
>>> (for both bcache and btrfs), using plain raw devices (no LVM or MD
>>> involved). It supports TRIM (as my chipset does), and it supports powerloss-
>>> protection and maybe even some internal RAID-like data protection layer
>>> (whatever that is, it's in the papers).
>>>
>>> I'm not sure what a hard-reset technically means to the SSD but I guess it
>>> is handled as some sort of short powerloss. Reading through different SSD
>>> firmware update descriptions, I also see a lot words around power-off and
>>> reset problems being fixed that could lead to data-loss otherwise. That
>>> could be pretty fatal to bcache as it considers it storage as always unclean
>>> (probably even in write-through mode). Having damaged data blocks out of
>>> expected write order (barriers!) could be pretty bad when bcache recovers
>>> from last shutdown and replays logs.
>>
>> Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)
>>
>> There's no known issues with TRIM on an 840-EVO, and no powerloss or
>> anything of the sort occurred.  I was seeing excessive write
>> amplification on my SSD, and enabled discard - then my machine
>> promptly started lagging, eventually disk access locked up and after a
>> reboot I was confronted with:
>>
>> [  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
>> offset 2047
>> [  276.571448] bcache: prio_read() bad csum reading priorities
>> [  276.571528] bcache: prio_read() bad magic reading priorities
>> [  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
>> bad btree header at bucket 65638, block 0, 0 keys, disabling caching
>> [  276.577457] bcache: register_cache() registered cache device sda4
>> [  276.577632] bcache: cache_set_free() Cache set
>> 804d6906-fa80-40ac-9081-a71a4d595378 unregistered
>>
>> Attempting to check the backingstore (echo 1 > bcache/running):
>>
>> [  687.912987] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  687.913192] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  687.913231] BTRFS: failed to read tree root on bcache0
>> [  687.936073] BTRFS: open_ctree failed
>>
>> The cache device is not going through LVM or anything of the sort, so
>> this is a direct failure of bcache.  Perhaps due to eraseblock
>> alignment and assumptions about sizes?  Either way, I've got a ton of
>> data to recover/restore now and I'm unhappy about it.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2015-04-08 18:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-02  9:47 bcache fails after reboot if discard is enabled Stefan Priebe - Profihost AG
2015-01-02 10:00 ` Stefan Priebe - Profihost AG
2015-01-03 16:32   ` Rolf Fokkens
2015-01-03 19:32     ` Stefan Priebe
2015-01-05  0:06       ` Michael Goertz
2015-02-09 19:46         ` Kai Krakow
2015-04-08  0:06           ` Dan Merillat
2015-04-08 18:17             ` Eric Wheeler
2015-04-08 18:27               ` Stefan Priebe [this message]
2015-04-08 19:31                 ` Eric Wheeler
2015-04-08 19:54                   ` Kai Krakow
2015-04-08 22:02                     ` Dan Merillat
2015-04-10 23:00                       ` Kai Krakow
2015-04-11  0:14                         ` Kai Krakow
2015-04-11  6:31                           ` Dan Merillat
2015-04-11  6:54                             ` Dan Merillat
2015-04-11  7:52                               ` Kai Krakow
2015-04-11 18:53                                 ` Dan Merillat
     [not found]                                 ` <CAPL5yKfpk8+6Vw cUVcwJ9QxAZJQmqaa98spCyT7+LekkRvkeAw@mail.gmail.com>
2015-04-11 20:09                                   ` Kai Krakow
2015-04-12  5:56                                     ` Dan Merillat
2015-04-29 17:48                                       ` Dan Merillat
2015-04-29 18:00                                         ` Ming Lin
2015-04-29 19:57                                         ` Kai Krakow
2015-04-08 18:46             ` Kai Krakow
2015-06-05  5:11             ` Kai Krakow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55257303.8020008@profihost.ag \
    --to=s.priebe@profihost.ag \
    --cc=bcache@lists.ewheeler.net \
    --cc=dan.merillat@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.