All of lore.kernel.org
 help / color / mirror / Atom feed
* bcache fails after reboot if discard is enabled
@ 2015-01-02  9:47 Stefan Priebe - Profihost AG
  2015-01-02 10:00 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Priebe - Profihost AG @ 2015-01-02  9:47 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kent Overstreet

Hi,

while running 3.10 or 3.18 kernel i've problems enabling discard.
Strangely this only appears on reboot or crash. While these situations
work fine without discard.

bcache completely fails when discard is enabled for reboot or crash.
Strangely it works fine while "running".

After a reboot dmesg looks like this (for all 3 cache and all backing
devices):
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
WARNING: at drivers/md/bcache/alloc.c:79 bch_inc_gen+0x5f/0x70 [bcache]()
Modules linked in: bcache sg sd_mod crc32_pclmul ghash_clmulni_intel
isci(+) libsas ahci scsi_transport_sas libahci igb i2c_algo_bit i2c_core
ixgbe(O) ptp pps_core
CPU: 0 PID: 438 Comm: bcache-register Tainted: G           O 3.18.1 #1
 [<ffffffffa007305f>] bch_inc_gen+0x5f/0x70 [bcache]
 [<ffffffffa0073234>] __bch_invalidate_one_bucket+0x44/0xe0 [bcache]
 [<ffffffffa007ba06>] bch_initial_gc_finish+0xe6/0x190 [bcache]
 [<ffffffffa0093747>] ? bch_crc64+0x37/0x50 [bcache]
 [<ffffffffa008bb38>] run_cache_set+0x3c8/0x900 [bcache]
 [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
bcache: error on d85a7b6f-50cf-4293-8f20-cdd16d5d16e0: key too stale:
97, need_gc 128, disabling caching
CPU: 1 PID: 438 Comm: bcache-register Tainted: G        W  O 3.18.1 #1
 [<ffffffffa00805e5>] bch_extent_bad+0x1b5/0x1c0 [bcache]
 [<ffffffffa0074c2a>] bch_ptr_bad+0xa/0x10 [bcache]
 [<ffffffffa00750e1>] btree_mergesort+0x2d1/0x560 [bcache]
 [<ffffffffa0074c20>] ? bch_ptr_invalid+0x10/0x10 [bcache]
 [<ffffffffa007571e>] ? bch_bset_init_next+0x8e/0xf0 [bcache]
 [<ffffffffa007712c>] ? bch_btree_iter_init+0x7c/0xc0 [bcache]
 [<ffffffffa0077705>] bch_btree_sort_into+0x55/0x80 [bcache]
 [<ffffffffa007b421>] btree_node_alloc_replacement+0x81/0xc0 [bcache]
 [<ffffffffa007bd1c>] btree_split+0xbc/0x6d0 [bcache]
 [<ffffffffa007c5ea>] bch_btree_insert_node+0x2ba/0x3a0 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007c6f8>] btree_insert_fn+0x28/0x50 [bcache]
 [<ffffffffa007b098>] bch_btree_map_nodes_recurse+0x38/0x160 [bcache]
 [<ffffffffa00762b7>] ? __bch_bset_search+0x187/0x4a0 [bcache]
 [<ffffffffa0080372>] ? bch_btree_ptr_invalid+0x12/0x20 [bcache]
 [<ffffffffa007acb8>] ? bch_btree_node_get+0x78/0x290 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007b133>] bch_btree_map_nodes_recurse+0xd3/0x160 [bcache]
 [<ffffffffa007ddf4>] __bch_btree_map_nodes+0x104/0x120 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007def1>] bch_btree_insert+0xe1/0x150 [bcache]
 [<ffffffffa008264a>] bch_journal_replay+0x12a/0x250 [bcache]
 [<ffffffffa0093747>] ? bch_crc64+0x37/0x50 [bcache]
 [<ffffffffa008bcdf>] run_cache_set+0x56f/0x900 [bcache]
 [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
bcache: bch_journal_replay() journal replay done, 4390 keys in 57
entries, seq 2406219
bcache: register_cache() registered cache device sda5
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: error on 157e5dc9-4017-410b-b1f6-450385345128: key too stale:
107, need_gc 128, disabling caching
CPU: 1 PID: 447 Comm: bcache-register Tainted: G        W  O 3.18.1 #1
 [<ffffffffa00805e5>] bch_extent_bad+0x1b5/0x1c0 [bcache]
 [<ffffffffa0074c2a>] bch_ptr_bad+0xa/0x10 [bcache]
 [<ffffffffa00750e1>] btree_mergesort+0x2d1/0x560 [bcache]
 [<ffffffffa0074c20>] ? bch_ptr_invalid+0x10/0x10 [bcache]
 [<ffffffffa007571e>] ? bch_bset_init_next+0x8e/0xf0 [bcache]
 [<ffffffffa007712c>] ? bch_btree_iter_init+0x7c/0xc0 [bcache]
 [<ffffffffa0077705>] bch_btree_sort_into+0x55/0x80 [bcache]
 [<ffffffffa007b421>] btree_node_alloc_replacement+0x81/0xc0 [bcache]
 [<ffffffffa007bd1c>] btree_split+0xbc/0x6d0 [bcache]
 [<ffffffffa007c5ea>] bch_btree_insert_node+0x2ba/0x3a0 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007c6f8>] btree_insert_fn+0x28/0x50 [bcache]
 [<ffffffffa007b098>] bch_btree_map_nodes_recurse+0x38/0x160 [bcache]
 [<ffffffffa00762b7>] ? __bch_bset_search+0x187/0x4a0 [bcache]
 [<ffffffffa0080372>] ? bch_btree_ptr_invalid+0x12/0x20 [bcache]
 [<ffffffffa007acb8>] ? bch_btree_node_get+0x78/0x290 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007b133>] bch_btree_map_nodes_recurse+0xd3/0x160 [bcache]
 [<ffffffffa007ddf4>] __bch_btree_map_nodes+0x104/0x120 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007def1>] bch_btree_insert+0xe1/0x150 [bcache]
 [<ffffffffa008264a>] bch_journal_replay+0x12a/0x250 [bcache]
 [<ffffffffa009374f>] ? bch_crc64+0x3f/0x50 [bcache]
 [<ffffffffa008bcdf>] run_cache_set+0x56f/0x900 [bcache]
 [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
bcache: bch_journal_replay() journal replay done, 4355 keys in 56
entries, seq 435045
bcache: register_cache() registered cache device sdb3
bcache: register_bdev() registered backing device sdd1
bcache: bch_cached_dev_attach() Can't attach sdd1: shutting down
bcache: register_bdev() registered backing device sdc1
bcache: register_bdev() registered backing device sde1
bcache: bch_cached_dev_attach() Can't attach sde1: shutting down
bcache: cache_set_free() Cache set d85a7b6f-50cf-4293-8f20-cdd16d5d16e0
unregistered
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: error on b755d45b-9fa1-490f-9eca-6b739618aaf1: accessing
priorities, disabling caching
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: register_cache() registered cache device sdi5
bcache: cache_set_free() Cache set b755d45b-9fa1-490f-9eca-6b739618aaf1
unregistered
bcache: register_bdev() registered backing device sdf1
bcache: register_bdev() registered backing device sdh1
bcache: bch_cached_dev_attach() Can't attach sdh1: shutting down
bcache: register_bdev() registered backing device sdg1
bcache: bch_cached_dev_attach() Can't attach sdg1: shutting down
bcache: error on d85a7b6f-50cf-4293-8f20-cdd16d5d16e0: key too stale:
105, need_gc 128, disabling caching
CPU: 1 PID: 1184 Comm: bcache-register Tainted: G        W  O 3.18.1 #1
 [<ffffffffa00805e5>] bch_extent_bad+0x1b5/0x1c0 [bcache]
 [<ffffffffa0074c2a>] bch_ptr_bad+0xa/0x10 [bcache]
 [<ffffffffa00750e1>] btree_mergesort+0x2d1/0x560 [bcache]
 [<ffffffffa0074c20>] ? bch_ptr_invalid+0x10/0x10 [bcache]
 [<ffffffffa007571e>] ? bch_bset_init_next+0x8e/0xf0 [bcache]
 [<ffffffffa007712c>] ? bch_btree_iter_init+0x7c/0xc0 [bcache]
 [<ffffffffa0077705>] bch_btree_sort_into+0x55/0x80 [bcache]
 [<ffffffffa007b421>] btree_node_alloc_replacement+0x81/0xc0 [bcache]
 [<ffffffffa007bd1c>] btree_split+0xbc/0x6d0 [bcache]
 [<ffffffffa007c5ea>] bch_btree_insert_node+0x2ba/0x3a0 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007c6f8>] btree_insert_fn+0x28/0x50 [bcache]
 [<ffffffffa007b098>] bch_btree_map_nodes_recurse+0x38/0x160 [bcache]
 [<ffffffffa00762b7>] ? __bch_bset_search+0x187/0x4a0 [bcache]
 [<ffffffffa0080372>] ? bch_btree_ptr_invalid+0x12/0x20 [bcache]
 [<ffffffffa007acb8>] ? bch_btree_node_get+0x78/0x290 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007b133>] bch_btree_map_nodes_recurse+0xd3/0x160 [bcache]
 [<ffffffffa007ddf4>] __bch_btree_map_nodes+0x104/0x120 [bcache]
 [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
 [<ffffffffa007def1>] bch_btree_insert+0xe1/0x150 [bcache]
 [<ffffffffa008264a>] bch_journal_replay+0x12a/0x250 [bcache]
 [<ffffffffa008bcdf>] run_cache_set+0x56f/0x900 [bcache]
 [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
bcache: bch_journal_replay() journal replay done, 4390 keys in 58
entries, seq 2406220
bcache: bch_cached_dev_attach() Can't attach sde1: shutting down
bcache: bch_cached_dev_attach() Can't attach sdd1: shutting down
bcache: register_cache() registered cache device sda5
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: error on b755d45b-9fa1-490f-9eca-6b739618aaf1: accessing
priorities, disabling caching
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: prio_read() bad csum reading priorities
bcache: prio_read() bad magic reading priorities
bcache: register_cache() registered cache device sdi5
bcache: cache_set_free() Cache set b755d45b-9fa1-490f-9eca-6b739618aaf1
unregistered
bcache: cache_set_free() Cache set d85a7b6f-50cf-4293-8f20-cdd16d5d16e0
unregistered
bcache: cache_set_free() Cache set 157e5dc9-4017-410b-b1f6-450385345128
unregistered

Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-01-02  9:47 bcache fails after reboot if discard is enabled Stefan Priebe - Profihost AG
@ 2015-01-02 10:00 ` Stefan Priebe - Profihost AG
  2015-01-03 16:32   ` Rolf Fokkens
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Priebe - Profihost AG @ 2015-01-02 10:00 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kent Overstreet

I'm sorry the backtraces were incomplete.

Here is a complete one:
[    8.191781] CPU: 1 PID: 1184 Comm: bcache-register Tainted: G
W  O 3.10.63+96-ph #1
[    8.191783] Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0c
10/08/2012
[    8.191784]  ffffc90017cb0000 ffff880c315a1608 ffffffff8154fad2
ffff880c315a1638
[    8.191786]  ffffffffa00805e5 ffff880c315a165e ffff880c3cffa8c8
ffff880c31340680
[    8.191788]  ffff880c3a440530 ffff880c315a1648 ffffffffa0074c2a
ffff880c315a1708
[    8.191790] Call Trace:
[    8.191795]  [<ffffffff8154fad2>] dump_stack+0x19/0x1b
[    8.191802]  [<ffffffffa00805e5>] bch_extent_bad+0x1b5/0x1c0 [bcache]
[    8.191806]  [<ffffffffa0074c2a>] bch_ptr_bad+0xa/0x10 [bcache]
[    8.191809]  [<ffffffffa00750e1>] btree_mergesort+0x2d1/0x560 [bcache]
[    8.191813]  [<ffffffffa0074c20>] ? bch_ptr_invalid+0x10/0x10 [bcache]
[    8.191816]  [<ffffffff8137e800>] ? get_random_bytes+0x20/0x30
[    8.191820]  [<ffffffffa007571e>] ? bch_bset_init_next+0x8e/0xf0 [bcache]
[    8.191823]  [<ffffffffa007712c>] ? bch_btree_iter_init+0x7c/0xc0
[bcache]
[    8.191827]  [<ffffffffa0077705>] bch_btree_sort_into+0x55/0x80 [bcache]
[    8.191830]  [<ffffffff810706ab>] ? prepare_to_wait+0x5b/0x90
[    8.191833]  [<ffffffffa007b421>]
btree_node_alloc_replacement+0x81/0xc0 [bcache]
[    8.191837]  [<ffffffffa007bd1c>] btree_split+0xbc/0x6d0 [bcache]
[    8.191840]  [<ffffffff81083f16>] ? find_busiest_group+0x36/0x4a0
[    8.191843]  [<ffffffffa007c5ea>] bch_btree_insert_node+0x2ba/0x3a0
[bcache]
[    8.191847]  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0
[bcache]
[    8.191850]  [<ffffffffa007c6f8>] btree_insert_fn+0x28/0x50 [bcache]
[    8.191853]  [<ffffffffa007b098>]
bch_btree_map_nodes_recurse+0x38/0x160 [bcache]
[    8.191857]  [<ffffffffa00762b7>] ? __bch_bset_search+0x187/0x4a0
[bcache]
[    8.191861]  [<ffffffffa0080372>] ? bch_btree_ptr_invalid+0x12/0x20
[bcache]
[    8.191864]  [<ffffffffa007acb8>] ? bch_btree_node_get+0x78/0x290
[bcache]
[    8.191868]  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0
[bcache]
[    8.191871]  [<ffffffffa007b133>]
bch_btree_map_nodes_recurse+0xd3/0x160 [bcache]
[    8.191875]  [<ffffffffa007ddf4>] __bch_btree_map_nodes+0x104/0x120
[bcache]
[    8.191878]  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0
[bcache]
[    8.191881]  [<ffffffffa007def1>] bch_btree_insert+0xe1/0x150 [bcache]
[    8.191883]  [<ffffffff81070430>] ? finish_wait+0x80/0x80
[    8.191887]  [<ffffffffa008264a>] bch_journal_replay+0x12a/0x250 [bcache]
[    8.191889]  [<ffffffff8107c29d>] ? ttwu_do_wakeup+0x1d/0xe0
[    8.191891]  [<ffffffff8107e8fc>] ? try_to_wake_up+0x20c/0x2e0
[    8.191893]  [<ffffffff8107ea37>] ? wake_up_process+0x27/0x50
[    8.191898]  [<ffffffffa008bcdf>] run_cache_set+0x56f/0x900 [bcache]
[    8.191902]  [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
[    8.191904]  [<ffffffff8111510f>] ? handle_mm_fault+0x2cf/0x400
[    8.191907]  [<ffffffff812b83df>] kobj_attr_store+0xf/0x20
[    8.191909]  [<ffffffff811bf480>] sysfs_write_file+0xd0/0x150
[    8.191911]  [<ffffffff81151fc5>] vfs_write+0xc5/0x1f0
[    8.191913]  [<ffffffff811524b2>] SyS_write+0x52/0xa0
[    8.191915]  [<ffffffff81032ece>] ? do_page_fault+0xe/0x10
[    8.191917]  [<ffffffff81555a12>] system_call_fastpath+0x16/0x1b
Am 02.01.2015 um 10:47 schrieb Stefan Priebe - Profihost AG:
> Hi,
> 
> while running 3.10 or 3.18 kernel i've problems enabling discard.
> Strangely this only appears on reboot or crash. While these situations
> work fine without discard.
> 
> bcache completely fails when discard is enabled for reboot or crash.
> Strangely it works fine while "running".
> 
> After a reboot dmesg looks like this (for all 3 cache and all backing
> devices):
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> WARNING: at drivers/md/bcache/alloc.c:79 bch_inc_gen+0x5f/0x70 [bcache]()
> Modules linked in: bcache sg sd_mod crc32_pclmul ghash_clmulni_intel
> isci(+) libsas ahci scsi_transport_sas libahci igb i2c_algo_bit i2c_core
> ixgbe(O) ptp pps_core
> CPU: 0 PID: 438 Comm: bcache-register Tainted: G           O 3.18.1 #1
>  [<ffffffffa007305f>] bch_inc_gen+0x5f/0x70 [bcache]
>  [<ffffffffa0073234>] __bch_invalidate_one_bucket+0x44/0xe0 [bcache]
>  [<ffffffffa007ba06>] bch_initial_gc_finish+0xe6/0x190 [bcache]
>  [<ffffffffa0093747>] ? bch_crc64+0x37/0x50 [bcache]
>  [<ffffffffa008bb38>] run_cache_set+0x3c8/0x900 [bcache]
>  [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
> bcache: error on d85a7b6f-50cf-4293-8f20-cdd16d5d16e0: key too stale:
> 97, need_gc 128, disabling caching
> CPU: 1 PID: 438 Comm: bcache-register Tainted: G        W  O 3.18.1 #1
>  [<ffffffffa00805e5>] bch_extent_bad+0x1b5/0x1c0 [bcache]
>  [<ffffffffa0074c2a>] bch_ptr_bad+0xa/0x10 [bcache]
>  [<ffffffffa00750e1>] btree_mergesort+0x2d1/0x560 [bcache]
>  [<ffffffffa0074c20>] ? bch_ptr_invalid+0x10/0x10 [bcache]
>  [<ffffffffa007571e>] ? bch_bset_init_next+0x8e/0xf0 [bcache]
>  [<ffffffffa007712c>] ? bch_btree_iter_init+0x7c/0xc0 [bcache]
>  [<ffffffffa0077705>] bch_btree_sort_into+0x55/0x80 [bcache]
>  [<ffffffffa007b421>] btree_node_alloc_replacement+0x81/0xc0 [bcache]
>  [<ffffffffa007bd1c>] btree_split+0xbc/0x6d0 [bcache]
>  [<ffffffffa007c5ea>] bch_btree_insert_node+0x2ba/0x3a0 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007c6f8>] btree_insert_fn+0x28/0x50 [bcache]
>  [<ffffffffa007b098>] bch_btree_map_nodes_recurse+0x38/0x160 [bcache]
>  [<ffffffffa00762b7>] ? __bch_bset_search+0x187/0x4a0 [bcache]
>  [<ffffffffa0080372>] ? bch_btree_ptr_invalid+0x12/0x20 [bcache]
>  [<ffffffffa007acb8>] ? bch_btree_node_get+0x78/0x290 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007b133>] bch_btree_map_nodes_recurse+0xd3/0x160 [bcache]
>  [<ffffffffa007ddf4>] __bch_btree_map_nodes+0x104/0x120 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007def1>] bch_btree_insert+0xe1/0x150 [bcache]
>  [<ffffffffa008264a>] bch_journal_replay+0x12a/0x250 [bcache]
>  [<ffffffffa0093747>] ? bch_crc64+0x37/0x50 [bcache]
>  [<ffffffffa008bcdf>] run_cache_set+0x56f/0x900 [bcache]
>  [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
> bcache: bch_journal_replay() journal replay done, 4390 keys in 57
> entries, seq 2406219
> bcache: register_cache() registered cache device sda5
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: error on 157e5dc9-4017-410b-b1f6-450385345128: key too stale:
> 107, need_gc 128, disabling caching
> CPU: 1 PID: 447 Comm: bcache-register Tainted: G        W  O 3.18.1 #1
>  [<ffffffffa00805e5>] bch_extent_bad+0x1b5/0x1c0 [bcache]
>  [<ffffffffa0074c2a>] bch_ptr_bad+0xa/0x10 [bcache]
>  [<ffffffffa00750e1>] btree_mergesort+0x2d1/0x560 [bcache]
>  [<ffffffffa0074c20>] ? bch_ptr_invalid+0x10/0x10 [bcache]
>  [<ffffffffa007571e>] ? bch_bset_init_next+0x8e/0xf0 [bcache]
>  [<ffffffffa007712c>] ? bch_btree_iter_init+0x7c/0xc0 [bcache]
>  [<ffffffffa0077705>] bch_btree_sort_into+0x55/0x80 [bcache]
>  [<ffffffffa007b421>] btree_node_alloc_replacement+0x81/0xc0 [bcache]
>  [<ffffffffa007bd1c>] btree_split+0xbc/0x6d0 [bcache]
>  [<ffffffffa007c5ea>] bch_btree_insert_node+0x2ba/0x3a0 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007c6f8>] btree_insert_fn+0x28/0x50 [bcache]
>  [<ffffffffa007b098>] bch_btree_map_nodes_recurse+0x38/0x160 [bcache]
>  [<ffffffffa00762b7>] ? __bch_bset_search+0x187/0x4a0 [bcache]
>  [<ffffffffa0080372>] ? bch_btree_ptr_invalid+0x12/0x20 [bcache]
>  [<ffffffffa007acb8>] ? bch_btree_node_get+0x78/0x290 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007b133>] bch_btree_map_nodes_recurse+0xd3/0x160 [bcache]
>  [<ffffffffa007ddf4>] __bch_btree_map_nodes+0x104/0x120 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007def1>] bch_btree_insert+0xe1/0x150 [bcache]
>  [<ffffffffa008264a>] bch_journal_replay+0x12a/0x250 [bcache]
>  [<ffffffffa009374f>] ? bch_crc64+0x3f/0x50 [bcache]
>  [<ffffffffa008bcdf>] run_cache_set+0x56f/0x900 [bcache]
>  [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
> bcache: bch_journal_replay() journal replay done, 4355 keys in 56
> entries, seq 435045
> bcache: register_cache() registered cache device sdb3
> bcache: register_bdev() registered backing device sdd1
> bcache: bch_cached_dev_attach() Can't attach sdd1: shutting down
> bcache: register_bdev() registered backing device sdc1
> bcache: register_bdev() registered backing device sde1
> bcache: bch_cached_dev_attach() Can't attach sde1: shutting down
> bcache: cache_set_free() Cache set d85a7b6f-50cf-4293-8f20-cdd16d5d16e0
> unregistered
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: error on b755d45b-9fa1-490f-9eca-6b739618aaf1: accessing
> priorities, disabling caching
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: register_cache() registered cache device sdi5
> bcache: cache_set_free() Cache set b755d45b-9fa1-490f-9eca-6b739618aaf1
> unregistered
> bcache: register_bdev() registered backing device sdf1
> bcache: register_bdev() registered backing device sdh1
> bcache: bch_cached_dev_attach() Can't attach sdh1: shutting down
> bcache: register_bdev() registered backing device sdg1
> bcache: bch_cached_dev_attach() Can't attach sdg1: shutting down
> bcache: error on d85a7b6f-50cf-4293-8f20-cdd16d5d16e0: key too stale:
> 105, need_gc 128, disabling caching
> CPU: 1 PID: 1184 Comm: bcache-register Tainted: G        W  O 3.18.1 #1
>  [<ffffffffa00805e5>] bch_extent_bad+0x1b5/0x1c0 [bcache]
>  [<ffffffffa0074c2a>] bch_ptr_bad+0xa/0x10 [bcache]
>  [<ffffffffa00750e1>] btree_mergesort+0x2d1/0x560 [bcache]
>  [<ffffffffa0074c20>] ? bch_ptr_invalid+0x10/0x10 [bcache]
>  [<ffffffffa007571e>] ? bch_bset_init_next+0x8e/0xf0 [bcache]
>  [<ffffffffa007712c>] ? bch_btree_iter_init+0x7c/0xc0 [bcache]
>  [<ffffffffa0077705>] bch_btree_sort_into+0x55/0x80 [bcache]
>  [<ffffffffa007b421>] btree_node_alloc_replacement+0x81/0xc0 [bcache]
>  [<ffffffffa007bd1c>] btree_split+0xbc/0x6d0 [bcache]
>  [<ffffffffa007c5ea>] bch_btree_insert_node+0x2ba/0x3a0 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007c6f8>] btree_insert_fn+0x28/0x50 [bcache]
>  [<ffffffffa007b098>] bch_btree_map_nodes_recurse+0x38/0x160 [bcache]
>  [<ffffffffa00762b7>] ? __bch_bset_search+0x187/0x4a0 [bcache]
>  [<ffffffffa0080372>] ? bch_btree_ptr_invalid+0x12/0x20 [bcache]
>  [<ffffffffa007acb8>] ? bch_btree_node_get+0x78/0x290 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007b133>] bch_btree_map_nodes_recurse+0xd3/0x160 [bcache]
>  [<ffffffffa007ddf4>] __bch_btree_map_nodes+0x104/0x120 [bcache]
>  [<ffffffffa007c6d0>] ? bch_btree_insert_node+0x3a0/0x3a0 [bcache]
>  [<ffffffffa007def1>] bch_btree_insert+0xe1/0x150 [bcache]
>  [<ffffffffa008264a>] bch_journal_replay+0x12a/0x250 [bcache]
>  [<ffffffffa008bcdf>] run_cache_set+0x56f/0x900 [bcache]
>  [<ffffffffa008d517>] register_bcache+0xd37/0x13c0 [bcache]
> bcache: bch_journal_replay() journal replay done, 4390 keys in 58
> entries, seq 2406220
> bcache: bch_cached_dev_attach() Can't attach sde1: shutting down
> bcache: bch_cached_dev_attach() Can't attach sdd1: shutting down
> bcache: register_cache() registered cache device sda5
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: error on b755d45b-9fa1-490f-9eca-6b739618aaf1: accessing
> priorities, disabling caching
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: prio_read() bad csum reading priorities
> bcache: prio_read() bad magic reading priorities
> bcache: register_cache() registered cache device sdi5
> bcache: cache_set_free() Cache set b755d45b-9fa1-490f-9eca-6b739618aaf1
> unregistered
> bcache: cache_set_free() Cache set d85a7b6f-50cf-4293-8f20-cdd16d5d16e0
> unregistered
> bcache: cache_set_free() Cache set 157e5dc9-4017-410b-b1f6-450385345128
> unregistered
> 
> Stefan
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-01-02 10:00 ` Stefan Priebe - Profihost AG
@ 2015-01-03 16:32   ` Rolf Fokkens
  2015-01-03 19:32     ` Stefan Priebe
  0 siblings, 1 reply; 25+ messages in thread
From: Rolf Fokkens @ 2015-01-03 16:32 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-bcache; +Cc: Kent Overstreet

I've been using discard for a while, but I ran a few times in serious FS 
corruptions. After disabling discard bcache was stable again.

So far I tributed the corruptions to a low-cost SSD which probably 
didn't handle discard very well. But this was only an assumptions.

I didn't experience specific reboot problems like you describe.

On 01/02/2015 11:00 AM, Stefan Priebe - Profihost AG wrote:
> I'm sorry the backtraces were incomplete.
>
> Am 02.01.2015 um 10:47 schrieb Stefan Priebe - Profihost AG:
>> Hi,
>>
>> while running 3.10 or 3.18 kernel i've problems enabling discard.
>> Strangely this only appears on reboot or crash. While these situations
>> work fine without discard.
>>
>> bcache completely fails when discard is enabled for reboot or crash.
>> Strangely it works fine while "running".
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-01-03 16:32   ` Rolf Fokkens
@ 2015-01-03 19:32     ` Stefan Priebe
  2015-01-05  0:06       ` Michael Goertz
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Priebe @ 2015-01-03 19:32 UTC (permalink / raw)
  To: Rolf Fokkens, linux-bcache, Kent Overstreet

Hi Rolf,
Am 03.01.2015 um 17:32 schrieb Rolf Fokkens:
> I've been using discard for a while, but I ran a few times in serious FS
> corruptions. After disabling discard bcache was stable again.
>
> So far I tributed the corruptions to a low-cost SSD which probably
> didn't handle discard very well. But this was only an assumptions.
>
> I didn't experience specific reboot problems like you describe.

Reboot just triggers it faster and even fs crashes can occur the errors 
are just examples.

I've now disabled discards in my kernel code.

Kent may you have a look at the 3.18 kernel code regading discards?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-01-03 19:32     ` Stefan Priebe
@ 2015-01-05  0:06       ` Michael Goertz
  2015-02-09 19:46         ` Kai Krakow
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Goertz @ 2015-01-05  0:06 UTC (permalink / raw)
  To: linux-bcache

Stefan Priebe <s.priebe <at> profihost.ag> writes:

> 
> Hi Rolf,
> Am 03.01.2015 um 17:32 schrieb Rolf Fokkens:
> > I've been using discard for a while, but I ran a few times in serious FS
> > corruptions. After disabling discard bcache was stable again.
> >
> > So far I tributed the corruptions to a low-cost SSD which probably
> > didn't handle discard very well. But this was only an assumptions.
> >
> > I didn't experience specific reboot problems like you describe.
> 
> Reboot just triggers it faster and even fs crashes can occur the errors 
> are just examples.
> 
> I've now disabled discards in my kernel code.
> 
> Kent may you have a look at the 3.18 kernel code regading discards?
> 
> Greets,
> Stefan
> 

I just started using bcache and run into this same issue after a reboot.  I 
was running in writeback mode at the time and run into some FS loss as a 
result.  I don't have back traces since my machine wouldn't boot.  I 
recovered by removing the cache device and forcing the backing device to run 
without it.

I am running Ubuntu 14.04 with the Utopic kernel.  I can provide more 
details of my setup and hardware if that's helpful.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-01-05  0:06       ` Michael Goertz
@ 2015-02-09 19:46         ` Kai Krakow
  2015-04-08  0:06           ` Dan Merillat
  0 siblings, 1 reply; 25+ messages in thread
From: Kai Krakow @ 2015-02-09 19:46 UTC (permalink / raw)
  To: linux-bcache

Michael Goertz <goertzm@gmail.com> schrieb:

> Stefan Priebe <s.priebe <at> profihost.ag> writes:
> 
>> 
>> Hi Rolf,
>> Am 03.01.2015 um 17:32 schrieb Rolf Fokkens:
>> > I've been using discard for a while, but I ran a few times in serious
>> > FS corruptions. After disabling discard bcache was stable again.
>> >
>> > So far I tributed the corruptions to a low-cost SSD which probably
>> > didn't handle discard very well. But this was only an assumptions.
>> >
>> > I didn't experience specific reboot problems like you describe.
>> 
>> Reboot just triggers it faster and even fs crashes can occur the errors
>> are just examples.
>> 
>> I've now disabled discards in my kernel code.
>> 
>> Kent may you have a look at the 3.18 kernel code regading discards?
>> 
>> Greets,
>> Stefan
>> 
> 
> I just started using bcache and run into this same issue after a reboot. 
> I was running in writeback mode at the time and run into some FS loss as a
> result.  I don't have back traces since my machine wouldn't boot.  I
> recovered by removing the cache device and forcing the backing device to
> run without it.
> 
> I am running Ubuntu 14.04 with the Utopic kernel.  I can provide more
> details of my setup and hardware if that's helpful.

It works perfectly fine here with latest 3.18. My setup is backing a btrfs 
filesystem in write-back mode. I can reboot cleanly, hard-reset upon 
freezes, I had no issues yet and no data loss. Even after hard-reset the 
kernel logs of both bcache and btrfs were clean, the filesystem was clean, 
just the usual btrfs recovery messages after an unclean shutdown.

I wonder if the SSD and/or the block layer in use may be part of the 
problem:

  * if putting bcache on LVM, discards may not be handled well
  * if putting bcache or the backing fs on LVM, barriers may not be handled
    well (bcache relies on perfectly working barriers)
  * does the SSD support powerloss protection? (IOW, use capacitors)
  * latest firmware applied? read the changelogs of it?

I'd try to first figure out these differences before looking further into 
debugging. I guess that most consumer-grade drives at least lack a few of 
the important features to use write-back mode, or use bcache at all.

So, to start the list: My SSD is a Crucial MX100 128GB with discards enabled 
(for both bcache and btrfs), using plain raw devices (no LVM or MD 
involved). It supports TRIM (as my chipset does), and it supports powerloss-
protection and maybe even some internal RAID-like data protection layer 
(whatever that is, it's in the papers).

I'm not sure what a hard-reset technically means to the SSD but I guess it 
is handled as some sort of short powerloss. Reading through different SSD 
firmware update descriptions, I also see a lot words around power-off and 
reset problems being fixed that could lead to data-loss otherwise. That 
could be pretty fatal to bcache as it considers it storage as always unclean 
(probably even in write-through mode). Having damaged data blocks out of 
expected write order (barriers!) could be pretty bad when bcache recovers 
from last shutdown and replays logs.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-02-09 19:46         ` Kai Krakow
@ 2015-04-08  0:06           ` Dan Merillat
  2015-04-08 18:17             ` Eric Wheeler
                               ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Dan Merillat @ 2015-04-08  0:06 UTC (permalink / raw)
  To: linux-bcache

> It works perfectly fine here with latest 3.18. My setup is backing a btrfs
> filesystem in write-back mode. I can reboot cleanly, hard-reset upon
> freezes, I had no issues yet and no data loss. Even after hard-reset the
> kernel logs of both bcache and btrfs were clean, the filesystem was clean,
> just the usual btrfs recovery messages after an unclean shutdown.
>
> I wonder if the SSD and/or the block layer in use may be part of the
> problem:
>
>   * if putting bcache on LVM, discards may not be handled well
>   * if putting bcache or the backing fs on LVM, barriers may not be handled
>     well (bcache relies on perfectly working barriers)
>   * does the SSD support powerloss protection? (IOW, use capacitors)
>   * latest firmware applied? read the changelogs of it?
>
> I'd try to first figure out these differences before looking further into
> debugging. I guess that most consumer-grade drives at least lack a few of
> the important features to use write-back mode, or use bcache at all.
>
> So, to start the list: My SSD is a Crucial MX100 128GB with discards enabled
> (for both bcache and btrfs), using plain raw devices (no LVM or MD
> involved). It supports TRIM (as my chipset does), and it supports powerloss-
> protection and maybe even some internal RAID-like data protection layer
> (whatever that is, it's in the papers).
>
> I'm not sure what a hard-reset technically means to the SSD but I guess it
> is handled as some sort of short powerloss. Reading through different SSD
> firmware update descriptions, I also see a lot words around power-off and
> reset problems being fixed that could lead to data-loss otherwise. That
> could be pretty fatal to bcache as it considers it storage as always unclean
> (probably even in write-through mode). Having damaged data blocks out of
> expected write order (barriers!) could be pretty bad when bcache recovers
> from last shutdown and replays logs.

Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)

There's no known issues with TRIM on an 840-EVO, and no powerloss or
anything of the sort occurred.  I was seeing excessive write
amplification on my SSD, and enabled discard - then my machine
promptly started lagging, eventually disk access locked up and after a
reboot I was confronted with:

[  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
offset 2047
[  276.571448] bcache: prio_read() bad csum reading priorities
[  276.571528] bcache: prio_read() bad magic reading priorities
[  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
bad btree header at bucket 65638, block 0, 0 keys, disabling caching
[  276.577457] bcache: register_cache() registered cache device sda4
[  276.577632] bcache: cache_set_free() Cache set
804d6906-fa80-40ac-9081-a71a4d595378 unregistered

Attempting to check the backingstore (echo 1 > bcache/running):

[  687.912987] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  687.913192] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  687.913231] BTRFS: failed to read tree root on bcache0
[  687.936073] BTRFS: open_ctree failed

The cache device is not going through LVM or anything of the sort, so
this is a direct failure of bcache.  Perhaps due to eraseblock
alignment and assumptions about sizes?  Either way, I've got a ton of
data to recover/restore now and I'm unhappy about it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08  0:06           ` Dan Merillat
@ 2015-04-08 18:17             ` Eric Wheeler
  2015-04-08 18:27               ` Stefan Priebe
  2015-04-08 18:46             ` Kai Krakow
  2015-06-05  5:11             ` Kai Krakow
  2 siblings, 1 reply; 25+ messages in thread
From: Eric Wheeler @ 2015-04-08 18:17 UTC (permalink / raw)
  To: Dan Merillat; +Cc: linux-bcache

Intentional top post:

Anecdotally, I seem to remember someone else on the list having trouble 
using bcache when the backing device(s?) have TRIM enabled.

-Eric

--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

On Tue, 7 Apr 2015, Dan Merillat wrote:

> > It works perfectly fine here with latest 3.18. My setup is backing a btrfs
> > filesystem in write-back mode. I can reboot cleanly, hard-reset upon
> > freezes, I had no issues yet and no data loss. Even after hard-reset the
> > kernel logs of both bcache and btrfs were clean, the filesystem was clean,
> > just the usual btrfs recovery messages after an unclean shutdown.
> >
> > I wonder if the SSD and/or the block layer in use may be part of the
> > problem:
> >
> >   * if putting bcache on LVM, discards may not be handled well
> >   * if putting bcache or the backing fs on LVM, barriers may not be handled
> >     well (bcache relies on perfectly working barriers)
> >   * does the SSD support powerloss protection? (IOW, use capacitors)
> >   * latest firmware applied? read the changelogs of it?
> >
> > I'd try to first figure out these differences before looking further into
> > debugging. I guess that most consumer-grade drives at least lack a few of
> > the important features to use write-back mode, or use bcache at all.
> >
> > So, to start the list: My SSD is a Crucial MX100 128GB with discards enabled
> > (for both bcache and btrfs), using plain raw devices (no LVM or MD
> > involved). It supports TRIM (as my chipset does), and it supports powerloss-
> > protection and maybe even some internal RAID-like data protection layer
> > (whatever that is, it's in the papers).
> >
> > I'm not sure what a hard-reset technically means to the SSD but I guess it
> > is handled as some sort of short powerloss. Reading through different SSD
> > firmware update descriptions, I also see a lot words around power-off and
> > reset problems being fixed that could lead to data-loss otherwise. That
> > could be pretty fatal to bcache as it considers it storage as always unclean
> > (probably even in write-through mode). Having damaged data blocks out of
> > expected write order (barriers!) could be pretty bad when bcache recovers
> > from last shutdown and replays logs.
> 
> Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)
> 
> There's no known issues with TRIM on an 840-EVO, and no powerloss or
> anything of the sort occurred.  I was seeing excessive write
> amplification on my SSD, and enabled discard - then my machine
> promptly started lagging, eventually disk access locked up and after a
> reboot I was confronted with:
> 
> [  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
> offset 2047
> [  276.571448] bcache: prio_read() bad csum reading priorities
> [  276.571528] bcache: prio_read() bad magic reading priorities
> [  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
> bad btree header at bucket 65638, block 0, 0 keys, disabling caching
> [  276.577457] bcache: register_cache() registered cache device sda4
> [  276.577632] bcache: cache_set_free() Cache set
> 804d6906-fa80-40ac-9081-a71a4d595378 unregistered
> 
> Attempting to check the backingstore (echo 1 > bcache/running):
> 
> [  687.912987] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  687.913192] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  687.913231] BTRFS: failed to read tree root on bcache0
> [  687.936073] BTRFS: open_ctree failed
> 
> The cache device is not going through LVM or anything of the sort, so
> this is a direct failure of bcache.  Perhaps due to eraseblock
> alignment and assumptions about sizes?  Either way, I've got a ton of
> data to recover/restore now and I'm unhappy about it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08 18:17             ` Eric Wheeler
@ 2015-04-08 18:27               ` Stefan Priebe
  2015-04-08 19:31                 ` Eric Wheeler
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Priebe @ 2015-04-08 18:27 UTC (permalink / raw)
  To: Eric Wheeler, Dan Merillat; +Cc: linux-bcache


Am 08.04.2015 um 20:17 schrieb Eric Wheeler:
> Intentional top post:
>
> Anecdotally, I seem to remember someone else on the list having trouble
> using bcache when the backing device(s?) have TRIM enabled.

Me. Wasn't able to fix it. Trim just results in complete data loss with 
bcache if you reboot.

Stefan

>
> -Eric
>
> --
> Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
> 888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
> www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298
>
> On Tue, 7 Apr 2015, Dan Merillat wrote:
>
>>> It works perfectly fine here with latest 3.18. My setup is backing a btrfs
>>> filesystem in write-back mode. I can reboot cleanly, hard-reset upon
>>> freezes, I had no issues yet and no data loss. Even after hard-reset the
>>> kernel logs of both bcache and btrfs were clean, the filesystem was clean,
>>> just the usual btrfs recovery messages after an unclean shutdown.
>>>
>>> I wonder if the SSD and/or the block layer in use may be part of the
>>> problem:
>>>
>>>    * if putting bcache on LVM, discards may not be handled well
>>>    * if putting bcache or the backing fs on LVM, barriers may not be handled
>>>      well (bcache relies on perfectly working barriers)
>>>    * does the SSD support powerloss protection? (IOW, use capacitors)
>>>    * latest firmware applied? read the changelogs of it?
>>>
>>> I'd try to first figure out these differences before looking further into
>>> debugging. I guess that most consumer-grade drives at least lack a few of
>>> the important features to use write-back mode, or use bcache at all.
>>>
>>> So, to start the list: My SSD is a Crucial MX100 128GB with discards enabled
>>> (for both bcache and btrfs), using plain raw devices (no LVM or MD
>>> involved). It supports TRIM (as my chipset does), and it supports powerloss-
>>> protection and maybe even some internal RAID-like data protection layer
>>> (whatever that is, it's in the papers).
>>>
>>> I'm not sure what a hard-reset technically means to the SSD but I guess it
>>> is handled as some sort of short powerloss. Reading through different SSD
>>> firmware update descriptions, I also see a lot words around power-off and
>>> reset problems being fixed that could lead to data-loss otherwise. That
>>> could be pretty fatal to bcache as it considers it storage as always unclean
>>> (probably even in write-through mode). Having damaged data blocks out of
>>> expected write order (barriers!) could be pretty bad when bcache recovers
>>> from last shutdown and replays logs.
>>
>> Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)
>>
>> There's no known issues with TRIM on an 840-EVO, and no powerloss or
>> anything of the sort occurred.  I was seeing excessive write
>> amplification on my SSD, and enabled discard - then my machine
>> promptly started lagging, eventually disk access locked up and after a
>> reboot I was confronted with:
>>
>> [  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
>> offset 2047
>> [  276.571448] bcache: prio_read() bad csum reading priorities
>> [  276.571528] bcache: prio_read() bad magic reading priorities
>> [  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
>> bad btree header at bucket 65638, block 0, 0 keys, disabling caching
>> [  276.577457] bcache: register_cache() registered cache device sda4
>> [  276.577632] bcache: cache_set_free() Cache set
>> 804d6906-fa80-40ac-9081-a71a4d595378 unregistered
>>
>> Attempting to check the backingstore (echo 1 > bcache/running):
>>
>> [  687.912987] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  687.913192] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  687.913231] BTRFS: failed to read tree root on bcache0
>> [  687.936073] BTRFS: open_ctree failed
>>
>> The cache device is not going through LVM or anything of the sort, so
>> this is a direct failure of bcache.  Perhaps due to eraseblock
>> alignment and assumptions about sizes?  Either way, I've got a ton of
>> data to recover/restore now and I'm unhappy about it.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08  0:06           ` Dan Merillat
  2015-04-08 18:17             ` Eric Wheeler
@ 2015-04-08 18:46             ` Kai Krakow
  2015-06-05  5:11             ` Kai Krakow
  2 siblings, 0 replies; 25+ messages in thread
From: Kai Krakow @ 2015-04-08 18:46 UTC (permalink / raw)
  To: linux-bcache

Dan Merillat <dan.merillat@gmail.com> schrieb:

>> It works perfectly fine here with latest 3.18. My setup is backing a
>> btrfs filesystem in write-back mode. I can reboot cleanly, hard-reset
>> upon freezes, I had no issues yet and no data loss. Even after hard-reset
>> the kernel logs of both bcache and btrfs were clean, the filesystem was
>> clean, just the usual btrfs recovery messages after an unclean shutdown.
>>
>> I wonder if the SSD and/or the block layer in use may be part of the
>> problem:
>>
>>   * if putting bcache on LVM, discards may not be handled well
>>   * if putting bcache or the backing fs on LVM, barriers may not be
>>   handled
>>     well (bcache relies on perfectly working barriers)
>>   * does the SSD support powerloss protection? (IOW, use capacitors)
>>   * latest firmware applied? read the changelogs of it?
>>
>> I'd try to first figure out these differences before looking further into
>> debugging. I guess that most consumer-grade drives at least lack a few of
>> the important features to use write-back mode, or use bcache at all.
>>
>> So, to start the list: My SSD is a Crucial MX100 128GB with discards
>> enabled (for both bcache and btrfs), using plain raw devices (no LVM or
>> MD involved). It supports TRIM (as my chipset does), and it supports
>> powerloss- protection and maybe even some internal RAID-like data
>> protection layer (whatever that is, it's in the papers).
>>
>> I'm not sure what a hard-reset technically means to the SSD but I guess
>> it is handled as some sort of short powerloss. Reading through different
>> SSD firmware update descriptions, I also see a lot words around power-off
>> and reset problems being fixed that could lead to data-loss otherwise.
>> That could be pretty fatal to bcache as it considers it storage as always
>> unclean (probably even in write-through mode). Having damaged data blocks
>> out of expected write order (barriers!) could be pretty bad when bcache
>> recovers from last shutdown and replays logs.
> 
> Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)
> 
> There's no known issues with TRIM on an 840-EVO, and no powerloss or
> anything of the sort occurred.  I was seeing excessive write
> amplification on my SSD, and enabled discard - then my machine
> promptly started lagging, eventually disk access locked up and after a
> reboot I was confronted with:
> 
> [  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
> offset 2047
> [  276.571448] bcache: prio_read() bad csum reading priorities
> [  276.571528] bcache: prio_read() bad magic reading priorities
> [  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
> bad btree header at bucket 65638, block 0, 0 keys, disabling caching
> [  276.577457] bcache: register_cache() registered cache device sda4
> [  276.577632] bcache: cache_set_free() Cache set
> 804d6906-fa80-40ac-9081-a71a4d595378 unregistered
> 
> Attempting to check the backingstore (echo 1 > bcache/running):
> 
> [  687.912987] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  687.913192] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  687.913231] BTRFS: failed to read tree root on bcache0
> [  687.936073] BTRFS: open_ctree failed

Uncool... :-(

> The cache device is not going through LVM or anything of the sort, so
> this is a direct failure of bcache.  Perhaps due to eraseblock
> alignment and assumptions about sizes?  Either way, I've got a ton of
> data to recover/restore now and I'm unhappy about it.

I think the bucket size in bcache defaults to something else than 2MB which, 
according to my knowledge, is what most SSDs use as erase block size - thus 
important for doing discards correctly (aligned).

Next: I think the native sector size of the SSD is assumed to be 2k for 
bcache. I'd recommend setting it to 4k.

Third - partition alignment: Which partitioning tool did you use? On which 
boundary did it start the first partition? For SSD is should be 2M, not 
sector 63 (really bad idea) and not 1M (which is the default of fdisk I 
think, while gdisk defaults to 2M). I suggest using cgdisk to prepare the 
drive tho it will eventually create GPT-only partitions - check your kernel 
support for it.

Fourth - wear-levelling reservation: Depending on your BIOS the kernel may 
see parts of your drive which should usually be hidden (host protected area, 
HPA). If the HPA is visible, you should take that into account. 128GB SSDs 
usually have an HPA of 8GB accounting for 120GB. Depending on the 
manufacturer, they are announced with 120 or 128GB size. Recommendation: Use 
only 120GB. Better leave some extra spare space. It helps performance and 
live-time of the drive, especially when under write heavy applications. 
General recommendation is to use only about 80% of the drive calculated form 
the native size (read: including HPA). 256GB drives usually are sold as 
240GB but they are 256GB native. 512 is 500, and so on. There may be 
"strange" sizes like 480 which are simply multiples of the lower variants 
(because they are RAID-striped internally for better performance, like 4x 
120, so calculate 4x128GB natively). This is only a general abstraction, 
don't take it as law. Manufactures may follow different strategies. But it 
is generally not a bad idea to take this formula into account.

Take note, that when you reformat according to this recommendation, you have 
to trim your drive to take advantage of this. You can use "blkdiscard" for 
this to selectively or completely trim the drives. Proceed with care, take 
backups first. If used wrong, it will eat your data or even kill your kitty.

Apart from that, I've heard about discard problems with the Evo series from 
different sources. Samsung lately updated some firmwares, with the caveat 
that for some users that bricked their drives. Samsung will replace those 
drives. But at least it told me not to trust Samsung too much. In my job we 
also had a lot of problems with those drives in the past regarding SATA 
problems, performance problems (also in Windows), computer freezes, 
bluescreens. Most of them were fixed by BIOS and firmware updates, some 
others by using a high-quality SATA cable. So from my experience I'd check 
for such issues, too (especially the cable issue since SSDs can use higher 
SATA rates). We had good experience with SanDisk so far (read: no problems 
yet). I cannot say anything about other manufacturers.

I myself am using a Crucial MX100 128GB and am generally happy with it, 
except that writing is a bit slower compared to similar sized drives from 
other manufactures. But writing is not my primary target, and with bcache 
write-back it is still a lot faster than to HDD natively. I'm perfectly 
happy with its stability and reliability. And seeing that this drive didn't 
need a single firmware update yet since it is based off a (for Crucial) well 
known and established controller, speaks for stability. You cannot say that 
about Samsung although they offer great drives performance-wise. Personally 
I can recommend Crucial, tho on a broader range I don't have much experience 
with it wrt/ reliability.

Anyways: Your warning is well placed here regarding bcache and caching 
strategies. One should take it into account.

PS: I've used a block size of 4k and bucket size of 2M for my bcache setup. 
Probably it makes a difference. Other people here may give a deeper insight 
and maybe even explain why bcache defaults to 2k and 1M.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08 18:27               ` Stefan Priebe
@ 2015-04-08 19:31                 ` Eric Wheeler
  2015-04-08 19:54                   ` Kai Krakow
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Wheeler @ 2015-04-08 19:31 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Dan Merillat, linux-bcache

> Am 08.04.2015 um 20:17 schrieb Eric Wheeler:
> > Anecdotally, I seem to remember someone else on the list having trouble
> > using bcache when the backing device(s?) have TRIM enabled.
> 
> Me. Wasn't able to fix it. Trim just results in complete data loss with bcache
> if you reboot.
> 
> Stefan

Should bcache TRIM handling be disabled by default?

Kai reports success with TRIM on his Crucial SSD, so perhaps this is not a 
problem for everyone---but data integrity should be a priority and TRIM 
should only be enabled by those who understand the risks and wish to test.

Of course if the underlying problem could be found and fixed in code, that 
would be even better.

-Eric

 
> > 
> > -Eric
> > 
> > --
> > Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
> > 888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
> > www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298
> > 
> > On Tue, 7 Apr 2015, Dan Merillat wrote:
> > 
> > > > It works perfectly fine here with latest 3.18. My setup is backing a
> > > > btrfs
> > > > filesystem in write-back mode. I can reboot cleanly, hard-reset upon
> > > > freezes, I had no issues yet and no data loss. Even after hard-reset the
> > > > kernel logs of both bcache and btrfs were clean, the filesystem was
> > > > clean,
> > > > just the usual btrfs recovery messages after an unclean shutdown.
> > > > 
> > > > I wonder if the SSD and/or the block layer in use may be part of the
> > > > problem:
> > > > 
> > > >    * if putting bcache on LVM, discards may not be handled well
> > > >    * if putting bcache or the backing fs on LVM, barriers may not be
> > > > handled
> > > >      well (bcache relies on perfectly working barriers)
> > > >    * does the SSD support powerloss protection? (IOW, use capacitors)
> > > >    * latest firmware applied? read the changelogs of it?
> > > > 
> > > > I'd try to first figure out these differences before looking further
> > > > into
> > > > debugging. I guess that most consumer-grade drives at least lack a few
> > > > of
> > > > the important features to use write-back mode, or use bcache at all.
> > > > 
> > > > So, to start the list: My SSD is a Crucial MX100 128GB with discards
> > > > enabled
> > > > (for both bcache and btrfs), using plain raw devices (no LVM or MD
> > > > involved). It supports TRIM (as my chipset does), and it supports
> > > > powerloss-
> > > > protection and maybe even some internal RAID-like data protection layer
> > > > (whatever that is, it's in the papers).
> > > > 
> > > > I'm not sure what a hard-reset technically means to the SSD but I guess
> > > > it
> > > > is handled as some sort of short powerloss. Reading through different
> > > > SSD
> > > > firmware update descriptions, I also see a lot words around power-off
> > > > and
> > > > reset problems being fixed that could lead to data-loss otherwise. That
> > > > could be pretty fatal to bcache as it considers it storage as always
> > > > unclean
> > > > (probably even in write-through mode). Having damaged data blocks out of
> > > > expected write order (barriers!) could be pretty bad when bcache
> > > > recovers
> > > > from last shutdown and replays logs.
> > > 
> > > Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)
> > > 
> > > There's no known issues with TRIM on an 840-EVO, and no powerloss or
> > > anything of the sort occurred.  I was seeing excessive write
> > > amplification on my SSD, and enabled discard - then my machine
> > > promptly started lagging, eventually disk access locked up and after a
> > > reboot I was confronted with:
> > > 
> > > [  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
> > > offset 2047
> > > [  276.571448] bcache: prio_read() bad csum reading priorities
> > > [  276.571528] bcache: prio_read() bad magic reading priorities
> > > [  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
> > > bad btree header at bucket 65638, block 0, 0 keys, disabling caching
> > > [  276.577457] bcache: register_cache() registered cache device sda4
> > > [  276.577632] bcache: cache_set_free() Cache set
> > > 804d6906-fa80-40ac-9081-a71a4d595378 unregistered
> > > 
> > > Attempting to check the backingstore (echo 1 > bcache/running):
> > > 
> > > [  687.912987] BTRFS (device bcache0): parent transid verify failed on
> > > 7567956930560 wanted 613690 found 613681
> > > [  687.913192] BTRFS (device bcache0): parent transid verify failed on
> > > 7567956930560 wanted 613690 found 613681
> > > [  687.913231] BTRFS: failed to read tree root on bcache0
> > > [  687.936073] BTRFS: open_ctree failed
> > > 
> > > The cache device is not going through LVM or anything of the sort, so
> > > this is a direct failure of bcache.  Perhaps due to eraseblock
> > > alignment and assumptions about sizes?  Either way, I've got a ton of
> > > data to recover/restore now and I'm unhappy about it.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08 19:31                 ` Eric Wheeler
@ 2015-04-08 19:54                   ` Kai Krakow
  2015-04-08 22:02                     ` Dan Merillat
  0 siblings, 1 reply; 25+ messages in thread
From: Kai Krakow @ 2015-04-08 19:54 UTC (permalink / raw)
  To: linux-bcache

Eric Wheeler <bcache@lists.ewheeler.net> schrieb:

>> Am 08.04.2015 um 20:17 schrieb Eric Wheeler:
>> > Anecdotally, I seem to remember someone else on the list having trouble
>> > using bcache when the backing device(s?) have TRIM enabled.
>> 
>> Me. Wasn't able to fix it. Trim just results in complete data loss with
>> bcache if you reboot.
>> 
>> Stefan
> 
> Should bcache TRIM handling be disabled by default?
> 
> Kai reports success with TRIM on his Crucial SSD, so perhaps this is not a
> problem for everyone---but data integrity should be a priority and TRIM
> should only be enabled by those who understand the risks and wish to test.
> 
> Of course if the underlying problem could be found and fixed in code, that
> would be even better.

I didn't have a problem yet with it. Bcache/btrfs combo even survived power-
outages here during writes - with discard enabled for both btrfs and bcache. 
There's also no thrashing or unexpected performance drops.

But I always recommend to learn the correct erase block size of your drive. 
I just got a comment from Josep (didn't reply here) that TLC drives may use 
"strange" (unexpected) erase block sizes, read: 3x native erase block size, 
in case of TLC Evo that is 3x 512kB = 1536kB.

For bcache, you should set the bucket size to erase block size. I cannot 
say, however, if that plays into the trimming problem on reboots. Another 
factor may be that the main board resets the drive while it is still 
trimming during reboot/shutdown. It's probably a firmware bug, but could 
also be a problem with missing/non-working power-loss-protection. At least 
it should play into the performance problem when using trimming.

So the lesson here is (apart from "discard" being buggy in some firmwares): 
Erase block size heavily depends on the SSD's internal structure (memory 
cell layout, memory cell layers, memory cell striping). The most common 
value is probably 2M (it should fit most combinations in even multiples for 
MLC and SLC drives, not for TLC tho).

> 
> -Eric
> 
>  
>> > 
>> > -Eric
>> > 
>> > --
>> > Eric Wheeler, President           eWheeler, Inc. dba Global Linux
>> > Security
>> > 888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box
>> > 25107
>> > www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR
>> > 97298
>> > 
>> > On Tue, 7 Apr 2015, Dan Merillat wrote:
>> > 
>> > > > It works perfectly fine here with latest 3.18. My setup is backing
>> > > > a btrfs
>> > > > filesystem in write-back mode. I can reboot cleanly, hard-reset
>> > > > upon freezes, I had no issues yet and no data loss. Even after
>> > > > hard-reset the kernel logs of both bcache and btrfs were clean, the
>> > > > filesystem was clean,
>> > > > just the usual btrfs recovery messages after an unclean shutdown.
>> > > > 
>> > > > I wonder if the SSD and/or the block layer in use may be part of
>> > > > the problem:
>> > > > 
>> > > >    * if putting bcache on LVM, discards may not be handled well
>> > > >    * if putting bcache or the backing fs on LVM, barriers may not
>> > > >    be
>> > > > handled
>> > > >      well (bcache relies on perfectly working barriers)
>> > > >    * does the SSD support powerloss protection? (IOW, use
>> > > >    capacitors) * latest firmware applied? read the changelogs of
>> > > >    it?
>> > > > 
>> > > > I'd try to first figure out these differences before looking
>> > > > further into
>> > > > debugging. I guess that most consumer-grade drives at least lack a
>> > > > few of
>> > > > the important features to use write-back mode, or use bcache at
>> > > > all.
>> > > > 
>> > > > So, to start the list: My SSD is a Crucial MX100 128GB with
>> > > > discards enabled
>> > > > (for both bcache and btrfs), using plain raw devices (no LVM or MD
>> > > > involved). It supports TRIM (as my chipset does), and it supports
>> > > > powerloss-
>> > > > protection and maybe even some internal RAID-like data protection
>> > > > layer (whatever that is, it's in the papers).
>> > > > 
>> > > > I'm not sure what a hard-reset technically means to the SSD but I
>> > > > guess it
>> > > > is handled as some sort of short powerloss. Reading through
>> > > > different SSD
>> > > > firmware update descriptions, I also see a lot words around
>> > > > power-off and
>> > > > reset problems being fixed that could lead to data-loss otherwise.
>> > > > That could be pretty fatal to bcache as it considers it storage as
>> > > > always unclean
>> > > > (probably even in write-through mode). Having damaged data blocks
>> > > > out of expected write order (barriers!) could be pretty bad when
>> > > > bcache recovers
>> > > > from last shutdown and replays logs.
>> > > 
>> > > Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)
>> > > 
>> > > There's no known issues with TRIM on an 840-EVO, and no powerloss or
>> > > anything of the sort occurred.  I was seeing excessive write
>> > > amplification on my SSD, and enabled discard - then my machine
>> > > promptly started lagging, eventually disk access locked up and after
>> > > a reboot I was confronted with:
>> > > 
>> > > [  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
>> > > offset 2047
>> > > [  276.571448] bcache: prio_read() bad csum reading priorities
>> > > [  276.571528] bcache: prio_read() bad magic reading priorities
>> > > [  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
>> > > bad btree header at bucket 65638, block 0, 0 keys, disabling caching
>> > > [  276.577457] bcache: register_cache() registered cache device sda4
>> > > [  276.577632] bcache: cache_set_free() Cache set
>> > > 804d6906-fa80-40ac-9081-a71a4d595378 unregistered
>> > > 
>> > > Attempting to check the backingstore (echo 1 > bcache/running):
>> > > 
>> > > [  687.912987] BTRFS (device bcache0): parent transid verify failed
>> > > [  on
>> > > 7567956930560 wanted 613690 found 613681
>> > > [  687.913192] BTRFS (device bcache0): parent transid verify failed
>> > > [  on
>> > > 7567956930560 wanted 613690 found 613681
>> > > [  687.913231] BTRFS: failed to read tree root on bcache0
>> > > [  687.936073] BTRFS: open_ctree failed
>> > > 
>> > > The cache device is not going through LVM or anything of the sort, so
>> > > this is a direct failure of bcache.  Perhaps due to eraseblock
>> > > alignment and assumptions about sizes?  Either way, I've got a ton of
>> > > data to recover/restore now and I'm unhappy about it.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08 19:54                   ` Kai Krakow
@ 2015-04-08 22:02                     ` Dan Merillat
  2015-04-10 23:00                       ` Kai Krakow
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Merillat @ 2015-04-08 22:02 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

You can't always use the correct eraseblock size with BCache, since it
doesn't (didn't, at least at the time I created my cache) support
non-powers-of-two that TLC drives use.  That said, TRIM is not
supposed to blow away entire eraseblocks, just let the drive know the
mapping between presented LBA and internal address is no longer
needed, allowing it to do what it wishes with that knowledge
(generally reclaim multiple partial blocks to create fully empty
blocks).

I can't find any reports of errors with TRIM support in the 840-EVO
series.  They had/may still have a problem reading old data that was a
big deal in the fall, and there was an 850 firmware that bricked some
drives.  Nothing about TRIM erasing unintended data, though.

There were no problems with bcache at all in the year+ I've used it,
until I enabled bcache discard. Before that, I put on over 100
terabytes of writes to the bcache partition with no interface errors.
 I've also never seen a TRIM failure in other filesystems using the
same model in my other systems.  There was no powerloss, the system
went through a software reboot cycle before the failure.  I'm
therefore *extremely* hesitant about allowing this to be written off
as a hardware failure.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08 22:02                     ` Dan Merillat
@ 2015-04-10 23:00                       ` Kai Krakow
  2015-04-11  0:14                         ` Kai Krakow
  0 siblings, 1 reply; 25+ messages in thread
From: Kai Krakow @ 2015-04-10 23:00 UTC (permalink / raw)
  To: linux-bcache

Dan Merillat <dan.merillat@gmail.com> schrieb:

> You can't always use the correct eraseblock size with BCache, since it
> doesn't (didn't, at least at the time I created my cache) support
> non-powers-of-two that TLC drives use.  That said, TRIM is not
> supposed to blow away entire eraseblocks, just let the drive know the
> mapping between presented LBA and internal address is no longer
> needed, allowing it to do what it wishes with that knowledge
> (generally reclaim multiple partial blocks to create fully empty
> blocks).

Yes, I know that TRIM doesn't simply blow away blocks. It just marks them as 
unused. My recommendation was more or less for it to be efficient, otherwise 
you may experience write amplification problems on SSD which turns into 
peaks of bad performance from time to time.

One has simply take into account that SSD is a completely different 
technology than HDD. A logical sector here is not the native block size of 
the inner organization of the drive. It is made of flash memory blocks which 
are a lot larger than a single sector. Each of these blocks may be organized 
into "chunks" or "stripes" (in terms of RAID), so what makes up a complete 
logical block depends on the internal organization and layout of the flash 
chips.

With this knowledge one has to think about the fact that flash memory cannot 
be overridden or modified in a traditional aspect. Essentially, flash memory 
is write-once-read-multiple in this regard. For a block of flash memory to 
be reused, it has to be erased. That operation is not fast, it takes some 
time, and it can only applied to the complete organizational unit, read: the 
erase block size.

So, to be on the safe side performance-wise, you should tell your system (if 
applicable) at least an integer multiple of this native erase block size. My 
recommendation of 2MB should be safe for SLC and MLC drives, no matter if 
they are striped internally of 1, 2, or 4 flash memory blocks (usually 512k, 
read 1x, 2x, or 4x 512k, which is 2MB). As I learned, this is probably not 
true for TLC drives. For such drives, you probably may want to _not_ use 
discard in bcache and instead leave a space reservation to let the firmware 
do performant wear-levelling in the background. Thus I recommend to only 
partition 80% of the drive and leave the rest of it pre-trimmed.

> I can't find any reports of errors with TRIM support in the 840-EVO
> series.  They had/may still have a problem reading old data that was a
> big deal in the fall, and there was an 850 firmware that bricked some
> drives.  Nothing about TRIM erasing unintended data, though.

I don't remember where but I read about problems with TRIM and data loss 
with Samsung firmware in different (but rare) scenarios. Even the Samsung's 
performance restoration tool could accidently destroy data because it 
trimmed the drive. I cannot say which of the series this applied to. I used 
this tool multiple times myself and has good results with it, and could not 
confirm those reports. But I'd take my safety guards first, anyways, and use 
backups, and test my setup. Of course, you should always to it, but for 
those drives I'm especially picky about it.

> There were no problems with bcache at all in the year+ I've used it,
> until I enabled bcache discard. Before that, I put on over 100
> terabytes of writes to the bcache partition with no interface errors.

There are reports about endurance tests that say you can write petabytes of 
data to SSD before they die. Samsung's drives belong to the best performers 
here with one downside: If they die, in those tests they took all your data 
with them and without warning. Most other drives went into read-only mode 
first so you could at least get your data off those drives, but after a 
reboot those drives were dead, too.

http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead

From those reports, I conclude: If your drive suddenly slows down, it's a 
good idea to order a replacement and check the SMART stats (if you didn't do 
that before).

>  I've also never seen a TRIM failure in other filesystems using the
> same model in my other systems.  There was no powerloss, the system
> went through a software reboot cycle before the failure.  I'm
> therefore *extremely* hesitant about allowing this to be written off
> as a hardware failure.

I'm also not sure to instead call it a general bug or problem of bcache. The 
TRIM implementation seems to be correct, at least it doesn't show problems 
for me. I have TRIM enabled for btrfs, bcache, and the kernel claims it to 
be supported. So I'd rather call it an incompatibility or firmware flaw 
which needs to be worked around.

I think one has to keep in mind, that most consumer grade drives are tested 
by the manufacturers only for Windows. If they pass all tests there, they 
are good enough. That's sadly fact. Linux may expose bugs of 
hardware/firmware that are otherwise not visible.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-10 23:00                       ` Kai Krakow
@ 2015-04-11  0:14                         ` Kai Krakow
  2015-04-11  6:31                           ` Dan Merillat
  0 siblings, 1 reply; 25+ messages in thread
From: Kai Krakow @ 2015-04-11  0:14 UTC (permalink / raw)
  To: linux-bcache

Kai Krakow <hurikhan77@gmail.com> schrieb:

> Dan Merillat <dan.merillat@gmail.com> schrieb:
> 
>> You can't always use the correct eraseblock size with BCache, since it
>> doesn't (didn't, at least at the time I created my cache) support
>> non-powers-of-two that TLC drives use.  That said, TRIM is not
>> supposed to blow away entire eraseblocks, just let the drive know the
>> mapping between presented LBA and internal address is no longer
>> needed, allowing it to do what it wishes with that knowledge
>> (generally reclaim multiple partial blocks to create fully empty
>> blocks).
> 
> Yes, I know that TRIM doesn't simply blow away blocks. It just marks them
> as unused. My recommendation was more or less for it to be efficient,
> otherwise you may experience write amplification problems on SSD which
> turns into peaks of bad performance from time to time.
> 
> One has simply take into account that SSD is a completely different
> technology than HDD. A logical sector here is not the native block size of
> the inner organization of the drive. It is made of flash memory blocks
> which are a lot larger than a single sector. Each of these blocks may be
> organized into "chunks" or "stripes" (in terms of RAID), so what makes up
> a complete logical block depends on the internal organization and layout
> of the flash chips.
> 
> With this knowledge one has to think about the fact that flash memory
> cannot be overridden or modified in a traditional aspect. Essentially,
> flash memory is write-once-read-multiple in this regard. For a block of
> flash memory to be reused, it has to be erased. That operation is not
> fast, it takes some time, and it can only applied to the complete
> organizational unit, read: the erase block size.
> 
> So, to be on the safe side performance-wise, you should tell your system
> (if applicable) at least an integer multiple of this native erase block
> size. My recommendation of 2MB should be safe for SLC and MLC drives, no
> matter if they are striped internally of 1, 2, or 4 flash memory blocks
> (usually 512k, read 1x, 2x, or 4x 512k, which is 2MB). As I learned, this
> is probably not true for TLC drives. For such drives, you probably may
> want to _not_ use discard in bcache and instead leave a space reservation
> to let the firmware do performant wear-levelling in the background. Thus I
> recommend to only partition 80% of the drive and leave the rest of it
> pre-trimmed.
> 
>> I can't find any reports of errors with TRIM support in the 840-EVO
>> series.  They had/may still have a problem reading old data that was a
>> big deal in the fall, and there was an 850 firmware that bricked some
>> drives.  Nothing about TRIM erasing unintended data, though.
> 
> I don't remember where but I read about problems with TRIM and data loss
> with Samsung firmware in different (but rare) scenarios. Even the
> Samsung's performance restoration tool could accidently destroy data
> because it trimmed the drive. I cannot say which of the series this
> applied to. I used this tool multiple times myself and has good results
> with it, and could not confirm those reports. But I'd take my safety
> guards first, anyways, and use backups, and test my setup. Of course, you
> should always to it, but for those drives I'm especially picky about it.
> 
>> There were no problems with bcache at all in the year+ I've used it,
>> until I enabled bcache discard. Before that, I put on over 100
>> terabytes of writes to the bcache partition with no interface errors.
> 
> There are reports about endurance tests that say you can write petabytes
> of data to SSD before they die. Samsung's drives belong to the best
> performers here with one downside: If they die, in those tests they took
> all your data with them and without warning. Most other drives went into
> read-only mode first so you could at least get your data off those drives,
> but after a reboot those drives were dead, too.
> 
> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
> 
> From those reports, I conclude: If your drive suddenly slows down, it's a
> good idea to order a replacement and check the SMART stats (if you didn't
> do that before).
> 
>>  I've also never seen a TRIM failure in other filesystems using the
>> same model in my other systems.  There was no powerloss, the system
>> went through a software reboot cycle before the failure.  I'm
>> therefore *extremely* hesitant about allowing this to be written off
>> as a hardware failure.
> 
> I'm also not sure to instead call it a general bug or problem of bcache.
> The TRIM implementation seems to be correct, at least it doesn't show
> problems for me. I have TRIM enabled for btrfs, bcache, and the kernel
> claims it to be supported. So I'd rather call it an incompatibility or
> firmware flaw which needs to be worked around.
> 
> I think one has to keep in mind, that most consumer grade drives are
> tested by the manufacturers only for Windows. If they pass all tests
> there, they are good enough. That's sadly fact. Linux may expose bugs of
> hardware/firmware that are otherwise not visible.

I'd like to amend: Because Samsung's TLC drives (at least for 21nm 
production) are unaligned for most OS operations (because the block size is 
no power of two), the firmware has to be more complex. This implies it is 
more prone to bugs. So, this is not a question of Samsung or not, it is a 
question of TLC or not. But Samsung is one of the first to implement TLC on 
a broad basis. They fixed this problem with the new 19nm production by using 
alignable block sizes. See tables one here:

http://www.anandtech.com/show/7173/samsung-ssd-840-evo-review-120gb-250gb-500gb-750gb-1tb-models-tested

It may be a better to use 512kb bucket size in bcache when wanting to try 
discard because this gives the firmware a chance to do wear-levelling for 3 
blocks at once and then throw away this job instead of accumulating maybe 
hundrets of "half-sized" discard jobs and wait and manage until those jobs 
can be merged into one erase job. If you instruct the drive to discard 2M, 
it can immediatly discard 1.5M but has to store information about discarding 
the remainder of 512k sometime later. This introduces a more complex code 
path to trigger, and more complex means a higher probability for bugs. Of 
course every firmware has to implement that code path because the OS may 
send discards unaligned. But as filesystems usually operate on aligned 
bounderies, that code path is usually not triggered. But still, that path 
can expose more bugs in every manufacturer's firmware.

Concluding: Of course, this only applies if you use such a "strange" Samsung 
drive. Most of Samsung's drives use "normal" block sizes. And as mentioned 
earlier, this is probably not specific to Samsung but to all manufacturers 
that use TLC and non-power-of-two block sizes.

So, if I were to hit this problem, I'd try to experiment with smaller bucket 
sizes in bcache that fit into the erase block size with an integer multiple 
if I'd still wanted to use discard. If this helps, there's probably an easy 
way to work around this quirk in bcache's kernel code.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-11  0:14                         ` Kai Krakow
@ 2015-04-11  6:31                           ` Dan Merillat
  2015-04-11  6:54                             ` Dan Merillat
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Merillat @ 2015-04-11  6:31 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

On Fri, Apr 10, 2015 at 8:14 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> There are reports about endurance tests that say you can write petabytes
>> of data to SSD before they die. Samsung's drives belong to the best
>> performers here with one downside: If they die, in those tests they took
>> all your data with them and without warning. Most other drives went into
>> read-only mode first so you could at least get your data off those drives,
>> but after a reboot those drives were dead, too.
>>
>> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead

This has what to do with bcache eating my data, again?

>> From those reports, I conclude: If your drive suddenly slows down, it's a
>> good idea to order a replacement and check the SMART stats (if you didn't
>> do that before).

This as well.  The drive didn't die, all sectors are still readable.
No errors at all on SMART.

>> I'm also not sure to instead call it a general bug or problem of bcache.
>> The TRIM implementation seems to be correct, at least it doesn't show
>> problems for me. I have TRIM enabled for btrfs, bcache, and the kernel
>> claims it to be supported. So I'd rather call it an incompatibility or
>> firmware flaw which needs to be worked around.

Please explain why no other filesystem, windows OR linux, has errors
with TRIM on this drive.

The other part I don't understand is that nothing should have been
discarded yet - I took the time
to flush the cache to disk (echo none > cache_mode), waited for
writeback to complete, detach the cdev,
waited for the detach to complete, recreate it with the correct
blocksize and discard enabled, then re-attach.  I ran for maybe 15
minutes like this before rebooting, for perhaps a dozen GB written out
of the 200gb cache partition.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-11  6:31                           ` Dan Merillat
@ 2015-04-11  6:54                             ` Dan Merillat
  2015-04-11  7:52                               ` Kai Krakow
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Merillat @ 2015-04-11  6:54 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

Looking through the kernel log, this may be related: I booted into
4.0-rc7, and attempted to run it there at first:
Apr  7 12:54:08 fileserver kernel: [ 2028.533893] bcache-register:
page allocation failure: order:8, mode:0x
... memory dump
Apr  7 12:54:08 fileserver kernel: [ 2028.541396] bcache:
register_cache() error opening sda4: cannot allocate memory


Apr  7 12:55:08 fileserver kernel: [ 2088.639190] bcache:
__cached_dev_store() Can't attach 804d6906-fa80-40ac-9081-a71a4d595378
Apr  7 12:55:08 fileserver kernel: [ 2088.639190] : cache set not found

I poked the vm.min_free_kbytes and retried, and got the following:

Apr  7 12:55:29 fileserver kernel: [ 2109.303315] bcache:
run_cache_set() invalidating existing data
Apr  7 12:55:29 fileserver kernel: [ 2109.408255] bcache:
bch_cached_dev_attach() Caching md127 as bcache0 on set
804d6906-fa80-40ac-9081-a71a4d595378
Apr  7 12:55:29 fileserver kernel: [ 2109.408443] bcache:
register_cache() registered cache device sda4
Apr  7 12:55:33 fileserver kernel: [ 2113.307687] bcache:
bch_cached_dev_attach() Can't attach md127: already attached
Apr  7 12:55:33 fileserver kernel: [ 2113.307747] bcache:
__cached_dev_store() Can't attach 804d6906-fa80-40ac-9081-a71a4d595378
Apr  7 12:55:33 fileserver kernel: [ 2113.307747] : cache set not found

A few hours later, I was getting stalls:
Apr  7 18:00:20 fileserver kernel: [20400.288049] INFO: task java:3610
blocked for more than 120 seconds.
Apr  7 18:00:20 fileserver kernel: [20400.288069]       Not tainted 4.0.0-rc7 #1
Apr  7 18:00:20 fileserver kernel: [20400.288085] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this
 message.
Apr  7 18:00:20 fileserver kernel: [20400.293521] INFO: task
nmbd:22692 blocked for more than 120 seconds.
Apr  7 18:00:20 fileserver kernel: [20400.293532]       Not tainted 4.0.0-rc7 #1
Apr  7 18:00:20 fileserver kernel: [20400.293545] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this
 message.

So I rebooted to 4.0-rc7 again:
Apr  7 19:36:23 fileserver kernel: [    2.145004] bcache:
journal_read_bucket() 157: too big, 552 bytes, offset 2047
Apr  7 19:36:23 fileserver kernel: [    2.154586] bcache: prio_read()
bad csum reading priorities
Apr  7 19:36:23 fileserver kernel: [    2.154643] bcache: prio_read()
bad magic reading priorities
Apr  7 19:36:23 fileserver kernel: [    2.158008] bcache: error on
804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
65638, block 0, 0 keys, disabling caching
Apr  7 19:36:23 fileserver kernel: [    2.158408] bcache:
cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
unregistered
Apr  7 19:36:23 fileserver kernel: [    2.158468] bcache:
register_cache() registered cache device sda4

Apr  7 19:36:23 fileserver kernel: [    2.226581] md127: detected
capacity change from 0 to 12001954234368
Apr  7 19:36:23 fileserver kernel: [    2.265347] bcache:
register_bdev() registered backing device md127

Apr  7 19:36:23 fileserver kernel: [   21.423819] bcache:
journal_read_bucket() 157: too big, 552 bytes, offset 2047
Apr  7 19:36:23 fileserver kernel: [   21.432091] bcache: prio_read()
bad csum reading priorities
Apr  7 19:36:23 fileserver kernel: [   21.432138] bcache: prio_read()
bad magic reading priorities
Apr  7 19:36:23 fileserver kernel: [   21.435613] bcache: error on
804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
65638, block 0, 0 keys, disabling caching
Apr  7 19:36:23 fileserver kernel: [   21.436225] bcache:
cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
unregistered
Apr  7 19:36:23 fileserver kernel: [   21.436273] bcache:
register_cache() registered cache device sda4

At this point, everything is gone, and that's where I'm at right now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-11  6:54                             ` Dan Merillat
@ 2015-04-11  7:52                               ` Kai Krakow
  2015-04-11 18:53                                 ` Dan Merillat
       [not found]                                 ` <CAPL5yKfpk8+6Vw cUVcwJ9QxAZJQmqaa98spCyT7+LekkRvkeAw@mail.gmail.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Kai Krakow @ 2015-04-11  7:52 UTC (permalink / raw)
  To: linux-bcache

Dan Merillat <dan.merillat@gmail.com> schrieb:

> Looking through the kernel log, this may be related: I booted into
> 4.0-rc7, and attempted to run it there at first:
> Apr  7 12:54:08 fileserver kernel: [ 2028.533893] bcache-register:
> page allocation failure: order:8, mode:0x
> ... memory dump
> Apr  7 12:54:08 fileserver kernel: [ 2028.541396] bcache:
> register_cache() error opening sda4: cannot allocate memory

Is your system under memory stress? Are you maybe using the huge memory page 
allocation policy in your kernel? If yes, could you retry without or at 
least set it to madvice mode?

> Apr  7 12:55:29 fileserver kernel: [ 2109.303315] bcache:
> run_cache_set() invalidating existing data
> Apr  7 12:55:29 fileserver kernel: [ 2109.408255] bcache:
> bch_cached_dev_attach() Caching md127 as bcache0 on set
> 804d6906-fa80-40ac-9081-a71a4d595378

Why is it on md? I thought you are not using intermediate layers like LVM...

> Apr  7 12:55:29 fileserver kernel: [ 2109.408443] bcache:
> register_cache() registered cache device sda4
> Apr  7 12:55:33 fileserver kernel: [ 2113.307687] bcache:
> bch_cached_dev_attach() Can't attach md127: already attached

And why is it done twice? Something looks strange here... What is your 
device layout?

> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] bcache:
> __cached_dev_store() Can't attach 804d6906-fa80-40ac-9081-a71a4d595378
> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] : cache set not found

My first guess would be that two different caches overlap and try to share 
the same device space. I had a similar problem after repartitioning because 
I did not "wipefs" the device first.

> A few hours later, I was getting stalls:
> Apr  7 18:00:20 fileserver kernel: [20400.288049] INFO: task java:3610
> blocked for more than 120 seconds.
> Apr  7 18:00:20 fileserver kernel: [20400.288069]       Not tainted
> 4.0.0-rc7 #1
> Apr  7 18:00:20 fileserver kernel: [20400.288085] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this
>  message.
> Apr  7 18:00:20 fileserver kernel: [20400.293521] INFO: task
> nmbd:22692 blocked for more than 120 seconds.
> Apr  7 18:00:20 fileserver kernel: [20400.293532]       Not tainted
> 4.0.0-rc7 #1
> Apr  7 18:00:20 fileserver kernel: [20400.293545] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this
>  message.

If you are using huge memory this may be an artifact of your initial 
finding.

> So I rebooted to 4.0-rc7 again:
> Apr  7 19:36:23 fileserver kernel: [    2.145004] bcache:
> journal_read_bucket() 157: too big, 552 bytes, offset 2047
> Apr  7 19:36:23 fileserver kernel: [    2.154586] bcache: prio_read()
> bad csum reading priorities
> Apr  7 19:36:23 fileserver kernel: [    2.154643] bcache: prio_read()
> bad magic reading priorities
> Apr  7 19:36:23 fileserver kernel: [    2.158008] bcache: error on
> 804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
> 65638, block 0, 0 keys, disabling caching

Same here: If somehow two different caches overwrite each other, this could 
explain the problem.

> Apr  7 19:36:23 fileserver kernel: [    2.158408] bcache:
> cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
> unregistered
> Apr  7 19:36:23 fileserver kernel: [    2.158468] bcache:
> register_cache() registered cache device sda4
> 
> Apr  7 19:36:23 fileserver kernel: [    2.226581] md127: detected
> capacity change from 0 to 12001954234368

I wonder where md127 comes from... Maybe bcache probing is running too early 
and should run after md setup.

> Apr  7 19:36:23 fileserver kernel: [    2.265347] bcache:
> register_bdev() registered backing device md127
> 
> Apr  7 19:36:23 fileserver kernel: [   21.423819] bcache:
> journal_read_bucket() 157: too big, 552 bytes, offset 2047
> Apr  7 19:36:23 fileserver kernel: [   21.432091] bcache: prio_read()
> bad csum reading priorities
> Apr  7 19:36:23 fileserver kernel: [   21.432138] bcache: prio_read()
> bad magic reading priorities
> Apr  7 19:36:23 fileserver kernel: [   21.435613] bcache: error on
> 804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
> 65638, block 0, 0 keys, disabling caching
> Apr  7 19:36:23 fileserver kernel: [   21.436225] bcache:
> cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
> unregistered
> Apr  7 19:36:23 fileserver kernel: [   21.436273] bcache:
> register_cache() registered cache device sda4
> 
> At this point, everything is gone, and that's where I'm at right now.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-11  7:52                               ` Kai Krakow
@ 2015-04-11 18:53                                 ` Dan Merillat
       [not found]                                 ` <CAPL5yKfpk8+6Vw cUVcwJ9QxAZJQmqaa98spCyT7+LekkRvkeAw@mail.gmail.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Dan Merillat @ 2015-04-11 18:53 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

On Sat, Apr 11, 2015 at 3:52 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Dan Merillat <dan.merillat@gmail.com> schrieb:
>
>> Looking through the kernel log, this may be related: I booted into
>> 4.0-rc7, and attempted to run it there at first:
>> Apr  7 12:54:08 fileserver kernel: [ 2028.533893] bcache-register:
>> page allocation failure: order:8, mode:0x
>> ... memory dump
>> Apr  7 12:54:08 fileserver kernel: [ 2028.541396] bcache:
>> register_cache() error opening sda4: cannot allocate memory
>
> Is your system under memory stress? Are you maybe using the huge memory page
> allocation policy in your kernel? If yes, could you retry without or at
> least set it to madvice mode?

No, it's right after bootup, nothing heavy running yet.  No idea why
memory is already so fragmented - it's something to do with 4.0-rc7,
since it never has had that problem on 3.18.

>> Apr  7 12:55:29 fileserver kernel: [ 2109.303315] bcache:
>> run_cache_set() invalidating existing data
>> Apr  7 12:55:29 fileserver kernel: [ 2109.408255] bcache:
>> bch_cached_dev_attach() Caching md127 as bcache0 on set
>> 804d6906-fa80-40ac-9081-a71a4d595378
>
> Why is it on md? I thought you are not using intermediate layers like LVM...

The backing device is MD, the cdev is directly on sda4

>> Apr  7 12:55:29 fileserver kernel: [ 2109.408443] bcache:
>> register_cache() registered cache device sda4
>> Apr  7 12:55:33 fileserver kernel: [ 2113.307687] bcache:
>> bch_cached_dev_attach() Can't attach md127: already attached
>
> And why is it done twice? Something looks strange here... What is your
> device layout?

2100 seconds after boot?  That's me doing it manually to try to figure
out why I can't access my filesystem.

>
>> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] bcache:
>> __cached_dev_store() Can't attach 804d6906-fa80-40ac-9081-a71a4d595378
>> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] : cache set not found
>
> My first guess would be that two different caches overlap and try to share
> the same device space. I had a similar problem after repartitioning because
> I did not "wipefs" the device first.

I had to wipefs, it wouldn't let me create the bcache super until I did.

> If you are using huge memory this may be an artifact of your initial
> finding.

I'm not using it for anything, but it's configured.  It's never given
this problem in 3.18, so something changed in 4.0.

>
>> So I rebooted to 4.0-rc7 again:
>> Apr  7 19:36:23 fileserver kernel: [    2.145004] bcache:
>> journal_read_bucket() 157: too big, 552 bytes, offset 2047
>> Apr  7 19:36:23 fileserver kernel: [    2.154586] bcache: prio_read()
>> bad csum reading priorities
>> Apr  7 19:36:23 fileserver kernel: [    2.154643] bcache: prio_read()
>> bad magic reading priorities
>> Apr  7 19:36:23 fileserver kernel: [    2.158008] bcache: error on
>> 804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
>> 65638, block 0, 0 keys, disabling caching
>
> Same here: If somehow two different caches overwrite each other, this could
> explain the problem.

Possibly!  So wipefs wasn't good enough, I should have done a discard
on the entire cdev
to make sure?

>
>> Apr  7 19:36:23 fileserver kernel: [    2.158408] bcache:
>> cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
>> unregistered
>> Apr  7 19:36:23 fileserver kernel: [    2.158468] bcache:
>> register_cache() registered cache device sda4
>>
>> Apr  7 19:36:23 fileserver kernel: [    2.226581] md127: detected
>> capacity change from 0 to 12001954234368
>
> I wonder where md127 comes from... Maybe bcache probing is running too early
> and should run after md setup.

No, that's how udev works, it registers things as it finds them.  So
on raw disks it finds
the bcache cdev, and registers it.  Then it finds the raid signature
and sets it up.  When the new md127 shows up, it finds the bdev
signature and registers that.   Bog-standard setup, most people never
look this closely at the startup.  I'd hope bcache wouldn't screw up
if its pieces get registered in a different order.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
       [not found]                                 ` <CAPL5yKfpk8+6Vw cUVcwJ9QxAZJQmqaa98spCyT7+LekkRvkeAw@mail.gmail.com>
@ 2015-04-11 20:09                                   ` Kai Krakow
  2015-04-12  5:56                                     ` Dan Merillat
  0 siblings, 1 reply; 25+ messages in thread
From: Kai Krakow @ 2015-04-11 20:09 UTC (permalink / raw)
  To: linux-bcache

Dan Merillat <dan.merillat@gmail.com> schrieb:

> On Sat, Apr 11, 2015 at 3:52 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> Dan Merillat <dan.merillat@gmail.com> schrieb:
>>
>>> Looking through the kernel log, this may be related: I booted into
>>> 4.0-rc7, and attempted to run it there at first:
>>> Apr  7 12:54:08 fileserver kernel: [ 2028.533893] bcache-register:
>>> page allocation failure: order:8, mode:0x
>>> ... memory dump
>>> Apr  7 12:54:08 fileserver kernel: [ 2028.541396] bcache:
>>> register_cache() error opening sda4: cannot allocate memory
>>
>> Is your system under memory stress? Are you maybe using the huge memory
>> page allocation policy in your kernel? If yes, could you retry without or
>> at least set it to madvice mode?
> 
> No, it's right after bootup, nothing heavy running yet.  No idea why
> memory is already so fragmented - it's something to do with 4.0-rc7,
> since it never has had that problem on 3.18.

I'm thinking of the same: something new in 4.0-*. And it is early boot, so 
huge memory should make no difference. Yet, I had problems with it when it 
was configured - memory became heavily fragmented and swapping a very slow 
process. So I set it to madvice policy and had no problems since. Maybe 
worth a try, tho still I couldn't imagine why this became a problem in early 
boot then.

>>> Apr  7 12:55:29 fileserver kernel: [ 2109.303315] bcache:
>>> run_cache_set() invalidating existing data
>>> Apr  7 12:55:29 fileserver kernel: [ 2109.408255] bcache:
>>> bch_cached_dev_attach() Caching md127 as bcache0 on set
>>> 804d6906-fa80-40ac-9081-a71a4d595378
>>
>> Why is it on md? I thought you are not using intermediate layers like
>> LVM...
> 
> The backing device is MD, the cdev is directly on sda4

Ah okay...

>>> Apr  7 12:55:29 fileserver kernel: [ 2109.408443] bcache:
>>> register_cache() registered cache device sda4
>>> Apr  7 12:55:33 fileserver kernel: [ 2113.307687] bcache:
>>> bch_cached_dev_attach() Can't attach md127: already attached
>>
>> And why is it done twice? Something looks strange here... What is your
>> device layout?
> 
> 2100 seconds after boot?  That's me doing it manually to try to figure
> out why I can't access my filesystem.

Oh, I didn't get the time difference here.

>>> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] bcache:
>>> __cached_dev_store() Can't attach 804d6906-fa80-40ac-9081-a71a4d595378
>>> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] : cache set not found
>>
>> My first guess would be that two different caches overlap and try to
>> share the same device space. I had a similar problem after repartitioning
>> because I did not "wipefs" the device first.
> 
> I had to wipefs, it wouldn't let me create the bcache super until I did.

Yes same here. But that is not what I meant. Some years back I created btrfs 
on raw device, then decided to prefer partitioning, did that, and created 
btrfs inside that partition. The kernel (or udev) saw two btrfs now. Another 
user here reported the same problem when creating bcache first on raw 
device, then in a partition. The kernel saw two bcache devices, with one 
being broken. The fix was to kill the superblock signature that was still 
lingering around in the raw device. But it would have just been easier to 
wipefs in the first place before partitioning.

>> If you are using huge memory this may be an artifact of your initial
>> finding.
> 
> I'm not using it for anything, but it's configured.  It's never given
> this problem in 3.18, so something changed in 4.0.

Maybe want to try without huge memory option? Just to sort things out?

>>> So I rebooted to 4.0-rc7 again:
>>> Apr  7 19:36:23 fileserver kernel: [    2.145004] bcache:
>>> journal_read_bucket() 157: too big, 552 bytes, offset 2047
>>> Apr  7 19:36:23 fileserver kernel: [    2.154586] bcache: prio_read()
>>> bad csum reading priorities
>>> Apr  7 19:36:23 fileserver kernel: [    2.154643] bcache: prio_read()
>>> bad magic reading priorities
>>> Apr  7 19:36:23 fileserver kernel: [    2.158008] bcache: error on
>>> 804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
>>> 65638, block 0, 0 keys, disabling caching
>>
>> Same here: If somehow two different caches overwrite each other, this
>> could explain the problem.
> 
> Possibly!  So wipefs wasn't good enough, I should have done a discard
> on the entire cdev
> to make sure?

See above... And I have another idea below:

>>> Apr  7 19:36:23 fileserver kernel: [    2.158408] bcache:
>>> cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
>>> unregistered
>>> Apr  7 19:36:23 fileserver kernel: [    2.158468] bcache:
>>> register_cache() registered cache device sda4
>>>
>>> Apr  7 19:36:23 fileserver kernel: [    2.226581] md127: detected
>>> capacity change from 0 to 12001954234368
>>
>> I wonder where md127 comes from... Maybe bcache probing is running too
>> early and should run after md setup.
> 
> No, that's how udev works, it registers things as it finds them.  So
> on raw disks it finds
> the bcache cdev, and registers it.  Then it finds the raid signature
> and sets it up.  When the new md127 shows up, it finds the bdev
> signature and registers that.   Bog-standard setup, most people never
> look this closely at the startup.  I'd hope bcache wouldn't screw up
> if its pieces get registered in a different order.

I don't think that order is a problem. But I remember my md times back in 
kernel 2.2 when I used it to mirror two hard disks. The problem with md (and 
that's why I never again used it later and avoided it), at least at that 
time, was: Any software would see the same data through two devices: The md 
device and the underlying raw device. MD didn't hide it away the way bcache 
or lvm do it (by using a private superblock), it is simply dependent on a 
configuration file and some auto-detection through some partition signature. 
This is an artifact from the fact that you could easily migrate from single 
device to md raid device using md without backup/restore as outlined here:

http://tldp.org/HOWTO/Software-RAID-HOWTO-7.html

With this knowledge, I guess that bcache could probably detect its backing 
device signature twice - once through the underlying raw device and once 
through the md device. From your logs I'm not sure if they were complete 
enough to see that case. But to be sure I'd modify the udev rules to exclude 
the md parent devices from being run through probe-bcache. Otherwise all 
sorts of strange things may happen (like one process accessing the backing 
device through md, while bcache access it through the parent device - 
probably even on different mirror stripes).

It's your setup, but personally I'd avoid MD for that reason and go with 
lvm. MD is just not modern, neither appropriate for modern system setups. It 
should really be just there for legacy setups and migration paths.

I'm also not sure if MD is able to pass-through write barriers correctly 
which is needed to stay with consistent filesystems in case of 
crashes/reboots. Even LVM/device-mapper ignored them back some (more) kernel 
versions and I am not sure if they are respected for every target type as of 
now - why I always recommend to avoid them if not needed.

I could also imagine that ignoring write-barriers (not passing them down to 
the hardware while the filesystem driver expected it to work) and using 
discard may lead to filesystem corruption upon reboots or crashes.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-11 20:09                                   ` Kai Krakow
@ 2015-04-12  5:56                                     ` Dan Merillat
  2015-04-29 17:48                                       ` Dan Merillat
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Merillat @ 2015-04-12  5:56 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

On Sat, Apr 11, 2015 at 4:09 PM, Kai Krakow <hurikhan77@gmail.com> wrote:

> With this knowledge, I guess that bcache could probably detect its backing
> device signature twice - once through the underlying raw device and once
> through the md device. From your logs I'm not sure if they were complete

It doesn't, the system is smarter than you think it is.

> enough to see that case. But to be sure I'd modify the udev rules to exclude
> the md parent devices from being run through probe-bcache. Otherwise all
> sorts of strange things may happen (like one process accessing the backing
> device through md, while bcache access it through the parent device -
> probably even on different mirror stripes).

This didn't occur, I copied all the lines pertaining to bcache but
skipped the superfluous ones.

> It's your setup, but personally I'd avoid MD for that reason and go with
> lvm. MD is just not modern, neither appropriate for modern system setups. It
> should really be just there for legacy setups and migration paths.

Not related to bcache at all.  Perhaps complain about MD on the
appropriate list?  I'm not seeing any evidence that MD had anything to
do with this, especially since the issues with bcache are entirely
confined to the direct SATA access to /dev/sda4.

In that vein, I'm reading the on-disk format of bcache and seeing
exactly what's still valid on my system.  It looks like I've got
65,000 good buckets before the first bad one.  My idea is to go
through, look for valid data in the buckets and use a COW in
user-mode-linux to write that data back to the (copy-on-write version
of) the backing device.  Basically, anything that passes checksum and
is still 'dirty', force-write-it-out.  Then see what the status of my
backing-store is.  If it works, do it outside UML to the real backing
store.

Are there any diagnostic tools outside the bcache-tools repo? Not much
there other than show the superblock info.  Otherwise I'll just finish
writing it myself.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-12  5:56                                     ` Dan Merillat
@ 2015-04-29 17:48                                       ` Dan Merillat
  2015-04-29 18:00                                         ` Ming Lin
  2015-04-29 19:57                                         ` Kai Krakow
  0 siblings, 2 replies; 25+ messages in thread
From: Dan Merillat @ 2015-04-29 17:48 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-bcache

Killed it again - enabled bcache discard, copied a few TB of data from
the backup the the drive, rebooted, different error
"bcache: bch_cached_dev_attach() Couldn't find uuid for <REDACTED> in set"

The exciting failure that required reboot this time was an infinite
spin in bcache_writeback.

I'll give it another shot at narrowing down exactly what causes the
failure before I give up on bcache entirely.

On Sun, Apr 12, 2015 at 1:56 AM, Dan Merillat <dan.merillat@gmail.com> wrote:
> On Sat, Apr 11, 2015 at 4:09 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>
>> With this knowledge, I guess that bcache could probably detect its backing
>> device signature twice - once through the underlying raw device and once
>> through the md device. From your logs I'm not sure if they were complete
>
> It doesn't, the system is smarter than you think it is.
>
>> enough to see that case. But to be sure I'd modify the udev rules to exclude
>> the md parent devices from being run through probe-bcache. Otherwise all
>> sorts of strange things may happen (like one process accessing the backing
>> device through md, while bcache access it through the parent device -
>> probably even on different mirror stripes).
>
> This didn't occur, I copied all the lines pertaining to bcache but
> skipped the superfluous ones.
>
>> It's your setup, but personally I'd avoid MD for that reason and go with
>> lvm. MD is just not modern, neither appropriate for modern system setups. It
>> should really be just there for legacy setups and migration paths.
>
> Not related to bcache at all.  Perhaps complain about MD on the
> appropriate list?  I'm not seeing any evidence that MD had anything to
> do with this, especially since the issues with bcache are entirely
> confined to the direct SATA access to /dev/sda4.
>
> In that vein, I'm reading the on-disk format of bcache and seeing
> exactly what's still valid on my system.  It looks like I've got
> 65,000 good buckets before the first bad one.  My idea is to go
> through, look for valid data in the buckets and use a COW in
> user-mode-linux to write that data back to the (copy-on-write version
> of) the backing device.  Basically, anything that passes checksum and
> is still 'dirty', force-write-it-out.  Then see what the status of my
> backing-store is.  If it works, do it outside UML to the real backing
> store.
>
> Are there any diagnostic tools outside the bcache-tools repo? Not much
> there other than show the superblock info.  Otherwise I'll just finish
> writing it myself.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-29 17:48                                       ` Dan Merillat
@ 2015-04-29 18:00                                         ` Ming Lin
  2015-04-29 19:57                                         ` Kai Krakow
  1 sibling, 0 replies; 25+ messages in thread
From: Ming Lin @ 2015-04-29 18:00 UTC (permalink / raw)
  To: Dan Merillat; +Cc: Kai Krakow, linux-bcache

On Wed, Apr 29, 2015 at 10:48 AM, Dan Merillat <dan.merillat@gmail.com> wrote:
> Killed it again - enabled bcache discard, copied a few TB of data from
> the backup the the drive, rebooted, different error
> "bcache: bch_cached_dev_attach() Couldn't find uuid for <REDACTED> in set"
>
> The exciting failure that required reboot this time was an infinite
> spin in bcache_writeback.
>
> I'll give it another shot at narrowing down exactly what causes the
> failure before I give up on bcache entirely.

Don't know if it helps or not.Maybe you can have a try these patches:
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=bcache

Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-29 17:48                                       ` Dan Merillat
  2015-04-29 18:00                                         ` Ming Lin
@ 2015-04-29 19:57                                         ` Kai Krakow
  1 sibling, 0 replies; 25+ messages in thread
From: Kai Krakow @ 2015-04-29 19:57 UTC (permalink / raw)
  To: linux-bcache

Dan Merillat <dan.merillat@gmail.com> schrieb:

> Killed it again - enabled bcache discard, copied a few TB of data from
> the backup the the drive, rebooted, different error
> "bcache: bch_cached_dev_attach() Couldn't find uuid for <REDACTED> in set"
> 
> The exciting failure that required reboot this time was an infinite
> spin in bcache_writeback.
> 
> I'll give it another shot at narrowing down exactly what causes the
> failure before I give up on bcache entirely.

I wonder what is "wrong" with your setup... Using bcache with online discard 
works rock solid for me. So your access patterns either trigger a bug in 
your storage software stack (driver/md/bcache/fs) or in your hardware's 
firmware (bcache probably exposes very different access patterns from normal 
filesystem access).

I think the frustation level is already pretty high but given that you take 
either discard or bcache out of the stack and it works, I wonder what 
happens if you take maybe md out of the stack instead.

I also wonder if you could trigger the problem if you enable online discard 
on the fs only while using bcache. I have enabled discard for both bcache 
and the fs. I don't know how it would pass from the fs down the storage 
layer but at least I could enable it: it's announced to be supported by the 
virtual bcache block device.

Then, I'd also take chance to try a completely different SSD hardware which 
has proven to work, and use it for the same setup and see if it works then 
to rule the firmware out.

For the last part, I can say that a Crucial MX100 128GB works for me, tho I 
don't use md. I applied a firmware updates lately (MU02) which, from the 
Changelog, stated that it fixed NCQ TRIM commands (queued discards, but the 
kernel blacklisted queued discards for my model) and improved cable signal 
issues. I wonder if the kernel enabled NCQ TRIM for your drive and you could 
maybe blacklist your drive manually in the kernel source and see if "normal" 
TRIM command would work.

Could you maybe try libata.force=noncq or libata.force=X.YY:noncq? Since 
bcache is a huge, block sorting elevator, it shouldn't hurt too much.
 
> On Sun, Apr 12, 2015 at 1:56 AM, Dan Merillat <dan.merillat@gmail.com>
> wrote:
>> On Sat, Apr 11, 2015 at 4:09 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>>
>>> With this knowledge, I guess that bcache could probably detect its
>>> backing device signature twice - once through the underlying raw device
>>> and once through the md device. From your logs I'm not sure if they were
>>> complete
>>
>> It doesn't, the system is smarter than you think it is.
>>
>>> enough to see that case. But to be sure I'd modify the udev rules to
>>> exclude the md parent devices from being run through probe-bcache.
>>> Otherwise all sorts of strange things may happen (like one process
>>> accessing the backing device through md, while bcache access it through
>>> the parent device - probably even on different mirror stripes).
>>
>> This didn't occur, I copied all the lines pertaining to bcache but
>> skipped the superfluous ones.
>>
>>> It's your setup, but personally I'd avoid MD for that reason and go with
>>> lvm. MD is just not modern, neither appropriate for modern system
>>> setups. It should really be just there for legacy setups and migration
>>> paths.
>>
>> Not related to bcache at all.  Perhaps complain about MD on the
>> appropriate list?  I'm not seeing any evidence that MD had anything to
>> do with this, especially since the issues with bcache are entirely
>> confined to the direct SATA access to /dev/sda4.
>>
>> In that vein, I'm reading the on-disk format of bcache and seeing
>> exactly what's still valid on my system.  It looks like I've got
>> 65,000 good buckets before the first bad one.  My idea is to go
>> through, look for valid data in the buckets and use a COW in
>> user-mode-linux to write that data back to the (copy-on-write version
>> of) the backing device.  Basically, anything that passes checksum and
>> is still 'dirty', force-write-it-out.  Then see what the status of my
>> backing-store is.  If it works, do it outside UML to the real backing
>> store.
>>
>> Are there any diagnostic tools outside the bcache-tools repo? Not much
>> there other than show the superblock info.  Otherwise I'll just finish
>> writing it myself.
-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bcache fails after reboot if discard is enabled
  2015-04-08  0:06           ` Dan Merillat
  2015-04-08 18:17             ` Eric Wheeler
  2015-04-08 18:46             ` Kai Krakow
@ 2015-06-05  5:11             ` Kai Krakow
  2 siblings, 0 replies; 25+ messages in thread
From: Kai Krakow @ 2015-06-05  5:11 UTC (permalink / raw)
  To: linux-bcache

Dan Merillat <dan.merillat@gmail.com> schrieb:

>> It works perfectly fine here with latest 3.18. My setup is backing a
>> btrfs filesystem in write-back mode. I can reboot cleanly, hard-reset
>> upon freezes, I had no issues yet and no data loss. Even after hard-reset
>> the kernel logs of both bcache and btrfs were clean, the filesystem was
>> clean, just the usual btrfs recovery messages after an unclean shutdown.
>>
>> I wonder if the SSD and/or the block layer in use may be part of the
>> problem:
>>
>>   * if putting bcache on LVM, discards may not be handled well
>>   * if putting bcache or the backing fs on LVM, barriers may not be
>>   handled
>>     well (bcache relies on perfectly working barriers)
>>   * does the SSD support powerloss protection? (IOW, use capacitors)
>>   * latest firmware applied? read the changelogs of it?
>>
>> I'd try to first figure out these differences before looking further into
>> debugging. I guess that most consumer-grade drives at least lack a few of
>> the important features to use write-back mode, or use bcache at all.
>>
>> So, to start the list: My SSD is a Crucial MX100 128GB with discards
>> enabled (for both bcache and btrfs), using plain raw devices (no LVM or
>> MD involved). It supports TRIM (as my chipset does), and it supports
>> powerloss- protection and maybe even some internal RAID-like data
>> protection layer (whatever that is, it's in the papers).
>>
>> I'm not sure what a hard-reset technically means to the SSD but I guess
>> it is handled as some sort of short powerloss. Reading through different
>> SSD firmware update descriptions, I also see a lot words around power-off
>> and reset problems being fixed that could lead to data-loss otherwise.
>> That could be pretty fatal to bcache as it considers it storage as always
>> unclean (probably even in write-through mode). Having damaged data blocks
>> out of expected write order (barriers!) could be pretty bad when bcache
>> recovers from last shutdown and replays logs.
> 
> Samsung 840-EVO 256GB here, running 4.0-rc7 (was 3.18)
> 
> There's no known issues with TRIM on an 840-EVO, and no powerloss or
> anything of the sort occurred.  I was seeing excessive write
> amplification on my SSD, and enabled discard - then my machine
> promptly started lagging, eventually disk access locked up and after a
> reboot I was confronted with:

I've tried with a Samsung 850 EVO 256GB now, and I don't see those errors on 
kernel 4.0.4. Discard is enabled, write-back is enabled, reboots work just 
fine.

> [  276.558692] bcache: journal_read_bucket() 157: too big, 552 bytes,
> offset 2047
> [  276.571448] bcache: prio_read() bad csum reading priorities
> [  276.571528] bcache: prio_read() bad magic reading priorities
> [  276.576807] bcache: error on 804d6906-fa80-40ac-9081-a71a4d595378:
> bad btree header at bucket 65638, block 0, 0 keys, disabling caching
> [  276.577457] bcache: register_cache() registered cache device sda4
> [  276.577632] bcache: cache_set_free() Cache set
> 804d6906-fa80-40ac-9081-a71a4d595378 unregistered
> 
> Attempting to check the backingstore (echo 1 > bcache/running):
> 
> [  687.912987] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  687.913192] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  687.913231] BTRFS: failed to read tree root on bcache0
> [  687.936073] BTRFS: open_ctree failed
> 
> The cache device is not going through LVM or anything of the sort, so
> this is a direct failure of bcache.  Perhaps due to eraseblock
> alignment and assumptions about sizes?  Either way, I've got a ton of
> data to recover/restore now and I'm unhappy about it.

As said, not here. The difference is 840 EVO vs. 850 EVO. AFAIK, bcache had 
not seen any updates during the 4.0 phase, so it may still be a problem of 
Samsung's 840 firmware, or maybe something with your SATA chipset or its 
interaction with the kernel, or with the way your shutdown process works.

Regarding your idea about eraseblock alignment or sizes: I've used 2M bucket 
size and 4k block size again this time. While I didn't find any specs 
supporting my settings, I found it being more appropriate when looking at 
specs of other Samsung TLC drives. Bcache ships with defaults of 1M and 2k. 
Given that difference of both our setups, it may play into the problem. But 
I suggest that, if it makes a difference, is a problem that the firmware 
should handle better. It would also make me feel uncomfortable because it 
tells me that using other settings probably only hides a bug that may still 
occur, just much less frequent.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-06-05  5:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-02  9:47 bcache fails after reboot if discard is enabled Stefan Priebe - Profihost AG
2015-01-02 10:00 ` Stefan Priebe - Profihost AG
2015-01-03 16:32   ` Rolf Fokkens
2015-01-03 19:32     ` Stefan Priebe
2015-01-05  0:06       ` Michael Goertz
2015-02-09 19:46         ` Kai Krakow
2015-04-08  0:06           ` Dan Merillat
2015-04-08 18:17             ` Eric Wheeler
2015-04-08 18:27               ` Stefan Priebe
2015-04-08 19:31                 ` Eric Wheeler
2015-04-08 19:54                   ` Kai Krakow
2015-04-08 22:02                     ` Dan Merillat
2015-04-10 23:00                       ` Kai Krakow
2015-04-11  0:14                         ` Kai Krakow
2015-04-11  6:31                           ` Dan Merillat
2015-04-11  6:54                             ` Dan Merillat
2015-04-11  7:52                               ` Kai Krakow
2015-04-11 18:53                                 ` Dan Merillat
     [not found]                                 ` <CAPL5yKfpk8+6Vw cUVcwJ9QxAZJQmqaa98spCyT7+LekkRvkeAw@mail.gmail.com>
2015-04-11 20:09                                   ` Kai Krakow
2015-04-12  5:56                                     ` Dan Merillat
2015-04-29 17:48                                       ` Dan Merillat
2015-04-29 18:00                                         ` Ming Lin
2015-04-29 19:57                                         ` Kai Krakow
2015-04-08 18:46             ` Kai Krakow
2015-06-05  5:11             ` Kai Krakow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.