All of lore.kernel.org
 help / color / mirror / Atom feed
* Filesystem corruption caused by disk from backing device kicked out of RAID1 array
@ 2013-07-21 16:20 Cyril B.
  2013-07-21 18:17 ` [PATCH] Don't register if a non-bcache superblock is found as well Gabriel
  0 siblings, 1 reply; 6+ messages in thread
From: Cyril B. @ 2013-07-21 16:20 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hello,

I've unfortunately experienced a filesystem corruption with bcache in 
writeback mode. The good news is I can reproduce it.

Short version: this is on stock kernel 3.10.0, bcache-tools from 
20130626. Both my backing and cache devices are in a RAID1 mdadm array. 
Writeback caching is enabled. One disk from the backing RAID1 suddenly 
has an issue and gets kicked out of the RAID. Everything continues 
smoothly, as expected.

The server is rebooted (several hours later) in order to 'reset' the 
failed disk and check if it's still functioning or needs to be replaced 
(this is a non-hot swap chassis). The disk is fine, but bcache is now 
very confused as it sees 2 devices with the same UUID. Boom, the 
filesystem is now corrupted.

Longer version: here's how to reproduce.

* create mdadm RAID1 arrays with a superblock at the end of the device 
(0.9 or 1.0):
mdadm -C /dev/md10 --metadata=1.0 -l 1 -n 2 /dev/sdc1 /dev/sdd1
mdadm -C /dev/md11 --metadata=1.0 -l 1 -n 2 /dev/sdc2 /dev/sdd2

* create your bcache device:
make-bcache -B /dev/md10 -C /dev/md11

* create a filesystem and mount it:
mkfs -t xfs /dev/bcache0
mount /dev/bcache0 /mnt

* enable writeback mode:
echo writeback > /sys/block/bcache0/bcache/cache_mode

* simulate writes on the filesystem:
iozone -a

* wait a bit, make sure data is being written on both the backing and 
cache device. At some point, simulate a failure on the disk:
mdadm -f /dev/md10 /dev/sdc1

* wait again until more data is written on the /dev/md10 backing device: 
/dev/sdc1 and /dev/sdd1 are now diverging.

* finally, reboot.

Here's what I get (relevant lines only):

[    9.759055] md: bind<sdc1>
[    9.759418] md: bind<sdd1>
[    9.759532] md: kicking non-fresh sdc1 from array!
[    9.759636] md: unbind<sdc1>
[    9.842171] md: export_rdev(sdc1)
[    9.844579] md/raid1:md10: active with 1 out of 2 mirrors
[    9.844708] md10: detected capacity change from 0 to 5379194880
[    9.844931] RAID1 conf printout:
[    9.844933]  --- wd:1 rd:2
[    9.844935]  disk 1, wo:0, o:1, dev:sdd1
[    9.845295] md: md11 stopped.
[    9.845729]  md10: unknown partition table
[    9.864962] md: bind<sdd2>
[    9.865332] md: bind<sdc2>
[    9.867314] md/raid1:md11: active with 2 out of 2 mirrors
[    9.867432] md11: detected capacity change from 0 to 1085669376
[    9.867611] RAID1 conf printout:
[    9.867613]  --- wd:2 rd:2
[    9.867615]  disk 0, wo:0, o:1, dev:sdc2
[    9.867617]  disk 1, wo:0, o:1, dev:sdd2
[    9.868148]  md11: unknown partition table

[   10.746609] bcache: register_bdev() registered backing device sdc1
[   10.855288] bcache: register_bdev() registered backing device md10
[   10.926637] bcache: bch_journal_replay() journal replay done, 547 
keys in 4 entries, seq 293
[   10.936029] bcache: bch_cached_dev_attach() Caching md10 as bcache1 
on set ba9ed0f7-c75c-401c-a841-7cc99c393b05

[   10.940232] ------------[ cut here ]------------
[   10.940342] WARNING: at fs/sysfs/dir.c:530 sysfs_add_one+0xbc/0xe0()
[   10.940450] sysfs: cannot create duplicate filename 
'/fs/bcache/ba9ed0f7-c75c-401c-a841-7cc99c393b05/bdev0'
[   10.941491] ehci-pci 0000:00:1d.7: USB 2.0 started, EHCI 1.00
[   10.940595] Modules linked in: bcache i2c_i801(+) ioatdma(+) 
ehci_pci(+) ehci_hcd coretemp i7core_edac edac_core usbhid uhci_hcd 
usbcore usb_common megaraid_sas netconsole ixgbe mdio igb hwmon dca 
i2c_algo_bit ptp pps_core
[   10.942312] CPU: 7 PID: 5126 Comm: bcache-register Not tainted 3.10.0 #1
[   10.942450] Hardware name: Intel Corporation S5520HC/S5520HC, BIOS 
S5500.86B.01.00.0059.082320111421 08/23/2011
[   10.942597]  0000000000000212 ffff880c2dd23848 ffffffff8169349e 
ffff880c2dd23888
[   10.942974]  ffffffff81044a1b ffff880c6ac38bc0 ffff880c2f37e978 
ffff880c2dd239c8
[   10.943351]  ffff880c31795000 00000000ffffffef ffff880c2f3233c8 
ffff880c2dd23968
[   10.943728] Call Trace:
[   10.943827]  [<ffffffff8169349e>] dump_stack+0x19/0x1b
[   10.943934]  [<ffffffff81044a1b>] warn_slowpath_common+0x6b/0xa0
[   10.944041]  [<ffffffff81044b49>] warn_slowpath_fmt+0x69/0x70
[   10.944150]  [<ffffffff81408d84>] ? strlcat+0x74/0x90
[   10.944255]  [<ffffffff81408d84>] ? strlcat+0x74/0x90
[   10.944359]  [<ffffffff81408d84>] ? strlcat+0x74/0x90
[   10.944463]  [<ffffffff81408d84>] ? strlcat+0x74/0x90
[   10.944567]  [<ffffffff811a1bbc>] sysfs_add_one+0xbc/0xe0
[   10.944674]  [<ffffffff811a2ccf>] sysfs_do_create_link_sd+0xbf/0x1f0
[   10.944783]  [<ffffffff811a2e5b>] sysfs_create_link+0x2b/0x30
[   10.944895]  [<ffffffffa0227586>] bcache_device_link+0xe6/0x100 [bcache]
[   10.945009]  [<ffffffffa0229390>] bch_cached_dev_attach+0x380/0x4b0 
[bcache]
[   10.945123]  [<ffffffffa0228b6d>] ? __write_super+0x15d/0x170 [bcache]
[   10.945236]  [<ffffffffa0228cd3>] ? bcache_write_super+0x153/0x170 
[bcache]
[   10.945350]  [<ffffffffa0229ced>] run_cache_set+0x82d/0x9d0 [bcache]
[   10.945463]  [<ffffffffa022a0aa>] register_cache_set+0x21a/0x2b0 [bcache]
[   10.945576]  [<ffffffffa022a803>] register_cache+0x6c3/0x710 [bcache]
[   10.945686]  [<ffffffff81121b68>] ? kmem_cache_alloc_trace+0xb8/0x1e0
[   10.945798]  [<ffffffffa022ae48>] register_bcache+0x5f8/0xb20 [bcache]
[   10.945908]  [<ffffffff81402987>] kobj_attr_store+0x17/0x20
[   10.946014]  [<ffffffff811a092c>] sysfs_write_file+0xcc/0x150
[   10.946123]  [<ffffffff81130a02>] vfs_write+0xe2/0x1e0
[   10.946228]  [<ffffffff81130c0c>] SyS_write+0x5c/0xa0
[   10.946335]  [<ffffffff8169ea12>] system_call_fastpath+0x16/0x1b
[   10.946442] ---[ end trace 1f31f40a281dfb68 ]---
[   10.946545] ------------[ cut here ]------------
[   10.946652] WARNING: at drivers/md/bcache/super.c:689 
bcache_device_link+0xc5/0x100 [bcache]()
[   10.947846] Couldn't create device <-> cache set symlinks
[   10.947930] Modules linked in:<7>[   10.948184] ioatdma 0000:00:16.2: 
irq 122 for MSI/MSI-X

[   10.948095]  bcache i2c_i801(+) ioatdma(+) ehci_pci ehci_hcd coretemp 
i7core_edac edac_core usbhid uhci_hcd usbcore usb_common megaraid_sas 
netconsole ixgbe mdio igb hwmon dca i2c_algo_bit ptp pps_core
[   10.949635] CPU: 7 PID: 5126 Comm: bcache-register Tainted: G 
W    3.10.0 #1
[   10.949777] Hardware name: Intel Corporation S5520HC/S5520HC, BIOS 
S5500.86B.01.00.0059.082320111421 08/23/2011
[   10.949818] ioatdma 0000:00:16.6: irq 126 for MSI/MSI-X
[   10.950212] ioatdma 0000:00:16.7: irq 127 for MSI/MSI-X
[   10.949924]  00000000000002b1 ffff880c2dd238f8 ffffffff8169349e 
ffff880c2dd23938
[   10.950304]  ffffffff81044a1b ffff880c2dd23928 ffff880c32260040 
ffff880c32260010
[   10.950655] i801_smbus 0000:00:1f.3: SMBus using PCI Interrupt
[   10.950803]  ffff880c31f00040 ffff880c3226008c ffff880c31f00000 
ffff880c2dd23a18
[   10.951208] Call Trace:
[   10.951306]  [<ffffffff8169349e>] dump_stack+0x19/0x1b
[   10.951411]  [<ffffffff81044a1b>] warn_slowpath_common+0x6b/0xa0
[   10.951528]  [<ffffffff81044b49>] warn_slowpath_fmt+0x69/0x70
[   10.951637]  [<ffffffff811a1d85>] ? release_sysfs_dirent+0x75/0x130
[   10.951746]  [<ffffffff811a2d86>] ? sysfs_do_create_link_sd+0x176/0x1f0
[   10.951857]  [<ffffffff811a2e5b>] ? sysfs_create_link+0x2b/0x30
[   10.951969]  [<ffffffffa0227565>] bcache_device_link+0xc5/0x100 [bcache]
[   10.952083]  [<ffffffffa0229390>] bch_cached_dev_attach+0x380/0x4b0 
[bcache]
[   10.952198]  [<ffffffffa0228b6d>] ? __write_super+0x15d/0x170 [bcache]
[   10.952312]  [<ffffffffa0228cd3>] ? bcache_write_super+0x153/0x170 
[bcache]
[   10.952426]  [<ffffffffa0229ced>] run_cache_set+0x82d/0x9d0 [bcache]
[   10.952540]  [<ffffffffa022a0aa>] register_cache_set+0x21a/0x2b0 [bcache]
[   10.952653]  [<ffffffffa022a803>] register_cache+0x6c3/0x710 [bcache]
[   10.952763]  [<ffffffff81121b68>] ? kmem_cache_alloc_trace+0xb8/0x1e0
[   10.952875]  [<ffffffffa022ae48>] register_bcache+0x5f8/0xb20 [bcache]
[   10.952986]  [<ffffffff81402987>] kobj_attr_store+0x17/0x20
[   10.953093]  [<ffffffff811a092c>] sysfs_write_file+0xcc/0x150
[   10.953200]  [<ffffffff81130a02>] vfs_write+0xe2/0x1e0
[   10.953305]  [<ffffffff81130c0c>] SyS_write+0x5c/0xa0
[   10.953410]  [<ffffffff8169ea12>] system_call_fastpath+0x16/0x1b
[   10.953517] ---[ end trace 1f31f40a281dfb69 ]---
[   10.953621] bcache: bch_cached_dev_attach() Caching sdc1 as bcache0 
on set ba9ed0f7-c75c-401c-a841-7cc99c393b05
[   10.953789] bcache: register_cache() registered cache device md11

And the filesystem is now obviously corrupted:

[   14.312553] XFS (bcache0): Mounting Filesystem
[   14.413270] XFS (bcache0): Starting recovery (logdev: internal)
[   14.414033] XFS (bcache0): Internal error xlog_valid_rec_header(1) at 
line 3610 of file fs/xfs/xfs_log_recover.c.  Caller 0xffffffff812f6da2
(XFS trace)
[   14.418639] XFS (bcache0): log mount/recovery failed: error 117
[   14.418800] XFS (bcache0): log mount failed

If the RAID arrays are created with a 1.1 or 1.2 superblock, the issue 
doesn't occur, as the kicked out disk is not recognized by bcache.

I'm not sure what bcache could do to prevent this. Preventing the second 
cache device from being created wouldn't solve the problem.

Thanks,
Cyril

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] Don't register if a non-bcache superblock is found as well
  2013-07-21 16:20 Filesystem corruption caused by disk from backing device kicked out of RAID1 array Cyril B.
@ 2013-07-21 18:17 ` Gabriel
       [not found]   ` <1374430637-12954-1-git-send-email-g2p.code-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Gabriel @ 2013-07-21 18:17 UTC (permalink / raw)
  To: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA, Cyril B .; +Cc: Gabriel

---
This should fix it.

Because the raid superblock is at the end of the device, bcache's
udev rules could detect a bcache device on the unassembled raid
member.  A better fix would be to merge probe-bcache into
util-linux's blkid.

 61-bcache.rules | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/61-bcache.rules b/61-bcache.rules
index 22c1a90..dd85e69 100644
--- a/61-bcache.rules
+++ b/61-bcache.rules
@@ -3,14 +3,19 @@
 
 SUBSYSTEM!="block", GOTO="bcache_end"
 ACTION=="remove", GOTO="bcache_end"
 
 # Backing devices: scan, symlink, register
+IMPORT{program}="/sbin/blkid -o udev $tempnode"
+# blkid and probe-bcache can disagree, in which case don't register
+ENV{ID_FS_TYPE}=="?*", ENV{ID_FS_TYPE}!="bcache", GOTO="bcache_backing_end"
+
 IMPORT{program}="/sbin/probe-bcache -o udev $tempnode"
 ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
 SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="bcache", \
         RUN+="bcache-register $tempnode"
+LABEL="bcache_backing_end"
 
 # Cached devices: symlink
 DRIVER=="bcache", ENV{CACHED_UUID}=="?*", \
         SYMLINK+="bcache/by-uuid/$env{CACHED_UUID}"
 DRIVER=="bcache", ENV{CACHED_LABEL}=="?*", \
-- 
1.8.3.3.758.g90e98ff

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't register if a non-bcache superblock is found as well
       [not found]   ` <1374430637-12954-1-git-send-email-g2p.code-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-07-22  1:59     ` Kent Overstreet
  2013-07-22 10:38     ` Cyril B.
  1 sibling, 0 replies; 6+ messages in thread
From: Kent Overstreet @ 2013-07-22  1:59 UTC (permalink / raw)
  To: Gabriel; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Cyril B .

On Sun, Jul 21, 2013 at 08:17:17PM +0200, Gabriel wrote:
> ---
> This should fix it.
> 
> Because the raid superblock is at the end of the device, bcache's
> udev rules could detect a bcache device on the unassembled raid
> member.  A better fix would be to merge probe-bcache into
> util-linux's blkid.

Thanks, applied.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't register if a non-bcache superblock is found as well
       [not found]   ` <1374430637-12954-1-git-send-email-g2p.code-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-07-22  1:59     ` Kent Overstreet
@ 2013-07-22 10:38     ` Cyril B.
       [not found]       ` <51ED0BAB.4040604-SxHCd5+OuqTrt3ojHgZu+w@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Cyril B. @ 2013-07-22 10:38 UTC (permalink / raw)
  To: Gabriel; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Gabriel wrote:
> This should fix it.
>
> Because the raid superblock is at the end of the device, bcache's
> udev rules could detect a bcache device on the unassembled raid
> member.  A better fix would be to merge probe-bcache into
> util-linux's blkid.


I'm afraid that doesn't work: with that patch, my backing device doesn't 
even get registered anymore, since blkid will detect it as XFS:

# /sbin/blkid -o udev /dev/md10
ID_FS_UUID=08e8619f-6815-4380-b8c8-9eab18d1643c
ID_FS_UUID_ENC=08e8619f-6815-4380-b8c8-9eab18d1643c
ID_FS_TYPE=xfs

Cyril

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't register if a non-bcache superblock is found as well
       [not found]       ` <51ED0BAB.4040604-SxHCd5+OuqTrt3ojHgZu+w@public.gmane.org>
@ 2013-07-22 11:00         ` Gabriel de Perthuis
       [not found]           ` <51ED10C4.2050808-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Gabriel de Perthuis @ 2013-07-22 11:00 UTC (permalink / raw)
  To: Cyril B.; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

> Gabriel wrote:
>> This should fix it.
>>
>> Because the raid superblock is at the end of the device, bcache's
>> udev rules could detect a bcache device on the unassembled raid
>> member.  A better fix would be to merge probe-bcache into
>> util-linux's blkid.
>
> I'm afraid that doesn't work: with that patch, my backing device doesn't even get registered anymore, since blkid will detect it as XFS:

I just tried making an xfs filesystem on a bcache device,
and the xfs superblock isn't visible to the backing device.

Your problem is likely an old superblock; use wipefs before make-bcache.

If you do find a filesystem that doesn't put its blkid magic
at a fixed offset from the start of the device, we could
consider alternatives, like whitelisting them or porting to blkid.

For now, better safe than sorry.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't register if a non-bcache superblock is found as well
       [not found]           ` <51ED10C4.2050808-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-07-22 14:36             ` Cyril B.
  0 siblings, 0 replies; 6+ messages in thread
From: Cyril B. @ 2013-07-22 14:36 UTC (permalink / raw)
  To: Gabriel de Perthuis; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Gabriel de Perthuis wrote:
>> I'm afraid that doesn't work: with that patch, my backing device doesn't even get registered anymore, since blkid will detect it as XFS:
>
> I just tried making an xfs filesystem on a bcache device,
> and the xfs superblock isn't visible to the backing device.
>
> Your problem is likely an old superblock; use wipefs before make-bcache.
>
> If you do find a filesystem that doesn't put its blkid magic
> at a fixed offset from the start of the device, we could
> consider alternatives, like whitelisting them or porting to blkid.
>
> For now, better safe than sorry.


After zeroing my devices and trying again, I cannot reproduce the issue. 
I guess you're right, I must have had an old superblock or something.

Thanks for the fix.

Cyril

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-07-22 14:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-21 16:20 Filesystem corruption caused by disk from backing device kicked out of RAID1 array Cyril B.
2013-07-21 18:17 ` [PATCH] Don't register if a non-bcache superblock is found as well Gabriel
     [not found]   ` <1374430637-12954-1-git-send-email-g2p.code-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-07-22  1:59     ` Kent Overstreet
2013-07-22 10:38     ` Cyril B.
     [not found]       ` <51ED0BAB.4040604-SxHCd5+OuqTrt3ojHgZu+w@public.gmane.org>
2013-07-22 11:00         ` Gabriel de Perthuis
     [not found]           ` <51ED10C4.2050808-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-07-22 14:36             ` Cyril B.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.