* btrfsck: backpointer mismatch (and multiple other errors)
@ 2016-03-31 20:44 Kai Krakow
2016-03-31 23:27 ` Henk Slager
0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-03-31 20:44 UTC (permalink / raw)
To: linux-btrfs
Hello!
I already reported this in another thread but it was a bit confusing by
intermixing multiple volumes. So let's start a new thread:
Since one of the last kernel upgrades, I'm experiencing one VDI file
(containing a NTFS image with Windows 7) getting damaged when running
the machine in VirtualBox. I got knowledge about this after
experiencing an error "duplicate object" and btrfs went RO. I fixed it
by deleting the VDI and restoring from backup - but no I get csum
errors as soon as some VM IO goes into the VDI file.
The FS is still usable. One effect is, that after reading all files
with rsync (to copy to my backup), each call of "du" or "df" hangs, also
similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
outcome of this is, that the FS does not properly unmount during
shutdown.
Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
series, and never had problems), including Gentoo patch-set r1.
The device layout is:
$ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
NAME MODEL FSTYPE LABEL MOUNTPOINT
sda Crucial_CT128MX1
├─sda1 vfat ESP /boot
├─sda2
└─sda3 bcache
├─bcache0 btrfs system
├─bcache1 btrfs system
└─bcache2 btrfs system /usr/src
sdb SAMSUNG HD103SJ
├─sdb1 swap swap0 [SWAP]
└─sdb2 bcache
└─bcache2 btrfs system /usr/src
sdc SAMSUNG HD103SJ
├─sdc1 swap swap1 [SWAP]
└─sdc2 bcache
└─bcache1 btrfs system
sdd SAMSUNG HD103UJ
├─sdd1 swap swap2 [SWAP]
└─sdd2 bcache
└─bcache0 btrfs system
Mount options are:
$ mount|fgrep btrfs
/dev/bcache2 on / type btrfs (rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)
The FS uses mraid=1 and draid=0.
Output of btrfsck is:
(also available here:
https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)
$ btrfsck /dev/disk/by-label/system
checking extents
ref mismatch on [10443660537856 524288] extent item 1, found 2
Backref 10443660537856 root 256 owner 23536425 offset 1310720 num_refs 0 not found in extent tree
Incorrect local backref count on 10443660537856 root 256 owner 23536425 offset 1310720 found 1 wanted 0 back 0x4ceee750
Backref disk bytenr does not match extent record, bytenr=10443660537856, ref bytenr=10443660914688
Backref bytes do not match extent backref, bytenr=10443660537856, ref bytes=524288, backref bytes=69632
backpointer mismatch on [10443660537856 524288]
extent item 11271946579968 has multiple extent items
ref mismatch on [11271946579968 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946579968, ref bytenr=11271946629120
backpointer mismatch on [11271946579968 110592]
extent item 11271946690560 has multiple extent items
ref mismatch on [11271946690560 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946690560, ref bytenr=11271946739712
Backref bytes do not match extent backref, bytenr=11271946690560, ref bytes=114688, backref bytes=110592
backpointer mismatch on [11271946690560 114688]
extent item 11271946805248 has multiple extent items
ref mismatch on [11271946805248 114688] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271946805248, ref bytenr=11271946850304
Backref bytes do not match extent backref, bytenr=11271946805248, ref bytes=114688, backref bytes=53248
Backref disk bytenr does not match extent record, bytenr=11271946805248, ref bytenr=11271946903552
Backref bytes do not match extent backref, bytenr=11271946805248, ref bytes=114688, backref bytes=49152
backpointer mismatch on [11271946805248 114688]
extent item 11271946919936 has multiple extent items
ref mismatch on [11271946919936 61440] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946919936, ref bytenr=11271946952704
Backref bytes do not match extent backref, bytenr=11271946919936, ref bytes=61440, backref bytes=110592
backpointer mismatch on [11271946919936 61440]
extent item 11271946981376 has multiple extent items
ref mismatch on [11271946981376 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946981376, ref bytenr=11271947063296
backpointer mismatch on [11271946981376 110592]
extent item 11271947091968 has multiple extent items
ref mismatch on [11271947091968 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947091968, ref bytenr=11271947173888
Backref bytes do not match extent backref, bytenr=11271947091968, ref bytes=110592, backref bytes=114688
backpointer mismatch on [11271947091968 110592]
extent item 11271947202560 has multiple extent items
ref mismatch on [11271947202560 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947202560, ref bytenr=11271947288576
Backref bytes do not match extent backref, bytenr=11271947202560, ref bytes=110592, backref bytes=102400
backpointer mismatch on [11271947202560 110592]
extent item 11271947313152 has multiple extent items
ref mismatch on [11271947313152 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947313152, ref bytenr=11271947390976
Backref bytes do not match extent backref, bytenr=11271947313152, ref bytes=114688, backref bytes=110592
backpointer mismatch on [11271947313152 114688]
extent item 11271947427840 has multiple extent items
ref mismatch on [11271947427840 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947427840, ref bytenr=11271947501568
backpointer mismatch on [11271947427840 110592]
extent item 11271947538432 has multiple extent items
ref mismatch on [11271947538432 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947538432, ref bytenr=11271947612160
Backref bytes do not match extent backref, bytenr=11271947538432, ref bytes=86016, backref bytes=81920
backpointer mismatch on [11271947538432 86016]
extent item 11271947624448 has multiple extent items
ref mismatch on [11271947624448 77824] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947624448, ref bytenr=11271947694080
Backref bytes do not match extent backref, bytenr=11271947624448, ref bytes=77824, backref bytes=102400
backpointer mismatch on [11271947624448 77824]
ref mismatch on [11271947702272 102400] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947702272, ref bytenr=11271947796480
Backref bytes do not match extent backref, bytenr=11271947702272, ref bytes=102400, backref bytes=90112
backpointer mismatch on [11271947702272 102400]
extent item 11271947862016 has multiple extent items
extent item 11271947886592 has multiple extent items
ref mismatch on [11271947886592 131072] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947886592, ref bytenr=11271947948032
Backref bytes do not match extent backref, bytenr=11271947886592, ref bytes=131072, backref bytes=102400
backpointer mismatch on [11271947886592 131072]
extent item 11271948017664 has multiple extent items
ref mismatch on [11271948017664 49152] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948017664, ref bytenr=11271948050432
Backref bytes do not match extent backref, bytenr=11271948017664, ref bytes=49152, backref bytes=94208
backpointer mismatch on [11271948017664 49152]
extent item 11271948144640 has multiple extent items
ref mismatch on [11271948144640 73728] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948144640, ref bytenr=11271948148736
Backref bytes do not match extent backref, bytenr=11271948144640, ref bytes=73728, backref bytes=110592
backpointer mismatch on [11271948144640 73728]
extent item 11271948218368 has multiple extent items
ref mismatch on [11271948218368 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948218368, ref bytenr=11271948259328
Backref bytes do not match extent backref, bytenr=11271948218368, ref bytes=110592, backref bytes=102400
backpointer mismatch on [11271948218368 110592]
extent item 11271948328960 has multiple extent items
ref mismatch on [11271948328960 106496] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948328960, ref bytenr=11271948361728
Backref bytes do not match extent backref, bytenr=11271948328960, ref bytes=106496, backref bytes=110592
backpointer mismatch on [11271948328960 106496]
extent item 11271948435456 has multiple extent items
ref mismatch on [11271948435456 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948435456, ref bytenr=11271948472320
Backref bytes do not match extent backref, bytenr=11271948435456, ref bytes=110592, backref bytes=114688
backpointer mismatch on [11271948435456 110592]
extent item 11271948546048 has multiple extent items
ref mismatch on [11271948546048 110592] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271948546048, ref bytenr=11271948587008
Backref bytes do not match extent backref, bytenr=11271948546048, ref bytes=110592, backref bytes=61440
Backref disk bytenr does not match extent record, bytenr=11271948546048, ref bytenr=11271948648448
Backref bytes do not match extent backref, bytenr=11271948546048, ref bytes=110592, backref bytes=73728
backpointer mismatch on [11271948546048 110592]
extent item 11271948656640 has multiple extent items
ref mismatch on [11271948656640 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948656640, ref bytenr=11271948722176
backpointer mismatch on [11271948656640 110592]
extent item 11271948767232 has multiple extent items
ref mismatch on [11271948767232 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948767232, ref bytenr=11271948832768
Backref bytes do not match extent backref, bytenr=11271948767232, ref bytes=114688, backref bytes=73728
backpointer mismatch on [11271948767232 114688]
extent item 11271948881920 has multiple extent items
ref mismatch on [11271948881920 114688] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271948881920, ref bytenr=11271948906496
Backref bytes do not match extent backref, bytenr=11271948881920, ref bytes=114688, backref bytes=12288
Backref disk bytenr does not match extent record, bytenr=11271948881920, ref bytenr=11271948926976
Backref bytes do not match extent backref, bytenr=11271948881920, ref bytes=114688, backref bytes=524288
backpointer mismatch on [11271948881920 114688]
extent item 11271949414400 has multiple extent items
ref mismatch on [11271949414400 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949414400, ref bytenr=11271949451264
Backref bytes do not match extent backref, bytenr=11271949414400, ref bytes=110592, backref bytes=81920
backpointer mismatch on [11271949414400 110592]
extent item 11271949524992 has multiple extent items
ref mismatch on [11271949524992 57344] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949524992, ref bytenr=11271949533184
Backref bytes do not match extent backref, bytenr=11271949524992, ref bytes=57344, backref bytes=94208
backpointer mismatch on [11271949524992 57344]
extent item 11271949582336 has multiple extent items
ref mismatch on [11271949582336 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949582336, ref bytenr=11271949627392
Backref bytes do not match extent backref, bytenr=11271949582336, ref bytes=86016, backref bytes=81920
backpointer mismatch on [11271949582336 86016]
extent item 11271949668352 has multiple extent items
ref mismatch on [11271949668352 94208] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949668352, ref bytenr=11271949709312
Backref bytes do not match extent backref, bytenr=11271949668352, ref bytes=94208, backref bytes=98304
backpointer mismatch on [11271949668352 94208]
extent item 11271949762560 has multiple extent items
ref mismatch on [11271949762560 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949762560, ref bytenr=11271949807616
Backref bytes do not match extent backref, bytenr=11271949762560, ref bytes=81920, backref bytes=94208
backpointer mismatch on [11271949762560 81920]
extent item 11271949844480 has multiple extent items
ref mismatch on [11271949844480 94208] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949844480, ref bytenr=11271949901824
backpointer mismatch on [11271949844480 94208]
extent item 11271949938688 has multiple extent items
ref mismatch on [11271949938688 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949938688, ref bytenr=11271949996032
Backref bytes do not match extent backref, bytenr=11271949938688, ref bytes=81920, backref bytes=90112
backpointer mismatch on [11271949938688 81920]
extent item 11271950020608 has multiple extent items
ref mismatch on [11271950020608 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950020608, ref bytenr=11271950086144
Backref bytes do not match extent backref, bytenr=11271950020608, ref bytes=81920, backref bytes=94208
backpointer mismatch on [11271950020608 81920]
extent item 11271950180352 has multiple extent items
ref mismatch on [11271950180352 81920] extent item 1, found 2
Backref bytes do not match extent backref, bytenr=11271950180352, ref bytes=81920, backref bytes=98304
backpointer mismatch on [11271950180352 81920]
ref mismatch on [11271950262272 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950262272, ref bytenr=11271950278656
Backref bytes do not match extent backref, bytenr=11271950262272, ref bytes=81920, backref bytes=102400
backpointer mismatch on [11271950262272 81920]
extent item 11271950344192 has multiple extent items
ref mismatch on [11271950344192 77824] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950344192, ref bytenr=11271950381056
Backref bytes do not match extent backref, bytenr=11271950344192, ref bytes=77824, backref bytes=98304
backpointer mismatch on [11271950344192 77824]
extent item 11271950422016 has multiple extent items
ref mismatch on [11271950422016 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950422016, ref bytenr=11271950479360
Backref bytes do not match extent backref, bytenr=11271950422016, ref bytes=81920, backref bytes=98304
backpointer mismatch on [11271950422016 81920]
extent item 11271950503936 has multiple extent items
ref mismatch on [11271950503936 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950503936, ref bytenr=11271950577664
Backref bytes do not match extent backref, bytenr=11271950503936, ref bytes=86016, backref bytes=94208
backpointer mismatch on [11271950503936 86016]
extent item 11271950589952 has multiple extent items
ref mismatch on [11271950589952 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950589952, ref bytenr=11271950671872
Backref bytes do not match extent backref, bytenr=11271950589952, ref bytes=86016, backref bytes=94208
backpointer mismatch on [11271950589952 86016]
extent item 11271950675968 has multiple extent items
ref mismatch on [11271950675968 98304] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950675968, ref bytenr=11271950766080
backpointer mismatch on [11271950675968 98304]
extent item 11271950774272 has multiple extent items
ref mismatch on [11271950774272 94208] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950774272, ref bytenr=11271950864384
Backref bytes do not match extent backref, bytenr=11271950774272, ref bytes=94208, backref bytes=98304
backpointer mismatch on [11271950774272 94208]
extent item 11271950954496 has multiple extent items
ref mismatch on [11271950954496 90112] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950954496, ref bytenr=11271950962688
Backref bytes do not match extent backref, bytenr=11271950954496, ref bytes=90112, backref bytes=61440
backpointer mismatch on [11271950954496 90112]
extent item 11271952793600 has multiple extent items
ref mismatch on [11271952793600 98304] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271952793600, ref bytenr=11271952879616
Backref bytes do not match extent backref, bytenr=11271952793600, ref bytes=98304, backref bytes=102400
backpointer mismatch on [11271952793600 98304]
extent item 11271952891904 has multiple extent items
ref mismatch on [11271952891904 262144] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271952891904, ref bytenr=11271952994304
Backref bytes do not match extent backref, bytenr=11271952891904, ref bytes=262144, backref bytes=1052672
backpointer mismatch on [11271952891904 262144]
extent item 11271953993728 has multiple extent items
ref mismatch on [11271953993728 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271953993728, ref bytenr=11271954046976
Backref bytes do not match extent backref, bytenr=11271953993728, ref bytes=114688, backref bytes=1052672
backpointer mismatch on [11271953993728 114688]
ref mismatch on [11271954878464 393216] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271954878464, ref bytenr=11271955099648
Backref bytes do not match extent backref, bytenr=11271954878464, ref bytes=393216, backref bytes=3149824
backpointer mismatch on [11271954878464 393216]
extent item 11271956312064 has multiple extent items
ref mismatch on [11271958249472 2101248] extent item 0, found 1
Backref 11271958249472 parent 12160723820544 owner 0 offset 0 num_refs 0 not found in extent tree
Incorrect local backref count on 11271958249472 parent 12160723820544 owner 0 offset 0 found 1 wanted 0 back 0x14d56620
backpointer mismatch on [11271958249472 2101248]
extent item 11271960338432 has multiple extent items
ref mismatch on [11271960338432 57344] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271960338432, ref bytenr=11271960350720
Backref bytes do not match extent backref, bytenr=11271960338432, ref bytes=57344, backref bytes=1052672
backpointer mismatch on [11271960338432 57344]
extent item 11271961325568 has multiple extent items
ref mismatch on [11271961325568 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271961325568, ref bytenr=11271961403392
Backref bytes do not match extent backref, bytenr=11271961325568, ref bytes=81920, backref bytes=1052672
backpointer mismatch on [11271961325568 81920]
extent item 11271962333184 has multiple extent items
ref mismatch on [11271962333184 524288] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271962333184, ref bytenr=11271962456064
Backref bytes do not match extent backref, bytenr=11271962333184, ref bytes=524288, backref bytes=1052672
backpointer mismatch on [11271962333184 524288]
extent item 11271963475968 has multiple extent items
ref mismatch on [11271963475968 393216] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271963475968, ref bytenr=11271963508736
Backref bytes do not match extent backref, bytenr=11271963475968, ref bytes=393216, backref bytes=1052672
backpointer mismatch on [11271963475968 393216]
extent item 11271964389376 has multiple extent items
ref mismatch on [11271964389376 524288] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271964389376, ref bytenr=11271964561408
Backref bytes do not match extent backref, bytenr=11271964389376, ref bytes=524288, backref bytes=1052672
backpointer mismatch on [11271964389376 524288]
extent item 11271965601792 has multiple extent items
ref mismatch on [11271965601792 90112] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271965601792, ref bytenr=11271965614080
Backref bytes do not match extent backref, bytenr=11271965601792, ref bytes=90112, backref bytes=1052672
backpointer mismatch on [11271965601792 90112]
extent item 11271968571392 has multiple extent items
ref mismatch on [11271968571392 1052672] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271968571392, ref bytenr=11271969107968
Backref bytes do not match extent backref, bytenr=11271968571392, ref bytes=1052672, backref bytes=69632
Backref disk bytenr does not match extent record, bytenr=11271968571392, ref bytenr=11271969177600
Backref bytes do not match extent backref, bytenr=11271968571392, ref bytes=1052672, backref bytes=262144
backpointer mismatch on [11271968571392 1052672]
checking free space cache
checking fs roots
root 4336 inode 4284125 errors 1000, some csum missing
Checking filesystem on /dev/disk/by-label/system
UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
found 1832931324360 bytes used err is 1
total csum bytes: 1730105656
total tree bytes: 6494474240
total fs tree bytes: 3789783040
total extent tree bytes: 608219136
btree space waste bytes: 1221460063
file data blocks allocated: 2406059724800
referenced 2040857763840
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-03-31 20:44 btrfsck: backpointer mismatch (and multiple other errors) Kai Krakow
@ 2016-03-31 23:27 ` Henk Slager
2016-04-01 1:10 ` Qu Wenruo
2016-04-02 9:00 ` Kai Krakow
0 siblings, 2 replies; 22+ messages in thread
From: Henk Slager @ 2016-03-31 23:27 UTC (permalink / raw)
To: linux-btrfs
On Thu, Mar 31, 2016 at 10:44 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Hello!
>
> I already reported this in another thread but it was a bit confusing by
> intermixing multiple volumes. So let's start a new thread:
>
> Since one of the last kernel upgrades, I'm experiencing one VDI file
> (containing a NTFS image with Windows 7) getting damaged when running
> the machine in VirtualBox. I got knowledge about this after
> experiencing an error "duplicate object" and btrfs went RO. I fixed it
> by deleting the VDI and restoring from backup - but no I get csum
> errors as soon as some VM IO goes into the VDI file.
>
> The FS is still usable. One effect is, that after reading all files
> with rsync (to copy to my backup), each call of "du" or "df" hangs, also
> similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
> outcome of this is, that the FS does not properly unmount during
> shutdown.
>
> Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
> series, and never had problems), including Gentoo patch-set r1.
One possibility could be that the vbox kernel modules somehow corrupt
btrfs kernel area since kernel 4.5.
In order to make this reproducible (or an attempt to reproduce) for
others, you could unload VirtualBox stuff and restore the VDI file
from backup (or whatever big file) and then make pseudo-random, but
reproducible writes to the file.
It is not clear to me what 'Gentoo patch-set r1' is and does. So just
boot a vanilla v4.5 kernel from kernel.org and see if you get csum
errors in dmesg.
Also, where does 'duplicate object' come from? dmesg ? then please
post its surroundings, straight from dmesg.
> The device layout is:
>
> $ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
> NAME MODEL FSTYPE LABEL MOUNTPOINT
> sda Crucial_CT128MX1
> ├─sda1 vfat ESP /boot
> ├─sda2
> └─sda3 bcache
> ├─bcache0 btrfs system
> ├─bcache1 btrfs system
> └─bcache2 btrfs system /usr/src
> sdb SAMSUNG HD103SJ
> ├─sdb1 swap swap0 [SWAP]
> └─sdb2 bcache
> └─bcache2 btrfs system /usr/src
> sdc SAMSUNG HD103SJ
> ├─sdc1 swap swap1 [SWAP]
> └─sdc2 bcache
> └─bcache1 btrfs system
> sdd SAMSUNG HD103UJ
> ├─sdd1 swap swap2 [SWAP]
> └─sdd2 bcache
> └─bcache0 btrfs system
>
> Mount options are:
>
> $ mount|fgrep btrfs
> /dev/bcache2 on / type btrfs (rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)
>
> The FS uses mraid=1 and draid=0.
>
> Output of btrfsck is:
> (also available here:
> https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)
>
> $ btrfsck /dev/disk/by-label/system
> checking extents
> ref mismatch on [10443660537856 524288] extent item 1, found 2
This 10443660537856 number is bigger than the 1832931324360 number
found for total bytes. AFAIK, this is already wrong.
[...]
> checking fs roots
> root 4336 inode 4284125 errors 1000, some csum missing
What is in this inode?
> Checking filesystem on /dev/disk/by-label/system
> UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
> found 1832931324360 bytes used err is 1
> total csum bytes: 1730105656
> total tree bytes: 6494474240
> total fs tree bytes: 3789783040
> total extent tree bytes: 608219136
> btree space waste bytes: 1221460063
> file data blocks allocated: 2406059724800
> referenced 2040857763840
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-03-31 23:27 ` Henk Slager
@ 2016-04-01 1:10 ` Qu Wenruo
2016-04-02 8:47 ` Kai Krakow
2016-04-02 9:00 ` Kai Krakow
1 sibling, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2016-04-01 1:10 UTC (permalink / raw)
To: Henk Slager, linux-btrfs
Henk Slager wrote on 2016/04/01 01:27 +0200:
> On Thu, Mar 31, 2016 at 10:44 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> Hello!
>>
>> I already reported this in another thread but it was a bit confusing by
>> intermixing multiple volumes. So let's start a new thread:
>>
>> Since one of the last kernel upgrades, I'm experiencing one VDI file
>> (containing a NTFS image with Windows 7) getting damaged when running
>> the machine in VirtualBox. I got knowledge about this after
>> experiencing an error "duplicate object" and btrfs went RO. I fixed it
>> by deleting the VDI and restoring from backup - but no I get csum
>> errors as soon as some VM IO goes into the VDI file.
>>
>> The FS is still usable. One effect is, that after reading all files
>> with rsync (to copy to my backup), each call of "du" or "df" hangs, also
>> similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
>> outcome of this is, that the FS does not properly unmount during
>> shutdown.
>>
>> Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
>> series, and never had problems), including Gentoo patch-set r1.
>
> One possibility could be that the vbox kernel modules somehow corrupt
> btrfs kernel area since kernel 4.5.
>
> In order to make this reproducible (or an attempt to reproduce) for
> others, you could unload VirtualBox stuff and restore the VDI file
> from backup (or whatever big file) and then make pseudo-random, but
> reproducible writes to the file.
>
> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
> errors in dmesg.
>
> Also, where does 'duplicate object' come from? dmesg ? then please
> post its surroundings, straight from dmesg.
>
>> The device layout is:
>>
>> $ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
>> NAME MODEL FSTYPE LABEL MOUNTPOINT
>> sda Crucial_CT128MX1
>> ├─sda1 vfat ESP /boot
>> ├─sda2
>> └─sda3 bcache
>> ├─bcache0 btrfs system
>> ├─bcache1 btrfs system
>> └─bcache2 btrfs system /usr/src
>> sdb SAMSUNG HD103SJ
>> ├─sdb1 swap swap0 [SWAP]
>> └─sdb2 bcache
>> └─bcache2 btrfs system /usr/src
>> sdc SAMSUNG HD103SJ
>> ├─sdc1 swap swap1 [SWAP]
>> └─sdc2 bcache
>> └─bcache1 btrfs system
>> sdd SAMSUNG HD103UJ
>> ├─sdd1 swap swap2 [SWAP]
>> └─sdd2 bcache
>> └─bcache0 btrfs system
>>
>> Mount options are:
>>
>> $ mount|fgrep btrfs
>> /dev/bcache2 on / type btrfs (rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)
>>
>> The FS uses mraid=1 and draid=0.
>>
>> Output of btrfsck is:
>> (also available here:
>> https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)
>>
>> $ btrfsck /dev/disk/by-label/system
>> checking extents
>> ref mismatch on [10443660537856 524288] extent item 1, found 2
> This 10443660537856 number is bigger than the 1832931324360 number
> found for total bytes. AFAIK, this is already wrong.
Nope. That's btrfs logical space address, which can be beyond real disk
bytenr.
The easiest method to reproduce such case, is write something in a 256M
btrfs, and balance the fs several times.
Then all chunks can be at bytenr beyond 256M.
The real problem is, the extent has mismatched reference.
Normally it can fixed by --init-extent-tree option, but it normally
means bigger problem, especially it has already caused kernel
delayed-ref problem.
No to mention the error "extent item 11271947091968 has multiple extent
items", which makes the problem more serious.
I assume some older kernel have already screwed up the extent tree, as
although delayed-ref is bug-prove, it has improved in recent years.
But it seems fs tree is less damaged, I assume the extent tree
corruption could be fixed by "--init-extent-tree".
For the only fs tree error (missing csum), if "btrfsck
--init-extent-tree --repair" works without any problem, the most simple
fix would be, just removing the file.
Or you can use a lot of CPU time and disk IO to rebuild the whole csum,
by using "--init-csum-tree" option.
Thanks,
Qu
>
> [...]
>
>> checking fs roots
>> root 4336 inode 4284125 errors 1000, some csum missing
> What is in this inode?
>
>> Checking filesystem on /dev/disk/by-label/system
>> UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
>> found 1832931324360 bytes used err is 1
>> total csum bytes: 1730105656
>> total tree bytes: 6494474240
>> total fs tree bytes: 3789783040
>> total extent tree bytes: 608219136
>> btree space waste bytes: 1221460063
>> file data blocks allocated: 2406059724800
>> referenced 2040857763840
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-01 1:10 ` Qu Wenruo
@ 2016-04-02 8:47 ` Kai Krakow
0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-02 8:47 UTC (permalink / raw)
To: linux-btrfs
Am Fri, 1 Apr 2016 09:10:44 +0800
schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
> The real problem is, the extent has mismatched reference.
> Normally it can fixed by --init-extent-tree option, but it normally
> means bigger problem, especially it has already caused kernel
> delayed-ref problem.
>
> No to mention the error "extent item 11271947091968 has multiple
> extent items", which makes the problem more serious.
>
>
> I assume some older kernel have already screwed up the extent tree,
> as although delayed-ref is bug-prove, it has improved in recent years.
>
> But it seems fs tree is less damaged, I assume the extent tree
> corruption could be fixed by "--init-extent-tree".
>
> For the only fs tree error (missing csum), if "btrfsck
> --init-extent-tree --repair" works without any problem, the most
> simple fix would be, just removing the file.
> Or you can use a lot of CPU time and disk IO to rebuild the whole
> csum, by using "--init-csum-tree" option.
Okay, so I'm going to inode-resolve the file with csum errors.
Actually, it's a file from Steam which has been there for ages and
never showed csum errors before which make me wonder if csum errors may
sneak in on long existing files through other corruptions.
I now removed this file and had to reboot because btrfs went RO. Here's
the backtrace:
https://gist.github.com/kakra/a7be40c23e08fc6e237f9108371afadf
[137619.835374] ------------[ cut here ]------------
[137619.835385] WARNING: CPU: 1 PID: 4840 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x156/0x620()
[137619.835394] Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO)
[137619.835405] CPU: 1 PID: 4840 Comm: rm Tainted: P O 4.5.0-gentoo-r1 #1
[137619.835407] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[137619.835409] 0000000000000000 ffffffff8159eae9 0000000000000000 ffffffff81ea1d08
[137619.835412] ffffffff810c6e37 ffff8803d56a4d20 ffff88040c7daa00 00000a4075114000
[137619.835415] 0000000000201000 0000000000000000 ffffffff81489836 0000001d00000000
[137619.835418] Call Trace:
[137619.835423] [<ffffffff8159eae9>] ? dump_stack+0x46/0x5d
[137619.835429] [<ffffffff810c6e37>] ? warn_slowpath_common+0x77/0xb0
[137619.835432] [<ffffffff81489836>] ? lookup_inline_extent_backref+0x156/0x620
[137619.835435] [<ffffffff814bdfce>] ? btrfs_get_token_32+0xee/0x110
[137619.835440] [<ffffffff8115de48>] ? __set_page_dirty_nobuffers+0xf8/0x150
[137619.835443] [<ffffffff81489d54>] ? insert_inline_extent_backref+0x54/0xe0
[137619.835450] [<ffffffff8119ebd8>] ? __slab_free+0x98/0x220
[137619.835453] [<ffffffff8119e6ad>] ? kmem_cache_alloc+0x14d/0x160
[137619.835456] [<ffffffff8148a1e9>] ? __btrfs_inc_extent_ref.isra.64+0x99/0x270
[137619.835459] [<ffffffff8148ecc3>] ? __btrfs_run_delayed_refs+0x673/0x1020
[137619.835463] [<ffffffff814c6e01>] ? btrfs_release_extent_buffer_page+0x71/0x120
[137619.835466] [<ffffffff814c6eef>] ? release_extent_buffer+0x3f/0x90
[137619.835469] [<ffffffff8149222f>] ? btrfs_run_delayed_refs+0x8f/0x2b0
[137619.835473] [<ffffffff814b0978>] ? btrfs_truncate_inode_items+0x8b8/0xdc0
[137619.835477] [<ffffffff814b1d4e>] ? btrfs_evict_inode+0x3fe/0x550
[137619.835481] [<ffffffff811cd4f7>] ? evict+0xb7/0x180
[137619.835484] [<ffffffff811c37cc>] ? do_unlinkat+0x12c/0x2d0
[137619.835488] [<ffffffff81bdb017>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[137619.835491] ---[ end trace 6e8061336c42ff93 ]---
[137619.835494] ------------[ cut here ]------------
[137619.835497] WARNING: CPU: 1 PID: 4840 at fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x279/0x2b0()
[137619.835499] BTRFS: Transaction aborted (error -5)
[137619.835500] Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO)
[137619.835506] CPU: 1 PID: 4840 Comm: rm Tainted: P W O 4.5.0-gentoo-r1 #1
[137619.835508] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[137619.835509] 0000000000000000 ffffffff8159eae9 ffff880255d1bc98 ffffffff81ea1d08
[137619.835512] ffffffff810c6e37 ffff88040c7daa00 ffff880255d1bce8 00000000000001c6
[137619.835514] ffff8803211b4510 000000000000000b ffffffff810c6eb7 ffffffff81e8a0a0
[137619.835517] Call Trace:
[137619.835519] [<ffffffff8159eae9>] ? dump_stack+0x46/0x5d
[137619.835522] [<ffffffff810c6e37>] ? warn_slowpath_common+0x77/0xb0
[137619.835525] [<ffffffff810c6eb7>] ? warn_slowpath_fmt+0x47/0x50
[137619.835528] [<ffffffff81492419>] ? btrfs_run_delayed_refs+0x279/0x2b0
[137619.835531] [<ffffffff814b0978>] ? btrfs_truncate_inode_items+0x8b8/0xdc0
[137619.835535] [<ffffffff814b1d4e>] ? btrfs_evict_inode+0x3fe/0x550
[137619.835538] [<ffffffff811cd4f7>] ? evict+0xb7/0x180
[137619.835541] [<ffffffff811c37cc>] ? do_unlinkat+0x12c/0x2d0
[137619.835543] [<ffffffff81bdb017>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[137619.835545] ---[ end trace 6e8061336c42ff94 ]---
[137619.835547] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2946: errno=-5 IO failure
[137619.835550] BTRFS info (device bcache2): forced readonly
[137619.886069] pending csums is 410705920
So it looks like fixing one error introduces other errors. Should I try
init-extent-tree after taking a backup?
BTW: "btrfsck --repair" does not work: I complains about unsupported
cases due to compression of extents and that I need to contact the
developers for covering this case.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-03-31 23:27 ` Henk Slager
2016-04-01 1:10 ` Qu Wenruo
@ 2016-04-02 9:00 ` Kai Krakow
2016-04-02 17:17 ` Henk Slager
1 sibling, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-02 9:00 UTC (permalink / raw)
To: linux-btrfs
Am Fri, 1 Apr 2016 01:27:21 +0200
schrieb Henk Slager <eye1tm@gmail.com>:
> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
> errors in dmesg.
It is the gentoo patchset, I don't think anything there relates to
btrfs:
https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/
> Also, where does 'duplicate object' come from? dmesg ? then please
> post its surroundings, straight from dmesg.
It was in dmesg. I already posted it in the other thread and Qu took
note of it. Apparently, I didn't manage to capture anything else than:
btrfs_run_delayed_refs:2927: errno=-17 Object already exists
It hit me unexpected. This was the first time btrfs went RO for me. It
was with kernel 4.4.5 I think.
I suspect this is the outcome of unnoticed corruptions that sneaked in
earlier over some period of time. The system had no problems until this
incident, and only then I discovered the huge pile of corruptions when I
ran btrfsck.
I'm also pretty convinced now that VirtualBox itself is not the problem
but only victim of these corruptions, that's why it primarily shows up
in the VDI file.
However, I now found csum errors in unrelated files (see other post in
this thread), even for files not touched in a long time.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-02 9:00 ` Kai Krakow
@ 2016-04-02 17:17 ` Henk Slager
2016-04-02 20:16 ` Kai Krakow
0 siblings, 1 reply; 22+ messages in thread
From: Henk Slager @ 2016-04-02 17:17 UTC (permalink / raw)
To: linux-btrfs
On Sat, Apr 2, 2016 at 11:00 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Fri, 1 Apr 2016 01:27:21 +0200
> schrieb Henk Slager <eye1tm@gmail.com>:
>
>> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
>> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
>> errors in dmesg.
>
> It is the gentoo patchset, I don't think anything there relates to
> btrfs:
> https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/
>
>> Also, where does 'duplicate object' come from? dmesg ? then please
>> post its surroundings, straight from dmesg.
>
> It was in dmesg. I already posted it in the other thread and Qu took
> note of it. Apparently, I didn't manage to capture anything else than:
>
> btrfs_run_delayed_refs:2927: errno=-17 Object already exists
>
> It hit me unexpected. This was the first time btrfs went RO for me. It
> was with kernel 4.4.5 I think.
>
> I suspect this is the outcome of unnoticed corruptions that sneaked in
> earlier over some period of time. The system had no problems until this
> incident, and only then I discovered the huge pile of corruptions when I
> ran btrfsck.
>
> I'm also pretty convinced now that VirtualBox itself is not the problem
> but only victim of these corruptions, that's why it primarily shows up
> in the VDI file.
>
> However, I now found csum errors in unrelated files (see other post in
> this thread), even for files not touched in a long time.
Ok, this is some good further status and background. That there are
more csum errors elsewhere is quite worrying I would say. You said HW
is tested, are you sure there no rare undetected failures, like due to
overclocking or just aging or whatever. It might just be that spurious
HW errors just now start to happen and are unrelated to kernel upgrade
from 4.4.x to 4.5.
I had once a RAM module going bad; Windows7 ran fine (at least no
crashes), but when I booted with Linux/btrfs, all kinds of strange
btrfs errors started to appear including csum errors.
The other thing you could think about is the SSD cache partition. I
don't remember if blocks from RAM to SSD get an extra CRC attached
(independent of BTRFS). But if data gets corrupted while in the SSD,
you could get very nasty errors, how nasty depends a bit on the
various bcache settings. It is not unthinkable that dirty changed data
gets written to the harddisks. But at least btrfs (scub) can detect
that (the situation you are in now).
Maybe to further isolate just btrfs, you could temporary rule out
bcache by making sure the cache is clean and then increase the
startsectors of second partitions on the harddisks by 16 (8KiB) and
then reboot. Of course after any write to the partitions, you'll have
to recreate all bcache.
But maybe it is just due to bugs in older kernels that the fs has been
silently corrupted and now kernel 4.5 cannot handle it anymore and any
use of the fs increases corruption.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-02 17:17 ` Henk Slager
@ 2016-04-02 20:16 ` Kai Krakow
2016-04-03 0:14 ` Chris Murphy
0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-02 20:16 UTC (permalink / raw)
To: linux-btrfs
Am Sat, 2 Apr 2016 19:17:55 +0200
schrieb Henk Slager <eye1tm@gmail.com>:
> On Sat, Apr 2, 2016 at 11:00 AM, Kai Krakow <hurikhan77@gmail.com>
> wrote:
> > Am Fri, 1 Apr 2016 01:27:21 +0200
> > schrieb Henk Slager <eye1tm@gmail.com>:
> >
> >> It is not clear to me what 'Gentoo patch-set r1' is and does. So
> >> just boot a vanilla v4.5 kernel from kernel.org and see if you get
> >> csum errors in dmesg.
> >
> > It is the gentoo patchset, I don't think anything there relates to
> > btrfs:
> > https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/
> >
> >> Also, where does 'duplicate object' come from? dmesg ? then please
> >> post its surroundings, straight from dmesg.
> >
> > It was in dmesg. I already posted it in the other thread and Qu took
> > note of it. Apparently, I didn't manage to capture anything else
> > than:
> >
> > btrfs_run_delayed_refs:2927: errno=-17 Object already exists
> >
> > It hit me unexpected. This was the first time btrfs went RO for me.
> > It was with kernel 4.4.5 I think.
> >
> > I suspect this is the outcome of unnoticed corruptions that sneaked
> > in earlier over some period of time. The system had no problems
> > until this incident, and only then I discovered the huge pile of
> > corruptions when I ran btrfsck.
> >
> > I'm also pretty convinced now that VirtualBox itself is not the
> > problem but only victim of these corruptions, that's why it
> > primarily shows up in the VDI file.
> >
> > However, I now found csum errors in unrelated files (see other post
> > in this thread), even for files not touched in a long time.
>
> Ok, this is some good further status and background. That there are
> more csum errors elsewhere is quite worrying I would say. You said HW
> is tested, are you sure there no rare undetected failures, like due to
> overclocking or just aging or whatever. It might just be that spurious
> HW errors just now start to happen and are unrelated to kernel upgrade
> from 4.4.x to 4.5.
> I had once a RAM module going bad; Windows7 ran fine (at least no
> crashes), but when I booted with Linux/btrfs, all kinds of strange
> btrfs errors started to appear including csum errors.
I'll go checking the RAM for problems - tho that would be the first
time in twenty years that a RAM module hadn't errors from the
beginning. Well, you'll never know. But I expect no error since usually
this would mean all sorts of different and random problems which I
don't have. Problems are very specific, which is atypical for RAM
errors.
The hardware is not overclocked, every part was tested when installed.
> The other thing you could think about is the SSD cache partition. I
> don't remember if blocks from RAM to SSD get an extra CRC attached
> (independent of BTRFS). But if data gets corrupted while in the SSD,
> you could get very nasty errors, how nasty depends a bit on the
> various bcache settings. It is not unthinkable that dirty changed data
> gets written to the harddisks. But at least btrfs (scub) can detect
> that (the situation you are in now).
Well, the SSD could in fact soon become a problem. It's at 97% of its
lifetime according to SMART. I'm probably somewhere near 85TB (that's
the lifetime spec of the SSD) of written data within one year thanks to
some unfortunate disk replacement (btrfs replace) action with btrfs
through bcache, and weekly scrubs (which does not just read, but
writes).
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100
100 000 Pre-fail Always - 1 5 Reallocate_NAND_Blk_Cnt
0x0033 100 100 000 Pre-fail Always - 0 9
Power_On_Hours 0x0032 100 100 000 Old_age
Always - 8705 12 Power_Cycle_Count 0x0032 100
100 000 Old_age Always - 286 171
Program_Fail_Count 0x0032 100 100 000 Old_age
Always - 0 172 Erase_Fail_Count 0x0032 100 100
000 Old_age Always - 0 173 Ave_Block-Erase_Count
0x0032 003 003 000 Old_age Always - 2913 174
Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age
Always - 112 180 Unused_Reserve_NAND_Blk 0x0033 000
000 000 Pre-fail Always - 1036 183
SATA_Interfac_Downshift 0x0032 100 100 000 Old_age
Always - 0 184 Error_Correction_Count 0x0032 100 100
000 Old_age Always - 0 187 Reported_Uncorrect
0x0032 100 100 000 Old_age Always - 0 194
Temperature_Celsius 0x0022 067 057 000 Old_age
Always - 33 (Min/Max 20/43) 196 Reallocated_Event_Count
0x0032 100 100 000 Old_age Always - 0 197
Current_Pending_Sector 0x0032 100 100 000 Old_age
Always - 0 198 Offline_Uncorrectable 0x0030 100 100
000 Old_age Offline - 0 199 UDMA_CRC_Error_Count
0x0032 100 100 000 Old_age Always - 0 202
Percent_Lifetime_Used 0x0031 003 003 000 Pre-fail
Offline - 97 206 Write_Error_Rate 0x000e 100
100 000 Old_age Always - 0 210
Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age
Always - 0 246 Total_Host_Sector_Write 0x0032 100 100
000 Old_age Always - 42879382296 247
Host_Program_Page_Count 0x0032 100 100 000 Old_age
Always - 1495038460 248 Bckgnd_Program_Page_Cnt 0x0032
100 100 000 Old_age Always - 42326578695
> Maybe to further isolate just btrfs, you could temporary rule out
> bcache by making sure the cache is clean and then increase the
> startsectors of second partitions on the harddisks by 16 (8KiB) and
> then reboot. Of course after any write to the partitions, you'll have
> to recreate all bcache.
Bcache had some patches lately for problems I never experienced. At
this point, I'd also not rule out bcache as the fault. Tho, bcache
itself had no problems (I have one other system where bcache broke down
after those patches were applied, resulting in a broken bcache b-tree).
> But maybe it is just due to bugs in older kernels that the fs has been
> silently corrupted and now kernel 4.5 cannot handle it anymore and any
> use of the fs increases corruption.
I'm pretty sure the problems sneaked in during running older kernels,
and the FS going RO was only tip of the iceberg.
My last "error free" rsync backup is from mid March. By that time, I
probably had no csum errors in files with young modification time - but
since I only in-place sync files with changed mod-time, I cannot rule
out csum errors having already been there. My script only takes
snapshots of the backup scratch area when rsync was successful, thus my
last snapshot from mid March holds valid copies of the broken files
while the scratch area has a current backup with some files broken (due
to in-place sync). [1]
According to previous inspections, that backup FS is in good shape -
the only btrfsck errors have been false alerts which have been fixed by
Qu (thanks BTW).
Interesting thing is:
As with the first file with csum errors (the VDI file), also the second
file has csum errors again when recreated. It's a game data file from
Steam. I removed it (then the FS went RO, mentioned earlier in this
thread). Now, Steam re-downloaded the file to a temp directory - so
obviously it's a completely new file (except Steam somehow magically
recovered it from somewhere else). But this new file has csum errors
again. WTH? And Steam forces the FS RO when working with this file.
So, either the SSD (thru bcache) or btrfs' compression algorithms show
bugs with very specific data patterns (since I'm using compress=lzo),
or the other corruptions make btrfs destroy those new files and it
allocates space over and over again from affected areas of the disk. I
don't know how btrfs allocation works - but that may be an explanation
(wrt the backpointer errors).
BTW: Replacement SSD already ordered. At the current rate the old
one will reach 100% lifetime in about 4-6 weeks.
[1]: As a reference or if you're curious:
https://gist.github.com/kakra/5520370
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-02 20:16 ` Kai Krakow
@ 2016-04-03 0:14 ` Chris Murphy
2016-04-03 4:02 ` Kai Krakow
0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-03 0:14 UTC (permalink / raw)
To: Btrfs BTRFS
On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> I'll go checking the RAM for problems - tho that would be the first
> time in twenty years that a RAM module hadn't errors from the
> beginning. Well, you'll never know. But I expect no error since usually
> this would mean all sorts of different and random problems which I
> don't have. Problems are very specific, which is atypical for RAM
> errors.
Well so far it's just the VDI that's experiencing csum mismatch
errors, right? So that's not bad RAM, which would affect other files
too. And same for a failing SSD.
I think you've got a bug somewhere and it's just hard to say where it
is based on the available information. I've already lost track if
others have all of the exact same setup you do: bcache + nossd +
autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume.
There are others who have some of those options, but I don't know if
there's anyone who has all of those going on.
Maybe Qu has some suggestions, but if it were me I'd do this. Build
mainline 4.5.0, it's a known quantity by Btrfs devs. Build the kernel
with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you
mount the file system, don't use mount option check_int, just use your
regular mount options and try to reproduce the VDI corruption. If you
can reproduce it, then start over, this time with check_int mount
option included along with the others you're using and try to
reproduce. It's possible there will be fairly verbose kernel messages,
so use boot parameter log_buf_len=1M and then that way you can use
dmesg rather than depending on journalctl -k which sometimes drops
messages if there are too many.
If you reproduce the corruption while check_int is enabled, kernel
messages should have clues and then you can put that in a file and
attach to the list or open a bug. FWIW, I'm pretty sure your MUA is
wrapping poorly, when I look at this URL for your post with smartctl
output, it wraps in a way that's essentially impossible to sort out at
a glance. Whether it's your MUA or my web browser pretty much doesn't
matter, it's not legible so what I do is just attach as file to a bug
report or if small enough onto the list itself.
http://www.spinics.net/lists/linux-btrfs/msg53790.html
Finally, I would retest yet again with check_int_data as a mount
option and try to reproduce. This is reported to be dirt slow, but it
might capture something that check_int doesn't. But I admit this is
throwing spaghetti on the wall, and is something of a goose chase just
because I don't know what else to recommend other than iterating all
of your mount options from none, adding just one at a time, and trying
to reproduce. That somehow sounds more tedious. But chances are you'd
find out what mount option is causing it; OR maybe you'd find out the
corruption always happens, even with defaults, even without bcache, in
which case that'd seem to implicate either a gentoo patch, or a
virtual box bug of some sort.
--
Chris Murphy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-03 0:14 ` Chris Murphy
@ 2016-04-03 4:02 ` Kai Krakow
2016-04-03 5:06 ` Duncan
2016-04-03 19:03 ` Chris Murphy
0 siblings, 2 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-03 4:02 UTC (permalink / raw)
To: linux-btrfs
Am Sat, 2 Apr 2016 18:14:17 -0600
schrieb Chris Murphy <lists@colorremedies.com>:
> On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow <hurikhan77@gmail.com>
> wrote:
>
> > I'll go checking the RAM for problems - tho that would be the first
> > time in twenty years that a RAM module hadn't errors from the
> > beginning. Well, you'll never know. But I expect no error since
> > usually this would mean all sorts of different and random problems
> > which I don't have. Problems are very specific, which is atypical
> > for RAM errors.
>
> Well so far it's just the VDI that's experiencing csum mismatch
> errors, right? So that's not bad RAM, which would affect other files
> too. And same for a failing SSD.
No, other files are affected, too. And it looks like those files are
easily affected even when removed and recreated from whatever backup
source.
> I think you've got a bug somewhere and it's just hard to say where it
> is based on the available information. I've already lost track if
> others have all of the exact same setup you do: bcache + nossd +
> autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume.
> There are others who have some of those options, but I don't know if
> there's anyone who has all of those going on.
I didn't run VirtualBox since the incident. So I'd rule out VirtualBox.
Currently, there seems to be no csum error for the VDI file, instead
now another file gets corruptions, even after recreated. I think it is
result of another corruption and thus a side effect.
Also I think, having options nossd+autodefrag+lzo shouldn't be an
exotic or unsupported option. Having this on top of bcache should just
work.
Let's not rule out bcache had a problem although I usually expect
bcache to freak out with internal btree corruption then.
> Maybe Qu has some suggestions, but if it were me I'd do this. Build
> mainline 4.5.0, it's a known quantity by Btrfs devs.
4.5.0-gentoo is currently only a few patches so I could easily build
vanilla.
> Build the kernel
> with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you
> mount the file system, don't use mount option check_int, just use your
> regular mount options and try to reproduce the VDI corruption. If you
> can reproduce it, then start over, this time with check_int mount
> option included along with the others you're using and try to
> reproduce. It's possible there will be fairly verbose kernel messages,
> so use boot parameter log_buf_len=1M and then that way you can use
> dmesg rather than depending on journalctl -k which sometimes drops
> messages if there are too many.
Does it make sense while I still have the corruptions in the FS? I'd
like to wait for Qu whether I should recreate the FS or whether I
should take some image, or send info to improve btrfsck...
I'm pretty sure I do not have reproducible corruptions which are not
caused by another corruption - so check_int would probably be of less
use currently.
> If you reproduce the corruption while check_int is enabled, kernel
> messages should have clues and then you can put that in a file and
> attach to the list or open a bug. FWIW, I'm pretty sure your MUA is
> wrapping poorly, when I look at this URL for your post with smartctl
> output, it wraps in a way that's essentially impossible to sort out at
> a glance. Whether it's your MUA or my web browser pretty much doesn't
> matter, it's not legible so what I do is just attach as file to a bug
> report or if small enough onto the list itself.
> http://www.spinics.net/lists/linux-btrfs/msg53790.html
Claws mail is just too smart for me... It showed up correctly in the
editor before hitting the send button. I wish I could go back to knode
(that did it's job right). But it's currently an unsupported orphan
project of KDE. :-(
> Finally, I would retest yet again with check_int_data as a mount
> option and try to reproduce. This is reported to be dirt slow, but it
> might capture something that check_int doesn't. But I admit this is
> throwing spaghetti on the wall, and is something of a goose chase just
> because I don't know what else to recommend other than iterating all
> of your mount options from none, adding just one at a time, and trying
> to reproduce. That somehow sounds more tedious. But chances are you'd
> find out what mount option is causing it; OR maybe you'd find out the
> corruption always happens, even with defaults, even without bcache, in
> which case that'd seem to implicate either a gentoo patch, or a
> virtual box bug of some sort.
I think the latter two are easily the least probable sort of bugs. But
I'll give it a try. For the time being, I could switch bcache to
write-around mode - so it could at least not corrupt btrfs during
writes.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-03 4:02 ` Kai Krakow
@ 2016-04-03 5:06 ` Duncan
2016-04-03 22:19 ` Kai Krakow
2016-04-03 19:03 ` Chris Murphy
1 sibling, 1 reply; 22+ messages in thread
From: Duncan @ 2016-04-03 5:06 UTC (permalink / raw)
To: linux-btrfs
Kai Krakow posted on Sun, 03 Apr 2016 06:02:02 +0200 as excerpted:
> No, other files are affected, too. And it looks like those files are
> easily affected even when removed and recreated from whatever backup
> source.
I've seen you say that several times now, I think. But none of those
times has it apparently occurred to you to double-check whether it's the
/same/ corruptions every time, or at least, if you checked it, I've not
seen it actually /reported/. (Note that I didn't say you didn't report
it, only that I've not seen it. A difference there is! =:^)
If I'm getting repeated corruptions of something, that's the first thing
I'd check, is there some repeating pattern to those corruptions, same
place in the file, same "wanted" value (expected), same "got" value, (not
expected if it's reporting corruption), etc.
Then I'd try different variations like renaming the file, putting it in a
different directory with all of the same other files, putting it in a
different directory with all different files, putting it in a different
directory by itself, putting it in the same directory but in a different
subvolume... you get the point.
Then I'd try different mount options, with and without compression, with
different kinds of compression, with compress-force and with simple
compress, with and without autodefrag...
I could try it with nocow enabled for the file (note that the file has to
be created with nocow before it gets content, for nocow to take effect),
tho of course that'll turn off btrfs checksumming, but I could still for
instance md5sum the original source and the nocowed test version and see
if it tests clean that way.
I could try it with nocow on the file but with a bunch of snapshots
interwoven with writing changes to the file (obviously this will kill
comparison against the original, but I could arrange to write the same
changes to the test file on btrfs, and to a control copy of the file on
non-btrfs, and then md5sum or whatever compare them).
Then, if I had the devices available to do so, I'd try it in a different
btrfs of the same layout (same redundancy mode and number of devices),
both single and dup mode on a single device, etc.
And again if available, I'd try swapping the filesystem to different
machines...
OK, so trying /all/ the above might be a bit overboard but I think you
get the point. Try to find some pattern or common element in the whole
thing, and report back the results at least for the "simple" experiments
like whether the corruption appears to be the same (same got at the same
spot) or different, and whether putting the file in a different subdir or
using a different name for it matters at all. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-03 4:02 ` Kai Krakow
2016-04-03 5:06 ` Duncan
@ 2016-04-03 19:03 ` Chris Murphy
1 sibling, 0 replies; 22+ messages in thread
From: Chris Murphy @ 2016-04-03 19:03 UTC (permalink / raw)
To: Kai Krakow; +Cc: Btrfs BTRFS
On Sat, Apr 2, 2016 at 10:02 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Sat, 2 Apr 2016 18:14:17 -0600
> Also I think, having options nossd+autodefrag+lzo shouldn't be an
> exotic or unsupported option. Having this on top of bcache should just
> work.
I'm not suggesting it shouldn't work. But in fact something isn't
working. Bugs happen. Regressions happen. This is a process of
elimination project to find out either why, or under what
condition(s), it doesn't work.
> Does it make sense while I still have the corruptions in the FS? I'd
> like to wait for Qu whether I should recreate the FS or whether I
> should take some image, or send info to improve btrfsck...
It's up to you. I think it's fair to say the file system should not be
corrupting files so long as it's willing to write to the volume. So
that's a problem in and of itself; it should sooner go read only.
It's completely reasonable to take a btrfs-image, back everything up,
and then try a 'btrfs check --repair' and see if it can fix things up.
If not, that makes the btrfs-image more valuable.
> I think the latter two are easily the least probable sort of bugs. But
> I'll give it a try. For the time being, I could switch bcache to
> write-around mode - so it could at least not corrupt btrfs during
> writes.
I don't know enough about bcache to speculate what can happen if there
are already fs corruptions. Is it possible bcache makes things worse?
No idea.
--
Chris Murphy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-03 5:06 ` Duncan
@ 2016-04-03 22:19 ` Kai Krakow
2016-04-04 0:51 ` Chris Murphy
2016-04-04 4:34 ` Duncan
0 siblings, 2 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-03 22:19 UTC (permalink / raw)
To: linux-btrfs
Am Sun, 3 Apr 2016 05:06:19 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:
> Kai Krakow posted on Sun, 03 Apr 2016 06:02:02 +0200 as excerpted:
>
> > No, other files are affected, too. And it looks like those files are
> > easily affected even when removed and recreated from whatever backup
> > source.
>
> I've seen you say that several times now, I think. But none of those
> times has it apparently occurred to you to double-check whether it's
> the /same/ corruptions every time, or at least, if you checked it,
> I've not seen it actually /reported/. (Note that I didn't say you
> didn't report it, only that I've not seen it. A difference there is!
> =:^)
Believe me, I would double check... But this FS is (and the affected
files are) just too big to create test cases, and backups, and copies,
and you know what...
So only chance I see is to offer help improving "btrfsck --repair"
before I wipe and restore from backup. Except the unlikely case
"--repair" will improve to a point it gets my FS back in order. ;-)
I'll have to wait for my new bcache SSD to arrive. I it's current state
(lifetime at 97%) I don't want to push my whole file data through it.
Then I'll backup the current state (the damaged files are skipped
anyways because they haven't been "modified" according to mtime), so
I'll get a clean backup except for the VDI file and some big Steam
files (which actually can easily be downloaded again through the
client).
And yes, you are true in that I didn't check if it is the same
corruption every time. But that's also a bit difficult to do because
I'd need either enough spare disk space to keep copies of the files to
compare against, or need to setup some block-identifying checksumming
like a hash tree.
> If I'm getting repeated corruptions of something, that's the first
> thing I'd check, is there some repeating pattern to those
> corruptions, same place in the file, same "wanted" value (expected),
> same "got" value, (not expected if it's reporting corruption), etc.
Way to go, usually...
> Then I'd try different variations like renaming the file, putting it
> in a different directory with all of the same other files, putting it
> in a different directory with all different files, putting it in a
> different directory by itself, putting it in the same directory but
> in a different subvolume... you get the point.
Here's the point: Shuffling files around should be done to different
filesystems. I neither have any spare files to do that, nor I currently
can afford time to shuffle around such big files - it takes multiple
hours to copy these. Already looking forward to restoring the backup...
*sigh*
BTW: Is it possible to use my backup drive (it's btrfs single-data
dup-metadata, single device) as a seed device for my newly created
btrfs pool (raid0-data, raid1-metadata, three devices)? I guess the
seed source cannot be mounted or modified...
> Then I'd try different mount options, with and without compression,
> with different kinds of compression, with compress-force and with
> simple compress, with and without autodefrag...
As a first step I've switched bcache to write-around mode. It should
prevent (or at least reduce) more corruption if bcache is at fault. And
it's the safer choice anyway for a soon-to-die SSD.
> I could try it with nocow enabled for the file (note that the file
> has to be created with nocow before it gets content, for nocow to
> take effect), tho of course that'll turn off btrfs checksumming, but
> I could still for instance md5sum the original source and the nocowed
> test version and see if it tests clean that way.
I already thought about putting the VDI back to nocow... I had this
before. But in this sense, csum errors would go unnoticed. So I don't
think that is adequate.
But in consequence I could actually md5sum the files as you wrote
because there won't be read errors due to csum mismatch. And I could
detect corruption that way.
> I could try it with nocow on the file but with a bunch of snapshots
> interwoven with writing changes to the file (obviously this will kill
> comparison against the original, but I could arrange to write the
> same changes to the test file on btrfs, and to a control copy of the
> file on non-btrfs, and then md5sum or whatever compare them).
That would probably work but I do not quite trust it due to the
corruptions already on disk which seemingly damage specific files or
areas on the disk.
> Then, if I had the devices available to do so, I'd try it in a
> different btrfs of the same layout (same redundancy mode and number
> of devices), both single and dup mode on a single device, etc.
In that sense: If I had the disks available I already would've taken a
block-by-block copy and then restored from backup.
> And again if available, I'd try swapping the filesystem to different
> machines...
Maybe another time... ;-)
Actually, I only have that one system here. I could do that with the
other system I have problems with - but that's another story and
currently low priority.
> OK, so trying /all/ the above might be a bit overboard but I think
> you get the point. Try to find some pattern or common element in the
> whole thing, and report back the results at least for the "simple"
> experiments like whether the corruption appears to be the same (same
> got at the same spot) or different, and whether putting the file in a
> different subdir or using a different name for it matters at all.
> =:^)
Your ideas are always welcome.
The corruptions seem to be different by the following observation:
While the VDI file was corrupted over and over again with a csum error,
I could simply remove it and restore from backup. The last thing I did
was ddescue it from the damaged version to my backup device, than rsync
the file back to the originating device (which created a new file
side-by-side, so in a new area of disk space, then replace-by-renamed
the old one). I didn't run VirtualBox since back then but the file
didn't become corrupted either since then.
But now, according to btrfsck, a csum error instead came up in another
big file from Steam. This time, when I rm the file, the kernel
backtraces and sends btrfs to RO mode. The file cannot be removed. I'm
going to leave it that way currently, the file won't be used currently.
And I can simply ignore it for backup and restore, it's not an
important one. Better have an "incorrectable" csum error there than
having one jumping unpredictably across my files.
Before you ask: Yes, I'm still working productively with this broken
file system. I'm not sure if this is a point for or against btrfs,
tho. ;-) It works perfectly stable as long as I do not touch any of the
damaged files (which was and currently continues to be easy). Ah, well,
"perfectly" except that commands "df" and "du" tend to freeze and be
unkillable. I'm going to ignore that and take the opportunity to test
how far I can stress btrfs before it finally breaks down.
Thus, I'll leave it that way until it breaks down or I decide to effort
the time to restore from backup. Until then, I keep my last known-good
snapshot and a known-incomplete backup scratch storage where I at least
know which files are broken. My daily-business files are stored twice
anyways (offsite and local backup).
I hope I can add some value to improving btrfsck until I have to
restore from backup. I know that with my current setup I cannot give
any help in finding a possible btrfs kernel flaw - which I actually
think maybe was in a previous kernel version and has been fixed by now.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-03 22:19 ` Kai Krakow
@ 2016-04-04 0:51 ` Chris Murphy
2016-04-04 19:36 ` Kai Krakow
2016-04-04 4:34 ` Duncan
1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-04 0:51 UTC (permalink / raw)
To: Kai Krakow; +Cc: Btrfs BTRFS
On Sun, Apr 3, 2016 at 4:19 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> BTW: Is it possible to use my backup drive (it's btrfs single-data
> dup-metadata, single device) as a seed device for my newly created
> btrfs pool (raid0-data, raid1-metadata, three devices)?
Yes.
I just tried doing the conversion to raid1 before and after seed
removal, but with the small amount of data (4GiB) I can't tell a
difference. It seems like -dconvert=raid with seed still connected
makes two rw copies (i.e. there's a ro copy which is the original, and
then two rw copies on 2 of the 3 devices I added all at the same time
to the seed), and the 'btrfs dev remove' command to remove the seed
happened immediately, suggested the prior balances had already
migrated copies off the seed. This may or may not be optimal for your
case.
Two gotchas.
I ran into this bug:
btrfs fi usage crash when volume contains seed device
https://bugzilla.kernel.org/show_bug.cgi?id=115851
And there is a phantom single chunk on one of the new rw devices that was added.
Data,single: Size:1.00GiB, Used:0.00B
/dev/dm-8 1.00GiB
It's still there after the -dconvert=raid1 and separate -mconvert=raid
and after seed device removal. A balance start without filters removes
it, chances are had I used -dconvert=raid1,soft it would have vanished
also but I didn't retest for that.
> I guess the
> seed source cannot be mounted or modified...
?
--
Chris Murphy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-03 22:19 ` Kai Krakow
2016-04-04 0:51 ` Chris Murphy
@ 2016-04-04 4:34 ` Duncan
2016-04-04 19:26 ` Kai Krakow
1 sibling, 1 reply; 22+ messages in thread
From: Duncan @ 2016-04-04 4:34 UTC (permalink / raw)
To: linux-btrfs
Kai Krakow posted on Mon, 04 Apr 2016 00:19:25 +0200 as excerpted:
> The corruptions seem to be different by the following observation:
>
> While the VDI file was corrupted over and over again with a csum error,
> I could simply remove it and restore from backup. The last thing I did
> was ddescue it from the damaged version to my backup device, than rsync
> the file back to the originating device (which created a new file
> side-by-side, so in a new area of disk space, then replace-by-renamed
> the old one). I didn't run VirtualBox since back then but the file
> didn't become corrupted either since then.
>
> But now, according to btrfsck, a csum error instead came up in another
> big file from Steam. This time, when I rm the file, the kernel
> backtraces and sends btrfs to RO mode. The file cannot be removed. I'm
> going to leave it that way currently, the file won't be used currently.
> And I can simply ignore it for backup and restore, it's not an important
> one. Better have an "incorrectable" csum error there than having one
> jumping unpredictably across my files.
While my dying ssd experience was with btrfs raid1 direct on a pair of
ssds, extrapolating from what I learned about the ssd behavior to your
case with bcache caching to the ssd, then writing back to the spinning
rust backing store, presumably in btrfs single-device mode with single
data and either single or dup metadata (there's enough other cases
interwoven on this thread its no longer clear to me which posted btrfs fi
show, etc, apply to this case, so I'm guessing, as I believe presenting
it as more than a single device at the btrfs level would require multiple
bcache devices, tho of course you could do that by partitioning the
ssd)...
Would lead me to predict very much the behavior you're seeing, if the
caching ssd was dying.
As bcache is running below btrfs, btrfs won't know anything about it, and
therefore, will behave, effectively, as if it's not there -- an error on
the ssd will look like an error on the btrfs, period. (As I'm assuming a
single btrfs device, which device of the btrfs doesn't come into
question, tho which copy of dup metadata might... but that's an entirely
different can of worms since I'm not sure whether the bcache would end up
deduping the dup metadata or not, and the ssd might do the same, and...)
And with bcache doing write-behind from the ssd to the backing store,
underneath the level at which btrfs could detect and track csum
corruption, if it's corrupt on the ssd, that corruption then transfers to
the backing store as btrfs won't know that transfer is happening at all
and thus won't be in the loop to detect the csum error at that stage.
Meanwhile, what I saw on the pair of ssds, one going bad, in btrfs raid1
mode, was that a btrfs scrub *WOULD* successfully detect the csum errors
on the bad ssd, and rewrite it from the remaining good copy.
Keep in mind that this is without snapshots, so that rewrite, while COW,
would then release the old copy back into the free space pool. In so
doing, it would trigger the ssd firmware to copy the rest of the erase-
block and erase it, and that in turn would trigger the firmware to detect
the bad sector and replace it with one from its spare-sectors list. As a
result, it would tick up the raw value of attribute #5,
Reallocated_Sector_Ct, as well as 182, Erase_Fail_Count_Total, in smartctl
-A (tho the two attributes didn't increase in numeric lock-step, both
were increasing over time, primarily when I ran scrubs).
But it was mostly (almost entirely) when I ran the scrubs and
consequently rewrote the corrupted sectors from the copy on the good
device, that it would trigger those erase-fails and sector reallocations.
Anyway, the failing ssd's issues gradually got worse, until I was having
to scrub and trigger both filesystem recopy and bad ssd sector rewrites
any time I wrote anything major to the filesystem as well as at cold-boot
(leaving the system off for several hours apparently accelerated the
sector rot within stable data, while the powered-on state kept the flash
cells charged high enough they didn't rot so fast and it was mostly or
entirely new/changed data I had to worry about). Eventually I simply
decided I was tired of the now more or less constant hassle and I wasn't
learning much new any more from the decaying device's behavior, and I
replaced it.
Translating that to your case, if your caching ssd is dying and some
sectors are now corrupted, unless there's a second btrfs copy of that
block to copy over the bad version with, it's unlikely to trigger those
sector reallocations.
Tho actually rewriting them (or at the device firmware level, COWing them
and erasing the old erase-blocks), as bcache will be doing if it dumps
the current cache content and fills those blocks with something else,
should trigger the same thing, tho unless bcache can force-dump and
recache or something, I don't believe there's a systematic way to trigger
it over all cached data as btrfs scrub does.
Anyway, if I'm correct and as your ordering the new ssd indicates you may
suspect as well, the problem may indeed be that ssd, and a new ssd
(assuming /it/ isn't defective) should fix it, tho the existing damage on
the existing btrfs may or may not be fully recoverable once you get a new
ssd and thus don't have to worry about further damage from the old one.
Meanwhile, putting bcache into write-around mode, so it makes no further
changes to the ssd and only uses it for reads, is probably wise, and
should help limit further damage. Tho if in that mode bcache still does
writeback of existing dirty and cached data to the backing store, some
further damage could occur from that. But I don't know enough about
bcache to know what its behavior and level of available configuration in
that regard actually are. As long as it's not trying to write anything
from the ssd to the backing store, I think further damage should be very
limited.
But were you running btrfs raid1 without bcache, or with multiple devices
at the btrfs level, each bcached but to separate ssds so any rot wouldn't
be likely to transfer between them increasing the chances of both copies
being bad at once, I expect you'd be seeing behavior on your ssd very
close to what I saw on my failing one, and assuming your other device was
fine, you could still be scrubbing and recovering fine, as I was, tho
with the necessary frequency of scrubs increasing over time (and not
helped by the recently reported too many csum errors on compressed
content, even when they're on raid1 and should recover from the other
copy, crashing btrfs and the system, thus requiring more frequent scrubs
than would otherwise be required -- I ran into this too, but didn't
realize it only triggered on compressed content and was thus a specific
bug, and simply attributed it to btrfs not yet being fully stable and
believed that's what it always did with too many crc errors, even when
they should be recoverable from the good raid1 copy).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 4:34 ` Duncan
@ 2016-04-04 19:26 ` Kai Krakow
2016-04-05 1:44 ` Duncan
0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 19:26 UTC (permalink / raw)
To: linux-btrfs
Am Mon, 4 Apr 2016 04:34:54 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:
> Meanwhile, putting bcache into write-around mode, so it makes no
> further changes to the ssd and only uses it for reads, is probably
> wise, and should help limit further damage. Tho if in that mode
> bcache still does writeback of existing dirty and cached data to the
> backing store, some further damage could occur from that. But I
> don't know enough about bcache to know what its behavior and level of
> available configuration in that regard actually are. As long as it's
> not trying to write anything from the ssd to the backing store, I
> think further damage should be very limited.
bcache has 0 for dirty data most of the time for me - even in write
back mode. It does write back during idle time and at reduced rate,
usually that finishes within a few minutes.
Switching the cache to write-around initiates instant write-back of all
dirty data, so within seconds it goes down to zero and the cache
becomes detachable.
I'll go test the soon-to-die SSD as soon as it replaced. I think it's
still far from failing with bitrot. It was overprovisioned by 30% most
of the time, with the spare space trimmed. It certainly should have a
lot of sectors for wear levelling. In addition, smartctl shows no
sector errors at all - except for one: raw_read_error_rate. I'm not
sure what all those sensors tell me, but that one I'm also seeing on
hard disks which show absolutely no data damage.
In fact, I see those counters for my hard disks. But dd to /dev/null of
the complete raw hard disk shows no sector errors. It seems good. But
well, counting 1+1 together: I currently see data damage. But I guess
that's unrelated.
Is there some documentation somewhere what each of those sensors
technically mean and how to read the raw values and thresh values?
I'm also seeing multi_zone_error_rate on my spinning rust.
According to smartctl health check and smartctl extended selftest,
there's no problems at all - and the smart error log is empty. There
has never been an ATA error in dmesg... No relocated sectors... From my
naive view the drives still look good.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 0:51 ` Chris Murphy
@ 2016-04-04 19:36 ` Kai Krakow
2016-04-04 19:57 ` Chris Murphy
0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 19:36 UTC (permalink / raw)
To: linux-btrfs
Am Sun, 3 Apr 2016 18:51:07 -0600
schrieb Chris Murphy <lists@colorremedies.com>:
> > BTW: Is it possible to use my backup drive (it's btrfs single-data
> > dup-metadata, single device) as a seed device for my newly created
> > btrfs pool (raid0-data, raid1-metadata, three devices)?
>
> Yes.
>
> I just tried doing the conversion to raid1 before and after seed
> removal, but with the small amount of data (4GiB) I can't tell a
> difference. It seems like -dconvert=raid with seed still connected
> makes two rw copies (i.e. there's a ro copy which is the original, and
> then two rw copies on 2 of the 3 devices I added all at the same time
> to the seed), and the 'btrfs dev remove' command to remove the seed
> happened immediately, suggested the prior balances had already
> migrated copies off the seed. This may or may not be optimal for your
> case.
>
> Two gotchas.
>
> I ran into this bug:
> btrfs fi usage crash when volume contains seed device
> https://bugzilla.kernel.org/show_bug.cgi?id=115851
>
> And there is a phantom single chunk on one of the new rw devices that
> was added. Data,single: Size:1.00GiB, Used:0.00B
> /dev/dm-8 1.00GiB
>
> It's still there after the -dconvert=raid1 and separate -mconvert=raid
> and after seed device removal. A balance start without filters removes
> it, chances are had I used -dconvert=raid1,soft it would have vanished
> also but I didn't retest for that.
Good to know, thanks.
> > I guess the
> > seed source cannot be mounted or modified...
>
> ?
In the following sense: I should disable the automounter and backup job
for the seed device while I let my data migrate back to main storage in
the background...
My intention is to use fully my system while btrfs migrates the data
from seed to main storage. Then, afterwards I'd like to continue using
the seed device for backups.
I'd probably do the following:
1. create btrfs pool, attach seed
2. recreate my original subvolume structure by snapshotting the backup
scratch area multiple times into each subvolume
3. rearrange the files in each subvolume to match their intended use by
using rm and mv
4. reboot into full system
4. remove all left-over snapshots from the seed
5. remove (detach) the seed device
6. rebalance
7. switch bcache to write-back mode (or attach bcache only now)
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 19:36 ` Kai Krakow
@ 2016-04-04 19:57 ` Chris Murphy
2016-04-04 20:50 ` Kai Krakow
0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-04 19:57 UTC (permalink / raw)
To: Kai Krakow; +Cc: Btrfs BTRFS
On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>
>> > I guess the
>> > seed source cannot be mounted or modified...
>>
>> ?
>
> In the following sense: I should disable the automounter and backup job
> for the seed device while I let my data migrate back to main storage in
> the background...
The sprout can be written to just fine by the backup, just understand
that the seed and sprout volume UUID are different. Your automounter
is probably looking for the seed's UUID, and that seed can only be
mounted ro. The sprout UUID however can be mounted rw.
I would probably skip the automounter. Do the seed setup, mount it,
add all devices you're planning to add, then -o remount,rw,compress...
, and then activate the backup. But maybe your backup also is looking
for UUID? If so, that needs to be updated first. Once the balance
-dconvert=raid1 and -mconvert=raid1 is finished, then you can remove
the seed device. And now might be a good time to give the raid1 a new
label, I think it inherits the label of the seed but I'm not certain
of this.
> My intention is to use fully my system while btrfs migrates the data
> from seed to main storage. Then, afterwards I'd like to continue using
> the seed device for backups.
>
> I'd probably do the following:
>
> 1. create btrfs pool, attach seed
I don't understand that step in terms of commands. Sprouts are made
with btrfs dev add, not with mkfs. There is no pool creation. You make
a seed. You mount it. Add devices to it. Then remount it.
> 2. recreate my original subvolume structure by snapshotting the backup
> scratch area multiple times into each subvolume
> 3. rearrange the files in each subvolume to match their intended use by
> using rm and mv
> 4. reboot into full system
> 4. remove all left-over snapshots from the seed
> 5. remove (detach) the seed device
You have two 4's.
Anyway the 2nd 4 is not possible. The seed is ro by definition so you
can't remove snapshots from the seed. If you remove them from the
mounted rw sprout volume, they're removed from the sprout, not the
seed. If you want them on the sprout, but not on the seed, you need to
delete snapshots only after the seed is a.) removed from the sprout
and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.
--
Chris Murphy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 19:57 ` Chris Murphy
@ 2016-04-04 20:50 ` Kai Krakow
2016-04-04 21:00 ` Kai Krakow
2016-04-04 23:09 ` Chris Murphy
0 siblings, 2 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 20:50 UTC (permalink / raw)
To: linux-btrfs
Am Mon, 4 Apr 2016 13:57:50 -0600
schrieb Chris Murphy <lists@colorremedies.com>:
> On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow <hurikhan77@gmail.com>
> wrote:
>
> >
> [...]
> >>
> >> ?
> >
> > In the following sense: I should disable the automounter and backup
> > job for the seed device while I let my data migrate back to main
> > storage in the background...
>
> The sprout can be written to just fine by the backup, just understand
> that the seed and sprout volume UUID are different. Your automounter
> is probably looking for the seed's UUID, and that seed can only be
> mounted ro. The sprout UUID however can be mounted rw.
>
> I would probably skip the automounter. Do the seed setup, mount it,
> add all devices you're planning to add, then -o remount,rw,compress...
> , and then activate the backup. But maybe your backup also is looking
> for UUID? If so, that needs to be updated first. Once the balance
> -dconvert=raid1 and -mconvert=raid1 is finished, then you can remove
> the seed device. And now might be a good time to give the raid1 a new
> label, I think it inherits the label of the seed but I'm not certain
> of this.
>
>
> > My intention is to use fully my system while btrfs migrates the data
> > from seed to main storage. Then, afterwards I'd like to continue
> > using the seed device for backups.
> >
> > I'd probably do the following:
> >
> > 1. create btrfs pool, attach seed
>
> I don't understand that step in terms of commands. Sprouts are made
> with btrfs dev add, not with mkfs. There is no pool creation. You make
> a seed. You mount it. Add devices to it. Then remount it.
Hmm, yes. I didn't think this through into detail yet. It actually
works that way. I more commonly referenced to the general approach.
But I think this answers my question... ;-)
> > 2. recreate my original subvolume structure by snapshotting the
> > backup scratch area multiple times into each subvolume
> > 3. rearrange the files in each subvolume to match their intended
> > use by using rm and mv
> > 4. reboot into full system
> > 4. remove all left-over snapshots from the seed
> > 5. remove (detach) the seed device
>
> You have two 4's.
Oh... Sorry... I think one week of 80 work hours, and another of 60 was
a bit too much... ;-)
> Anyway the 2nd 4 is not possible. The seed is ro by definition so you
> can't remove snapshots from the seed. If you remove them from the
> mounted rw sprout volume, they're removed from the sprout, not the
> seed. If you want them on the sprout, but not on the seed, you need to
> delete snapshots only after the seed is a.) removed from the sprout
> and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.
If I understand right, the seed device won't change? So whatever action
I apply to the sprout pool, I can later remove the seed from the pool
and it will still be kind of untouched. Except, I'll have to return it
no non-seed mode (step b).
Why couldn't/shouldn't I remove snapshots before detaching the seed
device? I want to keep them on the seed but they are useless to me on
the sprout.
What happens to the UUIDs when I separate seed and sprout?
This is my layout:
/dev/sde1 contains my backup storage: btrfs with multiple weeks worth
of retention in form of ro snapshots, and one scratch area in which the
backup is performed. Snapshots are created from the scratch area. The
scratch area is one single subvolume updated by rsync.
I want to turn this into a seed for my newly created btrfs pool. This
one has subvolumes for /home, /home/my_user, /distribution_name/rootfs
and a few more (like var/log etc).
Since the backup is not split by those subvolumes but contains just the
single runtime view of my system rootfs, I'm planning to clone this
single subvolume back into each of my previously used subvolumes which
in turn of course now contain all the same complete filesystem tree.
Thus, in the next step, I'm planning to mv/rm the contents to get back
to the original subvolume structure - mv should be a fast operation
here, rm probably not so but I don't bother. I could defer that until
later by moving those rm-candidates into some trash folder per
subvolume.
Now, I still have the ro-snapshots worth of multiple weeks of
retention. I only need those in my backup storage, not in the storage
proposed to become my bootable system. So I'd simply remove them. I
could also defer that until later easily.
This should get my system back into working state pretty fast and
easily if I didn't miss a point.
I'd now reboot into the system to see if it's working. By then, it's
time for some cleanup (remove the previously deferred "trashes" and
retention snapshots), then separate the seed from the sprout. During
that time, I could already use my system again while it's migrating for
me in the background.
I'd then return the seed back to non-seed, so it can take the role of
my backup storage again. I'd do a rebalance now.
During the whole process, the backup storage will still stay safe for
me. If something goes wrong, I could easily start over.
Did I miss something? Is it too much of an experimental kind of stuff?
BTW: The way it is arranged now, the backup storage is bootable by
setting the scratch area subvolume as the rootfs on kernel cmdline,
USB drivers are included in the kernel, it's tested and works. I guess,
this isn't possible while the backup storage acts as a seed device? But
I have an initrd with latest btrfs-progs on my boot device (which is an
UEFI ESP, so not related to btrfs at all), I should be able to use that
to revert changes preventing me from booting.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 20:50 ` Kai Krakow
@ 2016-04-04 21:00 ` Kai Krakow
2016-04-04 23:09 ` Chris Murphy
1 sibling, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 21:00 UTC (permalink / raw)
To: linux-btrfs
Am Mon, 4 Apr 2016 22:50:18 +0200
schrieb Kai Krakow <hurikhan77@gmail.com>:
> Am Mon, 4 Apr 2016 13:57:50 -0600
> schrieb Chris Murphy <lists@colorremedies.com>:
>
> > On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow <hurikhan77@gmail.com>
> > wrote:
> >
> > >
> > [...]
> [...]
> > >
> > > In the following sense: I should disable the automounter and
> > > backup job for the seed device while I let my data migrate back
> > > to main storage in the background...
> >
> > The sprout can be written to just fine by the backup, just
> > understand that the seed and sprout volume UUID are different. Your
> > automounter is probably looking for the seed's UUID, and that seed
> > can only be mounted ro. The sprout UUID however can be mounted rw.
> >
> > I would probably skip the automounter. Do the seed setup, mount it,
> > add all devices you're planning to add, then -o
> > remount,rw,compress... , and then activate the backup. But maybe
> > your backup also is looking for UUID? If so, that needs to be
> > updated first. Once the balance -dconvert=raid1 and -mconvert=raid1
> > is finished, then you can remove the seed device. And now might be
> > a good time to give the raid1 a new label, I think it inherits the
> > label of the seed but I'm not certain of this.
> >
> >
> > > My intention is to use fully my system while btrfs migrates the
> > > data from seed to main storage. Then, afterwards I'd like to
> > > continue using the seed device for backups.
> > >
> > > I'd probably do the following:
> > >
> > > 1. create btrfs pool, attach seed
> >
> > I don't understand that step in terms of commands. Sprouts are made
> > with btrfs dev add, not with mkfs. There is no pool creation. You
> > make a seed. You mount it. Add devices to it. Then remount it.
>
> Hmm, yes. I didn't think this through into detail yet. It actually
> works that way. I more commonly referenced to the general approach.
>
> But I think this answers my question... ;-)
>
> > > 2. recreate my original subvolume structure by snapshotting the
> > > backup scratch area multiple times into each subvolume
> > > 3. rearrange the files in each subvolume to match their intended
> > > use by using rm and mv
> > > 4. reboot into full system
> > > 4. remove all left-over snapshots from the seed
> > > 5. remove (detach) the seed device
> >
> > You have two 4's.
>
> Oh... Sorry... I think one week of 80 work hours, and another of 60
> was a bit too much... ;-)
>
> > Anyway the 2nd 4 is not possible. The seed is ro by definition so
> > you can't remove snapshots from the seed. If you remove them from
> > the mounted rw sprout volume, they're removed from the sprout, not
> > the seed. If you want them on the sprout, but not on the seed, you
> > need to delete snapshots only after the seed is a.) removed from
> > the sprout and b.) made no longer a seed with btrfstune -S 0 and
> > c.) mounted rw.
>
> If I understand right, the seed device won't change? So whatever
> action I apply to the sprout pool, I can later remove the seed from
> the pool and it will still be kind of untouched. Except, I'll have to
> return it no non-seed mode (step b).
>
> Why couldn't/shouldn't I remove snapshots before detaching the seed
> device? I want to keep them on the seed but they are useless to me on
> the sprout.
>
> What happens to the UUIDs when I separate seed and sprout?
>
> This is my layout:
>
> /dev/sde1 contains my backup storage: btrfs with multiple weeks worth
> of retention in form of ro snapshots, and one scratch area in which
> the backup is performed. Snapshots are created from the scratch area.
> The scratch area is one single subvolume updated by rsync.
>
> I want to turn this into a seed for my newly created btrfs pool. This
> one has subvolumes for /home, /home/my_user, /distribution_name/rootfs
> and a few more (like var/log etc).
>
> Since the backup is not split by those subvolumes but contains just
> the single runtime view of my system rootfs, I'm planning to clone
> this single subvolume back into each of my previously used subvolumes
> which in turn of course now contain all the same complete filesystem
> tree. Thus, in the next step, I'm planning to mv/rm the contents to
> get back to the original subvolume structure - mv should be a fast
> operation here, rm probably not so but I don't bother. I could defer
> that until later by moving those rm-candidates into some trash folder
> per subvolume.
>
> Now, I still have the ro-snapshots worth of multiple weeks of
> retention. I only need those in my backup storage, not in the storage
> proposed to become my bootable system. So I'd simply remove them. I
> could also defer that until later easily.
>
> This should get my system back into working state pretty fast and
> easily if I didn't miss a point.
>
> I'd now reboot into the system to see if it's working. By then, it's
> time for some cleanup (remove the previously deferred "trashes" and
> retention snapshots), then separate the seed from the sprout. During
> that time, I could already use my system again while it's migrating
> for me in the background.
>
> I'd then return the seed back to non-seed, so it can take the role of
> my backup storage again. I'd do a rebalance now.
>
> During the whole process, the backup storage will still stay safe for
> me. If something goes wrong, I could easily start over.
>
> Did I miss something? Is it too much of an experimental kind of stuff?
>
> BTW: The way it is arranged now, the backup storage is bootable by
> setting the scratch area subvolume as the rootfs on kernel cmdline,
> USB drivers are included in the kernel, it's tested and works. I
> guess, this isn't possible while the backup storage acts as a seed
> device? But I have an initrd with latest btrfs-progs on my boot
> device (which is an UEFI ESP, so not related to btrfs at all), I
> should be able to use that to revert changes preventing me from
> booting.
The whole idea of this is to think of it as sort of thin provisioning
my system from the backup storage, then let btrfs work for me. It saves
me copying back data for 40 hours without being able to use the system.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 20:50 ` Kai Krakow
2016-04-04 21:00 ` Kai Krakow
@ 2016-04-04 23:09 ` Chris Murphy
2016-04-05 7:05 ` Kai Krakow
1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-04 23:09 UTC (permalink / raw)
To: Kai Krakow; +Cc: Btrfs BTRFS
On Mon, Apr 4, 2016 at 2:50 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> Anyway the 2nd 4 is not possible. The seed is ro by definition so you
>> can't remove snapshots from the seed. If you remove them from the
>> mounted rw sprout volume, they're removed from the sprout, not the
>> seed. If you want them on the sprout, but not on the seed, you need to
>> delete snapshots only after the seed is a.) removed from the sprout
>> and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.
>
> If I understand right, the seed device won't change? So whatever action
> I apply to the sprout pool, I can later remove the seed from the pool
> and it will still be kind of untouched. Except, I'll have to return it
> no non-seed mode (step b).
Correct. In a sense, making a volume a seed is like making it a
volume-wide read-only snapshot. Any changes are applied via COW only
to added device(s).
>
> Why couldn't/shouldn't I remove snapshots before detaching the seed
> device? I want to keep them on the seed but they are useless to me on
> the sprout.
You can remove snapshots before or after detaching the seed device, it
doesn't matter, but such snapshot removal only affects the sprout. You
wrote:
"remove all left-over snapshots from the seed"
The seed is read only, you can't modify the contents of the seed device.
What you should do is just delete the snapshots you don't want
migrated over to the sprout right away before you even do the balance
-dconvert -mconvert. That way you aren't wasting time moving things
over that you don't want. To be clear:
btrfstune -S 0
mount /dev/seed /mnt/
btrfs dev add /dev/new1
btrfs dev add /dev/new2
mount -o remount,rw /mnt/
btrfs sub del blah/ blah2/ blah3/ blah4/
btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/
btrfs dev del /dev/seed /mnt/
If you're doing any backsup once remounting rw, note those backups
will only be on the sprout. Backups will not be on the seed because
it's read-only.
>
> What happens to the UUIDs when I separate seed and sprout?
Nothing. They remain intact and unique, per volume.
>
> I'd now reboot into the system to see if it's working.
Note you'll need to change grub.cfg, possibly fstab, and possibly the
initramfs, all three of which may be referencing the old volume.
> By then, it's
> time for some cleanup (remove the previously deferred "trashes" and
> retention snapshots), then separate the seed from the sprout. During
> that time, I could already use my system again while it's migrating for
> me in the background.
>
> I'd then return the seed back to non-seed, so it can take the role of
> my backup storage again. I'd do a rebalance now.
OK? I don't know why you need to balance the seed at all, let alone
afterward, but it seems like it might be a more efficient replication
if you balanced before making it a seed?
>
> During the whole process, the backup storage will still stay safe for
> me. If something goes wrong, I could easily start over.
>
> Did I miss something? Is it too much of an experimental kind of stuff?
I'm not sure where all the bugs are. It's good to find bugs though and
get them squashed. I have an idea of making live media use Btrfs
instead of using a loop mounted file to back a rw lvm snapshot device
(persistent overlay), which I think is really fragile and a lot more
complicated in the initramfs. It's also good to take advantage of
checksumming after having written an ISO to flash media, where users
often don't verify or something can mount the USB stick rw and
immediately modify the stick in such a way that media verification
will fail anyway. So, a number of plusses, I'd like to see the seed
device be robust.
>
> BTW: The way it is arranged now, the backup storage is bootable by
> setting the scratch area subvolume as the rootfs on kernel cmdline,
> USB drivers are included in the kernel, it's tested and works. I guess,
> this isn't possible while the backup storage acts as a seed device? But
> I have an initrd with latest btrfs-progs on my boot device (which is an
> UEFI ESP, so not related to btrfs at all), I should be able to use that
> to revert changes preventing me from booting.
--
Chris Murphy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 19:26 ` Kai Krakow
@ 2016-04-05 1:44 ` Duncan
0 siblings, 0 replies; 22+ messages in thread
From: Duncan @ 2016-04-05 1:44 UTC (permalink / raw)
To: linux-btrfs
Kai Krakow posted on Mon, 04 Apr 2016 21:26:28 +0200 as excerpted:
> I'll go test the soon-to-die SSD as soon as it replaced. I think it's
> still far from failing with bitrot. It was overprovisioned by 30% most
> of the time, with the spare space trimmed.
Same here, FWIW. In fact, I had expected to get ~128 GB SSDs and ended
up getting 256 GB, such that I was only using about 130 GiB, so depending
on relative to what the overprovisioning percentage is calculated
against, I was and am near 50% or 100% overprovisioned.
So in my case I think the SSD was simply defective, such that the
overprovisioning and trim simply didn't help. Tho the other two
identical brand and model devices I bought from the same store at the
same time, so very likely the same manufacturing lot, were and are just
fine (tho one is showing a trivial non-zero raw value for 5, reallocated
sector count, and 182, erase fail count total, but both remain at 100%
"cooked" value, but absolutely no issues on the other one, actually the
one that wasn't replaced of the original pair, at all).
But based on that experience, while overprovisioning may help in terms of
normal wearout, it doesn't necessarily help at all if the device is
actually going bad.
> It certainly should have a
> lot of sectors for wear levelling. In addition, smartctl shows no sector
> errors at all - except for one: raw_read_error_rate. I'm not sure what
> all those sensors tell me, but that one I'm also seeing on hard disks
> which show absolutely no data damage.
>
> In fact, I see those counters for my hard disks. But dd to /dev/null of
> the complete raw hard disk shows no sector errors. It seems good. But
> well, counting 1+1 together: I currently see data damage. But I guess
> that's unrelated.
>
> Is there some documentation somewhere what each of those sensors
> technically mean and how to read the raw values and thresh values?
Nothing user/admin level that I'm aware of. I'm sure there's some smart
docs somewhere that describe them as part of the standard, but they could
easily be effectively unavailable for those unwilling to pay a big-
corporate-sized consortium membership fee (as was the case with one of
the CompactDisc specs, Orange Book IIRC, at one point).
I know there's some discussion by allusion in the smartctl manpage and
docs, but many attributes appear to be manufacturer specific and/or to
have been reverse-engineered by the smartctl devs, meaning even /they/
don't really have access to proper documentation for at least some
attributes.
Which is sad, but in a majority proprietary or at best don't-care
market...
> I'm also seeing multi_zone_error_rate on my spinning rust.
> According to smartctl health check and smartctl extended selftest,
> there's no problems at all - and the smart error log is empty. There has
> never been an ATA error in dmesg... No relocated sectors... From my
> naive view the drives still look good.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: btrfsck: backpointer mismatch (and multiple other errors)
2016-04-04 23:09 ` Chris Murphy
@ 2016-04-05 7:05 ` Kai Krakow
0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-05 7:05 UTC (permalink / raw)
To: linux-btrfs
Am Mon, 4 Apr 2016 17:09:14 -0600
schrieb Chris Murphy <lists@colorremedies.com>:
> > Why couldn't/shouldn't I remove snapshots before detaching the seed
> > device? I want to keep them on the seed but they are useless to me
> > on the sprout.
>
> You can remove snapshots before or after detaching the seed device, it
> doesn't matter, but such snapshot removal only affects the sprout. You
> wrote:
>
> "remove all left-over snapshots from the seed"
>
> The seed is read only, you can't modify the contents of the seed
> device.
Sorry, not a native speaker... What I actually meant was to remove the
snapshot that originated from the seed, and which I don't need in the
sprout.
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-04-05 7:05 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31 20:44 btrfsck: backpointer mismatch (and multiple other errors) Kai Krakow
2016-03-31 23:27 ` Henk Slager
2016-04-01 1:10 ` Qu Wenruo
2016-04-02 8:47 ` Kai Krakow
2016-04-02 9:00 ` Kai Krakow
2016-04-02 17:17 ` Henk Slager
2016-04-02 20:16 ` Kai Krakow
2016-04-03 0:14 ` Chris Murphy
2016-04-03 4:02 ` Kai Krakow
2016-04-03 5:06 ` Duncan
2016-04-03 22:19 ` Kai Krakow
2016-04-04 0:51 ` Chris Murphy
2016-04-04 19:36 ` Kai Krakow
2016-04-04 19:57 ` Chris Murphy
2016-04-04 20:50 ` Kai Krakow
2016-04-04 21:00 ` Kai Krakow
2016-04-04 23:09 ` Chris Murphy
2016-04-05 7:05 ` Kai Krakow
2016-04-04 4:34 ` Duncan
2016-04-04 19:26 ` Kai Krakow
2016-04-05 1:44 ` Duncan
2016-04-03 19:03 ` Chris Murphy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.