All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfsck: backpointer mismatch (and multiple other errors)
@ 2016-03-31 20:44 Kai Krakow
  2016-03-31 23:27 ` Henk Slager
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-03-31 20:44 UTC (permalink / raw)
  To: linux-btrfs

Hello!

I already reported this in another thread but it was a bit confusing by
intermixing multiple volumes. So let's start a new thread:

Since one of the last kernel upgrades, I'm experiencing one VDI file
(containing a NTFS image with Windows 7) getting damaged when running
the machine in VirtualBox. I got knowledge about this after
experiencing an error "duplicate object" and btrfs went RO. I fixed it
by deleting the VDI and restoring from backup - but no I get csum
errors as soon as some VM IO goes into the VDI file.

The FS is still usable. One effect is, that after reading all files
with rsync (to copy to my backup), each call of "du" or "df" hangs, also
similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
outcome of this is, that the FS does not properly unmount during
shutdown.

Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
series, and never had problems), including Gentoo patch-set r1.

The device layout is:

$ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
NAME        MODEL            FSTYPE LABEL      MOUNTPOINT
sda         Crucial_CT128MX1
├─sda1                       vfat   ESP        /boot
├─sda2
└─sda3                       bcache
  ├─bcache0                  btrfs  system
  ├─bcache1                  btrfs  system
  └─bcache2                  btrfs  system     /usr/src
sdb         SAMSUNG HD103SJ
├─sdb1                       swap   swap0      [SWAP]
└─sdb2                       bcache
  └─bcache2                  btrfs  system     /usr/src
sdc         SAMSUNG HD103SJ
├─sdc1                       swap   swap1      [SWAP]
└─sdc2                       bcache
  └─bcache1                  btrfs  system
sdd         SAMSUNG HD103UJ
├─sdd1                       swap   swap2      [SWAP]
└─sdd2                       bcache
  └─bcache0                  btrfs  system

Mount options are:

$ mount|fgrep btrfs
/dev/bcache2 on / type btrfs (rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)

The FS uses mraid=1 and draid=0.

Output of btrfsck is:
(also available here:
https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)

$ btrfsck /dev/disk/by-label/system
checking extents
ref mismatch on [10443660537856 524288] extent item 1, found 2
Backref 10443660537856 root 256 owner 23536425 offset 1310720 num_refs 0 not found in extent tree
Incorrect local backref count on 10443660537856 root 256 owner 23536425 offset 1310720 found 1 wanted 0 back 0x4ceee750
Backref disk bytenr does not match extent record, bytenr=10443660537856, ref bytenr=10443660914688
Backref bytes do not match extent backref, bytenr=10443660537856, ref bytes=524288, backref bytes=69632
backpointer mismatch on [10443660537856 524288]
extent item 11271946579968 has multiple extent items
ref mismatch on [11271946579968 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946579968, ref bytenr=11271946629120
backpointer mismatch on [11271946579968 110592]
extent item 11271946690560 has multiple extent items
ref mismatch on [11271946690560 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946690560, ref bytenr=11271946739712
Backref bytes do not match extent backref, bytenr=11271946690560, ref bytes=114688, backref bytes=110592
backpointer mismatch on [11271946690560 114688]
extent item 11271946805248 has multiple extent items
ref mismatch on [11271946805248 114688] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271946805248, ref bytenr=11271946850304
Backref bytes do not match extent backref, bytenr=11271946805248, ref bytes=114688, backref bytes=53248
Backref disk bytenr does not match extent record, bytenr=11271946805248, ref bytenr=11271946903552
Backref bytes do not match extent backref, bytenr=11271946805248, ref bytes=114688, backref bytes=49152
backpointer mismatch on [11271946805248 114688]
extent item 11271946919936 has multiple extent items
ref mismatch on [11271946919936 61440] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946919936, ref bytenr=11271946952704
Backref bytes do not match extent backref, bytenr=11271946919936, ref bytes=61440, backref bytes=110592
backpointer mismatch on [11271946919936 61440]
extent item 11271946981376 has multiple extent items
ref mismatch on [11271946981376 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946981376, ref bytenr=11271947063296
backpointer mismatch on [11271946981376 110592]
extent item 11271947091968 has multiple extent items
ref mismatch on [11271947091968 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947091968, ref bytenr=11271947173888
Backref bytes do not match extent backref, bytenr=11271947091968, ref bytes=110592, backref bytes=114688
backpointer mismatch on [11271947091968 110592]
extent item 11271947202560 has multiple extent items
ref mismatch on [11271947202560 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947202560, ref bytenr=11271947288576
Backref bytes do not match extent backref, bytenr=11271947202560, ref bytes=110592, backref bytes=102400
backpointer mismatch on [11271947202560 110592]
extent item 11271947313152 has multiple extent items
ref mismatch on [11271947313152 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947313152, ref bytenr=11271947390976
Backref bytes do not match extent backref, bytenr=11271947313152, ref bytes=114688, backref bytes=110592
backpointer mismatch on [11271947313152 114688]
extent item 11271947427840 has multiple extent items
ref mismatch on [11271947427840 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947427840, ref bytenr=11271947501568
backpointer mismatch on [11271947427840 110592]
extent item 11271947538432 has multiple extent items
ref mismatch on [11271947538432 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947538432, ref bytenr=11271947612160
Backref bytes do not match extent backref, bytenr=11271947538432, ref bytes=86016, backref bytes=81920
backpointer mismatch on [11271947538432 86016]
extent item 11271947624448 has multiple extent items
ref mismatch on [11271947624448 77824] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947624448, ref bytenr=11271947694080
Backref bytes do not match extent backref, bytenr=11271947624448, ref bytes=77824, backref bytes=102400
backpointer mismatch on [11271947624448 77824]
ref mismatch on [11271947702272 102400] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947702272, ref bytenr=11271947796480
Backref bytes do not match extent backref, bytenr=11271947702272, ref bytes=102400, backref bytes=90112
backpointer mismatch on [11271947702272 102400]
extent item 11271947862016 has multiple extent items
extent item 11271947886592 has multiple extent items
ref mismatch on [11271947886592 131072] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947886592, ref bytenr=11271947948032
Backref bytes do not match extent backref, bytenr=11271947886592, ref bytes=131072, backref bytes=102400
backpointer mismatch on [11271947886592 131072]
extent item 11271948017664 has multiple extent items
ref mismatch on [11271948017664 49152] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948017664, ref bytenr=11271948050432
Backref bytes do not match extent backref, bytenr=11271948017664, ref bytes=49152, backref bytes=94208
backpointer mismatch on [11271948017664 49152]
extent item 11271948144640 has multiple extent items
ref mismatch on [11271948144640 73728] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948144640, ref bytenr=11271948148736
Backref bytes do not match extent backref, bytenr=11271948144640, ref bytes=73728, backref bytes=110592
backpointer mismatch on [11271948144640 73728]
extent item 11271948218368 has multiple extent items
ref mismatch on [11271948218368 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948218368, ref bytenr=11271948259328
Backref bytes do not match extent backref, bytenr=11271948218368, ref bytes=110592, backref bytes=102400
backpointer mismatch on [11271948218368 110592]
extent item 11271948328960 has multiple extent items
ref mismatch on [11271948328960 106496] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948328960, ref bytenr=11271948361728
Backref bytes do not match extent backref, bytenr=11271948328960, ref bytes=106496, backref bytes=110592
backpointer mismatch on [11271948328960 106496]
extent item 11271948435456 has multiple extent items
ref mismatch on [11271948435456 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948435456, ref bytenr=11271948472320
Backref bytes do not match extent backref, bytenr=11271948435456, ref bytes=110592, backref bytes=114688
backpointer mismatch on [11271948435456 110592]
extent item 11271948546048 has multiple extent items
ref mismatch on [11271948546048 110592] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271948546048, ref bytenr=11271948587008
Backref bytes do not match extent backref, bytenr=11271948546048, ref bytes=110592, backref bytes=61440
Backref disk bytenr does not match extent record, bytenr=11271948546048, ref bytenr=11271948648448
Backref bytes do not match extent backref, bytenr=11271948546048, ref bytes=110592, backref bytes=73728
backpointer mismatch on [11271948546048 110592]
extent item 11271948656640 has multiple extent items
ref mismatch on [11271948656640 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948656640, ref bytenr=11271948722176
backpointer mismatch on [11271948656640 110592]
extent item 11271948767232 has multiple extent items
ref mismatch on [11271948767232 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271948767232, ref bytenr=11271948832768
Backref bytes do not match extent backref, bytenr=11271948767232, ref bytes=114688, backref bytes=73728
backpointer mismatch on [11271948767232 114688]
extent item 11271948881920 has multiple extent items
ref mismatch on [11271948881920 114688] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271948881920, ref bytenr=11271948906496
Backref bytes do not match extent backref, bytenr=11271948881920, ref bytes=114688, backref bytes=12288
Backref disk bytenr does not match extent record, bytenr=11271948881920, ref bytenr=11271948926976
Backref bytes do not match extent backref, bytenr=11271948881920, ref bytes=114688, backref bytes=524288
backpointer mismatch on [11271948881920 114688]
extent item 11271949414400 has multiple extent items
ref mismatch on [11271949414400 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949414400, ref bytenr=11271949451264
Backref bytes do not match extent backref, bytenr=11271949414400, ref bytes=110592, backref bytes=81920
backpointer mismatch on [11271949414400 110592]
extent item 11271949524992 has multiple extent items
ref mismatch on [11271949524992 57344] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949524992, ref bytenr=11271949533184
Backref bytes do not match extent backref, bytenr=11271949524992, ref bytes=57344, backref bytes=94208
backpointer mismatch on [11271949524992 57344]
extent item 11271949582336 has multiple extent items
ref mismatch on [11271949582336 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949582336, ref bytenr=11271949627392
Backref bytes do not match extent backref, bytenr=11271949582336, ref bytes=86016, backref bytes=81920
backpointer mismatch on [11271949582336 86016]
extent item 11271949668352 has multiple extent items
ref mismatch on [11271949668352 94208] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949668352, ref bytenr=11271949709312
Backref bytes do not match extent backref, bytenr=11271949668352, ref bytes=94208, backref bytes=98304
backpointer mismatch on [11271949668352 94208]
extent item 11271949762560 has multiple extent items
ref mismatch on [11271949762560 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949762560, ref bytenr=11271949807616
Backref bytes do not match extent backref, bytenr=11271949762560, ref bytes=81920, backref bytes=94208
backpointer mismatch on [11271949762560 81920]
extent item 11271949844480 has multiple extent items
ref mismatch on [11271949844480 94208] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949844480, ref bytenr=11271949901824
backpointer mismatch on [11271949844480 94208]
extent item 11271949938688 has multiple extent items
ref mismatch on [11271949938688 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271949938688, ref bytenr=11271949996032
Backref bytes do not match extent backref, bytenr=11271949938688, ref bytes=81920, backref bytes=90112
backpointer mismatch on [11271949938688 81920]
extent item 11271950020608 has multiple extent items
ref mismatch on [11271950020608 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950020608, ref bytenr=11271950086144
Backref bytes do not match extent backref, bytenr=11271950020608, ref bytes=81920, backref bytes=94208
backpointer mismatch on [11271950020608 81920]
extent item 11271950180352 has multiple extent items
ref mismatch on [11271950180352 81920] extent item 1, found 2
Backref bytes do not match extent backref, bytenr=11271950180352, ref bytes=81920, backref bytes=98304
backpointer mismatch on [11271950180352 81920]
ref mismatch on [11271950262272 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950262272, ref bytenr=11271950278656
Backref bytes do not match extent backref, bytenr=11271950262272, ref bytes=81920, backref bytes=102400
backpointer mismatch on [11271950262272 81920]
extent item 11271950344192 has multiple extent items
ref mismatch on [11271950344192 77824] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950344192, ref bytenr=11271950381056
Backref bytes do not match extent backref, bytenr=11271950344192, ref bytes=77824, backref bytes=98304
backpointer mismatch on [11271950344192 77824]
extent item 11271950422016 has multiple extent items
ref mismatch on [11271950422016 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950422016, ref bytenr=11271950479360
Backref bytes do not match extent backref, bytenr=11271950422016, ref bytes=81920, backref bytes=98304
backpointer mismatch on [11271950422016 81920]
extent item 11271950503936 has multiple extent items
ref mismatch on [11271950503936 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950503936, ref bytenr=11271950577664
Backref bytes do not match extent backref, bytenr=11271950503936, ref bytes=86016, backref bytes=94208
backpointer mismatch on [11271950503936 86016]
extent item 11271950589952 has multiple extent items
ref mismatch on [11271950589952 86016] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950589952, ref bytenr=11271950671872
Backref bytes do not match extent backref, bytenr=11271950589952, ref bytes=86016, backref bytes=94208
backpointer mismatch on [11271950589952 86016]
extent item 11271950675968 has multiple extent items
ref mismatch on [11271950675968 98304] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950675968, ref bytenr=11271950766080
backpointer mismatch on [11271950675968 98304]
extent item 11271950774272 has multiple extent items
ref mismatch on [11271950774272 94208] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950774272, ref bytenr=11271950864384
Backref bytes do not match extent backref, bytenr=11271950774272, ref bytes=94208, backref bytes=98304
backpointer mismatch on [11271950774272 94208]
extent item 11271950954496 has multiple extent items
ref mismatch on [11271950954496 90112] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271950954496, ref bytenr=11271950962688
Backref bytes do not match extent backref, bytenr=11271950954496, ref bytes=90112, backref bytes=61440
backpointer mismatch on [11271950954496 90112]
extent item 11271952793600 has multiple extent items
ref mismatch on [11271952793600 98304] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271952793600, ref bytenr=11271952879616
Backref bytes do not match extent backref, bytenr=11271952793600, ref bytes=98304, backref bytes=102400
backpointer mismatch on [11271952793600 98304]
extent item 11271952891904 has multiple extent items
ref mismatch on [11271952891904 262144] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271952891904, ref bytenr=11271952994304
Backref bytes do not match extent backref, bytenr=11271952891904, ref bytes=262144, backref bytes=1052672
backpointer mismatch on [11271952891904 262144]
extent item 11271953993728 has multiple extent items
ref mismatch on [11271953993728 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271953993728, ref bytenr=11271954046976
Backref bytes do not match extent backref, bytenr=11271953993728, ref bytes=114688, backref bytes=1052672
backpointer mismatch on [11271953993728 114688]
ref mismatch on [11271954878464 393216] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271954878464, ref bytenr=11271955099648
Backref bytes do not match extent backref, bytenr=11271954878464, ref bytes=393216, backref bytes=3149824
backpointer mismatch on [11271954878464 393216]
extent item 11271956312064 has multiple extent items
ref mismatch on [11271958249472 2101248] extent item 0, found 1
Backref 11271958249472 parent 12160723820544 owner 0 offset 0 num_refs 0 not found in extent tree
Incorrect local backref count on 11271958249472 parent 12160723820544 owner 0 offset 0 found 1 wanted 0 back 0x14d56620
backpointer mismatch on [11271958249472 2101248]
extent item 11271960338432 has multiple extent items
ref mismatch on [11271960338432 57344] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271960338432, ref bytenr=11271960350720
Backref bytes do not match extent backref, bytenr=11271960338432, ref bytes=57344, backref bytes=1052672
backpointer mismatch on [11271960338432 57344]
extent item 11271961325568 has multiple extent items
ref mismatch on [11271961325568 81920] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271961325568, ref bytenr=11271961403392
Backref bytes do not match extent backref, bytenr=11271961325568, ref bytes=81920, backref bytes=1052672
backpointer mismatch on [11271961325568 81920]
extent item 11271962333184 has multiple extent items
ref mismatch on [11271962333184 524288] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271962333184, ref bytenr=11271962456064
Backref bytes do not match extent backref, bytenr=11271962333184, ref bytes=524288, backref bytes=1052672
backpointer mismatch on [11271962333184 524288]
extent item 11271963475968 has multiple extent items
ref mismatch on [11271963475968 393216] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271963475968, ref bytenr=11271963508736
Backref bytes do not match extent backref, bytenr=11271963475968, ref bytes=393216, backref bytes=1052672
backpointer mismatch on [11271963475968 393216]
extent item 11271964389376 has multiple extent items
ref mismatch on [11271964389376 524288] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271964389376, ref bytenr=11271964561408
Backref bytes do not match extent backref, bytenr=11271964389376, ref bytes=524288, backref bytes=1052672
backpointer mismatch on [11271964389376 524288]
extent item 11271965601792 has multiple extent items
ref mismatch on [11271965601792 90112] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271965601792, ref bytenr=11271965614080
Backref bytes do not match extent backref, bytenr=11271965601792, ref bytes=90112, backref bytes=1052672
backpointer mismatch on [11271965601792 90112]
extent item 11271968571392 has multiple extent items
ref mismatch on [11271968571392 1052672] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271968571392, ref bytenr=11271969107968
Backref bytes do not match extent backref, bytenr=11271968571392, ref bytes=1052672, backref bytes=69632
Backref disk bytenr does not match extent record, bytenr=11271968571392, ref bytenr=11271969177600
Backref bytes do not match extent backref, bytenr=11271968571392, ref bytes=1052672, backref bytes=262144
backpointer mismatch on [11271968571392 1052672]
checking free space cache
checking fs roots
root 4336 inode 4284125 errors 1000, some csum missing
Checking filesystem on /dev/disk/by-label/system
UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
found 1832931324360 bytes used err is 1
total csum bytes: 1730105656
total tree bytes: 6494474240
total fs tree bytes: 3789783040
total extent tree bytes: 608219136
btree space waste bytes: 1221460063
file data blocks allocated: 2406059724800
 referenced 2040857763840


-- 
Regards,
Kai

Replies to list-only preferred.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-03-31 20:44 btrfsck: backpointer mismatch (and multiple other errors) Kai Krakow
@ 2016-03-31 23:27 ` Henk Slager
  2016-04-01  1:10   ` Qu Wenruo
  2016-04-02  9:00   ` Kai Krakow
  0 siblings, 2 replies; 22+ messages in thread
From: Henk Slager @ 2016-03-31 23:27 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Mar 31, 2016 at 10:44 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Hello!
>
> I already reported this in another thread but it was a bit confusing by
> intermixing multiple volumes. So let's start a new thread:
>
> Since one of the last kernel upgrades, I'm experiencing one VDI file
> (containing a NTFS image with Windows 7) getting damaged when running
> the machine in VirtualBox. I got knowledge about this after
> experiencing an error "duplicate object" and btrfs went RO. I fixed it
> by deleting the VDI and restoring from backup - but no I get csum
> errors as soon as some VM IO goes into the VDI file.
>
> The FS is still usable. One effect is, that after reading all files
> with rsync (to copy to my backup), each call of "du" or "df" hangs, also
> similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
> outcome of this is, that the FS does not properly unmount during
> shutdown.
>
> Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
> series, and never had problems), including Gentoo patch-set r1.

One possibility could be that the vbox kernel modules somehow corrupt
btrfs kernel area since kernel 4.5.

In order to make this reproducible (or an attempt to reproduce) for
others, you could unload VirtualBox stuff and restore the VDI file
from backup (or whatever big file) and then make pseudo-random, but
reproducible writes to the file.

It is not clear to me what 'Gentoo patch-set r1' is and does. So just
boot a vanilla v4.5 kernel from kernel.org and see if you get csum
errors in dmesg.

Also, where does 'duplicate object' come from? dmesg ? then please
post its surroundings, straight from dmesg.

> The device layout is:
>
> $ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
> NAME        MODEL            FSTYPE LABEL      MOUNTPOINT
> sda         Crucial_CT128MX1
> ├─sda1                       vfat   ESP        /boot
> ├─sda2
> └─sda3                       bcache
>   ├─bcache0                  btrfs  system
>   ├─bcache1                  btrfs  system
>   └─bcache2                  btrfs  system     /usr/src
> sdb         SAMSUNG HD103SJ
> ├─sdb1                       swap   swap0      [SWAP]
> └─sdb2                       bcache
>   └─bcache2                  btrfs  system     /usr/src
> sdc         SAMSUNG HD103SJ
> ├─sdc1                       swap   swap1      [SWAP]
> └─sdc2                       bcache
>   └─bcache1                  btrfs  system
> sdd         SAMSUNG HD103UJ
> ├─sdd1                       swap   swap2      [SWAP]
> └─sdd2                       bcache
>   └─bcache0                  btrfs  system
>
> Mount options are:
>
> $ mount|fgrep btrfs
> /dev/bcache2 on / type btrfs (rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)
>
> The FS uses mraid=1 and draid=0.
>
> Output of btrfsck is:
> (also available here:
> https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)
>
> $ btrfsck /dev/disk/by-label/system
> checking extents
> ref mismatch on [10443660537856 524288] extent item 1, found 2
This   10443660537856  number is bigger than the  1832931324360 number
found for total bytes. AFAIK, this is already wrong.

[...]

> checking fs roots
> root 4336 inode 4284125 errors 1000, some csum missing
What is in this inode?

> Checking filesystem on /dev/disk/by-label/system
> UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
> found 1832931324360 bytes used err is 1
> total csum bytes: 1730105656
> total tree bytes: 6494474240
> total fs tree bytes: 3789783040
> total extent tree bytes: 608219136
> btree space waste bytes: 1221460063
> file data blocks allocated: 2406059724800
>  referenced 2040857763840

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-03-31 23:27 ` Henk Slager
@ 2016-04-01  1:10   ` Qu Wenruo
  2016-04-02  8:47     ` Kai Krakow
  2016-04-02  9:00   ` Kai Krakow
  1 sibling, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2016-04-01  1:10 UTC (permalink / raw)
  To: Henk Slager, linux-btrfs



Henk Slager wrote on 2016/04/01 01:27 +0200:
> On Thu, Mar 31, 2016 at 10:44 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> Hello!
>>
>> I already reported this in another thread but it was a bit confusing by
>> intermixing multiple volumes. So let's start a new thread:
>>
>> Since one of the last kernel upgrades, I'm experiencing one VDI file
>> (containing a NTFS image with Windows 7) getting damaged when running
>> the machine in VirtualBox. I got knowledge about this after
>> experiencing an error "duplicate object" and btrfs went RO. I fixed it
>> by deleting the VDI and restoring from backup - but no I get csum
>> errors as soon as some VM IO goes into the VDI file.
>>
>> The FS is still usable. One effect is, that after reading all files
>> with rsync (to copy to my backup), each call of "du" or "df" hangs, also
>> similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
>> outcome of this is, that the FS does not properly unmount during
>> shutdown.
>>
>> Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
>> series, and never had problems), including Gentoo patch-set r1.
>
> One possibility could be that the vbox kernel modules somehow corrupt
> btrfs kernel area since kernel 4.5.
>
> In order to make this reproducible (or an attempt to reproduce) for
> others, you could unload VirtualBox stuff and restore the VDI file
> from backup (or whatever big file) and then make pseudo-random, but
> reproducible writes to the file.
>
> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
> errors in dmesg.
>
> Also, where does 'duplicate object' come from? dmesg ? then please
> post its surroundings, straight from dmesg.
>
>> The device layout is:
>>
>> $ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
>> NAME        MODEL            FSTYPE LABEL      MOUNTPOINT
>> sda         Crucial_CT128MX1
>> ├─sda1                       vfat   ESP        /boot
>> ├─sda2
>> └─sda3                       bcache
>>    ├─bcache0                  btrfs  system
>>    ├─bcache1                  btrfs  system
>>    └─bcache2                  btrfs  system     /usr/src
>> sdb         SAMSUNG HD103SJ
>> ├─sdb1                       swap   swap0      [SWAP]
>> └─sdb2                       bcache
>>    └─bcache2                  btrfs  system     /usr/src
>> sdc         SAMSUNG HD103SJ
>> ├─sdc1                       swap   swap1      [SWAP]
>> └─sdc2                       bcache
>>    └─bcache1                  btrfs  system
>> sdd         SAMSUNG HD103UJ
>> ├─sdd1                       swap   swap2      [SWAP]
>> └─sdd2                       bcache
>>    └─bcache0                  btrfs  system
>>
>> Mount options are:
>>
>> $ mount|fgrep btrfs
>> /dev/bcache2 on / type btrfs (rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)
>>
>> The FS uses mraid=1 and draid=0.
>>
>> Output of btrfsck is:
>> (also available here:
>> https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)
>>
>> $ btrfsck /dev/disk/by-label/system
>> checking extents
>> ref mismatch on [10443660537856 524288] extent item 1, found 2
> This   10443660537856  number is bigger than the  1832931324360 number
> found for total bytes. AFAIK, this is already wrong.

Nope. That's btrfs logical space address, which can be beyond real disk 
bytenr.

The easiest method to reproduce such case, is write something in a 256M 
btrfs, and balance the fs several times.

Then all chunks can be at bytenr beyond 256M.

The real problem is, the extent has mismatched reference.
Normally it can fixed by --init-extent-tree option, but it normally 
means bigger problem, especially it has already caused kernel 
delayed-ref problem.

No to mention the error "extent item 11271947091968 has multiple extent 
items", which makes the problem more serious.


I assume some older kernel have already screwed up the extent tree, as 
although delayed-ref is bug-prove, it has improved in recent years.

But it seems fs tree is less damaged, I assume the extent tree 
corruption could be fixed by "--init-extent-tree".

For the only fs tree error (missing csum), if "btrfsck 
--init-extent-tree --repair" works without any problem, the most simple 
fix would be, just removing the file.
Or you can use a lot of CPU time and disk IO to rebuild the whole csum, 
by using "--init-csum-tree" option.

Thanks,
Qu

>
> [...]
>
>> checking fs roots
>> root 4336 inode 4284125 errors 1000, some csum missing
> What is in this inode?
>
>> Checking filesystem on /dev/disk/by-label/system
>> UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
>> found 1832931324360 bytes used err is 1
>> total csum bytes: 1730105656
>> total tree bytes: 6494474240
>> total fs tree bytes: 3789783040
>> total extent tree bytes: 608219136
>> btree space waste bytes: 1221460063
>> file data blocks allocated: 2406059724800
>>   referenced 2040857763840
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-01  1:10   ` Qu Wenruo
@ 2016-04-02  8:47     ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-02  8:47 UTC (permalink / raw)
  To: linux-btrfs

Am Fri, 1 Apr 2016 09:10:44 +0800
schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:

> The real problem is, the extent has mismatched reference.
> Normally it can fixed by --init-extent-tree option, but it normally 
> means bigger problem, especially it has already caused kernel 
> delayed-ref problem.
> 
> No to mention the error "extent item 11271947091968 has multiple
> extent items", which makes the problem more serious.
> 
> 
> I assume some older kernel have already screwed up the extent tree,
> as although delayed-ref is bug-prove, it has improved in recent years.
> 
> But it seems fs tree is less damaged, I assume the extent tree 
> corruption could be fixed by "--init-extent-tree".
> 
> For the only fs tree error (missing csum), if "btrfsck 
> --init-extent-tree --repair" works without any problem, the most
> simple fix would be, just removing the file.
> Or you can use a lot of CPU time and disk IO to rebuild the whole
> csum, by using "--init-csum-tree" option.

Okay, so I'm going to inode-resolve the file with csum errors.
Actually, it's a file from Steam which has been there for ages and
never showed csum errors before which make me wonder if csum errors may
sneak in on long existing files through other corruptions.

I now removed this file and had to reboot because btrfs went RO. Here's
the backtrace:

https://gist.github.com/kakra/a7be40c23e08fc6e237f9108371afadf

[137619.835374] ------------[ cut here ]------------
[137619.835385] WARNING: CPU: 1 PID: 4840 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x156/0x620()
[137619.835394] Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO)
[137619.835405] CPU: 1 PID: 4840 Comm: rm Tainted: P           O    4.5.0-gentoo-r1 #1
[137619.835407] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[137619.835409]  0000000000000000 ffffffff8159eae9 0000000000000000 ffffffff81ea1d08
[137619.835412]  ffffffff810c6e37 ffff8803d56a4d20 ffff88040c7daa00 00000a4075114000
[137619.835415]  0000000000201000 0000000000000000 ffffffff81489836 0000001d00000000
[137619.835418] Call Trace:
[137619.835423]  [<ffffffff8159eae9>] ? dump_stack+0x46/0x5d
[137619.835429]  [<ffffffff810c6e37>] ? warn_slowpath_common+0x77/0xb0
[137619.835432]  [<ffffffff81489836>] ? lookup_inline_extent_backref+0x156/0x620
[137619.835435]  [<ffffffff814bdfce>] ? btrfs_get_token_32+0xee/0x110
[137619.835440]  [<ffffffff8115de48>] ? __set_page_dirty_nobuffers+0xf8/0x150
[137619.835443]  [<ffffffff81489d54>] ? insert_inline_extent_backref+0x54/0xe0
[137619.835450]  [<ffffffff8119ebd8>] ? __slab_free+0x98/0x220
[137619.835453]  [<ffffffff8119e6ad>] ? kmem_cache_alloc+0x14d/0x160
[137619.835456]  [<ffffffff8148a1e9>] ? __btrfs_inc_extent_ref.isra.64+0x99/0x270
[137619.835459]  [<ffffffff8148ecc3>] ? __btrfs_run_delayed_refs+0x673/0x1020
[137619.835463]  [<ffffffff814c6e01>] ? btrfs_release_extent_buffer_page+0x71/0x120
[137619.835466]  [<ffffffff814c6eef>] ? release_extent_buffer+0x3f/0x90
[137619.835469]  [<ffffffff8149222f>] ? btrfs_run_delayed_refs+0x8f/0x2b0
[137619.835473]  [<ffffffff814b0978>] ? btrfs_truncate_inode_items+0x8b8/0xdc0
[137619.835477]  [<ffffffff814b1d4e>] ? btrfs_evict_inode+0x3fe/0x550
[137619.835481]  [<ffffffff811cd4f7>] ? evict+0xb7/0x180
[137619.835484]  [<ffffffff811c37cc>] ? do_unlinkat+0x12c/0x2d0
[137619.835488]  [<ffffffff81bdb017>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[137619.835491] ---[ end trace 6e8061336c42ff93 ]---
[137619.835494] ------------[ cut here ]------------
[137619.835497] WARNING: CPU: 1 PID: 4840 at fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x279/0x2b0()
[137619.835499] BTRFS: Transaction aborted (error -5)
[137619.835500] Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO)
[137619.835506] CPU: 1 PID: 4840 Comm: rm Tainted: P        W  O    4.5.0-gentoo-r1 #1
[137619.835508] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[137619.835509]  0000000000000000 ffffffff8159eae9 ffff880255d1bc98 ffffffff81ea1d08
[137619.835512]  ffffffff810c6e37 ffff88040c7daa00 ffff880255d1bce8 00000000000001c6
[137619.835514]  ffff8803211b4510 000000000000000b ffffffff810c6eb7 ffffffff81e8a0a0
[137619.835517] Call Trace:
[137619.835519]  [<ffffffff8159eae9>] ? dump_stack+0x46/0x5d
[137619.835522]  [<ffffffff810c6e37>] ? warn_slowpath_common+0x77/0xb0
[137619.835525]  [<ffffffff810c6eb7>] ? warn_slowpath_fmt+0x47/0x50
[137619.835528]  [<ffffffff81492419>] ? btrfs_run_delayed_refs+0x279/0x2b0
[137619.835531]  [<ffffffff814b0978>] ? btrfs_truncate_inode_items+0x8b8/0xdc0
[137619.835535]  [<ffffffff814b1d4e>] ? btrfs_evict_inode+0x3fe/0x550
[137619.835538]  [<ffffffff811cd4f7>] ? evict+0xb7/0x180
[137619.835541]  [<ffffffff811c37cc>] ? do_unlinkat+0x12c/0x2d0
[137619.835543]  [<ffffffff81bdb017>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[137619.835545] ---[ end trace 6e8061336c42ff94 ]---
[137619.835547] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2946: errno=-5 IO failure
[137619.835550] BTRFS info (device bcache2): forced readonly
[137619.886069] pending csums is 410705920

So it looks like fixing one error introduces other errors. Should I try
init-extent-tree after taking a backup?

BTW: "btrfsck --repair" does not work: I complains about unsupported
cases due to compression of extents and that I need to contact the
developers for covering this case.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-03-31 23:27 ` Henk Slager
  2016-04-01  1:10   ` Qu Wenruo
@ 2016-04-02  9:00   ` Kai Krakow
  2016-04-02 17:17     ` Henk Slager
  1 sibling, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-02  9:00 UTC (permalink / raw)
  To: linux-btrfs

Am Fri, 1 Apr 2016 01:27:21 +0200
schrieb Henk Slager <eye1tm@gmail.com>:

> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
> errors in dmesg.

It is the gentoo patchset, I don't think anything there relates to
btrfs:
https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/

> Also, where does 'duplicate object' come from? dmesg ? then please
> post its surroundings, straight from dmesg.

It was in dmesg. I already posted it in the other thread and Qu took
note of it. Apparently, I didn't manage to capture anything else than:

btrfs_run_delayed_refs:2927: errno=-17 Object already exists

It hit me unexpected. This was the first time btrfs went RO for me. It
was with kernel 4.4.5 I think.

I suspect this is the outcome of unnoticed corruptions that sneaked in
earlier over some period of time. The system had no problems until this
incident, and only then I discovered the huge pile of corruptions when I
ran btrfsck.

I'm also pretty convinced now that VirtualBox itself is not the problem
but only victim of these corruptions, that's why it primarily shows up
in the VDI file.

However, I now found csum errors in unrelated files (see other post in
this thread), even for files not touched in a long time.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-02  9:00   ` Kai Krakow
@ 2016-04-02 17:17     ` Henk Slager
  2016-04-02 20:16       ` Kai Krakow
  0 siblings, 1 reply; 22+ messages in thread
From: Henk Slager @ 2016-04-02 17:17 UTC (permalink / raw)
  To: linux-btrfs

On Sat, Apr 2, 2016 at 11:00 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Fri, 1 Apr 2016 01:27:21 +0200
> schrieb Henk Slager <eye1tm@gmail.com>:
>
>> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
>> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
>> errors in dmesg.
>
> It is the gentoo patchset, I don't think anything there relates to
> btrfs:
> https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/
>
>> Also, where does 'duplicate object' come from? dmesg ? then please
>> post its surroundings, straight from dmesg.
>
> It was in dmesg. I already posted it in the other thread and Qu took
> note of it. Apparently, I didn't manage to capture anything else than:
>
> btrfs_run_delayed_refs:2927: errno=-17 Object already exists
>
> It hit me unexpected. This was the first time btrfs went RO for me. It
> was with kernel 4.4.5 I think.
>
> I suspect this is the outcome of unnoticed corruptions that sneaked in
> earlier over some period of time. The system had no problems until this
> incident, and only then I discovered the huge pile of corruptions when I
> ran btrfsck.
>
> I'm also pretty convinced now that VirtualBox itself is not the problem
> but only victim of these corruptions, that's why it primarily shows up
> in the VDI file.
>
> However, I now found csum errors in unrelated files (see other post in
> this thread), even for files not touched in a long time.

Ok, this is some good further status and background. That there are
more csum errors elsewhere is quite worrying I would say. You said HW
is tested, are you sure there no rare undetected failures, like due to
overclocking or just aging or whatever. It might just be that spurious
HW errors just now start to happen and are unrelated to kernel upgrade
from 4.4.x to 4.5.
I had once a RAM module going bad; Windows7 ran fine (at least no
crashes), but when I booted with Linux/btrfs, all kinds of strange
btrfs errors started to appear including csum errors.

The other thing you could think about is the SSD cache partition. I
don't remember if blocks from RAM to SSD get an extra CRC attached
(independent of BTRFS). But if data gets corrupted while in the SSD,
you could get very nasty errors, how nasty depends a bit on the
various bcache settings. It is not unthinkable that dirty changed data
gets written to the harddisks. But at least btrfs (scub) can detect
that (the situation you are in now).

Maybe to further isolate just btrfs, you could temporary rule out
bcache by making sure the cache is clean and then increase the
startsectors of second partitions on the harddisks by 16 (8KiB) and
then reboot. Of course after any write to the partitions, you'll have
to recreate all bcache.

But maybe it is just due to bugs in older kernels that the fs has been
silently corrupted and now kernel 4.5 cannot handle it anymore and any
use of the fs increases corruption.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-02 17:17     ` Henk Slager
@ 2016-04-02 20:16       ` Kai Krakow
  2016-04-03  0:14         ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-02 20:16 UTC (permalink / raw)
  To: linux-btrfs

Am Sat, 2 Apr 2016 19:17:55 +0200
schrieb Henk Slager <eye1tm@gmail.com>:

> On Sat, Apr 2, 2016 at 11:00 AM, Kai Krakow <hurikhan77@gmail.com>
> wrote:
> > Am Fri, 1 Apr 2016 01:27:21 +0200
> > schrieb Henk Slager <eye1tm@gmail.com>:
> >  
> >> It is not clear to me what 'Gentoo patch-set r1' is and does. So
> >> just boot a vanilla v4.5 kernel from kernel.org and see if you get
> >> csum errors in dmesg.  
> >
> > It is the gentoo patchset, I don't think anything there relates to
> > btrfs:
> > https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/
> >  
> >> Also, where does 'duplicate object' come from? dmesg ? then please
> >> post its surroundings, straight from dmesg.  
> >
> > It was in dmesg. I already posted it in the other thread and Qu took
> > note of it. Apparently, I didn't manage to capture anything else
> > than:
> >
> > btrfs_run_delayed_refs:2927: errno=-17 Object already exists
> >
> > It hit me unexpected. This was the first time btrfs went RO for me.
> > It was with kernel 4.4.5 I think.
> >
> > I suspect this is the outcome of unnoticed corruptions that sneaked
> > in earlier over some period of time. The system had no problems
> > until this incident, and only then I discovered the huge pile of
> > corruptions when I ran btrfsck.
> >
> > I'm also pretty convinced now that VirtualBox itself is not the
> > problem but only victim of these corruptions, that's why it
> > primarily shows up in the VDI file.
> >
> > However, I now found csum errors in unrelated files (see other post
> > in this thread), even for files not touched in a long time.  
> 
> Ok, this is some good further status and background. That there are
> more csum errors elsewhere is quite worrying I would say. You said HW
> is tested, are you sure there no rare undetected failures, like due to
> overclocking or just aging or whatever. It might just be that spurious
> HW errors just now start to happen and are unrelated to kernel upgrade
> from 4.4.x to 4.5.
> I had once a RAM module going bad; Windows7 ran fine (at least no
> crashes), but when I booted with Linux/btrfs, all kinds of strange
> btrfs errors started to appear including csum errors.

I'll go checking the RAM for problems - tho that would be the first
time in twenty years that a RAM module hadn't errors from the
beginning. Well, you'll never know. But I expect no error since usually
this would mean all sorts of different and random problems which I
don't have. Problems are very specific, which is atypical for RAM
errors.

The hardware is not overclocked, every part was tested when installed.

> The other thing you could think about is the SSD cache partition. I
> don't remember if blocks from RAM to SSD get an extra CRC attached
> (independent of BTRFS). But if data gets corrupted while in the SSD,
> you could get very nasty errors, how nasty depends a bit on the
> various bcache settings. It is not unthinkable that dirty changed data
> gets written to the harddisks. But at least btrfs (scub) can detect
> that (the situation you are in now).

Well, the SSD could in fact soon become a problem. It's at 97% of its
lifetime according to SMART. I'm probably somewhere near 85TB (that's
the lifetime spec of the SSD) of written data within one year thanks to
some unfortunate disk replacement (btrfs replace) action with btrfs
through bcache, and weekly scrubs (which does not just read, but
writes).

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate     0x002f   100
100   000    Pre-fail  Always       -       1 5 Reallocate_NAND_Blk_Cnt
0x0033   100   100   000    Pre-fail  Always       -       0 9
Power_On_Hours          0x0032   100   100   000    Old_age
Always       -       8705 12 Power_Cycle_Count       0x0032   100
100   000    Old_age   Always       -       286 171
Program_Fail_Count      0x0032   100   100   000    Old_age
Always       -       0 172 Erase_Fail_Count        0x0032   100   100
000    Old_age   Always       -       0 173 Ave_Block-Erase_Count
0x0032   003   003   000    Old_age   Always       -       2913 174
Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age
Always       -       112 180 Unused_Reserve_NAND_Blk 0x0033   000
000   000    Pre-fail  Always       -       1036 183
SATA_Interfac_Downshift 0x0032   100   100   000    Old_age
Always       -       0 184 Error_Correction_Count  0x0032   100   100
000    Old_age   Always       -       0 187 Reported_Uncorrect
0x0032   100   100   000    Old_age   Always       -       0 194
Temperature_Celsius     0x0022   067   057   000    Old_age
Always       -       33 (Min/Max 20/43) 196 Reallocated_Event_Count
0x0032   100   100   000    Old_age   Always       -       0 197
Current_Pending_Sector  0x0032   100   100   000    Old_age
Always       -       0 198 Offline_Uncorrectable   0x0030   100   100
000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count
0x0032   100   100   000    Old_age   Always       -       0 202
Percent_Lifetime_Used   0x0031   003   003   000    Pre-fail
Offline      -       97 206 Write_Error_Rate        0x000e   100
100   000    Old_age   Always       -       0 210
Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age
Always       -       0 246 Total_Host_Sector_Write 0x0032   100   100
000    Old_age   Always       -       42879382296 247
Host_Program_Page_Count 0x0032   100   100   000    Old_age
Always       -       1495038460 248 Bckgnd_Program_Page_Cnt 0x0032
100   100   000    Old_age   Always       -       42326578695


> Maybe to further isolate just btrfs, you could temporary rule out
> bcache by making sure the cache is clean and then increase the
> startsectors of second partitions on the harddisks by 16 (8KiB) and
> then reboot. Of course after any write to the partitions, you'll have
> to recreate all bcache.

Bcache had some patches lately for problems I never experienced. At
this point, I'd also not rule out bcache as the fault. Tho, bcache
itself had no problems (I have one other system where bcache broke down
after those patches were applied, resulting in a broken bcache b-tree).

> But maybe it is just due to bugs in older kernels that the fs has been
> silently corrupted and now kernel 4.5 cannot handle it anymore and any
> use of the fs increases corruption.

I'm pretty sure the problems sneaked in during running older kernels,
and the FS going RO was only tip of the iceberg.

My last "error free" rsync backup is from mid March. By that time, I
probably had no csum errors in files with young modification time - but
since I only in-place sync files with changed mod-time, I cannot rule
out csum errors having already been there. My script only takes
snapshots of the backup scratch area when rsync was successful, thus my
last snapshot from mid March holds valid copies of the broken files
while the scratch area has a current backup with some files broken (due
to in-place sync). [1]

According to previous inspections, that backup FS is in good shape -
the only btrfsck errors have been false alerts which have been fixed by
Qu (thanks BTW).

Interesting thing is:

As with the first file with csum errors (the VDI file), also the second
file has csum errors again when recreated. It's a game data file from
Steam. I removed it (then the FS went RO, mentioned earlier in this
thread). Now, Steam re-downloaded the file to a temp directory - so
obviously it's a completely new file (except Steam somehow magically
recovered it from somewhere else). But this new file has csum errors
again. WTH? And Steam forces the FS RO when working with this file.

So, either the SSD (thru bcache) or btrfs' compression algorithms show
bugs with very specific data patterns (since I'm using compress=lzo),
or the other corruptions make btrfs destroy those new files and it
allocates space over and over again from affected areas of the disk. I
don't know how btrfs allocation works - but that may be an explanation
(wrt the backpointer errors).

BTW: Replacement SSD already ordered. At the current rate the old
one will reach 100% lifetime in about 4-6 weeks.

[1]: As a reference or if you're curious:
https://gist.github.com/kakra/5520370

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-02 20:16       ` Kai Krakow
@ 2016-04-03  0:14         ` Chris Murphy
  2016-04-03  4:02           ` Kai Krakow
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-03  0:14 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow <hurikhan77@gmail.com> wrote:

> I'll go checking the RAM for problems - tho that would be the first
> time in twenty years that a RAM module hadn't errors from the
> beginning. Well, you'll never know. But I expect no error since usually
> this would mean all sorts of different and random problems which I
> don't have. Problems are very specific, which is atypical for RAM
> errors.

Well so far it's just the VDI that's experiencing csum mismatch
errors, right? So that's not bad RAM, which would affect other files
too. And same for a failing SSD.

I think you've got a bug somewhere and it's just hard to say where it
is based on the available information. I've already lost track if
others have all of the exact same setup you do: bcache + nossd +
autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume.
There are others who have some of those options, but I don't know if
there's anyone who has all of those going on.

Maybe Qu has some suggestions, but if it were me I'd do this. Build
mainline 4.5.0, it's a known quantity by Btrfs devs. Build the kernel
with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you
mount the file system, don't use mount option check_int, just use your
regular mount options and try to reproduce the VDI corruption. If you
can reproduce it, then start over, this time with check_int mount
option included along with the others you're using and try to
reproduce. It's possible there will be fairly verbose kernel messages,
so use boot parameter log_buf_len=1M and then that way you can use
dmesg rather than depending on journalctl -k which sometimes drops
messages if there are too many.

If you reproduce the corruption while check_int is enabled, kernel
messages should have clues and then you can put that in a file and
attach to the list or open a bug. FWIW, I'm pretty sure your MUA is
wrapping poorly, when I look at this URL for your post with smartctl
output, it wraps in a way that's essentially impossible to sort out at
a glance. Whether it's your MUA or my web browser pretty much doesn't
matter, it's not legible so what I do is just attach as file to a bug
report or if small enough onto the list itself.
http://www.spinics.net/lists/linux-btrfs/msg53790.html

Finally, I would retest yet again with check_int_data as a mount
option and try to reproduce. This is reported to be dirt slow, but it
might capture something that check_int doesn't. But I admit this is
throwing spaghetti on the wall, and is something of a goose chase just
because I don't know what else to recommend other than iterating all
of your mount options from none, adding just one at a time, and trying
to reproduce. That somehow sounds more tedious. But chances are you'd
find out what mount option is causing it; OR maybe you'd find out the
corruption always happens, even with defaults, even without bcache, in
which case that'd seem to implicate either a gentoo patch, or a
virtual box bug of some sort.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-03  0:14         ` Chris Murphy
@ 2016-04-03  4:02           ` Kai Krakow
  2016-04-03  5:06             ` Duncan
  2016-04-03 19:03             ` Chris Murphy
  0 siblings, 2 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-03  4:02 UTC (permalink / raw)
  To: linux-btrfs

Am Sat, 2 Apr 2016 18:14:17 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow <hurikhan77@gmail.com>
> wrote:
> 
> > I'll go checking the RAM for problems - tho that would be the first
> > time in twenty years that a RAM module hadn't errors from the
> > beginning. Well, you'll never know. But I expect no error since
> > usually this would mean all sorts of different and random problems
> > which I don't have. Problems are very specific, which is atypical
> > for RAM errors.  
> 
> Well so far it's just the VDI that's experiencing csum mismatch
> errors, right? So that's not bad RAM, which would affect other files
> too. And same for a failing SSD.

No, other files are affected, too. And it looks like those files are
easily affected even when removed and recreated from whatever backup
source.

> I think you've got a bug somewhere and it's just hard to say where it
> is based on the available information. I've already lost track if
> others have all of the exact same setup you do: bcache + nossd +
> autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume.
> There are others who have some of those options, but I don't know if
> there's anyone who has all of those going on.

I didn't run VirtualBox since the incident. So I'd rule out VirtualBox.
Currently, there seems to be no csum error for the VDI file, instead
now another file gets corruptions, even after recreated. I think it is
result of another corruption and thus a side effect.

Also I think, having options nossd+autodefrag+lzo shouldn't be an
exotic or unsupported option. Having this on top of bcache should just
work.

Let's not rule out bcache had a problem although I usually expect
bcache to freak out with internal btree corruption then.

> Maybe Qu has some suggestions, but if it were me I'd do this. Build
> mainline 4.5.0, it's a known quantity by Btrfs devs.

4.5.0-gentoo is currently only a few patches so I could easily build
vanilla.

> Build the kernel
> with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you
> mount the file system, don't use mount option check_int, just use your
> regular mount options and try to reproduce the VDI corruption. If you
> can reproduce it, then start over, this time with check_int mount
> option included along with the others you're using and try to
> reproduce. It's possible there will be fairly verbose kernel messages,
> so use boot parameter log_buf_len=1M and then that way you can use
> dmesg rather than depending on journalctl -k which sometimes drops
> messages if there are too many.

Does it make sense while I still have the corruptions in the FS? I'd
like to wait for Qu whether I should recreate the FS or whether I
should take some image, or send info to improve btrfsck...

I'm pretty sure I do not have reproducible corruptions which are not
caused by another corruption - so check_int would probably be of less
use currently.

> If you reproduce the corruption while check_int is enabled, kernel
> messages should have clues and then you can put that in a file and
> attach to the list or open a bug. FWIW, I'm pretty sure your MUA is
> wrapping poorly, when I look at this URL for your post with smartctl
> output, it wraps in a way that's essentially impossible to sort out at
> a glance. Whether it's your MUA or my web browser pretty much doesn't
> matter, it's not legible so what I do is just attach as file to a bug
> report or if small enough onto the list itself.
> http://www.spinics.net/lists/linux-btrfs/msg53790.html

Claws mail is just too smart for me... It showed up correctly in the
editor before hitting the send button. I wish I could go back to knode
(that did it's job right). But it's currently an unsupported orphan
project of KDE. :-(

> Finally, I would retest yet again with check_int_data as a mount
> option and try to reproduce. This is reported to be dirt slow, but it
> might capture something that check_int doesn't. But I admit this is
> throwing spaghetti on the wall, and is something of a goose chase just
> because I don't know what else to recommend other than iterating all
> of your mount options from none, adding just one at a time, and trying
> to reproduce. That somehow sounds more tedious. But chances are you'd
> find out what mount option is causing it; OR maybe you'd find out the
> corruption always happens, even with defaults, even without bcache, in
> which case that'd seem to implicate either a gentoo patch, or a
> virtual box bug of some sort.

I think the latter two are easily the least probable sort of bugs. But
I'll give it a try. For the time being, I could switch bcache to
write-around mode - so it could at least not corrupt btrfs during
writes.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-03  4:02           ` Kai Krakow
@ 2016-04-03  5:06             ` Duncan
  2016-04-03 22:19               ` Kai Krakow
  2016-04-03 19:03             ` Chris Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Duncan @ 2016-04-03  5:06 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Sun, 03 Apr 2016 06:02:02 +0200 as excerpted:

> No, other files are affected, too. And it looks like those files are
> easily affected even when removed and recreated from whatever backup
> source.

I've seen you say that several times now, I think.  But none of those 
times has it apparently occurred to you to double-check whether it's the 
/same/ corruptions every time, or at least, if you checked it, I've not 
seen it actually /reported/.  (Note that I didn't say you didn't report 
it, only that I've not seen it.  A difference there is! =:^)

If I'm getting repeated corruptions of something, that's the first thing 
I'd check, is there some repeating pattern to those corruptions, same 
place in the file, same "wanted" value (expected), same "got" value, (not 
expected if it's reporting corruption), etc.

Then I'd try different variations like renaming the file, putting it in a 
different directory with all of the same other files, putting it in a 
different directory with all different files, putting it in a different 
directory by itself, putting it in the same directory but in a different 
subvolume... you get the point.

Then I'd try different mount options, with and without compression, with 
different kinds of compression, with compress-force and with simple 
compress, with and without autodefrag...

I could try it with nocow enabled for the file (note that the file has to 
be created with nocow before it gets content, for nocow to take effect), 
tho of course that'll turn off btrfs checksumming, but I could still for 
instance md5sum the original source and the nocowed test version and see 
if it tests clean that way.

I could try it with nocow on the file but with a bunch of snapshots 
interwoven with writing changes to the file (obviously this will kill 
comparison against the original, but I could arrange to write the same 
changes to the test file on btrfs, and to a control copy of the file on 
non-btrfs, and then md5sum or whatever compare them).

Then, if I had the devices available to do so, I'd try it in a different 
btrfs of the same layout (same redundancy mode and number of devices), 
both single and dup mode on a single device, etc.

And again if available, I'd try swapping the filesystem to different 
machines...

OK, so trying /all/ the above might be a bit overboard but I think you 
get the point.  Try to find some pattern or common element in the whole 
thing, and report back the results at least for the "simple" experiments 
like whether the corruption appears to be the same (same got at the same 
spot) or different, and whether putting the file in a different subdir or 
using a different name for it matters at all.  =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-03  4:02           ` Kai Krakow
  2016-04-03  5:06             ` Duncan
@ 2016-04-03 19:03             ` Chris Murphy
  1 sibling, 0 replies; 22+ messages in thread
From: Chris Murphy @ 2016-04-03 19:03 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On Sat, Apr 2, 2016 at 10:02 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Sat, 2 Apr 2016 18:14:17 -0600

> Also I think, having options nossd+autodefrag+lzo shouldn't be an
> exotic or unsupported option. Having this on top of bcache should just
> work.

I'm not suggesting it shouldn't work. But in fact something isn't
working. Bugs happen. Regressions happen. This is a process of
elimination project to find out either why, or under what
condition(s), it doesn't work.


> Does it make sense while I still have the corruptions in the FS? I'd
> like to wait for Qu whether I should recreate the FS or whether I
> should take some image, or send info to improve btrfsck...

It's up to you. I think it's fair to say the file system should not be
corrupting files so long as it's willing to write to the volume. So
that's a problem in and of itself; it should sooner go read only.

It's completely reasonable to take a btrfs-image, back everything up,
and then try a 'btrfs check --repair' and see if it can fix things up.
If not, that makes the btrfs-image more valuable.



> I think the latter two are easily the least probable sort of bugs. But
> I'll give it a try. For the time being, I could switch bcache to
> write-around mode - so it could at least not corrupt btrfs during
> writes.

I don't know enough about bcache to speculate what can happen if there
are already fs corruptions. Is it possible bcache makes things worse?
No idea.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-03  5:06             ` Duncan
@ 2016-04-03 22:19               ` Kai Krakow
  2016-04-04  0:51                 ` Chris Murphy
  2016-04-04  4:34                 ` Duncan
  0 siblings, 2 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-03 22:19 UTC (permalink / raw)
  To: linux-btrfs

Am Sun, 3 Apr 2016 05:06:19 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Kai Krakow posted on Sun, 03 Apr 2016 06:02:02 +0200 as excerpted:
> 
> > No, other files are affected, too. And it looks like those files are
> > easily affected even when removed and recreated from whatever backup
> > source.  
> 
> I've seen you say that several times now, I think.  But none of those 
> times has it apparently occurred to you to double-check whether it's
> the /same/ corruptions every time, or at least, if you checked it,
> I've not seen it actually /reported/.  (Note that I didn't say you
> didn't report it, only that I've not seen it.  A difference there is!
> =:^)

Believe me, I would double check... But this FS is (and the affected
files are) just too big to create test cases, and backups, and copies,
and you know what...

So only chance I see is to offer help improving "btrfsck --repair"
before I wipe and restore from backup. Except the unlikely case
"--repair" will improve to a point it gets my FS back in order. ;-)

I'll have to wait for my new bcache SSD to arrive. I it's current state
(lifetime at 97%) I don't want to push my whole file data through it.

Then I'll backup the current state (the damaged files are skipped
anyways because they haven't been "modified" according to mtime), so
I'll get a clean backup except for the VDI file and some big Steam
files (which actually can easily be downloaded again through the
client).

And yes, you are true in that I didn't check if it is the same
corruption every time. But that's also a bit difficult to do because
I'd need either enough spare disk space to keep copies of the files to
compare against, or need to setup some block-identifying checksumming
like a hash tree.

> If I'm getting repeated corruptions of something, that's the first
> thing I'd check, is there some repeating pattern to those
> corruptions, same place in the file, same "wanted" value (expected),
> same "got" value, (not expected if it's reporting corruption), etc.

Way to go, usually...

> Then I'd try different variations like renaming the file, putting it
> in a different directory with all of the same other files, putting it
> in a different directory with all different files, putting it in a
> different directory by itself, putting it in the same directory but
> in a different subvolume... you get the point.

Here's the point: Shuffling files around should be done to different
filesystems. I neither have any spare files to do that, nor I currently
can afford time to shuffle around such big files - it takes multiple
hours to copy these. Already looking forward to restoring the backup...
*sigh*

BTW: Is it possible to use my backup drive (it's btrfs single-data
dup-metadata, single device) as a seed device for my newly created
btrfs pool (raid0-data, raid1-metadata, three devices)? I guess the
seed source cannot be mounted or modified...

> Then I'd try different mount options, with and without compression,
> with different kinds of compression, with compress-force and with
> simple compress, with and without autodefrag...

As a first step I've switched bcache to write-around mode. It should
prevent (or at least reduce) more corruption if bcache is at fault. And
it's the safer choice anyway for a soon-to-die SSD.

> I could try it with nocow enabled for the file (note that the file
> has to be created with nocow before it gets content, for nocow to
> take effect), tho of course that'll turn off btrfs checksumming, but
> I could still for instance md5sum the original source and the nocowed
> test version and see if it tests clean that way.

I already thought about putting the VDI back to nocow... I had this
before. But in this sense, csum errors would go unnoticed. So I don't
think that is adequate.

But in consequence I could actually md5sum the files as you wrote
because there won't be read errors due to csum mismatch. And I could
detect corruption that way.

> I could try it with nocow on the file but with a bunch of snapshots 
> interwoven with writing changes to the file (obviously this will kill 
> comparison against the original, but I could arrange to write the
> same changes to the test file on btrfs, and to a control copy of the
> file on non-btrfs, and then md5sum or whatever compare them).

That would probably work but I do not quite trust it due to the
corruptions already on disk which seemingly damage specific files or
areas on the disk.

> Then, if I had the devices available to do so, I'd try it in a
> different btrfs of the same layout (same redundancy mode and number
> of devices), both single and dup mode on a single device, etc.

In that sense: If I had the disks available I already would've taken a
block-by-block copy and then restored from backup.

> And again if available, I'd try swapping the filesystem to different 
> machines...

Maybe another time... ;-)

Actually, I only have that one system here. I could do that with the
other system I have problems with - but that's another story and
currently low priority.

> OK, so trying /all/ the above might be a bit overboard but I think
> you get the point.  Try to find some pattern or common element in the
> whole thing, and report back the results at least for the "simple"
> experiments like whether the corruption appears to be the same (same
> got at the same spot) or different, and whether putting the file in a
> different subdir or using a different name for it matters at all.
> =:^)

Your ideas are always welcome.

The corruptions seem to be different by the following observation:

While the VDI file was corrupted over and over again with a csum error,
I could simply remove it and restore from backup. The last thing I did
was ddescue it from the damaged version to my backup device, than rsync
the file back to the originating device (which created a new file
side-by-side, so in a new area of disk space, then replace-by-renamed
the old one). I didn't run VirtualBox since back then but the file
didn't become corrupted either since then.

But now, according to btrfsck, a csum error instead came up in another
big file from Steam. This time, when I rm the file, the kernel
backtraces and sends btrfs to RO mode. The file cannot be removed. I'm
going to leave it that way currently, the file won't be used currently.
And I can simply ignore it for backup and restore, it's not an
important one. Better have an "incorrectable" csum error there than
having one jumping unpredictably across my files.

Before you ask: Yes, I'm still working productively with this broken
file system. I'm not sure if this is a point for or against btrfs,
tho. ;-) It works perfectly stable as long as I do not touch any of the
damaged files (which was and currently continues to be easy). Ah, well,
"perfectly" except that commands "df" and "du" tend to freeze and be
unkillable. I'm going to ignore that and take the opportunity to test
how far I can stress btrfs before it finally breaks down.

Thus, I'll leave it that way until it breaks down or I decide to effort
the time to restore from backup. Until then, I keep my last known-good
snapshot and a known-incomplete backup scratch storage where I at least
know which files are broken. My daily-business files are stored twice
anyways (offsite and local backup).

I hope I can add some value to improving btrfsck until I have to
restore from backup. I know that with my current setup I cannot give
any help in finding a possible btrfs kernel flaw - which I actually
think maybe was in a previous kernel version and has been fixed by now.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-03 22:19               ` Kai Krakow
@ 2016-04-04  0:51                 ` Chris Murphy
  2016-04-04 19:36                   ` Kai Krakow
  2016-04-04  4:34                 ` Duncan
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-04  0:51 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On Sun, Apr 3, 2016 at 4:19 PM, Kai Krakow <hurikhan77@gmail.com> wrote:

> BTW: Is it possible to use my backup drive (it's btrfs single-data
> dup-metadata, single device) as a seed device for my newly created
> btrfs pool (raid0-data, raid1-metadata, three devices)?

Yes.

I just tried doing the conversion to raid1 before and after seed
removal, but with the small amount of data (4GiB) I can't tell a
difference. It seems like -dconvert=raid with seed still connected
makes two rw copies (i.e. there's a ro copy which is the original, and
then two rw copies on 2 of the 3 devices I added all at the same time
to the seed), and the 'btrfs dev remove' command to remove the seed
happened immediately, suggested the prior balances had already
migrated copies off the seed. This may or may not be optimal for your
case.

Two gotchas.

I ran into this bug:
btrfs fi usage crash when volume contains seed device
https://bugzilla.kernel.org/show_bug.cgi?id=115851

And there is a phantom single chunk on one of the new rw devices that was added.
Data,single: Size:1.00GiB, Used:0.00B
   /dev/dm-8       1.00GiB

It's still there after the -dconvert=raid1 and separate -mconvert=raid
and after seed device removal. A balance start without filters removes
it, chances are had I used -dconvert=raid1,soft it would have vanished
also but I didn't retest for that.


> I guess the
> seed source cannot be mounted or modified...

?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-03 22:19               ` Kai Krakow
  2016-04-04  0:51                 ` Chris Murphy
@ 2016-04-04  4:34                 ` Duncan
  2016-04-04 19:26                   ` Kai Krakow
  1 sibling, 1 reply; 22+ messages in thread
From: Duncan @ 2016-04-04  4:34 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Mon, 04 Apr 2016 00:19:25 +0200 as excerpted:

> The corruptions seem to be different by the following observation:
> 
> While the VDI file was corrupted over and over again with a csum error,
> I could simply remove it and restore from backup. The last thing I did
> was ddescue it from the damaged version to my backup device, than rsync
> the file back to the originating device (which created a new file
> side-by-side, so in a new area of disk space, then replace-by-renamed
> the old one). I didn't run VirtualBox since back then but the file
> didn't become corrupted either since then.
> 
> But now, according to btrfsck, a csum error instead came up in another
> big file from Steam. This time, when I rm the file, the kernel
> backtraces and sends btrfs to RO mode. The file cannot be removed. I'm
> going to leave it that way currently, the file won't be used currently.
> And I can simply ignore it for backup and restore, it's not an important
> one. Better have an "incorrectable" csum error there than having one
> jumping unpredictably across my files.

While my dying ssd experience was with btrfs raid1 direct on a pair of 
ssds, extrapolating from what I learned about the ssd behavior to your 
case with bcache caching to the ssd, then writing back to the spinning 
rust backing store, presumably in btrfs single-device mode with single 
data and either single or dup metadata (there's enough other cases 
interwoven on this thread its no longer clear to me which posted btrfs fi 
show, etc, apply to this case, so I'm guessing, as I believe presenting 
it as more than a single device at the btrfs level would require multiple 
bcache devices, tho of course you could do that by partitioning the 
ssd)...

Would lead me to predict very much the behavior you're seeing, if the 
caching ssd was dying.

As bcache is running below btrfs, btrfs won't know anything about it, and 
therefore, will behave, effectively, as if it's not there -- an error on 
the ssd will look like an error on the btrfs, period.  (As I'm assuming a 
single btrfs device, which device of the btrfs doesn't come into 
question, tho which copy of dup metadata might... but that's an entirely 
different can of worms since I'm not sure whether the bcache would end up 
deduping the dup metadata or not, and the ssd might do the same, and...)

And with bcache doing write-behind from the ssd to the backing store, 
underneath the level at which btrfs could detect and track csum 
corruption, if it's corrupt on the ssd, that corruption then transfers to 
the backing store as btrfs won't know that transfer is happening at all 
and thus won't be in the loop to detect the csum error at that stage.


Meanwhile, what I saw on the pair of ssds, one going bad, in btrfs raid1 
mode, was that a btrfs scrub *WOULD* successfully detect the csum errors 
on the bad ssd, and rewrite it from the remaining good copy.

Keep in mind that this is without snapshots, so that rewrite, while COW, 
would then release the old copy back into the free space pool.  In so 
doing, it would trigger the ssd firmware to copy the rest of the erase-
block and erase it, and that in turn would trigger the firmware to detect 
the bad sector and replace it with one from its spare-sectors list.  As a 
result, it would tick up the raw value of attribute #5, 
Reallocated_Sector_Ct, as well as 182, Erase_Fail_Count_Total, in smartctl 
-A (tho the two attributes didn't increase in numeric lock-step, both 
were increasing over time, primarily when I ran scrubs).


But it was mostly (almost entirely) when I ran the scrubs and 
consequently rewrote the corrupted sectors from the copy on the good 
device, that it would trigger those erase-fails and sector reallocations.

Anyway, the failing ssd's issues gradually got worse, until I was having 
to scrub and trigger both filesystem recopy and bad ssd sector rewrites 
any time I wrote anything major to the filesystem as well as at cold-boot 
(leaving the system off for several hours apparently accelerated the 
sector rot within stable data, while the powered-on state kept the flash 
cells charged high enough they didn't rot so fast and it was mostly or 
entirely new/changed data I had to worry about).  Eventually I simply 
decided I was tired of the now more or less constant hassle and I wasn't 
learning much new any more from the decaying device's behavior, and I 
replaced it.


Translating that to your case, if your caching ssd is dying and some 
sectors are now corrupted, unless there's a second btrfs copy of that 
block to copy over the bad version with, it's unlikely to trigger those 
sector reallocations.

Tho actually rewriting them (or at the device firmware level, COWing them 
and erasing the old erase-blocks), as bcache will be doing if it dumps 
the current cache content and fills those blocks with something else, 
should trigger the same thing, tho unless bcache can force-dump and 
recache or something, I don't believe there's a systematic way to trigger 
it over all cached data as btrfs scrub does.

Anyway, if I'm correct and as your ordering the new ssd indicates you may 
suspect as well, the problem may indeed be that ssd, and a new ssd 
(assuming /it/ isn't defective) should fix it, tho the existing damage on 
the existing btrfs may or may not be fully recoverable once you get a new 
ssd and thus don't have to worry about further damage from the old one.

Meanwhile, putting bcache into write-around mode, so it makes no further 
changes to the ssd and only uses it for reads, is probably wise, and 
should help limit further damage.  Tho if in that mode bcache still does 
writeback of existing dirty and cached data to the backing store, some 
further damage could occur from that.  But I don't know enough about 
bcache to know what its behavior and level of available configuration in 
that regard actually are.  As long as it's not trying to write anything 
from the ssd to the backing store, I think further damage should be very 
limited.

But were you running btrfs raid1 without bcache, or with multiple devices 
at the btrfs level, each bcached but to separate ssds so any rot wouldn't 
be likely to transfer between them increasing the chances of both copies 
being bad at once, I expect you'd be seeing behavior on your ssd very 
close to what I saw on my failing one, and assuming your other device was 
fine, you could still be scrubbing and recovering fine, as I was, tho 
with the necessary frequency of scrubs increasing over time (and not 
helped by the recently reported too many csum errors on compressed 
content, even when they're on raid1 and should recover from the other 
copy, crashing btrfs and the system, thus requiring more frequent scrubs 
than would otherwise be required -- I ran into this too, but didn't 
realize it only triggered on compressed content and was thus a specific 
bug, and simply attributed it to btrfs not yet being fully stable and 
believed that's what it always did with too many crc errors, even when 
they should be recoverable from the good raid1 copy).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04  4:34                 ` Duncan
@ 2016-04-04 19:26                   ` Kai Krakow
  2016-04-05  1:44                     ` Duncan
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 19:26 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 4 Apr 2016 04:34:54 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Meanwhile, putting bcache into write-around mode, so it makes no
> further changes to the ssd and only uses it for reads, is probably
> wise, and should help limit further damage.  Tho if in that mode
> bcache still does writeback of existing dirty and cached data to the
> backing store, some further damage could occur from that.  But I
> don't know enough about bcache to know what its behavior and level of
> available configuration in that regard actually are.  As long as it's
> not trying to write anything from the ssd to the backing store, I
> think further damage should be very limited.

bcache has 0 for dirty data most of the time for me - even in write
back mode. It does write back during idle time and at reduced rate,
usually that finishes within a few minutes.

Switching the cache to write-around initiates instant write-back of all
dirty data, so within seconds it goes down to zero and the cache
becomes detachable.

I'll go test the soon-to-die SSD as soon as it replaced. I think it's
still far from failing with bitrot. It was overprovisioned by 30% most
of the time, with the spare space trimmed. It certainly should have a
lot of sectors for wear levelling. In addition, smartctl shows no
sector errors at all - except for one: raw_read_error_rate. I'm not
sure what all those sensors tell me, but that one I'm also seeing on
hard disks which show absolutely no data damage.

In fact, I see those counters for my hard disks. But dd to /dev/null of
the complete raw hard disk shows no sector errors. It seems good. But
well, counting 1+1 together: I currently see data damage. But I guess
that's unrelated.

Is there some documentation somewhere what each of those sensors
technically mean and how to read the raw values and thresh values?

I'm also seeing multi_zone_error_rate on my spinning rust.

According to smartctl health check and smartctl extended selftest,
there's no problems at all - and the smart error log is empty. There
has never been an ATA error in dmesg... No relocated sectors... From my
naive view the drives still look good.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04  0:51                 ` Chris Murphy
@ 2016-04-04 19:36                   ` Kai Krakow
  2016-04-04 19:57                     ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 19:36 UTC (permalink / raw)
  To: linux-btrfs

Am Sun, 3 Apr 2016 18:51:07 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> > BTW: Is it possible to use my backup drive (it's btrfs single-data
> > dup-metadata, single device) as a seed device for my newly created
> > btrfs pool (raid0-data, raid1-metadata, three devices)?  
> 
> Yes.
> 
> I just tried doing the conversion to raid1 before and after seed
> removal, but with the small amount of data (4GiB) I can't tell a
> difference. It seems like -dconvert=raid with seed still connected
> makes two rw copies (i.e. there's a ro copy which is the original, and
> then two rw copies on 2 of the 3 devices I added all at the same time
> to the seed), and the 'btrfs dev remove' command to remove the seed
> happened immediately, suggested the prior balances had already
> migrated copies off the seed. This may or may not be optimal for your
> case.
> 
> Two gotchas.
> 
> I ran into this bug:
> btrfs fi usage crash when volume contains seed device
> https://bugzilla.kernel.org/show_bug.cgi?id=115851
> 
> And there is a phantom single chunk on one of the new rw devices that
> was added. Data,single: Size:1.00GiB, Used:0.00B
>    /dev/dm-8       1.00GiB
> 
> It's still there after the -dconvert=raid1 and separate -mconvert=raid
> and after seed device removal. A balance start without filters removes
> it, chances are had I used -dconvert=raid1,soft it would have vanished
> also but I didn't retest for that.

Good to know, thanks.

> > I guess the
> > seed source cannot be mounted or modified...  
> 
> ?

In the following sense: I should disable the automounter and backup job
for the seed device while I let my data migrate back to main storage in
the background...

My intention is to use fully my system while btrfs migrates the data
from seed to main storage. Then, afterwards I'd like to continue using
the seed device for backups.

I'd probably do the following:

1. create btrfs pool, attach seed
2. recreate my original subvolume structure by snapshotting the backup
   scratch area multiple times into each subvolume
3. rearrange the files in each subvolume to match their intended use by
   using rm and mv
4. reboot into full system
4. remove all left-over snapshots from the seed
5. remove (detach) the seed device
6. rebalance
7. switch bcache to write-back mode (or attach bcache only now)


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04 19:36                   ` Kai Krakow
@ 2016-04-04 19:57                     ` Chris Murphy
  2016-04-04 20:50                       ` Kai Krakow
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-04 19:57 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow <hurikhan77@gmail.com> wrote:

>
>> > I guess the
>> > seed source cannot be mounted or modified...
>>
>> ?
>
> In the following sense: I should disable the automounter and backup job
> for the seed device while I let my data migrate back to main storage in
> the background...

The sprout can be written to just fine by the backup, just understand
that the seed and sprout volume UUID are different. Your automounter
is probably looking for the seed's UUID, and that seed can only be
mounted ro. The sprout UUID however can be mounted rw.

I would probably skip the automounter. Do the seed setup, mount it,
add all devices you're planning to add, then -o remount,rw,compress...
, and then activate the backup. But maybe your backup also is looking
for UUID? If so, that needs to be updated first. Once the balance
-dconvert=raid1 and -mconvert=raid1 is finished, then you can remove
the seed device. And now might be a good time to give the raid1 a new
label, I think it inherits the label of the seed but I'm not certain
of this.


> My intention is to use fully my system while btrfs migrates the data
> from seed to main storage. Then, afterwards I'd like to continue using
> the seed device for backups.
>
> I'd probably do the following:
>
> 1. create btrfs pool, attach seed

I don't understand that step in terms of commands. Sprouts are made
with btrfs dev add, not with mkfs. There is no pool creation. You make
a seed. You mount it. Add devices to it. Then remount it.


> 2. recreate my original subvolume structure by snapshotting the backup
>    scratch area multiple times into each subvolume
> 3. rearrange the files in each subvolume to match their intended use by
>    using rm and mv
> 4. reboot into full system
> 4. remove all left-over snapshots from the seed
> 5. remove (detach) the seed device

You have two 4's.

Anyway the 2nd 4 is not possible. The seed is ro by definition so you
can't remove snapshots from the seed. If you remove them from the
mounted rw sprout volume, they're removed from the sprout, not the
seed. If you want them on the sprout, but not on the seed, you need to
delete snapshots only after the seed is a.) removed from the sprout
and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04 19:57                     ` Chris Murphy
@ 2016-04-04 20:50                       ` Kai Krakow
  2016-04-04 21:00                         ` Kai Krakow
  2016-04-04 23:09                         ` Chris Murphy
  0 siblings, 2 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 20:50 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 4 Apr 2016 13:57:50 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow <hurikhan77@gmail.com>
> wrote:
> 
> >  
>  [...]  
> >>
> >> ?  
> >
> > In the following sense: I should disable the automounter and backup
> > job for the seed device while I let my data migrate back to main
> > storage in the background...  
> 
> The sprout can be written to just fine by the backup, just understand
> that the seed and sprout volume UUID are different. Your automounter
> is probably looking for the seed's UUID, and that seed can only be
> mounted ro. The sprout UUID however can be mounted rw.
> 
> I would probably skip the automounter. Do the seed setup, mount it,
> add all devices you're planning to add, then -o remount,rw,compress...
> , and then activate the backup. But maybe your backup also is looking
> for UUID? If so, that needs to be updated first. Once the balance
> -dconvert=raid1 and -mconvert=raid1 is finished, then you can remove
> the seed device. And now might be a good time to give the raid1 a new
> label, I think it inherits the label of the seed but I'm not certain
> of this.
> 
> 
> > My intention is to use fully my system while btrfs migrates the data
> > from seed to main storage. Then, afterwards I'd like to continue
> > using the seed device for backups.
> >
> > I'd probably do the following:
> >
> > 1. create btrfs pool, attach seed  
> 
> I don't understand that step in terms of commands. Sprouts are made
> with btrfs dev add, not with mkfs. There is no pool creation. You make
> a seed. You mount it. Add devices to it. Then remount it.

Hmm, yes. I didn't think this through into detail yet. It actually
works that way. I more commonly referenced to the general approach.

But I think this answers my question... ;-)

> > 2. recreate my original subvolume structure by snapshotting the
> > backup scratch area multiple times into each subvolume
> > 3. rearrange the files in each subvolume to match their intended
> > use by using rm and mv
> > 4. reboot into full system
> > 4. remove all left-over snapshots from the seed
> > 5. remove (detach) the seed device  
> 
> You have two 4's.

Oh... Sorry... I think one week of 80 work hours, and another of 60 was
a bit too much... ;-)

> Anyway the 2nd 4 is not possible. The seed is ro by definition so you
> can't remove snapshots from the seed. If you remove them from the
> mounted rw sprout volume, they're removed from the sprout, not the
> seed. If you want them on the sprout, but not on the seed, you need to
> delete snapshots only after the seed is a.) removed from the sprout
> and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.

If I understand right, the seed device won't change? So whatever action
I apply to the sprout pool, I can later remove the seed from the pool
and it will still be kind of untouched. Except, I'll have to return it
no non-seed mode (step b).

Why couldn't/shouldn't I remove snapshots before detaching the seed
device? I want to keep them on the seed but they are useless to me on
the sprout.

What happens to the UUIDs when I separate seed and sprout?

This is my layout:

/dev/sde1 contains my backup storage: btrfs with multiple weeks worth
of retention in form of ro snapshots, and one scratch area in which the
backup is performed. Snapshots are created from the scratch area. The
scratch area is one single subvolume updated by rsync.

I want to turn this into a seed for my newly created btrfs pool. This
one has subvolumes for /home, /home/my_user, /distribution_name/rootfs
and a few more (like var/log etc).

Since the backup is not split by those subvolumes but contains just the
single runtime view of my system rootfs, I'm planning to clone this
single subvolume back into each of my previously used subvolumes which
in turn of course now contain all the same complete filesystem tree.
Thus, in the next step, I'm planning to mv/rm the contents to get back
to the original subvolume structure - mv should be a fast operation
here, rm probably not so but I don't bother. I could defer that until
later by moving those rm-candidates into some trash folder per
subvolume.

Now, I still have the ro-snapshots worth of multiple weeks of
retention. I only need those in my backup storage, not in the storage
proposed to become my bootable system. So I'd simply remove them. I
could also defer that until later easily.

This should get my system back into working state pretty fast and
easily if I didn't miss a point.

I'd now reboot into the system to see if it's working. By then, it's
time for some cleanup (remove the previously deferred "trashes" and
retention snapshots), then separate the seed from the sprout. During
that time, I could already use my system again while it's migrating for
me in the background.

I'd then return the seed back to non-seed, so it can take the role of
my backup storage again. I'd do a rebalance now.

During the whole process, the backup storage will still stay safe for
me. If something goes wrong, I could easily start over.

Did I miss something? Is it too much of an experimental kind of stuff?

BTW: The way it is arranged now, the backup storage is bootable by
setting the scratch area subvolume as the rootfs on kernel cmdline,
USB drivers are included in the kernel, it's tested and works. I guess,
this isn't possible while the backup storage acts as a seed device? But
I have an initrd with latest btrfs-progs on my boot device (which is an
UEFI ESP, so not related to btrfs at all), I should be able to use that
to revert changes preventing me from booting.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04 20:50                       ` Kai Krakow
@ 2016-04-04 21:00                         ` Kai Krakow
  2016-04-04 23:09                         ` Chris Murphy
  1 sibling, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-04 21:00 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 4 Apr 2016 22:50:18 +0200
schrieb Kai Krakow <hurikhan77@gmail.com>:

> Am Mon, 4 Apr 2016 13:57:50 -0600
> schrieb Chris Murphy <lists@colorremedies.com>:
> 
> > On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow <hurikhan77@gmail.com>
> > wrote:
> >   
> > >    
> >  [...]    
>  [...]  
> > >
> > > In the following sense: I should disable the automounter and
> > > backup job for the seed device while I let my data migrate back
> > > to main storage in the background...    
> > 
> > The sprout can be written to just fine by the backup, just
> > understand that the seed and sprout volume UUID are different. Your
> > automounter is probably looking for the seed's UUID, and that seed
> > can only be mounted ro. The sprout UUID however can be mounted rw.
> > 
> > I would probably skip the automounter. Do the seed setup, mount it,
> > add all devices you're planning to add, then -o
> > remount,rw,compress... , and then activate the backup. But maybe
> > your backup also is looking for UUID? If so, that needs to be
> > updated first. Once the balance -dconvert=raid1 and -mconvert=raid1
> > is finished, then you can remove the seed device. And now might be
> > a good time to give the raid1 a new label, I think it inherits the
> > label of the seed but I'm not certain of this.
> > 
> >   
> > > My intention is to use fully my system while btrfs migrates the
> > > data from seed to main storage. Then, afterwards I'd like to
> > > continue using the seed device for backups.
> > >
> > > I'd probably do the following:
> > >
> > > 1. create btrfs pool, attach seed    
> > 
> > I don't understand that step in terms of commands. Sprouts are made
> > with btrfs dev add, not with mkfs. There is no pool creation. You
> > make a seed. You mount it. Add devices to it. Then remount it.  
> 
> Hmm, yes. I didn't think this through into detail yet. It actually
> works that way. I more commonly referenced to the general approach.
> 
> But I think this answers my question... ;-)
> 
> > > 2. recreate my original subvolume structure by snapshotting the
> > > backup scratch area multiple times into each subvolume
> > > 3. rearrange the files in each subvolume to match their intended
> > > use by using rm and mv
> > > 4. reboot into full system
> > > 4. remove all left-over snapshots from the seed
> > > 5. remove (detach) the seed device    
> > 
> > You have two 4's.  
> 
> Oh... Sorry... I think one week of 80 work hours, and another of 60
> was a bit too much... ;-)
> 
> > Anyway the 2nd 4 is not possible. The seed is ro by definition so
> > you can't remove snapshots from the seed. If you remove them from
> > the mounted rw sprout volume, they're removed from the sprout, not
> > the seed. If you want them on the sprout, but not on the seed, you
> > need to delete snapshots only after the seed is a.) removed from
> > the sprout and b.) made no longer a seed with btrfstune -S 0 and
> > c.) mounted rw.  
> 
> If I understand right, the seed device won't change? So whatever
> action I apply to the sprout pool, I can later remove the seed from
> the pool and it will still be kind of untouched. Except, I'll have to
> return it no non-seed mode (step b).
> 
> Why couldn't/shouldn't I remove snapshots before detaching the seed
> device? I want to keep them on the seed but they are useless to me on
> the sprout.
> 
> What happens to the UUIDs when I separate seed and sprout?
> 
> This is my layout:
> 
> /dev/sde1 contains my backup storage: btrfs with multiple weeks worth
> of retention in form of ro snapshots, and one scratch area in which
> the backup is performed. Snapshots are created from the scratch area.
> The scratch area is one single subvolume updated by rsync.
> 
> I want to turn this into a seed for my newly created btrfs pool. This
> one has subvolumes for /home, /home/my_user, /distribution_name/rootfs
> and a few more (like var/log etc).
> 
> Since the backup is not split by those subvolumes but contains just
> the single runtime view of my system rootfs, I'm planning to clone
> this single subvolume back into each of my previously used subvolumes
> which in turn of course now contain all the same complete filesystem
> tree. Thus, in the next step, I'm planning to mv/rm the contents to
> get back to the original subvolume structure - mv should be a fast
> operation here, rm probably not so but I don't bother. I could defer
> that until later by moving those rm-candidates into some trash folder
> per subvolume.
> 
> Now, I still have the ro-snapshots worth of multiple weeks of
> retention. I only need those in my backup storage, not in the storage
> proposed to become my bootable system. So I'd simply remove them. I
> could also defer that until later easily.
> 
> This should get my system back into working state pretty fast and
> easily if I didn't miss a point.
> 
> I'd now reboot into the system to see if it's working. By then, it's
> time for some cleanup (remove the previously deferred "trashes" and
> retention snapshots), then separate the seed from the sprout. During
> that time, I could already use my system again while it's migrating
> for me in the background.
> 
> I'd then return the seed back to non-seed, so it can take the role of
> my backup storage again. I'd do a rebalance now.
> 
> During the whole process, the backup storage will still stay safe for
> me. If something goes wrong, I could easily start over.
> 
> Did I miss something? Is it too much of an experimental kind of stuff?
> 
> BTW: The way it is arranged now, the backup storage is bootable by
> setting the scratch area subvolume as the rootfs on kernel cmdline,
> USB drivers are included in the kernel, it's tested and works. I
> guess, this isn't possible while the backup storage acts as a seed
> device? But I have an initrd with latest btrfs-progs on my boot
> device (which is an UEFI ESP, so not related to btrfs at all), I
> should be able to use that to revert changes preventing me from
> booting.

The whole idea of this is to think of it as sort of thin provisioning
my system from the backup storage, then let btrfs work for me. It saves
me copying back data for 40 hours without being able to use the system.


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04 20:50                       ` Kai Krakow
  2016-04-04 21:00                         ` Kai Krakow
@ 2016-04-04 23:09                         ` Chris Murphy
  2016-04-05  7:05                           ` Kai Krakow
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2016-04-04 23:09 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On Mon, Apr 4, 2016 at 2:50 PM, Kai Krakow <hurikhan77@gmail.com> wrote:

>> Anyway the 2nd 4 is not possible. The seed is ro by definition so you
>> can't remove snapshots from the seed. If you remove them from the
>> mounted rw sprout volume, they're removed from the sprout, not the
>> seed. If you want them on the sprout, but not on the seed, you need to
>> delete snapshots only after the seed is a.) removed from the sprout
>> and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.
>
> If I understand right, the seed device won't change? So whatever action
> I apply to the sprout pool, I can later remove the seed from the pool
> and it will still be kind of untouched. Except, I'll have to return it
> no non-seed mode (step b).

Correct. In a sense, making a volume a seed is like making it a
volume-wide read-only snapshot. Any changes are applied via COW only
to added device(s).

>
> Why couldn't/shouldn't I remove snapshots before detaching the seed
> device? I want to keep them on the seed but they are useless to me on
> the sprout.

You can remove snapshots before or after detaching the seed device, it
doesn't matter, but such snapshot removal only affects the sprout. You
wrote:

"remove all left-over snapshots from the seed"

The seed is read only, you can't modify the contents of the seed device.

What you should do is just delete the snapshots you don't want
migrated over to the sprout right away before you even do the balance
-dconvert -mconvert. That way you aren't wasting time moving things
over that you don't want. To be clear:

btrfstune -S 0
mount /dev/seed /mnt/
btrfs dev add /dev/new1
btrfs dev add /dev/new2
mount -o remount,rw /mnt/
btrfs sub del blah/ blah2/ blah3/ blah4/
btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/
btrfs dev del /dev/seed /mnt/

If you're doing any backsup once remounting rw, note those backups
will only be on the sprout. Backups will not be on the seed because
it's read-only.


>
> What happens to the UUIDs when I separate seed and sprout?

Nothing. They remain intact and unique, per volume.




>
> I'd now reboot into the system to see if it's working.

Note you'll need to change grub.cfg, possibly fstab, and possibly the
initramfs, all three of which may be referencing the old volume.


> By then, it's
> time for some cleanup (remove the previously deferred "trashes" and
> retention snapshots), then separate the seed from the sprout. During
> that time, I could already use my system again while it's migrating for
> me in the background.
>
> I'd then return the seed back to non-seed, so it can take the role of
> my backup storage again. I'd do a rebalance now.

OK? I don't know why you need to balance the seed at all, let alone
afterward, but it seems like it might be a more efficient replication
if you balanced before making it a seed?


>
> During the whole process, the backup storage will still stay safe for
> me. If something goes wrong, I could easily start over.
>
> Did I miss something? Is it too much of an experimental kind of stuff?

I'm not sure where all the bugs are. It's good to find bugs though and
get them squashed. I have an idea of making live media use Btrfs
instead of using a loop mounted file to back a rw lvm snapshot device
(persistent overlay), which I think is really fragile and a lot more
complicated in the initramfs. It's also good to take advantage of
checksumming after having written an ISO to flash media, where users
often don't verify or something can mount the USB stick rw and
immediately modify the stick in such a way that media verification
will fail anyway. So, a number of plusses, I'd like to see the seed
device be robust.


>
> BTW: The way it is arranged now, the backup storage is bootable by
> setting the scratch area subvolume as the rootfs on kernel cmdline,
> USB drivers are included in the kernel, it's tested and works. I guess,
> this isn't possible while the backup storage acts as a seed device? But
> I have an initrd with latest btrfs-progs on my boot device (which is an
> UEFI ESP, so not related to btrfs at all), I should be able to use that
> to revert changes preventing me from booting.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04 19:26                   ` Kai Krakow
@ 2016-04-05  1:44                     ` Duncan
  0 siblings, 0 replies; 22+ messages in thread
From: Duncan @ 2016-04-05  1:44 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Mon, 04 Apr 2016 21:26:28 +0200 as excerpted:

> I'll go test the soon-to-die SSD as soon as it replaced. I think it's
> still far from failing with bitrot. It was overprovisioned by 30% most
> of the time, with the spare space trimmed.

Same here, FWIW.  In fact, I had expected to get ~128 GB SSDs and ended 
up getting 256 GB, such that I was only using about 130 GiB, so depending 
on relative to what the overprovisioning percentage is calculated 
against, I was and am near 50% or 100% overprovisioned.

So in my case I think the SSD was simply defective, such that the 
overprovisioning and trim simply didn't help.  Tho the other two 
identical brand and model devices I bought from the same store at the 
same time, so very likely the same manufacturing lot, were and are just 
fine (tho one is showing a trivial non-zero raw value for 5, reallocated 
sector count, and 182, erase fail count total, but both remain at 100% 
"cooked" value, but absolutely no issues on the other one, actually the 
one that wasn't replaced of the original pair, at all).

But based on that experience, while overprovisioning may help in terms of 
normal wearout, it doesn't necessarily help at all if the device is 
actually going bad.

> It certainly should have a
> lot of sectors for wear levelling. In addition, smartctl shows no sector
> errors at all - except for one: raw_read_error_rate. I'm not sure what
> all those sensors tell me, but that one I'm also seeing on hard disks
> which show absolutely no data damage.
> 
> In fact, I see those counters for my hard disks. But dd to /dev/null of
> the complete raw hard disk shows no sector errors. It seems good. But
> well, counting 1+1 together: I currently see data damage. But I guess
> that's unrelated.
> 
> Is there some documentation somewhere what each of those sensors
> technically mean and how to read the raw values and thresh values?

Nothing user/admin level that I'm aware of.  I'm sure there's some smart 
docs somewhere that describe them as part of the standard, but they could 
easily be effectively unavailable for those unwilling to pay a big-
corporate-sized consortium membership fee (as was the case with one of 
the CompactDisc specs, Orange Book IIRC, at one point).

I know there's some discussion by allusion in the smartctl manpage and 
docs, but many attributes appear to be manufacturer specific and/or to 
have been reverse-engineered by the smartctl devs, meaning even /they/ 
don't really have access to proper documentation for at least some 
attributes.

Which is sad, but in a majority proprietary or at best don't-care 
market...

> I'm also seeing multi_zone_error_rate on my spinning rust.

> According to smartctl health check and smartctl extended selftest,
> there's no problems at all - and the smart error log is empty. There has
> never been an ATA error in dmesg... No relocated sectors... From my
> naive view the drives still look good.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: btrfsck: backpointer mismatch (and multiple other errors)
  2016-04-04 23:09                         ` Chris Murphy
@ 2016-04-05  7:05                           ` Kai Krakow
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Krakow @ 2016-04-05  7:05 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 4 Apr 2016 17:09:14 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> > Why couldn't/shouldn't I remove snapshots before detaching the seed
> > device? I want to keep them on the seed but they are useless to me
> > on the sprout.  
> 
> You can remove snapshots before or after detaching the seed device, it
> doesn't matter, but such snapshot removal only affects the sprout. You
> wrote:
> 
> "remove all left-over snapshots from the seed"
> 
> The seed is read only, you can't modify the contents of the seed
> device.

Sorry, not a native speaker... What I actually meant was to remove the
snapshot that originated from the seed, and which I don't need in the
sprout.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-04-05  7:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31 20:44 btrfsck: backpointer mismatch (and multiple other errors) Kai Krakow
2016-03-31 23:27 ` Henk Slager
2016-04-01  1:10   ` Qu Wenruo
2016-04-02  8:47     ` Kai Krakow
2016-04-02  9:00   ` Kai Krakow
2016-04-02 17:17     ` Henk Slager
2016-04-02 20:16       ` Kai Krakow
2016-04-03  0:14         ` Chris Murphy
2016-04-03  4:02           ` Kai Krakow
2016-04-03  5:06             ` Duncan
2016-04-03 22:19               ` Kai Krakow
2016-04-04  0:51                 ` Chris Murphy
2016-04-04 19:36                   ` Kai Krakow
2016-04-04 19:57                     ` Chris Murphy
2016-04-04 20:50                       ` Kai Krakow
2016-04-04 21:00                         ` Kai Krakow
2016-04-04 23:09                         ` Chris Murphy
2016-04-05  7:05                           ` Kai Krakow
2016-04-04  4:34                 ` Duncan
2016-04-04 19:26                   ` Kai Krakow
2016-04-05  1:44                     ` Duncan
2016-04-03 19:03             ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.