All of lore.kernel.org
 help / color / mirror / Atom feed
* corruption: yet another one after deleting a ro snapshot
@ 2017-01-12  1:07 Christoph Anton Mitterer
  2017-01-12  1:13 ` Christoph Anton Mitterer
  2017-01-12  1:25 ` Qu Wenruo
  0 siblings, 2 replies; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-12  1:07 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

Hey.

Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.15-2 (2017-01-04)
x86_64 GNU/Linux
btrfs-progs v4.7.3

I've had this already at least once some year ago or so:

I was doing backups (incremental via send/receive).
After everything was copied, I unmounted the destination fs, made a
fsck, all fine.
Then I mounted it again and did nothing but deleting the old snapshot.
After that, another fsck with the following errors:


Usually I have quite positive experiences with btrfs (things seem to be
fine even after a crash or accidental removal of the USB cable which
attaches the HDD)... but I'm every time shocked again, when supposedly
simple and basic operations like this cause such corruptions.
Kinda gives one the feeling as if quite deep bugs are still everywhere
in place, especially as such "hard to explain" errors happens every now
and then (take e.g. my mails "strange btrfs deadlock", "csum errors
during btrfs check" from the last days... and I don't seem to be the
only one who suffers from such problems, even with the basic parts of
btrfs which are considered to be stable - I mean we're not talking
about RAID56 here)... sigh :-(


While these files are precious, I have in total copies of all these
files, 3 on btrfs and 1 on ext4 (just to be on the safe side if btrfs
gets corrupted for no good reason :-( ).... so I could do some
debugging here if some developer tells me what to do.


Anyway... what should I do to repair the fs? Or is it better to simply
re-create that backup from scratch?


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-12  1:07 corruption: yet another one after deleting a ro snapshot Christoph Anton Mitterer
@ 2017-01-12  1:13 ` Christoph Anton Mitterer
  2017-01-12  1:25 ` Qu Wenruo
  1 sibling, 0 replies; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-12  1:13 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

Oops.... forgot to copy and past the actual fsck output O:-)
# btrfs check /dev/mapper/data-a3 ; echo $?
Checking filesystem on /dev/mapper/data-a3
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ref mismatch on [37765120 16384] extent item 0, found 1
Backref 37765120 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [37765120 16384]
owner ref check failed [37765120 16384]
ref mismatch on [51200000 16384] extent item 0, found 1
Backref 51200000 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [51200000 16384]
owner ref check failed [51200000 16384]
ref mismatch on [78135296 16384] extent item 0, found 1
Backref 78135296 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [78135296 16384]
owner ref check failed [78135296 16384]
ref mismatch on [5960381235200 16384] extent item 0, found 1
Backref 5960381235200 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [5960381235200 16384]
checking free space cache
checking fs roots
checking csums
checking root refs
found 7483995824128 bytes used err is 0
total csum bytes: 7296183880
total tree bytes: 10875944960
total fs tree bytes: 2035286016
total extent tree bytes: 1015988224
btree space waste bytes: 920641324
file data blocks allocated: 8267656339456
 referenced 8389440876544
0



Also I've found the previous occasion of the apparently same issue:
https://www.spinics.net/lists/linux-btrfs/msg45190.html


What's the suggested way in reporting bugs? Here on the list?
kernel.org bugzilla?
It's a bit worrying that even just I myself has reported quite a number
of likely bugs here on the ML which never got a reaction from a
developer and thus likely still sleep under to hood :-/


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-12  1:07 corruption: yet another one after deleting a ro snapshot Christoph Anton Mitterer
  2017-01-12  1:13 ` Christoph Anton Mitterer
@ 2017-01-12  1:25 ` Qu Wenruo
  2017-01-12  2:28   ` Christoph Anton Mitterer
  1 sibling, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-01-12  1:25 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



At 01/12/2017 09:07 AM, Christoph Anton Mitterer wrote:
> Hey.
>
> Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.15-2 (2017-01-04)
> x86_64 GNU/Linux
> btrfs-progs v4.7.3
>
> I've had this already at least once some year ago or so:
>
> I was doing backups (incremental via send/receive).
> After everything was copied, I unmounted the destination fs, made a
> fsck, all fine.
> Then I mounted it again and did nothing but deleting the old snapshot.
> After that, another fsck with the following errors:
>

According to the messages, some tree blocks has wrong extent backref.

And since you just deleted a subvolume and unmount it soon, I assume the 
btrfs is still doing background subvolume deletion, maybe it's just a 
false alert from btrfsck.

Would you please try btrfs check --mode=lowmem using latest btrfs-progs?

Sometimes bugs in original mode are fixed in lowmem mode.

And it's also recommended to call btrfs fi sync, then wait for some time 
(depending on the subvolume size) to allow btrfs to fully delete the 
subvolume, then try btrfsck.

Thanks,
Qu
>
> Usually I have quite positive experiences with btrfs (things seem to be
> fine even after a crash or accidental removal of the USB cable which
> attaches the HDD)... but I'm every time shocked again, when supposedly
> simple and basic operations like this cause such corruptions.
> Kinda gives one the feeling as if quite deep bugs are still everywhere
> in place, especially as such "hard to explain" errors happens every now
> and then (take e.g. my mails "strange btrfs deadlock", "csum errors
> during btrfs check" from the last days... and I don't seem to be the
> only one who suffers from such problems, even with the basic parts of
> btrfs which are considered to be stable - I mean we're not talking
> about RAID56 here)... sigh :-(
>
>
> While these files are precious, I have in total copies of all these
> files, 3 on btrfs and 1 on ext4 (just to be on the safe side if btrfs
> gets corrupted for no good reason :-( ).... so I could do some
> debugging here if some developer tells me what to do.
>
>
> Anyway... what should I do to repair the fs? Or is it better to simply
> re-create that backup from scratch?
>
>
> Cheers,
> Chris.
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-12  1:25 ` Qu Wenruo
@ 2017-01-12  2:28   ` Christoph Anton Mitterer
  2017-01-12  2:38     ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-12  2:28 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 8224 bytes --]

Hey Qu,

On Thu, 2017-01-12 at 09:25 +0800, Qu Wenruo wrote:
> And since you just deleted a subvolume and unmount it soon
Indeed, I unmounted it pretty quickly afterwards...

I had mounted it (ro) in the meantime, and did a whole
find mntoint > /dev/null
on it just to see whether going through the file hierarchy causes any
kernel errors already.
There are about 1,2 million files on the fs (in now only one snapshot)
and that took some 3-5 mins...
Not sure whether it continues to delete the subvol when it's mounted
ro... if so, it would have had some time.

However, another fsck afterwards:
# btrfs check /dev/mapper/data-a3 ; echo $?
Checking filesystem on /dev/mapper/data-a3
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ref mismatch on [37765120 16384] extent item 0, found 1
Backref 37765120 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [37765120 16384]
owner ref check failed [37765120 16384]
ref mismatch on [51200000 16384] extent item 0, found 1
Backref 51200000 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [51200000 16384]
owner ref check failed [51200000 16384]
ref mismatch on [78135296 16384] extent item 0, found 1
Backref 78135296 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [78135296 16384]
owner ref check failed [78135296 16384]
ref mismatch on [5960381235200 16384] extent item 0, found 1
Backref 5960381235200 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [5960381235200 16384]
checking free space cache
checking fs roots
checking csums
checking root refs
found 7483995824128 bytes used err is 0
total csum bytes: 7296183880
total tree bytes: 10875944960
total fs tree bytes: 2035286016
total extent tree bytes: 1015988224
btree space waste bytes: 920641324
file data blocks allocated: 8267656339456
 referenced 8389440876544
0


> , I assume
> the 
> btrfs is still doing background subvolume deletion, maybe it's just
> a 
> false alert from btrfsck.
If one deleted a subvol and unmounts too fast, will this already cause
a corruption or does btrfs simply continue to cleanup during the next
time(s) it's mounted?



> Would you please try btrfs check --mode=lowmem using latest btrfs-
> progs?
Here we go, however still with v4.7.3:

# btrfs check --mode=lowmem /dev/mapper/data-a3 ; echo $?
Checking filesystem on /dev/mapper/data-a3
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0
ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0
ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0
ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0
ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0
ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0
ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0
ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112
ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112
ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0
ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0
ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872
ERROR: block group[8116980678656 1073741824] used 1073217536 but extent items used 0
ERROR: extent[5960381235200 16384] backref lost (owner: 6403, level: 1)
ERROR: check node failed root 6403 bytenr 5960381235200 level 1, force continue check
ERROR: extent[51200000 16384] backref lost (owner: 257, level: 1)
ERROR: check node failed root 6403 bytenr 51200000 level 1, force continue check
ERROR: extent[37765120 16384] backref lost (owner: 257, level: 1)
ERROR: check node failed root 6403 bytenr 37765120 level 1, force continue check
ERROR: extent[78135296 16384] backref lost (owner: 257, level: 1)
ERROR: check node failed root 6403 bytenr 78135296 level 1, force continue check
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
checking csums
checking root refs
found 7483995758592 bytes used err is 0
total csum bytes: 7296183880
total tree bytes: 11018780672
total fs tree bytes: 2178121728
total extent tree bytes: 1015988224
btree space waste bytes: 936782513
file data blocks allocated: 9157658292224
 referenced 9292573106176
0


btw: even if these may be false positives... shouldn't btrfs-check
return non-zero in any case an error might have been found?! Seems like
another bug...



For policy reasons here it's a bit problematic to simply compile my own
 btrfs-progs from git master... so I could either go leave it with just
4.7.3 (which is probably little helpful for you) and mount the fs now
rw for a while, see whether the errors still occur after say 15 mins
(where it should have had time to delete the subvol)... or  we shelve
this until 4.9.something hit Debian.
What would you prefer?


> And it's also recommended to call btrfs fi sync, then wait for some
> time 
> (depending on the subvolume size) to allow btrfs to fully delete the 
> subvolume, then try btrfsck.

Shouldn't it do these things automatically on unmount (or probably even
remount,ro)?!
I mean a normal user will never know about the necessity of these
steps,... and "some time" is also pretty unspecific even if one knows
about it.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-12  2:28   ` Christoph Anton Mitterer
@ 2017-01-12  2:38     ` Qu Wenruo
  2017-01-15 17:04       ` Christoph Anton Mitterer
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-01-12  2:38 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



At 01/12/2017 10:28 AM, Christoph Anton Mitterer wrote:
> Hey Qu,
>
> On Thu, 2017-01-12 at 09:25 +0800, Qu Wenruo wrote:
>> And since you just deleted a subvolume and unmount it soon
> Indeed, I unmounted it pretty quickly afterwards...
>
> I had mounted it (ro) in the meantime, and did a whole
> find mntoint > /dev/null
> on it just to see whether going through the file hierarchy causes any
> kernel errors already.
> There are about 1,2 million files on the fs (in now only one snapshot)
> and that took some 3-5 mins...
> Not sure whether it continues to delete the subvol when it's mounted
> ro... if so, it would have had some time.

IIRC, RO mount won't continue background deletion.

So the fsck result won't change.
>
> However, another fsck afterwards:
> # btrfs check /dev/mapper/data-a3 ; echo $?
> Checking filesystem on /dev/mapper/data-a3
> UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
> checking extents
> ref mismatch on [37765120 16384] extent item 0, found 1
> Backref 37765120 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [37765120 16384]
> owner ref check failed [37765120 16384]
> ref mismatch on [51200000 16384] extent item 0, found 1
> Backref 51200000 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [51200000 16384]
> owner ref check failed [51200000 16384]
> ref mismatch on [78135296 16384] extent item 0, found 1
> Backref 78135296 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [78135296 16384]
> owner ref check failed [78135296 16384]
> ref mismatch on [5960381235200 16384] extent item 0, found 1
> Backref 5960381235200 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [5960381235200 16384]
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 7483995824128 bytes used err is 0
> total csum bytes: 7296183880
> total tree bytes: 10875944960
> total fs tree bytes: 2035286016
> total extent tree bytes: 1015988224
> btree space waste bytes: 920641324
> file data blocks allocated: 8267656339456
>  referenced 8389440876544
> 0
>
>
>> , I assume
>> the
>> btrfs is still doing background subvolume deletion, maybe it's just
>> a
>> false alert from btrfsck.
> If one deleted a subvol and unmounts too fast, will this already cause
> a corruption or does btrfs simply continue to cleanup during the next
> time(s) it's mounted?

It will continue the deletion on next RW mount.

But, I'm still not sure whether it's a false alert or a *REAL* corruption.

Even it may cause problem and corrupt your data, I still hope you could 
do a rw mount and trigger a btrfs fi sync.

If it's a false alert, we can fix it then with ease.
Or, it's a really big problem.

>
>
>
>> Would you please try btrfs check --mode=lowmem using latest btrfs-
>> progs?
> Here we go, however still with v4.7.3:
>
> # btrfs check --mode=lowmem /dev/mapper/data-a3 ; echo $?
> Checking filesystem on /dev/mapper/data-a3
> UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
> checking extents
> ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0

Errr, lowmem mode is much restrict on this case then.
Quite some block groups has mismatch used space.

But according to the same used number, I assume it's a lowmmem mode bug.

Would you please try 4.9 btrfs-progs?

> ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112
> ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112
> ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872
> ERROR: block group[8116980678656 1073741824] used 1073217536 but extent items used 0
> ERROR: extent[5960381235200 16384] backref lost (owner: 6403, level: 1)

So same missing backref.
It's quite possible a kernel bug then.

> ERROR: check node failed root 6403 bytenr 5960381235200 level 1, force continue check
> ERROR: extent[51200000 16384] backref lost (owner: 257, level: 1)
> ERROR: check node failed root 6403 bytenr 51200000 level 1, force continue check
> ERROR: extent[37765120 16384] backref lost (owner: 257, level: 1)
> ERROR: check node failed root 6403 bytenr 37765120 level 1, force continue check
> ERROR: extent[78135296 16384] backref lost (owner: 257, level: 1)
> ERROR: check node failed root 6403 bytenr 78135296 level 1, force continue check
> Errors found in extent allocation tree or chunk allocation
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 7483995758592 bytes used err is 0
> total csum bytes: 7296183880
> total tree bytes: 11018780672
> total fs tree bytes: 2178121728
> total extent tree bytes: 1015988224
> btree space waste bytes: 936782513
> file data blocks allocated: 9157658292224
>  referenced 9292573106176
> 0
>
>
> btw: even if these may be false positives... shouldn't btrfs-check
> return non-zero in any case an error might have been found?! Seems like
> another bug...

IIRC it's fixed in latest btrfs-progs.
So it's always recommended to use latest btrfs-progs.

Thanks,
Qu

>
>
>
> For policy reasons here it's a bit problematic to simply compile my own
>  btrfs-progs from git master... so I could either go leave it with just
> 4.7.3 (which is probably little helpful for you) and mount the fs now
> rw for a while, see whether the errors still occur after say 15 mins
> (where it should have had time to delete the subvol)... or  we shelve
> this until 4.9.something hit Debian.
> What would you prefer?
>
>
>> And it's also recommended to call btrfs fi sync, then wait for some
>> time
>> (depending on the subvolume size) to allow btrfs to fully delete the
>> subvolume, then try btrfsck.
>
> Shouldn't it do these things automatically on unmount (or probably even
> remount,ro)?!
> I mean a normal user will never know about the necessity of these
> steps,... and "some time" is also pretty unspecific even if one knows
> about it.
>
>
> Cheers,
> Chris.
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-12  2:38     ` Qu Wenruo
@ 2017-01-15 17:04       ` Christoph Anton Mitterer
  2017-01-16  1:38         ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-15 17:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7155 bytes --]

On Thu, 2017-01-12 at 10:38 +0800, Qu Wenruo wrote:
> IIRC, RO mount won't continue background deletion.
I see.


> Would you please try 4.9 btrfs-progs?

Done now, see results (lowmem and original mode) below:

# btrfs version
btrfs-progs v4.9

# btrfs check /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ref mismatch on [37765120 16384] extent item 0, found 1
Backref 37765120 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [37765120 16384]
owner ref check failed [37765120 16384]
ref mismatch on [51200000 16384] extent item 0, found 1
Backref 51200000 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [51200000 16384]
owner ref check failed [51200000 16384]
ref mismatch on [78135296 16384] extent item 0, found 1
Backref 78135296 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [78135296 16384]
owner ref check failed [78135296 16384]
ref mismatch on [5960381235200 16384] extent item 0, found 1
Backref 5960381235200 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [5960381235200 16384]
checking free space cache
checking fs roots
checking csums
checking root refs
found 7483995824128 bytes used err is 0
total csum bytes: 7296183880
total tree bytes: 10875944960
total fs tree bytes: 2035286016
total extent tree bytes: 1015988224
btree space waste bytes: 920641324
file data blocks allocated: 8267656339456
 referenced 8389440876544
0


# btrfs check --mode=lowmem /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0
ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0
ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0
ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0
ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0
ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0
ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0
ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112
ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112
ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0
ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0
ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872
ERROR: block group[8116980678656 1073741824] used 1073217536 but extent items used 0
ERROR: extent[5960381235200 16384] backref lost (owner: 6403, level: 1)
ERROR: check node failed root 6403 bytenr 5960381235200 level 1, force continue check
ERROR: extent[51200000 16384] backref lost (owner: 257, level: 1)
ERROR: check node failed root 6403 bytenr 51200000 level 1, force continue check
ERROR: extent[37765120 16384] backref lost (owner: 257, level: 1)
ERROR: check node failed root 6403 bytenr 37765120 level 1, force continue check
ERROR: extent[78135296 16384] backref lost (owner: 257, level: 1)
ERROR: check node failed root 6403 bytenr 78135296 level 1, force continue check
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
found 7483995758592 bytes used err is -5
total csum bytes: 7296183880
total tree bytes: 11018780672
total fs tree bytes: 2178121728
total extent tree bytes: 1015988224
btree space waste bytes: 936782513
file data blocks allocated: 9157658292224
 referenced 9292573106176
1


It's the same fs as I was talking before, I just had to do it via NBD.


> But, I'm still not sure whether it's a false alert or a *REAL*
> corruption.
> 
> Even it may cause problem and corrupt your data, I still hope you
> could 
> do a rw mount and trigger a btrfs fi sync.
> 
> If it's a false alert, we can fix it then with ease.
> Or, it's a really big problem.

So what should I do next (as said, this is just a backup for me, so
while it would be annoying to recreate it (because nearly the full
8TiB) I could do it without any problem)... in terms of debugging this
issue?

Shall I rw-mount the fs and do sync and wait and retry? Or is there
anything else that you want me to try before in order to get the kernel
bug (if any) or btrfs-progs bug nailed down?


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-15 17:04       ` Christoph Anton Mitterer
@ 2017-01-16  1:38         ` Qu Wenruo
  2017-01-16  2:56           ` Christoph Anton Mitterer
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-01-16  1:38 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



At 01/16/2017 01:04 AM, Christoph Anton Mitterer wrote:
> On Thu, 2017-01-12 at 10:38 +0800, Qu Wenruo wrote:
>> IIRC, RO mount won't continue background deletion.
> I see.
>
>
>> Would you please try 4.9 btrfs-progs?
>
> Done now, see results (lowmem and original mode) below:
>
> # btrfs version
> btrfs-progs v4.9
>
> # btrfs check /dev/nbd0 ; echo $?
> Checking filesystem on /dev/nbd0
> UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
> checking extents
> ref mismatch on [37765120 16384] extent item 0, found 1
> Backref 37765120 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [37765120 16384]
> owner ref check failed [37765120 16384]
> ref mismatch on [51200000 16384] extent item 0, found 1
> Backref 51200000 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [51200000 16384]
> owner ref check failed [51200000 16384]
> ref mismatch on [78135296 16384] extent item 0, found 1
> Backref 78135296 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [78135296 16384]
> owner ref check failed [78135296 16384]
> ref mismatch on [5960381235200 16384] extent item 0, found 1
> Backref 5960381235200 parent 6403 root 6403 not found in extent tree
> backpointer mismatch on [5960381235200 16384]
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 7483995824128 bytes used err is 0
> total csum bytes: 7296183880
> total tree bytes: 10875944960
> total fs tree bytes: 2035286016
> total extent tree bytes: 1015988224
> btree space waste bytes: 920641324
> file data blocks allocated: 8267656339456
>  referenced 8389440876544
> 0
>
>
> # btrfs check --mode=lowmem /dev/nbd0 ; echo $?
> Checking filesystem on /dev/nbd0
> UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
> checking extents
> ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112
> ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112
> ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872
> ERROR: block group[8116980678656 1073741824] used 1073217536 but extent items used 0
> ERROR: extent[5960381235200 16384] backref lost (owner: 6403, level: 1)
> ERROR: check node failed root 6403 bytenr 5960381235200 level 1, force continue check
> ERROR: extent[51200000 16384] backref lost (owner: 257, level: 1)
> ERROR: check node failed root 6403 bytenr 51200000 level 1, force continue check
> ERROR: extent[37765120 16384] backref lost (owner: 257, level: 1)
> ERROR: check node failed root 6403 bytenr 37765120 level 1, force continue check
> ERROR: extent[78135296 16384] backref lost (owner: 257, level: 1)
> ERROR: check node failed root 6403 bytenr 78135296 level 1, force continue check
> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache
> checking fs roots
> found 7483995758592 bytes used err is -5
> total csum bytes: 7296183880
> total tree bytes: 11018780672
> total fs tree bytes: 2178121728
> total extent tree bytes: 1015988224
> btree space waste bytes: 936782513
> file data blocks allocated: 9157658292224
>  referenced 9292573106176
> 1
>
>
> It's the same fs as I was talking before, I just had to do it via NBD.
>

So the fs is REALLY corrupted. Although we still don't know if it's 
persistent.

BTW, lowmem mode seems to have a new false alert when checking the block 
group item.

>
>> But, I'm still not sure whether it's a false alert or a *REAL*
>> corruption.
>>
>> Even it may cause problem and corrupt your data, I still hope you
>> could
>> do a rw mount and trigger a btrfs fi sync.
>>
>> If it's a false alert, we can fix it then with ease.
>> Or, it's a really big problem.
>
> So what should I do next (as said, this is just a backup for me, so
> while it would be annoying to recreate it (because nearly the full
> 8TiB) I could do it without any problem)... in terms of debugging this
> issue?

Did you have any "lightweight" method to reproduce the bug?

For example, on a 1G btrfs fs with moderate operations, for example 
15min or so, to reproduce the bug?

>
> Shall I rw-mount the fs and do sync and wait and retry? Or is there
> anything else that you want me to try before in order to get the kernel
> bug (if any) or btrfs-progs bug nailed down?

Personally speaking, rw mount would help, to verify if it's just a bug 
that will disappear after the deletion is done.

But considering the size of your fs, it may not be a good idea as we 
don't have reliable method to recover/rebuild extent tree yet.

Thanks,
Qu

>
>
> Cheers,
> Chris.
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-16  1:38         ` Qu Wenruo
@ 2017-01-16  2:56           ` Christoph Anton Mitterer
  2017-01-16  3:16             ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-16  2:56 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3086 bytes --]

On Mon, 2017-01-16 at 09:38 +0800, Qu Wenruo wrote:
> So the fs is REALLY corrupted.
*sigh* ... (not as in fuck-I'm-loosing-my-data™ ... but as in *sigh*
another-possibly-deeply-hidden-bug-in-btrfs-that-might-eventually-
cause-data-loss...)

> BTW, lowmem mode seems to have a new false alert when checking the
> block 
> group item.

Anything you want to check me there?


> Did you have any "lightweight" method to reproduce the bug?
Na, not at all... as I've said this already happened to me once before,
and in both cases I was cleaning up old ro-snapshots.

At least in the current case the fs was only ever filled via
send/receive (well apart from minor mkdirs or so)... so there shouldn't
have been any "extreme ways" of using it.

I think (but not sure), that this was also the case on the other
occasion that happened to me with a different fs (i.e. I think it was
also a backup 8TB disk).


> For example, on a 1G btrfs fs with moderate operations, for example 
> 15min or so, to reproduce the bug?
Well I could try to produce it, but I guess you'd have far better means
to do so.

As I've said I was mostly doing send (with -p) | receive to do
incremental backups... and after a while I was cleaning up the old
snapshots on the backup fs.
Of course the snapshot subvols are pretty huge.. as I've said close to
8TB (7.5 or so)... everything from quite big files (4GB) to very small,
smylinks (no device/sockets/fifos)... perhaps some hardlinks...
Some refcopied files. The whole fs has compression enabled.


> > Shall I rw-mount the fs and do sync and wait and retry? Or is there
> > anything else that you want me to try before in order to get the
> > kernel
> > bug (if any) or btrfs-progs bug nailed down?
> 
> Personally speaking, rw mount would help, to verify if it's just a
> bug 
> that will disappear after the deletion is done.
Well but than we might loose any chance to further track it down.

And even if it would go away, it would still at least be a bug in terms
of fsck false positive.... if not more (in the sense of... corruptions
may happen if some affect parts of the fs are used while not cleaned up
again).


> But considering the size of your fs, it may not be a good idea as we 
> don't have reliable method to recover/rebuild extent tree yet.

So what do you effectively want now?
Wait and try something else?
RW mount and recheck to see whether it goes away with that? (And even
if, should I rather re-create/populate the fs from scratch just to be
sure?

What I can also offer in addition... as mentioned some times
previously, I do have full lists of the reg-files/dirs/symlinks as well
as SHA512 sums of each of the reg-files, as they are expected to be on
the fs respectively the snapshot.
So I can offer to do a full verification pass of these, to see whether
anything is missing or (file)data actually corrupted.

Of course that will take a while, and even if everything verifies, I'm
still not really sure whether I'd trust that fs anymore ;-)


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-16  2:56           ` Christoph Anton Mitterer
@ 2017-01-16  3:16             ` Qu Wenruo
  2017-01-16  4:53               ` Christoph Anton Mitterer
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-01-16  3:16 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



At 01/16/2017 10:56 AM, Christoph Anton Mitterer wrote:
> On Mon, 2017-01-16 at 09:38 +0800, Qu Wenruo wrote:
>> So the fs is REALLY corrupted.
> *sigh* ... (not as in fuck-I'm-loosing-my-data™ ... but as in *sigh*
> another-possibly-deeply-hidden-bug-in-btrfs-that-might-eventually-
> cause-data-loss...)
>
>> BTW, lowmem mode seems to have a new false alert when checking the
>> block
>> group item.
>
> Anything you want to check me there?

It would be very nice if you could paste the output of
"btrfs-debug-tree -t extent <your_device>" and "btrfs-debug-tree -t root 
<your device>"

That would help us to fix the bug in lowmem mode.

>
>
>> Did you have any "lightweight" method to reproduce the bug?
> Na, not at all... as I've said this already happened to me once before,
> and in both cases I was cleaning up old ro-snapshots.
>
> At least in the current case the fs was only ever filled via
> send/receive (well apart from minor mkdirs or so)... so there shouldn't
> have been any "extreme ways" of using it.

Since it's mostly populated by receive, yes, receive is completely sane, 
since it's done purely in user-space.

So if we have any way to reproduce it, it won't involve anything special.

BTW, if it's possible, would you please try to run btrfs-check before 
your next deletion on ro-snapshots?

>
> I think (but not sure), that this was also the case on the other
> occasion that happened to me with a different fs (i.e. I think it was
> also a backup 8TB disk).
>
>
>> For example, on a 1G btrfs fs with moderate operations, for example
>> 15min or so, to reproduce the bug?
> Well I could try to produce it, but I guess you'd have far better means
> to do so.
>
> As I've said I was mostly doing send (with -p) | receive to do
> incremental backups... and after a while I was cleaning up the old
> snapshots on the backup fs.
> Of course the snapshot subvols are pretty huge.. as I've said close to
> 8TB (7.5 or so)... everything from quite big files (4GB) to very small,
> smylinks (no device/sockets/fifos)... perhaps some hardlinks...
> Some refcopied files. The whole fs has compression enabled.
>
>
>>> Shall I rw-mount the fs and do sync and wait and retry? Or is there
>>> anything else that you want me to try before in order to get the
>>> kernel
>>> bug (if any) or btrfs-progs bug nailed down?
>>
>> Personally speaking, rw mount would help, to verify if it's just a
>> bug
>> that will disappear after the deletion is done.
> Well but than we might loose any chance to further track it down.
>
> And even if it would go away, it would still at least be a bug in terms
> of fsck false positive.... if not more (in the sense of... corruptions
> may happen if some affect parts of the fs are used while not cleaned up
> again).
>
>
>> But considering the size of your fs, it may not be a good idea as we
>> don't have reliable method to recover/rebuild extent tree yet.
>
> So what do you effectively want now?
> Wait and try something else?
> RW mount and recheck to see whether it goes away with that? (And even
> if, should I rather re-create/populate the fs from scratch just to be
> sure?
>
> What I can also offer in addition... as mentioned some times
> previously, I do have full lists of the reg-files/dirs/symlinks as well
> as SHA512 sums of each of the reg-files, as they are expected to be on
> the fs respectively the snapshot.
> So I can offer to do a full verification pass of these, to see whether
> anything is missing or (file)data actually corrupted.

Not really needed, as all corruption happens on tree block of root 6403,
it means, if it's a real corruption, it will only disturb you(make fs 
suddenly RO) when you try to modify something(leaves under that node) in 
that subvolume.

At least data is good.

And I highly suspect if the subvolume 6403 is the RO snapshot you just 
removed.

If 'btrfs subvolume list' can't find that subvolume, then I think it's 
mostly OK for you to RW mount and wait the subvolume to be fully deleted.

And I think you have already provided enough data for us to, at least 
try to, reproduce the bug.

Thanks,
Qu

>
> Of course that will take a while, and even if everything verifies, I'm
> still not really sure whether I'd trust that fs anymore ;-)
>
>
> Cheers,
> Chris.
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-16  3:16             ` Qu Wenruo
@ 2017-01-16  4:53               ` Christoph Anton Mitterer
  2017-01-16  5:47                 ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-16  4:53 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1809 bytes --]

On Mon, 2017-01-16 at 11:16 +0800, Qu Wenruo wrote:
> It would be very nice if you could paste the output of
> "btrfs-debug-tree -t extent <your_device>" and "btrfs-debug-tree -t
> root 
> <your device>"
> 
> That would help us to fix the bug in lowmem mode.
I'll send you the link in a private mail ... if any other developer
needs it, just ask me or Qu for the link.


> BTW, if it's possible, would you please try to run btrfs-check
> before 
> your next deletion on ro-snapshots?
You mean in general, when I do my next runs of backups respectively
snaphot-cleanup?
Sure, actually I did this this time as well (in original mode, though),
and no error was found.

For what should I look out?


> Not really needed, as all corruption happens on tree block of root
> 6403,
> it means, if it's a real corruption, it will only disturb you(make
> fs 
> suddenly RO) when you try to modify something(leaves under that node)
> in 
> that subvolume.
Ah... and it couldn't cause corruption to the same data blocks if they
were used by another snaphshot?



> And I highly suspect if the subvolume 6403 is the RO snapshot you
> just removed.
I guess there is no way to find out whether it was that snapshot, is
there?



> If 'btrfs subvolume list' can't find that subvolume, then I think
> it's 
> mostly OK for you to RW mount and wait the subvolume to be fully
> deleted.
>
> And I think you have already provided enough data for us to, at
> least 
> try to, reproduce the bug.

I won't do the remount,rw this night, so you have the rest of your
day/night time to think of anything further I should test or provide
you with from that fs... then it will be "gone" (in the sense of
mounted RW).
Just give your veto if I should wait :)


Thanks,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-16  4:53               ` Christoph Anton Mitterer
@ 2017-01-16  5:47                 ` Qu Wenruo
  2017-01-16 22:07                   ` Christoph Anton Mitterer
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-01-16  5:47 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



At 01/16/2017 12:53 PM, Christoph Anton Mitterer wrote:
> On Mon, 2017-01-16 at 11:16 +0800, Qu Wenruo wrote:
>> It would be very nice if you could paste the output of
>> "btrfs-debug-tree -t extent <your_device>" and "btrfs-debug-tree -t
>> root
>> <your device>"
>>
>> That would help us to fix the bug in lowmem mode.
> I'll send you the link in a private mail ... if any other developer
> needs it, just ask me or Qu for the link.
>
>
>> BTW, if it's possible, would you please try to run btrfs-check
>> before
>> your next deletion on ro-snapshots?
> You mean in general, when I do my next runs of backups respectively
> snaphot-cleanup?
> Sure, actually I did this this time as well (in original mode, though),
> and no error was found.
>
> For what should I look out?

Nothing special, just in case the fs is already corrupted.

>
>
>> Not really needed, as all corruption happens on tree block of root
>> 6403,
>> it means, if it's a real corruption, it will only disturb you(make
>> fs
>> suddenly RO) when you try to modify something(leaves under that node)
>> in
>> that subvolume.
> Ah... and it couldn't cause corruption to the same data blocks if they
> were used by another snaphshot?

No, it won't cause corruption to any data block, no matter shared or not.

>
>
>
>> And I highly suspect if the subvolume 6403 is the RO snapshot you
>> just removed.
> I guess there is no way to find out whether it was that snapshot, is
> there?

"btrfs subvolume list" could do it."
If no output of 6403, then it's removed.

And "btrfs-debug-tree -t root" also has info for it.
A deleted subvolume won't have corresponding ROOT_BACKREF, and its 
ROOT_ITEM should have none-zero drop key.

And in your case, your subvolume is indeed undergoing deletion.


Also checked the extent tree, the result is a little interesting:
1) Most tree backref are good.
    In fact, 3 of all the 4 errors reported are tree blocks shared by
    other subvolumes, like:

item 77 key (51200000 METADATA_ITEM 1) itemoff 13070 itemsize 42
		extent refs 2 gen 11 flags TREE_BLOCK|FULL_BACKREF
		tree block skinny level 1
		tree block backref root 7285
		tree block backref root 6572

This means the tree blocks are share by 2 other subvolumes,
7285 and 6572.

7285 subvolume is completely OK, while 6572 is also undergoing subvolume 
deletion(while real deletion doesn't start yet).

And considering the generation, I assume 6403 is deleted before 6572.

So we're almost clear that, btrfs (maybe only btrfsck) doesn't handle it 
well if there are multiple subvolume undergoing deletion.

This gives us enough info to try to build such image by ourselves now.
(Although still quite hard to do though).

Also that also explained why btrfs-progs test image 021 won't trigger 
the problem.
As it's only one subvolume undergoing deletion and no full backref extent.



And for the scary lowmem mode, it's false alert.

I manually checked the used size of a block group and it's OK.

BTW, most of your block groups are completely used, without any space.
But interestingly, mostly data extent size are just 512K, larger than
compressed extent upper limit, but still quite small.

In other words, your fs seems to be fragmented considering the upper 
limit of a data extent is 128M.
(Or your case is quite common in common world?)

>
>
>
>> If 'btrfs subvolume list' can't find that subvolume, then I think
>> it's
>> mostly OK for you to RW mount and wait the subvolume to be fully
>> deleted.
>>
>> And I think you have already provided enough data for us to, at
>> least
>> try to, reproduce the bug.
>
> I won't do the remount,rw this night, so you have the rest of your
> day/night time to think of anything further I should test or provide
> you with from that fs... then it will be "gone" (in the sense of
> mounted RW).
> Just give your veto if I should wait :)

At least from extent and root tree dump, I found nothing wrong.

It's still possible that some full backref needs to be checked from 
subvolume tree (consdiering your fs size, not really practical) and can 
be wrong, but the possibility is quite low.
And in that case, there should be more than 4 extent tree errors reported.

So you are mostly OK to mount it rw any time you want, and I have 
already downloaded the raw data.

Hard part is remaining for us developers to build such small image to 
reproduce your situation then.

Thanks,
Qu
>
>
> Thanks,
> Chris.
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-16  5:47                 ` Qu Wenruo
@ 2017-01-16 22:07                   ` Christoph Anton Mitterer
  2017-01-17  8:53                     ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-16 22:07 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 9032 bytes --]

On Mon, 2017-01-16 at 13:47 +0800, Qu Wenruo wrote:
> > > And I highly suspect if the subvolume 6403 is the RO snapshot you
> > > just removed.
> > 
> > I guess there is no way to find out whether it was that snapshot,
> > is
> > there?
> 
> "btrfs subvolume list" could do it."
Well that was clear,... but I rather meant something that also shows me
the path of the deleted subvol.
Anyway:
# btrfs subvolume list /data/data-a/3/
ID 6029 gen 2528 top level 5 path data
ID 6031 gen 3208 top level 5 path backups
ID 7285 gen 3409 top level 5 path snapshots/_external-fs/data-a1/data/2017-01-11_1

So since I only had two further snapshots in snapshots/_external-
fs/data-a1/data/ it must be the deleted one.

btw: data is empty, and backup contains actually some files (~25k,
~360GB)... these are not created via send/receive, but either via cp or
rsync.
And they are always in the same subvol (i.e. the backups svol isn't
deleted like the snaphots are)


> Also checked the extent tree, the result is a little interesting:
> 1) Most tree backref are good.
>     In fact, 3 of all the 4 errors reported are tree blocks shared by
>     other subvolumes, like:
> 
> item 77 key (51200000 METADATA_ITEM 1) itemoff 13070 itemsize 42
> 		extent refs 2 gen 11 flags TREE_BLOCK|FULL_BACKREF
> 		tree block skinny level 1
> 		tree block backref root 7285
> 		tree block backref root 6572
> 
> This means the tree blocks are share by 2 other subvolumes,
> 7285 and 6572.
> 
> 7285 subvolume is completely OK, while 6572 is also undergoing
> subvolume 
> deletion(while real deletion doesn't start yet).
Well there were in total three snapshots, the still existing:
snapshots/_external-fs/data-a1/data/2017-01-11_1
and two earlier ones,
one from around 2016-09-16_1 (= probably ID 6572?), one even a bit
earlier from 2016-08-19_1 (probably ID 6403?).
Each one was created with
send -p | receive, using the respectively earlier one as parent.
So it's
quite reasonable that they share the extents and also that it'sby 2
subvols.



> And considering the generation, I assume 6403 is deleted before 6572.
Don't remember which one of the 2 subvols form 2016 I've deleted first,
the older or the more recent one... my bash history implies in this
order:
 4203  btrfs subvolume delete 2016-08-19_1
 4204  btrfs subvolume delete 2016-09-16_1


> So we're almost clear that, btrfs (maybe only btrfsck) doesn't handle
> it 
> well if there are multiple subvolume undergoing deletion.
> 
> This gives us enough info to try to build such image by ourselves
> now.
> (Although still quite hard to do though).
Well keep me informed if you actually find/fix something  :)


> And for the scary lowmem mode, it's false alert.
> 
> I manually checked the used size of a block group and it's OK.
So you're going to fix this?


> BTW, most of your block groups are completely used, without any
> space.
> But interestingly, mostly data extent size are just 512K, larger than
> compressed extent upper limit, but still quite small.
Not sure if I understand this...


> In other words, your fs seems to be fragmented considering the upper 
> limit of a data extent is 128M.
> (Or your case is quite common in common world?)
No, I don't think I understand what you mean :D


> So you are mostly OK to mount it rw any time you want, and I have 
> already downloaded the raw data.
Okay, I've remounted it now RW, called btrfs filesystem sync, and
waited until the HDD became silent and showed no further activity.

(again 3.9)

# btrfs check /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 7469206884352 bytes used err is 0
total csum bytes: 7281779252
total tree bytes: 10837262336
total fs tree bytes: 2011906048
total extent tree bytes: 1015349248
btree space waste bytes: 922444044
file data blocks allocated: 7458369622016
 referenced 7579485159424
0


=> as you can see, original mode pretends things would be fine now.


# btrfs check --mode=lowmem /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0
ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0
ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0
ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0
ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0
ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0
ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0
ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0
ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112
ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0
ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552
ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0
ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112
ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0
ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0
ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0
ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872
ERROR: block group[8116980678656 1073741824] used 1073217536 but extent items used 0
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
found 7469206884352 bytes used err is -5
total csum bytes: 7281779252
total tree bytes: 10837262336
total fs tree bytes: 2011906048
total extent tree bytes: 1015349248
btree space waste bytes: 922444044
file data blocks allocated: 7458369622016
 referenced 7579485159424
1

=> but lomem mode finds quite some errors... actually it seems even
worse, normally (and before with 3.9 but without having it RW mounted
yet), the "checking fs roots" took ages (at least an hour or two... nbd
makes it even slower)... but this time, while checking extents too also
long, checking fs roots was extremely fast (possibly because of some
more grave error?), and checking csums didn't even occur.

What do you think... error in the fs or in fsck's lowmem mode?


> Hard part is remaining for us developers to build such small image
> to 
> reproduce your situation then.
Well... that's the life of a btrfs-dev ;-P


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-16 22:07                   ` Christoph Anton Mitterer
@ 2017-01-17  8:53                     ` Qu Wenruo
  2017-01-17 10:39                       ` Christoph Anton Mitterer
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-01-17  8:53 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



At 01/17/2017 06:07 AM, Christoph Anton Mitterer wrote:
> On Mon, 2017-01-16 at 13:47 +0800, Qu Wenruo wrote:
>>>> And I highly suspect if the subvolume 6403 is the RO snapshot you
>>>> just removed.
>>>
>>> I guess there is no way to find out whether it was that snapshot,
>>> is
>>> there?
>>
>> "btrfs subvolume list" could do it."
> Well that was clear,... but I rather meant something that also shows me
> the path of the deleted subvol.

Deleted subvol lost its ROOT_BACKREF, so there is no info where that 
subvolume used to be.

> Anyway:
> # btrfs subvolume list /data/data-a/3/
> ID 6029 gen 2528 top level 5 path data
> ID 6031 gen 3208 top level 5 path backups
> ID 7285 gen 3409 top level 5 path snapshots/_external-fs/data-a1/data/2017-01-11_1
>
> So since I only had two further snapshots in snapshots/_external-
> fs/data-a1/data/ it must be the deleted one.
>
> btw: data is empty, and backup contains actually some files (~25k,
> ~360GB)... these are not created via send/receive, but either via cp or
> rsync.
> And they are always in the same subvol (i.e. the backups svol isn't
> deleted like the snaphots are)
>
>
>> Also checked the extent tree, the result is a little interesting:
>> 1) Most tree backref are good.
>>     In fact, 3 of all the 4 errors reported are tree blocks shared by
>>     other subvolumes, like:
>>
>> item 77 key (51200000 METADATA_ITEM 1) itemoff 13070 itemsize 42
>> 		extent refs 2 gen 11 flags TREE_BLOCK|FULL_BACKREF
>> 		tree block skinny level 1
>> 		tree block backref root 7285
>> 		tree block backref root 6572
>>
>> This means the tree blocks are share by 2 other subvolumes,
>> 7285 and 6572.
>>
>> 7285 subvolume is completely OK, while 6572 is also undergoing
>> subvolume
>> deletion(while real deletion doesn't start yet).
> Well there were in total three snapshots, the still existing:
> snapshots/_external-fs/data-a1/data/2017-01-11_1
> and two earlier ones,
> one from around 2016-09-16_1 (= probably ID 6572?), one even a bit
> earlier from 2016-08-19_1 (probably ID 6403?).
> Each one was created with
> send -p | receive, using the respectively earlier one as parent.
> So it's
> quite reasonable that they share the extents and also that it'sby 2
> subvols.
>
>
>
>> And considering the generation, I assume 6403 is deleted before 6572.
> Don't remember which one of the 2 subvols form 2016 I've deleted first,
> the older or the more recent one... my bash history implies in this
> order:
>  4203  btrfs subvolume delete 2016-08-19_1
>  4204  btrfs subvolume delete 2016-09-16_1
>
>
>> So we're almost clear that, btrfs (maybe only btrfsck) doesn't handle
>> it
>> well if there are multiple subvolume undergoing deletion.
>>
>> This gives us enough info to try to build such image by ourselves
>> now.
>> (Although still quite hard to do though).
> Well keep me informed if you actually find/fix something  :)
>
>
>> And for the scary lowmem mode, it's false alert.
>>
>> I manually checked the used size of a block group and it's OK.
> So you're going to fix this?

Yes, digging now.
The lowmem mode bug should be much easier to fix, compared to the lost 
backref false alert.

>
>
>> BTW, most of your block groups are completely used, without any
>> space.
>> But interestingly, mostly data extent size are just 512K, larger than
>> compressed extent upper limit, but still quite small.
> Not sure if I understand this...
>
>
>> In other words, your fs seems to be fragmented considering the upper
>> limit of a data extent is 128M.
>> (Or your case is quite common in common world?)
> No, I don't think I understand what you mean :D
>
>
>> So you are mostly OK to mount it rw any time you want, and I have
>> already downloaded the raw data.
> Okay, I've remounted it now RW, called btrfs filesystem sync, and
> waited until the HDD became silent and showed no further activity.
>
> (again 3.9)
>
> # btrfs check /dev/nbd0 ; echo $?
> Checking filesystem on /dev/nbd0
> UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 7469206884352 bytes used err is 0
> total csum bytes: 7281779252
> total tree bytes: 10837262336
> total fs tree bytes: 2011906048
> total extent tree bytes: 1015349248
> btree space waste bytes: 922444044
> file data blocks allocated: 7458369622016
>  referenced 7579485159424
> 0

Nice to see it.

>
>
> => as you can see, original mode pretends things would be fine now.
>
>
> # btrfs check --mode=lowmem /dev/nbd0 ; echo $?
> Checking filesystem on /dev/nbd0
> UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
> checking extents
> ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112
> ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552
> ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112
> ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0
> ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872
> ERROR: block group[8116980678656 1073741824] used 1073217536 but extent items used 0
> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache
> checking fs roots
> found 7469206884352 bytes used err is -5
> total csum bytes: 7281779252
> total tree bytes: 10837262336
> total fs tree bytes: 2011906048
> total extent tree bytes: 1015349248
> btree space waste bytes: 922444044
> file data blocks allocated: 7458369622016
>  referenced 7579485159424
> 1
>
> => but lomem mode finds quite some errors... actually it seems even
> worse, normally (and before with 3.9 but without having it RW mounted
> yet), the "checking fs roots" took ages (at least an hour or two... nbd
> makes it even slower)... but this time, while checking extents too also
> long, checking fs roots was extremely fast (possibly because of some
> more grave error?), and checking csums didn't even occur.
>
> What do you think... error in the fs or in fsck's lowmem mode?

Just lowmem false alert, as extent-tree dump shows complete fine result.

I'll CC you and adds your reported-by tag when there is any update on 
this case.

Thanks,
Qu
>
>
>> Hard part is remaining for us developers to build such small image
>> to
>> reproduce your situation then.
> Well... that's the life of a btrfs-dev ;-P
>
>
> Cheers,
> Chris.
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-17  8:53                     ` Qu Wenruo
@ 2017-01-17 10:39                       ` Christoph Anton Mitterer
  2017-01-18  0:41                         ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-17 10:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Am 17. Januar 2017 09:53:19 MEZ schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
>Just lowmem false alert, as extent-tree dump shows complete fine
>result.
>
>I'll CC you and adds your reported-by tag when there is any update on 
>this case.

Fine, just one thing left right more from my side on this issue:
Do you want me to leave the fs untouched until I could verify a lowmem mode fix?
Or is it ok to go on using it (and running backups on it)? 

Cheers,
Chris.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-17 10:39                       ` Christoph Anton Mitterer
@ 2017-01-18  0:41                         ` Qu Wenruo
  2017-01-18  1:20                           ` Christoph Anton Mitterer
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-01-18  0:41 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



At 01/17/2017 06:39 PM, Christoph Anton Mitterer wrote:
> Am 17. Januar 2017 09:53:19 MEZ schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
>> Just lowmem false alert, as extent-tree dump shows complete fine
>> result.
>>
>> I'll CC you and adds your reported-by tag when there is any update on
>> this case.
>
> Fine, just one thing left right more from my side on this issue:
> Do you want me to leave the fs untouched until I could verify a lowmem mode fix?
> Or is it ok to go on using it (and running backups on it)?
>
> Cheers,
> Chris.
>
>
Since we have your extent tree and root tree dump, I think we should be 
able to build a image to reproduce the case.

So you're OK to go on using it.
BTW, your fs is too large for us to really do some verification or other 
work.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
  2017-01-18  0:41                         ` Qu Wenruo
@ 2017-01-18  1:20                           ` Christoph Anton Mitterer
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Anton Mitterer @ 2017-01-18  1:20 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

On Wed, 2017-01-18 at 08:41 +0800, Qu Wenruo wrote:
> Since we have your extent tree and root tree dump, I think we should
> be 
> able to build a image to reproduce the case.
+1

> BTW, your fs is too large for us to really do some verification or
> other 
> work.

Sure I know... but that's simply the one which I work the most with and
where I stumble over such things.

I have e.g. a smaller one (well still 1TB in total 500GB used) which is
the root-fs from my notebook... but not really any issues with that so
far ^^


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
@ 2017-01-16 11:06 Giuseppe Della Bianca
  0 siblings, 0 replies; 18+ messages in thread
From: Giuseppe Della Bianca @ 2017-01-16 11:06 UTC (permalink / raw)
  To: quwenruo, calestyo; +Cc: linux-btrfs

Hi.

If it can be helpful.

How to double checking the status of my filesystem, I launched ' btrfs
scrub / ' and/or ' du -sh /* '.
If the file system is corrupt, in my case, the command have aborted.


Regards.


gdb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: corruption: yet another one after deleting a ro snapshot
@ 2017-01-12 10:27 Giuseppe Della Bianca
  0 siblings, 0 replies; 18+ messages in thread
From: Giuseppe Della Bianca @ 2017-01-12 10:27 UTC (permalink / raw)
  To: calestyo; +Cc: linux-btrfs

Hi.

I had issues with a case very similar to yours.

My experience is that subvolume delete and/or attempt to repair the file
system makes the situation worse.

To gain experience and accepting losing the data of the file system, I
have always tried to recover the file system.

But then I always had to recreate the partition from scratch.


Cheers.

gdb


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-18  1:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12  1:07 corruption: yet another one after deleting a ro snapshot Christoph Anton Mitterer
2017-01-12  1:13 ` Christoph Anton Mitterer
2017-01-12  1:25 ` Qu Wenruo
2017-01-12  2:28   ` Christoph Anton Mitterer
2017-01-12  2:38     ` Qu Wenruo
2017-01-15 17:04       ` Christoph Anton Mitterer
2017-01-16  1:38         ` Qu Wenruo
2017-01-16  2:56           ` Christoph Anton Mitterer
2017-01-16  3:16             ` Qu Wenruo
2017-01-16  4:53               ` Christoph Anton Mitterer
2017-01-16  5:47                 ` Qu Wenruo
2017-01-16 22:07                   ` Christoph Anton Mitterer
2017-01-17  8:53                     ` Qu Wenruo
2017-01-17 10:39                       ` Christoph Anton Mitterer
2017-01-18  0:41                         ` Qu Wenruo
2017-01-18  1:20                           ` Christoph Anton Mitterer
2017-01-12 10:27 Giuseppe Della Bianca
2017-01-16 11:06 Giuseppe Della Bianca

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.