All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
@ 2019-07-13 20:48 Alexander Wetzel
  2019-07-14  1:30 ` Qu Wenruo
  2019-07-29 12:46 ` Swâmi Petaramesh
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Wetzel @ 2019-07-13 20:48 UTC (permalink / raw)
  To: linux-btrfs, wqu

Hello,

After updating one of my VMs from 5.1.16 to 5.2 btrfs is acting up strange:
The system is using btrfs as root (also for /boot) and has compression 
enabled. (It's a gentoo virtual machine and not using an initrd.) 
Rebooting the system into 5.2 is able to bring up openssh, but not other 
services (like e.g. postfix).

Redirecting dmesg to a file also failed. The file was created but empty, 
even when checked immediately. Rebooting the system into the old kernel 
fully restores functionality and a btrfs srub shows no errors..

When running a bad kernel the system is partially ro, but since I'm 
using selinux in strict mode it could also by caused by selinux unable 
to read some data.

Here how a reboot via ssh looks for a broken kernel:
xar /home/alex # shutdown -r now
shutdown: warning: cannot open /var/run/shutdown.pid

But deleting the "bad" kernel and running "grub-mkconfig -o 
/boot/grub/grub.cfg" works as it should and the system is using the 
previous kernel... So it still can write some things...

I've bisected the problem to 496245cac57e (btrfs: tree-checker: Verify 
inode item)

filtering for btrfs and removing duplicate lines just shows three uniq 
error messages:
  BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528 
slot=4 ino=259223, invalid inode generation: has 139737289170944 expect 
[0, 1425224]
  BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528 
slot=4 ino=259223, invalid inode generation: has 139737289170944 expect 
[0, 1425225]
  BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528 
slot=4 ino=259223, invalid inode generation: has 139737289170944 expect 
[0, 1425227]
  BTRFS error (device vda3): block=8645398528 read time tree block 
corruption detected

All there errors are only there with commit 496245cac57e, booting a 
kernel without the patch after that just works normally again.

I tried to reproduce the issue transferring the fs with btrfs 
send/receive to another system. I was able to mount the migrated Fs with 
a 5.2 kernel and also btrfs scub was ok...

I tried to revert the commit but a simple revert is not working. So I've 
just verified that a kernel build on 496245cac57e shows the symptoms 
while using the commit before that is fine.

I've not tried more, so I still can reproduce the issue when needed and 
gather more data. (The system is now back running 5.1.16)

Now I guess I could have some corruption only detected/triggered with 
the patch and btrfs check may fix it... shall I try that next?

Here the dmesg from the affected system (the first few lines still in 
the log buffer when I checked):
[    8.963796] BTRFS critical (device vda3): corrupt leaf: root=300 
block=8645398528 slot=4 ino=259223, invalid inode generation: has 
139737289170944 expect [0, 1425224]
[    8.963799] BTRFS error (device vda3): block=8645398528 read time 
tree block corruption detected
[    8.967487] audit: type=1400 audit(1563023702.540:19): avc:  denied 
{ write } for  pid=2154 comm="cp" name="localtime" dev="vda3" 
ino=1061039 scontext=system_u:system_r:initrc_t 
tcontext=system_u:object_r:locale_t tclass=file permissive=0
[    9.023608] audit: type=1400 audit(1563023702.590:20): avc:  denied 
{ mounton } for  pid=2194 comm="mount" path="/chroot/dns/run/named" 
dev="vda3" ino=1061002 scontext=system_u:system_r:mount_t 
tcontext=system_u:object_r:named_var_run_t tclass=dir permissive=0
[    9.038235] audit: type=1400 audit(1563023702.610:21): avc:  denied 
{ getattr } for  pid=2199 comm="start-stop-daem" path="pid:[4026531836]" 
dev="nsfs" ino=4026531836 scontext=system_u:system_r:initrc_t 
tcontext=system_u:object_r:nsfs_t tclass=file permissive=0
[    9.100897] BTRFS critical (device vda3): corrupt leaf: root=300 
block=8645398528 slot=4 ino=259223, invalid inode generation: has 
139737289170944 expect [0, 1425224]
[    9.100900] BTRFS error (device vda3): block=8645398528 read time 
tree block corruption detected
[    9.137974] BTRFS critical (device vda3): corrupt leaf: root=300 
block=8645398528 slot=4 ino=259223, invalid inode generation: has 
139737289170944 expect [0, 1425224]
[    9.137976] BTRFS error (device vda3): block=8645398528 read time 
tree block corruption detected
[    9.138095] audit: type=1400 audit(1563023702.710:22): avc:  denied 
{ getattr } for  pid=2237 comm="start-stop-daem" path="pid:[4026531836]" 
dev="nsfs" ino=4026531836 scontext=system_u:system_r:initrc_t 
tcontext=system_u:object_r:nsfs_t tclass=file permissive=0
[    9.138477] BTRFS critical (device vda3): corrupt leaf: root=300 
block=8645398528 slot=4 ino=259223, invalid inode generation: has 
139737289170944 expect [0, 1425224]
[    9.138480] BTRFS error (device vda3): block=8645398528 read time 
tree block corruption detected
[    9.161866] BTRFS critical (device vda3): corrupt leaf: root=300 
block=8645398528 slot=4 ino=259223, invalid inode generation: has 
139737289170944 expect [0, 1425224]
[    9.161868] BTRFS error (device vda3): block=8645398528 read time 
tree block corruption detected
[    9.170228] BTRFS critical (device vda3): corrupt leaf: root=300 
block=8645398528 slot=4 ino=259223, invalid inode generation: has 
139737289170944 expect [0, 1425224]
[    9.170230] BTRFS error (device vda3): block=8645398528 read time 
tree block corruption detected
[    9.214491] BTRFS critical (device vda3): corrupt leaf: root=300 
block=8645398528 slot=4 ino=259223, invalid inode generation: has 
139737289170944 expect [0, 1425224]
[    9.214494] BTRFS error (device vda3): block=8645398528 read time 
tree block corruption detected


Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-13 20:48 [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e Alexander Wetzel
@ 2019-07-14  1:30 ` Qu Wenruo
  2019-07-14  9:25   ` Alexander Wetzel
  2019-07-29 12:46 ` Swâmi Petaramesh
  1 sibling, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-07-14  1:30 UTC (permalink / raw)
  To: Alexander Wetzel, linux-btrfs, wqu


[-- Attachment #1.1: Type: text/plain, Size: 7045 bytes --]



On 2019/7/14 上午4:48, Alexander Wetzel wrote:
> Hello,
> 
> After updating one of my VMs from 5.1.16 to 5.2 btrfs is acting up strange:
> The system is using btrfs as root (also for /boot) and has compression
> enabled. (It's a gentoo virtual machine and not using an initrd.)
> Rebooting the system into 5.2 is able to bring up openssh, but not other
> services (like e.g. postfix).
> 
> Redirecting dmesg to a file also failed. The file was created but empty,
> even when checked immediately. Rebooting the system into the old kernel
> fully restores functionality and a btrfs srub shows no errors..
> 
> When running a bad kernel the system is partially ro, but since I'm
> using selinux in strict mode it could also by caused by selinux unable
> to read some data.
> 
> Here how a reboot via ssh looks for a broken kernel:
> xar /home/alex # shutdown -r now
> shutdown: warning: cannot open /var/run/shutdown.pid
> 
> But deleting the "bad" kernel and running "grub-mkconfig -o
> /boot/grub/grub.cfg" works as it should and the system is using the
> previous kernel... So it still can write some things...
> 
> I've bisected the problem to 496245cac57e (btrfs: tree-checker: Verify
> inode item)
> 
> filtering for btrfs and removing duplicate lines just shows three uniq
> error messages:
>  BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
> [0, 1425224]
>  BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
> [0, 1425225]
>  BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
> [0, 1425227]

The generation number is 0x7f171f7ba000, I see no reason why it would
make any sense.

I see no problem rejecting obviously corrupted item.

The problem is:
- Is that corrupted item?
  At least to me, it looks corrupted just from the dmesg.

- How and when this happens
  Obviously happened on some older kernel.
  V5.2 will report such problem before writing corrupted data back to
  disk, at least prevent such problem from happening.

Please provide the following dump:
 # btrfs ins dump-tree -b 8645398528 /dev/vda3

>  BTRFS error (device vda3): block=8645398528 read time tree block
> corruption detected
> 
> All there errors are only there with commit 496245cac57e, booting a
> kernel without the patch after that just works normally again.
> 
> I tried to reproduce the issue transferring the fs with btrfs
> send/receive to another system. I was able to mount the migrated Fs with
> a 5.2 kernel and also btrfs scub was ok...
> 
> I tried to revert the commit but a simple revert is not working. So I've
> just verified that a kernel build on 496245cac57e shows the symptoms
> while using the commit before that is fine.
> 
> I've not tried more, so I still can reproduce the issue when needed and
> gather more data. (The system is now back running 5.1.16)
> 
> Now I guess I could have some corruption only detected/triggered with
> the patch and btrfs check may fix it... shall I try that next?

Sorry, AFAIK btrfs doesn't check as strict as kernel tree-checker, as
corrupted data in kernel space could lead to system crash while in user
space it would only cause btrfs check to crash.

Thus I made tree-checker way picky about irregular data, it's literally
checking every member and even unused member.
The dump mentioned above should help us to determine whether btrfs check
can detect and fix it.
(I believe it shouldn't be that hard to fix in btrfs-progs)

Thanks,
Qu

> 
> Here the dmesg from the affected system (the first few lines still in
> the log buffer when I checked):
> [    8.963796] BTRFS critical (device vda3): corrupt leaf: root=300
> block=8645398528 slot=4 ino=259223, invalid inode generation: has
> 139737289170944 expect [0, 1425224]
> [    8.963799] BTRFS error (device vda3): block=8645398528 read time
> tree block corruption detected
> [    8.967487] audit: type=1400 audit(1563023702.540:19): avc:  denied {
> write } for  pid=2154 comm="cp" name="localtime" dev="vda3" ino=1061039
> scontext=system_u:system_r:initrc_t tcontext=system_u:object_r:locale_t
> tclass=file permissive=0
> [    9.023608] audit: type=1400 audit(1563023702.590:20): avc:  denied {
> mounton } for  pid=2194 comm="mount" path="/chroot/dns/run/named"
> dev="vda3" ino=1061002 scontext=system_u:system_r:mount_t
> tcontext=system_u:object_r:named_var_run_t tclass=dir permissive=0
> [    9.038235] audit: type=1400 audit(1563023702.610:21): avc:  denied {
> getattr } for  pid=2199 comm="start-stop-daem" path="pid:[4026531836]"
> dev="nsfs" ino=4026531836 scontext=system_u:system_r:initrc_t
> tcontext=system_u:object_r:nsfs_t tclass=file permissive=0
> [    9.100897] BTRFS critical (device vda3): corrupt leaf: root=300
> block=8645398528 slot=4 ino=259223, invalid inode generation: has
> 139737289170944 expect [0, 1425224]
> [    9.100900] BTRFS error (device vda3): block=8645398528 read time
> tree block corruption detected
> [    9.137974] BTRFS critical (device vda3): corrupt leaf: root=300
> block=8645398528 slot=4 ino=259223, invalid inode generation: has
> 139737289170944 expect [0, 1425224]
> [    9.137976] BTRFS error (device vda3): block=8645398528 read time
> tree block corruption detected
> [    9.138095] audit: type=1400 audit(1563023702.710:22): avc:  denied {
> getattr } for  pid=2237 comm="start-stop-daem" path="pid:[4026531836]"
> dev="nsfs" ino=4026531836 scontext=system_u:system_r:initrc_t
> tcontext=system_u:object_r:nsfs_t tclass=file permissive=0
> [    9.138477] BTRFS critical (device vda3): corrupt leaf: root=300
> block=8645398528 slot=4 ino=259223, invalid inode generation: has
> 139737289170944 expect [0, 1425224]
> [    9.138480] BTRFS error (device vda3): block=8645398528 read time
> tree block corruption detected
> [    9.161866] BTRFS critical (device vda3): corrupt leaf: root=300
> block=8645398528 slot=4 ino=259223, invalid inode generation: has
> 139737289170944 expect [0, 1425224]
> [    9.161868] BTRFS error (device vda3): block=8645398528 read time
> tree block corruption detected
> [    9.170228] BTRFS critical (device vda3): corrupt leaf: root=300
> block=8645398528 slot=4 ino=259223, invalid inode generation: has
> 139737289170944 expect [0, 1425224]
> [    9.170230] BTRFS error (device vda3): block=8645398528 read time
> tree block corruption detected
> [    9.214491] BTRFS critical (device vda3): corrupt leaf: root=300
> block=8645398528 slot=4 ino=259223, invalid inode generation: has
> 139737289170944 expect [0, 1425224]
> [    9.214494] BTRFS error (device vda3): block=8645398528 read time
> tree block corruption detected
> 
> 
> Alexander


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-14  1:30 ` Qu Wenruo
@ 2019-07-14  9:25   ` Alexander Wetzel
  2019-07-14  9:49     ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Wetzel @ 2019-07-14  9:25 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, wqu


>>
>> filtering for btrfs and removing duplicate lines just shows three uniq
>> error messages:
>>   BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>> [0, 1425224]
>>   BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>> [0, 1425225]
>>   BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>> [0, 1425227]
> 
> The generation number is 0x7f171f7ba000, I see no reason why it would
> make any sense.
> 
> I see no problem rejecting obviously corrupted item.
> 
> The problem is:
> - Is that corrupted item?
>    At least to me, it looks corrupted just from the dmesg.
> 
> - How and when this happens
>    Obviously happened on some older kernel.
>    V5.2 will report such problem before writing corrupted data back to
>    disk, at least prevent such problem from happening.

It's probably useless information at that point, but the FS was created 
with a boot image from Debian 8 around Dec 1st 2016 by migrating an also 
freshly created ext4 filesystem to btrfs.
I'm pretty sure the migration failed with the newer gentoo kernel 
intended for operation - which was sys-kernel/hardened-sources-4.7.10 - 
and a used the Debian boot image for that. (I can piece together all 
kernel versions used from wtmp, but the Debian boot kernel would be 
"guess only".)

The time stamps like "2016-12-01 21:51:27" in the dump below are 
matching very well to the time I was setting up the system based on the 
few remaining log evidence I have.

> Please provide the following dump:
>   # btrfs ins dump-tree -b 8645398528 /dev/vda3
> 

xar /home/alex # btrfs ins dump-tree -b 8645398528 /dev/vda3
btrfs-progs v4.19
leaf 8645398528 items 48 free space 509 generation 1425074 owner 300
leaf 8645398528 flags 0x1(WRITTEN) backref revision 1
fs uuid 668c885e-50b9-41d0-a3ce-b653a4d3f87a
chunk uuid 54c6809b-e261-423f-b4a1-362304e887bd
         item 0 key (259222 DIR_ITEM 2504220146) itemoff 3960 itemsize 35
                 location key (259223 INODE_ITEM 0) type FILE
                 transid 8119256875011 data_len 0 name_len 5
                 name: .keep
         item 1 key (259222 DIR_INDEX 2) itemoff 3925 itemsize 35
                 location key (259223 INODE_ITEM 0) type FILE
                 transid 8119256875011 data_len 0 name_len 5
                 name: .keep
         item 2 key (259222 DIR_INDEX 3) itemoff 3888 itemsize 37
                 location key (258830 INODE_ITEM 0) type DIR
                 transid 2673440063491 data_len 0 name_len 7
                 name: portage
         item 3 key (259222 DIR_INDEX 4) itemoff 3851 itemsize 37
                 location key (3632036 INODE_ITEM 0) type DIR
                 transid 169620 data_len 0 name_len 7
                 name: binpkgs
         item 4 key (259223 INODE_ITEM 0) itemoff 3691 itemsize 160
                 generation 1 transid 139737289170944 size 0 nbytes 0
                 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
                 sequence 139737289225400 flags 0x0(none)
                 atime 1480625487.0 (2016-12-01 21:51:27)
                 ctime 1480625487.0 (2016-12-01 21:51:27)
                 mtime 1480015482.0 (2016-11-24 20:24:42)
                 otime 0.0 (1970-01-01 01:00:00)
         item 5 key (259223 INODE_REF 259222) itemoff 3676 itemsize 15
                 index 2 namelen 5 name: .keep
         item 6 key (259224 INODE_ITEM 0) itemoff 3516 itemsize 160
                 generation 1 transid 1733 size 4 nbytes 5
                 block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
                 sequence 139737289225401 flags 0x0(none)
                 atime 1480626250.0 (2016-12-01 22:04:10)
                 ctime 1480688366.120000000 (2016-12-02 15:19:26)
                 mtime 1480015482.0 (2016-11-24 20:24:42)
                 otime 0.0 (1970-01-01 01:00:00)
         item 7 key (259224 INODE_REF 259207) itemoff 3503 itemsize 13
                 index 7 namelen 3 name: run
         item 8 key (259224 XATTR_ITEM 3817753667) itemoff 3429 itemsize 74
                 location key (0 UNKNOWN.0 0) type XATTR
                 transid 1733 data_len 28 name_len 16
                 name: security.selinux
                 data system_u:object_r:var_run_t
         item 9 key (259224 EXTENT_DATA 0) itemoff 3403 itemsize 26
                 generation 22 type 0 (inline)
                 inline extent data size 5 ram_bytes 5 compression 0 (none)
         item 10 key (259225 INODE_ITEM 0) itemoff 3243 itemsize 160
                 generation 1 transid 591302 size 186 nbytes 0
                 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0
                 sequence 139737289227437 flags 0x0(none)
                 atime 1484937932.634171139 (2017-01-20 19:45:32)
                 ctime 1524992223.179247581 (2018-04-29 10:57:03)
                 mtime 1524992223.179247581 (2018-04-29 10:57:03)
                 otime 0.0 (1970-01-01 01:00:00)
         item 11 key (259225 INODE_REF 259207) itemoff 3230 itemsize 13
                 index 8 namelen 3 name: lib
         item 12 key (259225 XATTR_ITEM 3817753667) itemoff 3156 itemsize 74
                 location key (0 UNKNOWN.0 0) type XATTR
                 transid 1733 data_len 28 name_len 16
                 name: security.selinux
                 data system_u:object_r:var_lib_t
         item 13 key (259225 DIR_ITEM 73688767) itemoff 3122 itemsize 34
                 location key (341157 INODE_ITEM 0) type DIR
                 transid 128778240 data_len 0 name_len 4
                 name: misc
         item 14 key (259225 DIR_ITEM 300800368) itemoff 3085 itemsize 37
                 location key (785370 INODE_ITEM 0) type DIR
                 transid 1489 data_len 0 name_len 7
                 name: selinux
         item 15 key (259225 DIR_ITEM 1107045563) itemoff 3047 itemsize 38
                 location key (789129 INODE_ITEM 0) type DIR
                 transid 1494 data_len 0 name_len 8
                 name: sepolgen
         item 16 key (259225 DIR_ITEM 1111485758) itemoff 3008 itemsize 39
                 location key (1042909 INODE_ITEM 0) type DIR
                 transid 1860 data_len 0 name_len 9
                 name: syslog-ng
         item 17 key (259225 DIR_ITEM 1612599130) itemoff 2972 itemsize 36
                 location key (259233 INODE_ITEM 0) type DIR
                 transid 8439610736640 data_len 0 name_len 6
                 name: gentoo
         item 18 key (259225 DIR_ITEM 2116554129) itemoff 2934 itemsize 38
                 location key (1836819 INODE_ITEM 0) type DIR
                 transid 28448 data_len 0 name_len 8
                 name: openntpd
         item 19 key (259225 DIR_ITEM 2320516785) itemoff 2897 itemsize 37
                 location key (259226 INODE_ITEM 0) type DIR
                 transid 2160540348579840 data_len 0 name_len 7
                 name: portage
         item 20 key (259225 DIR_ITEM 2449158508) itemoff 2858 itemsize 39
                 location key (259239 INODE_ITEM 0) type DIR
                 transid 0 data_len 0 name_len 9
                 name: ip6tables
         item 21 key (259225 DIR_ITEM 2504220146) itemoff 2823 itemsize 35
                 location key (259238 INODE_ITEM 0) type FILE
                 transid 0 data_len 0 name_len 5
                 name: .keep
         item 22 key (259225 DIR_ITEM 2635879490) itemoff 2786 itemsize 37
                 location key (6762622 INODE_ITEM 0) type DIR
                 transid 591302 data_len 0 name_len 7
                 name: postfix
         item 23 key (259225 DIR_ITEM 2734009058) itemoff 2751 itemsize 35
                 location key (1838131 INODE_ITEM 0) type DIR
                 transid 28461 data_len 0 name_len 5
                 name: btrfs
         item 24 key (259225 DIR_ITEM 3421666276) itemoff 2717 itemsize 34
                 location key (259230 INODE_ITEM 0) type DIR
                 transid 504403158265495552 data_len 0 name_len 4
                 name: arpd
         item 25 key (259225 DIR_ITEM 3481791328) itemoff 2681 itemsize 36
                 location key (3820659 INODE_ITEM 0) type DIR
                 transid 211025 data_len 0 name_len 6
                 name: layman
         item 26 key (259225 DIR_ITEM 3968635316) itemoff 2643 itemsize 38
                 location key (259231 INODE_ITEM 0) type DIR
                 transid 68186368 data_len 0 name_len 8
                 name: iptables
         item 27 key (259225 DIR_INDEX 2) itemoff 2606 itemsize 37
                 location key (259226 INODE_ITEM 0) type DIR
                 transid 266353 data_len 0 name_len 7
                 name: portage
         item 28 key (259225 DIR_INDEX 3) itemoff 2572 itemsize 34
                 location key (259230 INODE_ITEM 0) type DIR
                 transid 2 data_len 0 name_len 4
                 name: arpd
         item 29 key (259225 DIR_INDEX 4) itemoff 2534 itemsize 38
                 location key (259231 INODE_ITEM 0) type DIR
                 transid 10359104371 data_len 0 name_len 8
                 name: iptables
         item 30 key (259225 DIR_INDEX 5) itemoff 2498 itemsize 36
                 location key (259233 INODE_ITEM 0) type DIR
                 transid 158067 data_len 0 name_len 6
                 name: gentoo
         item 31 key (259225 DIR_INDEX 6) itemoff 2463 itemsize 35
                 location key (259238 INODE_ITEM 0) type FILE
                 transid 617 data_len 0 name_len 5
                 name: .keep
         item 32 key (259225 DIR_INDEX 7) itemoff 2424 itemsize 39
                 location key (259239 INODE_ITEM 0) type DIR
                 transid 2651930718976 data_len 0 name_len 9
                 name: ip6tables
         item 33 key (259225 DIR_INDEX 8) itemoff 2390 itemsize 34
                 location key (341157 INODE_ITEM 0) type DIR
                 transid 2 data_len 0 name_len 4
                 name: misc
         item 34 key (259225 DIR_INDEX 9) itemoff 2353 itemsize 37
                 location key (785370 INODE_ITEM 0) type DIR
                 transid 1489 data_len 0 name_len 7
                 name: selinux
         item 35 key (259225 DIR_INDEX 10) itemoff 2315 itemsize 38
                 location key (789129 INODE_ITEM 0) type DIR
                 transid 1494 data_len 0 name_len 8
                 name: sepolgen
         item 36 key (259225 DIR_INDEX 11) itemoff 2276 itemsize 39
                 location key (1042909 INODE_ITEM 0) type DIR
                 transid 1860 data_len 0 name_len 9
                 name: syslog-ng
         item 37 key (259225 DIR_INDEX 154) itemoff 2238 itemsize 38
                 location key (1836819 INODE_ITEM 0) type DIR
                 transid 28448 data_len 0 name_len 8
                 name: openntpd
         item 38 key (259225 DIR_INDEX 155) itemoff 2203 itemsize 35
                 location key (1838131 INODE_ITEM 0) type DIR
                 transid 28461 data_len 0 name_len 5
                 name: btrfs
         item 39 key (259225 DIR_INDEX 590) itemoff 2167 itemsize 36
                 location key (3820659 INODE_ITEM 0) type DIR
                 transid 211025 data_len 0 name_len 6
                 name: layman
         item 40 key (259225 DIR_INDEX 591) itemoff 2130 itemsize 37
                 location key (6762622 INODE_ITEM 0) type DIR
                 transid 591302 data_len 0 name_len 7
                 name: postfix
         item 41 key (259226 INODE_ITEM 0) itemoff 1970 itemsize 160
                 generation 1 transid 1425074 size 88 nbytes 0
                 block group 0 mode 42755 links 1 uid 0 gid 250 rdev 0
                 sequence 139737289231301 flags 0x0(none)
                 atime 1484937932.634171139 (2017-01-20 19:45:32)
                 ctime 1563016286.555238083 (2019-07-13 13:11:26)
                 mtime 1563016286.555238083 (2019-07-13 13:11:26)
                 otime 0.0 (1970-01-01 01:00:00)
         item 42 key (259226 INODE_REF 259225) itemoff 1953 itemsize 17
                 index 2 namelen 7 name: portage
         item 43 key (259226 XATTR_ITEM 3817753667) itemoff 1873 itemsize 80
                 location key (0 UNKNOWN.0 0) type XATTR
                 transid 1733 data_len 34 name_len 16
                 name: security.selinux
                 data system_u:object_r:portage_cache_t
         item 44 key (259226 DIR_ITEM 310146024) itemoff 1820 itemsize 53
                 location key (11501213 INODE_ITEM 0) type FILE
                 transid 1328324 data_len 0 name_len 23
                 name: preserved_libs_registry
         item 45 key (259226 DIR_ITEM 2128402847) itemoff 1780 itemsize 40
                 location key (10719048 INODE_ITEM 0) type FILE
                 transid 1265971 data_len 0 name_len 10
                 name: world_sets
         item 46 key (259226 DIR_ITEM 3145042590) itemoff 1744 itemsize 36
                 location key (11426212 INODE_ITEM 0) type FILE
                 transid 1328203 data_len 0 name_len 6
                 name: config
         item 47 key (259226 DIR_ITEM 4131655965) itemoff 1709 itemsize 35
                 location key (12072460 INODE_ITEM 0) type FILE
                 transid 1401504 data_len 0 name_len 5
                 name: world
>>
>> Now I guess I could have some corruption only detected/triggered with
>> the patch and btrfs check may fix it... shall I try that next?
> 
> Sorry, AFAIK btrfs doesn't check as strict as kernel tree-checker, as
> corrupted data in kernel space could lead to system crash while in user
> space it would only cause btrfs check to crash.
> 
> Thus I made tree-checker way picky about irregular data, it's literally
> checking every member and even unused member.
> The dump mentioned above should help us to determine whether btrfs check
> can detect and fix it.
> (I believe it shouldn't be that hard to fix in btrfs-progs)
> 
> Thanks,
> Qu

Thank you for your support!

Alexander



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-14  9:25   ` Alexander Wetzel
@ 2019-07-14  9:49     ` Qu Wenruo
  2019-07-14 12:07       ` Alexander Wetzel
  2019-07-14 15:40       ` Chris Murphy
  0 siblings, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-07-14  9:49 UTC (permalink / raw)
  To: Alexander Wetzel, linux-btrfs, wqu


[-- Attachment #1.1: Type: text/plain, Size: 5966 bytes --]



On 2019/7/14 下午5:25, Alexander Wetzel wrote:
> 
>>>
>>> filtering for btrfs and removing duplicate lines just shows three uniq
>>> error messages:
>>>   BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>>> [0, 1425224]
>>>   BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>>> [0, 1425225]
>>>   BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>>> [0, 1425227]
>>
>> The generation number is 0x7f171f7ba000, I see no reason why it would
>> make any sense.
>>
>> I see no problem rejecting obviously corrupted item.
>>
>> The problem is:
>> - Is that corrupted item?
>>    At least to me, it looks corrupted just from the dmesg.
>>
>> - How and when this happens
>>    Obviously happened on some older kernel.
>>    V5.2 will report such problem before writing corrupted data back to
>>    disk, at least prevent such problem from happening.
> 
> It's probably useless information at that point, but the FS was created
> with a boot image from Debian 8 around Dec 1st 2016 by migrating an also
> freshly created ext4 filesystem to btrfs.

Migrated image could has something unexpected, but according to the
owner id, it's definitely not the converted subvolume. But newly created
subvolume/snapshot.

> I'm pretty sure the migration failed with the newer gentoo kernel
> intended for operation - which was sys-kernel/hardened-sources-4.7.10 -
> and a used the Debian boot image for that. (I can piece together all
> kernel versions used from wtmp, but the Debian boot kernel would be
> "guess only".)
> 
> The time stamps like "2016-12-01 21:51:27" in the dump below are
> matching very well to the time I was setting up the system based on the
> few remaining log evidence I have.

I just did a quick grep and blame for inode transid related code.
The latest direct modification to inode_transid is 6e17d30bfaf4 ("Btrfs:
fill ->last_trans for delayed inode in btrfs_fill_inode."), which is
upstreamed in v4.1.

Furthermore, at that time, we don't have good enough practice for
backport, thus that commit lacks fixes tag, and won't be backported to
most stable branches.
I don't believe Debian backport team would pick this into their kernels,
so if the fs is modified by kernel older than v4.1, then that may be the
cause.

> 
>> Please provide the following dump:
>>   # btrfs ins dump-tree -b 8645398528 /dev/vda3
>>
> 
> xar /home/alex # btrfs ins dump-tree -b 8645398528 /dev/vda3
> btrfs-progs v4.19
> leaf 8645398528 items 48 free space 509 generation 1425074 owner 300
> leaf 8645398528 flags 0x1(WRITTEN) backref revision 1
> fs uuid 668c885e-50b9-41d0-a3ce-b653a4d3f87a
> chunk uuid 54c6809b-e261-423f-b4a1-362304e887bd
>         item 0 key (259222 DIR_ITEM 2504220146) itemoff 3960 itemsize 35
>                 location key (259223 INODE_ITEM 0) type FILE
>                 transid 8119256875011 data_len 0 name_len 5
>                 name: .keep

If we're checking DIR_ITEM/DIR_INDEX transid, it kernel should fail even
easier.

Those transid makes no sense at all.

>         item 1 key (259222 DIR_INDEX 2) itemoff 3925 itemsize 35
>                 location key (259223 INODE_ITEM 0) type FILE
>                 transid 8119256875011 data_len 0 name_len 5
>                 name: .keep
>         item 2 key (259222 DIR_INDEX 3) itemoff 3888 itemsize 37
>                 location key (258830 INODE_ITEM 0) type DIR
>                 transid 2673440063491 data_len 0 name_len 7
>                 name: portage
>         item 3 key (259222 DIR_INDEX 4) itemoff 3851 itemsize 37
>                 location key (3632036 INODE_ITEM 0) type DIR
>                 transid 169620 data_len 0 name_len 7
>                 name: binpkgs
>         item 4 key (259223 INODE_ITEM 0) itemoff 3691 itemsize 160
>                 generation 1 transid 139737289170944 size 0 nbytes 0
>                 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
>                 sequence 139737289225400 flags 0x0(none)

Either the reported transid makes sense.

>                 atime 1480625487.0 (2016-12-01 21:51:27)
>                 ctime 1480625487.0 (2016-12-01 21:51:27)
>                 mtime 1480015482.0 (2016-11-24 20:24:42)
>                 otime 0.0 (1970-01-01 01:00:00)
>         item 5 key (259223 INODE_REF 259222) itemoff 3676 itemsize 15
>                 index 2 namelen 5 name: .keep
>         item 6 key (259224 INODE_ITEM 0) itemoff 3516 itemsize 160
>                 generation 1 transid 1733 size 4 nbytes 5

This transid shold be correct.

According to the leaf geneartion, any transid larger than 1425074 should
be incorrect.

So, the are a lot of transid error, not limited to the reported item 4.
There may be so many transid error that most of your tree blocks may get
modified to update the transid.

To fix this, I believe it's possible to reset all these inodes' transid
to leaf transid, but I'm not 100% sure if such fix won't affect things
like send.


I totally understand that the solution I'm going to provide sounds
aweful, but I'd recommend to use a newer enough kernel but without that
check, to copy all the data to another btrfs fs.

It could be more safe than waiting for a btrfs check to repair it.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-14  9:49     ` Qu Wenruo
@ 2019-07-14 12:07       ` Alexander Wetzel
  2019-07-14 12:51         ` Qu Wenruo
  2019-07-14 15:40       ` Chris Murphy
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Wetzel @ 2019-07-14 12:07 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, wqu

Am 14.07.19 um 11:49 schrieb Qu Wenruo:
> 
> 
> On 2019/7/14 下午5:25, Alexander Wetzel wrote:
>>
>>>>
>>>> filtering for btrfs and removing duplicate lines just shows three uniq
>>>> error messages:
>>>>    BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>>>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>>>> [0, 1425224]
>>>>    BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>>>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>>>> [0, 1425225]
>>>>    BTRFS critical (device vda3): corrupt leaf: root=300 block=8645398528
>>>> slot=4 ino=259223, invalid inode generation: has 139737289170944 expect
>>>> [0, 1425227]
>>>
>>> The generation number is 0x7f171f7ba000, I see no reason why it would
>>> make any sense.
>>>
>>> I see no problem rejecting obviously corrupted item.
>>>
>>> The problem is:
>>> - Is that corrupted item?
>>>     At least to me, it looks corrupted just from the dmesg.
>>>
>>> - How and when this happens
>>>     Obviously happened on some older kernel.
>>>     V5.2 will report such problem before writing corrupted data back to
>>>     disk, at least prevent such problem from happening.
>>
>> It's probably useless information at that point, but the FS was created
>> with a boot image from Debian 8 around Dec 1st 2016 by migrating an also
>> freshly created ext4 filesystem to btrfs.
> 
> Migrated image could has something unexpected, but according to the
> owner id, it's definitely not the converted subvolume. But newly created
> subvolume/snapshot.
> 

Yes, I'm using snapshots and I've added subvolumes after the migration. 
Basically I moved everything from root into a subvolume. But not sure 
which procedure I used back then... I think I created a snapshot of old 
root and then deleted all files/dirs in root with the exception of the 
snapshot. Now I would have said that I did that under the 4.7.10 kernel, 
but it's long ago and there may have been some complications.

>> I'm pretty sure the migration failed with the newer gentoo kernel
>> intended for operation - which was sys-kernel/hardened-sources-4.7.10 -
>> and a used the Debian boot image for that. (I can piece together all
>> kernel versions used from wtmp, but the Debian boot kernel would be
>> "guess only".)
>>
>> The time stamps like "2016-12-01 21:51:27" in the dump below are
>> matching very well to the time I was setting up the system based on the
>> few remaining log evidence I have.
> 
> I just did a quick grep and blame for inode transid related code.
> The latest direct modification to inode_transid is 6e17d30bfaf4 ("Btrfs:
> fill ->last_trans for delayed inode in btrfs_fill_inode."), which is
> upstreamed in v4.1.
> 
> Furthermore, at that time, we don't have good enough practice for
> backport, thus that commit lacks fixes tag, and won't be backported to
> most stable branches.
> I don't believe Debian backport team would pick this into their kernels,
> so if the fs is modified by kernel older than v4.1, then that may be the
> cause.

Sounds plausible, the Debian 8 boot image used for migration probably 
was using a 3.16 kernel. So this version did touch the FS and this may 
be some left overs.
Especially since I have multiple very similar VMs which are working 
fine. (The broken one is the only one hosted in a VPS, forcing me to use 
whatever the provider had at the time to jump start the system. All the 
other VMs - most of them older by some years - are working with 5.2.)

Long story short: I guess we can assume it was something I did on the FS 
while running a 3.16 kernel and the new checks are now complaining about.

> 
>>
>>> Please provide the following dump:
>>>    # btrfs ins dump-tree -b 8645398528 /dev/vda3
>>>
>>
>> xar /home/alex # btrfs ins dump-tree -b 8645398528 /dev/vda3
>> btrfs-progs v4.19
>> leaf 8645398528 items 48 free space 509 generation 1425074 owner 300
>> leaf 8645398528 flags 0x1(WRITTEN) backref revision 1
>> fs uuid 668c885e-50b9-41d0-a3ce-b653a4d3f87a
>> chunk uuid 54c6809b-e261-423f-b4a1-362304e887bd
>>          item 0 key (259222 DIR_ITEM 2504220146) itemoff 3960 itemsize 35
>>                  location key (259223 INODE_ITEM 0) type FILE
>>                  transid 8119256875011 data_len 0 name_len 5
>>                  name: .keep
> 
> If we're checking DIR_ITEM/DIR_INDEX transid, it kernel should fail even
> easier.
> 
> Those transid makes no sense at all.
> 
>>          item 1 key (259222 DIR_INDEX 2) itemoff 3925 itemsize 35
>>                  location key (259223 INODE_ITEM 0) type FILE
>>                  transid 8119256875011 data_len 0 name_len 5
>>                  name: .keep
>>          item 2 key (259222 DIR_INDEX 3) itemoff 3888 itemsize 37
>>                  location key (258830 INODE_ITEM 0) type DIR
>>                  transid 2673440063491 data_len 0 name_len 7
>>                  name: portage
>>          item 3 key (259222 DIR_INDEX 4) itemoff 3851 itemsize 37
>>                  location key (3632036 INODE_ITEM 0) type DIR
>>                  transid 169620 data_len 0 name_len 7
>>                  name: binpkgs
>>          item 4 key (259223 INODE_ITEM 0) itemoff 3691 itemsize 160
>>                  generation 1 transid 139737289170944 size 0 nbytes 0
>>                  block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
>>                  sequence 139737289225400 flags 0x0(none)
> 
> Either the reported transid makes sense.
> 
>>                  atime 1480625487.0 (2016-12-01 21:51:27)
>>                  ctime 1480625487.0 (2016-12-01 21:51:27)
>>                  mtime 1480015482.0 (2016-11-24 20:24:42)
>>                  otime 0.0 (1970-01-01 01:00:00)
>>          item 5 key (259223 INODE_REF 259222) itemoff 3676 itemsize 15
>>                  index 2 namelen 5 name: .keep
>>          item 6 key (259224 INODE_ITEM 0) itemoff 3516 itemsize 160
>>                  generation 1 transid 1733 size 4 nbytes 5
> 
> This transid shold be correct.
> 
> According to the leaf geneartion, any transid larger than 1425074 should
> be incorrect.
> 
> So, the are a lot of transid error, not limited to the reported item 4.
> There may be so many transid error that most of your tree blocks may get
> modified to update the transid.
> 
> To fix this, I believe it's possible to reset all these inodes' transid
> to leaf transid, but I'm not 100% sure if such fix won't affect things
> like send.
> 
> 
> I totally understand that the solution I'm going to provide sounds
> aweful, but I'd recommend to use a newer enough kernel but without that
> check, to copy all the data to another btrfs fs.
>  > It could be more safe than waiting for a btrfs check to repair it.

No problem for me. This report here was created for science only:-)
I just wanted to your attention prior to destroying the broken FS and 
shredding potential useful data useful to track down what went wrong.
With that now concluded I'll just do that!

But maybe one additional remark: The snapshots transferred via btrfs 
send/receive to another PC are working fine on a system using a 5.2 
kernel. Since the "moved" subvolume also does not have the block 
8645398528 I assume I don't really have to copy the files but restoring 
the subvolume with btrfs receive on a new btrfs image will also get rid 
of the errors.

Thanks for your time, the incredible fast feedback and your help,

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-14 12:07       ` Alexander Wetzel
@ 2019-07-14 12:51         ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-07-14 12:51 UTC (permalink / raw)
  To: Alexander Wetzel, linux-btrfs, wqu


[-- Attachment #1.1: Type: text/plain, Size: 2124 bytes --]

[...]
>> I totally understand that the solution I'm going to provide sounds
>> aweful, but I'd recommend to use a newer enough kernel but without that
>> check, to copy all the data to another btrfs fs.
>>  > It could be more safe than waiting for a btrfs check to repair it.
> 
> No problem for me. This report here was created for science only:-)

Thank you for your report!

It really reminds us how badly we did in the past, and gives me some
more hint on how to enhance the tree-checker to report more corruptions!

> I just wanted to your attention prior to destroying the broken FS and
> shredding potential useful data useful to track down what went wrong.
> With that now concluded I'll just do that!
> 
> But maybe one additional remark: The snapshots transferred via btrfs
> send/receive to another PC are working fine on a system using a 5.2
> kernel.

Depends on how you send.
If you are sending the subvolume alone, (without -p or -c), it only
contains data (obviously), inode mode (regular, dir, block ...)
timestamps, filenames.

No internal structures like transid/sequence included, thus send/receive
will remove the corrupted internal structures, and since the destination
is 5.2 kernel, it will recreate them using correct values.

> Since the "moved" subvolume also does not have the block
> 8645398528 I assume I don't really have to copy the files but restoring
> the subvolume with btrfs receive on a new btrfs image will also get rid
> of the errors.

No need to bother that intermediate number at all. It's completely tree
block bytenr.
You don't need to worry about tree blocks, they're just an internal
method to restore things like filenames mentioned above.

As long as the important part is the received correctly, there is
nothing you'd ever need to bother, as they are all *internal* used data
structures, only kernel and developers need to care (and in this case,
receive side kernel will handle it, even developer don't need to care).

Thanks,
Qu

> 
> Thanks for your time, the incredible fast feedback and your help,
> 
> Alexander


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-14  9:49     ` Qu Wenruo
  2019-07-14 12:07       ` Alexander Wetzel
@ 2019-07-14 15:40       ` Chris Murphy
  2019-07-15  1:07         ` Qu Wenruo
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2019-07-14 15:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Alexander Wetzel, Btrfs BTRFS, wqu

On Sun, Jul 14, 2019 at 3:49 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
> I totally understand that the solution I'm going to provide sounds
> aweful, but I'd recommend to use a newer enough kernel but without that
> check, to copy all the data to another btrfs fs.
>
> It could be more safe than waiting for a btrfs check to repair it.

Does the problem affect all trees? If so, then merely creating new
subvolumes, and then 'cp -a --relink oldsubvol newsubvol', and then
delete old subvolumes, won't fix it.

I wonder where the ideas are for online or even out of band fsck.
Offline fsck is too slow and does not scale, a known problem. And both
copying old file system to new file system; as well as restoring
backups to a new file system, is astronomically slower because data
must also be copied, not just metadata. Also a known problem.

What about a variation on btrfs send/receive with --no-data option, to
read out all the old metadata and rewrite all new metadata to the same
file system, taking advantage of COW, but without having to copy out
the data? And then after all of that is done, delete the old file
subvolumes?

Or a variation on seed/sprout, without requiring additional devices.
The seed part "snapshots" the whole original file system (all trees),
and create two read-write file systems: current online mounted volume,
and in-progress offline repair volume. If the repair fails, it's
straightforward to clean up everything while retaining the changes -
at least it's not worse off. If the repair succeeds, then there'd need
to be some means of merging the two read-write file systems - that
could be complicated. But even if in the short term that merge
required an unmount, and perform the merge offline, that would be way
more tolerable than the way things are now.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-14 15:40       ` Chris Murphy
@ 2019-07-15  1:07         ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-07-15  1:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Alexander Wetzel, Btrfs BTRFS, wqu


[-- Attachment #1.1: Type: text/plain, Size: 2360 bytes --]



On 2019/7/14 下午11:40, Chris Murphy wrote:
> On Sun, Jul 14, 2019 at 3:49 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>> I totally understand that the solution I'm going to provide sounds
>> aweful, but I'd recommend to use a newer enough kernel but without that
>> check, to copy all the data to another btrfs fs.
>>
>> It could be more safe than waiting for a btrfs check to repair it.
> 
> Does the problem affect all trees? If so, then merely creating new
> subvolumes, and then 'cp -a --relink oldsubvol newsubvol', and then
> delete old subvolumes, won't fix it.

Not 100% sure yet, but from your dump, it's affecting INODE_ITEM and
DIR_INDEX/DIR_ITEM.

For other trees, only root tree and data reloc tree has INODE_ITEM.
They may get affected but shouldn't be a problem.

> 
> I wonder where the ideas are for online or even out of band fsck.
> Offline fsck is too slow and does not scale, a known problem. And both
> copying old file system to new file system; as well as restoring
> backups to a new file system, is astronomically slower because data
> must also be copied, not just metadata. Also a known problem.
> 
> What about a variation on btrfs send/receive with --no-data option, to
> read out all the old metadata and rewrite all new metadata to the same
> file system, taking advantage of COW, but without having to copy out
> the data? And then after all of that is done, delete the old file
> subvolumes?

It looks possible, but the use case also looks very limited.
Just the case you're hitting, not a generic use case.

So I'm afraid it won't be possible in short term.

Thanks,
Qu

> 
> Or a variation on seed/sprout, without requiring additional devices.
> The seed part "snapshots" the whole original file system (all trees),
> and create two read-write file systems: current online mounted volume,
> and in-progress offline repair volume. If the repair fails, it's
> straightforward to clean up everything while retaining the changes -
> at least it's not worse off. If the repair succeeds, then there'd need
> to be some means of merging the two read-write file systems - that
> could be complicated. But even if in the short term that merge
> required an unmount, and perform the merge offline, that would be way
> more tolerable than the way things are now.
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-13 20:48 [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e Alexander Wetzel
  2019-07-14  1:30 ` Qu Wenruo
@ 2019-07-29 12:46 ` Swâmi Petaramesh
  2019-07-29 13:33   ` Qu Wenruo
  1 sibling, 1 reply; 16+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 12:46 UTC (permalink / raw)
  To: alexander.wetzel; +Cc: linux-btrfs

Hi,

The corruption issue that you report just after upgrading to kernel 5.2
resembles very much to what I had on 2 filesystems after such an upgrade.

I think I'me gonna emergency downgrade all my BTRFS machines to kernel
5.1 before they break ,-(


ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-29 12:46 ` Swâmi Petaramesh
@ 2019-07-29 13:33   ` Qu Wenruo
  2019-07-30  4:56     ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-07-29 13:33 UTC (permalink / raw)
  To: Swâmi Petaramesh, alexander.wetzel; +Cc: linux-btrfs



On 2019/7/29 下午8:46, Swâmi Petaramesh wrote:
> Hi,
>
> The corruption issue that you report just after upgrading to kernel 5.2
> resembles very much to what I had on 2 filesystems after such an upgrade.
>
> I think I'me gonna emergency downgrade all my BTRFS machines to kernel
> 5.1 before they break ,-(

Full kernel message please.

That commit is designed to corrupted inode item, we need more info to
determine if it's a real corruption or not.

Thanks,
Qu

>
>
> ॐ
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-29 13:33   ` Qu Wenruo
@ 2019-07-30  4:56     ` Qu Wenruo
  2019-07-30  6:44       ` Swâmi Petaramesh
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-07-30  4:56 UTC (permalink / raw)
  To: Swâmi Petaramesh, alexander.wetzel; +Cc: linux-btrfs



On 2019/7/29 下午9:33, Qu Wenruo wrote:
>
>
> On 2019/7/29 下午8:46, Swâmi Petaramesh wrote:
>> Hi,
>>
>> The corruption issue that you report just after upgrading to kernel 5.2
>> resembles very much to what I had on 2 filesystems after such an upgrade.
>>
>> I think I'me gonna emergency downgrade all my BTRFS machines to kernel
>> 5.1 before they break ,-(
>
> Full kernel message please.
>
> That commit is designed to corrupted inode item, we need more info to
> determine if it's a real corruption or not.
>
> Thanks,
> Qu

Ping? If you really want to solve the problem, please provide the full
kernel message.

Thanks,
Qu
>
>>
>>
>> ॐ
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-30  4:56     ` Qu Wenruo
@ 2019-07-30  6:44       ` Swâmi Petaramesh
  2019-07-30  7:21         ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Swâmi Petaramesh @ 2019-07-30  6:44 UTC (permalink / raw)
  To: Qu Wenruo, alexander.wetzel; +Cc: linux-btrfs

Le 30/07/2019 à 06:56, Qu Wenruo a écrit :
>>>
>>> I think I'me gonna emergency downgrade all my BTRFS machines to kernel
>>> 5.1 before they break ,-(
>>
>> Full kernel message please.
>>
>> That commit is designed to corrupted inode item, we need more info to
>> determine if it's a real corruption or not.
>>
>> Thanks,
>> Qu
> 
> Ping? If you really want to solve the problem, please provide the full
> kernel message.

Hi,

I have emergency downgraded my system to 5.1 not to take any risk of
crashing my SSD again (and if it crashes again anyway, then I will know
it is not kernel 5.2's fault and let you know...)

I don't have the « first kernel messages » for the SSD because I
restored it to a backup before it failed, so obviously last kernel
messages were lost.

This morning if I try to scrub (using kernel5.1.16-arch1-1-ARCH) the
external HD that failed yesterday, I get :

BTRFS info (device dm-3): scrub: started on devid 1
BTRFS error (device dm-3): parent transid verify failed on 2137144377344
wanted 7684 found 7499
BTRFS error (device dm-3): parent transid verify failed on 2137144377344
wanted 7684 found 7499
BTRFS error (device dm-3): parent transid verify failed on 2137144377344
wanted 7684 found 7499
BTRFS: error (device dm-3) in btrfs_drop_snapshot:9603: errno=-5 IO failure
BTRFS info (device dm-3): forced readonly
BTRFS warning (device dm-3): failed setting block group ro: -30
BTRFS info (device dm-3): scrub: not finished on devid 1 with status: -30

Hope this helps...

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-30  6:44       ` Swâmi Petaramesh
@ 2019-07-30  7:21         ` Qu Wenruo
  2019-07-30  8:02           ` Swâmi Petaramesh
  2019-07-30 13:57           ` Swâmi Petaramesh
  0 siblings, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-07-30  7:21 UTC (permalink / raw)
  To: Swâmi Petaramesh, alexander.wetzel; +Cc: linux-btrfs



On 2019/7/30 下午2:44, Swâmi Petaramesh wrote:
> Le 30/07/2019 à 06:56, Qu Wenruo a écrit :
>>>>
>>>> I think I'me gonna emergency downgrade all my BTRFS machines to kernel
>>>> 5.1 before they break ,-(
>>>
>>> Full kernel message please.
>>>
>>> That commit is designed to corrupted inode item, we need more info to
>>> determine if it's a real corruption or not.
>>>
>>> Thanks,
>>> Qu
>>
>> Ping? If you really want to solve the problem, please provide the full
>> kernel message.
>
> Hi,
>
> I have emergency downgraded my system to 5.1 not to take any risk of
> crashing my SSD again (and if it crashes again anyway, then I will know
> it is not kernel 5.2's fault and let you know...)

That kernel message is to *prevent* any further damage, by rejecting any
invalid metadata.
If it caused mount failure, it shouldn't have written anything to the disk.

The later transid error doesn't really match the original report.

We really need the mount failure message of that fs.

>
> I don't have the « first kernel messages » for the SSD because I
> restored it to a backup before it failed, so obviously last kernel
> messages were lost.
>
> This morning if I try to scrub (using kernel5.1.16-arch1-1-ARCH) the
> external HD that failed yesterday, I get :
>
> BTRFS info (device dm-3): scrub: started on devid 1
> BTRFS error (device dm-3): parent transid verify failed on 2137144377344
> wanted 7684 found 7499
> BTRFS error (device dm-3): parent transid verify failed on 2137144377344
> wanted 7684 found 7499
> BTRFS error (device dm-3): parent transid verify failed on 2137144377344
> wanted 7684 found 7499
> BTRFS: error (device dm-3) in btrfs_drop_snapshot:9603: errno=-5 IO failure
> BTRFS info (device dm-3): forced readonly
> BTRFS warning (device dm-3): failed setting block group ro: -30
> BTRFS info (device dm-3): scrub: not finished on devid 1 with status: -30
>

Unfortunately, transid error here helps nothing.

Thanks,
Qu

> Hope this helps...
>
> ॐ
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-30  7:21         ` Qu Wenruo
@ 2019-07-30  8:02           ` Swâmi Petaramesh
  2019-07-30 13:57           ` Swâmi Petaramesh
  1 sibling, 0 replies; 16+ messages in thread
From: Swâmi Petaramesh @ 2019-07-30  8:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On 7/30/19 9:21 AM, Qu Wenruo wrote:
>
>> I have emergency downgraded my system to 5.1 not to take any risk of
>> crashing my SSD again (and if it crashes again anyway, then I will know
>> it is not kernel 5.2's fault and let you know...)
> That kernel message is to *prevent* any further damage, by rejecting any
> invalid metadata.
> If it caused mount failure, it shouldn't have written anything to the disk.
>
> The later transid error doesn't really match the original report.
>
> We really need the mount failure message of that fs.

This message is what I just got from my external backup USB HD that
started acting up yesterday (before I reverted back to 5.1).

I may find more in the syslog (but the machine is at home and I'm at
work so it'll have to wait... I have the external disk with me however,
so I would be able to run anything to test it now.)

For the machine's system SSD, what happened is that mount never actually
*failed* but the machine started displaying error messages at boot and
these error messages went along with such BTRFS error messages on the
console.

For example the machines wouldn't boot to GUI anymore but I would still
be able to log in in console and see that the log was crowded with
failing services and BTRFS errors.

Then I reformatted and reinstalled from backup, so the corresponding
logs are indeed lost.

ॐ

-- 

Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-30  7:21         ` Qu Wenruo
  2019-07-30  8:02           ` Swâmi Petaramesh
@ 2019-07-30 13:57           ` Swâmi Petaramesh
  2019-07-30 14:16             ` Qu Wenruo
  1 sibling, 1 reply; 16+ messages in thread
From: Swâmi Petaramesh @ 2019-07-30 13:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hello,

Le 30/07/2019 à 09:21, Qu Wenruo a écrit :
> Unfortunately, transid error here helps nothing.

Now, each and everytime I try to mount this disk on the original
machine, or another one, I get :

systemd[1]: run-media-xxxxxxxxxxxxxxxx.mount: Succeeded.
kernel: BTRFS info (device dm-2): disk space caching is enabled
kernel: BTRFS info (device dm-2): has skinny extents
kernel: BTRFS error (device dm-2): parent transid verify failed on
2137144377344 wanted 7684 found 7499
kernel: BTRFS error (device dm-2): parent transid verify failed on
2137144377344 wanted 7684 found 7499
kernel: BTRFS error (device dm-2): parent transid verify failed on
2137144377344 wanted 7684 found 7499
kernel: BTRFS: error (device dm-2) in btrfs_drop_snapshot:9465: errno=-5
IO failure
kernel: BTRFS info (device dm-2): forced readonly

(It first appears to mount OK, then the errors follow a few seconds
afterwards, and the it remounts readonly).

The "7499" displays seems to correspond to the most recent snapshot
created on the disk (using btrfs su li).

Is there any way I could repair this several-TB FS, ever if it implies
losing the latest (or a few of the latest) created subvols and snapshots ?

TIA.

Kind regards.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e
  2019-07-30 13:57           ` Swâmi Petaramesh
@ 2019-07-30 14:16             ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-07-30 14:16 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs



On 2019/7/30 下午9:57, Swâmi Petaramesh wrote:
> Hello,
>
> Le 30/07/2019 à 09:21, Qu Wenruo a écrit :
>> Unfortunately, transid error here helps nothing.
>
> Now, each and everytime I try to mount this disk on the original
> machine, or another one, I get :

When the message shows, it means the damage is already done *BEFORE*.

Really nothing to do with your current kernel version.

If it's the fault of btrfs or device mapper, then it's related to the
kernel used in last unmount (if unmounted cleanly) or last unclean
shutdown before hitting this error.

If it's the hardware, you'd check if your disk has unreliable flush/fua
behavior, then different kernel may have some difference in how it's
affected, but nothing can save you.

>
> systemd[1]: run-media-xxxxxxxxxxxxxxxx.mount: Succeeded.
> kernel: BTRFS info (device dm-2): disk space caching is enabled
> kernel: BTRFS info (device dm-2): has skinny extents
> kernel: BTRFS error (device dm-2): parent transid verify failed on
> 2137144377344 wanted 7684 found 7499
> kernel: BTRFS error (device dm-2): parent transid verify failed on
> 2137144377344 wanted 7684 found 7499
> kernel: BTRFS error (device dm-2): parent transid verify failed on
> 2137144377344 wanted 7684 found 7499
> kernel: BTRFS: error (device dm-2) in btrfs_drop_snapshot:9465: errno=-5
> IO failure
> kernel: BTRFS info (device dm-2): forced readonly
>
> (It first appears to mount OK, then the errors follow a few seconds
> afterwards, and the it remounts readonly).
>
> The "7499" displays seems to correspond to the most recent snapshot
> created on the disk (using btrfs su li).
>
> Is there any way I could repair this several-TB FS, ever if it implies
> losing the latest (or a few of the latest) created subvols and snapshots ?

Short answer: no.

If you want to recover to a rw mountable fs and pass btrfs check, the
chance is very low, as the corruption happens in extent tree, an
essential tree to write operations.
Corruption there is not easy to repair to pass btrfs check again.

Thanks,
Qu

>
> TIA.
>
> Kind regards.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-07-30 14:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-13 20:48 [BUG] BTRFS critical corrupt leaf - bisected to 496245cac57e Alexander Wetzel
2019-07-14  1:30 ` Qu Wenruo
2019-07-14  9:25   ` Alexander Wetzel
2019-07-14  9:49     ` Qu Wenruo
2019-07-14 12:07       ` Alexander Wetzel
2019-07-14 12:51         ` Qu Wenruo
2019-07-14 15:40       ` Chris Murphy
2019-07-15  1:07         ` Qu Wenruo
2019-07-29 12:46 ` Swâmi Petaramesh
2019-07-29 13:33   ` Qu Wenruo
2019-07-30  4:56     ` Qu Wenruo
2019-07-30  6:44       ` Swâmi Petaramesh
2019-07-30  7:21         ` Qu Wenruo
2019-07-30  8:02           ` Swâmi Petaramesh
2019-07-30 13:57           ` Swâmi Petaramesh
2019-07-30 14:16             ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.