Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
@ 2019-08-24 17:44 Christoph Anton Mitterer
  2019-08-25 10:00 ` Swâmi Petaramesh
                   ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Christoph Anton Mitterer @ 2019-08-24 17:44 UTC (permalink / raw)
  To: linux-btrfs

Hey.

Anything new about the issue described here:
https://www.spinics.net/lists/linux-btrfs/msg91046.html

It was said that it might be a regression in 5.2 actually and not a
hardware thing... so I just wonder whether I can safely move to 5.2?


Cheers,
Chris.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
@ 2019-08-25 10:00 ` Swâmi Petaramesh
  2019-08-27  0:00   ` Christoph Anton Mitterer
  2019-08-27 12:52 ` Michal Soltys
  2019-09-12  7:50 ` Filipe Manana
  2 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-25 10:00 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

Le 24/08/2019 à 19:44, Christoph Anton Mitterer a écrit :
> 
> Anything new about the issue described here:
> https://www.spinics.net/lists/linux-btrfs/msg91046.html
> 
> It was said that it might be a regression in 5.2 actually and not a
> hardware thing... so I just wonder whether I can safely move to 5.2?

Hello,

I have re-upgraded the system on which I had the corruption about 10
days ago, and now running kernel 5.2.9-arch1-1-ARCH.

I haven't seen any filesystem issue since, but I haven't used the system
very much yet.

I was planning to wait a little more before reporting whether it looked
stable or not on my machine.

So I can only say that I was able to use the system upgraded again to
5.2 for a few days, so filesystem corruption didn't, at least,
immediately follow a kernel upgrade and a number of system reboots and
snapshots creation / deletion since.

Hope this helps a bit.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-25 10:00 ` Swâmi Petaramesh
@ 2019-08-27  0:00   ` Christoph Anton Mitterer
  2019-08-27  5:06     ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Christoph Anton Mitterer @ 2019-08-27  0:00 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs

Hey.


On Sun, 2019-08-25 at 12:00 +0200, Swâmi Petaramesh wrote:
> I haven't seen any filesystem issue since, but I haven't used the
> system
> very much yet.

Hmm strange... so could it have been a hardware issue?

Cheers,
Chris.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  0:00   ` Christoph Anton Mitterer
@ 2019-08-27  5:06     ` Swâmi Petaramesh
  2019-08-27  6:13       ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  5:06 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

Hi,

Le 27/08/2019 à 02:00, Christoph Anton Mitterer a écrit :
> 
> On Sun, 2019-08-25 at 12:00 +0200, Swâmi Petaramesh wrote:
>> I haven't seen any filesystem issue since, but I haven't used the
>> system
>> very much yet.
> 
> Hmm strange... so could it have been a hardware issue?

I really do not feel so.

This is a laptop that has been perfectly stable and reliable since 2014.

Hardware dies, but if it did I would expect problems to persist or
manifest again real soon. When RAM fails it usually doesn't feel better
later on.

It has corrupt its internal SSD's BTRFS right after I first upgraded to
kernel 5.2, then it corrupt an external's HD BTRFS while I was trying to
backup what could still be...

Then the machine reverted to its usual fair and stable behaviour after I
restored it with a 5.1 kernel again.

Now the machine looks stable so far with a 5.2, albeit more recent, Arch
kernel : 5.2.9-arch1-1-ARCH.

I'm typing this email on it.

I cannot tell what happened, but really this doesn't feel like an
hardware issue to me...

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  5:06     ` Swâmi Petaramesh
@ 2019-08-27  6:13       ` Swâmi Petaramesh
  2019-08-27  6:21         ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  6:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Anton Mitterer

Le 27/08/2019 à 07:06, Swâmi Petaramesh a écrit :
> 
> Now the machine looks stable so far with a 5.2, albeit more recent, Arch
> kernel : 5.2.9-arch1-1-ARCH.

As my 1st machine looks fairly stable now, I just upgraded to 5.2
another one that had always been running <= 5.1 before.

So I keep an eye on the syslog.

Right after reboot in 5.2 I see :

kernel: BTRFS warning (device dm-1): block
group 34390147072 has wrong amount of free space
kernel: BTRFS warning (device dm-1):
failed to load free space cache for block group 34390147072, rebuilding
it now

So it seems that the 5.2 kernel finds and tries to fix minor
inconsistencies that were unnoticed in previous kernel versions ?

I wonder if such things could be the cause of the corruption issues I
got : finding some inconsistencies with new checks right after a kernel
upgrade, trying to fix them and creating a mess instead ?

(This 2nd machine has been rebooted twice after this and still looks
happy...)

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:13       ` Swâmi Petaramesh
@ 2019-08-27  6:21         ` Qu Wenruo
  2019-08-27  6:34           ` Swâmi Petaramesh
  2019-08-27 10:59           ` Swâmi Petaramesh
  0 siblings, 2 replies; 84+ messages in thread
From: Qu Wenruo @ 2019-08-27  6:21 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2019/8/27 下午2:13, Swâmi Petaramesh wrote:
> Le 27/08/2019 à 07:06, Swâmi Petaramesh a écrit :
>>
>> Now the machine looks stable so far with a 5.2, albeit more recent, Arch
>> kernel : 5.2.9-arch1-1-ARCH.
>
> As my 1st machine looks fairly stable now, I just upgraded to 5.2
> another one that had always been running <= 5.1 before.
>
> So I keep an eye on the syslog.
>
> Right after reboot in 5.2 I see :
>
> kernel: BTRFS warning (device dm-1): block
> group 34390147072 has wrong amount of free space
> kernel: BTRFS warning (device dm-1):
> failed to load free space cache for block group 34390147072, rebuilding
> it now
>
> So it seems that the 5.2 kernel finds and tries to fix minor
> inconsistencies that were unnoticed in previous kernel versions ?

It means something wrong is already done in previous kernel.

V1 space cache use regular-file-like internal structures to record used
space. V1 space cache doesn't use btrfs' regular csum tree, but uses its
own inline csum to protect its content.

If free space cache is invalid but passes its csum check, it's
completely *possible* to break metadata CoW, thus leads to transid mismatch.

You can go v2 space cache which uses metadata CoW to protect its space
cache, thus in theory it should be a little safer than V1 space cache.

Or you can just disable space cache using nospace_cache mount option, as
it's just an optimization. It's also recommended to clean existing cache
by using "btrfs check --clear-space-cache v1".

I'd prefer to do a "btrfs check --readonly" anyway (which also checks
free space cache), then go nospace_cache if you're concerned.

Thanks,
Qu

>
> I wonder if such things could be the cause of the corruption issues I
> got : finding some inconsistencies with new checks right after a kernel
> upgrade, trying to fix them and creating a mess instead ?
>
> (This 2nd machine has been rebooted twice after this and still looks
> happy...)
>
> Kind regards.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:21         ` Qu Wenruo
@ 2019-08-27  6:34           ` Swâmi Petaramesh
  2019-08-27  6:52             ` Qu Wenruo
  2019-08-27 10:59           ` Swâmi Petaramesh
  1 sibling, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  6:34 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Hi Qu,

Le 27/08/2019 à 08:21, Qu Wenruo a écrit :
> If free space cache is invalid but passes its csum check, it's
> completely *possible* to break metadata CoW, thus leads to transid mismatch.
> 
> You can go v2 space cache which uses metadata CoW to protect its space
> cache, thus in theory it should be a little safer than V1 space cache.
> 
> Or you can just disable space cache using nospace_cache mount option, as
> it's just an optimization. It's also recommended to clean existing cache
> by using "btrfs check --clear-space-cache v1".
> 
> I'd prefer to do a "btrfs check --readonly" anyway (which also checks
> free space cache), then go nospace_cache if you're concerned.

I will leave for travel shortly, so I will be unable to perform further
tests on this machine for a week, but I'll do when I'm back.

Should I understand your statement as an advice to clear the space cache
even though the kernel said it has rebuilt it, or to use the V2 space
cache generally speaking, on any machine that I use (I had understood it
was useful only on multi-TB filesystems...)

Thanks.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:34           ` Swâmi Petaramesh
@ 2019-08-27  6:52             ` Qu Wenruo
  2019-08-27  9:14               ` Swâmi Petaramesh
       [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
  0 siblings, 2 replies; 84+ messages in thread
From: Qu Wenruo @ 2019-08-27  6:52 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2019/8/27 下午2:34, Swâmi Petaramesh wrote:
> Hi Qu,
>
> Le 27/08/2019 à 08:21, Qu Wenruo a écrit :
>> If free space cache is invalid but passes its csum check, it's
>> completely *possible* to break metadata CoW, thus leads to transid mismatch.
>>
>> You can go v2 space cache which uses metadata CoW to protect its space
>> cache, thus in theory it should be a little safer than V1 space cache.
>>
>> Or you can just disable space cache using nospace_cache mount option, as
>> it's just an optimization. It's also recommended to clean existing cache
>> by using "btrfs check --clear-space-cache v1".
>>
>> I'd prefer to do a "btrfs check --readonly" anyway (which also checks
>> free space cache), then go nospace_cache if you're concerned.
>
> I will leave for travel shortly, so I will be unable to perform further
> tests on this machine for a week, but I'll do when I'm back.
>
> Should I understand your statement as an advice to clear the space cache
> even though the kernel said it has rebuilt it,

Rebuild only happens when kernel detects such mismatch by comparing the
block group free space (recorded in block group item) and free space cache.

If those numbers (along with other things like csum and generation)
match, we don't have a way to detect wrong free space cache at all.

So if kernel is already complaining about free space cache, then no
matter whatever the reason is, you'd better take extra care about the
free space cache.

Although another possible cause is, your extent tree is already
corrupted thus the free space number in block group item is already
incorrect.
You can only determine that by running btrfs check --readonly.

> or to use the V2 space
> cache generally speaking, on any machine that I use (I had understood it
> was useful only on multi-TB filesystems...)

10GiB is enough to create large enough block groups to utilize free
space cache.
So you can't really escape from free space cache.

Thanks,
Qu

>
> Thanks.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:52             ` Qu Wenruo
@ 2019-08-27  9:14               ` Swâmi Petaramesh
  2019-08-27 12:40                 ` Hans van Kranenburg
       [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
  1 sibling, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  9:14 UTC (permalink / raw)
  To: linux-btrfs

On 8/27/19 8:52 AM, Qu Wenruo wrote:
>> or to use the V2 space
>> cache generally speaking, on any machine that I use (I had understood it
>> was useful only on multi-TB filesystems...)
> 10GiB is enough to create large enough block groups to utilize free
> space cache.
> So you can't really escape from free space cache.

I meant that I had understood that the V2 space cache was preferable to
V1 only for multi-TB filesystems.

So would you advise to use V2 space cache also for filesystems < 1 TB ?

TIA.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:21         ` Qu Wenruo
  2019-08-27  6:34           ` Swâmi Petaramesh
@ 2019-08-27 10:59           ` Swâmi Petaramesh
  2019-08-27 11:11             ` Alberto Bursi
  1 sibling, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 10:59 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Hi again,

Le 27/08/2019 à 08:21, Qu Wenruo a écrit :
> I'd prefer to do a "btrfs check --readonly" anyway (which also checks
> free space cache), then go nospace_cache if you're concerned.

Here's what I dit, here's what I got...:

root@PartedMagic:~# uname -r
5.1.5-pmagic64

root@PartedMagic:~# btrfs --version
btrfs-progs v5.1

root@PartedMagic:~# btrfs check --readonly /dev/PPL_VG1/LINUX
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
block group 52677312512 has wrong amount of free space, free space cache
has 266551296 block group has 266584064
failed to load free space cache for block group 52677312512
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/PPL_VG1/LINUX
UUID: 25fede5a-d8c2-4c7e-9e7e-b19aad319044
found 87804731392 bytes used, no error found
total csum bytes: 79811080
total tree bytes: 2195832832
total fs tree bytes: 1992900608
total extent tree bytes: 101548032
btree space waste bytes: 380803707
file data blocks allocated: 626135830528
 referenced 124465221632

root@PartedMagic:~# mkdir /hd
root@PartedMagic:~# mount -t btrfs -o noatime,clear_cache
/dev/PPL_VG1/LINUX /hd

(Waited for no disk activity and top showing no btrfs processes)

root@PartedMagic:~# umount /hd

root@PartedMagic:~# mount -t btrfs -o noatime /dev/PPL_VG1/LINUX /hd

root@PartedMagic:~# grep btrfs /proc/self/mountinfo
40 31 0:43 / /hd rw,noatime - btrfs /dev/mapper/PPL_VG1-LINUX
rw,ssd,space_cache,subvolid=5,subvol=/

root@PartedMagic:~# umount /hd

root@PartedMagic:~# btrfs check --readonly /dev/PPL_VG1/LINUX
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
block group 52677312512 has wrong amount of free space, free space cache
has 266551296 block group has 266584064
failed to load free space cache for block group 52677312512
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/PPL_VG1/LINUX
UUID: 25fede5a-d8c2-4c7e-9e7e-b19aad319044
found 87804207104 bytes used, no error found
total csum bytes: 79811080
total tree bytes: 2195832832
total fs tree bytes: 1992900608
total extent tree bytes: 101548032
btree space waste bytes: 380804019
file data blocks allocated: 626135306240
 referenced 124464697344
root@PartedMagic:~#


So it seems that mounting with “clear_cache” did not actually clear the
cache and fix the issue ?

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 10:59           ` Swâmi Petaramesh
@ 2019-08-27 11:11             ` Alberto Bursi
  2019-08-27 11:20               ` Swâmi Petaramesh
                                 ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Alberto Bursi @ 2019-08-27 11:11 UTC (permalink / raw)
  To: Swâmi Petaramesh, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer


On 27/08/19 12:59, Swâmi Petaramesh wrote:
>
>
> So it seems that mounting with “clear_cache” did not actually clear the
> cache and fix the issue ?
>
> ॐ
>

mounting with clear_cache does not actually clear cache

unless it is needed or modified or something.


If you want to fully clear cache you need to use (on an unmounted 
filesystem)

btrfs check --clear-space-cache v1 /dev/sdX

or

btrfs check  --clear-space-cache v2 /dev/sdX

depending on what space cache you used (v1 is default)


-Alberto


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:11             ` Alberto Bursi
@ 2019-08-27 11:20               ` Swâmi Petaramesh
  2019-08-27 11:29                 ` Alberto Bursi
  2019-08-27 17:49               ` Swâmi Petaramesh
  2019-08-27 22:10               ` Chris Murphy
  2 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 11:20 UTC (permalink / raw)
  To: Alberto Bursi, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Le 27/08/2019 à 13:11, Alberto Bursi a écrit :
> 
> 
> btrfs check --clear-space-cache v1 /dev/sdX

“Bad option” (even with _ instead of - and between option and v1 or V2...

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:20               ` Swâmi Petaramesh
@ 2019-08-27 11:29                 ` Alberto Bursi
  2019-08-27 11:45                   ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Alberto Bursi @ 2019-08-27 11:29 UTC (permalink / raw)
  To: Swâmi Petaramesh, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer


On 27/08/19 13:20, Swâmi Petaramesh wrote:
> Le 27/08/2019 à 13:11, Alberto Bursi a écrit :
>>
>> btrfs check --clear-space-cache v1 /dev/sdX
> “Bad option” (even with _ instead of - and between option and v1 or V2...
>
> ॐ
>

Here on my up-to-date OpenSUSE Tumbleweed system it works.

(doing this on a random flash drive I just formatted)

hpprobook:/home/alby # btrfs check --clear-space-cache v1 /dev/sdd1
Opening filesystem to check...
Checking filesystem on /dev/sdd1
UUID: f69a86ce-7aaa-4c9d-a6dd-4c8ff092007f
Free space cache cleared

-Alberto


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:29                 ` Alberto Bursi
@ 2019-08-27 11:45                   ` Swâmi Petaramesh
  0 siblings, 0 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 11:45 UTC (permalink / raw)
  To: Alberto Bursi, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Le 27/08/2019 à 13:29, Alberto Bursi a écrit :
> hpprobook:/home/alby # btrfs check --clear-space-cache v1 /dev/sdd1

My mistake, I read it too fast and tried it a a mount option...

ॐ


-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Re : Massive filesystem corruption since kernel 5.2 (ARCH)
       [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
@ 2019-08-27 12:34                 ` " Qu Wenruo
  0 siblings, 0 replies; 84+ messages in thread
From: Qu Wenruo @ 2019-08-27 12:34 UTC (permalink / raw)
  To: swami, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2019/8/27 下午4:35, swami@petaramesh.org wrote:
> Hi Qu,
>
> Sorry for top-posting (my phone, uh...)
>
> I meant that I had understood that the V2 space cache was preferable to
> V1 only for multi-TB filesystems.
>
> So would you advise to use V2 space cache also for filesystems < 1 TB ?

Sorry, I'm not that familiar with space cache/tree code to do any
recommendation based on code.

My recommendation is, disable space cache if in doubt. Simple and
straightforward.

Thanks,
Qu
>
> TIA.
>
> Kind regards.
>
>
>
> -------- Message original --------
> Objet : Re: Massive filesystem corruption since kernel 5.2 (ARCH)
> De : Qu Wenruo
> À : Swâmi Petaramesh ,linux-btrfs@vger.kernel.org
> Cc : Christoph Anton Mitterer
>
>
>     > or to use the V2 space
>     > cache generally speaking, on any machine that I use (I had
>     understood it
>     > was useful only on multi-TB filesystems...)
>
>     10GiB is enough to create large enough block groups to utilize free
>     space cache.
>     So you can't really escape from free space cache.
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  9:14               ` Swâmi Petaramesh
@ 2019-08-27 12:40                 ` Hans van Kranenburg
  2019-08-29 12:46                   ` Oliver Freyermuth
  0 siblings, 1 reply; 84+ messages in thread
From: Hans van Kranenburg @ 2019-08-27 12:40 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs

On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>> or to use the V2 space
>>> cache generally speaking, on any machine that I use (I had understood it
>>> was useful only on multi-TB filesystems...)
>> 10GiB is enough to create large enough block groups to utilize free
>> space cache.
>> So you can't really escape from free space cache.
> 
> I meant that I had understood that the V2 space cache was preferable to
> V1 only for multi-TB filesystems.
> 
> So would you advise to use V2 space cache also for filesystems < 1 TB ?

Yes.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
  2019-08-25 10:00 ` Swâmi Petaramesh
@ 2019-08-27 12:52 ` Michal Soltys
  2019-09-12  7:50 ` Filipe Manana
  2 siblings, 0 replies; 84+ messages in thread
From: Michal Soltys @ 2019-08-27 12:52 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

On 8/24/19 7:44 PM, Christoph Anton Mitterer wrote:
> Hey.
> 
> Anything new about the issue described here:
> https://www.spinics.net/lists/linux-btrfs/msg91046.html
> 
> It was said that it might be a regression in 5.2 actually and not a
> hardware thing... so I just wonder whether I can safely move to 5.2?
> 
> 
> Cheers,
> Chris.
> 
> 

FWIW, my laptop is on btrfs since around late 4.x kernel times - also 
using archlinux. And it went w/o any issues through all the kernels 
since then to current one (5.2.9 as of this writing). It survived some 
peculiar hangs and sudden power offs without any bad lasting sideffects 
throughout btrfs existence as its main filesystem. It's on old samsung 
850 pro ssd (though haven't tested if the disk is cache flush liar or not).

That to say I don't have any storage stacks underneath. On some other 
machines I use btrfs (on arch) as well, often with its builtin raid1 
implementation - no issues observed so far.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:11             ` Alberto Bursi
  2019-08-27 11:20               ` Swâmi Petaramesh
@ 2019-08-27 17:49               ` Swâmi Petaramesh
  2019-08-27 22:10               ` Chris Murphy
  2 siblings, 0 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 17:49 UTC (permalink / raw)
  To: Alberto Bursi, linux-btrfs; +Cc: Qu Wenruo, Christoph Anton Mitterer

Hi Alberto,

Le 27/08/2019 à 13:11, Alberto Bursi a écrit :
> If you want to fully clear cache you need to use (on an unmounted 
> filesystem)
> 
> btrfs check --clear-space-cache v1 /dev/sdX

It worked, thanks.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:11             ` Alberto Bursi
  2019-08-27 11:20               ` Swâmi Petaramesh
  2019-08-27 17:49               ` Swâmi Petaramesh
@ 2019-08-27 22:10               ` Chris Murphy
  2 siblings, 0 replies; 84+ messages in thread
From: Chris Murphy @ 2019-08-27 22:10 UTC (permalink / raw)
  To: Alberto Bursi
  Cc: Swâmi Petaramesh, Qu Wenruo, linux-btrfs, Christoph Anton Mitterer

On Tue, Aug 27, 2019 at 5:11 AM Alberto Bursi <alberto.bursi@outlook.it> wrote:

> If you want to fully clear cache you need to use (on an unmounted
> filesystem)
>
> btrfs check --clear-space-cache v1 /dev/sdX
>
> or
>
> btrfs check  --clear-space-cache v2 /dev/sdX

I recommend a minimum version of btrfs-progs 5.1 for either of these
commands. Before that version, a crash mid write of updating the
extent tree can cause file system corruption. In my case, all data
could be extracted merely by mounting -o ro, but I did have to
recreated that file system from scratch.



--
Chris Murphy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 12:40                 ` Hans van Kranenburg
@ 2019-08-29 12:46                   ` Oliver Freyermuth
  2019-08-29 13:08                     ` Christoph Anton Mitterer
                                       ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Oliver Freyermuth @ 2019-08-29 12:46 UTC (permalink / raw)
  To: Hans van Kranenburg, Swâmi Petaramesh, linux-btrfs

Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>> or to use the V2 space
>>>> cache generally speaking, on any machine that I use (I had understood it
>>>> was useful only on multi-TB filesystems...)
>>> 10GiB is enough to create large enough block groups to utilize free
>>> space cache.
>>> So you can't really escape from free space cache.
>>
>> I meant that I had understood that the V2 space cache was preferable to
>> V1 only for multi-TB filesystems.
>>
>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
> 
> Yes.
> 

This makes me wonder if it should be the default? 

This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
 failed to load free space cache for block group XXXX, rebuilding it now
at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels. 

So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW, 
I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?). 

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 12:46                   ` Oliver Freyermuth
@ 2019-08-29 13:08                     ` Christoph Anton Mitterer
  2019-08-29 13:09                     ` Swâmi Petaramesh
  2019-08-29 13:11                     ` Qu Wenruo
  2 siblings, 0 replies; 84+ messages in thread
From: Christoph Anton Mitterer @ 2019-08-29 13:08 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 2019-08-29 at 14:46 +0200, Oliver Freyermuth wrote:
> This thread made me check on my various BTRFS volumes and for almost
> all of them (in different machines), I find cases of
>  failed to load free space cache for block group XXXX, rebuilding it
> now
> at several points during the last months in my syslogs - and that's
> for machines without broken memory, for disks for which FUA should be
> working fine,
> without any unsafe shutdowns over their lifetime, and with histories
> as short as only having seen 5.x kernels. 


I'm having the very same... machines that are most likely fine in terms
of hardware and had no crashes or so,... yet they still see v1 free
space cache issues every now and then,... which sounds like a pointer
that something's wrong there.

Cheers,
Chris.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 12:46                   ` Oliver Freyermuth
  2019-08-29 13:08                     ` Christoph Anton Mitterer
@ 2019-08-29 13:09                     ` Swâmi Petaramesh
  2019-08-29 13:11                     ` Qu Wenruo
  2 siblings, 0 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-29 13:09 UTC (permalink / raw)
  To: Oliver Freyermuth, Hans van Kranenburg, linux-btrfs

On 8/29/19 2:46 PM, Oliver Freyermuth wrote:
> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>  failed to load free space cache for block group XXXX, rebuilding it now
> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels. 

Wow. Thanks for the report. There's definitely some bug out there.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 12:46                   ` Oliver Freyermuth
  2019-08-29 13:08                     ` Christoph Anton Mitterer
  2019-08-29 13:09                     ` Swâmi Petaramesh
@ 2019-08-29 13:11                     ` Qu Wenruo
  2019-08-29 13:17                       ` Oliver Freyermuth
  2 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-08-29 13:11 UTC (permalink / raw)
  To: Oliver Freyermuth, Hans van Kranenburg, Swâmi Petaramesh,
	linux-btrfs



On 2019/8/29 下午8:46, Oliver Freyermuth wrote:
> Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
>> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>>> or to use the V2 space
>>>>> cache generally speaking, on any machine that I use (I had understood it
>>>>> was useful only on multi-TB filesystems...)
>>>> 10GiB is enough to create large enough block groups to utilize free
>>>> space cache.
>>>> So you can't really escape from free space cache.
>>>
>>> I meant that I had understood that the V2 space cache was preferable to
>>> V1 only for multi-TB filesystems.
>>>
>>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
>>
>> Yes.
>>
>
> This makes me wonder if it should be the default?

It will be.

Just a spoiler, I believe features like no-holes and v2 space cache will
be default in not so far future.

>
> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>  failed to load free space cache for block group XXXX, rebuilding it now
> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels.

That's interesting. In theory that shouldn't happen, especially without
unsafe shutdown.

But please also be aware that, there is no concrete proof that corrupted
v1 space cache is causing all the problems.
What I said is just, corrupted v1 space cache may cause problem, I need
to at least craft an image to proof my assumption.

>
> So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW,
> I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?).

At least, your experience would definitely help the btrfs community.

Thanks,
Qu

>
> Cheers,
> 	Oliver
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 13:11                     ` Qu Wenruo
@ 2019-08-29 13:17                       ` Oliver Freyermuth
  2019-08-29 17:40                         ` Oliver Freyermuth
  0 siblings, 1 reply; 84+ messages in thread
From: Oliver Freyermuth @ 2019-08-29 13:17 UTC (permalink / raw)
  To: Qu Wenruo, Hans van Kranenburg, Swâmi Petaramesh, linux-btrfs

Am 29.08.19 um 15:11 schrieb Qu Wenruo:
> 
> 
> On 2019/8/29 下午8:46, Oliver Freyermuth wrote:
>> Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
>>> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>>>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>>>> or to use the V2 space
>>>>>> cache generally speaking, on any machine that I use (I had understood it
>>>>>> was useful only on multi-TB filesystems...)
>>>>> 10GiB is enough to create large enough block groups to utilize free
>>>>> space cache.
>>>>> So you can't really escape from free space cache.
>>>>
>>>> I meant that I had understood that the V2 space cache was preferable to
>>>> V1 only for multi-TB filesystems.
>>>>
>>>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
>>>
>>> Yes.
>>>
>>
>> This makes me wonder if it should be the default?
> 
> It will be.
> 
> Just a spoiler, I believe features like no-holes and v2 space cache will
> be default in not so far future.
> 
>>
>> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>>  failed to load free space cache for block group XXXX, rebuilding it now
>> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
>> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels.
> 
> That's interesting. In theory that shouldn't happen, especially without
> unsafe shutdown.

I also forgot to add that in addition on the machines there is no mdraid / dm / LUKS in between (i.e. purely btrfs on the drives). 
The messages _seem_ to be more prominent for spinning disks, but after all, my statistics is just 5 devices in total. 
So it really "feels" like a bug crawling somewhere. However, the machines seem to not have not seen any actual corruption as consequence. 
I'm playing with "btrfs check --readonly" now to see if there's really everything still fine, but I'm already running kernel 5.2 with the new checks without issues. 

> But please also be aware that, there is no concrete proof that corrupted
> v1 space cache is causing all the problems.
> What I said is just, corrupted v1 space cache may cause problem, I need
> to at least craft an image to proof my assumption.

I see - that might be useful in any case to hopefully track down the issue. 

> 
>>
>> So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW,
>> I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?).
> 
> At least, your experience would definitely help the btrfs community.

Ok, then I will slowly switch the nodes one by one - in case I do not come and cry on the list, this means all is well (but I'm only a small datapoint with 5 disks in three machines) ;-). 

Cheers,
	Oliver

> 
> Thanks,
> Qu
> 
>>
>> Cheers,
>> 	Oliver
>>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 13:17                       ` Oliver Freyermuth
@ 2019-08-29 17:40                         ` Oliver Freyermuth
  0 siblings, 0 replies; 84+ messages in thread
From: Oliver Freyermuth @ 2019-08-29 17:40 UTC (permalink / raw)
  To: Qu Wenruo, Hans van Kranenburg, Swâmi Petaramesh, linux-btrfs

Am 29.08.19 um 15:17 schrieb Oliver Freyermuth:
> Am 29.08.19 um 15:11 schrieb Qu Wenruo:
>>
>>
>> On 2019/8/29 下午8:46, Oliver Freyermuth wrote:
>>> Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
>>>> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>>>>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>>>>> or to use the V2 space
>>>>>>> cache generally speaking, on any machine that I use (I had understood it
>>>>>>> was useful only on multi-TB filesystems...)
>>>>>> 10GiB is enough to create large enough block groups to utilize free
>>>>>> space cache.
>>>>>> So you can't really escape from free space cache.
>>>>>
>>>>> I meant that I had understood that the V2 space cache was preferable to
>>>>> V1 only for multi-TB filesystems.
>>>>>
>>>>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
>>>>
>>>> Yes.
>>>>
>>>
>>> This makes me wonder if it should be the default?
>>
>> It will be.
>>
>> Just a spoiler, I believe features like no-holes and v2 space cache will
>> be default in not so far future.
>>
>>>
>>> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>>>  failed to load free space cache for block group XXXX, rebuilding it now
>>> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
>>> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels.
>>
>> That's interesting. In theory that shouldn't happen, especially without
>> unsafe shutdown.
> 
> I also forgot to add that in addition on the machines there is no mdraid / dm / LUKS in between (i.e. purely btrfs on the drives). 
> The messages _seem_ to be more prominent for spinning disks, but after all, my statistics is just 5 devices in total. 
> So it really "feels" like a bug crawling somewhere. However, the machines seem to not have not seen any actual corruption as consequence. 
> I'm playing with "btrfs check --readonly" now to see if there's really everything still fine, but I'm already running kernel 5.2 with the new checks without issues. 

To calm anybody still in fear of the rebuilding warnings:
I already checked two disks (including the most affected one) and they were perfectly healthy. So it seems that at least in my cases, the "rebuilding" warning did not (yet?) coincide with any corruption. 
I have also converted the largest and most affected disk to space_cache=v2 as of now. If that works well in the next weeks, I will look at converting the rest,
and if not, I'll be back here ;-). 

Cheers,
	Oliver

> 
>> But please also be aware that, there is no concrete proof that corrupted
>> v1 space cache is causing all the problems.
>> What I said is just, corrupted v1 space cache may cause problem, I need
>> to at least craft an image to proof my assumption.
> 
> I see - that might be useful in any case to hopefully track down the issue. 
> 
>>
>>>
>>> So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW,
>>> I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?).
>>
>> At least, your experience would definitely help the btrfs community.
> 
> Ok, then I will slowly switch the nodes one by one - in case I do not come and cry on the list, this means all is well (but I'm only a small datapoint with 5 disks in three machines) ;-). 
> 
> Cheers,
> 	Oliver
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> Cheers,
>>> 	Oliver
>>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
  2019-08-25 10:00 ` Swâmi Petaramesh
  2019-08-27 12:52 ` Michal Soltys
@ 2019-09-12  7:50 ` Filipe Manana
  2019-09-12  8:24   ` James Harvey
                     ` (2 more replies)
  2 siblings, 3 replies; 84+ messages in thread
From: Filipe Manana @ 2019-09-12  7:50 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-btrfs, David Sterba

On Sat, Aug 24, 2019 at 6:53 PM Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
>
> Hey.
>
> Anything new about the issue described here:
> https://www.spinics.net/lists/linux-btrfs/msg91046.html
>
> It was said that it might be a regression in 5.2 actually and not a
> hardware thing... so I just wonder whether I can safely move to 5.2?

So we definitely have a serious regression introduced on 5.2.
I sent out a fix for it yesterday:  https://patchwork.kernel.org/patch/11141559/

Two things can happen:

1) either a hang when committing a transaction, reported by several
users recently and hit it myself too twice when running fstests (test
case generic/475 and generic/561) after I upgradaded my development
branch from a 5.1.x kernel to a 5.3-rcX kernel. If this happens you
risk no corruption, still the hang is very inconvenient of course, as
you have to reboot.

2) writeback for some btree nodes may never be started and we end up
committing a transaction without noticing that. This is really serious
and that will lead to the "parent transid verify failed on ..."
messages.

Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
recommend running 5.2 or 5.3.

>
>
> Cheers,
> Chris.
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  7:50 ` Filipe Manana
@ 2019-09-12  8:24   ` James Harvey
  2019-09-12  9:06     ` Filipe Manana
                       ` (2 more replies)
  2019-09-12  8:48   ` Swâmi Petaramesh
  2019-09-12 13:09   ` Christoph Anton Mitterer
  2 siblings, 3 replies; 84+ messages in thread
From: James Harvey @ 2019-09-12  8:24 UTC (permalink / raw)
  To: fdmanana; +Cc: Christoph Anton Mitterer, linux-btrfs, David Sterba

On Thu, Sep 12, 2019 at 3:51 AM Filipe Manana <fdmanana@gmail.com> wrote:
> ...
>
> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
> recommend running 5.2 or 5.3.

What is your recommendation for distributions that have been shipping
5.2.x for quite some time, where a distro-wide downgrade to 5.1.x
isn't really an option that will be considered, especially because
many users aren't using BTRFS?  Can/should your patch be backported to
5.2.13/5.2.14?  Or, does it really need to be applied to 5.3rc or git
master?  Or, is it possibly not the right fix for the corruption risk,
and should a flashing neon sign be given to users to just run 5.1.x
even though the distribution repos have 5.2.x?

What is your recommendation for users who have been running 5.2.x and
running into a lot of hangs?  Would you say to apply your patch to a
custom-compiled kernel, or to downgrade to 5.1.x?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  7:50 ` Filipe Manana
  2019-09-12  8:24   ` James Harvey
@ 2019-09-12  8:48   ` Swâmi Petaramesh
  2019-09-12 13:09   ` Christoph Anton Mitterer
  2 siblings, 0 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12  8:48 UTC (permalink / raw)
  To: fdmanana, Christoph Anton Mitterer; +Cc: linux-btrfs, David Sterba

Hi Filipe,

On 9/12/19 9:50 AM, Filipe Manana wrote:
> So we definitely have a serious regression introduced on 5.2.
> I sent out a fix for it yesterday:  https://patchwork.kernel.org/patch/11141559/

Many thanks for having found and patched it.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  8:24   ` James Harvey
@ 2019-09-12  9:06     ` Filipe Manana
  2019-09-12  9:09     ` Holger Hoffstätte
  2019-09-12 10:53     ` Swâmi Petaramesh
  2 siblings, 0 replies; 84+ messages in thread
From: Filipe Manana @ 2019-09-12  9:06 UTC (permalink / raw)
  To: James Harvey; +Cc: Christoph Anton Mitterer, linux-btrfs, David Sterba

On Thu, Sep 12, 2019 at 9:24 AM James Harvey <jamespharvey20@gmail.com> wrote:
>
> On Thu, Sep 12, 2019 at 3:51 AM Filipe Manana <fdmanana@gmail.com> wrote:
> > ...
> >
> > Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
> > recommend running 5.2 or 5.3.
>
> What is your recommendation for distributions that have been shipping
> 5.2.x for quite some time, where a distro-wide downgrade to 5.1.x
> isn't really an option that will be considered, especially because
> many users aren't using BTRFS?  Can/should your patch be backported to
> 5.2.13/5.2.14?

It's meant to be backported to 5.2.x and 5.3.x (probably not to 5.3
since we are at rc8 and too close to merge window for 5.4).

> Or, does it really need to be applied to 5.3rc or git
> master?  Or, is it possibly not the right fix for the corruption risk,
> and should a flashing neon sign be given to users to just run 5.1.x
> even though the distribution repos have 5.2.x?
>
> What is your recommendation for users who have been running 5.2.x and
> running into a lot of hangs?  Would you say to apply your patch to a
> custom-compiled kernel, or to downgrade to 5.1.x?

Sorry, I can't advise on that. That depends a lot on the distro and user needs.
Going back to 5.1 might be ok for some, but not for others due to
important fixes or new features/drivers in 5.2 for example.

It's really up to the distro and user to choose according to their needs.



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  8:24   ` James Harvey
  2019-09-12  9:06     ` Filipe Manana
@ 2019-09-12  9:09     ` Holger Hoffstätte
  2019-09-12 10:53     ` Swâmi Petaramesh
  2 siblings, 0 replies; 84+ messages in thread
From: Holger Hoffstätte @ 2019-09-12  9:09 UTC (permalink / raw)
  To: James Harvey; +Cc: linux-btrfs

On 9/12/19 10:24 AM, James Harvey wrote:
> On Thu, Sep 12, 2019 at 3:51 AM Filipe Manana <fdmanana@gmail.com> wrote:
>> ...
>>
>> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
>> recommend running 5.2 or 5.3.
> 
> What is your recommendation for distributions that have been shipping
> 5.2.x for quite some time, where a distro-wide downgrade to 5.1.x
> isn't really an option that will be considered, especially because
> many users aren't using BTRFS?  Can/should your patch be backported to
> 5.2.13/5.2.14?  Or, does it really need to be applied to 5.3rc or git
> master?  Or, is it possibly not the right fix for the corruption risk,
> and should a flashing neon sign be given to users to just run 5.1.x
> even though the distribution repos have 5.2.x?

It applies and works just fine in 5.2.x, I have it running in .14.
If your distribution doesn't apply patches or just ships a random
release-of-the month kernel, well.. ¯\(ツ)/¯

> What is your recommendation for users who have been running 5.2.x and
> running into a lot of hangs?  Would you say to apply your patch to a
> custom-compiled kernel, or to downgrade to 5.1.x?

5.1.x is EOL upstream and you might be missing other critical things
like security fixes. Considering how easy it is to build a custom kernel
from an existing configuration, the former.

-h

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  8:24   ` James Harvey
  2019-09-12  9:06     ` Filipe Manana
  2019-09-12  9:09     ` Holger Hoffstätte
@ 2019-09-12 10:53     ` Swâmi Petaramesh
  2019-09-12 12:58       ` Christoph Anton Mitterer
  2 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12 10:53 UTC (permalink / raw)
  To: James Harvey, fdmanana
  Cc: Christoph Anton Mitterer, linux-btrfs, David Sterba

Le 12/09/2019 à 10:24, James Harvey a écrit :
> and should a flashing neon sign be given to users to just run 5.1.x
> even though the distribution repos have 5.2.x?
Yep, I assume that a big flashing red neon sign should be raised for a 
confirmed bug that can trash your filesystem into ashes, and actually 
did so for two of mine...

ॐ
-- 
Swâmi Petaramesh <swami@petaramesh.org> OpenPGP ID 0x1BFFD850


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 10:53     ` Swâmi Petaramesh
@ 2019-09-12 12:58       ` Christoph Anton Mitterer
  2019-10-14  4:00         ` Nicholas D Steeves
  0 siblings, 1 reply; 84+ messages in thread
From: Christoph Anton Mitterer @ 2019-09-12 12:58 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs

On Thu, 2019-09-12 at 12:53 +0200, Swâmi Petaramesh wrote:
> Yep, I assume that a big flashing red neon sign should be raised for
> a 
> confirmed bug that can trash your filesystem into ashes, and
> actually 
> did so for two of mine...

I doubt this will happen... I've asked for something like this to be
set up on the last corruption bugs but there seems to be little
interest for a warning system for users.


Cheers,
Chris.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  7:50 ` Filipe Manana
  2019-09-12  8:24   ` James Harvey
  2019-09-12  8:48   ` Swâmi Petaramesh
@ 2019-09-12 13:09   ` Christoph Anton Mitterer
  2019-09-12 14:28     ` Filipe Manana
  2 siblings, 1 reply; 84+ messages in thread
From: Christoph Anton Mitterer @ 2019-09-12 13:09 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

Hi.

First, thanks for finding&fixing this :-)


On Thu, 2019-09-12 at 08:50 +0100, Filipe Manana wrote:
> 1) either a hang when committing a transaction, reported by several
> users recently and hit it myself too twice when running fstests (test
> case generic/475 and generic/561) after I upgradaded my development
> branch from a 5.1.x kernel to a 5.3-rcX kernel. If this happens you
> risk no corruption, still the hang is very inconvenient of course, as
> you have to reboot.

Okay inconvenient, but not so bad if there is no corruption risk.


> 2) writeback for some btree nodes may never be started and we end up
> committing a transaction without noticing that. This is really
> serious
> and that will lead to the "parent transid verify failed on ..."
> messages.

As some people have already pointed out, it will be infeasible for many
end users to downgrade (no security updates) or manually patch (well,
end-users).

Can you elaborate under which circumstances this problem occurs,
whether there are any intermediate workarounds, and whether it's always
noticed (i.e. no silence corruption)?


Thanks,
Chris.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 13:09   ` Christoph Anton Mitterer
@ 2019-09-12 14:28     ` Filipe Manana
  2019-09-12 14:39       ` Christoph Anton Mitterer
  2019-09-13 18:50       ` Pete
  0 siblings, 2 replies; 84+ messages in thread
From: Filipe Manana @ 2019-09-12 14:28 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-btrfs

On Thu, Sep 12, 2019 at 2:09 PM Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
>
> Hi.
>
> First, thanks for finding&fixing this :-)
>
>
> On Thu, 2019-09-12 at 08:50 +0100, Filipe Manana wrote:
> > 1) either a hang when committing a transaction, reported by several
> > users recently and hit it myself too twice when running fstests (test
> > case generic/475 and generic/561) after I upgradaded my development
> > branch from a 5.1.x kernel to a 5.3-rcX kernel. If this happens you
> > risk no corruption, still the hang is very inconvenient of course, as
> > you have to reboot.
>
> Okay inconvenient, but not so bad if there is no corruption risk.
>
>
> > 2) writeback for some btree nodes may never be started and we end up
> > committing a transaction without noticing that. This is really
> > serious
> > and that will lead to the "parent transid verify failed on ..."
> > messages.
>
> As some people have already pointed out, it will be infeasible for many
> end users to downgrade (no security updates) or manually patch (well,
> end-users).

Yes, but I can't do anything about that. I'm not skilled to build a
time machine to go back in time :)

>
> Can you elaborate under which circumstances this problem occurs,
> whether there are any intermediate workarounds, and whether it's always
> noticed (i.e. no silence corruption)?

It can happen whenever a transaction is being committed (or committing
the fsync log).
Every fs is at risk, unless it's always mounted in read-only and with
-o nologreplay.

A btree node/leaf (extent buffer) is dirty in memory, needs to be
written to disk, this always happens at transaction commit time,
but can also happen before that, if for some reason writeback on the
btree inode happens (due to reclaim, system under memory pressure,
etc).

If the writeback happens only at the transaction commit time, and if
one the node's pages is locked (not necessarily by btrfs,
it can happen everywhere in the memory management subsystem, page
migration for example), we ended up skipping the
writeback (start the process of writing what's in memory to disk) of a
node. This is case 2), the corruption with the error messages
"parent transid verify failed ..." in dmesg/syslog after mounting the
filesystem again.
This is very likely (as we can never rule out other bugs, be it in
btrfs or some other layer, or even hardware/firmware) what
Swâmi ran into, since he never had problems with 5.1 and older kernel
versions and has been using the same hardware for a long time.

For case 1), the hang, it happens if writeback happened before the
transaction commit as well. At transaction commit we trigger
writeback again for the same node(s), and here we hang because of the
previous attempt.
Two people reported the hang yesterday here on the list, plus at least
one more some weeks ago.
I hit it myself once last week and once 2 evenings ago with test cases
from fstests after changing my development branch from 5.1 to 5.3-rcX.

To hit any of the problems, sure, you still need to have some bad
luck, but it's impossible to tell how likely to run into it.
It depends on so many things, from workloads, system configuration, etc.
No matter how likely (and how likely will not be the same for
everyone), it's serious because if it happens you can get a corrupt
filesystem.

>
>
> Thanks,
> Chris.
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:28     ` Filipe Manana
@ 2019-09-12 14:39       ` Christoph Anton Mitterer
  2019-09-12 14:57         ` Swâmi Petaramesh
  2019-09-13 18:50       ` Pete
  1 sibling, 1 reply; 84+ messages in thread
From: Christoph Anton Mitterer @ 2019-09-12 14:39 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Thu, 2019-09-12 at 15:28 +0100, Filipe Manana wrote:
> This is case 2), the corruption with the error messages
> "parent transid verify failed ..." in dmesg/syslog after mounting the
> filesystem again.

Hmm so "at least" it will never go unnoticed, right?

This is IMO a pretty important advise, as people may want to compare
their current data with that of backups... if silent corruption would
have been possible.


Cheers,
Chris.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:39       ` Christoph Anton Mitterer
@ 2019-09-12 14:57         ` Swâmi Petaramesh
  2019-09-12 16:21           ` Zdenek Kaspar
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12 14:57 UTC (permalink / raw)
  To: Christoph Anton Mitterer, fdmanana; +Cc: linux-btrfs

On 9/12/19 4:39 PM, Christoph Anton Mitterer wrote:
> On Thu, 2019-09-12 at 15:28 +0100, Filipe Manana wrote:
>> This is case 2), the corruption with the error messages
>> "parent transid verify failed ..." in dmesg/syslog after mounting the
>> filesystem again.
> Hmm so "at least" it will never go unnoticed, right?
>
> This is IMO a pretty important advise, as people may want to compare
> their current data with that of backups... if silent corruption would
> have been possible.
>
>
> Cheers,
> Chris.

Per my own experience hitting this bug, it definitely doesn't go
unnoticed - I haven't experienced the system hang, which obviouslly
cannot go unoticed, but the « Parent TransID » failure thing.

The most annoying consequence is that the filesystem is then beyond
repair and needs to be completely recreated. In my case I still could
backup most of my files from the damaged FS, even though a few were lost
or damaged - files that were open or had been recently changed, while
old files were still there and healthy.

So for me most of the hassle was to first recreate my FS with all its
complexity (subvols, snapshots...) and restore from a backup made just
before, then check missing files from another (luckily very recent)
other backup and trying to fix what got broke.

I have to say that I have re-upgraded said machine to latest Arch 5.2
kernel a couple weeks ago and a couple other Manjaro machines to latest
5.2 kernel as well, and haven't been hit by the bug since.

However having read that the bug is diagnosed, confirmed and fixed by
Filipe, I seriously consider downgrading my kernel back to 5.1 on the 2
Manjaro machines as it is rather straightforward, and maybe my Arch as
well... Until I'm sure that the fix made it to said distro kernels.

Fortunately other common, less “bleading edge” distros that I use, such
as Debian stable or Mint/Ubuntu, still ships a kernel which is older
than 5.1 and I will stay away from 5.2 backports...

I'm however quite concerned that the FS on which I store all of my most
precious data and considered “the safest native Linux FS available” can
still suffer such regressions that can plainly trash it to ruins.

Kind regards.

>
ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:57         ` Swâmi Petaramesh
@ 2019-09-12 16:21           ` Zdenek Kaspar
  2019-09-12 18:52             ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Zdenek Kaspar @ 2019-09-12 16:21 UTC (permalink / raw)
  To: Swâmi Petaramesh, Christoph Anton Mitterer, fdmanana; +Cc: linux-btrfs

On 9/12/19 4:57 PM, Swâmi Petaramesh wrote:

> However having read that the bug is diagnosed, confirmed and fixed by
> Filipe, I seriously consider downgrading my kernel back to 5.1 on the 2
> Manjaro machines as it is rather straightforward, and maybe my Arch as
> well... Until I'm sure that the fix made it to said distro kernels.

It's included in [testing] right now...

https://git.archlinux.org/linux.git/log/?h=v5.2.14-arch2

Z.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 16:21           ` Zdenek Kaspar
@ 2019-09-12 18:52             ` Swâmi Petaramesh
  0 siblings, 0 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12 18:52 UTC (permalink / raw)
  To: Zdenek Kaspar, Christoph Anton Mitterer, fdmanana; +Cc: linux-btrfs

Le 12/09/2019 à 18:21, Zdenek Kaspar a écrit :
> On 9/12/19 4:57 PM, Swâmi Petaramesh wrote:
>
>> However having read that the bug is diagnosed, confirmed and fixed by
>> Filipe, I seriously consider downgrading my kernel back to 5.1 on the 2
>> Manjaro machines as it is rather straightforward, and maybe my Arch as
>> well... Until I'm sure that the fix made it to said distro kernels.
>
> It's included in [testing] right now...
>
> https://git.archlinux.org/linux.git/log/?h=v5.2.14-arch2
> Z.


:)


ॐ
-- 
Swâmi Petaramesh <swami@petaramesh.org> OpenPGP ID 0x1BFFD850


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:28     ` Filipe Manana
  2019-09-12 14:39       ` Christoph Anton Mitterer
@ 2019-09-13 18:50       ` Pete
       [not found]         ` <CACzgC9gvhGwyQAKm5J1smZZjim-ecEix62ZQCY-wwJYVzMmJ3Q@mail.gmail.com>
  1 sibling, 1 reply; 84+ messages in thread
From: Pete @ 2019-09-13 18:50 UTC (permalink / raw)
  To: fdmanana, Christoph Anton Mitterer; +Cc: linux-btrfs

On 9/12/19 3:28 PM, Filipe Manana wrote:

>>> 2) writeback for some btree nodes may never be started and we end up
>>> committing a transaction without noticing that. This is really
>>> serious
>>> and that will lead to the "parent transid verify failed on ..."
>>> messages.

> Two people reported the hang yesterday here on the list, plus at least
> one more some weeks ago.

This was one of my messages that I got when I reported an issue in the
thread 'Chasing IO errors' which occurred in mid to late August.


> I hit it myself once last week and once 2 evenings ago with test cases
> from fstests after changing my development branch from 5.1 to 5.3-rcX.
> 
> To hit any of the problems, sure, you still need to have some bad
> luck, but it's impossible to tell how likely to run into it.
> It depends on so many things, from workloads, system configuration, etc.
> No matter how likely (and how likely will not be the same for
> everyone), it's serious because if it happens you can get a corrupt
> filesystem.

I can't help you with any specifics workloads causing it.  I just
notices that my fs went read only, that is all.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
       [not found]         ` <CACzgC9gvhGwyQAKm5J1smZZjim-ecEix62ZQCY-wwJYVzMmJ3Q@mail.gmail.com>
@ 2019-10-14  2:07           ` Adam Bahe
  2019-10-14  2:19             ` Qu Wenruo
  2019-10-14 17:54             ` Chris Murphy
  0 siblings, 2 replies; 84+ messages in thread
From: Adam Bahe @ 2019-10-14  2:07 UTC (permalink / raw)
  To: linux-btrfs

> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.

I know fixes went in to distro specific kernels. But wanted to verify
if the fix went into the vanilla kernel.org kernel? If so, what
version should be safe? ex:
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6

With 180 raw TB in raid1 I just want to be explicit. Thanks!


On Sun, Oct 13, 2019 at 9:01 PM Adam Bahe <adambahe@gmail.com> wrote:
>
> > Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
>
> I know fixes went in to distro specific kernels. But wanted to verify if the fix went into the vanilla kernel.org kernel? If so, what version should be safe?
>
> ex: https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
>
> With 180 raw TB in raid1 I just want to be explicit. Thanks!
>
> On Fri, Sep 13, 2019 at 11:16 PM Pete <pete@petezilla.co.uk> wrote:
>>
>> On 9/12/19 3:28 PM, Filipe Manana wrote:
>>
>> >>> 2) writeback for some btree nodes may never be started and we end up
>> >>> committing a transaction without noticing that. This is really
>> >>> serious
>> >>> and that will lead to the "parent transid verify failed on ..."
>> >>> messages.
>>
>> > Two people reported the hang yesterday here on the list, plus at least
>> > one more some weeks ago.
>>
>> This was one of my messages that I got when I reported an issue in the
>> thread 'Chasing IO errors' which occurred in mid to late August.
>>
>>
>> > I hit it myself once last week and once 2 evenings ago with test cases
>> > from fstests after changing my development branch from 5.1 to 5.3-rcX.
>> >
>> > To hit any of the problems, sure, you still need to have some bad
>> > luck, but it's impossible to tell how likely to run into it.
>> > It depends on so many things, from workloads, system configuration, etc.
>> > No matter how likely (and how likely will not be the same for
>> > everyone), it's serious because if it happens you can get a corrupt
>> > filesystem.
>>
>> I can't help you with any specifics workloads causing it.  I just
>> notices that my fs went read only, that is all.
>>
>>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-10-14  2:07           ` Adam Bahe
@ 2019-10-14  2:19             ` Qu Wenruo
  2019-10-14 17:54             ` Chris Murphy
  1 sibling, 0 replies; 84+ messages in thread
From: Qu Wenruo @ 2019-10-14  2:19 UTC (permalink / raw)
  To: Adam Bahe, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 2388 bytes --]



On 2019/10/14 上午10:07, Adam Bahe wrote:
>> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
> 
> I know fixes went in to distro specific kernels. But wanted to verify
> if the fix went into the vanilla kernel.org kernel? If so, what
> version should be safe? ex:
> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
> 
> With 180 raw TB in raid1 I just want to be explicit. Thanks!

v5.2.15 and newer.
v5.3.0 and newer.

Kernels before v5.2 are not affected.

Thanks,
Qu
> 
> 
> On Sun, Oct 13, 2019 at 9:01 PM Adam Bahe <adambahe@gmail.com> wrote:
>>
>>> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
>>
>> I know fixes went in to distro specific kernels. But wanted to verify if the fix went into the vanilla kernel.org kernel? If so, what version should be safe?
>>
>> ex: https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
>>
>> With 180 raw TB in raid1 I just want to be explicit. Thanks!
>>
>> On Fri, Sep 13, 2019 at 11:16 PM Pete <pete@petezilla.co.uk> wrote:
>>>
>>> On 9/12/19 3:28 PM, Filipe Manana wrote:
>>>
>>>>>> 2) writeback for some btree nodes may never be started and we end up
>>>>>> committing a transaction without noticing that. This is really
>>>>>> serious
>>>>>> and that will lead to the "parent transid verify failed on ..."
>>>>>> messages.
>>>
>>>> Two people reported the hang yesterday here on the list, plus at least
>>>> one more some weeks ago.
>>>
>>> This was one of my messages that I got when I reported an issue in the
>>> thread 'Chasing IO errors' which occurred in mid to late August.
>>>
>>>
>>>> I hit it myself once last week and once 2 evenings ago with test cases
>>>> from fstests after changing my development branch from 5.1 to 5.3-rcX.
>>>>
>>>> To hit any of the problems, sure, you still need to have some bad
>>>> luck, but it's impossible to tell how likely to run into it.
>>>> It depends on so many things, from workloads, system configuration, etc.
>>>> No matter how likely (and how likely will not be the same for
>>>> everyone), it's serious because if it happens you can get a corrupt
>>>> filesystem.
>>>
>>> I can't help you with any specifics workloads causing it.  I just
>>> notices that my fs went read only, that is all.
>>>
>>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 12:58       ` Christoph Anton Mitterer
@ 2019-10-14  4:00         ` Nicholas D Steeves
  0 siblings, 0 replies; 84+ messages in thread
From: Nicholas D Steeves @ 2019-10-14  4:00 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Swâmi Petaramesh; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

Hi Christoph and Swâmi,

Christoph Anton Mitterer <calestyo@scientia.net> writes:

> On Thu, 2019-09-12 at 12:53 +0200, Swâmi Petaramesh wrote:
>> Yep, I assume that a big flashing red neon sign should be raised for
>> a 
>> confirmed bug that can trash your filesystem into ashes, and
>> actually 
>> did so for two of mine...
>
> I doubt this will happen... I've asked for something like this to be
> set up on the last corruption bugs but there seems to be little
> interest for a warning system for users.

I used to track such bugs on the Debian wiki page for Btrfs...but users
of Debian and derivatives continued to track
sid/testing/stable-backports kernel, which made me feel like that work
was a waste of time.  Now that page has a warning that reads something
along the lines of "sid/testing/backports kernels periodically have
grave dataloss bugs.  Please track the most recent upstream LTS kernel
if a kernel newer than 4.19.x is required.  That said, upstream
appreciates bug reports using the most recent kernel available to you".

If you'd like to maintain a section at the top of that page that tracks
this type of issue, please go ahead.  I'd rather work on getting boot
environments working properly, then making them easy to use, then
enabling staged upgrades in a rw snapshot before rotating that snapshot
onto the rootfs.

P.S. Do you want to co-found a BTRFS integration team in Debian?  We're
still quite a ways behind SUSE, and even Fedora is ahead of us now!

Regards,
Nicholas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-10-14  2:07           ` Adam Bahe
  2019-10-14  2:19             ` Qu Wenruo
@ 2019-10-14 17:54             ` Chris Murphy
  1 sibling, 0 replies; 84+ messages in thread
From: Chris Murphy @ 2019-10-14 17:54 UTC (permalink / raw)
  To: Adam Bahe; +Cc: Btrfs BTRFS

On Sun, Oct 13, 2019 at 8:07 PM Adam Bahe <adambahe@gmail.com> wrote:
>
> > Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
>
> I know fixes went in to distro specific kernels. But wanted to verify
> if the fix went into the vanilla kernel.org kernel? If so, what
> version should be safe? ex:
> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
>
> With 180 raw TB in raid1 I just want to be explicit. Thanks!

It's fixed upstream stable since 5.2.15, and includes all 5.3.x series.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-08  9:55                           ` Swâmi Petaramesh
@ 2019-08-08 10:12                             ` Qu Wenruo
  0 siblings, 0 replies; 84+ messages in thread
From: Qu Wenruo @ 2019-08-08 10:12 UTC (permalink / raw)
  To: Swâmi Petaramesh, Anand Jain; +Cc: Lionel Bouton, linux-btrfs



On 2019/8/8 下午5:55, Swâmi Petaramesh wrote:
> Hi Qu,
>
> On 8/8/19 10:46 AM, Qu Wenruo wrote:
>> Follow up questions about the corruption.
>>
>> Is there enough free space (not only unallocated, but allocated bg) for
>> metadata?
>>
>> As further digging into the case, it looks like btrfs is even harder to
>> get corrupted for tree blocks.
>>
>> If we have enough metadata free space, we will try to allocate tree
>> blocks at bytenr sequence, without reusing old bytenr until there is not
>> enough space or hit the end of the block group.
>>
>> This means, even we have something wrong implementing barrier, we still
>> won't write new data to old tree blocks (even several trans ago).
>
>
> It's kind of hard for me to say if the 2 filesystems that got corrupt
> lacked allocated metadata space at any time, and now both filesystems
> have been reformatted, so I cannot tell.
>
> What I can be 100% sure is that I never got any “No space left on
> device” ENOSPC on any of them.

No need to hit ENOSPC, although it needs extra info to get the metadata
bg usage to determine, so I didn't expect it to be easy to get.

>
> *BUT* the SSD on which the machine runs may have run close to full as I
> had copied a bunch of ISOs on it shortly before upgrading packages - and
> kernel.
>
> However the upgrade went seemingly good and I didn't see no ENOSPC at
> any time.
>
>
> On the external HD that went corrupt as well, I'm pretty sure it
> happened as follows :
>
> - I started a full backup onto it in an emergency ;
>
> - I asked myself « Will I have enough space » and checked with “df”.
>
> - There were still several dozens of GBs free but not enough for a full
> system backup. I cannot tell if these had been allocated or not in the past.
>
> - Noticing that I would miss HD space (but far before it actually
> happened) I deleted a high number of snapshots from the HD.
>
> - I thus assume that the deletion of snapshots would have freed a good
> amount of data AND metadata space.
>
> So the situation of the external HD was that a full backup was in
> progress and a vast number of snapshots have been deleted meanwhile.
>
> After that the FS got corrupt at some point.
>
>
> For the internal SSD, it looks like the kernel upgrade went good and the
> machine rebooted OK, then midnight came and with it probably the cron
> task that performs “snapper” timeline snapshots deletion.
>
>
> Then the machine was turned off and rebooted next day, and by that time
> the FS was corrupt.
>
>
> So I strongly suspect the issue has something to do with snapshots
> deletion, but I cannot tell more.

I was also working on that in recent days, hasn't yet got any clue. (In
fact, just find btrfs harder to get corrupted if there is enough
metadata space).
But will definitely continue digging.

>
>
> It may be worth noticing that the machine has been running a lot since I
> reverted back to kernel 5.1 and reformatted the filesystems, and that no
> corruption has occurred since, even though I performed quite a lot of
> backups on the external HD after it has been reformatted.
>
> Everything is in the exact same setup as before, except for the kernel.
>
> So I would definitely exclude an hardware problem on the machine : it's
> now running fine as it ever did.
>
> I plan to retry upgrading to Arch kernel 5.2 in the coming weeks after
> having performed a full disk binary clone in case it happens again.
>
> (However I've seen that Arch has released 3-4 kernel 5.2 package updates
> since, so it won't be the exact same kernel by the time I test again).

No problem, not that many fixes get backport, none of them are really
high priority so I'd say it would not make much difference.

>
> I will be on vacation until August, 20, so I cannot perform this test
> before I'm back.
>
> But I'll be glad to help if I can and thank you very much for your help
> with this issue.

My pleasure, if we could finally pin down the cause, it would be a great
improvement for btrfs.

Thanks,
Qu
>
> Best regards.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-08  8:46                         ` Qu Wenruo
@ 2019-08-08  9:55                           ` Swâmi Petaramesh
  2019-08-08 10:12                             ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-08  9:55 UTC (permalink / raw)
  To: Qu Wenruo, Anand Jain; +Cc: Lionel Bouton, linux-btrfs

Hi Qu,

On 8/8/19 10:46 AM, Qu Wenruo wrote:
> Follow up questions about the corruption.
>
> Is there enough free space (not only unallocated, but allocated bg) for
> metadata?
>
> As further digging into the case, it looks like btrfs is even harder to
> get corrupted for tree blocks.
>
> If we have enough metadata free space, we will try to allocate tree
> blocks at bytenr sequence, without reusing old bytenr until there is not
> enough space or hit the end of the block group.
>
> This means, even we have something wrong implementing barrier, we still
> won't write new data to old tree blocks (even several trans ago).


It's kind of hard for me to say if the 2 filesystems that got corrupt
lacked allocated metadata space at any time, and now both filesystems
have been reformatted, so I cannot tell.

What I can be 100% sure is that I never got any “No space left on
device” ENOSPC on any of them.

*BUT* the SSD on which the machine runs may have run close to full as I
had copied a bunch of ISOs on it shortly before upgrading packages - and
kernel.

However the upgrade went seemingly good and I didn't see no ENOSPC at
any time.


On the external HD that went corrupt as well, I'm pretty sure it
happened as follows :

- I started a full backup onto it in an emergency ;

- I asked myself « Will I have enough space » and checked with “df”.

- There were still several dozens of GBs free but not enough for a full
system backup. I cannot tell if these had been allocated or not in the past.

- Noticing that I would miss HD space (but far before it actually
happened) I deleted a high number of snapshots from the HD.

- I thus assume that the deletion of snapshots would have freed a good
amount of data AND metadata space.

So the situation of the external HD was that a full backup was in
progress and a vast number of snapshots have been deleted meanwhile.

After that the FS got corrupt at some point.


For the internal SSD, it looks like the kernel upgrade went good and the
machine rebooted OK, then midnight came and with it probably the cron
task that performs “snapper” timeline snapshots deletion.


Then the machine was turned off and rebooted next day, and by that time
the FS was corrupt.


So I strongly suspect the issue has something to do with snapshots
deletion, but I cannot tell more.


It may be worth noticing that the machine has been running a lot since I
reverted back to kernel 5.1 and reformatted the filesystems, and that no
corruption has occurred since, even though I performed quite a lot of
backups on the external HD after it has been reformatted.

Everything is in the exact same setup as before, except for the kernel.

So I would definitely exclude an hardware problem on the machine : it's
now running fine as it ever did.

I plan to retry upgrading to Arch kernel 5.2 in the coming weeks after
having performed a full disk binary clone in case it happens again.

(However I've seen that Arch has released 3-4 kernel 5.2 package updates
since, so it won't be the exact same kernel by the time I test again).

I will be on vacation until August, 20, so I cannot perform this test
before I'm back.

But I'll be glad to help if I can and thank you very much for your help
with this issue.

Best regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-01 18:56                       ` Swâmi Petaramesh
@ 2019-08-08  8:46                         ` Qu Wenruo
  2019-08-08  9:55                           ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-08-08  8:46 UTC (permalink / raw)
  To: Swâmi Petaramesh, Anand Jain; +Cc: Lionel Bouton, linux-btrfs



On 2019/8/2 上午2:56, Swâmi Petaramesh wrote:
> Le 01/08/2019 à 15:46, Anand Jain a écrit :
>>  Swami, Do you have the kernel logs around this time frame?
>
> No, it really got lost.
>
> ॐ
>
Follow up questions about the corruption.

Is there enough free space (not only unallocated, but allocated bg) for
metadata?

As further digging into the case, it looks like btrfs is even harder to
get corrupted for tree blocks.

If we have enough metadata free space, we will try to allocate tree
blocks at bytenr sequence, without reusing old bytenr until there is not
enough space or hit the end of the block group.

This means, even we have something wrong implementing barrier, we still
won't write new data to old tree blocks (even several trans ago).

Thanks,
Qu

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-01 13:46                     ` Anand Jain
@ 2019-08-01 18:56                       ` Swâmi Petaramesh
  2019-08-08  8:46                         ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-01 18:56 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo; +Cc: Lionel Bouton, linux-btrfs

Le 01/08/2019 à 15:46, Anand Jain a écrit :
>  Swami, Do you have the kernel logs around this time frame?

No, it really got lost.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-01  8:43                   ` Qu Wenruo
@ 2019-08-01 13:46                     ` Anand Jain
  2019-08-01 18:56                       ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Anand Jain @ 2019-08-01 13:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Swâmi Petaramesh, Lionel Bouton, linux-btrfs



> On 1 Aug 2019, at 4:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
> 
> 
> On 2019/8/1 下午4:07, Swâmi Petaramesh wrote:
>> On 8/1/19 8:36 AM, Qu Wenruo wrote:
>>> Could you give more detailed history, including each reboot?
>>> Like:
>>> 
>>> CASE 1
>>> # Upgrade kernel (running 5.1)
>>> # Reboot
>>> # Kernel mount failure (running 5.2)
>> 
>> No, it never was a “kernel mount failure”, it was more of :
>> 
>> - Running 5.1 OK
>> 
>> - Upgrade to 5.2
>> 
>> - Reboot without noticing problem on kernel 5.2.1-arch1-1
>> 
>> - Performed usual remote rsync backup using kernel 5.2.1-arch1-1 WITHOUT
>> any error at 23:20 on july, 16
>> 
>> - Quite unfortunately I do not backup /var/log in frequent rsync backups...
>> 
>> - Machine does its usual cronned snapper snapshots auto-delete
>> 
>> - Turned off machine for the night
> 
> So it looks like damage is done at this point.
> 
> It looks like some data doesn't reach disk properly.
> 

 Yep I suspect the same.

 Swami, Do you have the kernel logs around this time frame?


>> 
>> - Next days, boot machine as usual (without paying attention to
>> scrolling messages)
>> 
>> - Machine boots. Cinnamon GUI fails loading. Wonders. Reboot.
>> 
>> - Notice BTRFS error messages on console at boot. Still no GUI.
>> 
>> - Reboot in systemd rescue mode. Run "btrfs check -f" in read-only mode.
>> 
>> - Get LOADS of error messages.
>> 
>> - Tells myself « Jeez the damn thing screwed up ! »
>> 
>> - Reboot in multi-user.target console mode
>> 
>> - Notice BTRFS errors again.
>> 
>> - Connect external USB HD for performing an emergency full backup of
>> what can be.
>> 
>> - Lack enough space on external USB HD. Delete a load of old snapshots
>> to make enough space.
> 
> Indeed looks like some btrfs bug in 5.2 now.
> 
> So far, the common workload involves snapshot deletion and proper shutdown.
> 
> I need to double check the snapshot deletion with dm-logwrites.
> 
> Thanks for the detailed history, this helps more than the btrfs
> check/log message.
> 
> Thanks,
> Qu


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-01  8:07                 ` Swâmi Petaramesh
@ 2019-08-01  8:43                   ` Qu Wenruo
  2019-08-01 13:46                     ` Anand Jain
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-08-01  8:43 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Anand Jain, Lionel Bouton, linux-btrfs



On 2019/8/1 下午4:07, Swâmi Petaramesh wrote:
> On 8/1/19 8:36 AM, Qu Wenruo wrote:
>> Could you give more detailed history, including each reboot?
>> Like:
>>
>> CASE 1
>> # Upgrade kernel (running 5.1)
>> # Reboot
>> # Kernel mount failure (running 5.2)
>
> No, it never was a “kernel mount failure”, it was more of :
>
> - Running 5.1 OK
>
> - Upgrade to 5.2
>
> - Reboot without noticing problem on kernel 5.2.1-arch1-1
>
> - Performed usual remote rsync backup using kernel 5.2.1-arch1-1 WITHOUT
> any error at 23:20 on july, 16
>
> - Quite unfortunately I do not backup /var/log in frequent rsync backups...
>
> - Machine does its usual cronned snapper snapshots auto-delete
>
> - Turned off machine for the night

So it looks like damage is done at this point.

It looks like some data doesn't reach disk properly.

>
> - Next days, boot machine as usual (without paying attention to
> scrolling messages)
>
> - Machine boots. Cinnamon GUI fails loading. Wonders. Reboot.
>
> - Notice BTRFS error messages on console at boot. Still no GUI.
>
> - Reboot in systemd rescue mode. Run "btrfs check -f" in read-only mode.
>
> - Get LOADS of error messages.
>
> - Tells myself « Jeez the damn thing screwed up ! »
>
> - Reboot in multi-user.target console mode
>
> - Notice BTRFS errors again.
>
> - Connect external USB HD for performing an emergency full backup of
> what can be.
>
> - Lack enough space on external USB HD. Delete a load of old snapshots
> to make enough space.

Indeed looks like some btrfs bug in 5.2 now.

So far, the common workload involves snapshot deletion and proper shutdown.

I need to double check the snapshot deletion with dm-logwrites.

Thanks for the detailed history, this helps more than the btrfs
check/log message.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-01  6:36               ` Qu Wenruo
@ 2019-08-01  8:07                 ` Swâmi Petaramesh
  2019-08-01  8:43                   ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-01  8:07 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Anand Jain, Lionel Bouton, linux-btrfs

On 8/1/19 8:36 AM, Qu Wenruo wrote:
> Could you give more detailed history, including each reboot?
> Like:
>
> CASE 1
> # Upgrade kernel (running 5.1)
> # Reboot
> # Kernel mount failure (running 5.2)

No, it never was a “kernel mount failure”, it was more of :

- Running 5.1 OK

- Upgrade to 5.2

- Reboot without noticing problem on kernel 5.2.1-arch1-1

- Performed usual remote rsync backup using kernel 5.2.1-arch1-1 WITHOUT
any error at 23:20 on july, 16

- Quite unfortunately I do not backup /var/log in frequent rsync backups...

- Machine does its usual cronned snapper snapshots auto-delete

- Turned off machine for the night

- Next days, boot machine as usual (without paying attention to
scrolling messages)

- Machine boots. Cinnamon GUI fails loading. Wonders. Reboot.

- Notice BTRFS error messages on console at boot. Still no GUI.

- Reboot in systemd rescue mode. Run "btrfs check -f" in read-only mode.

- Get LOADS of error messages.

- Tells myself « Jeez the damn thing screwed up ! »

- Reboot in multi-user.target console mode

- Notice BTRFS errors again.

- Connect external USB HD for performing an emergency full backup of
what can be.

- Lack enough space on external USB HD. Delete a load of old snapshots
to make enough space.

- Perform full backup (rsync onto external HD). Everything goes well
except for a few recently modified files that fail. Either temp or cache
files I can live without, or files that are OK in the remote backup
performed the evening before.

- Wait a few days before restoring the machine - lack of time.

- Reformat and restore the machine, reverting to kernel 5.1.

- Want to perform more backups onto the external USB HD.

- Get BTRFS errors on the external HD (posted here previously).

- Eventually decide to reformat the external HD completely as the FS
seems to be beyond salvation by “btrfs check”.


- The machine and involved disks seems stable and have been checked
healthy now with kernel 5.1.

- As you can see, the damaged filesystems have been reformatted, and I'm
afraid I don't have useful logs available.


> (It's a really pity that the original corrupted leaf kernel message
> can't be preserved, that could really help a lot to detect memory
> corruption or things like that

Well I'm sorry..

Kind regards.

-- 

ॐ

Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-01  6:07             ` Swâmi Petaramesh
@ 2019-08-01  6:36               ` Qu Wenruo
  2019-08-01  8:07                 ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-08-01  6:36 UTC (permalink / raw)
  To: Swâmi Petaramesh, Anand Jain; +Cc: Lionel Bouton, linux-btrfs

[...]
>
> So I am - for myself - positively sure that the 2 FS corruptions I met
> were related to Arch kernel 5.2 on this machine, as it happened right
> after I had upgraded the kernel, had never happened before, and doesn't
> happen since I downgraded the kernel.

Could you give more detailed history, including each reboot?

Like:

CASE 1
# Upgrade kernel (running 5.1)
# Reboot
# Kernel mount failure (running 5.2)

CASE 2
# Upgrade kernel (running 5.1)
# Reboot
# Kernel mount success (running 5.2)
# Doing some operations (running 5.2)
# Reboot
# Kernel mount failure (running 5.2)

For case 1, as already explained, the damage is done using 5.1 not 5.2.
For case 2, it's indeed more likely 5.2's fault.

BTW, working case makes no sense here, as that's expected.

(It's a really pity that the original corrupted leaf kernel message
can't be preserved, that could really help a lot to detect memory
corruption or things like that)

Thanks,
Qu


>
> I have to add however that I upgraded another little machine to Manjaro
> kernel 5.2 - after taking a full clone of the FS - and I don't have met
> any filesystem corruption so far.
>
> It is worth noting that Manjaro is the same family as Arch.
>
>
> So even though I have no better logs to provide, here is my experience :
>
> - Arch kernel 5.2 : BTRFS over LVM over LUKS on a SSD, and BTRFS over
> LUKS on an USB HD : 2 filesystem corruptions. Both using numerous
> snapshots, some were deleted (either by snapper or manually). Downgraded
> to 5.1 now OK.
>
>
> - Manjaro kernel 5.2 on a small laptop, BTRFS over LUKS on eMMC, no
> compression, no snapshots, no problem so far.
>
>
> - Manjaro kernel 5.2 on another laptop for a very short while before
> reverting to 5.1, BTRFS over LVM over LUKS on SSD, a few snapshots, I
> dunno if some were deleted (snapper) : Still OK.
>
> - Manjaro kernel 5.2 on a desktop for a very short while before
> reverting to 5.1, BTRFS RAID-1 over bcache over LUKS on a 2 HD + 1 SSD
> mix, a few snapshots, I dunno if some were deleted (snapper) : Still OK.
>
> So you see the setups can be a bit complex : Always a LUKS layer,
> compression used on mechanical HDs, sometimes LVM or bcache, some BTRFS
> RAID on one system...
>
> As far as I can tell, the issue doesn't relate to the most complex setups.
>
>
> I am under the unproved but strong feeling that the mess has something
> to do with snapshots deletion with kernel 5.2...
>
> Dunno if it can be of some help.
>
> Kind regards.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-01  4:50           ` Anand Jain
@ 2019-08-01  6:07             ` Swâmi Petaramesh
  2019-08-01  6:36               ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-08-01  6:07 UTC (permalink / raw)
  To: Anand Jain; +Cc: Lionel Bouton, linux-btrfs

Hello,

Le 01/08/2019 à 06:50, Anand Jain a écrit :
>>
>> So, I've had the issue of 2 FSes so far :
>>
>> - BTRFS FS on LVM over LUKS on a SATA SSD.
>>
>> - BTRFS FS directly over LUKS on an USB-3 mechanical HD.
>>
>> (All this having been perfectly stable until upgrade to 5.2 kernel...)
>>
> 
>  What kind of btrfs chunk profiles were used here (I guess its either
> single or dup)?

Yes, it was the default profiles :

- Single data, single metadata on the internal SSD (mounted with the
"ssd,discard,noatime" options and no compression.

- Single data, DUP metadata on the external USB HD (mounted with the
"noatime,compress=zstd" options.


I have downgraded the kenel to 5.1.16-arch1-1-ARCH when I restored the
machine (before rebooting it) and recreated the SSD BTRFS FS using the
latest "Parted Magic" (5.1 kernel).

The kernel was the ONLY package I downgraded.

The machine has been running like a charm since - as it ever dit - and
I'm typing this email on it.

(The SSD has passed extended self-tests, SMART tests, and BTRFS has been
successfully scrubbed since it was recreated)


So I am - for myself - positively sure that the 2 FS corruptions I met
were related to Arch kernel 5.2 on this machine, as it happened right
after I had upgraded the kernel, had never happened before, and doesn't
happen since I downgraded the kernel.

I have to add however that I upgraded another little machine to Manjaro
kernel 5.2 - after taking a full clone of the FS - and I don't have met
any filesystem corruption so far.

It is worth noting that Manjaro is the same family as Arch.


So even though I have no better logs to provide, here is my experience :

- Arch kernel 5.2 : BTRFS over LVM over LUKS on a SSD, and BTRFS over
LUKS on an USB HD : 2 filesystem corruptions. Both using numerous
snapshots, some were deleted (either by snapper or manually). Downgraded
to 5.1 now OK.


- Manjaro kernel 5.2 on a small laptop, BTRFS over LUKS on eMMC, no
compression, no snapshots, no problem so far.


- Manjaro kernel 5.2 on another laptop for a very short while before
reverting to 5.1, BTRFS over LVM over LUKS on SSD, a few snapshots, I
dunno if some were deleted (snapper) : Still OK.

- Manjaro kernel 5.2 on a desktop for a very short while before
reverting to 5.1, BTRFS RAID-1 over bcache over LUKS on a 2 HD + 1 SSD
mix, a few snapshots, I dunno if some were deleted (snapper) : Still OK.

So you see the setups can be a bit complex : Always a LUKS layer,
compression used on mechanical HDs, sometimes LVM or bcache, some BTRFS
RAID on one system...

As far as I can tell, the issue doesn't relate to the most complex setups.


I am under the unproved but strong feeling that the mess has something
to do with snapshots deletion with kernel 5.2...

Dunno if it can be of some help.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:04         ` Swâmi Petaramesh
@ 2019-08-01  4:50           ` Anand Jain
  2019-08-01  6:07             ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Anand Jain @ 2019-08-01  4:50 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Lionel Bouton, linux-btrfs

On 29/7/19 10:04 PM, Swâmi Petaramesh wrote:
> On 7/29/19 3:58 PM, Lionel Bouton wrote:
>> For example I suspected that your SSD is a SATA one and I remember
>> data corruption bugs where the root cause was wrong assumptions made
>> between the filesystem layer and the io scheduler. As NVMe devices
>> triggered major io scheduler rework it seemed worthwhile to mention
>> that my system might differ from yours on this.
> 
> So, I've had the issue of 23 FSes so far :
> 
> - BTRFS FS on LVM over LUKS on a SATA SSD.
> 
> - BTRFS FS directly over LUKS on an USB-3 mechanical HD.
> 
> (All this having been perfectly stable until upgrade to 5.2 kernel...)
> 

  What kind of btrfs chunk profiles were used here (I guess its either 
single or dup)?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-30 23:13                               ` Graham Cobb
@ 2019-07-30 23:24                                 ` Chris Murphy
  0 siblings, 0 replies; 84+ messages in thread
From: Chris Murphy @ 2019-07-30 23:24 UTC (permalink / raw)
  To: Graham Cobb; +Cc: Swâmi Petaramesh, Btrfs BTRFS

On Tue, Jul 30, 2019 at 5:13 PM Graham Cobb <g.btrfs@cobb.uk.net> wrote:
>
> On 30/07/2019 23:44, Swâmi Petaramesh wrote:
> > Still, losing a given FS with subvols, snapshots etc, may be very
> > annoying and very time consuming rebuilding.
>
> I believe that in one of the earlier mails, Qu said that you can
> probably mount the corrupted fs readonly and read everything.
>
> If that is the case then, if I were in your position, I would probably
> buy another disk, create a a new fs, and then use one of the subvol
> preserving btrfs clone utilities to clone the readonly disk onto the new
> disk.
>
> Not cheap, and would still take some time, but at least it could be
> automated.

btrfstune might allow the seeding flag to be changed on this volume;
I'm not sure what kind of checks are done to see if it's viable; but
also the seed feature is somewhere between tricky and unsupported in
multiple device contexts.

But yeah if it's readonly mountable, it should be possible to script
`btrfs send-receive` based replication for the ro snapshots anyway, to
preserve shared extents.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-30 22:44                             ` Swâmi Petaramesh
@ 2019-07-30 23:13                               ` Graham Cobb
  2019-07-30 23:24                                 ` Chris Murphy
  0 siblings, 1 reply; 84+ messages in thread
From: Graham Cobb @ 2019-07-30 23:13 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Btrfs BTRFS

On 30/07/2019 23:44, Swâmi Petaramesh wrote:
> Still, losing a given FS with subvols, snapshots etc, may be very
> annoying and very time consuming rebuilding.

I believe that in one of the earlier mails, Qu said that you can
probably mount the corrupted fs readonly and read everything.

If that is the case then, if I were in your position, I would probably
buy another disk, create a a new fs, and then use one of the subvol
preserving btrfs clone utilities to clone the readonly disk onto the new
disk.

Not cheap, and would still take some time, but at least it could be
automated.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-30 20:15                           ` Chris Murphy
@ 2019-07-30 22:44                             ` Swâmi Petaramesh
  2019-07-30 23:13                               ` Graham Cobb
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-30 22:44 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Qu Wenruo, Btrfs BTRFS

Le 30/07/2019 à 22:15, Chris Murphy a écrit :
> I sympathize with the lack of resources. But no full disk backup
> simply cannot be taken seriously in any computer science context. The
> data cannot be that important by the user's own estimation if there
> aren't backups. It's reasonable for resource limitations to have a
> subset of data backed up. But if none of it is *shrug* there just
> aren't that many people who will sympathize with data loss if there
> are no backups.

I do, have backups for everything, and backups of backups and offsite
backups, etc.

What I mean is that i.e. I have a NAS machine that acts as a backup
server for all the rest of my machines. Yes the rest is also cross
backed up scattered here and there. If I lose the NAS I don't lose no
*unique data*

Still, the exact organization of THIS machine's filesystems is unique
and would be long redoing, it may hold some timeline snapshots that
other storage doesn't have or doesn't have anymore etc.

I can lose any of my filesystems without losing critical data. I can
live and survive this way.

Still, losing a given FS with subvols, snapshots etc, may be very
annoying and very time consuming rebuilding.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-30  8:09                         ` Swâmi Petaramesh
@ 2019-07-30 20:15                           ` Chris Murphy
  2019-07-30 22:44                             ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Chris Murphy @ 2019-07-30 20:15 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS

On Tue, Jul 30, 2019 at 2:09 AM Swâmi Petaramesh <swami@petaramesh.org> wrote:
>
> On 7/29/19 9:10 PM, Chris Murphy wrote:
> > We've discussed many times how both file system repair, and file
> > system restore from backup, simply are not scalable for big file
> > systems. It takes too long.
>
> So what would be the solution ?

There presently is no solution, and I'm not aware of the future plan
either. I think it's a problem.

>
> IMHO yes, having to full backup then reformat then full restore is
> impractical for big FSes. Especially if they have a lot of subvols.
>
> Also most private individuals do not have enough disks to perform a full
> backup of their RAID NAS, etc.

I sympathize with the lack of resources. But no full disk backup
simply cannot be taken seriously in any computer science context. The
data cannot be that important by the user's own estimation if there
aren't backups. It's reasonable for resource limitations to have a
subset of data backed up. But if none of it is *shrug* there just
aren't that many people who will sympathize with data loss if there
are no backups.

Backup+restore is for sure a Byzantine work around for the data
storage problem, but you have no idea what will fail or what will
fail. There's not a file system list on earth that will tell you it's
OK to not have backups.


> I believe that we should have a repair tool that can fix a filesystem
> metadata and make it clean and usable again even if this is at the cost
> of losing a whole directory tree or subvols or whatever.

So far that isn't how it works. I don't know if it's a limitation of
the on disk format. Or a limitation on reconstructing from incorrect
information, even though the checksum is correct.


> But it would be better to lose clearly identified things and resume with
> a working FS and a list of files to be restored, rather than being
> unable to repair and having to reformat everything and restore everything...

Yep. That doesn't exist yet and I don't know if that's a design goal
of Btrfs eventually.

ZFS meanwhile has no repair tool. If it becomes inconsistent, that's
it, recreate the file system.

If your use case policy requires a repair tool, you really have to
disqualify both ZFS and Btrfs because the Btrfs repair tool is still
marked in the man page as dangerous. I just cannot take repair of
Btrfs seriously when Btrfs developers consider it dangerous on a case
by case basis.

It's always the case with any file system that a clean reproducer has
the best chance of getting developer attention. This is not easy. Part
of practical best practice is having a bulk of systems on some very
stable operating system with well maintained stable, or actively
maintained long term kernels. And to have some smaller percentage of
machines to test mainline kernels on. It might be annoying and
tedious, and definitely bad and a bug, to have a problem. But at least
your problem is restricted to your test machines.

There isn't enough history here to piece together with any certainty
why you're experiencing what you're experiencing beyond what Qu has
already stated.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-30  8:04     ` Henk Slager
@ 2019-07-30  8:17       ` Swâmi Petaramesh
  0 siblings, 0 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-30  8:17 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

Hi,

On 7/30/19 10:04 AM, Henk Slager wrote:
> Maybe you could zoom-in a bit more on the kernel (and btrfs-progs) binary.
> Does Arch do any changes to the kernel.org version 5.2.0 ?

I don't precisely know. Each and every distro out there applies some set
of patches to the kernel that they package.

However Arch is known to keep things simple and up-to-date and as close
as possible to the upstream.

> And what configuration is used?
> Or did you create/compile things by yourself?
> What compiler version is used?

I didn't compile this kernel, just used the latest kernel packages i.e.
https://www.archlinux.org/packages/core/x86_64/linux/

(There's been another upgrade since the fisrt 4.2.x I installed, with
reported issues...)

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 19:10                       ` Chris Murphy
@ 2019-07-30  8:09                         ` Swâmi Petaramesh
  2019-07-30 20:15                           ` Chris Murphy
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-30  8:09 UTC (permalink / raw)
  To: Chris Murphy, Qu Wenruo; +Cc: Btrfs BTRFS

On 7/29/19 9:10 PM, Chris Murphy wrote:
> We've discussed many times how both file system repair, and file
> system restore from backup, simply are not scalable for big file
> systems. It takes too long.

So what would be the solution ?

IMHO yes, having to full backup then reformat then full restore is
impractical for big FSes. Especially if they have a lot of subvols.

Also most private individuals do not have enough disks to perform a full
backup of their RAID NAS, etc.

And a FS “repair” is long and often inefficient.

I believe that we should have a repair tool that can fix a filesystem
metadata and make it clean and usable again even if this is at the cost
of losing a whole directory tree or subvols or whatever.

But it would be better to lose clearly identified things and resume with
a working FS and a list of files to be restored, rather than being
unable to repair and having to reformat everything and restore everything...

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:35   ` Swâmi Petaramesh
@ 2019-07-30  8:04     ` Henk Slager
  2019-07-30  8:17       ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Henk Slager @ 2019-07-30  8:04 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs

On Mon, Jul 29, 2019 at 4:17 PM Swâmi Petaramesh <swami@petaramesh.org> wrote:
>
> On 7/29/19 3:29 PM, Lionel Bouton wrote:
> > For another reference point, my personal laptop reports 17 days of
> > uptime on 5.2.0-arch2-1-ARCH.
> > I use BTRFS both over LUKS over LVM and directly over LVM. The system
> > is suspended during the night and running otherwise (probably more
> > than 16 hours a day).
> >
> > I don't have any problem so far. I'll reboot right away and reply to
> > this message (if you see it and not a reply shortly after, there might
> > be a bug affecting me too).
> >
> Well I had upgraded 3 machines to 5.2 (One Arch and 2 Manjaros).
>
> The Arch broke 2 BTRFS filesystems residing on 2 different disks that
> had been perfectly reliable ever before.
>
> The 2 Manjaros did not exhibit trouble so far but I use these 2 very
> little and I preferred to revert back to 5.1 in a hurry before I break
> my backup machines as badly as my main machine :-/
>
> My Arch first broke its BTRFS main FS and I told myself it was years
> old, so maybe some old corruption undetected by scrub so far...
Maybe you could zoom-in a bit more on the kernel (and btrfs-progs) binary.
Does Arch do any changes to the kernel.org version 5.2.0 ?
And what configuration is used?
Or did you create/compile things by yourself?
What compiler version is used?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 19:20                               ` Chris Murphy
@ 2019-07-30  6:47                                 ` Swâmi Petaramesh
  0 siblings, 0 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-30  6:47 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Qu Wenruo, Btrfs BTRFS

Le 29/07/2019 à 21:20, Chris Murphy a écrit :
> I think it's totally reasonable to go back to 5.1 for a while and make
> certain the problems aren't happening there.

That's exactly what I did.

> If they are, then I start
> to wonder about noisy power since you have so many different drives
> and setups affected. Some of the strangest problems I have ever seen
> in computing were directly attributed to noise on the power line.

Yeah, but in this case, being a laptop with the main power being
filtered by the battery, I wouldn't expect a power issue...

Best regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 15:05                             ` Swâmi Petaramesh
@ 2019-07-29 19:20                               ` Chris Murphy
  2019-07-30  6:47                                 ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Chris Murphy @ 2019-07-29 19:20 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Qu Wenruo, Btrfs BTRFS

On Mon, Jul 29, 2019 at 9:05 AM Swâmi Petaramesh <swami@petaramesh.org> wrote:
>
> On 7/29/19 4:55 PM, Swâmi Petaramesh wrote:
> > Well  All the errors I detailed today happen on the SAME FS, and this fs
> > is a BTRFS that was created on a new HD with a recent kernel (surely >=
> > 4.19) only a few months ago.
> >
> > And the errors I have one this one, As far as I can tell, look exactly
> > like what happened on the same machines SSD as soons as I installer a
> > 5.2 kernel...
>
> Plus I just decided to “btrfs check” the SSD FS from my machine (not yet
> showing errors), which I completely reformatted using 5.2 3 days ago
> (after having fully tested the SSD error-free itself)...
>
> And btrfs check tells me that this FS is now completely corrupt as well
> :-(((
>
> The list of files in error has been scrolling for five minutes now :-(((

Without both dmesg and btrfs check output it's not very useful. I've
got a case where a file system scrubs fine and btrfs check complains,
but turns out it's because of nocow files that are compressed via
defrag path. The files are fine, there is no corruption, it's just
noise. But the only way to know it is to always include the full dmesg
and check output - I personally think snippets and trimmed logs are
annoying. In this case we don't actually have anything to go on,
therefore the problem could be anything, therefore we need all the
information available.

It seems unlikely drive related, as so many drives are involved. Same
for logic board or RAM. I think these days with ext4 and XFS using
checksumming for metadata, that if it were device-mapper or blk-mq
related, they too would see errors. And yet of course many people are
using kernel 5.2 with Btrfs and aren't having problems. So, it's just
inherently tedious work to narrow down what's causing the problem.

I think it's totally reasonable to go back to 5.1 for a while and make
certain the problems aren't happening there. If they are, then I start
to wonder about noisy power since you have so many different drives
and setups affected. Some of the strangest problems I have ever seen
in computing were directly attributed to noise on the power line.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:40                     ` Qu Wenruo
  2019-07-29 14:46                       ` Swâmi Petaramesh
@ 2019-07-29 19:10                       ` Chris Murphy
  2019-07-30  8:09                         ` Swâmi Petaramesh
  1 sibling, 1 reply; 84+ messages in thread
From: Chris Murphy @ 2019-07-29 19:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Swâmi Petaramesh, Btrfs BTRFS

On Mon, Jul 29, 2019 at 8:40 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/7/29 下午10:34, Swâmi Petaramesh wrote:
> > Le 29/07/2019 à 16:27, Qu Wenruo a écrit :
> >> BTW, I'm more interesting in your other corrupted leaf report other than
> >> this transid error.
> >
> > Well I already broke 2 FSes including my most important computer with
> > this, took me 2 working days to restore and mostly fix my main computer
> > which I couldn't use for a week (because of lack of time for restoring
> > it) and now I lose my main backup disk.
>
> At least from what I see in this transid error, unless you ruled out the
> possibility of bad disk firmware and LVM/LUKS, it's hard to say it's
> btrfs causing the problem.

I'm using kernel 5.2.x since early rc's on several ystems, nvme, SSD,
HDD, half plain partition, half on dmcrypt/LUKS. I can report no
problems. None are on LVM.

It comes down to:
a. workload specific behavior is triggering a new bug in Btrfs or dm
or blk layer, or combination
b. new hardware issue

It seems to me whenever weird stuff pops up with ext4 or XFS, their
call traces generally expose the problem, so I wonder if Btrfs devs
still have the kernel debug information needed to point the blame; or
if there needs to be some debug mode (mount option?) that does extra
checks to try and catch the problem. Is this a case for metadata
integrity checking of some kind, and have Swâmi run this workload?
Either on the problem file system or a new Btrfs file system, just to
gather better quality information?

But yeah, at least a complete current dmesg is needed. And even
possibly helpful is kernel messages for the entire time since
switching to 5.2.0: it could be a big file but easy to filter for dm,
libata, smartd, and btrfs messages. The filtering I'd leave up to a
developer, I always by default provide the entire dmesg, it's not
always clear what the instigator is.

We've discussed many times how both file system repair, and file
system restore from backup, simply are not scalable for big file
systems. It takes too long.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:55                           ` Swâmi Petaramesh
@ 2019-07-29 15:05                             ` Swâmi Petaramesh
  2019-07-29 19:20                               ` Chris Murphy
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 15:05 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 7/29/19 4:55 PM, Swâmi Petaramesh wrote:
> Well  All the errors I detailed today happen on the SAME FS, and this fs
> is a BTRFS that was created on a new HD with a recent kernel (surely >=
> 4.19) only a few months ago.
>
> And the errors I have one this one, As far as I can tell, look exactly
> like what happened on the same machines SSD as soons as I installer a
> 5.2 kernel...

Plus I just decided to “btrfs check” the SSD FS from my machine (not yet
showing errors), which I completely reformatted using 5.2 3 days ago
(after having fully tested the SSD error-free itself)...

And btrfs check tells me that this FS is now completely corrupt as well
:-(((

The list of files in error has been scrolling for five minutes now :-(((

-- 
ॐ

Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:51                         ` Qu Wenruo
@ 2019-07-29 14:55                           ` Swâmi Petaramesh
  2019-07-29 15:05                             ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 14:55 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hi again,

On 7/29/19 4:51 PM, Qu Wenruo wrote:
> If I understand your two threads correctly, one is this transid error,
> the other one is tree-checker, which is completely different from this one.
>
> The tree-checker one is mostly caused by older fs, and as you mentioned,
> reverting to 5.1 solves that problem.
>
> This transid won't be resolved whatever kernel version you use, it's a
> real corruption in extent tree, caused by incorrect metadata CoW.
>
> So they are two different problems, and this transid error can be
> completely unrelated to 5.2 kernel.


Well  All the errors I detailed today happen on the SAME FS, and this fs
is a BTRFS that was created on a new HD with a recent kernel (surely >=
4.19) only a few months ago.

And the errors I have one this one, As far as I can tell, look exactly
like what happened on the same machines SSD as soons as I installer a
5.2 kernel...

Kind regards.

-- 
ॐ

Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:46                       ` Swâmi Petaramesh
@ 2019-07-29 14:51                         ` Qu Wenruo
  2019-07-29 14:55                           ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-07-29 14:51 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs



On 2019/7/29 下午10:46, Swâmi Petaramesh wrote:
> Le 29/07/2019 à 16:40, Qu Wenruo a écrit :
>> At least from what I see in this transid error, unless you ruled out the
>> possibility of bad disk firmware and LVM/LUKS, it's hard to say it's
>> btrfs causing the problem.
>
> Well it would be such a coincidence that a machine that has been 100%
> stable since 2014 suddently crashes 2 filesystems on 2 different disks
> justy after upgrading to 5.2 and both things would be unrelated.
>
> Can happen, but the chances...

If I understand your two threads correctly, one is this transid error,
the other one is tree-checker, which is completely different from this one.

The tree-checker one is mostly caused by older fs, and as you mentioned,
reverting to 5.1 solves that problem.

This transid won't be resolved whatever kernel version you use, it's a
real corruption in extent tree, caused by incorrect metadata CoW.

So they are two different problems, and this transid error can be
completely unrelated to 5.2 kernel.

Thanks,
Qu

>
>> In fact, we have a more experienced sysadmin, Zygo, sharing his
>> experience of bad *HARDWARE* causing various Flush/FUA problem, which is
>> not easy to hit in normal use case, but only after power loss.
>
>> So for your transid error, it's really hard to pin down the cause,
>> unless you have deployed hundreds btrfs...
>
> For the record I am a professional sysadmin with 20+ years Linux
> experience and I have deployed BTRFS on dozens of systems (OK maybe not
> hundreds, not sure)...>
> Kind regards.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:40                     ` Qu Wenruo
@ 2019-07-29 14:46                       ` Swâmi Petaramesh
  2019-07-29 14:51                         ` Qu Wenruo
  2019-07-29 19:10                       ` Chris Murphy
  1 sibling, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 14:46 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Le 29/07/2019 à 16:40, Qu Wenruo a écrit :
> At least from what I see in this transid error, unless you ruled out the
> possibility of bad disk firmware and LVM/LUKS, it's hard to say it's
> btrfs causing the problem.

Well it would be such a coincidence that a machine that has been 100%
stable since 2014 suddently crashes 2 filesystems on 2 different disks
justy after upgrading to 5.2 and both things would be unrelated.

Can happen, but the chances...

> In fact, we have a more experienced sysadmin, Zygo, sharing his
> experience of bad *HARDWARE* causing various Flush/FUA problem, which is
> not easy to hit in normal use case, but only after power loss.

> So for your transid error, it's really hard to pin down the cause,
> unless you have deployed hundreds btrfs...

For the record I am a professional sysadmin with 20+ years Linux
experience and I have deployed BTRFS on dozens of systems (OK maybe not
hundreds, not sure)...

Kind regards.

ॐ
-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:34                   ` Swâmi Petaramesh
@ 2019-07-29 14:40                     ` Qu Wenruo
  2019-07-29 14:46                       ` Swâmi Petaramesh
  2019-07-29 19:10                       ` Chris Murphy
  0 siblings, 2 replies; 84+ messages in thread
From: Qu Wenruo @ 2019-07-29 14:40 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs



On 2019/7/29 下午10:34, Swâmi Petaramesh wrote:
> Le 29/07/2019 à 16:27, Qu Wenruo a écrit :
>> BTW, I'm more interesting in your other corrupted leaf report other than
>> this transid error.
>
> Well I already broke 2 FSes including my most important computer with
> this, took me 2 working days to restore and mostly fix my main computer
> which I couldn't use for a week (because of lack of time for restoring
> it) and now I lose my main backup disk.

At least from what I see in this transid error, unless you ruled out the
possibility of bad disk firmware and LVM/LUKS, it's hard to say it's
btrfs causing the problem.

In fact, we have a more experienced sysadmin, Zygo, sharing his
experience of bad *HARDWARE* causing various Flush/FUA problem, which is
not easy to hit in normal use case, but only after power loss.

So for your transid error, it's really hard to pin down the cause,
unless you have deployed hundreds btrfs...

>
> I'd really like to see this addressed, because I'm crying tears of blood...
>
>> The later one is either some real corruption from older fs, or some
>> false alerts needs to be addressed.
>
> So how could I help with this one ?

As already said in that thread, full dmesg of that mount failure.

Thanks,
Qu

>
> TIA
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:27                 ` Qu Wenruo
@ 2019-07-29 14:34                   ` Swâmi Petaramesh
  2019-07-29 14:40                     ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 14:34 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Swâmi Petaramesh

Le 29/07/2019 à 16:27, Qu Wenruo a écrit :
> BTW, I'm more interesting in your other corrupted leaf report other than
> this transid error.

Well I already broke 2 FSes including my most important computer with
this, took me 2 working days to restore and mostly fix my main computer
which I couldn't use for a week (because of lack of time for restoring
it) and now I lose my main backup disk.

I'd really like to see this addressed, because I'm crying tears of blood...

> The later one is either some real corruption from older fs, or some
> false alerts needs to be addressed.

So how could I help with this one ?

TIA

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:21               ` Swâmi Petaramesh
@ 2019-07-29 14:27                 ` Qu Wenruo
  2019-07-29 14:34                   ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-07-29 14:27 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs



On 2019/7/29 下午10:21, Swâmi Petaramesh wrote:
> Le 29/07/2019 à 16:08, Qu Wenruo a écrit :
>> You don't need to repair.
>>
>> The corruption is in extent tree, to read data out you don't need extent
>> tree at all.
>
>> But I'd say you would have a good chance to salvage a lot of data at least.
>>
>> BTW, --repair may help, but won't be any better than that skip_bg
>> rescue, you won't have much chance other than salvaging data.
>
>
> Basically my question is : Is there anyway I can turn this broken FS
> into a sane FS using « btrfs repair » EVEN if this causes data losses ?
>
> I can afford some data losses of this backup disk (next backup will fix
> missing files)
>
> But I DO NOT want to lose (or have to recreate) the complete FS with all
> its subvols and snapshots, which I have no other disk to copy to currently.
>
> So I can accept a « fix with losses », but not a « well you need to
> reformat the disk completely »...

Then you can try, but we can't ensure anything. The problem is, as long
as CoW is already broken once, especially when extent tree is corrupted,
it's very easy later write breaks CoW again due to corrupted extent
tree, thus make things worse.

The rescue method provides full access, including subvolume and things
like that, the only problem is everything is RO.

To be clear again, btrfs check --repair is never ensured to make the
image to be usable (pass btrfs check after repair), especially when
extent tree corruption is involved.

BTW, I'm more interesting in your other corrupted leaf report other than
this transid error.
The later one is either some real corruption from older fs, or some
false alerts needs to be addressed.

Thanks,
Qu

>
> Kind regards.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:08             ` Qu Wenruo
@ 2019-07-29 14:21               ` Swâmi Petaramesh
  2019-07-29 14:27                 ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 14:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Le 29/07/2019 à 16:08, Qu Wenruo a écrit :
> You don't need to repair.
> 
> The corruption is in extent tree, to read data out you don't need extent
> tree at all.

> But I'd say you would have a good chance to salvage a lot of data at least.
> 
> BTW, --repair may help, but won't be any better than that skip_bg
> rescue, you won't have much chance other than salvaging data.


Basically my question is : Is there anyway I can turn this broken FS
into a sane FS using « btrfs repair » EVEN if this causes data losses ?

I can afford some data losses of this backup disk (next backup will fix
missing files)

But I DO NOT want to lose (or have to recreate) the complete FS with all
its subvols and snapshots, which I have no other disk to copy to currently.

So I can accept a « fix with losses », but not a « well you need to
reformat the disk completely »...

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 14:01           ` Swâmi Petaramesh
@ 2019-07-29 14:08             ` Qu Wenruo
  2019-07-29 14:21               ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-07-29 14:08 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs



On 2019/7/29 下午10:01, Swâmi Petaramesh wrote:
> Le 29/07/2019 à 15:52, Swâmi Petaramesh a écrit :
>>
>> Please tell me how I could help.
>
> Here is the complete output of BTRFS check, ressembling exactly to what
> I saw on the 1st disk that broke.
>
> QUESTION : If I run « btrfs check » in repair mode, is there any hope it
> will repair the FS properly - I can afford to lose the damaged files,
> that's not a problem for me.

You don't need to repair.

The corruption is in extent tree, to read data out you don't need extent
tree at all.

You can experiment with the following patchset:
https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637

>
>
...
> Wrong key of child node/leaf, wanted: (1797454, 96, 23), have:
> (18446744073709551606, 128, 2538887163904)

Unfortunately, some of your fs tree is also corrupted during that CoW
breakage.

The following output is your the corrupted files.

> root 2815 inode 1797454 errors 200, dir isize wrong
> root 2815 inode 1797455 errors 2001, no inode item, link count wrong
> 	unresolved ref dir 1767611 index 5 namelen 2 name fs filetype 2 errors
> 4, no inode ref
[...]
> ERROR: errors found in fs roots

But I'd say you would have a good chance to salvage a lot of data at least.

BTW, --repair may help, but won't be any better than that skip_bg
rescue, you won't have much chance other than salvaging data.

Thanks,
Qu

> found 1681780068352 bytes used, error(s) found
> total csum bytes: 1634102612
> total tree bytes: 7675002880
> total fs tree bytes: 5528207360
> total extent tree bytes: 374669312
> btree space waste bytes: 1041372744
> file data blocks allocated: 2247597993984
>  referenced 1803060187136
> root:~#
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
       [not found]       ` <d8c571e4-718e-1241-66ab-176d091d6b48@bouton.name>
@ 2019-07-29 14:04         ` Swâmi Petaramesh
  2019-08-01  4:50           ` Anand Jain
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 14:04 UTC (permalink / raw)
  To: Lionel Bouton, linux-btrfs

On 7/29/19 3:58 PM, Lionel Bouton wrote:
> For example I suspected that your SSD is a SATA one and I remember
> data corruption bugs where the root cause was wrong assumptions made
> between the filesystem layer and the io scheduler. As NVMe devices
> triggered major io scheduler rework it seemed worthwhile to mention
> that my system might differ from yours on this.

So, I've had the issue of 23 FSes so far :

- BTRFS FS on LVM over LUKS on a SATA SSD.

- BTRFS FS directly over LUKS on an USB-3 mechanical HD.

(All this having been perfectly stable until upgrade to 5.2 kernel...)

-- 
ॐ

Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:52         ` Swâmi Petaramesh
  2019-07-29 13:59           ` Qu Wenruo
@ 2019-07-29 14:01           ` Swâmi Petaramesh
  2019-07-29 14:08             ` Qu Wenruo
  1 sibling, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 14:01 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Le 29/07/2019 à 15:52, Swâmi Petaramesh a écrit :
> 
> Please tell me how I could help.

Here is the complete output of BTRFS check, ressembling exactly to what
I saw on the 1st disk that broke.

QUESTION : If I run « btrfs check » in repair mode, is there any hope it
will repair the FS properly - I can afford to lose the damaged files,
that's not a problem for me.


# btrfs check /dev/mapper/luks-UUID
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-UUID
UUID: ==UUID==
[1/7] checking root items
[2/7] checking extents
parent transid verify failed on 2137144377344 wanted 7684 found 7499
parent transid verify failed on 2137144377344 wanted 7684 found 7499
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
leaf parent key incorrect 2137144377344
bad block 2137144377344
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
Wrong key of child node/leaf, wanted: (1797454, 96, 23), have:
(18446744073709551606, 128, 2538887163904)
Wrong generation of child node/leaf, wanted: 7499, have: 7684
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
root 2815 inode 1797454 errors 200, dir isize wrong
root 2815 inode 1797455 errors 2001, no inode item, link count wrong
	unresolved ref dir 1767611 index 5 namelen 2 name fs filetype 2 errors
4, no inode ref
root 2815 inode 1797456 errors 2001, no inode item, link count wrong
	unresolved ref dir 1767611 index 6 namelen 3 name lib filetype 2 errors
4, no inode ref
root 2815 inode 1797457 errors 2001, no inode item, link count wrong
	unresolved ref dir 1767611 index 7 namelen 4 name misc filetype 2
errors 4, no inode ref
root 2815 inode 1797458 errors 2001, no inode item, link count wrong
	unresolved ref dir 1767611 index 8 namelen 2 name mm filetype 2 errors
4, no inode ref
root 2815 inode 1797459 errors 2001, no inode item, link count wrong
	unresolved ref dir 1767611 index 9 namelen 3 name net filetype 2 errors
4, no inode ref
	unresolved ref dir 1797454 index 23 namelen 8 name firmware filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 24 namelen 3 name fmc filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 25 namelen 4 name fpga filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 26 namelen 3 name fsi filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 27 namelen 4 name gnss filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 28 namelen 4 name gpio filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 29 namelen 3 name gpu filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 30 namelen 3 name hid filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 31 namelen 3 name hsi filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 32 namelen 2 name hv filetype 2 errors
2, no dir index
	unresolved ref dir 1797454 index 33 namelen 5 name hwmon filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 34 namelen 9 name hwtracing filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 35 namelen 3 name i2c filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 36 namelen 3 name i3c filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 37 namelen 3 name iio filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 38 namelen 10 name infiniband filetype
2 errors 2, no dir index
	unresolved ref dir 1797454 index 39 namelen 5 name input filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 40 namelen 12 name interconnect
filetype 2 errors 2, no dir index
	unresolved ref dir 1797454 index 41 namelen 5 name iommu filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 42 namelen 5 name ipack filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 43 namelen 7 name irqchip filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 44 namelen 4 name isdn filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 45 namelen 4 name leds filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 46 namelen 8 name lightnvm filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 47 namelen 9 name macintosh filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 48 namelen 7 name mailbox filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 49 namelen 3 name mcb filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 50 namelen 2 name md filetype 2 errors
2, no dir index
	unresolved ref dir 1797454 index 51 namelen 5 name media filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 52 namelen 8 name memstick filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 53 namelen 7 name message filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 54 namelen 3 name mfd filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 55 namelen 4 name misc filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 56 namelen 3 name mmc filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 57 namelen 3 name mtd filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 58 namelen 3 name mux filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 59 namelen 3 name net filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 60 namelen 3 name nfc filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 61 namelen 3 name ntb filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 62 namelen 6 name nvdimm filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 63 namelen 4 name nvme filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 64 namelen 5 name nvmem filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 65 namelen 2 name of filetype 2 errors
2, no dir index
	unresolved ref dir 1797454 index 66 namelen 7 name parport filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 67 namelen 3 name pci filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 68 namelen 6 name pcmcia filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 69 namelen 3 name phy filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 70 namelen 7 name pinctrl filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 71 namelen 8 name platform filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 72 namelen 5 name power filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 73 namelen 8 name powercap filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 74 namelen 3 name pps filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 75 namelen 3 name ptp filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 76 namelen 3 name pwm filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 77 namelen 7 name rapidio filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 78 namelen 9 name regulator filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 79 namelen 10 name remoteproc filetype
2 errors 2, no dir index
	unresolved ref dir 1797454 index 80 namelen 5 name reset filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 81 namelen 5 name rpmsg filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 82 namelen 3 name rtc filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 83 namelen 4 name scsi filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 84 namelen 4 name siox filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 85 namelen 7 name slimbus filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 86 namelen 3 name soc filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 87 namelen 9 name soundwire filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 88 namelen 3 name spi filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 89 namelen 4 name spmi filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 90 namelen 3 name ssb filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 91 namelen 7 name staging filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 92 namelen 6 name target filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 93 namelen 7 name thermal filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 94 namelen 11 name thunderbolt
filetype 2 errors 2, no dir index
	unresolved ref dir 1797454 index 95 namelen 3 name tty filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 96 namelen 3 name uio filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 97 namelen 3 name usb filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 98 namelen 3 name uwb filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 99 namelen 4 name vfio filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 100 namelen 5 name vhost filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 101 namelen 5 name video filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 102 namelen 4 name virt filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 103 namelen 6 name virtio filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 104 namelen 8 name visorbus filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 105 namelen 3 name vme filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 106 namelen 2 name w1 filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 107 namelen 8 name watchdog filetype 2
errors 2, no dir index
	unresolved ref dir 1797454 index 108 namelen 3 name xen filetype 2
errors 2, no dir index
root 2815 inode 1802706 errors 2000, link count wrong
	unresolved ref dir 1797455 index 2 namelen 2 name 9p filetype 0 errors
3, no dir item, no dir index
root 2815 inode 1802707 errors 2000, link count wrong
	unresolved ref dir 1797455 index 3 namelen 4 name affs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802708 errors 2000, link count wrong
	unresolved ref dir 1797455 index 4 namelen 3 name afs filetype 0 errors
3, no dir item, no dir index
root 2815 inode 1802709 errors 2000, link count wrong
	unresolved ref dir 1797455 index 5 namelen 4 name befs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802710 errors 2000, link count wrong
	unresolved ref dir 1797455 index 6 namelen 5 name btrfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802711 errors 2000, link count wrong
	unresolved ref dir 1797455 index 7 namelen 10 name cachefiles filetype
0 errors 3, no dir item, no dir index
root 2815 inode 1802712 errors 2000, link count wrong
	unresolved ref dir 1797455 index 8 namelen 4 name ceph filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802713 errors 2000, link count wrong
	unresolved ref dir 1797455 index 9 namelen 4 name cifs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802714 errors 2000, link count wrong
	unresolved ref dir 1797455 index 10 namelen 4 name coda filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802715 errors 2000, link count wrong
	unresolved ref dir 1797455 index 11 namelen 6 name cramfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802716 errors 2000, link count wrong
	unresolved ref dir 1797455 index 12 namelen 3 name dlm filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802717 errors 2000, link count wrong
	unresolved ref dir 1797455 index 13 namelen 8 name ecryptfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802718 errors 2000, link count wrong
	unresolved ref dir 1797455 index 14 namelen 4 name ext4 filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802719 errors 2000, link count wrong
	unresolved ref dir 1797455 index 15 namelen 4 name f2fs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802720 errors 2000, link count wrong
	unresolved ref dir 1797455 index 16 namelen 3 name fat filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802721 errors 2000, link count wrong
	unresolved ref dir 1797455 index 17 namelen 7 name fscache filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802722 errors 2000, link count wrong
	unresolved ref dir 1797455 index 18 namelen 4 name fuse filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802723 errors 2000, link count wrong
	unresolved ref dir 1797455 index 19 namelen 4 name gfs2 filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802724 errors 2000, link count wrong
	unresolved ref dir 1797455 index 20 namelen 3 name hfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802725 errors 2000, link count wrong
	unresolved ref dir 1797455 index 21 namelen 7 name hfsplus filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802726 errors 2000, link count wrong
	unresolved ref dir 1797455 index 22 namelen 5 name isofs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802727 errors 2000, link count wrong
	unresolved ref dir 1797455 index 23 namelen 4 name jbd2 filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802728 errors 2000, link count wrong
	unresolved ref dir 1797455 index 24 namelen 5 name jffs2 filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802729 errors 2000, link count wrong
	unresolved ref dir 1797455 index 25 namelen 3 name jfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802730 errors 2000, link count wrong
	unresolved ref dir 1797455 index 26 namelen 5 name lockd filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802731 errors 2000, link count wrong
	unresolved ref dir 1797455 index 27 namelen 5 name minix filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802732 errors 2000, link count wrong
	unresolved ref dir 1797455 index 28 namelen 3 name nfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802733 errors 2000, link count wrong
	unresolved ref dir 1797455 index 29 namelen 10 name nfs_common filetype
0 errors 3, no dir item, no dir index
root 2815 inode 1802734 errors 2000, link count wrong
	unresolved ref dir 1797455 index 30 namelen 4 name nfsd filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802735 errors 2000, link count wrong
	unresolved ref dir 1797455 index 31 namelen 6 name nilfs2 filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802736 errors 2000, link count wrong
	unresolved ref dir 1797455 index 32 namelen 3 name nls filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802737 errors 2000, link count wrong
	unresolved ref dir 1797455 index 33 namelen 4 name ntfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802738 errors 2000, link count wrong
	unresolved ref dir 1797455 index 34 namelen 5 name ocfs2 filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802739 errors 2000, link count wrong
	unresolved ref dir 1797455 index 35 namelen 4 name omfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802740 errors 2000, link count wrong
	unresolved ref dir 1797455 index 36 namelen 8 name orangefs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802741 errors 2000, link count wrong
	unresolved ref dir 1797455 index 37 namelen 9 name overlayfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802742 errors 2000, link count wrong
	unresolved ref dir 1797455 index 38 namelen 5 name quota filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802743 errors 2000, link count wrong
	unresolved ref dir 1797455 index 39 namelen 8 name reiserfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802744 errors 2000, link count wrong
	unresolved ref dir 1797455 index 40 namelen 5 name romfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802745 errors 2000, link count wrong
	unresolved ref dir 1797455 index 41 namelen 8 name squashfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802746 errors 2000, link count wrong
	unresolved ref dir 1797455 index 42 namelen 5 name ubifs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802748 errors 2000, link count wrong
	unresolved ref dir 1797455 index 43 namelen 3 name udf filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802749 errors 2000, link count wrong
	unresolved ref dir 1797455 index 44 namelen 3 name ufs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802750 errors 2000, link count wrong
	unresolved ref dir 1797455 index 45 namelen 3 name xfs filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802915 errors 2000, link count wrong
	unresolved ref dir 1797456 index 2 namelen 3 name 842 filetype 0 errors
3, no dir item, no dir index
root 2815 inode 1802916 errors 2000, link count wrong
	unresolved ref dir 1797456 index 3 namelen 3 name lz4 filetype 0 errors
3, no dir item, no dir index
root 2815 inode 1802917 errors 2000, link count wrong
	unresolved ref dir 1797456 index 4 namelen 4 name math filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802918 errors 2000, link count wrong
	unresolved ref dir 1797456 index 5 namelen 5 name raid6 filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1802945 errors 2000, link count wrong
	unresolved ref dir 1797459 index 2 namelen 7 name 6lowpan filetype 2
errors 1, no dir item
root 2815 inode 1802946 errors 2000, link count wrong
	unresolved ref dir 1797459 index 3 namelen 3 name 802 filetype 2 errors
1, no dir item
root 2815 inode 1802951 errors 2000, link count wrong
	unresolved ref dir 1797459 index 8 namelen 10 name batman-adv filetype
2 errors 1, no dir item
root 2815 inode 1802952 errors 2000, link count wrong
	unresolved ref dir 1797459 index 9 namelen 9 name bluetooth filetype 2
errors 1, no dir item
root 2815 inode 1802953 errors 2000, link count wrong
	unresolved ref dir 1797459 index 10 namelen 6 name bridge filetype 2
errors 1, no dir item
root 2815 inode 1802956 errors 2000, link count wrong
	unresolved ref dir 1797459 index 13 namelen 4 name ceph filetype 2
errors 1, no dir item
root 2815 inode 1802958 errors 2000, link count wrong
	unresolved ref dir 1797459 index 15 namelen 4 name dccp filetype 2
errors 1, no dir item
root 2815 inode 1802961 errors 2000, link count wrong
	unresolved ref dir 1797459 index 18 namelen 3 name hsr filetype 2
errors 1, no dir item
root 2815 inode 1802962 errors 2000, link count wrong
	unresolved ref dir 1797459 index 19 namelen 10 name ieee802154 filetype
2 errors 1, no dir item
root 2815 inode 1802963 errors 2000, link count wrong
	unresolved ref dir 1797459 index 20 namelen 3 name ife filetype 2
errors 1, no dir item
root 2815 inode 1802964 errors 2000, link count wrong
	unresolved ref dir 1797459 index 21 namelen 4 name ipv4 filetype 2
errors 1, no dir item
root 2815 inode 1802967 errors 2000, link count wrong
	unresolved ref dir 1797459 index 24 namelen 3 name key filetype 2
errors 1, no dir item
root 2815 inode 1802971 errors 2000, link count wrong
	unresolved ref dir 1797459 index 28 namelen 9 name mac802154 filetype 2
errors 1, no dir item
root 2815 inode 1802972 errors 2000, link count wrong
	unresolved ref dir 1797459 index 29 namelen 4 name mpls filetype 2
errors 1, no dir item
root 2815 inode 1802973 errors 2000, link count wrong
	unresolved ref dir 1797459 index 30 namelen 9 name netfilter filetype 2
errors 1, no dir item
root 2815 inode 1802974 errors 2000, link count wrong
	unresolved ref dir 1797459 index 31 namelen 7 name netlink filetype 2
errors 1, no dir item
root 2815 inode 1802975 errors 2000, link count wrong
	unresolved ref dir 1797459 index 32 namelen 6 name netrom filetype 2
errors 1, no dir item
root 2815 inode 1802976 errors 2000, link count wrong
	unresolved ref dir 1797459 index 33 namelen 3 name nfc filetype 2
errors 1, no dir item
root 2815 inode 1802977 errors 2000, link count wrong
	unresolved ref dir 1797459 index 34 namelen 3 name nsh filetype 2
errors 1, no dir item
root 2815 inode 1802978 errors 2000, link count wrong
	unresolved ref dir 1797459 index 35 namelen 11 name openvswitch
filetype 2 errors 1, no dir item
root 2815 inode 1802981 errors 2000, link count wrong
	unresolved ref dir 1797459 index 38 namelen 3 name rds filetype 2
errors 1, no dir item
root 2815 inode 1802982 errors 2000, link count wrong
	unresolved ref dir 1797459 index 39 namelen 6 name rfkill filetype 2
errors 1, no dir item
root 2815 inode 1802985 errors 2000, link count wrong
	unresolved ref dir 1797459 index 42 namelen 5 name sched filetype 2
errors 1, no dir item
root 2815 inode 1802992 errors 2000, link count wrong
	unresolved ref dir 1797459 index 49 namelen 5 name wimax filetype 2
errors 1, no dir item
root 2815 inode 1802993 errors 2000, link count wrong
	unresolved ref dir 1797459 index 50 namelen 8 name wireless filetype 2
errors 1, no dir item
root 2815 inode 1803776 errors 2000, link count wrong
	unresolved ref dir 1797455 index 47 namelen 13 name mbcache.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803889 errors 2000, link count wrong
	unresolved ref dir 1797456 index 7 namelen 9 name bch.ko.xz filetype 0
errors 3, no dir item, no dir index
root 2815 inode 1803890 errors 2000, link count wrong
	unresolved ref dir 1797456 index 9 namelen 15 name crc-itu-t.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803891 errors 2000, link count wrong
	unresolved ref dir 1797456 index 11 namelen 11 name crc16.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803892 errors 2000, link count wrong
	unresolved ref dir 1797456 index 13 namelen 10 name crc4.ko.xz filetype
0 errors 3, no dir item, no dir index
root 2815 inode 1803893 errors 2000, link count wrong
	unresolved ref dir 1797456 index 15 namelen 11 name crc64.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803894 errors 2000, link count wrong
	unresolved ref dir 1797456 index 17 namelen 10 name crc7.ko.xz filetype
0 errors 3, no dir item, no dir index
root 2815 inode 1803901 errors 2000, link count wrong
	unresolved ref dir 1797456 index 19 namelen 10 name crc8.ko.xz filetype
0 errors 3, no dir item, no dir index
root 2815 inode 1803902 errors 2000, link count wrong
	unresolved ref dir 1797456 index 21 namelen 15 name libcrc32c.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803903 errors 2000, link count wrong
	unresolved ref dir 1797456 index 23 namelen 15 name lru_cache.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803904 errors 2000, link count wrong
	unresolved ref dir 1797456 index 25 namelen 12 name objagg.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803905 errors 2000, link count wrong
	unresolved ref dir 1797456 index 27 namelen 12 name parman.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803906 errors 2000, link count wrong
	unresolved ref dir 1797456 index 29 namelen 11 name ts_bm.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803907 errors 2000, link count wrong
	unresolved ref dir 1797456 index 31 namelen 12 name ts_fsm.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803908 errors 2000, link count wrong
	unresolved ref dir 1797456 index 33 namelen 12 name ts_kmp.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803915 errors 2000, link count wrong
	unresolved ref dir 1797457 index 3 namelen 13 name vboxdrv.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803916 errors 2000, link count wrong
	unresolved ref dir 1797457 index 5 namelen 16 name vboxnetadp.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803917 errors 2000, link count wrong
	unresolved ref dir 1797457 index 7 namelen 16 name vboxnetflt.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803918 errors 2000, link count wrong
	unresolved ref dir 1797457 index 9 namelen 13 name vboxpci.ko.xz
filetype 0 errors 3, no dir item, no dir index
root 2815 inode 1803919 errors 2000, link count wrong
	unresolved ref dir 1797458 index 3 namelen 21 name
hwpoison-inject.ko.xz filetype 0 errors 3, no dir item, no dir index
ERROR: errors found in fs roots
found 1681780068352 bytes used, error(s) found
total csum bytes: 1634102612
total tree bytes: 7675002880
total fs tree bytes: 5528207360
total extent tree bytes: 374669312
btree space waste bytes: 1041372744
file data blocks allocated: 2247597993984
 referenced 1803060187136
root:~#

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:52         ` Swâmi Petaramesh
@ 2019-07-29 13:59           ` Qu Wenruo
  2019-07-29 14:01           ` Swâmi Petaramesh
  1 sibling, 0 replies; 84+ messages in thread
From: Qu Wenruo @ 2019-07-29 13:59 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs



On 2019/7/29 下午9:52, Swâmi Petaramesh wrote:
> On 7/29/19 3:47 PM, Qu Wenruo wrote:
>> Although there are some bug fixes queued for stable, it doesn't look
>> like related to such CoW breakage.
>>
>> Thus we need to rule out lower layer bugs to make sure it's btrfs
>> causing the problem.
>
> Please tell me how I could help.

Full history of the backup device please.

Including the mount/usage before and after 5.2 kernel.

>
> This machine was extremely stable (for years) before upgrading from
> kernel 5.1 to 5.2 so unless the hardware is failing, I can hardly
> imagine what else could be the problem...

You know, LUKS/LVM all uses device mapper, which adds an extra layer for
the storage stack, and it may affects how FLUSH/FUA is handled, and
break the fragile CoW used in btrfs.

And device mapper code is also upgraded with kernel.
(Although I don't believe that's the case, but we still need to wipe out
all possibilities)

Despite that, testing btrfs without LUKS/LVM on the same backup disk
(after you have restored needed data) would help us to determine if it's
the disk to blame.

(It's possible the disk itself doesn't handle FUA/FLUSH correctly thus
it's just a problem of time to hit such problem)

Thanks,
Qu

>
> Both FSes are BTRFS over LUKS (one using an LVM, the other not).
>
> Kind regards.
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:47       ` Qu Wenruo
@ 2019-07-29 13:52         ` Swâmi Petaramesh
  2019-07-29 13:59           ` Qu Wenruo
  2019-07-29 14:01           ` Swâmi Petaramesh
  0 siblings, 2 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 13:52 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, swami

On 7/29/19 3:47 PM, Qu Wenruo wrote:
> Although there are some bug fixes queued for stable, it doesn't look
> like related to such CoW breakage.
>
> Thus we need to rule out lower layer bugs to make sure it's btrfs
> causing the problem.

Please tell me how I could help.

This machine was extremely stable (for years) before upgrading from
kernel 5.1 to 5.2 so unless the hardware is failing, I can hardly
imagine what else could be the problem...

Both FSes are BTRFS over LUKS (one using an LVM, the other not).

Kind regards.

-- 
ॐ

Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:42     ` Swâmi Petaramesh
@ 2019-07-29 13:47       ` Qu Wenruo
  2019-07-29 13:52         ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-07-29 13:47 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs



On 2019/7/29 下午9:42, Swâmi Petaramesh wrote:
>
> Le 29/07/2019 à 15:35, Qu Wenruo a écrit :
>> This means btrfs metadata CoW is broken.
>
> I remember having had exactly the same kind of messages on the main
> machine's SSD a week ago (before I had to recreate, backup and restore it)
>
>> Did the system went through some power loss?
>
> No *THIS* filesystem is an external backup. It's typical use is plug,
> backup, umount (properly), unplug.
>
> So there's very little reasons such a filsystem would en up broken.
>
>> If not, then it's btrfs or lower layer causing the problem.
>>
>> Did you have any btrfs without LUKS?
>
> Not much...
>
> Here's the rest of the (still running) btrsf check :
>
> # btrfs check /dev/mapper/luks-UUID
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/luks-UUID
> UUID: ==somehing==
> [1/7] checking root items
> [2/7] checking extents
> parent transid verify failed on 2137144377344 wanted 7684 found 7499
> parent transid verify failed on 2137144377344 wanted 7684 found 7499
> parent transid verify failed on 2137144377344 wanted 7684 found 7499
> Ignoring transid failure
> leaf parent key incorrect 2137144377344
> bad block 2137144377344

All these are the same problem, one tree block didn't went through CoW,
and overwritten some existing data.>
>
> Uh I'm at a loss...

Although there are some bug fixes queued for stable, it doesn't look
like related to such CoW breakage.

Thus we need to rule out lower layer bugs to make sure it's btrfs
causing the problem.

Thanks,
Qu

>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:39   ` Lionel Bouton
@ 2019-07-29 13:45     ` Swâmi Petaramesh
       [not found]       ` <d8c571e4-718e-1241-66ab-176d091d6b48@bouton.name>
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 13:45 UTC (permalink / raw)
  To: Lionel Bouton, linux-btrfs

Le 29/07/2019 à 15:39, Lionel Bouton a écrit :
> My laptop rebooted without problems. Note : my system uses a NVMe
> device, not a SATA SSD.

Anyway in computer science, « works here now » has never been and will
never be a proof that something doesn't have deadly bugs waiting to bite...

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:35   ` Qu Wenruo
@ 2019-07-29 13:42     ` Swâmi Petaramesh
  2019-07-29 13:47       ` Qu Wenruo
  0 siblings, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 13:42 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


Le 29/07/2019 à 15:35, Qu Wenruo a écrit :
> This means btrfs metadata CoW is broken.

I remember having had exactly the same kind of messages on the main
machine's SSD a week ago (before I had to recreate, backup and restore it)

> Did the system went through some power loss?

No *THIS* filesystem is an external backup. It's typical use is plug,
backup, umount (properly), unplug.

So there's very little reasons such a filsystem would en up broken.

> If not, then it's btrfs or lower layer causing the problem.
> 
> Did you have any btrfs without LUKS?

Not much...

Here's the rest of the (still running) btrsf check :

# btrfs check /dev/mapper/luks-UUID
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-UUID
UUID: ==somehing==
[1/7] checking root items
[2/7] checking extents
parent transid verify failed on 2137144377344 wanted 7684 found 7499
parent transid verify failed on 2137144377344 wanted 7684 found 7499
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
leaf parent key incorrect 2137144377344
bad block 2137144377344
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
Wrong key of child node/leaf, wanted: (1797454, 96, 23), have:
(18446744073709551606, 128, 2538887163904)
Wrong generation of child node/leaf, wanted: 7499, have: 7684


Uh I'm at a loss...

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
       [not found] ` <f8b08aec-2c43-9545-906e-7e41953d9ed4@bouton.name>
  2019-07-29 13:35   ` Swâmi Petaramesh
@ 2019-07-29 13:39   ` Lionel Bouton
  2019-07-29 13:45     ` Swâmi Petaramesh
  1 sibling, 1 reply; 84+ messages in thread
From: Lionel Bouton @ 2019-07-29 13:39 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs

Le 29/07/2019 à 15:29, Lionel Bouton a écrit :
> Hi,
>
> Le 29/07/2019 à 14:32, Swâmi Petaramesh a écrit :
>> Hi list,
>>
>> I've been using BTRFS for years on a whole lot of machines witch complex
>> configurations (in RAID, over LUKS, with bcache etc) and was happy I
>> never lost ONE byte.
>>
>> Well, since I upgraded my main laptop to Arch Linux kernel 5.2, I
>> IMMEDIATELY got my laptop's SSD BTRFS FS (over LUKS) corrupt, and have
>> to rebuild and restore it.
>
> For another reference point, my personal laptop reports 17 days of
> uptime on 5.2.0-arch2-1-ARCH.
> I use BTRFS both over LUKS over LVM and directly over LVM. The system
> is suspended during the night and running otherwise (probably more
> than 16 hours a day).
>
> I don't have any problem so far. I'll reboot right away and reply to
> this message (if you see it and not a reply shortly after, there might
> be a bug affecting me too).

My laptop rebooted without problems. Note : my system uses a NVMe
device, not a SATA SSD.

Lionel

Edit: resent in pure text (Thunderbird seems to have forgotten that
linux-btrfs refuses HTML).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
       [not found] ` <f8b08aec-2c43-9545-906e-7e41953d9ed4@bouton.name>
@ 2019-07-29 13:35   ` Swâmi Petaramesh
  2019-07-30  8:04     ` Henk Slager
  2019-07-29 13:39   ` Lionel Bouton
  1 sibling, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 13:35 UTC (permalink / raw)
  To: Lionel Bouton, linux-btrfs

On 7/29/19 3:29 PM, Lionel Bouton wrote:
> For another reference point, my personal laptop reports 17 days of
> uptime on 5.2.0-arch2-1-ARCH.
> I use BTRFS both over LUKS over LVM and directly over LVM. The system
> is suspended during the night and running otherwise (probably more
> than 16 hours a day).
>
> I don't have any problem so far. I'll reboot right away and reply to
> this message (if you see it and not a reply shortly after, there might
> be a bug affecting me too).
>
Well I had upgraded 3 machines to 5.2 (One Arch and 2 Manjaros).

The Arch broke 2 BTRFS filesystems residing on 2 different disks that
had been perfectly reliable ever before.

The 2 Manjaros did not exhibit trouble so far but I use these 2 very
little and I preferred to revert back to 5.1 in a hurry before I break
my backup machines as badly as my main machine :-/

My Arch first broke its BTRFS main FS and I told myself it was years
old, so maybe some old corruption undetected by scrub so far...

But the external HD that just broke is less than 6 months old and has
been formatted with at least a 4.20 kernel... And is used purely for
backups. So this I don't understand.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 13:02 ` Swâmi Petaramesh
@ 2019-07-29 13:35   ` Qu Wenruo
  2019-07-29 13:42     ` Swâmi Petaramesh
  0 siblings, 1 reply; 84+ messages in thread
From: Qu Wenruo @ 2019-07-29 13:35 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs



On 2019/7/29 下午9:02, Swâmi Petaramesh wrote:
> Le 29/07/2019 à 14:32, Swâmi Petaramesh a écrit :
>>
>> Today, same machine, but this time my external BTRFS (over LUKS) backup
>> USB HDD went corrupt the same.
>
> btrfs check reports as follows :
>
> # btrfs check /dev/mapper/luks-UUID
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/luks-UUID
> UUID: ---Something---
> [1/7] checking root items
> [2/7] checking extents
> parent transid verify failed on 2137144377344 wanted 7684 found 7499
> parent transid verify failed on 2137144377344 wanted 7684 found 7499
> parent transid verify failed on 2137144377344 wanted 7684 found 7499

This means btrfs metadata CoW is broken.

Did the system went through some power loss?
If not, then it's btrfs or lower layer causing the problem.

Did you have any btrfs without LUKS?

Thanks,
Qu


> Ignoring transid failure
> leaf parent key incorrect 2137144377344
> bad block 2137144377344
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space cache
> [4/7] checking fs roots
>
> (Still running)
>
> ॐ
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-07-29 12:32 Swâmi Petaramesh
@ 2019-07-29 13:02 ` Swâmi Petaramesh
  2019-07-29 13:35   ` Qu Wenruo
       [not found] ` <f8b08aec-2c43-9545-906e-7e41953d9ed4@bouton.name>
  1 sibling, 1 reply; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 13:02 UTC (permalink / raw)
  To: linux-btrfs

Le 29/07/2019 à 14:32, Swâmi Petaramesh a écrit :
> 
> Today, same machine, but this time my external BTRFS (over LUKS) backup
> USB HDD went corrupt the same.

btrfs check reports as follows :

# btrfs check /dev/mapper/luks-UUID
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-UUID
UUID: ---Something---
[1/7] checking root items
[2/7] checking extents
parent transid verify failed on 2137144377344 wanted 7684 found 7499
parent transid verify failed on 2137144377344 wanted 7684 found 7499
parent transid verify failed on 2137144377344 wanted 7684 found 7499
Ignoring transid failure
leaf parent key incorrect 2137144377344
bad block 2137144377344
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots

(Still running)

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Massive filesystem corruption since kernel 5.2 (ARCH)
@ 2019-07-29 12:32 Swâmi Petaramesh
  2019-07-29 13:02 ` Swâmi Petaramesh
       [not found] ` <f8b08aec-2c43-9545-906e-7e41953d9ed4@bouton.name>
  0 siblings, 2 replies; 84+ messages in thread
From: Swâmi Petaramesh @ 2019-07-29 12:32 UTC (permalink / raw)
  To: linux-btrfs, swami

Hi list,

I've been using BTRFS for years on a whole lot of machines witch complex
configurations (in RAID, over LUKS, with bcache etc) and was happy I
never lost ONE byte.

Well, since I upgraded my main laptop to Arch Linux kernel 5.2, I
IMMEDIATELY got my laptop's SSD BTRFS FS (over LUKS) corrupt, and have
to rebuild and restore it.

Today, same machine, but this time my external BTRFS (over LUKS) backup
USB HDD went corrupt the same.

The first logged messages looks the same in both cases : « Parent
transid verify failed on (huge number) wanted 7684 found 7499. »

(This time - or similar numbers)

So I'm under the impression that either my laptop's RAM is dying, or
something is *very* broke in BTRFS in kernel 5.2.

Any hint or advice very much appreciated.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, back to index

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
2019-08-25 10:00 ` Swâmi Petaramesh
2019-08-27  0:00   ` Christoph Anton Mitterer
2019-08-27  5:06     ` Swâmi Petaramesh
2019-08-27  6:13       ` Swâmi Petaramesh
2019-08-27  6:21         ` Qu Wenruo
2019-08-27  6:34           ` Swâmi Petaramesh
2019-08-27  6:52             ` Qu Wenruo
2019-08-27  9:14               ` Swâmi Petaramesh
2019-08-27 12:40                 ` Hans van Kranenburg
2019-08-29 12:46                   ` Oliver Freyermuth
2019-08-29 13:08                     ` Christoph Anton Mitterer
2019-08-29 13:09                     ` Swâmi Petaramesh
2019-08-29 13:11                     ` Qu Wenruo
2019-08-29 13:17                       ` Oliver Freyermuth
2019-08-29 17:40                         ` Oliver Freyermuth
     [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
2019-08-27 12:34                 ` Re : " Qu Wenruo
2019-08-27 10:59           ` Swâmi Petaramesh
2019-08-27 11:11             ` Alberto Bursi
2019-08-27 11:20               ` Swâmi Petaramesh
2019-08-27 11:29                 ` Alberto Bursi
2019-08-27 11:45                   ` Swâmi Petaramesh
2019-08-27 17:49               ` Swâmi Petaramesh
2019-08-27 22:10               ` Chris Murphy
2019-08-27 12:52 ` Michal Soltys
2019-09-12  7:50 ` Filipe Manana
2019-09-12  8:24   ` James Harvey
2019-09-12  9:06     ` Filipe Manana
2019-09-12  9:09     ` Holger Hoffstätte
2019-09-12 10:53     ` Swâmi Petaramesh
2019-09-12 12:58       ` Christoph Anton Mitterer
2019-10-14  4:00         ` Nicholas D Steeves
2019-09-12  8:48   ` Swâmi Petaramesh
2019-09-12 13:09   ` Christoph Anton Mitterer
2019-09-12 14:28     ` Filipe Manana
2019-09-12 14:39       ` Christoph Anton Mitterer
2019-09-12 14:57         ` Swâmi Petaramesh
2019-09-12 16:21           ` Zdenek Kaspar
2019-09-12 18:52             ` Swâmi Petaramesh
2019-09-13 18:50       ` Pete
     [not found]         ` <CACzgC9gvhGwyQAKm5J1smZZjim-ecEix62ZQCY-wwJYVzMmJ3Q@mail.gmail.com>
2019-10-14  2:07           ` Adam Bahe
2019-10-14  2:19             ` Qu Wenruo
2019-10-14 17:54             ` Chris Murphy
  -- strict thread matches above, loose matches on Subject: below --
2019-07-29 12:32 Swâmi Petaramesh
2019-07-29 13:02 ` Swâmi Petaramesh
2019-07-29 13:35   ` Qu Wenruo
2019-07-29 13:42     ` Swâmi Petaramesh
2019-07-29 13:47       ` Qu Wenruo
2019-07-29 13:52         ` Swâmi Petaramesh
2019-07-29 13:59           ` Qu Wenruo
2019-07-29 14:01           ` Swâmi Petaramesh
2019-07-29 14:08             ` Qu Wenruo
2019-07-29 14:21               ` Swâmi Petaramesh
2019-07-29 14:27                 ` Qu Wenruo
2019-07-29 14:34                   ` Swâmi Petaramesh
2019-07-29 14:40                     ` Qu Wenruo
2019-07-29 14:46                       ` Swâmi Petaramesh
2019-07-29 14:51                         ` Qu Wenruo
2019-07-29 14:55                           ` Swâmi Petaramesh
2019-07-29 15:05                             ` Swâmi Petaramesh
2019-07-29 19:20                               ` Chris Murphy
2019-07-30  6:47                                 ` Swâmi Petaramesh
2019-07-29 19:10                       ` Chris Murphy
2019-07-30  8:09                         ` Swâmi Petaramesh
2019-07-30 20:15                           ` Chris Murphy
2019-07-30 22:44                             ` Swâmi Petaramesh
2019-07-30 23:13                               ` Graham Cobb
2019-07-30 23:24                                 ` Chris Murphy
     [not found] ` <f8b08aec-2c43-9545-906e-7e41953d9ed4@bouton.name>
2019-07-29 13:35   ` Swâmi Petaramesh
2019-07-30  8:04     ` Henk Slager
2019-07-30  8:17       ` Swâmi Petaramesh
2019-07-29 13:39   ` Lionel Bouton
2019-07-29 13:45     ` Swâmi Petaramesh
     [not found]       ` <d8c571e4-718e-1241-66ab-176d091d6b48@bouton.name>
2019-07-29 14:04         ` Swâmi Petaramesh
2019-08-01  4:50           ` Anand Jain
2019-08-01  6:07             ` Swâmi Petaramesh
2019-08-01  6:36               ` Qu Wenruo
2019-08-01  8:07                 ` Swâmi Petaramesh
2019-08-01  8:43                   ` Qu Wenruo
2019-08-01 13:46                     ` Anand Jain
2019-08-01 18:56                       ` Swâmi Petaramesh
2019-08-08  8:46                         ` Qu Wenruo
2019-08-08  9:55                           ` Swâmi Petaramesh
2019-08-08 10:12                             ` Qu Wenruo

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git