linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
@ 2019-08-24 17:44 Christoph Anton Mitterer
  2019-08-25 10:00 ` Swâmi Petaramesh
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Christoph Anton Mitterer @ 2019-08-24 17:44 UTC (permalink / raw)
  To: linux-btrfs

Hey.

Anything new about the issue described here:
https://www.spinics.net/lists/linux-btrfs/msg91046.html

It was said that it might be a regression in 5.2 actually and not a
hardware thing... so I just wonder whether I can safely move to 5.2?


Cheers,
Chris.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
@ 2019-08-25 10:00 ` Swâmi Petaramesh
  2019-08-27  0:00   ` Christoph Anton Mitterer
  2019-08-27 12:52 ` Michal Soltys
  2019-09-12  7:50 ` Filipe Manana
  2 siblings, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-25 10:00 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

Le 24/08/2019 à 19:44, Christoph Anton Mitterer a écrit :
> 
> Anything new about the issue described here:
> https://www.spinics.net/lists/linux-btrfs/msg91046.html
> 
> It was said that it might be a regression in 5.2 actually and not a
> hardware thing... so I just wonder whether I can safely move to 5.2?

Hello,

I have re-upgraded the system on which I had the corruption about 10
days ago, and now running kernel 5.2.9-arch1-1-ARCH.

I haven't seen any filesystem issue since, but I haven't used the system
very much yet.

I was planning to wait a little more before reporting whether it looked
stable or not on my machine.

So I can only say that I was able to use the system upgraded again to
5.2 for a few days, so filesystem corruption didn't, at least,
immediately follow a kernel upgrade and a number of system reboots and
snapshots creation / deletion since.

Hope this helps a bit.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-25 10:00 ` Swâmi Petaramesh
@ 2019-08-27  0:00   ` Christoph Anton Mitterer
  2019-08-27  5:06     ` Swâmi Petaramesh
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Anton Mitterer @ 2019-08-27  0:00 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs

Hey.


On Sun, 2019-08-25 at 12:00 +0200, Swâmi Petaramesh wrote:
> I haven't seen any filesystem issue since, but I haven't used the
> system
> very much yet.

Hmm strange... so could it have been a hardware issue?

Cheers,
Chris.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  0:00   ` Christoph Anton Mitterer
@ 2019-08-27  5:06     ` Swâmi Petaramesh
  2019-08-27  6:13       ` Swâmi Petaramesh
  0 siblings, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  5:06 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

Hi,

Le 27/08/2019 à 02:00, Christoph Anton Mitterer a écrit :
> 
> On Sun, 2019-08-25 at 12:00 +0200, Swâmi Petaramesh wrote:
>> I haven't seen any filesystem issue since, but I haven't used the
>> system
>> very much yet.
> 
> Hmm strange... so could it have been a hardware issue?

I really do not feel so.

This is a laptop that has been perfectly stable and reliable since 2014.

Hardware dies, but if it did I would expect problems to persist or
manifest again real soon. When RAM fails it usually doesn't feel better
later on.

It has corrupt its internal SSD's BTRFS right after I first upgraded to
kernel 5.2, then it corrupt an external's HD BTRFS while I was trying to
backup what could still be...

Then the machine reverted to its usual fair and stable behaviour after I
restored it with a 5.1 kernel again.

Now the machine looks stable so far with a 5.2, albeit more recent, Arch
kernel : 5.2.9-arch1-1-ARCH.

I'm typing this email on it.

I cannot tell what happened, but really this doesn't feel like an
hardware issue to me...

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  5:06     ` Swâmi Petaramesh
@ 2019-08-27  6:13       ` Swâmi Petaramesh
  2019-08-27  6:21         ` Qu Wenruo
  0 siblings, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  6:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Anton Mitterer

Le 27/08/2019 à 07:06, Swâmi Petaramesh a écrit :
> 
> Now the machine looks stable so far with a 5.2, albeit more recent, Arch
> kernel : 5.2.9-arch1-1-ARCH.

As my 1st machine looks fairly stable now, I just upgraded to 5.2
another one that had always been running <= 5.1 before.

So I keep an eye on the syslog.

Right after reboot in 5.2 I see :

kernel: BTRFS warning (device dm-1): block
group 34390147072 has wrong amount of free space
kernel: BTRFS warning (device dm-1):
failed to load free space cache for block group 34390147072, rebuilding
it now

So it seems that the 5.2 kernel finds and tries to fix minor
inconsistencies that were unnoticed in previous kernel versions ?

I wonder if such things could be the cause of the corruption issues I
got : finding some inconsistencies with new checks right after a kernel
upgrade, trying to fix them and creating a mess instead ?

(This 2nd machine has been rebooted twice after this and still looks
happy...)

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:13       ` Swâmi Petaramesh
@ 2019-08-27  6:21         ` Qu Wenruo
  2019-08-27  6:34           ` Swâmi Petaramesh
  2019-08-27 10:59           ` Swâmi Petaramesh
  0 siblings, 2 replies; 43+ messages in thread
From: Qu Wenruo @ 2019-08-27  6:21 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2019/8/27 下午2:13, Swâmi Petaramesh wrote:
> Le 27/08/2019 à 07:06, Swâmi Petaramesh a écrit :
>>
>> Now the machine looks stable so far with a 5.2, albeit more recent, Arch
>> kernel : 5.2.9-arch1-1-ARCH.
>
> As my 1st machine looks fairly stable now, I just upgraded to 5.2
> another one that had always been running <= 5.1 before.
>
> So I keep an eye on the syslog.
>
> Right after reboot in 5.2 I see :
>
> kernel: BTRFS warning (device dm-1): block
> group 34390147072 has wrong amount of free space
> kernel: BTRFS warning (device dm-1):
> failed to load free space cache for block group 34390147072, rebuilding
> it now
>
> So it seems that the 5.2 kernel finds and tries to fix minor
> inconsistencies that were unnoticed in previous kernel versions ?

It means something wrong is already done in previous kernel.

V1 space cache use regular-file-like internal structures to record used
space. V1 space cache doesn't use btrfs' regular csum tree, but uses its
own inline csum to protect its content.

If free space cache is invalid but passes its csum check, it's
completely *possible* to break metadata CoW, thus leads to transid mismatch.

You can go v2 space cache which uses metadata CoW to protect its space
cache, thus in theory it should be a little safer than V1 space cache.

Or you can just disable space cache using nospace_cache mount option, as
it's just an optimization. It's also recommended to clean existing cache
by using "btrfs check --clear-space-cache v1".

I'd prefer to do a "btrfs check --readonly" anyway (which also checks
free space cache), then go nospace_cache if you're concerned.

Thanks,
Qu

>
> I wonder if such things could be the cause of the corruption issues I
> got : finding some inconsistencies with new checks right after a kernel
> upgrade, trying to fix them and creating a mess instead ?
>
> (This 2nd machine has been rebooted twice after this and still looks
> happy...)
>
> Kind regards.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:21         ` Qu Wenruo
@ 2019-08-27  6:34           ` Swâmi Petaramesh
  2019-08-27  6:52             ` Qu Wenruo
  2019-08-27 10:59           ` Swâmi Petaramesh
  1 sibling, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  6:34 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Hi Qu,

Le 27/08/2019 à 08:21, Qu Wenruo a écrit :
> If free space cache is invalid but passes its csum check, it's
> completely *possible* to break metadata CoW, thus leads to transid mismatch.
> 
> You can go v2 space cache which uses metadata CoW to protect its space
> cache, thus in theory it should be a little safer than V1 space cache.
> 
> Or you can just disable space cache using nospace_cache mount option, as
> it's just an optimization. It's also recommended to clean existing cache
> by using "btrfs check --clear-space-cache v1".
> 
> I'd prefer to do a "btrfs check --readonly" anyway (which also checks
> free space cache), then go nospace_cache if you're concerned.

I will leave for travel shortly, so I will be unable to perform further
tests on this machine for a week, but I'll do when I'm back.

Should I understand your statement as an advice to clear the space cache
even though the kernel said it has rebuilt it, or to use the V2 space
cache generally speaking, on any machine that I use (I had understood it
was useful only on multi-TB filesystems...)

Thanks.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:34           ` Swâmi Petaramesh
@ 2019-08-27  6:52             ` Qu Wenruo
  2019-08-27  9:14               ` Swâmi Petaramesh
       [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
  0 siblings, 2 replies; 43+ messages in thread
From: Qu Wenruo @ 2019-08-27  6:52 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2019/8/27 下午2:34, Swâmi Petaramesh wrote:
> Hi Qu,
>
> Le 27/08/2019 à 08:21, Qu Wenruo a écrit :
>> If free space cache is invalid but passes its csum check, it's
>> completely *possible* to break metadata CoW, thus leads to transid mismatch.
>>
>> You can go v2 space cache which uses metadata CoW to protect its space
>> cache, thus in theory it should be a little safer than V1 space cache.
>>
>> Or you can just disable space cache using nospace_cache mount option, as
>> it's just an optimization. It's also recommended to clean existing cache
>> by using "btrfs check --clear-space-cache v1".
>>
>> I'd prefer to do a "btrfs check --readonly" anyway (which also checks
>> free space cache), then go nospace_cache if you're concerned.
>
> I will leave for travel shortly, so I will be unable to perform further
> tests on this machine for a week, but I'll do when I'm back.
>
> Should I understand your statement as an advice to clear the space cache
> even though the kernel said it has rebuilt it,

Rebuild only happens when kernel detects such mismatch by comparing the
block group free space (recorded in block group item) and free space cache.

If those numbers (along with other things like csum and generation)
match, we don't have a way to detect wrong free space cache at all.

So if kernel is already complaining about free space cache, then no
matter whatever the reason is, you'd better take extra care about the
free space cache.

Although another possible cause is, your extent tree is already
corrupted thus the free space number in block group item is already
incorrect.
You can only determine that by running btrfs check --readonly.

> or to use the V2 space
> cache generally speaking, on any machine that I use (I had understood it
> was useful only on multi-TB filesystems...)

10GiB is enough to create large enough block groups to utilize free
space cache.
So you can't really escape from free space cache.

Thanks,
Qu

>
> Thanks.
>
> ॐ
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:52             ` Qu Wenruo
@ 2019-08-27  9:14               ` Swâmi Petaramesh
  2019-08-27 12:40                 ` Hans van Kranenburg
       [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
  1 sibling, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27  9:14 UTC (permalink / raw)
  To: linux-btrfs

On 8/27/19 8:52 AM, Qu Wenruo wrote:
>> or to use the V2 space
>> cache generally speaking, on any machine that I use (I had understood it
>> was useful only on multi-TB filesystems...)
> 10GiB is enough to create large enough block groups to utilize free
> space cache.
> So you can't really escape from free space cache.

I meant that I had understood that the V2 space cache was preferable to
V1 only for multi-TB filesystems.

So would you advise to use V2 space cache also for filesystems < 1 TB ?

TIA.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  6:21         ` Qu Wenruo
  2019-08-27  6:34           ` Swâmi Petaramesh
@ 2019-08-27 10:59           ` Swâmi Petaramesh
  2019-08-27 11:11             ` Alberto Bursi
  1 sibling, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 10:59 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Hi again,

Le 27/08/2019 à 08:21, Qu Wenruo a écrit :
> I'd prefer to do a "btrfs check --readonly" anyway (which also checks
> free space cache), then go nospace_cache if you're concerned.

Here's what I dit, here's what I got...:

root@PartedMagic:~# uname -r
5.1.5-pmagic64

root@PartedMagic:~# btrfs --version
btrfs-progs v5.1

root@PartedMagic:~# btrfs check --readonly /dev/PPL_VG1/LINUX
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
block group 52677312512 has wrong amount of free space, free space cache
has 266551296 block group has 266584064
failed to load free space cache for block group 52677312512
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/PPL_VG1/LINUX
UUID: 25fede5a-d8c2-4c7e-9e7e-b19aad319044
found 87804731392 bytes used, no error found
total csum bytes: 79811080
total tree bytes: 2195832832
total fs tree bytes: 1992900608
total extent tree bytes: 101548032
btree space waste bytes: 380803707
file data blocks allocated: 626135830528
 referenced 124465221632

root@PartedMagic:~# mkdir /hd
root@PartedMagic:~# mount -t btrfs -o noatime,clear_cache
/dev/PPL_VG1/LINUX /hd

(Waited for no disk activity and top showing no btrfs processes)

root@PartedMagic:~# umount /hd

root@PartedMagic:~# mount -t btrfs -o noatime /dev/PPL_VG1/LINUX /hd

root@PartedMagic:~# grep btrfs /proc/self/mountinfo
40 31 0:43 / /hd rw,noatime - btrfs /dev/mapper/PPL_VG1-LINUX
rw,ssd,space_cache,subvolid=5,subvol=/

root@PartedMagic:~# umount /hd

root@PartedMagic:~# btrfs check --readonly /dev/PPL_VG1/LINUX
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
block group 52677312512 has wrong amount of free space, free space cache
has 266551296 block group has 266584064
failed to load free space cache for block group 52677312512
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/PPL_VG1/LINUX
UUID: 25fede5a-d8c2-4c7e-9e7e-b19aad319044
found 87804207104 bytes used, no error found
total csum bytes: 79811080
total tree bytes: 2195832832
total fs tree bytes: 1992900608
total extent tree bytes: 101548032
btree space waste bytes: 380804019
file data blocks allocated: 626135306240
 referenced 124464697344
root@PartedMagic:~#


So it seems that mounting with “clear_cache” did not actually clear the
cache and fix the issue ?

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 10:59           ` Swâmi Petaramesh
@ 2019-08-27 11:11             ` Alberto Bursi
  2019-08-27 11:20               ` Swâmi Petaramesh
                                 ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Alberto Bursi @ 2019-08-27 11:11 UTC (permalink / raw)
  To: Swâmi Petaramesh, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer


On 27/08/19 12:59, Swâmi Petaramesh wrote:
>
>
> So it seems that mounting with “clear_cache” did not actually clear the
> cache and fix the issue ?
>
> ॐ
>

mounting with clear_cache does not actually clear cache

unless it is needed or modified or something.


If you want to fully clear cache you need to use (on an unmounted 
filesystem)

btrfs check --clear-space-cache v1 /dev/sdX

or

btrfs check  --clear-space-cache v2 /dev/sdX

depending on what space cache you used (v1 is default)


-Alberto


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:11             ` Alberto Bursi
@ 2019-08-27 11:20               ` Swâmi Petaramesh
  2019-08-27 11:29                 ` Alberto Bursi
  2019-08-27 17:49               ` Swâmi Petaramesh
  2019-08-27 22:10               ` Chris Murphy
  2 siblings, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 11:20 UTC (permalink / raw)
  To: Alberto Bursi, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Le 27/08/2019 à 13:11, Alberto Bursi a écrit :
> 
> 
> btrfs check --clear-space-cache v1 /dev/sdX

“Bad option” (even with _ instead of - and between option and v1 or V2...

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:20               ` Swâmi Petaramesh
@ 2019-08-27 11:29                 ` Alberto Bursi
  2019-08-27 11:45                   ` Swâmi Petaramesh
  0 siblings, 1 reply; 43+ messages in thread
From: Alberto Bursi @ 2019-08-27 11:29 UTC (permalink / raw)
  To: Swâmi Petaramesh, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer


On 27/08/19 13:20, Swâmi Petaramesh wrote:
> Le 27/08/2019 à 13:11, Alberto Bursi a écrit :
>>
>> btrfs check --clear-space-cache v1 /dev/sdX
> “Bad option” (even with _ instead of - and between option and v1 or V2...
>
> ॐ
>

Here on my up-to-date OpenSUSE Tumbleweed system it works.

(doing this on a random flash drive I just formatted)

hpprobook:/home/alby # btrfs check --clear-space-cache v1 /dev/sdd1
Opening filesystem to check...
Checking filesystem on /dev/sdd1
UUID: f69a86ce-7aaa-4c9d-a6dd-4c8ff092007f
Free space cache cleared

-Alberto


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:29                 ` Alberto Bursi
@ 2019-08-27 11:45                   ` Swâmi Petaramesh
  0 siblings, 0 replies; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 11:45 UTC (permalink / raw)
  To: Alberto Bursi, Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer

Le 27/08/2019 à 13:29, Alberto Bursi a écrit :
> hpprobook:/home/alby # btrfs check --clear-space-cache v1 /dev/sdd1

My mistake, I read it too fast and tried it a a mount option...

ॐ


-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Re : Massive filesystem corruption since kernel 5.2 (ARCH)
       [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
@ 2019-08-27 12:34                 ` Qu Wenruo
  0 siblings, 0 replies; 43+ messages in thread
From: Qu Wenruo @ 2019-08-27 12:34 UTC (permalink / raw)
  To: swami, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2019/8/27 下午4:35, swami@petaramesh.org wrote:
> Hi Qu,
>
> Sorry for top-posting (my phone, uh...)
>
> I meant that I had understood that the V2 space cache was preferable to
> V1 only for multi-TB filesystems.
>
> So would you advise to use V2 space cache also for filesystems < 1 TB ?

Sorry, I'm not that familiar with space cache/tree code to do any
recommendation based on code.

My recommendation is, disable space cache if in doubt. Simple and
straightforward.

Thanks,
Qu
>
> TIA.
>
> Kind regards.
>
>
>
> -------- Message original --------
> Objet : Re: Massive filesystem corruption since kernel 5.2 (ARCH)
> De : Qu Wenruo
> À : Swâmi Petaramesh ,linux-btrfs@vger.kernel.org
> Cc : Christoph Anton Mitterer
>
>
>     > or to use the V2 space
>     > cache generally speaking, on any machine that I use (I had
>     understood it
>     > was useful only on multi-TB filesystems...)
>
>     10GiB is enough to create large enough block groups to utilize free
>     space cache.
>     So you can't really escape from free space cache.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27  9:14               ` Swâmi Petaramesh
@ 2019-08-27 12:40                 ` Hans van Kranenburg
  2019-08-29 12:46                   ` Oliver Freyermuth
  0 siblings, 1 reply; 43+ messages in thread
From: Hans van Kranenburg @ 2019-08-27 12:40 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs

On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>> or to use the V2 space
>>> cache generally speaking, on any machine that I use (I had understood it
>>> was useful only on multi-TB filesystems...)
>> 10GiB is enough to create large enough block groups to utilize free
>> space cache.
>> So you can't really escape from free space cache.
> 
> I meant that I had understood that the V2 space cache was preferable to
> V1 only for multi-TB filesystems.
> 
> So would you advise to use V2 space cache also for filesystems < 1 TB ?

Yes.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
  2019-08-25 10:00 ` Swâmi Petaramesh
@ 2019-08-27 12:52 ` Michal Soltys
  2019-09-12  7:50 ` Filipe Manana
  2 siblings, 0 replies; 43+ messages in thread
From: Michal Soltys @ 2019-08-27 12:52 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

On 8/24/19 7:44 PM, Christoph Anton Mitterer wrote:
> Hey.
> 
> Anything new about the issue described here:
> https://www.spinics.net/lists/linux-btrfs/msg91046.html
> 
> It was said that it might be a regression in 5.2 actually and not a
> hardware thing... so I just wonder whether I can safely move to 5.2?
> 
> 
> Cheers,
> Chris.
> 
> 

FWIW, my laptop is on btrfs since around late 4.x kernel times - also 
using archlinux. And it went w/o any issues through all the kernels 
since then to current one (5.2.9 as of this writing). It survived some 
peculiar hangs and sudden power offs without any bad lasting sideffects 
throughout btrfs existence as its main filesystem. It's on old samsung 
850 pro ssd (though haven't tested if the disk is cache flush liar or not).

That to say I don't have any storage stacks underneath. On some other 
machines I use btrfs (on arch) as well, often with its builtin raid1 
implementation - no issues observed so far.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:11             ` Alberto Bursi
  2019-08-27 11:20               ` Swâmi Petaramesh
@ 2019-08-27 17:49               ` Swâmi Petaramesh
  2019-08-27 22:10               ` Chris Murphy
  2 siblings, 0 replies; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-27 17:49 UTC (permalink / raw)
  To: Alberto Bursi, linux-btrfs; +Cc: Qu Wenruo, Christoph Anton Mitterer

Hi Alberto,

Le 27/08/2019 à 13:11, Alberto Bursi a écrit :
> If you want to fully clear cache you need to use (on an unmounted 
> filesystem)
> 
> btrfs check --clear-space-cache v1 /dev/sdX

It worked, thanks.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 11:11             ` Alberto Bursi
  2019-08-27 11:20               ` Swâmi Petaramesh
  2019-08-27 17:49               ` Swâmi Petaramesh
@ 2019-08-27 22:10               ` Chris Murphy
  2 siblings, 0 replies; 43+ messages in thread
From: Chris Murphy @ 2019-08-27 22:10 UTC (permalink / raw)
  To: Alberto Bursi
  Cc: Swâmi Petaramesh, Qu Wenruo, linux-btrfs, Christoph Anton Mitterer

On Tue, Aug 27, 2019 at 5:11 AM Alberto Bursi <alberto.bursi@outlook.it> wrote:

> If you want to fully clear cache you need to use (on an unmounted
> filesystem)
>
> btrfs check --clear-space-cache v1 /dev/sdX
>
> or
>
> btrfs check  --clear-space-cache v2 /dev/sdX

I recommend a minimum version of btrfs-progs 5.1 for either of these
commands. Before that version, a crash mid write of updating the
extent tree can cause file system corruption. In my case, all data
could be extracted merely by mounting -o ro, but I did have to
recreated that file system from scratch.



--
Chris Murphy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-27 12:40                 ` Hans van Kranenburg
@ 2019-08-29 12:46                   ` Oliver Freyermuth
  2019-08-29 13:08                     ` Christoph Anton Mitterer
                                       ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Oliver Freyermuth @ 2019-08-29 12:46 UTC (permalink / raw)
  To: Hans van Kranenburg, Swâmi Petaramesh, linux-btrfs

Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>> or to use the V2 space
>>>> cache generally speaking, on any machine that I use (I had understood it
>>>> was useful only on multi-TB filesystems...)
>>> 10GiB is enough to create large enough block groups to utilize free
>>> space cache.
>>> So you can't really escape from free space cache.
>>
>> I meant that I had understood that the V2 space cache was preferable to
>> V1 only for multi-TB filesystems.
>>
>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
> 
> Yes.
> 

This makes me wonder if it should be the default? 

This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
 failed to load free space cache for block group XXXX, rebuilding it now
at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels. 

So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW, 
I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?). 

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 12:46                   ` Oliver Freyermuth
@ 2019-08-29 13:08                     ` Christoph Anton Mitterer
  2019-08-29 13:09                     ` Swâmi Petaramesh
  2019-08-29 13:11                     ` Qu Wenruo
  2 siblings, 0 replies; 43+ messages in thread
From: Christoph Anton Mitterer @ 2019-08-29 13:08 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 2019-08-29 at 14:46 +0200, Oliver Freyermuth wrote:
> This thread made me check on my various BTRFS volumes and for almost
> all of them (in different machines), I find cases of
>  failed to load free space cache for block group XXXX, rebuilding it
> now
> at several points during the last months in my syslogs - and that's
> for machines without broken memory, for disks for which FUA should be
> working fine,
> without any unsafe shutdowns over their lifetime, and with histories
> as short as only having seen 5.x kernels. 


I'm having the very same... machines that are most likely fine in terms
of hardware and had no crashes or so,... yet they still see v1 free
space cache issues every now and then,... which sounds like a pointer
that something's wrong there.

Cheers,
Chris.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 12:46                   ` Oliver Freyermuth
  2019-08-29 13:08                     ` Christoph Anton Mitterer
@ 2019-08-29 13:09                     ` Swâmi Petaramesh
  2019-08-29 13:11                     ` Qu Wenruo
  2 siblings, 0 replies; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-08-29 13:09 UTC (permalink / raw)
  To: Oliver Freyermuth, Hans van Kranenburg, linux-btrfs

On 8/29/19 2:46 PM, Oliver Freyermuth wrote:
> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>  failed to load free space cache for block group XXXX, rebuilding it now
> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels. 

Wow. Thanks for the report. There's definitely some bug out there.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 12:46                   ` Oliver Freyermuth
  2019-08-29 13:08                     ` Christoph Anton Mitterer
  2019-08-29 13:09                     ` Swâmi Petaramesh
@ 2019-08-29 13:11                     ` Qu Wenruo
  2019-08-29 13:17                       ` Oliver Freyermuth
  2 siblings, 1 reply; 43+ messages in thread
From: Qu Wenruo @ 2019-08-29 13:11 UTC (permalink / raw)
  To: Oliver Freyermuth, Hans van Kranenburg, Swâmi Petaramesh,
	linux-btrfs



On 2019/8/29 下午8:46, Oliver Freyermuth wrote:
> Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
>> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>>> or to use the V2 space
>>>>> cache generally speaking, on any machine that I use (I had understood it
>>>>> was useful only on multi-TB filesystems...)
>>>> 10GiB is enough to create large enough block groups to utilize free
>>>> space cache.
>>>> So you can't really escape from free space cache.
>>>
>>> I meant that I had understood that the V2 space cache was preferable to
>>> V1 only for multi-TB filesystems.
>>>
>>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
>>
>> Yes.
>>
>
> This makes me wonder if it should be the default?

It will be.

Just a spoiler, I believe features like no-holes and v2 space cache will
be default in not so far future.

>
> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>  failed to load free space cache for block group XXXX, rebuilding it now
> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels.

That's interesting. In theory that shouldn't happen, especially without
unsafe shutdown.

But please also be aware that, there is no concrete proof that corrupted
v1 space cache is causing all the problems.
What I said is just, corrupted v1 space cache may cause problem, I need
to at least craft an image to proof my assumption.

>
> So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW,
> I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?).

At least, your experience would definitely help the btrfs community.

Thanks,
Qu

>
> Cheers,
> 	Oliver
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 13:11                     ` Qu Wenruo
@ 2019-08-29 13:17                       ` Oliver Freyermuth
  2019-08-29 17:40                         ` Oliver Freyermuth
  0 siblings, 1 reply; 43+ messages in thread
From: Oliver Freyermuth @ 2019-08-29 13:17 UTC (permalink / raw)
  To: Qu Wenruo, Hans van Kranenburg, Swâmi Petaramesh, linux-btrfs

Am 29.08.19 um 15:11 schrieb Qu Wenruo:
> 
> 
> On 2019/8/29 下午8:46, Oliver Freyermuth wrote:
>> Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
>>> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>>>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>>>> or to use the V2 space
>>>>>> cache generally speaking, on any machine that I use (I had understood it
>>>>>> was useful only on multi-TB filesystems...)
>>>>> 10GiB is enough to create large enough block groups to utilize free
>>>>> space cache.
>>>>> So you can't really escape from free space cache.
>>>>
>>>> I meant that I had understood that the V2 space cache was preferable to
>>>> V1 only for multi-TB filesystems.
>>>>
>>>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
>>>
>>> Yes.
>>>
>>
>> This makes me wonder if it should be the default?
> 
> It will be.
> 
> Just a spoiler, I believe features like no-holes and v2 space cache will
> be default in not so far future.
> 
>>
>> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>>  failed to load free space cache for block group XXXX, rebuilding it now
>> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
>> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels.
> 
> That's interesting. In theory that shouldn't happen, especially without
> unsafe shutdown.

I also forgot to add that in addition on the machines there is no mdraid / dm / LUKS in between (i.e. purely btrfs on the drives). 
The messages _seem_ to be more prominent for spinning disks, but after all, my statistics is just 5 devices in total. 
So it really "feels" like a bug crawling somewhere. However, the machines seem to not have not seen any actual corruption as consequence. 
I'm playing with "btrfs check --readonly" now to see if there's really everything still fine, but I'm already running kernel 5.2 with the new checks without issues. 

> But please also be aware that, there is no concrete proof that corrupted
> v1 space cache is causing all the problems.
> What I said is just, corrupted v1 space cache may cause problem, I need
> to at least craft an image to proof my assumption.

I see - that might be useful in any case to hopefully track down the issue. 

> 
>>
>> So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW,
>> I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?).
> 
> At least, your experience would definitely help the btrfs community.

Ok, then I will slowly switch the nodes one by one - in case I do not come and cry on the list, this means all is well (but I'm only a small datapoint with 5 disks in three machines) ;-). 

Cheers,
	Oliver

> 
> Thanks,
> Qu
> 
>>
>> Cheers,
>> 	Oliver
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-29 13:17                       ` Oliver Freyermuth
@ 2019-08-29 17:40                         ` Oliver Freyermuth
  0 siblings, 0 replies; 43+ messages in thread
From: Oliver Freyermuth @ 2019-08-29 17:40 UTC (permalink / raw)
  To: Qu Wenruo, Hans van Kranenburg, Swâmi Petaramesh, linux-btrfs

Am 29.08.19 um 15:17 schrieb Oliver Freyermuth:
> Am 29.08.19 um 15:11 schrieb Qu Wenruo:
>>
>>
>> On 2019/8/29 下午8:46, Oliver Freyermuth wrote:
>>> Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
>>>> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>>>>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>>>>> or to use the V2 space
>>>>>>> cache generally speaking, on any machine that I use (I had understood it
>>>>>>> was useful only on multi-TB filesystems...)
>>>>>> 10GiB is enough to create large enough block groups to utilize free
>>>>>> space cache.
>>>>>> So you can't really escape from free space cache.
>>>>>
>>>>> I meant that I had understood that the V2 space cache was preferable to
>>>>> V1 only for multi-TB filesystems.
>>>>>
>>>>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
>>>>
>>>> Yes.
>>>>
>>>
>>> This makes me wonder if it should be the default?
>>
>> It will be.
>>
>> Just a spoiler, I believe features like no-holes and v2 space cache will
>> be default in not so far future.
>>
>>>
>>> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>>>  failed to load free space cache for block group XXXX, rebuilding it now
>>> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
>>> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels.
>>
>> That's interesting. In theory that shouldn't happen, especially without
>> unsafe shutdown.
> 
> I also forgot to add that in addition on the machines there is no mdraid / dm / LUKS in between (i.e. purely btrfs on the drives). 
> The messages _seem_ to be more prominent for spinning disks, but after all, my statistics is just 5 devices in total. 
> So it really "feels" like a bug crawling somewhere. However, the machines seem to not have not seen any actual corruption as consequence. 
> I'm playing with "btrfs check --readonly" now to see if there's really everything still fine, but I'm already running kernel 5.2 with the new checks without issues. 

To calm anybody still in fear of the rebuilding warnings:
I already checked two disks (including the most affected one) and they were perfectly healthy. So it seems that at least in my cases, the "rebuilding" warning did not (yet?) coincide with any corruption. 
I have also converted the largest and most affected disk to space_cache=v2 as of now. If that works well in the next weeks, I will look at converting the rest,
and if not, I'll be back here ;-). 

Cheers,
	Oliver

> 
>> But please also be aware that, there is no concrete proof that corrupted
>> v1 space cache is causing all the problems.
>> What I said is just, corrupted v1 space cache may cause problem, I need
>> to at least craft an image to proof my assumption.
> 
> I see - that might be useful in any case to hopefully track down the issue. 
> 
>>
>>>
>>> So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW,
>>> I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?).
>>
>> At least, your experience would definitely help the btrfs community.
> 
> Ok, then I will slowly switch the nodes one by one - in case I do not come and cry on the list, this means all is well (but I'm only a small datapoint with 5 disks in three machines) ;-). 
> 
> Cheers,
> 	Oliver
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> Cheers,
>>> 	Oliver
>>>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
  2019-08-25 10:00 ` Swâmi Petaramesh
  2019-08-27 12:52 ` Michal Soltys
@ 2019-09-12  7:50 ` Filipe Manana
  2019-09-12  8:24   ` James Harvey
                     ` (2 more replies)
  2 siblings, 3 replies; 43+ messages in thread
From: Filipe Manana @ 2019-09-12  7:50 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-btrfs, David Sterba

On Sat, Aug 24, 2019 at 6:53 PM Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
>
> Hey.
>
> Anything new about the issue described here:
> https://www.spinics.net/lists/linux-btrfs/msg91046.html
>
> It was said that it might be a regression in 5.2 actually and not a
> hardware thing... so I just wonder whether I can safely move to 5.2?

So we definitely have a serious regression introduced on 5.2.
I sent out a fix for it yesterday:  https://patchwork.kernel.org/patch/11141559/

Two things can happen:

1) either a hang when committing a transaction, reported by several
users recently and hit it myself too twice when running fstests (test
case generic/475 and generic/561) after I upgradaded my development
branch from a 5.1.x kernel to a 5.3-rcX kernel. If this happens you
risk no corruption, still the hang is very inconvenient of course, as
you have to reboot.

2) writeback for some btree nodes may never be started and we end up
committing a transaction without noticing that. This is really serious
and that will lead to the "parent transid verify failed on ..."
messages.

Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
recommend running 5.2 or 5.3.

>
>
> Cheers,
> Chris.
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  7:50 ` Filipe Manana
@ 2019-09-12  8:24   ` James Harvey
  2019-09-12  9:06     ` Filipe Manana
                       ` (2 more replies)
  2019-09-12  8:48   ` Swâmi Petaramesh
  2019-09-12 13:09   ` Christoph Anton Mitterer
  2 siblings, 3 replies; 43+ messages in thread
From: James Harvey @ 2019-09-12  8:24 UTC (permalink / raw)
  To: fdmanana; +Cc: Christoph Anton Mitterer, linux-btrfs, David Sterba

On Thu, Sep 12, 2019 at 3:51 AM Filipe Manana <fdmanana@gmail.com> wrote:
> ...
>
> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
> recommend running 5.2 or 5.3.

What is your recommendation for distributions that have been shipping
5.2.x for quite some time, where a distro-wide downgrade to 5.1.x
isn't really an option that will be considered, especially because
many users aren't using BTRFS?  Can/should your patch be backported to
5.2.13/5.2.14?  Or, does it really need to be applied to 5.3rc or git
master?  Or, is it possibly not the right fix for the corruption risk,
and should a flashing neon sign be given to users to just run 5.1.x
even though the distribution repos have 5.2.x?

What is your recommendation for users who have been running 5.2.x and
running into a lot of hangs?  Would you say to apply your patch to a
custom-compiled kernel, or to downgrade to 5.1.x?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  7:50 ` Filipe Manana
  2019-09-12  8:24   ` James Harvey
@ 2019-09-12  8:48   ` Swâmi Petaramesh
  2019-09-12 13:09   ` Christoph Anton Mitterer
  2 siblings, 0 replies; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12  8:48 UTC (permalink / raw)
  To: fdmanana, Christoph Anton Mitterer; +Cc: linux-btrfs, David Sterba

Hi Filipe,

On 9/12/19 9:50 AM, Filipe Manana wrote:
> So we definitely have a serious regression introduced on 5.2.
> I sent out a fix for it yesterday:  https://patchwork.kernel.org/patch/11141559/

Many thanks for having found and patched it.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  8:24   ` James Harvey
@ 2019-09-12  9:06     ` Filipe Manana
  2019-09-12  9:09     ` Holger Hoffstätte
  2019-09-12 10:53     ` Swâmi Petaramesh
  2 siblings, 0 replies; 43+ messages in thread
From: Filipe Manana @ 2019-09-12  9:06 UTC (permalink / raw)
  To: James Harvey; +Cc: Christoph Anton Mitterer, linux-btrfs, David Sterba

On Thu, Sep 12, 2019 at 9:24 AM James Harvey <jamespharvey20@gmail.com> wrote:
>
> On Thu, Sep 12, 2019 at 3:51 AM Filipe Manana <fdmanana@gmail.com> wrote:
> > ...
> >
> > Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
> > recommend running 5.2 or 5.3.
>
> What is your recommendation for distributions that have been shipping
> 5.2.x for quite some time, where a distro-wide downgrade to 5.1.x
> isn't really an option that will be considered, especially because
> many users aren't using BTRFS?  Can/should your patch be backported to
> 5.2.13/5.2.14?

It's meant to be backported to 5.2.x and 5.3.x (probably not to 5.3
since we are at rc8 and too close to merge window for 5.4).

> Or, does it really need to be applied to 5.3rc or git
> master?  Or, is it possibly not the right fix for the corruption risk,
> and should a flashing neon sign be given to users to just run 5.1.x
> even though the distribution repos have 5.2.x?
>
> What is your recommendation for users who have been running 5.2.x and
> running into a lot of hangs?  Would you say to apply your patch to a
> custom-compiled kernel, or to downgrade to 5.1.x?

Sorry, I can't advise on that. That depends a lot on the distro and user needs.
Going back to 5.1 might be ok for some, but not for others due to
important fixes or new features/drivers in 5.2 for example.

It's really up to the distro and user to choose according to their needs.



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  8:24   ` James Harvey
  2019-09-12  9:06     ` Filipe Manana
@ 2019-09-12  9:09     ` Holger Hoffstätte
  2019-09-12 10:53     ` Swâmi Petaramesh
  2 siblings, 0 replies; 43+ messages in thread
From: Holger Hoffstätte @ 2019-09-12  9:09 UTC (permalink / raw)
  To: James Harvey; +Cc: linux-btrfs

On 9/12/19 10:24 AM, James Harvey wrote:
> On Thu, Sep 12, 2019 at 3:51 AM Filipe Manana <fdmanana@gmail.com> wrote:
>> ...
>>
>> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really
>> recommend running 5.2 or 5.3.
> 
> What is your recommendation for distributions that have been shipping
> 5.2.x for quite some time, where a distro-wide downgrade to 5.1.x
> isn't really an option that will be considered, especially because
> many users aren't using BTRFS?  Can/should your patch be backported to
> 5.2.13/5.2.14?  Or, does it really need to be applied to 5.3rc or git
> master?  Or, is it possibly not the right fix for the corruption risk,
> and should a flashing neon sign be given to users to just run 5.1.x
> even though the distribution repos have 5.2.x?

It applies and works just fine in 5.2.x, I have it running in .14.
If your distribution doesn't apply patches or just ships a random
release-of-the month kernel, well.. ¯\(ツ)/¯

> What is your recommendation for users who have been running 5.2.x and
> running into a lot of hangs?  Would you say to apply your patch to a
> custom-compiled kernel, or to downgrade to 5.1.x?

5.1.x is EOL upstream and you might be missing other critical things
like security fixes. Considering how easy it is to build a custom kernel
from an existing configuration, the former.

-h

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  8:24   ` James Harvey
  2019-09-12  9:06     ` Filipe Manana
  2019-09-12  9:09     ` Holger Hoffstätte
@ 2019-09-12 10:53     ` Swâmi Petaramesh
  2019-09-12 12:58       ` Christoph Anton Mitterer
  2 siblings, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12 10:53 UTC (permalink / raw)
  To: James Harvey, fdmanana
  Cc: Christoph Anton Mitterer, linux-btrfs, David Sterba

Le 12/09/2019 à 10:24, James Harvey a écrit :
> and should a flashing neon sign be given to users to just run 5.1.x
> even though the distribution repos have 5.2.x?
Yep, I assume that a big flashing red neon sign should be raised for a 
confirmed bug that can trash your filesystem into ashes, and actually 
did so for two of mine...

ॐ
-- 
Swâmi Petaramesh <swami@petaramesh.org> OpenPGP ID 0x1BFFD850


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 10:53     ` Swâmi Petaramesh
@ 2019-09-12 12:58       ` Christoph Anton Mitterer
  2019-10-14  4:00         ` Nicholas D Steeves
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Anton Mitterer @ 2019-09-12 12:58 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs

On Thu, 2019-09-12 at 12:53 +0200, Swâmi Petaramesh wrote:
> Yep, I assume that a big flashing red neon sign should be raised for
> a 
> confirmed bug that can trash your filesystem into ashes, and
> actually 
> did so for two of mine...

I doubt this will happen... I've asked for something like this to be
set up on the last corruption bugs but there seems to be little
interest for a warning system for users.


Cheers,
Chris.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12  7:50 ` Filipe Manana
  2019-09-12  8:24   ` James Harvey
  2019-09-12  8:48   ` Swâmi Petaramesh
@ 2019-09-12 13:09   ` Christoph Anton Mitterer
  2019-09-12 14:28     ` Filipe Manana
  2 siblings, 1 reply; 43+ messages in thread
From: Christoph Anton Mitterer @ 2019-09-12 13:09 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

Hi.

First, thanks for finding&fixing this :-)


On Thu, 2019-09-12 at 08:50 +0100, Filipe Manana wrote:
> 1) either a hang when committing a transaction, reported by several
> users recently and hit it myself too twice when running fstests (test
> case generic/475 and generic/561) after I upgradaded my development
> branch from a 5.1.x kernel to a 5.3-rcX kernel. If this happens you
> risk no corruption, still the hang is very inconvenient of course, as
> you have to reboot.

Okay inconvenient, but not so bad if there is no corruption risk.


> 2) writeback for some btree nodes may never be started and we end up
> committing a transaction without noticing that. This is really
> serious
> and that will lead to the "parent transid verify failed on ..."
> messages.

As some people have already pointed out, it will be infeasible for many
end users to downgrade (no security updates) or manually patch (well,
end-users).

Can you elaborate under which circumstances this problem occurs,
whether there are any intermediate workarounds, and whether it's always
noticed (i.e. no silence corruption)?


Thanks,
Chris.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 13:09   ` Christoph Anton Mitterer
@ 2019-09-12 14:28     ` Filipe Manana
  2019-09-12 14:39       ` Christoph Anton Mitterer
  2019-09-13 18:50       ` Pete
  0 siblings, 2 replies; 43+ messages in thread
From: Filipe Manana @ 2019-09-12 14:28 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-btrfs

On Thu, Sep 12, 2019 at 2:09 PM Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
>
> Hi.
>
> First, thanks for finding&fixing this :-)
>
>
> On Thu, 2019-09-12 at 08:50 +0100, Filipe Manana wrote:
> > 1) either a hang when committing a transaction, reported by several
> > users recently and hit it myself too twice when running fstests (test
> > case generic/475 and generic/561) after I upgradaded my development
> > branch from a 5.1.x kernel to a 5.3-rcX kernel. If this happens you
> > risk no corruption, still the hang is very inconvenient of course, as
> > you have to reboot.
>
> Okay inconvenient, but not so bad if there is no corruption risk.
>
>
> > 2) writeback for some btree nodes may never be started and we end up
> > committing a transaction without noticing that. This is really
> > serious
> > and that will lead to the "parent transid verify failed on ..."
> > messages.
>
> As some people have already pointed out, it will be infeasible for many
> end users to downgrade (no security updates) or manually patch (well,
> end-users).

Yes, but I can't do anything about that. I'm not skilled to build a
time machine to go back in time :)

>
> Can you elaborate under which circumstances this problem occurs,
> whether there are any intermediate workarounds, and whether it's always
> noticed (i.e. no silence corruption)?

It can happen whenever a transaction is being committed (or committing
the fsync log).
Every fs is at risk, unless it's always mounted in read-only and with
-o nologreplay.

A btree node/leaf (extent buffer) is dirty in memory, needs to be
written to disk, this always happens at transaction commit time,
but can also happen before that, if for some reason writeback on the
btree inode happens (due to reclaim, system under memory pressure,
etc).

If the writeback happens only at the transaction commit time, and if
one the node's pages is locked (not necessarily by btrfs,
it can happen everywhere in the memory management subsystem, page
migration for example), we ended up skipping the
writeback (start the process of writing what's in memory to disk) of a
node. This is case 2), the corruption with the error messages
"parent transid verify failed ..." in dmesg/syslog after mounting the
filesystem again.
This is very likely (as we can never rule out other bugs, be it in
btrfs or some other layer, or even hardware/firmware) what
Swâmi ran into, since he never had problems with 5.1 and older kernel
versions and has been using the same hardware for a long time.

For case 1), the hang, it happens if writeback happened before the
transaction commit as well. At transaction commit we trigger
writeback again for the same node(s), and here we hang because of the
previous attempt.
Two people reported the hang yesterday here on the list, plus at least
one more some weeks ago.
I hit it myself once last week and once 2 evenings ago with test cases
from fstests after changing my development branch from 5.1 to 5.3-rcX.

To hit any of the problems, sure, you still need to have some bad
luck, but it's impossible to tell how likely to run into it.
It depends on so many things, from workloads, system configuration, etc.
No matter how likely (and how likely will not be the same for
everyone), it's serious because if it happens you can get a corrupt
filesystem.

>
>
> Thanks,
> Chris.
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:28     ` Filipe Manana
@ 2019-09-12 14:39       ` Christoph Anton Mitterer
  2019-09-12 14:57         ` Swâmi Petaramesh
  2019-09-13 18:50       ` Pete
  1 sibling, 1 reply; 43+ messages in thread
From: Christoph Anton Mitterer @ 2019-09-12 14:39 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Thu, 2019-09-12 at 15:28 +0100, Filipe Manana wrote:
> This is case 2), the corruption with the error messages
> "parent transid verify failed ..." in dmesg/syslog after mounting the
> filesystem again.

Hmm so "at least" it will never go unnoticed, right?

This is IMO a pretty important advise, as people may want to compare
their current data with that of backups... if silent corruption would
have been possible.


Cheers,
Chris.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:39       ` Christoph Anton Mitterer
@ 2019-09-12 14:57         ` Swâmi Petaramesh
  2019-09-12 16:21           ` Zdenek Kaspar
  0 siblings, 1 reply; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12 14:57 UTC (permalink / raw)
  To: Christoph Anton Mitterer, fdmanana; +Cc: linux-btrfs

On 9/12/19 4:39 PM, Christoph Anton Mitterer wrote:
> On Thu, 2019-09-12 at 15:28 +0100, Filipe Manana wrote:
>> This is case 2), the corruption with the error messages
>> "parent transid verify failed ..." in dmesg/syslog after mounting the
>> filesystem again.
> Hmm so "at least" it will never go unnoticed, right?
>
> This is IMO a pretty important advise, as people may want to compare
> their current data with that of backups... if silent corruption would
> have been possible.
>
>
> Cheers,
> Chris.

Per my own experience hitting this bug, it definitely doesn't go
unnoticed - I haven't experienced the system hang, which obviouslly
cannot go unoticed, but the « Parent TransID » failure thing.

The most annoying consequence is that the filesystem is then beyond
repair and needs to be completely recreated. In my case I still could
backup most of my files from the damaged FS, even though a few were lost
or damaged - files that were open or had been recently changed, while
old files were still there and healthy.

So for me most of the hassle was to first recreate my FS with all its
complexity (subvols, snapshots...) and restore from a backup made just
before, then check missing files from another (luckily very recent)
other backup and trying to fix what got broke.

I have to say that I have re-upgraded said machine to latest Arch 5.2
kernel a couple weeks ago and a couple other Manjaro machines to latest
5.2 kernel as well, and haven't been hit by the bug since.

However having read that the bug is diagnosed, confirmed and fixed by
Filipe, I seriously consider downgrading my kernel back to 5.1 on the 2
Manjaro machines as it is rather straightforward, and maybe my Arch as
well... Until I'm sure that the fix made it to said distro kernels.

Fortunately other common, less “bleading edge” distros that I use, such
as Debian stable or Mint/Ubuntu, still ships a kernel which is older
than 5.1 and I will stay away from 5.2 backports...

I'm however quite concerned that the FS on which I store all of my most
precious data and considered “the safest native Linux FS available” can
still suffer such regressions that can plainly trash it to ruins.

Kind regards.

>
ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:57         ` Swâmi Petaramesh
@ 2019-09-12 16:21           ` Zdenek Kaspar
  2019-09-12 18:52             ` Swâmi Petaramesh
  0 siblings, 1 reply; 43+ messages in thread
From: Zdenek Kaspar @ 2019-09-12 16:21 UTC (permalink / raw)
  To: Swâmi Petaramesh, Christoph Anton Mitterer, fdmanana; +Cc: linux-btrfs

On 9/12/19 4:57 PM, Swâmi Petaramesh wrote:

> However having read that the bug is diagnosed, confirmed and fixed by
> Filipe, I seriously consider downgrading my kernel back to 5.1 on the 2
> Manjaro machines as it is rather straightforward, and maybe my Arch as
> well... Until I'm sure that the fix made it to said distro kernels.

It's included in [testing] right now...

https://git.archlinux.org/linux.git/log/?h=v5.2.14-arch2

Z.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 16:21           ` Zdenek Kaspar
@ 2019-09-12 18:52             ` Swâmi Petaramesh
  0 siblings, 0 replies; 43+ messages in thread
From: Swâmi Petaramesh @ 2019-09-12 18:52 UTC (permalink / raw)
  To: Zdenek Kaspar, Christoph Anton Mitterer, fdmanana; +Cc: linux-btrfs

Le 12/09/2019 à 18:21, Zdenek Kaspar a écrit :
> On 9/12/19 4:57 PM, Swâmi Petaramesh wrote:
>
>> However having read that the bug is diagnosed, confirmed and fixed by
>> Filipe, I seriously consider downgrading my kernel back to 5.1 on the 2
>> Manjaro machines as it is rather straightforward, and maybe my Arch as
>> well... Until I'm sure that the fix made it to said distro kernels.
>
> It's included in [testing] right now...
>
> https://git.archlinux.org/linux.git/log/?h=v5.2.14-arch2
> Z.


:)


ॐ
-- 
Swâmi Petaramesh <swami@petaramesh.org> OpenPGP ID 0x1BFFD850


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 14:28     ` Filipe Manana
  2019-09-12 14:39       ` Christoph Anton Mitterer
@ 2019-09-13 18:50       ` Pete
       [not found]         ` <CACzgC9gvhGwyQAKm5J1smZZjim-ecEix62ZQCY-wwJYVzMmJ3Q@mail.gmail.com>
  1 sibling, 1 reply; 43+ messages in thread
From: Pete @ 2019-09-13 18:50 UTC (permalink / raw)
  To: fdmanana, Christoph Anton Mitterer; +Cc: linux-btrfs

On 9/12/19 3:28 PM, Filipe Manana wrote:

>>> 2) writeback for some btree nodes may never be started and we end up
>>> committing a transaction without noticing that. This is really
>>> serious
>>> and that will lead to the "parent transid verify failed on ..."
>>> messages.

> Two people reported the hang yesterday here on the list, plus at least
> one more some weeks ago.

This was one of my messages that I got when I reported an issue in the
thread 'Chasing IO errors' which occurred in mid to late August.


> I hit it myself once last week and once 2 evenings ago with test cases
> from fstests after changing my development branch from 5.1 to 5.3-rcX.
> 
> To hit any of the problems, sure, you still need to have some bad
> luck, but it's impossible to tell how likely to run into it.
> It depends on so many things, from workloads, system configuration, etc.
> No matter how likely (and how likely will not be the same for
> everyone), it's serious because if it happens you can get a corrupt
> filesystem.

I can't help you with any specifics workloads causing it.  I just
notices that my fs went read only, that is all.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
       [not found]         ` <CACzgC9gvhGwyQAKm5J1smZZjim-ecEix62ZQCY-wwJYVzMmJ3Q@mail.gmail.com>
@ 2019-10-14  2:07           ` Adam Bahe
  2019-10-14  2:19             ` Qu Wenruo
  2019-10-14 17:54             ` Chris Murphy
  0 siblings, 2 replies; 43+ messages in thread
From: Adam Bahe @ 2019-10-14  2:07 UTC (permalink / raw)
  To: linux-btrfs

> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.

I know fixes went in to distro specific kernels. But wanted to verify
if the fix went into the vanilla kernel.org kernel? If so, what
version should be safe? ex:
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6

With 180 raw TB in raid1 I just want to be explicit. Thanks!


On Sun, Oct 13, 2019 at 9:01 PM Adam Bahe <adambahe@gmail.com> wrote:
>
> > Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
>
> I know fixes went in to distro specific kernels. But wanted to verify if the fix went into the vanilla kernel.org kernel? If so, what version should be safe?
>
> ex: https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
>
> With 180 raw TB in raid1 I just want to be explicit. Thanks!
>
> On Fri, Sep 13, 2019 at 11:16 PM Pete <pete@petezilla.co.uk> wrote:
>>
>> On 9/12/19 3:28 PM, Filipe Manana wrote:
>>
>> >>> 2) writeback for some btree nodes may never be started and we end up
>> >>> committing a transaction without noticing that. This is really
>> >>> serious
>> >>> and that will lead to the "parent transid verify failed on ..."
>> >>> messages.
>>
>> > Two people reported the hang yesterday here on the list, plus at least
>> > one more some weeks ago.
>>
>> This was one of my messages that I got when I reported an issue in the
>> thread 'Chasing IO errors' which occurred in mid to late August.
>>
>>
>> > I hit it myself once last week and once 2 evenings ago with test cases
>> > from fstests after changing my development branch from 5.1 to 5.3-rcX.
>> >
>> > To hit any of the problems, sure, you still need to have some bad
>> > luck, but it's impossible to tell how likely to run into it.
>> > It depends on so many things, from workloads, system configuration, etc.
>> > No matter how likely (and how likely will not be the same for
>> > everyone), it's serious because if it happens you can get a corrupt
>> > filesystem.
>>
>> I can't help you with any specifics workloads causing it.  I just
>> notices that my fs went read only, that is all.
>>
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-10-14  2:07           ` Adam Bahe
@ 2019-10-14  2:19             ` Qu Wenruo
  2019-10-14 17:54             ` Chris Murphy
  1 sibling, 0 replies; 43+ messages in thread
From: Qu Wenruo @ 2019-10-14  2:19 UTC (permalink / raw)
  To: Adam Bahe, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2388 bytes --]



On 2019/10/14 上午10:07, Adam Bahe wrote:
>> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
> 
> I know fixes went in to distro specific kernels. But wanted to verify
> if the fix went into the vanilla kernel.org kernel? If so, what
> version should be safe? ex:
> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
> 
> With 180 raw TB in raid1 I just want to be explicit. Thanks!

v5.2.15 and newer.
v5.3.0 and newer.

Kernels before v5.2 are not affected.

Thanks,
Qu
> 
> 
> On Sun, Oct 13, 2019 at 9:01 PM Adam Bahe <adambahe@gmail.com> wrote:
>>
>>> Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
>>
>> I know fixes went in to distro specific kernels. But wanted to verify if the fix went into the vanilla kernel.org kernel? If so, what version should be safe?
>>
>> ex: https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
>>
>> With 180 raw TB in raid1 I just want to be explicit. Thanks!
>>
>> On Fri, Sep 13, 2019 at 11:16 PM Pete <pete@petezilla.co.uk> wrote:
>>>
>>> On 9/12/19 3:28 PM, Filipe Manana wrote:
>>>
>>>>>> 2) writeback for some btree nodes may never be started and we end up
>>>>>> committing a transaction without noticing that. This is really
>>>>>> serious
>>>>>> and that will lead to the "parent transid verify failed on ..."
>>>>>> messages.
>>>
>>>> Two people reported the hang yesterday here on the list, plus at least
>>>> one more some weeks ago.
>>>
>>> This was one of my messages that I got when I reported an issue in the
>>> thread 'Chasing IO errors' which occurred in mid to late August.
>>>
>>>
>>>> I hit it myself once last week and once 2 evenings ago with test cases
>>>> from fstests after changing my development branch from 5.1 to 5.3-rcX.
>>>>
>>>> To hit any of the problems, sure, you still need to have some bad
>>>> luck, but it's impossible to tell how likely to run into it.
>>>> It depends on so many things, from workloads, system configuration, etc.
>>>> No matter how likely (and how likely will not be the same for
>>>> everyone), it's serious because if it happens you can get a corrupt
>>>> filesystem.
>>>
>>> I can't help you with any specifics workloads causing it.  I just
>>> notices that my fs went read only, that is all.
>>>
>>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-09-12 12:58       ` Christoph Anton Mitterer
@ 2019-10-14  4:00         ` Nicholas D Steeves
  0 siblings, 0 replies; 43+ messages in thread
From: Nicholas D Steeves @ 2019-10-14  4:00 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Swâmi Petaramesh; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

Hi Christoph and Swâmi,

Christoph Anton Mitterer <calestyo@scientia.net> writes:

> On Thu, 2019-09-12 at 12:53 +0200, Swâmi Petaramesh wrote:
>> Yep, I assume that a big flashing red neon sign should be raised for
>> a 
>> confirmed bug that can trash your filesystem into ashes, and
>> actually 
>> did so for two of mine...
>
> I doubt this will happen... I've asked for something like this to be
> set up on the last corruption bugs but there seems to be little
> interest for a warning system for users.

I used to track such bugs on the Debian wiki page for Btrfs...but users
of Debian and derivatives continued to track
sid/testing/stable-backports kernel, which made me feel like that work
was a waste of time.  Now that page has a warning that reads something
along the lines of "sid/testing/backports kernels periodically have
grave dataloss bugs.  Please track the most recent upstream LTS kernel
if a kernel newer than 4.19.x is required.  That said, upstream
appreciates bug reports using the most recent kernel available to you".

If you'd like to maintain a section at the top of that page that tracks
this type of issue, please go ahead.  I'd rather work on getting boot
environments working properly, then making them easy to use, then
enabling staged upgrades in a rw snapshot before rotating that snapshot
onto the rootfs.

P.S. Do you want to co-found a BTRFS integration team in Debian?  We're
still quite a ways behind SUSE, and even Fedora is ahead of us now!

Regards,
Nicholas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Massive filesystem corruption since kernel 5.2 (ARCH)
  2019-10-14  2:07           ` Adam Bahe
  2019-10-14  2:19             ` Qu Wenruo
@ 2019-10-14 17:54             ` Chris Murphy
  1 sibling, 0 replies; 43+ messages in thread
From: Chris Murphy @ 2019-10-14 17:54 UTC (permalink / raw)
  To: Adam Bahe; +Cc: Btrfs BTRFS

On Sun, Oct 13, 2019 at 8:07 PM Adam Bahe <adambahe@gmail.com> wrote:
>
> > Until the fix gets merged to 5.2 kernels (and 5.3), I don't really recommend running 5.2 or 5.3.
>
> I know fixes went in to distro specific kernels. But wanted to verify
> if the fix went into the vanilla kernel.org kernel? If so, what
> version should be safe? ex:
> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.6
>
> With 180 raw TB in raid1 I just want to be explicit. Thanks!

It's fixed upstream stable since 5.2.15, and includes all 5.3.x series.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-10-14 17:54 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-24 17:44 Massive filesystem corruption since kernel 5.2 (ARCH) Christoph Anton Mitterer
2019-08-25 10:00 ` Swâmi Petaramesh
2019-08-27  0:00   ` Christoph Anton Mitterer
2019-08-27  5:06     ` Swâmi Petaramesh
2019-08-27  6:13       ` Swâmi Petaramesh
2019-08-27  6:21         ` Qu Wenruo
2019-08-27  6:34           ` Swâmi Petaramesh
2019-08-27  6:52             ` Qu Wenruo
2019-08-27  9:14               ` Swâmi Petaramesh
2019-08-27 12:40                 ` Hans van Kranenburg
2019-08-29 12:46                   ` Oliver Freyermuth
2019-08-29 13:08                     ` Christoph Anton Mitterer
2019-08-29 13:09                     ` Swâmi Petaramesh
2019-08-29 13:11                     ` Qu Wenruo
2019-08-29 13:17                       ` Oliver Freyermuth
2019-08-29 17:40                         ` Oliver Freyermuth
     [not found]               ` <-z770dp-y45icx-naspi1dhhf7m-b1jjq3853x22lswnef-p5g363n8kd2f-vdijlg-jk4z4q-raec5-em5djr-et1h33i4xib8jxzw1zxyza74-miq3zn-e4azxaaeyo3abtrf6zj8nb18-hbhrrmnr1ww1.1566894946135@email.android.com>
2019-08-27 12:34                 ` Re : " Qu Wenruo
2019-08-27 10:59           ` Swâmi Petaramesh
2019-08-27 11:11             ` Alberto Bursi
2019-08-27 11:20               ` Swâmi Petaramesh
2019-08-27 11:29                 ` Alberto Bursi
2019-08-27 11:45                   ` Swâmi Petaramesh
2019-08-27 17:49               ` Swâmi Petaramesh
2019-08-27 22:10               ` Chris Murphy
2019-08-27 12:52 ` Michal Soltys
2019-09-12  7:50 ` Filipe Manana
2019-09-12  8:24   ` James Harvey
2019-09-12  9:06     ` Filipe Manana
2019-09-12  9:09     ` Holger Hoffstätte
2019-09-12 10:53     ` Swâmi Petaramesh
2019-09-12 12:58       ` Christoph Anton Mitterer
2019-10-14  4:00         ` Nicholas D Steeves
2019-09-12  8:48   ` Swâmi Petaramesh
2019-09-12 13:09   ` Christoph Anton Mitterer
2019-09-12 14:28     ` Filipe Manana
2019-09-12 14:39       ` Christoph Anton Mitterer
2019-09-12 14:57         ` Swâmi Petaramesh
2019-09-12 16:21           ` Zdenek Kaspar
2019-09-12 18:52             ` Swâmi Petaramesh
2019-09-13 18:50       ` Pete
     [not found]         ` <CACzgC9gvhGwyQAKm5J1smZZjim-ecEix62ZQCY-wwJYVzMmJ3Q@mail.gmail.com>
2019-10-14  2:07           ` Adam Bahe
2019-10-14  2:19             ` Qu Wenruo
2019-10-14 17:54             ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).