On 2020/3/16 下午1:19, Tomasz Chmielewski wrote:
> On 2020-03-16 14:06, Qu Wenruo wrote:
>> On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
>>> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), the
>>> system panics shortly after mounting and starting to use a btrfs
>>> filesystem. Here is a dmesg - please advise how to deal with it.
>>> It has since crashed several times, because of panic=10 parameter
>>> (system boots, runs for a while, crashes, boots again, and so on).
>>>
>>> Mount options:
>>>
>>> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
>>>
>>>
>>>
>>> [   65.777428] BTRFS info (device sda2): enabling ssd optimizations
>>> [   65.777435] BTRFS info (device sda2): using free space tree
>>> [   65.777436] BTRFS info (device sda2): has skinny extents
>>> [   98.225099] BTRFS error (device sda2): parent transid verify failed
>>> on 19718118866944 wanted 664218442 found 674530371
>>> [   98.225594] BTRFS error (device sda2): parent transid verify failed
>>> on 19718118866944 wanted 664218442 found 674530371
>>
>> This is the root cause, not quota.
>>
>> The metadata is already corrupted, and quota is the first to complain
>> about it.
> 
> Still, should it crash the server, putting it into a cycle of
> crash-boot-crash-boot, possibly breaking the filesystem even more?

The transid mismatch in the first place is the cause, and I'm not sure
how it happened.

Did you have any history of the kernel used on that server?

Some potential corruption source includes the v5.2.0~v5.2.14, which
could cause some tree block not written to disk.

> 
> Also, how do I fix that corruption?
> 
> This server had a drive added, a full balance (to RAID-10 for data and
> metadata) and scrub a few weeks ago, with no errors. Running scrub now
> to see if it shows up anything.

Then at least at that time, it's not corrupted.

Is there any sudden powerloss happened in recent days?
Another potential cause is out of spec FLUSH/FUA behavior, which means
the hard disk controller is not reporting correct FLUSH/FUA finish.

That means if you use the same disk/controller, and manually to cause
powerloss, it would fail just after several cycle.

Thanks,
Qu

> 
> btrfs filesystem stats also shows no errors:
> 
> # btrfs device stats /data/lxd
> [/dev/sda2].write_io_errs    0
> [/dev/sda2].read_io_errs     0
> [/dev/sda2].flush_io_errs    0
> [/dev/sda2].corruption_errs  0
> [/dev/sda2].generation_errs  0
> [/dev/sdd2].write_io_errs    0
> [/dev/sdd2].read_io_errs     0
> [/dev/sdd2].flush_io_errs    0
> [/dev/sdd2].corruption_errs  0
> [/dev/sdd2].generation_errs  0
> [/dev/sdc2].write_io_errs    0
> [/dev/sdc2].read_io_errs     0
> [/dev/sdc2].flush_io_errs    0
> [/dev/sdc2].corruption_errs  0
> [/dev/sdc2].generation_errs  0
> [/dev/sdb2].write_io_errs    0
> [/dev/sdb2].read_io_errs     0
> [/dev/sdb2].flush_io_errs    0
> [/dev/sdb2].corruption_errs  0
> [/dev/sdb2].generation_errs  0
> 
> 
> Tomasz Chmielewski
> https://lxadm.com