On 2018/10/11 上午1:25, Larkin Lowrey wrote:
> On 10/10/2018 12:04 PM, Holger Hoffstätte wrote:
>> On 10/10/18 17:44, Larkin Lowrey wrote:
>> (..)
>>> About once a week, or so, I'm running into the above situation where
>>> FS seems to deadlock. All IO to the FS blocks, there is no IO
>>> activity at all. I have to hard reboot the system to recover. There
>>> are no error indications except for the following which occurs well
>>> before the FS freezes up:
>>>
>>> BTRFS warning (device dm-3): block group 78691883286528 has wrong
>>> amount of free space
>>> BTRFS warning (device dm-3): failed to load free space cache for
>>> block group 78691883286528, rebuilding it now
>>>
>>> Do I have any options other the nuking the FS and starting over?
>>
>> Unmount cleanly & mount again with -o space_cache=v2.
> 
> It froze while unmounting. The attached zip is a stack dump captured via
> 'echo t > /proc/sysrq-trigger'. A second attempt after a hard reboot
> worked.

The trace shows it's indeed free space cache write back code causing the
problem.

It may be a deadlock caused by nested tree locks caused by extent
allocator and free space writeback code.

To avoid such problem, you could completely disable v1 free space cache
or goes to v2 cache.

Chris Murphy's guide should be pretty good.


Personally speaking, if your usage is not a performance critical case,
the following things can be disable and avoid possible bugs:

1) free space cache
   It only increase the speed to lookup free space.
2) tree log
   It only speed up fsync() causes. Without it we just falls back to
   sync()

So I'd recommend the following mount option:
nospace_cache,notreelog

Thanks,
Qu

> 
> --Larkin