On 2018/10/11 上午1:25, Larkin Lowrey wrote: > On 10/10/2018 12:04 PM, Holger Hoffstätte wrote: >> On 10/10/18 17:44, Larkin Lowrey wrote: >> (..) >>> About once a week, or so, I'm running into the above situation where >>> FS seems to deadlock. All IO to the FS blocks, there is no IO >>> activity at all. I have to hard reboot the system to recover. There >>> are no error indications except for the following which occurs well >>> before the FS freezes up: >>> >>> BTRFS warning (device dm-3): block group 78691883286528 has wrong >>> amount of free space >>> BTRFS warning (device dm-3): failed to load free space cache for >>> block group 78691883286528, rebuilding it now >>> >>> Do I have any options other the nuking the FS and starting over? >> >> Unmount cleanly & mount again with -o space_cache=v2. > > It froze while unmounting. The attached zip is a stack dump captured via > 'echo t > /proc/sysrq-trigger'. A second attempt after a hard reboot > worked. The trace shows it's indeed free space cache write back code causing the problem. It may be a deadlock caused by nested tree locks caused by extent allocator and free space writeback code. To avoid such problem, you could completely disable v1 free space cache or goes to v2 cache. Chris Murphy's guide should be pretty good. Personally speaking, if your usage is not a performance critical case, the following things can be disable and avoid possible bugs: 1) free space cache It only increase the speed to lookup free space. 2) tree log It only speed up fsync() causes. Without it we just falls back to sync() So I'd recommend the following mount option: nospace_cache,notreelog Thanks, Qu > > --Larkin