On 2020/1/17 下午10:02, David Sterba wrote: > On Fri, Jan 17, 2020 at 08:54:35AM +0800, Qu Wenruo wrote: >> >> >> On 2020/1/16 下午10:29, David Sterba wrote: >>> On Wed, Jan 15, 2020 at 11:41:28AM +0800, Qu Wenruo wrote: >>>> [BUG] >>>> When there are a lot of metadata space reserved, e.g. after balancing a >>>> data block with many extents, vanilla df would report 0 available space. >>>> >>>> [CAUSE] >>>> btrfs_statfs() would report 0 available space if its metadata space is >>>> exhausted. >>>> And the calculation is based on currently reserved space vs on-disk >>>> available space, with a small headroom as buffer. >>>> When there is not enough headroom, btrfs_statfs() will report 0 >>>> available space. >>>> >>>> The problem is, since commit ef1317a1b9a3 ("btrfs: do not allow >>>> reservations if we have pending tickets"), we allow btrfs to over commit >>>> metadata space, as long as we have enough space to allocate new metadata >>>> chunks. >>>> >>>> This makes old calculation unreliable and report false 0 available space. >>>> >>>> [FIX] >>>> Don't do such naive check anymore for btrfs_statfs(). >>>> Also remove the comment about "0 available space when metadata is >>>> exhausted". >>> >>> This is intentional and was added to prevent a situation where 'df' >>> reports available space but exhausted metadata don't allow to create new >>> inode. >> >> But this behavior itself is not accurate. >> >> We have global reservation, which is normally always larger than the >> immediate number 4M. > > The global block reserve is subtracted from the metadata accounted from > the block groups. And after that, if there's only little space left, the > check triggers. Because at this point any new metadata reservation > cannot be satisfied from the remaining space, yet there's >0 reported. OK, then we need to do over-commit calculation here, and do the 4M calculation. The quick solution I can think of would go back to Josef's solution by exporting can_overcommit() to do the calculation. But my biggest problem is, do we really need to do all these hassle? My argument is, other fs like ext4/xfs still has their inode number limits, and they don't report 0 avail when that get exhausted. (Although statfs() has such report mechanism for them though). If it's a different source making us unable to write data, I believe it should be reported in different way. Thanks, Qu > >> So that check will never really be triggered. >> >> Thus invalidating most of your argument. > > Please read the current comment and code in statfs again. >