On 2020/1/17 下午10:02, David Sterba wrote:
> On Fri, Jan 17, 2020 at 08:54:35AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/1/16 下午10:29, David Sterba wrote:
>>> On Wed, Jan 15, 2020 at 11:41:28AM +0800, Qu Wenruo wrote:
>>>> [BUG]
>>>> When there are a lot of metadata space reserved, e.g. after balancing a
>>>> data block with many extents, vanilla df would report 0 available space.
>>>>
>>>> [CAUSE]
>>>> btrfs_statfs() would report 0 available space if its metadata space is
>>>> exhausted.
>>>> And the calculation is based on currently reserved space vs on-disk
>>>> available space, with a small headroom as buffer.
>>>> When there is not enough headroom, btrfs_statfs() will report 0
>>>> available space.
>>>>
>>>> The problem is, since commit ef1317a1b9a3 ("btrfs: do not allow
>>>> reservations if we have pending tickets"), we allow btrfs to over commit
>>>> metadata space, as long as we have enough space to allocate new metadata
>>>> chunks.
>>>>
>>>> This makes old calculation unreliable and report false 0 available space.
>>>>
>>>> [FIX]
>>>> Don't do such naive check anymore for btrfs_statfs().
>>>> Also remove the comment about "0 available space when metadata is
>>>> exhausted".
>>>
>>> This is intentional and was added to prevent a situation where 'df'
>>> reports available space but exhausted metadata don't allow to create new
>>> inode.
>>
>> But this behavior itself is not accurate.
>>
>> We have global reservation, which is normally always larger than the
>> immediate number 4M.
> 
> The global block reserve is subtracted from the metadata accounted from
> the block groups. And after that, if there's only little space left, the
> check triggers. Because at this point any new metadata reservation
> cannot be satisfied from the remaining space, yet there's >0 reported.

OK, then we need to do over-commit calculation here, and do the 4M
calculation.

The quick solution I can think of would go back to Josef's solution by
exporting can_overcommit() to do the calculation.


But my biggest problem is, do we really need to do all these hassle?
My argument is, other fs like ext4/xfs still has their inode number
limits, and they don't report 0 avail when  that get exhausted.
(Although statfs() has such report mechanism for them though).

If it's a different source making us unable to write data, I believe it
should be reported in different way.

Thanks,
Qu

> 
>> So that check will never really be triggered.
>>
>> Thus invalidating most of your argument.
> 
> Please read the current comment and code in statfs again.
>