On 2019/6/5 上午1:43, Josef Bacik wrote:
> On Tue, Jun 04, 2019 at 08:31:23AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/6/4 上午1:36, Josef Bacik wrote:
>>> On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2019/2/13 上午12:03, David Sterba wrote:
>>>>> On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
>>>>>> Previously callers to btrfs_end_transaction_throttle() would commit the
>>>>>> transaction if there wasn't enough delayed refs space.  This happens in
>>>>>> relocation, and if the fs is relatively empty we'll run out of delayed
>>>>>> refs space basically immediately, so we'll just be stuck in this loop of
>>>>>> committing the transaction over and over again.
>>>>>>
>>>>>> This code existed because we didn't have a good feedback mechanism for
>>>>>> running delayed refs, but with the delayed refs rsv we do now.  Delete
>>>>>> this throttling code and let the btrfs_start_transaction() in relocation
>>>>>> deal with putting pressure on the delayed refs infrastructure.  With
>>>>>> this patch we no longer take 5 minutes to balance a metadata only fs.
>>>>>>
>>>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>>>
>>>>> For the record, this has been merged to 5.0-rc5
>>>>>
>>>>
>>>> Bisecting leads me to this patch for strange balance ENOSPC.
>>>>
>>>> Can be reproduced by btrfs/156, or the following small script:
>>>> ------
>>>> #!/bin/bash
>>>> dev="/dev/test/test"
>>>> mnt="/mnt/btrfs"
>>>>
>>>> _fail()
>>>> {
>>>> 	echo "!!! FAILED: $@ !!!"
>>>> 	exit 1
>>>> }
>>>>
>>>> do_work()
>>>> {
>>>> 	umount $dev &> /dev/null
>>>> 	umount $mnt &> /dev/null
>>>>
>>>> 	mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>>>>
>>>> 	mount $dev $mnt
>>>>
>>>> 	for i in $(seq -w 0 511); do
>>>> 	#	xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
>>>> 		xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
>>>> 	done
>>>> 	sync
>>>>
>>>> 	btrfs balance start --full $mnt || return 1
>>>> 	sync
>>>>
>>>>
>>>> 	btrfs balance start --full $mnt || return 1
>>>> 	umount $mnt
>>>> }
>>>>
>>>> failed=0
>>>> for i in $(seq -w 0 24); do
>>>> 	echo "=== run $i ==="
>>>> 	do_work
>>>> 	if [ $? -eq 1 ]; then
>>>> 		failed=$(($failed + 1))
>>>> 	fi
>>>> done
>>>> if [ $failed -ne 0 ]; then
>>>> 	echo "!!! failed $failed/25 !!!"
>>>> else
>>>> 	echo "=== all passes ==="
>>>> fi
>>>> ------
>>>>
>>>> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
>>>> But at that patch (upstream commit
>>>> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
>>>>
>>>> Any idea for that ENOSPC problem?
>>>> As it looks really wired for the 2nd full balance to fail even we have
>>>> enough unallocated space.
>>>>
>>>
>>> I've been running this all morning on kdave's misc-next and not had a single
>>> failure.  I ran it a few times on spinning rust and a few times on my nvme
>>> drive.  I wouldn't doubt that it's failing for you, but I can't reproduce.  It
>>> would be helpful to know where the ENOSPC was coming from so I can think of
>>> where the problem might be.  Thanks,
>>>
>>> Josef
>>>
>>
>> Since v5.2-rc2 has a lot of enospc debug output merged, here is the
>> debug info just by enospc_debug:
>>
> 
> Ah ok, sorry I'm travelling so I can't easily test a patch right now, but change
> the btrfs_join_transaction() in btrfs_inc_block_group_ro to
> btrfs_start_transaction(root, 0);  This will trigger the delayed ref flushing if
> we need it and likely will fix the problem.

Unfortunately, it doesn't work as expected.

False ENOSPC still gets triggered.

The same problem for system chunk, min_alloc is causing problem, as
pinned/reserved/may_use are all zero.
Just the min_allocable of 1M makes the check to fail.

While for metadata, btrfs_start_transaction(root, 0) doesn't solve the
problem as reserved and may_use is still relatively high.

I'll dig further to find a different way to solve it.

Thanks,
Qu

>  There's so much random cruft built
> into the relocation enospc stuff that we're likely to keep finding problems like
> this, we just need to rework it so it's still tripping over the normal
> reservation path.  Thanks,
> 
> Josef 
>