On 2019/6/5 上午1:43, Josef Bacik wrote: > On Tue, Jun 04, 2019 at 08:31:23AM +0800, Qu Wenruo wrote: >> >> >> On 2019/6/4 上午1:36, Josef Bacik wrote: >>> On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote: >>>> >>>> >>>> On 2019/2/13 上午12:03, David Sterba wrote: >>>>> On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote: >>>>>> Previously callers to btrfs_end_transaction_throttle() would commit the >>>>>> transaction if there wasn't enough delayed refs space. This happens in >>>>>> relocation, and if the fs is relatively empty we'll run out of delayed >>>>>> refs space basically immediately, so we'll just be stuck in this loop of >>>>>> committing the transaction over and over again. >>>>>> >>>>>> This code existed because we didn't have a good feedback mechanism for >>>>>> running delayed refs, but with the delayed refs rsv we do now. Delete >>>>>> this throttling code and let the btrfs_start_transaction() in relocation >>>>>> deal with putting pressure on the delayed refs infrastructure. With >>>>>> this patch we no longer take 5 minutes to balance a metadata only fs. >>>>>> >>>>>> Signed-off-by: Josef Bacik >>>>> >>>>> For the record, this has been merged to 5.0-rc5 >>>>> >>>> >>>> Bisecting leads me to this patch for strange balance ENOSPC. >>>> >>>> Can be reproduced by btrfs/156, or the following small script: >>>> ------ >>>> #!/bin/bash >>>> dev="/dev/test/test" >>>> mnt="/mnt/btrfs" >>>> >>>> _fail() >>>> { >>>> echo "!!! FAILED: $@ !!!" >>>> exit 1 >>>> } >>>> >>>> do_work() >>>> { >>>> umount $dev &> /dev/null >>>> umount $mnt &> /dev/null >>>> >>>> mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null >>>> >>>> mount $dev $mnt >>>> >>>> for i in $(seq -w 0 511); do >>>> # xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null >>>> xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null >>>> done >>>> sync >>>> >>>> btrfs balance start --full $mnt || return 1 >>>> sync >>>> >>>> >>>> btrfs balance start --full $mnt || return 1 >>>> umount $mnt >>>> } >>>> >>>> failed=0 >>>> for i in $(seq -w 0 24); do >>>> echo "=== run $i ===" >>>> do_work >>>> if [ $? -eq 1 ]; then >>>> failed=$(($failed + 1)) >>>> fi >>>> done >>>> if [ $failed -ne 0 ]; then >>>> echo "!!! failed $failed/25 !!!" >>>> else >>>> echo "=== all passes ===" >>>> fi >>>> ------ >>>> >>>> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare). >>>> But at that patch (upstream commit >>>> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25. >>>> >>>> Any idea for that ENOSPC problem? >>>> As it looks really wired for the 2nd full balance to fail even we have >>>> enough unallocated space. >>>> >>> >>> I've been running this all morning on kdave's misc-next and not had a single >>> failure. I ran it a few times on spinning rust and a few times on my nvme >>> drive. I wouldn't doubt that it's failing for you, but I can't reproduce. It >>> would be helpful to know where the ENOSPC was coming from so I can think of >>> where the problem might be. Thanks, >>> >>> Josef >>> >> >> Since v5.2-rc2 has a lot of enospc debug output merged, here is the >> debug info just by enospc_debug: >> > > Ah ok, sorry I'm travelling so I can't easily test a patch right now, but change > the btrfs_join_transaction() in btrfs_inc_block_group_ro to > btrfs_start_transaction(root, 0); This will trigger the delayed ref flushing if > we need it and likely will fix the problem. Unfortunately, it doesn't work as expected. False ENOSPC still gets triggered. The same problem for system chunk, min_alloc is causing problem, as pinned/reserved/may_use are all zero. Just the min_allocable of 1M makes the check to fail. While for metadata, btrfs_start_transaction(root, 0) doesn't solve the problem as reserved and may_use is still relatively high. I'll dig further to find a different way to solve it. Thanks, Qu > There's so much random cruft built > into the relocation enospc stuff that we're likely to keep finding problems like > this, we just need to rework it so it's still tripping over the normal > reservation path. Thanks, > > Josef >