On 2019/12/5 上午12:39, David Sterba wrote: > On Tue, Dec 03, 2019 at 02:42:50PM +0800, Qu Wenruo wrote: >> [PROBLEM] >> There are quite some users reporting that 'btrfs balance cancel' slow to >> cancel current running balance, or even doesn't work for certain dead >> balance loop. >> >> With the following script showing how long it takes to fully stop a >> balance: >> #!/bin/bash >> dev=/dev/test/test >> mnt=/mnt/btrfs >> >> umount $mnt &> /dev/null >> umount $dev &> /dev/null >> >> mkfs.btrfs -f $dev >> mount $dev -o nospace_cache $mnt >> >> dd if=/dev/zero bs=1M of=$mnt/large & >> dd_pid=$! >> >> sleep 3 >> kill -KILL $dd_pid >> sync >> >> btrfs balance start --bg --full $mnt & >> sleep 1 >> >> echo "cancel request" >> /dev/kmsg >> time btrfs balance cancel $mnt >> umount $mnt >> >> It takes around 7~10s to cancel the running balance in my test >> environment. >> >> [CAUSE] >> Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel >> request are queued. >> However that cancelling request is only checked after relocating a block >> group. > > Yes that's the reason why it takes so long to cancel. Adding more > cancellation points is fine, but I don't know what exactly happens when > the block group relocation is not finished. There's code to merge the > reloc inode and commit that, but that's only a high-level view of the > thing. When cancelled, we still merge the reloc roots with its source (if possible, as we still do the check for last_snapshot generation). That means, if balance is canceled halfway, we still merge what is relocated. Then do the regular cleanup (cleanup the reloc tree). I see no problem doing faster canceling here. Or do you have any extra concern? Thanks, Qu