On 2019/10/8 下午5:14, Johannes Thumshirn wrote: >> [[Benchmark]] >> Since I have upgraded my rig to all NVME storage, there is no HDD >> test result. >> >> Physical device: NVMe SSD >> VM device: VirtIO block device, backup by sparse file >> Nodesize: 4K (to bump up tree height) >> Extent data size: 4M >> Fs size used: 1T >> >> All file extents on disk is in 4M size, preallocated to reduce space usage >> (as the VM uses loopback block device backed by sparse file) > > Do you have a some additional details about the test setup? I tried to > do the same (testing) for a bug Felix (added to Cc) reported to my at > the ALPSS Conference and I couldn't reproduce the issue. > > My testing was a 100TB sparse file passed into a VM and running this > script to touch all blockgroups: Here is my test scripts: --- #!/bin/bash dev="/dev/vdb" mnt="/mnt/btrfs" nr_subv=16 nr_extents=16384 extent_size=$((4 * 1024 * 1024)) # 4M _fail() { echo "!!! FAILED: $@ !!!" exit 1 } fill_one_subv() { path=$1 if [ -z $path ]; then _fail "wrong parameter for fill_one_subv" fi btrfs subv create $path || _fail "create subv" for i in $(seq 0 $((nr_extents - 1))); do fallocate -o $((i * $extent_size)) -l $extent_size $path/file || _fail "fallocate" done } declare -a pids umount $mnt &> /dev/null umount $dev &> /dev/null #~/btrfs-progs/mkfs.btrfs -f -n 4k $dev -O bg-tree mkfs.btrfs -f -n 4k $dev mount $dev $mnt -o nospace_cache for i in $(seq 1 $nr_subv); do fill_one_subv $mnt/subv_${i} & pids[$i]=$! done for i in $(seq 1 $nr_subv); do wait ${pids[$i]} done sync umount $dev --- > > #!/bin/sh > > FILE=/mnt/test > > add_dirty_bg() { > off="$1" > len="$2" > touch $FILE > xfs_io -c "falloc $off $len" $FILE > rm $FILE > } > > mkfs.btrfs /dev/vda > mount /dev/vda /mnt > > for ((i = 1; i < 100000; i++)); do > add_dirty_bg $i"G" "1G" > done This wont really build a good enough extent tree layout. 1G fallocate will only cause 8 128M file extents, thus 8 EXTENT_ITEMs. Thus a leaf (16K by default) can still contain a lot of BLOCK_GROUPS all together. To build a case to really show the problem, you'll need a lot of EXTENT_ITEM/METADATA_ITEMS to fill the gaps between BLOCK_GROUPS. My test scripts did that, but may still not represent the real world, as real world can cause even smaller extents due to snapshots. Thanks, Qu > > umount /mnt > > >