On 08/10/2019 11:26, Qu Wenruo wrote: > > > On 2019/10/8 下午5:14, Johannes Thumshirn wrote: >>> [[Benchmark]] >>> Since I have upgraded my rig to all NVME storage, there is no HDD >>> test result. >>> >>> Physical device: NVMe SSD >>> VM device: VirtIO block device, backup by sparse file >>> Nodesize: 4K (to bump up tree height) >>> Extent data size: 4M >>> Fs size used: 1T >>> >>> All file extents on disk is in 4M size, preallocated to reduce space usage >>> (as the VM uses loopback block device backed by sparse file) >> >> Do you have a some additional details about the test setup? I tried to >> do the same (testing) for a bug Felix (added to Cc) reported to my at >> the ALPSS Conference and I couldn't reproduce the issue. >> >> My testing was a 100TB sparse file passed into a VM and running this >> script to touch all blockgroups: > > Here is my test scripts: > --- > #!/bin/bash > > dev="/dev/vdb" > mnt="/mnt/btrfs" > > nr_subv=16 > nr_extents=16384 > extent_size=$((4 * 1024 * 1024)) # 4M > > _fail() > { > echo "!!! FAILED: $@ !!!" > exit 1 > } > > fill_one_subv() > { > path=$1 > if [ -z $path ]; then > _fail "wrong parameter for fill_one_subv" > fi > btrfs subv create $path || _fail "create subv" > > for i in $(seq 0 $((nr_extents - 1))); do > fallocate -o $((i * $extent_size)) -l $extent_size > $path/file || _fail "fallocate" > done > } > > declare -a pids > umount $mnt &> /dev/null > umount $dev &> /dev/null > > #~/btrfs-progs/mkfs.btrfs -f -n 4k $dev -O bg-tree > mkfs.btrfs -f -n 4k $dev > mount $dev $mnt -o nospace_cache > > for i in $(seq 1 $nr_subv); do > fill_one_subv $mnt/subv_${i} & > pids[$i]=$! > done > > for i in $(seq 1 $nr_subv); do > wait ${pids[$i]} > done > sync > umount $dev > > --- > >> >> #!/bin/sh >> >> FILE=/mnt/test >> >> add_dirty_bg() { >> off="$1" >> len="$2" >> touch $FILE >> xfs_io -c "falloc $off $len" $FILE >> rm $FILE >> } >> >> mkfs.btrfs /dev/vda >> mount /dev/vda /mnt >> >> for ((i = 1; i < 100000; i++)); do >> add_dirty_bg $i"G" "1G" >> done > > This wont really build a good enough extent tree layout. > > 1G fallocate will only cause 8 128M file extents, thus 8 EXTENT_ITEMs. > > Thus a leaf (16K by default) can still contain a lot of BLOCK_GROUPS all > together. > > To build a case to really show the problem, you'll need a lot of > EXTENT_ITEM/METADATA_ITEMS to fill the gaps between BLOCK_GROUPS. > > My test scripts did that, but may still not represent the real world, as > real world can cause even smaller extents due to snapshots. > Ah thanks for the explanation. I'll give your testscript a try. -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany (HRB 247165, AG München) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850