regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Bruno Damasceno Freire <bdamasceno@hotmail.com.br>
To: Thorsten Leemhuis <regressions@leemhuis.info>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [regression] 5.15 kernel triggering 100x more inode evictions
Date: Mon, 4 Apr 2022 05:29:18 +0000	[thread overview]
Message-ID: <MN2PR20MB251205164078C367C3FC4166D2E59@MN2PR20MB2512.namprd20.prod.outlook.com> (raw)
In-Reply-To: <35b62998-e386-2032-5a7a-07e3413b3bc1@leemhuis.info>

On 28.03.22 13:28, Thorsten Leemhuis wrote:
> Hi Btrfs developers, this is your Linux kernel regression tracker.
> Top-posting for once, to make this easily accessible to everyone.
> 
> Are there any known regressions in 5.15 that cause more inode evictions?
> There is a user reporting such problems, see the msg quoted below, which
> is the last from this thread:
> 
> https://lore.kernel.org/all/MN2PR20MB2512314446801B92562E26B5D2169@MN2PR20MB2512.namprd20.prod.outlook.com/
> 
> @Bruno: sorry, you report is very hard to follow at least for me. Maybe
> the btrfs developers have a idea what's wrong here, but if not you
> likely need to make this easier for all us:
> 
> Write a really simple testcase and run it on those vanilla kernels that
> matter, which are in this case: The latest 5.17.y and 5.15.y releases to
> check if it was already solved -- and if not, on 5.14 and 5.15. Maybe
> you need to bisect the issue to bring us closer the the root of the
> problem. But with a bit of luck the btrfs developers might have an idea
> already.
> 
> Ciao, Thosten

Hi Thorsten,
Thanks for the feedback.
I tried my best to make it simple this time.
I hope it is, at least, better than before.
Grazie, Bruno

*** Test case about high inode eviction on the 5.15 kernel ***

This regression was first observed during rpm operations with specific packages that became A LOT slower to update ranging from 4 to 30 minutes [1].
The symptoms are: high cpu usage, high inode eviction and much slower I/O performance.
Analyzing the rpm's strace report and making some experiments I could replace the rpm with a script that do 3 thing:
- rename a file, unlink the renamed file and create a new file.

This test case is designed to trigger the following regression on the 5.15 kernel:
* repeated renameat2, unlink and openat system calls reach files with btrfs compression property enable.
* the combination of these system calls and the btrfs compression property triggers the high inode eviction.
* the high inode eviction causes to much work for the btrfs directory logging.
* the overloaded btrfs directory logging causes the slower I/O performance.

A simplified script is supplied.
For a more capable script, more information and more test results please refer to my github account [2].

[1] https://bugzilla.opensuse.org/show_bug.cgi?id=1193549
[2] https://github.com/bdamascen0/s3e

Index:
1.   Vanilla kernels
2.   k5.14.21 x k5.15.32
2.1. Detailed test report for compressed files
2.2. Comparison results for compressed files
2.3. Detailed test report for uncompressed files
2.4. Comparison results for uncompressed files
2.5. Cpu usage
3.   k5.17.1 and k5.16.15
3.1. Basic test report
4.   Simplified test script


1.   Vanilla kernels

This test case mainly covers the following vanilla kernels [3]: 5.15.32 and 5.14.21.
The 5.15.32 vanilla kernel produced abnormal results:
* high cpu usage, high inode eviction and much slower I/O performance for compressed files.
* double inode eviction and slightly worse I/O performance for uncompressed files.
The 5.14.21 vanilla kernel produced normal results and is used as a reference.
The 5.17.1 and 5.16.15 vanilla kernels [4] produced normal results which can be found at the end.

[3] https://wiki.ubuntu.com/Kernel/MainlineBuilds 
[4] https://software.opensuse.org/package/kernel-vanilla


2.   k5.14.21 x k5.15.32

2.1  Detailed test report for compressed files

ubuntu jammy jellyfish -- kernel 5.14.21 --- vanilla --- (kvm)
...updating   50 files on /mnt/inode-ev/zstd: Job took    226 ms @inode_evictions: 51
...updating   50 files on /mnt/inode-ev/lzo:  Job took    222 ms @inode_evictions: 51
...updating  100 files on /mnt/inode-ev/zstd: Job took    384 ms @inode_evictions: 101
...updating  100 files on /mnt/inode-ev/lzo:  Job took    462 ms @inode_evictions: 101
...updating  150 files on /mnt/inode-ev/zstd: Job took    493 ms @inode_evictions: 151
...updating  150 files on /mnt/inode-ev/lzo:  Job took    554 ms @inode_evictions: 151
...updating  200 files on /mnt/inode-ev/zstd: Job took    804 ms @inode_evictions: 201
...updating  200 files on /mnt/inode-ev/lzo:  Job took    725 ms @inode_evictions: 201
...updating  250 files on /mnt/inode-ev/zstd: Job took    745 ms @inode_evictions: 251
...updating  250 files on /mnt/inode-ev/lzo:  Job took    758 ms @inode_evictions: 251
...updating 1000 files on /mnt/inode-ev/zstd: Job took   3452 ms @inode_evictions: 1001
...updating 1000 files on /mnt/inode-ev/lzo:  Job took   2979 ms @inode_evictions: 1001
ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm)
...updating   50 files on /mnt/inode-ev/zstd: Job took    420 ms @inode_evictions: 1275
...updating   50 files on /mnt/inode-ev/lzo:  Job took    488 ms @inode_evictions: 1275
...updating  100 files on /mnt/inode-ev/zstd: Job took   1649 ms @inode_evictions: 5050
...updating  100 files on /mnt/inode-ev/lzo:  Job took   1566 ms @inode_evictions: 5050
...updating  150 files on /mnt/inode-ev/zstd: Job took   4448 ms @inode_evictions: 11325
...updating  150 files on /mnt/inode-ev/lzo:  Job took   4136 ms @inode_evictions: 11325
...updating  200 files on /mnt/inode-ev/zstd: Job took   9177 ms @inode_evictions: 20100
...updating  200 files on /mnt/inode-ev/lzo:  Job took   9070 ms @inode_evictions: 20100
...updating  250 files on /mnt/inode-ev/zstd: Job took  16191 ms @inode_evictions: 31375
...updating  250 files on /mnt/inode-ev/lzo:  Job took  16062 ms @inode_evictions: 31375
...updating 1000 files on /mnt/inode-ev/zstd: Job took 132865 ms @inode_evictions: 104195
...updating 1000 files on /mnt/inode-ev/lzo:  Job took 131979 ms @inode_evictions: 106639

2.2. Comparison results for compressed files

k5.15.32 vanilla (compared to: k5.14.21 vanilla)
50   files gives aprox.  1.8 x more time and aprox.  25 x more inode evictions 
100  files gives aprox.  3.3 x more time and aprox.  50 x more inode evictions 
150  files gives aprox.  7.4 x more time and aprox.  75 x more inode evictions 
200  files gives aprox. 11.4 x more time and aprox. 100 x more inode evictions 
250  files gives aprox. 21.1 x more time and aprox. 125 x more inode evictions 
1000 files gives aprox. 38.4 x more time and aprox. 100 x more inode evictions 

2.3  Detailed test report for uncompressed files

ubuntu jammy jellyfish -- kernel 5.14.21 --- vanilla --- (kvm)
...updating   50 files on /mnt/inode-ev/uncompressed: Job took  214 ms @inode_evictions: 51
...updating  100 files on /mnt/inode-ev/uncompressed: Job took  402 ms @inode_evictions: 101
...updating  150 files on /mnt/inode-ev/uncompressed: Job took  543 ms @inode_evictions: 151
...updating  200 files on /mnt/inode-ev/uncompressed: Job took  694 ms @inode_evictions: 201
...updating  250 files on /mnt/inode-ev/uncompressed: Job took  835 ms @inode_evictions: 251
...updating 1000 files on /mnt/inode-ev/uncompressed: Job took 3162 ms @inode_evictions: 1001
ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm)
...updating   50 files on /mnt/inode-ev/uncompressed: Job took  269 ms @inode_evictions: 99
...updating  100 files on /mnt/inode-ev/uncompressed: Job took  359 ms @inode_evictions: 199
...updating  150 files on /mnt/inode-ev/uncompressed: Job took  675 ms @inode_evictions: 299
...updating  200 files on /mnt/inode-ev/uncompressed: Job took  752 ms @inode_evictions: 399
...updating  250 files on /mnt/inode-ev/uncompressed: Job took 1149 ms @inode_evictions: 499
...updating 1000 files on /mnt/inode-ev/uncompressed: Job took 7333 ms @inode_evictions: 1999

2.4. Comparison results for uncompressed files

k5.15.32 vanilla (compared to: k5.14.21 vanilla)
50   files gives aprox. 1.2 x more time and aprox. 2 x more inode evictions 
100  files gives aprox. 0.8 x more time and aprox. 2 x more inode evictions 
150  files gives aprox. 1.2 x more time and aprox. 2 x more inode evictions 
200  files gives aprox. 1.0 x more time and aprox. 2 x more inode evictions 
250  files gives aprox. 1.3 x more time and aprox. 2 x more inode evictions 
1000 files gives aprox. 2.3 x more time and aprox. 2 x more inode evictions 

2.5. Cpu usage

ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm)
...updating 1000 files on /mnt/inode-ev/zstd:         Job took 132691 ms - real 2m12,731s sys 2m 7,134s
...updating 1000 files on /mnt/inode-ev/lzo:          Job took 134130 ms - real 2m14,149s sys 2m 8,447s
...updating 1000 files on /mnt/inode-ev/uncompressed: Job took   7241 ms - real 0m 7,256s sys 0m 4,732s


3    k5.17.1 and k5.16.15

Just for the record, the 5.16 kernel never reproduced the regression.
The real life workload to trigger the regression, the rpm package updates, were verified to work fine since 5.16~rc6 [1].
It was expected that the synthetic workload from the script also produced normal results on the 5.16 and 5.17 kernels.

[1] https://bugzilla.opensuse.org/show_bug.cgi?id=1193549


3.1  Basic test report

opensuse tumbleweed ----- kernel 5.16.15 --- vanilla --- (kvm)
...updating 250 files on /mnt/inode-ev/zstd:         Job took 910 ms @inode_evictions: 250
...updating 250 files on /mnt/inode-ev/lzo:          Job took 740 ms @inode_evictions: 250
...updating 250 files on /mnt/inode-ev/uncompressed: Job took 717 ms @inode_evictions: 250
opensuse tumbleweed ----- kernel 5.17.1 ---- vanilla --- (kvm)
...updating 250 files on /mnt/inode-ev/zstd:         Job took 701 ms @inode_evictions: 250
...updating 250 files on /mnt/inode-ev/lzo:          Job took 695 ms @inode_evictions: 250
...updating 250 files on /mnt/inode-ev/uncompressed: Job took 954 ms @inode_evictions: 250


4.   Simplified test script

This simplified script tries to setup, format and mount a ramdisk block device.
It creates 3 testing folders (zstd, lzo, uncompressed) and set its btrfs compression property.
For each time the script is executed, 3 tests are done and the bpftrace is started right before each test.

#!/bin/bash
# s3e_t3.sh (based on s3e.sh version 4.7)
# Simple Syscall Signature Emulator (test3)
# test3: populate + test. test renameat2/openat + unlink syscalls w/ empty files (3x)
# Copyright (c) 2022 Bruno Damasceno <bdamasceno@hotmail.com.br>
# Warning: no safety checks

dir1=zstd
dir2=lzo
dir3=uncompressed
DIR=zzz
NUM_FILES=250
DEV=/dev/ram0
MNT=/mnt/inode-ev
DIR_1="$MNT/$dir1"
DIR_2="$MNT/$dir2"
DIR_3="$MNT/$dir3"

populate() {
    DIR=$1
    echo "...populating 1st generation of files on $DIR:"
    for ((i = 1; i <= $NUM_FILES; i++)); do
        echo -n > $DIR/file_$i
    done
    }

run_test() {
    DIR=$1
    sync
    xfs_io -c "fsync" $DIR
    echo -e "\n...updating $NUM_FILES files on $DIR:"
    #dumb pause so bpftrace has time to atach its probe
    sleep 3s
    start=$(date +%s%N)
    for ((i = 1; i <= $NUM_FILES; i++)); do
        mv $DIR/file_$i $DIR/file_$i-RPMDELETE
        unlink $DIR/file_$i-RPMDELETE
        echo -n > $DIR/file_$i
        echo -n "_$i"
        [ $i != $NUM_FILES ] && echo -ne "\r"
    done
    end=$(date +%s%N)
    dur=$(( (end - start) / 1000000 ))
    echo -ne "\r"
    echo "Job took $dur milliseconds"
    }

modprobe brd rd_size=128000 max_part=1 rd_nr=1
mkfs.btrfs --label inode-ev --force $DEV > /dev/null
mkdir $MNT
mount $DEV $MNT
mkdir $MNT/{$dir1,$dir2,$dir3}
btrfs property set $DIR_1 compression zstd:1
btrfs property set $DIR_2 compression lzo
btrfs property set $DIR_3 compression none

for dir in "$DIR_1" "$DIR_2" "$DIR_3"
    do
        populate "$dir"
        bpftrace -e 'kprobe:btrfs_evict_inode { @inode_evictions = count(); }' & run_test "$dir"
        pkill bpftrace
        #dumb pause to wait the bpftrace report
        sleep 2s
    done

umount $MNT
rm --dir $MNT
rmmod brd

  reply	other threads:[~2022-04-04  5:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <MN2PR20MB2512314446801B92562E26B5D2169@MN2PR20MB2512.namprd20.prod.outlook.com>
2022-03-24 12:18 ` [regression] 5.15 kernel triggering 100x more inode evictions Thorsten Leemhuis
2022-03-28  1:29   ` Bruno Damasceno Freire
2022-03-28  3:12   ` Bruno Damasceno Freire
2022-03-28 11:28     ` Thorsten Leemhuis
2022-04-04  5:29       ` Bruno Damasceno Freire [this message]
2022-04-08 10:32         ` Thorsten Leemhuis
2022-04-08 14:52           ` David Sterba
2022-04-08 15:55             ` Filipe Manana
2022-04-08 16:50               ` Thorsten Leemhuis
2022-04-09 17:12                 ` Bruno Damasceno Freire
2022-04-10  7:27                   ` Thorsten Leemhuis
2022-04-13  0:15                     ` Nicholas D Steeves
2022-04-09 17:07               ` Bruno Damasceno Freire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR20MB251205164078C367C3FC4166D2E59@MN2PR20MB2512.namprd20.prod.outlook.com \
    --to=bdamasceno@hotmail.com.br \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).