From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1148427481888897552==" MIME-Version: 1.0 From: Yu Kuai To: lkp@lists.01.org Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression Date: Sat, 08 Oct 2022 16:00:10 +0800 Message-ID: In-Reply-To: <202210081045.77ddf59b-yujie.liu@intel.com> List-Id: --===============1148427481888897552== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi, =E5=9C=A8 2022/10/08 10:50, kernel test robot =E5=86=99=E9=81=93: > Greeting, > = > FYI, we noticed a -10.6% regression of fio.read_iops due to commit: I don't know how this is working but I'm *sure* this commit won't affect performance. Please take a look at the commit, only wbt initialization is touched, which is done while creating the device: device_add_disk blk_register_queue wbt_enable_default wbt_init And io path is the same with or without this commit. By the way, wbt should only work for write. Thanks, Kuai > = > commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_a= dd() after wb_normal is initialized") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > = > in testcase: fio-basic > on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU= @ 2.30GHz (Cascade Lake) with 192G memory > with following parameters: > = > runtime: 300s > nr_task: 8t > disk: 1SSD > fs: btrfs > rw: randread > bs: 2M > ioengine: sync > test_size: 256g > cpufreq_governor: performance > = > test-description: Fio is a tool that will spawn a number of threads or pr= ocesses doing a particular type of I/O action as specified by the user. > test-url: https://github.com/axboe/fio > = > = > Details are as below: > = > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runt= ime/rw/tbox_group/test_size/testcase: > 2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x= 86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic > = > commit: > f7de4886fe ("rnbd-srv: remove struct rnbd_dev") > 8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized= ") > = > f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 0.03 =C2=B1106% +0.2 0.22 =C2=B1 80% fio.latency_20ms% > 0.02 =C2=B1 33% -0.0 0.01 =C2=B1 12% fio.latency_4ms% > 2508 -10.6% 2243 fio.read_bw_MBps > 6717440 +17.6% 7897088 fio.read_clat_90%_us > 6892202 +19.0% 8202922 fio.read_clat_95%_us > 7602176 =C2=B1 4% +18.4% 9000277 =C2=B1 3% fio.read_clat_99= %_us > 6374238 +11.8% 7127450 fio.read_clat_mean_us > 363825 =C2=B1 10% +74.9% 636378 =C2=B1 5% fio.read_clat_st= ddev > 1254 -10.6% 1121 fio.read_iops > 104.97 +11.8% 117.32 fio.time.elapsed_time > 104.97 +11.8% 117.32 fio.time.elapsed_time.max > 13731 +5.6% 14498 =C2=B1 4% fio.time.maximum_resi= dent_set_size > 116.00 -8.2% 106.50 fio.time.percent_of_cpu_th= is_job_got > 1.998e+10 +11.4% 2.226e+10 cpuidle..time > 3.27 =C2=B1 3% +4.6% 3.42 iostat.cpu.iowait > 4.49 =C2=B1 68% -2.1 2.38 =C2=B1152% perf-profile.chi= ldren.cycles-pp.number > 4.49 =C2=B1 68% -2.5 1.98 =C2=B1175% perf-profile.sel= f.cycles-pp.number > 557763 +5.4% 587781 proc-vmstat.pgfault > 25488 +3.1% 26274 proc-vmstat.pgreuse > 2459048 -10.1% 2209482 vmstat.io.bi > 184649 =C2=B1 5% -10.4% 165526 =C2=B1 7% vmstat.system.cs > 111733 =C2=B1 30% +61.8% 180770 =C2=B1 21% numa-meminfo.nod= e0.AnonPages > 113221 =C2=B1 30% +60.2% 181416 =C2=B1 21% numa-meminfo.nod= e0.Inactive(anon) > 11301 =C2=B1 24% +164.5% 29888 =C2=B1117% numa-meminfo.nod= e2.Active(file) > 104911 =C2=B1 39% -80.5% 20456 =C2=B1100% numa-meminfo.nod= e3.AnonHugePages > 131666 =C2=B1 27% -67.9% 42297 =C2=B1 82% numa-meminfo.nod= e3.AnonPages > 132698 =C2=B1 26% -67.5% 43158 =C2=B1 81% numa-meminfo.nod= e3.Inactive(anon) > 27934 =C2=B1 30% +61.8% 45196 =C2=B1 21% numa-vmstat.node= 0.nr_anon_pages > 28306 =C2=B1 30% +60.2% 45358 =C2=B1 21% numa-vmstat.node= 0.nr_inactive_anon > 28305 =C2=B1 30% +60.2% 45357 =C2=B1 21% numa-vmstat.node= 0.nr_zone_inactive_anon > 6291 =C2=B1 24% +68.0% 10567 =C2=B1 26% numa-vmstat.node= 2.workingset_nodes > 32925 =C2=B1 27% -67.9% 10571 =C2=B1 82% numa-vmstat.node= 3.nr_anon_pages > 33182 =C2=B1 26% -67.5% 10786 =C2=B1 81% numa-vmstat.node= 3.nr_inactive_anon > 33182 =C2=B1 26% -67.5% 10786 =C2=B1 81% numa-vmstat.node= 3.nr_zone_inactive_anon > 161.78 =C2=B1 4% -28.2% 116.10 =C2=B1 30% sched_debug.cfs_= rq:/.runnable_avg.avg > 161.46 =C2=B1 4% -28.2% 115.85 =C2=B1 30% sched_debug.cfs_= rq:/.util_avg.avg > 426382 +11.0% 473345 =C2=B1 6% sched_debug.cpu.clock= .avg > 426394 +11.0% 473357 =C2=B1 6% sched_debug.cpu.clock= .max > 426370 +11.0% 473331 =C2=B1 6% sched_debug.cpu.clock= .min > 426139 +10.9% 472586 =C2=B1 6% sched_debug.cpu.clock= _task.avg > 426368 +11.0% 473130 =C2=B1 6% sched_debug.cpu.clock= _task.max > 416196 +11.1% 462228 =C2=B1 6% sched_debug.cpu.clock= _task.min > 1156 =C2=B1 7% -10.8% 1031 =C2=B1 6% sched_debug.cpu.= curr->pid.stddev > 426372 +11.0% 473334 =C2=B1 6% sched_debug.cpu_clk > 425355 +11.0% 472318 =C2=B1 6% sched_debug.ktime > 426826 +11.0% 473787 =C2=B1 6% sched_debug.sched_clk > 1.263e+09 -7.9% 1.164e+09 =C2=B1 3% perf-stat.i.branch-in= structions > 190886 =C2=B1 5% -10.8% 170290 =C2=B1 7% perf-stat.i.cont= ext-switches > 1.979e+09 -8.8% 1.804e+09 =C2=B1 2% perf-stat.i.dTLB-loads > 8.998e+08 -8.2% 8.257e+08 =C2=B1 2% perf-stat.i.dTLB-stor= es > 6.455e+09 -8.0% 5.938e+09 =C2=B1 3% perf-stat.i.instructi= ons > 21.78 -8.4% 19.95 perf-stat.i.metric.M/sec > 7045315 =C2=B1 4% -14.0% 6057863 =C2=B1 6% perf-stat.i.node= -load-misses > 2658563 =C2=B1 7% -21.9% 2077647 =C2=B1 12% perf-stat.i.node= -loads > 414822 =C2=B1 4% -12.9% 361455 =C2=B1 3% perf-stat.i.node= -store-misses > 1.251e+09 -7.8% 1.154e+09 =C2=B1 3% perf-stat.ps.branch-i= nstructions > 189082 =C2=B1 5% -10.7% 168849 =C2=B1 7% perf-stat.ps.con= text-switches > 1.96e+09 -8.8% 1.789e+09 =C2=B1 2% perf-stat.ps.dTLB-loa= ds > 8.912e+08 -8.1% 8.187e+08 =C2=B1 2% perf-stat.ps.dTLB-sto= res > 6.393e+09 -7.9% 5.888e+09 =C2=B1 3% perf-stat.ps.instruct= ions > 6978485 =C2=B1 4% -13.9% 6006510 =C2=B1 6% perf-stat.ps.nod= e-load-misses > 2633627 =C2=B1 7% -21.8% 2060033 =C2=B1 12% perf-stat.ps.nod= e-loads > 410822 =C2=B1 4% -12.8% 358289 =C2=B1 3% perf-stat.ps.nod= e-store-misses > = > = > If you fix the issue, kindly add following tag > | Reported-by: kernel test robot > | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu(a)intel= .com > = > = > To reproduce: > = > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > sudo bin/lkp install job.yaml # job file is attached i= n this email > bin/lkp split-job --compatible job.yaml # generate the yaml file= for lkp run > sudo bin/lkp run generated-yaml-file > = > # if come across any failure that blocks the test, > # please remove ~/.lkp and /lkp dir to run from a clean state. > = > = > Disclaimer: > Results have been estimated based on internal Intel analysis and are prov= ided > for informational purposes only. Any difference in system hardware or sof= tware > design or configuration may affect actual performance. > = >=20 --===============1148427481888897552==--