From: Yu Kuai <yukuai1@huaweicloud.com> To: Yin Fengwei <fengwei.yin@intel.com>, Yu Kuai <yukuai1@huaweicloud.com>, kernel test robot <yujie.liu@intel.com> Cc: lkp@lists.01.org, lkp@intel.com, Jens Axboe <axboe@kernel.dk>, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, "yukuai (C)" <yukuai3@huawei.com> Subject: Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression Date: Sun, 9 Oct 2022 14:14:05 +0800 [thread overview] Message-ID: <e6d191e7-eda6-2511-1764-221b6f868cc8@huaweicloud.com> (raw) In-Reply-To: <de2cc259-727a-3707-6738-1c5ab41075a2@intel.com> Hi, 在 2022/10/09 13:47, Yin Fengwei 写道: > Hi Kuai, > > On 10/8/22 16:00, Yu Kuai wrote: >> Hi, >> >> 在 2022/10/08 10:50, kernel test robot 写道: >>> Greeting, >>> >>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit: >> >> I don't know how this is working but I'm *sure* this commit won't affect >> performance. Please take a look at the commit, only wbt initialization >> is touched, which is done while creating the device: >> >> device_add_disk >> blk_register_queue >> wbt_enable_default >> wbt_init >> >> And io path is the same with or without this commit. >> >> By the way, wbt should only work for write. > Some information here: > It looks like the line > wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); > matters. > > If move only this line to original position based on 8c5035dfbb, > the regression is gone. > > If move only this line before ret = rq_qos_add() (just like your patch > did, but only with this line) based on 8c5035dfbb, the regression can > be reproduced. > Thanks for the information, but I still don't understand if there is any difference after wbt_init() is done, and how does read is afftected by wbt. 🙁 > > Regards > Yin, Fengwei > >> >> Thanks, >> Kuai >>> >>> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >>> >>> in testcase: fio-basic >>> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory >>> with following parameters: >>> >>> runtime: 300s >>> nr_task: 8t >>> disk: 1SSD >>> fs: btrfs >>> rw: randread >>> bs: 2M >>> ioengine: sync >>> test_size: 256g >>> cpufreq_governor: performance >>> >>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. >>> test-url: https://github.com/axboe/fio >>> >>> >>> Details are as below: >>> >>> ========================================================================================= >>> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase: >>> 2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic >>> >>> commit: >>> f7de4886fe ("rnbd-srv: remove struct rnbd_dev") >>> 8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >>> >>> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73 >>> ---------------- --------------------------- >>> %stddev %change %stddev >>> \ | \ >>> 0.03 ±106% +0.2 0.22 ± 80% fio.latency_20ms% >>> 0.02 ± 33% -0.0 0.01 ± 12% fio.latency_4ms% >>> 2508 -10.6% 2243 fio.read_bw_MBps >>> 6717440 +17.6% 7897088 fio.read_clat_90%_us >>> 6892202 +19.0% 8202922 fio.read_clat_95%_us >>> 7602176 ± 4% +18.4% 9000277 ± 3% fio.read_clat_99%_us >>> 6374238 +11.8% 7127450 fio.read_clat_mean_us >>> 363825 ± 10% +74.9% 636378 ± 5% fio.read_clat_stddev >>> 1254 -10.6% 1121 fio.read_iops >>> 104.97 +11.8% 117.32 fio.time.elapsed_time >>> 104.97 +11.8% 117.32 fio.time.elapsed_time.max >>> 13731 +5.6% 14498 ± 4% fio.time.maximum_resident_set_size >>> 116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got >>> 1.998e+10 +11.4% 2.226e+10 cpuidle..time >>> 3.27 ± 3% +4.6% 3.42 iostat.cpu.iowait >>> 4.49 ± 68% -2.1 2.38 ±152% perf-profile.children.cycles-pp.number >>> 4.49 ± 68% -2.5 1.98 ±175% perf-profile.self.cycles-pp.number >>> 557763 +5.4% 587781 proc-vmstat.pgfault >>> 25488 +3.1% 26274 proc-vmstat.pgreuse >>> 2459048 -10.1% 2209482 vmstat.io.bi >>> 184649 ± 5% -10.4% 165526 ± 7% vmstat.system.cs >>> 111733 ± 30% +61.8% 180770 ± 21% numa-meminfo.node0.AnonPages >>> 113221 ± 30% +60.2% 181416 ± 21% numa-meminfo.node0.Inactive(anon) >>> 11301 ± 24% +164.5% 29888 ±117% numa-meminfo.node2.Active(file) >>> 104911 ± 39% -80.5% 20456 ±100% numa-meminfo.node3.AnonHugePages >>> 131666 ± 27% -67.9% 42297 ± 82% numa-meminfo.node3.AnonPages >>> 132698 ± 26% -67.5% 43158 ± 81% numa-meminfo.node3.Inactive(anon) >>> 27934 ± 30% +61.8% 45196 ± 21% numa-vmstat.node0.nr_anon_pages >>> 28306 ± 30% +60.2% 45358 ± 21% numa-vmstat.node0.nr_inactive_anon >>> 28305 ± 30% +60.2% 45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon >>> 6291 ± 24% +68.0% 10567 ± 26% numa-vmstat.node2.workingset_nodes >>> 32925 ± 27% -67.9% 10571 ± 82% numa-vmstat.node3.nr_anon_pages >>> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_inactive_anon >>> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon >>> 161.78 ± 4% -28.2% 116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg >>> 161.46 ± 4% -28.2% 115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg >>> 426382 +11.0% 473345 ± 6% sched_debug.cpu.clock.avg >>> 426394 +11.0% 473357 ± 6% sched_debug.cpu.clock.max >>> 426370 +11.0% 473331 ± 6% sched_debug.cpu.clock.min >>> 426139 +10.9% 472586 ± 6% sched_debug.cpu.clock_task.avg >>> 426368 +11.0% 473130 ± 6% sched_debug.cpu.clock_task.max >>> 416196 +11.1% 462228 ± 6% sched_debug.cpu.clock_task.min >>> 1156 ± 7% -10.8% 1031 ± 6% sched_debug.cpu.curr->pid.stddev >>> 426372 +11.0% 473334 ± 6% sched_debug.cpu_clk >>> 425355 +11.0% 472318 ± 6% sched_debug.ktime >>> 426826 +11.0% 473787 ± 6% sched_debug.sched_clk >>> 1.263e+09 -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions >>> 190886 ± 5% -10.8% 170290 ± 7% perf-stat.i.context-switches >>> 1.979e+09 -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads >>> 8.998e+08 -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores >>> 6.455e+09 -8.0% 5.938e+09 ± 3% perf-stat.i.instructions >>> 21.78 -8.4% 19.95 perf-stat.i.metric.M/sec >>> 7045315 ± 4% -14.0% 6057863 ± 6% perf-stat.i.node-load-misses >>> 2658563 ± 7% -21.9% 2077647 ± 12% perf-stat.i.node-loads >>> 414822 ± 4% -12.9% 361455 ± 3% perf-stat.i.node-store-misses >>> 1.251e+09 -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions >>> 189082 ± 5% -10.7% 168849 ± 7% perf-stat.ps.context-switches >>> 1.96e+09 -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads >>> 8.912e+08 -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores >>> 6.393e+09 -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions >>> 6978485 ± 4% -13.9% 6006510 ± 6% perf-stat.ps.node-load-misses >>> 2633627 ± 7% -21.8% 2060033 ± 12% perf-stat.ps.node-loads >>> 410822 ± 4% -12.8% 358289 ± 3% perf-stat.ps.node-store-misses >>> >>> >>> If you fix the issue, kindly add following tag >>> | Reported-by: kernel test robot <yujie.liu@intel.com> >>> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com >>> >>> >>> To reproduce: >>> >>> git clone https://github.com/intel/lkp-tests.git >>> cd lkp-tests >>> sudo bin/lkp install job.yaml # job file is attached in this email >>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run >>> sudo bin/lkp run generated-yaml-file >>> >>> # if come across any failure that blocks the test, >>> # please remove ~/.lkp and /lkp dir to run from a clean state. >>> >>> >>> Disclaimer: >>> Results have been estimated based on internal Intel analysis and are provided >>> for informational purposes only. Any difference in system hardware or software >>> design or configuration may affect actual performance. >>> >>> >> _______________________________________________ >> LKP mailing list -- lkp@lists.01.org >> To unsubscribe send an email to lkp-leave@lists.01.org > > . >
WARNING: multiple messages have this Message-ID (diff)
From: Yu Kuai <yukuai1@huaweicloud.com> To: lkp@lists.01.org Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression Date: Sun, 09 Oct 2022 14:14:05 +0800 [thread overview] Message-ID: <e6d191e7-eda6-2511-1764-221b6f868cc8@huaweicloud.com> (raw) In-Reply-To: <de2cc259-727a-3707-6738-1c5ab41075a2@intel.com> [-- Attachment #1: Type: text/plain, Size: 10453 bytes --] Hi, 在 2022/10/09 13:47, Yin Fengwei 写道: > Hi Kuai, > > On 10/8/22 16:00, Yu Kuai wrote: >> Hi, >> >> 在 2022/10/08 10:50, kernel test robot 写道: >>> Greeting, >>> >>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit: >> >> I don't know how this is working but I'm *sure* this commit won't affect >> performance. Please take a look at the commit, only wbt initialization >> is touched, which is done while creating the device: >> >> device_add_disk >> blk_register_queue >> wbt_enable_default >> wbt_init >> >> And io path is the same with or without this commit. >> >> By the way, wbt should only work for write. > Some information here: > It looks like the line > wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); > matters. > > If move only this line to original position based on 8c5035dfbb, > the regression is gone. > > If move only this line before ret = rq_qos_add() (just like your patch > did, but only with this line) based on 8c5035dfbb, the regression can > be reproduced. > Thanks for the information, but I still don't understand if there is any difference after wbt_init() is done, and how does read is afftected by wbt. 🙁 > > Regards > Yin, Fengwei > >> >> Thanks, >> Kuai >>> >>> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >>> >>> in testcase: fio-basic >>> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory >>> with following parameters: >>> >>> runtime: 300s >>> nr_task: 8t >>> disk: 1SSD >>> fs: btrfs >>> rw: randread >>> bs: 2M >>> ioengine: sync >>> test_size: 256g >>> cpufreq_governor: performance >>> >>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. >>> test-url: https://github.com/axboe/fio >>> >>> >>> Details are as below: >>> >>> ========================================================================================= >>> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase: >>> 2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic >>> >>> commit: >>> f7de4886fe ("rnbd-srv: remove struct rnbd_dev") >>> 8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized") >>> >>> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73 >>> ---------------- --------------------------- >>> %stddev %change %stddev >>> \ | \ >>> 0.03 ±106% +0.2 0.22 ± 80% fio.latency_20ms% >>> 0.02 ± 33% -0.0 0.01 ± 12% fio.latency_4ms% >>> 2508 -10.6% 2243 fio.read_bw_MBps >>> 6717440 +17.6% 7897088 fio.read_clat_90%_us >>> 6892202 +19.0% 8202922 fio.read_clat_95%_us >>> 7602176 ± 4% +18.4% 9000277 ± 3% fio.read_clat_99%_us >>> 6374238 +11.8% 7127450 fio.read_clat_mean_us >>> 363825 ± 10% +74.9% 636378 ± 5% fio.read_clat_stddev >>> 1254 -10.6% 1121 fio.read_iops >>> 104.97 +11.8% 117.32 fio.time.elapsed_time >>> 104.97 +11.8% 117.32 fio.time.elapsed_time.max >>> 13731 +5.6% 14498 ± 4% fio.time.maximum_resident_set_size >>> 116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got >>> 1.998e+10 +11.4% 2.226e+10 cpuidle..time >>> 3.27 ± 3% +4.6% 3.42 iostat.cpu.iowait >>> 4.49 ± 68% -2.1 2.38 ±152% perf-profile.children.cycles-pp.number >>> 4.49 ± 68% -2.5 1.98 ±175% perf-profile.self.cycles-pp.number >>> 557763 +5.4% 587781 proc-vmstat.pgfault >>> 25488 +3.1% 26274 proc-vmstat.pgreuse >>> 2459048 -10.1% 2209482 vmstat.io.bi >>> 184649 ± 5% -10.4% 165526 ± 7% vmstat.system.cs >>> 111733 ± 30% +61.8% 180770 ± 21% numa-meminfo.node0.AnonPages >>> 113221 ± 30% +60.2% 181416 ± 21% numa-meminfo.node0.Inactive(anon) >>> 11301 ± 24% +164.5% 29888 ±117% numa-meminfo.node2.Active(file) >>> 104911 ± 39% -80.5% 20456 ±100% numa-meminfo.node3.AnonHugePages >>> 131666 ± 27% -67.9% 42297 ± 82% numa-meminfo.node3.AnonPages >>> 132698 ± 26% -67.5% 43158 ± 81% numa-meminfo.node3.Inactive(anon) >>> 27934 ± 30% +61.8% 45196 ± 21% numa-vmstat.node0.nr_anon_pages >>> 28306 ± 30% +60.2% 45358 ± 21% numa-vmstat.node0.nr_inactive_anon >>> 28305 ± 30% +60.2% 45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon >>> 6291 ± 24% +68.0% 10567 ± 26% numa-vmstat.node2.workingset_nodes >>> 32925 ± 27% -67.9% 10571 ± 82% numa-vmstat.node3.nr_anon_pages >>> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_inactive_anon >>> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon >>> 161.78 ± 4% -28.2% 116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg >>> 161.46 ± 4% -28.2% 115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg >>> 426382 +11.0% 473345 ± 6% sched_debug.cpu.clock.avg >>> 426394 +11.0% 473357 ± 6% sched_debug.cpu.clock.max >>> 426370 +11.0% 473331 ± 6% sched_debug.cpu.clock.min >>> 426139 +10.9% 472586 ± 6% sched_debug.cpu.clock_task.avg >>> 426368 +11.0% 473130 ± 6% sched_debug.cpu.clock_task.max >>> 416196 +11.1% 462228 ± 6% sched_debug.cpu.clock_task.min >>> 1156 ± 7% -10.8% 1031 ± 6% sched_debug.cpu.curr->pid.stddev >>> 426372 +11.0% 473334 ± 6% sched_debug.cpu_clk >>> 425355 +11.0% 472318 ± 6% sched_debug.ktime >>> 426826 +11.0% 473787 ± 6% sched_debug.sched_clk >>> 1.263e+09 -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions >>> 190886 ± 5% -10.8% 170290 ± 7% perf-stat.i.context-switches >>> 1.979e+09 -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads >>> 8.998e+08 -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores >>> 6.455e+09 -8.0% 5.938e+09 ± 3% perf-stat.i.instructions >>> 21.78 -8.4% 19.95 perf-stat.i.metric.M/sec >>> 7045315 ± 4% -14.0% 6057863 ± 6% perf-stat.i.node-load-misses >>> 2658563 ± 7% -21.9% 2077647 ± 12% perf-stat.i.node-loads >>> 414822 ± 4% -12.9% 361455 ± 3% perf-stat.i.node-store-misses >>> 1.251e+09 -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions >>> 189082 ± 5% -10.7% 168849 ± 7% perf-stat.ps.context-switches >>> 1.96e+09 -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads >>> 8.912e+08 -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores >>> 6.393e+09 -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions >>> 6978485 ± 4% -13.9% 6006510 ± 6% perf-stat.ps.node-load-misses >>> 2633627 ± 7% -21.8% 2060033 ± 12% perf-stat.ps.node-loads >>> 410822 ± 4% -12.8% 358289 ± 3% perf-stat.ps.node-store-misses >>> >>> >>> If you fix the issue, kindly add following tag >>> | Reported-by: kernel test robot <yujie.liu@intel.com> >>> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu(a)intel.com >>> >>> >>> To reproduce: >>> >>> git clone https://github.com/intel/lkp-tests.git >>> cd lkp-tests >>> sudo bin/lkp install job.yaml # job file is attached in this email >>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run >>> sudo bin/lkp run generated-yaml-file >>> >>> # if come across any failure that blocks the test, >>> # please remove ~/.lkp and /lkp dir to run from a clean state. >>> >>> >>> Disclaimer: >>> Results have been estimated based on internal Intel analysis and are provided >>> for informational purposes only. Any difference in system hardware or software >>> design or configuration may affect actual performance. >>> >>> >> _______________________________________________ >> LKP mailing list -- lkp(a)lists.01.org >> To unsubscribe send an email to lkp-leave(a)lists.01.org > > . >
next prev parent reply other threads:[~2022-10-09 6:14 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-10-08 2:50 [blk] 8c5035dfbb: fio.read_iops -10.6% regression kernel test robot 2022-10-08 2:50 ` kernel test robot 2022-10-08 8:00 ` Yu Kuai 2022-10-08 8:00 ` Yu Kuai 2022-10-09 5:47 ` [LKP] " Yin Fengwei 2022-10-09 5:47 ` Yin Fengwei 2022-10-09 6:14 ` Yu Kuai [this message] 2022-10-09 6:14 ` Yu Kuai 2022-10-09 8:43 ` Ming Lei 2022-10-09 8:43 ` Ming Lei 2022-10-09 9:32 ` Yu Kuai 2022-10-09 9:32 ` Yu Kuai 2022-10-13 7:42 ` Feng Tang 2022-10-13 7:42 ` Feng Tang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e6d191e7-eda6-2511-1764-221b6f868cc8@huaweicloud.com \ --to=yukuai1@huaweicloud.com \ --cc=axboe@kernel.dk \ --cc=fengwei.yin@intel.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lkp@intel.com \ --cc=lkp@lists.01.org \ --cc=yujie.liu@intel.com \ --cc=yukuai3@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.