All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: kernel test robot <yujie.liu@intel.com>
Cc: lkp@lists.01.org, lkp@intel.com, Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	ying.huang@intel.com, feng.tang@intel.com,
	zhengjun.xing@linux.intel.com, fengwei.yin@intel.com,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
Date: Sat, 8 Oct 2022 16:00:10 +0800	[thread overview]
Message-ID: <d5279fc2-38b3-6d20-4404-604d5c7277e2@huaweicloud.com> (raw)
In-Reply-To: <202210081045.77ddf59b-yujie.liu@intel.com>

Hi,

在 2022/10/08 10:50, kernel test robot 写道:
> Greeting,
> 
> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:

I don't know how this is working but I'm *sure* this commit won't affect
performance. Please take a look at the commit, only wbt initialization
is touched, which is done while creating the device:

device_add_disk
  blk_register_queue
   wbt_enable_default
    wbt_init

And io path is the same with or without this commit.

By the way, wbt should only work for write.

Thanks,
Kuai
> 
> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: fio-basic
> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
> with following parameters:
> 
> 	runtime: 300s
> 	nr_task: 8t
> 	disk: 1SSD
> 	fs: btrfs
> 	rw: randread
> 	bs: 2M
> 	ioengine: sync
> 	test_size: 256g
> 	cpufreq_governor: performance
> 
> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
> 
> 
> Details are as below:
> 
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
>    2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic
> 
> commit:
>    f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
>    8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
> 
> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>        0.03 ±106%      +0.2        0.22 ± 80%  fio.latency_20ms%
>        0.02 ± 33%      -0.0        0.01 ± 12%  fio.latency_4ms%
>        2508           -10.6%       2243        fio.read_bw_MBps
>     6717440           +17.6%    7897088        fio.read_clat_90%_us
>     6892202           +19.0%    8202922        fio.read_clat_95%_us
>     7602176 ±  4%     +18.4%    9000277 ±  3%  fio.read_clat_99%_us
>     6374238           +11.8%    7127450        fio.read_clat_mean_us
>      363825 ± 10%     +74.9%     636378 ±  5%  fio.read_clat_stddev
>        1254           -10.6%       1121        fio.read_iops
>      104.97           +11.8%     117.32        fio.time.elapsed_time
>      104.97           +11.8%     117.32        fio.time.elapsed_time.max
>       13731            +5.6%      14498 ±  4%  fio.time.maximum_resident_set_size
>      116.00            -8.2%     106.50        fio.time.percent_of_cpu_this_job_got
>   1.998e+10           +11.4%  2.226e+10        cpuidle..time
>        3.27 ±  3%      +4.6%       3.42        iostat.cpu.iowait
>        4.49 ± 68%      -2.1        2.38 ±152%  perf-profile.children.cycles-pp.number
>        4.49 ± 68%      -2.5        1.98 ±175%  perf-profile.self.cycles-pp.number
>      557763            +5.4%     587781        proc-vmstat.pgfault
>       25488            +3.1%      26274        proc-vmstat.pgreuse
>     2459048           -10.1%    2209482        vmstat.io.bi
>      184649 ±  5%     -10.4%     165526 ±  7%  vmstat.system.cs
>      111733 ± 30%     +61.8%     180770 ± 21%  numa-meminfo.node0.AnonPages
>      113221 ± 30%     +60.2%     181416 ± 21%  numa-meminfo.node0.Inactive(anon)
>       11301 ± 24%    +164.5%      29888 ±117%  numa-meminfo.node2.Active(file)
>      104911 ± 39%     -80.5%      20456 ±100%  numa-meminfo.node3.AnonHugePages
>      131666 ± 27%     -67.9%      42297 ± 82%  numa-meminfo.node3.AnonPages
>      132698 ± 26%     -67.5%      43158 ± 81%  numa-meminfo.node3.Inactive(anon)
>       27934 ± 30%     +61.8%      45196 ± 21%  numa-vmstat.node0.nr_anon_pages
>       28306 ± 30%     +60.2%      45358 ± 21%  numa-vmstat.node0.nr_inactive_anon
>       28305 ± 30%     +60.2%      45357 ± 21%  numa-vmstat.node0.nr_zone_inactive_anon
>        6291 ± 24%     +68.0%      10567 ± 26%  numa-vmstat.node2.workingset_nodes
>       32925 ± 27%     -67.9%      10571 ± 82%  numa-vmstat.node3.nr_anon_pages
>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_inactive_anon
>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_zone_inactive_anon
>      161.78 ±  4%     -28.2%     116.10 ± 30%  sched_debug.cfs_rq:/.runnable_avg.avg
>      161.46 ±  4%     -28.2%     115.85 ± 30%  sched_debug.cfs_rq:/.util_avg.avg
>      426382           +11.0%     473345 ±  6%  sched_debug.cpu.clock.avg
>      426394           +11.0%     473357 ±  6%  sched_debug.cpu.clock.max
>      426370           +11.0%     473331 ±  6%  sched_debug.cpu.clock.min
>      426139           +10.9%     472586 ±  6%  sched_debug.cpu.clock_task.avg
>      426368           +11.0%     473130 ±  6%  sched_debug.cpu.clock_task.max
>      416196           +11.1%     462228 ±  6%  sched_debug.cpu.clock_task.min
>        1156 ±  7%     -10.8%       1031 ±  6%  sched_debug.cpu.curr->pid.stddev
>      426372           +11.0%     473334 ±  6%  sched_debug.cpu_clk
>      425355           +11.0%     472318 ±  6%  sched_debug.ktime
>      426826           +11.0%     473787 ±  6%  sched_debug.sched_clk
>   1.263e+09            -7.9%  1.164e+09 ±  3%  perf-stat.i.branch-instructions
>      190886 ±  5%     -10.8%     170290 ±  7%  perf-stat.i.context-switches
>   1.979e+09            -8.8%  1.804e+09 ±  2%  perf-stat.i.dTLB-loads
>   8.998e+08            -8.2%  8.257e+08 ±  2%  perf-stat.i.dTLB-stores
>   6.455e+09            -8.0%  5.938e+09 ±  3%  perf-stat.i.instructions
>       21.78            -8.4%      19.95        perf-stat.i.metric.M/sec
>     7045315 ±  4%     -14.0%    6057863 ±  6%  perf-stat.i.node-load-misses
>     2658563 ±  7%     -21.9%    2077647 ± 12%  perf-stat.i.node-loads
>      414822 ±  4%     -12.9%     361455 ±  3%  perf-stat.i.node-store-misses
>   1.251e+09            -7.8%  1.154e+09 ±  3%  perf-stat.ps.branch-instructions
>      189082 ±  5%     -10.7%     168849 ±  7%  perf-stat.ps.context-switches
>    1.96e+09            -8.8%  1.789e+09 ±  2%  perf-stat.ps.dTLB-loads
>   8.912e+08            -8.1%  8.187e+08 ±  2%  perf-stat.ps.dTLB-stores
>   6.393e+09            -7.9%  5.888e+09 ±  3%  perf-stat.ps.instructions
>     6978485 ±  4%     -13.9%    6006510 ±  6%  perf-stat.ps.node-load-misses
>     2633627 ±  7%     -21.8%    2060033 ± 12%  perf-stat.ps.node-loads
>      410822 ±  4%     -12.8%     358289 ±  3%  perf-stat.ps.node-store-misses
> 
> 
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <yujie.liu@intel.com>
> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com
> 
> 
> To reproduce:
> 
>          git clone https://github.com/intel/lkp-tests.git
>          cd lkp-tests
>          sudo bin/lkp install job.yaml           # job file is attached in this email
>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>          sudo bin/lkp run generated-yaml-file
> 
>          # if come across any failure that blocks the test,
>          # please remove ~/.lkp and /lkp dir to run from a clean state.
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 


WARNING: multiple messages have this Message-ID (diff)
From: Yu Kuai <yukuai1@huaweicloud.com>
To: lkp@lists.01.org
Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression
Date: Sat, 08 Oct 2022 16:00:10 +0800	[thread overview]
Message-ID: <d5279fc2-38b3-6d20-4404-604d5c7277e2@huaweicloud.com> (raw)
In-Reply-To: <202210081045.77ddf59b-yujie.liu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 7873 bytes --]

Hi,

在 2022/10/08 10:50, kernel test robot 写道:
> Greeting,
> 
> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:

I don't know how this is working but I'm *sure* this commit won't affect
performance. Please take a look at the commit, only wbt initialization
is touched, which is done while creating the device:

device_add_disk
  blk_register_queue
   wbt_enable_default
    wbt_init

And io path is the same with or without this commit.

By the way, wbt should only work for write.

Thanks,
Kuai
> 
> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: fio-basic
> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
> with following parameters:
> 
> 	runtime: 300s
> 	nr_task: 8t
> 	disk: 1SSD
> 	fs: btrfs
> 	rw: randread
> 	bs: 2M
> 	ioengine: sync
> 	test_size: 256g
> 	cpufreq_governor: performance
> 
> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
> 
> 
> Details are as below:
> 
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
>    2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic
> 
> commit:
>    f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
>    8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
> 
> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>        0.03 ±106%      +0.2        0.22 ± 80%  fio.latency_20ms%
>        0.02 ± 33%      -0.0        0.01 ± 12%  fio.latency_4ms%
>        2508           -10.6%       2243        fio.read_bw_MBps
>     6717440           +17.6%    7897088        fio.read_clat_90%_us
>     6892202           +19.0%    8202922        fio.read_clat_95%_us
>     7602176 ±  4%     +18.4%    9000277 ±  3%  fio.read_clat_99%_us
>     6374238           +11.8%    7127450        fio.read_clat_mean_us
>      363825 ± 10%     +74.9%     636378 ±  5%  fio.read_clat_stddev
>        1254           -10.6%       1121        fio.read_iops
>      104.97           +11.8%     117.32        fio.time.elapsed_time
>      104.97           +11.8%     117.32        fio.time.elapsed_time.max
>       13731            +5.6%      14498 ±  4%  fio.time.maximum_resident_set_size
>      116.00            -8.2%     106.50        fio.time.percent_of_cpu_this_job_got
>   1.998e+10           +11.4%  2.226e+10        cpuidle..time
>        3.27 ±  3%      +4.6%       3.42        iostat.cpu.iowait
>        4.49 ± 68%      -2.1        2.38 ±152%  perf-profile.children.cycles-pp.number
>        4.49 ± 68%      -2.5        1.98 ±175%  perf-profile.self.cycles-pp.number
>      557763            +5.4%     587781        proc-vmstat.pgfault
>       25488            +3.1%      26274        proc-vmstat.pgreuse
>     2459048           -10.1%    2209482        vmstat.io.bi
>      184649 ±  5%     -10.4%     165526 ±  7%  vmstat.system.cs
>      111733 ± 30%     +61.8%     180770 ± 21%  numa-meminfo.node0.AnonPages
>      113221 ± 30%     +60.2%     181416 ± 21%  numa-meminfo.node0.Inactive(anon)
>       11301 ± 24%    +164.5%      29888 ±117%  numa-meminfo.node2.Active(file)
>      104911 ± 39%     -80.5%      20456 ±100%  numa-meminfo.node3.AnonHugePages
>      131666 ± 27%     -67.9%      42297 ± 82%  numa-meminfo.node3.AnonPages
>      132698 ± 26%     -67.5%      43158 ± 81%  numa-meminfo.node3.Inactive(anon)
>       27934 ± 30%     +61.8%      45196 ± 21%  numa-vmstat.node0.nr_anon_pages
>       28306 ± 30%     +60.2%      45358 ± 21%  numa-vmstat.node0.nr_inactive_anon
>       28305 ± 30%     +60.2%      45357 ± 21%  numa-vmstat.node0.nr_zone_inactive_anon
>        6291 ± 24%     +68.0%      10567 ± 26%  numa-vmstat.node2.workingset_nodes
>       32925 ± 27%     -67.9%      10571 ± 82%  numa-vmstat.node3.nr_anon_pages
>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_inactive_anon
>       33182 ± 26%     -67.5%      10786 ± 81%  numa-vmstat.node3.nr_zone_inactive_anon
>      161.78 ±  4%     -28.2%     116.10 ± 30%  sched_debug.cfs_rq:/.runnable_avg.avg
>      161.46 ±  4%     -28.2%     115.85 ± 30%  sched_debug.cfs_rq:/.util_avg.avg
>      426382           +11.0%     473345 ±  6%  sched_debug.cpu.clock.avg
>      426394           +11.0%     473357 ±  6%  sched_debug.cpu.clock.max
>      426370           +11.0%     473331 ±  6%  sched_debug.cpu.clock.min
>      426139           +10.9%     472586 ±  6%  sched_debug.cpu.clock_task.avg
>      426368           +11.0%     473130 ±  6%  sched_debug.cpu.clock_task.max
>      416196           +11.1%     462228 ±  6%  sched_debug.cpu.clock_task.min
>        1156 ±  7%     -10.8%       1031 ±  6%  sched_debug.cpu.curr->pid.stddev
>      426372           +11.0%     473334 ±  6%  sched_debug.cpu_clk
>      425355           +11.0%     472318 ±  6%  sched_debug.ktime
>      426826           +11.0%     473787 ±  6%  sched_debug.sched_clk
>   1.263e+09            -7.9%  1.164e+09 ±  3%  perf-stat.i.branch-instructions
>      190886 ±  5%     -10.8%     170290 ±  7%  perf-stat.i.context-switches
>   1.979e+09            -8.8%  1.804e+09 ±  2%  perf-stat.i.dTLB-loads
>   8.998e+08            -8.2%  8.257e+08 ±  2%  perf-stat.i.dTLB-stores
>   6.455e+09            -8.0%  5.938e+09 ±  3%  perf-stat.i.instructions
>       21.78            -8.4%      19.95        perf-stat.i.metric.M/sec
>     7045315 ±  4%     -14.0%    6057863 ±  6%  perf-stat.i.node-load-misses
>     2658563 ±  7%     -21.9%    2077647 ± 12%  perf-stat.i.node-loads
>      414822 ±  4%     -12.9%     361455 ±  3%  perf-stat.i.node-store-misses
>   1.251e+09            -7.8%  1.154e+09 ±  3%  perf-stat.ps.branch-instructions
>      189082 ±  5%     -10.7%     168849 ±  7%  perf-stat.ps.context-switches
>    1.96e+09            -8.8%  1.789e+09 ±  2%  perf-stat.ps.dTLB-loads
>   8.912e+08            -8.1%  8.187e+08 ±  2%  perf-stat.ps.dTLB-stores
>   6.393e+09            -7.9%  5.888e+09 ±  3%  perf-stat.ps.instructions
>     6978485 ±  4%     -13.9%    6006510 ±  6%  perf-stat.ps.node-load-misses
>     2633627 ±  7%     -21.8%    2060033 ± 12%  perf-stat.ps.node-loads
>      410822 ±  4%     -12.8%     358289 ±  3%  perf-stat.ps.node-store-misses
> 
> 
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <yujie.liu@intel.com>
> | Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu(a)intel.com
> 
> 
> To reproduce:
> 
>          git clone https://github.com/intel/lkp-tests.git
>          cd lkp-tests
>          sudo bin/lkp install job.yaml           # job file is attached in this email
>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>          sudo bin/lkp run generated-yaml-file
> 
>          # if come across any failure that blocks the test,
>          # please remove ~/.lkp and /lkp dir to run from a clean state.
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 

  reply	other threads:[~2022-10-08  8:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-08  2:50 [blk] 8c5035dfbb: fio.read_iops -10.6% regression kernel test robot
2022-10-08  2:50 ` kernel test robot
2022-10-08  8:00 ` Yu Kuai [this message]
2022-10-08  8:00   ` Yu Kuai
2022-10-09  5:47   ` [LKP] " Yin Fengwei
2022-10-09  5:47     ` Yin Fengwei
2022-10-09  6:14     ` [LKP] " Yu Kuai
2022-10-09  6:14       ` Yu Kuai
2022-10-09  8:43   ` Ming Lei
2022-10-09  8:43     ` Ming Lei
2022-10-09  9:32     ` Yu Kuai
2022-10-09  9:32       ` Yu Kuai
2022-10-13  7:42       ` Feng Tang
2022-10-13  7:42         ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5279fc2-38b3-6d20-4404-604d5c7277e2@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=axboe@kernel.dk \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=ying.huang@intel.com \
    --cc=yujie.liu@intel.com \
    --cc=yukuai3@huawei.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.