All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Bharata B Rao <bharata@amd.com>, <linux-kernel@vger.kernel.org>,
	<ying.huang@intel.com>, <feng.tang@intel.com>,
	<fengwei.yin@intel.com>, <aubrey.li@linux.intel.com>,
	<yu.c.chen@intel.com>, <linux-mm@kvack.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>, <rppt@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Aithal Srikanth <sraithal@amd.com>,
	"kernel test robot" <oliver.sang@intel.com>,
	Raghavendra K T <raghavendra.kt@amd.com>,
	Sapkal Swapnil <Swapnil.Sapkal@amd.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>
Subject: Re: [RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned
Date: Sun, 10 Sep 2023 23:29:28 +0800	[thread overview]
Message-ID: <202309102311.84b42068-oliver.sang@intel.com> (raw)
In-Reply-To: <109ca1ea59b9dd6f2daf7b7fbc74e83ae074fbdf.1693287931.git.raghavendra.kt@amd.com>



Hello,

kernel test robot noticed a -33.6% improvement of autonuma-benchmark.numa02.seconds on:


commit: af46f3c9ca2d16485912f8b9c896ef48bbfe1388 ("[RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned")
url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
patch link: https://lore.kernel.org/all/109ca1ea59b9dd6f2daf7b7fbc74e83ae074fbdf.1693287931.git.raghavendra.kt@amd.com/
patch subject: [RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned

testcase: autonuma-benchmark
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

	iterations: 4x
	test: numa01_THREAD_ALLOC
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230910/202309102311.84b42068-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
  gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark

commit: 
  167773d1dd ("sched/numa: Increase tasks' access history")
  af46f3c9ca ("sched/numa: Allow recently accessed VMAs to be scanned")

167773d1ddb5ffdd af46f3c9ca2d16485912f8b9c89 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 2.534e+10 ± 10%     -13.0%  2.204e+10 ±  7%  cpuidle..time
  26431366 ± 10%     -13.2%   22948978 ±  7%  cpuidle..usage
      0.15 ±  4%      -0.0        0.12 ±  3%  mpstat.cpu.all.soft%
      2.92 ±  3%      +0.4        3.32 ±  4%  mpstat.cpu.all.sys%
      2243 ±  2%     -12.7%       1957 ±  3%  uptime.boot
     29811 ±  8%     -11.1%      26507 ±  6%  uptime.idle
      5.32 ± 79%     -64.2%       1.91 ± 60%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      2.70 ± 18%     +37.8%       3.72 ±  9%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.64 ±137%  +26644.2%     169.91 ±220%  perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
      0.08 ± 20%      +0.0        0.12 ± 10%  perf-profile.children.cycles-pp.terminate_walk
      0.10 ± 25%      +0.0        0.14 ± 10%  perf-profile.children.cycles-pp.wake_up_q
      0.06 ± 50%      +0.0        0.10 ± 10%  perf-profile.children.cycles-pp.vfs_readlink
      0.15 ± 36%      +0.1        0.22 ± 13%  perf-profile.children.cycles-pp.readlink
      1.31 ± 19%      +0.4        1.69 ± 12%  perf-profile.children.cycles-pp.unmap_vmas
      2.46 ± 19%      +0.5        2.99 ±  4%  perf-profile.children.cycles-pp.exit_mmap
    311653 ± 10%     -23.7%     237884 ±  9%  turbostat.C1E
  26018024 ± 10%     -13.1%   22597563 ±  7%  turbostat.C6
      6.41 ±  9%     -13.6%       5.54 ±  8%  turbostat.CPU%c1
      2.47 ± 11%     +36.0%       3.36 ±  6%  turbostat.CPU%c6
 2.881e+08 ±  2%     -12.8%  2.513e+08 ±  3%  turbostat.IRQ
    212.86            +2.8%     218.84        turbostat.RAMWatt
    341.49            -4.1%     327.42 ±  2%  autonuma-benchmark.numa01.seconds
    186.67 ±  6%     -27.1%     136.12 ±  7%  autonuma-benchmark.numa01_THREAD_ALLOC.seconds
     21.17 ±  7%     -33.6%      14.05        autonuma-benchmark.numa02.seconds
      2200 ±  2%     -13.0%       1913 ±  3%  autonuma-benchmark.time.elapsed_time
      2200 ±  2%     -13.0%       1913 ±  3%  autonuma-benchmark.time.elapsed_time.max
   1159380 ±  2%     -12.0%    1019969 ±  3%  autonuma-benchmark.time.involuntary_context_switches
   3363550            -5.0%    3194802        autonuma-benchmark.time.minor_page_faults
    243046 ±  2%     -13.3%     210725 ±  3%  autonuma-benchmark.time.user_time
   7494239            -6.8%    6984234        proc-vmstat.numa_hit
    118829 ±  6%     +13.7%     135136 ±  6%  proc-vmstat.numa_huge_pte_updates
   6207618            -8.4%    5686795 ±  2%  proc-vmstat.numa_local
   8834573 ±  3%     +20.2%   10616944 ±  4%  proc-vmstat.numa_pages_migrated
  61094857 ±  6%     +13.6%   69409875 ±  6%  proc-vmstat.numa_pte_updates
   8602789            -9.0%    7827793 ±  2%  proc-vmstat.pgfault
   8834573 ±  3%     +20.2%   10616944 ±  4%  proc-vmstat.pgmigrate_success
    371818           -10.1%     334391 ±  2%  proc-vmstat.pgreuse
     17200 ±  3%     +20.3%      20686 ±  4%  proc-vmstat.thp_migration_success
  16401792 ±  2%     -12.7%   14322816 ±  3%  proc-vmstat.unevictable_pgs_scanned
 1.606e+08 ±  2%     -13.8%  1.385e+08 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.avg
 1.666e+08 ±  2%     -14.0%  1.433e+08 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.max
 1.364e+08 ±  2%     -11.7%  1.204e+08 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.min
   4795327 ±  7%     -17.5%    3956991 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.stddev
 1.606e+08 ±  2%     -13.8%  1.385e+08 ±  3%  sched_debug.cfs_rq:/.min_vruntime.avg
 1.666e+08 ±  2%     -14.0%  1.433e+08 ±  3%  sched_debug.cfs_rq:/.min_vruntime.max
 1.364e+08 ±  2%     -11.7%  1.204e+08 ±  3%  sched_debug.cfs_rq:/.min_vruntime.min
   4795327 ±  7%     -17.5%    3956991 ±  7%  sched_debug.cfs_rq:/.min_vruntime.stddev
    364.96 ±  6%     +16.6%     425.70 ±  5%  sched_debug.cfs_rq:/.util_est_enqueued.avg
   1099114           -13.0%     956021 ±  2%  sched_debug.cpu.clock.avg
   1099477           -13.0%     956344 ±  2%  sched_debug.cpu.clock.max
   1098702           -13.0%     955643 ±  2%  sched_debug.cpu.clock.min
   1080712           -13.0%     940415 ±  2%  sched_debug.cpu.clock_task.avg
   1085309           -13.1%     943557 ±  2%  sched_debug.cpu.clock_task.max
   1064613           -13.0%     925993 ±  2%  sched_debug.cpu.clock_task.min
     28890 ±  3%     -11.7%      25504 ±  3%  sched_debug.cpu.curr->pid.avg
     35200           -11.0%      31344        sched_debug.cpu.curr->pid.max
    862245 ±  3%      -8.7%     786984        sched_debug.cpu.max_idle_balance_cost.max
     74019 ±  9%     -28.2%      53158 ±  7%  sched_debug.cpu.max_idle_balance_cost.stddev
     15507           -11.9%      13667 ±  2%  sched_debug.cpu.nr_switches.avg
     57616 ±  6%     -19.0%      46642 ±  8%  sched_debug.cpu.nr_switches.max
      8460 ±  6%     -12.9%       7368 ±  5%  sched_debug.cpu.nr_switches.stddev
   1098689           -13.0%     955631 ±  2%  sched_debug.cpu_clk
   1097964           -13.0%     954907 ±  2%  sched_debug.ktime
      0.00           +15.0%       0.00 ±  2%  sched_debug.rt_rq:.rt_nr_migratory.avg
      0.03           +15.0%       0.03 ±  2%  sched_debug.rt_rq:.rt_nr_migratory.max
      0.00           +15.0%       0.00 ±  2%  sched_debug.rt_rq:.rt_nr_migratory.stddev
      0.00           +15.0%       0.00 ±  2%  sched_debug.rt_rq:.rt_nr_running.avg
      0.03           +15.0%       0.03 ±  2%  sched_debug.rt_rq:.rt_nr_running.max
      0.00           +15.0%       0.00 ±  2%  sched_debug.rt_rq:.rt_nr_running.stddev
   1099511           -13.0%     956501 ±  2%  sched_debug.sched_clk
      1162 ±  2%     +15.2%       1339 ±  3%  perf-stat.i.MPKI
 1.656e+08            +3.6%  1.716e+08        perf-stat.i.branch-instructions
      0.95 ±  4%      +0.1        1.03        perf-stat.i.branch-miss-rate%
   1538367 ±  6%     +11.0%    1707146 ±  2%  perf-stat.i.branch-misses
 6.327e+08 ±  3%     +18.7%  7.513e+08 ±  4%  perf-stat.i.cache-misses
 8.282e+08 ±  2%     +15.2%  9.542e+08 ±  3%  perf-stat.i.cache-references
    658.12 ±  3%     -11.4%     582.98 ±  6%  perf-stat.i.cycles-between-cache-misses
 2.201e+08            +2.8%  2.263e+08        perf-stat.i.dTLB-loads
    579771            +0.9%     584915        perf-stat.i.dTLB-store-misses
 1.122e+08            +1.4%  1.138e+08        perf-stat.i.dTLB-stores
 8.278e+08            +3.1%  8.538e+08        perf-stat.i.instructions
     13.98 ±  2%     +14.3%      15.98 ±  3%  perf-stat.i.metric.M/sec
      3797            +4.3%       3958        perf-stat.i.minor-faults
    258749            +8.0%     279391 ±  2%  perf-stat.i.node-load-misses
    261169 ±  2%      +7.4%     280417 ±  5%  perf-stat.i.node-loads
     40.91 ±  3%      -3.0       37.89 ±  3%  perf-stat.i.node-store-miss-rate%
 3.841e+08 ±  6%     +27.6%  4.902e+08 ±  7%  perf-stat.i.node-stores
      3797            +4.3%       3958        perf-stat.i.page-faults
    998.24 ±  2%     +11.8%       1116 ±  2%  perf-stat.overall.MPKI
    463.91            -3.2%     448.99        perf-stat.overall.cpi
    604.23 ±  3%     -15.9%     508.08 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.00            +3.3%       0.00        perf-stat.overall.ipc
     39.20 ±  5%      -4.5       34.70 ±  6%  perf-stat.overall.node-store-miss-rate%
 1.636e+08            +3.8%  1.698e+08        perf-stat.ps.branch-instructions
   1499760 ±  6%     +11.1%    1665855 ±  2%  perf-stat.ps.branch-misses
 6.296e+08 ±  3%     +19.0%  7.489e+08 ±  4%  perf-stat.ps.cache-misses
 8.178e+08 ±  2%     +15.5%  9.447e+08 ±  3%  perf-stat.ps.cache-references
  2.18e+08            +2.9%  2.244e+08        perf-stat.ps.dTLB-loads
    578148            +0.9%     583328        perf-stat.ps.dTLB-store-misses
 1.117e+08            +1.4%  1.132e+08        perf-stat.ps.dTLB-stores
 8.192e+08            +3.3%   8.46e+08        perf-stat.ps.instructions
      3744            +4.3%       3906        perf-stat.ps.minor-faults
    255974            +8.2%     276924 ±  2%  perf-stat.ps.node-load-misses
    263796 ±  2%      +7.7%     284110 ±  5%  perf-stat.ps.node-loads
  3.82e+08 ±  6%     +27.7%  4.879e+08 ±  7%  perf-stat.ps.node-stores
      3744            +4.3%       3906        perf-stat.ps.page-faults
 1.805e+12 ±  2%     -10.1%  1.622e+12 ±  2%  perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  reply	other threads:[~2023-09-10 15:29 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-29  6:06 [RFC PATCH V1 0/6] sched/numa: Enhance disjoint VMA scanning Raghavendra K T
2023-08-29  6:06 ` [RFC PATCH V1 1/6] sched/numa: Move up the access pid reset logic Raghavendra K T
2023-08-29  6:06 ` [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic Raghavendra K T
2023-09-12  7:50   ` kernelt test robot
2023-09-13  6:21     ` Raghavendra K T
2023-08-29  6:06 ` [RFC PATCH V1 3/6] sched/numa: Remove unconditional scan logic using mm numa_scan_seq Raghavendra K T
2023-08-29  6:06 ` [RFC PATCH V1 4/6] sched/numa: Increase tasks' access history Raghavendra K T
2023-09-12 14:24   ` kernel test robot
2023-09-13  6:15     ` Raghavendra K T
2023-09-13  7:34       ` Oliver Sang
2023-08-29  6:06 ` [RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned Raghavendra K T
2023-09-10 15:29   ` kernel test robot [this message]
2023-09-11 11:25     ` Raghavendra K T
2023-09-12  2:22       ` Oliver Sang
2023-09-12  6:43         ` Raghavendra K T
2023-08-29  6:06 ` [RFC PATCH V1 6/6] sched/numa: Allow scanning of shared VMAs Raghavendra K T
2023-09-13  5:28 ` [RFC PATCH V1 0/6] sched/numa: Enhance disjoint VMA scanning Swapnil Sapkal
2023-09-13  6:24   ` Raghavendra K T
2023-09-19  6:30 ` Raghavendra K T
2023-09-19  7:15   ` Ingo Molnar
2023-09-19  8:06     ` Raghavendra K T
2023-09-19  9:28 ` Peter Zijlstra
2023-09-19 16:22   ` Mel Gorman
2023-09-19 19:11     ` Peter Zijlstra
2023-09-20 10:42     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202309102311.84b42068-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=Swapnil.Sapkal@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=aubrey.li@linux.intel.com \
    --cc=bharata@amd.com \
    --cc=david@redhat.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rppt@kernel.org \
    --cc=sraithal@amd.com \
    --cc=vincent.guittot@linaro.org \
    --cc=ying.huang@intel.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.