Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression

From: "Huang\, Ying" <ying.huang@intel.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: kernel test robot <xiaolong.ye@intel.com>,
	Rik van Riel <riel@redhat.com>, Michal Hocko <mhocko@suse.com>,
	<lkp@01.org>, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Minchan Kim <minchan@kernel.org>,
	Vinayak Menon <vinmenon@codeaurora.org>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression
Date: Wed, 08 Jun 2016 15:21:56 +0800	[thread overview]
Message-ID: <87a8iw5enf.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20160606095136.GA79951@black.fi.intel.com> (Kirill A. Shutemov's message of "Mon, 6 Jun 2016 12:51:36 +0300")

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> On Mon, Jun 06, 2016 at 10:27:24AM +0800, kernel test robot wrote:
>> 
>> FYI, we noticed a -6.3% regression of unixbench.score due to commit:
>> 
>> commit 5c0a85fad949212b3e059692deecdeed74ae7ec7 ("mm: make faultaround produce old ptes")
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> 
>> in testcase: unixbench
>> on test machine: lituya: 16 threads Haswell High-end Desktop (i7-5960X 3.0G) with 16G memory
>> with following parameters: cpufreq_governor=performance/nr_task=1/test=shell8
>> 
>> 
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>> 
>> 
>> =========================================================================================
>> compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
>>   gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/lituya/shell8/unixbench
>> 
>> commit: 
>>   4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5
>>   5c0a85fad949212b3e059692deecdeed74ae7ec7
>> 
>> 4b50bcc7eda4d3cc 5c0a85fad949212b3e059692de 
>> ---------------- -------------------------- 
>>        fail:runs  %reproduction    fail:runs
>>            |             |             |    
>>           3:4          -75%            :4     kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
>>          %stddev     %change         %stddev
>>              \          |                \  
>>      14321 .  0%      -6.3%      13425 .  0%  unixbench.score
>>    1996897 .  0%      -6.1%    1874635 .  0%  unixbench.time.involuntary_context_switches
>>  1.721e+08 .  0%      -6.2%  1.613e+08 .  0%  unixbench.time.minor_page_faults
>>     758.65 .  0%      -3.0%     735.86 .  0%  unixbench.time.system_time
>>     387.66 .  0%      +5.4%     408.49 .  0%  unixbench.time.user_time
>>    5950278 .  0%      -6.2%    5583456 .  0%  unixbench.time.voluntary_context_switches
>
> That's weird.
>
> I don't understand why the change would reduce number or minor faults.
> It should stay the same on x86-64. Rise of user_time is puzzling too.

unixbench runs in fixed time mode.  That is, the total time to run
unixbench is fixed, but the work done varies.  So the minor_page_faults
change may reflect only the work done.

> Hm. Is reproducible? Across reboot?

Yes.  LKP will run every benchmark after reboot via kexec.  We run 3
times for both the commit and its parent.  The result is quite stable.
You can find the standard deviation in percent is near 0 across
different runs.  Here is another comparison with profile data.

=========================================================================================
compiler/cpufreq_governor/debug-setup/kconfig/nr_task/rootfs/tbox_group/test/testcase:
  gcc-4.9/performance/profile/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/lituya/shell8/unixbench

commit: 
  4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5
  5c0a85fad949212b3e059692deecdeed74ae7ec7

4b50bcc7eda4d3cc 5c0a85fad949212b3e059692de 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     14056 ±  0%      -6.3%      13172 ±  0%  unixbench.score
   6464046 ±  0%      -6.1%    6071922 ±  0%  unixbench.time.involuntary_context_switches
 5.555e+08 ±  0%      -6.2%  5.211e+08 ±  0%  unixbench.time.minor_page_faults
      2537 ±  0%      -3.2%       2455 ±  0%  unixbench.time.system_time
      1284 ±  0%      +5.8%       1359 ±  0%  unixbench.time.user_time
  19192611 ±  0%      -6.2%   18010830 ±  0%  unixbench.time.voluntary_context_switches
   7709931 ±  0%     -11.0%    6860574 ±  0%  cpuidle.C1-HSW.usage
      6900 ±  1%     -43.9%       3871 ±  0%  proc-vmstat.nr_active_file
     40813 ±  1%     -77.9%       9015 ±114%  softirqs.NET_RX
    111331 ±  1%     -13.3%      96503 ±  0%  meminfo.Active
     27603 ±  1%     -43.9%      15486 ±  0%  meminfo.Active(file)
     93169 ±  0%      -5.8%      87766 ±  0%  vmstat.system.cs
     19768 ±  0%      -1.7%      19437 ±  0%  vmstat.system.in
      6.22 ±  0%     +10.3%       6.86 ±  0%  turbostat.CPU%c3
      0.02 ± 20%     -85.7%       0.00 ±141%  turbostat.Pkg%pc3
     68.99 ±  0%      -1.7%      67.84 ±  0%  turbostat.PkgWatt
      1.38 ±  5%     -42.0%       0.80 ±  5%  perf-profile.cycles-pp.page_remove_rmap.unmap_page_range.unmap_single_vma.unmap_vmas.exit_mmap
      0.83 ±  4%     +28.8%       1.07 ± 21%  perf-profile.cycles-pp.release_pages.free_pages_and_swap_cache.tlb_flush_mmu_free.tlb_finish_mmu.exit_mmap
      1.55 ±  3%     -10.6%       1.38 ±  2%  perf-profile.cycles-pp.unmap_single_vma.unmap_vmas.exit_mmap.mmput.flush_old_exec
      1.59 ±  3%      -9.8%       1.44 ±  3%  perf-profile.cycles-pp.unmap_vmas.exit_mmap.mmput.flush_old_exec.load_elf_binary
    389.00 ±  0%     +32.1%     514.00 ±  8%  slabinfo.file_lock_cache.active_objs
    389.00 ±  0%     +32.1%     514.00 ±  8%  slabinfo.file_lock_cache.num_objs
      7075 ±  3%     -17.7%       5823 ±  7%  slabinfo.pid.active_objs
      7075 ±  3%     -17.7%       5823 ±  7%  slabinfo.pid.num_objs
      0.67 ± 34%     +86.4%       1.24 ± 30%  sched_debug.cfs_rq:/.runnable_load_avg.min
     -9013 ± -1%     +14.4%     -10315 ± -9%  sched_debug.cfs_rq:/.spread0.avg
     83127 ±  5%     +16.9%      97163 ±  8%  sched_debug.cpu.avg_idle.min
     17777 ± 16%     +66.6%      29608 ± 22%  sched_debug.cpu.curr->pid.avg
     50223 ± 10%     +49.3%      74974 ±  0%  sched_debug.cpu.curr->pid.max
     22281 ± 13%     +51.8%      33816 ±  6%  sched_debug.cpu.curr->pid.stddev
    251.79 ±  5%     -13.8%     217.15 ±  5%  sched_debug.cpu.nr_uninterruptible.max
   -261.12 ± -2%     -13.4%    -226.03 ± -1%  sched_debug.cpu.nr_uninterruptible.min
    221.14 ±  3%     -14.7%     188.60 ±  1%  sched_debug.cpu.nr_uninterruptible.stddev
  1.94e+11 ±  0%      -5.8%  1.827e+11 ±  0%  perf-stat.L1-dcache-load-misses
 3.496e+12 ±  0%      -6.5%  3.268e+12 ±  0%  perf-stat.L1-dcache-loads
 2.262e+12 ±  1%      -5.5%  2.137e+12 ±  0%  perf-stat.L1-dcache-stores
 9.711e+10 ±  0%      -3.7%  9.353e+10 ±  0%  perf-stat.L1-icache-load-misses
 8.051e+08 ±  0%      -8.8%  7.343e+08 ±  1%  perf-stat.LLC-load-misses
 7.184e+10 ±  1%      -5.6%   6.78e+10 ±  0%  perf-stat.LLC-loads
 5.867e+08 ±  2%      -7.0%  5.456e+08 ±  0%  perf-stat.LLC-store-misses
 1.524e+10 ±  1%      -5.6%  1.438e+10 ±  0%  perf-stat.LLC-stores
 2.711e+12 ±  0%      -6.3%  2.539e+12 ±  0%  perf-stat.branch-instructions
 5.948e+10 ±  0%      -3.9%  5.715e+10 ±  0%  perf-stat.branch-load-misses
 2.715e+12 ±  0%      -6.4%  2.542e+12 ±  0%  perf-stat.branch-loads
 5.947e+10 ±  0%      -3.9%  5.713e+10 ±  0%  perf-stat.branch-misses
 1.448e+09 ±  0%      -9.3%  1.313e+09 ±  1%  perf-stat.cache-misses
 1.931e+11 ±  0%      -5.8%  1.818e+11 ±  0%  perf-stat.cache-references
  58882705 ±  0%      -5.8%   55467522 ±  0%  perf-stat.context-switches
  17037466 ±  0%      -6.1%   15999111 ±  0%  perf-stat.cpu-migrations
 6.732e+09 ±  1%     +90.7%  1.284e+10 ±  0%  perf-stat.dTLB-load-misses
 3.474e+12 ±  0%      -6.6%  3.245e+12 ±  0%  perf-stat.dTLB-loads
 1.215e+09 ±  0%      -5.5%  1.149e+09 ±  0%  perf-stat.dTLB-store-misses
 2.286e+12 ±  0%      -5.8%  2.153e+12 ±  0%  perf-stat.dTLB-stores
 3.511e+09 ±  0%     +20.4%  4.226e+09 ±  0%  perf-stat.iTLB-load-misses
 2.317e+09 ±  0%      -6.8%   2.16e+09 ±  0%  perf-stat.iTLB-loads
 1.343e+13 ±  0%      -6.0%  1.263e+13 ±  0%  perf-stat.instructions
 5.504e+08 ±  0%      -6.2%  5.163e+08 ±  0%  perf-stat.minor-faults
  8.09e+08 ±  1%      -9.0%   7.36e+08 ±  1%  perf-stat.node-loads
 5.932e+08 ±  0%      -8.7%  5.417e+08 ±  1%  perf-stat.node-stores
 5.504e+08 ±  0%      -6.2%  5.163e+08 ±  0%  perf-stat.page-faults

Best Regards,
Huang, Ying