Re: [lkp-robot] [mm, vmscan] 5e56dfbd83: fsmark.files_per_sec -11.1% regression

From: Michal Hocko <mhocko@kernel.org>
To: Ye Xiaolong <xiaolong.ye@intel.com>, Mel Gorman <mgorman@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Minchan Kim <minchan@kernel.org>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@01.org
Subject: Re: [lkp-robot] [mm, vmscan]  5e56dfbd83:  fsmark.files_per_sec -11.1% regression
Date: Thu, 23 Feb 2017 08:35:45 +0100	[thread overview]
Message-ID: <20170223073544.uiy6rvw3d44irixf@dhcp22.suse.cz> (raw)
In-Reply-To: <20170223012734.GB31776@yexl-desktop>

On Thu 23-02-17 09:27:34, Ye Xiaolong wrote:
> Hi, Michal
> 
> On 02/07, Michal Hocko wrote:
> [snip]
> >Could you retest with a single NUMA node? I am not familiar with the
> >benchmark enough to judge it was set up properly for a NUMA machine.
> 
> I've retested the commit with a single NUMA node via "numactl -m 0 fs_mark xxx",
> and it did help recover the performance back.

Thanks for restesting! get_scan_count which was 
> 
> Here is the comparison:
> 
> commit/compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/md/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
>   5e56dfbd837421b7fa3c6c06018c6701e2704917/gcc-6/performance/3HDD/4M/btrfs/1/x86_64-rhel-7.2/RAID5/64/debian-x86_64-2016-08-31.cgz/NoSync/ivb44/130G/fsmark
>    
> (with a single NUMA node)	    (2 NUMA nodes)
> -------------------------------------------------------------------- 
>        fail:runs   %reproduction    fail:runs
>            |              |             |    
>          %stddev      %change         %stddev
>              \           |                \  
>      57.60 ±  0%      -11.1%      51.20 ±  0%  fsmark.files_per_sec
>     607.84 ±  0%       +9.0%     662.24 ±  1%  fsmark.time.elapsed_time
>     607.84 ±  0%       +9.0%     662.24 ±  1%  fsmark.time.elapsed_time.max
>      14317 ±  6%      -12.2%      12568 ±  7%  fsmark.time.involuntary_context_switches
>       1864 ±  0%       +0.5%       1873 ±  0%  fsmark.time.maximum_resident_set_size
>      12425 ±  0%      +23.3%      15320 ±  3%  fsmark.time.minor_page_faults
>      33.00 ±  3%      -33.9%      21.80 ±  1%  fsmark.time.percent_of_cpu_this_job_got
>     203.49 ±  3%      -28.1%     146.31 ±  1%  fsmark.time.system_time
>     605701 ±  0%       +3.6%     627486 ±  0%  fsmark.time.voluntary_context_switches
>     307106 ±  2%      +20.2%     368992 ±  9%  interrupts.CAL:Function_call_interrupts
>     183040 ±  0%      +23.2%     225559 ±  3%  softirqs.BLOCK
>      12203 ± 57%     +236.4%      41056 ±103%  softirqs.NET_RX
>     186118 ±  0%      +21.9%     226922 ±  2%  softirqs.TASKLET
>      14317 ±  6%      -12.2%      12568 ±  7%  time.involuntary_context_switches
>      12425 ±  0%      +23.3%      15320 ±  3%  time.minor_page_faults
>      33.00 ±  3%      -33.9%      21.80 ±  1%  time.percent_of_cpu_this_job_got
>     203.49 ±  3%      -28.1%     146.31 ±  1%  time.system_time
>       3.47 ±  3%      -13.0%       3.02 ±  1%  turbostat.%Busy
>      99.60 ±  1%       -9.6%      90.00 ±  1%  turbostat.Avg_MHz
>      78.69 ±  1%       +1.7%      80.01 ±  0%  turbostat.CorWatt
>       3.56 ± 61%      -91.7%       0.30 ± 76%  turbostat.Pkg%pc2
>     207790 ±  0%       -8.2%     190654 ±  1%  vmstat.io.bo
>   30667691 ±  0%      +65.9%   50890669 ±  1%  vmstat.memory.cache
>   34549892 ±  0%      -58.4%   14378939 ±  4%  vmstat.memory.free
>       6768 ±  0%       -1.3%       6681 ±  1%  vmstat.system.cs
>  1.089e+10 ±  2%      +13.4%  1.236e+10 ±  3%  cpuidle.C1E-IVT.time
>   11475304 ±  2%      +13.4%   13007849 ±  3%  cpuidle.C1E-IVT.usage
>    2.7e+09 ±  6%      +13.2%  3.057e+09 ±  3%  cpuidle.C3-IVT.time
>    2954294 ±  6%      +14.3%    3375966 ±  3%  cpuidle.C3-IVT.usage
>   96963295 ± 14%      +17.5%  1.139e+08 ± 12%  cpuidle.POLL.time
>       8761 ±  7%      +17.6%      10299 ±  9%  cpuidle.POLL.usage
>   30454483 ±  0%      +66.4%   50666102 ±  1%  meminfo.Cached
> 
> Do you see what's happening?

not really. All I could see in the previous data was that the memory
locality was different (and better) with my patch, which I cannot
explain either because get_scan_count is always per-node thing. Moreover
the change shouldn't make any difference for normal GFP_KERNEL requests
on 64b systems because the reclaim index covers all zones so there is
nothing to skip over.

> Or is there anything we can do to improve fsmark benchmark setup to
> make it more reasonable?

Unfortunatelly I am not an expert on this benchmark. Maybe Mel knows
better.
-- 
Michal Hocko
SUSE Labs