All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Minchan Kim <minchan@kernel.org>
Cc: "Huang\, Ying" <ying.huang@intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, "Michal Hocko" <mhocko@suse.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	"Vinayak Menon" <vinmenon@codeaurora.org>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>, <lkp@01.org>
Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression
Date: Fri, 17 Jun 2016 12:26:51 -0700	[thread overview]
Message-ID: <87y463hb0k.fsf@yhuang-mobile.sh.intel.com> (raw)
In-Reply-To: <20160617054156.GB2374@bbox> (Minchan Kim's message of "Fri, 17 Jun 2016 14:41:56 +0900")

Minchan Kim <minchan@kernel.org> writes:

> On Thu, Jun 16, 2016 at 03:27:44PM -0700, Huang, Ying wrote:
>> Minchan Kim <minchan@kernel.org> writes:
>> 
>> > On Thu, Jun 16, 2016 at 07:52:26AM +0800, Huang, Ying wrote:
>> >> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
>> >> 
>> >> > On Tue, Jun 14, 2016 at 05:57:28PM +0900, Minchan Kim wrote:
>> >> >> On Wed, Jun 08, 2016 at 11:58:11AM +0300, Kirill A. Shutemov wrote:
>> >> >> > On Wed, Jun 08, 2016 at 04:41:37PM +0800, Huang, Ying wrote:
>> >> >> > > "Huang, Ying" <ying.huang@intel.com> writes:
>> >> >> > > 
>> >> >> > > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>> >> >> > > >
>> >> >> > > >> On Mon, Jun 06, 2016 at 10:27:24AM +0800, kernel test robot wrote:
>> >> >> > > >>> 
>> >> >> > > >>> FYI, we noticed a -6.3% regression of unixbench.score due to commit:
>> >> >> > > >>> 
>> >> >> > > >>> commit 5c0a85fad949212b3e059692deecdeed74ae7ec7 ("mm: make faultaround produce old ptes")
>> >> >> > > >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> >> >> > > >>> 
>> >> >> > > >>> in testcase: unixbench
>> >> >> > > >>> on test machine: lituya: 16 threads Haswell High-end Desktop (i7-5960X 3.0G) with 16G memory
>> >> >> > > >>> with following parameters: cpufreq_governor=performance/nr_task=1/test=shell8
>> >> >> > > >>> 
>> >> >> > > >>> 
>> >> >> > > >>> Details are as below:
>> >> >> > > >>> -------------------------------------------------------------------------------------------------->
>> >> >> > > >>> 
>> >> >> > > >>> 
>> >> >> > > >>> =========================================================================================
>> >> >> > > >>> compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
>> >> >> > > >>>   gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/lituya/shell8/unixbench
>> >> >> > > >>> 
>> >> >> > > >>> commit: 
>> >> >> > > >>>   4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5
>> >> >> > > >>>   5c0a85fad949212b3e059692deecdeed74ae7ec7
>> >> >> > > >>> 
>> >> >> > > >>> 4b50bcc7eda4d3cc 5c0a85fad949212b3e059692de 
>> >> >> > > >>> ---------------- -------------------------- 
>> >> >> > > >>>        fail:runs  %reproduction    fail:runs
>> >> >> > > >>>            |             |             |    
>> >> >> > > >>>           3:4          -75%            :4     kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
>> >> >> > > >>>          %stddev     %change         %stddev
>> >> >> > > >>>              \          |                \  
>> >> >> > > >>>      14321 .  0%      -6.3%      13425 .  0%  unixbench.score
>> >> >> > > >>>    1996897 .  0%      -6.1%    1874635 .  0%  unixbench.time.involuntary_context_switches
>> >> >> > > >>>  1.721e+08 .  0%      -6.2%  1.613e+08 .  0%  unixbench.time.minor_page_faults
>> >> >> > > >>>     758.65 .  0%      -3.0%     735.86 .  0%  unixbench.time.system_time
>> >> >> > > >>>     387.66 .  0%      +5.4%     408.49 .  0%  unixbench.time.user_time
>> >> >> > > >>>    5950278 .  0%      -6.2%    5583456 .  0%  unixbench.time.voluntary_context_switches
>> >> >> > > >>
>> >> >> > > >> That's weird.
>> >> >> > > >>
>> >> >> > > >> I don't understand why the change would reduce number or minor faults.
>> >> >> > > >> It should stay the same on x86-64. Rise of user_time is puzzling too.
>> >> >> > > >
>> >> >> > > > unixbench runs in fixed time mode.  That is, the total time to run
>> >> >> > > > unixbench is fixed, but the work done varies.  So the minor_page_faults
>> >> >> > > > change may reflect only the work done.
>> >> >> > > >
>> >> >> > > >> Hm. Is reproducible? Across reboot?
>> >> >> > > >
>> >> >> > > 
>> >> >> > > And FYI, there is no swap setup for test, all root file system including
>> >> >> > > benchmark files are in tmpfs, so no real page reclaim will be
>> >> >> > > triggered.  But it appears that active file cache reduced after the
>> >> >> > > commit.
>> >> >> > > 
>> >> >> > >     111331 .  1%     -13.3%      96503 .  0%  meminfo.Active
>> >> >> > >      27603 .  1%     -43.9%      15486 .  0%  meminfo.Active(file)
>> >> >> > > 
>> >> >> > > I think this is the expected behavior of the commit?
>> >> >> > 
>> >> >> > Yes, it's expected.
>> >> >> > 
>> >> >> > After the change faularound would produce old pte. It means there's more
>> >> >> > chance for these pages to be on inactive lru, unless somebody actually
>> >> >> > touch them and flip accessed bit.
>> >> >> 
>> >> >> Hmm, tmpfs pages should be in anonymous LRU list and VM shouldn't scan
>> >> >> anonymous LRU list on swapless system so I really wonder why active file
>> >> >> LRU is shrunk.
>> >> >
>> >> > Hm. Good point. I don't why we have anything on file lru if there's no
>> >> > filesystems except tmpfs.
>> >> >
>> >> > Ying, how do you get stuff to the tmpfs?
>> >> 
>> >> We put root file system and benchmark into a set of compressed cpio
>> >> archive, then concatenate them into one initrd, and finally kernel use
>> >> that initrd as initramfs.
>> >
>> > I see.
>> >
>> > Could you share your 4 full vmstat(/proc/vmstat) files?
>> >
>> > old:
>> >
>> > cat /proc/vmstat > before.old.vmstat
>> > do benchmark
>> > cat /proc/vmstat > after.old.vmstat
>> >
>> > new:
>> >
>> > cat /proc/vmstat > before.new.vmstat
>> > do benchmark
>> > cat /proc/vmstat > after.new.vmstat
>> >
>> > IOW, I want to see stats related to reclaim.
>> 
>> Hi,
>> 
>> The /proc/vmstat for the parent commit (parent-proc-vmstat.gz) and first
>> bad commit (fbc-proc-vmstat.gz) are attached with the email.
>> 
>> The contents of the file is more than the vmstat before and after
>> benchmark running, but are sampled every 1 seconds.  Every sample begin
>> with "time: <time>".  You can check the first and last samples.  The
>> first /proc/vmstat capturing is started at the same time of the
>> benchmark, so it is not exactly the vmstat before the benchmark running.
>> 
>
> Thanks for the testing!
>
> nr_active_file was shrunk 48% but the vaule itself is not huge so
> I don't think it affects performance a lot.
>
> There was no reclaim activity for testing. :(
>
> pgfault, 6% reduced. Given that, pgalloc/free reduced 6%, too
> because unixbench was time fixed mode and 6% regressed so no
> doubt.
>
> No interesting data.
>
> It seems you tested it with THP, maybe always mode?

Yes.  With following in kconfig.

CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y

> I'm so sorry but could you test it with disabling CONFIG_TRANSPARENT_HUGEPAGE=n
> again? it might you already did.
> Is it still 6% regressed with disabling THP?

Yes.  I disabled THP via

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

The regression is the same as before.

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase/thp_defrag/thp_enabled:
  gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/lituya/shell8/unixbench/never/never

commit: 
  4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5
  5c0a85fad949212b3e059692deecdeed74ae7ec7

4b50bcc7eda4d3cc 5c0a85fad949212b3e059692de 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     14332 ±  0%      -6.2%      13438 ±  0%  unixbench.score
   6662206 ±  0%      -6.2%    6252260 ±  0%  unixbench.time.involuntary_context_switches
 5.734e+08 ±  0%      -6.2%  5.376e+08 ±  0%  unixbench.time.minor_page_faults
      2527 ±  0%      -3.2%       2446 ±  0%  unixbench.time.system_time
      1291 ±  0%      +5.4%       1361 ±  0%  unixbench.time.user_time
  19875455 ±  0%      -6.3%   18622488 ±  0%  unixbench.time.voluntary_context_switches
   6570355 ±  0%     -11.9%    5787517 ±  0%  cpuidle.C1-HSW.usage
     17257 ± 34%     -59.1%       7055 ±  7%  latency_stats.sum.ep_poll.SyS_epoll_wait.entry_SYSCALL_64_fastpath
      5976 ±  0%     -43.0%       3404 ±  0%  proc-vmstat.nr_active_file
     45729 ±  1%     -22.5%      35439 ±  1%  meminfo.Active
     23905 ±  0%     -43.0%      13619 ±  0%  meminfo.Active(file)
      8465 ±  3%     -29.8%       5940 ±  3%  slabinfo.pid.active_objs
      8476 ±  3%     -29.9%       5940 ±  3%  slabinfo.pid.num_objs
      3.46 ±  0%     +12.5%       3.89 ±  0%  turbostat.CPU%c3
     67.09 ±  0%      -2.1%      65.65 ±  0%  turbostat.PkgWatt
     96090 ±  0%      -5.8%      90479 ±  0%  vmstat.system.cs
      9083 ±  0%      -2.7%       8833 ±  0%  vmstat.system.in
    467.35 ± 78%    +416.7%       2414 ± 45%  sched_debug.cfs_rq:/.MIN_vruntime.avg
      7477 ± 78%    +327.7%      31981 ± 39%  sched_debug.cfs_rq:/.MIN_vruntime.max
      1810 ± 78%    +360.1%       8327 ± 40%  sched_debug.cfs_rq:/.MIN_vruntime.stddev
    467.35 ± 78%    +416.7%       2414 ± 45%  sched_debug.cfs_rq:/.max_vruntime.avg
      7477 ± 78%    +327.7%      31981 ± 39%  sched_debug.cfs_rq:/.max_vruntime.max
      1810 ± 78%    +360.1%       8327 ± 40%  sched_debug.cfs_rq:/.max_vruntime.stddev
    -10724 ± -7%     -12.0%      -9433 ± -3%  sched_debug.cfs_rq:/.spread0.avg
    -17721 ± -4%      -9.8%     -15978 ± -2%  sched_debug.cfs_rq:/.spread0.min
     90355 ±  9%     +14.1%     103099 ±  5%  sched_debug.cpu.avg_idle.min
      0.12 ± 35%    +325.0%       0.52 ± 46%  sched_debug.cpu.cpu_load[0].min
     21913 ±  2%     +29.1%      28288 ± 14%  sched_debug.cpu.curr->pid.avg
     49953 ±  3%     +30.2%      65038 ±  0%  sched_debug.cpu.curr->pid.max
     23062 ±  2%     +30.1%      29996 ±  4%  sched_debug.cpu.curr->pid.stddev
    274.39 ±  5%     -10.2%     246.27 ±  3%  sched_debug.cpu.nr_uninterruptible.max
    242.73 ±  4%     -13.5%     209.90 ±  2%  sched_debug.cpu.nr_uninterruptible.stddev

Best Regards,
Huang, Ying

>                  nr_free_pages      -6663      -6461    96.97%
>                 nr_alloc_batch       2594       4013   154.70%
>               nr_inactive_anon        112        112   100.00%
>                 nr_active_anon       2536       2159    85.13%
>               nr_inactive_file       -567       -227    40.04%
>                 nr_active_file        648        315    48.61%
>                 nr_unevictable          0          0     0.00%
>                       nr_mlock          0          0     0.00%
>                  nr_anon_pages       2634       2161    82.04%
>                      nr_mapped        511        530   103.72%
>                  nr_file_pages        207        215   103.86%
>                       nr_dirty         -7         -6    85.71%
>                   nr_writeback          0          0     0.00%
>            nr_slab_reclaimable        158        328   207.59%
>          nr_slab_unreclaimable       2208       2115    95.79%
>            nr_page_table_pages        268        247    92.16%
>                nr_kernel_stack        143         80    55.94%
>                    nr_unstable          1          1   100.00%
>                      nr_bounce          0          0     0.00%
>                nr_vmscan_write          0          0     0.00%
>    nr_vmscan_immediate_reclaim          0          0     0.00%
>              nr_writeback_temp          0          0     0.00%
>               nr_isolated_anon          0          0     0.00%
>               nr_isolated_file          0          0     0.00%
>                       nr_shmem        131        131   100.00%
>                     nr_dirtied         67         78   116.42%
>                     nr_written         74         84   113.51%
>               nr_pages_scanned          0          0     0.00%
>                       numa_hit  483752446  453696304    93.79%
>                      numa_miss          0          0     0.00%
>                   numa_foreign          0          0     0.00%
>                numa_interleave          0          0     0.00%
>                     numa_local  483752445  453696304    93.79%
>                     numa_other          1          0     0.00%
>             workingset_refault          0          0     0.00%
>            workingset_activate          0          0     0.00%
>         workingset_nodereclaim          0          0     0.00%
>  nr_anon_transparent_hugepages          1          0     0.00%
>                    nr_free_cma          0          0     0.00%
>             nr_dirty_threshold      -1316      -1274    96.81%
>  nr_dirty_background_threshold       -658       -637    96.81%
>                         pgpgin          0          0     0.00%
>                        pgpgout          0          0     0.00%
>                         pswpin          0          0     0.00%
>                        pswpout          0          0     0.00%
>                    pgalloc_dma          0          0     0.00%
>                  pgalloc_dma32   60130977   56323630    93.67%
>                 pgalloc_normal  457203182  428863437    93.80%
>                pgalloc_movable          0          0     0.00%
>                         pgfree  517327743  485181251    93.79%
>                     pgactivate    2059556    1930950    93.76%
>                   pgdeactivate          0          0     0.00%
>                        pgfault  572723351  537107146    93.78%
>                     pgmajfault          0          0     0.00%
>                    pglazyfreed          0          0     0.00%
>                   pgrefill_dma          0          0     0.00%
>                 pgrefill_dma32          0          0     0.00%
>                pgrefill_normal          0          0     0.00%
>               pgrefill_movable          0          0     0.00%
>             pgsteal_kswapd_dma          0          0     0.00%
>           pgsteal_kswapd_dma32          0          0     0.00%
>          pgsteal_kswapd_normal          0          0     0.00%
>         pgsteal_kswapd_movable          0          0     0.00%
>             pgsteal_direct_dma          0          0     0.00%
>           pgsteal_direct_dma32          0          0     0.00%
>          pgsteal_direct_normal          0          0     0.00%
>         pgsteal_direct_movable          0          0     0.00%
>              pgscan_kswapd_dma          0          0     0.00%
>            pgscan_kswapd_dma32          0          0     0.00%
>           pgscan_kswapd_normal          0          0     0.00%
>          pgscan_kswapd_movable          0          0     0.00%
>              pgscan_direct_dma          0          0     0.00%
>            pgscan_direct_dma32          0          0     0.00%
>           pgscan_direct_normal          0          0     0.00%
>          pgscan_direct_movable          0          0     0.00%
>         pgscan_direct_throttle          0          0     0.00%
>            zone_reclaim_failed          0          0     0.00%
>                   pginodesteal          0          0     0.00%
>                  slabs_scanned          0          0     0.00%
>              kswapd_inodesteal          0          0     0.00%
>   kswapd_low_wmark_hit_quickly          0          0     0.00%
>  kswapd_high_wmark_hit_quickly          0          0     0.00%
>                     pageoutrun          0          0     0.00%
>                     allocstall          0          0     0.00%
>                      pgrotated          0          0     0.00%
>                 drop_pagecache          0          0     0.00%
>                      drop_slab          0          0     0.00%
>               numa_pte_updates          0          0     0.00%
>          numa_huge_pte_updates          0          0     0.00%
>               numa_hint_faults          0          0     0.00%
>         numa_hint_faults_local          0          0     0.00%
>            numa_pages_migrated          0          0     0.00%
>              pgmigrate_success          0          0     0.00%
>                 pgmigrate_fail          0          0     0.00%
>        compact_migrate_scanned          0          0     0.00%
>           compact_free_scanned          0          0     0.00%
>               compact_isolated          0          0     0.00%
>                  compact_stall          0          0     0.00%
>                   compact_fail          0          0     0.00%
>                compact_success          0          0     0.00%
>            compact_daemon_wake          0          0     0.00%
>       htlb_buddy_alloc_success          0          0     0.00%
>          htlb_buddy_alloc_fail          0          0     0.00%
>         unevictable_pgs_culled          0          0     0.00%
>        unevictable_pgs_scanned          0          0     0.00%
>        unevictable_pgs_rescued          0          0     0.00%
>        unevictable_pgs_mlocked          0          0     0.00%
>      unevictable_pgs_munlocked          0          0     0.00%
>        unevictable_pgs_cleared          0          0     0.00%
>       unevictable_pgs_stranded          0          0     0.00%
>                thp_fault_alloc      22731      21604    95.04%
>             thp_fault_fallback          0          0     0.00%
>             thp_collapse_alloc          1          0     0.00%
>      thp_collapse_alloc_failed          0          0     0.00%
>                 thp_split_page          0          0     0.00%
>          thp_split_page_failed          0          0     0.00%
>        thp_deferred_split_page      22731      21604    95.04%
>                  thp_split_pmd          0          0     0.00%
>            thp_zero_page_alloc          0          0     0.00%
>     thp_zero_page_alloc_failed          0          0     0.00%
>                balloon_inflate          0          0     0.00%
>                balloon_deflate          0          0     0.00%
>                balloon_migrate          0          0     0.00%

WARNING: multiple messages have this Message-ID (diff)
From: Huang, Ying <ying.huang@intel.com>
To: lkp@lists.01.org
Subject: Re: [mm] 5c0a85fad9: unixbench.score -6.3% regression
Date: Fri, 17 Jun 2016 12:26:51 -0700	[thread overview]
Message-ID: <87y463hb0k.fsf@yhuang-mobile.sh.intel.com> (raw)
In-Reply-To: <20160617054156.GB2374@bbox>

[-- Attachment #1: Type: text/plain, Size: 18205 bytes --]

Minchan Kim <minchan@kernel.org> writes:

> On Thu, Jun 16, 2016 at 03:27:44PM -0700, Huang, Ying wrote:
>> Minchan Kim <minchan@kernel.org> writes:
>> 
>> > On Thu, Jun 16, 2016 at 07:52:26AM +0800, Huang, Ying wrote:
>> >> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
>> >> 
>> >> > On Tue, Jun 14, 2016 at 05:57:28PM +0900, Minchan Kim wrote:
>> >> >> On Wed, Jun 08, 2016 at 11:58:11AM +0300, Kirill A. Shutemov wrote:
>> >> >> > On Wed, Jun 08, 2016 at 04:41:37PM +0800, Huang, Ying wrote:
>> >> >> > > "Huang, Ying" <ying.huang@intel.com> writes:
>> >> >> > > 
>> >> >> > > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>> >> >> > > >
>> >> >> > > >> On Mon, Jun 06, 2016 at 10:27:24AM +0800, kernel test robot wrote:
>> >> >> > > >>> 
>> >> >> > > >>> FYI, we noticed a -6.3% regression of unixbench.score due to commit:
>> >> >> > > >>> 
>> >> >> > > >>> commit 5c0a85fad949212b3e059692deecdeed74ae7ec7 ("mm: make faultaround produce old ptes")
>> >> >> > > >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> >> >> > > >>> 
>> >> >> > > >>> in testcase: unixbench
>> >> >> > > >>> on test machine: lituya: 16 threads Haswell High-end Desktop (i7-5960X 3.0G) with 16G memory
>> >> >> > > >>> with following parameters: cpufreq_governor=performance/nr_task=1/test=shell8
>> >> >> > > >>> 
>> >> >> > > >>> 
>> >> >> > > >>> Details are as below:
>> >> >> > > >>> -------------------------------------------------------------------------------------------------->
>> >> >> > > >>> 
>> >> >> > > >>> 
>> >> >> > > >>> =========================================================================================
>> >> >> > > >>> compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
>> >> >> > > >>>   gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/lituya/shell8/unixbench
>> >> >> > > >>> 
>> >> >> > > >>> commit: 
>> >> >> > > >>>   4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5
>> >> >> > > >>>   5c0a85fad949212b3e059692deecdeed74ae7ec7
>> >> >> > > >>> 
>> >> >> > > >>> 4b50bcc7eda4d3cc 5c0a85fad949212b3e059692de 
>> >> >> > > >>> ---------------- -------------------------- 
>> >> >> > > >>>        fail:runs  %reproduction    fail:runs
>> >> >> > > >>>            |             |             |    
>> >> >> > > >>>           3:4          -75%            :4     kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
>> >> >> > > >>>          %stddev     %change         %stddev
>> >> >> > > >>>              \          |                \  
>> >> >> > > >>>      14321 .  0%      -6.3%      13425 .  0%  unixbench.score
>> >> >> > > >>>    1996897 .  0%      -6.1%    1874635 .  0%  unixbench.time.involuntary_context_switches
>> >> >> > > >>>  1.721e+08 .  0%      -6.2%  1.613e+08 .  0%  unixbench.time.minor_page_faults
>> >> >> > > >>>     758.65 .  0%      -3.0%     735.86 .  0%  unixbench.time.system_time
>> >> >> > > >>>     387.66 .  0%      +5.4%     408.49 .  0%  unixbench.time.user_time
>> >> >> > > >>>    5950278 .  0%      -6.2%    5583456 .  0%  unixbench.time.voluntary_context_switches
>> >> >> > > >>
>> >> >> > > >> That's weird.
>> >> >> > > >>
>> >> >> > > >> I don't understand why the change would reduce number or minor faults.
>> >> >> > > >> It should stay the same on x86-64. Rise of user_time is puzzling too.
>> >> >> > > >
>> >> >> > > > unixbench runs in fixed time mode.  That is, the total time to run
>> >> >> > > > unixbench is fixed, but the work done varies.  So the minor_page_faults
>> >> >> > > > change may reflect only the work done.
>> >> >> > > >
>> >> >> > > >> Hm. Is reproducible? Across reboot?
>> >> >> > > >
>> >> >> > > 
>> >> >> > > And FYI, there is no swap setup for test, all root file system including
>> >> >> > > benchmark files are in tmpfs, so no real page reclaim will be
>> >> >> > > triggered.  But it appears that active file cache reduced after the
>> >> >> > > commit.
>> >> >> > > 
>> >> >> > >     111331 .  1%     -13.3%      96503 .  0%  meminfo.Active
>> >> >> > >      27603 .  1%     -43.9%      15486 .  0%  meminfo.Active(file)
>> >> >> > > 
>> >> >> > > I think this is the expected behavior of the commit?
>> >> >> > 
>> >> >> > Yes, it's expected.
>> >> >> > 
>> >> >> > After the change faularound would produce old pte. It means there's more
>> >> >> > chance for these pages to be on inactive lru, unless somebody actually
>> >> >> > touch them and flip accessed bit.
>> >> >> 
>> >> >> Hmm, tmpfs pages should be in anonymous LRU list and VM shouldn't scan
>> >> >> anonymous LRU list on swapless system so I really wonder why active file
>> >> >> LRU is shrunk.
>> >> >
>> >> > Hm. Good point. I don't why we have anything on file lru if there's no
>> >> > filesystems except tmpfs.
>> >> >
>> >> > Ying, how do you get stuff to the tmpfs?
>> >> 
>> >> We put root file system and benchmark into a set of compressed cpio
>> >> archive, then concatenate them into one initrd, and finally kernel use
>> >> that initrd as initramfs.
>> >
>> > I see.
>> >
>> > Could you share your 4 full vmstat(/proc/vmstat) files?
>> >
>> > old:
>> >
>> > cat /proc/vmstat > before.old.vmstat
>> > do benchmark
>> > cat /proc/vmstat > after.old.vmstat
>> >
>> > new:
>> >
>> > cat /proc/vmstat > before.new.vmstat
>> > do benchmark
>> > cat /proc/vmstat > after.new.vmstat
>> >
>> > IOW, I want to see stats related to reclaim.
>> 
>> Hi,
>> 
>> The /proc/vmstat for the parent commit (parent-proc-vmstat.gz) and first
>> bad commit (fbc-proc-vmstat.gz) are attached with the email.
>> 
>> The contents of the file is more than the vmstat before and after
>> benchmark running, but are sampled every 1 seconds.  Every sample begin
>> with "time: <time>".  You can check the first and last samples.  The
>> first /proc/vmstat capturing is started at the same time of the
>> benchmark, so it is not exactly the vmstat before the benchmark running.
>> 
>
> Thanks for the testing!
>
> nr_active_file was shrunk 48% but the vaule itself is not huge so
> I don't think it affects performance a lot.
>
> There was no reclaim activity for testing. :(
>
> pgfault, 6% reduced. Given that, pgalloc/free reduced 6%, too
> because unixbench was time fixed mode and 6% regressed so no
> doubt.
>
> No interesting data.
>
> It seems you tested it with THP, maybe always mode?

Yes.  With following in kconfig.

CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y

> I'm so sorry but could you test it with disabling CONFIG_TRANSPARENT_HUGEPAGE=n
> again? it might you already did.
> Is it still 6% regressed with disabling THP?

Yes.  I disabled THP via

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

The regression is the same as before.

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase/thp_defrag/thp_enabled:
  gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/lituya/shell8/unixbench/never/never

commit: 
  4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5
  5c0a85fad949212b3e059692deecdeed74ae7ec7

4b50bcc7eda4d3cc 5c0a85fad949212b3e059692de 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     14332 ±  0%      -6.2%      13438 ±  0%  unixbench.score
   6662206 ±  0%      -6.2%    6252260 ±  0%  unixbench.time.involuntary_context_switches
 5.734e+08 ±  0%      -6.2%  5.376e+08 ±  0%  unixbench.time.minor_page_faults
      2527 ±  0%      -3.2%       2446 ±  0%  unixbench.time.system_time
      1291 ±  0%      +5.4%       1361 ±  0%  unixbench.time.user_time
  19875455 ±  0%      -6.3%   18622488 ±  0%  unixbench.time.voluntary_context_switches
   6570355 ±  0%     -11.9%    5787517 ±  0%  cpuidle.C1-HSW.usage
     17257 ± 34%     -59.1%       7055 ±  7%  latency_stats.sum.ep_poll.SyS_epoll_wait.entry_SYSCALL_64_fastpath
      5976 ±  0%     -43.0%       3404 ±  0%  proc-vmstat.nr_active_file
     45729 ±  1%     -22.5%      35439 ±  1%  meminfo.Active
     23905 ±  0%     -43.0%      13619 ±  0%  meminfo.Active(file)
      8465 ±  3%     -29.8%       5940 ±  3%  slabinfo.pid.active_objs
      8476 ±  3%     -29.9%       5940 ±  3%  slabinfo.pid.num_objs
      3.46 ±  0%     +12.5%       3.89 ±  0%  turbostat.CPU%c3
     67.09 ±  0%      -2.1%      65.65 ±  0%  turbostat.PkgWatt
     96090 ±  0%      -5.8%      90479 ±  0%  vmstat.system.cs
      9083 ±  0%      -2.7%       8833 ±  0%  vmstat.system.in
    467.35 ± 78%    +416.7%       2414 ± 45%  sched_debug.cfs_rq:/.MIN_vruntime.avg
      7477 ± 78%    +327.7%      31981 ± 39%  sched_debug.cfs_rq:/.MIN_vruntime.max
      1810 ± 78%    +360.1%       8327 ± 40%  sched_debug.cfs_rq:/.MIN_vruntime.stddev
    467.35 ± 78%    +416.7%       2414 ± 45%  sched_debug.cfs_rq:/.max_vruntime.avg
      7477 ± 78%    +327.7%      31981 ± 39%  sched_debug.cfs_rq:/.max_vruntime.max
      1810 ± 78%    +360.1%       8327 ± 40%  sched_debug.cfs_rq:/.max_vruntime.stddev
    -10724 ± -7%     -12.0%      -9433 ± -3%  sched_debug.cfs_rq:/.spread0.avg
    -17721 ± -4%      -9.8%     -15978 ± -2%  sched_debug.cfs_rq:/.spread0.min
     90355 ±  9%     +14.1%     103099 ±  5%  sched_debug.cpu.avg_idle.min
      0.12 ± 35%    +325.0%       0.52 ± 46%  sched_debug.cpu.cpu_load[0].min
     21913 ±  2%     +29.1%      28288 ± 14%  sched_debug.cpu.curr->pid.avg
     49953 ±  3%     +30.2%      65038 ±  0%  sched_debug.cpu.curr->pid.max
     23062 ±  2%     +30.1%      29996 ±  4%  sched_debug.cpu.curr->pid.stddev
    274.39 ±  5%     -10.2%     246.27 ±  3%  sched_debug.cpu.nr_uninterruptible.max
    242.73 ±  4%     -13.5%     209.90 ±  2%  sched_debug.cpu.nr_uninterruptible.stddev

Best Regards,
Huang, Ying

>                  nr_free_pages      -6663      -6461    96.97%
>                 nr_alloc_batch       2594       4013   154.70%
>               nr_inactive_anon        112        112   100.00%
>                 nr_active_anon       2536       2159    85.13%
>               nr_inactive_file       -567       -227    40.04%
>                 nr_active_file        648        315    48.61%
>                 nr_unevictable          0          0     0.00%
>                       nr_mlock          0          0     0.00%
>                  nr_anon_pages       2634       2161    82.04%
>                      nr_mapped        511        530   103.72%
>                  nr_file_pages        207        215   103.86%
>                       nr_dirty         -7         -6    85.71%
>                   nr_writeback          0          0     0.00%
>            nr_slab_reclaimable        158        328   207.59%
>          nr_slab_unreclaimable       2208       2115    95.79%
>            nr_page_table_pages        268        247    92.16%
>                nr_kernel_stack        143         80    55.94%
>                    nr_unstable          1          1   100.00%
>                      nr_bounce          0          0     0.00%
>                nr_vmscan_write          0          0     0.00%
>    nr_vmscan_immediate_reclaim          0          0     0.00%
>              nr_writeback_temp          0          0     0.00%
>               nr_isolated_anon          0          0     0.00%
>               nr_isolated_file          0          0     0.00%
>                       nr_shmem        131        131   100.00%
>                     nr_dirtied         67         78   116.42%
>                     nr_written         74         84   113.51%
>               nr_pages_scanned          0          0     0.00%
>                       numa_hit  483752446  453696304    93.79%
>                      numa_miss          0          0     0.00%
>                   numa_foreign          0          0     0.00%
>                numa_interleave          0          0     0.00%
>                     numa_local  483752445  453696304    93.79%
>                     numa_other          1          0     0.00%
>             workingset_refault          0          0     0.00%
>            workingset_activate          0          0     0.00%
>         workingset_nodereclaim          0          0     0.00%
>  nr_anon_transparent_hugepages          1          0     0.00%
>                    nr_free_cma          0          0     0.00%
>             nr_dirty_threshold      -1316      -1274    96.81%
>  nr_dirty_background_threshold       -658       -637    96.81%
>                         pgpgin          0          0     0.00%
>                        pgpgout          0          0     0.00%
>                         pswpin          0          0     0.00%
>                        pswpout          0          0     0.00%
>                    pgalloc_dma          0          0     0.00%
>                  pgalloc_dma32   60130977   56323630    93.67%
>                 pgalloc_normal  457203182  428863437    93.80%
>                pgalloc_movable          0          0     0.00%
>                         pgfree  517327743  485181251    93.79%
>                     pgactivate    2059556    1930950    93.76%
>                   pgdeactivate          0          0     0.00%
>                        pgfault  572723351  537107146    93.78%
>                     pgmajfault          0          0     0.00%
>                    pglazyfreed          0          0     0.00%
>                   pgrefill_dma          0          0     0.00%
>                 pgrefill_dma32          0          0     0.00%
>                pgrefill_normal          0          0     0.00%
>               pgrefill_movable          0          0     0.00%
>             pgsteal_kswapd_dma          0          0     0.00%
>           pgsteal_kswapd_dma32          0          0     0.00%
>          pgsteal_kswapd_normal          0          0     0.00%
>         pgsteal_kswapd_movable          0          0     0.00%
>             pgsteal_direct_dma          0          0     0.00%
>           pgsteal_direct_dma32          0          0     0.00%
>          pgsteal_direct_normal          0          0     0.00%
>         pgsteal_direct_movable          0          0     0.00%
>              pgscan_kswapd_dma          0          0     0.00%
>            pgscan_kswapd_dma32          0          0     0.00%
>           pgscan_kswapd_normal          0          0     0.00%
>          pgscan_kswapd_movable          0          0     0.00%
>              pgscan_direct_dma          0          0     0.00%
>            pgscan_direct_dma32          0          0     0.00%
>           pgscan_direct_normal          0          0     0.00%
>          pgscan_direct_movable          0          0     0.00%
>         pgscan_direct_throttle          0          0     0.00%
>            zone_reclaim_failed          0          0     0.00%
>                   pginodesteal          0          0     0.00%
>                  slabs_scanned          0          0     0.00%
>              kswapd_inodesteal          0          0     0.00%
>   kswapd_low_wmark_hit_quickly          0          0     0.00%
>  kswapd_high_wmark_hit_quickly          0          0     0.00%
>                     pageoutrun          0          0     0.00%
>                     allocstall          0          0     0.00%
>                      pgrotated          0          0     0.00%
>                 drop_pagecache          0          0     0.00%
>                      drop_slab          0          0     0.00%
>               numa_pte_updates          0          0     0.00%
>          numa_huge_pte_updates          0          0     0.00%
>               numa_hint_faults          0          0     0.00%
>         numa_hint_faults_local          0          0     0.00%
>            numa_pages_migrated          0          0     0.00%
>              pgmigrate_success          0          0     0.00%
>                 pgmigrate_fail          0          0     0.00%
>        compact_migrate_scanned          0          0     0.00%
>           compact_free_scanned          0          0     0.00%
>               compact_isolated          0          0     0.00%
>                  compact_stall          0          0     0.00%
>                   compact_fail          0          0     0.00%
>                compact_success          0          0     0.00%
>            compact_daemon_wake          0          0     0.00%
>       htlb_buddy_alloc_success          0          0     0.00%
>          htlb_buddy_alloc_fail          0          0     0.00%
>         unevictable_pgs_culled          0          0     0.00%
>        unevictable_pgs_scanned          0          0     0.00%
>        unevictable_pgs_rescued          0          0     0.00%
>        unevictable_pgs_mlocked          0          0     0.00%
>      unevictable_pgs_munlocked          0          0     0.00%
>        unevictable_pgs_cleared          0          0     0.00%
>       unevictable_pgs_stranded          0          0     0.00%
>                thp_fault_alloc      22731      21604    95.04%
>             thp_fault_fallback          0          0     0.00%
>             thp_collapse_alloc          1          0     0.00%
>      thp_collapse_alloc_failed          0          0     0.00%
>                 thp_split_page          0          0     0.00%
>          thp_split_page_failed          0          0     0.00%
>        thp_deferred_split_page      22731      21604    95.04%
>                  thp_split_pmd          0          0     0.00%
>            thp_zero_page_alloc          0          0     0.00%
>     thp_zero_page_alloc_failed          0          0     0.00%
>                balloon_inflate          0          0     0.00%
>                balloon_deflate          0          0     0.00%
>                balloon_migrate          0          0     0.00%

  reply	other threads:[~2016-06-17 19:26 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06  2:27 [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression kernel test robot
2016-06-06  2:27 ` kernel test robot
2016-06-06  9:51 ` [lkp] " Kirill A. Shutemov
2016-06-06  9:51   ` Kirill A. Shutemov
2016-06-08  7:21   ` [LKP] [lkp] " Huang, Ying
2016-06-08  7:21     ` Huang, Ying
2016-06-08  8:41     ` [LKP] [lkp] " Huang, Ying
2016-06-08  8:41       ` Huang, Ying
2016-06-08  8:58       ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-08  8:58         ` Kirill A. Shutemov
2016-06-12  0:49         ` [LKP] [lkp] " Huang, Ying
2016-06-12  0:49           ` Huang, Ying
2016-06-12  1:02           ` [LKP] [lkp] " Linus Torvalds
2016-06-12  1:02             ` Linus Torvalds
2016-06-13  9:02             ` [LKP] [lkp] " Huang, Ying
2016-06-13  9:02               ` Huang, Ying
2016-06-14 13:38               ` [LKP] [lkp] " Minchan Kim
2016-06-14 13:38                 ` Minchan Kim
2016-06-15 23:42                 ` [LKP] [lkp] " Huang, Ying
2016-06-15 23:42                   ` Huang, Ying
2016-06-13 12:52             ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-13 12:52               ` Kirill A. Shutemov
2016-06-14  6:11               ` [LKP] [lkp] " Linus Torvalds
2016-06-14  6:11                 ` Linus Torvalds
2016-06-14  8:26                 ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-14  8:26                   ` Kirill A. Shutemov
2016-06-14 16:07                   ` [LKP] [lkp] " Rik van Riel
2016-06-14 16:07                     ` Rik van Riel
2016-06-14 14:03                 ` [LKP] [lkp] " Christian Borntraeger
2016-06-14 14:03                   ` Christian Borntraeger
2016-06-14  8:57         ` [LKP] [lkp] " Minchan Kim
2016-06-14  8:57           ` Minchan Kim
2016-06-14 14:34           ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-14 14:34             ` Kirill A. Shutemov
2016-06-15 23:52             ` [LKP] [lkp] " Huang, Ying
2016-06-15 23:52               ` Huang, Ying
2016-06-16  0:13               ` [LKP] [lkp] " Minchan Kim
2016-06-16  0:13                 ` Minchan Kim
2016-06-16 22:27                 ` [LKP] [lkp] " Huang, Ying
2016-06-16 22:27                   ` Huang, Ying
2016-06-17  5:41                   ` [LKP] [lkp] " Minchan Kim
2016-06-17  5:41                     ` Minchan Kim
2016-06-17 19:26                     ` Huang, Ying [this message]
2016-06-17 19:26                       ` Huang, Ying
2016-06-20  0:06                       ` [LKP] [lkp] " Minchan Kim
2016-06-20  0:06                         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y463hb0k.fsf@yhuang-mobile.sh.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.