All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 5/6] mm/page_alloc: Free pages in a single pass during bulk free
Date: Wed, 23 Feb 2022 19:30:52 +0800	[thread overview]
Message-ID: <YhYa7PzhzEqRYXHp@ziqianlu-nuc9qn> (raw)
In-Reply-To: <YhOVzktqtWIJFjiJ@ziqianlu-nuc9qn>

On Mon, Feb 21, 2022 at 09:38:22PM +0800, Aaron Lu wrote:
> On Fri, Feb 18, 2022 at 12:20:03PM +0800, Aaron Lu wrote:
> > On Thu, Feb 17, 2022 at 09:31:13AM +0000, Mel Gorman wrote:
> > > On Thu, Feb 17, 2022 at 09:53:08AM +0800, Aaron Lu wrote:
> > > > > 2-socket CascadeLake (40 cores, 80 CPUs HT enabled)
> > > > >                                                     5.17.0-rc3                 5.17.0-rc3
> > > > >                                                        vanilla           mm-highpcpopt-v2
> > > > > Hmean     page_fault1-processes-2        2694662.26 (   0.00%)      2695780.35 (   0.04%)
> > > > > Hmean     page_fault1-processes-5        6425819.34 (   0.00%)      6435544.57 *   0.15%*
> > > > > Hmean     page_fault1-processes-8        9642169.10 (   0.00%)      9658962.39 (   0.17%)
> > > > > Hmean     page_fault1-processes-12      12167502.10 (   0.00%)     12190163.79 (   0.19%)
> > > > > Hmean     page_fault1-processes-21      15636859.03 (   0.00%)     15612447.26 (  -0.16%)
> > > > > Hmean     page_fault1-processes-30      25157348.61 (   0.00%)     25169456.65 (   0.05%)
> > > > > Hmean     page_fault1-processes-48      27694013.85 (   0.00%)     27671111.46 (  -0.08%)
> > > > > Hmean     page_fault1-processes-79      25928742.64 (   0.00%)     25934202.02 (   0.02%) <--
> > > > > Hmean     page_fault1-processes-110     25730869.75 (   0.00%)     25671880.65 *  -0.23%*
> > > > > Hmean     page_fault1-processes-141     25626992.42 (   0.00%)     25629551.61 (   0.01%)
> > > > > Hmean     page_fault1-processes-172     25611651.35 (   0.00%)     25614927.99 (   0.01%)
> > > > > Hmean     page_fault1-processes-203     25577298.75 (   0.00%)     25583445.59 (   0.02%)
> > > > > Hmean     page_fault1-processes-234     25580686.07 (   0.00%)     25608240.71 (   0.11%)
> > > > > Hmean     page_fault1-processes-265     25570215.47 (   0.00%)     25568647.58 (  -0.01%)
> > > > > Hmean     page_fault1-processes-296     25549488.62 (   0.00%)     25543935.00 (  -0.02%)
> > > > > Hmean     page_fault1-processes-320     25555149.05 (   0.00%)     25575696.74 (   0.08%)
> > > > > 
> > > > > The differences are mostly within the noise and the difference close to
> > > > > $nr_cpus is negligible.
> > > > 
> > > > I have queued will-it-scale/page_fault1/processes/$nr_cpu on 2 4-sockets
> > > > servers: CascadeLake and CooperLaker and will let you know the result
> > > > once it's out.
> > > > 
> > > 
> > > Thanks, 4 sockets and a later generation would be nice to cover.
> > > 
> > > > I'm using 'https://github.com/hnaz/linux-mm master' and doing the
> > > > comparison with commit c000d687ce22("mm/page_alloc: simplify how many
> > > > pages are selected per pcp list during bulk free") and commit 8391e0a7e172
> > > > ("mm/page_alloc: free pages in a single pass during bulk free") there.
> > > > 
> > > 
> > > The baseline looks fine. It's different to what I used but the page_alloc
> > > shouldn't have much impact.
> > > 
> > > When looking at will-it-scale, please pay attention to lower CPU counts
> > > as well and take account changes in standard deviation. Looking at the
> > 
> > I'll also test nr_task=4/16/64 on the 4sockets CooperLake(nr_cpu=144) then.
> > 
> 
> For the record, these tests don't show any visible performance changes
> on CooperLake.

One thing I just noticed is that, zone lock contention increased to some
extent. I'm not sure if this is worrisome so I suppose I should at least
mention it here.

The nr_task=100% test on the 4 sockets Cooper Lake showed that zone lock
contention increased from 13.56% to 20.16% and for nr_task=16, it
increased from 4.75% to 6.18%.

The reason is probably due to more code are now inside the lock and when
there is contention, it will make things worse. I'm aware of that
nr_task=100% is a rare case and this patchset is meant to improve things
when there is very little contention, which should be the common case.
So I guess that's just the tradeoff we have to make...

Here are the results on performance metric and zone lock metrics:

nr_task=100%
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/thp_enabled/cpufreq_governor:
  lkp-cpl-4sp1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/process/page_fault1/never/performance

commit/ucode:
  8391e0a7e1728d74faecebf096b446ac5d0a5709/0x7002302 (mm/page_alloc: free pages in a single pass during bulk free)
  c000d687ce22252c8ea96e47b4a2add592fbad6c/0x7002302 (mm/page_alloc: simplify how many pages are selected per pcp list during bulk free)
  7decb609034044e56cffd1c9971738878467ee96/0x7002402 (mm/page_alloc: Do not prefetch buddies during bulk free)

8391e0a7e1728d74 c000d687ce22252c8ea96e47b4a 7decb609034044e56cffd1c9971
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
  11807831            -0.5%   11750578            -0.3%   11778047        will-it-scale.144.processes
     15.44 ± 10%      -4.9       10.58 ±  8%      +0.6       16.01 ±  5%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages
      4.72 ±  8%      -1.7        2.98            -0.1        4.63 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages


nr_task=16
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/thp_enabled/cpufreq_governor/ucode:
  lkp-cpl-4sp1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/16/process/page_fault1/never/performance/0x7002402

commit:
  8391e0a7e1728d74faecebf096b446ac5d0a5709 (mm/page_alloc: free pages in a single pass during bulk free)
  c000d687ce22252c8ea96e47b4a2add592fbad6c (mm/page_alloc: simplify how many pages are selected per pcp list during bulk free)
  7decb609034044e56cffd1c9971738878467ee96 (mm/page_alloc: Do not prefetch buddies during bulk free)

8391e0a7e1728d74 c000d687ce22252c8ea96e47b4a 7decb609034044e56cffd1c9971
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3410615            +0.2%    3416565            +0.2%    3415846        will-it-scale.16.processes
      4.83 ±  3%      -1.1        3.76 ±  9%      -0.4        4.40 ±  4%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages
      1.35 ±  9%      -0.4        0.99 ± 14%      -0.2        1.17 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages

Regards,
Aaron

  reply	other threads:[~2022-02-23 11:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17  0:22 [PATCH v2 0/6] Follow-up on high-order PCP caching Mel Gorman
2022-02-17  0:22 ` [PATCH 1/6] mm/page_alloc: Fetch the correct pcp buddy during bulk free Mel Gorman
2022-02-17  1:43   ` Aaron Lu
2022-02-17  0:22 ` [PATCH 2/6] mm/page_alloc: Track range of active PCP lists " Mel Gorman
2022-02-17  9:41   ` Vlastimil Babka
2022-02-17  0:22 ` [PATCH 3/6] mm/page_alloc: Simplify how many pages are selected per pcp list " Mel Gorman
2022-02-17  0:22 ` [PATCH 4/6] mm/page_alloc: Drain the requested list first " Mel Gorman
2022-02-17  9:42   ` Vlastimil Babka
2022-02-17  0:22 ` [PATCH 5/6] mm/page_alloc: Free pages in a single pass " Mel Gorman
2022-02-17  1:53   ` Aaron Lu
2022-02-17  8:49     ` Aaron Lu
2022-02-17  9:31     ` Mel Gorman
2022-02-18  4:20       ` Aaron Lu
2022-02-18  9:20         ` Mel Gorman
2022-02-21 13:38         ` Aaron Lu
2022-02-23 11:30           ` Aaron Lu [this message]
2022-02-23 13:05             ` Mel Gorman
2022-02-24  1:34               ` Lu, Aaron
2022-02-18  6:07   ` Aaron Lu
2022-02-18  9:47     ` Mel Gorman
2022-02-18 12:13       ` Aaron Lu
2022-02-17  0:22 ` [PATCH 6/6] mm/page_alloc: Limit number of high-order pages on PCP " Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhYa7PzhzEqRYXHp@ziqianlu-nuc9qn \
    --to=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.