linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: "ying.huang@intel.com" <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	kernel test robot <oliver.sang@intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>, <lkp@lists.01.org>,
	<lkp@intel.com>, <feng.tang@intel.com>,
	<zhengjun.xing@linux.intel.com>, <fengwei.yin@intel.com>
Subject: Re: [mm/page_alloc]  f26b3fa046:  netperf.Throughput_Mbps -18.0% regression
Date: Fri, 6 May 2022 20:17:11 +0800	[thread overview]
Message-ID: <YnURx04+hE0sQ3v3@ziqianlu-desk1> (raw)
In-Reply-To: <bd3db4de223a010d1e06013e93b09879fc9b36a8.camel@intel.com>

On Fri, May 06, 2022 at 04:40:45PM +0800, ying.huang@intel.com wrote:
> On Fri, 2022-04-29 at 19:29 +0800, Aaron Lu wrote:
> > Hi Mel,
> > 
> > On Wed, Apr 20, 2022 at 09:35:26AM +0800, kernel test robot wrote:
> > > 
> > > (please be noted we reported
> > > "[mm/page_alloc]  39907a939a:  netperf.Throughput_Mbps -18.1% regression"
> > > on
> > > https://lore.kernel.org/all/20220228155733.GF1643@xsang-OptiPlex-9020/
> > > while the commit is on branch.
> > > now we still observe similar regression when it's on mainline, and we also
> > > observe a 13.2% improvement on another netperf subtest.
> > > so report again for information)
> > > 
> > > Greeting,
> > > 
> > > FYI, we noticed a -18.0% regression of netperf.Throughput_Mbps due to commit:
> > > 
> > > 
> > > commit: f26b3fa046116a7dedcaafe30083402113941451 ("mm/page_alloc: limit number of high-order pages on PCP during bulk free")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > 
> > So what this commit did is: if a CPU is always doing free(pcp->free_factor > 0)
> 
> IMHO, this means the consumer and producer are running on different
> CPUs.
>

Right.

> > and if the being freed high-order page's order is <= PAGE_ALLOC_COSTLY_ORDER,
> > then do not use PCP but directly free the page directly to buddy.
> > 
> > The rationale as explained in the commit's changelog is:
> > "
> > Netperf running on localhost exhibits this pattern and while it does not
> > matter for some machines, it does matter for others with smaller caches
> > where cache misses cause problems due to reduced page reuse. Pages
> > freed directly to the buddy list may be reused quickly while still cache
> > hot where as storing on the PCP lists may be cold by the time
> > free_pcppages_bulk() is called.
> > "
> > 
> > This regression occurred on a machine that has large caches so this
> > optimization brings no value to it but only overhead(skipped PCP), I
> > guess this is the reason why there is a regression.
> 
> Per my understanding, not only the cache size is larger, but also the L2
> cache (1MB) is per-core on this machine.  So if the consumer and
> producer are running on different cores, the cache-hot page may cause
> more core-to-core cache transfer.  This may hurt performance too.
>

Client side allocates skb(page) and server side recvfrom() it.
recvfrom() copies the page data to server's own buffer and then releases
the page associated with the skb. Client does all the allocation and
server does all the free, page reuse happens at client side.
So I think core-2-core cache transfer due to page reuse can occur when
client task migrates.

I have modified the job to have the client and server bound to a
specific CPU of different cores on the same node, and testing it on the
same Icelake 2 sockets server, the result is

  kernel      throughput
8b10b465d0e1     125168
f26b3fa04611     102039 -18%

It's also a 18% drop. I think this means c2c is not a factor?

> > I have also tested this case on a small machine: a skylake desktop and
> > this commit shows improvement:
> > 8b10b465d0e1: "netperf.Throughput_Mbps": 72288.76,
> > f26b3fa04611: "netperf.Throughput_Mbps": 90784.4,  +25.6%
> >
> > So this means those directly freed pages get reused by allocator side
> > and that brings performance improvement for machines with smaller cache.
> 
> Per my understanding, the L2 cache on this desktop machine is shared
> among cores.
> 

The said CPU is i7-6700 and according to this wikipedia page,
L2 is per core:
https://en.wikipedia.org/wiki/Skylake_(microarchitecture)#Mainstream_desktop_processors

> > I wonder if we should still use PCP a little bit under the above said
> > condition, for the purpose of:
> > 1 reduced overhead in the free path for machines with large cache;
> > 2 still keeps the benefit of reused pages for machines with smaller cache.
> > 
> > For this reason, I tested increasing nr_pcp_high() from returning 0 to
> > either returning pcp->batch or (pcp->batch << 2):
> > machine\nr_pcp_high() ret: pcp->high   0   pcp->batch (pcp->batch << 2)
> > skylake desktop:             72288   90784   92219       91528
> > icelake 2sockets:           120956   99177   98251      116108
> > 
> > note nr_pcp_high() returns pcp->high is the behaviour of this commit's
> > parent, returns 0 is the behaviour of this commit.
> > 
> > The result shows, if we effectively use a PCP high as (pcp->batch << 2)
> > for the described condition, then this workload's performance on
> > small machine can remain while the regression on large machines can be
> > greately reduced(from -18% to -4%).
> > 
> 
> Can we use cache size and topology information directly?

It can be complicated by the fact that the system can have multiple
producers(cpus that are doing free) running at the same time and getting
the perfect number can be a difficult job.

  reply	other threads:[~2022-05-06 12:17 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-20  1:35 [mm/page_alloc] f26b3fa046: netperf.Throughput_Mbps -18.0% regression kernel test robot
2022-04-29 11:29 ` Aaron Lu
2022-04-29 13:39   ` Mel Gorman
2022-05-05  8:27     ` Aaron Lu
2022-05-05 11:09       ` Mel Gorman
2022-05-05 14:29         ` Aaron Lu
2022-05-06  8:40   ` ying.huang
2022-05-06 12:17     ` Aaron Lu [this message]
2022-05-07  0:54       ` ying.huang
2022-05-07  3:27         ` Aaron Lu
2022-05-07  7:11           ` ying.huang
2022-05-07  7:31             ` Aaron Lu
2022-05-07  7:44               ` ying.huang
2022-05-10  3:43                 ` Aaron Lu
2022-05-10  6:23                   ` ying.huang
2022-05-10 18:05                     ` Linus Torvalds
2022-05-10 18:47                       ` Waiman Long
2022-05-10 19:03                         ` Linus Torvalds
2022-05-10 19:25                           ` Linus Torvalds
2022-05-10 19:46                           ` Waiman Long
2022-05-10 19:27                       ` Peter Zijlstra
2022-05-11  1:58                       ` ying.huang
2022-05-11  2:06                         ` Waiman Long
2022-05-11 11:04                         ` Aaron Lu
2022-05-12  3:17                           ` ying.huang
2022-05-12 12:45                             ` Aaron Lu
2022-05-12 17:42                               ` Linus Torvalds
2022-05-12 18:06                                 ` Andrew Morton
2022-05-12 18:49                                   ` Linus Torvalds
2022-06-14  2:09                                     ` Feng Tang
2022-05-13  6:19                                 ` ying.huang
2022-05-11  3:40                     ` Aaron Lu
2022-05-11  7:32                       ` ying.huang
2022-05-11  7:53                         ` Aaron Lu
2022-06-01  2:19                           ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnURx04+hE0sQ3v3@ziqianlu-desk1 \
    --to=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).