linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: kemi <kemi.wang@intel.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave <dave.hansen@linux.intel.com>,
	Andi Kleen <andi.kleen@intel.com>,
	Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	Tim Chen <tim.c.chen@intel.com>, Linux MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/2] Separate NUMA statistics from zone statistics
Date: Wed, 16 Aug 2017 11:23:46 +0800	[thread overview]
Message-ID: <7efb3a61-071b-b3cd-2f8a-a264ece9ab44@intel.com> (raw)
In-Reply-To: <20170815123636.3788230c@redhat.com>



On 2017a1'08ae??15ae?JPY 18:36, Jesper Dangaard Brouer wrote:
> On Tue, 15 Aug 2017 16:45:34 +0800
> Kemi Wang <kemi.wang@intel.com> wrote:
> 
>> Each page allocation updates a set of per-zone statistics with a call to
>> zone_statistics(). As discussed in 2017 MM submit, these are a substantial
>                                              ^^^^^^ should be "summit"

Hi, Jesper
   Thanks for reporting this issue and providing the benchmark to test raw
performance of page allocation. It is really quite helpful to figure out the
root cause.
>> source of overhead in the page allocator and are very rarely consumed. This
>> significant overhead in cache bouncing caused by zone counters (NUMA
>> associated counters) update in parallel in multi-threaded page allocation
>> (pointed out by Dave Hansen).
> 
> Hi Kemi
> 
> Thanks a lot for following up on this work. A link to the MM summit slides:
>  http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf
> 
Thanks for adding the link here. I should have done that in this cover letter.

>> To mitigate this overhead, this patchset separates NUMA statistics from
>> zone statistics framework, and update NUMA counter threshold to a fixed
>> size of 32765, as a small threshold greatly increases the update frequency
>> of the global counter from local per cpu counter (suggested by Ying Huang).
>> The rationality is that these statistics counters don't need to be read
>> often, unlike other VM counters, so it's not a problem to use a large
>> threshold and make readers more expensive.
>>
>> With this patchset, we see 26.6% drop of CPU cycles(537-->394, see below)
>> for per single page allocation and reclaim on Jesper's page_bench03
>> benchmark. Meanwhile, this patchset keeps the same style of virtual memory
>> statistics with little end-user-visible effects (see the first patch for
>> details), except that the number of NUMA items in each cpu
>> (vm_numa_stat_diff[]) is added to zone->vm_numa_stat[] when a user *reads*
>> the value of NUMA counter to eliminate deviation.
> 
> I'm very happy to see that you found my kernel module for benchmarking useful :-)
> 
>> I did an experiment of single page allocation and reclaim concurrently
>> using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based server
>> (88 processors with 126G memory) with different size of threshold of pcp
>> counter.
>>
>> Benchmark provided by Jesper D Broucer(increase loop times to 10000000):
>                                  ^^^^^^^
> You mis-spelled my last name, it is "Brouer".
> 
Dear Jesper, I am so sorry about it, please forgive me :)

>> https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
>>
>>    Threshold   CPU cycles    Throughput(88 threads)
>>       32        799         241760478
>>       64        640         301628829
>>       125       537         358906028 <==> system by default
>>       256       468         412397590
>>       512       428         450550704
>>       4096      399         482520943
>>       20000     394         489009617
>>       30000     395         488017817
>>       32765     394(-26.6%) 488932078(+36.2%) <==> with this patchset
>>       N/A       342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics
>>
>> Kemi Wang (2):
>>   mm: Change the call sites of numa statistics items
>>   mm: Update NUMA counter threshold size
>>
>>  drivers/base/node.c    |  22 ++++---
>>  include/linux/mmzone.h |  25 +++++---
>>  include/linux/vmstat.h |  33 ++++++++++
>>  mm/page_alloc.c        |  10 +--
>>  mm/vmstat.c            | 162 +++++++++++++++++++++++++++++++++++++++++++++++--
>>  5 files changed, 227 insertions(+), 25 deletions(-)
>>
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-16  3:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15  8:45 [PATCH 0/2] Separate NUMA statistics from zone statistics Kemi Wang
2017-08-15  8:45 ` [PATCH 1/2] mm: Change the call sites of numa statistics items Kemi Wang
2017-08-15  9:49   ` Mel Gorman
2017-08-16  2:12     ` kemi
2017-08-15  8:45 ` [PATCH 2/2] mm: Update NUMA counter threshold size Kemi Wang
2017-08-15  9:58   ` Mel Gorman
2017-08-15 16:55     ` Tim Chen
2017-08-15 17:30       ` Mel Gorman
2017-08-15 17:51         ` Tim Chen
2017-08-15 19:05           ` Mel Gorman
2017-08-16  3:02       ` kemi
2017-08-16  2:31     ` kemi
2017-08-22  3:21     ` kemi
2017-08-22  8:39       ` Mel Gorman
2017-08-22  8:53         ` kemi
2017-08-15 10:36 ` [PATCH 0/2] Separate NUMA statistics from zone statistics Jesper Dangaard Brouer
2017-08-16  3:23   ` kemi [this message]
2017-08-22 21:22 ` Christopher Lameter
2017-08-22 23:19   ` Andi Kleen
2017-08-23  1:14   ` kemi
2017-08-23  4:55     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7efb3a61-071b-b3cd-2f8a-a264ece9ab44@intel.com \
    --to=kemi.wang@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=brouer@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).