All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	andi.kleen@intel.com, tim.c.chen@intel.com,
	dave.hansen@intel.com, ying.huang@intel.com,
	Shakeel Butt <shakeelb@google.com>, Roman Gushchin <guro@fb.com>,
	songmuchun@bytedance.com
Subject: Re: [PATCH 2/2] mm: memcg: add a new MEMCG_UPDATE_BATCH
Date: Tue, 5 Jan 2021 09:57:38 +0800	[thread overview]
Message-ID: <20210105015738.GC101866@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20210104131540.GG13207@dhcp22.suse.cz>

On Mon, Jan 04, 2021 at 02:15:40PM +0100, Michal Hocko wrote:
> On Tue 29-12-20 22:35:14, Feng Tang wrote:
> > When profiling memory cgroup involved benchmarking, status update
> > sometimes take quite some CPU cycles. Current MEMCG_CHARGE_BATCH
> > is used for both charging and statistics/events updating, and is
> > set to 32, which may be good for accuracy of memcg charging, but
> > too small for stats update which causes concurrent access to global
> > stats data instead of per-cpu ones.
> > 
> > So handle them differently, by adding a new bigger batch number
> > for stats updating, while keeping the value for charging (though
> > comments in memcontrol.h suggests to consider a bigger value too)
> > 
> > The new batch is set to 512, which considers 2MB huge pages (512
> > pages), as the check logic mostly is:
> > 
> >     if (x > BATCH), then skip updating global data
> > 
> > so it will save 50% global data updating for 2MB pages
> 
> Please note that there is a patch set to change THP accounting to be per
> page based (http://lkml.kernel.org/r/20201228164110.2838-1-songmuchun@bytedance.com)
> which will change the current behavior already.

Thanks for the note! Muchun also mentioned the extra overhead will
be brought with his patchset, which may cause obvious performance
changes.

For the performance numbers in the commit log, they are not
bound to huge pages, as the benchmarks don't involve many
THPs (from vmstat). I tried 128, 256, 512, 1024 etc, which
all show similar test results, and 512 is picked specifically
for huge pages.

> Our batch size (MEMCG_CHARGE_BATCH) is quite arbitrary. I do not think
> anybody has ever seriously benchmarked the effect of the size. I am not
> opposed to changing that but I have to say I dislike the charge to
> diverge from counters in that respect. This just opens doors to weird
> effects IMO. Those two are quite related already.

Yes, separating to 2 batch number is the last method I can think of,
to avoid hurting the accuracy of charging.

Thanks,
Feng

> > Following are some performance data with the patch, against
> > v5.11-rc1, on several generations of Xeon platforms. Each category
> > below has several subcases run on different platform, and only the
> > worst and best scores are listed:
> > 
> > fio:				 +2.0% ~  +6.8%
> > will-it-scale/malloc:		 -0.9% ~  +6.2%
> > will-it-scale/page_fault1:	 no change
> > will-it-scale/page_fault2:	+13.7% ~ +26.2%
> > 
> > One thought is it could be dynamically calculated according to
> > memcg limit and number of CPUs, and another is to add a periodic
> > syncing of the data for accuracy reason similar to vmstat, as
> > suggested by Ying.
> > 
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > Cc: Shakeel Butt <shakeelb@google.com>
> > Cc: Roman Gushchin <guro@fb.com>
> > ---
> >  include/linux/memcontrol.h | 2 ++
> >  mm/memcontrol.c            | 6 +++---
> >  2 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index d827bd7..d58bf28 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -335,6 +335,8 @@ struct mem_cgroup {
> >   */
> >  #define MEMCG_CHARGE_BATCH 32U
> >  
> > +#define MEMCG_UPDATE_BATCH 512U
> > +
> >  extern struct mem_cgroup *root_mem_cgroup;
> >  
> >  enum page_memcg_data_flags {
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 605f671..01ca85d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -760,7 +760,7 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
> >   */
> >  void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
> >  {
> > -	long x, threshold = MEMCG_CHARGE_BATCH;
> > +	long x, threshold = MEMCG_UPDATE_BATCH;
> >  
> >  	if (mem_cgroup_disabled())
> >  		return;
> > @@ -800,7 +800,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
> >  {
> >  	struct mem_cgroup_per_node *pn;
> >  	struct mem_cgroup *memcg;
> > -	long x, threshold = MEMCG_CHARGE_BATCH;
> > +	long x, threshold = MEMCG_UPDATE_BATCH;
> >  
> >  	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> >  	memcg = pn->memcg;
> > @@ -905,7 +905,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
> >  		return;
> >  
> >  	x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
> > -	if (unlikely(x > MEMCG_CHARGE_BATCH)) {
> > +	if (unlikely(x > MEMCG_UPDATE_BATCH)) {
> >  		struct mem_cgroup *mi;
> >  
> >  		/*
> > -- 
> > 2.7.4
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2021-01-05  1:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-29 14:35 [PATCH 1/2] mm: page_counter: relayout structure to reduce false sharing Feng Tang
2020-12-29 14:35 ` [PATCH 2/2] mm: memcg: add a new MEMCG_UPDATE_BATCH Feng Tang
2020-12-29 17:13   ` Roman Gushchin
2021-01-04  2:53     ` Feng Tang
2021-01-04  7:46   ` [mm] 4d8191276e: vm-scalability.throughput 43.4% improvement kernel test robot
2021-01-04  7:46     ` kernel test robot
2021-01-04 13:15   ` [PATCH 2/2] mm: memcg: add a new MEMCG_UPDATE_BATCH Michal Hocko
2021-01-05  1:57     ` Feng Tang [this message]
2021-01-06  0:47   ` Shakeel Butt
2021-01-06  0:47     ` Shakeel Butt
2021-01-06  2:12     ` Feng Tang
2021-01-06  3:43       ` Chris Down
2021-01-06  3:45         ` Chris Down
2021-01-06  4:45         ` Feng Tang
2020-12-29 16:56 ` [PATCH 1/2] mm: page_counter: relayout structure to reduce false sharing Roman Gushchin
2020-12-30 14:19   ` Feng Tang
2021-01-04 13:03 ` Michal Hocko
2021-01-04 13:34   ` Feng Tang
2021-01-04 14:11     ` Michal Hocko
2021-01-04 14:44       ` Feng Tang
2021-01-04 15:34         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210105015738.GC101866@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=tim.c.chen@intel.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.