linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage
@ 2020-12-06  8:56 Muchun Song
  2020-12-07 12:36 ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2020-12-06  8:56 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, sfr,
	alexander.h.duyck, chris, laoar.shao, richard.weiyang
  Cc: linux-kernel, cgroups, linux-mm, Muchun Song

The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), so the type of s32
of lruvec_stat_cpu is enough. And introduce struct per_cpu_lruvec_stat
to optimize memory usage.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h | 6 +++++-
 mm/memcontrol.c            | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f9a496c4eac7..34cf119976b1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -92,6 +92,10 @@ struct lruvec_stat {
 	long count[NR_VM_NODE_STAT_ITEMS];
 };
 
+struct per_cpu_lruvec_stat {
+	s32 count[NR_VM_NODE_STAT_ITEMS];
+};
+
 /*
  * Bitmap of shrinker::id corresponding to memcg-aware shrinkers,
  * which have elements charged to this memcg.
@@ -111,7 +115,7 @@ struct mem_cgroup_per_node {
 	struct lruvec_stat __percpu *lruvec_stat_local;
 
 	/* Subtree VM stats (batched updates) */
-	struct lruvec_stat __percpu *lruvec_stat_cpu;
+	struct per_cpu_lruvec_stat __percpu *lruvec_stat_cpu;
 	atomic_long_t		lruvec_stat[NR_VM_NODE_STAT_ITEMS];
 
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 49fbcf003bf5..c874ea37b05d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5184,7 +5184,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 		return 1;
 	}
 
-	pn->lruvec_stat_cpu = alloc_percpu_gfp(struct lruvec_stat,
+	pn->lruvec_stat_cpu = alloc_percpu_gfp(struct per_cpu_lruvec_stat,
 					       GFP_KERNEL_ACCOUNT);
 	if (!pn->lruvec_stat_cpu) {
 		free_percpu(pn->lruvec_stat_local);
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage
  2020-12-06  8:56 [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage Muchun Song
@ 2020-12-07 12:36 ` Michal Hocko
  2020-12-07 12:56   ` [External] " Muchun Song
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2020-12-07 12:36 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, sfr,
	alexander.h.duyck, chris, laoar.shao, richard.weiyang,
	linux-kernel, cgroups, linux-mm

On Sun 06-12-20 16:56:39, Muchun Song wrote:
> The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), so the type of s32
> of lruvec_stat_cpu is enough. And introduce struct per_cpu_lruvec_stat
> to optimize memory usage.

How much savings are we talking about here? I am not deeply familiar
with the pcp allocator but can it compact smaller data types much
better?

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/memcontrol.h | 6 +++++-
>  mm/memcontrol.c            | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f9a496c4eac7..34cf119976b1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -92,6 +92,10 @@ struct lruvec_stat {
>  	long count[NR_VM_NODE_STAT_ITEMS];
>  };
>  
> +struct per_cpu_lruvec_stat {
> +	s32 count[NR_VM_NODE_STAT_ITEMS];
> +};
> +
>  /*
>   * Bitmap of shrinker::id corresponding to memcg-aware shrinkers,
>   * which have elements charged to this memcg.
> @@ -111,7 +115,7 @@ struct mem_cgroup_per_node {
>  	struct lruvec_stat __percpu *lruvec_stat_local;
>  
>  	/* Subtree VM stats (batched updates) */
> -	struct lruvec_stat __percpu *lruvec_stat_cpu;
> +	struct per_cpu_lruvec_stat __percpu *lruvec_stat_cpu;
>  	atomic_long_t		lruvec_stat[NR_VM_NODE_STAT_ITEMS];
>  
>  	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 49fbcf003bf5..c874ea37b05d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5184,7 +5184,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  		return 1;
>  	}
>  
> -	pn->lruvec_stat_cpu = alloc_percpu_gfp(struct lruvec_stat,
> +	pn->lruvec_stat_cpu = alloc_percpu_gfp(struct per_cpu_lruvec_stat,
>  					       GFP_KERNEL_ACCOUNT);
>  	if (!pn->lruvec_stat_cpu) {
>  		free_percpu(pn->lruvec_stat_local);
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage
  2020-12-07 12:36 ` Michal Hocko
@ 2020-12-07 12:56   ` Muchun Song
  2020-12-07 15:09     ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2020-12-07 12:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Stephen Rothwell, alexander.h.duyck, Chris Down,
	Yafang Shao, richard.weiyang, LKML, Cgroups,
	Linux Memory Management List

On Mon, Dec 7, 2020 at 8:36 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Sun 06-12-20 16:56:39, Muchun Song wrote:
> > The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), so the type of s32
> > of lruvec_stat_cpu is enough. And introduce struct per_cpu_lruvec_stat
> > to optimize memory usage.
>
> How much savings are we talking about here? I am not deeply familiar
> with the pcp allocator but can it compact smaller data types much
> better?

It is a percpu struct. The size of struct lruvec_stat is 304(tested on the
linux-5.5). So we can save 304 / 2 * nproc bytes per memcg where nproc
is the number of the possible CPU. If we have n memory cgroup in the
system. Finally, we can save (152 * nproc * n) bytes. In some configurations,
nproc here may be 512. And if we have a lot of dying cgroup. The n can be
100, 000 (I once saw it on my server).

It can not be smaller. Because the vmstat threshold is 32, and we have
some vmstat counters whose unit is byte. So the max bytes is 32 * 4k
which is 131072. The s16 can't hold these. Thanks.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/memcontrol.h | 6 +++++-
> >  mm/memcontrol.c            | 2 +-
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index f9a496c4eac7..34cf119976b1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -92,6 +92,10 @@ struct lruvec_stat {
> >       long count[NR_VM_NODE_STAT_ITEMS];
> >  };
> >
> > +struct per_cpu_lruvec_stat {
> > +     s32 count[NR_VM_NODE_STAT_ITEMS];
> > +};
> > +
> >  /*
> >   * Bitmap of shrinker::id corresponding to memcg-aware shrinkers,
> >   * which have elements charged to this memcg.
> > @@ -111,7 +115,7 @@ struct mem_cgroup_per_node {
> >       struct lruvec_stat __percpu *lruvec_stat_local;
> >
> >       /* Subtree VM stats (batched updates) */
> > -     struct lruvec_stat __percpu *lruvec_stat_cpu;
> > +     struct per_cpu_lruvec_stat __percpu *lruvec_stat_cpu;
> >       atomic_long_t           lruvec_stat[NR_VM_NODE_STAT_ITEMS];
> >
> >       unsigned long           lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 49fbcf003bf5..c874ea37b05d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5184,7 +5184,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> >               return 1;
> >       }
> >
> > -     pn->lruvec_stat_cpu = alloc_percpu_gfp(struct lruvec_stat,
> > +     pn->lruvec_stat_cpu = alloc_percpu_gfp(struct per_cpu_lruvec_stat,
> >                                              GFP_KERNEL_ACCOUNT);
> >       if (!pn->lruvec_stat_cpu) {
> >               free_percpu(pn->lruvec_stat_local);
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs



-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage
  2020-12-07 12:56   ` [External] " Muchun Song
@ 2020-12-07 15:09     ` Michal Hocko
  2020-12-07 15:19       ` Muchun Song
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2020-12-07 15:09 UTC (permalink / raw)
  To: Muchun Song
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Stephen Rothwell, alexander.h.duyck, Chris Down,
	Yafang Shao, richard.weiyang, LKML, Cgroups,
	Linux Memory Management List

On Mon 07-12-20 20:56:58, Muchun Song wrote:
> On Mon, Dec 7, 2020 at 8:36 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Sun 06-12-20 16:56:39, Muchun Song wrote:
> > > The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), so the type of s32
> > > of lruvec_stat_cpu is enough. And introduce struct per_cpu_lruvec_stat
> > > to optimize memory usage.
> >
> > How much savings are we talking about here? I am not deeply familiar
> > with the pcp allocator but can it compact smaller data types much
> > better?
> 
> It is a percpu struct. The size of struct lruvec_stat is 304(tested on the
> linux-5.5). So we can save 304 / 2 * nproc bytes per memcg where nproc
> is the number of the possible CPU. If we have n memory cgroup in the
> system. Finally, we can save (152 * nproc * n) bytes. In some configurations,
> nproc here may be 512. And if we have a lot of dying cgroup. The n can be
> 100, 000 (I once saw it on my server).

This should be part of the changelog. In general, any optimization
should come with some numbers showing the effect of the optimization.

As I've said I am not really familiar with pcp internals and how
efficiently it can organize smaller objects. Maybe it can really half
the memory consumption.

My only concern is that using smaller types for these counters can fire
back later on because we have an inderect dependency between the batch
size and the data type.  In general I do not really object to the patch
as long as savings are non trivial so that we are not creating a
potential trap for something that is practically miniscule
microptimization.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage
  2020-12-07 15:09     ` Michal Hocko
@ 2020-12-07 15:19       ` Muchun Song
  0 siblings, 0 replies; 5+ messages in thread
From: Muchun Song @ 2020-12-07 15:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Stephen Rothwell, Chris Down, Yafang Shao,
	richard.weiyang, LKML, Cgroups, Linux Memory Management List

On Mon, Dec 7, 2020 at 11:09 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 07-12-20 20:56:58, Muchun Song wrote:
> > On Mon, Dec 7, 2020 at 8:36 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Sun 06-12-20 16:56:39, Muchun Song wrote:
> > > > The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), so the type of s32
> > > > of lruvec_stat_cpu is enough. And introduce struct per_cpu_lruvec_stat
> > > > to optimize memory usage.
> > >
> > > How much savings are we talking about here? I am not deeply familiar
> > > with the pcp allocator but can it compact smaller data types much
> > > better?
> >
> > It is a percpu struct. The size of struct lruvec_stat is 304(tested on the
> > linux-5.5). So we can save 304 / 2 * nproc bytes per memcg where nproc
> > is the number of the possible CPU. If we have n memory cgroup in the
> > system. Finally, we can save (152 * nproc * n) bytes. In some configurations,
> > nproc here may be 512. And if we have a lot of dying cgroup. The n can be
> > 100, 000 (I once saw it on my server).
>
> This should be part of the changelog. In general, any optimization
> should come with some numbers showing the effect of the optimization.
>
> As I've said I am not really familiar with pcp internals and how
> efficiently it can organize smaller objects. Maybe it can really half
> the memory consumption.
>
> My only concern is that using smaller types for these counters can fire
> back later on because we have an inderect dependency between the batch
> size and the data type.  In general I do not really object to the patch
> as long as savings are non trivial so that we are not creating a
> potential trap for something that is practically miniscule
> microptimization.

There is a similar structure named struct per_cpu_nodestat.

struct per_cpu_nodestat {
        s8 stat_threshold;
        s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS];
};

The s8 is enough for per-node vmstat counters. This also depends on
the batch size. It can be s8 for a long time. Why not s32 is not suitable
for the per-memcg vmstat counters? They are very similar, right?

Thanks.

> --
> Michal Hocko
> SUSE Labs



-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-07 15:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-06  8:56 [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage Muchun Song
2020-12-07 12:36 ` Michal Hocko
2020-12-07 12:56   ` [External] " Muchun Song
2020-12-07 15:09     ` Michal Hocko
2020-12-07 15:19       ` Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).