All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field
@ 2019-12-27 12:43 Yafang Shao
  2019-12-27 12:43 ` [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Yafang Shao @ 2019-12-27 12:43 UTC (permalink / raw)
  To: guro, hannes, mhocko, vdavydov.dev, akpm; +Cc: linux-mm, Yafang Shao

There are some members in struct mem_group can be either 0(false) or
1(true), so we can define them using bit field to reduce size. With this
patch, the size of struct mem_cgroup can be reduced by 64 bytes in theory,
but as there're some MEMCG_PADDING()s, the real number may be different,
which is relate with the cacheline size. Anyway, this patch could reduce
the size of struct mem_cgroup more or less.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
---
 include/linux/memcontrol.h | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a7a0a1a5..f68a9ef 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -229,20 +229,26 @@ struct mem_cgroup {
 	/*
 	 * Should the accounting and control be hierarchical, per subtree?
 	 */
-	bool use_hierarchy;
+	unsigned int use_hierarchy : 1;
+
+	/* Legacy tcp memory accounting */
+	unsigned int tcpmem_active : 1;
+	unsigned int tcpmem_pressure : 1;
 
 	/*
 	 * Should the OOM killer kill all belonging tasks, had it kill one?
 	 */
-	bool oom_group;
+	unsigned int  oom_group : 1;
 
 	/* protected by memcg_oom_lock */
-	bool		oom_lock;
-	int		under_oom;
+	unsigned int oom_lock : 1;
 
-	int	swappiness;
 	/* OOM-Killer disable */
-	int		oom_kill_disable;
+	unsigned int oom_kill_disable : 1;
+
+	int under_oom;
+
+	int	swappiness;
 
 	/* memory.events and memory.events.local */
 	struct cgroup_file events_file;
@@ -297,9 +303,6 @@ struct mem_cgroup {
 
 	unsigned long		socket_pressure;
 
-	/* Legacy tcp memory accounting */
-	bool			tcpmem_active;
-	int			tcpmem_pressure;
 
 #ifdef CONFIG_MEMCG_KMEM
         /* Index in the kmem_cache->memcg_params.memcg_caches array */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
  2019-12-27 12:43 [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
@ 2019-12-27 12:43 ` Yafang Shao
  2019-12-27 23:49   ` Roman Gushchin
  2019-12-27 23:55 ` [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Roman Gushchin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Yafang Shao @ 2019-12-27 12:43 UTC (permalink / raw)
  To: guro, hannes, mhocko, vdavydov.dev, akpm
  Cc: linux-mm, Yafang Shao, Chris Down, stable

memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
them won't be changed until next recalculation in this function. After
either or both of them are set, the next reclaimer to relcaim this memcg
may be a different reclaimer, e.g. this memcg is also the root memcg of
the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
the old values of them will be used to calculate scan count, that is not
proper. We should reset them to zero in this case.

Here's an example of this issue.

    root_mem_cgroup
         /
        A   memory.max=1024M memory.min=512M memory.current=800M

Once kswapd is waked up, it will try to scan all MEMCGs, including
this A, and it will assign memory.emin of A with 512M.
After that, A may reach its hard limit(memory.max), and then it will
do memcg reclaim. Because A is the root of this reclaimer, so it will
not calculate its memory.emin. So the memory.emin is the old value
512M, and then this old value will be used in
mem_cgroup_protection() in get_scan_count() to get the scan count.
That is not proper.

Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <guro@fb.com>
Cc: stable@vger.kernel.org
---
 mm/memcontrol.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 601405b..bb3925d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 
 	if (!root)
 		root = root_mem_cgroup;
-	if (memcg == root)
+	if (memcg == root) {
+		/*
+		 * Reset memory.(emin, elow) for reclaiming the memcg
+		 * itself.
+		 */
+		if (memcg != root_mem_cgroup) {
+			memcg->memory.emin = 0;
+			memcg->memory.elow = 0;
+		}
 		return MEMCG_PROT_NONE;
+	}
 
 	usage = page_counter_read(&memcg->memory);
 	if (!usage)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
  2019-12-27 12:43 ` [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
@ 2019-12-27 23:49   ` Roman Gushchin
  2019-12-28  1:45       ` Yafang Shao
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-12-27 23:49 UTC (permalink / raw)
  To: Yafang Shao
  Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm, Chris Down, stable

On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> them won't be changed until next recalculation in this function. After
> either or both of them are set, the next reclaimer to relcaim this memcg
> may be a different reclaimer, e.g. this memcg is also the root memcg of
> the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> the old values of them will be used to calculate scan count, that is not
> proper. We should reset them to zero in this case.
> 
> Here's an example of this issue.
> 
>     root_mem_cgroup
>          /
>         A   memory.max=1024M memory.min=512M memory.current=800M
> 
> Once kswapd is waked up, it will try to scan all MEMCGs, including
> this A, and it will assign memory.emin of A with 512M.
> After that, A may reach its hard limit(memory.max), and then it will
> do memcg reclaim. Because A is the root of this reclaimer, so it will
> not calculate its memory.emin. So the memory.emin is the old value
> 512M, and then this old value will be used in
> mem_cgroup_protection() in get_scan_count() to get the scan count.
> That is not proper.
> 
> Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Chris Down <chris@chrisdown.name>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/memcontrol.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 601405b..bb3925d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
>  
>  	if (!root)
>  		root = root_mem_cgroup;
> -	if (memcg == root)
> +	if (memcg == root) {
> +		/*
> +		 * Reset memory.(emin, elow) for reclaiming the memcg
> +		 * itself.
> +		 */
> +		if (memcg != root_mem_cgroup) {
> +			memcg->memory.emin = 0;
> +			memcg->memory.elow = 0;
> +		}

I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
set on the top level. Reaching this limit doesn't mean that all protection
configuration inside the tree can be ignored.

Instead we should respect memory.low/max set by a user on this level
(look at the parent == root case), maybe clamped by memory.high/max.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field
  2019-12-27 12:43 [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
  2019-12-27 12:43 ` [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
@ 2019-12-27 23:55 ` Roman Gushchin
  2019-12-28  4:22   ` Yafang Shao
  2019-12-31 22:31 ` Andrew Morton
  2020-01-06 10:19 ` Michal Hocko
  3 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-12-27 23:55 UTC (permalink / raw)
  To: Yafang Shao; +Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm

On Fri, Dec 27, 2019 at 07:43:52AM -0500, Yafang Shao wrote:
> There are some members in struct mem_group can be either 0(false) or
> 1(true), so we can define them using bit field to reduce size. With this
> patch, the size of struct mem_cgroup can be reduced by 64 bytes in theory,
> but as there're some MEMCG_PADDING()s, the real number may be different,
> which is relate with the cacheline size. Anyway, this patch could reduce
> the size of struct mem_cgroup more or less.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Roman Gushchin <guro@fb.com>
> ---
>  include/linux/memcontrol.h | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index a7a0a1a5..f68a9ef 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -229,20 +229,26 @@ struct mem_cgroup {
>  	/*
>  	 * Should the accounting and control be hierarchical, per subtree?
>  	 */
> -	bool use_hierarchy;
> +	unsigned int use_hierarchy : 1;
> +
> +	/* Legacy tcp memory accounting */
> +	unsigned int tcpmem_active : 1;
> +	unsigned int tcpmem_pressure : 1;
>  
>  	/*
>  	 * Should the OOM killer kill all belonging tasks, had it kill one?
>  	 */
> -	bool oom_group;
> +	unsigned int  oom_group : 1;
>  
>  	/* protected by memcg_oom_lock */
> -	bool		oom_lock;
> -	int		under_oom;
> +	unsigned int oom_lock : 1;

Hm, looking at the original code, it was clear that oom_lock
and under_oom are protected with memcg_oom_lock; but not oom_kill_disable.

This information seems to be lost.

Also, I'd look at the actual memory savings. Is it worth it?
Or it's all eaten by the padding.

Thanks!

>  
> -	int	swappiness;
>  	/* OOM-Killer disable */
> -	int		oom_kill_disable;
> +	unsigned int oom_kill_disable : 1;
> +
> +	int under_oom;
> +
> +	int	swappiness;
>  
>  	/* memory.events and memory.events.local */
>  	struct cgroup_file events_file;
> @@ -297,9 +303,6 @@ struct mem_cgroup {
>  
>  	unsigned long		socket_pressure;
>  
> -	/* Legacy tcp memory accounting */
> -	bool			tcpmem_active;
> -	int			tcpmem_pressure;
>  
>  #ifdef CONFIG_MEMCG_KMEM
>          /* Index in the kmem_cache->memcg_params.memcg_caches array */
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
  2019-12-27 23:49   ` Roman Gushchin
@ 2019-12-28  1:45       ` Yafang Shao
  0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2019-12-28  1:45 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm, Chris Down, stable

On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > them won't be changed until next recalculation in this function. After
> > either or both of them are set, the next reclaimer to relcaim this memcg
> > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > the old values of them will be used to calculate scan count, that is not
> > proper. We should reset them to zero in this case.
> >
> > Here's an example of this issue.
> >
> >     root_mem_cgroup
> >          /
> >         A   memory.max=1024M memory.min=512M memory.current=800M
> >
> > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > this A, and it will assign memory.emin of A with 512M.
> > After that, A may reach its hard limit(memory.max), and then it will
> > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > not calculate its memory.emin. So the memory.emin is the old value
> > 512M, and then this old value will be used in
> > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > That is not proper.
> >
> > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Roman Gushchin <guro@fb.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  mm/memcontrol.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 601405b..bb3925d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> >
> >       if (!root)
> >               root = root_mem_cgroup;
> > -     if (memcg == root)
> > +     if (memcg == root) {
> > +             /*
> > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > +              * itself.
> > +              */
> > +             if (memcg != root_mem_cgroup) {
> > +                     memcg->memory.emin = 0;
> > +                     memcg->memory.elow = 0;
> > +             }
>
> I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> set on the top level. Reaching this limit doesn't mean that all protection
> configuration inside the tree can be ignored.
>

No, they won't be ignored.
Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
its children's effective min and low.

> Instead we should respect memory.low/max set by a user on this level
> (look at the parent == root case), maybe clamped by memory.high/max.
>

Let's look at the parent == root case.
What if the parent is the root_mem_cgroup?
The memory.{emin, elow} of root_mem_cgroup is always 0 right ?
So what's your problem ?

Thanks
Yafang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
@ 2019-12-28  1:45       ` Yafang Shao
  0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2019-12-28  1:45 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm, Chris Down, stable

On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > them won't be changed until next recalculation in this function. After
> > either or both of them are set, the next reclaimer to relcaim this memcg
> > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > the old values of them will be used to calculate scan count, that is not
> > proper. We should reset them to zero in this case.
> >
> > Here's an example of this issue.
> >
> >     root_mem_cgroup
> >          /
> >         A   memory.max=1024M memory.min=512M memory.current=800M
> >
> > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > this A, and it will assign memory.emin of A with 512M.
> > After that, A may reach its hard limit(memory.max), and then it will
> > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > not calculate its memory.emin. So the memory.emin is the old value
> > 512M, and then this old value will be used in
> > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > That is not proper.
> >
> > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Roman Gushchin <guro@fb.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  mm/memcontrol.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 601405b..bb3925d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> >
> >       if (!root)
> >               root = root_mem_cgroup;
> > -     if (memcg == root)
> > +     if (memcg == root) {
> > +             /*
> > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > +              * itself.
> > +              */
> > +             if (memcg != root_mem_cgroup) {
> > +                     memcg->memory.emin = 0;
> > +                     memcg->memory.elow = 0;
> > +             }
>
> I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> set on the top level. Reaching this limit doesn't mean that all protection
> configuration inside the tree can be ignored.
>

No, they won't be ignored.
Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
its children's effective min and low.

> Instead we should respect memory.low/max set by a user on this level
> (look at the parent == root case), maybe clamped by memory.high/max.
>

Let's look at the parent == root case.
What if the parent is the root_mem_cgroup?
The memory.{emin, elow} of root_mem_cgroup is always 0 right ?
So what's your problem ?

Thanks
Yafang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
  2019-12-28  1:45       ` Yafang Shao
  (?)
@ 2019-12-28  2:59       ` Roman Gushchin
  2019-12-28  4:24           ` Yafang Shao
  -1 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-12-28  2:59 UTC (permalink / raw)
  To: Yafang Shao
  Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm, Chris Down, stable

On Sat, Dec 28, 2019 at 09:45:11AM +0800, Yafang Shao wrote:
> On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > them won't be changed until next recalculation in this function. After
> > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > the old values of them will be used to calculate scan count, that is not
> > > proper. We should reset them to zero in this case.
> > >
> > > Here's an example of this issue.
> > >
> > >     root_mem_cgroup
> > >          /
> > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > >
> > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > this A, and it will assign memory.emin of A with 512M.
> > > After that, A may reach its hard limit(memory.max), and then it will
> > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > not calculate its memory.emin. So the memory.emin is the old value
> > > 512M, and then this old value will be used in
> > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > That is not proper.
> > >
> > > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > Cc: Chris Down <chris@chrisdown.name>
> > > Cc: Roman Gushchin <guro@fb.com>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  mm/memcontrol.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 601405b..bb3925d 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> > >
> > >       if (!root)
> > >               root = root_mem_cgroup;
> > > -     if (memcg == root)
> > > +     if (memcg == root) {
> > > +             /*
> > > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > > +              * itself.
> > > +              */
> > > +             if (memcg != root_mem_cgroup) {
> > > +                     memcg->memory.emin = 0;
> > > +                     memcg->memory.elow = 0;
> > > +             }
> >
> > I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> > protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> > set on the top level. Reaching this limit doesn't mean that all protection
> > configuration inside the tree can be ignored.
> >
> 
> No, they won't be ignored.
> Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
> its children's effective min and low.

Ah, you're right. I forgot about this
    if (parent == root)
	goto exit;

which saves elow/emin from being truncated to 0. Sorry.

Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field
  2019-12-27 23:55 ` [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Roman Gushchin
@ 2019-12-28  4:22   ` Yafang Shao
  0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2019-12-28  4:22 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm

On Sat, Dec 28, 2019 at 7:55 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Dec 27, 2019 at 07:43:52AM -0500, Yafang Shao wrote:
> > There are some members in struct mem_group can be either 0(false) or
> > 1(true), so we can define them using bit field to reduce size. With this
> > patch, the size of struct mem_cgroup can be reduced by 64 bytes in theory,
> > but as there're some MEMCG_PADDING()s, the real number may be different,
> > which is relate with the cacheline size. Anyway, this patch could reduce
> > the size of struct mem_cgroup more or less.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Roman Gushchin <guro@fb.com>
> > ---
> >  include/linux/memcontrol.h | 21 ++++++++++++---------
> >  1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index a7a0a1a5..f68a9ef 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -229,20 +229,26 @@ struct mem_cgroup {
> >       /*
> >        * Should the accounting and control be hierarchical, per subtree?
> >        */
> > -     bool use_hierarchy;
> > +     unsigned int use_hierarchy : 1;
> > +
> > +     /* Legacy tcp memory accounting */
> > +     unsigned int tcpmem_active : 1;
> > +     unsigned int tcpmem_pressure : 1;
> >
> >       /*
> >        * Should the OOM killer kill all belonging tasks, had it kill one?
> >        */
> > -     bool oom_group;
> > +     unsigned int  oom_group : 1;
> >
> >       /* protected by memcg_oom_lock */
> > -     bool            oom_lock;
> > -     int             under_oom;
> > +     unsigned int oom_lock : 1;
>
> Hm, looking at the original code, it was clear that oom_lock
> and under_oom are protected with memcg_oom_lock; but not oom_kill_disable.
>
> This information seems to be lost.
>

Should add this comment. Thanks for pointing this out.

> Also, I'd look at the actual memory savings. Is it worth it?
> Or it's all eaten by the padding.
>

As explained in the commit log, the real size depends on the cacheline size,
and in the future we may introduce other new bool members.
I have verified it on my server with 64B-cacheline, and the saveing is 0.

Actually there's no strong reason to make this minor optimization.

> Thanks!
>
> >
> > -     int     swappiness;
> >       /* OOM-Killer disable */
> > -     int             oom_kill_disable;
> > +     unsigned int oom_kill_disable : 1;
> > +
> > +     int under_oom;
> > +
> > +     int     swappiness;
> >
> >       /* memory.events and memory.events.local */
> >       struct cgroup_file events_file;
> > @@ -297,9 +303,6 @@ struct mem_cgroup {
> >
> >       unsigned long           socket_pressure;
> >
> > -     /* Legacy tcp memory accounting */
> > -     bool                    tcpmem_active;
> > -     int                     tcpmem_pressure;
> >
> >  #ifdef CONFIG_MEMCG_KMEM
> >          /* Index in the kmem_cache->memcg_params.memcg_caches array */
> > --
> > 1.8.3.1
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
  2019-12-28  2:59       ` Roman Gushchin
@ 2019-12-28  4:24           ` Yafang Shao
  0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2019-12-28  4:24 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm, Chris Down, stable

On Sat, Dec 28, 2019 at 11:00 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Sat, Dec 28, 2019 at 09:45:11AM +0800, Yafang Shao wrote:
> > On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > them won't be changed until next recalculation in this function. After
> > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > the old values of them will be used to calculate scan count, that is not
> > > > proper. We should reset them to zero in this case.
> > > >
> > > > Here's an example of this issue.
> > > >
> > > >     root_mem_cgroup
> > > >          /
> > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > >
> > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > this A, and it will assign memory.emin of A with 512M.
> > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > 512M, and then this old value will be used in
> > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > That is not proper.
> > > >
> > > > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > Cc: Chris Down <chris@chrisdown.name>
> > > > Cc: Roman Gushchin <guro@fb.com>
> > > > Cc: stable@vger.kernel.org
> > > > ---
> > > >  mm/memcontrol.c | 11 ++++++++++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 601405b..bb3925d 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> > > >
> > > >       if (!root)
> > > >               root = root_mem_cgroup;
> > > > -     if (memcg == root)
> > > > +     if (memcg == root) {
> > > > +             /*
> > > > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > > > +              * itself.
> > > > +              */
> > > > +             if (memcg != root_mem_cgroup) {
> > > > +                     memcg->memory.emin = 0;
> > > > +                     memcg->memory.elow = 0;
> > > > +             }
> > >
> > > I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> > > protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> > > set on the top level. Reaching this limit doesn't mean that all protection
> > > configuration inside the tree can be ignored.
> > >
> >
> > No, they won't be ignored.
> > Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
> > its children's effective min and low.
>
> Ah, you're right. I forgot about this
>     if (parent == root)
>         goto exit;
>
> which saves elow/emin from being truncated to 0. Sorry.
>
> Please, feel free to add
> Acked-by: Roman Gushchin <guro@fb.com>
>

Thanks for your review.

Thanks
Yafang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
@ 2019-12-28  4:24           ` Yafang Shao
  0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2019-12-28  4:24 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: hannes, mhocko, vdavydov.dev, akpm, linux-mm, Chris Down, stable

On Sat, Dec 28, 2019 at 11:00 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Sat, Dec 28, 2019 at 09:45:11AM +0800, Yafang Shao wrote:
> > On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > them won't be changed until next recalculation in this function. After
> > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > the old values of them will be used to calculate scan count, that is not
> > > > proper. We should reset them to zero in this case.
> > > >
> > > > Here's an example of this issue.
> > > >
> > > >     root_mem_cgroup
> > > >          /
> > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > >
> > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > this A, and it will assign memory.emin of A with 512M.
> > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > 512M, and then this old value will be used in
> > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > That is not proper.
> > > >
> > > > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > Cc: Chris Down <chris@chrisdown.name>
> > > > Cc: Roman Gushchin <guro@fb.com>
> > > > Cc: stable@vger.kernel.org
> > > > ---
> > > >  mm/memcontrol.c | 11 ++++++++++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 601405b..bb3925d 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> > > >
> > > >       if (!root)
> > > >               root = root_mem_cgroup;
> > > > -     if (memcg == root)
> > > > +     if (memcg == root) {
> > > > +             /*
> > > > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > > > +              * itself.
> > > > +              */
> > > > +             if (memcg != root_mem_cgroup) {
> > > > +                     memcg->memory.emin = 0;
> > > > +                     memcg->memory.elow = 0;
> > > > +             }
> > >
> > > I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> > > protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> > > set on the top level. Reaching this limit doesn't mean that all protection
> > > configuration inside the tree can be ignored.
> > >
> >
> > No, they won't be ignored.
> > Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
> > its children's effective min and low.
>
> Ah, you're right. I forgot about this
>     if (parent == root)
>         goto exit;
>
> which saves elow/emin from being truncated to 0. Sorry.
>
> Please, feel free to add
> Acked-by: Roman Gushchin <guro@fb.com>
>

Thanks for your review.

Thanks
Yafang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field
  2019-12-27 12:43 [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
  2019-12-27 12:43 ` [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
  2019-12-27 23:55 ` [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Roman Gushchin
@ 2019-12-31 22:31 ` Andrew Morton
  2020-01-02  5:43   ` Yafang Shao
  2020-01-06 10:19 ` Michal Hocko
  3 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2019-12-31 22:31 UTC (permalink / raw)
  To: Yafang Shao; +Cc: guro, hannes, mhocko, vdavydov.dev, linux-mm

On Fri, 27 Dec 2019 07:43:52 -0500 Yafang Shao <laoar.shao@gmail.com> wrote:

> There are some members in struct mem_group can be either 0(false) or
> 1(true), so we can define them using bit field to reduce size. With this
> patch, the size of struct mem_cgroup can be reduced by 64 bytes in theory,
> but as there're some MEMCG_PADDING()s, the real number may be different,
> which is relate with the cacheline size. Anyway, this patch could reduce
> the size of struct mem_cgroup more or less.
> 
> ...
>
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -229,20 +229,26 @@ struct mem_cgroup {
>  	/*
>  	 * Should the accounting and control be hierarchical, per subtree?
>  	 */
> -	bool use_hierarchy;
> +	unsigned int use_hierarchy : 1;
> +
> +	/* Legacy tcp memory accounting */
> +	unsigned int tcpmem_active : 1;
> +	unsigned int tcpmem_pressure : 1;

Kernel coding style for this is normally no-spaces:

	bool foo:1;



More significantly...  Now that these fields share the same word of
memory, what prevents races when two CPUs do read-modify-write
operations on adjacent bitfields?

This:

struct foo {
	int a;
	int b;
};

doesn't need locking to prevent modifications of `a' from scribbling on
`b'.  But with this:

struct foo {
	int a:1;
	int b:1;
}

a simple `a = 1' on CPU1 could race with a `b = 1' on CPU2.

I think.  Maybe the compiler can take care of this in some fashion, but
it would require atomic bitops and I doubt if gcc does that for us?




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field
  2019-12-31 22:31 ` Andrew Morton
@ 2020-01-02  5:43   ` Yafang Shao
  0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2020-01-02  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Linux MM

On Wed, Jan 1, 2020 at 6:31 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 27 Dec 2019 07:43:52 -0500 Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > There are some members in struct mem_group can be either 0(false) or
> > 1(true), so we can define them using bit field to reduce size. With this
> > patch, the size of struct mem_cgroup can be reduced by 64 bytes in theory,
> > but as there're some MEMCG_PADDING()s, the real number may be different,
> > which is relate with the cacheline size. Anyway, this patch could reduce
> > the size of struct mem_cgroup more or less.
> >
> > ...
> >
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -229,20 +229,26 @@ struct mem_cgroup {
> >       /*
> >        * Should the accounting and control be hierarchical, per subtree?
> >        */
> > -     bool use_hierarchy;
> > +     unsigned int use_hierarchy : 1;
> > +
> > +     /* Legacy tcp memory accounting */
> > +     unsigned int tcpmem_active : 1;
> > +     unsigned int tcpmem_pressure : 1;
>
> Kernel coding style for this is normally no-spaces:
>
>         bool foo:1;
>

I always learn the kernel coding style from the kernel source code.
Before I tried to define them using bit field in the kernel, I checked
what the kernel did in the past.
I found there're some places defined with spaces[1], some places
defined without spaces[2].
Finally I selected the one with spaces, unfortunately that's the wrong one .
Anyway I know what the right thing is now, thanks.

[1]. https://elixir.bootlin.com/linux/v5.5-rc4/source/include/linux/tcp.h#L87
[2]. https://elixir.bootlin.com/linux/v5.5-rc4/source/include/linux/tcp.h#L213

>
>
> More significantly...  Now that these fields share the same word of
> memory, what prevents races when two CPUs do read-modify-write
> operations on adjacent bitfields?
>
> This:
>
> struct foo {
>         int a;
>         int b;
> };
>
> doesn't need locking to prevent modifications of `a' from scribbling on
> `b'.  But with this:
>
> struct foo {
>         int a:1;
>         int b:1;
> }
>
> a simple `a = 1' on CPU1 could race with a `b = 1' on CPU2.
>
> I think.  Maybe the compiler can take care of this in some fashion, but
> it would require atomic bitops and I doubt if gcc does that for us?
>
>

GCC can guarantees that accesses to distinct structure members (which
aren't part of a bit-field) are independent, but it can't guarantee
that to bitfields.
I thought there are some synchronization mechanism to protect memcg
against concurrent access.

Thanks
Yafang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field
  2019-12-27 12:43 [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
                   ` (2 preceding siblings ...)
  2019-12-31 22:31 ` Andrew Morton
@ 2020-01-06 10:19 ` Michal Hocko
  3 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2020-01-06 10:19 UTC (permalink / raw)
  To: Yafang Shao; +Cc: guro, hannes, vdavydov.dev, akpm, linux-mm

On Fri 27-12-19 07:43:52, Yafang Shao wrote:
> There are some members in struct mem_group can be either 0(false) or
> 1(true), so we can define them using bit field to reduce size. With this
> patch, the size of struct mem_cgroup can be reduced by 64 bytes in theory,
> but as there're some MEMCG_PADDING()s, the real number may be different,
> which is relate with the cacheline size. Anyway, this patch could reduce
> the size of struct mem_cgroup more or less.

Pahole says
        /* size: 2496, cachelines: 39, members: 48 */
        /* sum members: 2364, holes: 8, sum holes: 108 */
        /* padding: 24 */
        /* forced alignments: 3, forced holes: 2, sum forced holes: 88 */

after the change
        /* size: 2496, cachelines: 39, members: 48 */
        /* sum members: 2352, holes: 7, sum holes: 108 */
        /* sum bitfield members: 6 bits, bit holes: 1, sum bit holes: 26 bits */
        /* padding: 32 */
        /* forced alignments: 3, forced holes: 2, sum forced holes: 88 */

so exactly 0 change. I do not think this reshuffling is worth it.

> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Roman Gushchin <guro@fb.com>
> ---
>  include/linux/memcontrol.h | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index a7a0a1a5..f68a9ef 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -229,20 +229,26 @@ struct mem_cgroup {
>  	/*
>  	 * Should the accounting and control be hierarchical, per subtree?
>  	 */
> -	bool use_hierarchy;
> +	unsigned int use_hierarchy : 1;
> +
> +	/* Legacy tcp memory accounting */
> +	unsigned int tcpmem_active : 1;
> +	unsigned int tcpmem_pressure : 1;
>  
>  	/*
>  	 * Should the OOM killer kill all belonging tasks, had it kill one?
>  	 */
> -	bool oom_group;
> +	unsigned int  oom_group : 1;
>  
>  	/* protected by memcg_oom_lock */
> -	bool		oom_lock;
> -	int		under_oom;
> +	unsigned int oom_lock : 1;
>  
> -	int	swappiness;
>  	/* OOM-Killer disable */
> -	int		oom_kill_disable;
> +	unsigned int oom_kill_disable : 1;
> +
> +	int under_oom;
> +
> +	int	swappiness;
>  
>  	/* memory.events and memory.events.local */
>  	struct cgroup_file events_file;
> @@ -297,9 +303,6 @@ struct mem_cgroup {
>  
>  	unsigned long		socket_pressure;
>  
> -	/* Legacy tcp memory accounting */
> -	bool			tcpmem_active;
> -	int			tcpmem_pressure;
>  
>  #ifdef CONFIG_MEMCG_KMEM
>          /* Index in the kmem_cache->memcg_params.memcg_caches array */
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-01-06 10:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-27 12:43 [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-27 12:43 ` [PATCH] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-27 23:49   ` Roman Gushchin
2019-12-28  1:45     ` Yafang Shao
2019-12-28  1:45       ` Yafang Shao
2019-12-28  2:59       ` Roman Gushchin
2019-12-28  4:24         ` Yafang Shao
2019-12-28  4:24           ` Yafang Shao
2019-12-27 23:55 ` [PATCH] mm, memcg: reduce size of struct mem_cgroup by using bit field Roman Gushchin
2019-12-28  4:22   ` Yafang Shao
2019-12-31 22:31 ` Andrew Morton
2020-01-02  5:43   ` Yafang Shao
2020-01-06 10:19 ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.