linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] cgroup: introduce proportional protection on memcg
@ 2022-03-24  9:22 zhaoyang.huang
  2022-03-24 14:27 ` Chris Down
  0 siblings, 1 reply; 7+ messages in thread
From: zhaoyang.huang @ 2022-03-24  9:22 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	ke wang, Zhaoyang Huang, linux-mm, linux-kernel, cgroups

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

current memcg protection via min,low,high asks for an evaluation of
protected entity, which could be hard for some system. Furthermore, the usage
could also be various under different scenarios(imagin keep protecting 50M when
usage change from 100M to 300M), which make the protection less meaning.
So we introduce the proportional protection over memcg's ever highest
usage(watermark) to overcome above constraints.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 include/linux/page_counter.h |  3 +++
 mm/memcontrol.c              | 17 +++++++++++++----
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 6795913..7762629 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -27,6 +27,9 @@ struct page_counter {
 	unsigned long watermark;
 	unsigned long failcnt;
 
+	/* proportional protection */
+	unsigned long min_prop;
+	unsigned long low_prop;
 	/*
 	 * 'parent' is placed here to be far from 'usage' to reduce
 	 * cache false sharing, as 'usage' is written mostly while
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 508bcea..937c6ce 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6616,6 +6616,7 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
 {
 	unsigned long usage, parent_usage;
 	struct mem_cgroup *parent;
+	unsigned long memcg_emin, memcg_elow, parent_emin, parent_elow;
 
 	if (mem_cgroup_disabled())
 		return;
@@ -6650,14 +6651,22 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
 
 	parent_usage = page_counter_read(&parent->memory);
 
+	/* use proportional protect first and take 1024 as 100% */
+	memcg_emin = READ_ONCE(memcg->memory.min_prop) ?
+		READ_ONCE(memcg->memory.min_prop) * READ_ONCE(memcg->memory.watermark) / 1024 : READ_ONCE(memcg->memory.min);
+	memcg_elow = READ_ONCE(memcg->memory.low_prop) ?
+		READ_ONCE(memcg->memory.low_prop) * READ_ONCE(memcg->memory.watermark) / 1024 : READ_ONCE(memcg->memory.low);
+	parent_emin = READ_ONCE(parent->memory.min_prop) ?
+		READ_ONCE(parent->memory.min_prop) * READ_ONCE(parent->memory.watermark) / 1024 : READ_ONCE(parent->memory.emin);
+	parent_elow = READ_ONCE(parent->memory.low_prop) ?
+		READ_ONCE(parent->memory.low_prop) * READ_ONCE(parent->memory.watermark) / 1024 : READ_ONCE(parent->memory.elow);
+
 	WRITE_ONCE(memcg->memory.emin, effective_protection(usage, parent_usage,
-			READ_ONCE(memcg->memory.min),
-			READ_ONCE(parent->memory.emin),
+			memcg_emin, parent_emin,
 			atomic_long_read(&parent->memory.children_min_usage)));
 
 	WRITE_ONCE(memcg->memory.elow, effective_protection(usage, parent_usage,
-			READ_ONCE(memcg->memory.low),
-			READ_ONCE(parent->memory.elow),
+			memcg_elow, parent_elow,
 			atomic_long_read(&parent->memory.children_low_usage)));
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] cgroup: introduce proportional protection on memcg
  2022-03-24  9:22 [RFC PATCH] cgroup: introduce proportional protection on memcg zhaoyang.huang
@ 2022-03-24 14:27 ` Chris Down
  2022-03-24 16:23   ` Roman Gushchin
  2022-03-25  3:02   ` Zhaoyang Huang
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Down @ 2022-03-24 14:27 UTC (permalink / raw)
  To: zhaoyang.huang
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	ke wang, Zhaoyang Huang, linux-mm, linux-kernel, cgroups

I'm confused by the aims of this patch. We already have proportional reclaim 
for memory.min and memory.low, and memory.high is already "proportional" by its 
nature to drive memory back down behind the configured threshold.

Could you please be more clear about what you're trying to achieve and in what 
way the existing proportional reclaim mechanisms are insufficient for you?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] cgroup: introduce proportional protection on memcg
  2022-03-24 14:27 ` Chris Down
@ 2022-03-24 16:23   ` Roman Gushchin
  2022-03-25  3:10     ` Zhaoyang Huang
  2022-03-25  3:02   ` Zhaoyang Huang
  1 sibling, 1 reply; 7+ messages in thread
From: Roman Gushchin @ 2022-03-24 16:23 UTC (permalink / raw)
  To: Chris Down
  Cc: zhaoyang.huang, Andrew Morton, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, ke wang, Zhaoyang Huang, linux-mm,
	linux-kernel, cgroups

It seems like what’s being proposed is an ability to express the protection in % of the current usage rather than an absolute number.
It’s an equivalent for something like a memory (reclaim) priority: e.g. a cgroup with 80% protection is _always_ reclaimed less aggressively than one with a 20% protection.

That said, I’m not a fan of this idea.
It might make sense in some reasonable range of usages, but if your workload is simply leaking memory and growing indefinitely, protecting it seems like a bad idea. And the first part can be easily achieved using an userspace tool.

Thanks!

> On Mar 24, 2022, at 7:33 AM, Chris Down <chris@chrisdown.name> wrote:
> 
> I'm confused by the aims of this patch. We already have proportional reclaim for memory.min and memory.low, and memory.high is already "proportional" by its nature to drive memory back down behind the configured threshold.
> 
> Could you please be more clear about what you're trying to achieve and in what way the existing proportional reclaim mechanisms are insufficient for you?
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] cgroup: introduce proportional protection on memcg
  2022-03-24 14:27 ` Chris Down
  2022-03-24 16:23   ` Roman Gushchin
@ 2022-03-25  3:02   ` Zhaoyang Huang
  2022-03-25  3:08     ` Zhaoyang Huang
  1 sibling, 1 reply; 7+ messages in thread
From: Zhaoyang Huang @ 2022-03-25  3:02 UTC (permalink / raw)
  To: Chris Down
  Cc: zhaoyang.huang, Andrew Morton, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, ke wang, open list:MEMORY MANAGEMENT, LKML,
	cgroups

On Thu, Mar 24, 2022 at 10:27 PM Chris Down <chris@chrisdown.name> wrote:
>
> I'm confused by the aims of this patch. We already have proportional reclaim
> for memory.min and memory.low, and memory.high is already "proportional" by its
> nature to drive memory back down behind the configured threshold.
>
> Could you please be more clear about what you're trying to achieve and in what
> way the existing proportional reclaim mechanisms are insufficient for you?
What I am trying to solve is that, the memcg's protection judgment[1]
is based on a set of fixed value on current design, while the real
scan and reclaim number[2] is based on the proportional min/low on the
real memory usage which you mentioned above. Fixed value setting has
some constraints as
1. It is an experienced value based on observation, which could be inaccurate.
2. working load is various from scenarios.
3. fixed value from [1] could be against the dynamic cgroup_size in [2].

shrink_node_memcgs
     mem_cgroup_calculate_protection(target_memcg, memcg);          \
     if (mem_cgroup_below_min(memcg))
             \    ===> [1] check if the memcg is protected based on
fixed min/low value
     ...
                                        /
     else if (mem_cgroup_below_low(memcg))                                     /
     ...

     shrink_lruvec
            get_scan_count
                                              \
                   mem_cgroup_protection
                                         \ ===> [2] calculate the
number of scan size proportionally
                   scan = lruvec_size - lruvec_size * protection /
(cgroup_size + 1);        /

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] cgroup: introduce proportional protection on memcg
  2022-03-25  3:02   ` Zhaoyang Huang
@ 2022-03-25  3:08     ` Zhaoyang Huang
  2022-03-25 12:49       ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Zhaoyang Huang @ 2022-03-25  3:08 UTC (permalink / raw)
  To: Chris Down
  Cc: zhaoyang.huang, Andrew Morton, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, ke wang, open list:MEMORY MANAGEMENT, LKML,
	cgroups

On Fri, Mar 25, 2022 at 11:02 AM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> On Thu, Mar 24, 2022 at 10:27 PM Chris Down <chris@chrisdown.name> wrote:
> >
> > I'm confused by the aims of this patch. We already have proportional reclaim
> > for memory.min and memory.low, and memory.high is already "proportional" by its
> > nature to drive memory back down behind the configured threshold.
> >
> > Could you please be more clear about what you're trying to achieve and in what
> > way the existing proportional reclaim mechanisms are insufficient for you?

sorry for the bad formatting of previous reply, resend it in new format

 What I am trying to solve is that, the memcg's protection judgment[1]
 is based on a set of fixed value on current design, while the real
 scan and reclaim number[2] is based on the proportional min/low on the
 real memory usage which you mentioned above. Fixed value setting has
 some constraints as
 1. It is an experienced value based on observation, which could be inaccurate.
 2. working load is various from scenarios.
 3. fixed value from [1] could be against the dynamic cgroup_size in [2].

 shrink_node_memcgs
[1] check if the memcg is protected based on fixed min/low value
     mem_cgroup_calculate_protection(target_memcg, memcg);
      if (mem_cgroup_below_min(memcg))
      ...
      else if (mem_cgroup_below_low(memcg))
      ...

[2] calculate the number of scan size proportionally
     shrink_lruvec
             get_scan_count
                    mem_cgroup_protection
                    scan = lruvec_size - lruvec_size * protection /
(cgroup_size + 1);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] cgroup: introduce proportional protection on memcg
  2022-03-24 16:23   ` Roman Gushchin
@ 2022-03-25  3:10     ` Zhaoyang Huang
  0 siblings, 0 replies; 7+ messages in thread
From: Zhaoyang Huang @ 2022-03-25  3:10 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Chris Down, zhaoyang.huang, Andrew Morton, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, ke wang,
	open list:MEMORY MANAGEMENT, LKML, cgroups

On Fri, Mar 25, 2022 at 12:23 AM Roman Gushchin
<roman.gushchin@linux.dev> wrote:
>
> It seems like what’s being proposed is an ability to express the protection in % of the current usage rather than an absolute number.
> It’s an equivalent for something like a memory (reclaim) priority: e.g. a cgroup with 80% protection is _always_ reclaimed less aggressively than one with a 20% protection.
>
> That said, I’m not a fan of this idea.
> It might make sense in some reasonable range of usages, but if your workload is simply leaking memory and growing indefinitely, protecting it seems like a bad idea. And the first part can be easily achieved using an userspace tool.
>
> Thanks!
>
> > On Mar 24, 2022, at 7:33 AM, Chris Down <chris@chrisdown.name> wrote:
> >
> > I'm confused by the aims of this patch. We already have proportional reclaim for memory.min and memory.low, and memory.high is already "proportional" by its nature to drive memory back down behind the configured threshold.
> >
> > Could you please be more clear about what you're trying to achieve and in what way the existing proportional reclaim mechanisms are insufficient for you?
ok, I think it could be fixable for memory leak issues. Please refer
to my reply on Chris's comment for more explanation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] cgroup: introduce proportional protection on memcg
  2022-03-25  3:08     ` Zhaoyang Huang
@ 2022-03-25 12:49       ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2022-03-25 12:49 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Chris Down, zhaoyang.huang, Andrew Morton, Johannes Weiner,
	Vladimir Davydov, ke wang, open list:MEMORY MANAGEMENT, LKML,
	cgroups

On Fri 25-03-22 11:08:00, Zhaoyang Huang wrote:
> On Fri, Mar 25, 2022 at 11:02 AM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> >
> > On Thu, Mar 24, 2022 at 10:27 PM Chris Down <chris@chrisdown.name> wrote:
> > >
> > > I'm confused by the aims of this patch. We already have proportional reclaim
> > > for memory.min and memory.low, and memory.high is already "proportional" by its
> > > nature to drive memory back down behind the configured threshold.
> > >
> > > Could you please be more clear about what you're trying to achieve and in what
> > > way the existing proportional reclaim mechanisms are insufficient for you?
> 
> sorry for the bad formatting of previous reply, resend it in new format
> 
>  What I am trying to solve is that, the memcg's protection judgment[1]
>  is based on a set of fixed value on current design, while the real
>  scan and reclaim number[2] is based on the proportional min/low on the
>  real memory usage which you mentioned above. Fixed value setting has
>  some constraints as
>  1. It is an experienced value based on observation, which could be inaccurate.
>  2. working load is various from scenarios.
>  3. fixed value from [1] could be against the dynamic cgroup_size in [2].

Could you elaborate some more about those points. I guess providing an
example how you are using the new interface instead would be helpful.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-25 12:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-24  9:22 [RFC PATCH] cgroup: introduce proportional protection on memcg zhaoyang.huang
2022-03-24 14:27 ` Chris Down
2022-03-24 16:23   ` Roman Gushchin
2022-03-25  3:10     ` Zhaoyang Huang
2022-03-25  3:02   ` Zhaoyang Huang
2022-03-25  3:08     ` Zhaoyang Huang
2022-03-25 12:49       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).