linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected()
@ 2019-06-15 11:17 Xunlei Pang
  2019-06-15 16:08 ` Chris Down
  0 siblings, 1 reply; 5+ messages in thread
From: Xunlei Pang @ 2019-06-15 11:17 UTC (permalink / raw)
  To: Roman Gushchin, Michal Hocko, Johannes Weiner; +Cc: linux-kernel, linux-mm

Currently memory.min|low implementation requires the whole
hierarchy has the settings, otherwise the protection will
be broken.

Our hierarchy is kind of like(memory.min value in brackets),

               root
                |
             docker(0)
              /    \
         c1(max)   c2(0)

Note that "docker" doesn't set memory.min. When kswapd runs,
mem_cgroup_protected() returns "0" emin for "c1" due to "0"
@parent_emin of "docker", as a result "c1" gets reclaimed.

But it's hard to maintain parent's "memory.min" when there're
uncertain protected children because only some important types
of containers need the protection.  Further, control tasks
belonging to parent constantly reproduce trivial memory which
should not be protected at all.  It makes sense to ignore
unprotected parent in this scenario to achieve the flexibility.

In order not to break previous hierarchical behaviour, only
ignore the parent when there's no protected ancestor upwards
the hierarchy.

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
---
 include/linux/page_counter.h |  2 ++
 mm/memcontrol.c              |  5 +++++
 mm/page_counter.c            | 24 ++++++++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index bab7e57f659b..aed7ed28b458 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -55,6 +55,8 @@ bool page_counter_try_charge(struct page_counter *counter,
 void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages);
 void page_counter_set_min(struct page_counter *counter, unsigned long nr_pages);
 void page_counter_set_low(struct page_counter *counter, unsigned long nr_pages);
+bool page_counter_has_min(struct page_counter *counter);
+bool page_counter_has_low(struct page_counter *counter);
 int page_counter_set_max(struct page_counter *counter, unsigned long nr_pages);
 int page_counter_memparse(const char *buf, const char *max,
 			  unsigned long *nr_pages);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ca0bc6e6be13..f1dfa651f55d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5917,6 +5917,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 	if (parent == root)
 		goto exit;
 
+	if (!page_counter_has_min(&parent->memory))
+		goto elow;
 	parent_emin = READ_ONCE(parent->memory.emin);
 	emin = min(emin, parent_emin);
 	if (emin && parent_emin) {
@@ -5931,6 +5933,9 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 				   siblings_min_usage);
 	}
 
+elow:
+	if (!page_counter_has_low(&parent->memory))
+		goto exit;
 	parent_elow = READ_ONCE(parent->memory.elow);
 	elow = min(elow, parent_elow);
 	if (elow && parent_elow) {
diff --git a/mm/page_counter.c b/mm/page_counter.c
index de31470655f6..8c668eae2af5 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -202,6 +202,30 @@ int page_counter_set_max(struct page_counter *counter, unsigned long nr_pages)
 	}
 }
 
+bool page_counter_has_min(struct page_counter *counter)
+{
+	struct page_counter *c;
+
+	for (c = counter; c; c = c->parent) {
+		if (counter->min)
+			return true;
+	}
+
+	return false;
+}
+
+bool page_counter_has_low(struct page_counter *counter)
+{
+	struct page_counter *c;
+
+	for (c = counter; c; c = c->parent) {
+		if (counter->low)
+			return true;
+	}
+
+	return false;
+}
+
 /**
  * page_counter_set_min - set the amount of protected memory
  * @counter: counter
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected()
  2019-06-15 11:17 [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected() Xunlei Pang
@ 2019-06-15 16:08 ` Chris Down
  2019-06-16  6:30   ` Xunlei Pang
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Down @ 2019-06-15 16:08 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel, linux-mm

Hi Xunlei,

Xunlei Pang writes:
>Currently memory.min|low implementation requires the whole
>hierarchy has the settings, otherwise the protection will
>be broken.
>
>Our hierarchy is kind of like(memory.min value in brackets),
>
>               root
>                |
>             docker(0)
>              /    \
>         c1(max)   c2(0)
>
>Note that "docker" doesn't set memory.min. When kswapd runs,
>mem_cgroup_protected() returns "0" emin for "c1" due to "0"
>@parent_emin of "docker", as a result "c1" gets reclaimed.
>
>But it's hard to maintain parent's "memory.min" when there're
>uncertain protected children because only some important types
>of containers need the protection.  Further, control tasks
>belonging to parent constantly reproduce trivial memory which
>should not be protected at all.  It makes sense to ignore
>unprotected parent in this scenario to achieve the flexibility.

I'm really confused by this, why don't you just set memory.{min,low} in the 
docker cgroup and only propagate it to the children that want it?

If you only want some children to have the protection, only request it in those 
children, or create an additional intermediate layer of the cgroup hierarchy 
with protections further limited if you don't trust the task to request the 
right amount.

Breaking the requirement for hierarchical propagation of protections seems like 
a really questionable API change, not least because it makes it harder to set 
systemwide policies about the constraints of protections within a subtree.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected()
  2019-06-15 16:08 ` Chris Down
@ 2019-06-16  6:30   ` Xunlei Pang
  2019-06-16 10:37     ` Chris Down
  0 siblings, 1 reply; 5+ messages in thread
From: Xunlei Pang @ 2019-06-16  6:30 UTC (permalink / raw)
  To: Chris Down
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel, linux-mm

Hi Chirs,

On 2019/6/16 AM 12:08, Chris Down wrote:
> Hi Xunlei,
> 
> Xunlei Pang writes:
>> Currently memory.min|low implementation requires the whole
>> hierarchy has the settings, otherwise the protection will
>> be broken.
>>
>> Our hierarchy is kind of like(memory.min value in brackets),
>>
>>               root
>>                |
>>             docker(0)
>>              /    \
>>         c1(max)   c2(0)
>>
>> Note that "docker" doesn't set memory.min. When kswapd runs,
>> mem_cgroup_protected() returns "0" emin for "c1" due to "0"
>> @parent_emin of "docker", as a result "c1" gets reclaimed.
>>
>> But it's hard to maintain parent's "memory.min" when there're
>> uncertain protected children because only some important types
>> of containers need the protection.  Further, control tasks
>> belonging to parent constantly reproduce trivial memory which
>> should not be protected at all.  It makes sense to ignore
>> unprotected parent in this scenario to achieve the flexibility.
> 
> I'm really confused by this, why don't you just set memory.{min,low} in
> the docker cgroup and only propagate it to the children that want it?
> 
> If you only want some children to have the protection, only request it
> in those children, or create an additional intermediate layer of the
> cgroup hierarchy with protections further limited if you don't trust the
> task to request the right amount.
> 
> Breaking the requirement for hierarchical propagation of protections
> seems like a really questionable API change, not least because it makes
> it harder to set systemwide policies about the constraints of
> protections within a subtree.

docker and various types(different memory capacity) of containers
are managed by k8s, it's a burden for k8s to maintain those dynamic
figures, simply set "max" to key containers is always welcome.

Set "max" to docker also protects docker cgroup memory(as docker
itself has tasks) unnecessarily.

This patch doesn't take effect on any intermediate layer with
positive memory.min set, it requires all the ancestors having
0 memory.min to work.

Nothing special change, but more flexible to business deployment...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected()
  2019-06-16  6:30   ` Xunlei Pang
@ 2019-06-16 10:37     ` Chris Down
  2019-06-16 11:57       ` Xunlei Pang
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Down @ 2019-06-16 10:37 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel, linux-mm

Hi Xunlei,

Xunlei Pang writes:
>docker and various types(different memory capacity) of containers
>are managed by k8s, it's a burden for k8s to maintain those dynamic
>figures, simply set "max" to key containers is always welcome.

Right, setting "max" is generally a fine way of going about it.

>Set "max" to docker also protects docker cgroup memory(as docker
>itself has tasks) unnecessarily.

That's not correct -- leaf memcgs have to _explicitly_ request memory 
protection. From the documentation:

    memory.low

    [...]

    Best-effort memory protection.  If the memory usages of a
    cgroup and all its ancestors are below their low boundaries,
    the cgroup's memory won't be reclaimed unless memory can be
    reclaimed from unprotected cgroups.

Note the part that the cgroup itself also must be within its low boundary, 
which is not implied simply by having ancestors that would permit propagation 
of protections.

In this case, Docker just shouldn't request it for those Docker-related tasks, 
and they won't get any. That seems a lot simpler and more intuitive than 
special casing "0" in ancestors.

>This patch doesn't take effect on any intermediate layer with
>positive memory.min set, it requires all the ancestors having
>0 memory.min to work.
>
>Nothing special change, but more flexible to business deployment...

Not so, this change is extremely "special". It violates the basic expectation 
that 0 means no possibility of propagation of protection, and I still don't see 
a compelling argument why Docker can't just set "max" in the intermediate 
cgroup and not accept any protection in leaf memcgs that it doesn't want 
protection for.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected()
  2019-06-16 10:37     ` Chris Down
@ 2019-06-16 11:57       ` Xunlei Pang
  0 siblings, 0 replies; 5+ messages in thread
From: Xunlei Pang @ 2019-06-16 11:57 UTC (permalink / raw)
  To: Chris Down
  Cc: Roman Gushchin, Michal Hocko, Johannes Weiner, linux-kernel, linux-mm

Hi Chris,

On 2019/6/16 PM 6:37, Chris Down wrote:
> Hi Xunlei,
> 
> Xunlei Pang writes:
>> docker and various types(different memory capacity) of containers
>> are managed by k8s, it's a burden for k8s to maintain those dynamic
>> figures, simply set "max" to key containers is always welcome.
> 
> Right, setting "max" is generally a fine way of going about it.
> 
>> Set "max" to docker also protects docker cgroup memory(as docker
>> itself has tasks) unnecessarily.
> 
> That's not correct -- leaf memcgs have to _explicitly_ request memory
> protection. From the documentation:
> 
>    memory.low
> 
>    [...]
> 
>    Best-effort memory protection.  If the memory usages of a
>    cgroup and all its ancestors are below their low boundaries,
>    the cgroup's memory won't be reclaimed unless memory can be
>    reclaimed from unprotected cgroups.
> 
> Note the part that the cgroup itself also must be within its low
> boundary, which is not implied simply by having ancestors that would
> permit propagation of protections.
> 
> In this case, Docker just shouldn't request it for those Docker-related
> tasks, and they won't get any. That seems a lot simpler and more
> intuitive than special casing "0" in ancestors.
> 
>> This patch doesn't take effect on any intermediate layer with
>> positive memory.min set, it requires all the ancestors having
>> 0 memory.min to work.
>>
>> Nothing special change, but more flexible to business deployment...
> 
> Not so, this change is extremely "special". It violates the basic
> expectation that 0 means no possibility of propagation of protection,
> and I still don't see a compelling argument why Docker can't just set
> "max" in the intermediate cgroup and not accept any protection in leaf
> memcgs that it doesn't want protection for.

I got the reason, I'm using cgroup v1(with memory.min backport)
which permits tasks existent in "docker" cgroup.procs.

For cgroup v2, it's not a problem.

Thanks,
Xunlei

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-16 11:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-15 11:17 [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected() Xunlei Pang
2019-06-15 16:08 ` Chris Down
2019-06-16  6:30   ` Xunlei Pang
2019-06-16 10:37     ` Chris Down
2019-06-16 11:57       ` Xunlei Pang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).