Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Roman Gushchin <guro@fb.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Richard Palethorpe <rpalethorpe@suse.com>, <ltp@lists.linux.it>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shakeel Butt <shakeelb@google.com>,
	Christoph Lameter <cl@linux.com>,
	Michal Hocko <mhocko@kernel.org>, Tejun Heo <tj@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root
Date: Mon, 19 Oct 2020 15:28:45 -0700
Message-ID: <20201019222845.GA64774@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20201016171502.GA102311@blackbook>

On Fri, Oct 16, 2020 at 07:15:02PM +0200, Michal Koutny wrote:
> On Fri, Oct 16, 2020 at 10:53:08AM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > The central try_charge() function charges recursively all the way up
> > to and including the root.
> Except for use_hiearchy=0 (which is the case here as Richard
> wrote). The reparenting is hence somewhat incompatible with
> new_parent.use_hiearchy=0 :-/
> 
> > We should clean this up one way or another: either charge the root or
> > don't, but do it consistently.
> I agree this'd be good to unify. One upside of excluding root memcg from
> charging is that users are spared from the charging overhead when memcg
> tree is not created.  (Actually, I thought that was the reason for this
> exception.)

Yeah, I'm completely on the same page. Moving a process to the root memory
cgroup is currently a good way to estimate the memory cgroup overhead.

How about the patch below, which consistently avoids charging the root
memory cgroup? It seems like it doesn't add too many checks.

Thanks!

--

From f50ea74d8f118b9121da3754acdde630ddc060a7 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Mon, 19 Oct 2020 14:37:35 -0700
Subject: [PATCH RFC] mm: memcontrol: do not charge the root memory cgroup

Currently the root memory cgroup is never charged directly, but
if an ancestor cgroup is charged, the charge is propagated up to the
root memory cgroup. The root memory cgroup doesn't show the charge
to a user, neither it does allow to set any limits/protections.
So the information about the current charge is completely useless.

Avoiding to charge the root memory cgroup allows to:
1) simplify the model and the code, so, hopefully, fewer bugs will
   be introduced in the future;
2) avoid unnecessary atomic operations, which are used to (un)charge
   corresponding root page counters.

In the default hierarchy case or if use_hiearchy == true, it's very
straightforward: when the page counters tree is traversed to the root,
the root page counter (the one with parent == NULL), should be
skipped. To avoid multiple identical checks over the page counters
code, for_each_nonroot_ancestor() macro is introduced.

To handle the use_hierarchy == false case without adding custom
checks, let's make page counters of all non-root memory cgroup
direct ascendants of the corresponding root memory cgroup's page
counters. In this case for_each_nonroot_ancestor() will work correctly
as well.

Please, note, that cgroup v1 provides root level memory.usage_in_bytes.
However, it's not based on page counters (refer to mem_cgroup_usage()).

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c   | 21 ++++++++++++++++-----
 mm/page_counter.c | 21 ++++++++++++---------
 2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2636f8bad908..34cac7522e74 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5339,17 +5339,28 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 		memcg->swappiness = mem_cgroup_swappiness(parent);
 		memcg->oom_kill_disable = parent->oom_kill_disable;
 	}
-	if (parent && parent->use_hierarchy) {
+	if (!parent) {
+		/* root memory cgroup */
+		page_counter_init(&memcg->memory, NULL);
+		page_counter_init(&memcg->swap, NULL);
+		page_counter_init(&memcg->kmem, NULL);
+		page_counter_init(&memcg->tcpmem, NULL);
+	} else if (parent->use_hierarchy) {
 		memcg->use_hierarchy = true;
 		page_counter_init(&memcg->memory, &parent->memory);
 		page_counter_init(&memcg->swap, &parent->swap);
 		page_counter_init(&memcg->kmem, &parent->kmem);
 		page_counter_init(&memcg->tcpmem, &parent->tcpmem);
 	} else {
-		page_counter_init(&memcg->memory, NULL);
-		page_counter_init(&memcg->swap, NULL);
-		page_counter_init(&memcg->kmem, NULL);
-		page_counter_init(&memcg->tcpmem, NULL);
+		/*
+		 * If use_hierarchy == false, consider all page counters direct
+		 * descendants of the corresponding root level counters.
+		 */
+		page_counter_init(&memcg->memory, &root_mem_cgroup->memory);
+		page_counter_init(&memcg->swap, &root_mem_cgroup->swap);
+		page_counter_init(&memcg->kmem, &root_mem_cgroup->kmem);
+		page_counter_init(&memcg->tcpmem, &root_mem_cgroup->tcpmem);
+
 		/*
 		 * Deeper hierachy with use_hierarchy == false doesn't make
 		 * much sense so let cgroup subsystem know about this
diff --git a/mm/page_counter.c b/mm/page_counter.c
index b24a60b28bb0..8901b184b9d5 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -13,6 +13,9 @@
 #include <linux/bug.h>
 #include <asm/page.h>
 
+#define for_each_nonroot_ancestor(c, counter) \
+	for ((c) = (counter); ((c) && ((c)->parent)); (c) = (c)->parent)
+
 static void propagate_protected_usage(struct page_counter *c,
 				      unsigned long usage)
 {
@@ -20,9 +23,6 @@ static void propagate_protected_usage(struct page_counter *c,
 	unsigned long low, min;
 	long delta;
 
-	if (!c->parent)
-		return;
-
 	min = READ_ONCE(c->min);
 	if (min || atomic_long_read(&c->min_usage)) {
 		protected = min(usage, min);
@@ -68,7 +68,7 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
 {
 	struct page_counter *c;
 
-	for (c = counter; c; c = c->parent) {
+	for_each_nonroot_ancestor(c, counter) {
 		long new;
 
 		new = atomic_long_add_return(nr_pages, &c->usage);
@@ -97,7 +97,7 @@ bool page_counter_try_charge(struct page_counter *counter,
 {
 	struct page_counter *c;
 
-	for (c = counter; c; c = c->parent) {
+	for_each_nonroot_ancestor(c, counter) {
 		long new;
 		/*
 		 * Charge speculatively to avoid an expensive CAS.  If
@@ -137,8 +137,11 @@ bool page_counter_try_charge(struct page_counter *counter,
 	return true;
 
 failed:
-	for (c = counter; c != *fail; c = c->parent)
+	for_each_nonroot_ancestor(c, counter) {
+		if (c == *fail)
+			break;
 		page_counter_cancel(c, nr_pages);
+	}
 
 	return false;
 }
@@ -152,7 +155,7 @@ void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages)
 {
 	struct page_counter *c;
 
-	for (c = counter; c; c = c->parent)
+	for_each_nonroot_ancestor(c, counter)
 		page_counter_cancel(c, nr_pages);
 }
 
@@ -211,7 +214,7 @@ void page_counter_set_min(struct page_counter *counter, unsigned long nr_pages)
 
 	WRITE_ONCE(counter->min, nr_pages);
 
-	for (c = counter; c; c = c->parent)
+	for_each_nonroot_ancestor(c, counter)
 		propagate_protected_usage(c, atomic_long_read(&c->usage));
 }
 
@@ -228,7 +231,7 @@ void page_counter_set_low(struct page_counter *counter, unsigned long nr_pages)
 
 	WRITE_ONCE(counter->low, nr_pages);
 
-	for (c = counter; c; c = c->parent)
+	for_each_nonroot_ancestor(c, counter)
 		propagate_protected_usage(c, atomic_long_read(&c->usage));
 }
 
-- 
2.26.2



  parent reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-14 19:07 Richard Palethorpe
2020-10-14 20:08 ` Roman Gushchin
2020-10-16  5:40   ` Richard Palethorpe
2020-10-16  9:47 ` Michal Koutný
2020-10-16 10:41   ` Richard Palethorpe
2020-10-16 15:05     ` Richard Palethorpe
2020-10-16 17:26       ` Michal Koutný
2020-10-16 14:53   ` Johannes Weiner
2020-10-16 17:02     ` Roman Gushchin
2020-10-16 17:15     ` Michal Koutný
2020-10-19  8:45       ` Richard Palethorpe
2020-10-19  9:58         ` [PATCH v3] " Richard Palethorpe
2020-10-19 16:58           ` Shakeel Butt
2020-10-20  5:52             ` Richard Palethorpe
2020-10-20 13:49               ` Richard Palethorpe
2020-10-20 16:56                 ` Shakeel Butt
2020-10-21 20:32                   ` Roman Gushchin
2020-10-20 17:24               ` Michal Koutný
2020-10-22  7:04                 ` Richard Palethorpe
2020-10-22 12:28                   ` [PATCH v4] " Richard Palethorpe
2020-10-22 16:37                     ` Shakeel Butt
2020-10-22 17:25                       ` Roman Gushchin
2020-10-22 23:59                         ` Shakeel Butt
2020-10-23  0:40                           ` Roman Gushchin
2020-10-23 15:44                             ` Johannes Weiner
2020-10-23 16:41                             ` Shakeel Butt
2020-10-26  7:32                             ` Richard Palethorpe
2020-10-26 23:14                               ` Roman Gushchin
2020-10-19 22:28       ` Roman Gushchin [this message]
2020-10-20  6:04         ` [RFC PATCH] " Richard Palethorpe
2020-10-20 12:02           ` Richard Palethorpe
2020-10-20 14:48         ` Richard Palethorpe
2020-10-20 16:27         ` Michal Koutný
2020-10-20 17:07           ` Roman Gushchin
2020-10-20 18:18             ` Johannes Weiner
2020-10-21 19:33               ` Roman Gushchin
2020-10-23 16:30                 ` Johannes Weiner
2020-11-10  1:27                   ` Roman Gushchin
2020-11-10 15:11                     ` Shakeel Butt
2020-11-10 19:13                       ` Roman Gushchin
2020-11-20 17:46                       ` Michal Koutný
2020-11-03 13:22                 ` Michal Hocko
2020-11-03 21:30                   ` Roman Gushchin
2020-10-20 16:55         ` Shakeel Butt
2020-10-20 17:17           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201019222845.GA64774@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ltp@lists.linux.it \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=rpalethorpe@suse.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git