linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<kernel-team@fb.com>, Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	Waiman Long <longman@redhat.com>, Roman Gushchin <guro@fb.com>
Subject: [PATCH v7 10/10] mm: reparent memcg kmem_caches on cgroup removal
Date: Tue, 11 Jun 2019 16:18:13 -0700	[thread overview]
Message-ID: <20190611231813.3148843-11-guro@fb.com> (raw)
In-Reply-To: <20190611231813.3148843-1-guro@fb.com>

Let's reparent non-root kmem_caches on memcg offlining. This allows us
to release the memory cgroup without waiting for the last outstanding
kernel object (e.g. dentry used by another application).

Since the parent cgroup is already charged, everything we need to do
is to splice the list of kmem_caches to the parent's kmem_caches list,
swap the memcg pointer, drop the css refcounter for each kmem_cache
and adjust the parent's css refcounter.

Please, note that kmem_cache->memcg_params.memcg isn't a stable
pointer anymore. It's safe to read it under rcu_read_lock(),
cgroup_mutex held, or any other way that protects the memory cgroup
from being released.

We can race with the slab allocation and deallocation paths. It's not
a big problem: parent's charge and slab global stats are always
correct, and we don't care anymore about the child usage and global
stats. The child cgroup is already offline, so we don't use or show it
anywhere.

Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE)
aren't used anywhere except count_shadow_nodes(). But even there it
won't break anything: after reparenting "nodes" will be 0 on child
level (because we're already reparenting shrinker lists), and on
parent level page stats always were 0, and this patch won't change
anything.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/slab.h |  2 +-
 mm/memcontrol.c      | 14 ++++++++------
 mm/slab.h            | 26 ++++++++++++++++++++------
 mm/slab_common.c     | 19 +++++++++++++++++--
 4 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1b54e5f83342..fecf40b7be69 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -152,7 +152,7 @@ void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 
 void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *);
-void memcg_deactivate_kmem_caches(struct mem_cgroup *);
+void memcg_deactivate_kmem_caches(struct mem_cgroup *, struct mem_cgroup *);
 
 /*
  * Please use this macro to create slab caches. Simply specify the
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 25e72779fd33..db46a9dc37ab 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3289,15 +3289,15 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
 	 */
 	memcg->kmem_state = KMEM_ALLOCATED;
 
-	memcg_deactivate_kmem_caches(memcg);
-
-	kmemcg_id = memcg->kmemcg_id;
-	BUG_ON(kmemcg_id < 0);
-
 	parent = parent_mem_cgroup(memcg);
 	if (!parent)
 		parent = root_mem_cgroup;
 
+	memcg_deactivate_kmem_caches(memcg, parent);
+
+	kmemcg_id = memcg->kmemcg_id;
+	BUG_ON(kmemcg_id < 0);
+
 	/*
 	 * Change kmemcg_id of this cgroup and all its descendants to the
 	 * parent's id, and then move all entries from this cgroup's list_lrus
@@ -3330,7 +3330,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
 	if (memcg->kmem_state == KMEM_ALLOCATED) {
 		WARN_ON(!list_empty(&memcg->kmem_caches));
 		static_branch_dec(&memcg_kmem_enabled_key);
-		WARN_ON(page_counter_read(&memcg->kmem));
 	}
 }
 #else
@@ -4777,6 +4776,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 
 	/* The following stuff does not apply to the root */
 	if (!parent) {
+#ifdef CONFIG_MEMCG_KMEM
+		INIT_LIST_HEAD(&memcg->kmem_caches);
+#endif
 		root_mem_cgroup = memcg;
 		return &memcg->css;
 	}
diff --git a/mm/slab.h b/mm/slab.h
index 7ead47cb9338..a4c9b9d042de 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -261,6 +261,9 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s)
  * which do not have slab_cache pointer set.
  * So this function assumes that the page can pass PageHead() and PageSlab()
  * checks.
+ *
+ * The kmem_cache can be reparented asynchronously. The caller must ensure
+ * the memcg lifetime, e.g. by taking rcu_read_lock() or cgroup_mutex.
  */
 static inline struct mem_cgroup *memcg_from_slab_page(struct page *page)
 {
@@ -268,7 +271,7 @@ static inline struct mem_cgroup *memcg_from_slab_page(struct page *page)
 
 	s = READ_ONCE(page->slab_cache);
 	if (s && !is_root_cache(s))
-		return s->memcg_params.memcg;
+		return READ_ONCE(s->memcg_params.memcg);
 
 	return NULL;
 }
@@ -285,10 +288,18 @@ static __always_inline int memcg_charge_slab(struct page *page,
 	struct lruvec *lruvec;
 	int ret;
 
-	memcg = s->memcg_params.memcg;
+	rcu_read_lock();
+	memcg = READ_ONCE(s->memcg_params.memcg);
+	while (memcg && !css_tryget_online(&memcg->css))
+		memcg = parent_mem_cgroup(memcg);
+	rcu_read_unlock();
+
+	if (unlikely(!memcg))
+		return true;
+
 	ret = memcg_kmem_charge_memcg(page, gfp, order, memcg);
 	if (ret)
-		return ret;
+		goto out;
 
 	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
 	mod_lruvec_state(lruvec, cache_vmstat_idx(s), 1 << order);
@@ -296,8 +307,9 @@ static __always_inline int memcg_charge_slab(struct page *page,
 	/* transer try_charge() page references to kmem_cache */
 	percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order);
 	css_put_many(&memcg->css, 1 << order);
-
-	return 0;
+out:
+	css_put(&memcg->css);
+	return ret;
 }
 
 /*
@@ -310,10 +322,12 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 
-	memcg = s->memcg_params.memcg;
+	rcu_read_lock();
+	memcg = READ_ONCE(s->memcg_params.memcg);
 	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
 	mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << order));
 	memcg_kmem_uncharge_memcg(page, order, memcg);
+	rcu_read_unlock();
 
 	percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order);
 }
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6b7750f7ea33..91e8c739dc97 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -252,7 +252,8 @@ static void memcg_unlink_cache(struct kmem_cache *s)
 	} else {
 		list_del(&s->memcg_params.children_node);
 		list_del(&s->memcg_params.kmem_caches_node);
-		css_put(&s->memcg_params.memcg->css);
+		mem_cgroup_put(s->memcg_params.memcg);
+		WRITE_ONCE(s->memcg_params.memcg, NULL);
 	}
 }
 #else
@@ -790,11 +791,13 @@ static void kmemcg_cache_deactivate(struct kmem_cache *s)
 	spin_unlock_irq(&memcg_kmem_wq_lock);
 }
 
-void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg)
+void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg,
+				  struct mem_cgroup *parent)
 {
 	int idx;
 	struct memcg_cache_array *arr;
 	struct kmem_cache *s, *c;
+	unsigned int nr_reparented;
 
 	idx = memcg_cache_id(memcg);
 
@@ -812,6 +815,18 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg)
 		kmemcg_cache_deactivate(c);
 		arr->entries[idx] = NULL;
 	}
+	nr_reparented = 0;
+	list_for_each_entry(s, &memcg->kmem_caches,
+			    memcg_params.kmem_caches_node) {
+		WRITE_ONCE(s->memcg_params.memcg, parent);
+		css_put(&memcg->css);
+		nr_reparented++;
+	}
+	if (nr_reparented) {
+		list_splice_init(&memcg->kmem_caches,
+				 &parent->kmem_caches);
+		css_get_many(&parent->css, nr_reparented);
+	}
 	mutex_unlock(&slab_mutex);
 
 	put_online_mems();
-- 
2.21.0


  parent reply	other threads:[~2019-06-11 23:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-11 23:18 [PATCH v7 00/10] mm: reparent slab memory on cgroup removal Roman Gushchin
2019-06-11 23:18 ` [PATCH v7 01/10] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache() Roman Gushchin
2019-06-13  2:04   ` Andrew Morton
2019-06-13 16:25     ` Roman Gushchin
2019-06-11 23:18 ` [PATCH v7 02/10] mm: rename slab delayed deactivation functions and fields Roman Gushchin
2019-06-25 18:17   ` Shakeel Butt
2019-06-11 23:18 ` [PATCH v7 03/10] mm: generalize postponed non-root kmem_cache deactivation Roman Gushchin
2019-06-25 18:17   ` Shakeel Butt
2019-06-11 23:18 ` [PATCH v7 04/10] mm: introduce __memcg_kmem_uncharge_memcg() Roman Gushchin
2019-06-11 23:18 ` [PATCH v7 05/10] mm: unify SLAB and SLUB page accounting Roman Gushchin
2019-06-11 23:18 ` [PATCH v7 06/10] mm: don't check the dying flag on kmem_cache creation Roman Gushchin
2019-06-16 16:26   ` Vladimir Davydov
2019-06-25 18:31   ` Shakeel Butt
2019-06-11 23:18 ` [PATCH v7 07/10] mm: synchronize access to kmem_cache dying flag using a spinlock Roman Gushchin
2019-06-16 16:27   ` Vladimir Davydov
2019-06-25 18:33   ` Shakeel Butt
2019-06-11 23:18 ` [PATCH v7 08/10] mm: rework non-root kmem_cache lifecycle management Roman Gushchin
2019-06-25 23:57   ` Shakeel Butt
2019-11-21 11:17   ` WARNING bisected (was Re: [PATCH v7 08/10] mm: rework non-root kmem_cache lifecycle management) Christian Borntraeger
2019-11-21 13:08     ` Christian Borntraeger
2019-11-21 16:58     ` Roman Gushchin
2019-11-21 16:59       ` Christian Borntraeger
2019-11-21 18:45         ` Roman Gushchin
2019-11-21 20:43           ` Rik van Riel
2019-11-21 20:55             ` Roman Gushchin
2019-11-21 22:09               ` Roman Gushchin
2019-11-22 13:00                 ` Christian Borntraeger
2019-11-22 16:28           ` Christian Borntraeger
2019-11-24  0:39             ` Roman Gushchin
2019-11-25  8:00               ` Christian Borntraeger
2019-11-25 18:07                 ` Roman Gushchin
2019-06-11 23:18 ` [PATCH v7 09/10] mm: stop setting page->mem_cgroup pointer for slab pages Roman Gushchin
2019-06-26  0:15   ` Shakeel Butt
2019-06-11 23:18 ` Roman Gushchin [this message]
2019-06-16 16:29   ` [PATCH v7 10/10] mm: reparent memcg kmem_caches on cgroup removal Vladimir Davydov
2019-06-26  0:15   ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190611231813.3148843-11-guro@fb.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).