linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages
@ 2021-03-19 16:38 Muchun Song
  2021-03-19 16:38 ` [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Muchun Song
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are charged with the new APIs of obj_cgroup. The new APIs
introduce a struct obj_cgroup to charge slab objects. It prevents
long-living objects from pinning the original memory cgroup in the memory.
But there are still some corner objects (e.g. allocations larger than
order-1 page on SLUB) which are not charged with the new APIs. Those
objects (include the pages which are allocated from buddy allocator
directly) are charged as kmem pages which still hold a reference to
the memory cgroup.

E.g. We know that the kernel stack is charged as kmem pages because the
size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
or arm64). If we create a thread (suppose the thread stack is charged to
memory cgroup A) and then move it from memory cgroup A to memory cgroup
B. Because the kernel stack of the thread hold a reference to the memory
cgroup A. The thread can pin the memory cgroup A in the memory even if
we remove the cgroup A. If we want to see this scenario by using the
following script. We can see that the system has added 500 dying cgroups
(This is not a real world issue, just a script to show that the large
kmallocs are charged as kmem pages which can pin the memory cgroup in the
memory).

	#!/bin/bash

	cat /proc/cgroups | grep memory

	cd /sys/fs/cgroup/memory
	echo 1 > memory.move_charge_at_immigrate

	for i in range{1..500}
	do
		mkdir kmem_test
		echo $$ > kmem_test/cgroup.procs
		sleep 3600 &
		echo $$ > cgroup.procs
		echo `cat kmem_test/cgroup.procs` > cgroup.procs
		rmdir kmem_test
	done

	cat /proc/cgroups | grep memory

This patchset aims to make those kmem pages to drop the reference to memory
cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
of the dying cgroups will not increase if we run the above test script.

Changlogs in v5:
  1. Add a new patch (1st) to fix a potential issue.
  2. Rename get_obj_cgroup_memcg() to get_mem_cgroup_from_objcg().
  3. Remove get_mem_cgroup_from_current() and get_active_memcg().
  4. Add a comment to uncharge_page().
  5. Add a separate patch to inline __memcg_kmem_{un}charge() into
     obj_cgroup_{un}charge_pages().
  6. Collect Acked-by and Reviewed-by tags.

  Thanks to Johannes and Shakeel's review and suggestions.

Changlogs in v4:
  1. Do not change behavior of page_memcg() and page_memcg_rcu().
  2. Rework uncharge_page() and uncharge_batch().
  3. Add two patches (patch #2 and patch #3).

  Thanks to Johannes and Shakeel and Roman's review and suggestions.

Changlogs in v3:
  1. Drop "remote objcg charging APIs" patch.
  2. Rename obj_cgroup_{un}charge_page to obj_cgroup_{un}charge_pages.
  3. Make page_memcg/page_memcg_rcu safe for adding new memcg_data flags.
  4. Reuse the ug infrastructure to uncharge the kmem pages.
  5. Add a new patch to move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM.

  Thanks to Roman's review and suggestions.

Changlogs in v2:
  1. Fix some types in the commit log (Thanks Roman).
  2. Do not introduce page_memcg_kmem helper (Thanks to Johannes and Shakeel).
  3. Reduce the CC list to mm/memcg folks (Thanks to Johannes).
  4. Introduce remote objcg charging APIs instead of convert "remote memcg
     charging APIs" to "remote objcg charging APIs".

Muchun Song (7):
  mm: memcontrol: slab: fix obtain a reference to a freeing memcg
  mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c
  mm: memcontrol: change ug->dummy_page only if memcg changed
  mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  mm: memcontrol: inline __memcg_kmem_{un}charge() into
    obj_cgroup_{un}charge_pages()
  mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM

 include/linux/memcontrol.h | 123 +++++++++++++++++++++------
 mm/memcontrol.c            | 206 +++++++++++++++++++++++----------------------
 mm/page_alloc.c            |   4 +-
 3 files changed, 207 insertions(+), 126 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg
  2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
@ 2021-03-19 16:38 ` Muchun Song
  2021-03-19 18:26   ` Shakeel Butt
                     ` (2 more replies)
  2021-03-19 16:38 ` [PATCH v5 2/7] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

The rcu_read_lock/unlock only can guarantee that the memcg will not be
freed, but it cannot guarantee the success of css_get (which is in the
refill_stock when cached memcg changed) to memcg.

  rcu_read_lock()
  memcg = obj_cgroup_memcg(old)
  __memcg_kmem_uncharge(memcg)
      refill_stock(memcg)
          if (stock->cached != memcg)
              // css_get can change the ref counter from 0 back to 1.
              css_get(&memcg->css)
  rcu_read_unlock()

This fix is very like the commit:

  eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")

Fix this by holding a reference to the memcg which is passed to the
__memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().

Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 845eec01ef9d..2cda76ff0629 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3181,9 +3181,17 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
 
 		if (nr_pages) {
+			struct mem_cgroup *memcg;
+
 			rcu_read_lock();
-			__memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
+retry:
+			memcg = obj_cgroup_memcg(old);
+			if (unlikely(!css_tryget(&memcg->css)))
+				goto retry;
 			rcu_read_unlock();
+
+			__memcg_kmem_uncharge(memcg, nr_pages);
+			css_put(&memcg->css);
 		}
 
 		/*
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 2/7] mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-19 16:38 ` [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Muchun Song
@ 2021-03-19 16:38 ` Muchun Song
  2021-03-19 16:38 ` [PATCH v5 3/7] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c Muchun Song
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

We know that the unit of slab object charging is bytes, the unit of
kmem page charging is PAGE_SIZE. If we want to reuse obj_cgroup APIs
to charge the kmem pages, we should pass PAGE_SIZE (as third parameter)
to obj_cgroup_charge(). Because the size is already PAGE_SIZE, we can
skip touch the objcg stock. And obj_cgroup_{un}charge_pages() are
introduced to charge in units of page level.

In the later patch, we also can reuse those two helpers to charge or
uncharge a number of kernel pages to a object cgroup. This is just
a code movement without any functional changes.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 63 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 23 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2cda76ff0629..9489c7fc1e04 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2905,6 +2905,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 	page->memcg_data = (unsigned long)memcg;
 }
 
+static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+retry:
+	memcg = obj_cgroup_memcg(objcg);
+	if (unlikely(!css_tryget(&memcg->css)))
+		goto retry;
+	rcu_read_unlock();
+
+	return memcg;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 				 gfp_t gfp, bool new_page)
@@ -3056,6 +3070,29 @@ static void memcg_free_cache_id(int id)
 	ida_simple_remove(&memcg_cache_ida, id);
 }
 
+static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
+				      unsigned int nr_pages)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = get_mem_cgroup_from_objcg(objcg);
+	__memcg_kmem_uncharge(memcg, nr_pages);
+	css_put(&memcg->css);
+}
+
+static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
+				   unsigned int nr_pages)
+{
+	struct mem_cgroup *memcg;
+	int ret;
+
+	memcg = get_mem_cgroup_from_objcg(objcg);
+	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+	css_put(&memcg->css);
+
+	return ret;
+}
+
 /**
  * __memcg_kmem_charge: charge a number of kernel pages to a memcg
  * @memcg: memory cgroup to charge
@@ -3180,19 +3217,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
 		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
 
-		if (nr_pages) {
-			struct mem_cgroup *memcg;
-
-			rcu_read_lock();
-retry:
-			memcg = obj_cgroup_memcg(old);
-			if (unlikely(!css_tryget(&memcg->css)))
-				goto retry;
-			rcu_read_unlock();
-
-			__memcg_kmem_uncharge(memcg, nr_pages);
-			css_put(&memcg->css);
-		}
+		if (nr_pages)
+			obj_cgroup_uncharge_pages(old, nr_pages);
 
 		/*
 		 * The leftover is flushed to the centralized per-memcg value.
@@ -3250,7 +3276,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 
 int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 {
-	struct mem_cgroup *memcg;
 	unsigned int nr_pages, nr_bytes;
 	int ret;
 
@@ -3267,24 +3292,16 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 	 * refill_obj_stock(), called from this function or
 	 * independently later.
 	 */
-	rcu_read_lock();
-retry:
-	memcg = obj_cgroup_memcg(objcg);
-	if (unlikely(!css_tryget(&memcg->css)))
-		goto retry;
-	rcu_read_unlock();
-
 	nr_pages = size >> PAGE_SHIFT;
 	nr_bytes = size & (PAGE_SIZE - 1);
 
 	if (nr_bytes)
 		nr_pages += 1;
 
-	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+	ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
 	if (!ret && nr_bytes)
 		refill_obj_stock(objcg, PAGE_SIZE - nr_bytes);
 
-	css_put(&memcg->css);
 	return ret;
 }
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 3/7] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c
  2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-19 16:38 ` [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Muchun Song
  2021-03-19 16:38 ` [PATCH v5 2/7] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
@ 2021-03-19 16:38 ` Muchun Song
  2021-03-19 16:38 ` [PATCH v5 4/7] mm: memcontrol: change ug->dummy_page only if memcg changed Muchun Song
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

The page_memcg() is not suitable for use by page_expected_state() and
page_bad_reason(). Because it can BUG_ON() for the slab pages when
CONFIG_DEBUG_VM is enabled. As neither lru, nor kmem, nor slab page
should have anything left in there by the time the page is freed, what
we care about is whether the value of page->memcg_data is 0. So just
directly access page->memcg_data here.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f10966e3b4a5..e5454b85a106 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page,
 	if (unlikely((unsigned long)page->mapping |
 			page_ref_count(page) |
 #ifdef CONFIG_MEMCG
-			(unsigned long)page_memcg(page) |
+			page->memcg_data |
 #endif
 			(page->flags & check_flags)))
 		return false;
@@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
 	}
 #ifdef CONFIG_MEMCG
-	if (unlikely(page_memcg(page)))
+	if (unlikely(page->memcg_data))
 		bad_reason = "page still charged to cgroup";
 #endif
 	return bad_reason;
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 4/7] mm: memcontrol: change ug->dummy_page only if memcg changed
  2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
                   ` (2 preceding siblings ...)
  2021-03-19 16:38 ` [PATCH v5 3/7] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c Muchun Song
@ 2021-03-19 16:38 ` Muchun Song
  2021-03-19 16:38 ` [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

Just like assignment to ug->memcg, we only need to update ug->dummy_page
if memcg changed. So move it to there. This is a very small optimization.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9489c7fc1e04..8d28a5a2ee58 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6854,6 +6854,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 			uncharge_gather_clear(ug);
 		}
 		ug->memcg = page_memcg(page);
+		ug->dummy_page = page;
 
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
@@ -6867,7 +6868,6 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	else
 		ug->pgpgout++;
 
-	ug->dummy_page = page;
 	page->memcg_data = 0;
 	css_put(&ug->memcg->css);
 }
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
                   ` (3 preceding siblings ...)
  2021-03-19 16:38 ` [PATCH v5 4/7] mm: memcontrol: change ug->dummy_page only if memcg changed Muchun Song
@ 2021-03-19 16:38 ` Muchun Song
  2021-03-19 18:27   ` Shakeel Butt
  2021-03-22 18:13   ` Roman Gushchin
  2021-03-19 16:38 ` [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Muchun Song
  2021-03-19 16:38 ` [PATCH v5 7/7] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
  6 siblings, 2 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are charged via the new APIs of obj_cgroup. The new APIs
introduce a struct obj_cgroup to charge slab objects. It prevents
long-living objects from pinning the original memory cgroup in the memory.
But there are still some corner objects (e.g. allocations larger than
order-1 page on SLUB) which are not charged via the new APIs. Those
objects (include the pages which are allocated from buddy allocator
directly) are charged as kmem pages which still hold a reference to
the memory cgroup.

We want to reuse the obj_cgroup APIs to charge the kmem pages.
If we do that, we should store an object cgroup pointer to
page->memcg_data for the kmem pages.

Finally, page->memcg_data will have 3 different meanings.

  1) For the slab pages, page->memcg_data points to an object cgroups
     vector.

  2) For the kmem pages (exclude the slab pages), page->memcg_data
     points to an object cgroup.

  3) For the user pages (e.g. the LRU pages), page->memcg_data points
     to a memory cgroup.

We do not change the behavior of page_memcg() and page_memcg_rcu().
They are also suitable for LRU pages and kmem pages. Why?

Because memory allocations pinning memcgs for a long time - it exists
at a larger scale and is causing recurring problems in the real world:
page cache doesn't get reclaimed for a long time, or is used by the
second, third, fourth, ... instance of the same job that was restarted
into a new cgroup every time. Unreclaimable dying cgroups pile up,
waste memory, and make page reclaim very inefficient.

We can convert LRU pages and most other raw memcg pins to the objcg
direction to fix this problem, and then the page->memcg will always
point to an object cgroup pointer. At that time, LRU pages and kmem
pages will be treated the same. The implementation of page_memcg()
will remove the kmem page check.

This patch aims to charge the kmem pages by using the new APIs of
obj_cgroup. Finally, the page->memcg_data of the kmem page points to
an object cgroup. We can use the __page_objcg() to get the object
cgroup associated with a kmem page. Or we can use page_memcg()
to get the memory cgroup associated with a kmem page, but caller must
ensure that the returned memcg won't be released (e.g. acquire the
rcu_read_lock or css_set_lock).

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++----------
 mm/memcontrol.c            | 110 +++++++++++++++++++++---------------------
 2 files changed, 145 insertions(+), 81 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..395a113e4a3b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -358,6 +358,62 @@ enum page_memcg_data_flags {
 
 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
 
+static inline bool PageMemcgKmem(struct page *page);
+
+/*
+ * After the initialization objcg->memcg is always pointing at
+ * a valid memcg, but can be atomically swapped to the parent memcg.
+ *
+ * The caller must ensure that the returned memcg won't be released:
+ * e.g. acquire the rcu_read_lock or css_set_lock.
+ */
+static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+{
+	return READ_ONCE(objcg->memcg);
+}
+
+/*
+ * __page_memcg - get the memory cgroup associated with a non-kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * kmem pages.
+ */
+static inline struct mem_cgroup *__page_memcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
+/*
+ * __page_objcg - get the object cgroup associated with a kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper object cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * LRU pages.
+ */
+static inline struct obj_cgroup *__page_objcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
+
+	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
 /*
  * page_memcg - get the memory cgroup associated with a page
  * @page: a pointer to the page struct
@@ -367,20 +423,23 @@ enum page_memcg_data_flags {
  * proper memory cgroup pointer. It's not safe to call this function
  * against some type of pages, e.g. slab pages or ex-slab pages.
  *
- * Any of the following ensures page and memcg binding stability:
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
+ *
  * - the page lock
  * - LRU isolation
  * - lock_page_memcg()
  * - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
  */
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
-	unsigned long memcg_data = page->memcg_data;
-
-	VM_BUG_ON_PAGE(PageSlab(page), page);
-	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
-
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	if (PageMemcgKmem(page))
+		return obj_cgroup_memcg(__page_objcg(page));
+	else
+		return __page_memcg(page);
 }
 
 /*
@@ -394,11 +453,19 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
  */
 static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
 {
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
 	VM_BUG_ON_PAGE(PageSlab(page), page);
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
-				     ~MEMCG_DATA_FLAGS_MASK);
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -406,15 +473,21 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
- * or NULL. This function unlike page_memcg() can take any  page
+ * or NULL. This function unlike page_memcg() can take any page
  * as an argument. It has to be used in cases when it's not known if a page
- * has an associated memory cgroup pointer or an object cgroups vector.
+ * has an associated memory cgroup pointer or an object cgroups vector or
+ * an object cgroup.
+ *
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
  *
- * Any of the following ensures page and memcg binding stability:
  * - the page lock
  * - LRU isolation
  * - lock_page_memcg()
  * - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
  */
 static inline struct mem_cgroup *page_memcg_check(struct page *page)
 {
@@ -427,6 +500,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
@@ -713,18 +793,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 	percpu_ref_put(&objcg->refcnt);
 }
 
-/*
- * After the initialization objcg->memcg is always pointing at
- * a valid memcg, but can be atomically swapped to the parent memcg.
- *
- * The caller must ensure that the returned memcg won't be released:
- * e.g. acquire the rcu_read_lock or css_set_lock.
- */
-static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
-{
-	return READ_ONCE(objcg->memcg);
-}
-
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8d28a5a2ee58..962499542531 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -855,18 +855,22 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
 			     int val)
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
-	struct mem_cgroup *memcg = page_memcg(head);
+	struct mem_cgroup *memcg;
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
+	rcu_read_lock();
+	memcg = page_memcg(head);
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
+		rcu_read_unlock();
 		__mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	__mod_lruvec_state(lruvec, idx, val);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(__mod_lruvec_page_state);
 
@@ -1055,20 +1059,6 @@ static __always_inline struct mem_cgroup *active_memcg(void)
 		return current->active_memcg;
 }
 
-static __always_inline struct mem_cgroup *get_active_memcg(void)
-{
-	struct mem_cgroup *memcg;
-
-	rcu_read_lock();
-	memcg = active_memcg();
-	/* remote memcg must hold a ref. */
-	if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
-		memcg = root_mem_cgroup;
-	rcu_read_unlock();
-
-	return memcg;
-}
-
 static __always_inline bool memcg_kmem_bypass(void)
 {
 	/* Allow remote memcg charging from any context. */
@@ -1083,20 +1073,6 @@ static __always_inline bool memcg_kmem_bypass(void)
 }
 
 /**
- * If active memcg is set, do not fallback to current->mm->memcg.
- */
-static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
-{
-	if (memcg_kmem_bypass())
-		return NULL;
-
-	if (unlikely(active_memcg()))
-		return get_active_memcg();
-
-	return get_mem_cgroup_from_mm(current->mm);
-}
-
-/**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
  * @root: hierarchy root
  * @prev: previously returned memcg, NULL on first invocation
@@ -3152,18 +3128,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
  */
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 {
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 	int ret = 0;
 
-	memcg = get_mem_cgroup_from_current();
-	if (memcg && !mem_cgroup_is_root(memcg)) {
-		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
+	objcg = get_obj_cgroup_from_current();
+	if (objcg) {
+		ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
 		if (!ret) {
-			page->memcg_data = (unsigned long)memcg |
+			page->memcg_data = (unsigned long)objcg |
 				MEMCG_DATA_KMEM;
 			return 0;
 		}
-		css_put(&memcg->css);
+		obj_cgroup_put(objcg);
 	}
 	return ret;
 }
@@ -3175,16 +3151,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg = page_memcg(page);
+	struct obj_cgroup *objcg;
 	unsigned int nr_pages = 1 << order;
 
-	if (!memcg)
+	if (!PageMemcgKmem(page))
 		return;
 
-	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
-	__memcg_kmem_uncharge(memcg, nr_pages);
+	objcg = __page_objcg(page);
+	obj_cgroup_uncharge_pages(objcg, nr_pages);
 	page->memcg_data = 0;
-	css_put(&memcg->css);
+	obj_cgroup_put(objcg);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -6799,7 +6775,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 
 struct uncharge_gather {
 	struct mem_cgroup *memcg;
-	unsigned long nr_pages;
+	unsigned long nr_memory;
 	unsigned long pgpgout;
 	unsigned long nr_kmem;
 	struct page *dummy_page;
@@ -6814,10 +6790,10 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	unsigned long flags;
 
-	if (!mem_cgroup_is_root(ug->memcg)) {
-		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
+	if (ug->nr_memory) {
+		page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
 		if (do_memsw_account())
-			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
+			page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
 		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
 			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
 		memcg_oom_recover(ug->memcg);
@@ -6825,7 +6801,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 
 	local_irq_save(flags);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
-	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
+	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory);
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
 
@@ -6836,40 +6812,60 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
 	unsigned long nr_pages;
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (!page_memcg(page))
-		return;
-
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page_memcg(page) at this point, we have fully
+	 * page memcg or objcg at this point, we have fully
 	 * exclusive access to the page.
 	 */
+	if (PageMemcgKmem(page)) {
+		objcg = __page_objcg(page);
+		/*
+		 * This get matches the put at the end of the function and
+		 * kmem pages do not hold memcg references anymore.
+		 */
+		memcg = get_mem_cgroup_from_objcg(objcg);
+	} else {
+		memcg = __page_memcg(page);
+	}
 
-	if (ug->memcg != page_memcg(page)) {
+	if (!memcg)
+		return;
+
+	if (ug->memcg != memcg) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
-		ug->memcg = page_memcg(page);
+		ug->memcg = memcg;
 		ug->dummy_page = page;
 
 		/* pairs with css_put in uncharge_batch */
-		css_get(&ug->memcg->css);
+		css_get(&memcg->css);
 	}
 
 	nr_pages = compound_nr(page);
-	ug->nr_pages += nr_pages;
 
-	if (PageMemcgKmem(page))
+	if (PageMemcgKmem(page)) {
+		ug->nr_memory += nr_pages;
 		ug->nr_kmem += nr_pages;
-	else
+
+		page->memcg_data = 0;
+		obj_cgroup_put(objcg);
+	} else {
+		/* LRU pages aren't accounted at the root level */
+		if (!mem_cgroup_is_root(memcg))
+			ug->nr_memory += nr_pages;
 		ug->pgpgout++;
 
-	page->memcg_data = 0;
-	css_put(&ug->memcg->css);
+		page->memcg_data = 0;
+	}
+
+	css_put(&memcg->css);
 }
 
 /**
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()
  2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
                   ` (4 preceding siblings ...)
  2021-03-19 16:38 ` [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
@ 2021-03-19 16:38 ` Muchun Song
  2021-03-19 18:41   ` Shakeel Butt
                     ` (2 more replies)
  2021-03-19 16:38 ` [PATCH v5 7/7] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
  6 siblings, 3 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

There is only one user of __memcg_kmem_charge(), so manually inline
__memcg_kmem_charge() to obj_cgroup_charge_pages(). Similarly manually
inline __memcg_kmem_uncharge() into obj_cgroup_uncharge_pages() and
call obj_cgroup_uncharge_pages() in obj_cgroup_release().

This is just code cleanup without any functionality changes.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 63 +++++++++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 38 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 962499542531..249bf6b4d94c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -255,10 +255,8 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
 #ifdef CONFIG_MEMCG_KMEM
 extern spinlock_t css_set_lock;
 
-static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
-			       unsigned int nr_pages);
-static void __memcg_kmem_uncharge(struct mem_cgroup *memcg,
-				  unsigned int nr_pages);
+static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
+				      unsigned int nr_pages);
 
 static void obj_cgroup_release(struct percpu_ref *ref)
 {
@@ -295,7 +293,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
 	spin_lock_irqsave(&css_set_lock, flags);
 	memcg = obj_cgroup_memcg(objcg);
 	if (nr_pages)
-		__memcg_kmem_uncharge(memcg, nr_pages);
+		obj_cgroup_uncharge_pages(objcg, nr_pages);
 	list_del(&objcg->list);
 	mem_cgroup_put(memcg);
 	spin_unlock_irqrestore(&css_set_lock, flags);
@@ -3046,46 +3044,45 @@ static void memcg_free_cache_id(int id)
 	ida_simple_remove(&memcg_cache_ida, id);
 }
 
+/*
+ * obj_cgroup_uncharge_pages: uncharge a number of kernel pages from a objcg
+ * @objcg: object cgroup to uncharge
+ * @nr_pages: number of pages to uncharge
+ */
 static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
 				      unsigned int nr_pages)
 {
 	struct mem_cgroup *memcg;
 
 	memcg = get_mem_cgroup_from_objcg(objcg);
-	__memcg_kmem_uncharge(memcg, nr_pages);
-	css_put(&memcg->css);
-}
 
-static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
-				   unsigned int nr_pages)
-{
-	struct mem_cgroup *memcg;
-	int ret;
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		page_counter_uncharge(&memcg->kmem, nr_pages);
+	refill_stock(memcg, nr_pages);
 
-	memcg = get_mem_cgroup_from_objcg(objcg);
-	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
 	css_put(&memcg->css);
-
-	return ret;
 }
 
-/**
- * __memcg_kmem_charge: charge a number of kernel pages to a memcg
- * @memcg: memory cgroup to charge
+/*
+ * obj_cgroup_charge_pages: charge a number of kernel pages to a objcg
+ * @objcg: object cgroup to charge
  * @gfp: reclaim mode
  * @nr_pages: number of pages to charge
  *
  * Returns 0 on success, an error code on failure.
  */
-static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
-			       unsigned int nr_pages)
+static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
+				   unsigned int nr_pages)
 {
 	struct page_counter *counter;
+	struct mem_cgroup *memcg;
 	int ret;
 
+	memcg = get_mem_cgroup_from_objcg(objcg);
+
 	ret = try_charge(memcg, gfp, nr_pages);
 	if (ret)
-		return ret;
+		goto out;
 
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
 	    !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
@@ -3097,25 +3094,15 @@ static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
 		 */
 		if (gfp & __GFP_NOFAIL) {
 			page_counter_charge(&memcg->kmem, nr_pages);
-			return 0;
+			goto out;
 		}
 		cancel_charge(memcg, nr_pages);
-		return -ENOMEM;
+		ret = -ENOMEM;
 	}
-	return 0;
-}
-
-/**
- * __memcg_kmem_uncharge: uncharge a number of kernel pages from a memcg
- * @memcg: memcg to uncharge
- * @nr_pages: number of pages to uncharge
- */
-static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
-		page_counter_uncharge(&memcg->kmem, nr_pages);
+out:
+	css_put(&memcg->css);
 
-	refill_stock(memcg, nr_pages);
+	return ret;
 }
 
 /**
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 7/7] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
  2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
                   ` (5 preceding siblings ...)
  2021-03-19 16:38 ` [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Muchun Song
@ 2021-03-19 16:38 ` Muchun Song
  6 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-19 16:38 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.

As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 395a113e4a3b..7fdc92e1983e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -510,6 +510,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
+#ifdef CONFIG_MEMCG_KMEM
 /*
  * PageMemcgKmem - check if the page has MemcgKmem flag set
  * @page: a pointer to the page struct
@@ -524,7 +525,6 @@ static inline bool PageMemcgKmem(struct page *page)
 	return page->memcg_data & MEMCG_DATA_KMEM;
 }
 
-#ifdef CONFIG_MEMCG_KMEM
 /*
  * page_objcgs - get the object cgroups vector associated with a page
  * @page: a pointer to the page struct
@@ -566,6 +566,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 }
 
 #else
+static inline bool PageMemcgKmem(struct page *page)
+{
+	return false;
+}
+
 static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
 	return NULL;
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg
  2021-03-19 16:38 ` [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Muchun Song
@ 2021-03-19 18:26   ` Shakeel Butt
  2021-03-22 14:46   ` Johannes Weiner
  2021-03-22 18:17   ` Roman Gushchin
  2 siblings, 0 replies; 17+ messages in thread
From: Shakeel Butt @ 2021-03-19 18:26 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux MM, Xiongchun duan

On Fri, Mar 19, 2021 at 9:38 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> The rcu_read_lock/unlock only can guarantee that the memcg will not be
> freed, but it cannot guarantee the success of css_get (which is in the
> refill_stock when cached memcg changed) to memcg.
>
>   rcu_read_lock()
>   memcg = obj_cgroup_memcg(old)
>   __memcg_kmem_uncharge(memcg)
>       refill_stock(memcg)
>           if (stock->cached != memcg)
>               // css_get can change the ref counter from 0 back to 1.
>               css_get(&memcg->css)
>   rcu_read_unlock()
>
> This fix is very like the commit:
>
>   eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
>
> Fix this by holding a reference to the memcg which is passed to the
> __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
>
> Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Good catch.

Reviewed-by: Shakeel Butt <shakeelb@google.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-19 16:38 ` [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
@ 2021-03-19 18:27   ` Shakeel Butt
  2021-03-22 18:13   ` Roman Gushchin
  1 sibling, 0 replies; 17+ messages in thread
From: Shakeel Butt @ 2021-03-19 18:27 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux MM, Xiongchun duan

On Fri, Mar 19, 2021 at 9:39 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> Since Roman series "The new cgroup slab memory controller" applied. All
> slab objects are charged via the new APIs of obj_cgroup. The new APIs
> introduce a struct obj_cgroup to charge slab objects. It prevents
> long-living objects from pinning the original memory cgroup in the memory.
> But there are still some corner objects (e.g. allocations larger than
> order-1 page on SLUB) which are not charged via the new APIs. Those
> objects (include the pages which are allocated from buddy allocator
> directly) are charged as kmem pages which still hold a reference to
> the memory cgroup.
>
> We want to reuse the obj_cgroup APIs to charge the kmem pages.
> If we do that, we should store an object cgroup pointer to
> page->memcg_data for the kmem pages.
>
> Finally, page->memcg_data will have 3 different meanings.
>
>   1) For the slab pages, page->memcg_data points to an object cgroups
>      vector.
>
>   2) For the kmem pages (exclude the slab pages), page->memcg_data
>      points to an object cgroup.
>
>   3) For the user pages (e.g. the LRU pages), page->memcg_data points
>      to a memory cgroup.
>
> We do not change the behavior of page_memcg() and page_memcg_rcu().
> They are also suitable for LRU pages and kmem pages. Why?
>
> Because memory allocations pinning memcgs for a long time - it exists
> at a larger scale and is causing recurring problems in the real world:
> page cache doesn't get reclaimed for a long time, or is used by the
> second, third, fourth, ... instance of the same job that was restarted
> into a new cgroup every time. Unreclaimable dying cgroups pile up,
> waste memory, and make page reclaim very inefficient.
>
> We can convert LRU pages and most other raw memcg pins to the objcg
> direction to fix this problem, and then the page->memcg will always
> point to an object cgroup pointer. At that time, LRU pages and kmem
> pages will be treated the same. The implementation of page_memcg()
> will remove the kmem page check.
>
> This patch aims to charge the kmem pages by using the new APIs of
> obj_cgroup. Finally, the page->memcg_data of the kmem page points to
> an object cgroup. We can use the __page_objcg() to get the object
> cgroup associated with a kmem page. Or we can use page_memcg()
> to get the memory cgroup associated with a kmem page, but caller must
> ensure that the returned memcg won't be released (e.g. acquire the
> rcu_read_lock or css_set_lock).
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Shakeel Butt <shakeelb@google.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()
  2021-03-19 16:38 ` [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Muchun Song
@ 2021-03-19 18:41   ` Shakeel Butt
  2021-03-22 14:34   ` Johannes Weiner
  2021-03-22 18:14   ` Roman Gushchin
  2 siblings, 0 replies; 17+ messages in thread
From: Shakeel Butt @ 2021-03-19 18:41 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux MM, Xiongchun duan

On Fri, Mar 19, 2021 at 9:39 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> There is only one user of __memcg_kmem_charge(), so manually inline
> __memcg_kmem_charge() to obj_cgroup_charge_pages(). Similarly manually
> inline __memcg_kmem_uncharge() into obj_cgroup_uncharge_pages() and
> call obj_cgroup_uncharge_pages() in obj_cgroup_release().
>
> This is just code cleanup without any functionality changes.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()
  2021-03-19 16:38 ` [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Muchun Song
  2021-03-19 18:41   ` Shakeel Butt
@ 2021-03-22 14:34   ` Johannes Weiner
  2021-03-22 18:14   ` Roman Gushchin
  2 siblings, 0 replies; 17+ messages in thread
From: Johannes Weiner @ 2021-03-22 14:34 UTC (permalink / raw)
  To: Muchun Song
  Cc: guro, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Sat, Mar 20, 2021 at 12:38:19AM +0800, Muchun Song wrote:
> There is only one user of __memcg_kmem_charge(), so manually inline
> __memcg_kmem_charge() to obj_cgroup_charge_pages(). Similarly manually
> inline __memcg_kmem_uncharge() into obj_cgroup_uncharge_pages() and
> call obj_cgroup_uncharge_pages() in obj_cgroup_release().
> 
> This is just code cleanup without any functionality changes.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg
  2021-03-19 16:38 ` [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Muchun Song
  2021-03-19 18:26   ` Shakeel Butt
@ 2021-03-22 14:46   ` Johannes Weiner
  2021-03-23  9:18     ` [External] " Muchun Song
  2021-03-22 18:17   ` Roman Gushchin
  2 siblings, 1 reply; 17+ messages in thread
From: Johannes Weiner @ 2021-03-22 14:46 UTC (permalink / raw)
  To: Muchun Song
  Cc: guro, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Sat, Mar 20, 2021 at 12:38:14AM +0800, Muchun Song wrote:
> The rcu_read_lock/unlock only can guarantee that the memcg will not be
> freed, but it cannot guarantee the success of css_get (which is in the
> refill_stock when cached memcg changed) to memcg.
> 
>   rcu_read_lock()
>   memcg = obj_cgroup_memcg(old)
>   __memcg_kmem_uncharge(memcg)
>       refill_stock(memcg)
>           if (stock->cached != memcg)
>               // css_get can change the ref counter from 0 back to 1.
>               css_get(&memcg->css)
>   rcu_read_unlock()
> 
> This fix is very like the commit:
> 
>   eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
> 
> Fix this by holding a reference to the memcg which is passed to the
> __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
> 
> Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Good catch! Did you trigger the WARN_ON() in
percpu_ref_kill_and_confirm() during testing?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-19 16:38 ` [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-19 18:27   ` Shakeel Butt
@ 2021-03-22 18:13   ` Roman Gushchin
  1 sibling, 0 replies; 17+ messages in thread
From: Roman Gushchin @ 2021-03-22 18:13 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Sat, Mar 20, 2021 at 12:38:18AM +0800, Muchun Song wrote:
> Since Roman series "The new cgroup slab memory controller" applied. All
> slab objects are charged via the new APIs of obj_cgroup. The new APIs
> introduce a struct obj_cgroup to charge slab objects. It prevents
> long-living objects from pinning the original memory cgroup in the memory.
> But there are still some corner objects (e.g. allocations larger than
> order-1 page on SLUB) which are not charged via the new APIs. Those
> objects (include the pages which are allocated from buddy allocator
> directly) are charged as kmem pages which still hold a reference to
> the memory cgroup.
> 
> We want to reuse the obj_cgroup APIs to charge the kmem pages.
> If we do that, we should store an object cgroup pointer to
> page->memcg_data for the kmem pages.
> 
> Finally, page->memcg_data will have 3 different meanings.
> 
>   1) For the slab pages, page->memcg_data points to an object cgroups
>      vector.
> 
>   2) For the kmem pages (exclude the slab pages), page->memcg_data
>      points to an object cgroup.
> 
>   3) For the user pages (e.g. the LRU pages), page->memcg_data points
>      to a memory cgroup.
> 
> We do not change the behavior of page_memcg() and page_memcg_rcu().
> They are also suitable for LRU pages and kmem pages. Why?
> 
> Because memory allocations pinning memcgs for a long time - it exists
> at a larger scale and is causing recurring problems in the real world:
> page cache doesn't get reclaimed for a long time, or is used by the
> second, third, fourth, ... instance of the same job that was restarted
> into a new cgroup every time. Unreclaimable dying cgroups pile up,
> waste memory, and make page reclaim very inefficient.
> 
> We can convert LRU pages and most other raw memcg pins to the objcg
> direction to fix this problem, and then the page->memcg will always
> point to an object cgroup pointer. At that time, LRU pages and kmem
> pages will be treated the same. The implementation of page_memcg()
> will remove the kmem page check.
> 
> This patch aims to charge the kmem pages by using the new APIs of
> obj_cgroup. Finally, the page->memcg_data of the kmem page points to
> an object cgroup. We can use the __page_objcg() to get the object
> cgroup associated with a kmem page. Or we can use page_memcg()
> to get the memory cgroup associated with a kmem page, but caller must
> ensure that the returned memcg won't be released (e.g. acquire the
> rcu_read_lock or css_set_lock).
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> ---
>  include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++----------
>  mm/memcontrol.c            | 110 +++++++++++++++++++++---------------------
>  2 files changed, 145 insertions(+), 81 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..395a113e4a3b 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -358,6 +358,62 @@ enum page_memcg_data_flags {
>  
>  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
>  
> +static inline bool PageMemcgKmem(struct page *page);
> +
> +/*
> + * After the initialization objcg->memcg is always pointing at
> + * a valid memcg, but can be atomically swapped to the parent memcg.
> + *
> + * The caller must ensure that the returned memcg won't be released:
> + * e.g. acquire the rcu_read_lock or css_set_lock.
> + */
> +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
> +{
> +	return READ_ONCE(objcg->memcg);
> +}
> +
> +/*
> + * __page_memcg - get the memory cgroup associated with a non-kmem page
> + * @page: a pointer to the page struct
> + *
> + * Returns a pointer to the memory cgroup associated with the page,
> + * or NULL. This function assumes that the page is known to have a
> + * proper memory cgroup pointer. It's not safe to call this function
> + * against some type of pages, e.g. slab pages or ex-slab pages or
> + * kmem pages.
> + */
> +static inline struct mem_cgroup *__page_memcg(struct page *page)
> +{
> +	unsigned long memcg_data = page->memcg_data;
> +
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
> +
> +	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> +}
> +
> +/*
> + * __page_objcg - get the object cgroup associated with a kmem page
> + * @page: a pointer to the page struct
> + *
> + * Returns a pointer to the object cgroup associated with the page,
> + * or NULL. This function assumes that the page is known to have a
> + * proper object cgroup pointer. It's not safe to call this function
> + * against some type of pages, e.g. slab pages or ex-slab pages or
> + * LRU pages.
> + */
> +static inline struct obj_cgroup *__page_objcg(struct page *page)
> +{
> +	unsigned long memcg_data = page->memcg_data;
> +
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> +	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
> +
> +	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> +}
> +
>  /*
>   * page_memcg - get the memory cgroup associated with a page
>   * @page: a pointer to the page struct
> @@ -367,20 +423,23 @@ enum page_memcg_data_flags {
>   * proper memory cgroup pointer. It's not safe to call this function
>   * against some type of pages, e.g. slab pages or ex-slab pages.
>   *
> - * Any of the following ensures page and memcg binding stability:
> + * For a non-kmem page any of the following ensures page and memcg binding
> + * stability:
> + *
>   * - the page lock
>   * - LRU isolation
>   * - lock_page_memcg()
>   * - exclusive reference
> + *
> + * For a kmem page a caller should hold an rcu read lock to protect memcg
> + * associated with a kmem page from being released.
>   */
>  static inline struct mem_cgroup *page_memcg(struct page *page)
>  {
> -	unsigned long memcg_data = page->memcg_data;
> -
> -	VM_BUG_ON_PAGE(PageSlab(page), page);
> -	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> -
> -	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> +	if (PageMemcgKmem(page))
> +		return obj_cgroup_memcg(__page_objcg(page));
> +	else
> +		return __page_memcg(page);
>  }
>  
>  /*
> @@ -394,11 +453,19 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
>   */
>  static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
>  {
> +	unsigned long memcg_data = READ_ONCE(page->memcg_data);
> +
>  	VM_BUG_ON_PAGE(PageSlab(page), page);
>  	WARN_ON_ONCE(!rcu_read_lock_held());
>  
> -	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
> -				     ~MEMCG_DATA_FLAGS_MASK);
> +	if (memcg_data & MEMCG_DATA_KMEM) {
> +		struct obj_cgroup *objcg;
> +
> +		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> +		return obj_cgroup_memcg(objcg);
> +	}
> +
> +	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
>  /*
> @@ -406,15 +473,21 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
>   * @page: a pointer to the page struct
>   *
>   * Returns a pointer to the memory cgroup associated with the page,
> - * or NULL. This function unlike page_memcg() can take any  page
> + * or NULL. This function unlike page_memcg() can take any page
>   * as an argument. It has to be used in cases when it's not known if a page
> - * has an associated memory cgroup pointer or an object cgroups vector.
> + * has an associated memory cgroup pointer or an object cgroups vector or
> + * an object cgroup.
> + *
> + * For a non-kmem page any of the following ensures page and memcg binding
> + * stability:
>   *
> - * Any of the following ensures page and memcg binding stability:
>   * - the page lock
>   * - LRU isolation
>   * - lock_page_memcg()
>   * - exclusive reference
> + *
> + * For a kmem page a caller should hold an rcu read lock to protect memcg
> + * associated with a kmem page from being released.
>   */
>  static inline struct mem_cgroup *page_memcg_check(struct page *page)
>  {
> @@ -427,6 +500,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
>  	if (memcg_data & MEMCG_DATA_OBJCGS)
>  		return NULL;
>  
> +	if (memcg_data & MEMCG_DATA_KMEM) {
> +		struct obj_cgroup *objcg;
> +
> +		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> +		return obj_cgroup_memcg(objcg);
> +	}
> +
>  	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
> @@ -713,18 +793,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
>  	percpu_ref_put(&objcg->refcnt);
>  }
>  
> -/*
> - * After the initialization objcg->memcg is always pointing at
> - * a valid memcg, but can be atomically swapped to the parent memcg.
> - *
> - * The caller must ensure that the returned memcg won't be released:
> - * e.g. acquire the rcu_read_lock or css_set_lock.
> - */
> -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
> -{
> -	return READ_ONCE(objcg->memcg);
> -}
> -
>  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
>  {
>  	if (memcg)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8d28a5a2ee58..962499542531 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -855,18 +855,22 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
>  			     int val)
>  {
>  	struct page *head = compound_head(page); /* rmap on tail pages */
> -	struct mem_cgroup *memcg = page_memcg(head);
> +	struct mem_cgroup *memcg;
>  	pg_data_t *pgdat = page_pgdat(page);
>  	struct lruvec *lruvec;
>  
> +	rcu_read_lock();
> +	memcg = page_memcg(head);
>  	/* Untracked pages have no memcg, no lruvec. Update only the node */
>  	if (!memcg) {
> +		rcu_read_unlock();
>  		__mod_node_page_state(pgdat, idx, val);
>  		return;
>  	}
>  
>  	lruvec = mem_cgroup_lruvec(memcg, pgdat);
>  	__mod_lruvec_state(lruvec, idx, val);
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(__mod_lruvec_page_state);
>  
> @@ -1055,20 +1059,6 @@ static __always_inline struct mem_cgroup *active_memcg(void)
>  		return current->active_memcg;
>  }
>  
> -static __always_inline struct mem_cgroup *get_active_memcg(void)
> -{
> -	struct mem_cgroup *memcg;
> -
> -	rcu_read_lock();
> -	memcg = active_memcg();
> -	/* remote memcg must hold a ref. */
> -	if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
> -		memcg = root_mem_cgroup;
> -	rcu_read_unlock();
> -
> -	return memcg;
> -}
> -
>  static __always_inline bool memcg_kmem_bypass(void)
>  {
>  	/* Allow remote memcg charging from any context. */
> @@ -1083,20 +1073,6 @@ static __always_inline bool memcg_kmem_bypass(void)
>  }
>  
>  /**
> - * If active memcg is set, do not fallback to current->mm->memcg.
> - */
> -static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
> -{
> -	if (memcg_kmem_bypass())
> -		return NULL;
> -
> -	if (unlikely(active_memcg()))
> -		return get_active_memcg();
> -
> -	return get_mem_cgroup_from_mm(current->mm);
> -}
> -
> -/**
>   * mem_cgroup_iter - iterate over memory cgroup hierarchy
>   * @root: hierarchy root
>   * @prev: previously returned memcg, NULL on first invocation
> @@ -3152,18 +3128,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
>   */
>  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>  {
> -	struct mem_cgroup *memcg;
> +	struct obj_cgroup *objcg;
>  	int ret = 0;
>  
> -	memcg = get_mem_cgroup_from_current();
> -	if (memcg && !mem_cgroup_is_root(memcg)) {
> -		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
> +	objcg = get_obj_cgroup_from_current();
> +	if (objcg) {
> +		ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
>  		if (!ret) {
> -			page->memcg_data = (unsigned long)memcg |
> +			page->memcg_data = (unsigned long)objcg |
>  				MEMCG_DATA_KMEM;
>  			return 0;
>  		}
> -		css_put(&memcg->css);
> +		obj_cgroup_put(objcg);
>  	}
>  	return ret;
>  }
> @@ -3175,16 +3151,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>   */
>  void __memcg_kmem_uncharge_page(struct page *page, int order)
>  {
> -	struct mem_cgroup *memcg = page_memcg(page);
> +	struct obj_cgroup *objcg;
>  	unsigned int nr_pages = 1 << order;
>  
> -	if (!memcg)
> +	if (!PageMemcgKmem(page))
>  		return;
>  
> -	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
> -	__memcg_kmem_uncharge(memcg, nr_pages);
> +	objcg = __page_objcg(page);
> +	obj_cgroup_uncharge_pages(objcg, nr_pages);
>  	page->memcg_data = 0;
> -	css_put(&memcg->css);
> +	obj_cgroup_put(objcg);
>  }
>  
>  static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
> @@ -6799,7 +6775,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
>  
>  struct uncharge_gather {
>  	struct mem_cgroup *memcg;
> -	unsigned long nr_pages;
> +	unsigned long nr_memory;
>  	unsigned long pgpgout;
>  	unsigned long nr_kmem;
>  	struct page *dummy_page;
> @@ -6814,10 +6790,10 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  {
>  	unsigned long flags;
>  
> -	if (!mem_cgroup_is_root(ug->memcg)) {
> -		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
> +	if (ug->nr_memory) {
> +		page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
>  		if (do_memsw_account())
> -			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
> +			page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
>  		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
>  			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
>  		memcg_oom_recover(ug->memcg);
> @@ -6825,7 +6801,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  
>  	local_irq_save(flags);
>  	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
> -	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
> +	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory);
>  	memcg_check_events(ug->memcg, ug->dummy_page);
>  	local_irq_restore(flags);
>  
> @@ -6836,40 +6812,60 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
>  	unsigned long nr_pages;
> +	struct mem_cgroup *memcg;
> +	struct obj_cgroup *objcg;
>  
>  	VM_BUG_ON_PAGE(PageLRU(page), page);
>  
> -	if (!page_memcg(page))
> -		return;
> -
>  	/*
>  	 * Nobody should be changing or seriously looking at
> -	 * page_memcg(page) at this point, we have fully
> +	 * page memcg or objcg at this point, we have fully
>  	 * exclusive access to the page.
>  	 */
> +	if (PageMemcgKmem(page)) {
> +		objcg = __page_objcg(page);
> +		/*
> +		 * This get matches the put at the end of the function and
> +		 * kmem pages do not hold memcg references anymore.
> +		 */
> +		memcg = get_mem_cgroup_from_objcg(objcg);
> +	} else {
> +		memcg = __page_memcg(page);
> +	}
>  
> -	if (ug->memcg != page_memcg(page)) {
> +	if (!memcg)
> +		return;
> +
> +	if (ug->memcg != memcg) {
>  		if (ug->memcg) {
>  			uncharge_batch(ug);
>  			uncharge_gather_clear(ug);
>  		}
> -		ug->memcg = page_memcg(page);
> +		ug->memcg = memcg;
>  		ug->dummy_page = page;
>  
>  		/* pairs with css_put in uncharge_batch */
> -		css_get(&ug->memcg->css);
> +		css_get(&memcg->css);
>  	}
>  
>  	nr_pages = compound_nr(page);
> -	ug->nr_pages += nr_pages;
>  
> -	if (PageMemcgKmem(page))
> +	if (PageMemcgKmem(page)) {
> +		ug->nr_memory += nr_pages;
>  		ug->nr_kmem += nr_pages;
> -	else
> +
> +		page->memcg_data = 0;
> +		obj_cgroup_put(objcg);
> +	} else {
> +		/* LRU pages aren't accounted at the root level */
> +		if (!mem_cgroup_is_root(memcg))
> +			ug->nr_memory += nr_pages;
>  		ug->pgpgout++;
>  
> -	page->memcg_data = 0;
> -	css_put(&ug->memcg->css);
> +		page->memcg_data = 0;
> +	}
> +
> +	css_put(&memcg->css);
>  }
>  
>  /**
> -- 
> 2.11.0
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()
  2021-03-19 16:38 ` [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Muchun Song
  2021-03-19 18:41   ` Shakeel Butt
  2021-03-22 14:34   ` Johannes Weiner
@ 2021-03-22 18:14   ` Roman Gushchin
  2 siblings, 0 replies; 17+ messages in thread
From: Roman Gushchin @ 2021-03-22 18:14 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Sat, Mar 20, 2021 at 12:38:19AM +0800, Muchun Song wrote:
> There is only one user of __memcg_kmem_charge(), so manually inline
> __memcg_kmem_charge() to obj_cgroup_charge_pages(). Similarly manually
> inline __memcg_kmem_uncharge() into obj_cgroup_uncharge_pages() and
> call obj_cgroup_uncharge_pages() in obj_cgroup_release().
> 
> This is just code cleanup without any functionality changes.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Roman Gushchin <guro@fb.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg
  2021-03-19 16:38 ` [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Muchun Song
  2021-03-19 18:26   ` Shakeel Butt
  2021-03-22 14:46   ` Johannes Weiner
@ 2021-03-22 18:17   ` Roman Gushchin
  2 siblings, 0 replies; 17+ messages in thread
From: Roman Gushchin @ 2021-03-22 18:17 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Sat, Mar 20, 2021 at 12:38:14AM +0800, Muchun Song wrote:
> The rcu_read_lock/unlock only can guarantee that the memcg will not be
> freed, but it cannot guarantee the success of css_get (which is in the
> refill_stock when cached memcg changed) to memcg.
> 
>   rcu_read_lock()
>   memcg = obj_cgroup_memcg(old)
>   __memcg_kmem_uncharge(memcg)
>       refill_stock(memcg)
>           if (stock->cached != memcg)
>               // css_get can change the ref counter from 0 back to 1.
>               css_get(&memcg->css)
>   rcu_read_unlock()
> 
> This fix is very like the commit:
> 
>   eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
> 
> Fix this by holding a reference to the memcg which is passed to the
> __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
> 
> Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg
  2021-03-22 14:46   ` Johannes Weiner
@ 2021-03-23  9:18     ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-23  9:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Mon, Mar 22, 2021 at 10:46 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Sat, Mar 20, 2021 at 12:38:14AM +0800, Muchun Song wrote:
> > The rcu_read_lock/unlock only can guarantee that the memcg will not be
> > freed, but it cannot guarantee the success of css_get (which is in the
> > refill_stock when cached memcg changed) to memcg.
> >
> >   rcu_read_lock()
> >   memcg = obj_cgroup_memcg(old)
> >   __memcg_kmem_uncharge(memcg)
> >       refill_stock(memcg)
> >           if (stock->cached != memcg)
> >               // css_get can change the ref counter from 0 back to 1.
> >               css_get(&memcg->css)
> >   rcu_read_unlock()
> >
> > This fix is very like the commit:
> >
> >   eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
> >
> > Fix this by holding a reference to the memcg which is passed to the
> > __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
> >
> > Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Good catch! Did you trigger the WARN_ON() in
> percpu_ref_kill_and_confirm() during testing?

No. The race window is very small, it should be difficult to trigger.
When I reviewed the code here, I suddenly realized that there
might be a problem here. Very coincidental.

Thanks.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-23  9:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 16:38 [PATCH v5 0/7] Use obj_cgroup APIs to charge kmem pages Muchun Song
2021-03-19 16:38 ` [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg Muchun Song
2021-03-19 18:26   ` Shakeel Butt
2021-03-22 14:46   ` Johannes Weiner
2021-03-23  9:18     ` [External] " Muchun Song
2021-03-22 18:17   ` Roman Gushchin
2021-03-19 16:38 ` [PATCH v5 2/7] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
2021-03-19 16:38 ` [PATCH v5 3/7] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c Muchun Song
2021-03-19 16:38 ` [PATCH v5 4/7] mm: memcontrol: change ug->dummy_page only if memcg changed Muchun Song
2021-03-19 16:38 ` [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
2021-03-19 18:27   ` Shakeel Butt
2021-03-22 18:13   ` Roman Gushchin
2021-03-19 16:38 ` [PATCH v5 6/7] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Muchun Song
2021-03-19 18:41   ` Shakeel Butt
2021-03-22 14:34   ` Johannes Weiner
2021-03-22 18:14   ` Roman Gushchin
2021-03-19 16:38 ` [PATCH v5 7/7] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).